Skip to content

ML-As-3

Problem 1: Hard-margin SVM. (18 pts)

You are given the following two sets of data points, each belonging to one of the two classes (class 1 and class -1):

  • Class 1 (labeled as +1):
(1,2),(2,3)
  • Class -1 (labeled as -1):
(2,1),(3,2)

Please find the optimal separating hyperplane using a linear SVM and derive the equation of the hyperplane. Assume the hard-margin SVM.

1. Write down the formulation of SVM, including the separation hyperplane, the constraints and the final optimization problem with parameters. (4 pts)

The hyperplane is defined through w and b as a set of points such that

H={x|wTx+b=0}
  • w=(w1,w2,,wn): weight vector
  • b: scalar

Subject to the constraint

yi(wTxi+b)1,i
  • yi is the class label of xi

Final optimization problem:

minw,b12||w||2

2. Write down the Lagrangian form for this problem using the parameters and Lagrange multipliers. Please also write out its dual form. (10 pts)

The Lagrangian form:

L(w,b,α)=12||w||2i=1nαi[yi(wTxi+b)1]

where αi0 are the Lagrange multipliers associated with each constraint.

The dual form of the optimization problem

maxαi=1nαi12i=1nj=1nαiαjyiyjxiTxj

subject to

i=1nαiyi=0andαi0 i

3. Assume that the Lagrangian multipliers α’s are all 0.5 and that the point (1,2) is a support vector for ease of calculation. Please calculate the values of weight vector w and bias b. Write out the explicit form of the hyperplane. (4 pts)

w=i=1mαiyixi

If

w=0.5×1×(1,2)+0.5×1×(2,3)+0.5×1×(2,1)+0.5×1×(3,2)=(1,1)b=yii=1mαiyixiTxj

Since the support vector is (1,2), we have: yj=1

b=1(0.5×1×(1,2)T×(1,2)+0.5×1×(2,3)T×(1,2)+0.5×1×(2,1)T×(1,2)+0.5×1×(3,2)T×(1,2))=0

The explicit form of the hyperplane.

H={x|wTx+b=0}H={x|(1,1)Tx=0}

Problem 2: Soft-margin SVM. (20 pts)

Suppose we have the data points xRn×d with corresponding labels yRn. We want to use a soft-margin SVM to classify these data points with a regularization parameter C=1.

1. Write down the formulation of the soft-margin SVM. for this problem using w, x, y, b and ξ. Write out explicitly their dimensions. (3 pts)

For a soft-margin SVM, the optimization problem can be formulated as follows:

minw,b,ξ12w2+Ci=1nξi

subject to:

yi(wxi+b)1ξi,ξi0i

where:

  • wRd is the weight vector,
  • bR is the bias,
  • ξRn is the vector of slack variables, and
  • C=1 is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.

Dimensions:

  • w has dimension d×1,
  • x has dimension n×d,
  • y has dimension n×1,
  • b is a scalar,
  • ξ has dimension n×1.

2. Write down the Lagrangian form and derive the dual for the problem. Write down the detailed derivation steps. (12 pts)

The primal objective function is:

L(w,b,ξ)=12w2+Ci=1nξii=1nαi[yi(wxi+b)1+ξi]i=1nμiξi

where αi0 and μi0 are Lagrange multipliers for the constraints yi(wxi+b)1ξi and ξi0, respectively.

To derive the dual problem, we take the partial derivatives of L with respect to w, b, and ξ and set them to zero:

Partial derivative with respect to w:

Lw=wi=1nαiyixi=0w=i=1nαiyixi

Partial derivative with respect to b:

Lb=i=1nαiyi=0i=1nαiyi=0

Partial derivative with respect to ξi:

Lξi=Cαiμi=0αiC

By substituting w back into the objective function, we obtain the dual problem:

maxαi=1nαi12i=1nj=1nαiαjyiyj(xixj)

subject to:

i=1nαiyi=0,0αiC

3. Obtain the decision boundary. (3 pts)

The decision boundary is given by:

wTx+b=0

where w=i=1nαiyixi from the dual problem. To classify a new point x, we use the decision function:

wTx+b=(i=1nαiyixi)Tx+b=0

4. Explain why ξ disappears in the dual. (2 pts)

In the dual formulation, ξ disappears because it only appears in the primal objective function as part of the constraints and is not involved in the dual variables. By taking the partial derivatives with respect to ξ and setting them to zero, we express ξi in terms of αi and C. Consequently, ξi is not explicitly present in the dual formulation, as it is fully captured by the constraints on αi (specifically, 0αiC).

Problem 3: Kernel SVM. (17 pts)

Consider the following 2D dataset with four training points:

x1=(1,2),y1=1x2=(2,3),y2=1x3=(3,1),y3=1x4=(4,3),y4=1

We want to use the polynomial kernel k(xi,xj)=(xiTxj+1)2 to classify these data points with a soft-margin SVM. The regularization parameters C=1.

To solve Problem 3 on Kernel SVM, let’s go through each part step-by-step.

1. Compute the Kernel Matrix K (6 pts)

k(xi,xj)=(xiTxj+1)2

The kernel matrix K is a 4×4 matrix where each element Kij=k(xi,xj). Let’s compute each entry using the given kernel function.

K11=k(x1,x1)=((11+22)+1)2=36K12=k(x1,x2)=((12+23)+1)2=81K13=k(x1,x3)=((13+21)+1)2=36K14=k(x1,x4)=((14+23)+1)2=121K22=k(x2,x2)=((22+33)+1)2=196K23=k(x2,x3)=((23+31)+1)2=100K24=k(x2,x4)=((24+33)+1)2=324K33=k(x3,x3)=((33+11)+1)2=121K34=k(x3,x4)=((34+13)+1)2=256K44=k(x4,x4)=((44+33)+1)2=676

Since K is symmetric, we can fill in the remaining entries by symmetry:

K=(3681361218119610032436100121256121324256676)

2. Set up the Dual Optimization Problem. You can use the results from Problem 2. (4 pts)

Using the results from Problem 2, the dual problem for a soft-margin SVM with a kernel function becomes:

maxαi=1nαi12i=1nj=1nαiαjyiyjKij

subject to:

i=1nαiyi=0,0αiC

where C=1 in this problem.

3. Suppose the Lagrange multipliers α’s are

α1=0.0182, α2=0.0068, α3=0.0250, and α4=0.

and x3 is a support vector. (2 pts)

The bias b is calculated as:

b=yji=1nαiyiKij

where we can use x3 (with y3=1) as the support vector.

Substitute j=3:

b=y3i=14αiyiKi3

Calculating each term in the summation:

α1y1K13=0.0182×1×36=0.6552α2y2K23=0.0068×1×100=0.68α3y3K33=0.0250×(1)×121=3.025α4y4K43=0 (since α4=0)

Summing these values:

i=14αiyiKi3=0.6552+0.683.025+0=1.6898b=1(1.6898)=1+1.6898=0.6898

4. Classify a New Point x5=(2,1) using the learned kernel SVM model. (5 pts)

To classify the point x5=(2,1), we use the decision function:

f(x5)=i=1nαiyik(xi,x5)+b

Let’s compute each k(xi,x5):

  1. k(x1,x5)=((12+21)+1)2=25
  2. k(x2,x5)=((22+31)+1)2=64
  3. k(x3,x5)=((32+11)+1)2=64
  4. k(x4,x5)=((42+31)+1)2=144

Now, calculate f(x5):

f(x5)=α1y1k(x1,x5)+α2y2k(x2,x5)+α3y3k(x3,x5)+α4y4k(x4,x5)+b

Substitute the values:

f(x5)=(0.0182×1×25)+(0.0068×1×64)+(0.0250×1×64)+(0×1×144)=0.6898

Calculate each term:

0.0182×25=0.4550.0068×64=0.43520.0250×64=1.6

Adding them up with b:

f(x5)=0.455+0.43521.6+0.6898=0.02

Since f(x5)<0, we classify x5​ as belonging to class 1.