ML-As-3

Problem 1: Hard-margin SVM. (18 pts)

You are given the following two sets of data points, each belonging to one of the two classes (class 1 and class -1):

Class 1 (labeled as +1):

(1, 2), (2, 3)

Class -1 (labeled as -1):

(2, 1), (3, 2)

Please find the optimal separating hyperplane using a linear SVM and derive the equation of the hyperplane. Assume the hard-margin SVM.

1. Write down the formulation of SVM, including the separation hyperplane, the constraints and the final optimization problem with parameters. (4 pts)

The hyperplane is defined through $w$ and $b$ as a set of points such that

H = {x | w^{T} x + b = 0}

$w = (w_{1}, w_{2}, \dots, w_{n})$ : weight vector
$b$ : scalar

Subject to the constraint

y_{i} (w^{T} x_{i} + b) \geq 1, \forall i

$y_{i}$ is the class label of $x_{i}$

Final optimization problem:

min_{w, b} \frac{1}{2} | | w | |^{2}

2. Write down the Lagrangian form for this problem using the parameters and Lagrange multipliers. Please also write out its dual form. (10 pts)

The Lagrangian form:

L (w, b, α) = \frac{1}{2} | | w | |^{2} - \sum_{i = 1}^{n} α_{i} [y_{i} (w^{T} x_{i} + b) - 1]

where $α_{i} \geq 0$ are the Lagrange multipliers associated with each constraint.

The dual form of the optimization problem

max_{α} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j}

subject to

\sum_{i = 1}^{n} α_{i} y_{i} = 0 and α_{i} \geq 0 \forall i

3. Assume that the Lagrangian multipliers $α$ ’s are all 0.5 and that the point $(1, 2)$ is a support vector for ease of calculation. Please calculate the values of weight vector $w$ and bias $b$ . Write out the explicit form of the hyperplane. (4 pts)

w^{*} = \sum_{i = 1}^{m} α_{i}^{*} y_{i} x_{i}

w^{*} = 0.5 \times 1 \times (1, 2) + 0.5 \times 1 \times (2, 3) + 0.5 \times - 1 \times (2, 1) + 0.5 \times - 1 \times (3, 2) = (- 1, 1)

b^{*} = y_{i} - \sum_{i = 1}^{m} α_{i}^{*} y_{i} x_{i}^{T} x_{j}

Since the support vector is $(1, 2)$ , we have: $y_{j} = 1$

b^{*} = 1 - (0.5 \times 1 \times (1, 2)^{T} \times (1, 2) + 0.5 \times 1 \times (2, 3)^{T} \times (1, 2) + 0.5 \times - 1 \times (2, 1)^{T} \times (1, 2) + 0.5 \times - 1 \times (3, 2)^{T} \times (1, 2)) = 0

The explicit form of the hyperplane.

H = {x | w^{T} x + b = 0}

H = {x | (- 1, 1)^{T} x = 0}

Problem 2: Soft-margin SVM. (20 pts)

Suppose we have the data points $x \in R^{n \times d}$ with corresponding labels $y \in R^{n}$ . We want to use a soft-margin SVM to classify these data points with a regularization parameter $C = 1$ .

1. Write down the formulation of the soft-margin SVM. for this problem using $w$ , $x$ , $y$ , $b$ and $ξ$ . Write out explicitly their dimensions. (3 pts)

For a soft-margin SVM, the optimization problem can be formulated as follows:

min_{w, b, ξ} \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{n} ξ_{i}

subject to:

y_{i} (w \cdot x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0 \forall i

where:

$w \in R^{d}$ is the weight vector,
$b \in R$ is the bias,
$ξ \in R^{n}$ is the vector of slack variables, and
$C = 1$ is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the classification error.

Dimensions:

$w$ has dimension $d \times 1$ ,
$x$ has dimension $n \times d$ ,
$y$ has dimension $n \times 1$ ,
$b$ is a scalar,
$ξ$ has dimension $n \times 1$ .

2. Write down the Lagrangian form and derive the dual for the problem. Write down the detailed derivation steps. (12 pts)

The primal objective function is:

L (w, b, ξ) = \frac{1}{2} ∥ w ∥^{2} + C \sum_{i = 1}^{n} ξ_{i} - \sum_{i = 1}^{n} α_{i} [y_{i} (w \cdot x_{i} + b) - 1 + ξ_{i}] - \sum_{i = 1}^{n} μ_{i} ξ_{i}

where $α_{i} \geq 0$ and $μ_{i} \geq 0$ are Lagrange multipliers for the constraints $y_{i} (w \cdot x_{i} + b) \geq 1 - ξ_{i}$ and $ξ_{i} \geq 0$ , respectively.

To derive the dual problem, we take the partial derivatives of $L$ with respect to $w$ , $b$ , and $ξ$ and set them to zero:

Partial derivative with respect to $w$ :

\frac{\partial L}{\partial w} = w - \sum_{i = 1}^{n} α_{i} y_{i} x_{i} = 0 \Rightarrow w = \sum_{i = 1}^{n} α_{i} y_{i} x_{i}

Partial derivative with respect to $b$ :

\frac{\partial L}{\partial b} = - \sum_{i = 1}^{n} α_{i} y_{i} = 0 \Rightarrow \sum_{i = 1}^{n} α_{i} y_{i} = 0

Partial derivative with respect to $ξ_{i}$ :

\frac{\partial L}{\partial ξ_{i}} = C - α_{i} - μ_{i} = 0 \Rightarrow α_{i} \leq C

By substituting $w$ back into the objective function, we obtain the dual problem:

max_{α} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} (x_{i} \cdot x_{j})

subject to:

\sum_{i = 1}^{n} α_{i} y_{i} = 0, 0 \leq α_{i} \leq C

3. Obtain the decision boundary. (3 pts)

The decision boundary is given by:

w^{T} \cdot x + b = 0

where $w = \sum_{i = 1}^{n} α_{i} y_{i} x_{i}$ from the dual problem. To classify a new point $x$ , we use the decision function:

w^{T} x + b = (\sum_{i = 1}^{n} α_{i} y_{i} x_{i})^{T} x + b = 0

4. Explain why $ξ$ disappears in the dual. (2 pts)

In the dual formulation, $ξ$ disappears because it only appears in the primal objective function as part of the constraints and is not involved in the dual variables. By taking the partial derivatives with respect to $ξ$ and setting them to zero, we express $ξ_{i}$ in terms of $α_{i}$ and $C$ . Consequently, $ξ_{i}$ is not explicitly present in the dual formulation, as it is fully captured by the constraints on $α_{i}$ (specifically, $0 \leq α_{i} \leq C$ ).

Problem 3: Kernel SVM. (17 pts)

Consider the following $2 D$ dataset with four training points:

x_{1} = (1, 2), y_{1} = 1

x_{2} = (2, 3), y_{2} = 1

x_{3} = (3, 1), y_{3} = - 1

x_{4} = (4, 3), y_{4} = - 1

We want to use the polynomial kernel $k (x_{i}, x_{j}) = (x_{i}^{T} x_{j} + 1)^{2}$ to classify these data points with a soft-margin SVM. The regularization parameters $C = 1$ .

To solve Problem 3 on Kernel SVM, let’s go through each part step-by-step.

1. Compute the Kernel Matrix $K$ (6 pts)

k (x_{i}, x_{j}) = (x_{i}^{T} x_{j} + 1)^{2}

The kernel matrix $K$ is a $4 \times 4$ matrix where each element $K_{i j} = k (x_{i}, x_{j})$ . Let’s compute each entry using the given kernel function.

$K_{11} = k (x_{1}, x_{1}) = ((1 \cdot 1 + 2 \cdot 2) + 1)^{2} = 36$ $K_{12} = k (x_{1}, x_{2}) = ((1 \cdot 2 + 2 \cdot 3) + 1)^{2} = 81$ $K_{13} = k (x_{1}, x_{3}) = ((1 \cdot 3 + 2 \cdot 1) + 1)^{2} = 36$ $K_{14} = k (x_{1}, x_{4}) = ((1 \cdot 4 + 2 \cdot 3) + 1)^{2} = 121$ $K_{22} = k (x_{2}, x_{2}) = ((2 \cdot 2 + 3 \cdot 3) + 1)^{2} = 196$ $K_{23} = k (x_{2}, x_{3}) = ((2 \cdot 3 + 3 \cdot 1) + 1)^{2} = 100$ $K_{24} = k (x_{2}, x_{4}) = ((2 \cdot 4 + 3 \cdot 3) + 1)^{2} = 324$ $K_{33} = k (x_{3}, x_{3}) = ((3 \cdot 3 + 1 \cdot 1) + 1)^{2} = 121$ $K_{34} = k (x_{3}, x_{4}) = ((3 \cdot 4 + 1 \cdot 3) + 1)^{2} = 256$ $K_{44} = k (x_{4}, x_{4}) = ((4 \cdot 4 + 3 \cdot 3) + 1)^{2} = 676$

Since $K$ is symmetric, we can fill in the remaining entries by symmetry:

K = (\begin{matrix} 36 & 81 & 36 & 121 \\ 81 & 196 & 100 & 324 \\ 36 & 100 & 121 & 256 \\ 121 & 324 & 256 & 676 \end{matrix})

2. Set up the Dual Optimization Problem. You can use the results from Problem 2. (4 pts)

Using the results from Problem 2, the dual problem for a soft-margin SVM with a kernel function becomes:

max_{α} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i} α_{j} y_{i} y_{j} K_{i j}

subject to:

\sum_{i = 1}^{n} α_{i} y_{i} = 0, 0 \leq α_{i} \leq C

where $C = 1$ in this problem.

3. Suppose the Lagrange multipliers $α$ ’s are

$α_{1} = 0.0182$ , $α_{2} = 0.0068$ , $α_{3} = 0.0250$ , and $α_{4} = 0$ .

and $x_{3}$ is a support vector. (2 pts)

The bias $b$ is calculated as:

b = y_{j} - \sum_{i = 1}^{n} α_{i} y_{i} K_{i j}

where we can use $x_{3}$ (with $y_{3} = - 1$ ) as the support vector.

Substitute $j = 3$ :

b = y_{3} - \sum_{i = 1}^{4} α_{i} y_{i} K_{i 3}

Calculating each term in the summation:

$α_{1} y_{1} K_{13} = 0.0182 \times 1 \times 36 = 0.6552$ $α_{2} y_{2} K_{23} = 0.0068 \times 1 \times 100 = 0.68$ $α_{3} y_{3} K_{33} = 0.0250 \times (- 1) \times 121 = - 3.025$ $α_{4} y_{4} K_{43} = 0$ (since $α_{4} = 0$ )

Summing these values:

\sum_{i = 1}^{4} α_{i} y_{i} K_{i 3} = 0.6552 + 0.68 - 3.025 + 0 = - 1.6898

b = - 1 - (- 1.6898) = - 1 + 1.6898 = 0.6898

4. Classify a New Point $x_{5} = (2, 1)$ using the learned kernel SVM model. (5 pts)

To classify the point $x_{5} = (2, 1)$ , we use the decision function:

f (x_{5}) = \sum_{i = 1}^{n} α_{i} y_{i} k (x_{i}, x_{5}) + b

Let’s compute each $k (x_{i}, x_{5})$ :

$k (x_{1}, x_{5}) = ((1 \cdot 2 + 2 \cdot 1) + 1)^{2} = 25$
$k (x_{2}, x_{5}) = ((2 \cdot 2 + 3 \cdot 1) + 1)^{2} = 64$
$k (x_{3}, x_{5}) = ((3 \cdot 2 + 1 \cdot 1) + 1)^{2} = 64$
$k (x_{4}, x_{5}) = ((4 \cdot 2 + 3 \cdot 1) + 1)^{2} = 144$

Now, calculate $f (x_{5})$ :

f (x_{5}) = α_{1} y_{1} k (x_{1}, x_{5}) + α_{2} y_{2} k (x_{2}, x_{5}) + α_{3} y_{3} k (x_{3}, x_{5}) + α_{4} y_{4} k (x_{4}, x_{5}) + b

Substitute the values:

f (x_{5}) = (0.0182 \times 1 \times 25) + (0.0068 \times 1 \times 64) + (0.0250 \times - 1 \times 64) + (0 \times - 1 \times 144) = 0.6898

Calculate each term:

$0.0182 \times 25 = 0.455$ $0.0068 \times 64 = 0.4352$ $0.0250 \times - 64 = - 1.6$

Adding them up with $b$ :

f (x_{5} ​) = 0.455 + 0.4352 - 1.6 + 0.6898 = - 0.02

Since $f (x_{5}) < 0$ , we classify $x_{5}$ as belonging to class $- 1$ .

Algorithm

Tutorial

assignment

Assignment

As-1

As-2

Lab-1

Lab-2

Lab-3

Lab-4

GAMES101

Assignment-1

Assignment-2

Assignment-3

Assignment-4

Lab

Lecture

Peoject

CSCN

Ploidy

ML-As-3 ​

Problem 1: Hard-margin SVM. (18 pts) ​

1. Write down the formulation of SVM, including the separation hyperplane, the constraints and the final optimization problem with parameters. (4 pts) ​

2. Write down the Lagrangian form for this problem using the parameters and Lagrange multipliers. Please also write out its dual form. (10 pts) ​

3. Assume that the Lagrangian multipliers α’s are all 0.5 and that the point (1,2) is a support vector for ease of calculation. Please calculate the values of weight vector w and bias b. Write out the explicit form of the hyperplane. (4 pts) ​

Problem 2: Soft-margin SVM. (20 pts) ​

1. Write down the formulation of the soft-margin SVM. for this problem using w, x, y, b and ξ. Write out explicitly their dimensions. (3 pts) ​

2. Write down the Lagrangian form and derive the dual for the problem. Write down the detailed derivation steps. (12 pts) ​

3. Obtain the decision boundary. (3 pts) ​

4. Explain why ξ disappears in the dual. (2 pts) ​

Problem 3: Kernel SVM. (17 pts) ​

1. Compute the Kernel Matrix K (6 pts) ​

2. Set up the Dual Optimization Problem. You can use the results from Problem 2. (4 pts) ​

3. Suppose the Lagrange multipliers α’s are ​

4. Classify a New Point x5=(2,1) using the learned kernel SVM model. (5 pts) ​

ML-As-3

Problem 1: Hard-margin SVM. (18 pts)

1. Write down the formulation of SVM, including the separation hyperplane, the constraints and the final optimization problem with parameters. (4 pts)

2. Write down the Lagrangian form for this problem using the parameters and Lagrange multipliers. Please also write out its dual form. (10 pts)

3. Assume that the Lagrangian multipliers $α$ ’s are all 0.5 and that the point $(1, 2)$ is a support vector for ease of calculation. Please calculate the values of weight vector $w$ and bias $b$ . Write out the explicit form of the hyperplane. (4 pts)

Problem 2: Soft-margin SVM. (20 pts)

1. Write down the formulation of the soft-margin SVM. for this problem using $w$ , $x$ , $y$ , $b$ and $ξ$ . Write out explicitly their dimensions. (3 pts)

2. Write down the Lagrangian form and derive the dual for the problem. Write down the detailed derivation steps. (12 pts)

3. Obtain the decision boundary. (3 pts)

4. Explain why $ξ$ disappears in the dual. (2 pts)

Problem 3: Kernel SVM. (17 pts)

1. Compute the Kernel Matrix $K$ (6 pts)

2. Set up the Dual Optimization Problem. You can use the results from Problem 2. (4 pts)

3. Suppose the Lagrange multipliers $α$ ’s are

4. Classify a New Point $x_{5} = (2, 1)$ using the learned kernel SVM model. (5 pts)