support vector machines: maximum-margin linear...
TRANSCRIPT
![Page 1: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/1.jpg)
Support vector machines:Maximum-margin linear classifiers
![Page 2: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/2.jpg)
Topics we’ll cover
1 The margin of a linear classifier
2 Maximizing the margin
3 A convex optimization problem
4 Support vectors
![Page 3: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/3.jpg)
Improving upon the Perceptron
For a linearly separable data set, there are in general many possibleseparating hyperplanes, and Perceptron is guaranteed to find oneof them.
Is there a better, more systematic choice of separator?The one with the most buffer around it, for instance?
![Page 4: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/4.jpg)
The learning problem
Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .
By scaling w , b, can equivalently ask for
y (i)(w · x (i) + b) ≥ 1 for all i
![Page 5: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/5.jpg)
The learning problem
Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that y (i)(w · x (i) + b) > 0 for all i .
By scaling w , b, can equivalently ask for
y (i)(w · x (i) + b) ≥ 1 for all i
![Page 6: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/6.jpg)
Maximizing the margin
Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that
y (i)(w · x (i) + b) ≥ 1 for all i .
w · x + b = 0
w · x + b = 1
w · x + b = �1
�
�
Maximize the margin γ.
![Page 7: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/7.jpg)
Maximizing the margin
Given: training data (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}.Find: w ∈ Rd and b ∈ R such that
y (i)(w · x (i) + b) ≥ 1 for all i .
w · x + b = 0
w · x + b = 1
w · x + b = �1
�
�
Maximize the margin γ.
![Page 8: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/8.jpg)
A formula for the margin
Close-up of a point z on the positive boundary.
w · x + b = 0
w · x + b = 1
�
z
A quick calculation shows that γ = 1/‖w‖.In short: to maximize the margin, minimize ‖w‖.
![Page 9: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/9.jpg)
Maximum-margin linear classifier
• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}
minw∈Rd ,b∈R
‖w‖2
s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n
• This is a convex optimization problem:
• Convex objective function• Linear constraints
• This means that:
• the optimal solution can be found efficiently• duality gives us information about the solution
![Page 10: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/10.jpg)
Maximum-margin linear classifier
• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}
minw∈Rd ,b∈R
‖w‖2
s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n
• This is a convex optimization problem:
• Convex objective function• Linear constraints
• This means that:
• the optimal solution can be found efficiently• duality gives us information about the solution
![Page 11: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/11.jpg)
Maximum-margin linear classifier
• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}
minw∈Rd ,b∈R
‖w‖2
s.t.: y (i)(w · x (i) + b) ≥ 1 for all i = 1, 2, . . . , n
• This is a convex optimization problem:
• Convex objective function• Linear constraints
• This means that:
• the optimal solution can be found efficiently• duality gives us information about the solution
![Page 12: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/12.jpg)
Support vectors
Support vectors: training points right on the margin, i.e.y (i)(w · x (i) + b) = 1.
w · x + b = 0
↵i is nonzero only forthese support vectors
w =∑n
i=1 αiy(i)x (i) is a function of just the support vectors.
![Page 13: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/13.jpg)
Small example: Iris data set
Fisher’s iris data
150 data points from three classes:
• iris setosa
• iris versicolor
• iris virginica
Four measurements: petal width/length, sepal width/length
![Page 14: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/14.jpg)
Small example: Iris data set
Two features: sepal width, petal width.Two classes: setosa (red circles), versicolor (black triangles)
![Page 15: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/15.jpg)
Small example: Iris data set
Two features: sepal width, petal width.Two classes: setosa (red circles), versicolor (black triangles)
![Page 16: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/16.jpg)
Soft Maximum-margin linear classifier
• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}
minw∈Rd ,b∈R,ξ∈Rn
‖w‖2 + C∑i
ξi
s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n
ξ ≥ 0
• Allows for violation of constraints:
• Model pays for violation via slack variables ξi• Works with non-separable data!
![Page 17: Support vector machines: Maximum-margin linear classifierscseweb.ucsd.edu/classes/wi19/cse151-b/SVM.pdf · 2019-03-05 · Improving upon the Perceptron For a linearly separable data](https://reader030.vdocument.in/reader030/viewer/2022040613/5f069ce67e708231d418dc80/html5/thumbnails/17.jpg)
Soft Maximum-margin linear classifier
• Given (x (1), y (1)), . . . , (x (n), y (n)) ∈ Rd × {−1,+1}
minw∈Rd ,b∈R,ξ∈Rn
‖w‖2 + C∑i
ξi
s.t.: y (i)(w · x (i) + b) ≥ 1− ξi for all i = 1, 2, . . . , n
ξ ≥ 0
• Allows for violation of constraints:
• Model pays for violation via slack variables ξi• Works with non-separable data!