review: support vector machines - computer sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 ·...
TRANSCRIPT
![Page 1: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/1.jpg)
Review: Support vector machines
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
What are support vectors?
COMP 875 Machine learning techniques and image analysis
![Page 2: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/2.jpg)
Review: Support vector machines
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
What are support vectors?
COMP 875 Machine learning techniques and image analysis
![Page 3: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/3.jpg)
Review: Support vector machines
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
What are support vectors?
COMP 875 Machine learning techniques and image analysis
![Page 4: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/4.jpg)
Review: Support vector machines
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
What are support vectors?
COMP 875 Machine learning techniques and image analysis
![Page 5: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/5.jpg)
Review: Support vector machines (separable case)
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
Add constraints as terms to the objective function:
min(w,w0)
12‖w‖2 +
n∑i=1
maxαi≥0
αi[1− yi(w0 + wTxi)
]What if yi(w0 + wTxi) > 1 (not a support vector) at the solution?Then we must have αi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 6: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/6.jpg)
Review: Support vector machines (separable case)
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
Add constraints as terms to the objective function:
min(w,w0)
12‖w‖2 +
n∑i=1
maxαi≥0
αi[1− yi(w0 + wTxi)
]
What if yi(w0 + wTxi) > 1 (not a support vector) at the solution?Then we must have αi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 7: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/7.jpg)
Review: Support vector machines (separable case)
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
Add constraints as terms to the objective function:
min(w,w0)
12‖w‖2 +
n∑i=1
maxαi≥0
αi[1− yi(w0 + wTxi)
]What if yi(w0 + wTxi) > 1 (not a support vector) at the solution?
Then we must have αi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 8: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/8.jpg)
Review: Support vector machines (separable case)
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
Add constraints as terms to the objective function:
min(w,w0)
12‖w‖2 +
n∑i=1
maxαi≥0
αi[1− yi(w0 + wTxi)
]What if yi(w0 + wTxi) > 1 (not a support vector) at the solution?Then we must have αi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 9: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/9.jpg)
Review: Support vector machines (separable case)
Margin optimization
min(w,w0)
12‖w‖2
subject to yi(w0 + wTxi)− 1 ≥ 0, i = 1, . . . , n.
Add constraints as terms to the objective function:
min(w,w0)
12‖w‖2 +
n∑i=1
maxαi≥0
αi[1− yi(w0 + wTxi)
]What if yi(w0 + wTxi) > 1 (not a support vector) at the solution?Then we must have αi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 10: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/10.jpg)
Review: SVM optimization (separable case)
max{αi≥0}
min(w,w0)
{12‖w‖2 +
n∑i=1
αi[1− yi(w0 + wTxi)
]}︸ ︷︷ ︸
L(w,w0;α)
First, we fix α and minimize L(w, w0;α) w.r.t. w, w0:
∂
∂wL(w, w0;α) = w −
n∑i=1
αiyyxi = 0,
∂
∂w0L(w, w0;α) = −
n∑i=1
αiyi = 0.
w(α) =n∑i=1
αiyixi,n∑i=1
αiyi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 11: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/11.jpg)
Review: SVM optimization (separable case)
max{αi≥0}
min(w,w0)
{12‖w‖2 +
n∑i=1
αi[1− yi(w0 + wTxi)
]}︸ ︷︷ ︸
L(w,w0;α)
First, we fix α and minimize L(w, w0;α) w.r.t. w, w0:
∂
∂wL(w, w0;α) = w −
n∑i=1
αiyyxi = 0,
∂
∂w0L(w, w0;α) = −
n∑i=1
αiyi = 0.
w(α) =n∑i=1
αiyixi,n∑i=1
αiyi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 12: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/12.jpg)
Review: SVM optimization (separable case)
max{αi≥0}
min(w,w0)
{12‖w‖2 +
n∑i=1
αi[1− yi(w0 + wTxi)
]}︸ ︷︷ ︸
L(w,w0;α)
First, we fix α and minimize L(w, w0;α) w.r.t. w, w0:
∂
∂wL(w, w0;α) = w −
n∑i=1
αiyyxi = 0,
∂
∂w0L(w, w0;α) = −
n∑i=1
αiyi = 0.
w(α) =n∑i=1
αiyixi,n∑i=1
αiyi = 0.
COMP 875 Machine learning techniques and image analysis
![Page 13: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/13.jpg)
Review: SVM optimization (separable case)
w(α) =n∑i=1
αiyixi,n∑i=1
αiyi = 0.
Now we can substitute this solution into
max{αi≥0,
∑i αiyi=0}
{12‖w(α)‖2 +
n∑i=1
αi[1− yi(w0(α) + w(α)Txi)
]}
= max{αi≥0,
∑i αiyi=0}
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjxTi xj
.
COMP 875 Machine learning techniques and image analysis
![Page 14: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/14.jpg)
Review: SVM optimization (separable case)
w(α) =n∑i=1
αiyixi,n∑i=1
αiyi = 0.
Now we can substitute this solution into
max{αi≥0,
∑i αiyi=0}
{12‖w(α)‖2 +
n∑i=1
αi[1− yi(w0(α) + w(α)Txi)
]}
= max{αi≥0,
∑i αiyi=0}
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjxTi xj
.
COMP 875 Machine learning techniques and image analysis
![Page 15: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/15.jpg)
Review: SVM optimization (separable case)
Dual optimization problem
max
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjxTi xj
subject to
n∑i=1
αiyi = 0, αi ≥ 0 for all i = 1, . . . , n.
Solving this quadratic program yields the optimal α. We substituteit back to get w:
w = w(α) =n∑i=1
αiyixi
What is the structure of the solution?
COMP 875 Machine learning techniques and image analysis
![Page 16: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/16.jpg)
Review: SVM optimization (separable case)
Dual optimization problem
max
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjxTi xj
subject to
n∑i=1
αiyi = 0, αi ≥ 0 for all i = 1, . . . , n.
Solving this quadratic program yields the optimal α. We substituteit back to get w:
w = w(α) =n∑i=1
αiyixi
What is the structure of the solution?
COMP 875 Machine learning techniques and image analysis
![Page 17: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/17.jpg)
Review: SVM classification
w =∑αi>0
αiyixi.
Given a test example x, how is it classified?
y = sign(w0 + wTx
)= sign
w0 +
(∑αi>0
αiyixi
)Tx
= sign
(w0 +
∑αi>0
αiyixTi x
)
The classifier is based on the expansion in terms of dot products ofx with support vectors.
COMP 875 Machine learning techniques and image analysis
![Page 18: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/18.jpg)
Review: SVM classification
w =∑αi>0
αiyixi.
Given a test example x, how is it classified?
y = sign(w0 + wTx
)= sign
w0 +
(∑αi>0
αiyixi
)Tx
= sign
(w0 +
∑αi>0
αiyixTi x
)
The classifier is based on the expansion in terms of dot products ofx with support vectors.
COMP 875 Machine learning techniques and image analysis
![Page 19: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/19.jpg)
Review: Non-separable case
What if the training data are not linearly separable?
Basic idea: minimize
12‖w‖2 + C(penalty for violating margin constraints).
COMP 875 Machine learning techniques and image analysis
![Page 20: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/20.jpg)
Review: Non-separable case
What if the training data are not linearly separable?
Basic idea: minimize
12‖w‖2 + C(penalty for violating margin constraints).
COMP 875 Machine learning techniques and image analysis
![Page 21: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/21.jpg)
Non-separable case
Rewrite the constraints with slack variables ξi ≥ 0:
min(w,w0)
12‖w‖2 + C
n∑i=1
ξi
subject to yi(w0 + wTxi
)− 1+ξi ≥ 0.
Whenever margin is ≥ 1 (original constraint is satisfied),ξi = 0.
Whenever margin is < 1 (constraint violated), pay linearpenalty: ξi = 1− yi(w0 + wTxi).
Penalty function
max(0, 1− yi(w0 + wTxi)
)
COMP 875 Machine learning techniques and image analysis
![Page 22: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/22.jpg)
Non-separable case
Rewrite the constraints with slack variables ξi ≥ 0:
min(w,w0)
12‖w‖2 + C
n∑i=1
ξi
subject to yi(w0 + wTxi
)− 1+ξi ≥ 0.
Whenever margin is ≥ 1 (original constraint is satisfied),ξi = 0.
Whenever margin is < 1 (constraint violated), pay linearpenalty: ξi = 1− yi(w0 + wTxi).
Penalty function
max(0, 1− yi(w0 + wTxi)
)
COMP 875 Machine learning techniques and image analysis
![Page 23: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/23.jpg)
Non-separable case
Rewrite the constraints with slack variables ξi ≥ 0:
min(w,w0)
12‖w‖2 + C
n∑i=1
ξi
subject to yi(w0 + wTxi
)− 1+ξi ≥ 0.
Whenever margin is ≥ 1 (original constraint is satisfied),ξi = 0.
Whenever margin is < 1 (constraint violated), pay linearpenalty: ξi = 1− yi(w0 + wTxi).
Penalty function
max(0, 1− yi(w0 + wTxi)
)
COMP 875 Machine learning techniques and image analysis
![Page 24: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/24.jpg)
Non-separable case
Rewrite the constraints with slack variables ξi ≥ 0:
min(w,w0)
12‖w‖2 + C
n∑i=1
ξi
subject to yi(w0 + wTxi
)− 1+ξi ≥ 0.
Whenever margin is ≥ 1 (original constraint is satisfied),ξi = 0.
Whenever margin is < 1 (constraint violated), pay linearpenalty: ξi = 1− yi(w0 + wTxi).
Penalty function
max(0, 1− yi(w0 + wTxi)
)COMP 875 Machine learning techniques and image analysis
![Page 25: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/25.jpg)
Review: Hinge loss
max(0, 1− yi(w0 + wTxi)
)
COMP 875 Machine learning techniques and image analysis
![Page 26: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/26.jpg)
Connection between SVMs and logistic regression
Support vector machines:
Hinge loss: max(0, 1− yi(w0 + wT xi)
)Logistic regression:
P (yi|xi;w, w0) =1
1 + e−yi(w0+wT xi)
Log loss: log(1 + e−yi(w0+wT xi)
)
COMP 875 Machine learning techniques and image analysis
![Page 27: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/27.jpg)
Review: Non-separable case Source: G. Shakhnarovich
Dual problem:
max
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjxTi xj
subject to
n∑i=1
αiyi = 0, 0 ≤ αi ≤ C, for all i = 1, . . . , N.
αi = 0: not support vector.
0 < αi < C: SV on the margin, ξi = 0.
αi = C: over the margin, either misclassified (ξi > 1) or not(0 < ξi ≤ 1).
COMP 875 Machine learning techniques and image analysis
![Page 28: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/28.jpg)
Nonlinear SVM
General idea: try to map the original input space into ahigh-dimensional feature space where the data is separable.
COMP 875 Machine learning techniques and image analysis
![Page 29: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/29.jpg)
Example of nonlinear mapping
Not separable in 1D:
Separable in 2D:
What is φ(x)? φ(x) = (x, x2).
COMP 875 Machine learning techniques and image analysis
![Page 30: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/30.jpg)
Example of nonlinear mapping
Not separable in 1D:
Separable in 2D:
What is φ(x)? φ(x) = (x, x2).
COMP 875 Machine learning techniques and image analysis
![Page 31: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/31.jpg)
Example of nonlinear mapping
Not separable in 1D:
Separable in 2D:
What is φ(x)?
φ(x) = (x, x2).
COMP 875 Machine learning techniques and image analysis
![Page 32: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/32.jpg)
Example of nonlinear mapping
Not separable in 1D:
Separable in 2D:
What is φ(x)? φ(x) = (x, x2).
COMP 875 Machine learning techniques and image analysis
![Page 33: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/33.jpg)
Example of nonlinear mapping Source: G. Shakhnarovich
Consider the mapping:φ : [x1, x2]T → [1,
√2x1,
√2x2, x
21, x
22,√
2x1x2]T .
The (linear) SVM classifier in the feature space:
y = sign
(w0 +
∑αi>0
αiyiφ(xi)Tφ(x)
)
The dot product in the feature space:
φ(x)Tφ(z) = 1 + 2x1z1 + 2x2z2 + x21z
21 + x2
2z22 + 2x1x2z1z2
=(1 + xT z
)2.
COMP 875 Machine learning techniques and image analysis
![Page 34: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/34.jpg)
Dot products and feature space Source: G. Shakhnarovich
We defined a non-linear mapping into feature space
φ : [x1, x2]T → [1,√
2x1,√
2x2, x21, x
22,√
2x1x2]T
and saw that φ(x)Tφ(z) = K(x, z) using the kernel
K(x, z) =(1 + xT z
)2.
I.e., we can calculate dot products in the feature spaceimplicitly, without ever writing the feature expansion!
COMP 875 Machine learning techniques and image analysis
![Page 35: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/35.jpg)
The kernel trick Source: G. Shakhnarovich
Replace dot products in the SVM formulation with kernelvalues.
The optimization problem:
max
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjK(xi,xj)
Need to compute pairwise kernel values for training data.
The classifier now defines a nonlinear decision boundary in theoriginal space:
y = sign
(w0 +
∑αi>0
αiyiK(xi,x)
)
Need to compute K(xi,x) for all SVs xi.
COMP 875 Machine learning techniques and image analysis
![Page 36: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/36.jpg)
The kernel trick Source: G. Shakhnarovich
Replace dot products in the SVM formulation with kernelvalues.
The optimization problem:
max
n∑i=1
αi − 12
n∑i,j=1
αiαjyiyjK(xi,xj)
Need to compute pairwise kernel values for training data.
The classifier now defines a nonlinear decision boundary in theoriginal space:
y = sign
(w0 +
∑αi>0
αiyiK(xi,x)
)
Need to compute K(xi,x) for all SVs xi.
COMP 875 Machine learning techniques and image analysis
![Page 37: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/37.jpg)
Mercer’s kernels Source: G. Shakhnarovich
What kind of function K is a valid kernel, i.e. such that thereexists a feature space Φ(x) in which K(x, z) = φ(x)Tφ(z)?
Theorem due to Mercer (1930s)
K must be
continuous;
symmetric: K(x, z) = K(z,x);
positive definite: for any x1, . . . ,xN , the kernel matrix
K =
K(x1,x1) K(x1,x2) K(x1,xN ). . . . . . . . . . . . . . . . .K(xN ,x1) K(xN ,x2) K(xN ,xN )
must be positive definite.
COMP 875 Machine learning techniques and image analysis
![Page 38: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/38.jpg)
Some popular kernels Source: G. Shakhnarovich
The linear kernel:K(x, z) = xT z.
This leads to the original, linear SVM.
The polynomial kernel:
K(x, z; c, d) = (c+ xT z)d.
We can write the expansion explicitly, by concatenatingpowers up to d and multiplying by appropriate weights.
COMP 875 Machine learning techniques and image analysis
![Page 39: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/39.jpg)
Some popular kernels Source: G. Shakhnarovich
The linear kernel:K(x, z) = xT z.
This leads to the original, linear SVM.
The polynomial kernel:
K(x, z; c, d) = (c+ xT z)d.
We can write the expansion explicitly, by concatenatingpowers up to d and multiplying by appropriate weights.
COMP 875 Machine learning techniques and image analysis
![Page 40: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/40.jpg)
Example: SVM with polynomial kernel Source: G. Shakhnarovich
COMP 875 Machine learning techniques and image analysis
![Page 41: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/41.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 42: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/42.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.
Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 43: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/43.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.
The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 44: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/44.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 45: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/45.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞?
Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 46: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/46.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z.
TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 47: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/47.jpg)
Radial basis function kernel Source: G. Shakhnarovich
K(x, z;σ) = exp(− 1σ2‖x− z‖2
).
The RBF kernel is a measure of similarity between twoexamples.
The mapping φ(x) is infinite-dimensional!
What is the role of parameter σ?
Consider σ → 0.Then K(xi,x;σ) → 1 if x = z or 0 if x 6= z.The SVM simply “memorizes” the training data (overfitting,lack of generalization).
What about σ →∞? Then K(x, z)→ 1 for all x, z. TheSVM underfits.
COMP 875 Machine learning techniques and image analysis
![Page 48: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/48.jpg)
SVM with RBF (Gaussian) kernels Source: G. Shakhnarovich
Note: some SV here not close to the boundary
COMP 875 Machine learning techniques and image analysis
![Page 49: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/49.jpg)
Making more Mercer kernels
Let K1 and K2 be Mercer kernels. Then the following functionsare also Mercer kernels:
K(x, z) = K1(x, z) +K2(x, z)K(x, z) = aK1(x, z) (a is a positive scalar)
K(x, z) = K1(x, z)K2(x, z)K(x, z) = xTBz (B is a symmetric positive semi-definitematrix)
Multiple kernel learning: learn kernel combinations as part of SVMoptimization.
COMP 875 Machine learning techniques and image analysis
![Page 50: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/50.jpg)
Making more Mercer kernels
Let K1 and K2 be Mercer kernels. Then the following functionsare also Mercer kernels:
K(x, z) = K1(x, z) +K2(x, z)K(x, z) = aK1(x, z) (a is a positive scalar)
K(x, z) = K1(x, z)K2(x, z)K(x, z) = xTBz (B is a symmetric positive semi-definitematrix)
Multiple kernel learning: learn kernel combinations as part of SVMoptimization.
COMP 875 Machine learning techniques and image analysis
![Page 51: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/51.jpg)
Multi-class SVMs
Various “direct” formulations exist, but they are not widelyused in practice. It is more common to obtain multi-classclassifiers by combining two-class SVMs in various ways.
One vs. others:
Traning: learn an SVM for each class vs. the othersTesting: apply each SVM to test example and assign to it theclass of the SVM that returns the highest decision value
One vs. one:
Training: learn an SVM for each pair of classesTesting: each learned SVM “votes” for a class to assign to thetest example
Error-correcting codes, decision trees/DAGs..
COMP 875 Machine learning techniques and image analysis
![Page 52: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/52.jpg)
Multi-class SVMs
Various “direct” formulations exist, but they are not widelyused in practice. It is more common to obtain multi-classclassifiers by combining two-class SVMs in various ways.
One vs. others:
Traning: learn an SVM for each class vs. the othersTesting: apply each SVM to test example and assign to it theclass of the SVM that returns the highest decision value
One vs. one:
Training: learn an SVM for each pair of classesTesting: each learned SVM “votes” for a class to assign to thetest example
Error-correcting codes, decision trees/DAGs..
COMP 875 Machine learning techniques and image analysis
![Page 53: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/53.jpg)
Multi-class SVMs
Various “direct” formulations exist, but they are not widelyused in practice. It is more common to obtain multi-classclassifiers by combining two-class SVMs in various ways.
One vs. others:
Traning: learn an SVM for each class vs. the othersTesting: apply each SVM to test example and assign to it theclass of the SVM that returns the highest decision value
One vs. one:
Training: learn an SVM for each pair of classesTesting: each learned SVM “votes” for a class to assign to thetest example
Error-correcting codes, decision trees/DAGs..
COMP 875 Machine learning techniques and image analysis
![Page 54: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/54.jpg)
Multi-class SVMs
Various “direct” formulations exist, but they are not widelyused in practice. It is more common to obtain multi-classclassifiers by combining two-class SVMs in various ways.
One vs. others:
Traning: learn an SVM for each class vs. the othersTesting: apply each SVM to test example and assign to it theclass of the SVM that returns the highest decision value
One vs. one:
Training: learn an SVM for each pair of classesTesting: each learned SVM “votes” for a class to assign to thetest example
Error-correcting codes, decision trees/DAGs..
COMP 875 Machine learning techniques and image analysis
![Page 55: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/55.jpg)
Support Vector Regression Source: G. Shakhnarovich
The model: f(x) = w0 + wTx
Instead of the margin around the predicted decision boundary, wehave ε-tube around the predicted function.
y
y(x)
y(x) + ε
y(x)− ε
x
ε-insensitive loss:
−1 −eps 0 eps 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
COMP 875 Machine learning techniques and image analysis
![Page 56: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/56.jpg)
Support Vector Regression Source: G. Shakhnarovich
The model: f(x) = w0 + wTx
Instead of the margin around the predicted decision boundary, wehave ε-tube around the predicted function.
y
y(x)
y(x) + ε
y(x)− ε
x
ε-insensitive loss:
−1 −eps 0 eps 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
COMP 875 Machine learning techniques and image analysis
![Page 57: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/57.jpg)
Support Vector Regression
Optimization: introduce constraints and slack variables for goingabove or below the tube.
minw0,w
12‖w‖2 + C
n∑i=1
(ξi + ξi)
subject to (w0 + wTxi)− yi ≤ ε+ ξi ,
yi − (w0 + wTxi) ≤ ε+ ξi ,
ξi, ξi ≥ 0, i = 1, . . . , n.
COMP 875 Machine learning techniques and image analysis
![Page 58: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/58.jpg)
Support Vector Regression: Dual problem
maxα
n∑i=1
(αi−αi)yi−εn∑i=1
(αi+αi)− 12
n∑i,j=1
(αi−αi)(αj−αj)xTi xj ,
subject to 0 ≤ αi, αi ≤ C,n∑i=1
(αi − αi) = 0, i = 1, . . . , n.
Note that at the solution, we must have ξiξi = 0, αiαi = 0.
We can let βi = αi − αi and simplify:
maxβ
n∑i=1
yiβi − εn∑i=1
|βi| − 12
n∑i,j=1
βiβjxTi xj ,
subject to − C ≤ βi ≤ C,n∑i=1
βi = 0, i = 1, . . . , n.
Then f(x) = w∗0 +∑n
i=1 β∗i x
Ti xj , where w∗0 is chosen so that
f(xi)− yi = −ε for any i with 0 < β∗i < C.
COMP 875 Machine learning techniques and image analysis
![Page 59: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/59.jpg)
Support Vector Regression: Dual problem
maxα
n∑i=1
(αi−αi)yi−εn∑i=1
(αi+αi)− 12
n∑i,j=1
(αi−αi)(αj−αj)xTi xj ,
subject to 0 ≤ αi, αi ≤ C,n∑i=1
(αi − αi) = 0, i = 1, . . . , n.
Note that at the solution, we must have ξiξi = 0, αiαi = 0.We can let βi = αi − αi and simplify:
maxβ
n∑i=1
yiβi − εn∑i=1
|βi| − 12
n∑i,j=1
βiβjxTi xj ,
subject to − C ≤ βi ≤ C,n∑i=1
βi = 0, i = 1, . . . , n.
Then f(x) = w∗0 +∑n
i=1 β∗i x
Ti xj , where w∗0 is chosen so that
f(xi)− yi = −ε for any i with 0 < β∗i < C.
COMP 875 Machine learning techniques and image analysis
![Page 60: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/60.jpg)
Support Vector Regression: Dual problem
maxα
n∑i=1
(αi−αi)yi−εn∑i=1
(αi+αi)− 12
n∑i,j=1
(αi−αi)(αj−αj)xTi xj ,
subject to 0 ≤ αi, αi ≤ C,n∑i=1
(αi − αi) = 0, i = 1, . . . , n.
Note that at the solution, we must have ξiξi = 0, αiαi = 0.We can let βi = αi − αi and simplify:
maxβ
n∑i=1
yiβi − εn∑i=1
|βi| − 12
n∑i,j=1
βiβjxTi xj ,
subject to − C ≤ βi ≤ C,n∑i=1
βi = 0, i = 1, . . . , n.
Then f(x) = w∗0 +∑n
i=1 β∗i x
Ti xj , where w∗0 is chosen so that
f(xi)− yi = −ε for any i with 0 < β∗i < C.
COMP 875 Machine learning techniques and image analysis
![Page 61: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/61.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 62: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/62.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classification
The kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 63: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/63.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 64: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/64.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 65: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/65.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant C
Choice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 66: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/66.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parameters
Number of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 67: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/67.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 68: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/68.jpg)
SVM: summary Source: G. Shakhnarovich
Main ideas:
Large margin classificationThe kernel trick
What does the complexity/generalization ability of SVMsdepend on?
The constant CChoice of kernel and kernel parametersNumber of support vectors
A crucial component: good QP solver.
Tons of off-the-shelf packages.
COMP 875 Machine learning techniques and image analysis
![Page 69: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/69.jpg)
Advantages and disadvantages of SVMs
Advantages:
One of the most successful ML techniques!
Good generalization ability for small training sets
Kernel trick is powerful and flexible
Margin-based formalism can be extended to a large class ofproblems (regression, structured prediction, etc.)
Disadvantages:
Computational and storage complexity of training: quadraticin the size of the training set
In the worst case, can degenerate to nearest neighbor (everytraining point a support vector), but is much slower to train
No direct multi-class formulation
COMP 875 Machine learning techniques and image analysis
![Page 70: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/70.jpg)
Advantages and disadvantages of SVMs
Advantages:
One of the most successful ML techniques!
Good generalization ability for small training sets
Kernel trick is powerful and flexible
Margin-based formalism can be extended to a large class ofproblems (regression, structured prediction, etc.)
Disadvantages:
Computational and storage complexity of training:
quadraticin the size of the training set
In the worst case, can degenerate to nearest neighbor (everytraining point a support vector), but is much slower to train
No direct multi-class formulation
COMP 875 Machine learning techniques and image analysis
![Page 71: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/71.jpg)
Advantages and disadvantages of SVMs
Advantages:
One of the most successful ML techniques!
Good generalization ability for small training sets
Kernel trick is powerful and flexible
Margin-based formalism can be extended to a large class ofproblems (regression, structured prediction, etc.)
Disadvantages:
Computational and storage complexity of training: quadraticin the size of the training set
In the worst case, can degenerate to nearest neighbor (everytraining point a support vector), but is much slower to train
No direct multi-class formulation
COMP 875 Machine learning techniques and image analysis
![Page 72: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/72.jpg)
Advantages and disadvantages of SVMs
Advantages:
One of the most successful ML techniques!
Good generalization ability for small training sets
Kernel trick is powerful and flexible
Margin-based formalism can be extended to a large class ofproblems (regression, structured prediction, etc.)
Disadvantages:
Computational and storage complexity of training: quadraticin the size of the training set
In the worst case, can degenerate to nearest neighbor (everytraining point a support vector), but is much slower to train
No direct multi-class formulation
COMP 875 Machine learning techniques and image analysis
![Page 73: Review: Support vector machines - Computer Sciencelazebnik/fall09/lec06_svm.pdf · 2009-09-14 · Review: Support vector machines Margin optimization min (w;w 0) 1 2 kwk2 subject](https://reader034.vdocument.in/reader034/viewer/2022042317/5f06a6087e708231d4190a9a/html5/thumbnails/73.jpg)
Advantages and disadvantages of SVMs
Advantages:
One of the most successful ML techniques!
Good generalization ability for small training sets
Kernel trick is powerful and flexible
Margin-based formalism can be extended to a large class ofproblems (regression, structured prediction, etc.)
Disadvantages:
Computational and storage complexity of training: quadraticin the size of the training set
In the worst case, can degenerate to nearest neighbor (everytraining point a support vector), but is much slower to train
No direct multi-class formulation
COMP 875 Machine learning techniques and image analysis