Download - Topics in Algorithms 2007
![Page 1: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/1.jpg)
Topics in Algorithms 2007
Ramesh Hariharan
![Page 2: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/2.jpg)
Support Vector Machines
![Page 3: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/3.jpg)
Machine Learning
How do learn good separators for 2 classes of points?
Seperator could be linear or non-linear
Maximize margin of separation
![Page 4: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/4.jpg)
Support Vector Machines
Hyperplane w
x
w
|w| = 1 For all x on the hyperplane w.x = |w||x| cos(ø)= |x|cos (ø) = constant = -b w.x+b=0
ø
-b
![Page 5: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/5.jpg)
Support Vector Machines
Margin of separation
w
|w| = 1x Є Blue: wx+b >= Δx Є Red: wx+b <= -Δ
maximize 2 Δ w,b,Δ
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 6: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/6.jpg)
Support Vector Machines
Eliminate Δ by dividing by Δ
w
|w| = 1x Є Blue: (w/Δ) x + (b/Δ) >= 1x Є Red: (w/Δ) x + (b/Δ) <= -1
w’=w/Δ b’=b/Δ |w’|=|w|/Δ=1/Δ
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 7: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/7.jpg)
Support Vector Machines
Perfect Separation Formulation
w
x Є Blue: w’x+b’ >= 1x Є Red: w’x+b’ <= -1
minimize |w’|/2 w’,b’
minimize (w’.w’)/2 w’,b’
wx+b=0
wx+b=Δ
wx+b=-Δ
![Page 8: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/8.jpg)
Support Vector Machines
Formulation allowing for misclassification
x Є Blue: wx+b >= 1x Є Red: -(wx+b) >= 1
minimize (w.w)/2 w,b
xi Є Blue: wxi + b >= 1-ξi
xi Є Red: -(wxi + b) >= 1-ξi
ξi >= 0
minimize (w.w)/2 + C Σξi
w,b,ξi
![Page 9: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/9.jpg)
Support Vector Machines
Duality
yi (wxi + b) + ξi >= 1 ξi >= 0 yi=+/-1, class label
minimize (w.w)/2 + C Σξi
w,b,ξi
Primal
Σ λi yi = 0 λi >= 0 -λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
Dual
![Page 10: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/10.jpg)
Support Vector Machines
Duality (Primal Lagrangian Dual)
If Primal is feasible then Primal=Lagrangian Primal yi (wxi + b) + ξi >= 1
ξi >= 0 yi=+/-1, class label
min (w.w)/2 + C Σξi
w,b,ξi
Primal
min maxw,b,ξi λi, αi >=0
(w.w)/2 + C Σξi
- Σi λi (yi (wxi + b) + ξi - 1) - Σi αi (ξi - 0)
Lagrangian Primal
=
![Page 11: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/11.jpg)
Support Vector Machines
Lagrangian Primal Lagrangian Dual
Langrangian Primal >= Lagrangian Dual
>=
min maxw,b,ξi λi, αi >=0
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1) -Σiαi(ξi -0)
Lagrangian Primal
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
Lagrangian Dual
![Page 12: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/12.jpg)
Support Vector Machines Lagrangian Primal >= Lagrangian DualProof
Consider a 2d matrix
Find max in each row
Find the smallest of these values
Find min in each column
Find the largest of these values
LP
LD
![Page 13: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/13.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
Proof
Consider w* b* ξ* optimal for primal
Find λi, αi>=0 such that
minimizing over w,b,ξ gives w* b* ξ*
Σiλi(yi (w*xi+b*)+ξi* -1)=0
Σiαi(ξi* -0)=0
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 14: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/14.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
Proof
Consider w* b* ξi* optimal for primal
Find λi, αi >=0 such that
Σiλi(yi (w*xi+b*)+ξi* -1)=0
Σiαi(ξi* -0)=0
ξi* > 0 implies αi=0
yi (w*xi+b*)+ξi* -1 !=0 implies λi=0
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 15: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/15.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
Proof
Consider w* b* ξi* optimal for primal
Find λi, αi >=0 such that minimizing over w,b,ξi gives w*,b*, ξi*
at w*,b*,ξi*
δ/ δwj = 0, δ/ δξi = 0, δ/ δb = 0and second derivatives should be non-neg at
all places
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 16: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/16.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
Proof
Consider w* b* ξi* optimal for primal
Find λi, αi >=0 such that
minimizing over w,b gives w*,b*
w* - Σiλi yi xi = 0
-Σiλi yi = 0
-λi - αi +C = 0
second derivatives are always non-neg
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 17: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/17.jpg)
Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?
Proof
Consider w* b* ξi* optimal for primal
Find λi, αi >=0 such that
ξi* > 0 implies αi=0
yi (w*xi+b*)+ξi* -1 !=0 implies λi=0 w* - Σiλi yi xi = 0
-Σiλi yi = 0
- λi - αi + C = 0 Such a λi, αi >=0 always exists!!!!!
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
![Page 18: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/18.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
Roll all primal variables into w
lagrange multipliers into λ
min f(w) w Xw >= y
max min f(w) – λ (Xw-y)λ>=0 w
min max f(w) – λ (Xw-y) w λ>=0
![Page 19: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/19.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
X
w*y
>
=
0
=>=0λ
λ>=0
X=
Grad(f) at w* =
Claim: This is satisfiable
>=
![Page 20: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/20.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
λ>=0
X=
Grad(f) =
Claim: This is satisfiable
Grad(f)
Row vectors of X=
Grad(f)
![Page 21: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/21.jpg)
Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?
λ>=0
X=
Grad(f) =
Claim: This is satisfiable
Row vectors of X=
Grad(f)
h
X= h >=0, Grad(f) h < 0w*+h is feasible and f(w*+h)<f(w*) for small enough h
![Page 22: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/22.jpg)
Support Vector Machines Finally the Lagrange Dual
max minλi, αi >=0 w,b,ξi
(w.w)/2+ C Σξi
-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)
w - Σiλi yi xi = 0
-Σiλi yi = 0
-λi - αi +C = 0
Rewrite in final dual form
Σ λi yi = 0λi >= 0-λi >= -C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
![Page 23: Topics in Algorithms 2007](https://reader035.vdocument.in/reader035/viewer/2022062422/5681329b550346895d99383c/html5/thumbnails/23.jpg)
Support Vector Machines Karush-Kuhn-Tucker conditions
Rewrite in final dual form
Σ λi yi = 0λi >= 0-λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi
Σiλi(yi (w*xi+b*)+ξi* -1)=0
Σiαi(ξi* -0)=0 -λi - αi +C = 0
If ξi*>0 αi =0 λi =C
If yi (w*xi+b*)+ξi* -1>0
λi = 0
ξi* = 0
If 0 < λi <C yi (w*xi+b*)=1