proximal support vector machine classifiers kdd 2001 san francisco august 26-29, 2001
DESCRIPTION
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001. Glenn Fung & Olvi Mangasarian. Data Mining Institute University of Wisconsin - Madison. Support Vector Machines Maximizing the Margin between Bounding Planes. A+. A-. - PowerPoint PPT PresentationTRANSCRIPT
Proximal Support Vector Machine ClassifiersKDD 2001
San Francisco August 26-29, 2001
Glenn Fung & Olvi Mangasarian
Data Mining Institute
University of Wisconsin - Madison
Support Vector MachinesMaximizing the Margin between Bounding
Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
w
Proximal Vector MachinesFitting the Data using two parallel
Bounding Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
w
Standard Support Vector Machine Formulation
Margin is maximized by minimizing21kw;í k2
2
÷> 0 Solve the quadratic program for some :
2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y > ey;w;ímin
s. t.(QP)
,
, denoteswhere D ii = æ1 A+ Aàor membership.
PSVM Formulation
We have from the QP SVM formulation:
w;í (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e=
This simple, but critical modification, changes the nature of the optimization problem tremendously!!
Solving for in terms of and gives:
minw;í 2
÷keà D(Awà eí )k22 + 2
1kw; í k22
y w í
Advantages of New Formulation
Objective function remains strongly convex
An explicit exact solution can be written in terms of the problem data
PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space
Exact leave-one-out-correctness can be obtained in terms of problem data
Linear PSVM
We want to solve:
w;ímin
2÷keà D(Awà eí )k2
2 + 21kw; í k2
2
Setting the gradient equal to zero, gives a nonsingular system of linear equations.
Solution of the system gives the desired PSVM classifier
Linear PSVM Solution
H = [A à e]Here,
íw
h i= (÷
I + H 0H)à 1H 0De
The linear system to solve depends on:
H 0H(n + 1) â (n + 1)which is of the size
is usually much smaller than n m
Linear Proximal SVM Algorithm
Classifier: sign(w0x à í )
Input A;D
Define H = [A à e]
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
Nonlinear PSVM Formulation
By QP “duality”, w = A0Du. Maximizing the margin
in the “dual space” , gives:
2÷keà D(AA0Du à eí )k2
2+ 21ku;í k2
2u;í
min
K (A;A0) Replace AA0by a nonlinear kernel :
2÷keà D(K (A;A0)Du à eí )k2
2+ 21ku;í k2
2u;ímin
Linear PSVM: (Linear separating surface:x0w = í )
w;í (QP)2÷kyk2
2 + 21kw;í k2
2
D(Awà eí ) + y
min
s. t. = e
The Nonlinear Classifier
K (A;A0) : Rmâ n â Rnâ m7à! Rmâ m
K (x0;A0)Du = í
The nonlinear classifier:
Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel :
"à ökA ià A jk22; i; j = 1;. . .;mK (A;A0)ij =
The ij -entry of K (A;A0) represents the “similarity” of data points A i A jand
Nonlinear PSVM
H = [K (A;A0) à e]Defining slightly different:
íu
h i= (÷
I + H 0H)à 1H 0De
Similar to the linear case, setting the gradient equal to zero, we obtain:
However, reduced kernels techniques can be used (RSVM)to reduce dimensionality.
Here, the linear system to solve is of the size
(m+ 1) â (m+ 1)
Linear Proximal SVM Algorithm
Input A;D
Solve (÷I + H 0H) í
wh i
= v
v = H0DeCalculate
Non
Define H = [A à e] K = K (A;A0)K
Classifier: sign(w0x à í ) Classifier: sign(K (x0;A0)u à í )
u u = Du