least squares support vector machine classifier
DESCRIPTION
Here Discussed Support Vector Machine classifier and mainly focused on Least Squares Support VEctor machine cllassifier.TRANSCRIPT
Least squares Support Vector Machine
Rajkumar Singh
November 25, 2012
Table of Contents
Support Vector machine
Least Squares Support Vector Machine Classifier
Conclusion
Support Vector Machines
I SVM is a classifier derived from statistical learning theory byVapnik and Chervonnkis.
I SVMs introduced by Boser, Guyon, Vapnik in COLT-92.
I Initially popularized in NIPS community, now an importantand active field of all Machine Learning Research
What is SVM?
I SVMs are learning systems thatI Use a hypothesis space of linear functionsI In a high dimensional feature space - kernel functionsI Trained with a learning algorithm from optimization theory -
LagrangeI Implements a learning bias derived from statistical learning
theory - Generalization
Support Vector Machines for Classification
Given a training set of N data points yk , xkNk=1, the supportvector method approach aims at constructing a classifier of theform.
y(x) = sign[N∑
k=1
αkykψ(x , xk) + b] (1)
wherexk ∈ Rn is the kth input patternyk ∈ Rn is the kth output.αk are positive constants, b is real constant.
ψ(x , xk) =
xTk x Linear SVM.
(xTk x + 1)d Polynomial SVM of degree d
exp−‖x−xk‖22/σ
2 RBF SVM
tanh(kxTk x + θ) Two layer neural SVMwhere σ, θ, k are constants.
SVMs for Classifications
The classifier is constructed as follows. One assumes that.ωTφ(xk) + b ≥ 1, ifyk = +1
ωTφ(xk) + b ≤ −1, ifyk = −1(2)
Which is equivalent to
yk [ωTφ(xk) + b] ≥ 1, k = 1, . . .N (3)
Where φ() is a non-linear function which maps the input space intohigher dimensional space.In order to have the possibility to violate (3), in case a separatinghyperplane in this high dimensional space does not exist, variablesξk are introduced such that
yk [ωTφ(xk) + b] ≥ 1− ξk , k = 1, . . .N
ξk ≥, k = 1, . . .N(4)
SVMs for ClassificationAccording to the structural risk minimization principle, the riskbound is minimized by formulating the optimization problem.
minω,ξk
J1(ω, ξk) =1
2ωTω + c
N∑k=1
ξk (5)
Subject to (4). Therefore, one constructs the Lagrangian.
L1(ω, b, ξk , αk , vk) = J1(ω, ξk)−N∑
k=1
αkyk [ωTφ(xk) + b]− 1 + ξk
−N∑
k=1
vkξk
(6)
by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . .N)The solution is given by the saddle point of the Lagrangian bycomputing.
maxαk ,vk
minω,b,ξk
L1(ω, b, ξk , αk , vk). (7)
SVMs for Classificationfrom (7) one can obtain.
δL1δω
= 0→ ω =N∑
k=1
αkykφ(xk
δL1δb
= 0→N∑
k=1
αkyk = 0
δL1δξk
= 0→ 0 ≤ αk ≤ c , k = 1, . . .N.
(8)
Which leads to the solution of the following quadraticprogramming problem
maxαk
Q1(αk ;φ(xk)) = −1
2
N∑k,l=1
ykylφ(xk)Tφ(xl)αkαl +N∑
k=1
αk (9)
such that∑N
k=1 αkyk = 0, 0 ≤ α ≤ c , k = 1, . . .N. The functionφ(xk) in (9) is related then to ψ(x , xk) by imposing
φ(x)Tφ(xk) = ψ(x , xk), (10)
Note that for the two layer neural SVM, Mercer’s condtion onlyholds for certain parameter values of k and θ. The classifier (3) isdesigned by solving.
maxαk
Q1(αk ;ψ(xk , xl)) = −1
2
N∑k,l=1
ykylψ(xk , xl)αkαl +N∑
k=1
αk (11)
Subject to the constrainnts in (9). one does not have to calculateω nor φ(xk) in order to determine the decision surface. Thesolution to (11) will be global.Further, it can be show that the hyperplane (3) satisfyingconstraint ‖ ω ‖2≤ α have a VC-dimension h which is boundend by
h ≤ min([r2a2], n) + 1 (12)
Where [.] denoted the integer part and r is the radius of thesmallest ball containing the points φ(x1), . . . φ(xN). Such ball isfound by defining Lagrangian.
L2(r , q, λk) = r2 −N∑
k=1
λk(r2− ‖ φ(xk)− q ‖22 (13)
SVMs for Classification
in (13) q is the center, λk are positive lagrange multipliers. Hereq =
∑k λkφ(xk), where the lagrangian follows from.
maxλk
Q2(λk ;φ(xk)) = −∑k,l=1
Nφ(xk)Tφ(xl)λkλl+N∑
k=1
λkφ(xk)Tφ(xk)
(14)Such that
∑Nk=1 λk = 1, λk ≥ 0, k = 1, . . .N. Based on (10), Q2
can also be expressed in terms of ψ(xk , xl). Finally one selects asupport vector machine VC dimension by solving (11) andcomputing (12) and (12).
Least Squares Support Vector MachinesLeast squares version to the SVM classifier by formulating theclassification problem as
minω,b,e
J3(ω, b, e) =1
2ωTω + γ
1
2
N∑k=1
e2k (15)
subject to the equality constraints.
yk [wTφ(xk) + b] = 1− ek , k = 1, . . . ,N. (16)
Lagrangian defined as
L3(ω, b, e, α) = J3(ω, b, e)−N∑
k=1
αkyk [ωTφ(xk)+b]−1+ek (17)
where αk are lagrange multipliers. The conditions for optimality
δL3δω
= 0→ ω =N∑
k=1
αkykφ(xk)
δL3δb
= 0→N∑
k=1
αkyk = 0
(18)
Least Squares Support Vector Machines
δL3δek
= 0→ αk = γek , k = 1, . . .N
δL3δαk
= 0→ yk [ωTφ(xk) + b]− 1 + ek = 0, k = 1, . . . ,N
can be writte n as the solution to the following set of linearequations.
I 0 0 −ZT
0 0 0 −Y T
0 0 γI −IZ Y I 0
ωbe
α
=
000
I
(19)
WhereZ = [φ(x1)T y1; . . . φ(xN)T yN ]Y = [y1; . . . ; yN ]→1 = [1; . . . 1]e = [e1, . . . eN ],α = [α1, . . . αN ]
Least Squares Support Vector MachinesThe solution is given by[
0 −Y T
Y ZZT + γ − 1I
] [b
a
]=
[0
1
](20)
Mercer’s Condition can be applied again to the matrix Ω = ZZT ,where
Ωkl = ykylφ(xk)Tφ(xl)
= ykylψ(xk), xl)(21)
Hence the classifier (1) is found by solving the linear equations(20), (21) instead of quadratic programming. The parameters ofthe kernele such as σ for the RBF kernel can be optimally chosenaccording to (12). The support values αk are proportional to theerrors ar the data points (18), while in case of (14) most values areequal to zero. Hence one could rather speak of a support valuespectrum in the least squares case.
Conclusion
I Due to the equality constraints, a set of linear equations hasto be solved instead of quadratic programming,
I Mercer’s condition is applied as in other SVM’s.
I Least squares SVM with RBF kernel is readily found withexcellent generalization performence and low computationalcost.
References
1. Least Squres Support Vector machine Classifiers., J.A.KSuykens, and J. Vandewalie.