least squares support vector machine classifier

Least squares Support Vector Machine

Rajkumar Singh

November 25, 2012

Table of Contents

Support Vector machine

Least Squares Support Vector Machine Classifier

Conclusion

Support Vector Machines

I SVM is a classifier derived from statistical learning theory byVapnik and Chervonnkis.

I SVMs introduced by Boser, Guyon, Vapnik in COLT-92.

I Initially popularized in NIPS community, now an importantand active field of all Machine Learning Research

What is SVM?

I SVMs are learning systems thatI Use a hypothesis space of linear functionsI In a high dimensional feature space - kernel functionsI Trained with a learning algorithm from optimization theory -

LagrangeI Implements a learning bias derived from statistical learning

theory - Generalization

Support Vector Machines for Classification

Given a training set of N data points yk , xkNk=1, the supportvector method approach aims at constructing a classifier of theform.

y(x) = sign[N∑

k=1

αkykψ(x , xk) + b] (1)

wherexk ∈ Rn is the kth input patternyk ∈ Rn is the kth output.αk are positive constants, b is real constant.

ψ(x , xk) =

xTk x Linear SVM.

(xTk x + 1)d Polynomial SVM of degree d

exp−‖x−xk‖22/σ

2 RBF SVM

tanh(kxTk x + θ) Two layer neural SVMwhere σ, θ, k are constants.

SVMs for Classifications

The classifier is constructed as follows. One assumes that.ωTφ(xk) + b ≥ 1, ifyk = +1

ωTφ(xk) + b ≤ −1, ifyk = −1(2)

Which is equivalent to

yk [ωTφ(xk) + b] ≥ 1, k = 1, . . .N (3)

Where φ() is a non-linear function which maps the input space intohigher dimensional space.In order to have the possibility to violate (3), in case a separatinghyperplane in this high dimensional space does not exist, variablesξk are introduced such that

yk [ωTφ(xk) + b] ≥ 1− ξk , k = 1, . . .N

ξk ≥, k = 1, . . .N(4)

SVMs for ClassificationAccording to the structural risk minimization principle, the riskbound is minimized by formulating the optimization problem.

minω,ξk

J1(ω, ξk) =1

2ωTω + c

N∑k=1

ξk (5)

Subject to (4). Therefore, one constructs the Lagrangian.

L1(ω, b, ξk , αk , vk) = J1(ω, ξk)−N∑

k=1

αkyk [ωTφ(xk) + b]− 1 + ξk

−N∑

k=1

vkξk

(6)

by introducing Lagrange multipliers αk ≥ 0, vk ≥ 0(k = 1, . . .N)The solution is given by the saddle point of the Lagrangian bycomputing.

maxαk ,vk

minω,b,ξk

L1(ω, b, ξk , αk , vk). (7)

SVMs for Classificationfrom (7) one can obtain.

δL1δω

= 0→ ω =N∑

k=1

αkykφ(xk

δL1δb

= 0→N∑

k=1

αkyk = 0

δL1δξk

= 0→ 0 ≤ αk ≤ c , k = 1, . . .N.

(8)

Which leads to the solution of the following quadraticprogramming problem

maxαk

Q1(αk ;φ(xk)) = −1

2

N∑k,l=1

ykylφ(xk)Tφ(xl)αkαl +N∑

k=1

αk (9)

such that∑N

k=1 αkyk = 0, 0 ≤ α ≤ c , k = 1, . . .N. The functionφ(xk) in (9) is related then to ψ(x , xk) by imposing

φ(x)Tφ(xk) = ψ(x , xk), (10)

Note that for the two layer neural SVM, Mercer’s condtion onlyholds for certain parameter values of k and θ. The classifier (3) isdesigned by solving.

maxαk

Q1(αk ;ψ(xk , xl)) = −1

2

N∑k,l=1

ykylψ(xk , xl)αkαl +N∑

k=1

αk (11)

Subject to the constrainnts in (9). one does not have to calculateω nor φ(xk) in order to determine the decision surface. Thesolution to (11) will be global.Further, it can be show that the hyperplane (3) satisfyingconstraint ‖ ω ‖2≤ α have a VC-dimension h which is boundend by

h ≤ min([r2a2], n) + 1 (12)

Where [.] denoted the integer part and r is the radius of thesmallest ball containing the points φ(x1), . . . φ(xN). Such ball isfound by defining Lagrangian.

L2(r , q, λk) = r2 −N∑

k=1

λk(r2− ‖ φ(xk)− q ‖22 (13)

SVMs for Classification

in (13) q is the center, λk are positive lagrange multipliers. Hereq =

∑k λkφ(xk), where the lagrangian follows from.

maxλk

Q2(λk ;φ(xk)) = −∑k,l=1

Nφ(xk)Tφ(xl)λkλl+N∑

k=1

λkφ(xk)Tφ(xk)

(14)Such that

∑Nk=1 λk = 1, λk ≥ 0, k = 1, . . .N. Based on (10), Q2

can also be expressed in terms of ψ(xk , xl). Finally one selects asupport vector machine VC dimension by solving (11) andcomputing (12) and (12).

Least Squares Support Vector MachinesLeast squares version to the SVM classifier by formulating theclassification problem as

minω,b,e

J3(ω, b, e) =1

2ωTω + γ

1

2

N∑k=1

e2k (15)

subject to the equality constraints.

yk [wTφ(xk) + b] = 1− ek , k = 1, . . . ,N. (16)

Lagrangian defined as

L3(ω, b, e, α) = J3(ω, b, e)−N∑

k=1

αkyk [ωTφ(xk)+b]−1+ek (17)

where αk are lagrange multipliers. The conditions for optimality

δL3δω

= 0→ ω =N∑

k=1

αkykφ(xk)

δL3δb

= 0→N∑

k=1

αkyk = 0

(18)

Least Squares Support Vector Machines

δL3δek

= 0→ αk = γek , k = 1, . . .N

δL3δαk

= 0→ yk [ωTφ(xk) + b]− 1 + ek = 0, k = 1, . . . ,N

can be writte n as the solution to the following set of linearequations.

I 0 0 −ZT

0 0 0 −Y T

0 0 γI −IZ Y I 0

ωbe

α

=

000

I

(19)

WhereZ = [φ(x1)T y1; . . . φ(xN)T yN ]Y = [y1; . . . ; yN ]→1 = [1; . . . 1]e = [e1, . . . eN ],α = [α1, . . . αN ]

Least Squares Support Vector MachinesThe solution is given by[

0 −Y T

Y ZZT + γ − 1I

] [b

a

]=

[0

1

](20)

Mercer’s Condition can be applied again to the matrix Ω = ZZT ,where

Ωkl = ykylφ(xk)Tφ(xl)

= ykylψ(xk), xl)(21)

Hence the classifier (1) is found by solving the linear equations(20), (21) instead of quadratic programming. The parameters ofthe kernele such as σ for the RBF kernel can be optimally chosenaccording to (12). The support values αk are proportional to theerrors ar the data points (18), while in case of (14) most values areequal to zero. Hence one could rather speak of a support valuespectrum in the least squares case.

Conclusion

I Due to the equality constraints, a set of linear equations hasto be solved instead of quadratic programming,

I Mercer’s condition is applied as in other SVM’s.

I Least squares SVM with RBF kernel is readily found withexcellent generalization performence and low computationalcost.

References

1. Least Squres Support Vector machine Classifiers., J.A.KSuykens, and J. Vandewalie.

least squares support vector machine classifier

Documents