exploiting duality (particularly the dual of svm) m. pawan kumar visual geometry group

Exploiting Duality(Particularly the dual of SVM)

M. Pawan Kumar

VISUAL GEOMETRY GROUP

PART I : General duality theory

PART II : Solving the SVM dual

• Basics of Mathematical Optimization

• The algebra

• The geometry

• Examples

• General Decomposition Algorithm

• Good Working Set

• Implementation Details

Mathematical Optimization

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

Objective function

Inequality constraints

Equality constraints

x is a feasible point fi(x) ≤ 0, hi(x) = 0

x is a strictly feasible point fi(x) < 0, hi(x) = 0

Feasible region - set of all feasible points

Convex Optimization

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

Objective function



Objective function is convex

Feasible region is convex

Convex set??? Convex function???

Convex Set

x1 x2

c x1 + (1 - c) x2

c [0,1]

Line Segment

Endpoints

Convex Set

x1 x2

All points on the line segment lie within the set

For all line segments with endpoints in the set

Non-Convex Set

x1 x2

Examples of Convex Sets

x1 x2

Line Segment


x1 x2

Line


Hyperplane aTx - b = 0


Halfspace aTx - b ≤ 0


Second-order Cone ||x|| ≤ t

t

x2

x1

Operations that Preserve ConvexityIntersection

Polyhedron / Polytope

Operations that Preserve ConvexityIntersection

Operations that Preserve ConvexityAffine Transformation x Ax + b

Convex Function

x

f(x)

Blue point always lies above red point

x1

x2

Convex Function

x

f(x)

f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2)

x1

x2

Domain of f(.) has to be convex

Convex Function

x

f(x)

x1

x2

-f(.) is concave

f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2)

Convex Function

Once-differentiable functions

f(y) + f(y)T (x - y) ≤ f(x)

x

f(x)

(y,f(y))

f(y) + f(y)T (x - y)

Twice-differentiable functions

2f(x) 0

Convex Function and Convex Sets

x

f(x)

Epigraph of a convex function is a convex set

Examples of Convex Functions

Linear function aTx

p-Norm functions (x1p + x2

p + xnp)1/p, p ≥ 1

Quadratic functions xT Q x

Q 0

Operations that Preserve ConvexityNon-negative weighted sum

x

f1(x)

w1

x

f2(x)

+ w2 + ….

xT Q x + aTx + b

Q 0

Operations that Preserve ConvexityPointwise maximum

x

f1(x)

max

x

f2(x)

,

Pointwise minimum of concave functions is concave

Convex Optimization

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

Objective function



Objective function is convex

Feasible region is convex




• The algebra

• The geometry

• Examples




Lagrangian

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

f0(x) + ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)L(x,,)

Lagrangian Dual

+ ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)L(x,,) f0(x)

minx L(x,,)g(,)

x belongs to intersection of domains of f0, fi and hi

x D

Lagrangian Dual

+ ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)f0(x) minx

g(,) =

Pointwise minimum of affine (concave) functions

Dual function is concave

Lagrangian Dual

+ ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)f0(x)

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

p* =

minx

g(,) =

≥

For all (,)

The Dual Problem

The lower bound could be far from p*

Best lower bound?

+ ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)f0(x) minxmax,

Easy to obtain d* =

p* - d* ≥ 0 Duality Gap

The Geometric Interpretation

(fi(x), hi(x), f0(x))u v t

x DG

t

G

u

p*

The Geometric Interpretation

(u, v, t)

G

t

u

(, , 1)T ≥ g(, )

p*

g()

d*

The Duality Gap

+ ∑i i fi(x) i ≥ 0

+ ∑i i hi(x)f0(x)

min f0(x)

s.t. fi(x) ≤ 0

hi(x) = 0

p* =

max, minx

d* =

≥

The Duality Gap

p* - d* Duality Gap

p* - d* ≥ 0 Weak Duality

p* - d* = 0 Strong Duality

Strong Duality

Problem is convex

There exists a strictly feasible point

Slater’s Condition

Taken care of by most solvers

At Strong Duality

f0(x*) = g(*, *)

= minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) )

≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*)

≤ f0(x*) Inequalities hold with equality

x* minimizes the Lagrangian at (*, *)

At Strong Duality

f0(x*) = g(*, *)

= minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) )

≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*)

≤ f0(x*) Inequalities hold with equality

i*fi(x*) = 0

KKT Conditions

fi(x*) ≤ 0 hi(x*) = 0

i* ≥ 0

Primal feasible

Dual feasible

i*fi(x*) = 0 Complementary Slackness

f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0

Necessary conditions for strong duality

KKT Conditions

fi(x*) ≤ 0 hi(x*) = 0

i* ≥ 0

Primal feasible

Dual feasible

i*fi(x*) = 0 Complementary Slackness

f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0

Necessary and sufficient for convex problems




• The algebra

• The geometry

• Examples




Linear Program

min cTx

s.t. A x = b

x ≥ 0

QCQP

min (1/2)xTP0x + q0x + r0

s.t. (1/2)xTPix + qix + ri

Entropy Maximization

min ∑i xi log(xi)

s.t. A x ≤ b

∑i xi = 1

The SVM Framework

Points X = {xi}

Labels y= {yi}

wTx + b = 0

yi {-1, +1}

yi (wTxi + b) ≥ 1 - i

i ≥ 0

min C i

2/||w||

1/2 wTw +

Convex Quadratic Program

The SVM Dual

min (1/2) TQ - T1

s.t. Ty = 0

0 ≤ ≤ C1

Qij = yiyjxiTxj = yiyj k(xi,xj)




• The algebra

• The geometry

• Examples




The SVM Dualmin (1/2) TQ - T1

s.t. Ty = 0

0 ≤ ≤ C1

Choose ‘q’ variables. Fix the rest.

Change unfixed variables, satisfying constraints,to decrease objective function (small problem).

Repeat. Minimum ‘q’ ??? Till When ???

Best set B?

KKT Conditionsmin (1/2) TQ - T1

s.t. Ty = 0

0 ≤ ≤ C1

eq

ilo i

up

-1 + Q + eqy - lo + up = 0

ilo i = 0 i

up (i - C) = 0

ilo ≥ 0 i

up ≥ 0

g()

KKT Conditions

-1 + g() + eqy - lo + up = 0

ilo i = 0 i

up (i - C) = 0

ilo ≥ 0 i

up ≥ 0

For all 0 < i < C -1 + gi() + eqyi = 0

For all i = 0 -1 + gi() + eqyi - ilo = 0

For all i = C -1 + gi() + eqyi + iup = 0

KKT Conditions

-1 + g() + eqy - lo + up = 0

ilo i = 0 i

up (i - C) = 0

ilo ≥ 0 i

up ≥ 0

gi() = yi ∑j jyj k(xi,xj)

git() = gi(t-1) + yi ∑j B (j

t - jt-1)yj k(xi,xj)

Best set of ‘q’ variables (Working set)




• The algebra

• The geometry

• Examples




Working Set

gi() = yi ∑j jyj k(xi,xj)

d : feasible direction of descent

t = t-1 + d

Choose steepest descent direction

First order approximation of objective

(-1 + g(t-1))T d

Working Set

mind (-1 + g(t-1))T d

s.t. yT d = 0

di ≥ 0 if it-1 = 0

di ≤ 0 if it-1 = C

Card{d} = q

-1 ≤ di ≤ 1

Working Set

si = yi (-1 + gi(t-1))

Sort according decreasing values of si

Choose q/2 from top if 0 < it-1 < C,

or di = -yi satisfies feasibility of direction

Choose q/2 from bottom if 0 < it-1 < C,

or di = yi satisfies feasibility of direction

Working Set

mind (-1 + g(t-1))T d

s.t. yT d = 0

di ≥ 0 if it-1 = 0

di ≤ 0 if it-1 = C

Card{d} = q

-1 ≤ di ≤ 1




• The algebra

• The geometry

• Examples




Shrinking

For all 0 < i < C -1 + gi() + eqyi = 0

For all i = 0 -1 + gi() + eqyi - ilo = 0

For all i = C -1 + gi() + eqyi + iup = 0

If ilo > 0 or i

up > 0 for n consecutive iterations

Drop i from problem (temporarily)

Caching

Kernel evaluation can be expensive

Cache them in a least-recently-used manner

Choose q’ variables where cache available

ResultsThose who have used SVMlight :

You know that it works very well.

Those who haven’t used SVMlight :

It works very well. See paper. Download.

Questions???

exploiting duality (particularly the dual of svm) m. pawan kumar visual geometry group

Documents

convex function x fx

w1w1 x f

max x f

convex slide

h i x lx

h i x f0xf0x

t x b

concave f c x