exploiting duality (particularly the dual of svm) m. pawan kumar visual geometry group
Post on 21-Dec-2015
217 views
TRANSCRIPT
Exploiting Duality(Particularly the dual of SVM)
M. Pawan Kumar
VISUAL GEOMETRY GROUP
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
Mathematical Optimization
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
Objective function
Inequality constraints
Equality constraints
x is a feasible point fi(x) ≤ 0, hi(x) = 0
x is a strictly feasible point fi(x) < 0, hi(x) = 0
Feasible region - set of all feasible points
Convex Optimization
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
Objective function
Inequality constraints
Equality constraints
Objective function is convex
Feasible region is convex
Convex set??? Convex function???
Convex Set
x1 x2
c x1 + (1 - c) x2
c [0,1]
Line Segment
Endpoints
Convex Set
x1 x2
All points on the line segment lie within the set
For all line segments with endpoints in the set
Non-Convex Set
x1 x2
Examples of Convex Sets
x1 x2
Line Segment
Examples of Convex Sets
x1 x2
Line
Examples of Convex Sets
Hyperplane aTx - b = 0
Examples of Convex Sets
Halfspace aTx - b ≤ 0
Examples of Convex Sets
Second-order Cone ||x|| ≤ t
t
x2
x1
Operations that Preserve ConvexityIntersection
Polyhedron / Polytope
Operations that Preserve ConvexityIntersection
Operations that Preserve ConvexityAffine Transformation x Ax + b
Convex Function
x
f(x)
Blue point always lies above red point
x1
x2
Convex Function
x
f(x)
f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2)
x1
x2
Domain of f(.) has to be convex
Convex Function
x
f(x)
x1
x2
-f(.) is concave
f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2)
Convex Function
Once-differentiable functions
f(y) + f(y)T (x - y) ≤ f(x)
x
f(x)
(y,f(y))
f(y) + f(y)T (x - y)
Twice-differentiable functions
2f(x) 0
Convex Function and Convex Sets
x
f(x)
Epigraph of a convex function is a convex set
Examples of Convex Functions
Linear function aTx
p-Norm functions (x1p + x2
p + xnp)1/p, p ≥ 1
Quadratic functions xT Q x
Q 0
Operations that Preserve ConvexityNon-negative weighted sum
x
f1(x)
w1
x
f2(x)
+ w2 + ….
xT Q x + aTx + b
Q 0
Operations that Preserve ConvexityPointwise maximum
x
f1(x)
max
x
f2(x)
,
Pointwise minimum of concave functions is concave
Convex Optimization
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
Objective function
Inequality constraints
Equality constraints
Objective function is convex
Feasible region is convex
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
Lagrangian
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
f0(x) + ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)L(x,,)
Lagrangian Dual
+ ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)L(x,,) f0(x)
minx L(x,,)g(,)
x belongs to intersection of domains of f0, fi and hi
x D
Lagrangian Dual
+ ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)f0(x) minx
g(,) =
Pointwise minimum of affine (concave) functions
Dual function is concave
Lagrangian Dual
+ ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)f0(x)
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
p* =
minx
g(,) =
≥
For all (,)
The Dual Problem
The lower bound could be far from p*
Best lower bound?
+ ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)f0(x) minxmax,
Easy to obtain d* =
p* - d* ≥ 0 Duality Gap
The Geometric Interpretation
(fi(x), hi(x), f0(x))u v t
x DG
t
G
u
p*
The Geometric Interpretation
(u, v, t)
G
t
u
(, , 1)T ≥ g(, )
p*
g()
d*
The Duality Gap
+ ∑i i fi(x) i ≥ 0
+ ∑i i hi(x)f0(x)
min f0(x)
s.t. fi(x) ≤ 0
hi(x) = 0
p* =
max, minx
d* =
≥
The Duality Gap
p* - d* Duality Gap
p* - d* ≥ 0 Weak Duality
p* - d* = 0 Strong Duality
Strong Duality
Problem is convex
There exists a strictly feasible point
Slater’s Condition
Taken care of by most solvers
At Strong Duality
f0(x*) = g(*, *)
= minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) )
≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*)
≤ f0(x*) Inequalities hold with equality
x* minimizes the Lagrangian at (*, *)
At Strong Duality
f0(x*) = g(*, *)
= minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) )
≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*)
≤ f0(x*) Inequalities hold with equality
i*fi(x*) = 0
KKT Conditions
fi(x*) ≤ 0 hi(x*) = 0
i* ≥ 0
Primal feasible
Dual feasible
i*fi(x*) = 0 Complementary Slackness
f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0
Necessary conditions for strong duality
KKT Conditions
fi(x*) ≤ 0 hi(x*) = 0
i* ≥ 0
Primal feasible
Dual feasible
i*fi(x*) = 0 Complementary Slackness
f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0
Necessary and sufficient for convex problems
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
Linear Program
min cTx
s.t. A x = b
x ≥ 0
QCQP
min (1/2)xTP0x + q0x + r0
s.t. (1/2)xTPix + qix + ri
Entropy Maximization
min ∑i xi log(xi)
s.t. A x ≤ b
∑i xi = 1
The SVM Framework
Points X = {xi}
Labels y= {yi}
wTx + b = 0
yi {-1, +1}
yi (wTxi + b) ≥ 1 - i
i ≥ 0
min C i
2/||w||
1/2 wTw +
Convex Quadratic Program
The SVM Dual
min (1/2) TQ - T1
s.t. Ty = 0
0 ≤ ≤ C1
Qij = yiyjxiTxj = yiyj k(xi,xj)
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
The SVM Dualmin (1/2) TQ - T1
s.t. Ty = 0
0 ≤ ≤ C1
Choose ‘q’ variables. Fix the rest.
Change unfixed variables, satisfying constraints,to decrease objective function (small problem).
Repeat. Minimum ‘q’ ??? Till When ???
Best set B?
KKT Conditionsmin (1/2) TQ - T1
s.t. Ty = 0
0 ≤ ≤ C1
eq
ilo i
up
-1 + Q + eqy - lo + up = 0
ilo i = 0 i
up (i - C) = 0
ilo ≥ 0 i
up ≥ 0
g()
KKT Conditions
-1 + g() + eqy - lo + up = 0
ilo i = 0 i
up (i - C) = 0
ilo ≥ 0 i
up ≥ 0
For all 0 < i < C -1 + gi() + eqyi = 0
For all i = 0 -1 + gi() + eqyi - ilo = 0
For all i = C -1 + gi() + eqyi + iup = 0
KKT Conditions
-1 + g() + eqy - lo + up = 0
ilo i = 0 i
up (i - C) = 0
ilo ≥ 0 i
up ≥ 0
gi() = yi ∑j jyj k(xi,xj)
git() = gi(t-1) + yi ∑j B (j
t - jt-1)yj k(xi,xj)
Best set of ‘q’ variables (Working set)
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
Working Set
gi() = yi ∑j jyj k(xi,xj)
d : feasible direction of descent
t = t-1 + d
Choose steepest descent direction
First order approximation of objective
(-1 + g(t-1))T d
Working Set
mind (-1 + g(t-1))T d
s.t. yT d = 0
di ≥ 0 if it-1 = 0
di ≤ 0 if it-1 = C
Card{d} = q
-1 ≤ di ≤ 1
Working Set
si = yi (-1 + gi(t-1))
Sort according decreasing values of si
Choose q/2 from top if 0 < it-1 < C,
or di = -yi satisfies feasibility of direction
Choose q/2 from bottom if 0 < it-1 < C,
or di = yi satisfies feasibility of direction
Working Set
mind (-1 + g(t-1))T d
s.t. yT d = 0
di ≥ 0 if it-1 = 0
di ≤ 0 if it-1 = C
Card{d} = q
-1 ≤ di ≤ 1
PART I : General duality theory
PART II : Solving the SVM dual
• Basics of Mathematical Optimization
• The algebra
• The geometry
• Examples
• General Decomposition Algorithm
• Good Working Set
• Implementation Details
Shrinking
For all 0 < i < C -1 + gi() + eqyi = 0
For all i = 0 -1 + gi() + eqyi - ilo = 0
For all i = C -1 + gi() + eqyi + iup = 0
If ilo > 0 or i
up > 0 for n consecutive iterations
Drop i from problem (temporarily)
Caching
Kernel evaluation can be expensive
Cache them in a least-recently-used manner
Choose q’ variables where cache available
ResultsThose who have used SVMlight :
You know that it works very well.
Those who haven’t used SVMlight :
It works very well. See paper. Download.
Questions???