ITERATIVE METHODS AND REGULARIZATION
IN THE DESIGN OF FAST ALGORITHMS
Lorenzo Orecchia, MIT Math
An unified framework for optimization and online learning
beyond Multiplicative Weight Updates
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
Optimization:
Regularized Updates
Online Learning:
Multiplicative Weight
Updates (MWUs)
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
• Online Linear Optimization
• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)
• A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW
• Non-smooth vs Smooth Convex Optimization
•Non-smooth Convex Optimization reduces to Online Linear Optimization
• Application: Understanding Undirected Maxflow algorithms based on MWUs
MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH
Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]
Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]
Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]
… and more
TOC Applications of MWUs
Machine Learning meets Optimization meets TCS
These techniques have been rediscovered multiple times in different fields:
Machine Learning, Convex Optimization, TCS
Three surveys emphasizing the different viewpoints and literatures:
1) ML: Prediction, Learning and Games by Gabor and Lugosi
2) Optimization: Lectures in Modern Convex Optimization
by Ben Tal and Nemirowski
3) TCS: The Multiplicative Weights Update Method: a Meta
Algorithm and Applications by Arora, Hazan and Kale
REGULARIZATION 101
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
• Stable optimum
• Unique optimal solution
• Smoothness conditions
…
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
Benefits of Regularization in Learning and Statistics:
• Prevents overfitting
• Increases stability
•Decreases sensitivity to random noise
Regularizer F Parameter ¸ > 0
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a linear optimization problem:
The optimal solution f(c) may be very unstable under perturbation of c :
S ½Rn
kc0 ¡ ck · ± and
S
cc0
f(c0) f(c)
kf(c0)¡ f(c)k >> ±
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is ¾-strongly convex.
Then:
S ½Rn
kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is ¾-strongly convex.
Then:
S ½Rn
kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)
kslopek · ±
ONLINE LINEAR OPTIMIZATION
AND
MULTIPLICATIVE WEIGHT UPDATES
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective
Loss vector
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X
Current solution
`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective
Loss vector
`(t)Tx(t)
Algorithm’s loss
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X
Updated solution
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½
Updated solution New Loss Vector
L̂
SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.
At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½
GOAL: update x(t) to minimize regret
Average Algorithm’s Loss A Posteriori Optimum
1
T¢TX
t=1
`(t)TxT ¡min
x2X
1
T¢TX
t=1
`(t)
i
T
x
L¤
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
p(t)ALGORITHM ADVERSARY
distribution over dimensions
i.e. experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
EiÃp(t)h`(t)
i
i= p(t)
T`(t)
Algorithm’s loss
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,
k`(t)k1 · ½
Experts’ losses
p(t+1)
Update distribution
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
MULTIPLICATIVE WEIGHT UPDATE
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1¡ ²)`(t)
i w(t)
i ; w1 = ~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
Simplex Case: Multiplicative Weight Updates
² 2 (0; 1)0 1
CONSERVATIVE AGGRESSIVE
MWUs: Unraveling the Update
p(t)
ALGORITHM ADVERSARY
`(t)
WEIGHT
CUMULATIVE LOSS
(1¡ ²)
Pt`(t)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
w(t+1)
i
Pt `(t)
i
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L̂¡L? · ½ logn
²T+ ½²
k`(t)k1 · ½² < 12
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L̂¡L? · ½ logn
²T+ ½²
² < 12
p(t+1)
i / w(t+1)
i = (1¡ ²)`(t)
i ¢w(t)iUpdate:
Algorithm’s
Regret
Start-up Penalty Penalty for
being greedy
k`(t)k1 · ½
ONLINE LINEAR OPTIMIZATION BEYOND MWUs
A REGULARIZATION FRAMEWORK
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
• Potential function is related to algorithm’s performance
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
©(t+1) ¡©(t) ¸³`(t)
Tp(t)´¡ ²
MWUs: Proof Sketch of Regret Bound
©(t+1) = log1¡²Pn
i=1w(t+1)
i
p(t+1)
i / w(t+1)
i = (1¡ ²)
Pt
s=1`(s)
iUpdate:
• Proof is potential function argument
• Potential function bounds loss of best expert
• Potential function is related to algorithm’s performance
©(t+1) · log1¡²minni=1w
(t+1)
i =minni=1
³Pt
s=1 `(s)
i
´
©(t+1) ¡©(t) ¸³`(t)
Tp(t)´¡ ²
DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?
MWUs: Proof Sketch of Regret Bound
MWUs AND APPLICATIONS
Designing a Regularized Update GOAL: Design an update and its potential function analysis
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t) ©(t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
Fails if best expert changes moves drastically
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 1 – FOLLOW THE LEADER: Cumulative loss
L(t) =Pt
s=1 `(s)
x(t+1) = argminx2X
xTL(t)
©(t+1) = minx2X
xTL(t)
Designing a Regularized Update
How to make update
more stable?
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Definition
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Definition
These properties are actually sufficient to get a regret bound
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
©(t+1) · minx2X
L(t)Tx+ ´ ¢max
x2XF(x)
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
©(t+1) · minx2X
L(t)Tx+ ´ ¢max
x2XF(x) Regularization
error
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best expert’s loss
2) tracks algorithm’s performance
Attempt 2 – FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + ´ ¢F(x)
©(t+1) = minx2X
xTL(t) + ´ ¢F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. ¾-strong convex w.r.t. norm
Parameter ´ ¸ 0, TBD
Regularized Update: Analysis
?
f(t+1)(x)
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Define:
©(t+1)
©(t)
Define:
©(t+1)
©(t)
Notice:
f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Define:
©(t+1)
©(t)
Notice:
f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector
`(t)Tx(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Compare:
©(t+1)
©(t)
and ©(t+1) ¡©(t)
Tracking the Algorithm: Proof by Picture
`(t)Tx(t)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
`(t)Tx(t)
f(t)(x)
f(t+1)(x)
p
Want:
©(t+1)
©(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x(t)) ¼ f(t+1)(x(t+1))
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
`(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
`(t)
Regularization in Action
©(t+1)
©(t)
f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION
kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)
Quadratic
lower bound
to f(t+1)
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾
f (t+1) is (´ ¢ ¾)-strongly-convex
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +
jj`(t)jj2¤2´ ¢ ¾
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t)
f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +
jj`(t)jj2¤2´ ¢ ¾
f (t+1) is (´ ¢ ¾)-strongly-convex
©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)
¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`
(t)k2¤2´ ¢ ¾
jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´Regret at iteration t
Progress in one iteration:
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´
Progress in one iteration:
Telescopic sum:
©(T+1) ¸TX
t=1
`(t)Tp(t) +©(1) ¡ T ¢ jj`
(t)jj2´ ¢ ¾
MWUs AND APPLICATIONS
Completing the Analysis
©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤
2¾´
Progress in one iteration:
Telescopic sum:
©(T+1) ¸TX
t=1
`(t)Tp(t) +©(1) ¡ T ¢ jj`
(t)jj2´ ¢ ¾
Final regret bound:
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·
´
T¢ (maxx2X
F (x)¡minx2X
F (x)) +½2
2¾´
MWUs AND APPLICATIONS
Completing the Analysis
Regret bound: with regularizer F and
jj`(t)jj¤ · ½
Start-up Penalty Penalty for
being greedy
SAME TYPE OF BOUND AS FOR MWUs
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·
´
T¢ (maxx2X
F (x)¡minx2X
F (x)) +½2
2¾´
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy F (p) =
nX
i=1
pi log pi
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nX
i=1
pi log pi
k ¢ k1
p(t+1) = arg minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpi
p(t+1)
i =e¡
1´L(t)
i
Pn
i=1 e¡ 1´L(t)
i
=(1¡ ²)L
(t)
i
Pn
i=1(1¡ ²)L(t)
i
:
SOFT-MAX
MWUs AND APPLICATIONS
Reinterpreting MWUs
©(t+1) = minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nX
i=1
pi log pi
k ¢ k1
p(t+1) = arg minp¸0;Ppi=1
pTL(t) + ´ ¢nX
i=1
pi logpi
p(t+1)
i =e¡
1´L(t)
i
Pn
i=1 e¡ 1´L(t)
i
=(1¡ ²)L
(t)
i
Pn
i=1(1¡ ²)L(t)
i
:
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over ́
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))
¾
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p
¾T
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over ́
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))
¾
1
T
ÃTX
t=1
`(t)Tx(t) ¡min
x2X
TX
t=1
`(t)Tx
!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p
¾T
Negative entropy with -norm is approximately optimal for simplex
QUESTION: are other regularizers ever useful?
`1
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Spectral algorithms for balanced separator running in time
Uses random-walk framework and SDP MWUs
Different walks correspond to different regularizers for eigenvector problem
[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]
Different Regularizers in Algorithm Design
F(X) = Tr(X1=2)
F(X) = Tr(Xp)
F(X) = Tr(X logX)SDP MWU
p-norm, 1 · p · 1
NEW REGULARIZER
Heat Kernel Random Walk
Lazy Random Walk
Personalized PageRank
~O(m)
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
²-spectral-sparsifiers with edges
Uses Matrix concentration bound equivalent to SDP MWUs
[Spielman, Srivastava 2008]
²-spectral-sparsifiers with edges
Can be interpreted as different regularizer:
[Batson, Spielman, Srivastava 2009]
Different Regularizers in Algorithm Design
O(n logn²2
)
O( n²2)
F(X) = Tr(X1=2)
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
Many more in Online Learning
Bandit Online Learning [AHR], …
Different Regularizers in Algorithm Design
NON-SMOOTH CONVEX OPTIMIZATION
REDUCES TO
ONLINE LINEAR OPTIMIZATION
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
Convex Optimization Setup
8x 2 X;
krf(x)k¤ · ½
8x; y 2 X;
krf(y)¡rf(x)k¤ · Lky ¡ xk
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
½-Lipschitz continuous ½-Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
ONLY DUAL GUARANTEE
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k¤ · ½
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
½-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k¤ · ½
f convex, differentiable
X µ Rn closed, convex set
minx2X
f(x)
½-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER BOUND:
LOWER BOUND:
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER:
LOWER :
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
LOWER
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) ¸ f(x¤)
f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))
Take convex combination of both upper bounds and lower bounds with weights °t
UPPER:
LOWER :
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
UPPER
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
LOWER HOW TO UPDATE ITERATES?
HOW TO CHOSE WEIGHTS?
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
UPPER:
LOWER :
DUALITY GAP:
1PT
t=1°t
³PT
t=1 °tf(x(t))´¸ f(x¤)
f(x¤) ¸ 1PT
t=1°t
hPT
t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))
i
·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
LINEAR FUNCTION
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: ·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X ¡rf(x(t))
ONLINE SETUP
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: ·PT
t=1°tPT
t=1°tf(x(t))
¸¡ f(x¤) ·
PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
Reduction to Online Linear Minimization
Fix weights °t to be uniform for simplicity:
DUALITY GAP: hPT
t=11Tf(x(t))
i¡ f(x¤) · 1
T¢PT
t=1¡rf(x(t))T(x¤ ¡ x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
1
T¢TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))
¾pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with ¾-strongly-convex regularizer F
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =¡rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient
k`(t)k¤ = krf(x(t))k¤ · ½
TX
t=1
¡rf(x(t))T (x¤ ¡ x(t)) = REGRET
²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))
¾pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with ¾-strongly-convex regularizer F
ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND
Non-Smooth Optimization over Simplex
²MD ·½p2 ¢ lognpT
RESULTING ALGORITHM:
MIRROR DESCENT OVER SIMPLEX = MWU
Regularizer F is negative entropy, with krf(x(t))k1 · ½
APPLICATIONS IN ALGORITHM DESIGN
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
LP Feasibility problem
Easy constraints
Maintain feasible Hard constraints
Require fixing
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)Best response to dual
solution p
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
xp : pT (b¡Axp) ¸ 0;
(b¡Axp) 2 @f(p)Subgradient is slack
in constraints
Warm-up Example: Linear Programming
A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
If we can pick xp such that , then
LP Feasibility problem
minp2¢m
maxx2X
pT (b¡Ax)
f(p) = maxx2X
pT (b¡Ax)
xp : pT (b¡Axp) ¸ 0;
(b¡Axp) 2 @f(p)
kb¡Axpk1 · ½
²MD ·½p2 ¢ lognpT
T ·2 ¢ ½2 ¢ logn
²2
Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:
Turn into non-smooth minimization problem over simplex:
MWU and s-t Maxflow
8e 2 E;F ¢ jfejce
· 1
BT f = es ¡ et
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ jfejce
¡ 1
Will enforce this
Best response fp is shortest s-t path with lengths pe / ce .
For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.
Otherwise, the following is a subgradient
Unfortunately, width can be large
@f(p)e =F ¢ j(fp)ej
ce¡ 1
k@f(p)ek1 ·F
cmin
[PST 91] T = O
³F logn
²2cmin
´
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
Width Reduction: make function nicer
x(t)x(t+1) x(t+2)
k@f(p)ek1 ·F
cmin
f(p) = minBTf=es¡et
F ¢X
e2E
fe
ce
³pe +
²
m
´¡ 1
NEED PRIMAL ARGUMENT
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
REGULARIZATION ERROR:
NEW WIDTH:
ITERATION BOUND:
Width Reduction: make primal nicer
k@f(p)ek1 ·F
cmin
f(p) = minBTf=es¡et
F ¢X
e2E
fe
ce
³pe +
²
m
´¡ 1
²F
k@f(p)ek1 ·m
²
[GK 98] T = O
³m logn
²2
´
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Original width:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
Best response is electrical flow fp
k@f(p)ek1 · m
Electrical Flow Approach [CKMST]
8e 2 E;F ¢ f2e
c2e· 1
BT f = es ¡ et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Regularize primal:
f(p) = minBT f=es¡et
X
e2Epe ¢
F ¢ f2ec2e
¡ 1
f(p) = minBT f=es¡et
F ¢X
e2E
f2ec2e
³pe +
²
m
´¡ 1
k@f(p)ek1 ·
rm
²
Conclusion: Take-away messages
• Regularization is a powerful tool for the design of fast algorithms.
• Most iterative algorithms can be understood as regularized updates:
MWUs, Width Reduction, Interior Point, Gradient descent, ..
• Perform well in practice. Regularization also helps eliminate noise.
• ULTIMATE GOAL:
Development of a library of iterative methods for fast graph algorithms.
Regularization plays a fundamental role in this effort
THE END – THANK YOU