Transcript
Page 1: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ITERATIVE METHODS AND REGULARIZATION

IN THE DESIGN OF FAST ALGORITHMS

Lorenzo Orecchia, MIT Math

An unified framework for optimization and online learning

beyond Multiplicative Weight Updates

Page 2: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

Page 3: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

Optimization:

Regularized Updates

Online Learning:

Multiplicative Weight

Updates (MWUs)

Page 4: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW

• Non-smooth vs Smooth Convex Optimization

•Non-smooth Convex Optimization reduces to Online Linear Optimization

• Application: Understanding Undirected Maxflow algorithms based on MWUs

MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH

Page 5: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]

Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]

Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]

… and more

TOC Applications of MWUs

Page 6: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Machine Learning meets Optimization meets TCS

These techniques have been rediscovered multiple times in different fields:

Machine Learning, Convex Optimization, TCS

Three surveys emphasizing the different viewpoints and literatures:

1) ML: Prediction, Learning and Games by Gabor and Lugosi

2) Optimization: Lectures in Modern Convex Optimization

by Ben Tal and Nemirowski

3) TCS: The Multiplicative Weights Update Method: a Meta

Algorithm and Applications by Arora, Hazan and Kale

Page 7: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

REGULARIZATION 101

Page 8: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

• Stable optimum

• Unique optimal solution

• Smoothness conditions

Page 9: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Benefits of Regularization in Learning and Statistics:

• Prevents overfitting

• Increases stability

•Decreases sensitivity to random noise

Regularizer F Parameter ¸ > 0

Page 10: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a linear optimization problem:

The optimal solution f(c) may be very unstable under perturbation of c :

S ½Rn

kc0 ¡ ck · ± and

S

cc0

f(c0) f(c)

kf(c0)¡ f(c)k >> ±

Page 11: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)

Page 12: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

Then:

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)

kslopek · ±

Page 13: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ONLINE LINEAR OPTIMIZATION

AND

MULTIPLICATIVE WEIGHT UPDATES

Page 14: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

Page 15: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

Page 16: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

`(t)Tx(t)

Algorithm’s loss

Page 17: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X

Updated solution

Page 18: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

Updated solution New Loss Vector

Page 19: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

GOAL: update x(t) to minimize regret

Average Algorithm’s Loss A Posteriori Optimum

1

T¢TX

t=1

`(t)TxT ¡min

x2X

1

T¢TX

t=1

`(t)

i

T

x

Page 20: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

Page 21: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over dimensions

i.e. experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

Page 22: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

EiÃp(t)h`(t)

i

i= p(t)

T`(t)

Algorithm’s loss

Page 23: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

k`(t)k1 · ½

Experts’ losses

p(t+1)

Update distribution

Page 24: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

Page 25: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

Page 26: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

MULTIPLICATIVE WEIGHT UPDATE

Simplex Case: Multiplicative Weight Updates

Page 27: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

Simplex Case: Multiplicative Weight Updates

² 2 (0; 1)0 1

CONSERVATIVE AGGRESSIVE

Page 28: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs: Unraveling the Update

p(t)

ALGORITHM ADVERSARY

`(t)

WEIGHT

CUMULATIVE LOSS

(1¡ ²)

Pt`(t)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

w(t+1)

i

Pt `(t)

i

Page 29: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

k`(t)k1 · ½² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Page 30: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L̂¡L? · ½ logn

²T+ ½²

² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Algorithm’s

Regret

Start-up Penalty Penalty for

being greedy

k`(t)k1 · ½

Page 31: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

ONLINE LINEAR OPTIMIZATION BEYOND MWUs

A REGULARIZATION FRAMEWORK

Page 32: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs: Proof Sketch of Regret Bound

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

Page 33: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

MWUs: Proof Sketch of Regret Bound

Page 34: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

• Potential function is related to algorithm’s performance

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

MWUs: Proof Sketch of Regret Bound

Page 35: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

©(t+1) = log1¡²Pn

i=1w(t+1)

i

p(t+1)

i / w(t+1)

i = (1¡ ²)

Pt

s=1`(s)

iUpdate:

• Proof is potential function argument

• Potential function bounds loss of best expert

• Potential function is related to algorithm’s performance

©(t+1) · log1¡²minni=1w

(t+1)

i =minni=1

³Pt

s=1 `(s)

i

´

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?

MWUs: Proof Sketch of Regret Bound

Page 36: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Designing a Regularized Update GOAL: Design an update and its potential function analysis

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Page 37: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Page 38: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Page 39: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

Fails if best expert changes moves drastically

Page 40: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

©(t+1) = minx2X

xTL(t)

Designing a Regularized Update

How to make update

more stable?

Page 41: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition

Page 42: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition

These properties are actually sufficient to get a regret bound

Page 43: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x)

Page 44: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x) Regularization

error

Page 45: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Analysis

?

f(t+1)(x)

Page 46: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

©(t)

Page 47: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Page 48: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Define:

©(t+1)

©(t)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

`(t)Tx(t)

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Page 49: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Compare:

©(t+1)

©(t)

and ©(t+1) ¡©(t)

Tracking the Algorithm: Proof by Picture

`(t)Tx(t)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

Page 50: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

p

Want:

©(t+1)

©(t)

Tracking the Algorithm: Proof by Picture

f(t+1)(x(t)) ¼ f(t+1)(x(t+1))

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

Page 51: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

Page 52: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

`(t)

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Page 53: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

`(t)

Regularization in Action

©(t+1)

©(t)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Quadratic

lower bound

to f(t+1)

Page 54: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

Page 55: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`

(t)k2¤2´ ¢ ¾

jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

Page 56: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´Regret at iteration t

Progress in one iteration:

Page 57: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´

Progress in one iteration:

Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Page 58: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´

Progress in one iteration:

Telescopic sum:

©(T+1) ¸TX

t=1

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Final regret bound:

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´

Page 59: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Completing the Analysis

Regret bound: with regularizer F and

jj`(t)jj¤ · ½

Start-up Penalty Penalty for

being greedy

SAME TYPE OF BOUND AS FOR MWUs

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

´

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

2¾´

Page 60: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy F (p) =

nX

i=1

pi log pi

Page 61: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:

SOFT-MAX

Page 62: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpiPotential function:

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nX

i=1

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

i=1

pi logpi

p(t+1)

i =e¡

1´L(t)

i

Pn

i=1 e¡ 1´L(t)

i

=(1¡ ²)L

(t)

i

Pn

i=1(1¡ ²)L(t)

i

:

Page 63: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T

Page 64: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

MWUs AND APPLICATIONS

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

¾

1

T

ÃTX

t=1

`(t)Tx(t) ¡min

x2X

TX

t=1

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

¾T

Negative entropy with -norm is approximately optimal for simplex

QUESTION: are other regularizers ever useful?

`1

Page 65: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Spectral algorithms for balanced separator running in time

Uses random-walk framework and SDP MWUs

Different walks correspond to different regularizers for eigenvector problem

[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]

Different Regularizers in Algorithm Design

F(X) = Tr(X1=2)

F(X) = Tr(Xp)

F(X) = Tr(X logX)SDP MWU

p-norm, 1 · p · 1

NEW REGULARIZER

Heat Kernel Random Walk

Lazy Random Walk

Personalized PageRank

~O(m)

Page 66: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Sparsification

²-spectral-sparsifiers with edges

Uses Matrix concentration bound equivalent to SDP MWUs

[Spielman, Srivastava 2008]

²-spectral-sparsifiers with edges

Can be interpreted as different regularizer:

[Batson, Spielman, Srivastava 2009]

Different Regularizers in Algorithm Design

O(n logn²2

)

O( n²2)

F(X) = Tr(X1=2)

Page 67: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Sparsification

Many more in Online Learning

Bandit Online Learning [AHR], …

Different Regularizers in Algorithm Design

Page 68: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

NON-SMOOTH CONVEX OPTIMIZATION

REDUCES TO

ONLINE LINEAR OPTIMIZATION

Page 69: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Page 70: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

Page 71: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

Page 72: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

ONLY DUAL GUARANTEE

Page 73: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Page 74: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½

f convex, differentiable

X µ Rn closed, convex set

minx2X

f(x)

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE

Page 75: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER BOUND:

LOWER BOUND:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

Page 76: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER

Page 77: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Setup: Dual Approach

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER:

LOWER :

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

UPPER

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

LOWER HOW TO UPDATE ITERATES?

HOW TO CHOSE WEIGHTS?

Page 78: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

UPPER:

LOWER :

DUALITY GAP:

1PT

t=1°t

³PT

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

hPT

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

i

·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

LINEAR FUNCTION

Page 79: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X ¡rf(x(t))

ONLINE SETUP

Page 80: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

Page 81: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

DUALITY GAP: hPT

t=11Tf(x(t))

i¡ f(x¤) · 1

T¢PT

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

1

T¢TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

Page 82: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

Page 83: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

TX

t=1

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

¾pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND

Page 84: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Non-Smooth Optimization over Simplex

²MD ·½p2 ¢ lognpT

RESULTING ALGORITHM:

MIRROR DESCENT OVER SIMPLEX = MWU

Regularizer F is negative entropy, with krf(x(t))k1 · ½

Page 85: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

APPLICATIONS IN ALGORITHM DESIGN

Page 86: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

LP Feasibility problem

Easy constraints

Maintain feasible Hard constraints

Require fixing

Page 87: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

Page 88: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)Best response to dual

solution p

Page 89: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective

Admits subgradients, for all p:

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)Subgradient is slack

in constraints

Page 90: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective

Admits subgradients, for all p:

If we can pick xp such that , then

LP Feasibility problem

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)

kb¡Axpk1 · ½

²MD ·½p2 ¢ lognpT

T ·2 ¢ ½2 ¢ logn

²2

Page 91: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:

Turn into non-smooth minimization problem over simplex:

MWU and s-t Maxflow

8e 2 E;F ¢ jfejce

· 1

BT f = es ¡ et

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ jfejce

¡ 1

Will enforce this

Best response fp is shortest s-t path with lengths pe / ce .

For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.

Otherwise, the following is a subgradient

Unfortunately, width can be large

@f(p)e =F ¢ j(fp)ej

ce¡ 1

k@f(p)ek1 ·F

cmin

[PST 91] T = O

³F logn

²2cmin

´

Page 92: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

Width Reduction: make function nicer

x(t)x(t+1) x(t+2)

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

NEED PRIMAL ARGUMENT

Page 93: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

REGULARIZATION ERROR:

NEW WIDTH:

ITERATION BOUND:

Width Reduction: make primal nicer

k@f(p)ek1 ·F

cmin

f(p) = minBTf=es¡et

F ¢X

e2E

fe

ce

³pe +

²

m

´¡ 1

²F

k@f(p)ek1 ·m

²

[GK 98] T = O

³m logn

²2

´

Page 94: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

Page 95: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

Original width:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

Best response is electrical flow fp

k@f(p)ek1 · m

Page 96: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

Regularize primal:

f(p) = minBT f=es¡et

X

e2Epe ¢

F ¢ f2ec2e

¡ 1

f(p) = minBT f=es¡et

F ¢X

e2E

f2ec2e

³pe +

²

m

´¡ 1

k@f(p)ek1 ·

rm

²

Page 97: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

Conclusion: Take-away messages

• Regularization is a powerful tool for the design of fast algorithms.

• Most iterative algorithms can be understood as regularized updates:

MWUs, Width Reduction, Interior Point, Gradient descent, ..

• Perform well in practice. Regularization also helps eliminate noise.

• ULTIMATE GOAL:

Development of a library of iterative methods for fast graph algorithms.

Regularization plays a fundamental role in this effort

Page 98: ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN …cs-people.bu.edu/orecchia/files/talks/FOCS13-workshop.pdfTalk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE

THE END – THANK YOU


Top Related