iterative methods and regularization in the design...

ITERATIVE METHODS AND REGULARIZATION

IN THE DESIGN OF FAST ALGORITHMS

Lorenzo Orecchia, MIT Math

An unified framework for optimization and online learning

beyond Multiplicative Weight Updates

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

• Online Linear Optimization

• Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs)

• A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE

Optimization:

Regularized Updates

Online Learning:

Multiplicative Weight

Updates (MWUs)

PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW

• Non-smooth vs Smooth Convex Optimization

•Non-smooth Convex Optimization reduces to Online Linear Optimization

• Application: Understanding Undirected Maxflow algorithms based on MWUs

MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH

Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]

Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]

Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]

… and more

TOC Applications of MWUs

Machine Learning meets Optimization meets TCS

These techniques have been rediscovered multiple times in different fields:

Machine Learning, Convex Optimization, TCS

Three surveys emphasizing the different viewpoints and literatures:

1) ML: Prediction, Learning and Games by Gabor and Lugosi

2) Optimization: Lectures in Modern Convex Optimization

by Ben Tal and Nemirowski

3) TCS: The Multiplicative Weights Update Method: a Meta

Algorithm and Applications by Arora, Hazan and Kale

REGULARIZATION 101

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

• Stable optimum

• Unique optimal solution

• Smoothness conditions

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Benefits of Regularization in Learning and Statistics:

• Prevents overfitting

• Increases stability

•Decreases sensitivity to random noise

Regularizer F Parameter ¸ > 0

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a linear optimization problem:

The optimal solution f(c) may be very unstable under perturbation of c :

S ½Rn

kc0 ¡ ck · ± and

f(c0) f(c)

kf(c0)¡ f(c)k >> ±

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

cTx+F(x)

c0Tx+F(x)

Consider a convex set and a regularized linear optimization problem

where F is ¾-strongly convex.

S ½Rn

kc0 ¡ ck · ± implies kf(c0)¡ f(c)kk · ±¾

f(c0)f(c)

cTx+F(x)

c0Tx+F(x)

kslopek · ±

ONLINE LINEAR OPTIMIZATION

MULTIPLICATIVE WEIGHT UPDATES

SETUP: Convex set X µ Rn, generic norm, repeated game over T rounds.

At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2X

Current solution

At round t,

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

At round t,

ALGORITHM ADVERSARY

x(t) 2X

Current solution

`(t) 2 Rn;kr`(t)k¤ · ½Current linear objective

Loss vector

`(t)Tx(t)

Algorithm’s loss

At round t,

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X

Updated solution

At round t,

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

Updated solution New Loss Vector

At round t,

ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k¤ · ½

x(t+1) 2X `(t+1) 2 Rn;kr`(t)k¤ · ½

GOAL: update x(t) to minimize regret

Average Algorithm’s Loss A Posteriori Optimum

`(t)TxT ¡min

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X µ Rn under ℓ1 norm. At round t,

distribution over dimensions

i.e. experts

k`(t)k1 · ½

Experts’ losses

k`(t)k1 · ½

Experts’ losses

EiÃp(t)h`(t)

i= p(t)

Algorithm’s loss

k`(t)k1 · ½

Experts’ losses

p(t+1)

Update distribution

Simplex Case: Multiplicative Weight Updates

ALGORITHM ADVERSARY

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

ALGORITHM ADVERSARY

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

j=1w(t)

Distribution:

ALGORITHM ADVERSARY

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

j=1w(t)

Distribution:

MULTIPLICATIVE WEIGHT UPDATE

ALGORITHM ADVERSARY

w(t+1)

i = (1¡ ²)`(t)

i w(t)

i ; w1 = ~1Weights:

p(t+1)

i =w(t)

j=1w(t)

Distribution:

² 2 (0; 1)0 1

CONSERVATIVE AGGRESSIVE

MWUs: Unraveling the Update

ALGORITHM ADVERSARY

WEIGHT

CUMULATIVE LOSS

(1¡ ²)

Pt`(t)

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

w(t+1)

Pt `(t)

For and

MWUs: Regret Bound

ALGORITHM ADVERSARY

L̂¡L? · ½ logn

²T+ ½²

k`(t)k1 · ½² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

For and

MWUs: Regret Bound

ALGORITHM ADVERSARY

L̂¡L? · ½ logn

²T+ ½²

² < 12

p(t+1)

i / w(t+1)

i = (1¡ ²)`(t)

i ¢w(t)iUpdate:

Algorithm’s

Regret

Start-up Penalty Penalty for

being greedy

k`(t)k1 · ½

ONLINE LINEAR OPTIMIZATION BEYOND MWUs

A REGULARIZATION FRAMEWORK

MWUs: Proof Sketch of Regret Bound

©(t+1) = log1¡²Pn

i=1w(t+1)

p(t+1)

i / w(t+1)

i = (1¡ ²)

s=1`(s)

iUpdate:

• Proof is potential function argument

©(t+1) = log1¡²Pn

i=1w(t+1)

p(t+1)

i / w(t+1)

i = (1¡ ²)

s=1`(s)

iUpdate:

• Potential function bounds loss of best expert

©(t+1) · log1¡²minni=1w

i =minni=1

s=1 `(s)

©(t+1) = log1¡²Pn

i=1w(t+1)

p(t+1)

i / w(t+1)

i = (1¡ ²)

s=1`(s)

iUpdate:

• Potential function is related to algorithm’s performance

i =minni=1

s=1 `(s)

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

©(t+1) = log1¡²Pn

i=1w(t+1)

p(t+1)

i / w(t+1)

i = (1¡ ²)

s=1`(s)

iUpdate:

• Potential function is related to algorithm’s performance

i =minni=1

s=1 `(s)

©(t+1) ¡©(t) ¸³`(t)

Tp(t)´¡ ²

DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?

MWUs AND APPLICATIONS

Designing a Regularized Update GOAL: Design an update and its potential function analysis

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best expert’s loss

2) tracks algorithm’s performance

Attempt 1 – FOLLOW THE LEADER: Cumulative loss

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t) ©(t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

Fails if best expert changes moves drastically

L(t) =Pt

s=1 `(s)

x(t+1) = argminx2X

xTL(t)

©(t+1) = minx2X

xTL(t)

How to make update

more stable?

Attempt 2 – FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. ¾-strong convex w.r.t. norm

Parameter ´ ¸ 0, TBD

Regularized Update: Definition

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Regularized Update: Definition

These properties are actually sufficient to get a regret bound

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

Regularized Update: Analysis

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x)

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

©(t+1) · minx2X

L(t)Tx+ ´ ¢max

x2XF(x) Regularization

x(t+1) = argminx2X

xTL(t) + ´ ¢F(x)

©(t+1) = minx2X

xTL(t) + ´ ¢F(x)

f(t+1)(x)

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

Define:

©(t+1)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

f(t+1)(x) = xTL(t) + ´ ¢ F(x)

f(t)(x)

f(t+1)(x)

x(t) x(t+1)

Define:

©(t+1)

Notice:

f(t+1)(x)¡ f(t)(x) = `(t)Tx Latest loss vector

`(t)Tx(t)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

f(t)(x)

f(t+1)(x)

x(t) x(t+1)

Compare:

©(t+1)

and ©(t+1) ¡©(t)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

x(t) x(t+1)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

©(t+1)

f(t+1)(x(t)) ¼ f(t+1)(x(t+1))

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

Regularization in Action

©(t+1)

f (t) is (´ ¢ ¾ )-strongly-convex REGULARIZATION

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

©(t+1)

kf(t+1) ¡ f(t)k¤ = k`(t)k¤ jjx(t+1) ¡ x(t)jj ·jj`(t)jj¤´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

©(t+1)

kf(t+1) ¡ f(t)k = k`(t)k jjx(t+1) ¡ x(t)jj¤ ·jj`(t)jj´¢¾

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ ´ ¢ F(x)

Quadratic

lower bound

to f(t+1)

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t) jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t)

f(t+1)(x(t+1))¡ f(t+1)(x(t)) ¸ `(t)T(x(t+1) ¡ x(t)) +

jj`(t)jj2¤2´ ¢ ¾

f (t+1) is (´ ¢ ¾)-strongly-convex

©(t+1) ¡©(t) = f(t+1)(x(t+1))¡ f(t+1)(x(t)) + `(t)Tx(t)

¸ ¡k`(t)k¤kx(t+1) ¡ x(t)k+ jj`(t)jj¤2´ ¢ ¾ ¸ ¡k`

(t)k2¤2´ ¢ ¾

jjx(t) ¡ x(t)jj ·jj`(t)jj¤´¢¾

Completing the Analysis

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

2¾´Regret at iteration t

Progress in one iteration:

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

Telescopic sum:

©(T+1) ¸TX

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

©(t+1) ¡©(t) ¸ `(t)Tx(t) ¡ k`(t)k¤

Telescopic sum:

©(T+1) ¸TX

`(t)Tp(t) +©(1) ¡ T ¢ jj`

(t)jj2´ ¢ ¾

Final regret bound:

`(t)Tx(t) ¡min

`(t)Tx

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

Regret bound: with regularizer F and

jj`(t)jj¤ · ½

Start-up Penalty Penalty for

being greedy

SAME TYPE OF BOUND AS FOR MWUs

`(t)Tx(t) ¡min

`(t)Tx

T¢ (maxx2X

F (x)¡minx2X

F (x)) +½2

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

pi logpiPotential function:

Regularizer: is negative entropy F (p) =

pi log pi

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

pi logpi

p(t+1)

i =e¡

1´L(t)

i=1 e¡ 1´L(t)

=(1¡ ²)L

i=1(1¡ ²)L(t)

SOFT-MAX

Reinterpreting MWUs

©(t+1) = minp¸0;Ppi=1

pTL(t) + ´ ¢nX

Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

pi log pi

k ¢ k1

p(t+1) = arg minp¸0;Ppi=1

pTL(t) + ´ ¢nX

pi logpi

p(t+1)

i =e¡

1´L(t)

i=1 e¡ 1´L(t)

=(1¡ ²)L

i=1(1¡ ²)L(t)

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

`(t)Tx(t) ¡min

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

Beyond MWUs: which regularizer?

Regret bound: optimizing over ́

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2¤ ¢ (maxx2X F (x)¡minx2X F (x))

`(t)Tx(t) ¡min

`(t)Tx

!·½p(2 ¢ (maxx2X F (x)¡minx2X F (x))p

Negative entropy with -norm is approximately optimal for simplex

QUESTION: are other regularizers ever useful?

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Spectral algorithms for balanced separator running in time

Uses random-walk framework and SDP MWUs

Different walks correspond to different regularizers for eigenvector problem

[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]

Different Regularizers in Algorithm Design

F(X) = Tr(X1=2)

F(X) = Tr(Xp)

F(X) = Tr(X logX)SDP MWU

p-norm, 1 · p · 1

NEW REGULARIZER

Heat Kernel Random Walk

Lazy Random Walk

Personalized PageRank

QUESTION 1:

YES! Applications:

Sparsification

²-spectral-sparsifiers with edges

Uses Matrix concentration bound equivalent to SDP MWUs

[Spielman, Srivastava 2008]

²-spectral-sparsifiers with edges

Can be interpreted as different regularizer:

[Batson, Spielman, Srivastava 2009]

O(n logn²2

O( n²2)

F(X) = Tr(X1=2)

QUESTION 1:

YES! Applications:

Sparsification

Many more in Online Learning

Bandit Online Learning [AHR], …

NON-SMOOTH CONVEX OPTIMIZATION

REDUCES TO

ONLINE LINEAR OPTIMIZATION

Convex Optimization Setup

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

krf(y)¡rf(x)k¤ · Lky ¡ xk

f convex, differentiable

X µ Rn closed, convex set

minx2X

NON-SMOOTH SMOOTH

½-Lipschitz continuous ½-Lipschitz continuous gradient

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

minx2X

NON-SMOOTH SMOOTH

Gradient step is guaranteed to decrease

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

minx2X

NON-SMOOTH SMOOTH

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

8x 2 X;

krf(x)k¤ · ½

8x; y 2 X;

minx2X

NON-SMOOTH SMOOTH

function value

f(x(t+1)) · f(x(t))¡ krf(x(t))k2¤2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

ONLY DUAL GUARANTEE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k¤ · ½

minx2X

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

8x 2X; krf(x)k¤ · ½

minx2X

½-Lipschitz continuous

x(t)x(t+1) x(t+2)

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE

x(t)x(t+1) x(t+2)

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t)T (x¤ ¡ x(t))

Take convex combination of both upper bounds and lower bounds with weights °t

UPPER BOUND:

LOWER BOUND:

t=1°t

t=1 °tf(x(t))´¸ f(x¤)

x(t)x(t+1) x(t+2)

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

UPPER:

LOWER :

t=1°t

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

x(t)x(t+1) x(t+2)

f(x(t)) ¸ f(x¤)

f(x¤) ¸ f(x(t)) +rf(x(t))T (x¤ ¡ x(t))

UPPER:

LOWER :

t=1°t

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

LOWER HOW TO UPDATE ITERATES?

HOW TO CHOSE WEIGHTS?

Reduction to Online Linear Minimization

Fix weights °t to be uniform for simplicity:

UPPER:

LOWER :

DUALITY GAP:

t=1°t

t=1 °tf(x(t))´¸ f(x¤)

f(x¤) ¸ 1PT

t=1°t

t=1 °t(f(x(t)) +rf(x(t))T (x¤ ¡ x(t)))

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

t=1¡rf(x(t))T(x¤ ¡ x(t))

LINEAR FUNCTION

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X ¡rf(x(t))

ONLINE SETUP

DUALITY GAP: ·PT

t=1°tPT

t=1°tf(x(t))

¸¡ f(x¤) ·

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient

k`(t)k¤ = krf(x(t))k¤ · ½

DUALITY GAP: hPT

t=11Tf(x(t))

i¡ f(x¤) · 1

t=1¡rf(x(t))T(x¤ ¡ x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

k`(t)k¤ = krf(x(t))k¤ · ½

¡rf(x(t))T (x¤ ¡ x(t)) = REGRET

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

k`(t)k¤ = krf(x(t))k¤ · ½

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =¡rf(x(t))

ONLINE SETUP

k`(t)k¤ = krf(x(t))k¤ · ½

²MD ·½p2 ¢ (maxx2X F (x)¡minx2X F (x))

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with ¾-strongly-convex regularizer F

ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND

Non-Smooth Optimization over Simplex

²MD ·½p2 ¢ lognpT

RESULTING ALGORITHM:

MIRROR DESCENT OVER SIMPLEX = MWU

Regularizer F is negative entropy, with krf(x(t))k1 · ½

APPLICATIONS IN ALGORITHM DESIGN

Warm-up Example: Linear Programming

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

LP Feasibility problem

Easy constraints

Maintain feasible Hard constraints

Require fixing

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

A 2 Rm£n;?9x 2 X : Ax¡ b ¸ 0

Non-differentiable objective:

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)Best response to dual

solution p

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Non-differentiable objective

Admits subgradients, for all p:

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)Subgradient is slack

in constraints

A 2 Rm£n;?9x 2 X : b¡Ax ¸ 0

Non-differentiable objective

Admits subgradients, for all p:

If we can pick xp such that , then

minp2¢m

maxx2X

pT (b¡Ax)

f(p) = maxx2X

pT (b¡Ax)

xp : pT (b¡Axp) ¸ 0;

(b¡Axp) 2 @f(p)

kb¡Axpk1 · ½

²MD ·½p2 ¢ lognpT

T ·2 ¢ ½2 ¢ logn

Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:

Turn into non-smooth minimization problem over simplex:

MWU and s-t Maxflow

8e 2 E;F ¢ jfejce

BT f = es ¡ et

f(p) = minBT f=es¡et

e2Epe ¢

F ¢ jfejce

Will enforce this

Best response fp is shortest s-t path with lengths pe / ce .

For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.

Otherwise, the following is a subgradient

Unfortunately, width can be large

@f(p)e =F ¢ j(fp)ej

ce¡ 1

k@f(p)ek1 ·F

[PST 91] T = O

³F logn

²2cmin

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

Width Reduction: make function nicer

x(t)x(t+1) x(t+2)

k@f(p)ek1 ·F

f(p) = minBTf=es¡et

³pe +

´¡ 1

NEED PRIMAL ARGUMENT

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

REGULARIZATION ERROR:

NEW WIDTH:

ITERATION BOUND:

Width Reduction: make primal nicer

k@f(p)ek1 ·F

f(p) = minBTf=es¡et

³pe +

´¡ 1

k@f(p)ek1 ·m

[GK 98] T = O

³m logn

Electrical Flow Approach [CKMST]

8e 2 E;F ¢ f2e

c2e· 1

BT f = es ¡ et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

e2Epe ¢

F ¢ f2ec2e

8e 2 E;F ¢ f2e

c2e· 1

Original width:

e2Epe ¢

F ¢ f2ec2e

Best response is electrical flow fp

k@f(p)ek1 · m

8e 2 E;F ¢ f2e

c2e· 1

Regularize primal:

e2Epe ¢

F ¢ f2ec2e

f2ec2e

³pe +

´¡ 1

k@f(p)ek1 ·

Conclusion: Take-away messages

• Regularization is a powerful tool for the design of fast algorithms.

• Most iterative algorithms can be understood as regularized updates:

MWUs, Width Reduction, Interior Point, Gradient descent, ..

• Perform well in practice. Regularization also helps eliminate noise.

• ULTIMATE GOAL:

Development of a library of iterative methods for fast graph algorithms.

Regularization plays a fundamental role in this effort

THE END – THANK YOU

iterative methods and regularization in the design...

Documents

arxiv:1804.05368v3 [math.oc] 30 sep...

an iterative, projection-based algorithm for general form...

regularization parameter trimming for iterative...

robust attribution regularization

siam annual meeting 20051 an iterative, projection-based...

l1 regularization

title: regularization

global regularization of inverse kinematics for...

regularization tools - dtu

data consistent ct reconstruction from …artifact reduction...

iterative regularization methods for ill-posed problems ·...

regularization in neural networks - welcome to...

an iterative multigrid regularization method for...

image deconvolution · keywords: image deconvolution,...

iterative regularization methods for inverse problems:...

an iterative multigrid regularization method for...

rudy regularization

implementing regularization implicitly via approximate...

a comparison of the iterative regularization technique and...

an iterative regularization method for total variation-based...