accelerating nesterov's method for strongly convex functions · nesterov’s method (h = 1 l):...

42
The Gap Quadratic Functions General Strongly Convex Functions Accelerating Nesterov’s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Upload: others

Post on 29-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Accelerating Nesterov’s Methodfor Strongly Convex Functions

Hao Chen Xiangrui Meng

MATH301, 2011

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 2: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Outline

1 The Gap

2 Quadratic Functions

3 General Strongly Convex Functions

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 3: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Outline

1 The Gap

2 Quadratic Functions

3 General Strongly Convex Functions

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 4: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Our talk begins with a tiny gap

For any x0 ∈ R∞ and any constant µ > 0, L > µ there exists afunction f ∈ S∞,1µ,L such that for any first-order method, we have

f (xk)− f ∗ ≥ µ

2

(√κ− 1√κ+ 1

)2k

‖x0 − x∗‖2, κ =L

µ.

Nesterov’s method generates a sequence {xk}∞k=0 such that

f (xk)− f ∗ ≤ L

(√κ− 1√κ

)k

‖x0 − x∗‖2, κ =L

µ.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 5: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

At a closer look, the gap is not tiny

Assume that κ is large. Given a small tolerance ε > 0, to makef (xk)− f < ε, the “ideal” first-order method needs

K ∗ =log ε− log µ

2

2 log√κ−1√κ+1

≈ log1

ε·√κ

4

number of iterations. Nesterov’s method needs

K =log ε− log L

log√κ−1√κ

≈ log1

ε·√κ

number of iterations, which is 4 times as large as the ideal number.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 6: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Can we reduce the gap?

Can we reduce the gap

for quadratic functions?

minimize f (x) =1

2xTAx − bT x , µIn � A � LIn.

In this case, we do have an “ideal” method, the conjugategradient method, having the optimal convergence rate.

for general strongly convex functions?

minimize f (x), f (x) ∈ Sµ,L.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 7: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Outline

1 The Gap

2 Quadratic Functions

3 General Strongly Convex Functions

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 8: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Nesterov’s constant step scheme, III

0. Choose y0 = x0 ∈ Rn.

1. k-th iteration (k ≥ 0).

xk+1 = yk − hf ′(yk),

yk+1 = xk+1 + β(xk+1 − xk),

where h = 1L and β = 1−

õh

1+õh

.

Q: Is Nesterov’s choice of h and β optimal?

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 9: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

On quadratic functions

When minimizing a quadratic function

f (x) =1

2xTAx − bT x ,

Nesterov’s updates become

0. Choose y0 = x0 = 0.

1. k-th iteration (k ≥ 0).

xk+1 = yk − h(Ayk − b),

yk+1 = xk+1 + β(xk+1 − xk).

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 10: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Eigendecomposition

Let A = VΛV T be A’s eigendecomposition. Define xk = V T xk ,yk = V T yk for all k , and b = V Tb. Then Nesterov’s updates canbe written as

0. Choose y0 = x0 = 0.

1. k-th iteration (k ≥ 0).

xk+1 = yk − h(Λyk − b),

yk+1 = xk+1 + β(xk+1 − xk).

Λ is diagonal, hence the updates are actually element-wise:

xk+1,i = yk,i − h(λi yk,i − bi ), i = 1, . . . , n,

yk+1,i = xk+1,i − β(λi yk,i − bi ), i = 1, . . . , n.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 11: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Recurrence relation

We can eliminate the sequence {yk} from the update scheme.

xk+1,i = yk,i − h(λi yk,i − bi )

= (xk,i + β(xk,i − xk−1,i )− h(λi (xk,i + β(xk,i − xk−1,i ))− bi )

= (1 + β)(1− λih)xk,i − β(1− λih)xk−1,i + hbi .

Let ek = V T (xk − x∗) = V T (xk − VΛ−1V Tb) = xk − Λ−1b for allk . We have the following recurrence relation on the error:

ek+1,i = (1 + β)(1− λih)ek,i − β(1− λih)ek−1,i .

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 12: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Characteristic equation

The characteristic equation for the recurrence relation is given by

ξ2i = (1 + β)(1− λih)ξi − β(1− λih).

Denote the two roots by ξi ,1 and ξi ,2, and assume they are distinctfor simplicity. The general solution is given by

ek,i = Ci ,1ξki ,1 + Ci ,2ξ

ki ,2.

Let Ci = |Ci ,1|+ |Ci ,2| and θi = max{|ξi ,1|, |ξi ,2|}. We have

|ek,i | ≤ Ciθki .

Hence,

‖xk − x∗‖2 = ‖xk − x∗‖2 =∑i

|ek,i |2 ≤∑i

C 2i θ

2ki ≤ Cθ2k ,

where C =∑

i C2i and θ = maxi θi .

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 13: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Finding the optimal convergence rate

Our problem becomes

minimize θ

subject to θ ≥ |ξ1(λ)|, |ξ2(λ)|, ∀λ ∈ [µ, L],

where ξ1(λ) and ξ2(λ) are the roots of

ξ2 = (1 + β)(1− λh)ξ − β(1− λh),

where h, β and θ are variables.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 14: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Special cases

If β = 0, we are doing gradient descent. The optimal rate isgiven by θ∗ = L−µ

L+µ , attained at h∗ = 2L+µ .

If h = 1L , the optimal rate is given by

θ∗ = 1−√µh = 1−

√µL , attained at β∗ = 1−

õh

1+õh

=√L−√µ√L+√µ

,

which confirms Nesterov’s choice.

Q: Why do we choose h = 1L? It guarantees the most decrease in

function value of a function with Lipschitz constant L.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 15: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

The optimal convergence rate

By considering all the combinations of h and β, we reach thefollowing optimal solution:

h∗ =4

3L + µ

(the harmonic mean of

1

Land

2

L + µ

)β∗ =

1−√µh∗

1 +√µh∗

,

θ∗ = 1−√µh∗ = 1− 2√

3κ+ 1.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 16: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Comparing the convergence rates

Nesterov’s method (h = 1L ):

‖xk − x∗‖ ≤ C ·(

1− 1√κ

)k

‖x0 − x∗‖.

Note that this is better than the convergence rate we have ongeneral strongly convex functions.

Nesterov’s method (h = 43L+µ ):

‖xk − x∗‖ ≤ C ·(

1− 2√3κ+ 1

)k

‖x0 − x∗‖.

Conjugate gradient:

‖xk − x∗‖A ≤ 2 ·(

1− 2√κ+ 1

)k

‖x0 − x∗‖A.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 17: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

What’s happening on the eigenspace

Figure: Error along eigendirections (|ek,i |)

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 18: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

The model problem

minimize f (x) =1

2xTAx − bT x ,

where

A =

2 −1

−1. . .

. . .. . .

. . . −1−1 2

+ δIn ∈ Rn×n, b = randn(n, 1) ∈ Rn.

We chose n = 106 and δ = 0.05.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 19: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: ‖xk − x∗‖Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 20: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: f (xk)− f ∗

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 21: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Outline

1 The Gap

2 Quadratic Functions

3 General Strongly Convex Functions

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 22: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Back to Nesterov’s proof

A pair of sequence {φk(x)} and {λk}, λk ≥ 0 is called anestimate sequence of function f (x) if λk → 0 and for anyx ∈ Rn and all k ≥ 0 we have

φk(x) ≤ (1− λk)f (x) + λkφ0(x).

If for a sequence {xk} we have

f (xk) ≤ φ∗k ≡ minx∈Rn

φk(x)

then f (xk)− f ∗ ≤ λk [φ0(x∗)− f ∗]→ 0

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 23: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

A useful estimate sequence provided by Nesterov

λk+1 = (1− αk)λk

φk+1(x) = (1− αk)φk(x) + αk [f (yk) +⟨f ′(yk), x − yk

⟩+µ

2||x − yk ||2]

where

{yk} is an arbitrary sequence in Rn.

αk ∈ (0, 1),∑∞

k=0 αk =∞.

λ0 = 1.

φ0 is an arbitrary function on Rn.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 24: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

A specific choice of φ0(x)

φ0(x) ≡ φ∗0 +γ02||x − v0||2

and set x0 = v0, φ∗0 = f (x0)The previous estimate sequence becomes

φk(x) ≡ φ∗k +γk2||x − vk ||2

with

γk+1 =(1− αk)γk + αkµ

vk+1 =[(1− αk)γkvk + αkµyk − αk f′(yk)]/γk+1

φ∗k+1 =(1− αk)φ∗k + αk f (yk)−α2k

2γk+1||f ′(yk)||2

+αk(1− αk)γk

γk+1

(µ2||yk − vk ||2 +

⟨f ′(yk), vk − yk

⟩)Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 25: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Let the update be

xk+1 = yk − hk f′(yk)

and use the inequalities

φ∗k ≥ f (xk) ≥ f (yk) + 〈f ′(yk), xk − yk〉+ µ2 ||xk − yk ||2

f (xk+1) ≤ f (yk)− hk (2−Lhk )2 ||f ′(yk)||2

We have

φ∗k+1 − f (xk+1) ≥(−

α2k

2γk+1+

hk(2− Lhk)

2

)||f ′(yk)||2

+ (1− α)〈f ′(yk),αkγkγk+1

(vk − yk) + (xk − yk)〉

+µ(1− αk)

2

(αkγkγk+1

||vk − yk ||2 + ||xk − yk ||2)

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 26: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

φ∗k+1 − f (xk+1) ≥(−

α2k

2γk+1+

hk(2− Lhk)

2

)||f ′(yk)||2

+(1− αk)〈f ′(yk),αkγkγk+1

(vk − yk) + (xk − yk)〉

+µ(1− αk)

2

(αkγkγk+1

||vk − yk ||2 + ||xk − yk ||2)

Nesterov’ choice:

yk = αkγkvk+γk+1xkγk+αkµ

hk = 1L .

γ0 ≥ µ. Since γk+1 = (1− αk)γk + αkµ, we have γk ≥ µ

αk can be as large as√

µL at each step, which leads to the

convergence rate 1−√

µL = 1− 1√

κ.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 27: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

A simplified version

γk ≡ µ, hk ≡ 1L

yk = αkvk+xkα+1

vk − yk = vk−xkα+1

xk − yk = α(xk−vk )α+1

φ∗k+1 − f (xk+1) ≥(−α2k

2µ+

1

2L

)||f ′(yk)||2

+µαk(1− αk)

2(1 + αk)||xk − vk ||2

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 28: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

||xk − vk ||2/||f ′(yk)||2

Figure: f (x) = 12 ||Ax − b||2 + λ · smooth(||x ||1, τ) + 1

2µ||x ||2

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 29: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

µαk(1− αk)

2(1 + αk)

||xk − vk ||2

||f ′(αkvk+xkαk+1 )||2

−α2k

2µ≥ − 1

2L

Since the decay rate is∏

k(1− αk), we want to find a largeαk such that the inequality holds.

Evaluating f ′(αkvk+xkα+1 ) is time consuming, so we hope our first

guess of αk is good.

Note that ||f ′(yk)|| has a trend of decreasing, so our procedure is

to find an αk ≥√

µL such that µαk (1−αk )

2(1+αk )||xk−vk ||2||f ′(yk−1)||2

− α2k

2µ is large,

then such αk usually makes the inequality holds.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 30: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

||f ′(yk)||

Figure: f (x) = 12 ||Ax − b||2 + λ · smooth(||x ||1, τ) + 1

2µ||x ||2

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 31: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Test 1: smooth-BPDN

The first test is a smooth version of Basis Pursuit De-Noising:

minimize f (x) =1

2‖Ax − b‖2 + λ · smooth(‖x‖1, τ) +

µ

2‖x‖2,

where we set

A =1√n· randn(m, n), m = 1000, n = 3000,

λ = 0.2, τ = 0.001, and µ = 0.01.

x∗ is a random sparse vector with 125 non-zeros and b = Ax∗ + ε.We use the following estimate for L:

L =

(1 +

√m

n

)2

τ+ µ ≈ 202.50.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 32: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: ‖xk − x∗‖

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 33: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: f (xk)− f ∗

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 34: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Test 2: anisotropic bowl

The second test is

minimize f (x) =n∑

i=1

i · x4i +1

2‖x‖2,

subject to ‖x‖ ≤ τ.

We choose n = 500 and τ = 4. x0 is randomly chosen from theboundary {x | ‖x‖ = τ}. For this problem, we have

L = 12nτ2 + 1 = 96001 and µ = 1.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 35: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: ‖xk − x∗‖

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 36: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: f (xk)− f ∗

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 37: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Test 3: back to quadratic functions

Let’s check the performance of the adaptive algorithm onquadratic functions.

minimize f (x) =1

2xTAx − bT x .

We choose A ∼ 1mWn(In,m), where n = 4500 and m = 5000. We

use the following estimate for L and µ:

L =

(1 +

√n

m

)2

, µ =

(1−

√n

m

)2

.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 38: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: ‖xk − x∗‖

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 39: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: f (xk)− f ∗

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 40: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Comparing with TFOCS(AT)

Figure: ‖xk − x∗‖Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 41: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Figure: f (xk)− f ∗

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions

Page 42: Accelerating Nesterov's Method for Strongly Convex Functions · Nesterov’s method (h = 1 L): kx k xk C 1 1 p k kx 0 xk: Note that this is better than the convergence rate we have

The GapQuadratic Functions

General Strongly Convex Functions

Final thoughts

The convergence rate of Nesterov’s method depends onproblem types. For quadratic problems, the speed is doubled.

There is space to improve Nesterov’s optimal gradient methodon strongly convex functions.

Whether we can improve Nesterov’s method universally (withtheoretical proof) is still a question.

Hao Chen, Xiangrui Meng Accelerating Nesterov’s Method for Strongly Convex Functions