subgradient and bundle methods

98

Subgradient and Bundle Methods for optimization of convex non-smooth functions April 1, 2009 Subgradient and Bundle Methods

Upload: harshhpareek

Post on 18-Nov-2014

120 views

Category:

Documents

1 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

A short first seminar on subgradient and bundle methods for nonsmooth optimization.

TRANSCRIPT

Page 1: Subgradient and Bundle methods

Subgradient and Bundle Methodsfor optimization of convex non-smooth functions

April 1, 2009

Subgradient and Bundle Methods

Page 2: Subgradient and Bundle methods

Motivation

Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function

If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Page 3: Subgradient and Bundle methods

Motivation

Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function

If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Page 4: Subgradient and Bundle methods

Motivation

Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function

If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Page 5: Subgradient and Bundle methods

Motivation

Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function

If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Page 6: Subgradient and Bundle methods

Motivation

Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function

If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Page 7: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 8: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 9: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 10: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 11: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 12: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 13: Subgradient and Bundle methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition

Subgradient and Bundle Methods

Page 14: Subgradient and Bundle methods

DefinitionAn extension of gradients

For a convex differentiable function f (x), ∀x , y

f (y) ≥ f (x) +∇f (x)T(y − x) (1)

So, a subgradient is defined as any g ∈ Rn such that ∀y

f (y) ≥ f (x) + gT(y − x) (2)

The set of all subgradients of f at x is denoted ∂f (x)

Subgradient and Bundle Methods

Page 15: Subgradient and Bundle methods

DefinitionAn extension of gradients

For a convex differentiable function f (x), ∀x , y

f (y) ≥ f (x) +∇f (x)T(y − x) (1)

So, a subgradient is defined as any g ∈ Rn such that ∀y

f (y) ≥ f (x) + gT(y − x) (2)

The set of all subgradients of f at x is denoted ∂f (x)

Subgradient and Bundle Methods

Page 16: Subgradient and Bundle methods

DefinitionAn extension of gradients

For a convex differentiable function f (x), ∀x , y

f (y) ≥ f (x) +∇f (x)T(y − x) (1)

So, a subgradient is defined as any g ∈ Rn such that ∀y

f (y) ≥ f (x) + gT(y − x) (2)

The set of all subgradients of f at x is denoted ∂f (x)

Subgradient and Bundle Methods

Page 17: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 18: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 19: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 20: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 21: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 22: Subgradient and Bundle methods

Some FactsFrom Convex Analysis

A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Page 23: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 24: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 25: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 26: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 27: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 28: Subgradient and Bundle methods

PropertiesWithout Proof

∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)

∂αf (x) = α∂f (x)

g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)

Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima

Subgradient and Bundle Methods

Page 29: Subgradient and Bundle methods

Subgradient MethodAlgorithm

Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)

f (k)best = min{f (k−1)

best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time

Subgradient and Bundle Methods

Page 30: Subgradient and Bundle methods

Subgradient MethodAlgorithm

Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)

f (k)best = min{f (k−1)

best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time

Subgradient and Bundle Methods

Page 31: Subgradient and Bundle methods

Subgradient MethodAlgorithm

Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)

f (k)best = min{f (k−1)

best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time

Subgradient and Bundle Methods

Page 32: Subgradient and Bundle methods

Subgradient MethodAlgorithm

Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)

f (k)best = min{f (k−1)

best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time

Subgradient and Bundle Methods

Page 33: Subgradient and Bundle methods

Step Lengths

Commonly used Step lengthsConstant step size: αk = α

Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,

∑∞k=1 α

2k <∞,

∑∞k=1 αk =∞.

Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,

∑∞k=1 αk =∞.

Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,

∑∞k=1 γk =∞.

Subgradient and Bundle Methods

Page 34: Subgradient and Bundle methods

Step Lengths

Commonly used Step lengthsConstant step size: αk = α

Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,

∑∞k=1 α

2k <∞,

∑∞k=1 αk =∞.

Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,

∑∞k=1 αk =∞.

Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,

∑∞k=1 γk =∞.

Subgradient and Bundle Methods

Page 35: Subgradient and Bundle methods

Step Lengths

Commonly used Step lengthsConstant step size: αk = α

Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,

∑∞k=1 α

2k <∞,

∑∞k=1 αk =∞.

Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,

∑∞k=1 αk =∞.

Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,

∑∞k=1 γk =∞.

Subgradient and Bundle Methods

Page 36: Subgradient and Bundle methods

Step Lengths

Commonly used Step lengthsConstant step size: αk = α

Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,

∑∞k=1 α

2k <∞,

∑∞k=1 αk =∞.

Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,

∑∞k=1 αk =∞.

Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,

∑∞k=1 γk =∞.

Subgradient and Bundle Methods

Page 37: Subgradient and Bundle methods

Step Lengths

Commonly used Step lengthsConstant step size: αk = α

Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,

∑∞k=1 α

2k <∞,

∑∞k=1 αk =∞.

Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,

∑∞k=1 αk =∞.

Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,

∑∞k=1 γk =∞.

Subgradient and Bundle Methods

Page 38: Subgradient and Bundle methods

Convergence Result

Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)

Result f kbest − f ∗ ≤

dist(x (1),X ∗

)2+ G2∑k

i=1 α2i

2∑k

i=1 αi

Proof is through proving ||x − x∗|| decreases

Subgradient and Bundle Methods

Page 39: Subgradient and Bundle methods

Convergence Result

Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)

Result f kbest − f ∗ ≤

dist(x (1),X ∗

)2+ G2∑k

i=1 α2i

2∑k

i=1 αi

Proof is through proving ||x − x∗|| decreases

Subgradient and Bundle Methods

Page 40: Subgradient and Bundle methods

Convergence Result

Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)

Result f kbest − f ∗ ≤

dist(x (1),X ∗

)2+ G2∑k

i=1 α2i

2∑k

i=1 αi

Proof is through proving ||x − x∗|| decreases

Subgradient and Bundle Methods

Page 41: Subgradient and Bundle methods

Convergence Result

Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)

Result f kbest − f ∗ ≤

dist(x (1),X ∗

)2+ G2∑k

i=1 α2i

2∑k

i=1 αi

Proof is through proving ||x − x∗|| decreases

Subgradient and Bundle Methods

Page 42: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 43: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 44: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 45: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 46: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 47: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 48: Subgradient and Bundle methods

Convergence for Commonly used Step lengths

Constant step size: f (k)best within

G2h2

of optimal

Constant step length: f (k)best within Gh of optimal

Square summable but not summable step size: f (k)best → f ∗

Nonsummable diminishing: f (k)best → f ∗

Nonsummable diminishing step lengths: f (k)best → f ∗

f (k)best − f ∗ ≤

R2 + G2∑ki=1 α

2i

2∑k

i=1 αi

So, optimal αi areR/G√

kand converges in (RG/ε)2 steps

Subgradient and Bundle Methods

Page 49: Subgradient and Bundle methods

Variations

If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known

αk =f (x (k))− f ∗

||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))

Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))

Subgradient and Bundle Methods

Page 50: Subgradient and Bundle methods

Variations

If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known

αk =f (x (k))− f ∗

||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))

Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))

Subgradient and Bundle Methods

Page 51: Subgradient and Bundle methods

Variations

If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known

αk =f (x (k))− f ∗

||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))

Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))

Subgradient and Bundle Methods

Page 52: Subgradient and Bundle methods

Variations

If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known

αk =f (x (k))− f ∗

||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))

Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))

Subgradient and Bundle Methods

Page 53: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 54: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 55: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 56: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 57: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 58: Subgradient and Bundle methods

ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible

ConsSlower than second-order methods

Subgradient and Bundle Methods

Page 59: Subgradient and Bundle methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi

T (x − xi)

Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable

Subgradient and Bundle Methods

Page 60: Subgradient and Bundle methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi

T (x − xi)

Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable

Subgradient and Bundle Methods

Page 61: Subgradient and Bundle methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi

T (x − xi)

Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable

Subgradient and Bundle Methods

Page 62: Subgradient and Bundle methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi

T (x − xi)

Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable

Subgradient and Bundle Methods

Page 63: Subgradient and Bundle methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi

T (x − xi)

Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable

Subgradient and Bundle Methods

Page 64: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 65: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 66: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 67: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 68: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 69: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 70: Subgradient and Bundle methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems tominimize f (x)

F (x) = miny∈Rn

{f (y) +

λ

2||y − x ||2

}p(x) = argminy∈Rn

{f (y) +

λ

2||y − x ||2

}F (x) is differentiable!∇F (x) = λ(x − p(x))

Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods

Subgradient and Bundle Methods

Page 71: Subgradient and Bundle methods

Elementary Bundle Method

As before, f is assumed to be Lipshitz continuousAt a generic iteration we maintain a “bundle”< yi , f (yi), si , αi >

Subgradient and Bundle Methods

Page 72: Subgradient and Bundle methods

Elementary Bundle Method

As before, f is assumed to be Lipshitz continuousAt a generic iteration we maintain a “bundle”< yi , f (yi), si , αi >

Subgradient and Bundle Methods

Page 73: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 74: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 75: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 76: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 77: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 78: Subgradient and Bundle methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularizationfor building the model

yk+1 = argminy∈Rn f̂k (y) +µk

2||y − x̂k ||2

δk = f (x̂k )− [f̂k (yk+1) +µk

2||yk+1 − x̂k ||2] ≥ 0

if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1

else Null Step x̂k+1 = x̂k

f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}

Subgradient and Bundle Methods

Page 79: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 80: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 81: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 82: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 83: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 84: Subgradient and Bundle methods

Convergence

The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps

Then,∑

k∈Ksδk ≤

f (x̂0)− f ∗

mso δk → 0

Subgradient and Bundle Methods

Page 85: Subgradient and Bundle methods

Variations

Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods

Subgradient and Bundle Methods

Page 86: Subgradient and Bundle methods

Variations

Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods

Subgradient and Bundle Methods

Page 87: Subgradient and Bundle methods

Variations

Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods

Subgradient and Bundle Methods

Page 88: Subgradient and Bundle methods

Variations

Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods

Subgradient and Bundle Methods

Page 89: Subgradient and Bundle methods

Variations

Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods

Subgradient and Bundle Methods

Page 90: Subgradient and Bundle methods

Summary

Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.

Subgradient and Bundle Methods

Page 91: Subgradient and Bundle methods

Summary

Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.

Subgradient and Bundle Methods

Page 92: Subgradient and Bundle methods

Summary

Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.

Subgradient and Bundle Methods

Page 93: Subgradient and Bundle methods

Summary

Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.

Subgradient and Bundle Methods

Page 94: Subgradient and Bundle methods

Summary

Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.

Subgradient and Bundle Methods

Page 95: Subgradient and Bundle methods

For Further Reading I

Naum Z. ShorMinimization Methods for non-differentiable functions.Springer-Verlag, 1985.

Boyd and VanderbergeConvex Optimization.Cambridge University Press

A. RuszczyinskiNonlinear OptimizationPrinceton University Press

Wikipediaen.wikipedia.org/wiki/Subgradient_method

Subgradient and Bundle Methods

en.wikipedia.org/wiki/Subgradient_method

Page 96: Subgradient and Bundle methods

For Further Reading II

Marko MakelaSurvey of Bundle Methods, 2009http://www.informaworld.com/smpp/content~db=all~content=a713741700

Alexandre BelloniAn Introduction to Bundle Methodshttp://web.mit.edu/belloni/www/LecturesIntroBundle.pdf

John E. MitchellCutting Plane and Subgradient Methods, 2005http://www.optimization-online.org/DB_HTML/2009/05/2298.html

Subgradient and Bundle Methods

http://www.informaworld.com/smpp/content~db=all~content=a713741700

http://www.informaworld.com/smpp/content~db=all~content=a713741700

http://web.mit.edu/belloni/www/LecturesIntroBundle.pdf

http://web.mit.edu/belloni/www/LecturesIntroBundle.pdf

http://www.optimization-online.org/DB_HTML/2009/05/2298.html

http://www.optimization-online.org/DB_HTML/2009/05/2298.html

Page 97: Subgradient and Bundle methods

For Further Reading III

Lecture Notes on Subgradient methods by Stephen Boydhttp://www.stanford.edu/class/ee392o/subgrad_method.pdf

Alexander J. Smola, S.V. N. Vishwanathan, Quoc V. LeBundle Methods for Machine Learning, 2007http://books.nips.cc/papers/files/nips20/NIPS2007_0470.pdf

C Lemarechal Variable metric bundle methods, 1997.http://www.springerlink.com/index/3515WK428153171N.pdf

Quoc Le, Alexander SmolaDirect Optimization of Ranking Measures, 2007http://arxiv.org/abs/0704.3359

Subgradient and Bundle Methods

http://www.stanford.edu/class/ee392o/subgrad_method.pdf

http://www.stanford.edu/class/ee392o/subgrad_method.pdf

http://books.nips.cc/papers/files/nips20/NIPS2007_0470.pdf

http://books.nips.cc/papers/files/nips20/NIPS2007_0470.pdf

http://www.springerlink.com/index/3515WK428153171N.pdf

http://www.springerlink.com/index/3515WK428153171N.pdf

http://arxiv.org/abs/0704.3359

Page 98: Subgradient and Bundle methods

For Further Reading IV

SVN Vishwanathan, A. SmolaQuasi-Newton Methods for Efficient Large-Scale MachineLearninghttp://portal.acm.org/ft_gateway.cfm?id=1390309&type=pdfandwww.stat.purdue.edu/~vishy/talks/LBFGS.pdf

Subgradient and Bundle Methods

http://portal.acm.org/ft_gateway.cfm?id=1390309&type=pdf

http://portal.acm.org/ft_gateway.cfm?id=1390309&type=pdf

www.stat.purdue.edu/~vishy/talks/LBFGS.pdf

INCREMENTAL SUBGRADIENT METHODS FOR …

Stochastic Subgradient Methods - Computer Science

Convex Optimization M2aspremon/PDF/MVA/FirstOrderMethods.pdf · First-order methods: introduction Exploiting structure First order algorithms Subgradient methods Gradient methods

Solving basis pursuit: infeasible-point subgradient ... › spear › slides › ...Infeasible-Point Subgradient Algorithm ISAL1 ISA = Infeasible-Point Subgradient Algorithm... works

Bundle Methods for Regularized Risk Minimization · 2. Bundle Methods The precursor to the bundle methods is the cutting plane method (CPM) (Kelly, 1960). CPM uses subgradients, which

Bundle Methods for Regularized Risk Minimization - CMPcmp.felk.cvut.cz/~uricamic/pdf/p311-teo.pdf · BUNDLE METHODS FOR REGULARIZED RISK MINIMIZATION Very brieﬂy, we highlight the

On the Computational Efficiency of Subgradient Methods: A ...frangio/papers/CIRRELT-2015-41.pdfBernard Gendron August 2015 CIRRELT-2015-41 . On the Computational Efficiency of Subgradient

Subgradient methods for huge-scale optimization problems · CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov ⁄ January, 2012 Abstract

Stochastic Subgradient MCMC Methods - arXiv Subgradient MCMC Methods ... stochastic subgradient methods for deterministic optimization, ... we conjoin the ideas of stochastic subgradi-ent

Subgradient Methods - Stanford University · There are many results on convergence of the subgradient method. For constant step size and constant step length, the subgradient algorithm

Regularized Bundle Methods for Convex and Non-Convex Risks

Bundle methods in depth - Optimization Online

Subgradient Methods for Saddle-Point Problems · concave function q(„).A subgradient of a concave function is deﬂned through a subgra-dient of a convex function ¡q(„).In particular,

Dynamic Bundle Methods - Optimization Online · Dynamic Bundle Methods Third Revision:March 15, 2007 Abstract. Lagrangian relaxation is a popular technique to solve diﬃcult optimization

Stochastic Subgradient Method - UCI

Dynamic Bundle Methods - Fuqua School of Businessabn5/dynbunMP.pdf · 2005-07-12 · Dynamic Bundle Methods Application to Combinatorial Optimization Revised version of: July 12,

Chapter 8 Stochastic gradient / subgradient methods

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization · 2010-03-03 · Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning andStochastic

Subgradient Methods for Saddle-Point Problems

Bundle Methods for Regularized Risk Minimizationjmlr.csail.mit.edu/papers/volume11/teo10a/teo10a.pdf · BUNDLE METHODS FOR REGULARIZED RISK MINIMIZATION Very brieﬂy, we highlight

Distributed Subgradient Methods for Convex Optimization over

INCREMENTAL SUBGRADIENT METHODS FOR …angelia/Increm_LIDS.pdf · INCREMENTAL SUBGRADIENT METHODS 1 ... It is well-known that solving dual problems of the type above, ... Convergence

Adaptive Subgradient Methods for Online Learning and ...simplecore-dev.intel.com › nervana › wp-content › ...Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Catholic University of Louvain, Belgium

Distributed Subgradient Methods for Multi-agent …...1 Distributed Subgradient Methods for Multi-agent Optimization∗ Angelia Nedi´c† and Asuman Ozdaglar‡ October 29, 2007 Abstract

Adaptive Subgradient Methods for Online Learning ...jduchi/projects/DuchiHaSi10.pdf · Adaptive Subgradient Methods Adaptive Subgradient Methods for Online Learning andStochastic

Subgradient Methods - Stanford University

Subgradient methods in nonsmooth nonconvex optimization

Subgradient methods - Princeton Universityyc5/ele522_optimization/lectures/sub... · 2019. 10. 7. · ELE 522: Large-Scale Optimization for Data Science Subgradient methods Yuxin

Non-linear conjugate gradient methods for vector optimization · subgradient, interior point, and proximal methods were proposed for vector optimization, see [2,3,8,19,20,23,24,26{28,37,44,47]

New primal-dual subgradient methods for Convex Problems ...lear.inrialpes.fr/workshop/osl2015/slides/osl2015_yurii.pdf · New primal-dual subgradient methods for Convex Problems with

Standard Bundle Methods: Untrusted Models and Dualityeprints.adm.unipi.it/2378/1/StandardBundle.pdf · Dipartimento di Informatica Technical Report Standard Bundle Methods: Untrusted

53 Cutting Plane Methods and Subgradient Methods

Adaptive Subgradient Methods for Online Learning and Stochastic …web.stanford.edu/~jduchi/projects/DuchiHaSi11.pdf · 2014. 9. 5. · Adaptive Subgradient Methods Using this notation,

Distributed Subgradient Methods for Saddle-Point Problemscarmenere.ucsd.edu/dmateosn/slides_nuclear_norm... · Distributed Subgradient Methods for Saddle-Point Problems David Mateos-Nu