subgradient and bundle methods
DESCRIPTION
A short first seminar on subgradient and bundle methods for nonsmooth optimization.TRANSCRIPT
![Page 1: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/1.jpg)
Subgradient and Bundle Methodsfor optimization of convex non-smooth functions
April 1, 2009
Subgradient and Bundle Methods
![Page 2: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/2.jpg)
Motivation
Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function
If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”
Subgradient and Bundle Methods
![Page 3: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/3.jpg)
Motivation
Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function
If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”
Subgradient and Bundle Methods
![Page 4: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/4.jpg)
Motivation
Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function
If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”
Subgradient and Bundle Methods
![Page 5: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/5.jpg)
Motivation
Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function
If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”
Subgradient and Bundle Methods
![Page 6: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/6.jpg)
Motivation
Many naturally occuring problems are nonsmoothHinge lossFeasible region of a convex minimization problemPiecewise Linear function
If a function is approximating a non-smooth function, then itmay be analytically smooth, but “numerically nonsmooth”
Subgradient and Bundle Methods
![Page 7: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/7.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 8: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/8.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 9: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/9.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 10: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/10.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 11: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/11.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 12: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/12.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 13: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/13.jpg)
Methods for nonsmooth optimizations
Approximate by a series of smooth functionsReformulate the problem adding more constraints suchthat the objective is smoothSubgradient MethodsCutting Plane MethodsMoreau-Yosida RegularizationBundle MethodsU V decomposition
Subgradient and Bundle Methods
![Page 14: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/14.jpg)
DefinitionAn extension of gradients
For a convex differentiable function f (x), ∀x , y
f (y) ≥ f (x) +∇f (x)T(y − x) (1)
So, a subgradient is defined as any g ∈ Rn such that ∀y
f (y) ≥ f (x) + gT(y − x) (2)
The set of all subgradients of f at x is denoted ∂f (x)
Subgradient and Bundle Methods
![Page 15: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/15.jpg)
DefinitionAn extension of gradients
For a convex differentiable function f (x), ∀x , y
f (y) ≥ f (x) +∇f (x)T(y − x) (1)
So, a subgradient is defined as any g ∈ Rn such that ∀y
f (y) ≥ f (x) + gT(y − x) (2)
The set of all subgradients of f at x is denoted ∂f (x)
Subgradient and Bundle Methods
![Page 16: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/16.jpg)
DefinitionAn extension of gradients
For a convex differentiable function f (x), ∀x , y
f (y) ≥ f (x) +∇f (x)T(y − x) (1)
So, a subgradient is defined as any g ∈ Rn such that ∀y
f (y) ≥ f (x) + gT(y − x) (2)
The set of all subgradients of f at x is denoted ∂f (x)
Subgradient and Bundle Methods
![Page 17: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/17.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 18: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/18.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 19: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/19.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 20: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/20.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 21: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/21.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 22: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/22.jpg)
Some FactsFrom Convex Analysis
A convex function is always subdifferentiable i.e. theSubgradient of a convex function exists at every point.Directional derivatives also exist at every point.If a convex function f is differentiable at x , its subgradientis the gradient at that point. i.e. ∂f (x) = {∇f (x)}Subgradients are lower bounds for directional derivatives.f ′(x ; d) = supg∈∂f (x) 〈g,d〉Further, d is a descent direction iff gT d < 0 ∀g ∈ ∂f (x)
Subgradient and Bundle Methods
![Page 23: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/23.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 24: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/24.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 25: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/25.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 26: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/26.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 27: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/27.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 28: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/28.jpg)
PropertiesWithout Proof
∂(f1 + f2)(x) = ∂f1(x) + ∂f2(x)
∂αf (x) = α∂f (x)
g(x) = f (Ax + b)⇒ ∂g(x) = AT∂f (Ax + b)
Local minima⇒ 0 ∈ ∂f (x)However, For f (x) = |x |, the oracle returns subgradient 0only at 0. So this is not a good way to find minima
Subgradient and Bundle Methods
![Page 29: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/29.jpg)
Subgradient MethodAlgorithm
Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)
f (k)best = min{f (k−1)
best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time
Subgradient and Bundle Methods
![Page 30: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/30.jpg)
Subgradient MethodAlgorithm
Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)
f (k)best = min{f (k−1)
best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time
Subgradient and Bundle Methods
![Page 31: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/31.jpg)
Subgradient MethodAlgorithm
Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)
f (k)best = min{f (k−1)
best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time
Subgradient and Bundle Methods
![Page 32: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/32.jpg)
Subgradient MethodAlgorithm
Subgradient Method is NOT a descent method!x (k+1) = x (k) − αkgk for αk ≥ 0 and gk ∈ ∂f (x)
f (k)best = min{f (k−1)
best , f (x (k))}Line search is not performed. Step lengths αk usually fixedahead of time
Subgradient and Bundle Methods
![Page 33: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/33.jpg)
Step Lengths
Commonly used Step lengthsConstant step size: αk = α
Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,
∑∞k=1 α
2k <∞,
∑∞k=1 αk =∞.
Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,
∑∞k=1 αk =∞.
Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,
∑∞k=1 γk =∞.
Subgradient and Bundle Methods
![Page 34: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/34.jpg)
Step Lengths
Commonly used Step lengthsConstant step size: αk = α
Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,
∑∞k=1 α
2k <∞,
∑∞k=1 αk =∞.
Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,
∑∞k=1 αk =∞.
Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,
∑∞k=1 γk =∞.
Subgradient and Bundle Methods
![Page 35: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/35.jpg)
Step Lengths
Commonly used Step lengthsConstant step size: αk = α
Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,
∑∞k=1 α
2k <∞,
∑∞k=1 αk =∞.
Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,
∑∞k=1 αk =∞.
Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,
∑∞k=1 γk =∞.
Subgradient and Bundle Methods
![Page 36: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/36.jpg)
Step Lengths
Commonly used Step lengthsConstant step size: αk = α
Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,
∑∞k=1 α
2k <∞,
∑∞k=1 αk =∞.
Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,
∑∞k=1 αk =∞.
Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,
∑∞k=1 γk =∞.
Subgradient and Bundle Methods
![Page 37: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/37.jpg)
Step Lengths
Commonly used Step lengthsConstant step size: αk = α
Constant step length: αk = αk = γ/‖g(k)‖2Square summable but not summable step size:αk ≥ 0,
∑∞k=1 α
2k <∞,
∑∞k=1 αk =∞.
Nonsummable diminishing:αk ≥ 0, limk→∞ αk = 0,
∑∞k=1 αk =∞.
Nonsummable diminishing step lengths:γk ≥ 0, limk→∞ γk = 0,
∑∞k=1 γk =∞.
Subgradient and Bundle Methods
![Page 38: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/38.jpg)
Convergence Result
Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)
Result f kbest − f ∗ ≤
dist(x (1),X ∗
)2+ G2∑k
i=1 α2i
2∑k
i=1 αi
Proof is through proving ||x − x∗|| decreases
Subgradient and Bundle Methods
![Page 39: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/39.jpg)
Convergence Result
Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)
Result f kbest − f ∗ ≤
dist(x (1),X ∗
)2+ G2∑k
i=1 α2i
2∑k
i=1 αi
Proof is through proving ||x − x∗|| decreases
Subgradient and Bundle Methods
![Page 40: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/40.jpg)
Convergence Result
Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)
Result f kbest − f ∗ ≤
dist(x (1),X ∗
)2+ G2∑k
i=1 α2i
2∑k
i=1 αi
Proof is through proving ||x − x∗|| decreases
Subgradient and Bundle Methods
![Page 41: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/41.jpg)
Convergence Result
Assume that ∃G such that the norm of the subgradients isbounded i.e. ||g(k)||2 ≤ G(For example, Suppose f is Lipshitz continuous)
Result f kbest − f ∗ ≤
dist(x (1),X ∗
)2+ G2∑k
i=1 α2i
2∑k
i=1 αi
Proof is through proving ||x − x∗|| decreases
Subgradient and Bundle Methods
![Page 42: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/42.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 43: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/43.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 44: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/44.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 45: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/45.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 46: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/46.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 47: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/47.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 48: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/48.jpg)
Convergence for Commonly used Step lengths
Constant step size: f (k)best within
G2h2
of optimal
Constant step length: f (k)best within Gh of optimal
Square summable but not summable step size: f (k)best → f ∗
Nonsummable diminishing: f (k)best → f ∗
Nonsummable diminishing step lengths: f (k)best → f ∗
f (k)best − f ∗ ≤
R2 + G2∑ki=1 α
2i
2∑k
i=1 αi
So, optimal αi areR/G√
kand converges in (RG/ε)2 steps
Subgradient and Bundle Methods
![Page 49: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/49.jpg)
Variations
If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known
αk =f (x (k))− f ∗
||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))
Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))
Subgradient and Bundle Methods
![Page 50: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/50.jpg)
Variations
If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known
αk =f (x (k))− f ∗
||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))
Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))
Subgradient and Bundle Methods
![Page 51: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/51.jpg)
Variations
If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known
αk =f (x (k))− f ∗
||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))
Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))
Subgradient and Bundle Methods
![Page 52: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/52.jpg)
Variations
If optimal value is known eg. if the optimal value is knownto be 0, but the point is not known
αk =f (x (k))− f ∗
||g(k)||2Projected Subgradient: minimize f (x) s.t. x ∈ Cx (k+1) = P(x (k) + αkg(k))
Alternating projections: Find a point in the intesection of 2convex setsHeavy Ball method:x (k+1) = x (k) − αkg(k) + βk (x (k )− x (k−1))
Subgradient and Bundle Methods
![Page 53: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/53.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 54: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/54.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 55: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/55.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 56: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/56.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 57: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/57.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 58: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/58.jpg)
ProsCan immediately be applied to a wide variety of problems,especially when accuracy required is not very high.Low memory usageOften possible to design distributed methods if objectiveis decomposible
ConsSlower than second-order methods
Subgradient and Bundle Methods
![Page 59: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/59.jpg)
Cutting Plane Method
Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi
T (x − xi)
Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable
Subgradient and Bundle Methods
![Page 60: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/60.jpg)
Cutting Plane Method
Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi
T (x − xi)
Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable
Subgradient and Bundle Methods
![Page 61: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/61.jpg)
Cutting Plane Method
Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi
T (x − xi)
Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable
Subgradient and Bundle Methods
![Page 62: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/62.jpg)
Cutting Plane Method
Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi
T (x − xi)
Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable
Subgradient and Bundle Methods
![Page 63: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/63.jpg)
Cutting Plane Method
Again, Consider the problem: minimize f (x) subject tox ∈ CConstruct an Approximate Model:f (x) = maxi∈I(f̂ (xi) + gi
T (x − xi)
Minimize model over x and find f (x) and gUpdate model and repeat till desired accuracyNumerically unstable
Subgradient and Bundle Methods
![Page 64: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/64.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 65: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/65.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 66: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/66.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 67: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/67.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 68: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/68.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 69: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/69.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 70: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/70.jpg)
Moreau-Yosida Regularization
Idea: solve a series of smooth convex problems tominimize f (x)
F (x) = miny∈Rn
{f (y) +
λ
2||y − x ||2
}p(x) = argminy∈Rn
{f (y) +
λ
2||y − x ||2
}F (x) is differentiable!∇F (x) = λ(x − p(x))
Minimization is done using the dual.Cutting Plane Method + Moreau-Yosida Regularization =Bundle Methods
Subgradient and Bundle Methods
![Page 71: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/71.jpg)
Elementary Bundle Method
As before, f is assumed to be Lipshitz continuousAt a generic iteration we maintain a “bundle”< yi , f (yi), si , αi >
Subgradient and Bundle Methods
![Page 72: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/72.jpg)
Elementary Bundle Method
As before, f is assumed to be Lipshitz continuousAt a generic iteration we maintain a “bundle”< yi , f (yi), si , αi >
Subgradient and Bundle Methods
![Page 73: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/73.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 74: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/74.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 75: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/75.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 76: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/76.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 77: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/77.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 78: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/78.jpg)
Elementary Bundle Method
Follow Cutting Plane Method, but use M-Y Regularizationfor building the model
yk+1 = argminy∈Rn f̂k (y) +µk
2||y − x̂k ||2
δk = f (x̂k )− [f̂k (yk+1) +µk
2||yk+1 − x̂k ||2] ≥ 0
if δk < δ stopIf f (x̂k )− f (yk+1) ≥ mδkSerious Step x̂k+1 = yk+1
else Null Step x̂k+1 = x̂k
f̂k+1(y) = max{f̂k (y), f (yk+1) +⟨sk+1, y − yk+1⟩}
Subgradient and Bundle Methods
![Page 79: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/79.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 80: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/80.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 81: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/81.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 82: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/82.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 83: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/83.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 84: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/84.jpg)
Convergence
The Algorithm either makes a finite number of SeriousSteps and then only makes Null stepsThen, If k0 is the last Serious Step, and µk isnondecreasing, then δk → 0Or it makes an infinite number of Serious steps
Then,∑
k∈Ksδk ≤
f (x̂0)− f ∗
mso δk → 0
Subgradient and Bundle Methods
![Page 85: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/85.jpg)
Variations
Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods
Subgradient and Bundle Methods
![Page 86: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/86.jpg)
Variations
Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods
Subgradient and Bundle Methods
![Page 87: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/87.jpg)
Variations
Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods
Subgradient and Bundle Methods
![Page 88: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/88.jpg)
Variations
Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods
Subgradient and Bundle Methods
![Page 89: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/89.jpg)
Variations
Replace ||y − x ||2 by (y − x)T Mk (y − x) : Still differentiableConjuguate Gradient methods are achieved as a slightmodification of the algorithm (Refer [5])Variable Metric Methods [10]Mk = uk I for Diagonal Variable Metric MethodsBundle-Newton Methods
Subgradient and Bundle Methods
![Page 90: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/90.jpg)
Summary
Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.
Subgradient and Bundle Methods
![Page 91: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/91.jpg)
Summary
Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.
Subgradient and Bundle Methods
![Page 92: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/92.jpg)
Summary
Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.
Subgradient and Bundle Methods
![Page 93: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/93.jpg)
Summary
Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.
Subgradient and Bundle Methods
![Page 94: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/94.jpg)
Summary
Nonsmooth convex optimization has been explored since1960’s. The original subgradient methods were introducedby Naum Shor. Bundle methods have been developedmore recently.Subgradient Methods are simple but slow, unlessdistributed, which is the predominant current application.Bundle Methods solve a bounded QP, which is slow, butneed fewer iterations. Preferred for applications whereoracle cost is high.
Subgradient and Bundle Methods
![Page 95: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/95.jpg)
For Further Reading I
Naum Z. ShorMinimization Methods for non-differentiable functions.Springer-Verlag, 1985.
Boyd and VanderbergeConvex Optimization.Cambridge University Press
A. RuszczyinskiNonlinear OptimizationPrinceton University Press
Wikipediaen.wikipedia.org/wiki/Subgradient_method
Subgradient and Bundle Methods
![Page 96: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/96.jpg)
For Further Reading II
Marko MakelaSurvey of Bundle Methods, 2009http://www.informaworld.com/smpp/content~db=all~content=a713741700
Alexandre BelloniAn Introduction to Bundle Methodshttp://web.mit.edu/belloni/www/LecturesIntroBundle.pdf
John E. MitchellCutting Plane and Subgradient Methods, 2005http://www.optimization-online.org/DB_HTML/2009/05/2298.html
Subgradient and Bundle Methods
![Page 97: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/97.jpg)
For Further Reading III
Lecture Notes on Subgradient methods by Stephen Boydhttp://www.stanford.edu/class/ee392o/subgrad_method.pdf
Alexander J. Smola, S.V. N. Vishwanathan, Quoc V. LeBundle Methods for Machine Learning, 2007http://books.nips.cc/papers/files/nips20/NIPS2007_0470.pdf
C Lemarechal Variable metric bundle methods, 1997.http://www.springerlink.com/index/3515WK428153171N.pdf
Quoc Le, Alexander SmolaDirect Optimization of Ranking Measures, 2007http://arxiv.org/abs/0704.3359
Subgradient and Bundle Methods
![Page 98: Subgradient and Bundle methods](https://reader034.vdocument.in/reader034/viewer/2022042518/546b160db4af9fba128b4b3c/html5/thumbnails/98.jpg)
For Further Reading IV
SVN Vishwanathan, A. SmolaQuasi-Newton Methods for Efficient Large-Scale MachineLearninghttp://portal.acm.org/ft_gateway.cfm?id=1390309&type=pdfandwww.stat.purdue.edu/~vishy/talks/LBFGS.pdf
Subgradient and Bundle Methods