a secant method for nonsmooth optimization · 2014. 7. 12. · a secant method for nonsmooth...

A Secant Method for Nonsmooth Optimization

Asef Nazari

CSIRO Melbourne

CARMA Workshop on Optimization, Nonlinear Analysis, Randomness and RiskNewcastle, Australia

12 July, 2014

Asef Nazari A Secant Method for Nonsmooth Optimization

OutlineBackground

Secant Method

1 BackgroundThe ProblemOptimization Algorithms and componentsSubgradient and Subdifferential

2 Secant MethodDefinitionsOptimality Condition and Descent DirectionThe Secant AlgorithmNumerical Results

Asef Nazari Secant Method

OutlineBackground

Secant Method

The ProblemOptimization Algorithms and componentsSubgradient and Subdifferential

The problem

Unconstrained Optimization Problem

minx∈Rn

f : Rn → Rn

Locally Lipshitz O

Why just unconstrained?

OutlineBackground

Secant Method

The problem

Unconstrained Optimization Problem

minx∈Rn

f : Rn → Rn

Locally Lipshitz O

Why just unconstrained?

OutlineBackground

Secant Method

Transforming Constrained into Unconstrained

Constrained Optimization Problem

minx∈Y

where Y ⊂ Rn.

Distance function

dist(x ,Y ) = miny∈Y‖y − x‖

Theory of penalty function (under some conditions)

minx∈Rn

f (x) + σdist(x , y)

f (x) + σdist(x , y) is a nonsmooth function

OutlineBackground

Secant Method

minx∈Y

where Y ⊂ Rn.

Distance function

minx∈Rn

OutlineBackground

Secant Method

minx∈Y

where Y ⊂ Rn.

Distance function

minx∈Rn

OutlineBackground

Secant Method

Sources of Nonsmooth Problems

Minimax Problemminx∈Rn

f (x) = max1≤i≤m

fi (x)

System of Nonlinear Equationsfi (x) = 0, i = 1, . . . ,m,

we often dominx∈Rn

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

fi (x)

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

fi (x)

‖f (x)‖

where f (x) = (f1(x), . . . , fm(x))

OutlineBackground

Secant Method

Structure of Optimization Algorithms

Step 1 Initial Step x0 ∈ Rn

Step 2 Termination Criteria

Step 3 Finding descent direction dk at xk

Step 4 Finding step size f (xk + αkdk) < f (xk)

Step 5 Loop xk+1 = xk + αkdk and go to step 2.

OutlineBackground

Secant Method

Classification of Algorithms Based on dk and αk

Directionsdk = −∇f (xk) Steepest Descent Methoddk = −H−1(xk)∇f (xk) Newton Methoddk = −B−1∇f (xk) Quasi-Newton Method

Step sizes

h(α) = f (xk + αdk)1 exactly solve h′(α) = 0 exact line search2 loosly solve it, inexact line serach

Trust region methods

OutlineBackground

Secant Method

Classification of Algorithms Based on dk and αk

Directionsdk = −∇f (xk) Steepest Descent Methoddk = −H−1(xk)∇f (xk) Newton Methoddk = −B−1∇f (xk) Quasi-Newton Method

Step sizesh(α) = f (xk + αdk)

1 exactly solve h′(α) = 0 exact line search2 loosly solve it, inexact line serach

Trust region methods

OutlineBackground

Secant Method

When the objective function is smooth

Descent direction f ′(xk , d) < 0

f ′(xk , d) = 〈∇f (xk), d〉 ≤ 0 descent direction

−∇f (xk) steepest descent direction

‖∇f (xk)‖ ≤ ε good stopping criteria (First Order NecessaryConditions)

OutlineBackground

Secant Method

Subgradient

Basic Inequality for convex differentiable function:

f (x) ≥ f (y) +∇f (y)T .(x − y) ∀x ∈ Dom(f )

OutlineBackground

Secant Method

Subgradient

Basic Inequality for convex differentiable function:

f (x) ≥ f (y) +∇f (y)T .(x − y) ∀x ∈ Dom(f )

OutlineBackground

Secant Method

Subgradient

g ∈ Rn is a subgradient of a convex function f at y if

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subgradient

f (x) ≥ f (y) + gT .(x − y) ∀x ∈ Dom(f )

if x < 0, g = −1

if x > 0, g = 1

if x = 0, g ∈ [−1, 1]

OutlineBackground

Secant Method

Subdifferential

Set of all subgradients of f at x , ∂f (x), is calledsubdifferential.

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

∂f (x0) = conv{lim∇f (xi ) : xi −→ x0 ∇f (xi )exists}

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}

f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}

for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Subdifferential

for f (x) = |x |, the subdifferential at x = 0 is ∂f (0) = [−1, 1]For x > 0 ∂f (x) = {1}For x < 0 ∂f (x) = {−1}f is differentiable at x ⇐⇒ ∂f (x) = {∇f (x)}for Lipschitz functions (Clarke)

OutlineBackground

Secant Method

Optimality Condition

For a smooth convex function

f (x∗) = infx∈Dom(f )

f (x)⇐⇒ 0 = ∇f (x∗)

For a nonsmooth convex function

f (x)⇐⇒ 0 ∈ ∂f (x∗)

OutlineBackground

Secant Method

Optimality Condition

For a smooth convex function

f (x)⇐⇒ 0 = ∇f (x∗)

For a nonsmooth convex function

f (x)⇐⇒ 0 ∈ ∂f (x∗)

OutlineBackground

Secant Method

Directional derivative

In Smooth Case

f ′(xk , d) = limλ→0+

f (xk + λd)− f (xk)

λ= 〈∇f (xk), d〉

In nonsmooth case

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

OutlineBackground

Secant Method

Directional derivative

In Smooth Case

f ′(xk , d) = limλ→0+

λ= 〈∇f (xk), d〉

In nonsmooth case

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

OutlineBackground

Secant Method

Descent Direction

In Smooth Case, if f ′(xk , d) = 〈∇f (xk), d〉 < 0. (−∇f (xk)steepest descent direction)

In nonsmooth case, if f ′(xk , d) < 0. It is proved that

d = −argming∈∂f (xk )

‖g‖

OutlineBackground

Secant Method

Descent Direction

In Smooth Case, if f ′(xk , d) = 〈∇f (xk), d〉 < 0. (−∇f (xk)steepest descent direction)

In nonsmooth case, if f ′(xk , d) < 0. It is proved that

‖g‖

OutlineBackground

Secant Method

In Summary

Optimality condition

f (x)⇐⇒ 0 ∈ ∂f (x∗)

Directional Derivative

f ′(xk , d) = limλ→0+

λ= sup

g∈∂f (xk )〈gT , d〉

Steepest Descent Direction

‖g‖

OutlineBackground

Secant Method

Methods for nonsmooth problems

The way we treat ∂f (x) leads to different types of algorithms innonsmooth optimization

Subgradient method (one subgradient at each iteration)

Bundle method (a bundle of subgradients in each iteration)

Gradient Sampling method

smoothing technique

OutlineBackground

Secant Method

DefinitionsOptimality Condition and Descent DirectionThe Secant AlgorithmNumerical Results

Definition of r -secant

S1 = {g ∈ Rn : ‖g‖ = 1} the unit sphere in Rn

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

g ∈ S1 and gj = gmax , v ∈ ∂f (x + rg)

s = s(x , g , r) ∈ Rn where

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

is called an r -secant of the function f at a point x in thedirection g .

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

g ∈ S1 we define

gmax = max {|gi |, i = 1, . . . , n} .

s = (s1, . . . , sn) : si = vi , i = 1, . . . , n, i 6= j

sj =f (x + rg)− f (x)− r

∑ni=1,i 6=j sigi

OutlineBackground

Secant Method

Facts about r -secant

If n = 1,

s =f (x + rg)− f (x)

Mean Value Theorem for r -secants

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Set of all possible r -secants of the function f at the point x

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

1 compact for r > 02 (x , r) 7→ Sr f (x), r > 0 is closed and upper semi-continuous

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

1 compact for r > 0

2 (x , r) 7→ Sr f (x), r > 0 is closed and upper semi-continuous

OutlineBackground

Secant Method

If n = 1,

s =f (x + rg)− f (x)

f (x + rg)− f (x) = r〈s(x , g , r), g〉

Sr f (x) = {s ∈ Rn : ∃g ∈ S1 : s = s(x , g , r)}

OutlineBackground

Secant Method

A new set

Sc0 f (x) = conv{v ∈ Rn : ∃(g ∈ S1, rk → +0, k → +∞) :

v = limk→+∞

s(x , g , rk)},

For regular and semismooth function f at a point x ∈ Rn:

∂f (x) = Sc0 f (x).

OutlineBackground

Secant Method

A new set

Sc0 f (x) = conv{v ∈ Rn : ∃(g ∈ S1, rk → +0, k → +∞) :

v = limk→+∞

s(x , g , rk)},

For regular and semismooth function f at a point x ∈ Rn:

∂f (x) = Sc0 f (x).

OutlineBackground

Secant Method

Optimality condition

Let x ∈ Rn be a local minimizer of the function f and it isdirectionally differentiable at x . Then

0 ∈ Sc0 f (x).

x ∈ Rn is an r -stationary point for a function f on Rn if0 ∈ Sc

r f (x).

x ∈ Rn is an (r , δ)-stationary point for a function f on Rn if0 ∈ Sc

r f (x) + Bδ where

Bδ = {v ∈ Rn : ‖v‖ ≤ δ}.

OutlineBackground

Secant Method

Descent Direction

If x ∈ Rn is not an r -stationary point of a function f on Rn,

0 6∈ Scr f (x).

we can compute a descent direction using the set Scr f (x)

Let x ∈ Rn and for given r > 0

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

Then for g0 = −‖v0‖−1v0

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

subjectto v ∈ Scr f (x).

OutlineBackground

Secant Method

Descent Direction

0 6∈ Scr f (x).

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

OutlineBackground

Secant Method

Descent Direction

0 6∈ Scr f (x).

min{‖v‖ : v ∈ Scr f (x)} = ‖v0‖ > 0.

f (x + rg0)− f (x) ≤ −r‖v0‖.

minimize ‖v‖2

OutlineBackground

Secant Method

An algorithm for descent direction (Alg1)

step 1. compute an r -secant s1 = s(x , g1, r) . Set W 1(x) = {s1} andk = 1.

step 2. ‖wk‖2 = min{‖w‖2 : w ∈ coW k(x)}. If

‖wk‖ ≤ δ,

then stop. Otherwise go to Step 3.

step 3. gk+1 = −‖wk‖−1wk .

step 4. f (x + rgk+1)− f (x) ≤ −cr‖wk‖, then stop. Otherwise go toStep 5.

step 5. sk+1 = s(x , gk+1, r) with respect to the direction gk+1,construct the set W k+1(x) = co

{W k(x)

⋃{sk+1}

}, set

k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

‖wk‖ ≤ δ,

step 3. gk+1 = −‖wk‖−1wk .

{W k(x)

⋃{sk+1}

}, set

OutlineBackground

Secant Method

The secant method (r , δ)-stationary point(Alg 2)

step 1. x0 ∈ Rn and set k = 0.

step 2. Apply Alg 1 at x = xk

Either ‖vk‖ ≤ δ or for the search direction gk = −‖vk‖−1vk

step 3. If ‖vk‖ ≤ δ then stop. Otherwise go to Step 4.

step 4. xk+1 = xk + σkgk , where σk is defined as follows

σk = arg max{σ ≥ 0 : f (xk + σgk)− f (xk) ≤ −c2σ‖vk‖

Set k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

The secant method (Alg 3)

step 2. Apply Alg 2 to xk for r = rk and δ = δk . This algorithmterminates after a finite number of iterations p > 0 and as aresult the algorithm finds (rk , δk)-stationary point xk+1.

step 3. Set k = k + 1 and go to Step 2.

OutlineBackground

Secant Method

OutlineBackground

Secant Method

OutlineBackground

Secant Method

Convergence Theorem

Theorem

Assume that the function f is locally Lipschitz and the set L(x0) isbounded for starting points x0 ∈ Rn. Then every accumulationpoint of the sequence {xk} belongs to the setX 0 = {x ∈ Rn : 0 ∈ ∂f (x)}.

OutlineBackground

Secant Method

Results

OutlineBackground

Secant Method

Results

a secant method for nonsmooth optimization · 2014. 7. 12. · a secant method for nonsmooth...

Documents

an algorithm for nonsmooth optimization by successive...

an introduction to nonsmooth convex optimization...

nonsmooth dynamic optimization of systems with varying...

nonsmooth optimization via bfgs - nyu...

atlanta, ga - 5 first-order methods for nonsmooth convex...

nonsmooth, nonconvex optimization, implications for deep...

a nonsmooth approach to modeling and optimization · 2020....

gauss-seidel for constrained nonsmooth optimization and

nonsmooth optimization: subgradient method, proximal...

how does a stochastic optimization/approximation algorithm...

incremental constraint projection-proximal methods for...

nonsmooth dynamic optimization of systems with varying...

convex analysis and nonsmooth optimization

a bfgs-sqp method for nonsmooth, nonconvex, …a bfgs-sqp...

proximal gradient method for nonsmooth optimization over

nonsmooth analysis and optimization

nonsmooth, nonconvex optimizationnonsmooth, nonconvex...

an introduction to nonsmooth convex optimization...

nonsmooth optimization for multidisk h ∞ synthesis -...

lecture 1. essentials of numerical nonsmooth optimization