basics of local optimization - university of cambridgempj1001/papers/anade_local...basics of local...

61
Basics of Local Optimization J. Peter, M. Bompard - ONERA The French Aerospace Lab. ANADE Workshop – Cambridge september 24th 2014

Upload: others

Post on 27-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Basics of Local Optimization

J. Peter, M. Bompard - ONERA The French Aerospace Lab.

ANADE Workshop – Cambridge september 24th 2014

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Mathematics and Mechanics

• Slides from those a RTO lecture series

• RTO-AVT-167 group Strategies for Optimization and Automated Design of Gas Turbine Engines

2008-2010

• Lecture notes available on S&T web site Local Search Methods for Design in Aeronautics

M. Bompard, J. Peter (46 pages) More details than in these slides

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Mathematics and Mechanics

• Classical optimization problems in Mechanics

. Minimize the drag of an aircraft under constraints (on lift, pitching mo-

ment...). Maximize total pressure of supersonic aircraft air intake

. Maximize efficiency of a turbomachinery blade under constraints (on surge

margin - compressor / on aerothermal behaviour - turbine)

• General mathematical counterpart

. All possible shapes = infinite dimensional search

. Partial differential equations, unsolvable for complex shapes and B.C.

• Need for simplification - discretization and parametrization

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Mathematics in finite dimension and Mechanics

• Classical optimization problems in Mechanics

• Using numerical analysis and parametrization

. Mesh (some hundreds of thousands/milions points) - baseline shape

. Numerical analysis (finite volumes for (RANS) equations...)

. Design parameters describing the possible deformed shapes

• Solvable, finite dimensional mathematics

• Optimization with large design space and complex direct problem very expensive

. Sampling of all design space plus computation of all points : unaffordable

. Efficient optimization algorithms required

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Local and global optimization - Mathematics (1/3)

• Definitions and notations

. α vector of parameters

. Dα design space = Region of Rnf associated to the parametrization

. Mechanics - objective : function that has to be maximized/minimized

→ Maths - function J (α) to minimize

. Mechanics - constraints : functions that have to be kept lower/higher than

some value→ Maths - functions Gk(α) that should verify Gk(α) ≤ 0.

• With or without equality constraints ?

. Each time it is possible

→ Change parametrization

→ Define a design space of smaller dimension in the subspace

defined by former equality constraint. No equality constraint considered in this course

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Local and global optimization - Mathematics (2/3)

• Global optimization

. Goal = find the best solution/shape α∗ over all design space

• Local optimization

. Goal = find a shape α∗, best shape over its neighborhood

. α∗ to be found in the vicinity of an initial value of parameter α0

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Local and global optimization - Mathematics (3/3)

• Global optimization

Seek for α∗ in Dα such that J (α∗) = Min J (α) on Dα

∀k ∈ [1, nc] Gk(α∗) ≤ 0.

• Local optimization α∗ over a neighborhood of α∗

Seek for α∗ in Dα such that J (α∗) = Min J (α) on Vα∗

(Vα∗ neighborhood of α∗)

∀k ∈ [1, nc] Gk(α∗) ≤ 0.

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Local and global optimization - Mechanics (1/2)

• For all shapes α

Mesh of the configuration X(α)State variables (displacement, flowfield) W (α)Discrete equations for Mechanics (RANS...) link X and W

R(W (α), X(α)) = 0

Function of interest depend on W and X

J (α) = J(W (α), X(α)) Gk(α) = gk(W (α), X(α))

→ See lecture notes for the assumption behind “W function of α”

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Local and global optimization - Mechanics (2/2)

• Global optimization

Seek for α∗ in Dα such that

J (α∗) = J(W (α∗), X(α∗)) = Min J (α) over Dα

∀k ∈ [1, nc] gk(W (α∗), X(α∗)) ≤ 0.

• Local optimization

Seek for α∗ in Dα such that

J (α∗) = J(W (α∗), X(α∗)) = Min J (α) over Vα∗

(Vα∗neighborhood of α∗)

∀k ∈ [1, nc] gk(W (α∗), X(α∗)) ≤ 0.

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Classification of optimization Algorithms and associated subjects

• Global optimization

. Genetic and evolutionnary algorithms

. Simulated annealing, Aunt colony, Particle swarm...

. Use of surrogate models for global optimization

→ broad other topic

• Local optimization

. Simplex method

. Descent methods

. Sensitivity computation for descent methods

→ Topics of this lesson

• How to combine local and global optimization algorithms

→ important other topic

Basics of Local Optimization ANADE Workshop 24/09/14

OPTIMIZATION

Outline of the lecture on local optimization

• The KKT (Karush-Kuhn-Tucker) conditions for local optimality

• Simplex method

• Descent methods

Basics of Local Optimization ANADE Workshop 24/09/14

KKT CONDITION FOR LOCAL OPTIMALITY

Outline of the lecture on local optimization

• The KKT (Karush-Kuhn-Tucker) conditions for local optimality

. KKT Conditions

. Solving KKT conditions in R2

• Simplex method

• Descent methods

Basics of Local Optimization ANADE Workshop 24/09/14

KKT CONDITION FOR LOCAL OPTIMALITY

Simple cases

• Infinite design space (Dα = Rnf ), no constraint, J C2 regular

• Conditions for local optimality

. Necessary condition for local optimality in α∗

∇J (α∗) = 0. Necessary and sufficient condition for local optimality in α∗

∇J (α∗) = 0 H(α∗) (Hessian of J ) positive definite

• Conditions for global optimality

. Necessary condition for global optimality in α∗

∇J (α∗) = 0. Necessary and sufficient condition for global optimality in α∗

∇J (α∗) = 0 H(α) positive definite over Rnf

strudel

Basics of Local Optimization ANADE Workshop 24/09/14

KKT CONDITION FOR LOCAL OPTIMALITY

General case (1/2)

• Bounded design space

. Supposed to be a parallelepiped

Dα = Dα = [α1,l, α1,u]×[α2,l, α2,u]×[α3,l, α3,u]×...×[αnf ,l, αnf ,u]

. Consider the bounds as 2nf additional constraints

Gnc+1(α) = α1,l − α1

Gnc+2(α) = α1 − α1,u

Gnc+3(α) = α2,l − α2

...

Gnc+2nf(α) = αnf

− αnf ,u

. Total number of constraints n′c = nc + 2nf

Basics of Local Optimization ANADE Workshop 24/09/14

KKT CONDITION FOR LOCAL OPTIMALITY

General case (2/2)

• KKT necessary condition for local optimality in α∗ (no lagrangian)

α∗is an admissible state and there exist n′c real numbers λj such that

∇J (α∗) + Σλ∗j∇Gj(α∗) = 0

λ∗jGj(α∗) = 0 λ∗j ≥ 0.

• Define Lagrangian L(α, λ1, · · · , λn′c) = J (α) + ΣλjGj(α)

• Other form of KKT necessary condition for local optimality in α∗

α∗is an admissible state and there exist n′c real numbers λj such that

∇αL(α∗, λ∗1, · · · , λ∗n′c) = 0

λ∗jGj(α∗) = 0 λ∗j ≥ 0.

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Considered simple problems

• Minimization in R2 - useful illustrative plots.

• Minimization of α21 + 3α2

2 with different constraints

/ To improve readability, change of notations (α, β) = (α1, α2)

• Minimization of α2 + 3β2 in R2 subject to

/ g1(α, β) = 7− 2α− 3β ≤ 0/ g2(α, β) = 1− α+ β2 ≤ 0 (lecture notes only)

/ g3(α, β) = 4− α− 2β ≤ 0 g4(α, β) = 1− α+ β2 ≤ 0

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

With one linear constraint

• Considered problem:

Min α2 + 3β2

subject to g1(α, β) = 7− 2α− 3β ≤ 0

• Corresponding Lagrangian Function:

L(α, β, λ) = α2

+ 3β2

+ λ1(7− 2α− 3β)

• KKT conditions:

∂L∂α

(α∗, β∗, λ∗1) = 2α

∗ − 2λ∗1 = 0

∂L∂β

(α∗, β∗, λ∗1) = 6β

∗ − 3λ∗1 = 0

g1(α∗, β∗) ≤ 0

λ∗1g1(α

∗, β∗) = 0

λ∗1 ≥ 0

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

First problem : solving KKT conditions

• First case: g1 is active⇒ g1(α∗, β∗) = 08>><>>:2α∗ − 2λ∗1 = 0

6β∗ − 3λ∗1 = 0

7− 2α∗ − 3β∗ = 0

9>>=>>;⇒ (α∗, β∗, λ∗1) = (2, 1, 2)

• Second case: g1 is not active⇒ λ∗1 = 08<: 2α∗ = 0

6β∗ = 0

9=;⇒ (α∗, β∗) = (0, 0)⇒ g1(α∗, β∗) = 7 > 0

• Unique solution: (α∗, β∗) = (2, 1) with λ∗1 = 2

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

First problem - solution

Solution of first problem: (α∗, β∗) = (2, 1), λ∗1 = 2

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Second problem

• Minimization in R2 - useful illustrative plots.

• To improve readability, change of notations (α, β) = (α1, α2)

• Considered problem:

Min α2 + 3β2

subject to g3(α, β) = 4− α− 2β ≤ 0

and g4(α, β) = 1− α+ β2 ≤ 0

• Corresponding Lagrangian Function:

L(α, β, λ3, λ4) = α2

+ 3β2

+ λ3(4− α− 2β) + λ4(1− α+ β2)

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Second problem - Writing KKT-conditions (2)

• KKT conditions:

∂L∂α

(α∗, β∗, λ3, λ4) = 2α

∗ − λ3 − λ4 = 0

∂L∂β

(α∗, β∗, λ3, λ4) = 6β

∗ − 2λ3 + 2λ4β∗

= 0

g3(α∗, β∗) ≤ 0

g4(α∗, β∗) ≤ 0

λ3g3(α∗, β∗) = 0

λ4g4(α∗, β∗) = 0

λ3 ≥ 0

λ4 ≥ 0

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Second problem - solving KKT conditions (1)

• First case: the two constraints are active⇒ g3(α∗, β∗) = g4(α∗, β∗) = 08>>>>><>>>>>:2α∗ − λ∗3 − λ∗4 = 0

2β∗(3 + λ∗4)− 2λ∗3 = 0

4− α∗ − 2β∗ = 0

1− α∗ + (β∗)2 = 0

9>>>>>=>>>>>;⇒ (α∗, β∗, λ∗3, λ

∗4) = (2, 1,

7

2,

1

2)

Solve order two equation in β∗. discard (α∗, β∗, λ∗3, λ∗4) = (10,−3, 69

2, −29

2)) since λ∗4 < 0

• Second case: only constraint g3 is active⇒ g3(α∗, β∗) = λ∗4 = 08>><>>:2α∗ − λ∗3 = 0

6β∗ − 2λ∗3 = 0

4− α∗ − 2β∗ = 0

9>>=>>;⇒ (α∗, β∗, λ∗3) = (12

7,

8

7,

24

7)⇒ g4(α∗, β∗) =

29

49> 0

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Second problem - solving KKT conditions (2)

• Third case: only constraint g4 is active⇒ g4(α∗, β∗) = λ∗3 = 08>><>>:2α∗ − λ∗4 = 0

2β∗(3 + λ∗4) = 0

1− α∗ + (β∗)2 = 0

9>>=>>;⇒ (α∗, β∗, λ∗4) = (1, 0, 2)⇒ g3(α∗, β∗) = 3 > 0

• Fourth case: two constraints are inactive⇒ λ∗4 = λ∗3 = 08<: 2α∗ = 0

6β∗ = 0

9=;⇒ (α∗, β∗) = (0, 0)⇒ g4(α∗, β∗) = 4 > 0

• Unique solution: (α∗, β∗) = (2, 1) with λ∗3 = 72

and λ∗4 = 12

Basics of Local Optimization ANADE Workshop 24/09/14

SOLVING KKT CONDITIONS FOR ELEMENTARY PROBLEMS

Example - Figure

Solution of example: (α∗, β∗) = (2, 1), λ∗3 = 72 , λ∗4 = 1

2

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Outline of the lecture on local optimization

• The KKT (Karush-Kuhn-Tucker) conditions for local optimality

• Simplex method

/ Definition of “simplex” or “polytop”

/ Transformation of the simplex

/ Algorithm

• Descent methods

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Introduction

• Deterministic and simple nonlinear local optimization algorithm:

/ Introduced by Nelder and Mead (1965)

/ "Direct method": uses only values of the objective function

/ Efficient (low memory and low computational cost), robust (very tolerant to

noise) and easy to code

• Based on the concept of “simplex” or “polytop”

/ Polytop = set of nd + 1 vertices (triangle in 2D, tetrahedron in 3D)

• Principle

/ Choice of an initial simplex

/ Successive transformation of simplex to adapt it to the objective function

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Starting simplex - possible transformations

• Values of J computed for all points

• Points of simplex sorted so that J (α1) ≤ J (α2) ≤ ... ≤ J (αnd+1)

• Principle = substitute a better point to worst point αnd+1

• Define α the center of gravity of (α1, α2, ..., αnd)

• Additional points evaluated during iteration

/ (always)αr reflection of worst point w.r.t. α : αr = (1+a)α−aαnd+1

/ (possibly) αe expansion towards αr (b > 1) : αe = bαr + (1− b)α/ (possibly) αc contraction in [α, αr] : αc = cα+ (1− c)αr

/ (possibly)αc contraction in [α, αnd+1] : αc = cα+(1−c)αnd+1

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD - MOVING THE WORST POINT (in R3)

Red α4 (worst point), blue α1 (best point), green tested new point

α1α4

α

Original simplex

α1α4

α

α1α4

α

Reflection Reflection and expansion

αr = (1 + a)α− aα4 αe = bαr + (1− b)α

a: reflection coefficient b: expansion coefficient

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD - MOVING THE WORST POINT (in R3)

Red α4 (worst point), blue α1 (best point), green tested new point

α1α4

α

Original simplex

α1α4

α

α1α4

Contraction Contraction in all direction

αc = cα+ (1− c)α(.) αi = dαi + (1− d)α1, i = 2, ..., nd + 1

c: contraction coefficient d: shrinking coefficient

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

An iteration of Simplex (1/2) - excellent/good cases

• Sorted simplex : J (α1) ≤ J (α2) ≤ ... ≤ J (αnd+1)

α the center of gravity of (α1, α2, ..., αnd)

• Define αr = (1 + a)α− aαnd+1 Compute J (αr)

• If J (αr) < J (α1) (very good point), define αe by expansion towards αr

If J(αe) < J(αr), αe is selected to replace αnd+1

Else αr is selected to replace αnd+1

• If J (α1) ≤ J(αr) < J(αnd), αr is accepted to replace αnd+1

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

An iteration of Simplex (2/2) - bad cases

• Sorted simplex : J (α1) ≤ J (α2) ≤ ... ≤ J (αnd+1). α the center of gravity

• If J (αnd) ≤ J (αr) < J (αnd+1), define αc by contraction of αr

If J(αc) < J(αr) then αc is selected to replace αnd+1

Otherwise αr is selected to replace αnd+1

• If J (αnd+1) ≤ J (αr) (very bad point), define αc by contraction of αnd+1

If J (αc) < J (αnd+1) then αc is selected to replace αnd+1

Otherwise, make an internal contraction of the polytop: points

αi, i = 2, ..., nd + 1 are contracted

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Starting and stopping simplex - parameters

• Choice of initial simplex:

/ An initial point α1 is chosen (for example, randomly)

/ Simplex is build around this point: αi+1 = α1 + λiei, i = 1, ..., nd

→ Vectors ei define a basis of IRnd

→ Scalar factors λi, (i = 1, ..., nd) are constants

• Stopping criterion: 1n

Pndi=1 ||α

k+1i −αki ||2 < ε (displacement of polytop small enough)

• Standard value of parameters: (a, b, c, d) = (1, 2, 1/2, 1/2)

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Rosenbrock Banana Function

Figure 1: f(x, y) = (1− x)2 + 100(y − x2)2

Basics of Local Optimization ANADE Workshop 24/09/14

SIMPLEX METHOD

Example (with a slightly different α)

0.8 0.85

0.9 0.95

1 1.05

1.1

0.8

0.85

0.9

0.95

1

1.05

1.1

(1-x)**2 + 100*(y - x**2)**2Simplex (it.1)Simplex (it.2)

True minimum

0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

(1-x)**2 + 100*(y - x**2)**2Simplex (it.10)Simplex (it.11)True minimum

0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 1.01

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

1.01

(1-x)**2 + 100*(y - x**2)**2Simplex (it.20)Simplex (it.21)True minimum

0.99 0.995

1 1.005

1.01

0.99

0.995

1

1.005

1.01

(1-x)**2 + 100*(y - x**2)**2Simplex (it.30)Simplex (it.31)True minimum

Figure 2: Iterations on Rosenbrock banana function

Basics of Local Optimization ANADE Workshop 24/09/14

DESCENT METHODS

Sensitivity computation for descent methods

• The KKT (Karush-Kuhn-Tucker) conditions for local optimality

• Simplex method

• Descent methods

/ Line search (1D descent)

/ Unconstrained multi-D descent

/ Constrained multi-D descent

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

PRESENTATION

• One part of classical optimization algorithm (at iteration k):

/ Compute direction dk/ Compute step tk in direction dk (USE LINE-SEARCH)

/ Update current solution: αk+1 = αk + tkdk

• Objective of line-search:

/ Find step t which minimizes q(t) = J (αk + tdk)/ Use a reasonable number of function evaluations

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

List of methods

• Line search

/ Find step t which minimizes q(t) = J (αk + tdk)

• Academic line-search methods

/ Require large number of function/derivative evaluations

/ Methods of order 0 : golden number method...

/ Methods of order 1 : Wolfe line-search...

→ See lecture notes

• Other line-search methods

/ Robust and not too expensive

/ Polynomial approximations

→ Presented next slides

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

Academic methods - methods of order 0

• Naive method: search minimum by successive tries and search range refinement in the

vicinity of a local minimum

• Algorithm

/ Begin with a search interval [a0, b0]/ At each step k, choose two search points t−k and t+k in [ak, bk]/ q(t−k ) ≤ q(t+k )⇒ [ak, bk] = [ak−1, t

+k ]

/ q(t+k ) < q(t−k )⇒ [ak, bk] = [t−k , bk−1]/ Stop when |ak − bk| ≤ ε

• Need a strategy to choose search points

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

Methods of order 0 - three equal parts

• Three equal parts: t−k = tL + 13(tR − tL) and t+k = tL + 2

3(tR − tL)

⇒ linear convergence with rate ofq

23' 0.82

1.05

1.1

1.15

1.2

1.25

1.3

0 0.2 0.4 0.6 0.8 1

[a0,b0]

[a1,b1]

[a2,b2]

[a3,b3]

exp(-x) + 0.3*exp(x)Search points (it.1)Search points (it.2)Search points (it.3)

Figure 3: First iterations of the method

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

Methods of order 0 - Golden number

• Use of golden number (λ = 1+√

52

): t−k = 11+λ

(λtL + tR) and t+k = 11+λ

(tL + λtR)

⇒ linear convergence with rate of 1λ' 0.62

1.05

1.1

1.15

1.2

1.25

1.3

0 0.2 0.4 0.6 0.8 1

[a0,b0]

[a1,b1]

[a2,b2]

[a3,b3]

exp(-x) + 0.3*exp(x)Search points (it.1)Search points (it.2)Search points (it.3)

Figure 4: First iterations of the method

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

Polynomial approximations

• One ot the most effective techniques:

/ Requiring only a few function evaluations

/ Can lead to a very poor approximation for highly nonlinear functions

• Example: approximation with a polynomial of degree 2 q(t) = a0 + a1t+ a2t2

/ Evaluation of q and q′ at t0 = 0:⇒ (a0, a1) = (q(t0), q′(t0))/ Evaluation of q at an another point t1:⇒ a2 = q(t1)−a0−a1t1

t21

/ Identification of minimum: t∗ = −a12a2

• Approximation with a polynomial of degree n requires n − 1 function evaluations (in

addition to the evaluation of function and derivative in t0 = 0)

Basics of Local Optimization ANADE Workshop 24/09/14

LINE SEARCH

Polynomial approximations- Example

1.05

1.1

1.15

1.2

1.25

1.3

0 0.2 0.4 0.6 0.8 1

exp(-x) + 0.3*exp(x)Polynomial approximation

Computed solutionExact solution

Figure 5: Exact solution and computed solution by polynomial approximations

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Presentation

• Objective: search an optimum of the objectif function on the whole space Rnd

• General principle:

/ Start with an initial point α0

/ At each iteration k, current iterate is modified

→ Compute a descent direction dk

→ Compute a step in this direction tk (for basic method, tk = 1)

→ Update approximation by αk+1 = αk + tkdk

/ End when a criterion (most often on the gradient) is satisfied

• Presented methods:

/ Steepest descent method

/ Conjugate gradient method

/ Newton method

/ Quasi-Newton method

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Steepest descent method

• Most intuitive, basic method: dk = −∇J (αk) and tk = 1

/ Easy to implement

/ But short-sighted

/ Zig-zagging behaviour

• Optimal step version : dk = −∇J (αk) and optimal tk (line-search)

• Requires more information to be efficient

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Steepest descent method: first example

Figure 6: Steepest descent on Rosenbrock function

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Steepest descent method: second example

Figure 7: Steepest descent on ellipsoid like function

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Conjugate gradient method (1/2)

• Minimization of (1/2(αTHα) + bTα) matrix H positive definite

/ one unique solution

/ also solution linear system Hα+ b = 0/ well-known linear algebra algorithm (Hestenes & Stiefel, 1952)

/ current descent direction combines function gradient and former direction

/ if steepest descent used, very sensitive to condition number of H

• Extensions

/ Linear algebra : different types of H (bi-CGSTAB, GMRES...)

/ Optimization : non quadratic functions

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Conjugate gradient method (2/2)

• Improve descent direction compared to steepest descent

/ Use direction descent dk conjugate to dk−1 in sense of H (like linear

algebra algorithm dTk−1Hdk = 0 )

dk = −∇J (αk) + βkdk−1 βk =

(∇J (αk)

)THdk−1

(dk−1)T Hdk−1

/ Use optimal step (line search) unlike linear algebra algorithm

/ Extend to non quadratic case:

• Fletcher-Reeves: βk = ‖∇J (αk)‖2‖∇J (αk−1)‖2

• Polak-Ribiere: βk = ‖∇J (αk)‖2 − (∇J (αk))T∇J (αk−1)

‖∇J (αk−1)‖2

• More efficient method require information about the second-order derivatives (Newton

method)

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Conjugate gradient method: one example

Figure 8: Conjugate gradient on Rosenbrock function

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Newton method (1)

• Originally a method to find roots of equation z(α) = 0

/ Approximate z by linear expansion: z(αk + dk) ' z(αk) +∇z(αk)dk

/ Solve equation z(αk) +∇z(αk)dk = 0→ increment dk = −

[∇z(αk)

]−1z(αk)

/ Update current approximation: αk+1 = αk + dk

• Same method can be used as an optimization algorithm by searching roots of gradient of

the objective function

/ Descent direction is computed by: dk = −[∇2J (αk)

]−1∇J (αk)

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Newton method (2)

• Reminder α(k+1) = αk −ˆ∇2J (αk)

˜−1∇J (αk)

• Main advantage is the convergence in neighborhood of the solution

/ Superlinear in general

/ Quadratic (number of exact digits doubled at each iteration) if J has C3

regularity• Drawbacks

/ Hessian is required: explicit form unavailable in general and numerical com-

putation very expensive/ In high dimensional space, solution of linear system (at each iteration) CPU

demanding/ Violent divergence far from the optimal point

• Quasi-Newton methods developed to circumvent these drawbacks

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Quasi-Newton method

• Improvement over the Newton method based on two main ideas

/ Adding a line-search process to the algorithm:

→ At each iteration, compute an optimal step by minimizing

q(t) = J (αk + tdk) (Cf. line-search methods above)→ Hessian of the objective function needs to be positive

/ Approximate the inverse of the Hessian by a matrix H

• Algorithm:

/ Start from an initial point α0 with H initialized to a positive definite matrix

/ While ||∇J || > ε

→ Compute descent direction : dk = −Hk∇J (αk)→ Line-search (initialized t = 1)

→ Update current iterate: αk+1 = αk + tdk

→ Compute the new approximation Hk+1

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Quasi-Newton method: update H

• Method presented here: BFGS (Broyden-Fletcher-Goldfarb-Shanno)

• The most used quasi-Newton algorithm

• Keep the matrix positive definite

• Update H by following formula:

sk = αk+1 − αk

yk = ∇J (αk+1)−∇J (αk)

Hk+1 = Hk − sk(yk)T Hk + Hkyk(sk)T

(yk)T sk+

»1 +

(yk)T Hkyk

(yk)T sk

–sk(sk)T

(yk)T sk

Basics of Local Optimization ANADE Workshop 24/09/14

UNCONSTRAINED MULTI-D DESCENT

Quasi-Newton method: one example

Figure 9: Quasi-Newton method (BFGS) on Rosenbrock function

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Reminder

• Solve the mathematical problem defined at the beginning of the talk

• Local optimization problem :

Seek for α∗ in Dα such that J (α∗) = Min J (α) on Vα∗

(Vα∗ neighborhood of α∗)

∀k ∈ [1, nc] Gk(α∗) ≤ 0.

• Restricted to differentiable functions

• Four methods

/ Sequential linear programming→ see lecture notes

/ Method of centers→ see lecture notes

/ Feasible direction method

/ Sequential quadratic programming

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Method of feasible direction (1/4)

• Two-step “non-linear” algorithm

/ Search for descent direction (taking into account non-linearity of active con-

straints)/ Line-search along descent-direction

• Definition of descent direction

/ Goal find best possible descent direction, avoiding following “trap” :

→ In R2, current point α0, objective J , one CONVEX ACTIVE

constraint G1.

The solution of “Find d (a priori bounded) minimizing

∇J (α0).d subject to ∇G1(α0).d < 0” leads to inadmissi-

ble α1 (see plot of slide N+3)

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Method of feasible direction (2/4)

• Definition of descent direction d

• Avoid trap described on previous slide

/ Good requirement associated to an active convex constraint is

∇G1(α0).d+ θj < 0 θj > 0• Need to link the decrease of J and Gj along d

• “Good” feasible direction search

/ Maximize β, find dk (a priori bounded), such that

. ∇J (α0).dk + β ≤ 0.

. ∇Gj(α0).dk + θjβ ≤ 0. ∀j/Gj(α0) = 0

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Method of feasible direction (3/4)

• Algorithm :

/ Set k = 0; an initial iterate α0 and a stopping tolerance ε are given.

/ WHILE KKT-conditions not satisfied

Maximize β, find dk (a priori bounded), such that

. ∇J (αk).dk + β ≤ 0.

. ∇Gj(αk).dk + θjβ ≤ 0. ∀j/Gj(αk) = 0Minimize q(t) = J(αk + tdk) (One dimensional search)

Update αk+1 = αk + tdk

Set k = k + 1/ END WHILE

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Method of feasible direction (4/4)

Figure 10: Method of feasible direction in R2

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Sequential quadratic programming (SQP) (1/2)

• Solve a sequence of quadratic minimization with linear constraints

• Quadratic minimization of Q(d) = J (αk) +∇J (αk).d+ 12dTBd

/ B approximate Hessian matrix

• Set of linear constraints∇Gj(αk).d+ δjGj(αk) ≤ 0. j = 1, nc

/ δj=1 for strictly respected constraint

Gj(αk) < 0 value of the constraint may increase

/ δj ∈ [0, 1] for violated constraints

Gj(αk) > 0 Forces decrease of (first-order expansion of) Gj

Basics of Local Optimization ANADE Workshop 24/09/14

CONSTRAINED MULTI-D DESCENT

Sequential quadratic programming (SQP) (2/2)

• Algorithm :

/ Set k = 0; an initial iterate α0, a stopping tolerance ε, and initial approxi-

mate Hessian matrix B0 are given./ WHILE KKT-conditions not satisfied

Find dk which

. minimizes J (αk) +∇J (αk).d+ 12dTBkd

. subject to∇Gj(αk).d+ δjGj(αk) ≤ 0. j = 1, ncUpdate αk+1 = αk + dk

Build Bk+1 (for example, from BFGS formula)

Set k = k + 1/ END WHILE