l2 static optimization unconstrained numerical

7/30/2019 L2 Static Optimization Unconstrained Numerical

1/10

Static optimization unconstrained problemsGraduate course on Optimal and Robust Control (spring12)

Zdenek Hurak

Department of Control EngineeringFaculty of Electrical Engineering

Czech Technical University in Prague

February 19, 2013

1 / 3 8

Lecture outline

Derivative-free optimizationNelder-Mead simplex method

Derivative-based optimizationLine search methods

Methods for line search (step length)Methods for descent direction search

Trust region methods

2 / 3 8

Numerical algorithms for unconstrained optimization

The key classification

Methods based on derivatives

Derivative-free methods (Nelder-Mead)

3 / 3 8

Derivative-free methods Nelder-Mead simplex method

Not to be confused with simplex method in linear programming!

1

2

3

fminsearch() in Matlab

4 / 3 8


2/10

Derivative-based methods

Line search methods


5 / 3 8

Line search methods

1. descent direction search . . . dk

2. line search (step length determination) . . . k

xk+1 = xk + kpk

6 / 3 8

Methods for line search

1. Fibonacci, golden section

2. Bisection

3. Newton

4. Inexact line search

7 / 3 8

Fibonacci searchFibonacci sequence = 1, 1, 2, 3, 5, 8, 13, . . .Fix the number of intervals at the beginning. Say, 13:

1 2 3 5 8 13 x

f(x)

Start by evaluating f(x) at x = 5 and x = 8. Need 4 evaluations(13 is the n = 4th Fib. number). In general n 2 steps and theuncertainty (b a)/FnImprovement in the uncertainty Fn1/Fn.

limn

Fn1/Fn =1

(1 +

5)/2 0.618

8 / 3 8


3/10

Golden section search

dk+1/dk 0.618

x

f(x)

a bx1x2

9 / 3 8

Speed of convergence Order of convergence

Order p of convergence of the sequence {rk} to r

0 limsupk

rk+1 r(rk r)p

<

Examples:

rk = ak, 0 < a < 1

rk = a(2k), 0 < a < 1

10/38

Linear convergence

limk

rk+1 rrk r

= < 1

Geometric seriesrk = c

k

Comparisons of two linearly converging algorithms based on theirconvergence ratios .

For = 0: superlinear convergence.For = 1: sublinear convergence.Ex.: rk = 1/k

11/38

Bisection method

x

f(x)

a bx1

12/38


4/10

Newtons method for line search

Approximate the function by a parabola (use f(xk), f(xk) and

f(xk)):

q(x) = f(xk) + f(xk)(x xk) + 1

2f(xk)(x xk)2

Find the minimum of the approximating function can be done

analytically:

0 = q(x) = f(xk) + f(xk)(x xk)

xk+1 = xk f(xk)

f(xk)

13/38

Newtons method for line search

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

x

f(x)

f(x)

x0 + f(x0)(x x0) + 1/2f(x0)(x x0)

14/38

Another look at Newtons method equation solving

Solving g(x) = 0

g(x)

0 xxk

g(xk)

g(xk)

xk+1

x

xk+1 = xk g(xk)

g(xk)

15/38

Quadratic convergence of Newtons method

Lets stay with the equation solving formulation.

xk+1 x = xk x g(xk) g(x)

g(xk)

= g(xk)

g(x) + g(xk)(x

xk)

g(xk)

=1

2

g()

g(xk)(xk x)2

|xk+1 x| k12k2

|xk x|2

16/38


5/10

Methods for descent direction search

1. steepest descent

2. Newton

3. Quasi-Newton4. Conjugate direction /conjugage gradient method (CG)

17/38

Steepest descent

Condition for descending direction

dTk f(xk) < 0

Recall the geometric interpettation of an inner product

|dTk f(xk)| = |dk||f(xk)| cos The steepest descent

xk+1 = xk kf(xk)

18/38

Steepest descent applied to quadratic cost

f(x) =1

2xTQx bTx

Find that minimizes f(xk kfk)

f(xkkfk) = 12

(xkkfk)TQ(xkkfk)bT(xkkfk)Using that gradient is

f(x) = Qx

b we get upon differentiation

wrt

k =fTk fkfTk Qfk

Hence steepest descent method is

xk+1 = xkfTk fkfTk Qfk

fk

19/38

Zigzagging of the steepest descent method

0

2500

2500

2500

25002500

5000

5000

5000

5000

5000

5000

5000

7500

7500

7

0

7500

7500

7500

7500

7500

10000

10000

10000

10000

12500

1250

0

12500

1250

0

15000

15000

17500

17500

20000

20000

22500

22500

x1

x2

100 80 60 40 20 0 20 40 60 80 100100

80

60

40

20

0

20

40

60

80

100

Poor convergence rate depending on the scaling.

20/38


6/10

Newtons search (also Newton-Raphson)

Idea: The function to be minimized is approximated locallyby a quadratic function and this approximatingfunction is minimized exactly.

xk+1 = xk [2f(xk) Hessian

]1 f(xk) gradient

Local convergence guaranteed but not global!

21/38

Solving symmetric positive definite linear equations

SolveAx = b

Cholesky factorization

A = XX.

22/38

Modifications of Newtons search damping

A search parameter introduced

xk+1 = xk k[2

f(xk)]1

f(xk)

23/38

Modifications of Newtons search positive definitenessPositive definite matrix Mk instead of Hessian

xk+1 = xk kM1k f(xk)Interprettation in the scalar nonlinear equation case:

x

g(x)

Another approach

Bk = 2f(xk) + Ekwhere Ek = 0 iff(xk) is sufficiently positive definite, otherwise itis chosen so that Bk > 0.

24/38


7/10

Quasi-Newton

From the definition of Hessian

2fk (xk+1 xk) sk

fk+1 fk yk

Find a matrix Bk+1 that mimics the Hessian behaviour above.

Bk+1sk = yk

Typically two requirements

symmetry (as Hessian)

low-rank approximation between the steps

25/38

Popular updates in quasi-Newton methods

Symmetric-rank-one (SR1): Bk+1 = Bk +(ykBksk)(ykBksk)

T

(ykBksk)Tsk

BFGS: Bk+1 = Bk BksksTkBk

sTkBksk

+yky

Tk

yTksk

As inversion ofBk is needed, the update can be applied to itsinverse directly:

DFP (Davidon, Fletcher and Powell)

Other updates keep the Hessian in factored formulation

H = RRT . . . Cholesky factorization

Matlab chol() function

26/38

Conjugate gradient directions

27/38

Inexact line search

Armijo

Goldstein

Wolfe

28/38


8/10

Intuitive approach step size reduction

Start with an initial step size s and if the corresponding vector

xk+ sd does not yield an improved (smaller) value of f(), that is, if

f(xk + sd) f(xk),reduce the step size, possibly by a fixed factor. Repeat.

29/38

However, convergence to minimum not guaranteed:

f(x) =

s(1x)2

4 2(1 x) if x > 1s(1+x)2

4 2(1 + x) if x < 1x2 1 if 1 x 1.

2 1 1 2x

1.0

0.5

0.5

1.0

1.5

2.0

2.5

f

xk+1 = xk 1f(xk).30/38

Armijos condition

31/38

Goldsteins condition

32/38


9/10

Wolfes condition

33/38

Terminal conditions

34/38


Recall

f(xk + p) f(xk) + fT(xk) p+ 12p2f(xk)p

We seek the minimum of the quadratic model function

mk(p) = f(xk) + fT(xk) p + 12pBkp

subject to

p k.For Bk = 2f(xk) trust-region Newton method.

35/38


Contours off(x)

Contours ofmk(x)

Trust region

Line search direction

Trust region step

36/38


10/10

Software

1. Optimization toolbox for Matlab: fminunc() (trust-regionNewton), fminsearch() (Nelder-Mead simplex)

2. UnconstrainedProblems package for Mathematica:FindMinimumPlot, FindMinimum

37/38

Summary

line search methods: direction search, step lentgthdetermination, Newton methods, quasi-Newton.

trust region methods.

38/38

l2 static optimization unconstrained numerical

Documents