chapter 3 unconstrained optimization prof. s.s. jang department of chemical engineering national...

Chapter 3 UNCONSTRAINED

OPTIMIZATION

Prof. S.S. Jang

Department of Chemical Engineering

National Tsing-Hua Univeristy

Contents

Optimality Criteria Direct Search Methods

• Nelder and Mead (Simplex Search)• Hook and Jeeves(Pattern Search)• Powell’s Method (The Conjugated Direction Search)

Gradient Based Methods• Cauchy’s method (the Steepest Decent) • Newton’s Method • Conjugate Gradient Method (The Fletcher and

Reese Method)• Quasi-Newton Method – Variable Metric Method

3-1 The Optimality Criteria

Given a function (objective function) f(x), where and let

Stationary Condition:

Sufficient minimum criteria positive definite

Sufficient maximum criteria negative definite

Nx

x

x

2

1

x

xxxxxxxxxxx 32 ***

2

1*** ffff TT

0* xf

*2 xf

*2 xf

Definiteness

A real quadratic for Q=xTCx and its symmetric matrix C=[cjk] are said to be positive definite (p.d.) if Q>0 for all x≠0. A necessary and sufficient condition for p.d. is that all the determinants:

11 12 1311 12

1 11 2 3 21 22 2321 22

31 32 33

, , ..., detn

c c cc c

C c C C c c c C Cc c

c c c

are positive, C is negative definitive (n.d.) if –C is p.d.

Example :

x=linspace(-3,3);y=x;>> for i=1:100for j=1:100z(i,j)=2*x(i)*x(i)+4*x(i)*y(j)-10*x(i)*y(j)^3+y(j)^2;endendmesh(x,y,z)contour(x,y,z,100)

22

32121

21 1042 xxxxxxxf

Example- Continued (mesh and contour)

Example-Continued

0

0*x

At

01044 22

211

xxxx

f*xx

101012

2

4

22

21

2

22

2

21

2

xxx

f

x

f

x

f

*xx

*xx

*xx

210

104H Indefinite, x* is a non-optimum point.

Mesh and Contour – Himmelblau’s function : f(x)=(x1

2+x2-11)2+(x1+x22-7)2

Optimality Criteria may find a local minimum in this casex=linspace(-5,5);y=x; for i=1:100for j=1:100z(i,j)=(x(i)^2+y(j)-11)^2+(x(i)+y(j)^2-7)^2;endendmesh(x,y,z)contour(x,y,z,80)

Mesh and Contour - Continued

-5

0

5

-5

0

50

200

400

600

800

1000

-5 -4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

5

Example: f(x)=x12+25x2

2

Rosenbrock’s Banana Functionf(x,y)=100(y-x2)2+(1-x)2

Example: f(x)=2x13+4x1x2

3-10x1x2+x23

Steepest decent search from the initial point at (2,2); we have d=[6x1

2+4x23-10x2, 12x1x2

2-10x1+3x22]’=

Example: Curve Fitting

From the theoretical considerations, it is believe that the dependent variable y is related to variable x via a two-parameter function:

xk

xky

2

1

1

The parameters k1 and k2 are to be determined by a least square of the following experimental data:x y1.0 1.052.0 1.253.0 1.554.0 1.59

Problem Formulation and mesh, contour

2

2

1

2

2

1

2

2

1

2

2

1

, 41

459.1

31

355.1

21

225.1

105.1min

21

k

k

k

k

k

k

k

kkk

The Importance of One-dimensional Problem - The Iterative Optimization Procedure

Consider a objectivefunction Min f(X)=x1

2+x22, with an initial

point X0=(-4,-1) anda direction (1,0), whatis the optimum at thisdirection, i.e. X1=X0

+*(1,0). This is a one-dimensional search for.

3-2 Steepest Decent Approach – Cauchy’s Method

A direction d=[x1,x2,..,xn]’ is a n-dimensional vector, a line search is an approach such that from a based point x0, find *, for all , x0+ *d is an optimum.

Theorem: the direction d=-[f/x1,f/x2, .., f/xn]’ is the steepest decent direction.

Proof: Consider a two dimensional system: f(x1,x2), the movementds2=dx1

2+dx22, and f= (f/x1)dx1+ (f/x2)dx2=

(f/x1)dx1+ (f/x2)(ds2-dx12)1/2. Then to maximize the change f,

we have d(f)/dx1=0. It can be shown that dx1/ds= (f/x1)/((f/x1)2

+ (f/x2)2)1/2, dx2/ds= (f/x2)/((f/x1)2+ (f/x2)2)1/2

Example: f(x)=2x13+4x1x2

3-10x1x2+x23

Steepest decent search from the initial point at (2,2); we have d=[6x1

2+4x23-10x2, 12x1x2

2-10x1+3x22]’=

Quadratic Function PropertiesProperty #1: Optimal linear search for a quadratic function:

kTk

kkT

kTkkk

T

kk

kT

kkkT

k

Hss

sf

Hsssfd

df

sxx

xxHxxxxfxfxf

0

Let 2

1)()(

Quadratic Function Properties- Continued

Property #2 fk+1 is orthogonal to sk for a quadratic function

0

0

0

)( 0 Since

or 2

1)(Let

1

1

1

1

kTk

kkTkk

Tk

kkTkk

Tk

kkkTT

kk

TT

fs

bfbfsfs

xxHsfs

xxssHsfs

bfHxHxbf

Hxxbxaxf

3-2 Newton’s Method – Best Convergence approaching the optimum

kkk

k

kkk

kkT

kk

kk

kT

kkkT

k

fHxx

fHd

fHxx

xxHfxfxfx

d

xxΔx

xxHxxxxfxfxf

11

1

11

*

Define

0Want

;Let 2

1)(

3-2 Newton’s Method – Best Convergence approaching the optimum

1 12 2 31 1 2 1 2 2

2 2 22 1 2 2 1 2 1 2

24 38 36 0.473912 12 10 6x +4x -10x

38 108 88 0.648112 10 24 6 12x x -10x +3x

x xd H f

x x x x

Comparison of Newton’s Method and Cauchy’s Method

Remarks

Newton’s method is much more efficient than Steep Decent approach. Especially, the starting point is getting close to the optimum.

There is a requirement for the positive definite of the hessian in implementing Newton’s method. Otherwise, the search will be diverse.

The evaluation of hessian requires second derivative of the objective function, thus, the number of function evaluation will be drastically increased if a numerical derivative is used.

Steepest decent is very inefficient around the optimum, but very efficient at the very beginning of the search.

Conclusion

Cauchy’s method is very seldom to use because of its inefficiency around the optimum.Newton’s method is also very seldom to

use because of the requirement of second derivative.A useful approach should be somewhere

in between.

3-3 Method of Powell’s Conjugate Direction

Definition: Let u and v denote two vectors in En. They are said to be orthogonal, if their scalar product equals to zero, i.e. uv=0. Similarly, u and v are said to be mutually conjugative with respect to a matrix A, if u and Av are mutually orthogonal, i.e. uAv=0. Theorem: If conjugate directions to the hessian

are used to any quadratic function of n variables that has a minimum, can be minimized in n steps.

Powell’s Conjugated Direction Method-Continued

Corollary: Generation of conjugated vectors-Extended Parallel Space PropertyGiven a direction d and two initial point x(1), x(2), we

assume:

Hdyy

Hdyy

dHybd

dq

yx

dHybd

dqy

dHxbd

dx

dx

dq

d

dq

dxx

xqHxxbxaxf T

w.r.t. toconjugated is ''

0 '' yields (2)-(1)

(1) )''(0

such that get fromstart approach, Same

(1) )''(0at Assume,

)''(

then

)(2

1)(

21

21

2

2)2(

11

)1(

Powell’s Algorithm

Step 1: Define x0, and set N linearly indep. Directions say:s(i)=e(i)Step 2: Perfrom N+1 one directional searches

along with s(N), s(1),,s(N)Step 3: Form the conjugate direction d using the

extended parallel space property.Step 4: Set Go to Step 2

d )()()1()3()2()2()1( ,,,, NNN sssssss

Powell’s Algorithm (MATLAB) function opt=powell_n_dim(x0,n,eps,tol,opt_func) xx=x0;obj_old=feval(opt_func,x0);s=zeros(n,n);obj=obj_old;df=100;absdx=100; for i=1:n;s(i,i)=1;end while df>eps/absdx>tol x_old=xx; for i=1:n+1 if(i==n+1) j=1; else j=i; end ss=s(:,j); alopt=one_dim_pw(xx,ss',opt_func); xx=xx+alopt*ss'; if(i==1) y1=xx; end if(i==n+1) y2=xx; end end d=y2-y1; nn=norm(d,'fro'); for i=1:n d(i)=d(i)/nn; end dd=d'; alopt=one_dim_pw(xx,dd',opt_func); xx=xx+alopt*d dx=xx-x_old; plot(xx(1),xx(2),'ro') absdx=norm(dx,'fro'); obj=feval(opt_func,xx); df=abs(obj-obj_old); obj_old=obj; for i=1:n-1 s(:,i)=s(:,i+1); end s(:,n)=dd; end opt=xx

Example: f(x)=x12+25x2

2

Example 2: Rosenbrock’s function

Example: fittings

x0=[3 3];opt=fminsearch('fittings',x0)

opt =

2.0884 1.0623

Fittings

Function Fminsearch >> help fminsearch

FMINSEARCH Multidimensional unconstrained nonlinear minimization (Nelder-Mead). X = FMINSEARCH(FUN,X0) starts at X0 and finds a local minimizer X of the function FUN. FUN accepts input X and returns a scalar function value F evaluated at X. X0 can be a scalar, vector or matrix. X = FMINSEARCH(FUN,X0,OPTIONS) minimizes with the default optimization parameters replaced by values in the structure OPTIONS, created with the OPTIMSET function. See OPTIMSET for details. FMINSEARCH uses these options: Display, TolX, TolFun, MaxFunEvals, and MaxIter. X = FMINSEARCH(FUN,X0,OPTIONS,P1,P2,...) provides for additional arguments which are passed to the objective function, F=feval(FUN,X,P1,P2,...). Pass an empty matrix for OPTIONS to use the default values. (Use OPTIONS = [] as a place holder if no options are set.) [X,FVAL]= FMINSEARCH(...) returns the value of the objective function, described in FUN, at X. [X,FVAL,EXITFLAG] = FMINSEARCH(...) returns a string EXITFLAG that describes the exit condition of FMINSEARCH. If EXITFLAG is: 1 then FMINSEARCH converged with a solution X. 0 then the maximum number of iterations was reached. [X,FVAL,EXITFLAG,OUTPUT] = FMINSEARCH(...) returns a structure OUTPUT with the number of iterations taken in OUTPUT.iterations. Examples FUN can be specified using @: X = fminsearch(@sin,3) finds a minimum of the SIN function near 3. In this case, SIN is a function that returns a scalar function value SIN evaluated at X. FUN can also be an inline object: X = fminsearch(inline('norm(x)'),[1;2;3]) returns a minimum near [0;0;0]. FMINSEARCH uses the Nelder-Mead simplex (direct search) method. See also OPTIMSET, FMINBND, @, INLINE.

Pipeline design

)/()(

39.3

)/($)1()10)(722()10)(57.6(

3690045000084.7),,,(

2/1522

2

219.066

12

1

1 hrSCFLf

DPPQ

yrrLL

DPDrLPDC

…

L

P1P2

D

Q

3-4 Conjugated Gradient Method Fletcher-Reeves’s Method (FRP)

Given s1=-f1, s2=-f2+s1

It can be shown that if

where, s2 is started at the optimum of the line search along the direction s1. And fa+bTxk+1/2xk

THxk, then s2 is conjugated to s1 with respect to HMost general iteration:

11

22

ff

ffT

T

)1(2)1(

2)(

)()(

k

k

k

kk sg

ggs

Example: (The Fletcher-Reeves Method for Rosenbrock’s function )

3-5 Quasi-Newton’s Method (Variable Matrix Method)

The variable matric approach is using an approximate Hessian matrix without taking second derivative of the objective function, the Newton’s direction can be approximated by: xk+1=xk-kAkfk

As Ak=I Cauchy’s directionAs Ak=H-1 Newton’s directionIn the variable matric approach, Ak can be

updated to approximate the inverse of the Hessian by iterative approaches

3-5 Quasi-Newton’s Method (Variable Matrix Method)- Continued

Given the objective function f(x)a+bx+ 1/2xTHx, then f b+Hx, and

fk+1- f kH(xk+1- xk ), or xk+1- xk= H-1gk

where gk = fk+1- f k

Let Ak+1=H-1 = Ak+ Ak xk+1- xk= (Ak+ Ak )gk

Ak gk = xk- Ak gk

Finally, k

T

Tkk

kT

Tk

k gz

zgA

gy

yxA

Quasi-Newton’s Method – Continued

Choose y= xk, z= Ak gk (The Davidon-Flecher-Powell’s method

Broyden-Flecher-Shanno’s Method

Summary:In the first run, given initial conditions and convergence criteria, and

evaluate the gradientFirst direction is on the negative of the gradient, Ak =I

In the iteration phase, evaluate the new gradient and find gk, xk, and solve Ak using the equation in the previous slide

The new direction is on sk+1=Ak+1f k+1

Check convergence, and continue the iteration phase

)()(

)()(

)()(

)()(

)()(

)()()1(

kTk

Tkk

kTk

Tkkk

kTk

Tkkk

gx

xx

gx

gxIA

gx

gxIA

Example: (The BFS Method for Rosenbrock’s function )

Heuristic Approaches (1) s-2 Simplex ApproachMain Idea

Rules1: Straddled: Selected the “worse” vertex generated in the

previous iteration, then choose instead the vertex with the next highest function value2. Cycling: In a given vertex unchanged more than M

iterations, reduce the size by some factor3. Termination: The search is terminated when the simlex

gets small enough.

Base Point

Trial Point

Example: Rosenbrock Function

Simplex Approach: Nelder and Mead

xnew

xc

xnew

X(h) xc

xnew

X(h)

z

xc

xnew

X(h) z

xnew

xcX(h)

z

)()( )(

)1(ReflectionNormal)(g

newl fxff

a

)()(

)1(E)(l

new fxf

xpansionb

)(

)(

)(

)(

)(C)(

hnew

gnew

fxf

andfxf

Oontractionc

)()( )(

)(C)(h

newg fxff

Oontractiond

Comparisons of Different Approach for Solving Rosenbrock functionMethod subprogram Optimum found Number of

iterationsCost function value

Number of function called

Cauchy’s Steepest_decent_rosenbrock

0.9809 0.9622 537 3.6379e-004 11021

Newton’s Newton_method_rosenbrock

0.9993 0.9985 9 1.8443e-006 N/A

Powell’s Pw_rosenbrock_main 1.0000 0.9999 10 3.7942e-009 799

Fletcher-Reeves

Frp_rosenbrock_main 0.9906 0.9813 8 8.7997e-005 155

Broyden_Fletcher_Shanno

Bfs_rosenbrock_main 1.0000 1.0001 17 1.0128e-004 388

Nelder/Mead fminsearch 1.0000 1.0000 N/A 3.7222e-010 103

Heuristic Approach (Reduction Sampling with Random Search)

Step 1: Given initial sampling radius and initial center of the generate a random N points within the region.Step 2: Get the best of the N sampling as

the new center of the circle.Step 3: Reduce the radius of the search

circle, if the radius is not small enough, go back to step 2, otherwise stop.

chapter 3 unconstrained optimization prof. s.s. jang department of chemical engineering national...

Documents