multivariate unconstrained optimisation first we consider algorithms for functions for which...

48
Multivariate Unconstrained Optimisation • First we consider algorithms for functions for which derivatives are not available. • Could try to extend direct method such as Golden Section: b a y x One dimension Two dimensions Number of function evaluations increases as e n , where n is number of dimensions.

Upload: charlene-lloyd

Post on 04-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Multivariate Unconstrained Optimisation

• First we consider algorithms for functions for which derivatives are not available.

• Could try to extend direct method such as Golden Section:

ba

y x

One dimension

Two dimensions

Number of function evaluations increases as en, where n is number of dimensions.

Page 2: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

The Polytope Algorithm

• This is a direct search method.• Also known as “simplex” method.• In n dimensional case, at each stage we

have n+1 points x1, x2,…,xn+1 such that:

F(x1) F(x2) F(xn+1)• The algorithm seeks to replace the worst

point, xn+1, with a better one.• The xi lie at the vertices of an n-dimensional

polytope.

Page 3: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

The Polytope Algorithm 2

• The new point is formed by reflecting the worst point through the centroid of the best n vertices:

n

iixn

c1

1

• Mathematically the new point can be written:

xr = c + (c-xn+1)where >0 is the reflection coefficient.

• In two dimensions polytope is a triangle; in three dimensions it is a tetrahedron.

Page 4: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example

• For n = 2 we have three points at each step.

x3

x1

x2

cc-x3(c-x3)

(worst point)

xr

Page 5: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Detailed Polytope Algorithm

1. Evaluate F(xr)Fr. If F1 Fr Fn, then xr replaces xn+1.

2. If Fr< F1 then xr is new best point and we assume direction of reflection is “good” and attempt to expand polytope in that direction by defining the point,

xe = c + (xr-c)

where >1. If Fe< Fr then xe replaces xn+1; otherwise xr replaces xn+1.

Page 6: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Detailed Polytope Algorithm 2

3. If Fr> Fn then the polytope is too big and we attempt to contract it by defining:

xc = c + (xn+1-c) if Fr Fn+1

xc = c + (xr-c) if Fr < Fn+1

where 0<<1. If Fc< min(Fr,Fn+1) then xc replaces xn+1; otherwise a further contraction is done.

Page 7: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

MATLAB Example Polytope

>> banana = @(x)10*(x(2)-x(1)^2)^2+(1-x(1))^2;

>> [x,fval] = fminsearch(banana,[-1.2, 1],optimset('Display','iter'))

Page 8: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example by Hand

222

222

222

5.0)1()2(3

5.0)1(25.0),(

yx

yxyxyxF

Page 9: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example

• Start with equilateral triangle:

x1 = (0,0) x2=(0,0.5) x3=(3,1)/4

• Take =1, =1.5, and =0.5

Page 10: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 1

• Polytope is

i 1 2 3

xi (0,0) (0,0.5) (0.433,0.25)

F(xi) 9.7918 7.3153 4.8601

• Worst point is x1, c = (x2+ x3)/2 = (0.2165,0.375)

• Relabel points: x3 x1, x1 x3

• xr = c + (c- x3) = (0.433,0.75) and F(xr)=3.6774

• F(xr)< F(x1) so xr is best point so try to expand.

• xe = c + (xr-c) = (0.5413,0.9375) and F(xe)=3.1086

• F(xe)< F(xr) so accept expand

Page 11: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

After Step 1

Page 12: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 2

• Polytope is

i 1 2 3

xi (0.433,0.25) (0,0.5) (0.5413,0.9375)

F(xi) 4.8601 7.3153 3.1086

• Worst point is x2, c = (x1+ x3)/2 = (0.4871,0.5938)

• Relabel points: x3 x1, x2 x3, x1 x2

• xr = c + (c- x3) = (0.9743,0.6875) and F(xr)=2.0093

• F(xr)< F(x1) so xr is best point so try to expand.

• xe = c + (xr-c) = (1.2179,0.7344) and F(xe)=2.2837

• F(xe)>F(xr) so reject expand.

Page 13: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

After Step 2

Page 14: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 3

• Polytope is

i 1 2 3

xi (0.5413,0.9375) (0.433.0.25) (0.9743,0.6875)

F(xi) 3.1086 4.8601 2.0093

• Worst point is x2, c = (x1+ x3)/2 = (0.7578,0.8125)

• Relabel points: x3 x1, x2 x3, x1 x2

• xr = c + (c- x3) = (1.0826,1.375) and F(xr)=3.1199

• F(xr)>F(x2) so polytope is too big. Need to contract.

• xc = c + (xr-c) = (0.9202,1.0938) and F(xc)=2.2476

• F(xc)<F(xr) so accept contraction.

Page 15: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

After Step 3

Page 16: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 4

• Polytope is

i 1 2 3

xi (0.9743,0.6875) (0.5413,0.9375) (0.9202,1.0938)

F(xi) 2.0093 3.1086 2.2476

• Worst point is x2, c = (x1+ x3)/2 = (0.9472,0.8906)

• Relabel points: x3 x2, x2 x3

• xr = c + (c-x3) = (1.3532,0.8438) and F(xr)=2.7671

• F(xr)>F(x2) so polytope is too big. Need to contract.

• xc = c + (xr-c) = (1.1502,0.8672) and F(xc)=2.1391

• F(xc)<F(xr) so accept contraction.

Page 17: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

After Step 4

Page 18: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 5

• Polytope is

i 1 2 3

xi (0.9743,0.6875) (0.9202,1.0938) (1.1502,0.8672)

F(xi) 2.0093 2.2476 2.1391

• Worst point is x2, c = (x1+ x3)/2 = (1.0622,0.7773)

• Relabel points: x3 x2, x2 x3

• xr = c + (c- x3) = (1.2043,0.4609) and F(xr)=2.6042

• F(xr)F(x3) so polytope is too big. Need to contract.

• xc = c + (x3-c) = (0.9912,0.9355) and F(xc)=2.0143

• F(xc)<F(xr) so accept contraction.

Page 19: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

After Step 5

Page 20: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Step 6

• Polytope is

i 1 2 3

xi (0.9743,0.6875) (1.1502,0.8672) (0.9912,0.9355)

F(xi) 2.0093 2.1391 2.0143

• Worst point is x2, c = (x1+ x3)/2 = (0.9827,0.8117)

• Relabel points: x3 x2, x2 x3

• xr = c + (c- x3) = (0.8153,0.7559) and F(xr)=2.1314

• F(xr)>F(x2) so polytope is too big. Need to contract.

• xc = c + (xr-c) = (0.8990,0.7837) and F(xc)=2.0012

• F(xc)<F(xr) so accept contraction.

Page 21: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Polytope Example: Final Result

• So after 6 steps the best estimate of the minimum is x = (0.8990,0.7837) for which F(x)=2.0012.

Page 22: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Alternating Variables Method

• Start from point x = (a1, a2,…, an).• Take first variable x1, and minimise F(x1,

a2,…, an) with respect to x1. This gives x1= a1

.• Take second variable x2, and minimise

F(a1, x2,…, an) with respect to x2. This

gives x2= a2.

• Continue with each variable in turn until minimum is reached.

Page 23: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

AVM in Two Dimensions

Start

• Method of minimisation over each variable can be any univariate method.

Page 24: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

AVM Example in 2D

• Minimise F(x,y)=x2+y2+xy-2x-4y

2

4042

2

2022

xyxy

y

F

yxyx

x

F

• Start at (0,0).

Page 25: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

AVM Example in 2Dx y F(x,y) |error|

0 0 0 4

1 0 -1 3

1 1.5 -3.25 0.75

0.25 1.5 -3.8125 0.1875

0.25 1.875 -3.953 0.047

0.0625 1.875 -3.988 0.012

0.0625 1.968 -3.9971 0.0029

0.0156 1.968 -3.9992 0.0008

0.0156 1.992 -3.9998 0.0002

0.004 1.992 -3.99995 0.00005

0.004 1.998 -3.99999 0.00001

Page 26: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Definition of Gradient Vector

• The gradient vector is:

nx

F

x

Fx

F

xg

2

1

)(

• The gradient vector is also written as F(x).

Page 27: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Definition of Hessian Matrix

• The Hessian matrix is defined as:

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

)()()(

)()()(

)()()(

nnn

n

n

x

xF

xx

xF

xx

xF

xx

xF

x

xF

xx

xFxx

xF

xx

xF

x

xF

G

• The Hessian matrix is symmetric, and is also written as 2F(x).

Page 28: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Conditions for a Minimum of a Multivariate Function

1. |g(x*)| = 0. That is, all partial derivatives are zero.

2. G(x*) is positive definite. That is, xTG(x*)x > 0 for all vectors x0.

• The second condition implies that the eigenvalues of G(x*) are strictly positive.

Page 29: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Stationary Points

• If g(x*)=0 then x* is said to be a stationary point.

• There are 3 types of stationary point:1. Minimum, e.g., x2+y2 at (0,0)

2. Maximum, e.g., 1-x2-y2 at (0,0)

3. Saddle Point, e.g., x2-y2 at (0,0)

Page 30: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Definition: Level Surface

• F(x)=constant defines a “level surface”.• For different values of the constant we generate

different level surfaces.• For example, in 3-D suppose

F(x,y,z) = x2/4 + y2/9 + z2/4• F(x,y,z) = constant is an ellipsoid surface

centred on the origin.• Thus, the level surfaces are a series of

concentric ellipsoidal surfaces.• The gradient vector at point x is normal to the

level surface passing through x.

Page 31: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Definition: Tangent Hyperplane

• For a differentiable multivariate function, F, the tangent hyperplane at the point xt on the surface F(x)=constant is normal to the gradient vector.

Page 32: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Definition: Quadratic Function

• If the Hessian matrix of F is constant then F is said to be a quadratic function.

• In this case F can be expressed as:

F(x) = (1/2)xTGx + cTx + for a constant matrix G, vector c, and scalar .

F(x) = Gx + c and 2F(x) = G.

Page 33: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Example Quadratic Function

• F(x,y) = x2 + 2y2 + xy – x + 2y

y

x

y

xyxyxF 21

41

12

2

1),(

• Gradient vector is zero at stationary point, so

Gx + c = 0 at stationary point• Need to solve Gx = -c to find stationary point:

x* = G-1c x* = (6/7 -5/7)T

Page 34: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Hessian Matrix Again

• We can predict the behaviour of a general nonlinear function near a stationary point, x*, by looking at the eigenvalues of the Hessian matrix.

• Let uj and j denote the jth eigenvector and eigenvalue of G.

• If j > 0 the function will increase as we move away from x* in direction uj.

• If j < 0 the function will decrease as we move away from x* in direction uj.

• If j = 0 the function will stay constant as we move away from x* in direction uj.

Page 35: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Example Again

y

x

y

xyxyxF 21

41

12

2

1),(

1 = 1.5858 and 2 = 4.4142, so F increases as we move away from the stationary point at (6/7 -5/7)T.

• So the stationary point is a minimum.

Page 36: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Example in 4D

4

3

2

1

,

2132

1410

3121

2012

cG

• In MATLAB:

>> c = [-1 2 3 4]’;

>>G = [2 1 0 2; 1 2 -1 3; 0 -1 4 1; 2 3 1 -2];

>>x = G\(-c)

>>[u,lambda] = eigs(G)

Page 37: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Descent Methods

• Seek a general algorithm for unconstrained minimisation of a smooth multivariate function.

• Require that F decreases at each iteration.

• A method that imposes this type of condition is called a descent method.

Page 38: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

A General Descent Algorithm

Let xk be current iterate.• If converged then quit; xk is estimate of

minimum.• Compute a nonzero vector pk giving

direction of search.• Compute a positive scalar step length, k for

which F(xk+ k pk) < F(xk).• New estimate of minimum is xk+1 = xk+ kpk.

Increment k by 1, and go to step 1.

Page 39: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Method of Steepest Descent

• Direction in which F decreases most steeply is -F, so we use this as the search direction.

• New iterate is xk+1 = xk - kF, where k is non-negative scalar chosen so that xk+1 is the minimum point along the line from xk in the direction -F.

• Thus, k minimises F(xk - F) with respect to .

Page 40: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Steepest Descent Algorithm

• Initialise: x0, k=0

• Loop:u = F(xk)

if |u|=0 then quit

else minimise h()=F(xk- u) to get k

xk+1 = xk- ku

k = k+1

if (not finished) go to Loop

Page 41: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Example

• F(x,y) = x3 + y3 - 2x2 + 3y2 - 8

660

046),(,

63

43),(

2

2

y

xyxG

yy

xxyxF

F(x,y) = 0 gives 3x2-4x=0 so x = 0 or 4/3; and, 3y2+6y=0 so y=0 or -2.

(x,y) G Type

(0,0) Indefinite Saddle point

(0,-2) Negative definite Maximum

(4/3,0) Positive definite Minimum

(4/3,-2) Indefinite Saddle point

Page 42: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Solve with Steepest Descent

• Take x0 = (1 -1)T, then F(x0)=(-1 -3)T.

• h()F(x0- F(x0)) = F(1+,-1+3) =

(1+)3+(3-1)3-2(1+)2+3(3-1)2-8

• Minimise h() with respect to .h/ = 3(1+)2+9(3-1)2-4(1+)+18(3-1)

= 842 + 2 -10 = 0

• So = 1/3 or -5/14. must be nonnegative so = 1/3.

Page 43: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Solve with Steepest Descent

• x1 = x0 - F(x0) = (1 -1)T – (-1/3 -1)T = (4/3 0)T.

• This is the exact minimum.

• We were lucky that the search direction at x0 points directly towards (4/3 0)T.

• Usually we would need to do more than one iteration to get a good solution.

Page 44: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Newton’s Method• Approximate F locally by a quadratic function and

minimise this exactly.• Taylor’s Theorem:

F(x) F(xk)+(g(xk))T(x-xk)+

(1/2)(x-xk)TG(xk)(x-xk)

= F(xk)-(g(xk))Txk+ (1/2)xkTG(xk)xk+

(g(xk)-G(xk)xk)Tx+(1/2)xTG(xk)x• RHS is minimum when

g(xk) – G(xk)xk+G(xk)xk+1=0• So

xk+1= xk – [G(xk)]-1g(xk)

Search direction is -[G(xk)]-1g(xk) and step length is 1.

Page 45: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Newton’s Method Example

• Rosenbrock’s function:

F(x,y) = 10(y-x2)2 + (1-x)2

2040

40240120)(

)(20

)1(2)(40)(

2

2

2

x

xyxxG

xy

xxyxxg

• Use Newton’s Method starting at (-1.2 1)T.

Page 46: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

MATLAB Solution

>> F=@(x,y)10*(y-x^2)^2+(1-x)^2

>>fgrad1=@(x,y)-40*x*(y-x^2)-2*(1-x)

>>fgrad2=@(x,y)20*(y-x^2)

>>G11=@(x,y)120*x^2-40*y+2

>>x=[-1.2;1]

>>x=x-inv([G11(x(1),x(2)) -40*x(1);-40*x(1) 20])*[fgrad1(x(1),x(2)) fgrad2(x(1),x(2))]’

Page 47: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

MATLAB Iterations

x y F(x,y)

-1.2 1 6.7760

-0.9755 0.9012 3.9280

0.0084 -0.9679 10.3533

0.0571 0.0009 0.8892

0.9573 0.1060 6.5695

0.9598 0.9212 0.0016

1.0000 0.9984 2.6210-5

1.0000 1.0000 2.4110-14

Page 48: Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct

Notes on Newton’s Method

• Newton’s Method converges quadratically if the quadratic model is a good fit to the objective function.

• Problems arise if the quadratic model is not a good fit outside a small neighbourhood of the current point.