multivariate unconstrained optimisation first we consider algorithms for functions for which...

Multivariate Unconstrained Optimisation

• First we consider algorithms for functions for which derivatives are not available.

• Could try to extend direct method such as Golden Section:

ba

y x

One dimension

Two dimensions

Number of function evaluations increases as en, where n is number of dimensions.

The Polytope Algorithm

• This is a direct search method.• Also known as “simplex” method.• In n dimensional case, at each stage we

have n+1 points x1, x2,…,xn+1 such that:

F(x1) F(x2) F(xn+1)• The algorithm seeks to replace the worst

point, xn+1, with a better one.• The xi lie at the vertices of an n-dimensional

polytope.

The Polytope Algorithm 2

• The new point is formed by reflecting the worst point through the centroid of the best n vertices:

n

iixn

c1

1

• Mathematically the new point can be written:

xr = c + (c-xn+1)where >0 is the reflection coefficient.

• In two dimensions polytope is a triangle; in three dimensions it is a tetrahedron.

Polytope Example

• For n = 2 we have three points at each step.

x3

x1

x2

cc-x3(c-x3)

(worst point)

xr

Detailed Polytope Algorithm

1. Evaluate F(xr)Fr. If F1 Fr Fn, then xr replaces xn+1.

2. If Fr< F1 then xr is new best point and we assume direction of reflection is “good” and attempt to expand polytope in that direction by defining the point,

xe = c + (xr-c)

where >1. If Fe< Fr then xe replaces xn+1; otherwise xr replaces xn+1.

Detailed Polytope Algorithm 2

3. If Fr> Fn then the polytope is too big and we attempt to contract it by defining:

xc = c + (xn+1-c) if Fr Fn+1

xc = c + (xr-c) if Fr < Fn+1

where 0<<1. If Fc< min(Fr,Fn+1) then xc replaces xn+1; otherwise a further contraction is done.

MATLAB Example Polytope

>> banana = @(x)10*(x(2)-x(1)^2)^2+(1-x(1))^2;

>> [x,fval] = fminsearch(banana,[-1.2, 1],optimset('Display','iter'))

Polytope Example by Hand

222

222

222

5.0)1()2(3

5.0)1(25.0),(

yx

yxyxyxF

Polytope Example

• Start with equilateral triangle:

x1 = (0,0) x2=(0,0.5) x3=(3,1)/4

• Take =1, =1.5, and =0.5

Polytope Example: Step 1

• Polytope is

i 1 2 3

xi (0,0) (0,0.5) (0.433,0.25)

F(xi) 9.7918 7.3153 4.8601

• Worst point is x1, c = (x2+ x3)/2 = (0.2165,0.375)

• Relabel points: x3 x1, x1 x3

• xr = c + (c- x3) = (0.433,0.75) and F(xr)=3.6774

• F(xr)< F(x1) so xr is best point so try to expand.

• xe = c + (xr-c) = (0.5413,0.9375) and F(xe)=3.1086

• F(xe)< F(xr) so accept expand

After Step 1


• Polytope is

i 1 2 3

xi (0.433,0.25) (0,0.5) (0.5413,0.9375)

F(xi) 4.8601 7.3153 3.1086


• Relabel points: x3 x1, x2 x3, x1 x2

• xr = c + (c- x3) = (0.9743,0.6875) and F(xr)=2.0093

• F(xr)< F(x1) so xr is best point so try to expand.

• xe = c + (xr-c) = (1.2179,0.7344) and F(xe)=2.2837

• F(xe)>F(xr) so reject expand.

After Step 2


• Polytope is

i 1 2 3

xi (0.5413,0.9375) (0.433.0.25) (0.9743,0.6875)

F(xi) 3.1086 4.8601 2.0093


• Relabel points: x3 x1, x2 x3, x1 x2

• xr = c + (c- x3) = (1.0826,1.375) and F(xr)=3.1199

• F(xr)>F(x2) so polytope is too big. Need to contract.

• xc = c + (xr-c) = (0.9202,1.0938) and F(xc)=2.2476

• F(xc)<F(xr) so accept contraction.

After Step 3


• Polytope is

i 1 2 3

xi (0.9743,0.6875) (0.5413,0.9375) (0.9202,1.0938)

F(xi) 2.0093 3.1086 2.2476



• xr = c + (c-x3) = (1.3532,0.8438) and F(xr)=2.7671


• xc = c + (xr-c) = (1.1502,0.8672) and F(xc)=2.1391


After Step 4


• Polytope is

i 1 2 3

xi (0.9743,0.6875) (0.9202,1.0938) (1.1502,0.8672)

F(xi) 2.0093 2.2476 2.1391



• xr = c + (c- x3) = (1.2043,0.4609) and F(xr)=2.6042

• F(xr)F(x3) so polytope is too big. Need to contract.

• xc = c + (x3-c) = (0.9912,0.9355) and F(xc)=2.0143


After Step 5


• Polytope is

i 1 2 3

xi (0.9743,0.6875) (1.1502,0.8672) (0.9912,0.9355)

F(xi) 2.0093 2.1391 2.0143



• xr = c + (c- x3) = (0.8153,0.7559) and F(xr)=2.1314


• xc = c + (xr-c) = (0.8990,0.7837) and F(xc)=2.0012


Polytope Example: Final Result

• So after 6 steps the best estimate of the minimum is x = (0.8990,0.7837) for which F(x)=2.0012.

Alternating Variables Method

• Start from point x = (a1, a2,…, an).• Take first variable x1, and minimise F(x1,

a2,…, an) with respect to x1. This gives x1= a1

.• Take second variable x2, and minimise

F(a1, x2,…, an) with respect to x2. This

gives x2= a2.

• Continue with each variable in turn until minimum is reached.

AVM in Two Dimensions

Start

• Method of minimisation over each variable can be any univariate method.

AVM Example in 2D

• Minimise F(x,y)=x2+y2+xy-2x-4y

2

4042

2

2022

xyxy

y

F

yxyx

x

F

• Start at (0,0).

AVM Example in 2Dx y F(x,y) |error|

0 0 0 4

1 0 -1 3

1 1.5 -3.25 0.75

0.25 1.5 -3.8125 0.1875

0.25 1.875 -3.953 0.047

0.0625 1.875 -3.988 0.012

0.0625 1.968 -3.9971 0.0029

0.0156 1.968 -3.9992 0.0008

0.0156 1.992 -3.9998 0.0002

0.004 1.992 -3.99995 0.00005

0.004 1.998 -3.99999 0.00001

Definition of Gradient Vector

• The gradient vector is:

nx

F

x

Fx

F

xg

2

1

)(

• The gradient vector is also written as F(x).

Definition of Hessian Matrix

• The Hessian matrix is defined as:

2

2

2

2

1

2

2

2

22

2

12

21

2

21

2

21

2

)()()(

)()()(

)()()(

nnn

n

n

x

xF

xx

xF

xx

xF

xx

xF

x

xF

xx

xFxx

xF

xx

xF

x

xF

G

• The Hessian matrix is symmetric, and is also written as 2F(x).

Conditions for a Minimum of a Multivariate Function

1. |g(x*)| = 0. That is, all partial derivatives are zero.

2. G(x*) is positive definite. That is, xTG(x*)x > 0 for all vectors x0.

• The second condition implies that the eigenvalues of G(x*) are strictly positive.

Stationary Points

• If g(x*)=0 then x* is said to be a stationary point.

• There are 3 types of stationary point:1. Minimum, e.g., x2+y2 at (0,0)

2. Maximum, e.g., 1-x2-y2 at (0,0)

3. Saddle Point, e.g., x2-y2 at (0,0)

Definition: Level Surface

• F(x)=constant defines a “level surface”.• For different values of the constant we generate

different level surfaces.• For example, in 3-D suppose

F(x,y,z) = x2/4 + y2/9 + z2/4• F(x,y,z) = constant is an ellipsoid surface

centred on the origin.• Thus, the level surfaces are a series of

concentric ellipsoidal surfaces.• The gradient vector at point x is normal to the

level surface passing through x.

Definition: Tangent Hyperplane

• For a differentiable multivariate function, F, the tangent hyperplane at the point xt on the surface F(x)=constant is normal to the gradient vector.

Definition: Quadratic Function

• If the Hessian matrix of F is constant then F is said to be a quadratic function.

• In this case F can be expressed as:

F(x) = (1/2)xTGx + cTx + for a constant matrix G, vector c, and scalar .

F(x) = Gx + c and 2F(x) = G.

Example Quadratic Function

• F(x,y) = x2 + 2y2 + xy – x + 2y

y

x

y

xyxyxF 21

41

12

2

1),(

• Gradient vector is zero at stationary point, so

Gx + c = 0 at stationary point• Need to solve Gx = -c to find stationary point:

x* = G-1c x* = (6/7 -5/7)T

Hessian Matrix Again

• We can predict the behaviour of a general nonlinear function near a stationary point, x*, by looking at the eigenvalues of the Hessian matrix.

• Let uj and j denote the jth eigenvector and eigenvalue of G.

• If j > 0 the function will increase as we move away from x* in direction uj.

• If j < 0 the function will decrease as we move away from x* in direction uj.

• If j = 0 the function will stay constant as we move away from x* in direction uj.

Example Again

y

x

y

xyxyxF 21

41

12

2

1),(

1 = 1.5858 and 2 = 4.4142, so F increases as we move away from the stationary point at (6/7 -5/7)T.

• So the stationary point is a minimum.

Example in 4D

4

3

2

1

,

2132

1410

3121

2012

cG

• In MATLAB:

>> c = [-1 2 3 4]’;

>>G = [2 1 0 2; 1 2 -1 3; 0 -1 4 1; 2 3 1 -2];

>>x = G\(-c)

>>[u,lambda] = eigs(G)

Descent Methods

• Seek a general algorithm for unconstrained minimisation of a smooth multivariate function.

• Require that F decreases at each iteration.

• A method that imposes this type of condition is called a descent method.

A General Descent Algorithm

Let xk be current iterate.• If converged then quit; xk is estimate of

minimum.• Compute a nonzero vector pk giving

direction of search.• Compute a positive scalar step length, k for

which F(xk+ k pk) < F(xk).• New estimate of minimum is xk+1 = xk+ kpk.

Increment k by 1, and go to step 1.

Method of Steepest Descent

• Direction in which F decreases most steeply is -F, so we use this as the search direction.

• New iterate is xk+1 = xk - kF, where k is non-negative scalar chosen so that xk+1 is the minimum point along the line from xk in the direction -F.

• Thus, k minimises F(xk - F) with respect to .

Steepest Descent Algorithm

• Initialise: x0, k=0

• Loop:u = F(xk)

if |u|=0 then quit

else minimise h()=F(xk- u) to get k

xk+1 = xk- ku

k = k+1

if (not finished) go to Loop

Example

• F(x,y) = x3 + y3 - 2x2 + 3y2 - 8

660

046),(,

63

43),(

2

2

y

xyxG

yy

xxyxF

F(x,y) = 0 gives 3x2-4x=0 so x = 0 or 4/3; and, 3y2+6y=0 so y=0 or -2.

(x,y) G Type

(0,0) Indefinite Saddle point

(0,-2) Negative definite Maximum

(4/3,0) Positive definite Minimum

(4/3,-2) Indefinite Saddle point

Solve with Steepest Descent

• Take x0 = (1 -1)T, then F(x0)=(-1 -3)T.

• h()F(x0- F(x0)) = F(1+,-1+3) =

(1+)3+(3-1)3-2(1+)2+3(3-1)2-8

• Minimise h() with respect to .h/ = 3(1+)2+9(3-1)2-4(1+)+18(3-1)

= 842 + 2 -10 = 0

• So = 1/3 or -5/14. must be nonnegative so = 1/3.

Solve with Steepest Descent

• x1 = x0 - F(x0) = (1 -1)T – (-1/3 -1)T = (4/3 0)T.

• This is the exact minimum.

• We were lucky that the search direction at x0 points directly towards (4/3 0)T.

• Usually we would need to do more than one iteration to get a good solution.

Newton’s Method• Approximate F locally by a quadratic function and

minimise this exactly.• Taylor’s Theorem:

F(x) F(xk)+(g(xk))T(x-xk)+

(1/2)(x-xk)TG(xk)(x-xk)

= F(xk)-(g(xk))Txk+ (1/2)xkTG(xk)xk+

(g(xk)-G(xk)xk)Tx+(1/2)xTG(xk)x• RHS is minimum when

g(xk) – G(xk)xk+G(xk)xk+1=0• So

xk+1= xk – [G(xk)]-1g(xk)

Search direction is -[G(xk)]-1g(xk) and step length is 1.

Newton’s Method Example

• Rosenbrock’s function:

F(x,y) = 10(y-x2)2 + (1-x)2

2040

40240120)(

)(20

)1(2)(40)(

2

2

2

x

xyxxG

xy

xxyxxg

• Use Newton’s Method starting at (-1.2 1)T.

MATLAB Solution

>> F=@(x,y)10*(y-x^2)^2+(1-x)^2

>>fgrad1=@(x,y)-40*x*(y-x^2)-2*(1-x)

>>fgrad2=@(x,y)20*(y-x^2)

>>G11=@(x,y)120*x^2-40*y+2

>>x=[-1.2;1]

>>x=x-inv([G11(x(1),x(2)) -40*x(1);-40*x(1) 20])*[fgrad1(x(1),x(2)) fgrad2(x(1),x(2))]’

MATLAB Iterations

x y F(x,y)

-1.2 1 6.7760

-0.9755 0.9012 3.9280

0.0084 -0.9679 10.3533

0.0571 0.0009 0.8892

0.9573 0.1060 6.5695

0.9598 0.9212 0.0016

1.0000 0.9984 2.6210-5

1.0000 1.0000 2.4110-14

Notes on Newton’s Method

• Newton’s Method converges quadratically if the quadratic model is a good fit to the objective function.

• Problems arise if the quadratic model is not a good fit outside a small neighbourhood of the current point.

multivariate unconstrained optimisation first we consider algorithms for functions for which...

Documents