multivariate unconstrained optimisation first we consider algorithms for functions for which...
TRANSCRIPT
Multivariate Unconstrained Optimisation
• First we consider algorithms for functions for which derivatives are not available.
• Could try to extend direct method such as Golden Section:
ba
y x
One dimension
Two dimensions
Number of function evaluations increases as en, where n is number of dimensions.
The Polytope Algorithm
• This is a direct search method.• Also known as “simplex” method.• In n dimensional case, at each stage we
have n+1 points x1, x2,…,xn+1 such that:
F(x1) F(x2) F(xn+1)• The algorithm seeks to replace the worst
point, xn+1, with a better one.• The xi lie at the vertices of an n-dimensional
polytope.
The Polytope Algorithm 2
• The new point is formed by reflecting the worst point through the centroid of the best n vertices:
n
iixn
c1
1
• Mathematically the new point can be written:
xr = c + (c-xn+1)where >0 is the reflection coefficient.
• In two dimensions polytope is a triangle; in three dimensions it is a tetrahedron.
Polytope Example
• For n = 2 we have three points at each step.
x3
x1
x2
cc-x3(c-x3)
(worst point)
xr
Detailed Polytope Algorithm
1. Evaluate F(xr)Fr. If F1 Fr Fn, then xr replaces xn+1.
2. If Fr< F1 then xr is new best point and we assume direction of reflection is “good” and attempt to expand polytope in that direction by defining the point,
xe = c + (xr-c)
where >1. If Fe< Fr then xe replaces xn+1; otherwise xr replaces xn+1.
Detailed Polytope Algorithm 2
3. If Fr> Fn then the polytope is too big and we attempt to contract it by defining:
xc = c + (xn+1-c) if Fr Fn+1
xc = c + (xr-c) if Fr < Fn+1
where 0<<1. If Fc< min(Fr,Fn+1) then xc replaces xn+1; otherwise a further contraction is done.
MATLAB Example Polytope
>> banana = @(x)10*(x(2)-x(1)^2)^2+(1-x(1))^2;
>> [x,fval] = fminsearch(banana,[-1.2, 1],optimset('Display','iter'))
Polytope Example by Hand
222
222
222
5.0)1()2(3
5.0)1(25.0),(
yx
yxyxyxF
Polytope Example
• Start with equilateral triangle:
x1 = (0,0) x2=(0,0.5) x3=(3,1)/4
• Take =1, =1.5, and =0.5
Polytope Example: Step 1
• Polytope is
i 1 2 3
xi (0,0) (0,0.5) (0.433,0.25)
F(xi) 9.7918 7.3153 4.8601
• Worst point is x1, c = (x2+ x3)/2 = (0.2165,0.375)
• Relabel points: x3 x1, x1 x3
• xr = c + (c- x3) = (0.433,0.75) and F(xr)=3.6774
• F(xr)< F(x1) so xr is best point so try to expand.
• xe = c + (xr-c) = (0.5413,0.9375) and F(xe)=3.1086
• F(xe)< F(xr) so accept expand
After Step 1
Polytope Example: Step 2
• Polytope is
i 1 2 3
xi (0.433,0.25) (0,0.5) (0.5413,0.9375)
F(xi) 4.8601 7.3153 3.1086
• Worst point is x2, c = (x1+ x3)/2 = (0.4871,0.5938)
• Relabel points: x3 x1, x2 x3, x1 x2
• xr = c + (c- x3) = (0.9743,0.6875) and F(xr)=2.0093
• F(xr)< F(x1) so xr is best point so try to expand.
• xe = c + (xr-c) = (1.2179,0.7344) and F(xe)=2.2837
• F(xe)>F(xr) so reject expand.
After Step 2
Polytope Example: Step 3
• Polytope is
i 1 2 3
xi (0.5413,0.9375) (0.433.0.25) (0.9743,0.6875)
F(xi) 3.1086 4.8601 2.0093
• Worst point is x2, c = (x1+ x3)/2 = (0.7578,0.8125)
• Relabel points: x3 x1, x2 x3, x1 x2
• xr = c + (c- x3) = (1.0826,1.375) and F(xr)=3.1199
• F(xr)>F(x2) so polytope is too big. Need to contract.
• xc = c + (xr-c) = (0.9202,1.0938) and F(xc)=2.2476
• F(xc)<F(xr) so accept contraction.
After Step 3
Polytope Example: Step 4
• Polytope is
i 1 2 3
xi (0.9743,0.6875) (0.5413,0.9375) (0.9202,1.0938)
F(xi) 2.0093 3.1086 2.2476
• Worst point is x2, c = (x1+ x3)/2 = (0.9472,0.8906)
• Relabel points: x3 x2, x2 x3
• xr = c + (c-x3) = (1.3532,0.8438) and F(xr)=2.7671
• F(xr)>F(x2) so polytope is too big. Need to contract.
• xc = c + (xr-c) = (1.1502,0.8672) and F(xc)=2.1391
• F(xc)<F(xr) so accept contraction.
After Step 4
Polytope Example: Step 5
• Polytope is
i 1 2 3
xi (0.9743,0.6875) (0.9202,1.0938) (1.1502,0.8672)
F(xi) 2.0093 2.2476 2.1391
• Worst point is x2, c = (x1+ x3)/2 = (1.0622,0.7773)
• Relabel points: x3 x2, x2 x3
• xr = c + (c- x3) = (1.2043,0.4609) and F(xr)=2.6042
• F(xr)F(x3) so polytope is too big. Need to contract.
• xc = c + (x3-c) = (0.9912,0.9355) and F(xc)=2.0143
• F(xc)<F(xr) so accept contraction.
After Step 5
Polytope Example: Step 6
• Polytope is
i 1 2 3
xi (0.9743,0.6875) (1.1502,0.8672) (0.9912,0.9355)
F(xi) 2.0093 2.1391 2.0143
• Worst point is x2, c = (x1+ x3)/2 = (0.9827,0.8117)
• Relabel points: x3 x2, x2 x3
• xr = c + (c- x3) = (0.8153,0.7559) and F(xr)=2.1314
• F(xr)>F(x2) so polytope is too big. Need to contract.
• xc = c + (xr-c) = (0.8990,0.7837) and F(xc)=2.0012
• F(xc)<F(xr) so accept contraction.
Polytope Example: Final Result
• So after 6 steps the best estimate of the minimum is x = (0.8990,0.7837) for which F(x)=2.0012.
Alternating Variables Method
• Start from point x = (a1, a2,…, an).• Take first variable x1, and minimise F(x1,
a2,…, an) with respect to x1. This gives x1= a1
.• Take second variable x2, and minimise
F(a1, x2,…, an) with respect to x2. This
gives x2= a2.
• Continue with each variable in turn until minimum is reached.
AVM in Two Dimensions
Start
• Method of minimisation over each variable can be any univariate method.
AVM Example in 2D
• Minimise F(x,y)=x2+y2+xy-2x-4y
2
4042
2
2022
xyxy
y
F
yxyx
x
F
• Start at (0,0).
AVM Example in 2Dx y F(x,y) |error|
0 0 0 4
1 0 -1 3
1 1.5 -3.25 0.75
0.25 1.5 -3.8125 0.1875
0.25 1.875 -3.953 0.047
0.0625 1.875 -3.988 0.012
0.0625 1.968 -3.9971 0.0029
0.0156 1.968 -3.9992 0.0008
0.0156 1.992 -3.9998 0.0002
0.004 1.992 -3.99995 0.00005
0.004 1.998 -3.99999 0.00001
Definition of Gradient Vector
• The gradient vector is:
nx
F
x
Fx
F
xg
2
1
)(
• The gradient vector is also written as F(x).
Definition of Hessian Matrix
• The Hessian matrix is defined as:
2
2
2
2
1
2
2
2
22
2
12
21
2
21
2
21
2
)()()(
)()()(
)()()(
nnn
n
n
x
xF
xx
xF
xx
xF
xx
xF
x
xF
xx
xFxx
xF
xx
xF
x
xF
G
• The Hessian matrix is symmetric, and is also written as 2F(x).
Conditions for a Minimum of a Multivariate Function
1. |g(x*)| = 0. That is, all partial derivatives are zero.
2. G(x*) is positive definite. That is, xTG(x*)x > 0 for all vectors x0.
• The second condition implies that the eigenvalues of G(x*) are strictly positive.
Stationary Points
• If g(x*)=0 then x* is said to be a stationary point.
• There are 3 types of stationary point:1. Minimum, e.g., x2+y2 at (0,0)
2. Maximum, e.g., 1-x2-y2 at (0,0)
3. Saddle Point, e.g., x2-y2 at (0,0)
Definition: Level Surface
• F(x)=constant defines a “level surface”.• For different values of the constant we generate
different level surfaces.• For example, in 3-D suppose
F(x,y,z) = x2/4 + y2/9 + z2/4• F(x,y,z) = constant is an ellipsoid surface
centred on the origin.• Thus, the level surfaces are a series of
concentric ellipsoidal surfaces.• The gradient vector at point x is normal to the
level surface passing through x.
Definition: Tangent Hyperplane
• For a differentiable multivariate function, F, the tangent hyperplane at the point xt on the surface F(x)=constant is normal to the gradient vector.
Definition: Quadratic Function
• If the Hessian matrix of F is constant then F is said to be a quadratic function.
• In this case F can be expressed as:
F(x) = (1/2)xTGx + cTx + for a constant matrix G, vector c, and scalar .
F(x) = Gx + c and 2F(x) = G.
Example Quadratic Function
• F(x,y) = x2 + 2y2 + xy – x + 2y
y
x
y
xyxyxF 21
41
12
2
1),(
• Gradient vector is zero at stationary point, so
Gx + c = 0 at stationary point• Need to solve Gx = -c to find stationary point:
x* = G-1c x* = (6/7 -5/7)T
Hessian Matrix Again
• We can predict the behaviour of a general nonlinear function near a stationary point, x*, by looking at the eigenvalues of the Hessian matrix.
• Let uj and j denote the jth eigenvector and eigenvalue of G.
• If j > 0 the function will increase as we move away from x* in direction uj.
• If j < 0 the function will decrease as we move away from x* in direction uj.
• If j = 0 the function will stay constant as we move away from x* in direction uj.
Example Again
y
x
y
xyxyxF 21
41
12
2
1),(
1 = 1.5858 and 2 = 4.4142, so F increases as we move away from the stationary point at (6/7 -5/7)T.
• So the stationary point is a minimum.
Example in 4D
4
3
2
1
,
2132
1410
3121
2012
cG
• In MATLAB:
>> c = [-1 2 3 4]’;
>>G = [2 1 0 2; 1 2 -1 3; 0 -1 4 1; 2 3 1 -2];
>>x = G\(-c)
>>[u,lambda] = eigs(G)
Descent Methods
• Seek a general algorithm for unconstrained minimisation of a smooth multivariate function.
• Require that F decreases at each iteration.
• A method that imposes this type of condition is called a descent method.
A General Descent Algorithm
Let xk be current iterate.• If converged then quit; xk is estimate of
minimum.• Compute a nonzero vector pk giving
direction of search.• Compute a positive scalar step length, k for
which F(xk+ k pk) < F(xk).• New estimate of minimum is xk+1 = xk+ kpk.
Increment k by 1, and go to step 1.
Method of Steepest Descent
• Direction in which F decreases most steeply is -F, so we use this as the search direction.
• New iterate is xk+1 = xk - kF, where k is non-negative scalar chosen so that xk+1 is the minimum point along the line from xk in the direction -F.
• Thus, k minimises F(xk - F) with respect to .
Steepest Descent Algorithm
• Initialise: x0, k=0
• Loop:u = F(xk)
if |u|=0 then quit
else minimise h()=F(xk- u) to get k
xk+1 = xk- ku
k = k+1
if (not finished) go to Loop
Example
• F(x,y) = x3 + y3 - 2x2 + 3y2 - 8
660
046),(,
63
43),(
2
2
y
xyxG
yy
xxyxF
F(x,y) = 0 gives 3x2-4x=0 so x = 0 or 4/3; and, 3y2+6y=0 so y=0 or -2.
(x,y) G Type
(0,0) Indefinite Saddle point
(0,-2) Negative definite Maximum
(4/3,0) Positive definite Minimum
(4/3,-2) Indefinite Saddle point
Solve with Steepest Descent
• Take x0 = (1 -1)T, then F(x0)=(-1 -3)T.
• h()F(x0- F(x0)) = F(1+,-1+3) =
(1+)3+(3-1)3-2(1+)2+3(3-1)2-8
• Minimise h() with respect to .h/ = 3(1+)2+9(3-1)2-4(1+)+18(3-1)
= 842 + 2 -10 = 0
• So = 1/3 or -5/14. must be nonnegative so = 1/3.
Solve with Steepest Descent
• x1 = x0 - F(x0) = (1 -1)T – (-1/3 -1)T = (4/3 0)T.
• This is the exact minimum.
• We were lucky that the search direction at x0 points directly towards (4/3 0)T.
• Usually we would need to do more than one iteration to get a good solution.
Newton’s Method• Approximate F locally by a quadratic function and
minimise this exactly.• Taylor’s Theorem:
F(x) F(xk)+(g(xk))T(x-xk)+
(1/2)(x-xk)TG(xk)(x-xk)
= F(xk)-(g(xk))Txk+ (1/2)xkTG(xk)xk+
(g(xk)-G(xk)xk)Tx+(1/2)xTG(xk)x• RHS is minimum when
g(xk) – G(xk)xk+G(xk)xk+1=0• So
xk+1= xk – [G(xk)]-1g(xk)
Search direction is -[G(xk)]-1g(xk) and step length is 1.
Newton’s Method Example
• Rosenbrock’s function:
F(x,y) = 10(y-x2)2 + (1-x)2
2040
40240120)(
)(20
)1(2)(40)(
2
2
2
x
xyxxG
xy
xxyxxg
• Use Newton’s Method starting at (-1.2 1)T.
MATLAB Solution
>> F=@(x,y)10*(y-x^2)^2+(1-x)^2
>>fgrad1=@(x,y)-40*x*(y-x^2)-2*(1-x)
>>fgrad2=@(x,y)20*(y-x^2)
>>G11=@(x,y)120*x^2-40*y+2
>>x=[-1.2;1]
>>x=x-inv([G11(x(1),x(2)) -40*x(1);-40*x(1) 20])*[fgrad1(x(1),x(2)) fgrad2(x(1),x(2))]’
MATLAB Iterations
x y F(x,y)
-1.2 1 6.7760
-0.9755 0.9012 3.9280
0.0084 -0.9679 10.3533
0.0571 0.0009 0.8892
0.9573 0.1060 6.5695
0.9598 0.9212 0.0016
1.0000 0.9984 2.6210-5
1.0000 1.0000 2.4110-14
Notes on Newton’s Method
• Newton’s Method converges quadratically if the quadratic model is a good fit to the objective function.
• Problems arise if the quadratic model is not a good fit outside a small neighbourhood of the current point.