= argmin - thecatweb.cecs.pdx.edu/~edam/slides/multivariateoptimizationx4.pdf · example 1:...

Example 1: Optimization Problem

a1

a 2

5 4 3 2 1 0 1 2 3 4 55

0

5

J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 3

Overview of Multivariate Optimization Topics

Problem definition Algorithms

Cyclic coordinate method Steepest descent Conjugate gradient algorithms PARTAN Newtons method Levenberg-Marquardt

Concise, subjective summary



a1

a 2

5 4 3 2 1 0 1 2 3 4 55

0

5


Multivariate Optimization Overview

The unconstrained optimization problem is a generalization ofthe line search problem

Find a vector a such thata = argmin

af(a)

Note that the are no constraints on a Example: Find the vector of coefficients (w Rp1) that

minimize the average absolute error of a linear model

Akin to a blind person trying to find their way to the bottom of avalley in a multidimensional landscape

We want to reach the bottom with the minimum number of canetaps

Also vaguely similar to taking core samples for oil prospecting






Example 1: MATLAB Code

function [] = OptimizationProblem ();

%==============================================================================

% User -Specified Parameters

%==============================================================================

x = -5:0.05 :5;

y = -5:0.05 :5;

%==============================================================================

% Evaluate the Function

%==============================================================================

[X,Y] = meshgrid(x,y);

[Z,G] = OptFn(X,Y);

functionName = OptimizationProblem ;

fileIdentifier = fopen([ functionName .tex],w);

%==============================================================================

% Contour Map

%==============================================================================

figure;

FigureSet(2,Slides);

contour(x,y,Z ,50);

xlabel(a_1);

ylabel(a_2);

zoom on;

AxisSet (8);

fileName = sprintf(%s-%s,functionName ,Contour );




case 1, view (45 ,10);

case 2, view ( -55 ,22);

case 3, view ( -131 ,10);

otherwise , error(Not implemented. );

end

fileName = sprintf(%s-%s%d,functionName ,Surface ,c1);

print(fileName ,-depsc);

fprintf(fileIdentifier ,%%=============================================================================

fprintf(fileIdentifier ,\\ newslide\n);

fprintf(fileIdentifier ,\\ slideheading{Example \\ arabic{exc}: Optimization Problem }\n);

fprintf(fileIdentifier ,%%=============================================================================

fprintf(fileIdentifier ,\\ includegraphics[scale =1]{ Matlab/%s}\n,fileName );

fprintf(fileIdentifier ,\n);

end

%==============================================================================

% List the MATLAB Code

%==============================================================================

fprintf(fileIdentifier ,%%==============================================================================\ n

fprintf(fileIdentifier ,\\ newslide \n);

fprintf(fileIdentifier ,\\ slideheading{Example \\ arabic{exc}: MATLAB Code}\n);


fprintf(fileIdentifier ,\t \\ matlabcode{Matlab/%s.m}\n,functionName );

fclose(fileIdentifier );





fprintf(fileIdentifier ,\\ stepcounter{exc}\n);





%==============================================================================

% Quiver Map

%==============================================================================

figure;


axis([-5 5 -5 5]);

contour(x,y,Z ,50);

h = get(gca ,Children );

set(h,LineWidth ,0.2);

hold on;

xCoarse = -5:0.5:5;

yCoarse = -5:0.5:5;

[X,Y] = meshgrid(xCoarse ,yCoarse );

[ZCoarse ,GCoarse] = OptFn(X,Y);

nr = size(xCoarse ,1);

dzx = GCoarse( 1:nr ,1:nr);

dzy = GCoarse(nr + (1:nr),1:nr);

quiver(xCoarse ,yCoarse ,dzx ,dzy);

hold off;

xlabel(a_1);

ylabel(a_2);

zoom on;


Global Optimization?

In general, all optimization algorithms find a local minimum in asfew steps as possible

There are also global optimization algorithms based on ideassuch as

Evolutionary computing Genetic algorithms Simulated annealing

None of these guarantee convergence in a finite number ofiterations

All require a lot of computation


AxisSet (8);

fileName = sprintf(%s-%s,functionName ,Quiver);








%==============================================================================

% 3D Maps

%==============================================================================

figure;

set(gcf ,Renderer ,zbuffer );


h = surf(x,y,Z);

set(h,LineStyle ,None);

xlabel(a_1);

ylabel(a_2);

shading interp;

grid on;

AxisSet (8);

hl = light(Position ,[0,0,30]);

set(hl ,Style,Local);

set(h,BackFaceLighting ,unlit)

material dull

for c1=1:3

switch c1


Cyclic Coordinate Method

1. For i = 1 to p,

ai := argmin

f([a1, a2, . . . , ai1, , ai+1, . . . , ap])

2. Loop to 1 until convergence

+ Simple to implement

+ Each line search can be performed semi-globally to avoid shallowlocal minima

+ Can be used with nominal variables

+ f(a) can be discontinuous

+ No gradient required

Very slow compared to gradient-based optimization algorithms Usually only practical when the number of parameters, p, is small There are modified versions with faster convergence


Optimization Comments

Ideally, when we construct models we should favor those whichcan be optimized with few shallow local minima and reasonablecomputation

Graphically you can think of the function to be minimized as theelevation in a complicated high-dimensional landscape

The problem is to find the lowest point The most common approach is to go downhill The gradient points in the most uphill direction The steepest downhill direction is the opposite of the gradient Most optimization algorithms use a line search algorithm The methods mostly differ only in the way that the direction of

descent is generated


Example 2: Cyclic Coordinate Method

5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


Optimization Algorithm Outline

The basic steps of these algorithms is as follows1. Pick a starting vector a

2. Find the direction of descent, d

3. Move in that direction until a minimum is found:

:= argmin

f(a + d)

a := a + d


Most of the theory of these algorithms is based on quadraticsurfaces

Near local minima, this is a good approximation Note that the functions should (must) have continuous gradients

(almost) everywhere



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror



3 2 1 03.5

3

2.5

2

1.5

1

0.5

0

0.5

X

Y


Example 2: Relevant MATLAB Code

function [] = CyclicCoordinate ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = -1;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,dzx ,dzy] = OptFn(x,y);

a(1,:) = [x y];

f(1) = z;

for cnt = 2:ns,

if rem(cnt ,2)==1 ,

d = [1 0]; % Along x direction

else

d = [0 1]; % Along y direction

end;

[b,fmin] = LineSearch ([x y],d,b0 ,ls);

x = x + b*d(1);

y = y + b*d(2);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


print -depsc CyclicCoordinateContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;

xerr = (sum (((a-ones(ns ,1)*[ xopt2 yopt2])).^2)).^(1/2);

h = plot(k-1,xerr ,b);

set(h(1),Marker,.);

set(h,MarkerSize ,6);

xlabel(Iteration );

ylabel(Euclidean Position Error);

xlim ([0 ns -1]);

ylim ([0 xerr (1)]);

grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc CyclicCoordinatePositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;

h = plot(k-1,f,b ,[0 ns],zopt *[1 1],r ,[0 ns],zopt2 *[1 1],g);

set(h(1),Marker,.);


xlabel(Iteration );

ylabel(Function Value);

ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;

set(gca ,Box,Off);

AxisSet (8);


a(cnt ,:) = [x y];

f(cnt) = fmin;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));


[zopt ,id1] = min(z);

[zopt ,id2] = min(zopt);

id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));


[zopt2 ,id1] = min(z);

[zopt2 ,id2] = min(zopt2);

id1 = id1(id2);

xopt2 = x(id1 ,id2);

yopt2 = y(id1 ,id2);

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);


print -depsc CyclicCoordinateErrorLinear;


set(h(1),LineWidth ,1.2);


h = plot(xopt ,yopt ,kx,xopt ,yopt ,rx);



set(h(1),MarkerSize ,5);


hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc CyclicCoordinateContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-1.5 + (-2:0.05:2),-1.5 + (-2:0.05 :2));


contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);


Example 3: Steepest Descent

5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


Steepest Descent

The gradient of the function f(a) is defined as the vector of partialderivatives:

af(a) [

f(a)a1

f(a)a2

. . . f(a)ap

]T

It can be shown that the gradient, af(a), points in thedirection of maximum ascent

The negative of the gradient, af(a), points in the directionof maximum descent

A vector d is a direction of descent if there exists a such thatf(a + d) < f(a) for all 0 < <

It can also be shown that d is a direction of descent iff(af(a))

T

d < 0

The algorithm of steepest descent uses d = af(a) The most fundamental of all algorithms for minimizing a

continuously differentiable function



2 1.8 1.6 1.4 1.22.2

2.1

2

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

X

Y


Steepest Descent

+ Very stable algorithm

Can converge very slowly once near the local minima where thesurface is approximately quadratic



function [] = SteepestDescent ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g] = OptFn(x, y);

a(1,:) = [x y];

f(1) = z;

d = -g/norm(g);

for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);


d = -g;

d = d/norm(d);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



[zopt zopt2]

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);

hold on;


Example 3: Steepest Descent Method

0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


AxisSet (8);

print -depsc SteepestDescentErrorLinear ;


h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc SteepestDescentContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-1.6 + (-0.5:0.01:0.5),-1.7 + (-0.5:0.01:0.5));

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);

zoom on;


Conjugate Gradient Algorithms

1. Take a steepest descent step

2. For i = 2 to p

:= argmin

f(a + d)

a := a + d gi := f(a) := g

Ti gi

gTi1gi1

d := gi + di3. Loop to 1 until convergence

Based on quadratic approximations of f Called the Fletcher-Reeves method


AxisSet (8);

print -depsc SteepestDescentContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc SteepestDescentPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;

set(gca ,Box,Off);


Example 4: Fletcher-Reeves Conjugate Gradient

0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror



1.5 2 2.53.5

3.4

3.3

3.2

3.1

3

2.9

2.8

2.7

2.6

2.5

X

Y


h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc FletcherReevesContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);

zoom on;



function [] = FletcherReeves ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);


a(1,:) = [x y];

f(1) = z;

d = -g/norm(g); % First direction

for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);

go = g; % Old gradient


beta = (g*g)/(go *go);


AxisSet (8);

print -depsc FletcherReevesContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc FletcherReevesPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;

set(gca ,Box,Off);


d = -g + beta*d;

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);

hold on;


Example 5: Polak-Ribiere Conjugate Gradient

5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


AxisSet (8);

print -depsc FletcherReevesErrorLinear;



1.5 2 2.53.5

3.4

3.3

3.2

3.1

3

2.9

2.8

2.7

2.6

2.5

X

Y


Conjugate Gradient Algorithms Continued

There is also a variant called Polak-Ribiere where

:=(gi gi1)Tgi

gT

i1gi1

+ Only requires the gradient

+ Converges in a finite No. steps when f(a) is quadratic and perfectline searches are used

Less stable numerically than steepest descent Sensitive to inexact line searches



function [] = PolakRibiere ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);


a(1,:) = [x y];

f(1) = z;


for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);

go = g; % Old gradient


beta = ((g-go)*g)/(go *go);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


d = -g + beta*d;

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);

hold on;



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


AxisSet (8);

print -depsc PolakRibiereErrorLinear;


h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc PolakRibiereContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);

zoom on;


Parallel Tangents (PARTAN)

1. First gradient step

d := f(a) := argmin f(a + d) sp := d a := a + sp

2. Gradient Step

dg := f(a) := argmin f(a + d) sg := d a := a + sg

3. Conjugate Step

dp := sp + sg := argmin f(a + d) sp := d a := a + sp



AxisSet (8);

print -depsc PolakRibiereContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc PolakRibierePositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;

set(gca ,Box,Off);


Example 6: PARTAN

1.5 2 2.53.5

3.4

3.3

3.2

3.1

3

2.9

2.8

2.7

2.6

2.5

X

Y


PARTAN Concept

a0

a1

a2a3

a4

a5

a6 a7

First two steps are steepest descent Thereafter, each iteration consists of two steps

1. Search along the direction

di = ai ai2where ai is the current point and ai2 is the point from twosteps ago

2. Search in the direction of the negative gradient

di = f(ai)


Example 6: PARTAN

0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


Example 6: PARTAN

5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


cnt = 2;

while cnt 0, % Line search in conjugate direction was successful

fprintf(P :);

x = xg + bp*d(1);

y = yg + bp*d(2);


Example 6: PARTAN

0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


else % Could not move - do another gradient update

cnt = cnt + 1;

a(cnt ,:) = a(cnt -1 ,:);

f(cnt) = f(cnt -1);

if cnt==ns ,

break;

end;

fprintf(G2:);

[z,g] = OptFn(xg,yg);

d = -g/norm(g); % Direction

[bp ,fmin] = LineSearch ([xg yg],d,b0 ,ls);

x = xg + bp*d(1);

y = yg + bp*d(2);

end;

% Update anchor point

xa = xg;

ya = yg;

cnt = cnt + 1;

a(cnt ,:) = [x y];

f(cnt) = OptFn(x,y);

fprintf( %d %5.3f\n,cnt ,f(cnt ));

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);



function [] = Partan ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g] = OptFn(x,y);

a(1,:) = [x y];

f(1) = z;

xa = x;

ya = y;

% First step - substitute for a Conjugate step


[bp,fmin] = LineSearch ([x y],d,b0 ,100);

x = x + bp*d(1); % Standin for a conjugate step

y = y + bp*d(2);

a(2,:) = [x y];

f(2) = fmin;


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc PartanPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc PartanErrorLinear;


yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;


PARTAN Pros and Cons

a0

a1

a2a3

a4

a5

a6 a7

+ For quadratic functions, converges in a finite number of steps

+ Easier to implement than 2nd order methods

+ Can be used with large number of parameters

+ Each (composite) step is at least as good as steepest descent

+ Tolerant of inexact line searches

Each (composite) step requires two line searches


AxisSet (8);

print -depsc PartanContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc PartanContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );



Example 7: Newtons with Steepest Descent Safeguard

0 0.5 1 1.5 2

3

2.5

2

1.5

X

Y


Newtons Method

ak+1 = ak H(ak)1 f(ak)where f(ak) is the gradient and H(ak) is the hessian of f(a),

H(ak)

2f(a)a21

2f(a)a1 a2

. . . 2f(a)

a1 ap2f(a)a2 a1

2f(a)a22

. . . 2f(a)

a2 ap...

.... . .

...2f(a)ap a1

2f(a)ap a2

. . . 2f(a)a2p

Based on a quadratic approximation of the function f(a) If f(a) is quadratic, converges in one step If H(a) is positive-definite, the problem is well defined near local

minima where f(a) is nearly quadratic



0 10 20 30 40 50 60 70 80 900

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


y = y + b*d(2);

[z,g,H] = OptFn(x,y);

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);





0 10 20 30 40 50 60 70 80 900

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc NewtonsContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.0 + (-1:0.02:1), -2.4 + (-1:0.02 :1));

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);



function [] = Newtons ();

%clear all;

close all;

ns = 100;

x = -3; % Starting x

y = 1; % Starting y

b0 = 1;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g,H] = OptFn(x, y);

a(1,:) = [x y];

f(1) = z;

for cnt = 2:ns,

d = -inv(H)*g;

if d*g>0, % Revert to steepest descent if is not direction of descent

%fprintf ((%2d of %2d) Min. Eig :%5 .3f Reverting...\n,cnt ,ns,min(eig(H)));

d = -g;

end;

d = d/norm(d);

[b,fmin] = LineSearch ([x y],d,b0 ,100);

%a(cnt ,:) = (a(cnt -1,:) - inv(H)*g); % Pure Newton s Method

x = x + b*d(1);


Newtons Method Pros and Cons

ak+1 = ak H(ak)1 f(ak)

+ Very fast convergence near local minima

Not guaranteed to converge (may actually diverge) Requires p p Hessian Requires a p p matrix inverse that uses O(p3) operations


ylabel(Y);

zoom on;

AxisSet (8);

print -depsc NewtonsContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc NewtonsPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);


Levenberg-Marquardt

1. Determine if kI + H(ak) is positive definite. If not, k := 4kand repeat.

2. Solve the following equation for ak+1

[kI + H(ak)] (ak+1 ak) = f(ak)

3.

rk f(ak) f(ak+1)q(ak) q(ak+1)

where q(a) is the quadratic approximation of f(a) based on thef(a), f(a), and H(ak)

4. If rk < 0.25, then k+1 := 4kIf rk > 0.75, then k+1 := 12kIf rk 0, then ak+1 := ak

5. If not converged, k := k + 1 and loop to 1.


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc NewtonsErrorLinear;


Example 8: Levenberg-Marquardt Conjugate Gradient

1.5 2 2.53.5

3.4

3.3

3.2

3.1

3

2.9

2.8

2.7

2.6

2.5

X

Y


Levenberg-Marquardt Comments

Similar to Newtons method Has safety provisions for regions where quadratic approximation is

inappropriate

CompareNewtons: ak+1 = ak H(ak)1 f(ak)

LM : [kI + H(ak)] (ak+1 ak) = f(ak)

If = 0, these are equivalent If , ak+1 ak is chosen to ensure that the smallest eigenvalue of H(ak) is

positive and sufficiently large ( )



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



5 0 55

4

3

2

1

0

1

2

3

4

5

X

Y


y = a(cnt ,2);

zo = zn; % Old function value

zn = OptFn(x,y);

xd = (a(cnt ,:)-ap);

qo = zo;

qn = zn + g*xd + 0.5*xd *H*xd;

if qo==qn , % Test for convergence

x = a(cnt ,1);

y = a(cnt ,2);

a(cnt:ns ,:) = ones(ns -cnt +1 ,1)*[x y];

f(cnt:ns ,:) = OptFn(x,y);

break;

end;

r = (zo -zn)/(qo-qn);

if r0.50 , % 0.75 is recommended , but much slower

eta = eta / 2;

end;

if zn>zo , % Back up

a(cnt ,:) = a(cnt -1 ,:);

else

ap = a(cnt ,:);

end;

x = a(cnt ,1);



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


y = a(cnt ,2);

a(cnt ,:) = [x y];

f(cnt) = OptFn(x,y);

%disp([cnt a(cnt ,:) f(cnt) r eta])

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(square);



function [] = LevenbergMarquardt ();

%clear all;

close all;

ns = 26;

x = -3; % Starting x

y = 1; % Starting y

eta = 0.0001;

a = zeros(ns ,2);

f = zeros(ns ,1);

[zn,g,H] = OptFn(x, y);

a(1,:) = [x y];

f(1) = zn;

ap = [x y]; % Previous point

for cnt = 2:ns,

[zn ,g,H] = OptFn(x,y);

while min(eig(eta*eye (2)+H))

set(gca ,Box,Off);

AxisSet (8);

print -depsc LevenbergMarquardtErrorLinear;


hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);








hold off;

xlabel(X);

ylabel(Y);

zoom on;

AxisSet (8);

print -depsc LevenbergMarquardtContourA ;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(square);

hold on;

h = plot(a(:,1),a(:,2),k,a(:,1),a(:,2),r);



hold off;

xlabel(X);

ylabel(Y);


Levenberg-Marquardt Pros and Cons

[kI + H(ak)] (ak+1 ak) = f(ak)

Many equivalent formulations+ No line search required

+ Can be used with approximations to the hessian

+ Extremely fast convergence (2nd order)

Requires gradient and hessian (or approximate hessian) Requires O(p3) operations for each solution to the key equation


zoom on;

AxisSet (8);

print -depsc LevenbergMarquardtContourB ;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;



set(h(1),Marker,.);


xlabel(Iteration );


xlim ([0 ns -1]);


grid on;

set(gca ,Box,Off);

AxisSet (8);

print -depsc LevenbergMarquardtPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;


set(h(1),Marker,.);


xlabel(Iteration );


ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;


Optimization Algorithm Summary

Algorithm Convergence Stable f(a) H(a) LSCyclic Coordinate Slow Y N N YSteepest Descent Slow Y Y N YConjugate Gradient Fast N Y N YPARTAN Fast Y Y N YNewtons Method Very Fast N Y Y NLevenberg-Marquardt Very Fast Y Y Y N


= argmin - thecatweb.cecs.pdx.edu/~edam/slides/multivariateoptimizationx4.pdf · example 1:...

Documents