optimization techniques m. fatih amasyalı. what is optimization?

27
Optimization Techniques M. Fatih Amasyalı

Upload: xander-benefield

Post on 14-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Optimization Techniques

M. Fatih Amasyalı

What is optimization?

Optimization examples

Optimization Techniques

• Derivative Techniques– Gradient descent– Newton Raphson– LM– BFGS

• Non Derivative Techniques– Hill climbing– Genetic algorithms– Simulated annealing

Linear, nonlinear, convex functions

• f is a function RnRFor every x1,x2 R∈ n

• f is linear if f(x1+x2)=f(x1)+f(x2) and f(r*x1)=r*f(x1)

• f(x)=2*x is linear and affine. f(x)=2*x+3 is affine but not linear.

• f(x)=3 is constant, but not linear.• f is convex if λ*f(x1) + β*f(x2) ≥ f(λ*x1 + β*x2). where λ+β=1, λ≥0,

β≥0

Linear, nonlinear, convex functions

• f(x1)=3*x1+3• f(x1,x2)=a*x1+b*x2+c• f(x1,x2)=x2^2+x1^2+3*x1• f(x)=sin(x) is convex when x [?,?]∈• Concave?

Commonsense

• Use derivative• Go to the negative direction of derivative• But, step size?

General Algorithm

• initial guess• While (improvement is significant) and

(maximum number of iterations is not exceeded)– improve the result

Gradient Descent

• Using only derivative sign• Using also magnitude of the derivative• Effect of the step size• Effect of the starting point• Local minimum problem

% The gradient descent algorithm with constant stepsize % f(x)=x^2, with derivative f'(x)=2x clear all;close all; % plot objective functionx=-5:0.01:5;y=zeros(1,length(x));for i=1:length(x) y(i)=x(i)^2;endplot(x,y); x_old = 0;x_new = -4.9; % The algorithm starts at x=-4.9eps = 0.05; % step sizeprecision = 0.00001; % stopping condition1max_iter=200; % stopping condition2Xs=zeros(1,max_iter);Ys=zeros(1,max_iter);

i=1;while abs(x_new - x_old) > precision && max_iter>=i Xs(i)=x_new; Ys(i)=x_new^2; hold on; plot(Xs(i),Ys(i),'r*'); text(Xs(i),Ys(i),int2str(i)); x_old = x_new; df= 2 * x_old; x_new = x_old - eps * sign(df); % gradient sign i=i+1;endfigure;plot(Xs(1:i-1));xlabel('iteration');ylabel('x values');

gradient_desc_1.m

Result

• f(x)=x^2• f’(x)=2*x• Max_iter=200• Constant step

size =0.05• Starting point=-

4.9

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-5

-4

-3

-2

-1

0

1

iteration

x va

lues

• It reaches the maximum iteration number

• It reached 0 at 100.iteration , then zig-zag.

• It selects 0 or 0.05 as the point f is minimum.

0 20 40 60 80 100 120 140 160 180 200-5

-4

-3

-2

-1

0

1

iteration

x va

lues

140 142 144 146 148 150 152

-0.05

0

0.05

0.1

iteration

x va

lues

Starting point=4.9

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-1

0

1

2

3

4

5

iteration

x va

lues

Step size=0.1

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-5

-4

-3

-2

-1

0

1

iteration

x va

lues

• It reaches the maximum iteration number

• It reached 0 at 50.iteration , then zig-zag.

• It selects 0 or 0.1 as the point f is minimum.

Step size=0.01• It reaches the

maximum iteration number

• It does not reached to 0.

• It selects -2.91 as the minimum point.

• It will zig-zag.

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-5

-4.5

-4

-3.5

-3

-2.5

iteration

x va

lues

Gradient Descent

• Using also the magnitude of the derivative• x_new = x_old - eps * df; • instead of x_new = x_old - eps * sign(df);• Can we reduce the required number of

iterations?

gradient_desc_2.m

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

251

2

3

4

5

6

78

910

1112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104

0 20 40 60 80 100 120-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

iteration

x va

lues

• Step size=0.05• Max_iter=200• The number of

iteration is 120.• It does not zig-zag.• It selects 0±10-5

the point f is minimum.

gradient_desc_2.m• Step size=0.1• Max_iter=200• The number of

iteration is 60.• It does not zig-zag.• It selects 0±10-5

the point f is minimum.

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

251

2

3

4

5

67

891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253

0 10 20 30 40 50 60-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

iteration

x va

lues

gradient_desc_2.m• Step size=0.01• Max_iter=200• It reaches the

maximum iteration number.

• It does not zig-zag.• It selects 0±10-2 as

the point f is minimum.

• It requires the more iteration.

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

25123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-5

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

iteration

x va

lues

gradient_desc_2.m• Step size=0.7• Max_iter=200• The number of

iteration is 16.• It does not zig-zag.• It selects 0±10-5

the point f is minimum.

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

251

2

3 45678910111213141516

0 2 4 6 8 10 12 14 16-5

-4

-3

-2

-1

0

1

2

iteration

x va

lues

gradient_desc_2.m• Step size=0.99• Max_iter=200• It reached the

maximum iteration number.

• It zig-zag, but zig-zag magnitude decreases.

• It selects 0±10-1 the point f is minimum.

• It requires more iteration.

-5 -4 -3 -2 -1 0 1 2 3 4 50

5

10

15

20

251

23

45

67

89

1011

1213

1415 1617 1819 2021 2223 2425 2627 2829 3031 3233 3435 3637 3839 4041 4243 4445 4647 4849 5051 5253 5455 5657 5859 6061 6263 6465 6667 6869 7071 7273 7475 7677 7879 8081 8283 8485 8687 8889 9091 9293 9495 9697 9899 100101 102103 104105 106107 108109 110111 112113 114115 116117 118119 120121 122123 124125 126127 128129 130131 132133 134135 136137 138139 140141 142143 144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

0 20 40 60 80 100 120 140 160 180 200-5

-4

-3

-2

-1

0

1

2

3

4

5

iteration

x va

lues

gradient_desc_2.m• Step size=1• Max_iter=200• It reached the

maximum iteration number.

• It does not converge.

-3 -2 -1 0 1 2 3

x 1016

0

1

2

3

4

5

6

7

8x 10

32

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185 186187 188189 190191 192193

194195

196

197

198

199

200

0 20 40 60 80 100 120 140 160 180 200-3

-2

-1

0

1

2

3x 10

16

iteration

x va

lues

Starting point matters?• f(x)=f(x)=x^4-3x^3+2 with derivative

f'(x)=4x^3-9x^2• It has two extreme points x=0, x=9/4• gradient_desc_3.m

Starting point=3.9

-2 -1 0 1 2 3 4-10

0

10

20

30

40

50

60

70

1

23456789101112131415161718192021222324252627282930313233343536373839404142

0 5 10 15 20 25 30 35 40 452.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

iteration

x va

lues

-2 -1 0 1 2 3 4-10

0

10

20

30

40

50

60

70

1

2

3456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200

Starting point=-1.9

0 20 40 60 80 100 120 140 160 180 200-2

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

iteration

x va

lues

Step size=0.01

• Step size=0.05 can handle plateau, but not guaranteed.

• Step size=0.1 is very big for this function, it does not converge

-2 -1 0 1 2 3 4-10

0

10

20

30

40

50

60

70

1

2 34 5 6789

1 2 3 4 5 6 7 8 9-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

iteration

x va

lues

Gradient descent

• Static zig-zag (hopeless) vs. slow converge (with hope)

• Very Small step size slow converge, plateau problem

• Very big step size do not converge• Big and small step sizes depend on the

objective function.

References • http://math.tutorvista.com/calculus/newton-raphson-method.html • http://math.tutorvista.com/calculus/linear-approximation.html • http://en.wikipedia.org/wiki/Newton's_method • http://en.wikipedia.org/wiki/Steepest_descent • http://www.pitt.edu/~nak54/Unconstrained_Optimization_KN.pdf • http://mathworld.wolfram.com/MatrixInverse.html • http://lpsa.swarthmore.edu/BackGround/RevMat/MatrixReview.html • http://www.cut-the-knot.org/arithmetic/algebra/Determinant.shtml • Matematik Dünyası, MD 2014-II, Determinantlar• http://www.sharetechnote.com/html/EngMath_Matrix_Main.html • Advanced Engineering Mathematics , Erwin Kreyszig, 10th Edition, John Wiley & Sons, 2011• http://en.wikipedia.org/wiki/Finite_difference • http://ocw.usu.edu/Civil_and_Environmental_Engineering/Numerical_Methods_in_Civil_Engineering

/NonLinearEquationsMatlab.pdf• http://www-math.mit.edu/~djk/calculus_beginners/chapter09/section02.html • http://stanford.edu/class/ee364a/lectures/intro.pdf • http://2020insight.com/chasing-returns-risk-getting-burned/ • http://jitkomut.lecturer.eng.chula.ac.th/matlab/eng_probs.html