static and dynamic optimization (42111)

Post on 10-Jan-2022

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Static and Dynamic Optimization (42111)

Build. 303b, room 048Section for Dynamical Systems

Dept. of Applied Mathematics and Computer ScienceThe Technical University of Denmark

Email: nkpo@dtu.dkphone: +45 4525 3356mobile: +45 9351 1161

2019-11-24 14:37

Lecture 12: Stochastic Dynamic Programming

1 / 29

Outline of lecture

Recap: L11 Deterministic Dynamic Programming (D)

Dynamics Programming (C)

Stochastics (Random variable)

Stochastic Dynamic Programming

Booking profiles

Stochastic Bellman

Stochastic optimal stepping (SDD)

Reading guidance: DO p. 83-92.

2 / 29

Dynamic Programming (D)

Find a sequence of decisions ui i = 0, , 1, . . . N which takes the system

xi+1 = fi(xi, ui) x0 = x0

along a trajectory, such that the cost function

J = φ(xN ) +

N−1∑

i=0

Li(xi, ui)

is minimized.

3 / 29

Dynamic Programming

The Bellman function (the optimal cost to go) is defined as:

Vi(xi) = minuN−1

i

Ji(xi, uN−1i )

and is a function of the present state, xi, and index, i.In particular

VN (xN ) = φN (xN )

Theorem

The Bellman function Vi, is given by the backwards recursion

Vi(xi) = minui

[

Li(xi, ui) + Vi+1(xi+1)]

xi+1 = fi(xi, ui) x0 = x0

with the boundary conditionVN (xN ) = φN (xN )

Bellman equation is a functional equation, gives a sufficient condition and V0(x0) = J∗. �

4 / 29

Dynamic programming

ui = arg minui

[

Li(xi, ui) + Vi+1( fi(xi, ui)︸ ︷︷ ︸

xi+1

)

︸ ︷︷ ︸

Wi(xi,ui)

]

If a maximization problem: min → max.

5 / 29

Type of solutions

−50

5

0

5

10

0

5

10

15

20

25

xt

x

Vt(x)

time (i)

V

Fish bone method (Graphical method)

Schematic method (Tables) − > programming

Analytical (e.g. Sep. of variable)

Analytical:

Guess the type of functionality in Vi(x) i.e. up to a number of parameter. Check if it satisfythe Bellman equation. This results in a (number of) recursion(s) for the parameter(s).

6 / 29

Continuous Dynamic Programming

Find the input function ut, t ∈ R, (more precisely {u}T0 ) that takes the system

x = ft(xt, ut) x0 = x0 t ∈ [0, T ] (1)

such that the cost function

J = φT (xT ) +

∫ T

0Lt(xt, ut) dt (2)

is minimized. Define the truncated performance index (cost to go)

Jt(xt, {u}Tt ) = φT (xT ) +

∫ T

t

Ls(xs, us) ds

The Bellman function (optimal cost to go) is defined by

Vt(xt) = min{u}T

t

[

Jt(xt, {u}Tt )

]

We have the following theorem, which states a sufficient condition.

Theorem

The Bellman function Vt(xt), satisfy the equation

−∂Vt(xt)

∂t= min

ut

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

Hamilton Jacobi Bellman (3)

This is a PDE with boundary conditions

VT (xT ) = φT (xT )

�7 / 29

Continuous Dynamic Programming

Proof.

In discrete time we have the Bellman equation

Vi(xi) = minui

[

Li(xi, ui) + Vi+1(xi+1)]

with the boundary conditionVN (xN ) = φN (xN )

t+∆t

i+ 1

t

i

Then

Vt(xt) = minut

[∫ t+∆t

t

Lt(xt, ut) dt+ Vt+∆t(xt+∆t)

]

Apply a Taylor expansion on Vt+∆t(xt+∆t)

Vt(xt) = minut

[

Lt(xt, ut)∆t + Vt(xt) +∂Vt(xt)

∂xft ∆t+

∂Vt(xt)

∂t∆t+ o(|∆t|)

]

8 / 29

Continuous Dynamic Programming

Proof.

Vt(xt) = minut

[

Lt(xt, ut)∆t+ Vt(xt) +∂Vt(xt)

∂xft∆t+

∂Vt(xt)

∂t∆t+o(|∆t|)

]

(just a copy)

Collect the terms which do not depend on the decision (ut):

Vt(xt) = Vt(xt) +∂Vt(xt)

∂t∆t+min

ut

[

Lt(xt, ut) ∆t+∂Vt(xt)

∂xft(xt, ut) ∆t

]

+o(|∆t|)

In the limit ∆t → 0 (and after divide with ∆t):

−∂Vt(xt)

∂t= min

ut

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

9 / 29

The HJB equation:

−∂Vt(xt)

∂t= min

u

[

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

]

(just a copy)

The Hamiltonian function

Ht(xt, ut, λTt ) = Lt(xt, ut) + λT

t ft(xt, ut)

The HJB equation can also be formulated as

−∂Vt(xt)

∂t= min

ut

Ht(xt, ut,∂Vt(xt)

∂x)

Link to Pontryagins maximum principle:

λTt =

∂Vt(xt)

∂x

xt = ft(xt, ut) State equation

−λTt =

∂xt

Ht Costate equation

ut = arg minut

[Ht] Optimality condition

10 / 29

Motion control

Consider the systemxt = ut x0 = x0

and the performance index

J =1

2px2

T +

∫ T

0

1

2u2t dt

The HJB equation, (3), gives:

−∂Vt(xt)

∂t= min

ut

[1

2u2t +

∂Vt(xt)

∂xut

]

VT (xT ) =1

2px2

T

The minimization can be carried out and gives a solution w.r.t. ut which is

ut = −∂Vt(xt)

∂x

So if the Bellman function is known the control action, the decision can be determined from this.If the result above is inserted in the HJB equation we get

−∂Vt(xt)

∂t=

1

2

[∂Vt(xt)

∂x

]2

[∂Vt(xt)

∂x

]2

= −1

2

[∂Vt(xt)

∂x

]2

which is a partial differential equation with the boundary condition

VT (xT ) =1

2px2

T

11 / 29

PDE:

−∂Vt(xt)

∂t= −

1

2

[∂Vt(xt)

∂x

]2

(just a copy)

Inspired of the boundary condition we guess on a candidate function of the type

Vt(x) =1

2stx

2

where the time dependence is in the function, st. Since

∂V

∂x= stx

∂V

∂t=

1

2stx

2

the following equation

−1

2stx

2 = −1

2(stx)

2

must be valid for any x, i.e. we can find st by solving the ODE

st = s2t sT = p

backwards. This is actually (a simple version of) the continuous time Riccati equation. Thesolution can be found analytically or by means of numerical methods. Knowing the function, st,we can find the control input

ut = −∂Vt(xt)

∂x= −stxt

12 / 29

Stochastic Dynamic Programming

13 / 29

The Bank loan

Deterministic:xi+1 = (1 + r)xi − ui x0 = x0

Stochastic:xi+1 = (1 + ri)xi − ui x0 = x0

0 5 10 15 20 250

1

2

3

4

5

6

7

8

9

10Rate of interests

%

time (month)

14 / 29

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4 Bank balance

Bal

ance

time (year)0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

4 Bank balance

Bal

ance

time (year)

15 / 29

Discrete Random Variable

X ∈{x1, x2, ..., xm

}∈ R

n

pk = P{X = xk

}≥ 0

m∑

k=1

pk = 1

1 2 3 4 5 6 7 80

0.2

0.4

E

{

X}

=m∑

k=1

pkxk

E

{

g(X)}

=m∑

k=1

pkg(xk)

16 / 29

Stochastic Dynamic Programming

Consider the problem of minimizing (in some sense):

J = φN (xN , eN ) +

N−1∑

i=0

Li(xi, ui, ei)

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

ei might be vectors reflecting model errors or direct stochastic effects.

17 / 29

Ranking performance Indexes

When ei and others are stochastic variable, what do we mean by that one strategy is better thananother.

In a deterministic situation we mean that

J1 > J2

(J1 (J2) being the objective function for strategy 1 (2)).

In a stochastic situation we can choose the definition

E

{

J1

}

> E

{

J2

}

but others do exists. This choice reflects some kind of average consideration.

18 / 29

Example: Booking profiles

Normally a plane is over booked, ie. more tickets are sold than the number of seats xN . Let xi

be the number of sold tickets on the beginning of day i.

0 N21

If xN < xN we have empty seats - money out the window.If xN > xN we have to pay compensations - also money out the window.

So we want to find a strategy such we are minimizing:

E

{

φ(xN − xN )}

Let wi be the requests for a ticket on day i

(with probability: P{wi = k

}= pk)

and let vi be number of cancellations on day i

(with probability P{vi = k

}= qk).

Dynamics:

xi+1 = xi +min(ui, wi) − vi ei =

[wi

vi

]

Decision information: ui(xi).

19 / 29

Stochastic Bellman Equation

Consider the problem of minimizing:

J = E

{

φ(xN , eN ) +

N−1∑

i=0

Li(xi, ui, ei)}

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

Theorem

The Bellman function (optimal cost to go), Vi(xi) is given by the (backward) recursion:

Vi(xi) = minui

E

{

Li(xi, ui, ei) + Vi+1(xi+1)}

xi+1 = fi(xi, ui, ei)

VN (xN ) = E

{

φN (xN , eN )}

where the optimization is subject to the constraints and the available information . �

20 / 29

Discrete (SDD) case

If ei is discrete, ie.

ei ∈{e1i , e2i , ... emi

}pki = P

{

ei = eki

}

k = 1, 2, ... m

then the stochastic Bellman equation can be expressed as

Vi(xi) = minui

m∑

k=1

pki

[

Li(xi, ui, eki ) + Vi+1(fi(xi, ui, e

ki )

︸ ︷︷ ︸

xi+1

)]

︸ ︷︷ ︸

Wi(xi,ui)

with boundary condition

VN (xN ) =m∑

k=1

pkNφN (xN , ekN )

The entries in the scheme below are now expected values (ie. weighted sums).

Wi ui Vi(xi) u∗i (xi)

xi 0 1 2 3

01234

21 / 29

Optimal stochastic stepping (SDD)

Consider the systemxi+1 = xi + ui + ei x0 = 2,

whereei ∈

{−1 0 1

}ui ∈ {−1, 0, 1}∗

xi ∈ {−2, −1, 0, 1, 2}

and

pki eixi -1 0 1

-2 0 12

12

-1 0 12

12

0 12

0 12

1 12

12

0

2 12

12

0

J = E

{

x24 +

3∑

i=0

x2i + u2

i

}

Notice, no stochastic components.

22 / 29

Optimal stochastic stepping (SDD)

Firstly, from

J = E

{

x24 +

3∑

i=0

x2i + u2

i

}

(no stochastics in cost) we establish V4(x4) = x24. We are assuming perfect state information.

x4 V4

-2 4-1 10 01 12 4

23 / 29

Optimal stochastic stepping (SDD)

Then we establish the W3(x3, u3) function (the cost to go):

W3(x3, u3) =m∑

k=1

pk3

[

L3(x3, u3, ek3) + V4(f3(x3, u3, e

k3)

]

W3(x3, u3) = p13[x23 + u2

3 + V4(x3 + u3 + e13)]

e13, p13

+p23[x23 + u2

3 + V4(x3 + u3 + e23)]

e23, p23

+p33[x23 + u2

3 + V4(x3 + u3 + e33)]

e33, p33

︸ ︷︷ ︸

L3(x3,u3,ek

3)

︸ ︷︷ ︸

f3(x3,u3,ek

3)

or more compact:

W3(x3, u3) = x23 + u2

3 + p13[V4(x3 + u3 + e13)

]

+p23[V4(x3 + u3 + e23)

]

+p33[V4(x3 + u3 + e33)

]

24 / 29

Optimal stochastic stepping (SDD)

W3(x3, u3) =3∑

k=1

pk[

x23 + u2

3 + V4(x3 + u3 + ek3)]

W3(0,−1) =1

2

[02 + (−1)2 + V4(0 − 1−1)

](−1,

1

2)

+0[02 + (−1)2 + V4(0− 1 + 0)

](0, 0)

+1

2

[02 + (−1)2 + V4(0− 1 + 1)

](1,

1

2)

=1

2(1 + 4) + 0 +

1

2(1 + 0) = 3

W3 u3

x3 -1 0 1-2 ∞ 6.5 5.5-1 4.5 1.5 2.50 3 1 31 2.5 1.5 4.52 5.5 3.5 ∞

x4 V4

-2 4-1 10 01 12 4

(just for reference)

25 / 29

Optimal stochastic stepping (SDD)

W3 u3 V3(x3) u∗3(x3)

x3 -1 0 1-2 ∞ 6.5 5.5 5.5 1-1 4.5 1.5 2.5 1.5 00 3 1 3 1 01 2.5 1.5 4.5 1.5 02 5.5 3.5 ∞ 3.5 0

W2 u2 V2(x2) u∗2(x2)

x2 -1 0 1-2 ∞ 7.5 6.25 6.25 1-1 5.5 2.25 3.25 2.25 00 4.25 1.5 3.25 1.5 01 3.25 2.25 4.5 2.25 02 6.25 6.5 ∞ 6.25 -1

26 / 29

Optimal stochastic stepping (SDD)

W1 u1 V1(x1) u∗1(x1)

x1 -1 0 1-2 ∞ 8.25 6.88 6.88 1-1 6.25 2.88 3.88 2.88 00 4.88 2.25 4.88 2.25 01 3.88 2.88 6.25 2.88 02 6.88 8.25 ∞ 6.88 -1

W0 u0 V0(x0) u∗0(x0)

x0 -1 0 12 7.56 8.88 ∞ 7.56 -1

Trace back: ui(xi). A feed back solution. Not a time function.

27 / 29

Deterministic setting (xi+1 = xi + ui i = 0, ... 3)

i 0 1 2 3u∗i -1 0 0 0

Stochastic setting (xi+1 = xi + ui+ei i = 0, ... 3)

x0 u∗0 x1 u∗

1 x2 u∗2 x3 u∗

3-2 1 -2 1 -2 1-1 0 -1 0 -1 00 0 0 0 0 01 0 1 0 1 0

2 -1 2 -1 2 -1 2 0

28 / 29

Concluding remarks

Discrete state and decision space.

Approximation. Grid covering state and decision space.

Curse of dimensions - combinatoric explosion.

29 / 29

top related