static and dynamic optimization (42111)

Static and Dynamic Optimization (42111)

Build. 303b, room 048Section for Dynamical Systems

Dept. of Applied Mathematics and Computer ScienceThe Technical University of Denmark

Email: nkpo@dtu.dkphone: +45 4525 3356mobile: +45 9351 1161

2019-11-24 14:37

Lecture 12: Stochastic Dynamic Programming

1 / 29

Outline of lecture

Recap: L11 Deterministic Dynamic Programming (D)

Dynamics Programming (C)

Stochastics (Random variable)

Stochastic Dynamic Programming

Booking profiles

Stochastic Bellman

Stochastic optimal stepping (SDD)

Reading guidance: DO p. 83-92.

2 / 29

Dynamic Programming (D)

Find a sequence of decisions ui i = 0, , 1, . . . N which takes the system

xi+1 = fi(xi, ui) x0 = x0

along a trajectory, such that the cost function

J = φ(xN ) +

N−1∑

Li(xi, ui)

is minimized.

3 / 29

Dynamic Programming

The Bellman function (the optimal cost to go) is defined as:

Vi(xi) = minuN−1

Ji(xi, uN−1i )

and is a function of the present state, xi, and index, i.In particular

VN (xN ) = φN (xN )

Theorem

The Bellman function Vi, is given by the backwards recursion

Vi(xi) = minui

Li(xi, ui) + Vi+1(xi+1)]

xi+1 = fi(xi, ui) x0 = x0

with the boundary conditionVN (xN ) = φN (xN )

Bellman equation is a functional equation, gives a sufficient condition and V0(x0) = J∗. �

4 / 29

Dynamic programming

ui = arg minui

Li(xi, ui) + Vi+1( fi(xi, ui)︸︷︷︸

︸︷︷︸

Wi(xi,ui)

If a maximization problem: min → max.

5 / 29

Type of solutions

time (i)

Fish bone method (Graphical method)

Schematic method (Tables) − > programming

Analytical (e.g. Sep. of variable)

Analytical:

Guess the type of functionality in Vi(x) i.e. up to a number of parameter. Check if it satisfythe Bellman equation. This results in a (number of) recursion(s) for the parameter(s).

6 / 29

Continuous Dynamic Programming

Find the input function ut, t ∈ R, (more precisely {u}T0 ) that takes the system

x = ft(xt, ut) x0 = x0 t ∈ [0, T ] (1)

such that the cost function

J = φT (xT ) +

0Lt(xt, ut) dt (2)

is minimized. Define the truncated performance index (cost to go)

Jt(xt, {u}Tt ) = φT (xT ) +

Ls(xs, us) ds

The Bellman function (optimal cost to go) is defined by

Vt(xt) = min{u}T

Jt(xt, {u}Tt )

We have the following theorem, which states a sufficient condition.

Theorem

The Bellman function Vt(xt), satisfy the equation

−∂Vt(xt)

∂t= min

Lt(xt, ut) +∂Vt(xt)

∂xft(xt, ut)

Hamilton Jacobi Bellman (3)

This is a PDE with boundary conditions

VT (xT ) = φT (xT )

�7 / 29

Proof.

In discrete time we have the Bellman equation

Vi(xi) = minui

Li(xi, ui) + Vi+1(xi+1)]

with the boundary conditionVN (xN ) = φN (xN )

t+∆t

Vt(xt) = minut

[∫ t+∆t

Lt(xt, ut) dt+ Vt+∆t(xt+∆t)

Apply a Taylor expansion on Vt+∆t(xt+∆t)

Vt(xt) = minut

Lt(xt, ut)∆t + Vt(xt) +∂Vt(xt)

∂xft ∆t+

∂Vt(xt)

∂t∆t+ o(|∆t|)

8 / 29

Proof.

Vt(xt) = minut

Lt(xt, ut)∆t+ Vt(xt) +∂Vt(xt)

∂xft∆t+

∂Vt(xt)

∂t∆t+o(|∆t|)

(just a copy)

Collect the terms which do not depend on the decision (ut):

Vt(xt) = Vt(xt) +∂Vt(xt)

∂t∆t+min

Lt(xt, ut) ∆t+∂Vt(xt)

∂xft(xt, ut) ∆t

+o(|∆t|)

In the limit ∆t → 0 (and after divide with ∆t):

−∂Vt(xt)

∂t= min

∂xft(xt, ut)

9 / 29

The HJB equation:

−∂Vt(xt)

∂t= min

∂xft(xt, ut)

(just a copy)

The Hamiltonian function

Ht(xt, ut, λTt ) = Lt(xt, ut) + λT

t ft(xt, ut)

The HJB equation can also be formulated as

−∂Vt(xt)

∂t= min

Ht(xt, ut,∂Vt(xt)

Link to Pontryagins maximum principle:

λTt =

∂Vt(xt)

xt = ft(xt, ut) State equation

−λTt =

Ht Costate equation

ut = arg minut

[Ht] Optimality condition

10 / 29

Motion control

Consider the systemxt = ut x0 = x0

and the performance index

2u2t dt

The HJB equation, (3), gives:

−∂Vt(xt)

∂t= min

2u2t +

∂Vt(xt)

∂xut

VT (xT ) =1

The minimization can be carried out and gives a solution w.r.t. ut which is

ut = −∂Vt(xt)

So if the Bellman function is known the control action, the decision can be determined from this.If the result above is inserted in the HJB equation we get

−∂Vt(xt)

[∂Vt(xt)

= −1

[∂Vt(xt)

which is a partial differential equation with the boundary condition

VT (xT ) =1

11 / 29

−∂Vt(xt)

∂t= −

[∂Vt(xt)

(just a copy)

Inspired of the boundary condition we guess on a candidate function of the type

Vt(x) =1

where the time dependence is in the function, st. Since

∂x= stx

the following equation

2 = −1

2(stx)

must be valid for any x, i.e. we can find st by solving the ODE

st = s2t sT = p

backwards. This is actually (a simple version of) the continuous time Riccati equation. Thesolution can be found analytically or by means of numerical methods. Knowing the function, st,we can find the control input

ut = −∂Vt(xt)

∂x= −stxt

12 / 29

13 / 29

The Bank loan

Deterministic:xi+1 = (1 + r)xi − ui x0 = x0

Stochastic:xi+1 = (1 + ri)xi − ui x0 = x0

0 5 10 15 20 250

10Rate of interests

time (month)

14 / 29

0 1 2 3 4 5 6 7 8 9 100

4 Bank balance

time (year)0 1 2 3 4 5 6 7 8 9 10

4 Bank balance

time (year)

15 / 29

Discrete Random Variable

X ∈{x1, x2, ..., xm

}∈ R

pk = P{X = xk

}≥ 0

pk = 1

1 2 3 4 5 6 7 80

pkg(xk)

16 / 29

Consider the problem of minimizing (in some sense):

J = φN (xN , eN ) +

N−1∑

Li(xi, ui, ei)

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

ei might be vectors reflecting model errors or direct stochastic effects.

17 / 29

Ranking performance Indexes

When ei and others are stochastic variable, what do we mean by that one strategy is better thananother.

In a deterministic situation we mean that

J1 > J2

(J1 (J2) being the objective function for strategy 1 (2)).

In a stochastic situation we can choose the definition

but others do exists. This choice reflects some kind of average consideration.

18 / 29

Example: Booking profiles

Normally a plane is over booked, ie. more tickets are sold than the number of seats xN . Let xi

be the number of sold tickets on the beginning of day i.

If xN < xN we have empty seats - money out the window.If xN > xN we have to pay compensations - also money out the window.

So we want to find a strategy such we are minimizing:

φ(xN − xN )}

Let wi be the requests for a ticket on day i

(with probability: P{wi = k

}= pk)

and let vi be number of cancellations on day i

(with probability P{vi = k

}= qk).

Dynamics:

xi+1 = xi +min(ui, wi) − vi ei =

Decision information: ui(xi).

19 / 29

Stochastic Bellman Equation

Consider the problem of minimizing:

φ(xN , eN ) +

N−1∑

Li(xi, ui, ei)}

subject toxi+1 = fi(xi, ui, ei) x0 = x0

and the constraints(xi, ui, ei) ∈ Vi (xN , eN ) ∈ VN

Theorem

The Bellman function (optimal cost to go), Vi(xi) is given by the (backward) recursion:

Vi(xi) = minui

Li(xi, ui, ei) + Vi+1(xi+1)}

xi+1 = fi(xi, ui, ei)

VN (xN ) = E

φN (xN , eN )}

where the optimization is subject to the constraints and the available information . �

20 / 29

Discrete (SDD) case

If ei is discrete, ie.

ei ∈{e1i , e2i , ... emi

}pki = P

ei = eki

k = 1, 2, ... m

then the stochastic Bellman equation can be expressed as

Vi(xi) = minui

Li(xi, ui, eki ) + Vi+1(fi(xi, ui, e

︸︷︷︸

Wi(xi,ui)

with boundary condition

VN (xN ) =m∑

pkNφN (xN , ekN )

The entries in the scheme below are now expected values (ie. weighted sums).

Wi ui Vi(xi) u∗i (xi)

xi 0 1 2 3

21 / 29

Optimal stochastic stepping (SDD)

Consider the systemxi+1 = xi + ui + ei x0 = 2,

whereei ∈

{−1 0 1

}ui ∈ {−1, 0, 1}∗

xi ∈ {−2, −1, 0, 1, 2}

pki eixi -1 0 1

-2 0 12

-1 0 12

x2i + u2

Notice, no stochastic components.

22 / 29

Firstly, from

x2i + u2

(no stochastics in cost) we establish V4(x4) = x24. We are assuming perfect state information.

-2 4-1 10 01 12 4

23 / 29

Then we establish the W3(x3, u3) function (the cost to go):

W3(x3, u3) =m∑

L3(x3, u3, ek3) + V4(f3(x3, u3, e

W3(x3, u3) = p13[x23 + u2

3 + V4(x3 + u3 + e13)]

e13, p13

+p23[x23 + u2

3 + V4(x3 + u3 + e23)]

e23, p23

+p33[x23 + u2

3 + V4(x3 + u3 + e33)]

e33, p33

︸︷︷︸

L3(x3,u3,ek

︸︷︷︸

f3(x3,u3,ek

or more compact:

W3(x3, u3) = x23 + u2

3 + p13[V4(x3 + u3 + e13)

+p23[V4(x3 + u3 + e23)

+p33[V4(x3 + u3 + e33)

24 / 29

W3(x3, u3) =3∑

x23 + u2

3 + V4(x3 + u3 + ek3)]

W3(0,−1) =1

[02 + (−1)2 + V4(0 − 1−1)

](−1,

+0[02 + (−1)2 + V4(0− 1 + 0)

](0, 0)

[02 + (−1)2 + V4(0− 1 + 1)

2(1 + 4) + 0 +

2(1 + 0) = 3

x3 -1 0 1-2 ∞ 6.5 5.5-1 4.5 1.5 2.50 3 1 31 2.5 1.5 4.52 5.5 3.5 ∞

-2 4-1 10 01 12 4

(just for reference)

25 / 29

W3 u3 V3(x3) u∗3(x3)

x3 -1 0 1-2 ∞ 6.5 5.5 5.5 1-1 4.5 1.5 2.5 1.5 00 3 1 3 1 01 2.5 1.5 4.5 1.5 02 5.5 3.5 ∞ 3.5 0

W2 u2 V2(x2) u∗2(x2)

x2 -1 0 1-2 ∞ 7.5 6.25 6.25 1-1 5.5 2.25 3.25 2.25 00 4.25 1.5 3.25 1.5 01 3.25 2.25 4.5 2.25 02 6.25 6.5 ∞ 6.25 -1

26 / 29

W1 u1 V1(x1) u∗1(x1)

x1 -1 0 1-2 ∞ 8.25 6.88 6.88 1-1 6.25 2.88 3.88 2.88 00 4.88 2.25 4.88 2.25 01 3.88 2.88 6.25 2.88 02 6.88 8.25 ∞ 6.88 -1

W0 u0 V0(x0) u∗0(x0)

x0 -1 0 12 7.56 8.88 ∞ 7.56 -1

Trace back: ui(xi). A feed back solution. Not a time function.

27 / 29

Deterministic setting (xi+1 = xi + ui i = 0, ... 3)

i 0 1 2 3u∗i -1 0 0 0

Stochastic setting (xi+1 = xi + ui+ei i = 0, ... 3)

x0 u∗0 x1 u∗

1 x2 u∗2 x3 u∗

3-2 1 -2 1 -2 1-1 0 -1 0 -1 00 0 0 0 0 01 0 1 0 1 0

2 -1 2 -1 2 -1 2 0

28 / 29

Concluding remarks

Discrete state and decision space.

Approximation. Grid covering state and decision space.

Curse of dimensions - combinatoric explosion.

29 / 29

static and dynamic optimization (42111)

Documents

structural optimization of exible components under dynamic...

static mains load optimization add-on unit · static mains...

static analysis and optimization of a connecting rod

adaptive static resource optimization

static analysis and geometric optimization of independent

static and dynamic optimization of synthesis, design and

overview motivations basic static and dynamic optimization...

spinstreams: a static optimization tool fordata stream...

process analysis, automation and control solutions ·...

static linear feedback for control as optimization...

automobile motor vehicle wholesalers 42111

discussion on the optimization problem formulation of...

the mean-value at risk static portfolio optimization using...

optimization of connecting rod on the basis of static &...

automobile motor vehicle wholesale lines 42111 l

mass, power and static stability optimization of a 4...

static optimization of php bytecode (phpsc 2017)

test and optimization of static and dynamical … and...

equivalent static loads method for non linear static ......

dynamic optimization. 2nd edition, amsterdam: north · pdf...