dynamic programming and optimal control€¦ · overview dynamic programming algorithm (dpa)...

||

Final Recitation

09.01.2017Rajan Gill, Weixuan Zhang 1

Dynamic Programming and Optimal

Control

||

Overview

Dynamic Programming Algorithm (DPA)

Deterministic Systems and the Shortest Path (SP)

Infinite Horizon Problems, Stochastic SP

Deterministic Continuous-Time Optimal Control


Outline

|| 09.01.2017Rajan Gill, Weixuan Zhang 3

Overview

||

Basic Problem

Alternative Problem Formulation

Reformulations

Time lag, correlated disturbances, forecasts, …



||

Basic idea: Principle of Optimality

Algorithm:

Minimizing the recursion equation for each and gives

us the optimal policy:



||

Consider now problems where

is a finite set,

No disturbance .

Convert DP to SP (and vice versa)

DP:

SP:

Viterbi Algorithm


Deterministic Systems and the Shortest Path

||

DP finds all optimal paths to end node. Sometimes not

needed.

Exploit structure of these problems to come up with

efficient algorithms for solving shortest path problems:


Deterministic Systems and the Shortest Path

Label Correcting Algorithm

Step 1: Remove a node i from OPEN and for each child j of i,

execute step 2.

Step 2: If di + aij < min{dj,dT}, set dj = di + aij and set i to be the

parent of j. In addition, if j≠T, place j in OPEN if it is not already in

OPEN, while if j=T, set dT to the new value di+aiT.

Step 3: If OPEN is empty, terminate; else go to step 1.

||

Consider time-invariant system with infinite horizon:

Optimal policy is stationary:

Optimal cost solves Bellman’s equation:


Infinite Horizon Problems

||

Stochastic Shortest Path problems:

Cost-free termination state :

a policy and an integer such that:

Infinite Horizon Problems: Stochastic Shortest Path


||

Value iteration:

Step 1: Choose an initial guess .

Step 2: Update cost values with the value iteration formula:

Step 3: If converged for all , terminate. Else go to step 2.



||

Policy iteration:

Step 1: Choose an initial stationary policy .

Step 2: Policy evaluation (compute cost of current policy):

Step 3: Policy improvement (find a better policy):

Step 4: If for all , terminate. Else go to step 2.



(lin. sys. of eq.)

||

Linear programming:

Optimal cost solves the following linear program:

For each admissible pair we get one linear constraint



||

Discounted problems:

Discounted cost:

Infinite Horizon Problems: Discounted Problems


||

Basic Problem

No noise: deterministic.

Goal: Find an admissible control trajectory , ,

and corresponding state trajectory which minimize

the cost.

Solution is found by HJB or Minimum Principle.



||

Hamilton-Jacobi-Bellman Equation (cont.-time analog to DPA)

“Derived” by discretizing and taking limits of DPA.

Partial differential equation. Very hard to solve!

Usually guess a solution and proof that is satisfies HJB.

Sufficient condition.

Optimal policy: that minimize RHS of HJB.



||

Minimum Principle (Only finds optimal solution for a specific initial condition )

Define Hamiltonian:

Then:

Only necessary conditions.

Various extensions (e.g. fixed terminal state, …).



dynamic programming and optimal control€¦ · overview dynamic programming algorithm (dpa)...

Documents