tutorial: a unified modeling and algorithmic framework for

156
Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty Informs Optimization Society Meeting March 23, 2018 Warren B. Powell Princeton University Department of Operations Research and Financial Engineering © 2016 Warren B. Powell, Princeton University

Upload: others

Post on 27-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial: A Unified Modeling and Algorithmic Framework for

Tutorial: A Unified Modeling and Algorithmic Framework for Optimization under Uncertainty

Informs Optimization Society Meeting

March 23, 2018

Warren B. Powell

Princeton UniversityDepartment of Operations Research

and Financial Engineering

© 2016 Warren B. Powell, Princeton University

Page 2: Tutorial: A Unified Modeling and Algorithmic Framework for

Drug discovery

Designing molecules

» X and Y are sites where we can hang substituents to change the behavior of the molecule. We approximate the performance using a linear belief model:

0

ij ijsites i substituents j

Y X

» How to sequence experiments to learn the best molecule as quickly as possible?

Page 3: Tutorial: A Unified Modeling and Algorithmic Framework for

Real-time logisticsUber/Lyft» Provides real-time, on-demand

transportation.» Drivers are encouraged to enter or leave

the system using pricing signals and informational guidance.

Decisions:» How to price to get the right balance of

drivers relative to customers.» Real-time management of drivers.» Policies (rules for managing drivers,

customers, …)

Page 4: Tutorial: A Unified Modeling and Algorithmic Framework for

Ride sharing

RidersCars

Page 5: Tutorial: A Unified Modeling and Algorithmic Framework for

t t+1 t+2

Ride sharing

Page 6: Tutorial: A Unified Modeling and Algorithmic Framework for

t t+1 t+2

Ride sharing

Page 7: Tutorial: A Unified Modeling and Algorithmic Framework for

t t+1 t+2

Ride sharing

Page 8: Tutorial: A Unified Modeling and Algorithmic Framework for

Matching buyers with sellersNow we have a logistic curve for each origin-destination pair (i,j)

Number of offers for each (i,j) pair is relatively small.Need to generalize the learning across hundreds to thousands of markets.

0

0( , | )1

aij ij ij

aij ij ij

p aY

p a

eP p ae

Buyer Seller

Offered price

Page 9: Tutorial: A Unified Modeling and Algorithmic Framework for

Meeting variability with portfolios of generationwith mixtures of dispatchability

Page 10: Tutorial: A Unified Modeling and Algorithmic Framework for

Storage applicationsHow much energy to store in a battery to handle the volatility of wind and spot prices to meet demands?

Page 11: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 12: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 13: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Decision trees

Page 14: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Stochastic search (derivative based)» Basic problem:

» Stochastic gradient

» Asymptotic convergence:

» Finite time performance

max ( , )x F x W

1 1( , )n n n nn xx x F x W

*lim ( , ) ( , )nn F x W F x W

Manufacturing network (x=design)Unit commitment problem (x=day ahead decisions)Inventory system (x=design, replenishment policy)Battery system (x=choice of material)Patient treatment cost (x=drug, treatments)Trucking company (x=fleet size and mix)

,max ( , ) where is an algorithm (or policy)nF x W

Page 15: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Ranking and selection (derivative free)» Basic problem:

» We need to design a policy that finds a design given by

» We refer to this objective as maximizing the final reward.

1 ,...,max ( , )Mx x x F x W

( )nX S

,max ,NF x W

,Nx

Page 16: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Multi-armed bandit problems» We learn the reward from playing each

“arm”

» We need to find a policy for playing machine x that maximizes:

where

We refer to this problem as maximizing cumulative reward.

( )nX S

11

0

max ( ( ), )N

n n

n

F X S W

1 "winnings"State of knowledge

( )

n

n

n n

WSx X S Choose next “arm” to play

New information

What we know about each slot machine

Page 17: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

(Discrete) Markov decision processes» Bellman’s optimality equation

» This is also the same as solving

where the optimal policy has the form

1 1

1 1 1'

( ) min ( , ) ( ) |

min ( , ) ( ' | , ) ( )

t

t

t t a t t t t t

a t t t t t t ts

V S C S a V S S

C S a p S s S a V S

A

A

{ }( )1 1( ) arg min ( , ) ( ) | ,tt x t t t t t tX S C S x V S S xp

+ += +

00

min , ( ) |T

t t tt

E C S X S S

Page 18: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Optimal stopping I» Model:

• Exogenous process:

• Decision:

• Reward:

» Optimization problem:

where is a “stopping time” (or )

1 If we stop and sell at time ( )

0 Otherwise t

tX

1 2, ,..., Sequence of stock pricesTp p p

Price received if we stop at time tp t

max p X " measurable function"tF

Page 19: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Optimal stopping II» Model:

• Exogenous process:

• State:

• Policy:

» Optimization problem:

1 ( | )

0 Otherwise t t

t t

p pX S

1 2

1

, ,..., Sequence of stock prices(1 )

T

t t t

p p pp p p

0 0max ( | ) max ( | )

T T

t t t tt t

p X S p X S

1 if we are holding asset, 0 otherwise.( , , )

t

t t t t

RS R p p

=

=

Page 20: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Linear quadratic regulation (LQR)» A popular optimal control problem in engineering

involves solving:

» where:

» Possible to show that the optimal policy looks like:

where is a complicated function of Q and R.

0 ,...,

0min ( ) ( )

T

TT T

u u t t t tt

x Qx u Ru

1

State at time Control at time (must be measurable)

( , ) ( is random at time )

t

t t

t t t t t

x tu t F

x f x u w w t

*( )t t t tU x K x

tK

Page 21: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Stochastic programming» A (two-stage) stochastic programming problem

where

» This is the canonical form of stochastic programming, which might also be written over multiple periods:

0 0 0 0 0 1min ( , )x X c x Q x

1 10 1 ( ) ( ) 1 1( , ( )) min ( ) ( )x XQ x c x

0 01

min ( ) ( ) ( )T

t tt

c x p c x

Page 22: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Stochastic programming» A (two-stage) stochastic programming policy

where

» This is the canonical form of stochastic programming, which might also be written over multiple periods:

1min ( , )t tx X t t t tc x Q x

1 11 ( ) ( ) 1 1( , ( )) min ( ) ( )t tt t x X t tQ x c x

' '' 1

min ( ) ( ) ( )t t

t H

t t t tt ttt t

c x p c x

Page 23: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

A robust optimization problem would be written

» This means finding the best design x for the worst outcome w in an “uncertainty set”

» This has been adapted to multiperiod problems

( )min max ( , )x X w F x w W

0 1 0,..., ( ,..., ) ( ) ' ' '' 0

min max ( )T

T

x x w w t t tt

c w x

( )

,..., ( ,..., ) ( ) ' ' ''

min max ( )t t H t t H

t H

x x w w t t tt t

c w x

Page 24: Tutorial: A Unified Modeling and Algorithmic Framework for

Canonical problems

Why do we need a unified framework?

» The classical frameworks and algorithms are fragile.

» Small changes to problems invalidate optimality conditions, or make algorithmic approaches intractable.

» Practitioners need robust approaches that will provide high quality solutions for all problems.

Page 25: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies problem classesModeling uncertaintyDesigning policies

Page 26: Tutorial: A Unified Modeling and Algorithmic Framework for

Storage applicationsHow much energy to store in a battery to handle the volatility of wind and spot prices to meet demands?

Page 27: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling

For deterministic problems, we speak the language of mathematical programming» Linear programming:

» For time-staged problems

min x cx

0Ax b

x

1 1

0

t t t t t

t t t

t

A x B x bD x u

x

0 ,...,0

minT

T

x x t tt

c x

Arguably Dantzig’s biggest contribution, more so than the simplex algorithm, was his articulation of optimization problems in a standard format, which has given algorithmic researchers a common language.

Page 28: Tutorial: A Unified Modeling and Algorithmic Framework for

Stochastic programming

Markov decision processes

Reinforcement learning

Optimal control

Model predictive

control

Robust optimization

Approximate dynamic

programming

Online computation

Simulation optimization

Stochastic search

Decision

analysis

Stochastic control

Simulation optimization

DynamicProgramming

andcontrol

Optimal learning

Banditproblems

Page 29: Tutorial: A Unified Modeling and Algorithmic Framework for

Stochastic programming

Markov decision processes

Reinforcement learning

Optimal control

Model predictive

control

Robust optimization

Approximate dynamic

programming

Online computation

Simulation optimization

Stochastic search

Decision

analysis

Stochastic control

Simulation optimization

DynamicProgramming

andcontrol

Optimal learning

Banditproblems

Page 30: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling

Before we can solve complex problems, we have to know how to think about them.

The biggest challenge when making decisions under uncertainty is modeling.

Min E {cx}Ax = bx > 0

Mathematician

Software

Organize classlibraries, and set up

communications and databases

Page 31: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling

We lack a standard language for modeling sequential, stochastic decision problems.» In the slides that follow, we propose to model problems

along five fundamental dimensions:

• State variables• Decision variables• Exogenous information• Transition function• Objective function

» This framework draws heavily from Markov decision processes and the control theory communities, but it is not the standard form used anywhere.

Page 32: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling dynamic problems

The state variable:

Controls community "Information state"Operations research/MDP/Computer science , , System state, where: Resource state (physical state) Loca

t

t t t t

t

x

S R I BR

tion/status of truck/train/plane Energy in storage Information state Prices Weather Belief state ("state of knowl

t

t

I

B

edge") Belief about traffic delays Belief about the status of equipment

Page 33: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

Illustrating state variables» A deterministic graph

1

2

3

4

5

6

7

8

9

10

11

12.6

8.4

9.2 3.6

8.1

17.4

15.9

16.5 20.2

13.5

8.912.7

15.9

2.34.5

7.3

9.65.7

?tS ( ) 6t tS N

Page 34: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

Illustrating state variables» A stochastic graph

1

2

3

4

5

6

7

8

9

10

11

3.6

8.1

13.5

8.9

12.712.6

8.4

9.2

15.9

?tS

Page 35: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

Illustrating state variables» A stochastic graph

1

2

3

4

5

6

7

8

9

10

11

3.6

8.1

13.5

8.9

12.712.6

8.4

9.2

15.9

?tS , ,, 6, 12.7,8.9,13.5tt t t N j j

S N c

tR

tI

Page 36: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

Illustrating state variables» A stochastic graph with left turn penalties

1

2

3

4

5

6

7

8

9

10

11

3.6

8.1

13.5

8.9

12.6

8.4

9.2

15.9

?tS , , 1, , 6, 12,7,8.9,13.5 ,3tt t t N j tj

S N c N

tR

tI

12.7( .7)

Page 37: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

Variant of problem in Puterman (2005):» Find best path from 1 to 11 that minimizes the second

highest arc cost along the path:

» If the traveler is at node 9, what is her state?

1 3

2

4

6

5

7

8

8

1212

14

87

3

11 14

11

8

9

8

10

112

4

510

15

815

( , highest,second highest) (9,15,12)t tS N ?tS

Page 38: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variables

What is a state variable?» Bellman’s classic text on dynamic programming (1957)

describes the state variable with:• “… we have a physical system characterized at any stage by a

small set of parameters, the state variables.”

» The most popular book on dynamic programming (Puterman, 2005, p.18) “defines” a state variable with the following sentence:

• “At each decision epoch, the system occupies a state.”

» Wikipedia:• A state variable is one of the set of variables that are used to

describe the mathematical ‘state’ of a dynamical system

Page 39: Tutorial: A Unified Modeling and Algorithmic Framework for

The state variable

My definition of a state variable:

» The first depends on a policy. The second depends only on the problem (and includes the constraints).

» Using either definition, all properly modeled problems are Markovian!

Page 40: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling dynamic problems

Decisions:Markov decision processes/Computer science Discrete actionControl theory Low-dimensional continuous vectorOperations research Usually a discrete or continuous but high-dimensional

t

t

t

a

u

x

vector of decisions.

At this point, we do not specify to make a decision.Instead, we define the function ( ) (or ( ) or ( )), where specifies the type of policy. " " carries informationabout the type of functi

howX s A s U s

.on , and any tunable parameters ff

Page 41: Tutorial: A Unified Modeling and Algorithmic Framework for

The decision variables

Styles of decisions» Binary

» Finite

» Continuous scalar

» Continuous vector

» Discrete vector

» Categorical

0,1x X

1,2,...,x X M

,x X a b

1( ,..., ), K kx x x x

1( ,..., ), K kx x x x

1( ,..., ), is a category (e.g. red/green/blue)I ix a a a

Page 42: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling dynamic problems

Exogenous information:

New information that first became known at time

ˆ ˆ ˆˆ = , , ,

ˆ Equipment failures, delays, new arrivals New drivers being hired to the network

ˆ New customer demands

t

t t t t

t

t

W t

R D p E

R

D

ˆ Changes in pricesˆ Information about the environment (temperature, ...) t

t

p

E

Note: Any variable indexed by t is known at time t. This convention, which is not standard in control theory, dramatically simplifies the modeling of information.

1 2Below, we let represent a sequence of actual observations , ,....

refers to a sample realization of the random variable .t t

W WW W

Page 43: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling dynamic problems

The transition function

1 1

1 1

1 1

1 1

( , , )ˆ Inventoriesˆ Spot pricesˆ Market demands

Mt t t t

t t t t

t t t

t t t

S S S x W

R R x Rp p p

D D D

Also known as the:“System model”“State transition model”“Plant model”“Plant equation”“State equation”

“Transfer function”“Transformation function”“Law of motion”“Model”“transition function”

For many applications, these equations are unknown. This is known as “model-free” dynamic programming.

Page 44: Tutorial: A Unified Modeling and Algorithmic Framework for

1 00

max , ( ), |

T

t t t t tt

E C S X S W S

Modeling stochastic, dynamic problemsThe universal objective function

Given a system model (transition function)

and a stochastic process:

Now we just have to find the best policy.

Decision function (policy)State variable

Contribution function

Finding the best policy

Expectation over allrandom outcomes

1 1, , ( )Mt t t tS S S x W

Initial state variableNew information

0 1 2, , ,..., TS W W W

Page 45: Tutorial: A Unified Modeling and Algorithmic Framework for

1 00

max , ( ), |

T

t t t t tt

E C S X S W S

Modeling stochastic, dynamic problemsThe universal objective function

Given a system model (transition function)

and a stochastic process:

Now we just have to find the best policy.

1 1, , ( )Mt t t tS S S x W

0 1 2, , ,..., TS W W W

Page 46: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling Deterministic » Objective function

» Decision variables:

» Constraints: • at time t

• Transition function

Stochastic» Objective function

» Policy

» Constraints at time t

» Transition function

» Exogenous information

1 00

max , ( ), |

T

t t t t tt

E C S X S W S0 ,..., 0min

T

T

t tx x tc x

( )t t t tx X S

1 1t t t tR b B x 1 1, ,Mt t t tS S S x W

0 1 2( , , ,..., )TS W W W

0t t t

t

A x Rx

t

0 ,..., Tx x :X S

Page 47: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 48: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Consider a basic energy storage problem:

» We are going to show that with minor variations in the characteristics of this problem, we can make each class of policy work best.

Page 49: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

A model of our problem

» State variables

» Decision variables

» Exogenous information

» Transition function

» Objective function

Page 50: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

State variables

» We will present the full model, accumulating the information we need in the state variable.

» We will highlight information we need as we proceed. This information will make up our state variable.

E

G

B

L

Page 51: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Decision variables

» Constraints;

E

G

B

L

, , , , ,EL EB GL GB BLt t t t t tx x x x x x

Page 52: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Exogenous information

E

G

B

L

' '

' '

ˆ Change in energy from wind between 1 and

Noise in the price process between 1 and

Forecast of demand provided by vendor at time

Provided exogenously

Differenc

tp

tD

tt t

D Dt tt t tDt

E t t

t t

f D t

f f

e between actual demand and forecast

tW

Page 53: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Transition function

E

G

B

L

1 1

1 0 1 1 2 2 1 1

1 , 1 1

1

ˆ

( )t t t

p T pt t t t t t t t t t t

D Dt t t t

battery batteryt t t

E E E

p p p p p

D f

R R x

1

2

t

t t

t

pp p

p

Page 54: Tutorial: A Unified Modeling and Algorithmic Framework for

Learning in stochastic optimization

Updating the demand parameter» Let be the new price and let

» We update our estimate using our recursive least squares equations:

1 11

1t t t t t

t

B pq q eg+ +

+

= -

( )1 1

11

1

( | ) , 1 ( )

1 ( )

t t t t t

Tt t t t t t

t

Tt t t t

F p p

B B B p p B

p B p

e q

g

g

+ +

++

+

= -

= -

= +

0 1 1 2 2( | ) ( )price Tt t t t t t t t t t tF p p p p p

Page 55: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Objective function

E

G

B

L

( , ) GB GLt t t t tC S x p x x

1 00

min ( , ( ), ) | T

t t t t tt

C S X S W S

Page 56: Tutorial: A Unified Modeling and Algorithmic Framework for

1 0 1 1 2 2 1

1 , 1 1

1 11

1 ( )

pt t t t t t t t

D Dt t t t

t t t t tt

p p p p

D f

B x

An energy storage problem

State variables» Cost function

» Decision functionConstraints:

» Transition function

Price of electricitytp

1 2, , , ( , , ) , , ( , )Dt t t t t t t t t tS E L R p p p f B

Page 57: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Elements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 58: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classesSpecial structure» There are special cases where we can solve

exactly. But not very many.

Sampled problems (SAA, scenario trees)» If the only problem is that we cannot compute the expectation, we

might solve a sampled approximation

Adaptive learning algorithms» This is what we have to turn to for most problems, and is the focus

of this tutorial.

max ( , )x F x W

ˆmax ( , )x F x W

Page 59: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

State independent problems» The problem does not depend on the state of the system.

» The only state variable is what we know (or believe) about the unknown function , called the belief state , so .

State dependent problems» Now the problem may depend on what we know at time t:

» Now the state is

max ( , ) min( , )xF x W p x W cx

( , )F x W

0max ( , , ) min( , )

tx R tC S x W p x W cx

Page 60: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

Offline (final reward)» We can iteratively search for the best solution, but only

care about the final answer.» Asymptotic formulation:

» Finite horizon formulation:

Online (cumulative reward)» We have to learn as we go

max ( , )xF x W

,max ( , )NF x W

11

0

max ( ( ), )N

n n

n

F X S W

üïïïïïýïïïïïþ

“ranking and selection”or

“stochastic search”

Page 61: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

Page 62: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

Page 63: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

Learning policies:Approximate dynamic programmingQ-learningSDDP…

Page 64: Tutorial: A Unified Modeling and Algorithmic Framework for

Solution strategies and problem classes

“Online” (cumulative reward) dynamic programming is recognized as the “dynamic programming problem,” but the entire literature on solving dynamic programs describes class (4) problems. This appears to be an open problem.

Page 65: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 66: Tutorial: A Unified Modeling and Algorithmic Framework for

Modeling uncertainty

Observational uncertaintyPrognostic uncertainty (forecasting)Experimental noise/variabilityTransitional uncertaintyInferential uncertaintyModel uncertaintySystematic exogenous uncertaintyControl/implementation uncertaintyAlgorithmic noiseGoal uncertainty

Modeling uncertainty in the context of stochastic optimization is a relatively untapped area of research.

Page 67: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

Canonical problemsElements of a dynamic modelAn energy storage illustrationSolution strategies and problem classesModeling uncertaintyDesigning policies

Page 68: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policies

We have to start by describing what we mean by a policy.» Definition:

A policy is a mapping from a state to an action. … any mapping.

How do we search over an arbitrary space of policies?

Page 69: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policies

Two fundamental strategies:

1) Policy search – Search over a class of functions for making decisions to optimize some metric.

2) Lookahead approximations – Approximate the impact of a decision now on the future.

0( , )0

max , ( | ) |f f

T

t t tf Ft

E C S X S S

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Page 70: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policiesPolicy search:1a) Policy function approximations (PFAs)

• Lookup tables – “when in this state, take this action”

• Parametric functions– Order-up-to policies: if inventory is less than s, order up to S.– Affine policies -– Neural networks

• Locally/semi/non parametric– Requires optimizing over local regions

1b) Cost function approximations (CFAs)• Optimizing a deterministic model modified to handle uncertainty

(buffer stocks, schedule slack)

( )( | ) arg max ( , | )

t t

CFAt t tx

X S C S x

X

( | )PFAt tx X S

( | ) ( )PFAt t f f t

f Fx X S S

Page 71: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policiesLookahead approximations – Approximate the impact of a decision now on the future:» An optimal policy (based on looking ahead):

2a) Approximating the value of being in a downstream state using machine learning (“value function approximations”)

*1 1

1 1

( ) arg max ( , ) ( ) | ,

( ) arg max ( , ) ( ) | ,

arg max ( , ) ( )

t

t

t

t t x t t t t t t

VFAt t x t t t t t t

x xx t t t t

X S C S x V S S x

X S C S x V S S x

C S x V S

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Page 72: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policiesLookahead approximations – Approximate the impact of a decision now on the future:» An optimal policy (based on looking ahead):

2a) Approximating the value of being in a downstream state using machine learning (“value function approximations”)

*1 1

1 1

( ) arg max ( , ) ( ) | ,

( ) arg max ( , ) ( ) | ,

arg max ( , ) ( )

t

t

t

t t x t t t t t t

VFAt t x t t t t t t

x xx t t t t

X S C S x V S S x

X S C S x V S S x

C S x V S

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Page 73: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policiesLookahead approximations – Approximate the impact of a decision now on the future:» An optimal policy (based on looking ahead):

2a) Approximating the value of being in a downstream state using machine learning (“value function approximations”)

*1 1

1 1

( ) arg max ( , ) ( ) | ,

( ) arg max ( , ) ( ) | ,

arg max ( , ) ( )

t

t

t

t t x t t t t t t

VFAt t x t t t t t t

x xx t t t t

X S C S x V S S x

X S C S x V S S x

C S x V S

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Page 74: Tutorial: A Unified Modeling and Algorithmic Framework for

The ultimate lookahead policy is optimal*

' ' ' 1' 1

( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Designing policies

Page 75: Tutorial: A Unified Modeling and Algorithmic Framework for

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

Designing policies

The ultimate lookahead policy is optimal

Expectations that we cannot compute

Maximization that we cannot compute

Page 76: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policies

The ultimate lookahead policy is optimal

» 2b) Instead, we have to solve an approximation called the lookahead model:

» A lookahead policy works by approximating the lookahead model.

*' ' ' 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

T

t t x t t t t t t t tt t

X S C S x C S X S S S x

*' ' ' , 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t

t H

t t x t t tt tt tt t t t tt t

X S C S x C S X S S S x

Page 77: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policiesTypes of lookahead approximations » One-step lookahead – Widely used in pure learning

policies:• Bayes greedy/naïve Bayes• Expected improvement• Value of information (knowledge gradient)

» Multi-step lookahead• Deterministic lookahead, also known as model predictive

control, rolling horizon procedure• Stochastic lookahead:

– Two-stage (widely used in stochastic linear programming)– Multistage

» Monte carlo tree search (MCTS) for discrete action spaces

» Multistage scenario trees (stochastic linear programming) – typically not tractable.

Page 78: Tutorial: A Unified Modeling and Algorithmic Framework for

1) Policy function approximations (PFAs)» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)»

3) Policies based on value function approximations (VFAs)»

4) Direct lookahead policies (DLAs)» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFAt t tx

X S C S x

X

( ) arg max ( , ) ( , )t

VFA x xt t x t t t t t tX S C S x V S S x

' '' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA St t tt tt tt tt

t tX S C S x p C S x

xtt , xt ,t1,..., xt ,tT

,' '( ),..., ' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA ROt t tt tt tt ttw Wx x t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Polic

y se

arch

,' ',..., ' 1

( ) arg max ( , ) ( , )

tt t t H

TLA Dt t tt tt tt ttx x t t

X S C S x C S x

Look

ahea

dap

prox

imat

ions

Page 79: Tutorial: A Unified Modeling and Algorithmic Framework for

1) Policy function approximations (PFAs)» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)»

3) Policies based on value function approximations (VFAs)»

4) Direct lookahead policies (DLAs)» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFAt t tx

X S C S x

X

,' ',..., ' 1

( ) arg max ( , ) ( , )

tt t t H

TLA Dt t tt tt tt ttx x t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x xt t x t t t t t tX S C S x V S S x

' '' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA St t tt tt tt tt

t tX S C S x p C S x

xtt , xt ,t1,..., xt ,tT

,' '( ),..., ' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA ROt t tt tt tt ttw Wx x t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Func

tion

appr

ox.

Page 80: Tutorial: A Unified Modeling and Algorithmic Framework for

1) Policy function approximations (PFAs)» Lookup tables, rules, parametric/nonparametric functions

2) Cost function approximation (CFAs)»

3) Policies based on value function approximations (VFAs)»

4) Direct lookahead policies (DLAs)» Deterministic lookahead/rolling horizon proc./model predictive control

» Chance constrained programming

» Stochastic lookahead /stochastic prog/Monte Carlo tree search

» “Robust optimization”

Four (meta)classes of policies

( )( | ) arg max ( , | )

t t

CFAt t tx

X S C S x

X

,' ',..., ' 1

( ) arg max ( , ) ( , )

tt t t H

TLA Dt t tt tt tt ttx x t t

X S C S x C S x

( ) arg max ( , ) ( , )t

VFA x xt t x t t t t t tX S C S x V S S x

' '' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA St t tt tt tt tt

t tX S C S x p C S x

xtt , xt ,t1,..., xt ,tT

,' '( ),..., ' 1

( ) arg max min ( , ) ( ( ), ( ))ttt t t H

TLA ROt t tt tt tt ttw Wx x t t

X S C S x C S w x w

[ ( )] 1t tP A x f W

Imbe

dded

opt

imiz

atio

n

Page 81: Tutorial: A Unified Modeling and Algorithmic Framework for

Learning problems

Functions we have to learn:» Approximating the objective

.» Designing a policy .» A value function approximation

.» Designing a cost function approximation:

• The objective function ̅ , | .• The constraints |

» Approximating the transition function

Page 82: Tutorial: A Unified Modeling and Algorithmic Framework for

Approximation strategies

Approximation strategies» Lookup tables

• Independent beliefs • Correlated beliefs

» Linear parametric models• Linear models • Sparse-linear• Tree regression

» Nonlinear parametric models• Logistic regression• Neural networks

» Nonparametric models• Gaussian process regression• Kernel regression• Support vector machines• Deep neural networks

Page 83: Tutorial: A Unified Modeling and Algorithmic Framework for

Designing policies

Finding the best policy» We have to first articulate our classes of policies

» So minimizing over means:

» We then have to pick an objective such as

or

10 0

max , ( | ) ( | ),T T

t t t tt t

C S X S F X S W

, , ,

Parameters that characterize each family.f

f PFAs CFAs VFAs DLAs

, ff

max , ( , )T T TC S X F X W

Page 84: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 85: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 86: Tutorial: A Unified Modeling and Algorithmic Framework for

Policy function approximations

Battery arbitrage – When to charge, when to discharge, given volatile LMPs

Page 87: Tutorial: A Unified Modeling and Algorithmic Framework for

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71

Grid operators require that batteries bid charge and discharge prices, an hour in advance.

We have to search for the best values for the policy parameters

DischargeCharge

Charge Dischargeand .

Policy function approximations

Page 88: Tutorial: A Unified Modeling and Algorithmic Framework for

Policy function approximations

Our policy function might be the parametric model (this is nonlinear in the parameters):

charge

charge discharge

charge

1 if ( | ) 0 if

1 if

t

t t

t

pX S p

p

Energy in storage:

Price of electricity:

Page 89: Tutorial: A Unified Modeling and Algorithmic Framework for

Policy function approximations

Finding the best policy» We need to maximize

» We cannot compute the expectation, so we run simulations:

DischargeCharge

0

max ( ) , ( | )T

tt t t

tF C S X S

Page 90: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 91: Tutorial: A Unified Modeling and Algorithmic Framework for

Cost function approximations

Lookup table» We can organize potential catalysts into groups» Scientists using domain knowledge can estimate

correlations in experiments between similar catalysts.

Page 92: Tutorial: A Unified Modeling and Algorithmic Framework for

92

Cost function approximations

Correlated beliefs: Testing one material teaches us about other materials

1 2 3 4 4 5

Page 93: Tutorial: A Unified Modeling and Algorithmic Framework for

Cost function approximations

Designing a policy» Examples of policies for making decisions:

• Interval estimation:

• Upper confidence bounding

• Thompson sampling:

• Knowledge gradient (expected value of information):

( | ) arg max Std. dev. of IE n IE n IE n n nx x x x xX S

ˆ ˆ( ) arg max ( , )TS n n n n nx x x x xX S N

log( | ) arg max No. of times is tested.UCB n UCB n UCB nx x xn

x

nX S N xN

, 1( ) max ( , ( )) max ( , )KG n n ny yx E F y B x F y B

Page 94: Tutorial: A Unified Modeling and Algorithmic Framework for

94

Cost function approximations

Picking means we are evaluating each choice at the mean.

1 2 3 4 4 5

Page 95: Tutorial: A Unified Modeling and Algorithmic Framework for

95

Cost function approximations

Picking means we are evaluating each choice at the 95th percentile.

1 2 3 4 4 5

Page 96: Tutorial: A Unified Modeling and Algorithmic Framework for

Cost function approximationsOptimizing the policy» We optimize to maximize:

where

Notes:» This can handle any belief model,

including correlated beliefs, nonlinear belief models.

» All we require is that we be able to simulate a policy.

( | ) arg max ( , )n IE n IE n IE n n n nx x x x xx X S S

,max ( ) ,IEIE NF F x W

Page 97: Tutorial: A Unified Modeling and Algorithmic Framework for

Cost function approximations

Inventory management

» How much product should I order to anticipate future demands?

» Need to accommodate different sources of uncertainty.

• Market behavior• Transit times• Supplier uncertainty• Product quality

Page 98: Tutorial: A Unified Modeling and Algorithmic Framework for

Cost function approximations

Imagine that we want to purchase parts from different suppliers. Let be the amount of product we purchase at time t from supplier p to meet forecasted demand . We would solve

» This assumes our demand forecast is accurate.

tD

tpx

tD

( ) arg maxtt t x p tp

p PX S c x

subject to

0

tp tp P

tp p

tp

x D

x u

x

t

Page 99: Tutorial: A Unified Modeling and Algorithmic Framework for

Imagine that we want to purchase parts from different suppliers. Let be the amount of product we purchase at time t from supplier p to meet forecasted demand . We would solve

» This is a “parametric cost function approximation”

( )

Reserve

buffer

( | ) arg max

subject to

tt t p tpx

p P

tp tp P

tp p

tp

X S c x

x D

x u

x

( )t

Cost function approximations

tD

tpx

Reserve

Buffer stock

Page 100: Tutorial: A Unified Modeling and Algorithmic Framework for

A general way of creating CFAs:» Define our policy:

subject to

» We tune by optimizing:

Cost function approximations

( ) arg max ( , | )t x t tX C S x

0

min ( ) ( , ( ))T

t tt

F C S X

( )Ax b

Parametricallymodified costs

Parametricallymodified constraints

Page 101: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 102: Tutorial: A Unified Modeling and Algorithmic Framework for

Locational marginal prices on the gridLMPs – Locational marginal prices

$977/MW !!!

Page 103: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 103

Grid level storage Optimizing battery storage

Slide 103

Page 104: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 104

Grid level storage Optimizing battery storage

Slide 104

Page 105: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 105

Value function approximations

Slide 105

Page 106: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 106

Value function approximationsMonday

Time :05 :10 :15 :20

Slide 106

Page 107: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 107

Value function approximationsMonday

Time :05 :10 :15 :20

Slide 107

Page 108: Tutorial: A Unified Modeling and Algorithmic Framework for

Slide 108

Value function approximationsMonday

Time :05 :10 :15 :20

Slide 108

Page 109: Tutorial: A Unified Modeling and Algorithmic Framework for

Exploiting concavity

Derivatives are used to estimate a piecewise linear approximation

( )t tV R

tR

Slide 109

Page 110: Tutorial: A Unified Modeling and Algorithmic Framework for

Approximate value iteration

Step 1: Start with a pre-decision state Step 2: Solve the deterministic optimization using

an approximate value function:

to obtain . Step 3: Update the value function approximation

Step 4: Obtain Monte Carlo sample of andcompute the next pre-decision state:

Step 5: Return to step 1.

, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n

t t n t t n tV S V S v

1 ,ˆ min ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x

ntS

( )ntW

1 1( , , ( ))n M n n nt t t tS S S x W

Simulation

Deterministicoptimization

Recursivestatistics

“on policy learning”

nx

Page 111: Tutorial: A Unified Modeling and Algorithmic Framework for

Approximate value iteration

Step 1: Start with a pre-decision state Step 2: Solve the deterministic optimization using

an approximate value function:

to obtain . Step 3: Update the value function approximation

Step 4: Obtain Monte Carlo sample of andcompute the next pre-decision state:

Step 5: Return to step 1.

, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n

t t n t t n tV S V S v

1 ,ˆ min ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x

ntS

( )ntW

1 1( , , ( ))n M n n nt t t tS S S x W

Simulation

Deterministicoptimization

Recursivestatistics

nx

Page 112: Tutorial: A Unified Modeling and Algorithmic Framework for

1200000

1300000

1400000

1500000

1600000

1700000

1800000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Obj

ectiv

e fu

nctio

n

Iterations

Approximate dynamic programming

… a typical performance graph.

Page 113: Tutorial: A Unified Modeling and Algorithmic Framework for

The value of grid level storage

Without storage

Page 114: Tutorial: A Unified Modeling and Algorithmic Framework for

The value of grid level storage

With storage

Page 115: Tutorial: A Unified Modeling and Algorithmic Framework for

“I think you give a too rosy a picture of ADP….”Andy Barto, in comments on a paper (2009)

“Is the RL glass half full, or half empty?” Rich Sutton, NIPS workshop, (2014)

Approximate value functions can work very well, but you need structure to guide the learning process. ADP needs benchmarks and careful tuning.

Page 116: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 117: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Planning your next chess move:

» You put your finger on the piece while you think about moves into the future. This is a lookahead policy, illustrated for a problem with discrete actions.

Page 118: Tutorial: A Unified Modeling and Algorithmic Framework for
Page 119: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Decision trees:

Page 120: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Modeling lookahead policies» Lookahead policies solve a lookahead model, which is an

approximation of the future.» It is important to understand the difference between the:

• Base model – this is the model we are trying to solve by finding the best policy. This is usually some form of simulator.

• The lookahead model, which is our approximation of the future to help us make better decisions now.

» The base model is typically a simulator, or it might be the real world.

Page 121: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead models use five classes of approximations:» Horizon truncation – Replacing a longer horizon problem

with a shorter horizon» Stage aggregation – Replacing multistage problems with

two-stage approximation.» Outcome aggregation/sampling – Simplifying the

exogenous information process» Discretization – Of time, states and decisions» Dimensionality reduction – We may ignore some variables

(such as forecasts) in the lookahead model that we capture in the base model (these become latent variables in the lookahead model).

Page 122: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead policies are the trickiest to model:» We create “tilde variables” for the lookahead model:

» All variables are indexed by t (when the lookaheadmodel is being generated) and t’ (the time within the lookahead model).

, '

, '

, , 1 ,

Approximated state variable (e.g coarse discretization)Decision we plan on implementing at time ' when we are

planning at time , ' , 1,...,

, ,...,

t t

t t

t t t t t t t H

Sx t

t t t t t H

x x x x

, '

, '

, '

Approximation of information processForecast of costs at time ' made at time

Forecast of right hand sides for time ' made at time

t t

t t

t t

Wc t t

b t t

Page 123: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

We can use this notation to create a policy based on our lookahead model:

» Simplest lookahead is deterministic.

*' ' ' , 1

' 1( ) arg max ( , ) max ( , ( )) | | ,

t H

t t t t tt tt tt t t t tt t

X S C S x C S X S S S x

Restricted/simplified set of policies

Simplified/discretized set of state variables

Simplified/discretized set of decision variables

Sampled set of realizations (or deterministic);Aggregated staging of decisions and information

Limited horizon

Page 124: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Deterministic lookahead

Stochastic lookahead (with two-stage approximation)

' '' 1

( ) arg max ( , ) ( , )T

LA Dt t tt tt tt tt

t t

X S C S x C S x

xtt , xt ,t1,..., xt ,tT

' '' 1

( ) arg max ( , ) ( ) ( ( ), ( ))t

TLA St t tt tt tt tt

t tX S C S x p C S x

xtt , xt ,t1,..., xt ,tT

Scenario trees

Page 125: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead policies peek into the future» Optimize over deterministic lookahead model

The real process

. . . .

The

look

ahea

dm

odel

t 1t 2t 3t

Page 126: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead policies peek into the future» Optimize over deterministic lookahead model

The real process

. . . .

The

look

ahea

dm

odel

t 1t 2t 3t

Page 127: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead policies peek into the future» Optimize over deterministic lookahead model

The real process

. . . .

The

look

ahea

dm

odel

t 1t 2t 3t

Page 128: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Lookahead policies peek into the future» Optimize over deterministic lookahead model

The real process

. . . .

The

look

ahea

dm

odel

t 1t 2t 3t

Page 129: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Stochastic lookahead» Here, we approximate the information model by using a

Monte Carlo sample to create a scenario tree: 1am 2am 3am 4am 5am …..

Change in wind speed

Change in wind speed

Change in wind speed

Page 130: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

We can then simulate this lookahead policy over time:

. . . .

t 1t 2t 3t

The

look

ahea

dm

odel

The base model

Page 131: Tutorial: A Unified Modeling and Algorithmic Framework for

. . . .

t 1t 2t 3t

Lookahead policies

We can then simulate this lookahead policy over time:

The

look

ahea

dm

odel

The base model

Page 132: Tutorial: A Unified Modeling and Algorithmic Framework for

. . . .

t 1t 2t 3t

Lookahead policies

We can then simulate this lookahead policy over time:

The

look

ahea

dm

odel

The base model

Page 133: Tutorial: A Unified Modeling and Algorithmic Framework for

. . . .

t 1t 2t 3t

Lookahead policies

We can then simulate this lookahead policy over time:

The

look

ahea

dm

odel

The base model

Page 134: Tutorial: A Unified Modeling and Algorithmic Framework for

Learning damaged networks

XX

X

134callcall

call

call

134

call

call

Which way should the truck go? X

X

X

call

Page 135: Tutorial: A Unified Modeling and Algorithmic Framework for

Decision Outcome Decision Outcome Decision

Page 136: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Monte Carlo tree search:

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–49, March 2012.

Page 137: Tutorial: A Unified Modeling and Algorithmic Framework for
Page 138: Tutorial: A Unified Modeling and Algorithmic Framework for

Lookahead policies

Notes:

» Solving stochastic lookahead policies can be hard!

» … but this is still just a lookahead policy which is a class of rolling horizon heuristic.

» Even if solving the lookahead model is hard, an optimal solution of a lookahead model (even a stochastic one) is (with rare exceptions) not an optimal policy.

Page 139: Tutorial: A Unified Modeling and Algorithmic Framework for

Outline

The four classes of policies» Policy function approximations (PFAs)» Cost function approximations (CFAs)» Value function approximations (VFAs)» Direct lookahead policies (DLAs)» A hybrid lookahead/CFA

Page 140: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

An energy storage problem:

Page 141: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximationForecasts evolve over time as new information arrives:

Actual

Rolling forecasts, updated each hour.

Forecast made at midnight:

Page 142: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

Benchmark policy – Deterministic lookahead

' ' ' '

' ' '

' ' '

max' ' '

' ' 'arg

' 'arg

' '

wd rd gd Dtt tt tt tt

gd gr Gtt tt tt

rd rgtt tt tt

wr grtt tt ttwr wd Ett tt ttwr gr ch ett ttrd rg disch ett tt

x x x f

x x f

x x R

x x R R

x x f

x x

x x

b

g

g

+ + £

+ £

+ £

+ £ -

+ £

+ £

+ £

Page 143: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

Parametric cost function approximations» Replace the constraint

with:» Lookup table modified forecasts (one adjustment term for

each time in the future):

» Exponential function for adjustments (just two parameters)

» Constant adjustment (one parameter)

't t

' ' ' 'wr wd Ett tt t t ttx x f

'wrttx '

wdttx

2 ( ' )

' ' 1 '

t twr wd Ett tt ttx x e f

' ' 'wr wd Ett tt ttx x f

Page 144: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

Optimizing the CFA:» Let be a simulation of our policy given by

» We then compute the gradient with respect to

» The parameter is found using a classical stochastic gradient algorithm:

We tested several stepsize formulas and found that ADAGRAD worked best:

( , ) ( , )F F

( , )F

0

( , ) ( ), ( ( ) | )T

t t tt

F C S X S

1 1( , )n n n nnF

( )21

' 0

( , )t

n t x t ttt

G F x WGh

ae +

=

= = +

å

Page 145: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

Optimizing the CFA:» We compute the gradient by applying the chain rule

where the interaction from one time period to the next is captured using

» Assuming there are no integer variables, these equations are quite easy to compute.

Page 146: Tutorial: A Unified Modeling and Algorithmic Framework for

0fs = 10fs =

20fs = 30fs =Lookup table

Constant parameterExponential function

Page 147: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

Improvement over deterministic benchmark:

Lookup tableExponential

Constant

Page 148: Tutorial: A Unified Modeling and Algorithmic Framework for

Parametric cost function approximation

The parametric CFA represents a fundamental rethinking of the modeling of stochastic programming problems:» From thinking of the lookahead model as the objective

function:

» To acknowledging that the lookahead model is a policy for solving the base model…

…. which is a simulator where we do not have to make any of the standard approximations required in stochastic programming.

0 01

max ( ) ( ) ( )T

t tt

c x p c x

1 00

max , ( | ), |T

t t t t tt

E C S X S W S

Page 149: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Consider a basic energy storage problem:

» We are going to show that with minor variations in the characteristics of this problem, we can make each class of policy work best.

Page 150: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

We can create distinct flavors of this problem:» Problem class 1 – Best for PFAs

• Highly stochastic (heavy tailed) electricity prices• Stationary data

» Problem class 2 – Best for CFAs• Stochastic prices and wind (but not heavy tailed)• Stationary data

» Problem class 3 - Best for VFAs• Stochastic wind and prices (but not too random)• Time varying loads, but inaccurate wind forecasts

» Problem class 4 – Best for deterministic lookaheads• Relatively low noise problem with accurate forecasts

» Problem class 5 – A hybrid policy worked best here• Stochastic prices and wind, nonstationary data, noisy forecasts.

Page 151: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problemThe policies» The PFA:

• Charge battery when price is below p1• Discharge when price is above p2

» The CFA• Optimize over a horizon H; maintain upper and lower bounds (u, l)

for every time period except the first (note that this is a hybrid with a lookahead).

» The VFA• Piecewise linear, concave value function in terms of energy, indexed

by time.» The lookahead (deterministic)

• Optimize over a horizon H (only tunable parameter) using forecasts of demand, prices and wind energy

» The lookahead CFA• Use a lookahead policy (deterministic), but with a tunable parameter

that improves robustness.

Page 152: Tutorial: A Unified Modeling and Algorithmic Framework for

An energy storage problem

Each policy is best on certain problems» Results are percent of posterior optimal solution

» … any policy might be best depending on the data.Joint research with Prof. Stephan Meisel, University of Muenster, Germany.

Page 153: Tutorial: A Unified Modeling and Algorithmic Framework for

Stochastic programming

Markov decision processes

Reinforcement learning

Optimal control

Model predictive

control

Robust optimization

Approximate dynamic

programming

Online computation

Simulation optimization

Stochastic search

Decision

analysis

Stochastic control

Simulation optimization

DynamicProgramming

andcontrol

Optimal learning

Banditproblems

© Warren Powell 2017

Page 154: Tutorial: A Unified Modeling and Algorithmic Framework for

Multi-armed bandits/optimal learning

Reinforcementlearning/ADP

Stochastic search

Simulationoptimization

Stochastic programming

Optimal controlMarkov decisionprocesses

Lookahead policies (DLAs)

Stochastic gradients

© Warren Powell 2017

Ranking andselection

Policy search(PFAs)

Cost functionapprox. (CFAs)

Value function approx. (VFAs)

Page 155: Tutorial: A Unified Modeling and Algorithmic Framework for

Theory

Applications

Computation

Modeling

Page 156: Tutorial: A Unified Modeling and Algorithmic Framework for

Thank you!http://www.castlelab.princeton.edu/

A tutorial on this topic is available at the top of

http://www.castlelab.princeton.edu/jungle/