planning in factored hybrid-mdpsnau/adversarial/program/monday/hauskrecht.pdfplanning in factored...

9
Planning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh Branislav Kveton Intelligent Systems Program, University of Pittsburgh Carlos Guestrin Department of Computer Science, CMU

Upload: vuongngoc

Post on 18-Mar-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Planning in Factored Hybrid-MDPs

Milos HauskrechtDepartment of Computer Science, University of Pittsburgh

Branislav KvetonIntelligent Systems Program, University of Pittsburgh

Carlos GuestrinDepartment of Computer Science, CMU

Page 2: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Factored MDPs

• Limitations Standard MDP formulation permits only flat discrete state and

action spaces Real-world problems are complex, factored, and involve

continuous quantities• Factored MDPs with discrete components

State and action spaces grow exponentially in the number ofvariables

Traditional methods for solving MDPs are polynomial in the sizeof state and action spaces

• Factored MDPs with continuous components State and action spaces are infinitely large optimal value function or policy may not have finite support Naive discretization of continuous variables often leads to an

exponential complexity of solution methods

Page 3: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Hybrid Factored MDPs

• A hybrid factored MDP(HMDP) is a 4-tuple M =(X, A, P, R): X = (XD, XC) is a vector of

state variables (discrete orcontinuous)

A = (AD, AC) is a vector ofaction variables (discreteor continuous)

P and R are factoredtransition and rewardmodels represented by adynamic Bayesian network(DBN)

X1

A1

X2

A2

X’1

X’2

R1

X3

A3

X’3

R2

t t + 1

Page 4: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Optimal Value Function and Policy

• Optimal policy π∗ maximizes the infinite horizondiscounted reward:

• Optimal value function V∗ is a fixed point of the Bellmanequation:

• Linear value function approximation A compact representation of an MDP does not guarantee the

same structure in the optimal value function or policy Approximation by a linear combination of |w| basis functions:

( )( )

πγ∑

=π tt

0t

t ,RE xx

( ) ( ) ( ) ( )

′′′γ+= ∑ ∫

′ ′

∗∗

D C

CdV,|PRsupVx xa

xxaxxax,x

( )∑=i

iiifw)(V xxw

Page 5: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Hybrid Approximate Linear Programming

• Hybrid approximate linear programming (HALP)formulation:

( ) ( )AaXx

axaxw

∈∈∀

≥−

α

∑∑

,

0,R,Fw:to subject

wminimize

iii

iii

Basis function relevance weight:

Difference between the basis function fi(x) andits discounted backprojection:

( ) ( ) ( ) ( )∑ ∫′ ′

′′′γ−=D C

Ciii df,|Pf,Fx x

xxaxxxax

( ) ( )∑ ∫ψ=αD C

Cii dfx x

xxx

Page 6: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

HALP formulation contains infinite number of constraints,

Methods:• Monte Carlo (Hauskrecht, Kveton, 2004):

Sample constraint space• ε−HALP (Guestrin, Hauskrecht, Kveton, 2004):

Discretize constraint space on the regular grid Take advantage of the structure Cutting plane methods on discrete state space Exponential in the treewidth of the problem

• Direct cutting plane methods (Kveton, Hauskrecht, 2005) Stochastic optimizations for finding a maximally violated

constraint

HALP solutions

( ) ( ) AaXxaxax ∈∈∀≥−∑ , ,0,R,Fwi

ii

Page 7: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Scale-up potential

• Problems with over 25 state and 20 action variables

Large irrigation network

Outflowregulation

device

Inflowregulation

device

Irrigation channelrepresented by a

continuous variable

Regulation devicerepresented by a

discrete action node

A1

A2

A4

A5

A6

A7

A3

A8

A9

A10

A11

A12

A13

A14

A15

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15 X16

X17

Page 8: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Traffic Management

• Recent surveys show Americans spend billions of hours in city traffic jams Congestions have grown in urban areas of every size Congestions have increasing economic impact The problem is too complex and growing rapidly No technology has provided a definite solution yet

• Our approach Automatic traffic management to ease congestions Traffic management exhibits local structure with long-term global

consequences

Page 9: Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored Hybrid-MDPs Milos Hauskrecht Department of Computer Science, University of Pittsburgh

Automatic Traffic Management

XF-1 – # cars on Forbes Avebetween S Bouquet St and

Bigelow Blvd

XF-2 – # cars on Forbes Ave betweenBigelow Blvd and Schenley Dr

XF-3 – # cars on ForbesAve between Schenley Dr

and S Bellefield Ave

AF-1 – traffic lightsat Forbes Ave and

Bigelow Blvd

AF-2 – trafficlights at Forbes

Ave andSchenley Dr

XB-1 – # cars on Bigelow Blvdbetween 5th Ave and Forbes Ave