efficient solution algorithms for factored mdps by carlos guestrin, daphne koller, ronald parr,...

Efficient Solution Algorithms for Factored MDPs

by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman

Presented by Arkady Epshteyn

Problem with MDPs

• Exponential number of states• Example: Sysadmin Problem

• 4 computers: M1, M2 , M3 , M4

• Each machine is working or has failed.• State space: 24

• 8 actions: whether to reboot each machine or not• Reward: depends on the number of working

machines

Factored Representation

• Transition model: DBN• Reward model:

k

j

j xrxR1

)()(

Approximate Value Function

• Linear value function:

• Basis functions:

hi(Xi=true)=1

hi(Xi=false)=0

h0=1

k

j

jj xhwxV1

)()(

Markov Decision Processes

'

)( )'()|'()()(x

x xVxxPxRxV For fixed policy :

The optimal value function V*:

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Solving MDPMethod 1: Policy Iteration

• Value determination

• Policy Improvement

'

)()( )'()|'()()(x

txx

t xVxxPxRxV

•Polynomial in the number of states N•Exponential in the number of variables K

])'()|'()([maxarg)('

1

x

taa

a

t xVxxPxRx

Solving MDPMethod 2: Linear Programming

Intuition: compare with the fixed point of V(x):

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

•Polynomial in the number of states N•Exponential in the number of variables

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Value Function Approximation

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

x

k

i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

1

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

Objective function


xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

•Objective function polynomial in the number of basis functions

i

i

Cx

i

i

ii

c

ii

i

i

x

i

x i

ii

xcwhere

chcw

xhxw

xhwx

)()(

,)()(

)()(

)()(

Each Constraint: Backprojection


xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

i

i

x

ai

i

ii

x

a xhxxPwxhwxxP )'()|()'()|('

'

'

'

))(|(

)|(

)|'(

iii

ii

i

cpacEh

xcEh

xxEh

Representing Exponentially Many Constraints


xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

axRxhxhxxPw

axxRxhxhxxPw

axxhwxxPxRxhw

a

i

ii

x

aix

a

i

ii

x

ai

i

ii

x

aa

i

ii

),()]()'()|([max0

,),()]()'()|([0

,,)'()|()()(

'

'

'

'

'

'

Restricted Domain

i j

jiix

a

i

iaii

x

a

i

ii

x

aix

xrxfw

xRxhxgw

axRxhxhxxPw

)()(max

)()]()([max

),()]()'()|([max0'

'

1. Backprojection - depends on few variables2. Basis function3. Reward function

1 2 3

Variable Elimination

)],(),([max),(

)],(),(),([max

)]],(),([max),(),([max

),(),(),(),(max

)()(max

4324214

321

321312221113,2,1

4324214

312221113,2,1

432421312221114,3,2,1

xxrxxrxxewhere

xxexxfwxxfw

xxrxxrxxfwxxfw

xxrxxrxxfwxxfw

xrxfw

x

xxx

xxxx

xxxx

i j

jiix

- similar to Bayesian Networks

Maximization as Linear Constraints

...

),(),(),(

),(),(),(

),(),(),(

),(),(),(

:sconstrainttoEquivalent

)],(),([max),(

432421321

432421321

432421321

432421321

4324214

321

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxex

• Exponential in the size of each function’s domain, not the number of states

Factored LP: Scaling

Rule-based Representation

Approximate Value Function

k

j hRule

ij

k

j

jj

k

j

jj

ji

xxxxRulew

xxxxhwxhwxV

1

4321

1

4321

1

),,,(

),,,()()(

x1

x30

5 0.6

h1:

6.0:,:

5:,:

0::

313

312

11

xxRule

xxRule

xRule

Notice: compact representation (2/4 variables, 3/16 rules)

Summing Over Rules

k

j hRule

ij

ji

xxxxRulewxV1

4321 ),,,()(

x1

x3u1

u2 u3

h1(x)

x2

x1u4

u5

h2(x)

+

u6

=

x2

x1

u1+u4

u2+u6 u3+u6

x1

x3 x3u5+u1

u2+u4 u3+u4

Multiplying over Rules

• Analogous construction

axRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

Rule-based MaximizationaxRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

x1

x2u1

u2 x3

u3 u4

Eliminate x2

x1

x3u1

max(u2,u3) max(u2,u4)

Rule-based Linear Program

• Backprojection, objective function – handled in a similar way

• All the operations (summation, multiplication, maximization) – keep rule representation intact

• is a linear function ji hRule

ij xxxxRulew ),,,( 4321

Conclusions

• Compact representation can be exploited to solve MDPs with exponentially many states efficiently.

• Still NP-complete in the worst case.• Factored solution may increase the size of LP

when the number of states is small (but it scales better).

• Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.

efficient solution algorithms for factored mdps by carlos guestrin, daphne koller, ronald parr,...

Documents