efficient solution algorithms for factored mdps by carlos guestrin, daphne koller, ronald parr,...
TRANSCRIPT
![Page 1: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/1.jpg)
Efficient Solution Algorithms for Factored MDPs
by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman
Presented by Arkady Epshteyn
![Page 2: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/2.jpg)
Problem with MDPs
• Exponential number of states• Example: Sysadmin Problem
• 4 computers: M1, M2 , M3 , M4
• Each machine is working or has failed.• State space: 24
• 8 actions: whether to reboot each machine or not• Reward: depends on the number of working
machines
![Page 3: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/3.jpg)
Factored Representation
• Transition model: DBN• Reward model:
k
j
j xrxR1
)()(
![Page 4: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/4.jpg)
Approximate Value Function
• Linear value function:
• Basis functions:
hi(Xi=true)=1
hi(Xi=false)=0
h0=1
k
j
jj xhwxV1
)()(
![Page 5: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/5.jpg)
Markov Decision Processes
'
)( )'()|'()()(x
x xVxxPxRxV For fixed policy :
The optimal value function V*:
])'(*)|'()([max)(*'
x
aaa
xVxxPxRxV
![Page 6: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/6.jpg)
Solving MDPMethod 1: Policy Iteration
• Value determination
• Policy Improvement
'
)()( )'()|'()()(x
txx
t xVxxPxRxV
•Polynomial in the number of states N•Exponential in the number of variables K
])'()|'()([maxarg)('
1
x
taa
a
t xVxxPxRx
![Page 7: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/7.jpg)
Solving MDPMethod 2: Linear Programming
Intuition: compare with the fixed point of V(x):
axVxxPxRVtoSubject
xiVxMinimize
VVVariables
i
j
jijaai
i
x
ii
N
i
,,)|()(:
0)(:,)(:
,...,: 1
•Polynomial in the number of states N•Exponential in the number of variables
])'(*)|'()([max)(*'
x
aaa
xVxxPxRxV
![Page 8: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/8.jpg)
Value Function Approximation
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
x
k
i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
1
axVxxPxRVtoSubject
xiVxMinimize
VVVariables
i
j
jijaai
i
x
ii
N
i
,,)|()(:
0)(:,)(:
,...,: 1
![Page 9: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/9.jpg)
Objective function
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
•Objective function polynomial in the number of basis functions
i
i
Cx
i
i
ii
c
ii
i
i
x
i
x i
ii
xcwhere
chcw
xhxw
xhwx
)()(
,)()(
)()(
)()(
![Page 10: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/10.jpg)
Each Constraint: Backprojection
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
i
i
x
ai
i
ii
x
a xhxxPwxhwxxP )'()|()'()|('
'
'
'
))(|(
)|(
)|'(
iii
ii
i
cpacEh
xcEh
xxEh
![Page 11: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/11.jpg)
Representing Exponentially Many Constraints
axxhwxxPxRxhwtoSubject
xixhwxMinimize
wwVariables
i
ii
x
aa
i
ii
i
x i
ii
K
,,)'()|()()(:
0)(:,)()(:
,...,:
'
'
1
axRxhxhxxPw
axxRxhxhxxPw
axxhwxxPxRxhw
a
i
ii
x
aix
a
i
ii
x
ai
i
ii
x
aa
i
ii
),()]()'()|([max0
,),()]()'()|([0
,,)'()|()()(
'
'
'
'
'
'
![Page 12: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/12.jpg)
Restricted Domain
i j
jiix
a
i
iaii
x
a
i
ii
x
aix
xrxfw
xRxhxgw
axRxhxhxxPw
)()(max
)()]()([max
),()]()'()|([max0'
'
1. Backprojection - depends on few variables2. Basis function3. Reward function
1 2 3
![Page 13: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/13.jpg)
Variable Elimination
)],(),([max),(
)],(),(),([max
)]],(),([max),(),([max
),(),(),(),(max
)()(max
4324214
321
321312221113,2,1
4324214
312221113,2,1
432421312221114,3,2,1
xxrxxrxxewhere
xxexxfwxxfw
xxrxxrxxfwxxfw
xxrxxrxxfwxxfw
xrxfw
x
xxx
xxxx
xxxx
i j
jiix
- similar to Bayesian Networks
![Page 14: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/14.jpg)
Maximization as Linear Constraints
...
),(),(),(
),(),(),(
),(),(),(
),(),(),(
:sconstrainttoEquivalent
)],(),([max),(
432421321
432421321
432421321
432421321
4324214
321
xxrxxrxxe
xxrxxrxxe
xxrxxrxxe
xxrxxrxxe
xxrxxrxxex
• Exponential in the size of each function’s domain, not the number of states
![Page 15: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/15.jpg)
Factored LP: Scaling
![Page 16: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/16.jpg)
Rule-based Representation
![Page 17: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/17.jpg)
Approximate Value Function
k
j hRule
ij
k
j
jj
k
j
jj
ji
xxxxRulew
xxxxhwxhwxV
1
4321
1
4321
1
),,,(
),,,()()(
x1
x30
5 0.6
h1:
6.0:,:
5:,:
0::
313
312
11
xxRule
xxRule
xRule
Notice: compact representation (2/4 variables, 3/16 rules)
![Page 18: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/18.jpg)
Summing Over Rules
k
j hRule
ij
ji
xxxxRulewxV1
4321 ),,,()(
x1
x3u1
u2 u3
h1(x)
x2
x1u4
u5
h2(x)
+
u6
=
x2
x1
u1+u4
u2+u6 u3+u6
x1
x3 x3u5+u1
u2+u4 u3+u4
![Page 19: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/19.jpg)
Multiplying over Rules
• Analogous construction
axRxhxhxxPw a
i
ii
x
aix
),()]()'()|([max0'
'
![Page 20: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/20.jpg)
Rule-based MaximizationaxRxhxhxxPw a
i
ii
x
aix
),()]()'()|([max0'
'
x1
x2u1
u2 x3
u3 u4
Eliminate x2
x1
x3u1
max(u2,u3) max(u2,u4)
![Page 21: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/21.jpg)
Rule-based Linear Program
• Backprojection, objective function – handled in a similar way
• All the operations (summation, multiplication, maximization) – keep rule representation intact
• is a linear function ji hRule
ij xxxxRulew ),,,( 4321
![Page 22: Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn](https://reader034.vdocument.in/reader034/viewer/2022051115/56649ef65503460f94c099ff/html5/thumbnails/22.jpg)
Conclusions
• Compact representation can be exploited to solve MDPs with exponentially many states efficiently.
• Still NP-complete in the worst case.• Factored solution may increase the size of LP
when the number of states is small (but it scales better).
• Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.