planning under uncertainty. 573 core topics agency problem spaces search knowledge representation...
Post on 21-Dec-2015
217 views
TRANSCRIPT
![Page 1: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/1.jpg)
Planning Under UncertaintyPlanning Under Uncertainty
![Page 2: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/2.jpg)
573 Core Topics
Agency
Problem Spaces
Search
Knowledge Representation
Reinforcement
Learning Inference Planning
SupervisedLearning
Logic-Based Probabilistic
![Page 3: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/3.jpg)
Administrivia• Reading for today’s class: ch 17 thru 17.3
• Reading for Thursday: ch 21 Reinforcement learning
• Problem Set Extension until Monday 11/10 8am One additional problem
• Tues 11/11 – no class
• Thurs 11/13 – midterm In class Closed book May bring 1 sheet of 8.5x11” paper
![Page 4: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/4.jpg)
Semantics• Syntax: a description of the legal
arrangements of symbols (Def “sentences”)
• Semantics: what the arrangement of symbols means in the world
Sentences
ModelsModels
Sentences
Representation
World
Semantics
Semantics
Inference
![Page 5: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/5.jpg)
Propositional Logic: SEMANTICS
• “Interpretation” (or “possible world”)• Specifically, TRUTH TABLES
Assignment to each variable either T or F Assignment of T or F to each connective Think “function mapping from P to T (or F)”
PT
T
F
F
Q
P Q
T
F F
F
• Does PQ |= PQ ?
![Page 6: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/6.jpg)
First Order Logic
• Syntax more complex
• Semantics more complex Specifically, the mappings are more complex (And the range of the mappings is more
complex)
![Page 7: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/7.jpg)
Models• Depiction of one possible “real-world” model
![Page 8: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/8.jpg)
Interpretations=Mappingssyntactic tokens model
elementsDepiction of one possible interpretation, assuming Constants: Functions: Relations:
Richard John Leg(p,l) On(x,y) King(p)
![Page 9: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/9.jpg)
Interpretations=Mappingssyntactic tokens model
elementsAnother interpretation, same assumptions Constants: Functions: Relations:
Richard John Leg(p,l) On(x,y) King(p)
![Page 10: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/10.jpg)
Satisfiability, Validity, & Entailment
•S is valid if it is true in all interpretations
•S is satisfiable if it is true in some interp
•S is unsatisfiable if it is false all interps
•S1 entails S2 if forall interps where S1 is true, S2 is also true
|=
![Page 11: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/11.jpg)
11
Simple proof from def of conditional probability
)(
)()|()|(
EP
HPHEPEHP
Bayes rules!
Previously, in 573…
![Page 12: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/12.jpg)
© Daniel S. Weld 12
Joint Distribution• All you need - Can answer any question
Inference by enumeration
• But… exponential Both time & space
• Solution: exploit conditional independence
![Page 13: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/13.jpg)
© Daniel S. Weld 13
Joint Distribution• All you need to know• Can answer any question
Inference by enumeration
• But… exponential Both time & space
• Solution: exploit conditional independence
![Page 14: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/14.jpg)
© Daniel S. Weld 14
Sample Bayes Net
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 15: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/15.jpg)
© Daniel S. Weld 15
Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X) Childs(X) Par(Childs(X))
Burglary
Alarm Sounded
Earthquake
![Page 16: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/16.jpg)
© Daniel S. Weld 16
Inference in BNs•We generally want to compute
Pr(X), or Pr(X|E) where E is (conjunctive) evidence
•The graphical independence representation Efficient inference, organized by network shape
•Two simple algorithms: Variable elimination (VE) Junction trees Approximate: Markov Chain Sampling
![Page 17: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/17.jpg)
© Daniel S. Weld 17
Learning
• Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian
• Learning Parameters for a Bayesian Network• Learning Structure of Bayesian Networks
• Naïve Bayes Models• Hidden Variables (later)
![Page 18: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/18.jpg)
Parameter Estimation Summary
Prior Hypothesis
Maximum Likelihood Estimate
Maximum A Posteriori Estimate
Bayesian Estimate
Uniform The most likely
Any The most likely
Any Weighted combination
![Page 19: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/19.jpg)
Parameter Estimation and Bayesian Networks
E B R A J MT F T T F TF F F F F TF T F T T TF F F T T TF T F F F F
...
P(A|E,B) = ?P(A|E,¬B) = ?P(A|¬E,B) = ?P(A|¬E,¬B) = ?
Prior
+ data= Beta(2,3)
(3,4)
![Page 20: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/20.jpg)
Structure Learning as Search• Local Search1. Start with some network structure2. Try to make a change
(add or delete or reverse edge)3. See if the new network is any better• What should the initial state be? Uniform prior over random networks? Based on prior knowledge? Empty network?• How do we evaluate networks?
© Daniel S. Weld 20
![Page 21: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/21.jpg)
Naïve Bayes
F 2 F N-2 F N-1 F NF 1 F 3
ClassValue
…
Assume that features are conditionally ind. given class variableWorks well in practiceForces probabilities towards 0 and 1
![Page 22: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/22.jpg)
Naïve Bayes for Text
• P(spam | w1 … wn)
• Independence assumption?
Spam?
apple dictator Nigeria…
![Page 23: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/23.jpg)
24
Naïve Bayes for Text
• Modeled as generating a bag of words for a document in a given category by repeatedly sampling with replacement from a vocabulary V = {w1, w2,…wm} based on the probabilities P(wj | ci).
• Smooth probability estimates with Laplace m-estimates assuming a uniform distribution over all words (p = 1/|V|) and m = |V| Equivalent to a virtual sample of seeing each word in
each category exactly once.
![Page 24: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/24.jpg)
25
Text Naïve Bayes Algorithm(Train)
Let V be the vocabulary of all words in the documents in DFor each category ci C
Let Di be the subset of documents in D in category ci
P(ci) = |Di| / |D|
Let Ti be the concatenation of all the documents in Di
Let ni be the total number of word occurrences in Ti
For each word wj V Let nij be the number of occurrences of wj in Ti
Let P(wi | ci) = (nij + 1) / (ni + |V|)
![Page 25: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/25.jpg)
26
Text Naïve Bayes Algorithm(Test)
Given a test document XLet n be the number of word occurrences in XReturn the category:
where ai is the word occurring the ith position in X
)|()(argmax1
n
iiii
CiccaPcP
![Page 26: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/26.jpg)
27
Naïve Bayes Time Complexity
• Training Time: O(|D|Ld + |C||V|)) where Ld is the average length of a document in D. Assumes V and all Di , ni, and nij pre-computed in O(|D|
Ld) time during one pass through all of the data. Generally just O(|D|Ld) since usually |C||V| < |D|Ld
• Test Time: O(|C| Lt) where Lt is the average length of a test document.
• Very efficient overall, linearly proportional to the time needed to just read in all the data.
![Page 27: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/27.jpg)
28
Easy to Implement
• But…
• If you do… it probably won’t work…
![Page 28: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/28.jpg)
Probabilities: Important Detail!
Any more potential problems here?
• P(spam | E1 … En) = P(spam | Ei)i
We are multiplying lots of small numbers Danger of underflow! 0.557 = 7 E -18
Solution? Use logs and add! p1 * p2 = e log(p1)+log(p2)
Always keep in log form
![Page 29: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/29.jpg)
30
Naïve Bayes Posterior Probabilities
• Classification results of naïve Bayes I.e. the class with maximum posterior
probability… Usually fairly accurate (?!?!?)
• However, due to the inadequacy of the conditional independence assumption… Actual posterior-probability estimates not
accurate. Output probabilities generally very close to
0 or 1.
![Page 30: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/30.jpg)
573 Core Topics
Agency
Problem Spaces
Search
Knowledge Representation
Reinforcement
Learning Inference Planning
SupervisedLearning
Logic-Based Probabilistic
![Page 31: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/31.jpg)
Planning
Percepts Actions
What action next?
Static
Fully Observable
StochasticInstantaneous
Full
Perfect
Planning under uncertainty
Environment
![Page 32: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/32.jpg)
Models of Planning
Classical Contingent MDP
??? Contingent POMDP
??? Conformant
POMDP
Complete Observation
Partial
None
UncertaintyDeterministic Disjunctive Probabilistic
![Page 33: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/33.jpg)
Awkward
• Never any defn of what an MDP is!
• Missed prob strips defn Should have done that before the 2DBN
All the asumptions about a Markov model could be simplified and eliminated – took too long!
![Page 34: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/34.jpg)
Defn: Markov Model
Q: set of states
init prob distribution
A: transition probability distribution
![Page 35: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/35.jpg)
E.g. Predict Web Behavior
Q: set of states (Pages)
init prob distribution (Likelihood of site entry point)
A: transition probability distribution (User navigation model)
When will visitor leave site?
![Page 36: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/36.jpg)
E.g. Predict Robot’s Behavior
Q: set of states
init prob distribution
A: transition probability distribution
Will it attack Dieter?
![Page 37: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/37.jpg)
Probability Distribution, A
• Forward Causality The probability of st does not depend directly
on values of future states.• Probability of new state could depend on
The history of states visited. Pr(st|st-1,st-2,…, s0)
• Markovian Assumption Pr(st|st-1,st-2,…s0) = Pr(st|st-1)
• Stationary Model Assumption Pr(st|st-1) = Pr(sk|sk-1) for all k.
![Page 38: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/38.jpg)
Representing A
Q: set of states
init prob distributionA: transition probabilities
how can we represent these?
s0
s1
s2
s3
s4
s5
s6
s0 s1 s2 s3 s4 s5 s6
s0
s1
s2
s3
s4
s5
s6
p12
Probability of transitioning from s1 to s2
∑ ?
![Page 39: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/39.jpg)
Factoring Q
• Represent Q simply as a set of states?
• Is there internal structure? Consider a robot domain What is the state space?
s0
s1
s2
s3
s4
s5
s6
![Page 40: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/40.jpg)
A Factored domain
• Six Boolean Variables : has_user_coffee (huc) , has_robot_coffee (hrc), robot_is_wet (w), has_robot_umbrella (u), raining (r), robot_in_office (o)
• How many states?
26 = 64
![Page 41: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/41.jpg)
Representing Compactly
Q: set of states init prob distribution
How represent this efficiently?
s0
s1
s2
s3
s4
s5
s6
r u hrc
w
With a Bayes net (of course!)
![Page 42: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/42.jpg)
Representing A Compactly
Q: set of states
init prob distributionA: transition probabilities
s0
s1
s2
s3
s4
s5
s6
s0 s1 s2 s3 s4 s5 s6 … s35
s0
s1
s2
s3
s4
s5
s6
…
s35
p12 How big is matrix version of A? 4096
![Page 43: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/43.jpg)
2-Dynamic Bayesian Network
huc
hrc
w
u
r
o o
r
u
w
hrc
huc 8
4
16
4
2
2
Total values
required to represent transition probability table = 36
Vs. 4096 required
for a complete
state probablity
table?
T T+1
![Page 44: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/44.jpg)
Dynamic Bayesian Network
huc
hrc
w
u
r
o o
r
u
w
hrc
huc
T T+1
Also known as a Factored Markov Model
Defined formally as * Set of random vars * BN for initial state * 2-layer DBN for transitions
huc
hrc
w
u
r
o
0
![Page 45: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/45.jpg)
Observability
• Full Observability• Partial Observability• No Observability
![Page 46: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/46.jpg)
Reward/cost
• Each action has an associated cost.• Agent may accrue rewards at different
stages. A reward may depend on The current state The (current state, action) pair The (current state, action, next state) triplet
• Additivity assumption : Costs and rewards are additive.
• Reward accumulated = R(s0)+R(s1)+R(s2)+…
![Page 47: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/47.jpg)
Horizon• Finite : Plan till t stages.
Reward = R(s0)+R(s1)+R(s2)+…+R(st)• Infinite : The agent never dies.
The reward R(s0)+R(s1)+R(s2)+… Could be unbounded.
Discounted reward : R(s0)+γR(s1)+ γ2R(s2)+…
Average reward : lim n∞ (1/n)[Σi R(si)]
?
![Page 48: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/48.jpg)
Goal for an MDP
• Find a policy which: maximizes expected discounted reward over an infinite horizon for a fully observable Markov decision process.
Why shouldn’t the planner find a plan??What is a policy??
![Page 49: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/49.jpg)
Optimal value of a state
• Define V*(s) `value of a state’ as the maximum expected discounted reward achievable from this state.
• Value of state if we force it to do action “a” right now, but let it act optimally later: Q*(a,s)=R(s) + c(a) + γΣs’εS Pr(s’|a,s)V*(s’)
• V* should satisfy the following equation: V*(s) = maxaεA {Q*(a,s)}
= R(s) + maxaεA {c(a) + γΣs’εS Pr(s’|a,s)V*(s’)}
![Page 50: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/50.jpg)
Value iteration
• Assign an arbitrary assignment of values to each state (or use an admissible heuristic).
• Iterate over the set of states and in each iteration improve the value function as follows:
Vt+1(s)=R(s) + maxaεA {c(a)+γΣs’εS Pr(s’|a,s)
Vt(s’)} `Bellman Backup’• Stop the iteration appropriately. Vt
approaches V* as t increases.
![Page 51: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/51.jpg)
Max
Bellman Backup
a1
a2
a3
s
Vn
Vn
Vn
Vn
Vn
Vn
Vn
Qn+1(s,a)
Vn+1(s)
![Page 52: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/52.jpg)
Stopping Condition
• ε-convergence : A value function is ε –optimal if the error (residue) at every state is less than ε. Residue(s)=|Vt+1(s)- Vt(s)| Stop when maxsεS R(s) < ε
![Page 53: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/53.jpg)
Complexity of value iteration
• One iteration takes O(|S|2|A|) time.
• Number of iterations required : poly(|S|,|A|,1/(1-γ))
• Overall, the algo is polynomial in state space!
• Thus exponential in number of state vars.
![Page 54: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/54.jpg)
Computation of optimal policy
• Given the value function V*(s), for each state, do Bellman backups and the action which maximises the inner product term is the optimal action.
Optimal policy is stationary (time independent) – intuitive for infinite horizon case.
![Page 55: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/55.jpg)
Policy evaluation
• Given a policy Π:SA, find value of each state using this policy.
• VΠ(s) = R(s) + c(Π(s)) + γ[Σs’εS Pr(s’| Π(s),s)VΠ(s’)]• This is a system of linear equations
involving |S| variables.
![Page 56: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/56.jpg)
Bellman’s principle of optimality
• A policy Π is optimal if VΠ(s) ≥ VΠ’(s) for all policies Π’ and all states s є S.
• Rather than finding the optimal value function, we can try and find the optimal policy directly, by doing a policy space search.
![Page 57: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/57.jpg)
Policy iteration
• Start with any policy (Π0).• Iterate
Policy evaluation : For each state find VΠi(s). Policy improvement : For each state s, find
action a* that maximises QΠi(a,s). If QΠi(a*,s) > VΠi(s) let Πi+1(s) = a* else let Πi+1(s) = Πi(s)
• Stop when Πi+1 = Πi
• Converges faster than value iteration but policy evaluation step is more expensive.
![Page 58: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/58.jpg)
Modified Policy iteration
• Rather than evaluating the actual value of policy by solving system of equations, approximate it by using value iteration with fixed policy.
![Page 59: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/59.jpg)
RTDP iteration
• Start with initial belief and initialize value of each belief as the heuristic value.
• For current belief Save the action that minimises the current
state value in the current policy. Update the value of the belief through
Bellman Backup.• Apply the minimum action and then
randomly pick an observation.• Go to next belief assuming that
observation.• Repeat until goal is achieved.
![Page 60: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/60.jpg)
Fast RTDP convergence
• What are the advantages of RTDP?• What are the disadvantages of RTDP?
How to speed up RTDP?
![Page 61: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/61.jpg)
Other speedups
• Heuristics• Aggregations• Reachability Analysis
![Page 62: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/62.jpg)
Going beyond full observability
• In execution phase, we are uncertain where we are,
• but we have some idea of where we can be.
• A belief state = ?
![Page 63: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/63.jpg)
Models of Planning
Classical Contingent MDP
??? Contingent POMDP
??? Conformant
POMDP
Complete Observation
Partial
None
UncertaintyDeterministic Disjunctive Probabilistic
![Page 64: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/64.jpg)
Speedups
• Reachability Analysis• More informed heuristic
![Page 65: Planning Under Uncertainty. 573 Core Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a48681/html5/thumbnails/65.jpg)
Algorithms for search
• A* : works for sequential solutions.• AO* : works for acyclic solutions.• LAO* : works for cyclic solutions. • RTDP : works for cyclic solutions.