learning probabilistic hierarchical task networks to capture user preferences
DESCRIPTION
A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?. Learning Probabilistic Hierarchical Task Networks to Capture User Preferences. Nan Li, Subbarao Kambhampati, and Sungwook Yoon - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/1.jpg)
LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCESNan Li, Subbarao Kambhampati, and Sungwook YoonSchool of Computing and InformaticsArizona State UniversityTempe, AZ 85281 [email protected], [email protected], [email protected] Thanks to William Cushing
A riddle for you:
What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?
![Page 2: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/2.jpg)
TWO TALES OF HTN PLANNING
Abstraction Efficiency Top-down
o Preference handlingo Qualityo Bottom-up
Learning Most work o Our work
![Page 3: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/3.jpg)
Hitchhike? No way!
Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2
Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8
Phike: Hitchhike(source, dest) 0
LEARNING USER PLAN PREFERENCES
![Page 4: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/4.jpg)
LEARNING USER PREFERENCES AS PHTNS
Given a set O of plans executed by the user Find a generative model, Hl
Hl = argmaxH p (O |H)
Probabilistic Hierarchical Task Networks(pHTNs)
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
![Page 5: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/5.jpg)
LEARNING pHTNs
HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction
pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional
S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout
![Page 6: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/6.jpg)
A TWO-STEP ALGORITHM
• Greedy Structure Hypothesizer: Hypothesizes the
schema structure
• Expectation-Maximization (EM) Phase: Refines schema
probabilities Removes redundant
schemas
Generalizes Inside-Outside Algorithm (Lary & Young, 1990)
![Page 7: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/7.jpg)
GREEDY STRUCTURE HYPOTHESIZER
Structure learning Bottom-up Prefer recursive to non-recursive
![Page 8: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/8.jpg)
EM PHASE
E Step: Plan parse tree
computation Most probable parse
tree M Step:
Selection probabilities update
s: ai p, aj ak
![Page 9: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/9.jpg)
EVALUATION
Ideal: User studies (too hard) Our approach:
Assume H* represents user preferences Generate observed plans using H* (H* O) Learn Hl from O (O Hl) Compare H* and Hl (H* T*, Hl Tl)
Syntactic similarity is not important, only distribution is
Use KL-Divergence between distributions T*, Tl
KL-Divergence measures distance between distributions
Domains Randomly Generated Logistics Planning, Gold Miner
H*
P1, P2, …Pn
Learner
Hl
![Page 10: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/10.jpg)
RATE OF LEARNING AND CONCISENESS
Rate of Learning Conciseness
More training plans, better schemas.
• Small domains, 1 or 2 more non-primitive actions• Large domains, much more non-primitive actions• Refine structure learning?
Randomly Generated Domains
![Page 11: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/11.jpg)
EFFECTIVENESS OF EM
• Compare greedy schemas with learned schemas• EM step is very effective in capturing user preferences
Randomly Generated Domains
![Page 12: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/12.jpg)
“BENCHMARK” DOMAINS
H*: Move by plane or
truck Prefer plane Prefer fewer steps
KL Divergence: 0.04 Recovers
plane > truck less steps > more
steps
H*: Get the laser cannon Shoot rock until adjacent to
gold Get a bomb Use the bomb to remove last
wall KL Divergence: 0.52
Reproduces basic strategy
Logistics Planning Gold Miner
![Page 13: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences](https://reader035.vdocument.in/reader035/viewer/2022062309/568146ca550346895db40524/html5/thumbnails/13.jpg)
CONCLUSIONS & EXTENSIONS
Learn user plan preferences Learned HTNs
capture preferences rather than domain abstractions
Evaluate predictive power Compare distributions
rather than structure
Preference obfuscation Poor graduate
student who prefers to travel by plane usually travels by car
Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09