![Page 1: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/1.jpg)
Learning Procedural Planning Knowledge in Complex Environments
Douglas Pearson
March 2004
![Page 2: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/2.jpg)
Characterizing the Learner
DeliberateImplicitMethod
KR
Declarative
Procedural
Simpler AgentsWeak, slower learning
Complex AgentsStrong, faster learning
Complex EnvironmentsActions: Duration & ConditionalSensing: Limited, noisy, delayedTask : Timely responseDomain: Change over time large state space
Simple EnvironmentsSymbolicLearners
ReinforcementLearning
IMPROV
![Page 3: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/3.jpg)
Why Limit Knowledge Access?
• Procedural – Only access by executing• Declarative – Can answer when will execute/what it will do.
Declarative Problems• Availability
– If (x^5 + 3x^3 – 5x^2 +2) > 7 then Action– Chains of rules A->B->C->Action
• Efficiency– O(size of knowledge base) or worse– Agent slows down as learns more
IMPROV Representation– Sets of production rules for operator preconditions and actions– Assume learner can only execute rules– But allow ability to add declarative knowledge when it’s efficient to do so.
![Page 4: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/4.jpg)
Focusing on Part of the Problem
TaskPerformance
0%
100%
Knowledge
RepresentationInitial
Rule Base
Learn thisDomain Knowledge
![Page 5: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/5.jpg)
The Problem
• Cast learning problem as– Error detection (incomplete/incorrect K)
– Error correction (fixing or adding K)
• But with just limited, procedural access
• Aim is to support learning in complex, scalable agents/environments.
![Page 6: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/6.jpg)
Error Detection Problem
S1Speed-30
S2Speed-10
S3Speed-0
S4Speed-30
Existing(PossiblyIncorrect)
Knowledge
PLAN
How to monitor the plan during execution without direct knowledge access?
![Page 7: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/7.jpg)
Error Detection Solution• Direct monitoring – not possible
• Instead detect lack of progress to the goal
– No rules matching or conflicting rules
S1Speed-30
S2Speed-10
S3Speed-0
S4Engine stallsNo proposal
• Not predicting behavior of the world (useful in stochastic environments)• But no implicit notion of quality of solution• Can add domain specific error conditions – but not required.
![Page 8: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/8.jpg)
IMPROV’s Recovery Method
Search
Learning
Identify Incorrect Operator(s)
Train Inductive Learner
Change Domain Knowledge
Replan
Execute
Record[State,Op -> Result]
Repeat untilfind goal
FailReached
Goal
![Page 9: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/9.jpg)
Finding the Incorrect Operator(s)
Speed-30 Speed-10 Speed-0 Speed-30
Speed-30 Speed-10 Speed-0 Speed-30Change-Gear
Change-Gear is over-specificSpeed-0 is over-general
By waiting can dobetter credit assignment
![Page 10: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/10.jpg)
Learning to Correct the Operator
• Collected a set of training instances– [State, Operator -> Result]
– Can identify differences between states
Speed = 40Light = greenSelf = carOther = car
Speed = 40Light = greenSelf = carOther = ambulance
• Used as a default bias in training inductive learner• Learn preconditions as classification problem (predict operator from state)
![Page 11: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/11.jpg)
K-Incremental Learning
• Collect a set of k instances
• Then train inductive learner
ReinforcementLearners
Till Correction(IMPROV)
Till Unique Cause(EXPO)
Non-IncrementalLearners
1 k1 k2 n
K-Incremental Learner– k does not grow over time => incremental behavior– Better decisions about what to discard when generalizing– When doing “active learning” bad early learning can really hurt
Instance set size
![Page 12: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/12.jpg)
Extending to Operator Actions
Speed 30 Speed 0 Speed 20
Speed 30
Decompose intooperator hierarchy
Speed 0 Speed 20
Brake Release
Slow -5 Slow -10 Slow -10 Slow 0
Terminates with operators that modify a single symbol
![Page 13: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/13.jpg)
Correcting Actions
Slow -5 Slow -10 Slow -10
Expected effectsof braking
Slow -2 Slow -4 Slow -6
Observed effectsof braking on ice
=> Failure
Use the correction method to change thepre-conditions of these sub-operators
![Page 14: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/14.jpg)
Change Procedural Actions
Brake
Changing effectsof brake
Braking & slow=0 & ice=>reject slow -5
Braking & slow=0 & ice=>propose slow -2
SpecializeSlow -5
GeneralizeSlow -2
Supports Complex ActionsActions with durations (sequence of operators)Conditional actions (branches in sequence of operators)Multiple simultaneous effects
![Page 15: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/15.jpg)
IMPROV Summary
DeliberateImplicitMethod
KR
DeclarativeNon-Incremental
ProceduralIncremental
SymbolicLearners
ReinforcementLearning
IMPROV
IMPROV support for:• Powerful agents -- Multiple goals -- Faster, deliberate learning• Complex environments -- Noise -- Complex actions -- Dynamic environments
k-Incremental Learning -- Improved credit assignment -- Which operator -- Which feature
General weak deliberate learner with only procedural access assumed -- General purpose error detection -- General correction method applied to preconditions and actions -- Nice re-use of precondition learner to learn actions -- Easy to add domain specific knowledge to make method stronger
![Page 16: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/16.jpg)
Redux: Diagram-based Example-driven Knowledge Acquisition
Douglas Pearson
March 2004
![Page 17: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/17.jpg)
1. User specifies desired behavior
![Page 18: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/18.jpg)
2. User selects features – define rules
Later we’ll use ML to guess this initial feature set
![Page 19: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/19.jpg)
3. Compare desired with rules
Desired
Actual
Move-through(door1) Turn-to-face(threat1) Shoot(threat1)
Move-through(door1) Turn-to-face(neutral1) Shoot(neutral1)
![Page 20: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/20.jpg)
4. Identify and correct problems
• Detect differences between desired behavior and rules– Detect overgeneral preconditions
– Detect conflicts within the scenario
– Detect conflicts between scenarios
– Detect choice points where there’s no guidance
– etc. etc.
• All of these errors are detected
automatically when rule is created
![Page 21: Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson douglas.pearson@threepenny.net March 2004](https://reader035.vdocument.in/reader035/viewer/2022081602/5514d6f9550346b0478b5288/html5/thumbnails/21.jpg)
5. Fast rule creation by expert
Expert Engineer
Library of validated behavior examples
A -> BC -> D
E, J -> FG, A, C -> H
E, G -> IJ, K -> L
ExecutableCode
Analysis & generation tools
Detect inconsistency
Generalize
Generaterules
Simulate execution
SimulationEnvironment
Define behavior withdiagram-based examples