confidence based autonomy: policy learning by demonstration manuela m. veloso thanks to sonia...
TRANSCRIPT
![Page 1: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/1.jpg)
Confidence Based Autonomy:Policy Learning by
Demonstration
Manuela M. Veloso
Thanks to Sonia Chernova
Computer Science DepartmentCarnegie Mellon University
Grad AI – Spring 2013
![Page 2: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/2.jpg)
Task Representation
• Robot state
• Robot actions
• Training dataset:
• Policy as classifier(e.g., Gaussian Mixture Model, Support Vector Machine)– policy action– decision boundary with greatest confidence for the query– classification confidence w.r.t. decision boundary
sensor data
f1
f2
),,(: dbp cdbasC
} ,...,1,:),{( niAaasD ii
},...,{: 1 kaaA
nf
f
s ...1
s
dbdbc
pa
![Page 3: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/3.jpg)
Confidence-Based Autonomy Assumptions
• Teacher understands and can demonstrate the task
• High-level task learning– Discrete actions– Non-negligible action duration
• State space contains all information necessary to learn the task policy
• Robot is able to stop to request demonstration– … however, the environment may continue to change
![Page 4: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/4.jpg)
Policy
No Yes
Confident Execution
s2 st…si…s4s3s1
Time
Current State
si
RequestDemonstration
?
ExecuteAction
ap
Relearn Classifier
ExecuteAction ad
RequestDemonstration
ad
),,( dbp cdba
Add Training Point (si, ad)
![Page 5: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/5.jpg)
Demonstration Selection
• When should the robot request a demonstration? – To obtain useful training data– To restrict autonomy in areas of uncertainty
![Page 6: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/6.jpg)
Fixed Confidence Threshold
• Why not apply a fixed classification confidence threshold?
– Example: conf = 0.5
– Simple– How to select good threshold value?
ss
![Page 7: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/7.jpg)
Confident Execution Demonstration Selection
• Distance parameter dist – Used to identify outliers and unexplored regions of state space
• Set of confidence parameters conf – Used to identify ambiguous state regions in which more than one
action is applicable
![Page 8: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/8.jpg)
),( DsNND
Confident Execution Distance Parameter
• Distance parameter dist
s
n
i
i
n
DpNND
1dist
),(
))ˆ,ˆ((),(1
jnj
spdistMinDpNND
} ,...,1 ,:),{( niAaasD ii
where
Given
Given state query , request demonstration ifs distDsNND ),(
dist
![Page 9: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/9.jpg)
Confident Execution Confidence Parameters
• Set of confidence
parameters conf – One for each decision
boundary
db
db
db
M
i db
iconf M
sconf
1
)(
} ,...,1 ,:),{( niAaasD ii
where
Given
),,(: dbp cdbasC and classifier
}:))(,,,{( ipipiidb aasconfaasM db
Given state query , request demonstration ifsdbconfdb sconf )(
db
s
![Page 10: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/10.jpg)
Policy
No Yes
Confident Executionsi
RequestDemonstration
?
ExecuteAction
ap
Relearn Classifier
ExecuteAction ad
RequestDemonstration
ad
),,( dbp cdba
Add Training Point (si, ad)
)(dbdb confisconf
disti DsNND ),(or
![Page 11: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/11.jpg)
CorrectiveDemonstration
Confidence-Based Autonomy
ConfidentExecution
Policy
No Yes
si
RequestDemonstration
?
ExecuteAction
ap
Relearn Classifier
ExecuteAction ad
RequestDemonstration
ad
),,( dbp cdba
Add Training Point (si, ad)
ac
Teacher
Relearn Classifier
Add Training Point (si, ac)
![Page 12: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/12.jpg)
Evaluation in Driving Domain
Introduced byAbbeel and Ng, 2004
Task: Teach the agent to drive on the highway– Fixed driving speed– Pass slower cars and avoid collisions
current lanenearest car lane 1nearest car lane 2nearest car lane 3
state
merge left merge right stay in lane
actions
![Page 13: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/13.jpg)
Evaluation in Driving Domain
Demonstration Selection Method
# Demonstrations Collision Timesteps
“Teacher knows best” 1300 2.7%
Confident Execution
fixed conf 1016 3.8%
Confident Execution
dist & mult.conf 504 1.9%
CBA 703 0%
CBA Final Policy
![Page 14: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/14.jpg)
Demonstrations Over Time
Total DemonstrationsConfident ExecutionCorrective Demonstration
![Page 15: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/15.jpg)
![Page 16: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/16.jpg)
Summary
Confidence-Based Autonomy algorithm– Confident Execution demonstration selection – Corrective Demonstration
![Page 17: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University](https://reader036.vdocument.in/reader036/viewer/2022062713/56649f4d5503460f94c6e234/html5/thumbnails/17.jpg)
What did we do today?
• (PO)MDPs: need to generate a good policy– Assumes the agent has some method for estimating its state (given
current belief state and action, observation, where do I think I am now?)– How do we estimate this?
• Discrete latent states HMMs (simplest DBNs)• Continuous latent states, observed states drawn from Gaussian,
linear dynamical system Kalman filters– (Assumptions relaxed by Extended Kalman Filter, etc)
• Not analytic particle filters– Take weighted samples (“particles”) of an underlying distribution
• We’ve mainly looked at policies for discrete state spaces• For continuous state spaces, can use LfD:
– ML gives us a good-guess action based on past actions– If we’re not confident enough, ask for help!