learning behavioral parameterization using spatio-temporal case-based reasoning
Post on 05-Feb-2016
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Learning Behavioral Parameterization
Using Spatio-Temporal Case-Based Reasoning
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
Mobile Robot Laboratory
Georgia Tech
This research was funded under the DARPA MARS program.
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 2
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation• Constant parameterization of robotic behavior results in
inefficient robot performance
• Manual selection of “right” parameters is difficult and tedious work
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 3
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation (cont’d) • Use of Case-Based Reasoning (CBR) methodology
– an automatic selection of optimal parameters at run-time (ICRA’01)– each case is a set of behavioral parameters indexed by environmental
features
“front-obstructed
” ca
se
“clear-to-goal
” ca
se
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 4
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation for the Current Research• The CBR module
– improves robot performance (in simulations and on real robots)
– avoids the manual configuration of behavioral parameters
• The CBR module still required the creation of a case library which– is dependent on a robot architecture– needs extensive experimentation to optimize cases– requires good understanding of how CBR works
• Solution: to extend the CBR module to learn– new cases from scratch or optimize existing cases– in a separate training process or during missions
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 5
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Related Work• Use of Case-Based Reasoning in the selection of
behavioral parameters– ACBARR [Georgia Tech ’92] , SINS [Georgia Tech
’93]
– KINS [Chagas and Hallam]
• Automatic optimization of behavioral parameters – genetic programming (e.g., GA-ROBOT [Ram, et. al.])
– reinforcement learning (e.g., Learning Momentum [Lee, et. al.])
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 6
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Behavioral Control and CBR Module
CBR Module controls (case output parameters): Weights for each behavior BiasMove Vector
Noise Persistence Obstacle Sphere
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 7
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Case Indices: Environmental FeaturesSpatial features: traversability vector• split environment into K = 4 angular regions• compute obstacle density within each region• transform the density into traversability
Temporal features:• Short-term velocity towards the goal• Long-term velocity towards the goal
f0=0.92
f1=0.58
f2=1.0
f3=0.68
f0=0.02
f1=0.22
f2=0.63
f3=0.02
Vspatial:
f0=0.92 f1=0.58f2=1.00 f3=0.68
Vtemporal
ShortTerm: Rs=1.0LongTerm: Rl=0.7
Vtemporal
ShortTerm: Rs=0.01LongTerm: Rl=1.0
Vspatial:
f0=0.02 f1=0.22f2=0.63 f3=0.02
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 8
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Overview of non-learning CBR Module
Case switchingDecision tree
CaseAdaptation
currentenvironment
FeatureIdentification
spatial & temporal
feature vectors
Spatial Features Vector Matching
(1st stage of Case Selection)
Temporal Features Vector Matching
(2nd stage of Case Selection)
set ofspatiallymatching
cases
set of spatially and temporally
matching cases
Case Library
all the casesin the library
best matching orcurrently used case
CaseApplication
case ready for application
case output parameters(behavioral assemblage
parameters)
Random Selection Process
(3rd stage of Case Selection)
best matchingcase
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 9
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Making CBR Module to Learn
Case output parameters ( behavioral assemblage parameters)
Random Selection Biased by Case Success
and Spatial and Temporal
Similarities
best matchingor currently used case
case ready for
application
last K cases
new or existing best
matching case
currentenvironment
FeatureIdentification
spatial & temporal
feature vectors
Spatial Features Vector Matching
(1st stage of Case Selection)
Temporal Features Vector Matching
(2nd stage of Case Selection)
set ofspatiallymatching
cases
set of spatially and temporally
matching cases
Case switchingDecision tree
best matchingcase
last K cases
with adjustedperformance
history
Case Library
all the casesin the library
Old Case Performance Evaluation
New Case Creation
(if necessary)
Case Adaptation
Case Application
best matchingor currently used case
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 10
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
• Random selection of cases with the probability of the selection proportional to:– spatial similarity with the environment ( 1st step)
– temporal similarity with the environment (2nd step)
– weighted sum of the case past performance and spatial and temporal similarities (3rd step)
Extensive Exploration of Cases: Modified Case Selection Process
set of spatially & temporally
matching cases:
{C1,,C4}
C1
spatial similarity
1.00.0
1.0P(selection)
C2C4 C3C5
set of spatially matching
cases:{C1, C2, C4}
temporal similarity
1.00.0
1.0P(selection)
C1 C4C2
weighted sum of spatial and temporal similarities and case success
0.0
1.0P(selection)
C1C4
best matching
case:C1
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 11
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Positive and Negative Reinforcement: Case Performance Evaluation
• Criteria for the evaluation of the case performance :
the average velocity with which the robot approaches its goal during the application of the case– opportunities for intermediate case performance evaluations
– may not always be the right criteria• such cases exhibit no positive velocity towards the goal
• the evaluation of the performance is delayed by K (=2) cases
– case_success (represents case performance) is:• increased if the average velocity is increased or sustained high
• decreased otherwise
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 12
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Maximization of Reinforcement: Case Adaptation
• Maximize case_success as a noisy function of case output parameters (behavioral assemblage parameters)– maintain the adaptation vector A(C) for each case C– if the last series of adaptations result in the increase of
case_success then continue the adaptation: O(C) = O(C) + A(C)
– otherwise switch the direction of the adaptation, add a random component and scale proportionally to case_success:
A(C) = -·A(C) + ·R O(C) = O(C) + A(C)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 13
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Maximization of Reinforcement: Case Adaptation (cont’d)
• Incorporate prior knowledge into the search:– fixed adaptation of the Noise_Gain and Noise_Persistence
parameters based on the short- and long-term velocities of the robot
• Constrain the search:– limit Obstacle_Gain to be higher than the sum of the other
schema gains (to avoid collisions)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 14
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
The Growth of the Case Library: Case Creation Decision
• To avoid divergence a new case is created whenever:– case_success of the selected case is high and spatial and
temporal similarities with the environment are low to moderate
– case_success of the selected case is low to moderate and spatial and temporal similarities are low
• Limit the maximum size of the library (10 in this work)• New case is initialized with:
– the spatial and temporal features of the environment
– the output parameter values of the selected case
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 15
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Experimental Analysis: Example Learning CBR: first run (starting with an empty library)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 16
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Experimental Analysis: Example Learning CBR: a run after 54 training runs on various environments• library of ten cases was learned • 36 percent shorter travel distance
A case of a
“clear-to-goal”
strategy is
learned for
such
environments
A case of a
“squeezing”
strategy is
learned for
such
environments
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 17
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
1 2 3
Experiments: Statistical Results Simulation results (after 250 training runs for learning CBR system)
12
3
15% Obstacle density
20% Obstacle density0.00
500.00
1000.00
1500.00
2000.00
2500.00
3000.00
3500.00
4000.00
4500.00
Heterogeneous environmentHomogeneous environment
Ave
rage
num
ber
of s
teps
Mis
sion
co
mpl
etio
n ra
te
0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
3500.0
1 2 3
lear
ning
CB
R
CB
R
non-
adap
tive
lear
ning
CB
R
CB
R
non-
adap
tive
lear
ning
CB
R
CB
R
non-
adap
tive
non-
adap
t.
CB
R
lear
n
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 18
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Real Robot Experiments: In Progress
• RWI ATRV-Jr• Sensors:
– SICK laser scanners in front and back
– Compass– Gyroscope
• Experiments in progress, no statistical results yet
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 19
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Conclusions
• New and existing cases are learned and optimized during a training process or as part of mission executions
• Performance:– substantially better than that of a non-adaptive system
– comparable to a non-learning CBR system
• Neither manual selection of behavioral parameters nor careful creation and optimization of case library is required from a user
• Future Work– real robot experiments
– case “forgetting” component
– integration with other adaptation & learning methods (e.g., Learning Momentum, RL for Behavioral Assemblage Selection)
top related