joseph xu soar workshop 2012 1 learning modal continuous models
Post on 17-Jan-2016
214 Views
Preview:
TRANSCRIPT
1
Joseph XuSoar Workshop 2012
Learning Modal Continuous Models
2
Setting: Continuous Environment
• Input to the agent is a set of objects with continuous properties– Position, rotation, scaling, ...
• Output is fixed-length vector of continuous numbers
• Agent runs in lock-step with environment
• Fully observable
Output
-9.0
5.8
Input
EnvironmentAgent
0.2 1.2 0.0 0.0
px py rx ry
A
B
0.0 0.2
pz rz
3.4 3.9 0.0
px py pz
0.0 0.0 0.0
rx ry rz
A B
3
Levels of Problem Solving
Motor Babbling
Continuous Sampling Methods (RRT)
Symbolic Model Free Methods (RL)
Continuous Model
Symbolic Abstraction
Symbolic Planning Symbolic Model
Slower Task CompletionSpecific Solutions
Faster Task CompletionGeneral Solutions
Problem Solving Method
Knowledge RequiredCharacteristics
NoneGoal Recognition
4
Continuous Model Learning
• Learn a function
• x: current continuous state vector
• u: current output vector• y: state vector in next
time stepx u y
ContinuousOutput
X U Y
5
Locally Weighted RegressionMotor
Command
left voltage: -0.6right voltage: 1.2 ?
x u
k nearest neighborsWeightedLinearRegression
j
jji
ii uwxwuxf ),(
Problems with LWR
• Euclidean distance doesn’t capture relational similarity
• Averages over neighbors exhibiting different types of interactions
6
Query
Neighbor
Neighbor Neighbor
Neighbor
Problems with LWR
7
Query
Neighbor Neighbor
Prediction
• Euclidean distance doesn’t capture relational similarity
• Averages over neighbors exhibiting different types of interactions
Modal Models• Object behavior can be categorized into different Modes
– Behavior within a single mode is usually simple and smooth (inertia, gravity, etc...)– Behaviors across modes can be discontinuous and complex (collisions, drops)– Modes can often be distinguished by discrete spatial relationships between objects
• Learn two-level models composed of:– A classifier that determines the active mode using spatial relationships– A set of linear functions (initial hypothesis), one for each model
8
Mod
e Cl
assi
fier Mode 1 model
Mode 2 model
Mode 3 model
Scene Prediction
Unsupervised Learning of Modes From Data
9
Environment
Mode 2
time
Mode 1
Expectation Maximization
Learned Mode 1
Learned Mode 2
𝒚
Continuous FeaturesTraining Data
0.5, 1.1, -0.2, 4, 17 21.9
10
Expectation Maximization
• ExpectationAssuming your current model parameters are correct, what is the likelihood that the model m generated data point i?
• MaximizationAssuming each data point was generated by the most probable model, modify each model’s parameters to maximize likelihood of generating data
• Iterate until convergence to local maximum
Learning Classifier
11
Spatial RelationsTraining Data
0.5, 1.1, -0.2, 4, 17 21.9
time
Scene
left-of(A,B) = 1right-of(A,B) = 0on-top(A,B) = 0touch(A,B) = 0
A B 10001010110110101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101
1
class
1111222211
attributes
1000101011011
Expectation Maximization
Learned Mode 1
Learned Mode 2
Learning Classifier
12
0101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101
1000101011011 1
Classifier Training Dataattributes class
1111222211
touch(A, B)
left-of(A, B)
mode 1 mode 2
mode 2
1 0
1 0
Use linear model for items in same model
13
Prediction Accuracy Experiment
• 2 Block Environment– Agent has two outputs (dx, dy) which control the x and y offsets of
the controlled block at every times tep– The pushed block can’t be moved except by pushing it with the
controlled block– Blocks are always axis-aligned, there’s no momentum
• Training– Instantiate Soar agent in a variety of spatial configurations– Run 10 time steps, each step is a training example
• Testing– Instantiate Soar agent in some configuration– Check accuracy of prediction for next time step
14
Prediction Accuracy – Pushed Block
10 20 30 40 50 60 70 801E-8
1E-6
1E-4
1E-2
1E+0
1E+2
1E+4
MM xMM ySM xSM y
Training Scenarios
Aver
age
Erro
r
15
Classification Performance
0 10 20 30 40 50 60 70 80 900
3
6
9
X errors Y errors
Training Scenarios
Erro
rs
16
Prediction Performance Without Classification Errors
0 10 20 30 40 50 60 70 80 901E-08
1E-05
1E-02
1E+01
1E+04
Best XBest YReal XReal Y
Training Scenarios
Aver
age
Erro
r
17
Levels of Problem Solving
Motor Babbling
Continuous Sampling Methods (RRT)
Symbolic Model Free Methods (RL)
Continuous Model
Symbolic Abstraction
Symbolic Planning Symbolic Model
Slower Task CompletionSpecific Solutions
Faster Task CompletionGeneral Solutions
Problem Solving Method
Knowledge RequiredCharacteristics
NoneGoal Recognition
18
Symbolic Abstraction• Lump continuous states sharing symbolic properties into a single
symbolic state• Should be Predictable
– Planning requires accurate model (ex. STRIPS operators)– Tends to require more states, more symbolic properties
• Should be General– Fast planning and transferrable solutions– Tends to require fewer states, fewer symbolic properties
C2C1
S1
S2C1 C1
C1
C1C1C1
C1
C1
C1
C1
S1: intersect(C1, C2)S2: ~intersect(C1, C2)
19
Symbolic Abstraction
• Hypothesis: contiguous regions of continuous space that share a single behavioral mode is a good abstract state– Planning within modes is simple because of linear
behavior– Combinatorial search occurs at symbolic level
• Spatial predicates used in continuous model decision tree are a reasonable approximation
20
Abstraction Experiment
• 3 blocks, goal is to push c2 to t• Demonstrate a solution trace to agent• Agent stores sequence of abstract states in solution in epmem• Agent tries to follow plan in analogous task• Abstraction should include predicates about c1, c2, t, avoid
predicates about d1, d2, d3
C2
C1
td1
d2
d3C2C1
C1
C2
C1
t
d1
d2
d3
C2C1
C1
21
Generalization Performance
Learned 10 Rnd 40 Rnd 80 Rnd All0
5
10
15
20
25
30 28.1
1.7
7
10.1 10.3
Abstraction Type
Num
ber o
f Tas
ks S
olve
d
80 Tasks Total
(16 average)
22
Conclusions
• For continuous environments with interacting objects, modal models are more general and accurate than uniform model
• The relationships that distinguish between modes serve as useful symbolic abstraction over continuous state
• All this work takes Soar toward being able to autonomously learn and improve behavior in continuous environments
23
Evaluation
Coal• Scaling issues: linear
regression is exponential in number of objects
• Linear modes is insufficient for more complex physics such as bouncing -> catastrophic failure
Nuggets• Modal model learning is
more accurate and general than uniform models
• Abstraction learning results are promising, but preliminary
top related