joseph xu soar workshop 2012 1 learning modal continuous models

Joseph XuSoar Workshop 2012

Learning Modal Continuous Models

Setting: Continuous Environment

• Input to the agent is a set of objects with continuous properties– Position, rotation, scaling, ...

• Output is fixed-length vector of continuous numbers

• Agent runs in lock-step with environment

• Fully observable

Output

EnvironmentAgent

0.2 1.2 0.0 0.0

px py rx ry

0.0 0.2

3.4 3.9 0.0

px py pz

0.0 0.0 0.0

rx ry rz

Levels of Problem Solving

Motor Babbling

Continuous Sampling Methods (RRT)

Symbolic Model Free Methods (RL)

Continuous Model

Symbolic Abstraction

Symbolic Planning Symbolic Model

Slower Task CompletionSpecific Solutions

Faster Task CompletionGeneral Solutions

Problem Solving Method

Knowledge RequiredCharacteristics

NoneGoal Recognition

Continuous Model Learning

• Learn a function

• x: current continuous state vector

• u: current output vector• y: state vector in next

time stepx u y

ContinuousOutput

Locally Weighted RegressionMotor

Command

left voltage: -0.6right voltage: 1.2 ?

k nearest neighborsWeightedLinearRegression

ii uwxwuxf ),(

Problems with LWR

• Euclidean distance doesn’t capture relational similarity

• Averages over neighbors exhibiting different types of interactions

Neighbor

Neighbor Neighbor

Neighbor

Problems with LWR

Neighbor Neighbor

Prediction

• Euclidean distance doesn’t capture relational similarity

• Averages over neighbors exhibiting different types of interactions

Modal Models• Object behavior can be categorized into different Modes

– Behavior within a single mode is usually simple and smooth (inertia, gravity, etc...)– Behaviors across modes can be discontinuous and complex (collisions, drops)– Modes can often be distinguished by discrete spatial relationships between objects

• Learn two-level models composed of:– A classifier that determines the active mode using spatial relationships– A set of linear functions (initial hypothesis), one for each model

fier Mode 1 model

Mode 2 model

Mode 3 model

Scene Prediction

Unsupervised Learning of Modes From Data

Environment

Mode 2

Mode 1

Expectation Maximization

Learned Mode 1

Learned Mode 2

Continuous FeaturesTraining Data

0.5, 1.1, -0.2, 4, 17 21.9

• ExpectationAssuming your current model parameters are correct, what is the likelihood that the model m generated data point i?

• MaximizationAssuming each data point was generated by the most probable model, modify each model’s parameters to maximize likelihood of generating data

• Iterate until convergence to local maximum

Learning Classifier

Spatial RelationsTraining Data

0.5, 1.1, -0.2, 4, 17 21.9

left-of(A,B) = 1right-of(A,B) = 0on-top(A,B) = 0touch(A,B) = 0

A B 10001010110110101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101

1111222211

attributes

1000101011011

Learned Mode 1

Learned Mode 2

Learning Classifier

0101011010100110010110000010101110101000010100010101111010001010000010101001111111010101010101010000100110101010100110100110010101

1000101011011 1

Classifier Training Dataattributes class

1111222211

touch(A, B)

left-of(A, B)

mode 1 mode 2

mode 2

Use linear model for items in same model

Prediction Accuracy Experiment

• 2 Block Environment– Agent has two outputs (dx, dy) which control the x and y offsets of

the controlled block at every times tep– The pushed block can’t be moved except by pushing it with the

controlled block– Blocks are always axis-aligned, there’s no momentum

• Training– Instantiate Soar agent in a variety of spatial configurations– Run 10 time steps, each step is a training example

• Testing– Instantiate Soar agent in some configuration– Check accuracy of prediction for next time step

Prediction Accuracy – Pushed Block

10 20 30 40 50 60 70 801E-8

MM xMM ySM xSM y

Training Scenarios

Classification Performance

0 10 20 30 40 50 60 70 80 900

X errors Y errors

Training Scenarios

Prediction Performance Without Classification Errors

0 10 20 30 40 50 60 70 80 901E-08

Best XBest YReal XReal Y

Training Scenarios

Levels of Problem Solving

Motor Babbling

Continuous Sampling Methods (RRT)

Symbolic Model Free Methods (RL)

Continuous Model

Symbolic Planning Symbolic Model

Slower Task CompletionSpecific Solutions

Faster Task CompletionGeneral Solutions

Problem Solving Method

Knowledge RequiredCharacteristics

NoneGoal Recognition

Symbolic Abstraction• Lump continuous states sharing symbolic properties into a single

symbolic state• Should be Predictable

– Planning requires accurate model (ex. STRIPS operators)– Tends to require more states, more symbolic properties

• Should be General– Fast planning and transferrable solutions– Tends to require fewer states, fewer symbolic properties

S2C1 C1

C1C1C1

S1: intersect(C1, C2)S2: ~intersect(C1, C2)

• Hypothesis: contiguous regions of continuous space that share a single behavioral mode is a good abstract state– Planning within modes is simple because of linear

behavior– Combinatorial search occurs at symbolic level

• Spatial predicates used in continuous model decision tree are a reasonable approximation

Abstraction Experiment

• 3 blocks, goal is to push c2 to t• Demonstrate a solution trace to agent• Agent stores sequence of abstract states in solution in epmem• Agent tries to follow plan in analogous task• Abstraction should include predicates about c1, c2, t, avoid

predicates about d1, d2, d3

d3C2C1

Generalization Performance

Learned 10 Rnd 40 Rnd 80 Rnd All0

30 28.1

10.1 10.3

Abstraction Type

80 Tasks Total

(16 average)

Conclusions

• For continuous environments with interacting objects, modal models are more general and accurate than uniform model

• The relationships that distinguish between modes serve as useful symbolic abstraction over continuous state

• All this work takes Soar toward being able to autonomously learn and improve behavior in continuous environments

Evaluation

Coal• Scaling issues: linear

regression is exponential in number of objects

• Linear modes is insufficient for more complex physics such as bouncing -> catastrophic failure

Nuggets• Modal model learning is

more accurate and general than uniform models

• Abstraction learning results are promising, but preliminary

joseph xu soar workshop 2012 1 learning modal continuous models

Documents

indoor localization via multi-modal sensing on...

soar webinar series: strengthening soar applications

soar magazine 09

soar-rl: reinforcement learning and soar

soar points - soar boating club

soar critical thinking

soar community 06

soar final1

soar- ebook

soar macomb schedule

soar higher

illinois soar webinar - · pdf fileillinois soar webinar...

soar 20051 progress on nl-soar, and introducing xnl-soar...

soar brochure 2011

soar manual

audio-visual source association for string ensembles...

soar-rl: reinforcement learning and soar shelley nason

soar · for information, call the soar office at (315)...

progress on nl-soar, and introducing xnl-soar

eagles soar!