learning prospective robot behavior
DESCRIPTION
Learning Prospective Robot Behavior. Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst. A Developmental Approach. Infant Learning In stages Maturation processes Parents provide constrained learning contexts Protect Easy Complex - PowerPoint PPT PresentationTRANSCRIPT
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Learning Prospective Robot Behavior
Shichao Ou and Roderic Grupen
Laboratory for Perceptual RoboticsUniversity of Massachusetts Amherst
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
A Developmental Approach
• Infant Learning– In stages
• Maturation processes
– Parents provide constrained learning contexts
• Protect• EasyComplex
– Motion mobile for newborns– Use brightly colored, easy to
pick up objects– Use building blocks– Association of words and
objects
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Application in Robotics
• Framework for Robot Developmental Learning – Role of teacher: setup learning contexts that make target concept
conspicuous– Role of robot: acquire concepts, generalize to new contexts by autonomous
exploration, provide feedback
• Control Basis – Robot actions are created using combinations of <σ,ф,τ> – Establish stages of learning by time-varying constraints on resources
• Easy Complex
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Example• Learning to Reach for
Objects– Stage 1: SearchTrack
• Focus attention using single brightly colored object (σ)
• Limit DOF (τ) to use head ONLY
– Stage 2: ReachGrab• Limit DOF (τ) to use
one arm ONLY
– Stage 3: Handedness, Scale-Sensitive
Hart et. al, 2008
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Robot Prospective Learning with Human Guidance
S0 S1 Si SnSja0 a1 ai-1 ai aj-1 aj an-1
S0 S1 Si SnSj
Si1 SinSij
g : 0 1
sub-task
a0 a1 ai-1 ai aj-1 aj an-1
S0 S1 Si SnSj
f
g(f)=1 g(f)=0
a0 a1 ai-1 ai aj-1 aj an-1Challenge
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
A 2D Navigation Domain Problem
• 30x30 map• 6 doors,
randomly closed• 6 buttons• 1 start and 1
goal• 3-bit door sensor
on robot
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Flat Learning Results
• Flat Q-Learning– 5-bit state
• (x,y, door-bit1, door-bit2, door-bit3)
– 4 actions • up, down, left, right
– Reward• 1 for reaching the goal• -0.01 for every step taken
– Learning parameter• α=0.1, γ=1.0, ε=0.1
• Learned solutions after 30,000 episodes
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Stage 1– All doors open– Constrain resources to
use only (x,y) sensors– Allow agent learn a
policy from start to goal
S0 S1 Si SnSjRight Down Right Right Up Right Right
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning• Stage 2
– Close 1 door– Robot learns the cause of
the failure
– Robot back tracks and finds an earlier indicator of this cause
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning• Stage 2
– Close 1 door– Robot learns the cause of
the failure– Robot back tracks and
finds an earlier indicator of this cause
– Create a sub-task– Learn a new policy to sub-
task
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning• Stage 2
– Close 1 door– Robot learns the cause of
the failure– Robot back tracks and
finds an earlier indicator of this cause
– Create a sub-task– Learn a new policy to sub-
task– Resume original policy
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning Results
Learned solutions < 2000 episodes
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Humanoid Robot Manipulation Domain
• Benefits of Prospective Learning
– Adapt to new contexts by maintaining majority of the existing policy
– Automatically generates sub-goals
– Sub-task can be learned in a completely different state space.
– Supports interactive learning
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Conclusion
• A developmental view to robot learning
• A framework enables interactive incremental learning in stages
• Extension to the control basis learning framework using the idea of prospective learning