learning prospective robot behavior

15
LABORATORY FOR PERCEPTUAL ROBOTICS UNIVERSITY OF MASSACHUSETTS AMHERST DEPARTMENT OF COMPUTER SCIENCE Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst

Upload: amandla

Post on 01-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Learning Prospective Robot Behavior. Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst. A Developmental Approach. Infant Learning In stages Maturation processes Parents provide constrained learning contexts Protect Easy Complex - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Learning Prospective Robot Behavior

Shichao Ou and Roderic Grupen

Laboratory for Perceptual RoboticsUniversity of Massachusetts Amherst

Page 2: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

A Developmental Approach

• Infant Learning– In stages

• Maturation processes

– Parents provide constrained learning contexts

• Protect• EasyComplex

– Motion mobile for newborns– Use brightly colored, easy to

pick up objects– Use building blocks– Association of words and

objects

Page 3: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Application in Robotics

• Framework for Robot Developmental Learning – Role of teacher: setup learning contexts that make target concept

conspicuous– Role of robot: acquire concepts, generalize to new contexts by autonomous

exploration, provide feedback

• Control Basis – Robot actions are created using combinations of <σ,ф,τ> – Establish stages of learning by time-varying constraints on resources

• Easy Complex

Page 4: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Example• Learning to Reach for

Objects– Stage 1: SearchTrack

• Focus attention using single brightly colored object (σ)

• Limit DOF (τ) to use head ONLY

– Stage 2: ReachGrab• Limit DOF (τ) to use

one arm ONLY

– Stage 3: Handedness, Scale-Sensitive

Hart et. al, 2008

Page 5: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning

• Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy

Page 6: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Robot Prospective Learning with Human Guidance

S0 S1 Si SnSja0 a1 ai-1 ai aj-1 aj an-1

S0 S1 Si SnSj

Si1 SinSij

g : 0 1

sub-task

a0 a1 ai-1 ai aj-1 aj an-1

S0 S1 Si SnSj

f

g(f)=1 g(f)=0

a0 a1 ai-1 ai aj-1 aj an-1Challenge

Page 7: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

A 2D Navigation Domain Problem

• 30x30 map• 6 doors,

randomly closed• 6 buttons• 1 start and 1

goal• 3-bit door sensor

on robot

Page 8: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Flat Learning Results

• Flat Q-Learning– 5-bit state

• (x,y, door-bit1, door-bit2, door-bit3)

– 4 actions • up, down, left, right

– Reward• 1 for reaching the goal• -0.01 for every step taken

– Learning parameter• α=0.1, γ=1.0, ε=0.1

• Learned solutions after 30,000 episodes

Page 9: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning

• Stage 1– All doors open– Constrain resources to

use only (x,y) sensors– Allow agent learn a

policy from start to goal

S0 S1 Si SnSjRight Down Right Right Up Right Right

Page 10: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning• Stage 2

– Close 1 door– Robot learns the cause of

the failure

– Robot back tracks and finds an earlier indicator of this cause

Page 11: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning• Stage 2

– Close 1 door– Robot learns the cause of

the failure– Robot back tracks and

finds an earlier indicator of this cause

– Create a sub-task– Learn a new policy to sub-

task

Page 12: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning• Stage 2

– Close 1 door– Robot learns the cause of

the failure– Robot back tracks and

finds an earlier indicator of this cause

– Create a sub-task– Learn a new policy to sub-

task– Resume original policy

Page 13: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Prospective Learning Results

Learned solutions < 2000 episodes

Page 14: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Humanoid Robot Manipulation Domain

• Benefits of Prospective Learning

– Adapt to new contexts by maintaining majority of the existing policy

– Automatically generates sub-goals

– Sub-task can be learned in a completely different state space.

– Supports interactive learning

Page 15: Learning Prospective Robot Behavior

LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE

Conclusion

• A developmental view to robot learning

• A framework enables interactive incremental learning in stages

• Extension to the control basis learning framework using the idea of prospective learning