1/24 human-robot interaction and learning from human demonstration maja j matarić chad jenkins,...

1/24

Human-Robot Interaction and Learning From Human Demonstration

Maja J MatarićChad Jenkins, Monica Nicolescu,

Evan Drumwright, and Chi-Wei Chu

University of Southern CaliforniaInteraction Lab / Robotics Research Lab

Center for Robotics and Embedded Systems

http://robotics.usc.edu/~agents/Mars/mars.html

1/24

Motivation & Approach Goals:

Natural human-robot interaction in various domains. Robot programming & learning by imitation;

(mobile & humanoid).

General Approach: Use intrinsic behavior repertoire to facilitate control,

human-robot interaction, and learning. Use human interactive training method. Use human data-driven training methods.

1/24

Philosophy: Modularity & Interaction Complex control is represented as a combination of lower-

dimensionality, composable building blocks: behaviors. These are the abstraction for interaction. Representation is deictic & action-embedded. Perception is classification, learning is refinement,

enhancement & composition of building blocks. Interaction is action-embedded:

humans know the robot’s behavior repertoire robots map human input onto the repertoire, for prediction &

learning

Human-robot communication is action-based. Intervention is treated as high-priority perceptual input.

1/24

Learning From People

Learn from humans in two “natural” modes:

Human teacher/trainer demonstrates a skill to a robot, which learns from one or a few (in single digits) trials, by mapping observations to existing behaviors.

Corpus of human data is provided off-line, statistical learning is used to derive new behaviors (humanoid).

1/241/44

Previous Developments

A Hierarchical Abstract Behavior Architecture Representation & execution of complex, sequential,

hierarchically structured tasks

An algorithm for on-line learning of task representations from experienced demonstrations

Validated the architecture and learning algorithm Execution of tasks with hierarchical structure and long

behavioral sequences Learning of complex tasks from both human and robot teachers

1/241/44

Environment

sensory input

M. N. Nicolescu, M. J Matarić, “A hierarchical architecture for behavior-based robots", International Conference of Autonomous Agents and Multiagent Systems, July 15- July 19, 2002.

Hierarchical Abstract Behavior Architecture Extended behavior-based architecture

Flexible activation conditions (dependency

links between behaviors) allow for

behavior reusability

Representation of tasks as (hierarchical)

behavior networks

Sequential & opportunistic execution

Support for automated generation (task

learning)

1/241/44

Learning from Experienced Demonstrations Goal: Learn a high-level task representation in

terms of the robot’s own skills Approach:

Teacher-following strategy, active participation

in the demonstration The robot is equipped with a set of basic skills The teacher is aware of these skills and also about what observations the

robot can gather Mapping between what the robot sees and what it can perform The status (met/not met) of all behavior’s goals is continuously monitored:

Teacher may signal moments of time relevant to the task

goals met behavior fire observation-behavior mapping

M. N. Nicolescu, M. J Mataric, "Experience-based representation construction: learning from human and robot teachers", IEEE/RSJ International Conference on Intelligent Robots and Systems, October 29 - November 3, 2001

1/241/44

Recent Developments

Motivation Current approach leads to a correct but possibly

overspecialized task representation Problem in changing environments

Approach Refine the learned task representations through:

Generalizing over multiple (but few) demonstrations New demonstrations are “incorporated” into the existing task

representation Providing feedback during task execution

Unnecessary/missing parts of the task

1/241/44

Generalization Problem

Hard to learn a task from only one trial: Limited sensing capabilities, quality of

teacher’s demonstration, particularities of the environment

Similar to inferring a regular expression (FSA equivalent) from examples

Small number of demonstrations desired Statistical techniques not applicable

Main learning inaccuracies: Learning irrelevant steps (false positives) Omission of steps that are relevant (false

negatives)

A

C

B

F

A

A

B

F

C

A

B

F

A

Training examples

Generalization

?

1/241/44

Generalization Approach

Demonstrate the same task in different/similar environments Construct a task representation that:

Encodes the specifics of each given example Captures the common parts between all demonstrations

Compute a measure of similarity (common steps) between different examples The longest common sequence (LCS) between the topological

representations O(nm) Merge the common nodes

M. N. Nicolescu, M. J Matarić, ”Natural Methods for Robot Task Learning: Instructive Demonstrations, Generalization and Practice", Second International Joint Conference on Autonomous Agents and Multi-Agent Systems , July 14-18, 2003

1/241/44

Illustration of Approach

A

C

B

F

A

A

B

F

C

A

C

B

F

A

A

B

F

C

A

B

F

A

B

F

A

C

B

F

A C

Generalized network

Longest common sequence

3322A

3 321F

22 21B

2111C

111 1A

CFBA

XY

1/241/44

A

C

B

F

A C

A

B

F

A

The dynamic programming method computes the LCS at each level (depth) in the graph

The LCS is computed only for the different parts of the paths

The LCS table is kept as a linked list of arrays

The longest of the paths is selected merge the common nodes

new exampleexisting graph

Merging Additional Demonstrations For subsequent examples compute the LCS between

the new example and all possible paths in the graph

1/241/44

A

C

B

F

A C

A

AC + A

(AC + A)B

(AC + A)BF

(AC + A)BF

Behavior Network Execution

Computing preconditions for each behavior is similar to computing the regular expression from a FSA representation

Added capability for disjunctive & conjunctive activation conditions

Computing the types of dependencies between behaviors (ordering, enabling, permanent) from the two merged behavior networks

1/241/44

Generalization

Task: Go to either the Green or Light Green targets, pick up the Orange box, go to the Yellow and Red targets, go to the Pink target, drop the box there, come back to the Light Green target

None of the demonstrations corresponds to the desired task Contain incorrect steps and inconsistencies

1/241/44

Generalization ExperimentsFirst demonstration

Robot performance

Learned topologyEnvironment

All observations relevantNo trajectory learning

Not reactive policy

1/241/44

Generalization Experiments (II)3rd Human demonstration

Robot performance

3rd 2nd 1st

1/241/44

Refining Task Representation Through Feedback Feedback given through speech

Unnecessary task steps (“bad”) – remove steps from network Missing task steps (”new” ”continue”) – add new steps to the network

A

B

A C

Deleteunnecessarysteps

A

B

A C

M

N

Include newlydemonstratedsteps

A

C

B

F

A C

BAD

BAD

NEW

M

N

CONTINUE

A

B

A C

M. N. Nicolescu, M. J Matarić, ”Natural Methods for Robot Task Learning: Instructive Demonstrations, Generalization and Practice", Second International Joint Conference on Autonomous Agents and Multi-Agent Systems , July 14-18, 2003

1/241/44

Practice and Feedback Experiments3rd demonstration Practice run & feedback

Robot performance

Topology refinement

1/241/44

Practice and Feedback Experiments(II)Practice run & feedback1st demonstration

Robot performancePractice run

Topology refinement

1/241/44

Summary

Generalization method incorporates multiple demonstrations into a unique behavior network representation Helps detect relevant/irrelevant observations

Simple feedback cues can be used for: Providing instructive demonstrations

Refining the task representations learned from direct

demonstration or generalization

1/24

Learning from Motion Data Goal:

To automatically derive both primitive and high-level behaviors from human motion data.

Use behaviors as a substrate for generating robot motion and predicting/classifying human activity.

Method: Corpus of human motion data (motion capture). Dimensional reduction to extract behaviors.

1/24

Motion Segmentation Extract short motion sequences. Previous methods:

Manual (slow, tedious).

Z-function (discrete motion only). !!!!Modified by Peters for Robonaut data

New method: kinematic centroid: Assume limbs are pendulums.

Greedy method determines “end” of

pendulum swing

Appropriate for highly dynamic motion

exhibiting large swings!!!! O. C. Jenkins, M. J Matarić, “Automated Modularization of Human Motion into Actions and Behaviors", USC Center for Robotics and Embedded Systems Technical Report No. CRES-02-002.

1/24

Previous Work Earlier efforts involved:

Application of PCA dimension reduction to arm motion data. K-means clustering to uncover primitive behaviors.

Limitations: Linear PCA applied to nonlinear motion data.

PCA does not capture temporal dependencies.

Clustering for PCA decomposes space, but: Primitives have no intuitive meaning or theme.

Difficult to compose these primitives into higher-level behaviors.

1/24

Current Work Use Isomap for non-linear DR [Tenenbaum et al 2000].

Extend Isomap to handle temporal dependencies. Cluster separable motion groups with bounding-box

clustering.

O. C. Jenkins, M. J Matarić, “Deriving Action and Behavior Primitives from Human Motion Data", IEEE International Conference on Intelligent Robots and Systems, September 30- October 4, 2002.

Interpolate within cluster to represent new motion. Use further DR iterations to derive high-level behaviors.

1/24

A

AA

B

B B

CCF

F

DD

D

E

E E

CF

A

Example of two high-level behaviors...

AB

C DE

FX Y

PCA, spatial IsomapA

B

C/FD

E

Spatio-temporal Isomap

AB

C DE

F

tA

Spatio-temporal Isomap, iter 2

X YtA

performing this sequence:

An example (???)

1/24

Spatio-temporal Dimension Reduction Isomap extracts underlying (non-linear) structure of data.

E.g. 2D spherical manifold from 3D position data.

Extend Isomap for temporal data using common temporal neighbors (CTN): CTN observes that sequence B is preceded by sequence A and

followed by sequence C (resolves spatially similar sequences)

1/24

High-level Behaviors Extract high-level behaviours by applying spatio-

temporal isomap again to sequence of primitives.

1/24

6

11

817

33

39

Arm Waving Punching

Derived High-level Behaviors

1/24

Primitive Motion Synthesis Use interpolation between motion sequences to generate

new variations. Interpolation provides a form of parameterization for a

primitive.

Trajectories of hand positions produced by interpolation. Blue/Red are motions grouped into a primitive. Black/magenta are motions are new motion variations.

1/24

High-level Motion Synthesis A behavior can be used to synthesize a variation on the

input motion. Synthesis uses segment concatentation

PunchingArm Waving Dancing: “Cabbage Patch”

1/24

Primitives as Forward Models Through eager evaluation, the span of motion variations

can be realized for each primitive Consequently, a nonlinear forward model can be

produced for each primitive Used for motion synthesis, given initial posture Experimenting with motion classification via Kalman gains

PCA-view of primitive flow field in joint angle space

1/24

Summary: Learning from Motion Data Strengths of current approach

Derives suitable behaviors for nonlinear motion data with temporal dependencies.

Segmentation techniques allow for full automation. New variations on derived behaviors can be synthesized. Flow field forward models can be produced for each primitive Primitive forward models allow for smooth motion synthesis and

motion classification

Future work: Validation on better motion data (always). Derivation of primitives from NASA Robonaut motion. Integration with task-directed control mechanisms. Posture-atomic primitive derivation.

1/24

Humanoid control via parameterized trajectories

● Free-space control of humanoid robots● Set of exemplar trajectories

● represent Cartesian extrema of single behavior

● trajectories are in joint-space● New movements produced via interpolation

● representative of the original behavior● selected by mixing parameter

1/24

What's good about this?● Very few exemplars of a behavior may be

needed to model that behavior●For dextrous robotic control, easier than explicit programming or optimal control methods●Trajectories can be represented compactly

●RBF approximation can represent complex (i.e. very non-linear) trajectories with high-fidelity using little storage

1/24

Robotic control via parametric primitives

● Precondition(s) for primitive must first be met

● Time duration for primitive then selected● Primitive then executed open-loop

●closed-loop controllers investigated in future

● Control operates at kinematic level only● position and/or velocity commands sent to

low-level controller

1/24

Activity recognition via primitives

● Primitives serve to model a behavior● This model can be used to recognize the behavior● We built a Bayesian classifier to recognize a set of five primitives from mocap & simulator data

● rate of false negatives: 3.39%● rate of false positives: 0.06%● more data needed for validation

1/24

Markerless Kinematic Model and Motion Capture In addition, a kinematic model is estimated for the subject We leverage recent voxel carving techinques for

constructing 3D point volumes of moving subjects in multiple calibrated cameras

Voxel Carving

1/24

Nonlinear Spherical Shells NSS is a simple means for volume skeletonization

Pose-independent principal curve

Captured volume skeleton curve

Original Volume Pose-independent “Da Vinci” Zero Posture

Spherical Shells

Dimension Reduction Partitioning

Clustering and Linking

Projection on originalvolume

1/24

Model and Pose Estimation Using each volume and its skeleton curve, a kinematic

model for each frame in a motion A single model is estimated for the sequence by

aligning frame-specific models identifying common joints using density of aligned joints

Alignment

1/24

Result for Human Waving

1/24

Result for Synthetic Volumes

Original Kinematics and Motion

Derived Kinematics and Motion

Snapshot of the Synethtic Volume for a Single Frame

1/24

Conclusions Goal: use the behavior substrate to facilitate action-

embedded human-robot interaction, control, and learning. Recent successes:

Generalization of multiple (but few) demonstrations into a unique behavior network representation.

Use of simple feedback cues for refining of learned tasks and faster learning.

Automatic derivation of behaviors from human motion data. Some work in progress:

Validation of generalization and human-robot interaction methods in elaborated experimental setups.

Validation of the derivation method on Robonaut data.

Info, videos, papers: http://robotics.usc.edu/projects/mars/

1/24 human-robot interaction and learning from human demonstration maja j matarić chad jenkins,...

Documents

robot programming learning

human input

statistical learning

behaviorbased robots

human teachertrainer

robot teachers

hierarchical architecture

intrinsic behavior repertoire