darpa ito/mars project update vanderbilt university

DARPA ITO/MARS Project UpdateVanderbilt University

A Software Architecture and Tools for Autonomous

Robots that Learn on MissionK. Kawamura, M. Wilkes, R. A. Peters II, D. Gaines

Vanderbilt UniversityCenter for Intelligent Systems

http://shogun.vuse.vanderbilt.edu/CIS/IRL/

12 January 2000

Vanderbilt MARS Team

• Kaz Kawamura, Professor of Electrical & Computer Engineering. MARS responsibility - PI, Integration

• Dan Gaines, Asst. Professor of Computer Science. MARS responsibility - Reinforcement Learning

• Alan Peters, Assoc. Professor of Electrical Engineering. MARS responsibility - DataBase Associative Memory, Sensory EgoSphere

• Mitch Wilkes, Assoc. Professor of Electrical Engineering. MARS responsibility - System Status Evaluation

• Jim Baumann, Nichols ResearchMARS responsibility - Technical Consultant

Sponsoring AgencyArmy Strategic Defense Command

IMPACT:

NEW IDEAS:GRAPHIC:

SCHEDULE:

Learning with a DataBase Associative Memory

Sensory EgoSphere

Attentional Network

Robust System Status Evaluation

Mission-level interaction between the robot and a Human Commander.

Enable automatic acquisition of skills and strategies.

Simplify robot training via intuitive interfaces - program by example.

A Software Architecture and Tools for Autonomous Mobile Robots That Learn on Mission

Year 1 Year 2

IMA agents and schema

Learning algorithms

Test Demo

Final Demo

Demo III

COMM

LEARNING

CMDR SQUAD 1

SQUAD 2

SQUAD N

...SELF

ENVIR

IMA

Project Goal

1. Develop a software control system for autonomous mobile robots that can:

2. accept mission-level plans from a human commander,

3. learn from experience to modify existing behaviors or to add new behaviors, and

4. share that knowledge with other robots.

Project Approach

• Use IMA, to map the problem to a set of agents.

• Develop System Status Evaluation (SSE) for self diagnosis and to assess task outcomes for learning.

• Develop learning algorithms that use and adapt prior knowledge and behaviors and acquire new ones.

• Develop Sensory EgoSphere, behavior and task descriptions, and memory association algorithms that enable learning on mission.

MARS Project: The Robots

ISAC HelpMate

ATRV-Jr.

CommunicationsAgent

Act./Learning Agent

Commander Agent

Squad Agent1

Squad Agent2

Squad Agentn

...Self

Agent

EnvironmentAgent

IMA

The IMA Software Agent Structure of a Single Robot

Robust System Status Analysis

• Timing information from communication between components and agents will be used.

• Timing patterns will be modeled.

• Deviations from normal indicate “discomfort.”

• Discomfort measures will be combined to provide system status information.

What Do We Measure?

• Visual Servoing Component– error vs. time

• Arm Agent– error vs. time, proximity to unstable points

• Camera Head Agent– 3D gaze point vs. time

• Tracking Agent– target location vs. time

• Vector Signals/Motion Links– log when data is updated

Update Delay Histogram (Arm Agent)

0

100

200

300

4001 9 17 25 33 41 49 57 65 73 81 89 97

Delay (10ms)

Freq

uenc

yUpdate Delay Histogram (Arm Agent)

0

50

100

150

200

1 9 17 25 33 41 49 57 65 73 81 89 97

Delay (10ms)

Freq

uenc

y

Update Delay Histogram (Arm Agent)

0

50

100

150

1 9 17 25 33 41 49 57 65 73 81 89 97

Delay (10ms)

Freq

uenc

y

Update Delay Histogram (Hand Agent)

0

500

1000

1500

1 10 19 28 37 46 55 64 73 82 91 100

Delay (10ms)

Freq

uenc

y

Commander Interface

Obstacle Avoidance

Planning/Learning Objectives• Integrated Learning and Planning

– learn skills, strategies and world dynamics

– handle large state spaces

– transfer learned knowledge to new tasks

– exploit a priori knowledge

• Combine Deliberative and Reactive Planning

– exploit predictive models and a priori knowledge

– adapt given actual experiences

– make cost-utility trade-offs

Overview of Approach

Example: Different Terrains

Generate Abstract Map

• Nodes selected based on learned action models • Each node represents a navigation skill

Generate Plan in Abstract Network

• Plan makes cost-utility trade-offs

• Plans updated during execution

• Action Model Learning– adapted MissionLab to allow experimentation (terrain conditions)– using regression trees to build action models

• Plan Generation– developed prototype Spreading Activation Network– using to evaluate potential of SAN for plan generation

Planning/Learning Status

Role of ISAC in MARS

• Inspired by the structure of vertebrate brains

• a fundamental human-robot interaction model

• sensory attention and memory association

• learning sensory-motor coordination (SMC) patterns

• learning the attributes of objects through SMC

ISAC is a testbed for learning complex, autonomous behaviors by a robot under human tutelage.

System Architecture

AA

A

AA

A

A

A

HumanAgent

RobotHuman

RobotSelfAgent

Software System

IMA PrimitiveAgent

HardwareI/O

Next Up: Peer Agent

We are currently developing the peer agent.

The peer agent encapsulates the robot’s understanding of and interaction with other (peer) robots.

System Architecture: High Level Agents

humanagent

selfagent

peeragent

peeragent

environmentagent

objectagent

objectagent

Due to the flat connectivity of IMA primitives, all high level agents can communicate directly if desired.

Robot Learning Procedure• The human programs a task by sequencing component

behaviors via speech and gesture commands.

• The robot records the behavior sequence as a finite state machine (FSM) and all sensory-motor time-series (SMTS).

• Repeated trials are run. The human provides reinforcement feedback.

• The robot uses Hebbian learning to find correlations in the SMTS and to delete spurious info.

Robot Learning (cont’d)• The robot extracts task dependent SMC info from the

behavior sequence and the Hebbian-thinned data.

• SMC occurs by associating sensory-motor events with behaviors nodes in the FSMs.

• The FSM is transformed into a spreading activation network (SAN).

• The SAN becomes a task record in the database associated memory (DBAM) and is subject to further refinements.

Human Agent: Human Detection

Human Agent: Recognition

Human Agent: Face Tracking

Schedule

YEAR ONE 1 2 3 4 5 6 7 8 9 10 11 12

Requirement Analysis/Concept Development

IMA (A/C) Deployment for HelpMate

IMA (A/C) Deployment for ATRV Jr.

Robust System Status Analysis

Reinforcement Learning

Develop Egosphere and DBAM

Demo Scenario – Simple HR interaction

darpa ito/mars project update vanderbilt university

Documents

system status information

mars responsibility

software control system

system status evaluation

autonomous robots

learning algorithms

autonomous mobile robots

new behaviors