darpa ito/mars project update vanderbilt university
DESCRIPTION
DARPA ITO/MARS Project Update Vanderbilt University. A Software Architecture and Tools for Autonomous Robots that Learn on Mission. K. Kawamura, M. Wilkes, R. A. Peters II, D. Gaines Vanderbilt University Center for Intelligent Systems http://shogun.vuse.vanderbilt.edu/CIS/IRL/. 12 January 2000. - PowerPoint PPT PresentationTRANSCRIPT
DARPA ITO/MARS Project UpdateVanderbilt University
A Software Architecture and Tools for Autonomous
Robots that Learn on MissionK. Kawamura, M. Wilkes, R. A. Peters II, D. Gaines
Vanderbilt UniversityCenter for Intelligent Systems
http://shogun.vuse.vanderbilt.edu/CIS/IRL/
12 January 2000
Vanderbilt MARS Team
• Kaz Kawamura, Professor of Electrical & Computer Engineering. MARS responsibility - PI, Integration
• Dan Gaines, Asst. Professor of Computer Science. MARS responsibility - Reinforcement Learning
• Alan Peters, Assoc. Professor of Electrical Engineering. MARS responsibility - DataBase Associative Memory, Sensory EgoSphere
• Mitch Wilkes, Assoc. Professor of Electrical Engineering. MARS responsibility - System Status Evaluation
• Jim Baumann, Nichols ResearchMARS responsibility - Technical Consultant
Sponsoring AgencyArmy Strategic Defense Command
IMPACT:
NEW IDEAS:GRAPHIC:
SCHEDULE:
Learning with a DataBase Associative Memory
Sensory EgoSphere
Attentional Network
Robust System Status Evaluation
Mission-level interaction between the robot and a Human Commander.
Enable automatic acquisition of skills and strategies.
Simplify robot training via intuitive interfaces - program by example.
A Software Architecture and Tools for Autonomous Mobile Robots That Learn on Mission
Year 1 Year 2
IMA agents and schema
Learning algorithms
Test Demo
Final Demo
Demo III
COMM
LEARNING
CMDR SQUAD 1
SQUAD 2
SQUAD N
...SELF
ENVIR
IMA
Project Goal
1. Develop a software control system for autonomous mobile robots that can:
2. accept mission-level plans from a human commander,
3. learn from experience to modify existing behaviors or to add new behaviors, and
4. share that knowledge with other robots.
Project Approach
• Use IMA, to map the problem to a set of agents.
• Develop System Status Evaluation (SSE) for self diagnosis and to assess task outcomes for learning.
• Develop learning algorithms that use and adapt prior knowledge and behaviors and acquire new ones.
• Develop Sensory EgoSphere, behavior and task descriptions, and memory association algorithms that enable learning on mission.
MARS Project: The Robots
ISAC HelpMate
ATRV-Jr.
CommunicationsAgent
Act./Learning Agent
Commander Agent
Squad Agent1
Squad Agent2
Squad Agentn
...Self
Agent
EnvironmentAgent
IMA
The IMA Software Agent Structure of a Single Robot
Robust System Status Analysis
• Timing information from communication between components and agents will be used.
• Timing patterns will be modeled.
• Deviations from normal indicate “discomfort.”
• Discomfort measures will be combined to provide system status information.
What Do We Measure?
• Visual Servoing Component– error vs. time
• Arm Agent– error vs. time, proximity to unstable points
• Camera Head Agent– 3D gaze point vs. time
• Tracking Agent– target location vs. time
• Vector Signals/Motion Links– log when data is updated
Update Delay Histogram (Arm Agent)
0
100
200
300
4001 9 17 25 33 41 49 57 65 73 81 89 97
Delay (10ms)
Freq
uenc
yUpdate Delay Histogram (Arm Agent)
0
50
100
150
200
1 9 17 25 33 41 49 57 65 73 81 89 97
Delay (10ms)
Freq
uenc
y
Update Delay Histogram (Arm Agent)
0
50
100
150
1 9 17 25 33 41 49 57 65 73 81 89 97
Delay (10ms)
Freq
uenc
y
Update Delay Histogram (Hand Agent)
0
500
1000
1500
1 10 19 28 37 46 55 64 73 82 91 100
Delay (10ms)
Freq
uenc
y
Commander Interface
Commander Interface
Commander Interface
Obstacle Avoidance
Planning/Learning Objectives• Integrated Learning and Planning
– learn skills, strategies and world dynamics
– handle large state spaces
– transfer learned knowledge to new tasks
– exploit a priori knowledge
• Combine Deliberative and Reactive Planning
– exploit predictive models and a priori knowledge
– adapt given actual experiences
– make cost-utility trade-offs
Overview of Approach
Example: Different Terrains
Generate Abstract Map
• Nodes selected based on learned action models • Each node represents a navigation skill
Generate Plan in Abstract Network
• Plan makes cost-utility trade-offs
• Plans updated during execution
• Action Model Learning– adapted MissionLab to allow experimentation (terrain conditions)– using regression trees to build action models
• Plan Generation– developed prototype Spreading Activation Network– using to evaluate potential of SAN for plan generation
Planning/Learning Status
Role of ISAC in MARS
• Inspired by the structure of vertebrate brains
• a fundamental human-robot interaction model
• sensory attention and memory association
• learning sensory-motor coordination (SMC) patterns
• learning the attributes of objects through SMC
ISAC is a testbed for learning complex, autonomous behaviors by a robot under human tutelage.
System Architecture
AA
A
AA
A
A
A
HumanAgent
RobotHuman
RobotSelfAgent
Software System
IMA PrimitiveAgent
HardwareI/O
Next Up: Peer Agent
We are currently developing the peer agent.
The peer agent encapsulates the robot’s understanding of and interaction with other (peer) robots.
System Architecture: High Level Agents
humanagent
selfagent
peeragent
peeragent
environmentagent
objectagent
objectagent
Due to the flat connectivity of IMA primitives, all high level agents can communicate directly if desired.
Robot Learning Procedure• The human programs a task by sequencing component
behaviors via speech and gesture commands.
• The robot records the behavior sequence as a finite state machine (FSM) and all sensory-motor time-series (SMTS).
• Repeated trials are run. The human provides reinforcement feedback.
• The robot uses Hebbian learning to find correlations in the SMTS and to delete spurious info.
Robot Learning (cont’d)• The robot extracts task dependent SMC info from the
behavior sequence and the Hebbian-thinned data.
• SMC occurs by associating sensory-motor events with behaviors nodes in the FSMs.
• The FSM is transformed into a spreading activation network (SAN).
• The SAN becomes a task record in the database associated memory (DBAM) and is subject to further refinements.
Human Agent: Human Detection
Human Agent: Recognition
Human Agent: Face Tracking
Schedule
YEAR ONE 1 2 3 4 5 6 7 8 9 10 11 12
Requirement Analysis/Concept Development
IMA (A/C) Deployment for HelpMate
IMA (A/C) Deployment for ATRV Jr.
Robust System Status Analysis
Reinforcement Learning
Develop Egosphere and DBAM
Demo Scenario – Simple HR interaction