Transcript
Page 1: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1

Adaptive Intelligent Mobile Robotics

Leslie Pack Kaelbling

Artificial Intelligence Laboratory

MIT

Page 2: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 2

Pyramid

•Addressing problem at multiple levels

Planning

Built-in Behaviors

Learning

Page 3: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 3

Built-in Behaviors

Goal: general-purpose, robust visually guided local navigation

• optical flow for depth information• finding the floor

• optical flow information• Horswill’s ground-plane method

• build local occupancy grids• navigate given the grid

• reactive methods• dynamic programming

Page 4: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 4

Reactive Obstacle Avoidance

Standard method in mobile robotics is to use potential fields

• attractive force toward goal• repulsive forces away from obstacles• robot moves in direction given by resultant force

New method for non-holonomic robots: move the center of the robot so that the front point is holonomic

Page 5: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 5

Human Obstacle Avoidance

Control law based on visual angle and distance to goal and obstacles

Parameters set based on experiments with humans in large free-walking VR environment

Page 6: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 6

Humans are Smooth!

Page 7: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 7

Behavior Learning

Typical RL methods require far too much data to be practical in an online setting. Address the problem with

• strong generalization techniques• locally weighted regression• “skeptical” Q-Learning

• bootstrapping from human-supplied policy• need not be optimal and might be very wrong• shows learner “interesting” parts of the space• “bad” initial policies might be more effective

Page 8: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 8

Two Learning Phases

LearningSystem

SuppliedControlPolicy

Environment

Phase One

AR O

Page 9: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 9

Two Learning Phases

LearningSystem

SuppliedControlPolicy

Environment

AR O

Phase Two

Page 10: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 10

New Results

Drive to goal, avoiding obstacles in visual field

Inputs (6 dimensions):• heading and distance to goal• image coordinates of two obstacles

Output:• steering angle

Reward:• +10 for getting to goal; -5 for running over obstacle

Training: simple policy that avoids one obstacle

Page 11: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 11

Robot’s View

Page 12: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 12

Local Navigation

Average steps to goal

0

50

100

150

200

250

-25 -15 -5 5 15

Training runs

Steps to goal

JAQLOptimalTrainer

Phase 1 Phase 2

Page 13: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 13

Map Learning

Robot learns high-level structure of environment• topological maps appropriate for large-scale

structure• low-level behaviors induce topology• based on previous work using sonar• vision changes problem dramatically

• no more problems with many states looking the same

• now same state always looks different!

Page 14: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 14

Sonar-Based Map Learning

DataTrue Model

Page 15: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 15

Current Issues in Map Learning

• segmenting space into “rooms”• detecting doors and corridor openings• representation of places

• stored images• gross 3D structure• features for image and structure matching

Page 16: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 16

Large Simulation Domain

Use for learning and large-scale experimentation that is impractical on a real robot

• built using video-game engine• large multi-story building• packages to deliver• battery power management• other agents (to survey)• dynamically appearing items to collect• general Bayes-net specification so it can be used

widely as a test bed

Page 17: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 17

Hierarchical MDP Planning

Large simulated domain has unspeakably many primitive statesUse hierarchical representation for planning

• logarithmic improvement in planning times• some loss of optimality of plans

Existing work on planning and learning given a hierarchy• temporal abstraction: macro actions• spatial abstraction: aggregated states

Where does the hierarchy come from?• combined spatial and temporal abstraction• top-down splitting approach

Page 18: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 18

Region-Based Hierarchies

Divide state space into regions• each region is a single abstract state at next level• polices for moving through regions are abstract

actions at next level

Page 19: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 19

Choosing Macros

Given a choice of a region, what is a good set of macro actions for traversing it?

• existing approaches guarantee optimality with a number of macros exponential in the number of exit states

• our method is approximate, but works well when here are no large rewards inside the region

Page 20: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 20

Point-Source Rewards

• Compute a value function for each possible exit state, offline

• Given a new valuation of all exit states online• Quickly combine value functions to determine

near-optimal action

Page 21: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 21

Approximation is Good

0

100

200

300

400

500

600

700

0 500 1000

Distance between point sources

Value

OptimalPoint Source

Page 22: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 22

How to Use the Hierarchy

Off line:• Decompose environment into abstract states• Compute macro operators

On line:• Given new goal, assign values to exits at highest

level• Propagate values at each level• In current low-level region, choose action

Page 23: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 23

What Makes a Decomposition Good?

Trade off• decrease in off-line planning time• decrease in on-line planning time• decrease in value of actions

We can articulate this criterion formally but…

… we can’t solve it

Current research on reasonable approximations

Page 24: DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 24

Next Steps

Low-level• apply JAQL to tune obstacle avoidance behaviors

Map learning• landmark selection and representation• visual detection of openings

Hierarchy• algorithm for constructing decomposition• test hierarchical planning on huge simulated

domain


Top Related