1
CS 188: Artificial IntelligenceSpring 2010
Advanced Applications:
Robotics
Pieter Abbeel – UC Berkeley
A few slides from Sebastian Thrun, Dan Klein
1
Announcements
� Project 5 due Thursday --- Classification!
� Contest!!� Tournaments every night.
� Final tournament: We will use submissions received by Thursday May 6, 11pm.
Estimation: Laplace Smoothing
� Laplace’s estimate (extended):� Pretend you saw every outcome
k extra times
� What’s Laplace with k = 0?
� k is the strength of the prior
� Laplace for conditionals:� Smooth each conditional
independently:
H H T
So Far: Foundational Methods
4
Now: Advanced Applications
5
Robotic Control Tasks
� Perception / Tracking� Where exactly am I?� What’s around me?
� Low-Level Control� How to move the robot
and/or objects from position A to position B
� High-Level Control� What are my goals?� What are the optimal
high-level actions?
2
Robot folds towels
� [pile of 5 video]
7[Maitin-Shepard, Cusumano-Towner, Lei & Abbeel, 2010]
Low-Level Planning
� Low-level: move from configuration A to configuration B
A Simple Robot Arm
� Configuration Space
� What are the natural
coordinates for specifying the
robot’s configuration?
� These are the configuration
space coordinates
� Can’t necessarily control all
degrees of freedom directly
� Work Space
� What are the natural
coordinates for specifying the effector tip’s position?
� These are the work space
coordinates
Coordinate Systems
� Workspace:� The world’s (x, y) system
� Obstacles specified here
� Configuration space� The robot’s state
� Planning happens here
� Obstacles can be projected to here
Obstacles in C-Space
� What / where are the obstacles?
� Remaining space is free space
Two-link manipulator
dddd1
dddd2
XXXX
YYYY
αααα2
αααα1
(x, y)
3
Example Obstacles in C-Space Two-link manipulator
� Demohttp://www-inst.eecs.berkeley.edu/~cs188/fa08/demos/robot.html
Probabilistic Roadmaps
� Idea: sample random points as nodes in a visibility graph
� This gives probabilistic
roadmaps
� Very successful in practice
� Lets you add points where you need them
� If insufficient points, incomplete or weird paths
Perception
18
1. Find a point see in two camera views2. Find 3D coordinates by finding the intersection of the rays
19 20
4
21 22
23 24
25 26
5
27 28
29 30
31 32
6
33 34
35 36
Glanced over
� Calibration of camera and robot
� Recognition of corners
� More generally: visual feedback during all manipulations
� How should we move the corners such that we obtain the desired result?
37
Now: Advanced Applications
38
7
Motivating Example
� How do we specify a task like this?
[demo: autorotate / tictoc]
Autonomous Helicopter Flight
� Control inputs:� alon : Main rotor longitudinal cyclic pitch control (affects pitch rate)
� alat : Main rotor latitudinal cyclic pitch control (affects roll rate)
� acoll : Main rotor collective pitch (affects main rotor thrust)
� arud : Tail rotor collective pitch (affects tail rotor thrust)
Autonomous Helicopter Setup
On-board inertial measurement unit (IMU)
Send out controls to helicopter
Position
HMM for Tracking the Helicopter
� Hmm representing KF variables here;
mention KF
42
Helicopter MDP
� State:
� Actions (control inputs):� alon : Main rotor longitudinal cyclic pitch control (affects pitch rate)
� alat : Main rotor latitudinal cyclic pitch control (affects roll rate)
� acoll : Main rotor collective pitch (affects main rotor thrust)
� arud : Tail rotor collective pitch (affects tail rotor thrust)
� Transitions (dynamics):
� st+1 = f (st, at) + wt
[f encodes helicopter dynamics]
[w is a probabilistic noise model]
� Can we solve the MDP yet?
Problem: What’s the Reward?
� Rewards for hovering:
� Rewards for “Tic-Toc”?
� Problem: what’s the target trajectory?
� Just write it down by hand?
44
[demo: hover]
[demo: bad]
8
Helicopter Apprenticeship?
47
[demo: unaligned]
Probabilistic Alignment
� Intended trajectory satisfies dynamics.
� Expert trajectory is a noisy observation of one of the hidden states.
� But we don’t know exactly which one.
Intended trajectory
Expert demonstrations
Time indices
[Coates, Abbeel & Ng, 2008]
Alignment of Samples
� Result: inferred sequence is much cleaner!49
[demo: alignment]
Final Behavior
50
[demo: airshow]
Now: Advanced Applications
51
� Reward function trades off 25 features.
� R(x) = w⊤φ(x)
Quadruped
[Kolter, Abbeel & Ng, 2008]
9
Find the Reward Function for Foot Placements
� R(x) = w⊤φ(x)
� Find the reward function that makes the demonstrations better than all other paths
by some margin
53
minw,ξ ‖w‖22 +∑
i
ξ(i)
s.t. ∀i, ∀x,∑
t
w⊤φ(x(i)t ) ≥
∑
t
w⊤φ(xt) + 1− ξ(i)
� Demonstrate path across the “training terrain”
� Run apprenticeship learning to find a set of weights w
� Receive “testing terrain” (a height map)
� Find a policy for crossing the testing terrain.
High-Level Control
Without learning With learned reward function
Now: Advanced Applications
57
Autonomous Vehicles
Autonomous vehicle slides adapted from Sebastian Thrun
10
� 150 mile off-road robot race across the Mojave desert
� Natural and manmade hazards
� No driver, no remote control
� No dynamic passing
� 150 mile off-road robot race across the Mojave desert
� Natural and manmade hazards
� No driver, no remote control
� No dynamic passing
Grand Challenge: Barstow, CA, to Primm, NV Inside an Autonomous Car
5 LasersCamera
Radar
E-stopGPS
GPS compass
6 Computers
IMU Steering motor
Control Screen
Sensors: Laser Readings
12
3
Readings: No Obstacles
∆Z
Readings: Obstacles
Raw Measurements: 12.6% false positives
Obstacle Detection
Trigger if |Zi−Zj| > 15cm for nearby zi, zj
11
xt+2xt xt+1
zt+2zt zt+1
Probabilistic Error Model
GPSIMU
GPSIMU
GPSIMU
HMM Inference: 0.02% false positivesRaw Measurements: 12.6% false positives
HMMs for Detection