hardware and software systems for personal robots a dissertation submitted to the department of

HARDWARE AND SOFTWARE SYSTEMS

FOR PERSONAL ROBOTS

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Morgan L. Quigley

December 2012

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/gq378mt7634

© 2012 by Morgan Lewis Quigley. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/gq378mt7634

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Andrew Ng, Primary Adviser


J Kenneth Salisbury, Jr


Peter Abbeel

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

Robots play a major role in precision manufacturing, continually performing economi-

cally justifiable tasks with superhuman speed and reliability. In contrast, deployments

of advanced “personal” robots in home or office environments have been stymied by

difficult hardware and software challenges. Among many others, these challenges have

included cost, reliability, perceptual capability, and software interoperability. This

thesis will describe a series of hardware and software systems designed in response to

these challenges and towards the long-range goal of creating general-purpose robots

that will be useful and practical in everyday environments.

First, several low-cost robot subsystems will be described, including systems for

indoor localization, short-range object recognition, and inertial joint encoding, as

demonstrated on prototype low-cost manipulators. Next, the design of a low-cost,

highly capable robotic hand will be described in detail, which incorporates all of the

aforementioned hardware and software subsystems. Finally, the thesis will describe

a robot software system developed for the STanford AI Robot (STAIR) project, and

its evolution into the Robot Operating System (ROS), a widely used robot software

framework designed to ease collaboration between disparate research communities to

create integrative, embodied AI systems.

v

Acknowledgments

Countless people contributed to the work described in this thesis. In alphabetical

order, the co-authors of the publications which led to this thesis were Alan Asbeck,

Siddharth Batra, Eric Berger, Reuben Brewer, Adam Coates, Ken Conley, Josh Faust,

Tully Foote, Brian P. Gerkey, Stephen Gould, Quoc Le, Jeremy Leibs, Ellen Klingbeil,

Vijay Pradeep, Curt Salisbury, Sai P. Soundaraj, David Stavens, Sebastian Thrun,

Andrew Y. Ng, Ashley Wellman, and Rob Wheeler.

The work described in this thesis benefited enormously from the collaboration of

researchers at Willow Garage, Inc., and Sandia National Laboratories. At Stanford,

my officemates Zico Kolter, Honglak Lee, Adam Coates, Alan Asbeck, and Anya

Petrovskaya patiently answered my never-ending barrage of questions. Faculty mem-

bers including Ken Salisbury, Pieter Abbeel, Oussama Khatib, Fei-Fei Li, and Mark

Cutkosky provided critical assistance and support through many phases of this work.

And of course, none of this would have been possible without the many years of

patient guidance provided by my advisor, Andrew Y. Ng.

vi

Contents

Abstract v

Acknowledgments vi

1 Introduction 1

1.1 Personal Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 First Published Appearances . . . . . . . . . . . . . . . . . . . . . . . 5

2 Low-cost Indoor Localization 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Robotic SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Obtaining Training Data . . . . . . . . . . . . . . . . . . . . . 10

2.3.3 Camera Sensor Model . . . . . . . . . . . . . . . . . . . . . . 12

2.3.4 WiFi Sensor Model . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.5 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 High-resolution Depth Sensing 25

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

vii

3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Laser Line Scanning for Robotics . . . . . . . . . . . . . . . . . . . . 29

3.3.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.2 Hardware Considerations . . . . . . . . . . . . . . . . . . . . . 30

3.3.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4.1 Sliding Windows . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4.2 Learning the Classifiers . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Door Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.6 Inventory-Control Experiment . . . . . . . . . . . . . . . . . . . . . . 40

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Inertial Joint Encoding 43

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.1 Estimation via EKF . . . . . . . . . . . . . . . . . . . . . . . 46

4.3.2 Point estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Controlling a low-cost manipulator . . . . . . . . . . . . . . . . . . . 57

4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.6.1 PR2 Alpha State Estimation . . . . . . . . . . . . . . . . . . . 59

4.6.2 Low-cost Manipulator Torque Control . . . . . . . . . . . . . 61

4.6.3 PR2 Alpha Position Control . . . . . . . . . . . . . . . . . . . 63

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 A Compliant Low-cost Robotic Manipulator 66

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3.1 Actuation overview . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3.2 Tradeoffs of using stepper motors . . . . . . . . . . . . . . . . 73

viii

5.3.3 Distal actuation . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3.4 Inertia and stiffness . . . . . . . . . . . . . . . . . . . . . . . . 76

5.3.5 Low-cost manufacturing . . . . . . . . . . . . . . . . . . . . . 76

5.4 Series Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.5 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.7 Control and Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.8 Demonstration Application . . . . . . . . . . . . . . . . . . . . . . . . 84

5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6 A Low-cost Robotic Hand 86

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3 High-level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3.1 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.3.2 Actuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.3.3 Hand Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.4 Sensor Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.4.1 Joint Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.4.2 Contact Geometry . . . . . . . . . . . . . . . . . . . . . . . . 94

6.4.3 Tactile Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.4.4 Strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.4.5 Visual Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.5 Wiring Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.5.1 Motor Wiring . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.5.2 Phalange Wiring . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.5.3 Interconnect Wiring . . . . . . . . . . . . . . . . . . . . . . . 112

6.6 Computational Systems . . . . . . . . . . . . . . . . . . . . . . . . . 113

6.7 Teleoperation Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

ix

7 STAIR and Switchyard 119

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.2 STAIR: Hardware Systems . . . . . . . . . . . . . . . . . . . . . . . . 120

7.2.1 STAIR 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.2.2 STAIR 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.3 Switchyard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.4.1 Message-Passing Topology . . . . . . . . . . . . . . . . . . . . 125

7.4.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7.5 Fetch a Stapler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8 ROS: A Robot Operating System 137

8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.2 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.2.1 Peer-to-Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.2.2 Multi-lingual . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8.2.3 Tools-based . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.2.4 Thin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

8.2.5 Free and Open-Source . . . . . . . . . . . . . . . . . . . . . . 143

8.3 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

8.4 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

8.4.1 Debugging a single node . . . . . . . . . . . . . . . . . . . . . 145

8.4.2 Logging and playback . . . . . . . . . . . . . . . . . . . . . . 146

8.4.3 Packaged subsystems . . . . . . . . . . . . . . . . . . . . . . . 147

8.4.4 Collaborative Development . . . . . . . . . . . . . . . . . . . . 148

8.4.5 Visualization and Monitoring . . . . . . . . . . . . . . . . . . 150

8.4.6 Composition of functionality . . . . . . . . . . . . . . . . . . . 151

8.4.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9 Conclusions 154

x

Bibliography 157

xi

List of Tables

2.1 Quantitative results for the tracking task . . . . . . . . . . . . . . . . 21

5.1 Measured properties of the manipulator . . . . . . . . . . . . . . . . . 72

5.2 Part cost breakdown of the arm . . . . . . . . . . . . . . . . . . . . . 77

xii

List of Figures

2.1 Left: the STAIR robot, used to acquire maps. Right: a laser backpack,

used to provide ground-truth localization of a pedestrian carrying the

low-cost sensor package. . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 2D map of the environment used in these experiments, as produced by

GMapping and the robot shown in Figure 2.1 . . . . . . . . . . . . . 10

2.3 A typical rendering of the particle filter used to localize the “ground

truth” pedestrian using rearward LIDAR. The green LIDAR scan is

rendered from the most likely particle in the filter. . . . . . . . . . . . 11

2.4 Example image regions corresponding to the “visual words” used dur-

ing image matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5 Empirical justification of the Gaussian + uniform model of the WiFi

power measurements. The plot shows the frequency of power mea-

surement deviations from their respective means. This dataset was

gathered while sitting stationary for 60 seconds, and includes 34 trans-

mitters, most of which were observed 25 times. . . . . . . . . . . . . . 15

2.6 Pre-computed nearest-neighbor prediction of the WiFi signal strength

of a particular MAC address at any point in the environment. The

walls of the environment are overlaid for clarity. . . . . . . . . . . . . 17

xiii

2.7 Visualization of the unified vision + WiFi localization system. Upper-

left shows the particle cloud, which is overshadowed by the centroid of

the particle distribution (yellow) and the ground-truth position (cyan

crosshairs). Right shows the current camera image, with SURF key-

points circled. Lower-left shows the joint likelihood of the WiFi ob-

servations. Extreme lower-left visualizes the histogram of the bag-of-

words representation image. . . . . . . . . . . . . . . . . . . . . . . . 18

2.8 Pedestrian motion model, shown after a one-second integration. With-

out odometry, the particle filter must generate sufficient diversity in

its hypotheses to handle corners. . . . . . . . . . . . . . . . . . . . . . 19

2.9 Images from the training set (top) differed from images in the test set

(bottom) due to illumination changes and typical furniture re-arranging. 20

2.10 The ground-truth LIDAR track of the 1-kilometer test set used for

quantitative evaluation. The test set contained 62 corner turns, and

a mixture of navigating tight corridors and open meeting spaces. Dis-

tances are in meters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.11 Histograms of localization errors during the tracking benchmark on

a continuous 1-kilometer test set. Errors are measured with respect

to LIDAR ground-truth. The SURF and HoG performance is for the

global (1-level) spatial pyramid. Adding WiFi to SURF slightly de-

creases its long-term average tracking accuracy. The color histogram

performs poorly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.12 Global localization performance. The localization systems were started

with a uniform prior at 200 different starting points in the test set.

Errors against ground-truth were averaged at each timestep to show

the expected convergence properties of each system. All methods show

improvement as more observations are incorporated. The combination

of WiFi and the best visual algorithm (3-level spatial pyramid of SURF

descriptors) produces the best performance. . . . . . . . . . . . . . . 23

xiv

3.1 Several off-axis views of a raw scan of a coffee mug obtained by our

scanner from 1.2 meters away. The 5mm-thick handle is prominently

visible. Approximately 1000 points of the scan are on the surface of the

coffee mug, despite the fact that it comprises only 5% of the horizontal

field-of-view of scan. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Clutter makes scene understanding from only 2D visual images dif-

ficult, even in a relatively simple office environment, as many of the

strong edges are not those which suggest the 3D structure of the scene. 28

3.3 A vertical (green) laser line projected by the robot at left is deformed

as it strikes objects in the scene. . . . . . . . . . . . . . . . . . . . . . 31

3.4 A prototype laser line scanner on the STAIR 1 robot. The laser and

its rotary stage are mounted in the upper-right. Images are captured

by the camera in the lower-left. . . . . . . . . . . . . . . . . . . . . . 32

3.5 Image channels considered by the patch-selection algorithm, along with

the typical appearance of a coffee mug. Top: intensity image. Middle:

gradient image. Bottom: depth image. . . . . . . . . . . . . . . . . . 33

3.6 Examples of localized patches from the coffee-mug dictionary. Left:

Intensity patches. Middle: Gradient patches. Right: Depthmap patches. 35

3.7 Precision-recall curves for mugs (left), disposable cups (middle), and

staplers (right). Blue solid curve is for our method; red dashed curve

is for vision only detectors. Scores are computed at each threshold by

first removing overlapping detections. A true-positive is counted if any

detection overlaps with our hand-labeled ground truth by more than

50%. Any detection that does not overlap with a groundtruth object of

the correct class is considered a false-positive. Average Precision mea-

sures 11-point interpolated area under the recall vs. precision curve.

Greater area under the curve is better. . . . . . . . . . . . . . . . . . 36

3.8 After localizing the door handle in the 3D point cloud, the robot can

plan a path to the handle and open the door. . . . . . . . . . . . . . 38

3.9 Detecting coffee mugs in cluttered environments. The detector cor-

rectly ignored the paper cup to the right of the coffee mug. . . . . . . 39

xv

3.10 The inventory-gathering experiment required autonomous navigation

(green track), autonomous door opening, and 20 laser scans of desks

in the four offices shown above. The robot position at each scan is

shown by the red circles, and the field-of-view of each scan is indicated

by the yellow triangles. The locations of the detected coffee mugs are

indicated by the orange circles. This figure was entirely automatically

generated, using the SLAM output for the map and the localization log

for the robot track and sensing positions, which allow the coffee-mug

detections to be transformed into the global map frame. . . . . . . . . 40

4.1 Two manipulators used to demonstrate the utility of accelerometer-

based sensing. Left: Willow Garage PR2 Alpha. Right: a prototype

low-cost manipulator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2 Accelerometers are present in the F0, F2, and F3 links of the robotic

finger, creating a 2-DOF estimation problem between F0 and F2, and

a 1-DOF estimation problem between F2 and F3. . . . . . . . . . . . 49

4.3 Left: an unpowered arm used to evaluate the calibration potential of

the accelerometer-based sensing approach. Right: touching the end

effector to points on the calibration board. . . . . . . . . . . . . . . . 53

4.4 Hold-out test set error during the optimization convergence on pro-

totype manipulator. The horizontal axis shows the iteration number,

and the vertical axis shows the mean of the miscalibrations. Numerical

optimization drives the average error from 11mm to 2mm. . . . . . . 55

4.5 Hold-out test set error during the optimization convergence on Willow

Garage PR2 Alpha. The horizontal axis shows the iteration number,

and the vertical axis shows the mean error in joint angle estimates of

the shoulder lift and the upper arm roll. The optimization drives the

average error from 0.1 deg to 0.02 deg. . . . . . . . . . . . . . . . . . 56

4.6 Shoulder and wrist of the demonstration manipulator . . . . . . . . . 58

4.7 Elbow and gripper of the demonstration manipulator. . . . . . . . . . 58

xvi

4.8 Accelerometers were attached to the upper arm, forearm, and gripper

of a PR2 Alpha robot. . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.9 Tracking the forearm roll of the robot shown in Figure 4.8, showing

the encoder ground-truth (red) against the joint angle estimate from

the accelerometers (blue). . . . . . . . . . . . . . . . . . . . . . . . . 60

4.10 Closed-loop control of a low-cost manipulator using only accelerome-

ters. Two joints are shown. Desired state is plotted in red. Output of

the accelerometer-based state estimation algorithm is plotted in blue.

Vertical axis denotes joint angles in radians; horizontal axis denotes

time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.11 Differences between each stopping position of the arm and their re-

spective cluster centroids, in the XY plane (left) and the XZ plane

(right), as measured by an optical tracker. 14 trials were run, all of

which appear on this plot. . . . . . . . . . . . . . . . . . . . . . . . . 62

4.12 In this experiment, accelerometer-based state estimation was used to

generate relative joint position commands, allowing a position-controlled

robot to repeatedly grasp a doorknob. . . . . . . . . . . . . . . . . . 63

4.13 Time series of one PR2 joint as the manipulator undergoes relative

joint angle commands from the accelerometer-based sensing scheme

and simple setpoint-interpolation to derive small step commands. . . 64

5.1 A low-cost compliant manipulator described described in this chapter.

A spatula was used as the end effector in the demonstration applica-

tion. For ease of prototyping, lasercut plywood was used as the primary

structural material. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Actuation scheme for each of the proximal four DOF. . . . . . . . . . 73

5.3 Cable routes (solid) and belt routes (dashed) for the shoulder lift,

shoulder roll, and elbow joints. All belt routes rotate about the shoul-

der lift joint. The elbow cables twist about the shoulder roll axis inside

a hollow shaft. Best viewed in color. . . . . . . . . . . . . . . . . . . 74

5.4 Compact servos are used to actuate the distal three joints. . . . . . . 75

xvii

5.5 Diagram of the series compliance. Left, compliant coupling with no

external force. Right, an applied force causes rotation against the

locked driven wheel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.6 Stiffness of the elbow. Hysteresis is exhibited due to the polyurethane

in the series compliance. The joint was quasi-statically moved through

70% of its normal operating range. . . . . . . . . . . . . . . . . . . . 79

5.7 Repeatability test results. Measurement accuracy is ±0.1 mm. . . . . 80

5.8 Step responses for each of the major types of actuators of the robot.

Top, the shoulder-lift joint, a NEMA-34 stepper motor. Middle, the

elbow joint, a NEMA-23 stepper motor. Bottom, the wrist yaw joint,

a rigidly coupled Robotis RX-64 servo. Note that timescales change

on each plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.9 Low-cost MEMS inertial sensors affixed to the teleoperator’s torso, up-

per arm, lower arm, and hand to estimate desired end-effector positions. 82

5.10 Playing chess via teleoperation. . . . . . . . . . . . . . . . . . . . . . 83

5.11 Demonstration task: making pancakes. . . . . . . . . . . . . . . . . 84

6.1 Each finger module has three motors at the proximal end of the module,

shown at left in the figure. . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2 The hand frame and its set of identical finger modules, which dock

magnetically or with retaining bolts. . . . . . . . . . . . . . . . . . . 88

6.3 The motor module (aluminum at right) separates from the rest of the

finger module (plastic at center) by simply removing a few bolts. Cable

tension is not affected. . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.4 Hand frame variations. Finger modules are unchanged. . . . . . . . . 92

6.5 The locations of the accelerometers are illustrated by the red circles. . 93

6.6 Soft tactile pads allow conformal grasping of small objects. . . . . . . 94

6.7 To achieve mechanical robustness while still exhibiting conforming

properties, the skin consists of a tougher thin outer layer above a very

soft and thick inner layer. . . . . . . . . . . . . . . . . . . . . . . . . 94

xviii

6.8 Cross-section rendering showing transflective sensors embedded in the

finger pads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.9 Tactile array implemented as a rigid-flex PCB. . . . . . . . . . . . . . 96

6.10 Flat test fixture for the tactile array. . . . . . . . . . . . . . . . . . . 97

6.11 Raw sensor response of repeatedly loading and unloading a 2-gram US

Penny onto the skin assembly shown in Figure 6.10. . . . . . . . . . . 98

6.12 Trinocular camera boards holding direct-solder lens modules (left) and

fully-assembled camera flex circuit boards (right). . . . . . . . . . . . 100

6.13 Left: beam-steered pico projector affixed to the side of a robotic hand.

Right: depth image constructed using this apparatus. . . . . . . . . . 101

6.14 Difference images of polarity-inversion bar codes produced by the pico

projector, amplified 5x. . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.15 Left: laser line generator mounted on robotic finger. Middle: Laser

line sweeping across scene. Right: typical frame and image-difference

processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.16 Two scenes showing typical scans of the fingertip-mounted laser scanner.104

6.17 Demonstration of unaided stereo (left), texture-assisted stereo (center),

and laser line scanning (right) on an artificial scene with very little

texture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.18 Left: initial prototype hand. Right: final prototype, after extensive

design work to eliminate loose wires. . . . . . . . . . . . . . . . . . . 106

6.19 Stackup of outrunner brushless motors, controller board, and heatsink. 107

6.20 Finger Motor Controller Board (FMCB) . . . . . . . . . . . . . . . . 108

6.21 Two pairs of steel cables actuate the distal phalanges, shown in red

and yellow. Electrical implementation is shown at bottom. . . . . . . 110

6.22 Simplified schematic of the multiplexing of power and half-duplex data

over the pair of conductors running the length of the finger. RS-485

transceivers are connected to the D+/D- nodes; F2 and F3 power sup-

plies are connected to the V+/V- nodes. Bus power is supplied from

FMCB (left). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

xix

6.23 Left: illustration of the connectors in the palm and one finger module

base. Right: The resulting hand, which has features no loose wires. . 114

6.24 Data bus topology of the robotic hand. . . . . . . . . . . . . . . . . . 115

6.25 An exoskeletal glove designed to precisely measure the movements of

the teleoperator, with a kinematic structure similar to the robotic hand.116

6.26 Various grasps and manipulation postures achieved during tele-operation.117

7.1 Left: the STAIR 1 robot. Right: the STAIR 2 robot . . . . . . . . . . 121

7.2 Pan-tilt-zoom (PTZ) camera control graph . . . . . . . . . . . . . . . 130

7.3 Graph of the original STAIR “fetch a stapler” demonstration. The

large red text indicates the tasks performed by various regions; those

annotations were not a functional part of the graph. . . . . . . . . . . 132

7.4 The STAIR1 robot picking up a stapler. . . . . . . . . . . . . . . . . 136

8.1 A typical ROS network configuration . . . . . . . . . . . . . . . . . . 140

8.2 An automatically-generated rendering of a running ROS system . . . 152

xx

Chapter 1

Introduction

This thesis addresses personal robots—that is, robots designed to be owned by in-

dividuals and operated for their benefit, as opposed to robots that are operated as

capital equipment for industrial applications. This nomenclature is modeled on the

paradigm shift seen in computing in the late 1970s, where organization-owned main-

frames and minicomputers gave way to the era of the personal computer and its

corresponding massive shifts in application domains, user base, and societal impor-

tance. Although the legendary success of personal computing has caused virtually

every field of endeavor to wish to recreate its trajectory, the field of advanced robotics

has been described as primed for a similar paradigm shift. The exciting possibilities

for increases in economic productivity as well as numerous potential opportunities for

societal improvement, however, will require advances in a variety of fields of robotics

research and development. Among many others, these challenges include cost, re-

liability, perceptual capability, and software interoperability. This thesis presents a

series of experiments and prototypes designed to explore these issues.

1.1 Personal Robots

Personal robots are a long-standing dream of artificial intelligence. As such, the

concept has attracted massive amounts of research and development over several

decades. Although the term is often used in a variety of contexts, broadly speaking, a

1

2 CHAPTER 1. INTRODUCTION

personal robot is intended to be owned, operated by, or of assistance to, an individual.

Although many subsystems and areas of expertise can be shared, the domain of

personal robotics is distinct from the established field of industrial robotics.

Industrial robotics, often described as automation, values precision, repeatability,

and reliability. Systems designed for production-line operation can have decades-long

deployments and thus must be developed with long-term system stability as a primary

design consideration [31]. In contrast, envisioned applications of personal robots often

involve difficult perceptual challenges, reasoning under considerable uncertainty, and

close interactions with humans. Both domains are certainly robots—sensors and

actuators connected by algorithms—but the differences are significant.

In addition to these environmental and task-space distinctions, many envisioned

applications of personal robotics are far more cost-sensitive than typical industrial-

robot applications. In contrast to the superhuman performance achieved by many

deployed industrial-automation systems, many envisioned tasks for personal robotics

are currently performed manually, and thus can be readily evaluated in terms of labor

cost. For mass-market acceptance of personal robots beyond entertainment value, this

presents an upper bound on the cost of the system.

Although the mass manufacture of any product, including personal robots, can

lead to enormous cost savings, additional cost-reduction can be achieved by specif-

ically designing robotic systems for low-cost operation. This entails co-design of

software and hardware, with the interplay generally seeking to increase the complex-

ity of software to compensate for a reduction in the complexity and/or precision of

hardware, a trade-off often justified by noting that extraordinarily complex software

can be perfectly copied at zero cost.

The work presented in this thesis was intended to address several challenges cur-

rently limiting deployments of personal robots. The hardware systems were developed

specifically to drive down the system cost while still attaining a level of performance

deemed sufficient to accomplish a variety of tasks envisioned of future personal robots.

The subsystem costs of several state of the art service robots were analyzed to select

areas for cost reduction, resulting in the identification of several research directions: a

low-cost localization system that does not require a scanning laser rangefinder, and the

1.2. CONTRIBUTIONS 3

development of sensing techniques, calibration methods, and fabrication techniques

to create low-cost manipulator arms and hands. The results from these low-cost

explorations are presented in this thesis.

Even if cost is no object, real-world usage of advanced personal robotics must

face the perceptual challenges associated with the unstructured and highly variable

environments of typical homes and offices. As a result, this thesis includes a series

of experiments designed to demonstrate that high-resolution depth sensing can dra-

matically improve the reliability of object recognition in the environments envisioned

for future deployments of personal robots, and can be done at reasonable cost. These

results led to the integration of high-resolution depth sensing systems in the robotic

hand which will be presented in Chapter 6.

Finally, numerous challenges are created by the sheer volume of software required

to create general-purpose robotic systems of the complexity necessary to handle un-

structured environments. Software systems of this scale require the contributions

of massive numbers of researchers and engineers, typically working in parallel and

often in disparate groups. Such challenges are common to any large-scale engineer-

ing endeavor. However, the desire to improve the productivity of robotics software

developers, in addition to the stability of the resulting composite systems and the

re-usability of individual robotic software subsystems, resulted in the creation of a

series of robotics software frameworks as part of this thesis. These frameworks were

used to implement all of the hardware systems described in the thesis, and for all of

the integrative demonstrations described in this work. The Robot Operating System

(ROS) has since become widely-used in the personal-robot research community as

well as in other robotics domains. The design goals and development of ROS and its

predecessors will be described in detail in the final two chapters of this thesis.

1.2 Contributions

The contributions of this thesis are in two areas. First, a series of hardware de-

sign techniques for low-cost robotics, culminating in the design of a low-cost, high-

performance robotic hand. Second, a series of robotics software frameworks were

4 CHAPTER 1. INTRODUCTION

designed, implemented, refined, and freely released to facilitate collaboration and

promote interoperability in the robotics research and development community.

1.3 Outline

The thesis will proceed in the following manner:

Chapter 2: Low-cost Indoor Localization will describe a low-cost system for

indoor localization. Localization is a critical component of virtually all mobile robotic

systems, and the system described in this chapter was purposely designed using only

low-cost sensors suitable for localizing and tracking personal electronics and robotics

in typical indoor environments.

Chapter 3: High-resolution Depth Sensing presents a high-resolution short-

range depth-sensing system mounted on a mobile manipulator. Experimental results

are then provided when using this system to perform two tasks that are expected to

be necessary for personal robots: opening interior doors, and object recognition in

cluttered environments.

Chapter 4: Low-cost Joint Encoding describes a method for estimating the

kinematic state of manipulator arms using low-cost 3D MEMS accelerometers of the

type commonly found in mobile phones. Implementations are shown on both low-cost

and high-precision manipulators.

Chapter 5: A Compliant Low-cost Manipulator presents the design and

implementation of a low-cost robotic arm with compliant joints. The manipulator is

then demonstrated performing a cooking task.

Chapter 6: A Low-cost Robotic Hand presents the design and implemen-

tation of a fully-actuated robotic hand, which integrates all of the low-cost design

concepts detailed in the previous chapters and provides an integrated suite of visual,

tactile, and inertial sensors.

Chapter 7: STAIR and Switchyard presents an overview of the STanford AI

Robot (STAIR) project to create a home and office assistant robot, and Switchyard,

the initial software integration framework which emerged from that effort.

Chapter 8: ROS: A Robot Operating System describes the development,

1.4. FIRST PUBLISHED APPEARANCES 5

design, and features of the Robot Operating System (ROS), a much larger software

integration framework intended to support personal robotics.

1.4 First Published Appearances

Much of the following material has been previously published. Chapter 2 is derived

from [77]. Chapter 3 is derived from [73]. Chapter 4 is an extension of [75]. Chapter

5 is derived from [72]. The work described in Chapter 6 has not previously been

published. Chapters 7 and 8 provide a more detailed discussion of work originally

published in [74] and [76].

Chapter 2

Low-cost Indoor Localization

2.1 Introduction

Of all the subsystems of a personal robot, localization is perhaps the one most fre-

quently glossed over as a solved problem. Indeed, there is a vast literature in the

subject, dating back to the earliest mobile robot experiments. Robust navigation

systems are now widely available and are often based around Bayes filter variants

which observe the world through laser range-finders and robot odometry [96].

However, most fielded localization systems involve high-precision, expensive sen-

sors such as laser range sensors, high-quality inertial units, or extensive infrastruc-

ture. Although these systems have been usefully employed in a variety of settings, the

domain of personal robotics entails cost sensitivity that is difficult to achieve using

time-of-flight laser range sensors. In contrast, this chapter describes a localization sys-

tem which employs sensing technologies found in commodity cellphones: integrated

CMOS imagers, accelerometers, and WiFi radios. This sensor suite is quite different

from the canonical laser range finder and odometer found in most robot localization

tasks, and presents a different (though related) set of challenges.

The system consists of two components: a mapping platform and a mobile lo-

calization platform. The mapping platform is a typical modern robot, as shown in

Figure 2.1. Like many robots of this size class, it can autonomously acquire maps of

indoor environments and aligns them with off-the-shelf SLAM algorithms. The focus

6

2.2. RELATED WORK 7

of this chapter is the mobile localization system, which can autonomously localize it-

self to pre-made maps using only low-cost sensors. The method requires no additional

environment instrumentation or modification beyond standard, widely-deployed WiFi

infrastructure. In practice, the mapping platform is used once for each environment.

The resulting map may then be used by many roaming robots, bringing high-quality

localization to these low-cost sensor platforms.

The implementation was tested by evaluating its accuracy against ground truth

results acquired using the backpack-mounted sensing system shown in Figure 2.1.

Using this data, the method was shown to provide sub-meter precision with low-cost,

consumer-grade sensors and without environment modification or instrumentation.

WiFi is shown to be excellent for quick global convergence, but camera data performs

better for precise position tracking, and sensor fusion combines the best aspects of

both systems. The localization system was tested in a typical office environment,

where the map and localization data were collected at different times of day and on

different days, with the environment allowed to undergo typical daily changes. The

system offers numerous potential applications, including the localization of low-cost

personal robot platforms in typical home and office environments.

2.2 Related Work

The literature on localizing a robot (or other rigid sensor platform) against a map

is vast. [96] provides a comprehensive literature review, which is summarized and

extended in this section. The idea goes back at least as far as the robot Odysseus [90],

which compared sensor measurements in a local grid to a global map and competed

at the National Conference on Artificial Intelligence (AAAI) in 1992. A continuum of

algorithms exist across a variety of sensor and map configurations. [52] used sonar to

detect coarse landmarks in maps and localize with an extended Kalman filter (EKF).

Later, grid-based methods were developed. In contrast to EKFs, these methods

represented the posterior as a histogram and were not constrained to Gaussian noise

assumptions. Grid-based methods usually relied on landmarks, however. Grid-based

localization was used successfully in sewer pipes [37], in a museum [11], and in an office

8 CHAPTER 2. LOW-COST INDOOR LOCALIZATION

environment [89]. [95] used learning to determine the best landmarks for reliable local-

ization. Most recently, Monte Carlo Localization (MCL) [18] was developed, replacing

landmarks with raw measurements and the histogram posterior with particles. In a

hybrid of ideas between MCL and grid-based methods, [43] introduces MCL with fea-

tures. Several papers have utilized MCL with cameras including [51, 83, 106, 46, 105].

Others have localized by direct image matching, without using a probabilistic filter or

motion model [85, 108]. Localization with signal-strength mechanisms such as WiFi

have been studied in the literature as well [6, 25, 21, 53, 36], including systems that

bootstrap automatically without an explicit map-making step [57, 7].

There are several key differences between the system described in this chapter

and the previous literature. First, the sensor suite is intentionally limited to com-

modity parts whose economy of scale allows price points in the tens-of-dollars range:

consumer-grade MEMS sensors, an integrated CMOS camera, and a WiFi radio.

Second, while previous work exists on low-cost sensors such as WiFi or cameras,

these sensors were usually studied individually, whereas the system described in this

chapter employs probabilistic sensor fusion. As long as the sources of measurement

uncertainty, such as noise and bias, are conditionally independent, combining mul-

tiple sensors will have a positive impact on performance. This is true even if the

inexpensive sensors are highly noisy. The data shows that while WiFi offers fast

global convergence, cameras provide more precise tracking. Sensor fusion allows the

best of both.

2.3 Approach

The system is based around three levels of sensing and inference. The first two are

used for offline map-building, and the third is used for online localization. These

stages are described in detail in the following sections.

2.3. APPROACH 9

Figure 2.1: Left: the STAIR robot, used to acquire maps. Right: a laser backpack,used to provide ground-truth localization of a pedestrian carrying the low-cost sensorpackage.

2.3.1 Robotic SLAM

The first tier of the system captures the 3D structure of the environment. This is per-

formed by a robotic platform equipped with three LIDAR scanners and a panoramic

camera, as shown in Figure 2.1. To build up a 2-D map of the environment and

correct the odometry of the robot, a horizontal LIDAR is used with the GMapping

SLAM system, an efficient open-source implementation of grid-based FastSLAM [34].

The GMapping system was used out-of-the-box to produce the 2D map shown

in Figure 2.2. The robot path corresponding to this map was then used to project

the vertical and diagonal LIDAR clouds into 3D by backprojecting rays through the

rectified images into the LIDAR cloud. The robotic mapping phase of the system is

thus able to map the 3D structure and visual texture of the environment. However,

this alone is not enough to permit localization via a low-cost sensor suite; what is

needed next is a precise sensor model of how the low-cost sensors behave in the


Figure 2.2: 2D map of the environment used in these experiments, as produced byGMapping and the robot shown in Figure 2.1

environment of interest. This is handled by the next phase of the system.

2.3.2 Obtaining Training Data

Non-parametric methods are a simple way to capture the complex phenomena ob-

served by the low-cost sensors. For example, it would be difficult to parametrically

model the various radio-frequency (RF) propagation effects that can occur with WiFi

signal power in typical indoor environments. Issues such as occlusion/shadowing

from building structural elements, interference between multiple access points, di-

rectionality of the transmit and receive antennas, etc., result in a complex power

distribution pattern. Similarly, the camera of a smartphone captures an enormously

complex stream of data. A simple (indeed, perhaps the simplest) way to predict these

complex observations is to simply acquire many observations from a large number of

known positions in the environment.

Obtaining training data for these non-parametric techniques is non-trivial: the

location of the observation (WiFi signal power or camera image) must be known for

it to be useful to subsequent localization algorithms. Low-cost localization systems

can be used to localize any number of items, ranging from low-cost robots to wheeled

equipment to pedestrians. In this experiment, pedestrian localization was examined

in detail, although the resulting system could be used equally well (indeed, perhaps

with even better results) on wheeled vehicles.

Because the pedestrian’s body can have an effect on the received signal strength

2.3. APPROACH 11

Figure 2.3: A typical rendering of the particle filter used to localize the “groundtruth” pedestrian using rearward LIDAR. The green LIDAR scan is rendered fromthe most likely particle in the filter.

(e.g., the person’s body is directly between the receiving and transmitting antennas).

A system was created for accurately localizing pedestrians, and was used to obtain

training data for non-parametric modeling of the spatial RF signal power.

To localize the pedestrian, a rearward-facing laser range finder was affixed to a

backpack, as shown in Figure 2.1. A particle filter was then employed to fuse the

laser observations with a crude motion model of a walking pedestrian. This is more

challenging than the canonical robot localization task, since mobile robots typically

have odometry which is locally stable. In contrast, the pedestrian-localization system

used in this experiment only knows if a person is walking or not; at time of writing,

low-cost MEMS accelerometers are far too noisy to simply doubly-integrate to produce

position estimates. Instead, a simple sliding-window classifier was used on the spectra

of the acceleration vectors to detect when to apply a “walking” motion model. Low-

cost magnetometers were found to be challenging in the test environment of a steel-

framed building with many computers, power cables, and other electronic equipment

capable of inducing local magnetic disturbances. While this testing environment may

have been particularly unfriendly, similar local magnetic perturbations would likely

also pose challenges to efforts relying heavily on magnetometer data in similar indoor

environments.

As the accelerometer and magnetometer can only give a coarse measurement of

the path of a pedestrian, the LIDAR-based pedestrian particle filter relies heavily


Figure 2.4: Example image regions corresponding to the “visual words” used duringimage matching.

on frequent “gentle” resamplings of the particle cloud. More specifically, the mea-

surement model has a far higher uniform component than is typical for mobile robot

filters, and incorporates measurements from every laser scan, in order to correctly

track the pedestrian through turns. A typical rendering of the particles is shown in

Figure 2.3.

2.3.3 Camera Sensor Model

The literature on place recognition using visual images contains many proposed meth-

ods. For these experiments, three different approaches were selected from the recent

computer vision literature: a “bag of words” method using SURF descriptors of inter-

est points [16] [5], a “bag of words” method using HoG descriptors of a dense uniform

grid [17], and a color-histogram method [111]. The first two methods were further

augmented by adding a spatial pyramid [50]. These methods will be briefly described

in the following paragraphs.

In the bag of words model, first a dictionary of “visual words” is constructed. This

is done by extracting SURF [5] descriptors from a large set of images captured of the

target environment and quantizing the descriptors using K-means clustering. The

2.3. APPROACH 13

resulting 128-dimensional cluster centroids are stored with indices 1 to k. Figure 2.4

shows image patches whose descriptors are at the center of clusters computed by K-

means. Then, given a image, the “bag of words” representation can be calculated in

the following way:

• Extract SURF descriptors from the image

• Map each descriptor to the index of the nearest centroid in the dictionary

• Construct a histogram with the frequency counts for each index, i.e., the number

of descriptors that were mapped to each index

Although the histogram discards all of the geometric information about the loca-

tions of the descriptors in the image, the histograms have nevertheless been shown to

function effectively as compact descriptions of the image content.

The HoG-based method used a similar approach. However, instead of using de-

scriptors of interest points, the image was sampled on a dense grid. As a result, the

number of HoG descriptors extracted from each image was always the same. To pro-

duce a similar data compression as the SURF-based method, we chose to extract HoG

descriptors from 32x32 blocks arranged on a 15x20 grid across the image. This re-

sulted in 300 HoG descriptors per image, which was similar to the average number of

SURF keypoints found in the same images using the OpenCV SURF implementation.

As before, k-means was used to quantize the HoG descriptors, and then histograms

were built of the quantized descriptors for each image.

As previously mentioned, the “vanilla” bag of words algorithm discards the spatial

configuration of the descriptors in the image plane. The “spatial pyramid” approach is

one proposed method to incorporate coarse spatial information, and is fully developed

in [50]. This method repeatedly subdivides the image into quadrants, and constructs

histograms for each quadrant on each level. For example, a two-level spatial pyramid

would have one global histogram for the whole image, and one histogram for each

quadrant, for a total of five histograms. Similarly, the three-level pyramid has 1+4+

16 = 21 histograms. This approach has been shown to offer improved performance


over the single-histogram technique, at the cost of correspondingly increased memory

and computation requirements.

For a radically different approach, and to serve as a baseline, a color-histogram

technique was also implemented. This technique is conceptually much simpler: the

image is first converted to hue-saturation-value (HSV) space, after which a histogram

is constructed of the hue values of all pixels in the image. HSV space is used to provide

some invariance to illumination changes. The resulting representation is essentially

a polar histogram of the color wheel, with the intent of capturing the colors of the

walls, ceiling, and floor coverings, which often change depending on the region of the

indoor environment viewed by the camera.

To use these image representations in a localization filter, it is necessary to produce

an estimate of the probability that an image representation z was produced from pose

x. To compute this probability using an approach analogous to that of laser range-

finders, a textured 3D model of the world would need to be projected into the camera

frame of each particle in the particle filter, followed by the computation of some sort

of distance function. This would be computationally difficult, even on high-power

GPU hardware. Instead, a coarse, yet experimentally justified, approximation was

used: p(z|x) was estimated through a nearest-neighbor lookup on the training-set

images yiimgand poses yipose in histogram space. A histogram distance metric was

augmented with a penalty for using images that were far from the candidate pose x.

Intuitively, if the pose x was in the exact position as a pose in the training set,

and the corresponding image histograms are identical, p(z|x) should be very high.

Furthermore, p(z|x) should fall off smoothly as the image and pose start to differ

from the training image histogram yihistand training image pose yipose , so that query

images taken near (but not exactly on) the poses of the training images will still receive

a significant probability. Conversely, if the query image z is significantly different from

the training image yihist, or the candidate pose x is significantly different from the

map image pose yipose , the probability should be very small.

Various probability distributions were tested, and it was found experimentally

that the heavy tails of a Laplacian distribution were better suited for this sensor than

a Gaussian distribution. Two parameters, λ1 and λ2, allow for independent scaling

2.3. APPROACH 15

−40 −30 −20 −10 0 10 200

20

40

60

80

100

120

RSSI difference from means

Fre

quen

cy (

2−se

cond

mea

s. in

terv

als)

25 Stationary observations of 34 transmitters

Figure 2.5: Empirical justification of the Gaussian + uniform model of the WiFi powermeasurements. The plot shows the frequency of power measurement deviations fromtheir respective means. This dataset was gathered while sitting stationary for 60seconds, and includes 34 transmitters, most of which were observed 25 times.

between the histogram distance and the pose distance. Penalization was added for

yaw deviation, as the query image and the training image should be pointed in nearly

the same direction for comparison to be meaningful. The combined model first finds

the nearest neighbor, using the aforementioned weighted distance metric, and then

models that distance as a zero-mean Laplacian distribution:

p(z|x) ∝ exp−miniλ1

∥∥z− yiimg

∥∥1

+ λ2∥∥x− yipose

∥∥2

σ(2.1)

Large changes in ambient illumination will cause low-cost cameras to have numer-

ous artifacts, such as higher noise when the camera gain must be raised in dim lighting

to maintain a short exposure time. This, in turn, will cause a different number of in-

terest points to be found in the image, resulting in a vertical shifts of the histogram.

To provide some measure of invariance to global illumination for the SURF-based

method, the image histograms were normalized before computing their distance.


2.3.4 WiFi Sensor Model

WiFi signal power measurements do not suffer from the correspondence-matching

problem often associated with robotic sensors. Signal power measurements from

scanning WiFi radios are returned with the transmitter’s MAC address, a 48-bit

number unique to the hardware device (barring pathological spoofing cases). Thus,

even though the power measurement is noisy, WiFi observations can provide excellent

context for global localization.

To simplify the probabilistic treatment, conditional independence of the WiFi sig-

nals was assumed. This assumption is impossible to justify without access to the con-

figuration and firmware of the WiFi radio, and we suspect that the assumption does

not hold up. For example, if two WiFi radios are broadcasting on the same channel,

a nearby radio may mask the presence of a more distant radio, or in some infras-

tructure deployments, the same radio may broadcast more than one MAC address.

However, assuming conditional independence was found experimentally to provide

a useful likelihood function, and has the significant added benefit of computational

simplicity.

To model the WiFi noise, a Gaussian distribution was used that was summed with

a uniform distribution. This was empirically justified by the stationary observations

shown in Figure 2.5, which were gathered from 34 transmitters over 60 seconds.

There is a Gaussian-like bump around the expected mean, and a small number of

large deviations on both sides. More formally, for a set of signal power measurements

zi and a robot pose x

p(z|x) ∝∏i

exp

(−‖zi − hi(x)‖22

σ2

)(2.2)

where hi(x) is the predicted power measurement for transmitter i at pose x. To

make this prediction, we simply employ nearest-neighbor over the training set: since

each observation in the training set occurred at a known location (thanks to the laser

scanner employed at training time), we build up a pre-computed map of the nearest-

neighbor prediction of the WiFi signal power levels. A sample nearest-neighbor map

is shown in Figure 2.6. A nearest-neighbor map was computed for each MAC address

2.3. APPROACH 17

Figure 2.6: Pre-computed nearest-neighbor prediction of the WiFi signal strengthof a particular MAC address at any point in the environment. The walls of theenvironment are overlaid for clarity.

seen in the training set. With these maps, the computation of p(z|x) is linear in the

number of MAC addresses in z.

2.3.5 Localization

Once the sensor models are acquired, they can be incorporated into a particle filter,

to introduce temporal constraints on the belief state and to fuse the models in a

systematic fashion. The particle filter used was Monte Carlo Localization (MCL)

as described in [18]. The update step of the particle filter requires a motion model.

As previously described, magnetometers were found to be unreliable in the steel-

framed building used in these experiments, preventing reliable direct observation of

the heading changes of the low-cost sensor package. Instead, a motion model was

used to continually hypothesize motions of the pedestrian.

The motion model was empirically developed to match the trajectories observed by

the laser-equipped ground-truth pedestrian. The motion model assumes that pedes-

trians usually travel in the direction they are facing, and this direction usually does

not change. This behavior was modeled by sampling the future heading from a Gaus-

sian distribution N1 centered on the previous heading. The velocity of the pedestrian

was sampled from a Gaussian distribution N2 with a mean of 1.2 meters/second,

which was empirically found using the LIDAR-based pedestrian localizer. These dis-

tributions are summed with a 2-d zero-mean Gaussians N3 to encourage diversity in


Figure 2.7: Visualization of the unified vision + WiFi localization system. Upper-left shows the particle cloud, which is overshadowed by the centroid of the particledistribution (yellow) and the ground-truth position (cyan crosshairs). Right showsthe current camera image, with SURF keypoints circled. Lower-left shows the jointlikelihood of the WiFi observations. Extreme lower-left visualizes the histogram ofthe bag-of-words representation image.

the particle filter. More formally, to sample from the motion model,

v′ = Rθ

[N2 (µvel, σ1)

0

](2.3)

x′

y′

θ′

=

x

y

θ

+

N3 (0, σ2) + v′

N1 (0, σ3)

(2.4)

The parameters to this model were tuned in the LIDAR-based localization sce-

nario, where the time between each laser scan was 27 milliseconds. To scale up to

the larger intervals seen in the WiFi- and camera-based filters, particles were simply

propagated through the previous equations the appropriate number of times. Run-

ning the model for one second produces the particle distribution shown in Figure 2.8

2.4. RESULTS 19

−1 0 1 2 3

−1.5

−1

−0.5

0

0.5

1

1.5

Motion Model

longitudinal axis (meters)

late

ral a

xis

(met

ers)

Figure 2.8: Pedestrian motion model, shown after a one-second integration. Withoutodometry, the particle filter must generate sufficient diversity in its hypotheses tohandle corners.

(dimensions in meters).

The motion model also encodes the fact that the target cannot go through walls.

As a result, when the target platform passes an intersection of corridors, particles are

rapidly generated to implicitly cover the possibility that the pedestrian has turned.

As is common practice in particle filters, to prevent premature convergence of the

particles during global localization, and to handle unmodeled effects when tracking

(e.g., lens flare when facing a sunbeam from a window, passers-by occluding the

camera, RF anomalies, etc.), a uniform distribution was added to the measurement

models described in the previous section.

2.4 Results

To quantify the performance of the system, two data sets were collected from the

second floor of the Stanford University Computer Science Department. The data sets

were approximately 13 minutes long and contain paths approximately one kilometer


Figure 2.9: Images from the training set (top) differed from images in the test set(bottom) due to illumination changes and typical furniture re-arranging.

long. The first data set was recorded in the daytime, and the second data set was

recorded in the nighttime, several days later, as shown in Figure 2.9. In the interim,

many chairs were moved around in meeting spaces, different office doors were open (or

shut), and some clutter was moved. No effort was made to normalize the environment

between testing and training, other than to ensure that interior lights were turned

on. However, no major renovations, redecorations, or organized clean-up occurred in

the intervening days.

The first data set was used solely for training the sensor models. The second

data set was used to generate localization estimates using the models learned from

the first data set. The backpack shown in Figure 2.1 was worn while collecting both

datasets, to permit quantitative analysis of localization errors of the low-cost sensors

with respect to the LIDAR localization scheme, which is the best available estimate

of ground truth.

2.4. RESULTS 21

−20 −10 0 10 20 30

−10

−5

0

5

10

Figure 2.10: The ground-truth LIDAR track of the 1-kilometer test set used forquantitative evaluation. The test set contained 62 corner turns, and a mixture ofnavigating tight corridors and open meeting spaces. Distances are in meters.

Data was collected on a laptop carried by a pedestrian. A handheld Logitech

Webcam Pro 9000 was run at 640x480, 30 frames per second, and the raw YUV

frames were recorded to disk. The internal WiFi card (Intel 4965) of a Dell m1330

laptop provided the WiFi data. Code was adapted from the Linux “iwlist” command

to scan the RF environment every two seconds. Accelerometer data was provided by

a handheld MicroStrain 3DM-GX2 at 100 Hz. All of these sensors are comparable in

performance to those in high-end smartphones; a laptop was used simply for ease of

data-collection and storage.

Empirical evaluation was performed of several vision methods, WiFi by itself, and

finally on a combination of WiFi and the empirically-best vision method.

The first evaluation benchmark is a histogram of the localization errors observed

on a 1-kilometer path through the test environment. This test dataset included 62

corners, and is shown in Figure 2.10.

Table 2.1: Quantitative results for the tracking taskMetric WiFi SURF HoG Color SURF+WiFi

mean error (m) 1.81 0.73 4.31 10.90 0.78std. dev of error (m) 0.99 0.57 9.15 10.65 0.64

The results of evaluating single-level SURF and dense HoG, color-histogram, WiFi,

and SURF+WiFi on the “tracking” benchmark are shown in Figure 2.11. The SURF


Figure 2.11: Histograms of localization errors during the tracking benchmark on acontinuous 1-kilometer test set. Errors are measured with respect to LIDAR ground-truth. The SURF and HoG performance is for the global (1-level) spatial pyramid.Adding WiFi to SURF slightly decreases its long-term average tracking accuracy. Thecolor histogram performs poorly.

method outperforms the WiFi method, and the combined SURF+WiFi system per-

forms no better than the SURF-only system. The dense HoG method does signif-

icantly worse, and the color histogram method performs poorly. The quantitative

results are shown in Table 2.1.

The second benchmark measures the speed of global localization by averaging the

localization error as a function of the time since the localizer was started. These

results were computed by starting the localization systems on the test data at 200

regularly-spaced starting points. A graphical plot is shown in Figure 2.12.

This benchmark reveals an interesting duality of the sensor suite: the WiFi sys-

tem, thanks to having an intrinsic solution to the correspondence problem, can quickly

achieve a mean error of 3-4 meters. However, due to the many sources of noise in the

WiFi signal power measurement, the WiFi-only system cannot obtain meter-level per-

formance. In contrast, the best visual methods (2-and 3-level SURF spatial pyramid)

are able to obtain excellent tracking results, but take much longer to converge due to

2.4. RESULTS 23

0 5 10 15 20 25 300

5

10

15

Time (seconds), 1 picture per second, 1 WiFi scan per 1.8 seconds

Mea

n lo

caliz

atio

n er

ror

(met

ers)

Global localization performance (200 different starting positions)

1−level SURF2−level SURF3−level SURF1−level dense HoG2−level dense HoG3−level dense HoGColor HistogramWiFiWiFi + 3−level SURF

Figure 2.12: Global localization performance. The localization systems were startedwith a uniform prior at 200 different starting points in the test set. Errors againstground-truth were averaged at each timestep to show the expected convergence prop-erties of each system. All methods show improvement as more observations are in-corporated. The combination of WiFi and the best visual algorithm (3-level spatialpyramid of SURF descriptors) produces the best performance.

the repetitive nature of some regions of the test environment (e.g., long corridors), or

inherent ambiguity of some starting positions (e.g., facing the end of a corridor).

Probabilistic fusion of the best visual method (3-level SURF spatial pyramid)

and the WiFi measurements produces a system that combines the strengths of both

modalities: quick global convergence to an approximate position fix, followed by

precise tracking. The particle filter performs the sensor fusion automatically, using

the sensor models and the motion model.


2.5 Summary

This chapter presented a precision indoor localization system which uses only low-

cost sensors which are several orders less expensive and less accurate than what is

typically found on a research robot. The method requires no environment instrumen-

tation or modification. The chapter also described the implementation and testing of

the system, demonstrating its effectiveness at sub-meter localization in a test environ-

ment. The results indicate sensor fusion is essential, as WiFi is effective for fast global

convergence, whereas computer vision is preferred for high-precision tracking. Such

a system could be readily ported to a low-cost personal robot navigating in typical

home and office environments, enabling “reasonable” localization performance suffi-

cient to permit autonomous navigation, at a small fraction of the cost of canonical

solutions employing scanning laser range fingers.

Chapter 3

High-resolution Depth Sensing

3.1 Introduction

Personal robots are expected to operate in the unstructured environments of typical

homes and workplaces. Although humans effortlessly use complex objects in such

environments, this domain is enormously more complex than the highly engineered

environments inside the typical closed-workcell environments of successful industrial

robots. The previous chapter presented a system intended to allow low-cost robots to

localize themselves in such environments, but localization is only part of the problem:

viable personal robots must also accomplish useful work. For many applications, this

will require robust perception capabilities to recognize objects in cluttered everyday

scenes.

This chapter will describe a series of experiments showing the utility of high-

resolution 3D sensing on mobile manipulators. Just as the change from sonar-based

sensing to laser-based sensing enabled drastic improvement of navigation and mapping

systems in mobile robotics, the results shown in this chapter suggest that dramatically

improving the quality of depth estimation on mobile manipulators could facilitate new

classes of algorithms and higher levels of performance, as shown in Figure 3.1.

It must be noted that these experiments were performed when the state of the art

of 3D data on mobile manipulators was either tilting time-of-flight laser scanners, or

relatively noisy time-of-flight 3D cameras. As such, the usage of depth information on

25

26 CHAPTER 3. HIGH-RESOLUTION DEPTH SENSING

large mobile manipulators was not yet standard practice. Since the original publica-

tion of these experiments, the field has been upended by the advent of low-cost depth

cameras typified by the Microsoft Kinect. However, the experiments of this chapter

remain relevant and the overall argument is still valid: high-resolution, high-fidelity

depth data can dramatically increase the utility of mobile manipulators in everyday

environments. In support of this idea, this chapter presents two scenarios where high-

accuracy 3D data is useful to large mobile manipulators operating in the cluttered

environments that one would expect to find in deployments of personal robots.

The first scenario involves object detection. In many tasks, a mobile manipulator

needs to search for an object class in a cluttered environment. This problem is chal-

lenging when only visual information is given to the system: variations in background,

lighting, scene structure, and object orientation exacerbate an already-difficult prob-

lem. This chapter demonstrates that augmenting state-of-the-art computer vision

techniques with high-resolution 3D information results in higher precision and recall

than is achievable by either modality alone.

The second scenario involves manipulator trajectory planning. The following sec-

tions demonstrate closed-loop perception and manipulation of door handles using

information from both visual images and the 3D scanner. The high-resolution 3D

information helps ensure that the trajectory planner keeps the manipulator clear of

the door while still contacting the door handle.

An application experiment is then presented which combines these capabilities

to perform a simple inventory-control task. The mobile manipulator enters several

offices, searches for an object class, and records the detected locations.

3.2 Related Work

Augmenting computer vision algorithms with 3D sensing has the potential to reduce

some of the difficulties inherent in image-only object recognition. Prior work has

shown that low-resolution depth information can improve object detection by remov-

ing object classifications which are inconsistent with training data. For example,

objects are usually not floating in the air, some object classes are unlikely to be on

3.2. RELATED WORK 27

Figure 3.1: Several off-axis views of a raw scan of a coffee mug obtained by our scannerfrom 1.2 meters away. The 5mm-thick handle is prominently visible. Approximately1000 points of the scan are on the surface of the coffee mug, despite the fact that itcomprises only 5% of the horizontal field-of-view of scan.

the floor, and many object classes have upper and lower bounds on their absolute

size [33].

However, if a depth sensor’s noise is comparable to the size of the target object

classes, it will be hard-pressed to provide more than contextual cues. The difference

between a stapler and a coffee mug, for example, is only several centimeters in each

dimension. Indeed, many objects designed for manipulation by human hands tend

to be similarly sized and placed; thus, using depth information to distinguish among

them requires sub-centimeter accuracy.

Unfortunately, many current sensing technologies have noise figures on the cen-

timeter level when measuring from 1-2 meters distant. Ranging devices based on

time-of-flight, for example, tend to have centimeter-level noise due to the extremely

short timescales involved [61]. Additionally, time-of-flight ranging systems can intro-

duce depth artifacts correlated with the reflectance or surface normal of the target

object [24].

In contrast, the accuracy of passive stereo cameras is limited by the ability to find

precise feature matches. Stereo vision can be significantly improved using global-

optimization techniques [93], but the fundamental problem remains: many surfaces,

particularly in artificial environments, do not possess sufficient texture to permit

robust feature matching (e.g., a blank piece of paper). Efforts have recently been


Figure 3.2: Clutter makes scene understanding from only 2D visual images difficult,even in a relatively simple office environment, as many of the strong edges are notthose which suggest the 3D structure of the scene.

made to combine passive stereo with time-of-flight cameras [112], but the resulting

noise figures still tend to be larger than what is achievable using a laser line scanner.

Active vision techniques use yet another approach: they project patterns onto

the scene using a video projector, and observe deformations of the patterns in a

camera to infer depth [109]. Besides the difficulties inherent in overcoming ambient

light simultaneously over a large area, the projected image must be at least roughly

focused, and thus depth of field is limited by the optical geometry. However, this

is a field of active research and great strides have been made in recent years. The

PrimeSense sensors, typified in the Microsoft Kinect depth camera, represent the state

of the art at time of writing. A pattern is projected onto the scene in near-infrared,

and its deformation onto the world is decoded using a custom ASIC implementation

to deliver 30-fps performance in a low-power, compact design.

This brief summary of the limitations of alternative 3D sensing modalities is bound

to change with the continual progress being made in each of the respective areas of

inquiry. The work in this chapter seeks to explore the potential benefits of highly

accurate 3D data for mobile manipulators. As the various 3D modalities continue to

improve, their data could be used by the algorithms described in this paper. However,

none of the extant 3D modalities are currently able to match the resolution and sharp

3.3. LASER LINE SCANNING FOR ROBOTICS 29

depth discontinuities that emerge from a triangulation-based laser line scanner. For

the purposes of this study, several laser line triangulation systems were constructed

to explore how high-quality 3D data can improve the performance of mobile manip-

ulation.

Laser line triangulation was selected because millimeter-level accuracy is readily

achievable. This is on the order of accuracy we have been able to achieve in sensor-

to-manipulator calibration; further increases in sensing accuracy would thus continue

to improve the perceptual capabilities of a robot, but not necessarily improve the

performance of its hand-eye calibration.

Laser line scanners have proven useful in manufacturing, as is well documented

both in the research literature [55] and by the numerous products available in the

marketplace. They have been often used in fixed settings, where objects are placed

on a rotary table in front of the scanner [44] or flow by on conveyor belts. Low-

cost implementations have been designed which rely on a known background pattern

instead of precision hardware. Triangulation-based laser scanners have also been

used on planetary rovers to model rough terrain [60], to find curbs for autonomous

vehicles [62] and to model archaeological sites and works of art [54].

Numerous out-of-the-box triangulation systems are commercially available for

imaging small objects. However, many of these systems emphasize high accuracy

(< 0.1mm), often sacrificing depth of field. To be of most use to a mobile manipula-

tor, the sensor needs to cover the entire workspace of the manipulator, and “extra”

sensing range is helpful in determining how to move the platform so that a nearby

object will enter the workspace of the manipulator.

3.3 Laser Line Scanning for Robotics

3.3.1 Fundamentals

The geometry of laser-line triangulation scheme is well-studied and only repeated

here for completeness. Many variants of the underlying concepts are possible. In the

scanners described in this chapter, a rotating vertical laser line is directed into the


scene. An image formed on a rigid, horizontally-offset camera shows a line which is

deformed by the depth variations of the scene (Figures 3.3 and 3.4). On each scanline

of the image, the centroid of the laser slice is detected and used to define a ray from

the camera origin through the image plane and into the scene. This ray is intersected

with the plane of laser light defined by the angle of the laser stage, its axis of rotation,

and the 3D translation from the laser stage to the camera origin. The intersection

of the plane and pixel ray produces a single 3D point directly in the image frame,

thus avoiding the depthmap-to-image registration problem, since the 3D point cloud

is defined directly in the camera image plane.

The vertical angular resolution of the point cloud is limited by the vertical res-

olution of the camera, whereas the horizontal resolution is determined by the the

rotational speed of the laser, the frame rate of the camera, and the horizontal field

of view. Depth resolution is likewise determined by a variety of factors, including

the ratio between the horizontal resolution of the camera and the field of view, the

precision of the shaft encoder on the laser stage, the ability to achieve horizontal

sub-pixel interpolation, the horizontal offset between the camera and the laser, and

the distance of the object from the camera.

3.3.2 Hardware Considerations

The prototype apparatus acquired roughly 600 images during each scan. The hori-

zontal field of view was approximately 70 degrees, and was overscanned by 10 degrees

to accommodate for the depth variations of the scene. As a result, the laser line

moved approximately 0.15 degrees per frame.

The prototype scanner onboard the robot required six seconds to gather its 600

images, which were buffered in RAM on a computer onboard the robot. Subsequent

image processing and triangulation steps require an additional 4 seconds. Such a

slow rate of acquisition means that the scanner cannot be used in fast-moving scenes.

This is a fundamental limitation of this type of laser line scanning. However, addi-

tional implementation effort could result in dramatic speedups, e.g., moving to (very)

high-speed cameras and/or performing the image processing on a graphics processor

3.3. LASER LINE SCANNING FOR ROBOTICS 31

Figure 3.3: A vertical (green) laser line projected by the robot at left is deformed asit strikes objects in the scene.

(GPU).

3.3.3 Calibration

The automatic checkerboard-finding algorithm and nonlinear solver implemented in

OpenCV [8] was used to estimate the camera intrinsics. To estimate the extrinsic

calibration between the camera and the laser, first the translation and rotation were

roughly measured by hand. Then, a flat board was scanned that was marked with

several points whose relative planar distances had been carefully measured. The lo-

cations of these points in the camera image were found and recorded. Then, the

calibration error could be quantified: an error function can be created as the sum

of planarity measures as well as deviations from the measured distances on the cal-

ibration board. The calibration board was imaged from several angles to cover the

workspace of the scanner. A numerical optimization routine was then used to min-

imize the sum of the errors while perturbing the parameters, randomly restarting

many times to explore many different local minima.

The resulting calibration holds true except at the extreme edges of the camera


Figure 3.4: A prototype laser line scanner on the STAIR 1 robot. The laser and itsrotary stage are mounted in the upper-right. Images are captured by the camera inthe lower-left.

view, which was hypothesized to be due to lens effects not captured in the standard

radial and tangential distortion models. Away from the edges of the image, the

scanner shows errors in the range of 1mm when imaging flat surfaces such as doors,

desks, and walls.

To calibrate the manipulator to the scanner, it was necessary to estimate the

6D transform between the manipulator base and the camera frame. To accomplish

this, the end effector of the manipulator was touched to several points on a test

board which were easily identifiable in the camera frame. Each time the end effector

touched a target point, the forward-kinematics estimate of the end effector position

was recorded. Finally, a numerical optimization routine was employed to improve the

hand-measured estimate of the 6D transform. The resulting calibration accuracy was

approximately 5mm throughout the workspace of the manipulator.

3.4. OBJECT DETECTION 33

Figure 3.5: Image channels considered by the patch-selection algorithm, along withthe typical appearance of a coffee mug. Top: intensity image. Middle: gradientimage. Bottom: depth image.

3.4 Object Detection

Once the scanner was calibrated, it was ready to be employed to improve the perfor-

mance of object detection. For many robotics applications, this is a critical subgoal

of a larger task: for example, in order to grasp an object, it is first necessary to detect

its presence (or absence) and localize it in the workspace.

Although the laser-line scanner geometry results in the production of depth es-

timates in the image plane, these estimates do not lie on a regular grid due to the

sub-pixel horizontal interpolation being used to estimate the center of the laser stripe.


Furthermore, some regions of the depth image will be more dense than others, de-

pending on the relative direction of the surface normal and the distance to the surface.

Thus, the depth maps were resampled so that exactly one depth estimate was pro-

duced for each RGB pixel. In the experiments described in this chapter, this was

achieved through bilinear interpolation. The resulting depthmap can then be consid-

ered as another channel in the image.

3.4.1 Sliding Windows

Sliding-window methods attempt to probabilistically match a rectangular window of

the image with a collection of features local to the window. These features are very

small “patches” of the window. The classifier can be viewed as a “black box” which

returns a high probability if the window tightly bounds an instance of the target

object class, and a low probability otherwise. To perform object detection across

an entire image, the window is shifted through all possible locations in the image at

several spatial scales.

An extension of the sliding-window approach was used to combine information

from the visual and depth channels. Similar to the state-of-the-art approach of Tor-

ralba et al. [98], the features used by the probabilistic classifier were derived from

a learned “patch dictionary.” Each patch was a very small rectangular subregion

randomly selected from a set of hand-labeled training examples. The channels con-

sidered were the original (intensity) image, the gradient image (a transformation of

the original image: edges become bright, flat regions become dark), and the depth

map discussed in the previous section. The patches were drawn separately from these

three channels, and probabilistically represented the visual appearance (intensity or

edge pattern) and shape (depth profile) of a small region of the object class, as shown

in Figure 3.5.

Combined, the patches provided a generalized representation of the entire object

class that is robust to occlusion and appearance or shape variation. Each dictionary

entry contained the patch g, its location within the window containing the positive

example w, and the channel from which it was drawn c (intensity, gradient, or depth).

3.4. OBJECT DETECTION 35

Figure 3.6: Examples of localized patches from the coffee-mug dictionary. Left: In-tensity patches. Middle: Gradient patches. Right: Depthmap patches.

A patch response for a particular window was computed by measuring the similarity

of the corresponding region within the window to the stored patch.

More formally, let the image window be represented by three channels {I i, Ig, Id}corresponding to intensity, gradient and depth, respectively. Then the patch response

for patch p = 〈g, w, c〉 is

maxw′

dc(Ic, g)

where dc() was a similarity metric defined for each channel. To improve robustness to

minor spatial variations, w′ was a 7× 7 pixel grid centered around the original patch

location in the training set. This allowed the patches to “slide” slightly within the

window being tested.

The similarity between patches was computed using normalized cross-correlation.

The intensity and gradient channels were normalized by subtracting the average

(mean) from the window. The depth channel was normalized by subtracting the

median depth.

3.4.2 Learning the Classifiers

The preceeding discussion assumed that the classifiers were already known. This

section will describe how the classifiers were built from training data.

For each object class, a binary gentle-boost classifier [28] was learned over two-split

decision stumps in these steps:


Figure 3.7: Precision-recall curves for mugs (left), disposable cups (middle), andstaplers (right). Blue solid curve is for our method; red dashed curve is for visiononly detectors. Scores are computed at each threshold by first removing overlap-ping detections. A true-positive is counted if any detection overlaps with our hand-labeled ground truth by more than 50%. Any detection that does not overlap with agroundtruth object of the correct class is considered a false-positive. Average Preci-sion measures 11-point interpolated area under the recall vs. precision curve. Greaterarea under the curve is better.

• Construct a training set by cropping positive examples and random negative

windows from the training images.

• Build an initial patch dictionary by randomly sampling regions from the positive

training images, and compute patch responses over our training set.

• Learn a gentle-boost classifier given these responses.

• Trim the dictionary to remove all patches that were not selected by boosting.

• Run the classifier over the training images and augment the set of negative

examples with any false-positives found.

• Repeat the training process with these new negative examples to obtain the

final classifier.

Since we are learning two-split decision stumps, our classifiers are able to learn

correlations between visual features (intensity patterns and edges) and object shape

(depth). Example patches from a coffee-mug classifier for the three image channels are

3.5. DOOR OPENING 37

shown in Figure 3.6. This figure is a typical representation of 12 of the approximately

50 patches selected by the algorithm.

Five-fold cross-validation was performed to evaluate the performance of the de-

tectors and compare them against state-of-the-art detectors that did not use depth

information. The dataset consisted of 150 cluttered office scenes, with several objects

in each scene. The training procedure outlined above was used for each detector and

the average performance was calculated over the hold-out sets. Results for coffee

mugs, disposable cups, and staplers are shown in Figure 3.7 and the following table:

Mug Cup Stapler

3D 2D 3D 2D 3D 2D

Max. F-1 Score 0.932 0.798 0.922 0.919 0.662 0.371

Average Precision 0.885 0.801 0.879 0.855 0.689 0.299

The results suggest that 3D information appears to help eliminate false positives.

The 2D detectors seldom miss instances of their trained object class. Instead, the

typical problem is that they can collect a variety of disparate cues from shadows

or unrelated objects that together match enough of the localized patches that the

sliding-window detector considers it a high-probability detection.

The 3D information can help in this regard: the training process often automati-

cally selects relatively large, uniform depth patches. Effectively, this associates higher

probabilities to windows which tightly bound a single object rather than a collection

of several disparate objects. Because the approach described in this chapter did

not normalize for depth variation inside a patch, but only for its median, the depth

patches thus also encoded a measure of the absolute size of an object. These depth

cues were not explicitly expressed in the visual-light image, and as is common in

machine learning systems, presenting a richer set of features to the classifier helped

to boost performance.

3.5 Door Opening

Automation of a large variety of home and office tasks will require that robots rou-

tinely open and pass through doors. For example, at the end of a workday a typical


Figure 3.8: After localizing the door handle in the 3D point cloud, the robot can plana path to the handle and open the door.

office building will have tens or hundreds of closed doors that must be opened if

the robot is to clean the building or search for an item. The ability to open a door

thus needs to be another primitive in the robot’s navigation toolbox, alongside path

planning and localization. In this section, a door-opening system is summarized, to

emphasize the utility of high-resolution 3D sensing for mobile manipulation.

Door opening requires manipulating the door handle without colliding with the

door. The operation of a typical door latch does not allow more than a centimeter

or two of positioning error, as the end effector is continually in close proximity to

the (rigid) door surface. Thus the door-opening task, like any grasping task where

target objects are identified in a camera, tests not only sensing accuracy but also the

calibration between the sensing system and the manipulator.

To test the utility of high-accuracy 3D sensing in this task, a system was con-

structed that used a hand-annotated map to marks the locations of doors on a 2D

building floor plan. If the robot needed to pass through one of the marked doorways,

it used the triangulation-based laser scanner described in this chapter to scan the

door. From this scan, the robot used a classifier trained on hundreds of door handles

3.6. INVENTORY-CONTROL EXPERIMENT 39

Figure 3.9: Detecting coffee mugs in cluttered environments. The detector correctlyignored the paper cup to the right of the coffee mug.

to localize the handle and classify the door as right-handed or left-handed. The robot

then drove to a location where its manipulator could reach the door handle, planned

a manipulator path to the edge of the handle, and pressed on the handle to unlatch

the door, as shown in Figure 3.8. Once the door was unlatched and partially opened,

the robot was able to drive through the door by pushing it fully open as its chassis

(slowly) came into contact with the now-unlatched door.

High-resolution point clouds assisted in planning collision-free manipulator paths

to the door handle. Unlike some 3D sensing modalities which effectively “low-pass”

the depth map as part of the sensing process, the laser-line triangulation process did

not smooth out depth discontinuities, such as those between the door handle and the

door immediately behind it. As a result, the door handle stood out sharply in the

3D data, making path planning and recognition considerably easier.


Figure 3.10: The inventory-gathering experiment required autonomous navigation(green track), autonomous door opening, and 20 laser scans of desks in the fouroffices shown above. The robot position at each scan is shown by the red circles, andthe field-of-view of each scan is indicated by the yellow triangles. The locations ofthe detected coffee mugs are indicated by the orange circles. This figure was entirelyautomatically generated, using the SLAM output for the map and the localizationlog for the robot track and sensing positions, which allow the coffee-mug detectionsto be transformed into the global map frame.

3.6 Inventory-Control Experiment

To demonstrate the utility of these two uses of the laser-line triangulation scanner

on our mobile manipulator, the object-detection and door-opening algorithms were

combined to form an inventory-taking system. Such a system could be envisioned in

a future home, perhaps cataloging the locations of every object in the house at night

so that the robot could instantly respond to human queries about the location of

commonly-misplaced objects. Workplace applications could include inventory-taking

in retail stores, safety inspections in industry, or location verification of movable

equipment in, e.g., hospitals.

In this system, a high-level planner sequenced a standard 2D navigation stack, the

door-opening system, and the object-detection system, which together allow the robot

to take an inventory of an object class in a cluttered office building with closed (but

3.7. SUMMARY 41

unlocked) doors. The system was implemented using the ROS software framework,

which will be described in Chapter 8. A building map was created offline using the

GMapping SLAM toolkit [34], using LIDAR and odometry data. The resulting map

was hand-annotated to mark the locations of doors and desks. The runtime navigation

stack is derived from the Player localization and planning modules, which perform

particle-filter localization and online path planning which unifies object-avoidance

and goal-seeking behaviors.

As necessary during the inventory-taking sequence, control switches to the door-

opening system discussed in the previous section, after which control is returned to

the 2D navigation stack.

A representative run of the inventory-gathering system is shown in Figure 3.9.

During this run, 25 coffee mugs were spread throughout in the search area. The 3D-

enhanced object detector found 24 of them, without any false positives. In contrast,

the image-only detector was only able to find 15 of the mugs, while also reporting 19

false positives. The robot was also seeking to identify disposable paper cups among

the clutter of the environment. The mug-inventory and cup-inventory results are

compared against ground truth in the following tables for both the integrated 3D

detectors and the 2D-only detectors.

3D-Enhanced Detectors

OBJECT COUNT HIT ERROR RECALL PREC.

Mug 25 24 0 0.96 1.00

Cup 10 8 2 0.8 0.8

2D-Only Detectors

OBJECT COUNT HIT ERROR RECALL PREC.

Mug 25 15 19 0.6 0.441

Cup 10 8 4 0.8 0.67

3.7 Summary

As shown by the PR curves obtained when using the 3D information versus 2D

alone, incorporating high-quality 3D information into the sensing scheme of a mobile

manipulator can increase its robustness when operating in a cluttered environment.


The door-opening task shows that high-quality 3D data can help accomplish motion

planning by accurately sensing the immediate vicinity of the robot.

These experiments were conducted using a simple laser-line triangulation device

which was constructed to provide high-quality depth data which preserved sharp

depth discontinuities and avoided depth-to-image registration difficulties. Many other

depth-measurement modalities exist, such as popular depth cameras using systems

from PrimeSense and other manufacturers, which are far easier to procure and op-

erate, and offer much faster speeds. These experiments were conducted before the

general availability of such off-the-shelf depth cameras. However, the data quality of

such cameras, as quantified by point density and noise, has yet to match that provided

by simple laser-line triangulation scanners, albeit at a much slower rate. However,

the quality of depth measurements produced by low-cost depth cameras is rapidly

increasing, and thus the algorithms presented in this chapter have the possibility of

becoming more applicable to commodity depth cameras in future personal robots.

High-resolution depth data can also be obtained by simply reducing the range

between the scanner and the targets of interest, as many depth-sensing modalities,

including stereo vision, have error modes which are a function of range. The results

of the work presented in this chapter inspired the integration of depth sensing into a

robotic hand which will be presented in Chapter 6, where extremely high resolution

3D point clouds can be acquired by “flying” the robotic hand near the objects of

interest.

Regardless of how such depth data is acquired, the results presented in this chapter

suggest that high-resolution depth processing is a powerful technique to address the

variability and clutter found in the everyday environments likely to be encountered

by personal robots.

Chapter 4

Inertial Joint Encoding

4.1 Introduction

The previous chapter advocated the use of high-precision 3D data to improve object

detection and manipulation of large robots equipped with manipulator arms. This

chapter moves further down the system, addressing the sensing needs of the manipu-

lator arms themselves while controlling system cost, as future deployments of personal

robots are envisioned to be extremely cost sensitive.

This chapter presents a sensing strategy which uses a series of 3D MEMS ac-

celerometers to provide a low-cost method for estimating the kinematic configuration

of robot manipulators. The approach mounts at least one 3D accelerometer for each

pair of joints. Then, the joint angles are inferred using either point estimates or

through a tracking method using an Extended Kalman Filter (EKF). Because of its

low cost, low power, and negligible volumetric requirements, this system of joint po-

sition estimation can be used to augment an existing robotic sensor suite, or it can

function as the primary sensor for a low-cost robotic manipulator.

43

44 CHAPTER 4. INERTIAL JOINT ENCODING

Figure 4.1: Two manipulators used to demonstrate the utility of accelerometer-basedsensing. Left: Willow Garage PR2 Alpha. Right: a prototype low-cost manipulator.

4.2 Related Work

Inertial sensing has been used extensively in recent years for human motion capture.

Several companies provide small, lightweight, and networkable inertial units which in-

tegrate accelerometers, gyroscopes, and magnetometers, and the required supporting

electronics. These systems are easily attached to human limbs and torsos and are used

extensively in film or video-game character animation [91], virtual reality [27], [26],

or human activity recognition [3], among many existing and proposed applications.

Accelerometers are used in the attitude determination systems of virtually every

aerial vehicle, as well as for a variety of unconstrained navigation applications includ-

ing underwater vehicle guidance. Mass-market accelerometer applications began with

collision-sensing devices for automotive air-bag systems, but inertial sensors are now

found in an ever-increasing array of products, including mobile phones, tablet com-

puters, digital cameras, video game controllers, laptop hard drives, and so on. The

devices are now so small and power-efficient that new applications are continually

emerging.

4.3. STATE ESTIMATION 45

In the robotics literature, accelerometers have also found other uses in robot navi-

gation beyond attitude determination. For example, when strapped to legged robots,

spectral analysis of accelerometer readings can be used to classify the walking sur-

face to aid in gait selection and tuning [103]. For robotic manipulation, prior work

includes simulation results of configuration estimation [67], employing strapdown in-

ertial units on heavy equipment [30], kinematic calibration [12], and the creation

of a fault-detection system [1]. In [58], accelerometers were doubly-integrated using

an Extended Kalman Filter (EKF) to estimate the configuration of a SCARA-type

manipulator undergoing high-dynamic motions. The use of accelerometers in flexible-

link robots was proposed in [56]. A human-robot system was proposed in [65] which

incorporated the attachment of accelerometers and other sensors to a human tele-

operator.

4.3 State Estimation

The vast majority of robotic manipulators use shaft encoders of some variety (optical,

magnetic, resistive, capacitive, etc.) to determine the kinematic configuration of the

manipulator. Shaft encoders may be placed on the joints themselves, or they may

be used on the motor shafts prior to the speed-reducing elements, to gain increased

resolution at the cost of not observing any unmodeled behavior in any downstream

transmissions or linkages. In contrast, this chapter discusses a sensing scheme based

solely on 3D MEMS accelerometers, and does not require any electromechanical shaft

encoders to produce estimates of the kinematic configuration of the manipulator.

As will be discussed in Chapter 6, such volumetric considerations become critical in

heavily-constrained design environments such as robotic hands.

In static conditions, a 3-axis accelerometer essentially returns a 3D vector pointed

away from the center of the earth. Since the length of this vector is fixed at 1g in

the static case, a static 3-axis accelerometer only exhibits two degrees of freedom.

Thus, at least one accelerometer is required for every two rotary joints in a robotic

manipulator. However, incorporating one accelerometer per rotary joint will increase

the robustness, the accuracy of the joint-angle estimates, and eliminate the potential


for multiple point-estimate solutions. These effects will be discussed in following

sections.

It is important to note that given a vector of 3D accelerometer readings and

knowledge of the kinematics of the system, it is possible to estimate joint angles in

all cases except measurement singularities where an axis of rotation is vertical. As an

axis of rotation approaches the vertical, inference accuracy is reduced by the loss of

effective resolution, as the noise floor of the measurement engulfs the projection of the

gravity vector onto the joint axis. In an all-accelerometer sensing scheme, this set of

configurations must be avoided, and the severity of this limitation is dependent on the

kinematics of the manipulator and the required task. For example, accelerometer-

only sensing will completely fail on SCARA-type manipulators which always have

vertical joint axes. However, for the anthropomorphic manipulators considered in

this chapter, vertical joint configurations are readily avoidable for most degrees of

freedom. Alternatively, excursions through vertical or near-vertical joint orientations

could be gracefully handled by augmenting the accelerometer measurements with

magnetometers, back-EMF sensing, angular rate sensors, or shaft encoders of various

types.

4.3.1 Estimation via EKF

This section describes a method of producing a coherent estimate of the manipulator

state using accelerometers attached to each link of the kinematic chain. This is a

sensor-fusion problem: each accelerometer, by itself, can only determine the direction

of a gravity-directed vector. However, by using a priori knowledge of the kinematic

constraints of the manipulator, it is possible to produce a unified state estimate. In

this section, this is achieved by an Extended Kalman Filter (EKF). An augmented

forward kinematics function is used to predict the accelerometer readings given the

current belief state. The EKF algorithm then updates the belief state after observing

the true measurement. The following paragraphs will discuss this process in more

detail.

Numerous strategies can be employed to infer the manipulator configuration from


accelerometer readings. An EKF is used in this chapter due to its relatively simple

implementation and fast on-line performance. A detailed discussion of the EKF

algorithm is beyond the scope of this chapter and is presented in many excellent

texts [96]. This section only discusses the aspects of the EKF implementation unique

to this work.

To encourage smooth estimates of the joint position, the state space is defined to

include the joint angles, velocities, and accelerations:

x =

θ

θ

θ

(4.1)

where θ represents the joint angles of the manipulator. The state (plant) update

function implements numerical integration and assumes constant acceleration:

f(x) =

θ + ∆tθ

θ + ∆tθ

θ

(4.2)

The measurement function needs to predict sensor measurements z given a state

x. As is often the case in EKF implementations, the measurement function could

be made arbitrarily complex to capture more and more properties of the system.

However, experiments showed that a measurement function which only predicts the

acceleration due to gravity was sufficient to handle the low-frequency regime of the

manipulators considered in this work. Adding additional terms to capture centripetal

accelerations and linear accelerations induced by joint-angle accelerations did not

change the performance, as will be discussed in Section 4.6.

To predict the measurement zi of a particular 3D accelerometer αi given the state

x,

zi = Rαii R

ii−1 (x) · · ·R1

0 (x)R0w

0

0

1

(4.3)


where Rαii represents the rotation from the accelerometer frame to the frame attached

to link i, the rotations Rii−1 (x) are the rotations between the link frames, and R0

w is

the rotation from the base of the manipulator to the world. The following paragraphs

describe these rotations in more detail.

Rαii is determined by how accel

Rii−1 (x) is determined by the axial orientation of link i and link i− 1, as well as

by the joint angle θi present in the state x. The link twist can be statically estimated

during calibration, but the joint angle must be recursively estimated by the EKF.

R0w is the rotation between the base of the manipulator and the gravitational

frame. If the manipulator is stationary, this rotation is constant and can be estimated

by static calibration. There are numerous situations where the R0w rotation is not

constant and needs to be estimated, but they lie in extreme domains of robotics:

manipulation on vehicles traveling rapidly across rough terrain, on spacecraft, or

aboard ships in rough seas, for example. These situations are far beyond the scope of

this chapter, but could be readily addressed by estimating R0w through various means.

Using the state-update and measurement functions, the EKF algorithm produces

a recursive estimate of the mean and covariance of the state as more timesteps and

observations are experienced. The computational requirements of an EKF of this

size are not a concern: even when computing numerical derivatives of the update and

measurement functions, updating the 18-state EKF described in this section at 100 Hz

only required 3% of a single core of a desktop CPU at time of writing. This efficiency

was due in part to the usage of the Eigen C++ library for efficient numerical code

generation, but significant speedups could also be found through analytical derivatives

of the update equation, if such proved necessary for a given target application and

computational resources.

4.3.2 Point estimates

The previous section described an EKF-based method to estimate a joint vector of

a manipulator given a vector of accelerometer readings distributed throughout the

manipulator. This section will describe an alternative method, which can produce


Figure 4.2: Accelerometers are present in the F0, F2, and F3 links of the robotic finger,creating a 2-DOF estimation problem between F0 and F2, and a 1-DOF estimationproblem between F2 and F3.

estimates of the joint vector given only a single set of accelerometer readings. The

tradeoff, however, is that point estimates do not provide filtering of the measure-

ment noise, and depending on the accelerometer configuration, the point estimates

may have multiple equally-likely estimates of the joint vector. These issues will be

discussed in detail in the following sections.

Both the 1-DOF and the 2-DOF estimation scenarios arise in the joint-state esti-

mation of the robotic fingers described in Chapter 6. For clarity of discussion, these

cases are presented here, although the inter-related electrical and mechanical issues

will be described in Chapter 6. As shown in Figure 4.2, each finger presents a 1-DOF

and a 2-DOF estimation problem.

The simplest closed-form solution arises when two accelerometers are separated by

a single joint. In the robotic finger shown in Figure 4.2, this occurs when estimating

the F2-F3 joint angle. In this case, the joint angle can be estimated by taking the

difference between the arctangents of the projection of the gravity vectors of each

accelerometer into the plane orthogonal to the joint axis. The accuracy of this esti-

mate varies as a function of the angle between the joint axis and the gravity vector:

the accuracy is maximal when the joint axis is orthogonal to gravity (as depicted in


Figure 4.2, and it degenerates to contain no information when the joint axis is paral-

lel to gravity. As a result, the configuration of the system must be considered when

evaluating the information content of the estimate, and joint axes near the vertical

will likely be dominated by sensor noise.

Although the geometry is somewhat more tedious, a pair of accelerometers sep-

arated by two orthogonal rotary joints also can produce a closed-form solution, al-

though depending on the joint configurations with respect to gravity and the joint

limits, up to four possible solutions may exist. In the robotic finger shown in Fig-

ure 4.2, this estimation scenario occurs when estimating the pair of joint angles con-

necting links F0 and F2, as accelerometers were not able to be fitted within the highly

constrained volume of the connecting link.

The closed-form solution requires a lemma that is presented in [42] through an

exposition of inverse kinematics, in which a similar trigonometric problem often recurs

and is reproduced here for completeness. Specifically, an equation of the form:

a cosψ + b sinψ = c (4.4)

has a closed-form solution using the following change of variables:

a = r sin γ (4.5)

b = r cos γ (4.6)

thus,

r =√a2 + b2 (4.7)

ψ = atan2 (a, b) (4.8)

giving zero, one, or two solutions:

ψ = atan2(c,±√r2 − c2

)− atan2 (a, b) (4.9)


Equation 4.9 will prove useful in the derivation of a closed-form point solution

to the 2-DOF separation between the accelerometers on F0 and F2 in the robotic

finger shown in Figure 4.2. Let the joint vector of the finger be defined as (0, 0, 0)

when the finger is in the fully outstretched posture shown in Figure 4.2. Furthermore,

let θ represent the proximal joint angle about the z axis, and φ represent the next

rotation about the y axis, as shown in Figure 4.2. Finally, let α0 and α2 define the

accelerometer vectors in links F0 and F2, respectively. Assuming the system is at

rest, and only the gravity vector is observed on the sensors, accelerometer α2 can

then be written as a function of accelerometer α0:

~α2 = Ry (φ) Rz (θ) ~α0 (4.10)

=

cφ 0 −sφ0 1 0

sφ 0 cφ

cθ sθ 0

−sθ cθ 0

0 0 1

~α0 (4.11)

~α2 =

cφ(α0xcθ + α0ysθ

)− α0zsφ

−α0xsθ + α0ycθ

sφ(α0xcθ + α0ysθ

)+ α0zcφ

(4.12)

The middle row of Equation 4.12 is of the form of Equation 4.4 and thus has

solutions of the form of Equations 4.9, by substituting a = α0y, b = −α0x, and

c = α2y. This substitution results in up to two possible joint estimates for the

proximal joint angle θ:

θ = atan2(α2y,±

√α0x

2 + α0y2 − α2y

2)− atan2

(α0y,−α1x

)(4.13)

Depending on the joint limits of the mechanism, it is often possible to eliminate

one of the estimates θ. For example, the proximal joint of the robotic finger shown in

Figure 4.2 has a range of motion of ± 90 degrees. Assuming that the estimation of θ

can be reduced to a single solution due to kinematic constraints, the estimated value

of the φ, the distal joint of the pair, can then be used with either the top or bottom

row of Equation 4.12, using another substitution of Equation 4.9. Using the top row


of Equation 4.12, the substitutions are a =(α0xcθ + α0ysθ

), b = −α0z, and c = α2x,

resulting in the following expression for φ:

φ = atan2

(α2x,±

√(α0xcθ + α0ysθ

)2+ α0x

2 − α2x2

)−atan2

(α0xcθ + α0ysθ,−α0z

)(4.14)

4.4 Calibration

The accelerometer based sensing framework described in this chapter can be used

for controlling a low-cost manipulator lacking other sensors, as well as to provide

an additional absolute configuration sensor on a manipulator equipped with another

primary sensing system. The second application can be useful when the primary

sensing modality is incremental, and the manipulator goes through a homing sequence

after initial power-up. An initial estimate of the manipulator state, even if far less

accurate than the primary sensing modality, could allow for a more efficient homing

sequence. The calibration accuracy achievable using accelerometer-based sensing in

both scenarios is evaluated in the following sections.

Depending on the fabrication methods employed to construct the manipulator

in question, some of the static link parameters may not be known a priori to high

precision. In addition, the accelerometer axes may have internal misalignments, and

additional misalignments are incurred as the chip is soldered to its circuit board

and that circuit board is subsequently mounted to the robotic manipulator. These

misalignments are cumulative, and can result in rotations of several degrees between

the measurement axes and the axes of the kinematic frames. Fortunately, these

mechanical imperfections are static, and thus can be modeled and calibrated away.

To demonstrate a calibration technique for low-cost manipulators using accelerom-

eters as a primary sensing modality, an unactuated arm was constructed, shown in

Figure 4.3. This 4-dof unpowered arm has a roughly spherical shoulder and a single

elbow joint, with link lengths similar to the manipulator shown in Figures 4.1. The

same accelerometers were used as on the powered manipulator shown in Figure 4.1.

4.4. CALIBRATION 53

Figure 4.3: Left: an unpowered arm used to evaluate the calibration potential of theaccelerometer-based sensing approach. Right: touching the end effector to points onthe calibration board.

The calibration scheme for the low-cost prototype was loosely derived from the

checkerboard-based calibration method widely used in computer vision [110]. A pla-

nar calibration board was placed in the workspace of the manipulator. The end

effector of the manipulator was touched to its corners, each time recording the corre-

sponding accelerometer readings. Then, the board was translated and rotated out of

plane, and the process was repeated to collect a dataset of 20 different measurements

covering the manipulator workspace. Because the sizes of the checkers are known

to high precision, these measurements served to provide scale and skew constraints

to the calibration. This data was augmented by collecting a very large number of

accelerometer readings of manipulator configurations where the end effector was in

contact with a large planar surface such as a tabletop. This large dataset served to

provide planarity constraints.


Formally, the optimization problem can be written as:

arg minL,R

g1(α,L,R) + λ2g2(α,L,R) + λ3g3(R) (4.15)

where α are the accelerometer readings in the test set, L represents the estimated link

parameters, and R represents the rotation matrices representing the misalignment of

each accelerometer frame to its respective link frame.

The first term enforces the known scale of the calibration board:

g1(α,L,R) =∑i,j

||dij − dij|| (4.16)

where dij is the distance between the estimated positions of the manipulator:

dij = |FK(αi,L,R)− FK(αj ,L,R)| (4.17)

where, FK is the forward-kinematics joint-angle estimation algorithm described in

the previous section to produce estimates of the end-effector positions using the ac-

celerometers and the link parameters L. The subscripts i and j identify samples

from the training set which were gathered from recorded positions on the calibration

checkerboard pattern, and which therefore correspond to end-effector positions whose

ground-truth distance dij is known.

The second term of the calibration optimization function, g2, corresponds to the

planarity constraint imposed on the large number of manipulator configurations in

which the end-effector was touching a tabletop which is assumed to be planar. Let

P be the plane fitted by taking the vectors corresponding to the top two singular

values of Y TY , where Y is the n x 3 matrix whose rows yi consist of the end-effector

positions calculated by the estimated calibration. This sum of the distances between

the large number of estimated end-effector positions and their projections onto the

fitted plane provides a source of information about the severity of the miscalibration:

in an ideal calibration, this sum would be zero. Thus:

4.4. CALIBRATION 55

Figure 4.4: Hold-out test set error during the optimization convergence on prototypemanipulator. The horizontal axis shows the iteration number, and the vertical axisshows the mean of the miscalibrations. Numerical optimization drives the averageerror from 11mm to 2mm.

g2(α,L,R) =∑yi

||yi − projP (yi)|| (4.18)

The final term of the optimization function g3 encourages the misalignment rota-

tion matrices Ri to remain orthonormal during the optimization:

g3(R) =∑i

||RiTRi − I3|| (4.19)

To evaluate the utility of an accelerometer-based sensing scheme for a robot which

already has high-precision configuration sensing, data was collected using a Willow

Garage PR2 Alpha robot. This robot has two 7-DOF arms equipped with optical

shaft encoders on the motors driving each joint. The manipulator was already well

calibrated and the link parameters are known a priori to high precision. In this case,

the calibration task is simplified, needing to estimate only the rotations Ri of the

mounting of each accelerometer on its respective link. The joint angles from the shaft

encoders were treated as ground truth, and compared with estimates produced from


Figure 4.5: Hold-out test set error during the optimization convergence on WillowGarage PR2 Alpha. The horizontal axis shows the iteration number, and the verticalaxis shows the mean error in joint angle estimates of the shoulder lift and the upperarm roll. The optimization drives the average error from 0.1 deg to 0.02 deg.

the accelerometer readings to calibrate the misalignment rotations. The resulting

optimization problem shown below is similar to Equation 4.15

arg minR

∑i

||θi − θi(α,Ri)||+ λ∑i

||RiTRi − I3|| (4.20)

where θi is the joint angle position of the manipulator as given by the shaft encoders

and θi is the estimate based on accelerometer readings. θi is computed by solving for

joint angles by considering inverse kinematics on pairs of links.

At time of writing, the implementation required approximately one hour of CPU

time to reach convergence using the simplex-based optimization technique imple-

mented by the MATLAB fminsearch function. As is always the case with nonconvex

optimization, a good starting point was necessary to achieve a reasonable solution.

Starting with the ideal parameters of the manipulator CAD models and assuming per-

fect sensor alignment resulted in obtaining reasonable solutions from the test data.

To evaluate the performance of the calibration methods quantitatively, we al-

lowed the optimization algorithm to use training data containing several manipulator

4.5. CONTROLLING A LOW-COST MANIPULATOR 57

positions and checkerboard orientations, and maintained a hold-out test set of sev-

eral other orientations for evaluation purposes. The results demonstrate a significant

calibration accuracy improvement over the initial starting point of the CAD models.

Figure 4.4 shows the convergence of the algorithm from an initial mean error of 11mm

down to 2mm on the prototype unpowered manipulator. Figure 4.5 shows conver-

gence of the algorithm from a mean error of 0.1 degrees in the shoulder lift joint to

0.02 degrees on Willow Garage PR2 Alpha.

On the low-cost unpowered manipulator, the resulting 2mm average end-effector

localization error is an order of magnitude worse than what is reported by manufac-

turers of high-quality robotic manipulators sensed by shaft encoders. However, it is

more accurate than the best camera-manipulator calibration we have been able to

achieve in several years of constructing and calibrating vision-guided systems with

high-performance manipulators. We anticipate that this level of calibration error will

not be the limiting factor of using low-cost localization approaches in a complete

robotic system.

4.5 Controlling a low-cost manipulator

To explore the feasibility of low-cost manipulator control using a purely accelerometer-

based control scheme, a 6-dof manipulator was constructed under a strict budgetary

constraint of $1000 USD (Figures 4.1, 4.6, 4.7). As this manipulator incorporates

several unconventional design features, its design is summarized in this section for

completeness. Mass-produced parts were employed wherever possible, such as au-

tomotive windshield-wiper DC gearmotors and skateboard bearings, and fabrication

cost was reduced by the use of laser-cut materials.

The shoulder operated in a spherical RRR configuration with a remote center of

motion to allow the second and third motors to operate in a direct-drive configuration

to achieve a minimal part count. Although the shoulder motors were powerful and

low-cost, they exhibited some cogging due to their ferrous cores, which had detrimen-

tal affects to the low-speed control of these joints. Unfortunately, powerful low-cost


Figure 4.6: Shoulder and wrist of the demonstration manipulator

Figure 4.7: Elbow and gripper of the demonstration manipulator.

DC brushed motors suffer almost universally from cogging, and addressing this be-

havior is a challenge in low-cost arm design.

A friction differential drive (Figure 4.6) provided the wrist with pitch and roll

degrees of freedom. The differential was created by belt-driven rubber wheels pressed

firmly against a thin aluminum veneer. Such assemblies are low-cost, durable, and

effective at transmitting high torques. The friction drive provided zero-backlash and

an inherent safety limit: torque overloads resulted in slippage rather than damage to

the drivetrain.

The elbow is driven via belt from a large motor in the shoulder. The gripper was

fabricated from lasercut polypropylene to avoid the mechanical complexity of discrete

parts, using flexures to create a durable, zero-backlash 4-bar mechanism. The thin

belts visible in Figure 4.7 were used only to turn potentiometers for position feedback.

Linux-based software was written using the open-source Robot Operating System

4.6. EXPERIMENTS 59

Figure 4.8: Accelerometers were attached to the upper arm, forearm, and gripper ofa PR2 Alpha robot.

(ROS) platform, which will be presented in Chapter 8. ROS modules were writ-

ten for firmware communication (via the Linux usb-serial driver), state estimation,

joint-space control, teleoperation via joysticks, trajectory recording, and trajectory

playback/looping.

4.6 Experiments

In this section, a series of experiments are presented that quantify the performance

of accelerometer-based state estimation and closed-loop control on two robots: the

low-cost manipulator discussed in the previous section, and a Willow Garage PR2

Alpha, a high-precision 7-dof manipulator. Accelerometers were designed into the

motor control boards distributed throughout four links of the low-cost manipulator.

The PR2 Alpha was outfitted with strapdown accelerometers, as shown in Figure 4.8.

4.6.1 PR2 Alpha State Estimation

To quantify the performance of the calibration and joint-tracking systems, accelerom-

eters were affixed to a Willow Garage PR2 Alpha robot (Figure 4.8). This 7-dof


Figure 4.9: Tracking the forearm roll of the robot shown in Figure 4.8, showing theencoder ground-truth (red) against the joint angle estimate from the accelerometers(blue).

manipulator is equipped with high-quality shaft encoders, which served as ground

truth for this experiment. Its kinematic configuration includes a vertical first joint

(the “shoulder yaw”), which was not estimated in these experiments. This joint axis

is always parallel to gravity, and thus lies in a measurement singularity.

Figure 4.9 demonstrates the tracking performance of one joint (the forearm roll)

as the manipulator was smoothly moved through its workspace. The following ta-

ble (in degrees) shows the mean error throughout the trajectory, measured as the

difference between the shaft encoder readings and the joint state estimates from the

accelerometers.

Shoulder Lift Upper Arm Roll Elbow Flex

Error (deg) 0.965 0.926 1.590

This experiment was done under near-ideal conditions: the PR2 Alpha arm uses

spring balances for gravity compensation and small coreless motors [107]. The mech-

anism is thus extremely smooth and well-behaved, avoiding any transients or other

anomalies as it travels the workspace.

4.6. EXPERIMENTS 61

Figure 4.10: Closed-loop control of a low-cost manipulator using only accelerometers.Two joints are shown. Desired state is plotted in red. Output of the accelerometer-based state estimation algorithm is plotted in blue. Vertical axis denotes joint anglesin radians; horizontal axis denotes time.

4.6.2 Low-cost Manipulator Torque Control

Manipulators equipped with shaft encoders can often ignore the state-estimation

problem when designing a control scheme, as the quality of the state estimate is

often independent of the state of the robot. In contrast, the quality of the state es-

timates inferred from the accelerometers can vary wildly with the configuration and

velocity of the manipulator. This section discusses the ramifications of this property

on the behavior of a low-cost manipulator equipped only with accelerometers and DC

gearmotors.

For this experiment, a proportional-integral (PI) controller was wrapped around

the state estimates produced by the EKF described in Section 4.3.1. To reduce the

efforts required of the PI controller, active gravity compensation was implemented,

using the Jacobian to compute the feed-forward torques necessary to float the manip-

ulator links, as is common practice [15]. Representative trajectories for the first and

second joints in the manipulator (taken simultaneously) are shown in Figure 4.6.2.

The responses of the other joints were similar.

The accelerometer-only sensing scheme was found to break down under high dy-

namic conditions. More specifically, linear accelerations induced by angular joint

accelerations are not modeled in the measurement prediction of Equation 4.3, and


Figure 4.11: Differences between each stopping position of the arm and their respec-tive cluster centroids, in the XY plane (left) and the XZ plane (right), as measuredby an optical tracker. 14 trials were run, all of which appear on this plot.

neither are the centripetal accelerations induced by the angular joint velocities. In-

terestingly, adding those terms did not improve the high-dynamic performance of the

manipulator, nor did adding a derivative term to the PI controller. Although it is

difficult to speculate without supporting experiments, it is possible that for such a

system to remain stable, it must have a higher system bandwidth, calibration level,

or measurement SNR.

As a result of the previous observation, the low-cost manipulator could become

unstable in high-dynamic situations. Furthermore, the measurement model of Equa-

tion 4.3 does not model contact forces; as a result, large contact forces may also

induce instability. In either case, stability can be regained by quickly ramping down

motor torques, which effectively slows the manipulator down until it re-enters a stable

region of the coupled sensing and control systems.

To quantify the repeatability of the low-cost manipulator, an active optical track-

ing device (Atracsys EasyTrack500) was used to obtain ground-truth position data of

the end effector. The tracking system has an accuracy of 0.2mm. The manipulator

was placed in front of the optical tracker, with the optical beacon attached to the

gripper tip.

The manipulator was commanded to cycle between two joint configurations which

required motion of at least 30 degrees on all six joints of the manipulator (excluding

4.6. EXPERIMENTS 63

Figure 4.12: In this experiment, accelerometer-based state estimation was used togenerate relative joint position commands, allowing a position-controlled robot torepeatedly grasp a doorknob.

the gripper fingers), resulting in end-effector travel of 34cm. The optical-tracking data

was analyzed to extract the points where the manipulator had stopped, resulting

in one cluster for each of the target positions. Figure 4.11 shows an estimate of

the repeatability of the manipulator, as measured by the deviation of each of these

stopping positions from the mean of its cluster. The mean deviation was 18.6mm,

and the maximum deviation was 33.9mm.

4.6.3 PR2 Alpha Position Control

A final experiment used the PR2 Alpha arm in a doorknob-grasping task (Fig-

ure 4.12). To avoid the instabilities witnessed in the low-cost manipulator experiment,

the accelerometer-based state estimator was used to control only the low-frequency

trajectory of the manipulator. The shaft encoders and internal high-frequency control

loops of the PR2 electrical system were used to stabilize the high-frequency behav-

ior. The accelerometer-based controller sent relative joint angle commands to the

PR2. As such, this controller could be used without shaft encoders on any manipula-

tor with stable position-based actuators, such as stepper motor-based manipulators.


Figure 4.13: Time series of one PR2 joint as the manipulator undergoes relative jointangle commands from the accelerometer-based sensing scheme and simple setpoint-interpolation to derive small step commands.

Importantly, because the accelerometers fly on the actual links of the robot, their

measurements are not corrupted by link droop or cable stretch.

The accelerometer-based controller, stabilized by incremental joint encoders, was

used to control an arm of a PR2 Alpha robot. The arm was driven through a sequence

of control points to repeatedly grasp a door handle in front of the robot. Trajectory

tracking in this position-controlled scenario is shown in Figure 4.13.

4.7 Summary

The work described in this chapter was motivated by the ever-increasing precision

of consumer-grade MEMS accelerometers, and the observation that some anticipated

future domains of robotics, such as home service robots, possess many sources of error

beyond manipulator repeatability. In such scenarios, a reduction in the repeatability

of the manipulator may not drastically increase the overall system error figure.

4.7. SUMMARY 65

In general, an accelerometer-only sensing strategy removes complexity from the

electromechanical mechanisms and increases the complexity of the calibration and

control software. This strategy is motivated by the observation that complex software,

unlike complex hardware, can be replicated at no cost.

However, because adding accelerometers to an existing manipulator design is me-

chanically simple and incurs very little cost, particularly for robots already equipped

with circuit boards distributed throughout the kinematic structure, this sensing ap-

proach is also suitable as a backup, or auxiliary, sensing strategy for manipulators

equipped with shaft encoders. The accelerometers could then be used to bootstrap the

power-up sequence of manipulators equipped with relative shaft encoders: regardless

of the configuration of the manipulator at power-up, an accelerometer-driven EKF

will quickly converge to a reasonable estimate of the manipulator configuration. After

an accelerometer-based joint configuration is estimated, the manipulator could use its

incremental encoders to safely and quickly reach the homing flags for each joint. Fur-

thermore, accelerometers can provide information about impacts, drivetrain health

(through spectral analysis), and a continual “sanity check” for the incremental en-

coders.

This chapter presented a low-cost sensing scheme based on 3D MEMS accelerome-

ters. The system produces coherent, absolute estimates of the kinematic configuration

of a manipulator. Experiments were performed to quantify the performance of this

scheme using both high-precision and low-cost manipulators. The accelerometer-

based sensing algorithm can be readily applied to any manipulator to augment its

state estimation with very little hardware cost and trivial mechanical complexity.

Chapter 5

A Compliant Low-cost Robotic

Manipulator

5.1 Introduction

Many extant robotic manipulators are very expensive, due to high-precision actuators

and custom machining of components. Indeed, at time of writing, the cost of the

manipulator arms and hands together make up the bulk of the cost of many state-of-

the-art research robots. This observation led to the work described in this chapter,

wherein several fabrication and actuation methods are described which endeavor to

offer significant reduction in total system cost of robotic manipulators.

The underlying motivation is that robotic manipulation research, and eventually

real-world deployments of personal robots, can advance more rapidly if robotic arms

of “sufficient” performance were available at a greatly reduced cost. Increased afford-

ability can lead to wider adoption, which in turn can lead to faster progress—a trend

seen in numerous other fields [14]. However, drastic cost reduction requires numerous

design tradeoffs and compromises, many of which only are justifiable in a far different

set of application domains than the closed-workcell factory domains in which previous

generations of robotic manipulators have seen economic success.

66

5.1. INTRODUCTION 67

Figure 5.1: A low-cost compliant manipulator described described in this chapter. Aspatula was used as the end effector in the demonstration application. For ease ofprototyping, lasercut plywood was used as the primary structural material.

Closed-workcell production line applications demand manipulators with high de-

grees of repeatability, precision, and reliability. Mass-production welding and paint-

ing robots provide stereotypical examples of this application domain, and provide

enormous commercial value. However, although high-precision manipulators excel at

tasks for which they can be pre-programmed, robotic manipulators have not become

commonplace in the unstructured domains of typical homes and workplaces. In such

domains, the requirements of absolute precision and speed become overshadowed by

the cost and safety concerns of the consumer market. A major effort of this thesis is

to develop techniques which can drive down the cost of personal robots. As robots

become less expensive, an increasingly varied array of applications becomes feasible,

which could lead to home and service robots capable of performing enough tasks to

provide economic justification for their large-scale deployment.

There are numerous dimensions over which robotic arms can be evaluated, includ-

ing backlash, payload, speed, bandwidth, repeatability, compliance, human safety,

and cost, to name a few. In robotics research, some of these dimensions are more

important than others: for grasping and object manipulation, high repeatability and

68 CHAPTER 5. A COMPLIANT LOW-COST ROBOTIC MANIPULATOR

low backlash are important. Payload must be sufficient to lift the objects under study.

Human-safety is critical if the manipulator is to be used in close proximity to people

or in classroom settings.

Some areas of robotics research require high-bandwidth, high-speed manipulators.

However, in many research settings, speed and bandwidth may be less important.

For example, in object manipulation, service robotics, or other tasks making use of

complex vision processing and motion planning, large amounts of time are typically

required for computation. This results in the actual robot motion requiring a small

percentage of the total task time. Additionally, in many laboratory settings, manip-

ulator speed is often deliberately limited to give the experimenters time to respond

to accidental collisions or unintended motions.

A shipped product must include overhead, additional design expenditures, testing

costs, packaging, and possibly technical support, making a direct comparisons diffi-

cult between commercial product prices and the parts cost of research prototypes.

However, this chapter includes the parts cost of the manipulator in order to give

a rough idea of the possible cost reduction as compared to commercially-available

manipulators at time of writing. Experiments are then presented which demonstrate

that repeatibility on the order of millimeters can be achieved with low-cost fabrication

technologies.

A set of design goals were selected to guide development, intending to produce a

manipulator whose performance is comparable to commercially-available manipula-

tors currently used for personal-robot research:

• Human-scale workspace

• 7 Degrees of freedom (DOF)

• Payload of at least 2 kg (4.4 lb.)

• Human-safety considerations:

– Compliant or easily backdrivable

– Flying mass under 4 kg


• Repeatability under 3 mm

• Maximum speed of at least 1.0 m/s

• Zero backlash

To meet these goals while remaining sensitive to system cost, a manipulator was

designed and prototyped which employs low-cost stepper motors in conjunction with

timing belt and cable drives to achieve backlash-free performance. This design trades

off the cost of expensive, compact gearheads for increased system volume, mass, and

power consumption. To improve human safety, a series-elastic design was used, in

combination with minimizing the flying mass of the arm by keeping the motors at or

close to ground. The resulting prototype is shown in Figure 5.1.

A brief outline of this chapter is as follows. Section 5.2 gives an overview of some

other robotic arms used in robotics research. Section 5.3 presents an overview of the

design of the arm, and discusses tradeoffs in its actuation scheme. Section 5.4 dis-

cusses the series compliance scheme, and sections 5.5, 5.6, and 5.7 discuss its sensing

scheme, performance, and control, respectively. Section 5.8 discusses application of

the robotic arm to a pancake-making task, followed by a conclusion.

5.2 Related Work

The field of robotic manipulation has produced a vast number of manipulator designs

over the past several decades. A full survey of the field would be immense and beyond

the scope of this work. The following discussion covers some of the widely-used and/or

influential robotic arms used in personal robotics research at time of writing, many

with unique features and design criteria intended to function in the personal robotics

domain.

The Barrett WAM [78, 94] is a popular cable-driven robot well-known for its back-

drivability and smooth, fast operation. It is capable of high speed (3 m/s) operation,

advertises 2 mm repeatability, and achieves zero-backlash performance through the

use of cable reductions and cable differentials. Very high mechanical bandwidth is


achieved through these cable reductions and the relatively low flying mass of the arm,

as the large shoulder and elbow motors are grounded.

The Meka A2 arm is series-elastic and intended for human interaction. Other

custom-made robots with series-elastic arms include Cog, Domo, Obrero, Twendy-

One, and the Agile Arm [9, 22, 99, 39, 71]. The Meka and Twendy-One arms achieve

zero-backlash performance by using harmonic drive gearheads. The Cog arms employ

planetary gearboxes, whereas Domo, Obrero, and the Agile Arm use ballscrews. These

robots use various mechanisms to provide generous series elasticity, and thus tend to

have a relatively low mechanical bandwidth of less than 5 Hz due to series compliance.

However, this bandwidth limitation has not appeared to restrict their use in research

in various manipulation domains, and appears to be offset by the increased margin

of safety.

A different approach is taken by several arms developed at the Stanford AI lab us-

ing a “macro-mini” approach, combining large, low-bandwidth series-elastic actuators

with small, high-bandwidth electric motors [114, 88].

The Willow Garage PR2 robot takes yet another approach to safety: a unique

3-DOF gravity-compensation mechanism allows the arms to passively float in any

configuration. Because the large masses of the arms are always supported, only

relatively small motors with small gear reductions are needed to move the arms and

support payloads. These small actuators can be easily backdriven and thus help

improve human safety, especially when combined with soft coverings on the arms.

The prior set of designs can be contrasted with the DLR-LWR III arm [38], Schunk

Lightweight Arm [84], and Robonaut 1 [2], which all use motors directly mounted to

each joint and harmonic-drive gearheads to provide high control bandwidth with zero

backlash. These arms have higher payloads than the other arms discussed in this

section, ranging from 3-14 kg. However, human safety is only possible with active

control systems, as these arms have relatively large flying masses (close to 14 kg for

the DLR-LWR). These systems have fewer “inherent” safety features than the arms

mentioned previously, but their high bandwidths allow for tight coupling with distal

force/torque sensors to stop the arms extremely quickly after collisions are detected.

Of the robotic arms discussed previously, those that are commercially available are

5.3. DESIGN 71

relatively expensive, with end-user purchase prices well above $100,000 USD at time

of writing. However, there are a few examples of low-cost robotic manipulators used

in research. The arms on the Dynamaid robot [92] are constructed from ROBOTIS

Dynamixels, which are lightweight and compact self-contained actuator modules. The

Dynamaid robot has a human-scale workspace, but a lower payload (1 kg) than the

class of arms discussed previously. The 5-DOF KUKA YouBot arm is targeted at

is a new 5-DOF arm for robotics research [48]. It has a comparatively small work

envelope of just over 0.5 m3, repeatability of 0.1 mm, and payload of 0.5 kg and

employs custom, compact motors and gearheads.

More relevant to the manipulator described in this chapter, countless robot arms

have been constructed using stepper motors. Pierrot and Dombre [69, 20] discuss

how stepper motors contribute to the human-safety of the Hippocrate and Dermarob

medical robots, because the steppers will remain stationary in the event of electronics

failure, as compared to traditional DC motors, which may continue rotating, depend-

ing on the type of subsystem failure. More importantly, stepper motors are often

operated relatively close to their maximum torque, as compared to DC or BLDC

motors which typically have a much higher stall torque than the torque used for con-

tinuous operation. In the marketplace at time of writing, ST Robotics offers a number

of stepper-driven robotic arms which advertise sub-mm repeatability. However, these

manipulators are intended for industrial applications, and thus do not have integrated

human-safety features. Various other small, non-compliant manipulators were made

in the 1980s and 1990s for the educational market were driven by stepper motors. For

example, the 5-DOF Armdroid robots had 0.6m reach and used steppers with timing

belts for gear reduction, followed by cables to connect to the rest of the arm.

5.3 Design

The manipulator presented in this chapter has an approximately spherical shoulder

and an approximately spherical wrist, connected by a single-DOF elbow. The joint

limits and topology were designed to enable the robot to perform manipulation tasks

when mounted near table-height, as opposed to anthropomorphic arms, which must


hang down from the shoulder and require the base of the arm to be mounted some

distance above the workspace, with a correspondingly higher center of mass (and fall

hazard) of the resulting structure. The shoulder-lift joint has nearly 180 degrees of

motion, allowing the arm to reach objects on the floor and also work comfortably on

tabletops. A summary of the measured properties and performance of the manipulator

is shown in table 5.1.

Table 5.1: Measured properties of the manipulatorLength 1.0m to wrist

Total mass 11.4 kgFlying mass 2.0 kg

Maximum payload 2.0 kgMaximum end-effector speed 1.5 m/s

Repeatability 3 mm

5.3.1 Actuation overview

Figure 5.2 shows the actuation scheme for the proximal four DOF. These joints are

driven by stepper motors. Speed reduction is realized through timing belts and cable

circuits, and is followed by a series-elastic coupling to each joint. Creating speed

reduction through timing belts and cable circuits results in low friction, low stiction,

and zero backlash, enabling the arm to make small incremental motions of less than

0.5mm in all configurations. Additionally, there is no gearing to damage under ap-

plied external impulse forces. This leads to a low-cost but relatively high performance

actuation scheme. A downside to this scheme, however, is that the reduction mech-

anisms occupy a relatively large volume, making the proximal portion of the arm

somewhat large.

The two-stage reduction of a timing belt followed by a cable circuit accomplishes

not only a larger gear reduction than a single stage, but also enables the motors to be

located closer to ground. The motors for the two most proximal DOF are grounded,

and the motors for the elbow and upperarm roll joints are located one DOF away

from ground. By placing the relatively heavy stepper motors close to ground, the

5.3. DESIGN 73

Figure 5.2: Actuation scheme for each of the proximal four DOF.

flying mass of the arm is greatly reduced: below the second (shoulder pitch) joint,

the flying mass of the arm is 2.0 kg. For comparison, a typical adult male human

arm has a flying mass of 3.4 kg [13].

The two-stage reduction scheme leads to coupled motions of the first four joints.

However, this coupling is purely linear and can easily be compensated using a software

feedforward term. The routes of the timing belts and cables can be seen in figure 5.3.

Following the timing belts and cable circuits, the proximal four DOF have series elastic

couplings between the cable capstan and the output link, as discussed in section 5.4.

These couplings provide intrinsic compliance to the arm, as well as provide torque

sensing (section 5.5).

The three distal joints are driven by Dynamixel RX-64 actuator modules. These

joints do not have compliant features aside from software torque limits. However,

the compliance of the proximal four joints allow the end effector to be displaced in

Cartesian space in three dimensions, outside of kinematic singularities where only two

dimensions will be compliant.

5.3.2 Tradeoffs of using stepper motors

Using stepper motors as actuators offer several advantages. Because they are similar

to brushless motors, but have many more poles than a typical brushless motor, stepper

motors excel at providing large torques at low speeds, which is the target regime of

the arm. They require a relatively low gear reduction, which can be accomplished

with timing belts and cable drives, as in the design presented in this chapter. In

this design, the effective reductions were 6, 10, 13, and 13, respectively, for the first

four joints. DC motors, for comparison, generally require a significantly larger gear


Figure 5.3: Cable routes (solid) and belt routes (dashed) for the shoulder lift, shoulderroll, and elbow joints. All belt routes rotate about the shoulder lift joint. The elbowcables twist about the shoulder roll axis inside a hollow shaft. Best viewed in color.

reduction that would be either susceptible to backlash or incur considerable cost.

Stepper motors can also act as electromagnetic clutches, improving safety if large

forces are accidentally applied at the output. If a force is applied that causes a stepper

to exceed its holding torque, the stepper motor will slip and the arm will move some

distance until the stepper can re-engage. In the controllers developed for this arm,

the holding torque is approximately 60% more than the maximum moving torque

(and hence the maximum payload of the arm): large enough to avoid unintentional

slipping, but small enough to provide a form of force-limiting.

However, there are several downsides of the stepper motors acting as an electro-

magnetic clutch. First, if a stepper motor slips, the arm may need to be re-homed.

The manipulator uses joint-angle encoders for state estimation, so closed-loop position

5.3. DESIGN 75

Figure 5.4: Compact servos are used to actuate the distal three joints.

control can still occur after a slip. However, force sensing will be miscalibrated (see

section 5.5). Second, the arm may move suddenly away from the impact if a stepper

motor slips. The arm only slips if relatively large amounts of force are applied, and

after a slip the steppers initially provide little resistance until the rotor slows down

sufficiently to be re-locked by the stator fields. The moving arm may collide with

other objects or people. In the proposed design, this risk was addressed by reducing

the flying mass of the arm as much as possible. Adding backshaft encoders to the

stepper motors would enable tracking of the rotor position during rotor slippage, and

thus enable faster stoppage of a slipping motor. Whether or not the additional cost is

justified depends on the task and the anticipated frequency of unintended high-speed

collisions. As envisioned in the design, stepper slips occur only as a final layer of

safety, and thus are not anticipated to be a frequent operational mode.

5.3.3 Distal actuation

The actuation scheme of the proposed manipulator uses series-elastic actuators (SEA)

in the proximal 4 joints, but directly-coupled actuators for the distal 3 joints. The

bandwidth of the distal 3 joints is somewhat higher than the bandwidth of the prox-

imal 4 joints, permitting a restricted set of higher-frequency motions. This is similar

to that described in [113], which employs a macro-mini actuation scheme for the most

proximal DOF and conventional actuators for the more distal DOF.

The decision to use of non-SEA distal actuators was primarily to reduce the flying


mass and volume of the forearm. However, one risk is that the gears of these distal

joints must absorb shock loads delivered or received by the end effectors.

5.3.4 Inertia and stiffness

One important tradeoff with a series-elastic manipulator is the arm inertia and series

elastic stiffness. Consider a single-joint arm with moment of inertia I [kg m2] driven

by a rotary joint with torsional stiffness kθ [N m/radian]. The arm will oscillate at

its natural frequency, which is f0 = 12π

√kθ/I. If the arm has a low inertia or the

series elastic coupling is stiff, the motor driving the arm may not have enough torque

or bandwidth to compensate for this oscillation. Pratt and Williamson [70] suggest

increasing the arm’s inertia to eliminate this effect; other options are to reduce the

spring constant; include damping in the series-elastic coupling; or increase bandwidth

by decreasing the motor gear reduction, at the cost of a lower payload. For human-safe

robotic arms with low inertia, this issue can be significant.

In the arm described in this chapter, considering the elbow joint, the natural

frequency is around f0 = 5.1 Hz, with kθ = 86 N m/radian and I = 0.083 kg m2.

This is close to the bandwidth of the motors with the selected gear reduction.

5.3.5 Low-cost manufacturing

Several methods were used to reduce the cost of the manipulator. With similar

speed/torque delivery, planetary-geared motors of similar cost to the stepper motors

used in this design typically exhibit at least one degree of backlash. At time of writing,

actuators using harmonic drives that are capable of these speeds and torques often

cost ten times as much as a stepper motor. For cost concerns, stepper motors were

used for the proximal four joints.

To control the low-volume manufacturing cost of the design, it was realized pri-

marily by lasercutting 5-ply plywood. Although high-volume manufacturing meth-

ods such as injection molding would enable drastically lower incremental costs, these

methods typically incur significant tooling expenses which require thousands of units

to justify. Laser cutting, in contrast, is “toolless” and thus allows low-volume runs

5.3. DESIGN 77

Table 5.2: Part cost breakdown of the armMotors

Steppers $700Servos $1335

Electronics $750Hardware $960Encoders $390Total $4135

to be economically realized. The lasercutter used for these experiments (a 500-watt

Beam Dynamics OmniBeam 500) can produce tolerances of 0.025mm, and excellent

results were also achieved with an Epilog Legend Helix 24 (45 Watt) laser cutter.

Dovetailing of the wood pieces was done to enable them to press-fit together, and

flanged bearings and shafts were also press-fit into holes. It is unknown how the

wooden structure would respond to large temperature and humidity variations, but

in a typical lab environment these are held relatively constant. Wood is an excellent

material for rapid prototyping, and is rigid enough to meet the repeatability design

requirements. Further experiments and iterations of the proximal joints were done us-

ing laser-cut sheet aluminum, which verified that the results presented in this chapter

are replicable on more durable materials.

The lower arm of the robot was made of folded lasercut aluminum. Although

folded metal structures typically cannot achieve the extreme precision of custom-

machined 3D parts, calibration techniques can be readily used to compensate for

manufacturing variances.

All parts other than the lasercut structures were off-the-shelf components. A

breakdown of the parts cost for the robot is shown in Table 5.2. Not included in

this list are the costs of laser cutter time and assembly time; laser cutting would

require approximately 2.5 hours and assembly would take approximately 15 hours

for additional copies of the manipulator. The costs shown in Table 5.2 were those

incurred for the creation of the prototype described in this chapter, from a variety

of domestic suppliers. Efforts at locating low-cost suppliers would likely result in a

dramatic reduction of the part cost.


Figure 5.5: Diagram of the series compliance. Left, compliant coupling with noexternal force. Right, an applied force causes rotation against the locked drivenwheel.

5.4 Series Compliance

As mentioned previously, the manipulator employs compliant couplings in the prox-

imal four joints. This provides an increased measure of safety, allows the arm to be

compliant even though the stepper motors are not backdrivable, and is used for force

sensing by measuring the deflection across the compliant members.

A diagram of the compliant coupling is shown in Figure 5.5. Its operation is

similar to the elastic couplings described in [49, 4, 100]. At the joint, a capstan used

in the cable circuit (labeled 1 in Figure 5.5) floats on bearings on the same shaft as the

output link (2). The capstan is connected to the output link only via the compliant

elements. Two plates connected to the output link extend through the middle of the

capstan, which has two interior cutouts. Each cutout contains a polyurethane tube

(3), which is compressed between the plate from the output link and the side of the

cutout in the capstan. In Figure 5.5(right), the capstan (4) is held stationary while

an external force (F) is applied. This causes one polyurethane tube (5) to compress

while the other (6) expands. The polyurethane tubes are initially pre-compressed to

slightly more than half of their maximum possible compression, so they will always

remain in compression as the output link moves with respect to the capstan.

Polyurethane was used to provide some mechanical damping of the joint, which

gives the arm some hysteresis but helps dampen oscillations. However, springs could

5.5. SENSING 79

Figure 5.6: Stiffness of the elbow. Hysteresis is exhibited due to the polyurethanein the series compliance. The joint was quasi-statically moved through 70% of itsnormal operating range.

readily be used in their place. Tubes were used instead of rods or balls to give the

output links around 4 degrees of compliance in each direction, which requires several

millimeters of travel. Figure 5.6 shows the stiffness and hysteresis of the compliant

coupling in the elbow joint.

5.5 Sensing

As discussed in previous sections, the first four joints of the manipulator are actuated

by relatively large stepper motors embedded in the base and shoulder. The intrinsic

stability of stepper motors forms a key aspect of the sensing strategy: assuming

the stepper motors do not slip, the series of step motions the motors undergo can be

precisely integrated to give the input displacement to the series-elastic coupling. Joint

angles are measured directly at the links using optical encoders. The deflection of

the compliant element can thus be measured as the difference of the (post-reduction)

motor position and the joint angle, thus permitting force sensing via the inferred


Figure 5.7: Repeatability test results. Measurement accuracy is ±0.1 mm.

deformation of the compliant elements in the couplings.

Integration of motor step counts occurs on embedded microcontrollers located

in the first two links of the manipulator. This integration commences at power-up,

and thus the motor step integration is best seen as a relative position estimate. To

estimate the position offsets, enabling comparison with the (indexed) absolute joint-

angle encoders, the manipulator is driven to the index pulses and held stationary.

The stepper count when the manipulator is stationary at all encoder index pulses can

be taken as a static offset to permit force-sensing calibration, barring hysteresis or

plastic deformation of the compliant elements.

The distal three joints are actuated by Robotis Dynamixel RX-64 servos, which

feature internal potentiometers with a usable range of 300 degrees. The potentiometer

voltage is internally sampled by the servo.

To simplify the manipulator wiring, the stepper-motor drivers and servos share

a common RS-485 bus and data protocol. Sensors are sampled and actuators are

commanded at 100 Hertz.

5.6. PERFORMANCE 81

−2 0 2 4 6

0

2

4

6

8

10

12

Time (seconds)

Jo

int

an

gle

(d

eg

ree

s)

Shoulder lift step response

TargetOpen−loopClosed−loop

−1 −0.5 0 0.5 1 1.5 2 2.5 3

0

5

10

15

Time (seconds)

Jo

int

an

gle

(d

eg

ree

s)

Elbow step response

TargetOpen−loopClosed−loop

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5

0

2

4

6

8

10

12

Time (seconds)

Join

t angle

(degre

es)

Wrist yaw step response

Target

Closed−loop

Figure 5.8: Step responses for each of the major types of actuators of the robot.Top, the shoulder-lift joint, a NEMA-34 stepper motor. Middle, the elbow joint, aNEMA-23 stepper motor. Bottom, the wrist yaw joint, a rigidly coupled RobotisRX-64 servo. Note that timescales change on each plot.

5.6 Performance

The performance of the manipulator on several metrics was measured. Closed-loop

repeatability was tested by repeatedly moving the between a home position and eight

locations distributed far apart in the workspace. The repeatability at the home

position is shown in Figure 5.7, where the position of the arm is plotted each time

after it returned from a distant location, as measured by an external high-precision

optical tracking system with 0.1mm accuracy.

The encoders can register changes of 0.036 degrees, which corresponds to 0.64mm


Figure 5.9: Low-cost MEMS inertial sensors affixed to the teleoperator’s torso, upperarm, lower arm, and hand to estimate desired end-effector positions.

at the base joint with the arm fully extended. The stepper motor at the base joint

can command changes of 0.52mm at the end effector. Moving down the arm, each

subsequent motor can command sequentially finer motions due to increased effective

gear ratios and shorter distances to the end effector.

Payload was measured by adding mass to the end effector until the shoulder step-

per motors slipped when slowly moving through the worst-case fully outstretched

configuration. Maximum velocity was measured by commanding the fully-extended

arm to move upwards at the maximum rate of the stepper motor controllers, while ob-

serving the end-effector velocity with an optical tracking system. These experiments

demonstrated a maximum payload of 2.0 kg and a maximum velocity of 1.5 m/s.

Due to the ability of the encoders to register very small motions and the soft com-

pliance of the arm, force sensing can be accomplished by measuring the displacement

of the arm. With the arm fully outstretched, masses of 15 grams reliably induce an

angular deflection large enough to be observed by the shoulder-lift joint encoder.

5.7. CONTROL AND SOFTWARE 83

Figure 5.10: Playing chess via teleoperation.

5.7 Control and Software

The manipulator is controlled using standard techniques: closed-loop PID control in

joint space is achieved using joint encoders. Inverse kinematics using the OROCOS-

KDL library [10] allows Cartesian control while respecting joint limits. Nullspace

control is numerically computed on-the-fly using the Eigen C++ library [35] to con-

tinually push the joints away from singular configurations. As previously discussed,

the proximal joints are coupled due to their belt and cable routes. Linear feedforward

terms were added to the joint-space controller to decouple the kinematics.

System integration and visualization is performed by the Robot Operating System

(ROS) [76] under Linux to ease debugging. ROS will be fully described in 8, but in this

context, it was employed due to its support of hot-swapping software modules, and

management of setup and teardown of an ensemble of peer-to-peer data connections.

It was thus possible to easily swap the underlying controllers to support additional

features, such as improved force sensing or to simulate Cartesian compliance.

To demonstrate the ability of the manipulator to perform various tasks, a low-

cost teleoperation system was created similar to that described in [64]. Compact,

inexpensive USB devices containing MEMS inertial sensors and magnetometers were


Figure 5.11: Demonstration task: making pancakes.

affixed to a shirt, allowing easy estimation of the posture of the teleoperator (Fig-

ure 5.9), which in turn was used to generate inverse-kinematics joint-angle targets for

the manipulator. This control stack was used, among other things, to play a game of

chess (Figure 5.10) to demonstrate teleoperation involving fine motions.

5.8 Demonstration Application

To explore the feasibility of real-world use of the proposed manipulator, a demon-

stration “chefbot” application was created. For this application, the manipulator was

equipped with an end effector consisting of a distal roll axis connected to a spatula

and spoon. The unpowered manipulator was moved by hand through a trajectory

that scooped pancake batter out of a basin, poured two pancakes on a griddle, flipped

them at the appropriate time, and finally deposited the pancakes onto a serving plate

(Figure 5.11).

Joint-space waypoints were recorded via keypress while the manipulator was being

manually moved. The intrinsic compliance of the manipulator simplified the software:

5.9. SUMMARY 85

only simple moving-setpoint control with linear joint-space interpolation was neces-

sary in order to obtain reliable autonomous task completion. During the scraping

operations, firm contact between the spatula and the griddle surface was maintained

by virtue of the series-elastic shoulder and elbow, in addition to the compliance of

the spatula. As a result, neither high-bandwidth control nor accurate force/torque

sensors were required at the end effector to manage the contact force.

5.9 Summary

This chapter presented the design of a low-cost 7-DOF arm intended for personal-

robotics research tasks and explorations. Low cost was the result of a series of trade-

offs. For speed reduction, the cost of an expensive gearhead was traded for the volume

and complexity of a timing belt and zero-backlash cable drive circuit. For the prox-

imal actuators, the cost and potential backlash of a highly-reduced gearmotor were

traded for the relatively large static power requirements of stepper motors. These

design tradeoffs were chosen for the envisioned application of research robots inter-

acting with unstructured environments such as a typical home or workplace, where

the safety of intrinsic mechanical compliance is an important design consideration.

The cost-controlling tradeoffs described in this chapter were made as an exploration

of designing affordable compliant manipulators, an area of research which, to date,

has received little attention, but which could have a large impact on the speed of

adoption of robots into high-volume consumer markets.

Chapter 6

A Low-cost Robotic Hand

6.1 Introduction

This chapter describes the design and development of a low-cost robotic hand de-

veloped for the DARPA Advanced Robotic Manipulation (ARM) program. Some of

the highest-level design decisions were driven by program requirements: most im-

portantly, the hand was required to be self-contained. Virtually all animals realize

phalange motion through a network of tendons connected to lower-limb muscles. In

contrast, this research program sought to create robotic hand designs which can be

easily swapped among a variety of extant robotic manipulators. This required a fully

self-contained hand design, and made for numerous volumetric challenges, which will

be described in this chapter. This requirement of self-contained actuation was the

primary driver of the mechatronic topology. Other influential program goals included

low cost and a reasonable level of robustness, which will be extensively discussed in

this chapter.

The hand was designed in close collaboration with Dr. Curt Salisbury of Sandia

National Laboratories, who realized the mechanical design and assembly. As such,

the mechanical work is not claimed as original work by the author for this thesis, and

is described herein only for completeness and to explain the co-designed electronics.

The electrical and software portions of the hand were designed and implemented by

the author, and are presented in full detail.

86


Figure 6.1: Each finger module has three motors at the proximal end of the module,shown at left in the figure.

6.2 Related Work

Robotic hands have long been a field of intense interest. Some of the seminal previ-

ous designs include the Salisbury Hand [79], the Utah/MIT Hand [41], the Shadow

Hand [87], and the Barrett Hand [94]. Among more recent designs, a high-level clas-

sification can be made between fully-actuated and under-actuated kinematics. Many

arguments have been made for and against each approach. Although implementation

decisions can greatly affect performance and capabilities, in general, under-actuated

hands can be simpler, less voluminous, and less massive, while still capable of grasp-

ing a large variety of objects and performing relatively simple in-hand manipulations

such as pushing a button on a handheld electronic device. In contrast, fully-actuated

designs must encompass relatively more complexity, typically requiring more volume

and mass, but offer the maximum possible kinematic capabilities and permit in-hand

manipulations such as complex finger gaiting. The hand described in this chapter

falls into the latter category: it is a fully-actuated, 12 DOF design.

6.3 High-level Design

The hand was designed with low cost and robustness in mind. Towards these goals,

the hand was designed as a collection of identical finger modules that attach to a

hand frame. Each finger module is a self-contained three-axis micro-manipulator, as

shown in Figure 6.1. The results of the design effort towards low-cost and robustness

will be described in detail in the following sections.

88 CHAPTER 6. A LOW-COST ROBOTIC HAND

Figure 6.2: The hand frame and its set of identical finger modules, which dock mag-netically or with retaining bolts.

6.3.1 Robustness

Robustness is a relative term that is used in many contexts and can be difficult to

quantify without extensive destructive testing. However, as a general concept, robust-

ness implies the ability of a system to gracefully absorb and recover from situations

that exceed the normal operating conditions of the system.

Several key design decisions were made to increase the robustness of the hand. As

shown in Figure 6.2, the finger modules connect to the hand frame via a mechanical

fuse, exhibiting binary compliance. When large forces or torques are experienced,

for example if a robotic manipulator crashes the hand into a rigid object, the finger

modules separate cleanly from the hand frame, rather than attempting to absorb

6.3. HIGH-LEVEL DESIGN 89

the overload force or torque and convey it back to the manipulator arm. Binary

compliance can be achieved in a variety of ways. During the design and prototyping

process, both magnetic decoupling and easily-replaced mechanical breakaway features

were employed.

An alternative strategy to improve robustness is to support field repairs by un-

trained personnel. Towards this goal, the finger modules are designed to be easily re-

placed in the event of mechanical or electrical failure. Electrical connectivity between

the finger modules and the hand frame is achieved with spring contacts, avoiding the

need for seating a delicate connector. Mechanical installation consists of tightening a

pair of easily-accessible screws into their respective mechanical breakaway features.

6.3.2 Actuation

Extremely compact fingers and knuckles are desired in a robotic hand, to permit

natural manipulation of objects and tools commonly found in human-designed envi-

ronments. Towards the goal of slender fingers and knuckles, the motors were placed

as proximal as possible while still meeting the design requirement of a self-contained

hand. Furthermore, as discussed in the previous section, each finger module needed

to be self-contained to support the goal of separable fingers for robustness.

These constraints resulted in a design where each finger module contained a col-

umn of three motors at its base, as shown in Figure 6.1. To maximize torque density

of the motors, brushless “outrunner” motors were employed. The “outrunner” motor

configuration is essentially an “inside-out” brushless motor: the stator is at the center

of the motor, and is enveloped by a ring of permanent magnets spinning around it.

The magnetic moment arm, and thus the resultant torque, is as large as possible for

a given motor diameter, a critical factor in such a space-constrained design scenario.

Unfortunately, the geometry of “outrunner” motors requires significant thought

to be given to the thermal path of the motor. Because the permanent-magnet rotor

envelops the stator coils, significant heat is generated inside the motor. To sink this

heat away from the motors, the stator of each motor is affixed to a heatsink inside

the finger module base. As will be described in detail in Section 6.5, the stators


Figure 6.3: The motor module (aluminum at right) separates from the rest of thefinger module (plastic at center) by simply removing a few bolts. Cable tension isnot affected.

and shaft supports of the “outrunner” motors pass through interior cutouts in the

motor control boards to conserve volume and eliminate wiring. Similarly, the rotor

shafts pass through the shaft supports and heatsink, terminating with small pinion

gears located outside of the heatsink. For thermal conductivity, the heatsink and a

protective cap over the rotors are constructed out of machined aluminum. As shown

in Figure 6.3, this “motor module” is fully self-contained and cleanly separates from

the finger module using six bolts. As such, it can be swapped without requiring any

adjustments to the cable drive mechanisms, as the cable tension is borne by the gear

reduction and its supporting input and output bearings.

6.4. SENSOR SUITE 91

After the motor, typical cable-driven robotic assemblies employ a gear reduction

followed by a capstan upon which mechanical drive cables wrap and terminate. How-

ever, the volume and mass of robotic fingers are severely constrained, making this

traditional approach undesirable. To permit the gear reduction to also serve as the

cable capstan, the output shaft of the gear reduction is rigidly coupled or “grounded”

to the finger module chassis, and the input side of the gear reduction housing is

supported by a thin-section bearing.

To maximize torque density, a low-cost planetary gear reduction was selected.

Like many planetary gear reductions, the selected unit was rotationally symmetric

about the co-axial input and output shafts. Additionally, the exterior of the ring

gear of the planetary gear reduction was slightly larger than a common metric thin-

section bearing size. Disassembling the gear reduction and precisely turning the ring

gear on a lathe allowed the gear reduction to be precisely mated to the thin-section

bearing, which in turn was press-fit into the finger module assembly. As a result, the

exterior of the gear reductions function as capstans for the mechanical drive cables

actuating the distal two degrees of freedom of each finger module. The first degree

of freedom, being co-axial with the first motor, did not require a cable drive and is

rigidly attached to its respective gear reduction.

6.3.3 Hand Frame

Because the mechanical complexity of the hand is contained inside the finger modules,

a variety of hand geometries can be realized by printing different hand frames and

simply bolting the fingers into them, as shown in Figure 6.4.

6.4 Sensor Suite

The hand is equipped with a variety of proprioceptive and exteroceptive sensors. The

proprioception capabilities include joint-angle, tactile, strain, and thermal sensing

schemes, whereas the exteroceptive sensing capability is provided by a small camera

array. These sensing modalities will be described in detail in the following sections.


Figure 6.4: Hand frame variations. Finger modules are unchanged.

6.4.1 Joint Encoding

The volumetric constraints of robotic fingers increase the difficulty of sensing their

joint angles. In Chapter 4, an argument was made in favor of utilizing MEMS ac-

celerometers to reduce the overall system cost versus the traditional solutions of

optical or magnetic encoder discs. However, in the case of robotic finger, an addi-

tional argument against traditional disc-based solutions can be made: they simply

cannot fit within the structures while satisfying mechanical constraints. The volume

of the finger knuckles is a critical stress concentration that must be mitigated by rel-

atively massive mechanical structures, due to the somewhat poor material properties

of rapid-prototyping plastic at time of writing.

As a result, a combination of MEMS inertial sensors and forward-propagation

of motor encoders were employed to create a compact, low-cost encoding scheme.

Three-dimensional accelerometers were placed on the circuit boards in the base of

the finger module and the F2 and F3 phalanges, as shown in Figure 6.5. In static

conditions, these accelerometers measure the direction of the gravity vector in each

respective inertial frame.

By projecting the difference between the direction of the gravity vectors onto the

constraints of the kinematic chain, the joint angles can be inferred in many cases.

The F2-F3 joint angle is relatively easy to infer, as accelerometers are positioned


Figure 6.5: The locations of the accelerometers are illustrated by the red circles.

on either side of this 1-d joint. As such, the joint is only difficult to measure when

the direction of the joint approaches vertical.

In contrast, the proximal two joints (F0-F1 and F1-F2) are significantly more

difficult to estimate. Because of the kinematic advantages of reducing the offset

between the two proximal orthogonal joints, F1 was made as compact as possible.

Its interior volume is fully occupied by four internal cable pulleys and the shaft and

bearings for the F1-F2 joint. As a result, it was prohibitively difficult to place an

accelerometer in this link, and thus the joint angles for the two proximal joints must

be inferred by only observing the gravity vector in F0 and F2, as shown in Figure 6.5.

The closed-form solution for the proximal two joint angles as a function of the

accelerometer readings in the F0 and F2 links is somewhat complex and was provided

in Chapter 4. This closed-form solution can produce up to four estimates. However,

applying range-of-motion constraints typically results in only one or two potential

solutions, and a variety of tracking and filtering schemes can be employed to choose

between them.


Figure 6.6: Soft tactile pads allow conformal grasping of small objects.

Figure 6.7: To achieve mechanical robustness while still exhibiting conforming prop-erties, the skin consists of a tougher thin outer layer above a very soft and thick innerlayer.

6.4.2 Contact Geometry

Soft coverings of manipulator surfaces can greatly simplify grasping and manipula-

tion tasks. As demonstrated throughout the animal kingdom, biological manipulator

surfaces such as hands, feet, paws, tentacles, etc., conform to the object or envi-

ronment. Flexible surfaces create far larger contact patches than those created by

rigid manipulator surfaces, allowing for partial or full envelopment of small objects,

as shown in Figure 6.6. This effect is particularly pronounced when small objects

must be manipulated in fingertip grasps, such as writing implements or keys. Even

when handling larger objects, however, grasp stability is greatly improved when the

contact patch is a large surface rather than the line contacts or even point contacts

produced when rigid manipulator surfaces attempt to grasp hard objects.


From an engineering perspective, however, it is often difficult to fabricate a surface

which exhibits conformal properties while being mechanically robust. To address this

challenge, a multi-layer silicone finger skin was developed. Conformal properties are

provided by a relatively thick layer of very soft Shore OO 10 silicone gel. This soft

gel is coated by a thin and somewhat stiffer coating of Shore A40 silicone. Because

the junction between these layers is between two layers of silicone, very high bond

strength can be achieved using silicone adhesives. The result is somewhat analogous

to animal skin, where relatively stiff layers of dermis and epidermis protect the soft

layers of flesh underneath. In both cases, the skin is highly compliant in the direction

of its surface normal, but significantly stiffer under shear loading conditions. This

allows for shear forces to be exhibited on objects and the environment, even while

the manipulator surface is conforming to the object or environment. A cutaway of

this construction is shown in Figure 6.7.

6.4.3 Tactile Sensing

Robotic manipulation, whether for simple grasping or complex in-hand manipulation,

involves managing fingertip forces while maintaining contact with objects. As such,

high-resolution tactile data can be extremely useful. Towards this objective, and with

the overall design goals of low cost and robustness in mind, a novel tactile scheme

was developed and implemented.

As described in the previous section, the finger pads are constructed using a

multi-layer construction to exhibit mechanical toughness against strain loads and

mechanical compliance under normal loads. By measuring the deflection of the soft

inner layer, the contact forces can be estimated.

To observe this deflection, a layer of clear silicone was added below the soft inner

silicone layer shown in Figure 6.7, and an array of transflective photosensors were em-

bedded below this clear layer, as shown in Figure 6.8. Transflective photosensors are

comprised of an LED and phototransistor pair inside a single package. The LED and

phototransistor are arranged with a vergence angle such that the photocurrent varies

significantly depending on the reflectivity of objects located within a few millimeters


Figure 6.8: Cross-section rendering showing transflective sensors embedded in thefinger pads.

Figure 6.9: Tactile array implemented as a rigid-flex PCB.

of the device.

To improve the optical properties as much as possible, the outer (tough) layer

is cast in black silicone, and the middle (soft) layer was cast in white silicone. As a

result, when observed from below, through the clear silicone layer, the proximity of the

nearest white surface changes from approximately 1.5mm to 0.25mm, depending on

the external forces applied to the silicone assembly. The variance in proximity of the

white layer proximity produces a varying photocurrent, which is directed through a

transimpedance amplifier and low-pass filter before being digitized by a 16-bit analog-

to-digital converter.

Rigid-flex circuit boards were created to fit this circuitry into the space constraints

of the robotic fingers. Rigid-flex constructions allow for high-density, multi-layer cir-

cuitry on a portion of the assembly, with a subset of the copper layers then continuing


Figure 6.10: Flat test fixture for the tactile array.

outside the rigid fiberglass section and protruding while being covered only by flexible

polymide film. As shown in Figure 6.9, the rigid portion includes the vast majority

of the components and is routed on six layers, whereas the flexible portion includes

only the photosensor array. When installed into the robotic finger, the rigid portion

resides inside the finger volume, and the flex portion wraps around the slightly conical

outside of the finger core, covered with a protective plastic “window frame” to secure

the photosensors against shear loads. This assembly is then covered with the layers

of clear, white, and black silicone. A flat test fixture of this construction is shown in

Figure 6.10.

By varying the durometers and thicknesses of each respective silicone layer, a va-

riety of sensor characteristics can be tuned, such as sensitivity, range, and mechanical

toughness. For the robotic hand described in this work, the layer thicknesses and

durometers were chosen experimentally to seek a balance between these properties to

allow sensing of handheld tool manipulation. A representative plot of the raw sensor

response to repeated cycles of loading and unloading a 2-gram US Penny is shown in

Figure 6.11, demonstrating that these 2-gram loads are far above the noise floor of

the sensor.

6.4.4 Strain

To provide a direct measurement of the force of the grasp closure, a strain gage

was affixed to the center of the F2 link. This gage is measuring the bending of the

F2 link as it is loaded under the combined torques of J2 and J3. Because of space

constraints in the F2 link, the strain is measured using a single gage in a quarter-bridge


Figure 6.11: Raw sensor response of repeatedly loading and unloading a 2-gram USPenny onto the skin assembly shown in Figure 6.10.

configuration. The differential bridge voltage is measured by a 24-bit analog-to-digital

converter after amplification.

6.4.5 Visual Sensing

Although tactile and strain sensing can be of great utility in managing and maintain-

ing contact forces, a key challenge in robotic manipulation is knowing where contact

forces should be applied. A wide variety of approaches have been applied to this

problem over decades of research, with many approaches utilizing vision-based data

as their sensory input. However, key challenges to all vision-based methods include

calibration and occlusion.

The calibration problem is simple to describe: namely, if visual or depth data

is obtained from head-mounted sensors on a humanoid robot, this information must

be propagated through typically nine joints (a pan/tilt head followed by a 7-DOF

arm) to arrive in the frame of the end effector. Angular errors and coordinate-frame

misalignments are cumulative, resulting in errors on the order of one centimeter even


on precisely machined, carefully calibrated humanoid robots. Although far superior

results can be obtained by the extremely rigid structures found in industrial robots,

and particularly by closed kinematic chains, the fundamental problem remains: prop-

agating sensory data through long kinematic chains is inherently difficult.

Occlusion is even simpler to describe: visual data is obtained by a 2-dimensional

projection of the scene onto an imaging device. By definition, it is impossible to see

through opaque objects. As a result, form-closure grasps of objects typically involve

placing at least one contact point on a region of the object that cannot be directly

perceived from a single viewpoint. A variety of strategies has emerged to predict

where these occluded grasp points should be placed. However, the ability to perceive

in the hand frame allows all sides of an object to be imaged by “flying” the hand

around the workplace. Such “eye-in-hand” systems have been proposed by a number

of researchers over the years. Recent developments in depth-sensing algorithms, com-

bined with ever-increasing electronics density, demonstrated the technical possibility

and utility of creating an in-hand stereo vision system.

A key design challenge of the in-hand stereo vision system was to create a system

which did not add to the volume of the hand, while simultaneously not compro-

mising the structural integrity of the mechatronic assembly. A first prototype was

constructed using mobile-phone camera modules. Due to global demand for “camera

phones,” hundreds of millions of camera modules are manufactured annually, with

massive market pressures pushing towards continual improvements in cost, volume,

and power.

Two approaches toward using cell-phone camera modules are shown in Figure 6.12,

using circuit boards which fit behind the palm surface of the hand and carry cam-

era modules on the tips of long protrusions reaching between the finger modules to

avoid introducing structural issues. The first effort intended to maximize density

by soldering camera modules directly to this large circuit board, but the (relatively)

large metal lens-holding structures were found to cause manufacturability issues with

standard surface-mount electronics production tools. As a result, the second effort

shown in Figure 6.12 employed the standard mobile-phone fabrication technique of

pre-assembled camera modules and high-density connectors.


Figure 6.12: Trinocular camera boards holding direct-solder lens modules (left) andfully-assembled camera flex circuit boards (right).

As demonstrated by the recent explosion of perception research using low-cost 3D

sensors, depth data can significantly improve the capabilities of perception systems.

A variety of modalities have been explored over the past several decades. Passive

stereo systems are notorious for failing in sparse artificial scenes having low texture.

Unfortunately, this is a common situation in the envisioned use cases for robotic

hands. As a result, various active-sensing modalities were prototyped for the in-hand

vision system.

Pico Projector

The first prototype was based on a beam-steered pico projector. These devices use

oscillating MEMS mirrors to sweep laser beams across their images, and are compact

enough to mount on the side of a robot hand, as shown in Figure 6.13.

As described in several papers in the literature using full-size projectors, the pro-

jectors can be used to create difference images of bars at various horizontal scales,


Figure 6.13: Left: beam-steered pico projector affixed to the side of a robotic hand.Right: depth image constructed using this apparatus.

as shown in Figure 6.14. Pixel-wise depth estimates are then obtained by classify-

ing each pixel of each frame as {0, 1, indeterminate}, and converting the resulting

binary string to its unique plane emerging from the projector. Intrinsic calibration

of the camera produces a ray for each pixel, which intersects this projector plane to

produces a 3D estimate. Figure 6.13 shows the composite depth image produced by

the images of Figure 6.14.

Although this method has some benefits, particularly that its independent pixel-

wise estimation preserves sharp depth discontinuities, it also has several drawbacks

that became apparent during the prototyping phase. First and foremost, it relies on

using the pico projector to overcome ambient light. Safety regulations dictate that

handheld laser projectors are eye-safe; as a result, they are limited to approximately 20

lumens, spread evenly across the entire scene. Although effective in somewhat dimly

lit rooms, this technique would not scale well to the variable lighting conditions of

many envisioned applications of the robot hand.


Figure 6.14: Difference images of polarity-inversion bar codes produced by the picoprojector, amplified 5x.

Fingertip Laser Line Scanner

As a result of the limited ambient-light robustness of the pico projector, a simpler

scheme was next prototyped using a laser line generator on the back of a finger.

This finger was then slowly moved in such a way as to sweep the laser line across the

scene, following the simple and well-known geometry and image-processing techniques

of laser line scanners. As shown in Figure 6.15, this technique is relatively simple to

implement. Because the optical energy of the laser is collapsed onto a 1-D line, instead

of trying to cover a 2-D surface as in the pico projector, there is a significant gain in

signal-to-noise ratio simply due to the higher concentration of laser light for a given

optical power limit.


Figure 6.15: Left: laser line generator mounted on robotic finger. Middle: Laser linesweeping across scene. Right: typical frame and image-difference processing.

As expected from this modality, the finger-based prototype of this technique

showed the advantages and drawbacks typical of laser line scanners, as previously

described in Chapter 3. Fine detail emerges readily, as shown by the coffee mug scan

in Figure 6.16. However, scans are time consuming. The scan shown in Figure 6.16

was acquired in approximately one minute and assembled from 300 positions of the

finger-mounted laser line generator.

Speckle Generator

As a final prototype experiment, a laser speckle generator was employed to produce

unstructured light. As noted widely throughout the literature and demonstrated by

contemporary robots such as the Willow Garage PR2, the injection of unstructured

texture into scenes can significantly boost the performance of passive stereo vision

systems. This is particularly true in artificial scenes containing featureless walls and

objects, which can cause block-matching stereo algorithms to fail.

A demonstration of the utility of this method is shown in Figure 6.17 on a scene

of an envisioned future application of the robotic hand: grasping a coffee mug from a

table. As is common in artificial workspaces, the scene offers little texture for passive

stereo block-matching, hence producing few points on the surface of the coffee mug


Figure 6.16: Two scenes showing typical scans of the fingertip-mounted laser scanner.

itself. Laser line scanning (Figure 6.17, right-most column) produces hundreds of

thousands of points, but at the cost of acquiring three hundred images. Injecting

texture into the scene via laser diffraction (Figure 6.17, middle column) offers a

useful intermediate level of point cloud density with the benefits of a single-frame

point-cloud acquisition time.

6.5 Wiring Elimination

Wiring issues are a common cause of failure in robotic systems. Unfortunately, proper

design for wires that flex during normal operation often involves large structures to

increase the bend radius of moving wires, or capturing wires in hollow chains traveling

through carefully-designed routes to stay clear of moving parts. For the design of

a fully-actuated robotic hand, such volumetric requirements are prohibitive, as the

majority of the interior volume is already occupied by electro-mechanical systems. To

address these issues, significant design effort was expended to eliminate all moving

6.5. WIRING ELIMINATION 105

Figure 6.17: Demonstration of unaided stereo (left), texture-assisted stereo (center),and laser line scanning (right) on an artificial scene with very little texture.

wires from the design.

The result of this design effort can be seen in Figure 6.18, which compares the

initial hand prototype, which utilized pre-packaged motors and hall-effect encoders,

with the custom motor assemblies of the final design and eliminated all loose wiring.

The wiring-reduction steps in the hand design can be categorized into three de-

sign efforts. First, wiring involved in the sensing and commutation of the motors.

Second, wiring to provide power and communications between the base of each finger

and sensor arrays mounted in the phalanges. Third, wiring connecting the major

subsystems of the hand. The approaches used to meet each of these challenges were

quite different, and will be detailed in the following sections.


Figure 6.18: Left: initial prototype hand. Right: final prototype, after extensivedesign work to eliminate loose wires.

6.5.1 Motor Wiring

As briefly mentioned in Section 6.3.2, the hand is actuated by a set of “outrunner”

brushless motors. These motors consist of a three-phase stator winding surrounded by

a permanent-magnet rotor ring. The direction of currents through the stator windings

must be commutated through a six-step cycle in synchrony with the position of the

rotor. Such motors are often packaged as an assembly featuring a small circuit board

underneath the stator that is only slightly larger than the rotor diameter, which

contains the necessary hall-effect switches to provide clocking for the commutation

sequence. Unfortunately, utilizing such pre-assembled motors can result in a design

such as that shown in the left side of Figure 6.18, where a great deal of wiring must be

run between the motor assemblies and the motor controllers. In a compact high-DOF

design environment such as a 12-motor hand, this can become difficult to manage.

To resolve this issue, as well as reduce the volume of the overall design, custom

motor controllers were designed. To control the fabrication cost of the design, com-

mercial off-the-shelf (COTS) low-cost outrunner brushless motors were employed.


Figure 6.19: Stackup of outrunner brushless motors, controller board, and heatsink.

These COTS motors were disassembled, after which their stators and rotors were

re-assembled around the custom motor controllers. As previous discussed in Sec-

tion 6.3.2, each finger module contains three outrunner brushless motors. Corre-

spondingly, each motor controller board contains the necessary hall effect sensors,

amplifiers, computation, and communications resources to drive three motors.

The motor control boards are mounted in between the rotors and the finger

heatsink, as shown in Figure 6.20. This placement allows the hall-effect sensors

for brushless commutation to be placed on the same circuit board as the amplifiers,

reducing the cost, volume, and reliability issues associated with bringing these sensor

inputs to off-board amplifiers.

The remainder of the motor-controller wiring shown in Figure 6.18 was eliminated

by soldering the stator windings directly to the motor control boards. Traces on the

motor control boards were brought to open pads directly underneath the termination

points of the magnet wire phases on the stators, allowing direct and secure connections

inside the volume of the enveloping rotors and avoiding any chance of wires being

ingested into the assembly.

On the motor control boards themselves, integrated motor drivers were utilized.


Figure 6.20: Finger Motor Controller Board (FMCB)

The ST L6229Q drivers were selected primarily for their compact size and high level

of integration, as they contain commutation logic, current control loops, and the three

half-bridges required for brushless commutation within a 5mm x 5mm package. Even

with the large interior cutouts required in the motor board for the stators to pass

through, the resulting layout easily fits within the volume of the finger module, as

shown in Figure 6.20.

However, one penalty for such a high level of power-stage integration is the rela-

tively high bridge resistance of approximately 2 ohms. To manage the heat dissipation

associated with this resistance, the amplifiers are mounted on the bottom side of the

motor control board, which is subsequently clamped to an aluminum thermal plate,

as shown in Figure 6.19.

To protect the high-speed outrunner rotors from damage, the back of the finger

module is covered by an aluminum shell which bolts to the thermal plate. The

shell also significantly increases the surface area available for heat convection. These

thermal features allow the fingers to operate without the associated reliability issues


of forced-air cooling, despite their relatively high power density and scale of electro-

mechanical integration.

6.5.2 Phalange Wiring

One of the key design challenges of a dense, fully-actuated robotic hand is to create

reliable and compact electrical connections out to the fingertip sensors. In particular,

the F1 link, which connects the orthogonal joints J1 and J2, is particularly challeng-

ing since its internal volume is fully occupied by structural material, bearings, and

shafts for the four cable lengths passing through it. Because F1 is the most proximal

finger link, it sees the highest mechanical stresses. Thus, unfortunate tradeoffs are

created by any effort to remove structural material to create free space for electrical

conductors.

Several prototypes were created using thin 0.5mm-pitch flat-flex cable (FFC),

which has become a standard interconnect method for mass-market products such as

flip phones and hinged laptop screens. However, after the failure of several prototypes,

it was determined that although FFC wiring excels at single-DOF applications, such

as the consumer products listed previously, it is difficult to create satisfactory routes

and service loops for FFC cables in space-constrained multi-DOF settings such as the

F1 links of the fingers.

To bypass the challenges associated with creating routes for copper wiring in

parallel with the steel mechanical tendons, a series of experiments and prototypes

were constructed which use the steel mechanical tendons as electrical conductors.

The resulting system consists of several carefully co-designed mechanical and electrical

components, and is able to multiplex DC power and multi-megabit half-duplex data

onto the pair of steel tendons which pass through the F1 link.

To permit bidirectional and symmetric torque generation at each joint, the F2 and

F3 links each house terminations of a pair of steel tendons. The tendons wrap multiple

times around their respective floating gear reduction in the base of the finger module,

as described in Section 6.3.2. To eliminate the possibility of tendons slipping on their

respective gear reduction, each cable is carefully unwound for a few millimeters at its


Figure 6.21: Two pairs of steel cables actuate the distal phalanges, shown in red andyellow. Electrical implementation is shown at bottom.

midpoint, and a small screw passes through the cable and into a hole tapped into the

side of the gear reduction.

This mechanical cable actuation system was created to produce maximal torque

density with commodity low-cost parts. However, it can also be exploited to pass

power and data through the difficult design environment of the F1 link. As shown

in Figure 6.21, there are two steel cables running through F1: the cable wrapping

around M2 (illustrated in yellow) and the cable wrapping around M3 (illustrated in

red). Through their retaining screws, these cables are electrically coupled to the ring

gear of their respective gear reductions. These planetary gear reductions have metal

ball bearings and a metal output shaft. As a result of the cable tension preload, the

cables are therefore electrically connected to the output shaft of their respective gear

reductions. Each output shaft, in turn, is electrically connected to the metal retaining

features and setscrews which lock them relative to the plastic front cover of the finger

module chassis. By connecting wires to these retaining features, it is thus possible to


1 2 3 4

D

C

OSRF419 N. Shoreline BlvdMountain View, CA 94043USAhttp://osrfoundation.org10/20/2012 11:26:02 AM

Morgan Quigley

Title

Size: Number:

Date:Designer:

Revision:

Sheet ofTime:

A

D+ D- D+ D-

V+ V-

D+ D-

V+ V-

EN

24V

FMCB

F2 F3

Figure 6.22: Simplified schematic of the multiplexing of power and half-duplex dataover the pair of conductors running the length of the finger. RS-485 transceiversare connected to the D+/D- nodes; F2 and F3 power supplies are connected to theV+/V- nodes. Bus power is supplied from FMCB (left).

obtain electrical connections to the F2 and F3 links, without requiring any flexible

copper conductors.

The steel mechanical tendons and the various mechanical features mentioned in

the previous section incur electrical penalties including both resistance and the po-

tential for intermittent connectivity as the ball bearings in the planetary output stage

are loaded and unloaded. The electrical resistance of the conductor chain totals 1.5

ohms; this would potentially be an issue if this transmission line were intended to drive

motors. However, since only sensors and relatively energy-efficient microcontrollers

are receiving power, the losses are manageable. The potential for intermittent con-

nectivity is addressed by capacitors on the distal sensors, allowing them to maintain

power through the momentary glitches. Data drops are handled by using a protocol

similar to USB, where each packet is protected by checksums and slightly spaced in

time, to allow for fast and unambiguous re-synchronization of all transceivers.

Although this conductor chain is mechanically robust, it only has two conductors.

The phalange sensors require both power and data connections to be useful. Thus,

a multiplexing scheme is employed to transmit both power and half-duplex RS-485

data across the pair of conductors. By reversing the standard connections to one

inductor of a common-mode choke, it becomes a “differential-mode choke,” which


strongly attenuates differential-mode signals. Thus, the pair of conductors can have

DC power flowing through it, but still act as a transmission line for data. The RS-485

transceivers on the motor controller and the distal and proximal phalanges are then

capacitively coupled to this conductor pair. Each transceiver has a transmitter-enable

line that is controlled by its corresponding microcontroller. Similar to USB and other

half-duplex architectures, bus traffic follows a simple master/slave scheduling method

to prevent collisions. A simplified schematic diagram is shown in Figure 6.22.

A key requirement of this multiplexing scheme is a data stream with zero DC off-

set. Otherwise, the data will attempt to “drive” the power line slightly higher or lower,

resulting in either saturation of the inductors or overloading the active transceiver’s

output stage. To avoid this condition, the microcontrollers in each module were specif-

ically selected to include Manchester encoders. The Manchester encoding scheme is

a simple method of removing DC bias in a data stream by replacing each “0” or “1”

bit with a “0-1” or “1-0” pair of chips. Because the chipping rate is now twice the

data rate, effectively half the bandwidth has been lost. This condition is consider-

ably improved in more contemporary 8b/10b coding standards, among many others.

However, space constraints in the phalanges, the desire to use commodity low-cost

microcontrollers, and the sufficiency of its throughput for tactile data transmission

resulted in the selection of Manchester coding for these experiments.

6.5.3 Interconnect Wiring

The design challenges and approaches taken to address the motor and phalange wiring

issues were somewhat specific to the design environment of a robotic hand with many

degrees of freedom. However, there are other interconnection challenges that are com-

mon to any high-density electronic system: connecting the various planes of circuitry

in the palm. These circuit boards do not move with respect to each other, allowing

standard high-density connectors to be employed. The cameras, palm board, FPGA

board, and power board all interconnect using 0.5mm-pitch, dual-row connectors.

The internal mechanical structures holding these circuit boards were the result of

careful and interactive electrical and mechanical co-design. The end result is that

6.6. COMPUTATIONAL SYSTEMS 113

installing the circuit boards into the hand cause their connectors to blind-mate, after

which the boards are locked into place using mechanical fasteners.

In contrast to the static interconnections of the palm circuit boards, the fingers are

designed to separate harmlessly from the palm in the event of mechanical overload.

This requires electrical connections which have sufficient capacity to power the motors

of a finger module and disconnect in “5.5” degrees of freedom: all three axes of torque

overload, and all axes of force overload except along the ray pointing from the finger

module to its mating surface on the back of the palm. The many potential modes

of mechanical finger/palm disconnection ruled out many standard connectors, which

would shear or otherwise be destroyed. As a result, a pogo-pin based approach was

employed. Four spring-loaded pogo pins protruding from the front face of each finger

module compress against suitably-placed gold-plated pads on a circuit board attached

to the back of the palm, carrying DC power and RS-485 data between the palm and

the finger module. As the finger disconnects during a mechanical overload, these

pogo pins simply slide or otherwise unload from their respective contact pads. The

circuitry immediately downstream and upstream of this spring-loaded connection is

designed to handle the transients associated with momentary connections or shorts of

these pins and their neighbors. As a result, mechanical breakaway can occur without

mechanical or electrical damage to the finger/palm connection.

Though it is difficult to render a three-dimensional interconnect topology in two

dimensions, Figure 6.23 illustrates the interconnection of the circuit boards in the

palm, as well as the motor control board of a single finger. The “pogo board” shown

connecting a finger socket to a finger motor board is simply an interconnect board

used to create rigid “wiring” between pogo pins protruding from the front of the

finger module to the finger motor control board, which must reside on a plane near

the back of the finger for the reasons described in Figure 6.5.1.

6.6 Computational Systems

The hand contains a distributed network of processors: three ARM Cortex-M3 micro-

controllers in each finger module, one ARM Cortex-M4 in the palm, and one Xilinx


Figure 6.23: Left: illustration of the connectors in the palm and one finger modulebase. Right: The resulting hand, which has features no loose wires.

Spartan-6 FPGA in the palm. These computational resources are connected through

a hybrid star-like topology, as illustrated in Figure 6.24. The microcontroller in the

base of each finger module has an RS-485 connection to a dedicated UART in the

palm microcontroller, as well as an RS-485 transceiver connected to its phalange

data/power bus, which has two slave controllers on the respective F2 and F3 links,

as described in Section 6.5.2. The palm microcontroller has a variety of outside-

facing peripherals, such as a CANbus isolator, 10/100 ethernet, and a USB isolator,

as well as an internal SPI link to the FPGA, which in turn has a high-bandwidth

connection to a gigabit ethernet physical-layer transceiver (PHY). This architecture

was designed to allow a variety of connectivity options to upstream hardware in the

often-noisy electrical environment of a robotic arm.

Because of the relatively large number of microcontrollers, attention was given to

designing methods to allow batch programming of the entire hand. It would be un-

fortunate, for example, if firmware updates required an operator to manually connect

6.7. TELEOPERATION INTERFACES 115

Figure 6.24: Data bus topology of the robotic hand.

programming adapters to all twelve ARM processors in the fingers. To prevent this sit-

uation, a custom bootloader was written that is compatible with the packet structures

used during normal operation of the hand. This bootloader waits for commands for

several seconds on power-up before booting the application image. Electrical power

to the nodes on each data bus can be shut down electronically by its upstream pro-

cessor. This allows a reprogramming cycle to be triggered at any point by forcing a

hard reset of the downstream processors, catching their respective bootloaders, and

transmitting a new application flash image. Through this method, a single script on

an attached computer can reprogram all twelve finger microcontrollers.

6.7 Teleoperation Interfaces

A variety of teleoperation interfaces were created to subjectively evaluate the perfor-

mance of the robotic hand with a collection of test objects. First, several off-the-shelf

gloves based on resistive flex-sensors were employed. The gloves used during these

tests, however, were challenging to calibrate for precise movements, and had tenden-

cies to experience “mechanical crosstalk” between the flex sensors in the proximal


Figure 6.25: An exoskeletal glove designed to precisely measure the movements of theteleoperator, with a kinematic structure similar to the robotic hand.

knuckles of the glove as they gradually migrated in between the knuckles of the tele-

operator during extended experimental sessions.

To improve teleoperation accuracy, an exo-skeletal glove was designed and inter-

faced to the robotic hand, as shown in Figure 6.25. Through a series of four-bar

linkages, this glove directly measures the relative orientations of the tele-operator’s

fingers, and was designed to have kinematics similar to the robotic hand. Limited,

subjective evaluations supported the hypothesis that direct measurement of the con-

figuration of the exoskeletal structure allowed for higher-precision teleoperation, as

compared to inferring the position of the tele-operator’s fingers through resistive flex-

sensor measurements.

The exo-skeletal glove allowed for high-dimensional movements to be expressed by

the tele-operator, such as finger-gaiting an object off a work surface and into a grasp.

However, the exo-skeletal glove required a significant amount of operator attention

to perform basic tasks such as performing perfectly cylindrical or spherical grasps.

This was partially because the range of motion of the robotic hand is significantly

super-human, requiring amplification of the motions of the exo-skeletal hand. The

6.7. TELEOPERATION INTERFACES 117

Figure 6.26: Various grasps and manipulation postures achieved during tele-operation.

resulting mapping required mental work on the part of the tele-operator and could

be surprisingly challenging.

To achieve super-human, reliable execution of “simple” grasps, an eigengrasp-

based interface was developed. Eigengrasps for canonical grasps such as cylindri-

cal, spherical, and prismatic grasps (among others) were experimentally determined.

Commodity gamepads were then utilized to traverse the eigengrasps by holding var-

ious buttons and moving the gamepad axes. Although this interface allowed quick

traversals of canonical grasps, the thumb-joystick interaction of commodity gamepads

caused difficulties in producing small, precise motions. The eigengrasp interface was

then used more successfully when mapped to the sliders of commodity audio-mixing

panels, which are widely available, low cost, and can connect to host computers over


USB and other common protocols. The relatively long travel of the sliders, and the

absence of spring-return features, allowed the mixing-panel eigengrasp interface to

more easily perform tasks in subjective evaluations.

Demonstrations of the robotic hand were performed by attaching the hand to

the wrist of a Barrett WAM arm, which was tele-operated using a straightforward

master-slave configuration with a second Barrett WAM arm. A second tele-operator

operated the hand using the exo-skeletal glove for high-dimensional tasks such as

finger gaiting, and the mixing-panel eigengrasp interface for simpler tasks such as

spherical grasps. A representative sampling of grasps and manipulations achieved

during these demonstrations is shown in Figure 6.26.

6.8 Summary

A low-cost and robust robotic hand was designed, prototyped through numerous

iterations, and demonstrated performing a variety of grasping tasks and simple ma-

nipulations. The design of this robotic hand was described in detail, focusing on its

electrical systems.

Chapter 7

STAIR and Switchyard

The previous chapters presented hardware subsystems intended to support the cre-

ation of low-cost robotic systems capable of performing a variety of tasks envisioned

of a future personal robot. However, the previous chapters did not directly address

the challenge of creating the immense volume of software required for a general-

purpose robot to successfully operate autonomously, or even semi-autonomously, in

the unstructured environments of typical homes and offices. To address this chal-

lenge, several robotics software integration frameworks were created as part of the

STanford AI Robot (STAIR) project.

This chapter provides a brief overview of the STAIR project and describes the

design of Switchyard, its original software integration framework. Switchyard was

created to support the rapid integration efforts of dozens of contributors to the STAIR

project, as the project quickly grew to include numerous parallel development efforts

for various robot subsystems.

7.1 Introduction

The STanford Artificial Intelligence Robot (STAIR) project was a long-running effort

by many researchers at Stanford University, led by Professor Andrew Ng. The long-

term goal of the STAIR project was to develop technologies necessary for the future

emergence of viable home and office assistant robots.

119

120 CHAPTER 7. STAIR AND SWITCHYARD

As concrete steps towards this goal, several large-scale demonstrations were cre-

ated to encourage the repeated integration of a large number of complex software

and hardware subsystems. These demonstrations included fetching items in response

to verbal commands, navigating multiple floors of office buildings using elevators,

and performing inventory-taking tasks in cluttered environments. Carrying out these

tasks involved the integration of large software subsystems for navigation, spoken

dialog, visual object detection, and robotic grasping, among others.

At the AAAI 2007 Mobile Robot Exhibition, videos were presented of the STAIR

robot performing a “fetch a stapler” demonstration. In the creation of these demon-

strations, having a consistent software framework was found to be critical to building

a robotic system of the level of complexity of STAIR. The following sections describe

these technical challenges in detail.

7.2 STAIR: Hardware Systems

The first robots built for the STAIR project were named simply STAIR 1 and

STAIR 2. Although each robot had a manipulator arm on a mobile base, they differed

in virtually all implementation details.

7.2.1 STAIR 1

The STAIR 1 robot, shown in Figure 7.1, was constructed using largely off-the-shelf

components. The robot was built atop a Segway RMP-100. The robot arm was

a Neuronics Katana 6M-180, equipped with a parallel plate gripper at the end of

a position-controlled arm with 5 degrees of freedom. Sensors used in the demon-

strations included a Point Grey Bumblebee stereo and trinocular cameras, a SICK

LMS-291 laser scanner in the horizontal plane, and a Sony EVI-D100 pan-tilt-zoom

video camera. For various experiments, additional laser scanners were mounted in

vertical and diagonal planes, as well as atop panning stages, to obtain full 3D point

clouds during both stationary and mobile settings.

The sensors were primarily mounted on an aluminum-extrusion frame bolted to

7.2. STAIR: HARDWARE SYSTEMS 121

Figure 7.1: Left: the STAIR 1 robot. Right: the STAIR 2 robot

the table of the Segway base. The self-balancing capabilities of the Segway were not

used; rather, an additional aluminum frame was constructed which added wheels to

the front and back of the robot to provide static stability. This was done as a practical

measure to avoid damage in the event of an emergency stop, at the cost of increasing

the footprint of the robot by approximately 20cm fore and aft.

STAIR 1 was powered by a deep-cycle 12-volt battery feeding an array of DC-

DC converters, which produced the various DC voltages required for the sensing and

computational subsystems. An off-the-shelf automatic battery charger carried by the

robot allowed the 12-volt power rail to function as a large-capacity uninterruptable

power supply (UPS), thereby enabling the computers and sensors to remain running

as AC power was removed for mobile experiments. The power system allowed for

approximately two hours of runtime on typical system loads.

Onboard computation was provided by a Pentium-M machine running Linux and

a Pentium-4 machine running Windows. These machines were connected via an on-

board ethernet switch and, via an 802.11g wireless bridge, to workstations distributed

throughout the wired building network.


7.2.2 STAIR 2

The STAIR 2 platform is also shown in Figure 7.1. Its original wheeled base (com-

prising the bottom 10 inches of the robot) was designed and constructed by Reuben

Brewer of the Stanford Biorobotics Laboratory. This base had four steerable turrets,

each of which contained two independently-driven wheels. As a result, the platform

could holonomically translate in any direction, turn in place, or translate and rotate

simultaneously. Desired motions in the robot’s coordinate system were translated

by a dedicated XScale processor on a Gumstix Verdex board into motor commands.

This mobile base was eventually replaced by a Segway RMP-400 holonomic base con-

structed from four Mecanum wheels, which provided similar capabilities for holonomic

motion.

The mechanical structure of STAIR 2 was also provided by an aluminum-extrusion

frame, which supported a Barrett WAM arm and a variety of sensors. The WAM

arm is a well-known manipulation system with seven degrees of freedom and a 3-

fingered hand. A dedicated onboard Linux PC was used to control the arm. For

perception, the robot was equipped with a Point Grey Bumblebee2 stereo camera,

with perceptual computation carried out on a second onboard Linux machine. The

power and networking systems were similar to STAIR 1: an ethernet switch linked the

onboard computers, and a 802.11g wireless bridge provided connectivity to various

offboard workstations.

7.3 Switchyard

Because of the large number of researchers contributing to the project and the parallel

development of its various subsystems, considerable effort was devoted to improving

the mechanisms used to facilitate collaborative robot software design, debugging, and

distribution.

Many other researchers have worked in this area of robot software systems, produc-

ing notable robotics frameworks such as Player/Stage [29], CARMEN [66], MCA [82],

Tekkotsu [97], Microsoft Robotics studio, and many others.

7.3. SWITCHYARD 123

After investigating these existing frameworks, we determined that our platform

and goals differed sufficiently from those of the designers of other frameworks that

implementing a purpose-built framework would be worthwhile. Specifically, our hard-

ware systems comprised a heterogeneous collection of operating systems and network

topologies. This implied requirements of both cross-platform support and peer-to-

peer messaging, which (at the time) was uncommon. Further, we desired to reduce

boilerplate code and simplify the messaging interface as much as possible, an admit-

tedly vague and subjective goal, yet one which significantly impacts the perceived

difficulty of adopting a software framework. Finally, the project required a collection

of tools for multi-machine process management, hierarchical subsystem composition,

and efficient message recording and playback.

This collection of requirements was subjectively determined to differ sufficiently

from existing projects to justify the creation of a new framework. As such, the Switch-

yard project was started. The following sections describe its design requirements and

goals.

Parallel Processing

The STAIR demonstration applications ran on large, highly capable robots, and

required carrying out a considerable amount of computation. The software had ele-

ments with soft-realtime requirements, such as obstacle avoidance, and longer-running

planning and scene analysis tasks. The onboard computational resources of the robots

could not support all of the required computations, resulting in the usage of several

offboard machines to support each experiment.

Modularity

Because the STAIR project involved dozens of researchers contributing to a sizable

code base, it was important to enforce modularity between software components so

that components could be debugged and verified in isolation as much as possible.

OS-neutral

Most of the computational resources used by the STAIR project were Linux work-

stations and servers. However, a few onboard sensors were only supplied with propri-

etary Windows drivers. As a result, the software system was required to operate on

both Linux and Windows, with data streams flowing transparently between them.


Robot-Independent

The STAIR project originally included two robots with significantly different hard-

ware. This encouraged the creation of software which was as robot-independent as

possible, to limit the size and complexity of the code base. Although some software

modules functioned as device drivers and thus were tied to hardware, the architec-

tural goal was to create as many software modules as possible which only sent and

received hardware-independent messages.

Clean

As with any large software project, research progress is significantly easier if the

software is clean, streamlined, and as short as possible.

7.4 Approach

The Switchyard framework was designed to meet the aforementioned requirements.

It supported parallel processing through message passing along a user-defined, task-

specific graph of peer-to-peer connections between software modules.

Modularity was enforced through the classical operating system process model:

each software module executes as a process on some CPU on the network. TCP was

chosen for message passing due to its support on all modern operating systems and

networking hardware, and its lossless performance allowed simple parsers that usually

did not need to handle re-synchronization.

From an aesthetic standpoint, the library was in the form of C++ classes which

each module extended to provide the required functionality. Networking, routing,

and scheduling code were abstracted away from the client software modules, allowing

most modules to have very little boilerplate code. The peer-to-peer connections were

encoded in hierarchical XML files.

These design choices were certainly debatable, and indeed, lessons learned from

Switchyard soon led to another software framework which will be described in the

next chapter. However, for completeness, the following sections will provide details

of Switchyard’s design and operation, as used in video demonstrations shown at the

AAAI 2007 Mobile Robot Exhibition.

7.4. APPROACH 125

7.4.1 Message-Passing Topology

Switchyard set up a “virtual cluster” of computers on top of an existing cluster of

networked machines. The loose term “virtual cluster” is meant to indicate that a

subset of machines on a local-area network operated as a cohesive group during a run

of the robot.

Master Server

One computer in the virtual cluster was chosen to be the master server. Importantly,

the master server did not process all traffic flowing through the virtual cluster. Such

routing patterns would result in a star network topology, which is highly inefficient

for networks with heterogeneous connections between machines.

As a concrete example, consider a STAIR robot with an onboard ethernet switch

connecting several machines onboard the robot, as well as a wireless bridge connect-

ing the onboard ethernet segment to the building network, which in turn consists of

many ethernet switches connecting many more machines. Network throughput be-

tween machines on either side of the wireless link is excellent, but throughput across

the wireless link is often slow, latent, and even intermittent, as the robot moves

through the building and encounters regions of poor radio coverage. For simplicity,

the Switchyard framework was designed around the concept of a single master server

that maintains data structures defining the peer-to-peer connections, and informs

new machines joining the virtual cluster which processes to launch, and where to

find their peers. By definition, the master server must reside on only one side of the

wireless link, and if it were to process all data messages, the wireless link would grind

throughput to a halt across the entire virtual cluster, even if the data-heavy message

flows are fully contained on the subnets on either side of the wireless bridge.

As a result of this routing challenge, the master server was only used to automate

the startup, shutdown, and interconnection of the virtual cluster. Data payloads

sent between software modules always traveled on peer-to-peer TCP connections.

On startup, the master server loaded an XML description of the desired connection

graph—the topology of the virtual cluster—and automated its creation, as described


in the next paragraph.

Startup

A simple process-launching program ran on every machine that was part of the virtual

cluster. This program was started with the IP address of the master server and

its “virtual machine name,” which was not necessarily its host name. The process

launcher connected to the master server, announced its name, received back a list

of processes that were to be launched, and forked them. This step was only for

convenience and to reduce manual scripting requirements; if a process needed to be

launched manually (for example, inside a debugging tool), it could be excluded from

the automatic-launch list.

The actual software modules themselves were invoked with the IP address of the

master server, a “virtual process name,” which was not necessarily the executable

image filename, and an available TCP port on which to open a server socket. Processes

started a server on their assigned port, connected to the master server, announced

their name, and received back a list of other processes with which to establish peer-

to-peer connections. The processes then automatically connected to their peers, and

started sending and receiving peer-to-peer messages.

Message streams

Message streams in Switchyard were always unidirectional and asynchronous with

respect to any other modules. From a data-flow perspective, a running system was

visualized as a directed graph, with processes corresponding to graph nodes, and

peer-to-peer connections corresponding to directed edges in the graph. The sending

(or “upstream”) node sent messages at any time. Each message stream could have

any number of downstream receiving nodes. Data was always sent in self-contained

messages such as images, laser scans, maps, matrices, or navigation-waypoint lists.

Although each of these messages could have been divided into smaller units, for

example, images and matrices could be divided into rows or blocks, downstream nodes

would likely have had to reconstruct the original logical unit (e.g., a full image or

7.4. APPROACH 127

matrix) before processing could begin. Thus, to reduce code size and complexity in the

receiving node, Switchyard only presented the user code with entire messages; message

assembly and inflation to user-space data structures were hidden from programmers

using the system.

To simplify the code required for typical uses of the system, a C++ class hierarchy

was provided, together with a set of macros that reduce the typical boilerplate to a

single-line macro instantiation. Abstract superclasses contained all networking and

sequencing code required to transmit byte blocks of arbitrary size. These superclasses

were then derived and specialized to create each type of data flow in the system, which

included:

• 2D, 3D, and 6D points

• Visual and depth images

• 2D navigation waypoint paths

• Grid maps for navigation

• 2D particle clouds for localization

• Configuration-space manipulator coordinates and paths

• Text strings

• Audio samples

• Miscellaneous simple tokens for state-machine sequencing

Each subclass contained only the code necessary to serialize and deserialize its

data structure to and from a byte stream. These methods are implemented as C++

virtual functions, which allows the higher-level scheduling code to invoke the serialize

and deserialize methods without needing to know what is actually being transmitted.

This use of C++ polymorphism significantly reduced the code size and complexity

required for implementing message routing and re-synchronization.


Because the computation graph ran asynchronously, whenever a process was ready

to send data to its downstream peers, it invoked a framework-provided function which

did the following:

1. The serialize method of the message subclass serialized the message data

structure to its linear representation.

2. The length of the serialization was sent to downstream peers via TCP.

3. The serialization itself was sent to downstream peers via TCP.

On the downstream side, the framework continually performed the following steps:

1. The serialized size of a message was received and adequate space allocated, if

necessary, to buffer it.

2. The serialization itself was received and buffered.

3. The deserialize virtual method in the message subclass inflated the data

structure.

4. A virtual function was called to notify the receiving process that a data structure

was ready for processing.

To avoid race conditions, each data flow had a mutex which was automatically

locked during the inflation and processing of each message. Thus, the data-processing

code was not required to be re-entrant, which often simplified its implementation. The

framework silently dropped incoming messages if the process has not finished handling

the previous message, thus implementing an effective message queue of length 1.

Data Flow Registration

To initialize a Switchyard process, message streams were required to be instantiated

and registered with the framework. This was typically done in the process constructor,

with a single line of C++ code for each message stream.

If a process was producing data on a message stream, the following actions were

taken by the framework:

7.4. APPROACH 129

• The data flow name, IP address, and port number of the sending process were

sent to the master server, which maintained a static map between processes in

the (static) system topology, and their current network location.

• As message-receiving processes connected to message-producing processes, the

client socket connections were spun off and stored in internal data structures.

• Whenever a message-sending process wished to send data down a message

stream, the framework would invoke the corresponding serialize method for

the message type, and send the serialized message to all downstream peers, as

discussed previously.

If a process was receiving data on a message stream, the following actions were

taken by the framework:

• A thread was spun off to handle the network communications of the message

stream asynchronously from the main thread of the process.

• This thread attempted to connect via TCP with the processes that produced

message on the particular stream in question, using the peer connection mapping

maintained by the master server.

• Once connected to a peer process, the thread announced the data flow it wished

to receive.

• The thread then synchronized to the data stream and invoked user code to

process each incoming message, as discussed in previous sections.

By organizing the behavior in this manner, the entire peer-to-peer messaging

scheme was automatically orchestrated by the framework, saving a great deal of bug-

prone, repetitive networking and sequencing code in each process.

Configuration Ports

Many robotics software modules have startup parameters that need to be configured

for a particular system run. For example, a map server has access to many map files,


Figure 7.2: Pan-tilt-zoom (PTZ) camera control graph

a laser scanner can be configured in a variety of resolutions, and so on. Following

the precedent set in the Player/Stage framework, the graph file itself can optionally

contain startup parameters which, at runtime, will override the default values hard-

coded in each process.

As reduction of C++ boilerplate code was a design goal, the configuration ports

were implemented by a table of pointers, which was built up in the constructor of each

process. A single line of C++ code registers a particular variable (e.g. a string or a

double) with the framework, which then handled requests from the master server to

overwrite the values of variables. Because this mechanism was used only for startup

configuration, not run-time parameter changes, such behavior could not cause race

conditions.

7.4.2 Operation

To run an experiment or demonstration, the following steps occurred:

Graph Design

The machines available to run the experiment were described in the XML graph file,

by hostname or IP address. Then, the software processes needed for the experiment

or demonstration were either used off-the-shelf, or written from scratch. Finally, the

connections between the processes were defined in the XML graph file.

7.4. APPROACH 131

As a concrete example, the following XML graph file routed video from machine

“1” to machine “2” and allowed remote pan-tilt-zoom (PTZ) control.

<graph>

<comp name="1">

<proc name="video_display">

<proc name="ptz_control">

</comp>

<comp name="2">

<proc name="ptz_camera">

<port name="compression_quality" value="40"/>

</proc>

</comp>

<conn from="1.ptz_control.ptz" to="2.ptz_camera.ptz">

<conn from="2.ptz_camera.frames" to="1.video_display.frames">

</graph>

When this graph was run, it caused two processes to start on computer 1, and

one process to start on computer 2. Once the processes were running, they connected

to each other automatically and messages started flowing. A visualization of this

simple graph is shown in Figure 7.2. It allowed machine “1” (typically offboard the

robot) to view a video stream and command the camera to pan, tilt, and zoom. The

camera device driver was running on machine “2” (typically onboard the robot) and

the video was transmitted across the network as individual JPEG-compressed images

of configurable quality.

The “fetch an item” demonstration involved a much larger graph. As shown in

Figure 7.3, this graph involved 21 processes running on 3 machines: two onboard

the robot and one offboard. The modular software structure allowed parallel devel-

opment of many modules in much smaller, even trivial, graphs, by disparate groups

of contributors, without requiring coordinated synchronous development and testing

efforts. Unit testing in this fashion thus reduced development time and helped to

isolate bugs.


Figure 7.3: Graph of the original STAIR “fetch a stapler” demonstration. The largered text indicates the tasks performed by various regions; those annotations were nota functional part of the graph.

The asynchronous nature of the Switchyard framework is also shown in Figure 7.3.

Cycles in this graph would create potential deadlocks if the graph were to operate

synchronously. To keep things simple, the framework enforced no synchronization:

any and all synchronous behavior was implemented using local state variables in

modules which required it.

7.5. FETCH A STAPLER 133

7.5 Fetch a Stapler

This section will discuss how the previously-described software and hardware systems

were used to perform a “fetch an item” demonstration. First, a user verbally asked the

robot to fetch an item, in this case, a stapler. In response to this spoken command, the

robot navigated to the area containing the item, finds it using visual object detection,

applied a learned grasping strategy to pick up the object, and finally navigated back

to the user to deliver the item.

Figure 7.3 showed the detailed organization of the components used in this demon-

stration. “Computer 1” was an offboard Linux machine, “Computer 2” was an on-

board Windows machine, and “Computer 3” was an onboard Linux machine. The

number in parentheses after each process name indicated what computer ran the

process, and the directed edges in the graph show the message streams.

The processes used could be subdivided into roughly five main functional par-

titions, which were implemented using Switchyard by five largely disjoint teams of

researchers.

Spoken Dialog

In prior work on the STAIR project [47], a Markov Decision Process (MDP) model

to provide robust management of human-robot dialog. However, because the dialog

requirements of the “fetch an item” demonstration were so simple, this behavior

was approximated by using the CMU Sphinx speech-recognition system [102] with a

grammar that included numerous variations of “STAIR, please fetch the [object] from

the [location].” When such a phrase was recognized, the robot gave a simple verbal

acknowledgment using the Festival speech-synthesis system.

Unfortunately, fan noise propagated throughout the metal-framed robot resulted

in unusable audio quality from the robot’s onboard omnidirectional microphones. To

produce usable audio data, a lapel-clip microphone was worn by the experimenter.

The audio signal was transmitted via analog FM to a nearby computer, where it was

digitized and using the PortAudio library inside the audio mic process, resulting in

a continual message stream of digital audio samples.


The digital-audio message stream was received by the sphinx transcribe pro-

cess, which utilized the CMU Sphinx speech-recognition system to continuously pro-

cess the incoming digital-audio stream and attempt to recognize a simple grammar.

When the system recognized a “fetch an item” command, it passed the command to

the fetch item director process, which functioned as a central planner. In the “fetch

an item” demonstration, verbal acknowledgment of the command was generated using

the Festival speech-synthesis system, by the tts festival process running onboard

the robot, which verbalized text strings sent to it along a message stream.

Navigation

A comprehensive navigation architecture which enabled STAIR to navigate in indoor

environments and open doors was described in [68]. However, for the “fetch an item”

video demonstration shown at AAAI 2007, portions of this system were replaced to

increase speed and robustness to changing environments. The navigation system used

a Voronoi-based global planner and an implementation of VFH+[101] to avoid local

obstacles. The robot used a map of the Gates building on the Stanford campus, which

was built offline using using the DP-Slam algorithm [23].

As shown in Figure 7.3, the navigation and localization module used a collection of

independent processes to stream laser-scanner readings, perform localization against

a map, generate low-level commands to control the robot base, and so on. The

processes with soft-realtime requirements were run onboard the robot (machine 3),

while processes running on longer time-scales, such as the Voronoi-based path planner,

were run on a more powerful offboard machine (machine 1).

Object detection

The foveal-peripheral visual object detection system used by STAIR used a foveal-

peripheral system with a wide-angle camera and a steerable pan-tilt-zoom (PTZ)

camera, to form a foveal-peripheral imaging system. This subsystem was described

in detail in [32], but is briefly described here for completeness.

Since object detection is significantly easier from high resolution images than

7.6. SUMMARY 135

from low resolution ones, the high-resolution images provided by the steerable pan-

tilt-zoom camera significantly improve the accuracy of the visual object detection

algorithm. (Note that, in contrast, obtaining high resolution, zoomed-in images this

way would not have been possible if we were performing object detection on images

downloaded off the Internet.) In our demonstration, a fast offboard machine (com-

puter 1) was responsible for steering our robot’s PTZ camera to obtain high resolution

images of selected regions, and for running the object recognition algorithm. An on-

board machine (computer 2) was used to run the low-level device drivers responsible

for steering the camera and for taking/streaming images. Our object recognition

system was built using image features described in [86].

Grasping

To pick up the object, the robot used the grasping algorithm developed by [80, 81].

The robot uses a stereo camera to acquire an image of the object to be grasped. Using

the visual appearance of the object, a learned classifier then selects a good “grasp

point”—i.e., a 3D position at which to attempt to pick up the object. The algorithm

for choosing a grasp point was trained on a large set of labeled natural and synthetic

images of a variety of household objects. Although this training set did not include

staplers, the learned feature set was robust enough to generalize to staplers. The low-

level drivers for the camera and the robot arm were run onboard the robot (computer

2); the slower algorithm for finding a grasp point was run offboard (computer 1). An

example of the robot executing a grasp of the stapler is shown in 7.4.

7.6 Summary

This chapter described the hardware and software systems that allowed the STAIR

robot to perform the “fetch a stapler” demonstration, emphasizing its software frame-

work. The Switchyard framework provided a uniform set of conventions for commu-

nications across processes, and allowed different research teams to write software in

parallel for many different modules. Using Switchyard, these modules were then easy

to execute simultaneously and in a distributed fashion across a small set of onboard


Figure 7.4: The STAIR1 robot picking up a stapler.

and offboard computers. After extensive use of Switchyard in the Stanford AI Lab,

many lessons were learned which contributed to the Robot Operating System (ROS),

which will be described in the next chapter.

Chapter 8

ROS: A Robot Operating System

The previous chapter provided an overview of the STAIR project, focusing on Switch-

yard, its software-integration platform, which was created to facilitate collaboration

among the project contributors. Although the Switchyard framework was used in the

Stanford AI Laboratory for a variety of research projects and integration efforts, some

areas of potential improvement were quickly identified. These included the static na-

ture of the message-passing graph, the requirement of hand-coding additional message

types, and the lack of software tools and infrastructure to permit radical scaling of

the number of contributors.

As a result of these shortcomings, which were deemed fundamental in nature, a

new project was started: the Robot Operating System (ROS), a collaboration between

the author and numerous other researchers, including many at Willow Garage, Inc.

This chapter describes the motivations, design, and implementation of ROS, an open-

source software framework intended to facilitate collaborative development of complex

robot software systems. ROS is not an operating system in the traditional sense of

processor scheduling and low-level resource management; instead, ROS provides a

structured communications layer above the host operating systems of heterogeneous

compute clusters of various sizes and capabilities.

137

138 CHAPTER 8. ROS: A ROBOT OPERATING SYSTEM

8.1 Overview

As discussed in the previous chapter, writing software for robots is difficult, particu-

larly as the scale and scope of robotics continues to grow. Different types of robots can

have wildly varying hardware, making code reuse nontrivial. On top of this, the sheer

size of the required code can be daunting, as it must contain a deep stack starting

from driver-level software and continuing up through perception, abstract reasoning,

and beyond. Since the required breadth of expertise is well beyond the capabilities

of any single researcher, robotics software architectures must also support large-scale

software integration efforts.

To meet these challenges, robotics researchers have created a variety of frameworks

to manage complexity and facilitate rapid prototyping of software for experiments,

resulting in the many robotic software systems currently used in academia and indus-

try [45]. Each of these frameworks was designed for a particular purpose, perhaps in

response to perceived weaknesses of other available frameworks, or to place emphasis

on aspects which were seen as most important in the design process.

ROS, the framework described in this chapter, is also the product of tradeoffs and

prioritizations made during its design cycle. Its emphasis on large-scale integrative

robotics research is intended to be useful in a wide variety of situations as robotic

systems grow ever more complex. This chapter discusses the design goals of ROS, how

its implementation works towards them, and demonstrate how ROS handles several

common use cases of robotics software development.

8.2 Design Goals

ROS is not the best framework for all robotics software. The field of robotics is far

too broad for a single solution. For example, there are applications and domains with

strict power, size, or cost constraints that preclude using POSIX-based computers

capable of running ROS. There are also applications for which mission-critical safety

considerations are paramount and drive the entire system design. In contrast, ROS

8.2. DESIGN GOALS 139

was designed to meet a specific set of challenges encountered when developing large-

scale service robots as part of the STAIR project [74] at Stanford University and the

Personal Robots Program [107] at Willow Garage. Although the resulting architecture

has found use in a variety of robotics domains, it is important to note its design

environment from the outset.

The philosophical goals of ROS can be summarized as:

• Peer-to-peer

• Tools-based

• Multi-lingual

• Thin

• Free and Open-Source

The following sections will elaborate on these philosophies, showing how they

influenced the design and implementation of ROS.

8.2.1 Peer-to-Peer

A system built using ROS consists of a number of processes, potentially running on

a number of different computers, connected at runtime in a peer-to-peer topology.

Although frameworks based on a central server (e.g., CARMEN [66]) can also realize

the benefits of the multi-process and multi-host design, a central data server is prob-

lematic if the computers are connected in a heterogeneous network, or if very high

data rates are involved.

For example, on the large service robots for which ROS was designed, there are

typically several onboard computers connected via wired ethernet. This network seg-

ment is bridged via wireless LAN to high-power offboard machines that are running

computation-intensive tasks such as computer vision or speech recognition (Figure

8.1). Running the central server either onboard or offboard would result in unnec-

essary traffic flowing across the (slow) wireless link, because many message routes


Figure 8.1: A typical ROS network configuration

are fully contained in the subnets either onboard or offboard the robot. In con-

trast, peer-to-peer connectivity, combined with buffering, throttling, and “fanout”

software modules where necessary, can allow topologies to be readily crafted that

achieve full utilization of the wired links on either side of the wireless link, with only

low-bandwidth, latency-tolerant data traveling across the link.

8.2.2 Multi-lingual

When writing code, many individuals have preferences for some programming lan-

guages above others. These preferences are the result of personal tradeoffs between

programming time, ease of debugging, syntax, runtime efficiency, and a host of other

reasons, both technical and cultural. For these reasons, ROS was designed to be

language-neutral. ROS software is often written in C++, Python, or Java, although

language ports in various states of completion exist to a variety of other languages,

including LISP, C#, and Octave/MATLAB.

The ROS specification is at the messaging layer, not any deeper. Peer-to-peer

connection negotiation and configuration occurs in XML-RPC, for which reasonable

implementations exist in most major languages. Rather than provide a C-based

implementation with stub interfaces generated for all major languages, ROS has been

natively implemented in each target language to better follow the conventions of each

language. However, in some cases it is expedient to add support for a new language

by wrapping an existing library. For example, the Octave client is implemented by

wrapping the ROS C++ library.

8.2. DESIGN GOALS 141

To support cross-language development, ROS uses a simple, language-neutral in-

terface definition language (IDL) to describe the messages sent between modules.

The IDL uses (very) short text files to describe fields of each message, and allows

composition of messages, as illustrated by the complete IDL file for a joystick state

message:

Header header

float32[] axes

int32[] buttons

Code generators for each supported language then generate native message im-

plementations which “feel” like native objects and are automatically serialized and

deserialized by ROS as messages are sent and received. This saves considerable pro-

grammer time and errors; the previous 3-line IDL file is automatically expanded to

hundreds of lines of tedious code in C++, Python, Java, LISP, and Octave.

Because the messages are generated automatically from such simple text files, it

is easy to enumerate new types of messages. At time of writing, the known publicly-

accessible ROS-based software ecosystem contains over two thousand types of mes-

sages. The ease of creating new message types facilitates easier refactoring of larger

software modules into a set of smaller ones. In turn, this allows for finer-grained

debugging, load balancing, and unit testing.

The code generators thus help produce a language-neutral message processing

scheme where different languages can be mixed and matched as desired. Each module

of a ROS network can be written in the language which best fits its (often competing)

design objectives of programmer expertise, performance requirements, and ease of

maintenance.

8.2.3 Tools-based

In an effort to manage the complexity of ROS, it follows a microkernel-inspired design,

where a large number of small tools are used to build and run the various ROS

components. This is in contrast to a monolithic type of design, where a single complex

program is used to provide an integrated development environment (IDE).


The suite of tools provided with ROS perform a variety of tasks to speed up devel-

opment and debugging. These include navigating the (very large) source code forest,

getting and setting configuration parameters, visualizing the peer-to-peer connection

topology, measuring bandwidth utilization, graphically plotting message data, auto-

generating documentation, and so on. Although core services such as a global clock

and a global logging mechanism could have been implemented inside the master mod-

ule, for virtually all decisions of this type, such functionality was pushed into separate

modules. The rationale for such actions was that the loss in efficiency was more than

offset by gains in stability and complexity management.

8.2.4 Thin

As was eloquently described in [59], most existing robotics software projects contain

drivers or algorithms which could be reusable outside of the project. Unfortunately,

due to a variety of reasons, much of this code has become so entangled with mid-

dleware that it is difficult to “extract” its functionality and re-use it outside of its

original context.

To combat this tendency, driver and algorithm development in ROS is strongly

encouraged to occur in standalone libraries that have no dependencies on ROS. The

ROS build system performs modular builds inside the source code tree, and its use

of CMake makes it comparatively easy to follow this “thin” ideology. ROS modules

are strongly encouraged, though not required, to place virtually all complexity in

libraries. These libraries can then be wrapped by small executables which expose

library functionality to ROS, to allow for easier code extraction and reuse beyond

the original goals. As an added benefit, unit testing is often far easier when code is

factored into libraries, as standalone test programs can be written to exercise various

features of the library against pre-defined task sequences or known-good datasets.

ROS re-uses code from numerous other open-source projects, such as the drivers,

navigation system, and simulators from the Player project [104], vision algorithms

from OpenCV [8], and planning algorithms from OpenRAVE [19], among hundreds of

such examples. In each case, ROS is used only to expose various configuration options

8.3. NOMENCLATURE 143

and to route data into and out of the respective software, with as little wrapping or

patching as possible. To benefit from the continual community improvements, the

ROS build system can automatically update source code from external repositories,

apply patches, and perform various other modifications to external source trees.

8.2.5 Free and Open-Source

The full source code of ROS is publicly available. This is critical to facilitate de-

bugging at all levels of the software stack. While proprietary environments such as

Microsoft Robotics Studio [40] and Webots [63] have many commendable attributes,

for some tasks there is simply no substitute for a fully open platform. This is par-

ticularly true when hardware and many levels of software are being designed and

debugged in parallel.

ROS is distributed under the terms of the BSD license, which allows the devel-

opment of both non-commercial and commercial projects. ROS passes data between

modules using inter-process communications, and does not require that modules link

together in the same executable. Systems built around ROS can use fine-grain li-

censing of their various components: individual modules can incorporate software

protected by various licenses ranging from GPL to BSD to proprietary, but license

“contamination” ends at the module boundary.

8.3 Nomenclature

The fundamental concepts of the ROS implementation are nodes, messages, topics,

and services. These terms are used similarly to those in the previous chapter describ-

ing Switchyard. For clarity, they are described in the following paragraphs.

Nodes are processes that perform computation. ROS is designed to be mod-

ular at a fine-grained scale: a system is typically comprised of many nodes, with

many systems involving several hundred nodes. In this context, the term “node” is

interchangeable with “software module.” The use of the term “node” arises from vi-

sualizations of ROS-based systems at runtime. As illustrated in the previous chapter,


when many nodes are running, it is convenient to render the peer-to-peer commu-

nications as a graph, with processes as graph nodes and the peer-to-peer links as

arcs.

Nodes communicate with each other by passing messages. A message is a strictly

typed data structure. Standard primitive types (integer, floating point, boolean,

etc.) are supported, as are arrays of primitive types and constants. Messages can be

composed of other messages, and arrays of other messages, nested arbitrarily deep.

Pairs of messages, termed the request and response, form a service.

A node sends a message by publishing it to a given topic, which is simply a string

such as “odometry” or “map.” A node that is interested in a certain kind of data

will subscribe to the appropriate topic. There may be multiple concurrent publishers

and subscribers for a single topic, and a single node may publish and/or subscribe to

multiple topics. In general, publishers and subscribers are not aware of each others’

existence.

The simplest communications are along pipelines:

microphone

speech recognition

dialog manager

speech synthesis

speaker

8.4. USE CASES 145

However, graphs are usually far more complex, and typically contain cycles and nu-

merous one-to-many or many-to-many connections.

Although the topic-based publish-subscribe model is a flexible communications

paradigm, its asynchronous “broadcast” routing scheme can be overly complex for

simple synchronous transactions. In ROS, a simple synchronous transaction is called

a service, and is defined by a name and a pair of strictly typed messages: one for

the request and one for the response. This is analogous to web services, which are

typically defined by URIs and have request and response documents of well-defined

types. Note that, unlike topics, only one node can advertise a service of any particular

name: there can only be one service called ”classify image”, for example, just as there

can only be one web service at any given URI.

8.4 Use Cases

In this section will describe a number of common scenarios encountered when using

robotic software frameworks. The open architecture of ROS allows for the creation

of a wide variety of tools. Describing the ROS approach to these use cases will also

introduce a number of the tools designed to be used with ROS.

8.4.1 Debugging a single node

When performing robotics research, often the scope of work is limited to a well-defined

area of the system, such as a node which performs some type of planning, reasoning,

perception, or control. However, to bring up a robotic system for experiments, a

much larger software ecosystem must exist. For example, to do vision-based grasping

experiments, drivers must be running for the camera(s) and manipulator(s), and any

number of intermediate processing nodes (e.g., object recognition, pose detection,

trajectory generation) also must be up and running. This adds a significant amount

of difficulty and overhead to integrative robotics research.

ROS is designed to minimize the difficulty of debugging in such settings: its strictly


modular structure allows nodes undergoing active development to run alongside pre-

existing, well-debugged nodes. Because nodes connect to each other at runtime, the

graph can be dynamically modified. In the previous example of vision-based grasping,

a graph with perhaps a dozen nodes is required to provide the “infrastructure.” This

infrastructure graph can be started and left running during an entire experimental

session. Only the node(s) undergoing source code modification need to be periodi-

cally restarted after each recompilation or parameter adjustment, at which time ROS

silently handles the graph modifications to disconnect and reconnect the program

after its relaunch. This can result in a massive increase in productivity, particularly

as the robotic system in question becomes ever more complex and interconnected.

To emphasize, altering the computation graph in ROS often amounts to simply

starting or stopping a POSIX process. In debugging settings, this is typically done at

the command line or in a debugger. The ease of inserting and removing nodes from

a running ROS-based system is one of its most powerful and fundamental features.

8.4.2 Logging and playback

Research in robotic perception is often done most conveniently with logged sensor

data, to permit controlled comparisons of various algorithms and to simplify the

experimental procedure. ROS supports this approach by providing generic logging

and playback functionality. Any ROS message stream can be dumped to disk and

later replayed. Importantly, this can all be done at the command line; it requires no

modification of the source code of any pieces of software in the graph.

For example, the following network graph could be quickly set up to collect a

dataset for visual-odometry research:

8.4. USE CASES 147

camera

logger visualizer

robot

The resulting message dump can be played back into a different graph, which

contains the node under development:

logger vision research visualizer

As before, node instantiation can be performed simply by launching a process; it

can be done at the command line, in a debugger, from a script, etc.

To facilitate logging and monitoring of systems distributed across many hosts, the

rosconsole library builds upon the Apache project’s log4cxx system to provide a

convenient and elegant logging interface, allowing printf-style diagnostic messages

to be routed through the network to a single stream called rosout.

8.4.3 Packaged subsystems

Some areas of robotics research, such as indoor robot navigation, have matured to the

point where “out of the box” algorithms can work reasonably well. ROS leverages

the algorithms implemented in the Player project to provide a navigation system,

producing this graph:


robot

localization

planner

laser map

Although each node can be run from the command line, repeatedly typing the

commands to launch the processes can get tedious, particularly with large subgraphs.

To allow for “packaged” functionality such as a navigation system, ROS provides a

tool called roslaunch, which reads an XML description of a graph and instantiates

the graph on the cluster, optionally on specific hosts. The end-user experience of

launching a navigation system then boils down to

roslaunch navstack.xml

and a single Ctrl-C will gracefully close all processes described in the XML document.

This functionality can also significantly aid sharing and reuse of large demonstrations

of integrative robotics research, as the set-up and tear-down of large distributed

systems can be encoded once by its creator, and subsequently inserted and removed

from other ROS systems.

8.4.4 Collaborative Development

Due to the vast scope of robotics and artificial intelligence, collaboration between

researchers is necessary in order to build large systems. To support collaborative

8.4. USE CASES 149

development, the ROS software system is organized into packages. The definition

of “package” is deliberately open-ended: a ROS package is simply a directory which

contains an XML file describing the package and stating any dependencies.

A collection of ROS packages is a directory tree with ROS packages at the leaves:

a ROS package repository may thus contain an arbitrarily complex scheme of sub-

directories. For example, one ROS repository has root directories including “nav,”

“vision,” and “motion planning,” each of which contains many packages as subdirec-

tories.

ROS provides a utility called rospack to query and inspect the code tree, search

dependencies, find packages by name, etc. A set of shell expansions called rosbash

is provided for convenience, accelerating command-line navigation of the system.

The rospack utility is designed to support simultaneous development across mul-

tiple ROS package repositories. Environment variables are used to define the roots

of local copies of ROS package repositories, and rospack crawls the package trees as

necessary. Recursive builds, supported by the rosmake utility, allow for cross-package

library dependencies.

The open-ended nature of ROS packages allows for great variation in their struc-

ture and purpose: some ROS packages wrap existing software, such as Player or

OpenCV, automating their builds and exporting their functionality. Some packages

build nodes for use in ROS graphs, other packages provide libraries and standalone

executables, and still others provide scripts to automate demonstrations and tests.

The packaging system is meant to partition the building of ROS-based software into

small, manageable chunks, each of which can be maintained and developed on its own

schedule by its own team of developers.

At time of writing, several thousand ROS packages exist across over one hundred

publicly-viewable repositories, and hundreds more likely exist in private repositories at

various institutions and companies. The ROS distributions, both binary and source-

code, are available via the ROS website:

http://ros.org

Additional packages are found on other sites. Known publicly-viewable code reposi-

tories are regularly queried and indexed by a crawling engine, with a searchable index


available on the ROS website.

8.4.5 Visualization and Monitoring

While designing and debugging robotics software, it often becomes necessary to ob-

serve some state while the system is running. Although printf is a familiar technique

for debugging programs on a single machine, this technique can be difficult to extend

to large-scale distributed systems, and can become unwieldy for general-purpose mon-

itoring.

Instead, ROS can exploit the dynamic nature of the connectivity graph to “tap

into” any message stream on the system. Furthermore, the decoupling between pub-

lishers and subscribers allows for the creation of general-purpose visualizers. Simple

programs can be written which subscribe to a particular topic name and plot a par-

ticular type of data, such as laser scans or images. However, a more powerful concept

is a visualization program which uses a plugin architecture: this is done in the rviz

program, which is distributed with ROS. Visualization panels can be dynamically in-

stantiated to view a large variety of datatypes, such as images, point clouds, geometric

primitives (such as object recognition results), render robot poses and trajectories,

etc. Plugins can be easily written to display more types of data.

A native ROS port is provided for Python, a dynamically-typed language sup-

porting introspection. Using Python, a powerful utility called rostopic was written

to filter messages using expressions supplied on the command line, resulting in an in-

stantly customizable “message tap” which can convert any portion of any data stream

into a text stream. These text streams can be piped to other UNIX command-line

tools such as grep, sed, and awk, to create complex monitoring tools without writing

any code.

Similarly, a tool called rxplot provides the functionality of a virtual oscilloscope,

plotting any variable in real-time as a time series, again through the use of Python

introspection and expression evaluation.

8.4. USE CASES 151

8.4.6 Composition of functionality

In ROS, a “stack” of software is a cluster of nodes that does something useful, as

was illustrated in the navigation example. As previously described, ROS is able to

instantiate a cluster of nodes with a single command, once the cluster is described in

an XML file. However, sometimes multiple instantiations of a cluster are desired. For

example, in multi-robot experiments, a navigation stack will be needed for each robot

in the system, and robots with humanoid torsos will likely need to instantiate two

identical arm controllers. ROS supports this by allowing nodes and entire roslaunch

cluster-description files to be pushed into a child namespace, thus ensuring that

there can be no name collisions. Essentially, this prepends a string (the namespace)

to all node, topic, and service names, without requiring any modification to the code

of the node or cluster. Figure 8.2 shows a hierarchical multi-robot control system

constructed by simply instantiating multiple navigation stacks, each in their own

namespace:

Figure 8.2 was automatically generated by the rxgraph tool, which can inspect

and monitor any ROS graph at runtime. Its output renders nodes as ovals, topics as

squares, and connectivity as arcs.

8.4.7 Transformations

Robotic systems often need to track spatial relationships for a variety of reasons:

between a mobile robot and some fixed frame of reference for localization, between

the various sensor frames and manipulator frames, or to place frames on target objects

for control purposes.

To simplify and unify the treatment of spatial frames, a transformation system has

been written for ROS, called tf. The tf system constructs a dynamic transformation

tree which relates all frames of reference in the system. As information streams in

from the various subsystems of the robot (joint encoders, localization algorithms,

etc.), the tf system can produce streams of transformations between nodes on the

tree by constructing a path between the desired nodes and performing the necessary

calculations.


Figure 8.2: An automatically-generated rendering of a running ROS system

For example, the tf system can be used to easily generate point clouds in a sta-

tionary “map” frame from laser scans received by a tilting laser scanner on a moving

robot. As another example, consider a two-armed robot: the tf system can stream

the transformation from a wrist camera on one robotic arm to the moving tool tip of

the second arm of the robot. These types of computations can be tedious, error-prone,

and difficult to debug when coded by hand, but the tf implementation, combined with

the dynamic messaging infrastructure of ROS, allows for an automated, systematic

approach.

8.5. SUMMARY 153

8.5 Summary

ROS follows a philosophy of modular, tools-based open-source software development.

Its open-ended design was intended to be readily extensible by other researchers, to

build robot software systems which can be useful to a variety of hardware platforms,

research settings, and runtime requirements. At time of writing, it has become a

popular platform for the human-scale service-robotics platforms, as well as a variety

of other robotics domains.

Chapter 9

Conclusions

This thesis presented a series of hardware and software systems designed to facilitate

personal robotics. In the context of this thesis, “personal robotics” was taken to im-

ply robotic systems that are primarily owned by, or perform work directly benefiting,

the end users. The linguistic borrowing of the term personal from the computing

industry was intended to highlight the nature of this emerging application domain.

Like early personal computers, the application space is still being explored, and the

economic and societal impacts of highly capable personal robots appear to be promis-

ing, but remain unclear. As a result, the work described in this thesis investigated

two key challenges in contemporary personal robotics: hardware cost and software

interoperability.

The over-arching theme of the hardware systems contained in this thesis was to

enable low-cost mechanical subsystems by increasing the complexity of the software,

firmware, or electrical designs. The justification for these efforts was simple: highly

complex software can be mass-produced essentially at zero cost, run on ever-cheaper

computing systems, and more easily incorporate future improvements by collabora-

tors. In contrast, high-precision mechanical assemblies require massive economies of

scale and significant investment in static tooling in order to obtain cost reduction.

The robotic hand described in Chapter 6 incorporated numerous design elements

from the preceeding chapters, and served to illustrate the potential of highly dense

and tightly co-designed low-cost mechanical and electrical systems.

154

155

The software systems presented in this thesis were explicitly designed to facilitate

the re-use of software among the robotics community, and has resulted in a significant

increase in the quality and quantity of interoperable open-source robotics software.

The Robot Operating System (ROS) framework was designed to have sufficient per-

formance and scalability to manage the high-level, non-realtime intercommunications

of a complex, human-scale robot. In addition, ROS was designed to be integration-

friendly by reducing the effort required to create and package software modules, and

equally important, to subsequently incorporate community-contributed collections of

these software modules to create a customized state-of-the-art robot software system.

The integration-friendly design elements detailed in Chapter 8 have allowed the ROS

software ecosystem to benefit from thousands of contributors.

ROS is not the highest-performing framework for modular robotics software, as

measured by metrics of messaging throughput, latency, or jitter. There are many

technical improvements that can and should be made in future work. In particular,

many opportunities exist for new frameworks and collaboration methodologies in the

low-level, hard-realtime subdomains of robot software systems.

However, the various performance limitations of ROS have not prevented its adop-

tion in the robotics community. Instead, ROS provides messaging performance that

is “good enough” for a variety of medium- to high-level, non-realtime tasks, such as

the perceptive and deliberative layers of embodied AI systems. Rather than allo-

cating the time and resources required to implement the highest possible messaging

performance in ROS, massive effort was expended to reduce barriers to technical col-

laboration among scientists and engineers. This included numerous design cycles and

efforts to simplify the fundamental messaging API, web-based documentation and in-

dexing tools, and extensive efforts by collaborators at Willow Garage, Inc. and other

institutions to provide hands-on training to thousands of individuals, freely provide

fully-supported hardware platforms through the PR2 Beta Program, and organize

numerous technical conferences and workshops. The result of these efforts is a large

and growing community, comprising academic institutions, corporations, government

agencies, and other interested parties, who are now creating software of increasing

interoperability and generality.

156 CHAPTER 9. CONCLUSIONS

The work presented in this thesis, comprised of low-cost hardware design method-

ologies and collaborative open software systems, suggests several directions for con-

tinued research towards the long-term goal of widespread, general-purpose personal

robots. In particular, the performance of low-cost, low-power embedded systems con-

tinues to increase rapidly, blurring the traditional performance boundaries between

microcontrollers and microprocessors. Exploiting the capability of these architec-

tures for real-time processing, while still providing an integration-friendly program-

ming and debugging environment, would allow increased community interaction with

lower levels of the system stack. This points towards opportunities for the creation of

new software frameworks intended for memory-constrained, resource-limited environ-

ments, and which are specifically designed to support massive collaboration. Creating

interoperable bridges between open low-level software frameworks and high-level soft-

ware systems such as ROS is an exciting avenue for future research and development.

Bibliography

[1] H. Aldridge and J.-N. Juang. Joint Position Sensor Fault Tolerance in Robot

Systems using Cartesian Accelerometers. AIAA Guidance, Navigation, and

Control Conference, 1996.

[2] RO Ambrose, H. Aldridge, RS Askew, RR Burridge, W. Bluethmann,

M. Diftler, C. Lovchik, D. Magruder, and F. Rehnmark. Robonaut: NASA’s

Space Humanoid. IEEE Intelligent Systems and Their Applications, 15(4):57–

63, 2000.

[3] L. Bao and S. Intille. Activity Recognition from User-Annotated Acceleration

Data. Pervasive Computing, 3001/2004, 2004.

[4] Y. Bar-Cohen and C.L. Breazeal. Biologically-Inspired Intelligent Robots. So-

ciety of Photo Optical, 2003.

[5] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-Up Robust Features

(SURF). Computer Vision and Image Understanding, 2008.

[6] M. Berna, B. Lisien, B. Sellner, G. Gordon, F. Pfenning, and S. Thrun. A

learning algorithm for localizing people based on wireless signal strength that

uses labeled and unlabeled data. In Proc. of the International Joint Conference

on Artificial Intelligence (IJCAI), 2003.

[7] P. Bolliger. Redpin - adaptive, zero-configuration indoor localization through

user collaboration. In Proc. of the First ACM Int. Workshop on Mobile Entity

Localization and Tracking in GPS-less Environments, 2008.

157

158 BIBLIOGRAPHY

[8] G. Bradski and A. Kaehler. Learning OpenCV. O’Reilly, 2008.

[9] R. Brooks, C. Breazeal, M. Marjanovic, B. Scassellati, and M. Williamson.

The Cog Project: Building a Humanoid Robot. Computation for metaphors,

analogy, and agents, pages 52–87, 1999.

[10] H. Bruyninckx. Open Robot Control Software: the OROCOS Project. In IEEE

International Conference on Robotics and Automation, pages 2523–2528, 2001.

[11] W. Burgard, A. B. Cremers, D. Fox, D. Haehnel, G. Lakemeyer, D. Schulz,

W. Steiner, and S. Thrun. Experiences with an Interactive Museum Tour-Guide

Robot. Artificial Intelligence, 114:3–55, 1999.

[12] G. Canepa, J. Hollerbach, and A. Boelen. Kinematic Calibration by Means

of a Triaxial Accelerometer. IEEE International Conference on Robotics and

Automation, 1994.

[13] RF Chandler, CE Clauser, JT McConville, HM Reynolds, and JW Young. In-

vestigation of Inertial Properties of the Human Body. NTIS, National Technical

Information Service, 1975.

[14] C. M. Christensen. The Innovator’s Dilemma: When New Technologies Cause

Great Firms to Fail. Harvard Business Press, 1997.

[15] J. Craig. Introduction to Robotics: Mechanics and Control, 3rd ed. Pearson

Prentice Hall, 2005.

[16] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual Cate-

gorization with Bags of Keypoints. In In Workshop on Statistical Learning in

Computer Vision, ECCV, pages 1–22, 2004.

[17] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

2005.

BIBLIOGRAPHY 159

[18] F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte Carlo Localization

for Mobile Robots. In Proc. of the International Conference on Robotics and

Automation (ICRA), 1999.

[19] R. Diankov and J. Kuffner. The Robotic Busboy: Steps Towards Developing a

Mobile Robotic Home Assistant. In Intelligent Autonomous Systems, volume 10,

2008.

[20] E. Dombre, G. Duchemin, P. Poignet, and F. Pierrot. Dermarob: a Safe Robot

for Reconstructive Surgery. IEEE Transactions on Robotics and Automation,

19(5):876–884, 2003.

[21] F. Duvallet and A. Tews. WiFi Position Estimation in Industrial Environments

Using Gaussian Processes. In Proc. of the IEEE/RSJ International Conference

on Intelligent Robots and Systems (IROS), 2008.

[22] A. Edsinger-Gonzales and J. Weber. Domo: A Force Sensing Humanoid Robot

for Manipulation Research. In 2004 4th IEEE/RAS International Conference

on Humanoid Robots, pages 273–291, 2004.

[23] A. Eliazar and R. Parr. Hierarchical Linear/Constant Time SLAM Using Parti-

cle Filters for Dense Maps. In Neural Information Processing Systems (NIPS),

2005.

[24] D. Falie and V. Buzuloiu. Noise Characteristics of 3D Time-of-Flight Cameras.

In International Symposium on Signals, Circuits, and Systems, 2007.

[25] B. Ferris, D. Haehnel, and D. Fox. Gaussian Processes for Signal Strength-

Based Location Estimation. In Proc. of Robotics: Science and Systems (RSS),

2006.

[26] D. Fontaine, D. David, and Y. Caritu. Sourceless Human Body Motion Capture.

Smart Objects Conference, 2003.

[27] E. Foxlin and L. Naimark. VIS-Tracker: A Wearable Vision-Inertial Self-

Tracker. IEEE Virtual Reality Conference, 2003.

160 BIBLIOGRAPHY

[28] J. Friedman, T. Hastie, and R. Tibshirani. Additive Logistic Regression: a

Statistical View of Boosting. In Technical report, Dept. of Statistics, Stanford

University, 1998.

[29] B. Gerkey, R. Vaughan, and A. Howard. The Player/Stage Project: Tools for

Multi-Robot and Distributed Sensor Systems. In International Conference on

Advanced Robotics (ICRA), 2003.

[30] F. Ghassemi, S. Tafazoli, P.D. Lawrence, and K. Hashtrudi-Zaad. An

Accelerometer-Based Joint Angle Sensor for Heavy-Duty Manipulators. IEEE

International Conference on Robotics and Automation, 2002.

[31] K. Goldberg. What is Automation? IEEE Transactions on Automation Science

and Engineering, 9(1):1–2, 2012.

[32] S. Gould, J. Arfvidsson, A. Kaehler, B. Sapp, M. Meissner, G. Bradski,

P. Baumstarck, S. Chung, and A. Y. Ng. Peripheral-Foveal Vision ror Real-

time Object Recognition and Tracking in Video. In Twentieth International

Joint Conference on Artificial Intelligence (IJCAI-07), 2007.

[33] S. Gould, P. Baumstarck, M. Quigley, A. Y. Ng, and D. Koller. Integrating Vi-

sual and Range Data for Robotic Object Detection. In European Conference on

Computer Vision (ECCV) workshop on Multi-camera and Multi-modal Sensor

Fusion Algorithms and Applications (M2SFA2), 2008.

[34] G. Grisetti, C. Stachniss, and W. Burgard. Improved Techniques for Grid Map-

ping with Rao-Blackwellized Particle Filters. IEEE Transactions on Robotics,

2006.

[35] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.

[36] A. Haeberlen, E. Flannery, A. Ladd, A. Rudys, D. Wallach, and L. Kavraki.

Practical robust localization over large-scale 802.11 wirless networks. In Proc.

of the International Conference on Mobile Computing and Networking, 2004.

BIBLIOGRAPHY 161

[37] J. Hertzberg and F. Kirchner. Landmark-based Autonomous Navigation in

Sewerage Pipes. In Proc. of the First Euromicro Workshop on Advanced Mobile

Robotics, 1996.

[38] G. Hirzinger, N. Sporer, A. Albu-Schaffer, M. Hahnle, R. Krenn, A. Pascucci,

and M. Schedl. DLR’s Rorque-Controlled Light Weight Robot III- Are We

Reaching the Technological Limits Now? In Proceedings- IEEE International

Conference on Robotics and Automation, volume 2, pages 1710–1716, 2002.

[39] H. Iwata, S. Kobashi, T. Aono, and S. Sugano. Design of Anthropomorphic

4-DOF Tactile Interaction Manipulator with Passive Joints. Intelligent Robots

and Systems, 2005 (IROS 2005), pages 1785 – 1790, Aug. 2005.

[40] J. Jackson. Microsoft Robotics Studio: A Technical Introduction. In IEEE

Robotics and Automation Magazine, Dec. 2007. http://msdn.microsoft.com/en-

us/robotics.

[41] S. Jacobsen, E. Iversen, D. Knitti, R. Johnson, and R. Biggers. Design of the

Utah/MIT Dextrous Hand. In ICRA, 1986.

[42] R. Jazar. Theory of Applied Robotics, 2nd Ed. Springer, 2010. The author of

this thesis is indebted to Sonny Chan for pointing to this solution.

[43] P. Jensfelt, D. Austin, O. Wijk, and M. Anderson. Feature-Based Condensation

for Mobile Robot Localization. In Proc. of the International Conference on

Robotics and Automation (ICRA), pages 2531–2537, 2000.

[44] J. Jezouin, P. Saint-Marc, and G. Medioni. Building an Accurate Range Finder

with Off-the-Shelf Components. In Proceedings of CVPR, 1988.

[45] J. Kramer and M. Scheutz. Development Environments for Autonomous Mobile

Robots: A Survey. Autonomous Robots, 22(2):101–132, 2007.

162 BIBLIOGRAPHY

[46] B. Krose, N. Vlassis, and R. Bunschoten. Omnidirectional Vision for

Appearance-Based Robot Localization. In Revised Papers from the Interna-

tional Workshop on Sensor Based Intelligent Robots, pages 39–50. Springer-

Verlag, 2002.

[47] F. Krsmanovic, Curtis Spencer, Daniel Jurafsky, and A. Y. Ng. Have we met?

MDP based speaker ID for robust dialog. In Ninth International Conference on

Spoken Language Processing (InterSpeech-ICSLP), 2006.

[48] KUKA. youbot arm, 2010.

[49] A. Kumpf. Explorations in Low-Cost Compliant Robotics. Master’s thesis,

Massachusetts Institute of Technology, 2007.

[50] S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyra-

mid Matching for Recognizing Natural Scene Categories. In IEEE Conference

on Computer Vision and Pattern Recognition (CVPR), 2006.

[51] S. Lenser and M. Veloso. Sensor Resetting Localization for Poorly Modelled

Mobile Robots. In Proc. of the International Conference on Robotics and Au-

tomation (ICRA), 2000.

[52] J. J. Leonard and H. F. Durrant-Whyte. Mobile Robot Localization by Tracking

Geometric Beacons. IEEE Transactions on Robotics and Automation, 7:376–

382, 1991.

[53] J. Letchner, D. Fox, and A. LaMarce. Large-Scale Localization from Wireless

Signal Strength. In Proc. of the National Conference on Artificial Intelligence

(AAAI), 2005.

[54] M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginz-

ton, S. Anderson, J. Davis, J. Ginsberg, J. Shade, and D. Fulk. The Digital

Michelangelo Project: 3D Scanning of Large Statues. In SIGGRAPH, 2000.

BIBLIOGRAPHY 163

[55] J. Li, J. Zhu, Y. Guo, X. Lin, K. Duan, Y. Wang, and Q. Tang. Calibration

of a Portable laser 3-D Scanner used by a robot and its use in Measurement.

Optical Engineering, 47(1), 2008.

[56] Y. F. Li and X. B. Chen. End-Point Sensing and State Observation of a Flexible-

Link Robot. IEEE/ASME Transactions on Mechatronics, 6(3), 2001.

[57] H. Lim, L. Kung, J. Hou, and H. Luo. Zero-configuration, Robust Indoor

Localization: Theory and Experimentation. In Proc. of IEEE INFOCOM, 2006.

[58] H. Liu and G. Pang. Accelerometers for Mobile Robot Positioning. IEEE

Transactions on Industry Applications, 37(3), 2001.

[59] A. Makarenko, A. Brooks, and T. Kaupp. On the Benefits of Making Robotic

Software Frameworks Thin. In IROS, November 2007.

[60] L. Matthies, T. Balch, and B. Wilcox. Fast Optical Hazard Detection for Plan-

etary Rovers using Multiple Spot Laser Triangulation. In ICRA, 1997.

[61] S. May, B. Werner, H. Surmann, and K. Pervolz. 3D Time-of-Flight Cameras

for Mobile Robotics. In IROS, 2006.

[62] C. Mertz, J. Kozar, J. R. Miller, and C. Thorpe. Eye-safe Laser Line Striper

for Outside Use. In IEEE Intelligent Vehicle Symposium, 2001.

[63] O. Michel. Webots: a Powerful Realistic Mobile Robots Simulator. In Proc. of

the Second Intl. Workshop on RoboCup. Springer-Verlag, 1998.

[64] N. Miller, O. C. Jenkins, M. Kallmann, and M. J. Mataric. Motion Cap-

ture from Inertial Sensing for Untethered Humanoid Teleoperation. In 2004

4th IEEE/RAS International Conference on Humanoid Robots, pages 547–565,

2004.

[65] N. Miller, O.C. Jenkins, M. Kallman, and M. Mataric. Motion Capture from

Inertial Sensing for Untethered Humanoid Teleoperation. International Journal

of Humanoid Robotics, 2008.

164 BIBLIOGRAPHY

[66] M. Montemerlo, N. Roy, and S. Thrun. Perspectives on Standardization in

Mobile Robot Programming: The Carnegie Mellon Navigation (CARMEN)

Toolkit. In IEEE/RSJ International Conference on Intelligent Robots and Sys-

tems (IROS), 2003.

[67] K. Parsa, J. Angeles, and A. Misra. Pose-and-Twist Estimation of a Rigid

Body Using Accelerometers. IEEE International Conference on Robotics and

Automation, 2001.

[68] A. Petrovskaya and A. Y. Ng. Probabilistic Mobile Manipulation in Dynamic

Environments, with Application to Opening Doors. In International Joint Con-

ference on Artificial Intelligence (IJCAI), 2007.

[69] F. Pierrot, E. Dombre, E. Degoulange, L. Urbain, P. Caron, S. Boudet,

J. Gariepy, and J. Megnien. Hippocrate: a Safe Robot Arm for Medical Appli-

cations with Force Feedback. Medical Image Analysis, 3(3):285–300, 1999.

[70] G.A. Pratt and M.M. Williamson. Series Elastic Actuators. In Proceedings

of the IEEE/RSJ International Conference on Intelligent Robots and Systems

(IROS-95), volume 1, pages 399–406, 1995.

[71] J. Pratt, B. Krupp, and C. Morse. Series Elastic Actuators for High Fidelity

Force Control. Industrial Robot: An International Journal, 29(3):234–241, 2002.

[72] M. Quigley, A. Asbeck, and A. Y. Ng. A Low-cost Compliant 7-DOF Robotic

Manipulator. In IEEE International Conference on Robotics and Automation

(ICRA), 2011.

[73] M. Quigley, S. Batra, S. Gould, E. Klingbeil, Q. Le, A. Wellman, and A. Y.

Ng. High-Accuracy 3D Sensing for Mobile Manipulation: Improving Object

Detection and Door Opening. In International Conference on Robotics and

Automation (ICRA), 2009.

[74] M. Quigley, E. Berger, and A. Y. Ng. STAIR: Hardware and Software Archi-

tecture. In AAAI Robotics Workshop, 2007.

BIBLIOGRAPHY 165

[75] M. Quigley, R. Brewer, S. P. Soudararaj, V. Pradeep, Q. Le, and A. Y. Ng. Low-

cost Accelerometers for Robotic Manipulator Perception. In IEEE Conference


[76] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger,

R. Wheeler, and A. Y. Ng. ROS: an open-source Robot Operating System.

In Open-Source Software workshop of the International Conference on Robotics

and Automation (ICRA), 2009.

[77] M. Quigley, D. Stavens, and S. Thrun. Sub-meter Indoor Localization in Un-

modified Environments with Inexpensive Sensors. In International Conference


[78] B. Rooks. The Harmonious Robot. Industrial Robot: An International Journal,

33(2):125–130, 2006.

[79] J. K. Salisbury and J. J. Craig. Articulated Hands: Force Control and Kine-

matic Issues. The International Journal of Robotics Research, 1(1), 1982.

[80] A. Saxena, J. Driemeyer, J. Kearns, and A. Y. Ng. Robotic Grasping of Novel

Objects. In Neural Information Processing Systems (NIPS), 2006.

[81] A. Saxena, J. Driemeyer, J. Kearns, C. Osondu, and A. Y. Ng. Learning to

Grasp Novel Objects Using Vision. In International Symposium on Experimen-

tal Robotics (ISER), 2006.

[82] K.-U. Scholl, J. Albiez, and B. Gassmann. MCA - An Expandable Modular

Controller Architecture. In 3rd Real-Time Linux Workshop, Milan, Italy, 2001.

[83] D. Schulz and D. Fox. Bayesian Color Estimation for Adaptive Vision-Based

Robot Localization. In Proc. of the IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS), 2004.

[84] Schunk. 7-DOF LWA Manipulator, 2010.

166 BIBLIOGRAPHY

[85] S. Se, D. Lowe, and J. Little. Vision-Based Global Localization and Mapping

for Mobile Robots. IEEE Transactions on Robotics, 21(3):364–375, 2005.

[86] T. Serre, L. Wolf, and T. Poggio. Object Recognition with Features Inspired by

Visual Cortex. In IEEE Conference on Computer Vision and Pattern Recogni-

tion, 2005.

[87] Shadow Robot Company, Ltd. Shadow Hand, 2012.

[88] D. Shin, I. Sardellitti, and O. Khatib. A Hybrid Actuation Approach for

Human-Friendly Robot Design. In IEEE Int. Conf. on Robotics and Automation

(ICRA 2008), Pasadena, USA, pages 1741–1746, 2008.

[89] R. G. Simmons, J. Fernandez, R. Goodwin, S. Koenig, and J. O’Sullivan.

Lessons Learned from Xavier. IEEE Robotics and Automation Magazine, 7:33–

39, 2000.

[90] R. G. Simmons, S. Thrun, C. Athanassiou, J. Cheng, L. Chrisman, R. Good-

win, G. T. Hsu, and H. Wan. Odysseus: An Autonomous Mobile Robot. AI

Magazine, 1992.

[91] R. Slyper and J. Hodgins. Action Capture with Accelerometers. Eurograph-

ics/ACM SIGGRAPH Symposium on Computer Animation, 2008.

[92] J. Stuckler, M. Schreiber, and S. Behnke. Dynamaid, an Anthropomorphic

Robot for Research on Domestic Service Applications. In Proc. of the 4th

European Conference on Mobile Robots (ECMR), 2009.

[93] J. Sun, N. Zheng, and H. Shum. Stereo Matching Using Belief Propagation.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 2003.

[94] Barrett Technology. http://www.barrett.com.

[95] S. Thrun. Bayesian landmark learning for mobile robot localization. Machine

Learning, 33, 1998.

[96] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005.

BIBLIOGRAPHY 167

[97] E. Tira-Thompson. Tekkotsu: A Rapid Development Framework for Robotics.

Master’s Thesis, Carnegie Mellon University, 2004.

[98] A. Torralba, K. Murphy, and W. Freeman. Sharing Visual Features for Multi-

class and Multiview Object Detection. In Neural Information Processing Sys-

tems (NIPS), 2007.

[99] E. Torres-Jara. Obrero: A Platform for Sensitive Manipulation. In 2005

5th IEEE-RAS International Conference on Humanoid Robots, pages 327–332,

2005.

[100] N. G. Tsagarakis, M. Laffranchi, B. Vanderborght, and D. G. Caldwell. A Com-

pact Soft Actuator Unit for Small Scale Human Friendly Robots. In IEEE In-

ternational Conference on Robotics and Automation Conference (ICRA), pages

4356–4362, 2009.

[101] I. Ulrich and J. Borenstein. VFH+: Reliable Obstacle Avoidance for Fast

Mobile Robots. In IEEE International Conference on Robotics and Automation

(ICRA), May 1998.

[102] Carnegie Mellon University. CMU Sphinx Open Source Toolkit for Speech

Recognition, 2012.

[103] D. Vail and M. Veloso. Learning from Accelerometer Data on a Legged Robot.

IFAC/EURON Symposium on Intelligent Autonomous Vehicles, 2004.

[104] R. Vaughan and B. Gerkey. Reusable Robot Code and the Player/Stage Project.

In Davide Brugali, editor, Software Engineering for Experimental Robotics,

Springer Tracts on Advanced Robotics, pages 267–289. Springer, 2007.

[105] N. Vlassis, B. Terwijn, and B. Krose. Auxillary Particle Filter Robot Localiza-

tion from High-Dimensional Sensor Observations. In Proc. of the International

Conference on Robotics and Automation (ICRA), 2002.

168 BIBLIOGRAPHY

[106] J. Wolf, W. Burgard, and H. Burkhardt. Robust Vision-Based Localization by

Combining an Image Retrieval System with Monte Carlo Localization. IEEE

Transactions on Robotics and Automation, 2005.

[107] K.A. Wyrobek, E.H. Berger, H. der Loos, and J.K. Salisbury. Towards a Per-

sonal Robotics Development Platform: Rationale and Design of an Intrinsically

Safe Personal Robot. In Proc. IEEE Int. Conf. on Robotics and Automation,

pages 2165–2170, 2008.

[108] N. Yazawa, H. Uchiyama, H. Saito, M. Servieres, and G. Moreau. Image-Based

View Localization System Retrieving from a Panorama Database by SURF. In

Proc. of the IAPR Conference on Machine Vision Applications, 2009.

[109] S. Zhang and P. Huang. High-Resolution, Real-Time Three-Dimensional Shape

Measurement. Optical Engineering, 45(12), 2006.

[110] Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence, 22, 2000.

[111] C. Zhou, Y. Wei, and T. Tan. Mobile Robot Self-Localization Based on Global

Visual Appearance Features. In IEEE International Conference on Robotics

and Automation, 2003.

[112] J. Zhu, L. Wang, R. Yang, and J. Davis. Fusion of Time-of-Flight Depth and

Stereo for High Accuracy Depth Maps. In Proceedings of CVPR, 2008.

[113] M. Zinn, O. Khatib, B. Roth, and J. K. Salisbury. Playing it safe: A New Actu-

ation Concept for Human-Friendly Robot Design. IEEE Robotics & Automation

Magazine, 11(2):12–21, 2004.

[114] M. Zinn, B. Roth, O. Khatib, and J. K. Salisbury. A New Actuation Approach

for Human Friendly Robot Design. The International Journal of Robotics Re-

search, 23(4-5):379, 2004.

hardware and software systems for personal robots a dissertation submitted to the department of

Documents