the international journal of robotics...

http://ijr.sagepub.com/Robotics Research

The International Journal of

http://ijr.sagepub.com/content/31/11/1320The online version of this article can be found at:

DOI: 10.1177/0278364912455256

2012 31: 1320The International Journal of Robotics ResearchFox and Nicholas Roy

Abraham Bachrach, Samuel Prentice, Ruijie He, Peter Henry, Albert S Huang, Michael Krainin, Daniel Maturana, Dieterenvironments

Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied

Published by:

http://www.sagepublications.com

On behalf of:

Multimedia Archives

can be found at:The International Journal of Robotics ResearchAdditional services and information for

http://ijr.sagepub.com/cgi/alertsEmail Alerts:

http://ijr.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://ijr.sagepub.com/content/31/11/1320.refs.htmlCitations:

What is This?

- Sep 11, 2012Version of Record >>

at UNIV WASHINGTON LIBRARIES on January 23, 2013ijr.sagepub.comDownloaded from

http://ijr.sagepub.com/

http://ijr.sagepub.com/content/31/11/1320

http://www.sagepublications.com

http://www.ijrr.org/

http://ijr.sagepub.com/cgi/alerts

http://ijr.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://ijr.sagepub.com/content/31/11/1320.refs.html

http://ijr.sagepub.com/content/31/11/1320.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml


Estimation, planning, and mapping forautonomous flight using an RGB-Dcamera in GPS-denied environments

The International Journal ofRobotics Research31(11) 1320–1343© The Author(s) 2012Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/0278364912455256ijr.sagepub.com

Abraham Bachrach1, Samuel Prentice1, Ruijie He1, Peter Henry2, Albert S Huang1,Michael Krainin2, Daniel Maturana3, Dieter Fox2 and Nicholas Roy1

AbstractRGB-D cameras provide both color images and per-pixel depth estimates. The richness of this data and the recent devel-opment of low-cost sensors have combined to present an attractive opportunity for mobile robotics research. In this paper,we describe a system for visual odometry and mapping using an RGB-D camera, and its application to autonomous flight.By leveraging results from recent state-of-the-art algorithms and hardware, our system enables 3D flight in cluttered envi-ronments using only onboard sensor data. All computation and sensing required for local position control are performedonboard the vehicle, reducing the dependence on an unreliable wireless link to a ground station. However, even withaccurate 3D sensing and position estimation, some parts of the environment have more perceptual structure than others,leading to state estimates that vary in accuracy across the environment. If the vehicle plans a path without regard to howwell it can localize itself along that path, it runs the risk of becoming lost or worse. We show how the belief roadmapalgorithm (Prentice and Roy 2009), a belief space extension of the probabilistic roadmap algorithm, can be used to planvehicle trajectories that incorporate the sensing model of the RGB-D camera. We evaluate the effectiveness of our sys-tem for controlling a quadrotor micro air vehicle, demonstrate its use for constructing detailed 3D maps of an indoorenvironment, and discuss its limitations.

KeywordsAerial robotics, sensing and perception, computer vision, motion planning

1. Introduction

Unmanned aerial vehicles (UAVs) rely on accurate knowl-edge of their position for decision-making and control. Asa result, considerable investment has been made towardsimproving the availability of the global positioning infras-tructure, including utilizing satellite-based GPS systemsand developing algorithms to use existing RF signals suchas WiFi. However, most indoor environments and manyparts of the urban canyon remain without access to exter-nal positioning systems, limiting the ability of currentautonomous UAVs to fly through these areas.

Localization using sonar ranging (Leonard and Durrant-Whyte 1991), laser ranging (Thrun et al. 2000) or camerasensing (Se et al. 2002) has been used extremely success-fully on a number of ground robots and is now essentiallya commodity technology. Previously, we have developedalgorithms for MAV flight in cluttered environments usinglaser rangefinders (Bachrach et al. 2009a) and stereo cam-eras (Achtelik et al. 2009). Laser rangefinders that are cur-rently available in form factors appropriate for use on a

MAV are very high precision, but only provide range mea-surements along a plane around the sensor. Since thesesensors can only detect objects that intersect the sensingplane, they are most useful in environments characterizedby vertical structures, and less so in more complex scenes.

Structured light RGB-D cameras are based upon stereotechniques, and thus share many properties with stereo cam-eras. The primary differences lie in the range and spatialdensity of depth data. Since RGB-D cameras illuminatea scene with a structured light pattern, they can estimatedepth in areas with poor visual texture but are range-limitedby their projectors. This paper presents our approach to

1Computer Science and Artificial Intelligence Laboratory, MassachusettsInstitute of Technology, USA2Department of Computer Science and Engineering, University of Wash-ington, USA3The Robotics Institute, Carnegie Mellon University, USA

Corresponding author:Nicholas Roy, Computer Science and Artificial Intelligence Laboratory,Massachusetts Institute of Technology, Cambridge, MA 02139, USA.Email: [email protected]



Bachrach et al. 1321

Fig. 1. Our quadrotor micro air vehicle (MAV). The RGB-Dcamera is mounted at the base of the vehicle, tilted slightly down.

providing an autonomous micro air vehicle with fast andreliable state estimates and a 3D map of its environment byusing an onboard RGB-D camera and inertial measurementunit (IMU). Together, these allow the MAV to safely operatein cluttered, GPS-denied indoor environments. The controlof a micro air vehicle requires accurate estimation of notonly the position of the vehicle but also the velocity – esti-mates that our algorithms are able to provide. Estimating avehicle’s 3D motion from sensor data typically consists ofestimating its relative motion at each time step by align-ing successive sensor measurements such as laser scansor RGB-D frames, a process most often known as “visualodometry” when comparing camera or RGB-D images.

Given knowledge of the relative motion of the vehiclefrom sensor frame to sensor frame, the 3D trajectory of thevehicle in the environment can be estimated by integratingthe relative motion estimates over time. Given knowledgeof the vehicle position in the environment, the locations ofobstacles in each sensor frame can also be used to con-struct a global map. While often useful for local positioncontrol and stability, visual odometry methods suffer fromlong-term drift and are not suitable for building large-scalemaps. To solve this problem, we also demonstrate how ourprevious work on RGB-D mapping (Henry et al. 2010) canbe incorporated to detect loop closures, correct for accumu-lated drift and maintain a representation of consistent poseestimates over the history of previous frames.

However, the fast dynamics of UAVs have stringentrequirements in terms of state estimation accuracy. TheRGB-D sensor has a limited range and field of view, whichstrongly affects its reliability for state estimation on UAVs.When position information is temporarily not availablefrom the onboard sensors, the state estimate will divergefrom the true state much faster than on a ground vehicle,giving the UAV greater sensitivity to sensor limitations as itmoves through the environment. Our approach to address-ing this sensitivity is based on the belief roadmap (BRM)algorithm (Prentice and Roy 2007; He et al. 2008a; Prenticeand Roy 2009). The BRM is a generalization of the prob-abilistic roadmap (PRM) algorithm (Kavraki et al. 1996),performing searches in the belief space of the vehicle effi-ciently by using the symplectic form of the Kalman filter

(Abou-Kandil 2003) to find the minimum expected costpath for the vehicle.

In this paper, we provide two primary contributions.Firstly, we provide a systematic experimental analysis ofhow the best practices in visual odometry using an RGB-Dcamera enable the control of a micro air vehicle; this exper-imental analysis integrates a range of previous results invisual odometry. We show that visual odometry can be usedwith the RGB-D camera to perform local map construc-tion, and we adopt previous results in RGB-D simultaneouslocalization and mapping (SLAM) for global loop closure(Henry et al. 2010). Secondly, we give an extension of ourprevious work in BRM planning (Prentice and Roy 2009)for a quadrotor helicopter (Figure 1), providing a novel sam-pling strategy that adapts to the information available inthe environment, a detailed comparison to other samplingstrategies and providing an experimental comparison usingthe RGB-D camera. We describe our overall system, justifythe design decisions made, provide a ground-truth evalua-tion, and discuss its capabilities and limitations. We con-clude the paper with a demonstration of the quadrotor heli-copter using the BRM algorithm to navigate autonomouslyin indoor environments.

This paper extends preliminary results given by Huanget al. (2011) and by He et al. (2008a), demonstrating theRGB-D mapping algorithm and the BRM algorithm. Wegive additional algorithmic details regarding estimation andmapping, provide the extension of the BRM to other sen-sor modalities such as the RGB-D camera, and give a morethorough experimental analysis in real-world environments.

2. Vehicle position estimation

The problem we address is that of quadrotor helicopter nav-igation. The quadrotor must use the onboard RGB-D sensorto estimate its own position (local estimation), build a dense3D model of the environment (global simultaneous local-ization and mapping) and use this model to plan trajectoriesthrough the environment.

Our algorithms are implemented on the vehicle shownin Figure 1. The vehicle is a Pelican quadrotor manufac-tured by Ascending Technologies GmbH. The vehicle hasa maximal dimension of 70 cm, and a payload of up to1000 g. We mounted a stripped down Microsoft Kinect sen-sor and connected it to the onboard flight computer. Theflight computer, developed by the Pixhawk project at ETHZurich (Meier et al. 2011), is a 1.86 GHz Core2Duo proces-sor with 4 GB of RAM. The computer is powerful enough toallow all of the real-time estimation and control algorithmsto run onboard the vehicle.

Following our previous work, we developed a systemthat decouples the real-time local state estimation fromthe global SLAM. Figure 2 depicts the organization of theprocesses and control flow of the system, adapted fromBachrach et al. (2011). The local state estimates are com-puted from visual odometry (Section 2.2), and to correct



1322 The International Journal of Robotics Research 31(11)

Fig. 2. Block diagram of the overall system. The local stateestimates are computed from visual odometry (Section 2.2). Tocorrect for drift of these local estimates, the estimator period-ically incorporates position corrections provided by the SLAMalgorithm (Section 2.3).

for drift in these local estimates the estimator periodicallyincorporates position corrections provided by the SLAMalgorithm (Section 2.3). This architecture allows the SLAMalgorithm to use much more processing time than wouldbe possible if the state estimates from the SLAM algo-rithm were directly used to control the vehicle. Althoughour current system architecture runs the SLAM and plan-ner processes offboard the vehicle, the feasibility of fullonboard operation without a link to a ground station hasbeen shown previously by Shen et al. (2011). While theirprevious work used a laser sensor rather than an RGB-Dcamera, our SLAM process is not significantly more com-putationally costly. Additionally, we will show in this paperthat each query of our planner is not significantly morecostly than a conventional shortest path planner. The majorcomputational difference is the graph construction, whichcan be amortized over all future queries.

If the UAV does not have access to perfect state knowl-edge, such as from external sources (e.g. motion cap-ture, GPS, etc.), it can localize itself by first using sen-sors to measure environmental features and then by reg-istering the measurements against a pre-existing map. Tocontrol the quadrotor, we integrated the new visual odom-etry and RGB-D mapping algorithms into our systempreviously developed around 2D laser scan-matching andSLAM (Bachrach et al. 2009a, 2011).

We keep the SLAM process separate from the real-timecontrol loop, instead having it provide corrections for thereal-time position estimates. Since these position correc-tions are delayed significantly from when the measurementupon which they were based was taken, we must account forthis delay when we incorporate the correction by retroac-tively modifying the appropriate position estimate in thestate history. All future state estimates are then recomputedfrom this corrected position, resulting in globally consis-tent real-time state estimates. By incorporating the SLAMcorrections after the fact, we allow the real-time state esti-mates to be processed with low enough delay to control theMAV, while still incorporating the information from SLAMto ensure drift-free position estimation.

2.1. State estimation

Throughout this work, the vehicle navigation and controlis assumed to depend on an estimate of the position andvelocity of the vehicle given by the state vector x. Theinstantaneous state of the vehicle xt is not directly observ-able but must be inferred from noisy measurements z1:t andis a function of the control actions u1:t. In order to mitigatethe effects of noisy measurements, we use Bayesian filter-ing to infer a probability distribution p( xt|u1:t, z1:t) over thestate, expressed as

p( xt|u1:t, z1:t) =1

Zp( zt|xt)

∫Sp( xt|ut, xt−1) p( xt−1) dxt−1 (1)

where Z is a normalizer. Equation (1), known as the Bayes’filter, provides an efficient, recursive way to update the statedistribution after each action ut and measurement zt. We usethe Kalman family of Bayes’ filters to perform data fusion(cf. Figure 2), integrating measurements from multiple sen-sors, e.g. the onboard IMU, the visual odometry and theglobal SLAM process.

The visual odometry and mapping processes described inSections 2.2 and 2.3 are of course state estimators in theirown right, but we treat these processes as additional mea-surements. However, Bayesian filtering requires a sensormodel, characterizing the uncertainty of each measurement.Since our visual odometry and mapping algorithms do notprovide an explicit model of the uncertainty of the posteriorstate estimate, we fit a constant noise model to each estima-tor acquired from data. This is clearly a limiting assump-tion as the accuracy and uncertainty of both processes aredependent on the instantaneous measurements. We are cur-rently examining the problem of learning adaptive noisemodels for these processes.

While the overall structure of the state estimation processdoes not change, we report results using different filteringtechniques to instantiate equation (1), depending on the spe-cific sensors. For the experimental analysis of the visualodometry using the RGB-D camera, we use the extendedKalman filter (EKF), described in Appendix A. For thelaser-based validation of the extensions to the BRM algo-rithm, we use the unscented Kalman filter (UKF), describedin Appendix B.

2.2. Visual odometry

The goal of visual odometry is to estimate the relativeposition change of the camera between pairs of succes-sive images. Each of these estimated position changescan then be incorporated into the overall state estimateusing a specific formulation of equation (1) as describedin Section 2.1. The visual odometry algorithm that wehave developed is based around a standard stereo visualodometry pipeline, with components adapted from existingalgorithms. While most visual odometry algorithms fol-low a common architecture, a large number of variationsand specific approaches exist, each with its own attributes.




Fig. 3. The input RGB-D data to the visual odometry algorithmalongside the detected feature matches. The top row images arefrom time t, the bottom row images are from time t + 1. The leftcolumn is the depth image, and the middle column is the corre-sponding RGB image. The right hand column shows the pixelsthat are matched between frames, where inlier feature matches aredrawn in blue and outliers are drawn in red.

A contribution of this paper is to specify the steps of ourvisual odometry algorithm, compare the alternatives foreach step, and provide a reference implementation. (Ourvisual odometry implementation is open source and avail-able for download at http://fovis.googlecode.com.) In thissection we specify these steps and we provide the exper-imental comparison of each step in the visual odometrypipeline. Our overall algorithm is most closely related to theapproaches taken by Mei et al. (2009) and Howard (2008),and consists of the following sequence of operations:

1. Image preprocessing. An RGB-D image is first acquiredfrom the RGB-D camera (Figure 3). The RGB com-ponent of the image is converted to grayscale andsmoothed with a Gaussian kernel of σ = 0.85. A Gaus-sian pyramid is then constructed to enable more robustfeature detection at different scales. Each level of thepyramid corresponds to one octave in scale space. Fea-tures at the higher scales generally correspond to largerimage structures in the scene, which generally makesthem more repeatable and robust to motion blur.

2. Feature extraction. Features are extracted at each levelof the Gaussian pyramid using the FAST feature detec-tor (Rosten and Drummond 2006). The threshold forthe FAST detector is adaptively chosen using a simpleproportional controller to ensure a sufficient number offeatures are detected in each frame. The depth corre-sponding to each feature is also extracted from the depthimage. Features that do not have an associated depth arediscarded. To maintain a more uniform distribution offeatures, each pyramid level is discretized into 80 × 80pixel buckets, and the 25 features in each bucket withthe strongest FAST corner scores are retained.

3. Initial rotation estimation. For small motions suchas those encountered in successive image frames, themajority of a feature’s apparent motion in the imageplane is caused by 3D rotation. Estimating this rotation

allows us to constrain the search window when match-ing features between frames. We use the technique pro-posed by Mei et al. (2009) to compute an initial rota-tion by directly minimizing the sum of squared pixelerrors between down-sampled versions of the currentand previous frames.One could also use an IMU or a dynamics model ofthe vehicle to compute this initial motion estimate;however, the increased generality of the image-basedestimate is preferable, while providing sufficient per-formance. An alternative approach would be to use acoarse-to-fine motion estimation that iteratively esti-mates motion from each level of the Gaussian pyramid,as proposed by Johnson et al. (2008).

4. Feature matching. Each feature is assigned an 80-bytedescriptor consisting of the brightness values of the9 × 9 pixel patch around the feature, normalized tozero mean and omitting the bottom right pixel. Theomission of one pixel results in a descriptor lengthmore suitable for vectorized instructions. Features arethen matched across frames by comparing their fea-ture descriptor values using a mutual-consistency check(Nistér et al. 2004). The match score between twofeatures is the sum-of-absolute differences (SAD) oftheir feature descriptors (Howard 2008), which canbe quickly computed using single-instruction multiple-data (SIMD) instructions such as Intel SSE2. A featurematch is declared when two features have the lowestscoring SAD with each other, and they lie within thesearch window defined by the initial rotation estimation.Once an initial match is found, the feature locationin the newest frame is refined to obtain a subpixelmatch. Refinement is computed by minimizing the sum-of-square errors of the descriptors, using an efficientsecond-order minimization (ESM) technique (Benhi-mane and Malis 2004) to solve the iterative nonlinearleast squares problem. We also use SIMD instructionsto speed up this process.

5. Inlier detection. Although the constraints imposed bythe initial rotation estimation substantially reduce therate of incorrect feature matches between frames, anadditional step is necessary to further prune away badmatches. We follow Howard’s approach of computing agraph of consistent feature matches, and then using agreedy algorithm to approximate the maximal clique inthe graph (Hirschmuller et al. 2002; Howard 2008).The graph is constructed according to the fact that rigidbody motions are distance-preserving operations – theEuclidean distance between two features at one timeshould match their distance at another time. Thus, eachpair of matched features across frames is a vertex inthe graph, and an edge is formed between two suchpairs of matched feature if the 3D distance between thefeatures does not change substantially from the priorframe to the subsequent frame. For a static scene, theset of inliers make up the maximal clique of consis-tent matches. The max-clique search is approximated




Fig. 4. Panorama photograph of the motion capture room usedto conduct our ground-truth experiments. Visual feature densityvaries substantially throughout this room.

by starting with an empty set of matched feature pairsand iteratively adding matched feature pairs with great-est degree that is consistent with all matched featurepairs in the clique (Figure 3). Overall, this algorithmhas a runtime quadratic in the number of matched fea-ture pairs, but runs very quickly due to the speed ofthe consistency checking. In our experimental anal-ysis, we compare this approach to RANSAC-basedmethods (Nistér et al. 2004; Konolige et al. 2007).

6. Motion estimation. The final motion estimate is com-puted from the matched features in three steps. First,Horn’s absolute orientation method provides an ini-tial estimate by minimizing the Euclidean distancesbetween the inlier feature matches (Horn 1987). Second,the motion estimate is refined by minimizing the fea-ture reprojection error using a nonlinear least-squaressolver (Benhimane and Malis 2004). This refinementstep implicitly accounts for the increase in depth uncer-tainty with range due to the fact that the depth esti-mates are computed by stereo matching in image space.Finally, feature matches exceeding a fixed reprojectionerror threshold are discarded from the inlier set and themotion estimate is refined once again.To reduce short-scale drift, we additionally use akeyframe technique. Motion is estimated by comparingthe newest frame against a reference frame. If the cam-era motion relative to the reference frame is successfullycomputed with a sufficient number of inlier features,then the reference frame is not changed. Otherwise,the newest frame replaces the reference frame afterthe estimation is finished. If motion estimation againstthe reference frame fails, then the motion estimationis tried again with the second most recent frame. Thissimple heuristic serves to eliminate drift in situationswhere the camera viewpoint does not vary significantly,a technique especially useful when hovering.

2.2.1. Visual odometry performance There are a variety ofvisual odometry methods, and the existing literature is oftenunclear about the advantages and limitations of each. Wepresent results comparing a number of these approaches andanalyze their performance. As is true in many domains, thetradeoffs can often be characterized as increased accuracyat the expense of additional computational requirements.In some cases, the additional cost is greatly offset by theimproved accuracy.

We conducted a number of experiments using a motioncapture system that provides 120 Hz ground-truth measure-ments of the MAV’s position and attitude. The motion cap-ture environment can be characterized as a single roomapproximately 11× 7× 4 m in size, lit by overhead fluores-cent lights and with a wide variation of visual clutter – onewall is blank and featureless, and the others have a vary-ing number of objects and visual features (see Figure 4).While this is not a large volume, it is representative of manyconfined, indoor spaces, and the motion capture system pro-vides the opportunity to directly compare against groundtruth.

We recorded a dataset of the MAV flying various patternsthrough the environment, designed to challenge vision-based approaches to the point of failure, and includesmotion blur and feature-poor images, as would commonlybe encountered indoors and under moderate lighting con-ditions. Substantial movement in X , Y , Z, and yaw were allrecorded, with small deviations in roll and pitch. We numer-ically differentiated the motion capture measurements toobtain the vehicle’s ground truth 3D velocities, and com-pared them to the velocities and trajectories estimated bythe visual odometry and mapping algorithms. Table 1 showsthe performance of our integrated approach, and its behav-ior when adjusting different aspects of the algorithm. Eachexperiment varied a single aspect from our approach. Wepresent the mean velocity error magnitude, the overall com-putation time per RGB-D frame and the gross failure rate.We define a gross failure to be any instance where thevisual odometry algorithm was either unable to produce amotion estimate (e.g. due to insufficient feature matches) orwhere the error in the estimated 3D velocities exceeded afixed threshold of 1 m/s. Timing results were computed ona 2.67 GHz laptop computer.

Visual odometry variations

Figure 5 shows the overall flow of the visual odometry pro-cess, including the different variants that we assessed forthe following steps of the process:

Inlier detection. RANSAC-based methods (Nistér et al.2004) are more commonly used than the greedy max-clique approach. We tested against two RANSAC schemes,traditional RANSAC, and preemptive RANSAC (Nistér2005).

Experimentally, we saw that for a fixed amount of com-putation time, RANSAC often performed worse than thegreedy max-clique approach when a high fraction of thematches were outliers. In such cases, many iterations arerequired to find the correct hypothesis with high probabil-ity. Each of these iterations involves generating a motionhypothesis from three randomly selected matches and test-ing the consistency of the remaining matched featuresagainst this hypothesis evaluated in terms of reprojectionerror. The evaluation becomes especially onerous with alarge number of matches. Preemptive RANSAC attempts




Fig. 5. Overall process flow of visual odometry. The different alternatives for each step of the process are shown in each box, and thebest performing algorithm is highlighted in yellow.

to speed up RANSAC by avoiding excessive scoring ofwrong motion hypotheses, but still must bear the cost ofevaluating hypotheses in terms of reprojection error. In ourexperiments, when allocated a comparable amount of com-putation time (by using 500 hypotheses), greedy max-cliqueoutperformed both by virtue of the very fast consistencychecking.

Initial rotation estimation. A good initial rotation esti-mate can help constrain the feature matching process andreduce the number of incorrect feature matches. Disablingthe rotation estimate results in slightly faster runtime, butmore frequent estimation failures.

Gaussian pyramid levels. Detecting and matching fea-tures on different levels of a Gaussian pyramid providesresilience against motion blur and helps track larger fea-tures.

Reprojection error. We compared unidirectional motionrefinement, which minimizes the reprojection error ofnewly detected features onto the reference frame, againstbidirectional refinement, which additionally minimizes thereprojection error of reference features projected onto thenew frame. We additionally compared a standard Gauss–Newton optimization technique with ESM (Benhimane andMalis 2004). Bidirectional refinement does provide slightlymore accuracy without substantially greater cost, and wefound no significant difference between Gauss–Newton andESM.

Feature window size. As expected, larger feature win-dows result in more successful motion estimation at thecost of additional computation time. Interestingly, a verysmall window size of 3×3 yielded reasonable performance,

a behavior we attribute to the constraints provided by theinitial rotation estimate.

Subpixel refinement, adaptive thresholding, and featurebucketing. We found the accuracy improvements affordedby subpixel feature refinement outweighed its additionalcomputational cost. While the lighting in the motion cap-ture experiments did not substantially change, adaptivethresholding still yielded a lower failure rate. We wouldexpect the accuracy difference to be greater when flyingthrough more varied lighting conditions. Finally, withoutfeature bucketing, the feature detector often detected clus-ters of closely spaced features, which in turn confused thematching process and resulted in both slower speeds anddecreased accuracy.

Real-world environments

The failure rate reported in Table 1 leads to the conclu-sion that our algorithm is unlikely to have been capable ofautonomously flying the MAV during the entire recordedtrajectory through this environment with artificially sparseperceptual features. We also compared the performance ofour algorithm in a more realistic environment with richervisual features. Example images of the sparse environmentand the real-world environment are shown in Figures 6(a)and 6(b), respectively.

In Table 2, we observe that the performance of the algo-rithm in the real-world environment was significantly betterthan in the sparse environment. In this environment, manyof the gross failures are due to the blank wall on one sideof the room – no state estimation process based on visualfeatures can overcome this problem. To specifically address




Table 1. Comparison of various approaches on a challengingdataset. The error was measured using a high-resolution motioncapture system for ground truth.

Velocity error % gross Total(m/s) failures time (ms)

Our approach 0.387 ± 0.004 3.39 14.7Inlier detectionRANSAC 0.412 ± 0.005 6.05 15.3Preemptive RANSAC 0.414 ± 0.005 5.91 14.9Greedy max-clique– our approach 0.387 ± 0.004 3.39 14.7Initial rotation estimateNone 0.388 ± 0.004 4.22 13.6Gaussian pyramid levels1 0.387 ± 0.004 5.17 17.02 0.385 ± 0.004 3.52 15.13 – our approach 0.387 ± 0.004 3.39 14.74 0.387 ± 0.004 3.50 14.5Reprojection errorminimizationBidir. Gauss–Newton 0.387 ± 0.004 3.24 14.7Bidir. ESM – our approach 0.387 ± 0.004 3.39 14.7Unidir. Gauss–Newton 0.391 ± 0.004 3.45 14.6Unidir. ESM 0.391 ± 0.004 3.47 14.6Absolute orientation only 0.467 ± 0.005 10.97 14.4Feature window size3 0.391 ± 0.004 5.96 12.85 0.388 ± 0.004 4.24 13.77 0.388 ± 0.004 3.72 14.29 – our approach 0.387 ± 0.004 3.39 14.711 0.388 ± 0.004 3.42 15.7Subpixel featurerefinementNo refinement 0.404 ± 0.004 5.13 13.1Adaptive FAST thresholdFixed threshold (10) 0.385 ± 0.004 3.12 15.3Feature grid/bucketingNo grid 0.398 ± 0.004 4.02 24.6

(a) Sparse environment (b) Realistic environment

Fig. 6. Comparison of samples images from a sparse, challengingdataset (a) and a more realistic dataset (b).

this problem and to ensure the safety of the vehicle, we willturn to planning algorithms presented in Section 3.

2.3. Mapping

Visual odometry provides locally accurate pose estimates;however, global consistency is needed for metric map gen-eration and navigation over long time-scales. We therefore

Table 2. Comparison of performance in a sparse, challengingdataset and a more realistic dataset. We use the same modules andparameters as described above in both environments.

Velocity error % gross(m/s) failures

Sparse environment 0.387 ± 0.004 3.39Realistic environment 0.08 ± 0.004 0

integrate our visual odometry system with our previouswork in RGB-D mapping (Henry et al. 2010) to providecorrections to the estimated position changes of the cam-era between loop closures. Just as with the visual odometry,each of these estimated position changes can then be incor-porated into the overall state estimate using a specific for-mulation of equation (1) as described in Section 2.1. Thissection focuses on the key decisions required for real-timeoperation; we refer readers to our previous publication fordetails of the original algorithm that emphasizes mappingaccuracy (Henry et al. 2010).

Unlike the local pose estimates needed for maintainingstable flight, map updates and global pose updates are notrequired at a high frequency and can therefore be processedon an offboard computer. The MAV transmits RGB-D datato an offboard laptop, which detects loop closures, com-putes global pose corrections, and constructs a 3D log-likelihood occupancy grid map. For coarse navigation, wefound that a grid map with 10 cm resolution provided auseful balance between map size and precision. Each depthimage from the camera is downsampled to 128×96 prior toa voxel map update to increase the update speed, resultingin spacing between depth pixels of approximately 5 cm ata range of 6 m. Incorporating a single frame into the voxelmap currently takes approximately 1.5 ms.

As before, we adopt a keyframe approach to loopclosure – new RGB-D frames are matched against a smallset of keyframes to detect loop closures, using a fast imagematching procedure (Henry et al. 2010). New keyframesare added when the accumulated motion since the previouskeyframe exceeds either 10◦ in rotation or 25 cm in trans-lation. When a new keyframe is constructed, a RANSACprocedure over FAST keypoints (Rosten and Drummond2006) compares the new keyframe to keyframes occurringmore than 4 s prior. As loop closure requires matchingnon-sequential frames, we obtain putative keypoint matchesusing Calonder randomized tree descriptors (Calonder et al.2008). A new keypoint is considered as a possible matchto an earlier frame if the L2 distance to the most sim-ilar descriptor in the earlier frame has a ratio less than0.6 with the next most similar descriptor. RANSAC inliermatches establish a relative pose between the frames, whichis accepted if there are at least 10 inliers. Matches with areprojection error below a fixed threshold are consideredinliers. The final refined relative pose between keyframes isobtained by solving a two-frame sparse bundle adjustment(SBA) system, which minimizes overall reprojection error.




To keep the loop closure detection near constant timeas the map grows, we limit the keyframes against whichthe new keyframe is checked. First, we only use keyframeswhose pose differs from the new frame (according tothe existing estimates) by at most 90◦ in rotation and5 m in translation. We also use Nistér’s vocabulary treeapproach (Nistér and Stewenius 2006), which uses a quan-tized “bag of visual words” model to rapidly determine the15 most likely loop closure candidates. Keyframes that passthese tests are matched against new frames, and matchingis terminated after the first successful loop closure. On eachsuccessful loop closure, a new constraint is added to a posegraph, which is then optimized using TORO (Grisetti et al.2007a). Pose graph optimization is typically fast, converg-ing in roughly 30 ms. Corrected pose estimates are thentransmitted back to the vehicle, along with any updatedvoxel maps.

Greater global map consistency can be achieved using asparse bundle adjustment technique that optimizes over allmatched features across all frames (Konolige 2010). How-ever, this is a much slower approach and not yet suitable forreal-time operation.

3. Trajectory planning

The visual odometry and SLAM processes in the previ-ous sections described how to estimate the position of thevehicle and the environment around it, but did not describehow the vehicle should move to explore that environment.We assume that the vehicle is holonomic and that we havefull control authority, allowing us to treat the trajectoryplanning problem as a kinematic motion planning problem.Our UAV uses an onboard IMU and processor to auto-stabilize the helicopter’s pitch and roll axes (Gurdan et al.2007). As a result, we model the configuration space asC = R3×S1: three dimensions for the UAV’s position, andone for the UAV’s yaw angle, where C denotes the config-uration space (Lozano-Perez 1983), the space of all vehicleposes. Cfree ∈ C is the set of all collision-free poses (basedon a known map M of obstacles and the dimensions of theUAV) and Cobst ∈ C is the set of poses resulting in colli-sion with obstacles, so that C ≡ Cfree ∪ Cobst. Exploring anunknown environment is often modelled as a problem ofcoverage, where the objective is to visit all reachable statesor “frontiers” that lie on the boundary of the known freespace (Yamauchi et al. 1998; Kollar and Roy 2008). There-fore, given the current vehicle state x0 ∈ Cfree and the partialmap of the environment, the planning problem is to find asequence of actions to move the vehicle from state x0 to afrontier state xg ∈ Cfree without collisions.

The probabilistic roadmap (PRM) is a well-known algo-rithm (Kavraki et al. 1996) for planning in problems of morethan two or three dimensions, in which a discrete graphis used to approximate the connectivity of Cfree. The PRMbuilds the graph by sampling a set of states randomly fromC (adding the start state x0 and goal state xg), and then eval-uating each sampled state for membership in Cfree. Samples

that lie within Cfree constitute the nodes of the PRM graphand edges are placed between nodes where a straight linepath between nodes also lies entirely within Cfree. Giventhe PRM graph, a feasible, collision-free path can be foundusing a standard graph search algorithm from the start nodeto the goal node.

However, the PRM and its variants are not yet wellsuited to the problem of a GPS-denied UAV, in that exe-cuting a plan requires a controller that can follow eachstraight-line edge joining two successive graph nodes in theplanned path. If the UAV executing the plan does not havea good estimate of its state, it may not be able to deter-mine when it has arrived at a graph node and is to startfollowing a new edge. Even more seriously, UAV stabil-ity typically depends on accurate estimates of higher ordervariables such as velocity. Without environmental feedback,velocity estimates from an inertial measurement unit (IMU)can quickly drift, causing catastrophic control failures. Wetherefore need the motion planner to generate plans thatensure accurate state estimation along the planned path.Recall from Section 2.1 that our state estimates for con-trol are provided by the Bayes filter of equation (1), whichgenerates a distribution over possible states by fusing mea-surements from multiple sensors. Since these distributionsare provided by the Kalman family of filters, the poste-rior distributions will be Gaussian, characterized by a meanstate μ and a covariance �. By planning in the beliefspace (or space of distributions), the planner can distinguishbetween future state estimates where the covariance will besmall (i.e. the vehicle has high confidence in its mean stateestimate) and future state estimates where the covariancewill be large (i.e. the mean state estimate is uncertain). Toaddress the problem of planning in belief space, we use thebelief roadmap (BRM) algorithm, first presented by Pren-tice and Roy (2007), and summarize the algorithm in thefollowing section.

3.1. Belief space planning

When planning in belief space, a naive approach wouldbe to treat the belief space as a high-dimensional config-uration space with some dimensions given by the covari-ance of the belief, and then directly apply the probabilisticroadmap. Assuming the beliefs are provided by a variant ofthe Kalman filter as described in Section 2.1, this approachwould require sampling beliefs directly from the space ofGaussian distributions ( μ, �) over the state, adding theinitial belief b0 to the set of graph nodes, placing edgesbetween pairs of beliefs ( bi, bj) for which a controller existsthat can take the vehicle from belief bi to bj, and then car-rying out a graph search as before to find a path that leadsto a belief with maximum probability at the goal. However,the direct application of the unmodified PRM to the beliefspace has some obvious failures, which can be addressed bythe following modifications to the PRM algorithm.

Firstly, in a Gaussian belief space, every belief has some(small) probability that the robot is at the goal state, hence a




different objective function is required. In order to incorpo-rate the full Gaussian distribution in our planning process,we continue to search for a shortest path trajectory, but addthe additional constraint that the uncertainty of the beliefmust be small throughout the path, that is, the trace ofthe covariance of the helicopter’s belief tr( �) < ε whereε is some defined safety parameter and � is the covari-ance of the UAV’s state estimate. (Note that, depending onthe problem statement, covariance terms such as velocityand orientation may or may not be included in the overallobjective. A variety of alternatives to this objective func-tion are discussed in the original BRM paper (Prentice andRoy 2009).)

To plan efficiently, the BRM uses the fact that each Gaus-sian belief bt is a combination of some μ and some �,where the reachability of μ and � can be calculated sep-arately. Under mild assumptions of unbiased motion andsensor models, the reachability of any μ is a function of thevehicle kinematics and the environmental structure, just asin the PRM. For any μ that is reachable from the μ0 of theinitial distribution, the corresponding reachable covariancecan be predicted by propagating the initial covariance �0

along the path using an iterative application of the motionand sensing models (see equations (8) and (9) in AppendixA). Therefore, to construct a graph of the reachable beliefspace, the planner first samples a set of mean poses {μi}from Cfree using the standard pose sampling of the PRMalgorithm, and places an edge eij between pairs ( μi, μj) ifthe straight line between poses is collision free. A forwardsearch can then be used to search for a path through thegraph, but each step of the search computes the posteriorcovariance at each node in addition to the cost-to-go.

Covariance propagation requires multiple EKF updatesalong each edge eij. While the cost of propagating thecovariances along each edge in the graph is a constantmultiplier of the asymptotic search complexity, it can stilldominate the overall search time.

To reduce this computational cost, the BRM uses theproperty that the covariance of a Kalman filter-based stateestimator can be factored as � = BC−1, which allows thetwo steps of the process and measurement update for theEKF to be combined as separate Bt and Ct block matrices,each a linear function of Bt−1 and Ct−1. The complete EKFequations are given in Appendix A, but the key quantitiesfor propagation of the covariance are the process Jacobian,Gt, process noise, Rt, the measurement Jacobian, Mt, andthe measurement noise, Qt. The linear forms of the pro-cess and measurement update do not depend on the specificfactorization, so we can use a trivial initial factorization asB0 = �0, C0 = I , such that

�t �[

BC

]t

=[

0 II M

]t

[0 G−T

G RG−T

]t

[BC

]t−1

(2)

where �t is defined to be the stacked block matrix[BC

]t

Algorithm 1 Belief roadmap (BRM) algorithm.Require: Start belief ( μ0, �0), goal μgoal and map C

1: Sample poses {xi} from Cfree to build belief graph node set{ni} such that ni = {μ = xi, � = ∅}

2: Create edge set {eij} between nodes ( ni, nj) if the straight-linepath between ( ni[μ], nj[μ]) is collision free

3: Build one-step transfer functions {ζij} ∀eij ∈ {eij}4: Augment each node ni with the best path p = ∅ to ni, such

that ni = {μ, �, p}5: Create search queue with initial position and covariance Q←

n0 = {μ0, �0, ∅}6: while Q is not empty do7: Pop n← Q8: if n = ngoal then9: Continue

10: end if11: for all n′ such that ∃en,n′ and not n′ � n[p] do12: Compute one-step update � ′ = ζn,n′ · �, where � =[ n[�]

I

]13: �′ ← � ′11 ·� ′21

−1

14: if tr( �′) < tr( n′[�]) then15: n′ ← {n′[μ], �′, n[p] ∪ {n′}}16: Push n′ → Q17: end if18: end for19: end while20: return ngoal[p]

consisting of the covariance factors and

ζt �[

W XY Z

]t

is defined to be the one-step linear operator on the covari-ance factors, equivalent to the process model and the mea-surement model. We recover the posterior covariance fromthe posterior factors as �t = BtC

−1t .

The EKF approximation assumes that the measurementfunction is locally linear, which is exactly the approxima-tion that the Jacobian is locally constant. As a result, when-ever the EKF assumptions hold, then we can assume thatMt is constant and known a priori. By multiplying �t−1

by a series of matrices ζt:T , we can compute the posteriorcovariance �T from T−t matrix multiplications and a singlematrix inversion on CT . This allows us to determine ζt forany point along a trajectory and the linearity of the updateallows us to combine multiple ζt matrices into a single, one-step update for the covariance along the entire length of atrajectory.

Algorithm 1 describes the complete belief roadmap algo-rithm, and Step 2 of the algorithm contains a pre-processingphase where each edge is labeled with the transfer functionζij that allows each covariance to be propagated in a singlestep. By pre-computing the transfer function for each edge,the search complexity for belief space planning becomescomparable to configuration space planning.




3.1.1. Belief space planning using the unscented Kalmanfilter The critical step of the BRM algorithm is the con-struction of the transfer function, which depends on termsRt and Mt, the projections of the process and measurementnoise terms into the state space. Rt and Mt represent theinformation lost due to motion, and the information gaineddue to measurements, respectively (again, see equations (8)and (9) in Appendix A). When using the extended Kalmanfilter (EKF) to perform state estimation, these terms are triv-ial to compute. However, the EKF is not always a feasibleform of Bayesian filtering, especially when linearizing thecontrol or measurement functions results in a poor approx-imation. One recent alternate to the EKF is the unscentedKalman filter (UKF) (Julier et al. 1995), which uses a set of2n+1 deterministic samples, known as “sigma points” froman assumed Gaussian density both to represent the proba-bility density of a space of dimensionality n and to directlymeasure the relevant motion and measurement covariances.Appendix B provides a formal description of the UKF, andhow to recover the information gain matrix Mt.

One concern is that the change in information modelledby Mt is constant in the Kalman filter and approximated asconstant in the extended Kalman filter (assuming locallyconstant Jacobians), but in the UKF model, the changein information depends on the specific prior �t. Differentchoices of �t for equation (27) may result in different one-step transfer functions. Nevertheless, the approximationerror can be seen experimentally to be small. Figure 7(a)compares the covariances computed using the full UKFupdate with the covariances computed using the one-steptransfer function for a range of motions and randomizedinitial conditions. The induced error is low; the traces ofthe posterior covariances computed with the one-step trans-fer function using the Mt matrix calculated in equation (27)closely match the posterior covariances from the full UKFmodel. Figure 7(b) shows a distribution of the ratio of theapproximation errors to the magnitudes of the informationgain, where 7000 trials were performed using 100 differ-ent priors and a range of initial conditions and trajectorieswere used to calculate the Mt matrix. The error induced inthe one-step transfer function by using a constant Mt is lessthan 2% with a significance of p = 0.955, indicating lowsensitivity to the choice of prior over a range of operatingconditions.

3.2. Sampling in belief space

The original belief roadmap formulation presented by Pren-tice and Roy (2007, 2009) assumed some base samplingstrategy for generating the graph through belief space. Asthe number of samples and the density of the graph grows,the BRM planning process will find increasingly low-covariance paths and is probabilistically complete. How-ever, as the density of the graph grows, the cost of searchingthe graph will also grow; searching the graph will have timecomplexity O( bd) for b edges per node and a path of length

d edges. We can reduce this complexity by minimizing thesize of the graph using a sampling strategy that generatesnodes that reflect the useful part of the belief space.

The optimal sampling strategy would generate samplesthat lie only on the optimal path to the goal; this would ofcourse require knowing the optimal path beforehand. How-ever, some samples are more likely to be useful than others:vehicle poses that generate measurements with high infor-mation value are much more likely to lie on the optimalpath than vehicle poses that generate measurements withlittle information.

3.2.1. Sensor uncertainty sampling If poses are initiallysampled from C uniformly, but are accepted or rejectedwith probability proportional to the expected informationgain from sensing at each point, the graph will still con-verge to one that maintains the connectivity of the freespace. But, the distribution of the graph nodes will be biasedtowards locations that generate sensor measurements whichmaximize the localization accuracy of the vehicle. We callthis sampling strategy sensor uncertainty (SU) sampling,after the “sensor uncertainty field” defined by Takeda andLatombe (1992). The SU field is a mapping from locationx to the expected information gain, x → I( x), where theinformation gain is measured as the difference in entropy ofthe prior and posterior distributions, which in the Gaussiancase is proportional to the lengths of the eigenvectors of thecovariance. However, examining the information filter formof the measurement update in equation (12), we can see thatthe posterior covariance results from adding a fixed amountof information Mt; the covariance therefore increases in sizeby an amount proportional to Mt. We can efficiently approx-imate the SU field using the size of Mt such as tr( Mt) (thatis, the average of each eigenvector of Mt, Fedorov 1972),rather than explicitly computing the posterior covarianceand the resulting information gain. Finally, building thecomplete SU field is computationally expensive in practice;by sampling from this field in building the BRM graph, wegain the benefits of focusing the search on the states thatlead to high information gain without the cost of explicitlybuilding the SU field.

Figure 8(a) shows the ground floor of MIT’s Stata Centerwith a 3D view of this environment in Figure 8(b). Theenvironment has dimensions of 13 × 23 m. The helicopteris equipped with a simulated RGB-D camera that is ableto sense features, represented by the green crosses on thewalls. We initially assume an unrealistically poor sensormodel to highlight the variations of the different algorithms,specifically that the camera has a 2 m range and a 30◦

field-of-view in the pitch and the yaw directions. A featurethat falls within the helicopter’s field-of-view will thengenerate a noisy measurement of the feature’s distance andbearing relative to the helicopter’s pose. The measurementnoise is assumed to be Gaussian with 1 m2 variance forthe range measurement and 1 rad2 in angular variance,and is independent of the distance between the feature and




Fig. 7. (a) Comparison of the trace of the covariance from full UKF filtering and the trace of the covariance from the one-step transferfunction using the UKF Mt matrix. (b) Distribution of the ratio of errors induced by computing the Mt matrix for the one-step transferfunction using a constant prior.

(a) Photo of environment (b) 3D-environment (c) SU Field

Fig. 8. (a) MIT’s Stata Center, ground floor. (b) 3D model of the unstructured, GPS-denied environment. The green crosses are theknown positions of visual features to be used for localization. Each wall has a different density of visual features, though the featuresare assumed to be uniformly distributed on each wall. (c) Sensor uncertainty field for a fixed height and RGB-D camera orientation.The darker cells indicate locations expected to produce greater information gain.

the helicopter. (We also assume perfect data association,which for the purposes of experimental assessment allowsus to control the noise of the sensor by varying termsin the measurement matrix. This is clearly an unrealisticassumption but our simulation experiments with noisydata association did not have significantly different resultsand are not presented.) By rotating the helicopter’s yaworientation, the planner can direct the camera in differentdirections, thereby enabling the helicopter to localize itselfin the environment using the features in its field-of-viewand a given map. This sensor model is unrealistic in termsof the maximum range and the constant noise model, butserves to illustrate how our planning approach achievesvarying accuracy at the goal. In the subsequent sections,we show results for a more accurate sensor model in termsof reaching the goal.

To create the corresponding sensor uncertainty fieldshown in Figure 8(c), the trace of the information gainmatrix, tr( M), is evaluated at each location (x, y) in Cfree

for a fixed height, yaw, pitch, and roll angle. Here, the cam-era is assumed to be pointing eastwards at a fixed height.Note that the full SU field is the same dimensionality as Cand would require computing the information gain for everycoordinate in Cfree. (The 2D slice of the sensor uncertaintyfield shown in Figure 8(c) is given only to illustrate the con-cept.) We can, however, evaluate the information gain of

a specific position efficiently, allowing us to draw samplesrandomly from Cfree and accept them based on their infor-mation gains tr( M). The intensities of the cells in the mapin Figure 8(c) correspond to the information gain, wheredarker map cells indicate locations that are expected to pro-duce greater information gain. For instance, the region inthe center of the map has high information gain because ofthe high concentration of features along the walls in thatregion. The information gain increases with the distance toeach wall because the number of features in the field of viewincreases more than the growing covariance, until the dis-tance to the wall is greater than the maximum range of thesensor. Locations where the associated sensor measurementis expected to detect more than one obstacle in the map alsotend to have higher information gain compared to those thatjust encounter one obstacle. Remember that we do not needto simulate actual measurements; in order to compute tr( M)we only need the measurement Jacobians.

Environmental adaptation

Note that values of tr( M) do not form a proper distribu-tion, so we cannot accept or reject samples trivially accord-ing to tr( M). Additionally, the range of values of tr( M)will vary across environments, depending on how easy




the given world is to localize in. Therefore, for a spe-cific environment, and for k samples x1, . . . , xk with cor-responding information gains tr( M1) , . . . , tr( Mk), we esti-mate a normal distribution over the information such thattr( Mk)∼ N( μk , σk), where ( μk , σk) are the sample meanand covariances of tr( M1:k). We then perform rejection sam-pling according to this distribution. For a new sample xk ,we draw a rejection threshold Pk according to the latestsampled normal distribution, and we retain the sample iftr( Mk) > Pk , otherwise reject it. This provides us with anonline method for estimating the distribution of informa-tion in the environment, and allows us to bias our acceptedsamples towards areas with greater expected informationgain relative to the rest of the environment. By perform-ing rejection sampling with a Gaussian proposal distribu-tion, we preserve the support of the distribution over theentire configuration space. The measure provided by sen-sor uncertainty sampling changes the performance of thesampler (as discussed by Hsu et al. 2006) in contrast to sam-pling strategies with other measures but does not affect thecompleteness of the algorithm.

3.2.2. Unscented Kalman filter sampling When using astate estimator that does not directly compute Mt, recallthat we can recover Mt from the prior distribution p( x)and the posterior distribution p( x|z), for example from theunscented Kalman filter as in Section 3.1. However, torecover Mt from the UKF prior and posterior requires usto invert the St matrix with complexity O( |Z|3), where |Z|is the number of measurements. In EKF-SU sampling, wewere able to avoid this complexity because the Mt matrixcould be computed directly from the measurement Jaco-bians. Given the number of measurements and the largenumber of samples that must be evaluated for informationgain, inverting St may be computationally expensive. Wetherefore sample according to the traditional informationgain I

I( x)= H( p( x) )−H( p( x|z) ) (3)

where the entropy is H( p( x) )= − ∫ p( x) log p( x). Notethat we do not necessarily know which measurementwe will receive at a given x, so we approximate equa-tion (3) using the most likely measurement such that z =argmaxz p( z|x). Given that we have assumed that the beliefof the helicopter’s position is represented as a Gaussiandistribution, H( p( x) ) is cheaper to compute than Mt. Inaddition, since our analysis (Figure 7(b)) suggested that themeasure of information gain was statistically insensitive tothe choice of prior, we use a constant prior p( x)= �0 toevaluate sensor uncertainty, such that H( p( x) )= P0

I( x)= P0 − H( p( x|z) ) (4)

where p( z|x) is calculated according to the UKF algo-rithm by simulating the sensor measurement at the sample’slocation and finding the probability of observing the sen-sor measurement at each of the sigma points. In general,

Algorithm 2 UKF sensor uncertainty sampling algorithmRequire: Map C, number of samples N , constant prior P0

1: while size of graph < N do2: Sample a pose, xk , from C, with equal probability3: if xk ∈ Cfree then4: Simulate expected sensor measurement, z, at xk

5: Generate sigma points, χi, about xk according toconstant prior P0, creating prior distribution p( xk)

6: Calculate information gain I( xk)= P0 −H( p( z|xk) )

7: Normalize I( xl) such that I( xk)∈ [0, 1]8: Update mean of I, μk = 1

k

∑km=1 I( xm)

9: Update cov of I, σk = 1k−1

∑km=1( I( xm)−μk)2

10: Sample threshold Pk from normal distributionN ( μk , σk)

11: if I( xk) > Pk then12: Add xk to graph with probability I( xk)13: end if14: end if15: end while16: return graph

Table 3. Average time (in ms) to compute tr( Mt) and I( x)

Number of measurements

90 300 500

tr( Mt) 0.731 2.84 7.23I( x) 0.0743 0.187 0.289

the lower the probability of observation at the neighbor-ing sigma points, the smaller the entropy of the posteriordistribution, and therefore the greater the information gain.We normalize the information gain I( x) so that it lies inthe range [0,1] by dividing by P0. Similar to our approachfor EKF-SU sampling, we then estimate a normal distribu-tion over the information gain such that I( x)∼ N( μk , σk),where ( μk , σk) are the sample mean and covariances ofI( x). Finally, we choose a rejection threshold Pk accord-ing to this normal distribution, and accept the sample ifI( x) > Pk . Algorithm 2 summarizes the UKF-SU samplingstrategy.

Table 3 shows the computational benefit of rejection sam-pling according to the information gain I( x), rather than ameasure on Mt. We evaluated the time taken to generatesamples for a range of measurements using the two differ-ent rejection sampling functions. Regardless of the numberof measurements, we saw an order of magnitude in timesavings when calculating information gain, which can besignificant for large graphs.

Figure 9(a) shows the samples drawn according to thesensor uncertainty. Observe that the sampling density ishighest in the same region as the dark region in Figure 8(c)(center of map), and is lowest when far from any environ-mental structure, which consequently provides little or no




localization information. For comparison, 9(b) shows thesamples drawn according to a uniform sampling strategy.

In practice, the differences in sampling strategies canresult in different paths and correspondingly differentuncertainties. Figure 9(a) shows that the sampler avoidedplacing samples in the large open space of the environ-ment, and the samples are more clustered near the wallsin the environment. As a result, the paths created by sen-sor uncertainty sampling tend to stay in regions with highinformation gain, since the samples were probabilisticallychosen based on the amount of information gain each wasexpected to provide. The uniform sampling strategy alsoattempts to find a low-uncertainty path but the lack of sam-ples in the regions with high information gain results in apath with higher uncertainty.

3.2.3. Alternate sampling strategies In the figures, the SUsampler appears to put samples close to obstacles, and asampler that simply samples close to obstacle boundariesmay do well. To test this hypothesis and evaluate the effec-tiveness of the SU sampling strategy, we compared it withother sampling strategies that have gained popularity in theliterature. Although these algorithms have been proposedto improve the performance of the PRM algorithm, they canalso be used to test the performance of the SU strategy in theBRM context. In this section, we first describe three alter-native sampling strategies (uniform, Gaussian, and bridge),before reporting the results of the BRM path-planning whenusing each of these strategies. We will see that in factthe SU sampling strategy leads to better performance thansampling strategies that use the obstacle boundaries.

Uniform sampling. Uniform sampling is the most basicsampling strategy used by the majority of sampling-basedtechniques. This algorithm does not use any known infor-mation about the world, but merely samples C uniformly,and adds samples to the graph that are in Cfree. By employ-ing a simple collision-check function, the uniform samplingstrategy is a very efficient means of obtaining the generalconnectivity of a given map. Figure 9(b) shows an exampleof the samples generated using this sampling method; notethat unlike the SU sampler, the uniform sampler places sam-ples in the middle of the open area. With a small numberof samples (100), there are not enough samples facing theenvironmental structure and near enough to the structureto localize well. As a result, the predicted covariances ofthe planned path are much larger than those for the plannedpath using the SU samples (Figure 9(a)). Even though thesame planner and objective function are used in both fig-ures, the different samples result in dramatically differentperformance, especially for small numbers of samples.

Gaussian sampling: A significant limitation of the uniformsampling strategy is that it often fails to represent impor-tant regions in Cfree, for instance, difficult regions such asnarrow corridors and areas around obstacles may not be

(a) SU Sampling (b) Uniform Sampling

Fig. 9. (a) Distribution of 100 samples (shown in red) drawn usingsensor uncertainty sampling. (b) Distribution of 100 uniformlydrawn samples. In both figures, the dark circles are the 1−σ

ellipses of the covariance. Smaller circles are higher-certaintypositions. Note that this is a bird’s-eye view and the helicoptercan fly over some obstacles. Also note that each sample is a pointin R3×S1×S1; the SU samples have a high bias towards sensororientations towards the environmental features. In both figures,the paths are found using the BRM, but because the uniform sam-pling strategy has many more samples with orientations that do notpoint towards the environmental features, the overall uncertainty ismuch higher.

sampled unless a large number of samples are used, incur-ring a large computation cost. Boor et al. (1999) present aGaussian sampling strategy that attempts to give better cov-erage of the difficult parts of the free configuration space,especially those areas that are close to obstacles. Gaussiansampling biases samples towards obstacles in the environ-ment, which, in the context of the BRM, would seem to bea reasonable approximation for areas with higher informa-tion gain. The algorithm first uniformly samples the C spaceto obtain a sample, x1

k , regardless of whether it is in Cfree

or Cobs. A distance value, d, and direction θ are then cho-sen according to normal distributions, and a second sample,x2

k , is generated at a location d away from x1k in the direc-

tion of θ . The two samples are then tested to determine ifthey belong to the subspaces Cfree or Cobs; if the samplesare in different subspaces of C, the sample that is in Cfree

is then added to the graph. For the purposes of our eval-uation, using the general intuition that the samples shouldbe within viewing range of the obstacles and features in theenvironments, we set σ , the standard deviation of the distri-bution on d, to be the maximum range of the sensor used forlocalization. Figure 10(a) shows an example set of samplesgenerated by the Gaussian sampling strategy.

The intent of the Gaussian sampler is to place samplesnear environmental structure, which may seem similar inspirit to the SU sampler. However, nearness to obstacles is




(a) 100 Samples generatedfrom Gaussian Sampling

(b) 100 Samples generatedfrom Bridge Sampling

Fig. 10. (a) Distribution of 100 samples drawn using Gaussiansampling. (b) Distribution of 100 samples drawn using bridgesampling.

insufficient – a sample may be placed too near an uninfor-mative obstacle in terms of perceptual features, or may beplaced in an orientation that is not helpful. With a smallnumber of samples (100), there are not enough samples toprovide useful localization. As a result, just as in the uni-form sampler, the predicted covariances of the planned pathare much larger than those for the planned path using theSU samples.

Bridge sampling. A third algorithm addresses a specificproblem encountered by many sampling strategies of notbeing able to identify narrow passages through a given envi-ronment. Being able to find narrow passages through Cfree isoften critical to finding good paths in many motion planningproblems. However, narrow passages are also the hardestfor a randomized sampler to find and add to the graph,requiring strategies that are biased towards finding paths innarrow passages.

To address this problem, the bridge test for sampling nar-row passages was developed by Hsu et al. (2003). The keyidea is to only add a sample to the graph when it is found tobe between two obstacles. Two samples, x1

k and x2k , are first

sampled from the map environment, with x2k being drawn

from a normal distribution with mean x1k and a given vari-

ance σ . If both samples are found to be in Cobs, the midpointbetween the two samples is then generated and tested forcollisions. This midpoint is added to the graph if it is inCfree. For reasons similar to the Gaussian sampling strategy,σ was set at twice the maximum sensor range. Figure 10(b)shows an example set of samples generated using the bridgestrategy. Just as in the uniform and Gaussian sampler, thepredicted covariances of the planned path are much largerthan those for the planned path using the SU samples.

3.3. Comparison of sampling strategies:Simulated camera-equipped helicopter

We tested the effectiveness of the SU sampling strategyagainst the alternative sampling algorithms above by run-ning experiments on the system described in Section 3.2.1:a helicopter navigating in the simulated environment ofMIT’s Stata Center ground floor, as shown in 8(b).

We first observe that the SU strategy is particularly use-ful when there is variability in terms of the informationavailable to the sensors throughout the environment. As dis-cussed previously, our initial simulated experiments are per-formed with an unrealistically poor RGB-D camera modelwhere the sensor’s capability is particularly limited, suchthat finding paths that maximize information gain through-out the path then becomes even more critical. Figure 11compares the performance of the SU and uniform sam-pling strategies under different noise and sensor limitationconditions. When the control and measurement noise isdoubled and the maximum sensor range is reduced (Figure11(b)), the resultant uncertainty for both sampling strate-gies increases. However, the emphasis on finding sampleswith high information gain under the SU sampling strat-egy reduces the effect of the noisier conditions, resultingin a greater absolute difference in uncertainty between bothsampling strategies.

Next, to compare the different sampling strategies andillustrate the performance of the BRM strategy, we ran-domly selected five start and goal positions in the mapwhere the straight-line distance between both points was atleast a minimum length of 8.53 m and an average length of13.66 m. For each start-goal combination, we sampled theenvironment using each of the four sampling strategies anda range of sample set sizes. After creating a graph of nodesfrom these samples, the BRM and PRM planning strategieswere executed and the performance of the resulting planscompared. For a given start-goal combination, samplingstrategy and sample set size, the experiment was repeated30 times.

Figure 12 shows the advantage of planning with theBRM, and sampling using the sensor uncertainty samplingstrategy. This figure reports the performance of each of thesampling strategies and planning methods, using a fixedstart and goal for all trials, over a range of sample set sizes,where performance is measured by the percentage of trialsthat failed to find a feasible path (bar graphs), as well as theaverage trace of the helicopter’s expected covariance at thegoal after executing the planned path (line graphs).

Table 4 shows a comparison of the sampling strategiesacross various initial start and goal positions using 100samples. An infeasible path was defined as one where thecovariance of the state estimate was greater than a thresh-old. The results not only suggest that the BRM substantiallyoutperforms the PRM in producing paths with lower uncer-tainty, but also that the SU sampling strategy allows usto achieve better paths with fewer samples. Regardless of




50 100 150 200 250 300 3500

10

20

30

40

50

60

70

80

Number of Samples

tr(Σ go

al)

Goal Uncertainty for Sampling Strategies with Standard Noise Parameters

BRM − Uniform SamplingBRM − SU Sampling

(a) Low Noise

50 100 150 200 250 300 3500

10

20

30

40

50

60

70

80

Number of Samples

tr(Σ go

al)

Goal Uncertainty for Sampling Strategies with High Noise Parameters

BRM − Uniform SamplingBRM − SU Sampling

(b) High Noise

Fig. 11. Performance of SU and uniform sampling strategies under different noise conditions. The high noise scenario has double thecontrol and measurement noise relative to the low noise model, as well as a 25% reduction in maximum sensor range. The bar plotsunder each graph show the percentage of feasible paths that each algorithm was able to find.

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

80

Number of Samples

tr(Σ go

al)

Goal Uncertainty for Different Sampling Strategies and Planners

PRM − Uniform SamplingBRM − Gaussian SamplingBRM − Bridge SamplingBRM − Uniform SamplingBRM − SU Sampling

Fig. 12. Comparison of the different sampling strategies and plan-ning methods. All trials presented in this graph used the same startand goal to perform the search. Each data point represents 30 tri-als, and the error bars represent the standard error of the computedmatrix trace. The bar graphs along the bottom of the figure show,for each sampling-planning strategy and number of samples, thepercentage of the 30 trials that failed to find a path that satisfiedthe constraints on the covariance (our true objective function). Theline graph also plots the trace of the helicopter’s expected covari-ance at the goal when we use different sampling strategies, samplesizes, and planning methods. Lower covariances at the goal alsotypically correspond to more accurate performance, and is oftenused as an alternate objective function for motion planning inbelief space.

the initial conditions, the SU sampling strategy consistentlyoutperformed the other sampling strategies, in terms of boththe percentage of paths found and the expected uncertainty

Table 4. Performance of different sampling strategies across dif-ferent paths, using 100 samples.

Path 1 Path 2 Path 3 Path 4 Path 5

Uniform % success 100 96.6 100 100 100Final cov 17.87 22.60 2.22 19.11 1.48

SU % success 100 96.6 100 100 100Final cov 12.38 11.36 1.99 12.39 1.39

Gaussian % success 96.6 96.6 100 93.1 89.7Final cov 23.89 17.89 17.2 22.16 1.41

Bridge % success 100 3.5 17.2 100 13.8Final cov 21.58 13.48 2.33 21.32 1.36

at the goal. These results emphasize that SU sampling isalso not equivalent to simply sampling close to obstacleboundaries.

Table 5 shows a comparison of the performance and timecosts of different combinations of sampling and planningstrategies. The conventional PRM method is unsurprisinglythe fastest algorithm, but suffers from poor localization per-formance. The BRM suffers from additional time complex-ity when building the graph; in particular, the BRM withSU sampling incurs the largest time penalty in buildingthe graph because of the need to calculate the informationgain matrix of every potential sample. However, the graphconstruction is a one-time operation and can be amortizedacross multiple queries.

Finally, we replaced the sensor model with a more realis-tic RGB-D model, and a more accurate map of a real envi-ronment (shown in Figure 17(a)). We modeled the RGB-Dsensor model as a Microsoft Kinect with a 4 m max rangeand 57◦ field of view in the yaw and 43◦ in pitch direc-tions and a Gaussian noise model, which is a function ofdepth, such that σ = 1.5 × 10−5 × d (Khoshelham 2011).Note that for position estimation, we saw experimentallythat the noise model of the individual features had little




Table 5. Performance and time costs of different planners.

Trace cov Graph build Path searchat goal time (s) time (s)

PRM, uniform sampling 56.38 0.79 0.15BRM, uniform sampling 19.11 110.75 0.38BRM, SU sampling 12.31 323.12 0.55BRM, Gaussian sampling 22.82 88.14 0.21BRM, bridge sampling 21.92 178.58 0.30

effect – the dominant effect was the number of availablefeatures. Figure 13 shows the performance of the differentalgorithms using the RGB-D camera model. Figure 13(a) isthe relevant measure for our application: the ability to findpaths that satisfy the constraint on the covariance. We seethat even with very few samples, the BRM is able to find afeasible path. For comparison with Figure 12, we also pro-vide the covariance at the the goal. We see that the BRMalgorithms are consistently able to find lower covariancetrajectories (the absolute values of the covariances betweenFigures 12 and 13(b) are not comparable because the sensormodel and state space are different, and so different overalluncertainties are feasible.)

An examination of Table 5 also shows that the SU sam-pling does incur some additional cost, in that the time tobuild the graph is higher than the other sampling tech-niques. In a planning scenario with repeated queries, thiscost can be amortized over future queries. In a single querysetting, if there is little variability in terms of informationover the environment, then the uniform sampling strategymay be preferable. There are relatively few real-world sce-narios where there is little information throughout the envi-ronment. However, in domains where the available sensorsare powerful enough to provide accurate localization every-where and only a single planning query is to be generated,the uniform sampling strategy may be more efficient.

4. Indoor navigation results

In addition to evaluating the visual odometry algorithmsreported in Section 2.2, we conducted a number ofautonomous flight experiments in the motion capture sys-tem and in larger environments. In these experiments, thevehicle flew autonomously with state estimates provided bythe algorithms presented in this paper.

Figure 14 shows an example where the MAV was com-manded to hover at a target point using the RGB-D cam-era, along with statistics about how well it achieved thisgoal. The ground-truth trajectory and performance mea-sures were recorded with the motion capture system.

4.1. Laser-based validation of belief spacenavigation

We performed a number of experiments onboard an actualquadrotor helicopter to demonstrate the properties of our

navigation in belief space. The vehicle was commandedthrough the environment by a human operator selecting des-tination waypoints using a graphical interface. The purposeof these experiments was to characterize the ability of theMAV to maintain a desired position and to follow a plannedtrajectory. We initially validated our results by building onour previous work (Bachrach et al. 2009b), which used aHokuyo UTM-30LX laser rangefinder for navigation andlocalization. The UTM-30LX is a planar laser rangefinder,which provides a 240◦ field-of-view at 40 Hz, up to an effec-tive range of 30 m. The laser is mounted in the x-y plane ofthe helicopter and we modified the laser to optically redirect20◦ of its field-of-view to provide a small set of range mea-surements in the (downward) z direction. In a single scan,the vehicle is therefore able to estimate its position, yaworientation, and altitude with respect to environmental fea-tures. We have shown previously that the measurement ofthe ground plane is relatively noisy, although sufficient foraltitude control.

We performed navigation experiments in two world envi-ronments: on the first floor of MIT’s Stata Center, which isa wide indoor walkway environment (Figure 15(a)), and onthe first floor of MIT’s Walker Memorial building, an oldgymnasium open space that is often used for banquets andfunctions (Figure 16(a)). For these two environments, wefocused on demonstrating the BRM algorithm’s ability togenerate paths that will allow the helicopter to remain well-localized. We did not compare the BRM’s performance tothe PRM algorithm to avoid potential loss of control andcrashes resulting from inaccurate state estimation. Instead,we artificially limited the range of the laser rangefinderfor both planning and state estimation; we demonstratethe effect of different sensor range limits on the plannedpaths and the need for incorporating sensor characteris-tics when planning, before moving to mapping using theRGB-D camera.

For each of these environments, we first generated a 2Dmap of the environment using SLAM technology that hasbeen reported previously (Bachrach et al. 2009b). Whileit may appear that localization using a 2D map is difficultwhen the helicopter pitches and rolls, we also reported pre-viously that 2D localization is relatively robust to changesin pitch and roll (Bachrach et al. 2009b). Figures 15(b)and 16(b) show the 2D map of both environments, as wellas the SU field indicating areas of sensor uncertainty, com-puted according to equation (4). However, note that theSU field is never actually constructed but SU samples aregenerated via rejection sampling.

For each environment, two different paths were gener-ated, each corresponding to a different maximum range forthe laser rangefinder. Different maximum ranges affect thehelicopter’s ability to localize itself, thus affecting the pathsgenerated by the BRM algorithm. Figures 15(c) and 16(c)show the helicopter’s trajectories based on the paths gener-ated by the BRM algorithm for the different sensor config-urations. For the experiments along the office walkway, the




(a) Paths Found (b) Goal Uncertainty

Fig. 13. Performance of the Kinect camera model in the environment shown in Figure 17. (a) The number of 60 trials that found afeasible path, as a function of the number of samples. The BRM using the SU sampler found a feasible path 100% of the time exceptwhen constrained to using 10 samples. (b) For comparison with Figure 12, we also provide a comparison of the trace of the helicopter’sexpected covariance at the goal (line graph) and the percentage of feasible paths that each algorithm was able to find (bar graph alongthe bottom).

MetricDuration 90 sMean speed 0.10 m/sMean pos. deviation 6.2 cmMax pos. deviation 19 cm

Fig. 14. A plot showing the ground-truth trajectory of the vehicle during position hold. The red dot near the center is the origin aroundwhich the vehicle was hovering. The vehicle was controlled using visual odometry, and its position measured with a motion capturesystem.

cyan path denotes the trajectory when the sensor range wasset to 5 m, while the pink path denotes the trajectory forthe 10 m configuration. For the open indoor environment,the cyan path represents the 8 m configuration, while thepink path represents the trajectory when the sensor rangewas 30 m. Due to the absence of a motion capture sys-tem, all paths were recorded based on the helicopter’s stateestimate from its localization module, and the helicopter’sability to reach the goal location was verified using thehuman eye.

In all of these scenarios, the helicopter successfully exe-cuted the paths generated by the BRM algorithm, and theactual and desired paths matched closely regardless of therange limits. In addition, the path generated for the laserwith a shorter maximum range was always longer than thatof the laser with the longer maximum range. In general,when the sensor is of higher quality, more of the envi-ronment is well-localizable, and hence the planned pathmore closely approximates the shortest path trajectory. Incontrast, a low-quality sensor requires more careful plan-ning to ensure that the vehicle remains well-localized, oftenresulting in a longer path.

We examined how the helicopter would perform if theBRM had assumed a better sensor than actually available,which allowed us to assess the effect of the sensor modelon the planning process. To avoid potential crashes, we didnot perform this analysis on actual hardware, but insteadmodified the raw laser data from the earlier experiments.Specifically, we post-processed the raw laser data from theexperiments shown in Figures 15 and 16, truncating thelaser data to have a shorter maximum range than was actu-ally available or was known to the BRM during planning.We then re-estimated the vehicle’s state along the trajec-tory using the modified laser data, and evaluated whetherthe helicopter was able to remain well-localized. In bothcases, the vehicle followed a trajectory that did not containenough information for the vehicle to stay well localized,since the truncation to a shorter maximum range removeda considerable amount of information from the sensor sig-nal. Additionally, in both cases, the state estimate becamesufficiently uncertain that the vehicle control would likelyhave become unstable. The crosses on both Figures 15(c)and 16(c) indicate the point at which the helicopter was nolonger able to localize itself, determined when tr( �) was




(a) Wide office walkway

(c) Executed paths for sensor with 5m (cyan) and 10m (pink) max range

(b) SU field

Fig. 15. Helicopter experiments in an office environment. (a)View of environment. (b) SU field of the environment. The lighterregions indicate areas of higher sensor uncertainty. Gray regionsindicate obstacles in the environment. (c) BRM paths executedwhen the laser range was set to 5 m (cyan) and 10 m (pink). Thehelicopter was able to successfully navigate both planned paths,traveling 44.05 m and 36.28 m, respectively. The red cross denoteswhere the state estimation would have failed if the 10 m path wereattempted using the 5 m sensor.

greater than 1. Given the helicopter’s strict requirements forlocalizability, where it is essential to be well-localized atevery step, the crosses effectively mark the points wherethe helicopter would have crashed given the planned pathand the modified sensor characteristics. It is therefore criti-cal that sensor limitations are accurately incorporated whenplanning under uncertainty.

4.2. Belief space navigation using theRGB-D camera

We also demonstrated the use of the BRM algorithm fornavigation on the helicopter using the RGB-D camera. Fig-ure 17(a) shows an example environment of an open space,where the center of the environment is out of range of theRGB-D camera. Additionally, the left side of the environ-ment (in the picture) is essentially featureless. In Figure17(b), we see that the sensor uncertainty field reflects theabsence of information along this wall.

Figure 17(c) shows the paths generated by the shortestpath planner (green) and the BRM planner using the RGB-D sensor model (pink), with the corresponding covariancesof the state estimator drawn on top of each trajectory. Asexpected, we see that the covariances of the state estimategrow along the shortest path, but stay tightly bounded alongthe BRM trajectory.

(a) Open, indoor environment

(b) SU field (c) Executed paths for sensor with 8m(cyan) and 30m (pink) max range

Fig. 16. Helicopter experiments in a large open indoor environ-ment. (a) View of environment. (b) SU field of the environment.The lighter regions indicate areas of higher sensor uncertainty.Gray regions indicate obstacles in the environment. (c) BRMpaths executed when the laser range was set to 8 m (cyan) and30 m (pink). The helicopter was able to successfully navigate bothplanned paths, traveling 36.58 m and 32.21 m, respectively. Thered cross denotes where the state estimation would have failed ifthe 10 m path were attempted using the 5 m sensor.

Fig. 17. Helicopter experiments in a large open indoor environ-ment. (a) View of environment. (b) SU field of the environment(slice at 0◦ yaw). The lighter regions indicate areas of highersensor uncertainty. Gray regions indicate obstacles in the environ-ment. (c) BRM paths using the RGB-D model (maximum range4 m, pink) and laser (maximum range 30 m, green).




4.3. Mapping using the RGB-D camera

Finally, we experimentally validated our mapping algo-rithms using the RGB-D camera. We have flown in a num-ber of locations around the MIT campus, and at the IntelResearch office in Seattle. Two such experiments are shownin Figure 18. As the MAV covered greater distances, theRGB-D mapping algorithm limited the global drift on itsposition estimates by detecting loop closures and correct-ing the trajectory estimates. The trajectory history wasthen combined with the RGB-D sensor data to automat-ically generate maps that are useful both for a humanoperator’s situational awareness, and for autonomous pathplanning and decision making. While ground-truth positionestimates are not available, the quality of the state estimatescomputed by our system is evident in the rendered pointcloud.

(a) (b)

Fig. 18. Trajectories flown by the MAV in two navigation experi-ments.

(a) (b)

Fig. 19. Voxel maps for the environments in Figure 18. (a)Dense maximum-likelihood occupancy voxel map of the envi-ronment depicted in Figure 18(a), false-colored by height.Unknown/unobserved cells are also tracked, but not depicted here.(b) A voxel map of the environment in Figure 18(b) allows thevehicle to plan a collision-free 3D trajectory (green).

Figure 19(a) shows an occupancy voxel map populatedusing the dense depth data provided by the RGB-D sensor.These occupancy maps can be used for autonomous pathplanning and navigation in highly cluttered environments,enabling flight through tight passageways and in close prox-imity to obstacles. Figure 19(b) shows a rendering of theMAV’s internal state estimate as it flew through the environ-ment depicted in Figure 18(b). While these renderings arenot necessary for obstacle avoidance, they would serve toprovide a human operator with greater situational awarenessof the MAV’s surrounding environment.

5. Related work

This paper uses techniques from previous results, adaptedfor the domain of an RGB-D camera. Specifically, thevisual odometry uses the FAST feature detector developedby Rosten and Drummond (2006), the rotation estimationdescribed by Mei et al. (2009), feature matching describedby Nistér et al. (2004), and the inlier detection described byHoward (2008). Our contribution in this area is the experi-mental analysis of the best practices compared with a rangeof alternative techniques. The mapping process integratesthe visual odometry for local map construction and buildson the previous results of Henry et al. (2010) for globalloop closure. Finally, the planning process builds on thebelief roadmap algorithm (Prentice and Roy 2009) and pre-vious initial results in sensor uncertainty sampling (He et al.2008b). We also provide a novel sampling strategy, whichadapts to the information available in the environment, anddetailed comparisons with other sampling strategies.

5.1. Visual odometry

Visual odometry refers to the process of estimating a vehi-cle’s 3D motion from visual imagery alone, and dates backto Moravec’s work on the Stanford cart (Moravec 1980).The basic algorithm used by Moravec and others is toidentify features of interest in each camera frame, esti-mate depth to each feature (typically using stereo), matchfeatures across time frames, and then estimate the rigidbody transformation that best aligns the features over time.Since then, a great deal of progress has been made inall aspects of visual odometry. Common feature detec-tors in modern real-time algorithms include Harris cor-ners (Harris and Stephens 1988) and FAST features (Rostenand Drummond 2006), which are relatively quick to com-pute and resilient against small viewpoint changes. Meth-ods for robustly matching features across frames includeRANSAC-based methods (Nistér et al. 2004; Konoligeet al. 2007; Johnson et al. 2008) and graph-based consis-tency algorithms (Howard 2008). In the motion estimationprocess, techniques have ranged from directly minimizingEuclidean distances between matched features (Horn 1987),to minimizing the pixel reprojection error instead of 3D dis-tance (Nistér et al. 2004). When computation constraintspermit, bundle adjustment has been shown to help reduceintegrated drift (Konolige et al. 2007).

Visual odometry estimates local motion and generallyhas unbounded global drift. To bound estimation error, itcan be integrated with simultaneous localization and map-ping (SLAM) algorithms, which employ loop closing tech-niques to detect when a vehicle revisits a previous loca-tion. Most recent visual SLAM methods rely on fast imagematching techniques (Snavely et al. 2006; Newman et al.2009) for loop closure. As loops are detected, a commonapproach is to construct a pose graph representing the spa-tial relationships between positions of the robot during itstrajectory and environmental features, creating constraints




that link previous poses. Optimization of this pose graphresults in a globally aligned set of frames (Olson et al. 2006;Grisetti et al. 2007b; Kaess et al. 2008). For increased visualconsistency, sparse bundle adjustment (SBA) (Triggs et al.2000) can be used to simultaneously optimize the poses andthe locations of observed features.

5.2. MAVs and visual navigation

The primary focus in the visual odometry communitieshas been on ground vehicles; however, there has been asignificant amount of research on using visual state esti-mation for the control of MAVs. For larger outdoor heli-copters, several researchers have demonstrated various lev-els of autonomy using vision-based state estimates (Buskeyet al. 2004; Kelly and Sukhatme 2007). While many of thechallenges for such vehicles are similar to smaller indoorMAVs, the payload and flight environments are quite differ-ent. For smaller MAVs operating in indoor environments,a number of researchers have used monocular camera sen-sors to control them (Celik et al. 2008; Steder et al. 2008;Ahrens et al. 2009; Blösch et al. 2010). Ko et al. (2007)use the iMote2 technology and the UKF for state estima-tion in aerial vehicles, and Valenti et al. (2006) were thefirst to demonstrate reliable navigation and position esti-mation on quadrotor helicopters. However, these algorithmsrequire specific assumptions about the environment (such asknown patterns) to obtain the unknown scale factor inherentin using a monocular camera. Previous work in our groupused a stereo camera to stabilize a MAV in unknown indoorenvironments (Achtelik et al. 2009), but the computationhad to be performed offboard, and no higher level mappingor SLAM was performed.

Finally, there has been considerable work in using laserrangefinders for MAV navigation and control (He et al.2008b; Bachrach et al. 2009a; Grzonka et al. 2009; Shenet al. 2011) with the limitations discussed earlier in thispaper. Laser range-finding onboard helicopters is also nota novel technology (Thrun et al. 2003; Mejias et al. 2006),and more recently, a number of quadrotor configurationshave been developed (Angeletti et al. 2008; Grzonka et al.2009) that are similar to the design first proposed by Heet al. (2008a).

5.3. Visual mapping

Our objective is not only alignment and registration, butalso building 3D models with both shape and appear-ance information. In the vision and graphics communities,a large body of work exists on alignment and registra-tion of images for 3D modeling and dense scene recon-struction (e.g. Pollefeys et al. 2008). However, our focusis primarily on scene modeling for robot perception andplanning, and secondarily for human situational aware-ness (e.g. for a human supervisor commanding the MAV).Strobl et al. (2009) combined a time-of-flight camera with

a stereo camera to build 3D object models in real time.Kim et al. (2009) used a set of time-of-flight camerasin a fixed calibrated configuration and with no temporalalignment of sensor streams. Se and Jasiobedzki (2008)used a stereo camera combined with SIFT features to cre-ate 3D models of environments, but made no provisionfor loop closure or global consistency. Newcombe andDavison (2010) developed an impressive system for real-time dense 3D reconstruction with a monocular camera,although their system is still limited to small feature-richscenes.

There has also been a large amount of work on densereconstruction from videos (e.g. Pollefeys et al. 2008) andphotos (e.g. Debevec et al. 1996; Furukawa and Ponce2009), mostly on objects or outdoor scenes. One inter-esting line of work (Furukawa et al. 2009) attacks thearguably harder problem of indoor reconstruction, using aManhattan-world assumption to fit simple geometric mod-els for visualization purposes. Such approaches are compu-tationally demanding and not very robust in feature-sparseenvironments.

5.4. Motion planning under uncertainty

Modern approaches to planning with incomplete state infor-mation are typically based on the partially observableMarkov decision process (POMDP) model or as a graphsearch through belief space (Bonet and Geffner 2000).While the POMDP provides a general framework for beliefspace planning, the complexity of the solution grows expo-nentially with the length of the policy and the number ofpotential observations. Approximation algorithms exist tomitigate the problem of scalability (Pineau et al. 2003;Smith and Simmons 2004), but these techniques still facecomputational issues in addressing large problems. Otherpapers have incorporated sensor characteristics for plan-ning (Taïx et al. 2008), though the algorithm assumes that anon-collision path already exists, and focuses on determin-ing the best landmarks to associate with each part of thepath. Van den Berg et al. (2010) proposed using a distribu-tion over state estimates with a conventional rapidly explor-ing random tree to generate motion plans, although thisapproach is not complete and can fail to find feasible plans.Bry and Roy (2011) proposed the rapidly exploring randombelief tree to track a distribution over state estimates alongwith the conventional Kalman filter covariance using anincremental sampling technique to refine trajectories, andis strongly related to the BRM algorithm.

The extended Kalman filter and unscented Kalman filterhave been used extensively, especially for state estimation.The symplectic form (and related Hamiltonian form) of thecovariance update has been reported before, most recentlyby Mourikis and Roumeliotis (2006). Planning algorithmshave also incorporated these filters to generate paths that arerobust to sensor uncertainty (Brooks et al. 2006; Gonzalez




and Stentz 2007). However, without the efficient covari-ance update presented in this paper, the deterministic searchperformed by these planning algorithms is computationallyexpensive.

6. Conclusion

This paper presented an experimental analysis of ourapproach to enabling autonomous flight using an RGB-D sensor. Our system combines visual odometry tech-niques from the existing literature with our previous workon autonomous flight and mapping, and is able to con-duct all sensing and computation required for local posi-tion control onboard the vehicle. Using the RGB-D sensor,our system is able to plan complex 3D paths in clutteredenvironments while retaining a high degree of situationalawareness. Additionally, we showed how the belief roadmapalgorithm (Prentice and Roy 2007, 2009) can be used toplan trajectories that incorporate a predictive model of sens-ing, allowing the planner to minimize the positional errorof the helicopter at the goal using an efficient graph search.The original BRM algorithm assumed an extended Kalmanfilter model for position estimation, and we showed how thisalgorithm can be extended to use the unscented Kalman fil-ter and provided a new sampling strategy for UKF positionestimation. We concluded with an experimental validationof our overall system for both laser- and RGB-D-basednavigation and mapping.

Funding

This research was supported by the Office of Naval Research(grant number MURI N00014-07-1-0749), Science of Auton-omy (program N00014-09-1-0641) and the Army Research Officeunder the MAST CTA. Peter Henry and Dieter Fox were sup-ported by ONR (MURI grant number N00014-09-1-1052), andby the NSF (contract number IIS-0812671), as well as collab-orative participation in the Robotics Consortium sponsored bythe US Army Research Laboratory (Agreement W911NF-10-2-0016). Ruijie He was supported by the Republic of SingaporeArmed Forces. Nicholas Roy was supported by the National Sci-ence Foundation Division of Information and Intelligent Systems(grant number 0546467).

References

Abou-Kandil H (2003) Matrix Riccati Equations in Control andSystems Theory. Birkhauser.

Achtelik M, Bachrach A, He R, Prentice S, and Roy N (2009)Stereo vision and laser odometry for autonomous helicopters inGPS-denied indoor environments. In Proceedings of the SPIEunmanned systems technology XI, volume 7332. Orlando, FL.

Ahrens S, Levine D, Andrews G, and How J (2009) Vision-based guidance and control of a hovering vehicle in unknown,GPS-denied environments. In IEEE international conferenceon robotics and automation, pp. 2643–2648.

Angeletti G, Valente JP, Iocchi L, and Nardi D (2008)Autonomous indoor hovering with a quadrotor. In Workshopproceedings of SIMPAR, pp. 472–481. Venice, Italy.

Bachrach A, He R, and Roy N (2009a) Autonomous flight inunknown indoor environments. International Journal of MicroAir Vehicles 1(4): 217–228.

Bachrach A, He R, and Roy, N (2009b) Autonomous flight inunstructured and unknown indoor environments. In Proceed-ings of EMAV.

Bachrach A, Prentice S, He R, and Roy N (2011) Range – robustautonomous navigation in GPS-denied environments. Journalof Field Robotics 28(5): 644–666.

Benhimane S and Malis E (2004) Improving vision-based controlusing efficient second-order minimization techniques. In IEEEinternational conference on robotics and automation.

Blösch M, Weiss S, Scaramuzza D, and Siegwart R (2010) Visionbased MAV navigation in unknown and unstructured envi-ronments. In IEEE international conference on robotics andautomation, pp. 21–28.

Bonet B and Geffner H (2000) Planning with incomplete infor-mation as heuristic search in belief space. In Proceedings ofAIPS.

Boor V, Overmars M, and van der Stappen A (1999) Gaus-sian sampling strategy for probabilistic roadmap planners. InProceedings of ICRA.

Brooks A, Makarenko A, Williams S, and Durrant-Whyte H(2006) Parametric POMDPs for planning in continuous statespaces. Robotics and Autonomous Systems 54(11): 887–897.

Bry A and Roy N (2011) Rapidly-exploring random belief treesfor motion planning under uncertainty. In Proceedings of ICRA.

Buskey G, Roberts J, Corke P, and Wyeth G (2004) Helicopterautomation using a low-cost sensing system. Computing Con-trol Engineering Journal 15(2): 8–9.

Calonder M, Lepetit V, and Fua P (2008) Keypoint signaturesfor fast learning and recognition. In European conference oncomputer vision, pp. 58–71.

Celik K, Chung SJ, and Somani A (2008) Mono-vision cornerSLAM for indoor navigation. In IEEE international conferenceon electro/information technology, pp. 343–348.

Debevec P, Taylor CJ, and Malik J (1996) Modeling and renderingarchitecture from photographs: A hybrid geometry and image-based approach. In SIGGRAPH.

Fedorov V (1972) Theory of Optimal Experiments. New York:Academic Press.

Furukawa Y, Curless B, Seitz S, and Szeliski R (2009) Recon-structing building interiors from images. In Proceedings of theinternational conference on computer vision.

Furukawa Y and Ponce J (2009) Accurate, dense, and robust mul-tiview stereopsis. IEEE Transactions on Pattern Analysis andMachine Intelligence, pp. 1362–1376.

Gonzalez J and Stentz, A (2007) Planning with uncertainty inposition using high-resolution maps. In Proceedings of ICRA.

Grisetti G, Grzonka S, Stachniss C, Pfaff P, and Burgard W(2007a) Estimation of accurate maximum likelihood maps in3D. In IEEE international conference on intelligent robots andsystems.

Grisetti G, Stachniss C, Grzonka S, and Burgard W (2007b) Atree parameterization for efficiently computing maximum like-lihood maps using gradient descent. In Proceedings of robotics:Science and systems.




Grzonka S, Grisetti G, and Burgard W (2009) Towards a navi-gation system for autonomous indoor flying. In IEEE interna-tional conference on robotics and automation. Kobe, Japan.

Gurdan D, Stumpf J, Achtelik M, Doth K, Hirzinger G, and RusD (2007) Energy-efficient autonomous four-rotor flying robotcontrolled at 1 kHz. In Proceedings of ICRA.

Harris C and Stephens M (1988) A combined corner and edgedetector. In Alvey vision conference, pp. 147–151.

He R, Prentice S, and Roy N (2008a) Planning in informationspace for a quadrotor helicopter in a GPS-denied environment.In Proceedings of ICRA.

He R, Prentice S, and Roy N (2008b) Planning in informationspace for a quadrotor helicopter in a GPS-denied environment.In IEEE international conference on robotics and automation,pp. 1814–1820. Los Angeles, CA.

Henry P, Krainin M, Herbst, E, Ren X, and Fox D (2010)RGB-D mapping: Using depth cameras for dense 3D mod-eling of indoor environments. In International symposium onexperimental robotics.

Hirschmuller H, Innocent P, and Garibaldi J (2002) Fast, uncon-strained camera motion estimation from stereo without trackingand robust statistics. In Proceedings of the international confer-ence on control, automation, robotics and vision, volume 2, pp.1099–1104.

Horn BKP (1987) Closed-form solution of absolute orienta-tion using unit quaternions. Journal of the Optical Society ofAmerica 4(4): 629–642.

Howard A (2008) Real-time stereo visual odometry forautonomous ground vehicles. In IEEE international conferenceon intelligent robots and systems.

Hsu D, Jiang T, Reif J, and Sun Z (2003) The bridge test for sam-pling narrow passages with probabilistic roadmap planners. InProceedings of ICRA.

Hsu D, Latombe J, and Kurniawati H (2006) On the probabilisticfoundations of probabilistic roadmap planning. InternationalJournal of Robotics Research 25(7): 627–643.

Huang AS, Bachrach A, Henry P, Krainin M, Maturana D,Fox D, and Roy N (2011) Visual odometry and mapping forautonomous flight using an RGB-D camera. In Proceedingsof the international symposium of robotics research (ISRR).Flagstaff, AZ.

Johnson AE, Goldberg SB, Cheng Y, and Matthies LH (2008)Robust and efficient stereo feature tracking for visual odome-try. In IEEE international conference on robotics and automa-tion. Pasadena, CA.

Julier S, Uhlmann J, and Durrant-Whyte H (1995) A newapproach for filtering nonlinear systems. In Proceedings ofACC.

Kaess M, Ranganathan A, and Dellaert F (2008) iSAM: Incremen-tal smoothing and mapping. IEEE Transactions on Robotics(TRO) 24(6): 1365–1378.

Kavraki LE, Svestka P, Latombe J-C, and Overmars M (1996)Probabilistic roadmaps for path planning in high dimensionalconfiguration spaces. IEEE Transactions on Robotics andAutomation 12(4): 566–580.

Kelly J and Sukhatme GS (2007) An experimental study of aerialstereo visual odometry. In Proceedings of the symposium onintelligent autonomous vehicles. Toulouse, France.

Khoshelham K (2011) Accuracy analysis of Kinect depth data. InISPRS workshop laser scanning, volume XXXVIII.

Kim YM, Theobalt C, Diebel J, Kosecka J, Micusik B, and ThrunS (2009) Multi-view image and ToF sensor fusion for dense3D reconstruction. In Workshop on 3-D digital imaging andmodeling (3DIM).

Ko J, Klein D, Fox D, and Hähnel D (2007) GP-UKF: UnscentedKalman filters with Gaussian process prediction and observa-tion models. In Proceedings of IROS.

Kollar T and Roy N (2008) Trajectory optimization using rein-forcement learning for map exploration. International Journalof Robotics Research 27(2): 175–197.

Konolige K (2010) Sparse sparse bundle adjustment. In Proceed-ings of the British machine vision conference (BMVC).

Konolige K, Agrawal M, and Sola J (2007) Large-scale visualodometry for rough terrain. In International symposium onrobotics research. Hiroshima, Japan.

Leonard J and Durrant-Whyte H (1991) Mobile robot localizationby tracking geometric beacons. IEEE Transactions on Roboticsand Automation 7(3): 376–382.

Lozano-Perez, T (1983) Spatial planning: A configuration spaceapproach. IEEE Transactions on Computers C-32(2).

Mei C, Sibley G, Cummins M, Newman P, and Reid I (2009) Aconstant time efficient stereo SLAM system. In British machinevision conference.

Meier L, Tanskanen P, Fraundorfer F, and Pollefeys M (2011)Pixhawk: A system for autonomous flight using onboard com-puter vision. In IEEE international conference on robotics andautomation.

Mejias LO, Saripalli S, Cervera P, and Sukhatme GS (2006) Visualservoing of an autonomous helicopter in urban areas usingfeature tracking. Journal of Field Robotics 23(3): 185–199.

Moravec H (1980) Obstacle avoidance and navigation in the realworld by a seeing robot rover. PhD thesis, Stanford University.

Mourikis A and Roumeliotis S (2006) Optimal sensor schedulingfor resource constrained localization of mobile robot forma-tions. IEEE Transactions on Robotics 22(5): 917–931.

Newcombe R and Davison A (2010) Live dense reconstructionwith a single moving camera. In Computer vision and patternrecognition, pp. 1498–1505.

Newman P, Sibley G, Smith M, Cummins M, Harrison A, Mei C,Posner I, Shade R, Schröter D, Murphy L, Churchill W, Cole D,and Reid I (2009) Navigating, recognising and describing urbanspaces with vision and laser. International Journal of RoboticsResearch 28(11–12), 1406–1433.

Nistér D (2005) Preemptive RANSAC for live structureand motion estimation. Machine Vision and Applications 16:321–329.

Nistér D, Naroditsky O, and Bergen J (2004) Visual odome-try. In Computer vision and pattern recognition, pp. 652–659.Washington, DC.

Nistér D and Stewenius H (2006) Scalable recognition with avocabulary tree. In Computer vision and pattern recognition.

Olson E, Leonard J, and Teller S (2006) Fast iterative optimizationof pose graphs with poor initial estimates. In IEEE interna-tional conference on robotics and automation, pp. 2262–2269.

Pineau J, Gordon G, and Thrun S (2003) Point-based value iter-ation: An anytime algorithm for POMDPs. In Proceedings ofIJCAI.

Pollefeys M, Nister D, Frahm J-M, Akbarzadeh A, MordohaiP, Clipp B, Engels C, Gallup D, Kim S-J, Merrell P, SalmiC, Sinha S, Talton B, Wang L, Yang Q, Stewenius H, Yang




R, Welch G, and Towles H (2008) Detailed real-time urban3D reconstruction from video. International Journal ComputerVision 72(2): 143–167.

Prentice S and Roy N (2007) The belief roadmap: Efficientplanning in linear POMDPs by factoring the covariance. InProceedings of ISRR.

Prentice S and Roy N (2009) The belief roadmap: Efficient plan-ning in belief space by factoring the covariance. The Interna-tional Journal of Robotics Research 28(11–12): 1448.

Rosten E and Drummond T (2006) Machine learning for high-speed corner detection. In European conference on computervision.

Se S and Jasiobedzki P (2008) Stereo-vision based 3D modelingand localization for unmanned vehicles. International Journalof Intelligent Control and Systems 13(1): 47–58.

Se S, Lowe D, and Little J (2002) Mobile robot localizationand mapping with uncertainty using scale-invariant visuallandmarks. International Journal of Robotics Research 21(8):735–758.

Shen S, Michael N, and Kumar V (2011) Autonomous multi-floorindoor navigation with a computationally constrained MAV.In IEEE international conference on robotics and automation.Shanghai, China.

Smith T and Simmons R (2004) Heuristic search value iterationfor POMDPs. In Proceedings of UAI.

Snavely N, Seitz S, and Szeliski R (2006) Photo tourism: Explor-ing photo collections in 3D. In ACM transactions on graphics(proceedings of SIGGRAPH).

Steder B, Grisetti G, Stachniss C, and Burgard W (2008) VisualSLAM for flying vehicles. IEEE Transactions on Robotics24(5): 1088–1093.

Strobl K, Mair E, Bodenmuller T, Kielhofer S, Sepp W, Suppa M,Burschka D, and Hirzinger G (2009) The self-referenced DLR3D-modeler. In IEEE international conference on intelligentrobots and systems, pp. 21–28.

Taïx M, Malti A, and Lamiraux F (2008) Planning robust land-marks for sensor based motion. In European robotics sympo-sium, pp. 195–204. Springer.

Takeda H and Latombe J (1992) Sensory uncertainty field formobile robot navigation. In Proceedings of ICRA.

Thrun S, Diel M, and Hähnel D (2003) Scan alignment and 3Dsurface modeling with a helicopter platform. In Proceedings ofFSR.

Thrun S, Fox D, Burgard W, and Dellaert F (2000) Robust MonteCarlo localization for mobile robots. Artificial Intelligence128(1–2): 99–141.

Thrun S, Liu Y, Koller D, Ng AY, Ghahramani Z, and Durrant-Whyte H (2004) Simultaneous localization and mapping withsparse extended information filters. International Journal ofRobotics Research 23(7-8): 693–716.

Triggs B, McLauchlan P, Hartley R, and Fitzgibbon A (2000)Bundle adjustment – a modern synthesis. Vision Algorithms:Theory and Practice: International Workshop on Vision Algo-rithms Corfu, Greece, 21–22 September 1999, Proceedings(Lecture Notes in Computer Science) pp. 153–177.

Valenti M, Bethke B, Fiore G, How JP, and Feron E (2006) Indoormulti-vehicle flight testbed for fault detection, isolation, andrecovery. In Proceedings of AIAA GNC.

Van den Berg J, Abbeel P, and Goldberg K (2010) LQG-MP:Optimized path planning for robots with motion uncertainty

and imperfect state information. In Proceedings of robotics:Science and systems.

Yamauchi B, Schultz A, and Adams W (1998) Mobile robot explo-ration and map-building with continuous localization. In Pro-ceedings of the 1998 IEEE international conference on roboticsand automation, volume 4, pp. 3715–3720.

Appendix A. The extended Kalman filter

For reference, we provide a description of the extendedKalman filter equations. Bayesian filtering is one ofthe most robust methods of localization (Leonard andDurrant-Whyte 1991), in which a probability distributionp( xt|u1:t, z1:t) is inferred over the (unknown) vehicle statext at time t following a series of noisy actions u1:t andmeasurements z1:t. With some standard assumptions aboutthe actions and observations, the posterior distribution (orbelief) can be expressed as

p( xt|u1:t, z1:t) = 1

Zp( zt|xt)

∫Sp( xt|ut, xt−1) p( xt−1) dxt−1 (5)

where Z is a normalizer. Equation (5), known as the Bayes’filter, provides an efficient, recursive way to update the statedistribution.

The Kalman filter is a form of Bayes filtering thatassumes that all probability distributions are Gaussian suchthat p( xt)= N( μt, �t) with mean μt and covariance �t,and that the transition and observation Gaussians are lin-early parameterized by the state and control. The extendedKalman filter (EKF) allows the same inference algorithm tooperate with non-linear transition and observation functionsby linearizing these functions around the current mean esti-mate. More formally, the next state xt and observation zt aregiven by the following functions

xt = g( xt−1, ut, wt) , wt ∼ N( 0, Wt) (6)

and zt = h( xt, qt) , qt ∼ N( 0, Qt) (7)

where ut is a control action, and wt and qt are random,unobservable noise variables. The EKF computes the statedistribution at time t in two steps: a process update basedonly on the control input ut leading to an estimate p( xt)=N( μt, �t), and a measurement update to complete theestimate of p( xt)= N( μt, �t). The process step follows as

μt = g( μt−1, ut) , �t = Gt�t−1GTt + VtWtV

Tt (8)

where Gt is the Jacobian of g with respect to x and Vt is theJacobian of g with respect to w. For convenience, we denoteRt � VtWtV T

t . Similarly, the measurement step follows as

μt = μt + Kt( Htμt − zt) , �t =( I − KtHt) �t (9)

where Ht is the Jacobian of h with respect to x and Kt isknown as the Kalman gain, given by

Kt = �tHTt

(Ht�tH

Tt + Qt

)−1(10)




An alternate form of the EKF represents the distributionp( xt|u1:t, z1:t) by an information vector ξt and informationmatrix t��−1

t . The information form may be more effi-cient to compute in domains where the information matrixis sparse (Thrun et al. 2004). The information matrix updatecan be written as

t = �−1t = ( Gt�t−1GT

t + Rt)−1

(11)

t = t + HTt Q−1

t Ht (12)

For convenience, we denote Mt � HTt Q−1

t Ht such that t =t +Mt.

Appendix B. The unscented Kalman filter

The critical step of the BRM algorithm is the constructionof the transfer function, which depends on terms Rt and Mt,the projections of the process and measurement noise termsinto the state space. Rt and Mt also represent the informationlost due to motion, and the information gained due to mea-surements, respectively. When using the extended Kalmanfilter to perform state estimation, these terms are trivial tocompute. However, the EKF is not always a feasible form ofBayesian filtering, especially when linearizing the control ormeasurement functions results in a poor approximation.

One recent alternate to the EKF is the unscented Kalmanfilter (UKF) (Julier et al. 1995), which uses a set of 2n + 1deterministic samples, known as “sigma points” from anassumed Gaussian density both to represent the probabil-ity density of a space of dimensionality n and to directlymeasure the relevant motion and measurement covariances.These samples are generated according to

X 0t =μt−1 (13)

X it =μt−1 +

(√( n+ λ) �t

)i, i=1, . . . , n (14)

X it =μt−1 −

(√( n+ λ) �t

)i, i=n+1, . . . , 2n (15)

where(√

( n+ λ) �t

)iis the ith column of the root of the

matrix. Each sigma point X i has an associated weight wim

used when computing the mean, and wic is the weight used

when computing the covariance, such that∑2n

i=0 wic = 1,∑2n

i=0 wim = 1. The weights and the λ parameters are chosen

to match the mean and variance of the assumed Gaussiandistribution; the mechanism for choosing these parame-ters can be found in Julier et al. (1995). The samples arepropagated according to the non-linear process model suchthat

X it = g(X i

t , u, 0) (16)

generating the process mean and covariance

μt =2n∑

i=0

wimX

it (17)

�t =2n∑

i=0

wic(X i

t − μt) (X it − μt)+Rt (18)

The measurement function uses the process mean andcovariance to create sigma points in the measurement space,which are then used to generate the posterior mean andcovariance ( μt, �t), such that

Z it = h(X i

t, 0) , μzt =

2n∑i=0

wimZ

it (19)

St =(

2n∑i=0

wim(Z i

t − μzt ) (Z i

t − μzt )

T

)+ Qt (20)

Kt =(

2n∑i=0

wic(X i

t − μt) (Z it − μz

t )T

)S−1

t (21)

μt = μt + Kt( zt − μzt ) (22)

�t = �t − KtStKTt (23)

Note that Rt � VtWtV Tt and Qt are the same process and

measurement noise terms from the EKF definition given inequations (6–10). The advantage of the UKF is that the pro-cess and measurement functions are not projected into thestate space by a linearization; instead, the unscented trans-form computes the moments of the process and measure-ment distributions in the state space itself. As a result, theUKF eliminates the need for linearization and captures thedistribution accurately up to the second order, rather thanthe first order fidelity of the EKF.

Although the UKF provides a mechanism for efficientlytracking the posterior distribution as a Gaussian whileavoiding linearization of the measurement model, the UKFno longer calculates the Mt matrix, which is a critical pieceof the individual transfer functions ζt. Nevertheless, we canstill recover Mt from the UKF update directly by working inthe information form and noticing that Mt is the informationgain due to measurement zt. We can combine equation (12)and equation (23)

t = t +Mt (24)

⇒ Mt = t −t (25)

= �−1t −�

−1t (26)

=( �t − KtStKTt )−1−�

−1t (27)

In order to calculate the Mt matrix for a series of pointsalong a trajectory, we therefore generate a prior covarianceand compute the posterior covariance as in equation (23).The UKF is still a projection of the measurement noise intothe state space, but is a more accurate projection than anexplicit linearization of the measurement model. Addition-ally, the UKF covariance update does not depend on theactual measurement received, exactly as in the EKF.



the international journal of robotics...

Documents