rgb-d mapping: using depth cameras for dense 3d modeling of indoor environments peter henry 1,...
TRANSCRIPT
1
RGB-D Mapping:Using Depth Cameras for Dense 3D Modeling of
Indoor Environments
Peter Henry1, Michael Krainin1, Evan Herbst1,Xiaofeng Ren2, and Dieter Fox1,2
1University of Washington Computer Science and Engineering2Intel Labs Seattle
2
RGB-D Data
3
4
Related Work SLAM
[Davison et al, PAMI 2007] (monocular) [Konolige et al, IJR 2010] (stereo) [Pollefeys et al, IJCV 2007] (multi-view stereo) [Borrmann et al, RAS 2008] (3D laser) [May et al, JFR 2009] (ToF sensor)
Loop Closure Detection [Nister et al, CVPR 2006] [Paul et al, ICRA 2010]
Photo collections [Snavely et al, SIGGRAPH 2006] [Furukawa et al, ICCV 2009]
5
System Overview
1. Frame-to-frame alignment
2. Loop closure detection and global consistency
3. Map representation
6
Alignment (3D SIFT RANSAC)
SIFT features (from image) in 3D (from depth)
RANSAC alignment requires no initial estimate
Requires sufficient matched features
7
RANSAC Failure Case
8
Alignment (ICP)
Does not need visual features or color
Requires sufficient geometry and initial estimate
9
ICP Failure Case
10
Joint Optimization (RGBD-ICP)
Fixed feature matches Dense point-to-plane
11
Loop ClosureSequential alignments accumulate error
Revisiting a previous location results in an inconsistent map
12
Loop Closure DetectionDetect by running RANSAC against previous
frames
Pre-filter options (for efficiency):Only a subset of frames (keyframes)
Every nth frameNew Keyframe when RANSAC fails directly against
previous keyframeOnly keyframes with similar estimated 3D posePlace recognition using vocabulary tree
13
Loop Closure CorrectionTORO [Grisetti 2007]:
Constraints between camera locations in pose graph
Maximum likelihood global camera poses
14
16
Map Representation: Surfels
Surface Elements [Pfister 2000, Weise 2009, Krainin 2010]
Circular surface patches
Accumulate color / orientation / size information
Incremental, independent updates
Incorporate occlusion reasoning
750 million points reduced to 9 million surfels
17
18
Experiment 1: RGBD-ICP16 locations marked in environment (~7m
apart)
Camera on tripod carried between and placed at markers
Error between ground truth distance and the accumulated distance from frame-to-frame alignment
α weight 0.0 (ICP) 0.2 0.5 0.8 1.0 (RANSAC)
Mean error (m)
0.22 0.19
0.16
0.18
1.29
19
Experiment 2: Robot ArmCamera held in WAM arm for ground truth pose
Method
Mean Translation
al Error (m)
Mean Angular
Error (deg)
RANSAC .168 4.5
ICP .144 3.9
RGBD-ICP (α=0.5) .123 3.3
RGBD-ICP + TORO
.069 2.8
20
Map Accuracy: Architectural Drawing
21
Map Accuracy: 2D Laser Map
22
ConclusionKinect-style depth cameras have recently
become available as consumer products
RGB-D Mapping can generate rich 3D maps using these cameras
RGBD-ICP combines visual and shape information for robust frame-to-frame alignment
Global consistency achieved via loop closure detection and optimization (RANSAC, TORO, SBA)
Surfels provide a compact map representation
23
Open QuestionsWhich are the best features to use?
How to find more loop closure constraints between frames?
What is the right representation (point clouds, surfels, meshes, volumetric, geometric primitives, objects)?
How to generate increasingly photorealistic maps?
Autonomous exploration for map completeness?
Can we use these rich 3D maps for semantic mapping?
24
Thank You!