Università degli Studi di Firenze
COMPUTATIONAL VISION GROUP
Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment
Fabio Bellavia†, Marco Fanfani‡, Fabio Pazzaglia† and Carlo Colombo‡, University of Florence, Italy†{bellavia.fabio, fabio.pazzaglia}@gmail.com, ‡{marco.fanfani, carlo.colombo}@unifi.it
KEYPOINTS DETECTION AND MATCHING
The HarrisZ detector and the sGLOH descriptorare used to extract and match respectively sta-ble corner features. Spatial and temporal con-strains have been imposed to refine the candi-dates matches.
• Keypoint matches between the frame fi andfj must satisfy the spatial constraint imposedby the epipolar rectification (yellow band) aswell as the temporal flow motion restriction(orange cone).
• Furthermore, the four matching points mustform the loop chain C (dotted line)
C =((xl
i,xri ), (x
li,x
lj), (x
lj ,x
rj), (x
ri ,x
rj))
• Finally, candidate matches are refined by run-ning four distinct RANSAC and a subset ofchain matches Ci,j ⊆ {C} is selected.
Left
Right
Time
• In the ideal case, points xlj , xr
j in frame fj mustcoincide with the projections x̃l
j , x̃rj of points
xli, x
ri in fi obtained by triangulation of Xi,j in
order for the chain C to be consistent with thepose Pi,j . Due to data noise, in the real casethe distances ‖ x̃l
j − xlj ‖ and ‖ x̃r
j − xrj ‖ are
minimized by RANSAC on matches Ci,j .
POSE ESTIMATION CONSTRAINED BY TEMPORAL FLOW
• The uncertainty of matches in the image planesis lower bounded by the image resolution (red)and is propagated to the 3D points. For thesame resolution limits, 3D point locations canassume an higher range Xi,j of values (darkgray quadrilateral) when estimated by closeframes fi and fj , while the possible locationsXi,w are more circumscribed (blue quadrilat-eral) in the case of distant frames fi and fw.
• An adaptive threshold δm is defined on the pre-vious observation to discard bad frames con-taining slow motion information. The frame fjis discarded and the next frame fj+1 is testedif it contains a high percentage of fixed pointmatches, i.e. for which the distance on the im-age is below a threshold δf .
• Examples of good frames are presented in thefigures below, where good fixed and unfixedmatches are shown in blue and light violet, re-spectively, while wrong correspondences arereported in cyan.
• Finally, a pose smoothing constraint betweenframes is introduced, so that the current rela-tive pose estimation Pi,j cannot abruptly varyfrom the previous one. This is achieved by im-posing that the relative rotation around the ori-gin between the two incremental rotations Rz,i
and Ri,j is bounded, as well as for the corre-sponding translation directions tz,i and ti,j .
• This last constraint can resolve issues in thecase of no camera movement or when mov-ing objects crossing the camera path cover thescene.
SSLAM
Our Selective SLAM (SSLAM) framework com-bines a robust loop chain matching scheme fortracking keypoints with an effective frame se-lection strategy.
• SSLAM relies on the observation that thepose estimation errors propagate from theuncertainty of the 3D points, higher for dis-tant points corresponding to matches withlow temporal flow disparity.
• SSLAM does not require any loop closure orbundle adjustment.
• SSLAM alternates between keypoint match-ing between consecutive SLAM frames andthe estimation of the relative camera poses.
REFERENCES
1. A. Geiger et al., “StereoScan: Dense 3D reconstruc-tion in real-time”, IEEE Intelligent Vehicles Sympo-sium (2011).
2. A. Geiger et al., “Are we ready for autonomousdriving? The KITTI vision benchmark suite”, Com-puter Vision and Pattern Recognition (2012).
FUNDINGThis work has been carried out during the THE-SAURUS project, founded by Regione Toscana(Italy) in the framework of the “FAS” program 2007-2013.
RESULTS
Path Length [m]
Tra
nsla
tion
Err
or [
%]
0 200 400 600 8001
1.52
2.53
3.5
Speed [km/h]
Rot
atio
n E
rror
[de
g/m
]
0 50 1000
0.01
0.02
0.03
0.04
0.05
0.06
0.07 10
Speed [km/h]
Tra
nsla
tion
Err
or [
%] 8
6
4
2
00 50 100
Path Length [m]
Rot
atio
n E
rror
[de
g/m
]
0 200 400 600 8000
0.01
0.02
0.03
0.04VISO2-S/200
SSLAM/500
SSLAM/15
SSLAM/3
SSLAM /500
Different versions of SSLAM have been com-pared with VISO2-S [1], which implements a sim-ilar loop chain matching scheme and RANSACpose estimation on the first 11 sequences of theKITTI vision benchmark suite [2], which provideground-truth data. SSLAM? refers to SSLAMwithout the adaptive frame discard detection, thenumber of RANSAC iterations are also reported.
• Both the different versions of the proposedmethod provide results better than VISO2-S ac-cording to the average translation and rotationerrors for increasing path length and speed.
• The chain loop matching scheme together withthe chosen keypoint detector and descriptor isrobust even for long paths, without requiringbundle adjustment or loop closure detection.
• Dropping low temporal flow disparity framesin SSLAM improves on the standard pose esti-mation used in SSLAM?, allowing the trackingof longer paths.
• Results for SSLAM/15 and SSLAM/500 areequivalent, while SSLAM/3 obtains inferior
results but similar to those obtained bySSLAM?/500, giving an evidence of the robust-ness of the methods.
• Estimated paths of the different SLAM methodon the KITTI dataset show that both SSLAMand SSLAM? paths are closer to the ground-truth than VISO2-S.
x [m]
z [
m]
-300 -200 -100 0 100 200 300
0
100
200
300
400
500
x [m]
z [m
]
-300 -200 -100 0 100 200 300 400
0
100
200
300
400
500VISO2-S/200
SSLAM/500
SSLAM/15
SSLAM/3
SSLAM /500
Ground-truth