robust selective stereo slam without loop closure and ...cvg.dsi.unifi.it/img/sslam-poster.pdf ·...

Università degli Studi di Firenze

COMPUTATIONAL VISION GROUP

Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment

Fabio Bellavia†, Marco Fanfani‡, Fabio Pazzaglia† and Carlo Colombo‡, University of Florence, Italy†{bellavia.fabio, fabio.pazzaglia}@gmail.com, ‡{marco.fanfani, carlo.colombo}@unifi.it

KEYPOINTS DETECTION AND MATCHING

The HarrisZ detector and the sGLOH descriptorare used to extract and match respectively sta-ble corner features. Spatial and temporal con-strains have been imposed to refine the candi-dates matches.

• Keypoint matches between the frame fi andfj must satisfy the spatial constraint imposedby the epipolar rectification (yellow band) aswell as the temporal flow motion restriction(orange cone).

• Furthermore, the four matching points mustform the loop chain C (dotted line)

C =((xl

i,xri ), (x

li,x

lj), (x

lj ,x

rj), (x

ri ,x

rj))

• Finally, candidate matches are refined by run-ning four distinct RANSAC and a subset ofchain matches Ci,j ⊆ {C} is selected.

Left

Right

Time

• In the ideal case, points xlj , xr

j in frame fj mustcoincide with the projections x̃l

j , x̃rj of points

xli, x

ri in fi obtained by triangulation of Xi,j in

order for the chain C to be consistent with thepose Pi,j . Due to data noise, in the real casethe distances ‖ x̃l

j − xlj ‖ and ‖ x̃r

j − xrj ‖ are

minimized by RANSAC on matches Ci,j .

POSE ESTIMATION CONSTRAINED BY TEMPORAL FLOW

• The uncertainty of matches in the image planesis lower bounded by the image resolution (red)and is propagated to the 3D points. For thesame resolution limits, 3D point locations canassume an higher range Xi,j of values (darkgray quadrilateral) when estimated by closeframes fi and fj , while the possible locationsXi,w are more circumscribed (blue quadrilat-eral) in the case of distant frames fi and fw.

• An adaptive threshold δm is defined on the pre-vious observation to discard bad frames con-taining slow motion information. The frame fjis discarded and the next frame fj+1 is testedif it contains a high percentage of fixed pointmatches, i.e. for which the distance on the im-age is below a threshold δf .

• Examples of good frames are presented in thefigures below, where good fixed and unfixedmatches are shown in blue and light violet, re-spectively, while wrong correspondences arereported in cyan.

• Finally, a pose smoothing constraint betweenframes is introduced, so that the current rela-tive pose estimation Pi,j cannot abruptly varyfrom the previous one. This is achieved by im-posing that the relative rotation around the ori-gin between the two incremental rotations Rz,i

and Ri,j is bounded, as well as for the corre-sponding translation directions tz,i and ti,j .

• This last constraint can resolve issues in thecase of no camera movement or when mov-ing objects crossing the camera path cover thescene.

SSLAM

Our Selective SLAM (SSLAM) framework com-bines a robust loop chain matching scheme fortracking keypoints with an effective frame se-lection strategy.

• SSLAM relies on the observation that thepose estimation errors propagate from theuncertainty of the 3D points, higher for dis-tant points corresponding to matches withlow temporal flow disparity.

• SSLAM does not require any loop closure orbundle adjustment.

• SSLAM alternates between keypoint match-ing between consecutive SLAM frames andthe estimation of the relative camera poses.

REFERENCES

1. A. Geiger et al., “StereoScan: Dense 3D reconstruc-tion in real-time”, IEEE Intelligent Vehicles Sympo-sium (2011).

2. A. Geiger et al., “Are we ready for autonomousdriving? The KITTI vision benchmark suite”, Com-puter Vision and Pattern Recognition (2012).

FUNDINGThis work has been carried out during the THE-SAURUS project, founded by Regione Toscana(Italy) in the framework of the “FAS” program 2007-2013.

RESULTS

Path Length [m]

Tra

nsla

tion

Err

or [

%]

0 200 400 600 8001

1.52

2.53

3.5

Speed [km/h]

Rot

atio

n E

rror

[de

g/m

]

0 50 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07 10

Speed [km/h]

Tra

nsla

tion

Err

or [

%] 8

6

4

2

00 50 100

Path Length [m]

Rot

atio

n E

rror

[de

g/m

]

0 200 400 600 8000

0.01

0.02

0.03

0.04VISO2-S/200

SSLAM/500

SSLAM/15

SSLAM/3

SSLAM /500

Different versions of SSLAM have been com-pared with VISO2-S [1], which implements a sim-ilar loop chain matching scheme and RANSACpose estimation on the first 11 sequences of theKITTI vision benchmark suite [2], which provideground-truth data. SSLAM? refers to SSLAMwithout the adaptive frame discard detection, thenumber of RANSAC iterations are also reported.

• Both the different versions of the proposedmethod provide results better than VISO2-S ac-cording to the average translation and rotationerrors for increasing path length and speed.

• The chain loop matching scheme together withthe chosen keypoint detector and descriptor isrobust even for long paths, without requiringbundle adjustment or loop closure detection.

• Dropping low temporal flow disparity framesin SSLAM improves on the standard pose esti-mation used in SSLAM?, allowing the trackingof longer paths.

• Results for SSLAM/15 and SSLAM/500 areequivalent, while SSLAM/3 obtains inferior

results but similar to those obtained bySSLAM?/500, giving an evidence of the robust-ness of the methods.

• Estimated paths of the different SLAM methodon the KITTI dataset show that both SSLAMand SSLAM? paths are closer to the ground-truth than VISO2-S.

x [m]

z [

m]

-300 -200 -100 0 100 200 300

0

100

200

300

400

500

x [m]

z [m

]

-300 -200 -100 0 100 200 300 400

0

100

200

300

400

500VISO2-S/200

SSLAM/500

SSLAM/15

SSLAM/3

SSLAM /500

Ground-truth

robust selective stereo slam without loop closure and ...cvg.dsi.unifi.it/img/sslam-poster.pdf ·...

Documents