robust selective stereo slam without loop closure and ...cvg.dsi.unifi.it/img/sslam-poster.pdf ·...

1
Università degli Studi di Firenze COMPUTATIONAL VISION GROUP Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment Fabio Bellavia , Marco Fanfani , Fabio Pazzaglia and Carlo Colombo , University of Florence, Italy {bellavia.fabio, fabio.pazzaglia}@gmail.com, {marco.fanfani, carlo.colombo}@unifi.it K EYPOINTS DETECTION AND MATCHING The HarrisZ detector and the sGLOH descriptor are used to extract and match respectively sta- ble corner features. Spatial and temporal con- strains have been imposed to refine the candi- dates matches. Keypoint matches between the frame f i and f j must satisfy the spatial constraint imposed by the epipolar rectification (yellow band) as well as the temporal flow motion restriction (orange cone). Furthermore, the four matching points must form the loop chain C (dotted line) C = ( (x l i , x r i ), (x l i , x l j ), (x l j , x r j ), (x r i , x r j ) ) Finally, candidate matches are refined by run- ning four distinct RANSAC and a subset of chain matches C i,j ⊆ {C} is selected. Left Right Time In the ideal case, points x l j , x r j in frame f j must coincide with the projections e x l j , e x r j of points x l i , x r i in f i obtained by triangulation of X i,j in order for the chain C to be consistent with the pose P i,j . Due to data noise, in the real case the distances k e x l j - x l j k and k e x r j - x r j k are minimized by RANSAC on matches C i,j . P OSE ESTIMATION CONSTRAINED BY TEMPORAL FLOW The uncertainty of matches in the image planes is lower bounded by the image resolution (red) and is propagated to the 3D points. For the same resolution limits, 3D point locations can assume an higher range X i,j of values (dark gray quadrilateral) when estimated by close frames f i and f j , while the possible locations X i,w are more circumscribed (blue quadrilat- eral) in the case of distant frames f i and f w . An adaptive threshold δ m is defined on the pre- vious observation to discard bad frames con- taining slow motion information. The frame f j is discarded and the next frame f j +1 is tested if it contains a high percentage of fixed point matches, i.e. for which the distance on the im- age is below a threshold δ f . Examples of good frames are presented in the figures below, where good fixed and unfixed matches are shown in blue and light violet, re- spectively, while wrong correspondences are reported in cyan. Finally, a pose smoothing constraint between frames is introduced, so that the current rela- tive pose estimation P i,j cannot abruptly vary from the previous one. This is achieved by im- posing that the relative rotation around the ori- gin between the two incremental rotations R z,i and R i,j is bounded, as well as for the corre- sponding translation directions t z,i and t i,j . This last constraint can resolve issues in the case of no camera movement or when mov- ing objects crossing the camera path cover the scene. SSLAM Our Selective SLAM (SSLAM) framework com- bines a robust loop chain matching scheme for tracking keypoints with an effective frame se- lection strategy. SSLAM relies on the observation that the pose estimation errors propagate from the uncertainty of the 3D points, higher for dis- tant points corresponding to matches with low temporal flow disparity. SSLAM does not require any loop closure or bundle adjustment. SSLAM alternates between keypoint match- ing between consecutive SLAM frames and the estimation of the relative camera poses. R EFERENCES 1. A. Geiger et al., “StereoScan: Dense 3D reconstruc- tion in real-time”, IEEE Intelligent Vehicles Sympo- sium (2011). 2. A. Geiger et al., “Are we ready for autonomous driving? The KITTI vision benchmark suite”, Com- puter Vision and Pattern Recognition (2012). F UNDING This work has been carried out during the THE- SAURUS project, founded by Regione Toscana (Italy) in the framework of the “FAS” program 2007- 2013. R ESULTS Path Length [m] Translation Error [%] 0 200 400 600 800 1 1.52 2.53 3.5 Speed [km/h] Rotation Error [deg/m] 0 50 100 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 10 Speed [km/h] Translation Error [%] 8 6 4 2 0 0 50 100 Path Length [m] Rotation Error [deg/m] 0 200 400 600 800 0 0.01 0.02 0.03 0.04 VISO2-S/200 SSLAM/500 SSLAM/15 SSLAM/3 SSLAM /500 Different versions of SSLAM have been com- pared with VISO2-S [1], which implements a sim- ilar loop chain matching scheme and RANSAC pose estimation on the first 11 sequences of the KITTI vision benchmark suite [2], which provide ground-truth data. SSLAM ? refers to SSLAM without the adaptive frame discard detection, the number of RANSAC iterations are also reported. Both the different versions of the proposed method provide results better than VISO2-S ac- cording to the average translation and rotation errors for increasing path length and speed. The chain loop matching scheme together with the chosen keypoint detector and descriptor is robust even for long paths, without requiring bundle adjustment or loop closure detection. Dropping low temporal flow disparity frames in SSLAM improves on the standard pose esti- mation used in SSLAM ? , allowing the tracking of longer paths. Results for SSLAM/15 and SSLAM/500 are equivalent, while SSLAM/3 obtains inferior results but similar to those obtained by SSLAM ? /500, giving an evidence of the robust- ness of the methods. Estimated paths of the different SLAM method on the KITTI dataset show that both SSLAM and SSLAM ? paths are closer to the ground- truth than VISO2-S. x [m] z [m] -300 -200 -100 0 100 200 300 0 100 200 300 400 500 x [m] z [m] -300 -200 -100 0 100 200 300 400 0 100 200 300 400 500 VISO2-S/200 SSLAM/500 SSLAM/15 SSLAM/3 SSLAM /500 Ground-truth

Upload: others

Post on 06-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Selective Stereo SLAM without Loop Closure and ...cvg.dsi.unifi.it/img/SSLAM-poster.pdf · Università degli Studi di Firenze COMPUTATIONAL VISION GROUP Robust Selective Stereo

Università degli Studi di Firenze

COMPUTATIONAL VISION GROUP

Robust Selective Stereo SLAM without Loop Closure and Bundle Adjustment

Fabio Bellavia†, Marco Fanfani‡, Fabio Pazzaglia† and Carlo Colombo‡, University of Florence, Italy†{bellavia.fabio, fabio.pazzaglia}@gmail.com, ‡{marco.fanfani, carlo.colombo}@unifi.it

KEYPOINTS DETECTION AND MATCHING

The HarrisZ detector and the sGLOH descriptorare used to extract and match respectively sta-ble corner features. Spatial and temporal con-strains have been imposed to refine the candi-dates matches.

• Keypoint matches between the frame fi andfj must satisfy the spatial constraint imposedby the epipolar rectification (yellow band) aswell as the temporal flow motion restriction(orange cone).

• Furthermore, the four matching points mustform the loop chain C (dotted line)

C =((xl

i,xri ), (x

li,x

lj), (x

lj ,x

rj), (x

ri ,x

rj))

• Finally, candidate matches are refined by run-ning four distinct RANSAC and a subset ofchain matches Ci,j ⊆ {C} is selected.

Left

Right

Time

• In the ideal case, points xlj , xr

j in frame fj mustcoincide with the projections x̃l

j , x̃rj of points

xli, x

ri in fi obtained by triangulation of Xi,j in

order for the chain C to be consistent with thepose Pi,j . Due to data noise, in the real casethe distances ‖ x̃l

j − xlj ‖ and ‖ x̃r

j − xrj ‖ are

minimized by RANSAC on matches Ci,j .

POSE ESTIMATION CONSTRAINED BY TEMPORAL FLOW

• The uncertainty of matches in the image planesis lower bounded by the image resolution (red)and is propagated to the 3D points. For thesame resolution limits, 3D point locations canassume an higher range Xi,j of values (darkgray quadrilateral) when estimated by closeframes fi and fj , while the possible locationsXi,w are more circumscribed (blue quadrilat-eral) in the case of distant frames fi and fw.

• An adaptive threshold δm is defined on the pre-vious observation to discard bad frames con-taining slow motion information. The frame fjis discarded and the next frame fj+1 is testedif it contains a high percentage of fixed pointmatches, i.e. for which the distance on the im-age is below a threshold δf .

• Examples of good frames are presented in thefigures below, where good fixed and unfixedmatches are shown in blue and light violet, re-spectively, while wrong correspondences arereported in cyan.

• Finally, a pose smoothing constraint betweenframes is introduced, so that the current rela-tive pose estimation Pi,j cannot abruptly varyfrom the previous one. This is achieved by im-posing that the relative rotation around the ori-gin between the two incremental rotations Rz,i

and Ri,j is bounded, as well as for the corre-sponding translation directions tz,i and ti,j .

• This last constraint can resolve issues in thecase of no camera movement or when mov-ing objects crossing the camera path cover thescene.

SSLAM

Our Selective SLAM (SSLAM) framework com-bines a robust loop chain matching scheme fortracking keypoints with an effective frame se-lection strategy.

• SSLAM relies on the observation that thepose estimation errors propagate from theuncertainty of the 3D points, higher for dis-tant points corresponding to matches withlow temporal flow disparity.

• SSLAM does not require any loop closure orbundle adjustment.

• SSLAM alternates between keypoint match-ing between consecutive SLAM frames andthe estimation of the relative camera poses.

REFERENCES

1. A. Geiger et al., “StereoScan: Dense 3D reconstruc-tion in real-time”, IEEE Intelligent Vehicles Sympo-sium (2011).

2. A. Geiger et al., “Are we ready for autonomousdriving? The KITTI vision benchmark suite”, Com-puter Vision and Pattern Recognition (2012).

FUNDINGThis work has been carried out during the THE-SAURUS project, founded by Regione Toscana(Italy) in the framework of the “FAS” program 2007-2013.

RESULTS

Path Length [m]

Tra

nsla

tion

Err

or [

%]

0 200 400 600 8001

1.52

2.53

3.5

Speed [km/h]

Rot

atio

n E

rror

[de

g/m

]

0 50 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07 10

Speed [km/h]

Tra

nsla

tion

Err

or [

%] 8

6

4

2

00 50 100

Path Length [m]

Rot

atio

n E

rror

[de

g/m

]

0 200 400 600 8000

0.01

0.02

0.03

0.04VISO2-S/200

SSLAM/500

SSLAM/15

SSLAM/3

SSLAM /500

Different versions of SSLAM have been com-pared with VISO2-S [1], which implements a sim-ilar loop chain matching scheme and RANSACpose estimation on the first 11 sequences of theKITTI vision benchmark suite [2], which provideground-truth data. SSLAM? refers to SSLAMwithout the adaptive frame discard detection, thenumber of RANSAC iterations are also reported.

• Both the different versions of the proposedmethod provide results better than VISO2-S ac-cording to the average translation and rotationerrors for increasing path length and speed.

• The chain loop matching scheme together withthe chosen keypoint detector and descriptor isrobust even for long paths, without requiringbundle adjustment or loop closure detection.

• Dropping low temporal flow disparity framesin SSLAM improves on the standard pose esti-mation used in SSLAM?, allowing the trackingof longer paths.

• Results for SSLAM/15 and SSLAM/500 areequivalent, while SSLAM/3 obtains inferior

results but similar to those obtained bySSLAM?/500, giving an evidence of the robust-ness of the methods.

• Estimated paths of the different SLAM methodon the KITTI dataset show that both SSLAMand SSLAM? paths are closer to the ground-truth than VISO2-S.

x [m]

z [

m]

-300 -200 -100 0 100 200 300

0

100

200

300

400

500

x [m]

z [m

]

-300 -200 -100 0 100 200 300 400

0

100

200

300

400

500VISO2-S/200

SSLAM/500

SSLAM/15

SSLAM/3

SSLAM /500

Ground-truth