robust point cloud based reconstruction of large-scale...

9
Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes Ziquan Lan Zi Jian Yew Gim Hee Lee Department of Computer Science, National University of Singapore {ziquan, zijian.yew, gimhee.lee}@comp.nus.edu.sg Abstract Outlier feature matches and loop-closures that survived front-end data association can lead to catastrophic fail- ures in the back-end optimization of large-scale point cloud based 3D reconstruction. To alleviate this problem, we pro- pose a probabilistic approach for robust back-end optimiza- tion in the presence of outliers. More specifically, we model the problem as a Bayesian network and solve it using the Expectation-Maximization algorithm. Our approach lever- ages on a long-tail Cauchy distribution to suppress outlier feature matches in the odometry constraints, and a Cauchy- Uniform mixture model with a set of binary latent variables to simultaneously suppress outlier loop-closure constraints and outlier feature matches in the inlier loop-closure con- straints. Furthermore, we show that by using a Gaussian- Uniform mixture model, our approach degenerates to the formulation of a state-of-the-art approach for robust indoor reconstruction. Experimental results demonstrate that our approach has comparable performance with the state-of- the-art on a benchmark indoor dataset, and outperforms it on a large-scale outdoor dataset. Our source code can be found on the project website 1 . 1. Introduction Point cloud reconstruction of outdoor scenes has many important applications such as 3D architectural modeling, terrestrial surveying, Simultaneous Localization and Map- ping (SLAM) for autonomous vehicles, etc. Compared to images, point clouds from 3D scanners exhibit less variation under different weather or lighting conditions, e.g., summer and winter (Fig. 1), or day and night (Fig. 5). Further- more, the depths of point clouds from 3D scanners are more accurate than image-based reconstructions. Consequently, point clouds from 3D scanners are preferred for large-scale outdoor 3D reconstructions. Most existing methods for 3D reconstruction are solved via a two-step approach: a front- end data association step and a back-end optimization step. More specifically, data association is used to establish fea- 1 https://github.com/ziquan111/RobustPCLReconstruction Odometry Choi et al. [5], Identity cov. Choi et al. [5] Ours (Sec. 4) Scene during first pass Scene during second pass Detected loop-closures Figure 1. Reconstruction of a 1km route traversed in two dif- ferent seasons: summer (orange) and winter (blue). The outlier (red links) loop-closures significantly outnumber the inliers (green links). Four zoomed-in point clouds on the right are reconstructed from different methods. ture matches [30] in point cloud fragments for registration, and loop-closures [26] between point cloud fragments for pose-graph [21] optimization. Unfortunately, no existing algorithm for feature matching and loop-closure detection guarantees complete elimination of outliers. Although out- lier feature matches are usually handled with RANSAC- based geometric verification [16, 30], such pairwise checks do not consider global consistency. In addition, the numer- ous efforts on improving the accuracy in loop-closure de- tection [6, 8, 20, 26] are not completely free from false pos- itives. Many back-end optimization algorithms [13, 14, 21] are based on non-linear least-squares that lack the robust- ness to cope with outliers. A small number of outliers would consequently lead to catastrophic failures in the 3D reconstructions. Several prior works focus on disabling out- lier loop-closures in the back-end optimization [5, 15, 25]. However, these methods do not consider the effect from the outlier feature matches with the exception of [34] that solves global geometric registration in a very small-scale problem setting. The main contribution of this paper is a probabilistic 9690

Upload: others

Post on 08-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes

Ziquan Lan Zi Jian Yew Gim Hee Lee

Department of Computer Science, National University of Singapore

{ziquan, zijian.yew, gimhee.lee}@comp.nus.edu.sg

Abstract

Outlier feature matches and loop-closures that survived

front-end data association can lead to catastrophic fail-

ures in the back-end optimization of large-scale point cloud

based 3D reconstruction. To alleviate this problem, we pro-

pose a probabilistic approach for robust back-end optimiza-

tion in the presence of outliers. More specifically, we model

the problem as a Bayesian network and solve it using the

Expectation-Maximization algorithm. Our approach lever-

ages on a long-tail Cauchy distribution to suppress outlier

feature matches in the odometry constraints, and a Cauchy-

Uniform mixture model with a set of binary latent variables

to simultaneously suppress outlier loop-closure constraints

and outlier feature matches in the inlier loop-closure con-

straints. Furthermore, we show that by using a Gaussian-

Uniform mixture model, our approach degenerates to the

formulation of a state-of-the-art approach for robust indoor

reconstruction. Experimental results demonstrate that our

approach has comparable performance with the state-of-

the-art on a benchmark indoor dataset, and outperforms it

on a large-scale outdoor dataset. Our source code can be

found on the project website 1.

1. Introduction

Point cloud reconstruction of outdoor scenes has many

important applications such as 3D architectural modeling,

terrestrial surveying, Simultaneous Localization and Map-

ping (SLAM) for autonomous vehicles, etc. Compared to

images, point clouds from 3D scanners exhibit less variation

under different weather or lighting conditions, e.g., summer

and winter (Fig. 1), or day and night (Fig. 5). Further-

more, the depths of point clouds from 3D scanners are more

accurate than image-based reconstructions. Consequently,

point clouds from 3D scanners are preferred for large-scale

outdoor 3D reconstructions. Most existing methods for 3D

reconstruction are solved via a two-step approach: a front-

end data association step and a back-end optimization step.

More specifically, data association is used to establish fea-

1https://github.com/ziquan111/RobustPCLReconstruction

Odom

etry

Choi

et a

l. [5]

, Id

entity

cov.

Choi

et a

l. [5]

Ours

(Sec

. 4)

Scene during first pass Scene during second pass

Detected loop-closures

Figure 1. Reconstruction of a 1km route traversed in two dif-

ferent seasons: summer (orange) and winter (blue). The outlier

(red links) loop-closures significantly outnumber the inliers (green

links). Four zoomed-in point clouds on the right are reconstructed

from different methods.

ture matches [30] in point cloud fragments for registration,

and loop-closures [26] between point cloud fragments for

pose-graph [21] optimization. Unfortunately, no existing

algorithm for feature matching and loop-closure detection

guarantees complete elimination of outliers. Although out-

lier feature matches are usually handled with RANSAC-

based geometric verification [16, 30], such pairwise checks

do not consider global consistency. In addition, the numer-

ous efforts on improving the accuracy in loop-closure de-

tection [6, 8, 20, 26] are not completely free from false pos-

itives. Many back-end optimization algorithms [13, 14, 21]

are based on non-linear least-squares that lack the robust-

ness to cope with outliers. A small number of outliers

would consequently lead to catastrophic failures in the 3D

reconstructions. Several prior works focus on disabling out-

lier loop-closures in the back-end optimization [5, 15, 25].

However, these methods do not consider the effect from

the outlier feature matches with the exception of [34] that

solves global geometric registration in a very small-scale

problem setting.

The main contribution of this paper is a probabilistic

9690

approach for robust back-end optimization to handle out-

liers from a weak front-end data association in large-scale

point cloud based reconstructions. Our approach simultane-

ously suppresses outlier feature matches and loop-closures.

To this end, we model our robust point cloud reconstruc-

tion problem as a Bayesian network. The global poses

of the point cloud fragments are the unknown parameters,

and odometry and loop-closure constraints are the observed

variables. A binary latent variable is assigned to each loop-

closure constraint; it determines whether a loop-closure

constraint is an inlier or outlier. We model feature matches

in the odometry constraints with a long-tail Cauchy distri-

bution to gain robustness to outlier matches. Additionally,

we use a Cauchy-Uniform mixture model for loop-closure

constraints. The uniform and Cauchy distributions model

outlier loop-closures and the feature matches in inlier loop-

closures, respectively. In contrast to many existing back-

end optimizers that use rigid transformations as the odome-

try and loop-closure constraints [5, 14, 15, 21, 25], we use

the distances between feature matches to exert direct influ-

ence on these matches.

We use the Expectation-Maximization (EM) algorithm

[3, 15] to find the globally consistent poses of the point

cloud fragments (Sec. 4). The EM algorithm iterates be-

tween the Expectation and Maximization steps. In the Ex-

pectation step, the posterior of a loop-closure constraint be-

ing an inlier is updated. In the Maximization step, a local

optimal solution for the global poses is found from maxi-

mizing the expected complete data log-likelihood over the

posterior from the expectation step. We also generalize

our approach to solve reconstruction problems with an eas-

ier setting (Sec. 5). In particular, a strong assumption is

imposed: odometry and inlier loop-closure constraints are

free from outlier feature matches. We show that by using

a Gaussian-Uniform mixture model, our approach degen-

erates to the formulation of a state-of-the-art approach for

robust indoor reconstruction [5]. Fig. 1 shows an example

of the reconstruction result with our method compared to

other methods in the presence of outliers.

2. Related Work

Reconstruction of outdoor scenes has been studied in

[22, 23]. Schops et al. [23] propose a set of filtering steps to

detect and discard unreliable depth measurements acquired

from a RGB-D camera. However, loop-closures is not de-

tected and this can lead to reconstruction failures. Rely-

ing on very accurate GPS/INS, Pollefeys et al. [22] pro-

pose a 3D reconstruction system from RGB images. How-

ever, GPS/INS signal may be unavailable or unreliable, es-

pecially on cloudy days or in urban canyons. Our work

relies on neither GPS/INS nor RGB images. In contrast,

we focus on reconstruction from point cloud data acquired

from 3D scanners that is less sensitive to weather or light-

ing changes. There are also many works on indoor scene

reconstruction. Since the seminal KinectFusion [18], there

are several follow-up algorithms [4, 19, 27]. Unfortunately,

these methods do not detect loop-closures. Nonetheless,

there are many RGB-D reconstruction methods with loop-

closure detection [5, 7, 10, 11, 24, 28, 31, 32, 33].

Choi et al. [5] achieve the state-of-the-art performance

for indoor reconstruction with robust loop-closure. How-

ever, they assume no outlier feature matches in the odom-

etry and inlier loop-closure constraints. We relax this as-

sumption to achieve robust feature matching. More specif-

ically, [5] estimates a switch variable [25] for each loop-

closure constraint using line processes [2]. Outlier loop-

closures are disabled by setting the respective switch vari-

ables to zero. Additional switch prior terms are imposed

and chosen empirically [25] to prevent a trivial solution of

removing all loop-closure constraints. In comparison, our

approach does not require the additional prior terms. We

estimate the posterior of a loop-closure being an inlier con-

straint in the Expectation step shown in Sec. 4. The EM ap-

proach is also used by Lee et al. [15]. However, they solve

a robust pose-graph optimization problem without coping

with the feature matches for reconstruction.

3. Overview

In this section, we provide an overview of our recon-

struction pipeline that consists of four main components:

point cloud fragment construction, point cloud registration,

loop-closure detection, and robust reconstruction with EM.

Point cloud fragment construction. A single scan from

a 3D scanner, e.g. LiDAR, contains limited number of

points. We integrate multiple consecutive scans with odom-

etry readings obtained from dead reckoning e.g., the Iner-

tial Navigation System (INS) [17] to form local point cloud

fragments. A set of 3D features is then extracted from each

point cloud fragment using [30].

Point cloud registration. The top k1 feature matches be-

tween two consecutive point cloud fragments Fi and Fi+1

are retained as the odometry constraint Xi,i+1. Since

consecutive fragments overlap sufficiently by construction

[17, 30], we define Xi,i+1 as a reliable constraint but note

that it can contain outlier feature matches.

Loop-closure detection. It is inefficient to perform an ex-

haustive pairwise registration for large-scale outdoor scenes

with many point cloud fragments. Hence, we perform point

cloud based place-recognition [26] to identify a set of can-

didate loop-closures. We retain the top k2 potential loop-

closures for each fragment and remove the duplicates. For

each loop-closure between fragments Fi and Fj , we keep

9691

the set of top k1 feature matches denoted as Yij . We define

Yij as a loop-closure constraint, which can either be an in-

lier or outlier. Similar to the odometry constraint, an inlier

loop-closure Yij can also contain outlier feature matches.

Robust reconstruction with EM. The constraints from

point cloud registration and loop-closure detection can con-

tain outliers. In particular, both odometry and loop-closure

constraints can contain outlier feature matches. Moreover,

many detected loop-closures are false positives. In the next

section, we describe our probabilistic modeling approach to

simultaneously suppress outlier feature matches and false

loop-closures. The EM algorithm is used to solve for the

globally consistent fragment poses. Optional refinement us-

ing ICP can be applied to further improve the global point

cloud registration.

4. Robust Reconstruction with EM

We model the robust reconstruction problem as a

Bayesian network shown in Fig. 2. Let T =[T1, ..., Ti, ..., TN ]⊤, where Ti ∈ SE(3), denote the N frag-

ment poses, X = [X12, ..., Xi,i+1, ..., XN−1,N ]⊤ denote

the N −1 odometry constraints obtained in point cloud reg-

istration, and Y = [..., Yij , ...]⊤ denote the M loop-closure

constraints obtained in loop-closure detection. We explic-

itly assign the loop-closure constraints into 2 clusters that

represent the inliers and outliers. For each loop-closure con-

straint Yij , we introduce a corresponding assignment vari-

able Zij = [Zij,in, Zij,out]⊤ ∈ {[1, 0]⊤, [0, 1]⊤}. Zij is a

one-hot vector: Zij,in = 1 and Zij,out = 1 assigns assigns

Yij as an inlier and outlier loop-closure constraint, respec-

tively. We use Z = [..., Zij , ...]⊤ to denote the assignment

variables. T is the unknown parameter, Z is the latent vari-

able, and X and Y are both observed variables.

Robust reconstruction can be solved as finding the Max-

imum a Posterior (MAP) solution of p(T |X,Y ). However,

the MAP solution involves an intractable step of marginal-

ization over the latent variable Z. We circumvent this prob-

lem by using the EM algorithm that takes the maximiza-

tion of the expected complete data log-likelihood over the

posterior of the latent variables. The EM algorithm iterates

between the Expectation and Maximization steps. In the

Expectation step, we use T old, i.e., fragment poses solved

from the previous iteration to find the posterior distribution

of the latent variable Z,

p(Z|Y, T old) =p(Y |Z, T old)p(Z|T old)

p(Y |T old), (1)

in which Z does not depend on X , since they are condition-

ally independent given Y according to the Bayesian net-

work in Fig. 2.

In the Maximization step, the posterior distribution (Eq.

(1)) is used to update T by maximizing the expectation of

Figure 2. Bayesian network representation of the robust recon-

struction problem. Ti+1, Ti and Tj are fragment poses. Xi,i+1

is an odometry constraint. Yij is a loop-closure constraint. Zij

is an assignment variable. N − 1 and M indicate the numbers of

odometry constraints and loop-closure constraints respectively.

the complete data log-likelihood denoted by

QEM : =∑

Z

p(Z|Y, T old) ln p(X,Y, Z|T ) (2)

= ln p(X|T )︸ ︷︷ ︸

QX

+∑

Z

p(Z|Y, T old) ln p(Y, Z|T )︸ ︷︷ ︸

QY

.

We define QX for the term with odometry constraints, and

QY for the term with loop-closure constraints.

Initialization. The unknown parameters, i.e., global

poses T of the N fragments, are initialized with the rel-

ative poses computed from odometry constraints X using

ICP. Other dead reckoning methods such as wheel odome-

try and/or INS readings can also be used.

4.1. Modeling Odometry Constraints

Odometry constraints are obtained from point cloud reg-

istration between two consecutive point cloud fragments.

Recall that an odometry constraint Xi,i+1 is a set of feature

matches between fragments Fi and Fi+1, which can contain

outlier matches. To gain robustness, we model each feature

match (p,q) ∈ Xi,i+1 with a long-tail multivariate Cauchy

distribution. Suppose these feature matches are indepen-

dent and identically distributed (i.i.d.), we take a geometric

mean over their product to get

p(Xi,i+1|T ) =( ∏

(p,q)∈Xi,i+1

Cauchyi,i+1(p,q)) 1

|Xi,i+1|

,

(3)

where

Cauchyi,i+1(p,q) =1

π2√detΣ(1 + d2Σ(Tip, Ti+1q))2

,

(4)

which we assume an isotropic covariance Σ = σ2I with

scale σ, and dΣ denotes the Mahalanobis distance such that

d2Σ(Tip, Ti+1q) = (Tip− Ti+1q)⊤Σ−1(Tip− Ti+1q).

(5)

9692

The value of σ is set based on the density of extracted fea-

tures. For example, σ = 0.5m in the outdoor dataset.

4.2. Modeling Loop­Closure Constraints

A loop-closure constraint Yij is the set of feature

matches between fragments Fi and Fj . We propose to use a

Cauchy-Uniform mixture model to cope with the (1) outlier

loop-closure constraints and (2) outlier feature matches in

the inlier loop-closure constraints.

To distinguish between inlier and outlier loop-closures,

we model the distribution of assignment variable Z as a

Bernoulli distribution defined by the inlier probability λ ∈[0, 1],

p(Zij) = λZij,in(1− λ)Zij,out . (6)

Next, we use two distributions: Cauchy and Uniform dis-

tributions to model the inlier and outlier loop-closure con-

straints, respectively.

Cauchy distribution – inlier loop-closure constraints.

The inlier loop-closure constraints can contain outlier fea-

ture matches. We use the same multivariate Cauchy dis-

tribution as Eq. (3) and further reorganize the terms. We

define Cij for brevity, such that

Cij := p(Yij |Zij,in = 1, T ) = π−2σ−3e−2Aij , (7)

in which

Aij =1

|Yij |∑

(p,q)∈Yij

ln(1 +‖Tip− Tjq‖2

σ2), (8)

and |Yij | denotes the number of feature matches in Yij .

Uniform distribution – outlier loop-closure constraints.

We model the outlier loop-closure constraints with a uni-

form distribution defined by a constant probability u ∈(0, 1),

p(Yij |Zij,out = 1, T ) = u. (9)

4.3. Expectation Step

Recall that the expectation step is evaluated in Eq. (1).

Plugging Eq. (6), (7) and (9) into the Bayes’ formula, we

obtain the posterior of being an inlier loop-closure con-

straint,

Pinij := p(Zij,in = 1|Y, T ) = Θ

Θ+ e2Aij, (10)

where

Θ =λ

(1− λ)uπ2σ3. (11)

The constant Θ consists of two distribution parameters: λ

is the probability of being an inlier loop-closure; u is the

constant probability to uniformly sample a random loop-

closure, which are difficult to set manually based on differ-

ent datasets. Hence, we propose to estimate Θ based on the

input data. More specifically, we learn Θ from the odome-

try constraints, since all odometry constraints are effectively

inlier loop-closure constraints.

The process to learn Θ is as follows. First, for each

odometry constraint Xi,i+1, we denote its corresponding er-

ror term mi,i+1 = e2Ai,i+1 (analogous to Eq. (10)), where

Ai,i+1 =1

|Xi,i+1|∑

(p,q)∈Xi,i+1

ln(1 +‖Tip− Ti+1q‖2

σ2).

(12)

Next, we compute the median error denoted as m. Since we

regard all odometry constraints as inlier loop-closure con-

straints, letΘ

Θ+ m= p, (13)

where we set p = 90%, meaning that a loop-closure Yij

with a small error (e2Aij < m) is very likely to be an inlier

(Pinij > p). Finally, we solve for Θ using Eq. (13).

4.4. Maximization Step

In the maximization step, we solve for T that maximizes

QEM = QX + QY , where QX and QY are shorthand no-

tations defined in Eq. (2). These two terms are evaluated

independently, and then optimized jointly.

Evaluate QX . Assuming the odometry constraints in X

are i.i.d., the joint probability of all odometry constraints is

given by

p(X|T ) =N−1∏

i=1

p(Xi,i+1|T ). (14)

Substituting the joint probability of the feature matches

within each odometry constraint (Eq. (3)), we can rewrite

QX as

QX = −2

N−1∑

i=1

Ai,i+1 + const. (15)

Evaluate QY . Using the product rule, the joint prob-

ability of loop-closure constraints and their correspond-

ing assignment variables can be written as p(Y, Z|T ) =p(Z)p(Y |Z, T ). Plugging Eq. (6), (7) and (9) in, we have

p(Y, Z|T ) =∏

i,j

(λ Cij)Zij,in

((1− λ)u

)Zij,out. (16)

We can rewrite QY as

QY =∑

i,j

Pinij lnCij + const, (17)

9693

with the joint probability from Eq. (16) and the posterior

from Eq. (10), which can be further expanded to

QY = −2∑

i,j

Pinij Aij + const. (18)

Maximize QEM . The maximization of QX +QY can be

reformulated into a non-linear least-squares problem with

the following objective function

argminT

i,j

Pinij

|Yij |∑

(p,q)∈Yij

ln(1 +‖Tip− Tjq‖2

σ2) (19)

+

N−1∑

i=1

1

|Xi,i+1|∑

(p,q)∈Xi,i+1

ln(1 +‖Tip− Ti+1q‖2

σ2),

which can be easily optimized using the sparse Cholesky

solver in Google Ceres [1]. The computation complexity is

cubic to the total number of feature matches.

5. Generalization using EM

In the previous section, we solved the problem when

constraints are contaminated with outlier feature matches.

In this section, we study a problem with an easier setting

where correct loop-closure constraints contain no outlier

feature matches. Recall that long-tail multivariate Cauchy

distribution is used to gain robustness against outlier fea-

ture matches. We replace the multivariate Cauchy distribu-

tion with a multivariate Gaussian distribution for the easier

problem without outlier feature matches, and show that our

EM formulation degenerates to the formulation of a state-

of-the-art approach for robust indoor reconstruction [5]. To

avoid repetition, we only highlight the major differences to

the previous section. Each analogous term is augmented

with a superscript G that stands for “Gaussian”.

Odometry constraints. Replacing the multivariate

Cauchy distribution in Eq. (3) with a multivariate Gaussian

distribution, we have

pG(Xi,i+1|T ) =( ∏

(p,q)∈Xi,i+1

Gaussi,i+1(p,q)) 1

|Xi,i+1|

,

(20)

where

Gaussi,i+1(p,q) =exp

(− 1

2d2Σ(Tip, Ti+1q)

)

(2π)3 detΣ, (21)

and Σ and dΣ remain unchanged.

Loop-closure constraints. We note that the Bernoulli

distribution in Eq. (6) still holds, and the major changes

start from Eq. (7). Using the multivariate Gaussian distri-

bution, we have

Gij := pG(Yij |Zij,in = 1, T ) = (√2πσ)−3e−

Bij

2σ2 , (22)

in which

Bij =1

|Yij |∑

(p,q)∈Yij

‖Tip− Tjq‖2, (23)

and |Yij | is the number of feature matches. We note that

Bij is a sum-of-square errors that can lead to arithmetic

overflow in the eBij term from the posterior of the latent

variable Zij (analogous to Eq. (10)). In contrast, there is

no arithmetic overflow in the eAij term from Eq. (10) since

Aij from Eq. (8) is a sum-of-log errors. We propose to al-

leviate the arithmetic overflow problem by using a Pareto

distribution that approximates Gij as

Gij ≈x0

B2ij

, (24)

where x0 > 0 is a scale parameter. For outlier loop-

closures, the uniform distribution in Eq. (9) still holds.

Expectation step. Using the approximation of Gij in Eq.

(24), the posterior from Eq. (10) becomes

Pinij

G ≈ ΘG

ΘG +B2ij

, (25)

where

ΘG =x0λ

(1− λ)u. (26)

It becomes apparent in PinG

ij that the arithmetic overflow

problem is alleviated by the replacement of eBij with B2ij .

In the previous section, Θ in Eq. (13) is learned from the

median error m of all the error terms mi,i+1 = e2Ai,i+1

in the odometry constraints. Unfortunately, the median er-

ror mG from mGi,i+1 = B2

i,i+1 becomes uninformative be-

cause we assume no outlier feature matches, i.e., mG → 0since mG

i,i+1 = B2i,i+1 → 0. Despite the absence of outlier

feature matches, ‖Tip − Tjq‖ is upper bounded by some

threshold, ǫ. Hence, the mean error term can be directly

estimated from Eq. (23) as mG = ǫ2. Subsequently, let

ΘG

ΘG + mG= p, (27)

where we set p = 90% and solve for ΘG. We set ǫ = 0.05m

for our experiments on the indoor dataset (see next section)

based on the typical magnitude of sensor noise.

9694

Living room 1 Living room 2 Office 1 Office 2 Average

Before pruningRecall(%) 61.2 49.7 64.4 61.5 59.2

Precision(%) 27.2 17.0 19.2 14.9 19.6

Choi et al. [5] Recall(%) 57.6 49.7 63.3 60.7 57.8

after pruning Precision(%) 95.1 97.4 98.3 100.0 97.7

Ours (Sec. 5) Recall(%) 58.7 48.4 63.9 61.5 58.1

after pruning Precision(%) 97.0 94.9 96.6 93.6 95.4

Table 1. Results of robust optimization on the indoor dataset. Our method shows comparable result with the state-of-the-art.

Living room 1 Living room 2 Office 1 Office 2 Average

Whelan et al. [27] 0.22 0.14 0.13 0.13 0.16

Kerl et al. [12] 0.21 0.06 0.11 0.10 0.12

SUN3D [29] 0.09 0.07 0.13 0.09 0.10

Choi et al. [5] 0.04 0.07 0.03 0.04 0.05

Ours (Sec. 5) 0.06 0.09 0.05 0.04 0.06

GT Trajectory 0.04 0.04 0.03 0.03 0.04

Table 2. Reconstruction accuracy on the indoor dataset. The entries are the mean distances of each model to its respective ground-truth

surface (in meters). Our proposed method shows comparable result with the state-of-the-art and outperforms the rest.

Maximization step. Finally, we reformulate the maxi-

mization problem as a non-linear least-squares problem

with the following objective function

argminT

i,j

Pinij

G

|Yij |∑

(p,q)∈Yij

‖Tip− Tjq‖2 (28)

+

N−1∑

i=1

1

|Xi,i+1|∑

(p,q)∈Xi,i+1

‖Tip− Ti+1q‖2,

which is similar to the formulation in [5] with two minor

differences. First, we average the square errors over the

number of feature matches but [5] does not. Second, we

estimate the posterior Pinij

Gby iterating between the Ex-

pectation and Maximization steps but [5] estimates it using

line processes [2]. It is important to note that Eq. (28) is

derived from the original Gaussian formulation in Eq. (22)

instead of the Pareto approximation in Eq. (24).

6. Evaluation

We use the experimental results from two datasets for

the comparison between our approach and the state-of-the-

art approach [5]. The first dataset is from small-scale in-

door scenes with no outlier feature matches in the odom-

etry and inlier loop-closure constraints, and the second

dataset is from large-scale outdoor scenes with outlier fea-

ture matches. Our Gaussian-Uniform EM (Sec. 5) and

Cauchy-Uniform EM (Sec. 4) are evaluated on the small-

scale indoor and large-scale outdoor datasets, respectively.

6.1. Small­Scale Indoor Scenes

The “Augmented ICL-NUIM Dataset” provided and

augmented by [9] and [5], respectively, is used as the small-

scale indoor dataset. This dataset is generated from syn-

thetic indoor environments and includes two models: a liv-

ing room and an office. There are two RGB-D image se-

quences for each model, resulting in a total of four test

cases. To ensure fair comparison, we follow the same eval-

uation criteria and experimental settings as [5].

Results. Tab. 1 shows the comparison of the average re-

call and precision of the loop-closures on (1) before prun-

ing, (2) [5] after pruning and (3) our method after prun-

ing. Here, “before pruning” refers the loop-closures from

the loop-closure detection, and “after pruning” refers to the

inlier loop-closures after robust optimization. It can be seen

that the average precision and recall of our method is com-

parable to [5]. This is an expected result since we showed in

Sec. 5 that our method degenerates to the method in [5] with

minor differences in the absence of outlier feature matches.

We further evaluate the reconstruction accuracy of the final

model using the error metric proposed in [9], i.e., the mean

distance of the reconstructed surfaces to the ground truth

surfaces. Tab. 2 shows the comparison of the reconstruc-

tion accuracy of our method to other existing approaches.

In addition, as suggested in [5], the reconstruction accu-

racy of the model obtained from fusing the input depth im-

ages with the ground truth trajectory (denoted as GT Trajec-

tory in Tab. 2) is reported for reference. As expected, our

method shows comparable result with the state-of-the-art on

the indoor dataset.

6.2. Large­Scale Outdoor Scenes

The large-scale outdoor dataset is based on the “Oxford

Robotcar Dataset” [17]. It consists of 3D point clouds cap-

tured with a LiDAR sensor mounted on a car that repeatedly

9695

Pure odometry Choi et al. [5] (identity covariance) Choi et al. [5] Ours (Sec. 4)Figure 3. Trajectories on the 1km route. Each trajectory (blue) is overlaid with the GPS/INS trajectory (green). Red asterisk indicates the

position of the 1st fragment pose.

Choi et al. [5] Ours (Sec. 4)Pure odometry Choi et al. [5] (identity covariance)Figure 4. Trajectories on the city-scale route. Each trajectory (blue) is overlaid with the GPS/INS trajectory (green). A zoomed-in region

is shown on the top right corner for each trajectory. Red asterisk indicates the position of the 1st fragment pose.

drives through Oxford, UK, at different times over a year.

We select two different driving routes from the dataset, a

short route (about 1km) and a long route (city-scale). Fur-

thermore, we take two traversals at different times for each

route, resulting four traversals in total. Unlike the syn-

thetic indoor dataset, there is no ground truth of the sur-

face geometry. We evaluate the trajectory accuracy against

the GPS/INS readings as an indirect measurement of recon-

struction accuracy. We prepare the dataset as follows:

• Point cloud fragments. We integrate the push-broom 2D

LiDAR scans and their corresponding INS readings into

the 3D point clouds. We segment the data into fragments

with 30m radius for every 10m interval. Each fragment

is then downsampled using a VoxelGrid filter with a grid

size of 0.2m. 242 and 1770 fragments are constructed for

the 1km route and the city-scale route, respectively.

• Odometry trajectory. The odometry trajectory is dis-

connected due to discontinuous INS data since we are

combining two traversals. We simulate the odometry tra-

jectory via geometric registrations between consecutive

point cloud fragments, and manually identify one link-

age transformation between the two traversals. We also

check the entire odometry trajectory to ensure that there

are no remaining erroneous transformations. The result-

ing odometry trajectory is used to initialize the fragment

poses, T .

• Odometry constraints. For every two consecutive

frames along the odometry trajectory, we perform point

cloud registration as described in Sec. 3. Specifically, we

extract 1024 features for each fragment, and collect the

top 200 feature matches to form an odometry constraint.

Note that the feature matches are selected without addi-

tional geometric verification, and it can contain outliers.

241 and 1769 odometry constraints are constructed for

the 1km route and the city-scale route, respectively.

• Loop-closure constraints. We perform loop-closure de-

tection as described in Sec. 3. We take every 5th frag-

ment along the trajectory as a keyframe fragment; loop-

closures are detected among the selected keyframe frag-

ments. For the 1km route, we find the top 5 loop-closures

for each keyframe fragment and then remove the dupli-

cates. For the city-scale route, we find the top 10 loop-

closures for each keyframe fragment and then remove the

duplicates. 171 and 1438 loop-closure constraints are

constructed for the 1km and city-scale route, respectively.

The outlier loop-closure ratio is more than 80% for both

routes.

Baseline Methods. We compare the effectiveness of our

approach with two baseline methods based on [5]: a

stronger and a weaker baseline. The stronger baseline en-

codes uncertainty information of the feature matches be-

tween two fragments into a covariance matrix. The feature

matches used to construct the covariance matrix are those

within 1m apart after geometric registration. Refer to [5] for

the more details on the covariance matrix. The covariance

matrix of the weaker baseline is set to identity, i.e., no un-

certainty information on the feature matches. The relative

poses between the point cloud fragments computed from

9696

Odom

etry

Choi

et a

l. [5]

, Id.

cov.

Choi

et a

l. [5]

Ours

(Sec

. 4)

Odom

etry

Choi

et a

l. [5]

, Id.

cov.

Choi

et a

l. [5]

Ou

rs (S

ec. 4

)

Scene images Scene images

Figure 5. Reconstruction results on the city-scale route traversed in day and night. The two columns of zoomed-in point clouds are

reconstructed based on different trajectories.

1km City-scale

Odometry 1.85 11.81

Choi et al. [5]123.24 207.93

(identity covariance)

Choi et al. [5] 1.97 50.92

Ours (Sec. 4) 1.34 2.45

Table 3. Reconstruction accuracy on outdoor dataset. Each entry

is the mean distance of the estimated poses to the GPS/INS ground

truth (in meters).

ICP are used as the odometry and loop-closure constraints

in the baseline methods.

Results. Tab. 3 summarizes the mean distances of the

estimated poses to the GPS/INS trajectory as an indirect

measure of the reconstruction accuracy on the 1km and

city-scale outdoor datasets. Fig. 3 and 4 show the plots

of the trajectories. We align the first five fragment poses

with the GPS/INS trajectory, error measurements start af-

ter the 5th fragment pose. The results show that the ac-

curacy increases when more information about the feature

matches is considered in the optimization process. We can

see from Tab. 3, and Fig. 3 and 4 that the weaker baseline

([5] with uninformative identity covariance) without infor-

mation of the feature matches gives the worst performance.

The stronger baseline ([5] with informative covariance ma-

trix) that encodes information of feature matches using the

covariance matrix shows better performance. In contrast,

our method that directly takes feature matches as the odom-

etry and loop-closure constraints outperforms the two base-

lines. Furthermore, Fig. 1 and 5 show reconstruction results

for qualitative evaluation. It can be seen from the bottom

left and right plots in Fig. 5 that our method produces the

sharpest reconstructions of the 3D point clouds.

7. Conclusion

In this paper, we proposed a probabilistic approach for

robust point cloud reconstruction of large-scale outdoor

scenes. Our approach leverages on a Cauchy-Uniform

mixture model to simultaneously suppress outlier feature

matches and loop-closures. Moreover, we showed that by

using a Gaussian-Uniform mixture model, our approach de-

generates to the formulation of a state-of-the-art approach

for robust indoor reconstruction. We verified our proposed

methods on both indoor and outdoor benchmark datasets.

Acknowledgement

This work is supported in part by a Singapore MOE Tier

1 grant R-252-000-A65-114.

9697

References

[1] S. Agarwal, K. Mierle, and Others. Ceres solver. http:

//ceres-solver.org.

[2] M. J. Black and A. Rangarajan. On the unification of line

processes, outlier rejection, and robust statistics with appli-

cations in early vision. In IJCV, 1996.

[3] G. Celeux and G. Govaert. A classification em algorithm for

clustering and two stochastic versions. In CSDA, 1992.

[4] J. Chen, D. Bautembach, and S. Izadi. Scalable real-time

volumetric surface reconstruction. In TOG, 2013.

[5] S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction

of indoor scenes. In ICCV, 2015.

[6] M. Cummins and P. Newman. Appearance-only slam at large

scale with fab-map 2.0. In IJRR, 2011.

[7] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard.

3-d mapping with an rgb-d camera. In T-RO, 2014.

[8] D. Galvez-Lopez and J. D. Tardos. Real-time loop detection

with bags of binary words. In IROS, 2011.

[9] A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A

benchmark for RGB-D visual odometry, 3D reconstruction

and SLAM. In ICRA, 2014.

[10] P. Henry, D. Fox, A. Bhowmik, and R. Mongia. Patch vol-

umes: Segmentation-based consistent mapping with rgb-d

cameras. In 3DV, 2013.

[11] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-

d mapping: Using kinect-style depth cameras for dense 3d

modeling of indoor environments. In IJRR, 2012.

[12] C. Kerl, J. Sturm, and D. Cremers. Dense visual slam for

rgb-d cameras. In IROS, 2013.

[13] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor

graphs and the sum-product algorithm. In T-IT, 2001.

[14] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and

W. Burgard. g 2 o: A general framework for graph opti-

mization. In ICRA, 2011.

[15] G. H. Lee, F. Fraundorfer, and M. Pollefeys. Robust

pose-graph loop-closures with expectation-maximization. In

IROS, 2013.

[16] G. H. Lee and M. Pollefeys. Unsupervised learning of thresh-

old for geometric verification in visual-based loop-closure.

In ICRA, 2014.

[17] W. Maddern, G. Pascoe, C. Linegar, and P. Newman. 1 year,

1000 km: The oxford robotcar dataset. In IJRR, 2017.

[18] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux,

D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and

A. Fitzgibbon. Kinectfusion: Real-time dense surface map-

ping and tracking. In ISMAR, 2011.

[19] M. Nießner, M. Zollhofer, S. Izadi, and M. Stamminger.

Real-time 3d reconstruction at scale using voxel hashing. In

TOG, 2013.

[20] D. Nister and H. Stewenius. Scalable recognition with a vo-

cabulary tree. In CVPR, 2006.

[21] E. Olson, J. Leonard, and S. Teller. Fast iterative alignment

of pose graphs with poor initial estimates. In ICRA, 2006.

[22] M. Pollefeys, D. Nister, J.-M. Frahm, A. Akbarzadeh,

P. Mordohai, B. Clipp, C. Engels, D. Gallup, S.-J. Kim,

P. Merrell, et al. Detailed real-time urban 3d reconstruction

from video. In IJCV, 2008.

[23] T. Schops, T. Sattler, C. Hane, and M. Pollefeys. Large-

scale outdoor 3d reconstruction on a mobile device. In CVIU,

2017.

[24] F. Steinbrucker, C. Kerl, and D. Cremers. Large-scale multi-

resolution surface reconstruction from rgb-d sequences. In

ICCV, 2013.

[25] N. Sunderhauf and P. Protzel. Switchable constraints for ro-

bust pose graph slam. In IROS, 2012.

[26] M. A. Uy and G. H. Lee. PointNetVLAD: Deep Point

Cloud Based Retrieval for Large-Scale Place Recognition.

In CVPR, 2018.

[27] T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and

J. McDonald. Robust real-time visual odometry for dense

rgb-d mapping. In ICRA, 2013.

[28] T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald.

Deformation-based loop closure for large scale dense rgb-d

slam. In IROS, 2013.

[29] J. Xiao, A. Owens, and A. Torralba. Sun3d: A database

of big spaces reconstructed using sfm and object labels. In

ICCV, 2013.

[30] Z. J. Yew and G. H. Lee. 3DFeat-Net: Weakly Supervised

Local 3D Features for Point Cloud Registration. In ECCV,

2018.

[31] Q.-Y. Zhou and V. Koltun. Dense scene reconstruction with

points of interest. In TOG, 2013.

[32] Q.-Y. Zhou and V. Koltun. Simultaneous localization and

calibration: Self-calibration of consumer depth cameras. In

CVPR, 2014.

[33] Q.-Y. Zhou, S. Miller, and V. Koltun. Elastic fragments for

dense scene reconstruction. In ICCV, 2013.

[34] Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration.

In ECCV, 2016.

9698