robust point cloud based reconstruction of large-scale...
TRANSCRIPT
Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes
Ziquan Lan Zi Jian Yew Gim Hee Lee
Department of Computer Science, National University of Singapore
{ziquan, zijian.yew, gimhee.lee}@comp.nus.edu.sg
Abstract
Outlier feature matches and loop-closures that survived
front-end data association can lead to catastrophic fail-
ures in the back-end optimization of large-scale point cloud
based 3D reconstruction. To alleviate this problem, we pro-
pose a probabilistic approach for robust back-end optimiza-
tion in the presence of outliers. More specifically, we model
the problem as a Bayesian network and solve it using the
Expectation-Maximization algorithm. Our approach lever-
ages on a long-tail Cauchy distribution to suppress outlier
feature matches in the odometry constraints, and a Cauchy-
Uniform mixture model with a set of binary latent variables
to simultaneously suppress outlier loop-closure constraints
and outlier feature matches in the inlier loop-closure con-
straints. Furthermore, we show that by using a Gaussian-
Uniform mixture model, our approach degenerates to the
formulation of a state-of-the-art approach for robust indoor
reconstruction. Experimental results demonstrate that our
approach has comparable performance with the state-of-
the-art on a benchmark indoor dataset, and outperforms it
on a large-scale outdoor dataset. Our source code can be
found on the project website 1.
1. Introduction
Point cloud reconstruction of outdoor scenes has many
important applications such as 3D architectural modeling,
terrestrial surveying, Simultaneous Localization and Map-
ping (SLAM) for autonomous vehicles, etc. Compared to
images, point clouds from 3D scanners exhibit less variation
under different weather or lighting conditions, e.g., summer
and winter (Fig. 1), or day and night (Fig. 5). Further-
more, the depths of point clouds from 3D scanners are more
accurate than image-based reconstructions. Consequently,
point clouds from 3D scanners are preferred for large-scale
outdoor 3D reconstructions. Most existing methods for 3D
reconstruction are solved via a two-step approach: a front-
end data association step and a back-end optimization step.
More specifically, data association is used to establish fea-
1https://github.com/ziquan111/RobustPCLReconstruction
Odom
etry
Choi
et a
l. [5]
, Id
entity
cov.
Choi
et a
l. [5]
Ours
(Sec
. 4)
Scene during first pass Scene during second pass
Detected loop-closures
Figure 1. Reconstruction of a 1km route traversed in two dif-
ferent seasons: summer (orange) and winter (blue). The outlier
(red links) loop-closures significantly outnumber the inliers (green
links). Four zoomed-in point clouds on the right are reconstructed
from different methods.
ture matches [30] in point cloud fragments for registration,
and loop-closures [26] between point cloud fragments for
pose-graph [21] optimization. Unfortunately, no existing
algorithm for feature matching and loop-closure detection
guarantees complete elimination of outliers. Although out-
lier feature matches are usually handled with RANSAC-
based geometric verification [16, 30], such pairwise checks
do not consider global consistency. In addition, the numer-
ous efforts on improving the accuracy in loop-closure de-
tection [6, 8, 20, 26] are not completely free from false pos-
itives. Many back-end optimization algorithms [13, 14, 21]
are based on non-linear least-squares that lack the robust-
ness to cope with outliers. A small number of outliers
would consequently lead to catastrophic failures in the 3D
reconstructions. Several prior works focus on disabling out-
lier loop-closures in the back-end optimization [5, 15, 25].
However, these methods do not consider the effect from
the outlier feature matches with the exception of [34] that
solves global geometric registration in a very small-scale
problem setting.
The main contribution of this paper is a probabilistic
9690
approach for robust back-end optimization to handle out-
liers from a weak front-end data association in large-scale
point cloud based reconstructions. Our approach simultane-
ously suppresses outlier feature matches and loop-closures.
To this end, we model our robust point cloud reconstruc-
tion problem as a Bayesian network. The global poses
of the point cloud fragments are the unknown parameters,
and odometry and loop-closure constraints are the observed
variables. A binary latent variable is assigned to each loop-
closure constraint; it determines whether a loop-closure
constraint is an inlier or outlier. We model feature matches
in the odometry constraints with a long-tail Cauchy distri-
bution to gain robustness to outlier matches. Additionally,
we use a Cauchy-Uniform mixture model for loop-closure
constraints. The uniform and Cauchy distributions model
outlier loop-closures and the feature matches in inlier loop-
closures, respectively. In contrast to many existing back-
end optimizers that use rigid transformations as the odome-
try and loop-closure constraints [5, 14, 15, 21, 25], we use
the distances between feature matches to exert direct influ-
ence on these matches.
We use the Expectation-Maximization (EM) algorithm
[3, 15] to find the globally consistent poses of the point
cloud fragments (Sec. 4). The EM algorithm iterates be-
tween the Expectation and Maximization steps. In the Ex-
pectation step, the posterior of a loop-closure constraint be-
ing an inlier is updated. In the Maximization step, a local
optimal solution for the global poses is found from maxi-
mizing the expected complete data log-likelihood over the
posterior from the expectation step. We also generalize
our approach to solve reconstruction problems with an eas-
ier setting (Sec. 5). In particular, a strong assumption is
imposed: odometry and inlier loop-closure constraints are
free from outlier feature matches. We show that by using
a Gaussian-Uniform mixture model, our approach degen-
erates to the formulation of a state-of-the-art approach for
robust indoor reconstruction [5]. Fig. 1 shows an example
of the reconstruction result with our method compared to
other methods in the presence of outliers.
2. Related Work
Reconstruction of outdoor scenes has been studied in
[22, 23]. Schops et al. [23] propose a set of filtering steps to
detect and discard unreliable depth measurements acquired
from a RGB-D camera. However, loop-closures is not de-
tected and this can lead to reconstruction failures. Rely-
ing on very accurate GPS/INS, Pollefeys et al. [22] pro-
pose a 3D reconstruction system from RGB images. How-
ever, GPS/INS signal may be unavailable or unreliable, es-
pecially on cloudy days or in urban canyons. Our work
relies on neither GPS/INS nor RGB images. In contrast,
we focus on reconstruction from point cloud data acquired
from 3D scanners that is less sensitive to weather or light-
ing changes. There are also many works on indoor scene
reconstruction. Since the seminal KinectFusion [18], there
are several follow-up algorithms [4, 19, 27]. Unfortunately,
these methods do not detect loop-closures. Nonetheless,
there are many RGB-D reconstruction methods with loop-
closure detection [5, 7, 10, 11, 24, 28, 31, 32, 33].
Choi et al. [5] achieve the state-of-the-art performance
for indoor reconstruction with robust loop-closure. How-
ever, they assume no outlier feature matches in the odom-
etry and inlier loop-closure constraints. We relax this as-
sumption to achieve robust feature matching. More specif-
ically, [5] estimates a switch variable [25] for each loop-
closure constraint using line processes [2]. Outlier loop-
closures are disabled by setting the respective switch vari-
ables to zero. Additional switch prior terms are imposed
and chosen empirically [25] to prevent a trivial solution of
removing all loop-closure constraints. In comparison, our
approach does not require the additional prior terms. We
estimate the posterior of a loop-closure being an inlier con-
straint in the Expectation step shown in Sec. 4. The EM ap-
proach is also used by Lee et al. [15]. However, they solve
a robust pose-graph optimization problem without coping
with the feature matches for reconstruction.
3. Overview
In this section, we provide an overview of our recon-
struction pipeline that consists of four main components:
point cloud fragment construction, point cloud registration,
loop-closure detection, and robust reconstruction with EM.
Point cloud fragment construction. A single scan from
a 3D scanner, e.g. LiDAR, contains limited number of
points. We integrate multiple consecutive scans with odom-
etry readings obtained from dead reckoning e.g., the Iner-
tial Navigation System (INS) [17] to form local point cloud
fragments. A set of 3D features is then extracted from each
point cloud fragment using [30].
Point cloud registration. The top k1 feature matches be-
tween two consecutive point cloud fragments Fi and Fi+1
are retained as the odometry constraint Xi,i+1. Since
consecutive fragments overlap sufficiently by construction
[17, 30], we define Xi,i+1 as a reliable constraint but note
that it can contain outlier feature matches.
Loop-closure detection. It is inefficient to perform an ex-
haustive pairwise registration for large-scale outdoor scenes
with many point cloud fragments. Hence, we perform point
cloud based place-recognition [26] to identify a set of can-
didate loop-closures. We retain the top k2 potential loop-
closures for each fragment and remove the duplicates. For
each loop-closure between fragments Fi and Fj , we keep
9691
the set of top k1 feature matches denoted as Yij . We define
Yij as a loop-closure constraint, which can either be an in-
lier or outlier. Similar to the odometry constraint, an inlier
loop-closure Yij can also contain outlier feature matches.
Robust reconstruction with EM. The constraints from
point cloud registration and loop-closure detection can con-
tain outliers. In particular, both odometry and loop-closure
constraints can contain outlier feature matches. Moreover,
many detected loop-closures are false positives. In the next
section, we describe our probabilistic modeling approach to
simultaneously suppress outlier feature matches and false
loop-closures. The EM algorithm is used to solve for the
globally consistent fragment poses. Optional refinement us-
ing ICP can be applied to further improve the global point
cloud registration.
4. Robust Reconstruction with EM
We model the robust reconstruction problem as a
Bayesian network shown in Fig. 2. Let T =[T1, ..., Ti, ..., TN ]⊤, where Ti ∈ SE(3), denote the N frag-
ment poses, X = [X12, ..., Xi,i+1, ..., XN−1,N ]⊤ denote
the N −1 odometry constraints obtained in point cloud reg-
istration, and Y = [..., Yij , ...]⊤ denote the M loop-closure
constraints obtained in loop-closure detection. We explic-
itly assign the loop-closure constraints into 2 clusters that
represent the inliers and outliers. For each loop-closure con-
straint Yij , we introduce a corresponding assignment vari-
able Zij = [Zij,in, Zij,out]⊤ ∈ {[1, 0]⊤, [0, 1]⊤}. Zij is a
one-hot vector: Zij,in = 1 and Zij,out = 1 assigns assigns
Yij as an inlier and outlier loop-closure constraint, respec-
tively. We use Z = [..., Zij , ...]⊤ to denote the assignment
variables. T is the unknown parameter, Z is the latent vari-
able, and X and Y are both observed variables.
Robust reconstruction can be solved as finding the Max-
imum a Posterior (MAP) solution of p(T |X,Y ). However,
the MAP solution involves an intractable step of marginal-
ization over the latent variable Z. We circumvent this prob-
lem by using the EM algorithm that takes the maximiza-
tion of the expected complete data log-likelihood over the
posterior of the latent variables. The EM algorithm iterates
between the Expectation and Maximization steps. In the
Expectation step, we use T old, i.e., fragment poses solved
from the previous iteration to find the posterior distribution
of the latent variable Z,
p(Z|Y, T old) =p(Y |Z, T old)p(Z|T old)
p(Y |T old), (1)
in which Z does not depend on X , since they are condition-
ally independent given Y according to the Bayesian net-
work in Fig. 2.
In the Maximization step, the posterior distribution (Eq.
(1)) is used to update T by maximizing the expectation of
Figure 2. Bayesian network representation of the robust recon-
struction problem. Ti+1, Ti and Tj are fragment poses. Xi,i+1
is an odometry constraint. Yij is a loop-closure constraint. Zij
is an assignment variable. N − 1 and M indicate the numbers of
odometry constraints and loop-closure constraints respectively.
the complete data log-likelihood denoted by
QEM : =∑
Z
p(Z|Y, T old) ln p(X,Y, Z|T ) (2)
= ln p(X|T )︸ ︷︷ ︸
QX
+∑
Z
p(Z|Y, T old) ln p(Y, Z|T )︸ ︷︷ ︸
QY
.
We define QX for the term with odometry constraints, and
QY for the term with loop-closure constraints.
Initialization. The unknown parameters, i.e., global
poses T of the N fragments, are initialized with the rel-
ative poses computed from odometry constraints X using
ICP. Other dead reckoning methods such as wheel odome-
try and/or INS readings can also be used.
4.1. Modeling Odometry Constraints
Odometry constraints are obtained from point cloud reg-
istration between two consecutive point cloud fragments.
Recall that an odometry constraint Xi,i+1 is a set of feature
matches between fragments Fi and Fi+1, which can contain
outlier matches. To gain robustness, we model each feature
match (p,q) ∈ Xi,i+1 with a long-tail multivariate Cauchy
distribution. Suppose these feature matches are indepen-
dent and identically distributed (i.i.d.), we take a geometric
mean over their product to get
p(Xi,i+1|T ) =( ∏
(p,q)∈Xi,i+1
Cauchyi,i+1(p,q)) 1
|Xi,i+1|
,
(3)
where
Cauchyi,i+1(p,q) =1
π2√detΣ(1 + d2Σ(Tip, Ti+1q))2
,
(4)
which we assume an isotropic covariance Σ = σ2I with
scale σ, and dΣ denotes the Mahalanobis distance such that
d2Σ(Tip, Ti+1q) = (Tip− Ti+1q)⊤Σ−1(Tip− Ti+1q).
(5)
9692
The value of σ is set based on the density of extracted fea-
tures. For example, σ = 0.5m in the outdoor dataset.
4.2. Modeling LoopClosure Constraints
A loop-closure constraint Yij is the set of feature
matches between fragments Fi and Fj . We propose to use a
Cauchy-Uniform mixture model to cope with the (1) outlier
loop-closure constraints and (2) outlier feature matches in
the inlier loop-closure constraints.
To distinguish between inlier and outlier loop-closures,
we model the distribution of assignment variable Z as a
Bernoulli distribution defined by the inlier probability λ ∈[0, 1],
p(Zij) = λZij,in(1− λ)Zij,out . (6)
Next, we use two distributions: Cauchy and Uniform dis-
tributions to model the inlier and outlier loop-closure con-
straints, respectively.
Cauchy distribution – inlier loop-closure constraints.
The inlier loop-closure constraints can contain outlier fea-
ture matches. We use the same multivariate Cauchy dis-
tribution as Eq. (3) and further reorganize the terms. We
define Cij for brevity, such that
Cij := p(Yij |Zij,in = 1, T ) = π−2σ−3e−2Aij , (7)
in which
Aij =1
|Yij |∑
(p,q)∈Yij
ln(1 +‖Tip− Tjq‖2
σ2), (8)
and |Yij | denotes the number of feature matches in Yij .
Uniform distribution – outlier loop-closure constraints.
We model the outlier loop-closure constraints with a uni-
form distribution defined by a constant probability u ∈(0, 1),
p(Yij |Zij,out = 1, T ) = u. (9)
4.3. Expectation Step
Recall that the expectation step is evaluated in Eq. (1).
Plugging Eq. (6), (7) and (9) into the Bayes’ formula, we
obtain the posterior of being an inlier loop-closure con-
straint,
Pinij := p(Zij,in = 1|Y, T ) = Θ
Θ+ e2Aij, (10)
where
Θ =λ
(1− λ)uπ2σ3. (11)
The constant Θ consists of two distribution parameters: λ
is the probability of being an inlier loop-closure; u is the
constant probability to uniformly sample a random loop-
closure, which are difficult to set manually based on differ-
ent datasets. Hence, we propose to estimate Θ based on the
input data. More specifically, we learn Θ from the odome-
try constraints, since all odometry constraints are effectively
inlier loop-closure constraints.
The process to learn Θ is as follows. First, for each
odometry constraint Xi,i+1, we denote its corresponding er-
ror term mi,i+1 = e2Ai,i+1 (analogous to Eq. (10)), where
Ai,i+1 =1
|Xi,i+1|∑
(p,q)∈Xi,i+1
ln(1 +‖Tip− Ti+1q‖2
σ2).
(12)
Next, we compute the median error denoted as m. Since we
regard all odometry constraints as inlier loop-closure con-
straints, letΘ
Θ+ m= p, (13)
where we set p = 90%, meaning that a loop-closure Yij
with a small error (e2Aij < m) is very likely to be an inlier
(Pinij > p). Finally, we solve for Θ using Eq. (13).
4.4. Maximization Step
In the maximization step, we solve for T that maximizes
QEM = QX + QY , where QX and QY are shorthand no-
tations defined in Eq. (2). These two terms are evaluated
independently, and then optimized jointly.
Evaluate QX . Assuming the odometry constraints in X
are i.i.d., the joint probability of all odometry constraints is
given by
p(X|T ) =N−1∏
i=1
p(Xi,i+1|T ). (14)
Substituting the joint probability of the feature matches
within each odometry constraint (Eq. (3)), we can rewrite
QX as
QX = −2
N−1∑
i=1
Ai,i+1 + const. (15)
Evaluate QY . Using the product rule, the joint prob-
ability of loop-closure constraints and their correspond-
ing assignment variables can be written as p(Y, Z|T ) =p(Z)p(Y |Z, T ). Plugging Eq. (6), (7) and (9) in, we have
p(Y, Z|T ) =∏
i,j
(λ Cij)Zij,in
((1− λ)u
)Zij,out. (16)
We can rewrite QY as
QY =∑
i,j
Pinij lnCij + const, (17)
9693
with the joint probability from Eq. (16) and the posterior
from Eq. (10), which can be further expanded to
QY = −2∑
i,j
Pinij Aij + const. (18)
Maximize QEM . The maximization of QX +QY can be
reformulated into a non-linear least-squares problem with
the following objective function
argminT
∑
i,j
Pinij
|Yij |∑
(p,q)∈Yij
ln(1 +‖Tip− Tjq‖2
σ2) (19)
+
N−1∑
i=1
1
|Xi,i+1|∑
(p,q)∈Xi,i+1
ln(1 +‖Tip− Ti+1q‖2
σ2),
which can be easily optimized using the sparse Cholesky
solver in Google Ceres [1]. The computation complexity is
cubic to the total number of feature matches.
5. Generalization using EM
In the previous section, we solved the problem when
constraints are contaminated with outlier feature matches.
In this section, we study a problem with an easier setting
where correct loop-closure constraints contain no outlier
feature matches. Recall that long-tail multivariate Cauchy
distribution is used to gain robustness against outlier fea-
ture matches. We replace the multivariate Cauchy distribu-
tion with a multivariate Gaussian distribution for the easier
problem without outlier feature matches, and show that our
EM formulation degenerates to the formulation of a state-
of-the-art approach for robust indoor reconstruction [5]. To
avoid repetition, we only highlight the major differences to
the previous section. Each analogous term is augmented
with a superscript G that stands for “Gaussian”.
Odometry constraints. Replacing the multivariate
Cauchy distribution in Eq. (3) with a multivariate Gaussian
distribution, we have
pG(Xi,i+1|T ) =( ∏
(p,q)∈Xi,i+1
Gaussi,i+1(p,q)) 1
|Xi,i+1|
,
(20)
where
Gaussi,i+1(p,q) =exp
(− 1
2d2Σ(Tip, Ti+1q)
)
√
(2π)3 detΣ, (21)
and Σ and dΣ remain unchanged.
Loop-closure constraints. We note that the Bernoulli
distribution in Eq. (6) still holds, and the major changes
start from Eq. (7). Using the multivariate Gaussian distri-
bution, we have
Gij := pG(Yij |Zij,in = 1, T ) = (√2πσ)−3e−
Bij
2σ2 , (22)
in which
Bij =1
|Yij |∑
(p,q)∈Yij
‖Tip− Tjq‖2, (23)
and |Yij | is the number of feature matches. We note that
Bij is a sum-of-square errors that can lead to arithmetic
overflow in the eBij term from the posterior of the latent
variable Zij (analogous to Eq. (10)). In contrast, there is
no arithmetic overflow in the eAij term from Eq. (10) since
Aij from Eq. (8) is a sum-of-log errors. We propose to al-
leviate the arithmetic overflow problem by using a Pareto
distribution that approximates Gij as
Gij ≈x0
B2ij
, (24)
where x0 > 0 is a scale parameter. For outlier loop-
closures, the uniform distribution in Eq. (9) still holds.
Expectation step. Using the approximation of Gij in Eq.
(24), the posterior from Eq. (10) becomes
Pinij
G ≈ ΘG
ΘG +B2ij
, (25)
where
ΘG =x0λ
(1− λ)u. (26)
It becomes apparent in PinG
ij that the arithmetic overflow
problem is alleviated by the replacement of eBij with B2ij .
In the previous section, Θ in Eq. (13) is learned from the
median error m of all the error terms mi,i+1 = e2Ai,i+1
in the odometry constraints. Unfortunately, the median er-
ror mG from mGi,i+1 = B2
i,i+1 becomes uninformative be-
cause we assume no outlier feature matches, i.e., mG → 0since mG
i,i+1 = B2i,i+1 → 0. Despite the absence of outlier
feature matches, ‖Tip − Tjq‖ is upper bounded by some
threshold, ǫ. Hence, the mean error term can be directly
estimated from Eq. (23) as mG = ǫ2. Subsequently, let
ΘG
ΘG + mG= p, (27)
where we set p = 90% and solve for ΘG. We set ǫ = 0.05m
for our experiments on the indoor dataset (see next section)
based on the typical magnitude of sensor noise.
9694
Living room 1 Living room 2 Office 1 Office 2 Average
Before pruningRecall(%) 61.2 49.7 64.4 61.5 59.2
Precision(%) 27.2 17.0 19.2 14.9 19.6
Choi et al. [5] Recall(%) 57.6 49.7 63.3 60.7 57.8
after pruning Precision(%) 95.1 97.4 98.3 100.0 97.7
Ours (Sec. 5) Recall(%) 58.7 48.4 63.9 61.5 58.1
after pruning Precision(%) 97.0 94.9 96.6 93.6 95.4
Table 1. Results of robust optimization on the indoor dataset. Our method shows comparable result with the state-of-the-art.
Living room 1 Living room 2 Office 1 Office 2 Average
Whelan et al. [27] 0.22 0.14 0.13 0.13 0.16
Kerl et al. [12] 0.21 0.06 0.11 0.10 0.12
SUN3D [29] 0.09 0.07 0.13 0.09 0.10
Choi et al. [5] 0.04 0.07 0.03 0.04 0.05
Ours (Sec. 5) 0.06 0.09 0.05 0.04 0.06
GT Trajectory 0.04 0.04 0.03 0.03 0.04
Table 2. Reconstruction accuracy on the indoor dataset. The entries are the mean distances of each model to its respective ground-truth
surface (in meters). Our proposed method shows comparable result with the state-of-the-art and outperforms the rest.
Maximization step. Finally, we reformulate the maxi-
mization problem as a non-linear least-squares problem
with the following objective function
argminT
∑
i,j
Pinij
G
|Yij |∑
(p,q)∈Yij
‖Tip− Tjq‖2 (28)
+
N−1∑
i=1
1
|Xi,i+1|∑
(p,q)∈Xi,i+1
‖Tip− Ti+1q‖2,
which is similar to the formulation in [5] with two minor
differences. First, we average the square errors over the
number of feature matches but [5] does not. Second, we
estimate the posterior Pinij
Gby iterating between the Ex-
pectation and Maximization steps but [5] estimates it using
line processes [2]. It is important to note that Eq. (28) is
derived from the original Gaussian formulation in Eq. (22)
instead of the Pareto approximation in Eq. (24).
6. Evaluation
We use the experimental results from two datasets for
the comparison between our approach and the state-of-the-
art approach [5]. The first dataset is from small-scale in-
door scenes with no outlier feature matches in the odom-
etry and inlier loop-closure constraints, and the second
dataset is from large-scale outdoor scenes with outlier fea-
ture matches. Our Gaussian-Uniform EM (Sec. 5) and
Cauchy-Uniform EM (Sec. 4) are evaluated on the small-
scale indoor and large-scale outdoor datasets, respectively.
6.1. SmallScale Indoor Scenes
The “Augmented ICL-NUIM Dataset” provided and
augmented by [9] and [5], respectively, is used as the small-
scale indoor dataset. This dataset is generated from syn-
thetic indoor environments and includes two models: a liv-
ing room and an office. There are two RGB-D image se-
quences for each model, resulting in a total of four test
cases. To ensure fair comparison, we follow the same eval-
uation criteria and experimental settings as [5].
Results. Tab. 1 shows the comparison of the average re-
call and precision of the loop-closures on (1) before prun-
ing, (2) [5] after pruning and (3) our method after prun-
ing. Here, “before pruning” refers the loop-closures from
the loop-closure detection, and “after pruning” refers to the
inlier loop-closures after robust optimization. It can be seen
that the average precision and recall of our method is com-
parable to [5]. This is an expected result since we showed in
Sec. 5 that our method degenerates to the method in [5] with
minor differences in the absence of outlier feature matches.
We further evaluate the reconstruction accuracy of the final
model using the error metric proposed in [9], i.e., the mean
distance of the reconstructed surfaces to the ground truth
surfaces. Tab. 2 shows the comparison of the reconstruc-
tion accuracy of our method to other existing approaches.
In addition, as suggested in [5], the reconstruction accu-
racy of the model obtained from fusing the input depth im-
ages with the ground truth trajectory (denoted as GT Trajec-
tory in Tab. 2) is reported for reference. As expected, our
method shows comparable result with the state-of-the-art on
the indoor dataset.
6.2. LargeScale Outdoor Scenes
The large-scale outdoor dataset is based on the “Oxford
Robotcar Dataset” [17]. It consists of 3D point clouds cap-
tured with a LiDAR sensor mounted on a car that repeatedly
9695
Pure odometry Choi et al. [5] (identity covariance) Choi et al. [5] Ours (Sec. 4)Figure 3. Trajectories on the 1km route. Each trajectory (blue) is overlaid with the GPS/INS trajectory (green). Red asterisk indicates the
position of the 1st fragment pose.
Choi et al. [5] Ours (Sec. 4)Pure odometry Choi et al. [5] (identity covariance)Figure 4. Trajectories on the city-scale route. Each trajectory (blue) is overlaid with the GPS/INS trajectory (green). A zoomed-in region
is shown on the top right corner for each trajectory. Red asterisk indicates the position of the 1st fragment pose.
drives through Oxford, UK, at different times over a year.
We select two different driving routes from the dataset, a
short route (about 1km) and a long route (city-scale). Fur-
thermore, we take two traversals at different times for each
route, resulting four traversals in total. Unlike the syn-
thetic indoor dataset, there is no ground truth of the sur-
face geometry. We evaluate the trajectory accuracy against
the GPS/INS readings as an indirect measurement of recon-
struction accuracy. We prepare the dataset as follows:
• Point cloud fragments. We integrate the push-broom 2D
LiDAR scans and their corresponding INS readings into
the 3D point clouds. We segment the data into fragments
with 30m radius for every 10m interval. Each fragment
is then downsampled using a VoxelGrid filter with a grid
size of 0.2m. 242 and 1770 fragments are constructed for
the 1km route and the city-scale route, respectively.
• Odometry trajectory. The odometry trajectory is dis-
connected due to discontinuous INS data since we are
combining two traversals. We simulate the odometry tra-
jectory via geometric registrations between consecutive
point cloud fragments, and manually identify one link-
age transformation between the two traversals. We also
check the entire odometry trajectory to ensure that there
are no remaining erroneous transformations. The result-
ing odometry trajectory is used to initialize the fragment
poses, T .
• Odometry constraints. For every two consecutive
frames along the odometry trajectory, we perform point
cloud registration as described in Sec. 3. Specifically, we
extract 1024 features for each fragment, and collect the
top 200 feature matches to form an odometry constraint.
Note that the feature matches are selected without addi-
tional geometric verification, and it can contain outliers.
241 and 1769 odometry constraints are constructed for
the 1km route and the city-scale route, respectively.
• Loop-closure constraints. We perform loop-closure de-
tection as described in Sec. 3. We take every 5th frag-
ment along the trajectory as a keyframe fragment; loop-
closures are detected among the selected keyframe frag-
ments. For the 1km route, we find the top 5 loop-closures
for each keyframe fragment and then remove the dupli-
cates. For the city-scale route, we find the top 10 loop-
closures for each keyframe fragment and then remove the
duplicates. 171 and 1438 loop-closure constraints are
constructed for the 1km and city-scale route, respectively.
The outlier loop-closure ratio is more than 80% for both
routes.
Baseline Methods. We compare the effectiveness of our
approach with two baseline methods based on [5]: a
stronger and a weaker baseline. The stronger baseline en-
codes uncertainty information of the feature matches be-
tween two fragments into a covariance matrix. The feature
matches used to construct the covariance matrix are those
within 1m apart after geometric registration. Refer to [5] for
the more details on the covariance matrix. The covariance
matrix of the weaker baseline is set to identity, i.e., no un-
certainty information on the feature matches. The relative
poses between the point cloud fragments computed from
9696
Odom
etry
Choi
et a
l. [5]
, Id.
cov.
Choi
et a
l. [5]
Ours
(Sec
. 4)
Odom
etry
Choi
et a
l. [5]
, Id.
cov.
Choi
et a
l. [5]
Ou
rs (S
ec. 4
)
Scene images Scene images
Figure 5. Reconstruction results on the city-scale route traversed in day and night. The two columns of zoomed-in point clouds are
reconstructed based on different trajectories.
1km City-scale
Odometry 1.85 11.81
Choi et al. [5]123.24 207.93
(identity covariance)
Choi et al. [5] 1.97 50.92
Ours (Sec. 4) 1.34 2.45
Table 3. Reconstruction accuracy on outdoor dataset. Each entry
is the mean distance of the estimated poses to the GPS/INS ground
truth (in meters).
ICP are used as the odometry and loop-closure constraints
in the baseline methods.
Results. Tab. 3 summarizes the mean distances of the
estimated poses to the GPS/INS trajectory as an indirect
measure of the reconstruction accuracy on the 1km and
city-scale outdoor datasets. Fig. 3 and 4 show the plots
of the trajectories. We align the first five fragment poses
with the GPS/INS trajectory, error measurements start af-
ter the 5th fragment pose. The results show that the ac-
curacy increases when more information about the feature
matches is considered in the optimization process. We can
see from Tab. 3, and Fig. 3 and 4 that the weaker baseline
([5] with uninformative identity covariance) without infor-
mation of the feature matches gives the worst performance.
The stronger baseline ([5] with informative covariance ma-
trix) that encodes information of feature matches using the
covariance matrix shows better performance. In contrast,
our method that directly takes feature matches as the odom-
etry and loop-closure constraints outperforms the two base-
lines. Furthermore, Fig. 1 and 5 show reconstruction results
for qualitative evaluation. It can be seen from the bottom
left and right plots in Fig. 5 that our method produces the
sharpest reconstructions of the 3D point clouds.
7. Conclusion
In this paper, we proposed a probabilistic approach for
robust point cloud reconstruction of large-scale outdoor
scenes. Our approach leverages on a Cauchy-Uniform
mixture model to simultaneously suppress outlier feature
matches and loop-closures. Moreover, we showed that by
using a Gaussian-Uniform mixture model, our approach de-
generates to the formulation of a state-of-the-art approach
for robust indoor reconstruction. We verified our proposed
methods on both indoor and outdoor benchmark datasets.
Acknowledgement
This work is supported in part by a Singapore MOE Tier
1 grant R-252-000-A65-114.
9697
References
[1] S. Agarwal, K. Mierle, and Others. Ceres solver. http:
//ceres-solver.org.
[2] M. J. Black and A. Rangarajan. On the unification of line
processes, outlier rejection, and robust statistics with appli-
cations in early vision. In IJCV, 1996.
[3] G. Celeux and G. Govaert. A classification em algorithm for
clustering and two stochastic versions. In CSDA, 1992.
[4] J. Chen, D. Bautembach, and S. Izadi. Scalable real-time
volumetric surface reconstruction. In TOG, 2013.
[5] S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction
of indoor scenes. In ICCV, 2015.
[6] M. Cummins and P. Newman. Appearance-only slam at large
scale with fab-map 2.0. In IJRR, 2011.
[7] F. Endres, J. Hess, J. Sturm, D. Cremers, and W. Burgard.
3-d mapping with an rgb-d camera. In T-RO, 2014.
[8] D. Galvez-Lopez and J. D. Tardos. Real-time loop detection
with bags of binary words. In IROS, 2011.
[9] A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A
benchmark for RGB-D visual odometry, 3D reconstruction
and SLAM. In ICRA, 2014.
[10] P. Henry, D. Fox, A. Bhowmik, and R. Mongia. Patch vol-
umes: Segmentation-based consistent mapping with rgb-d
cameras. In 3DV, 2013.
[11] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-
d mapping: Using kinect-style depth cameras for dense 3d
modeling of indoor environments. In IJRR, 2012.
[12] C. Kerl, J. Sturm, and D. Cremers. Dense visual slam for
rgb-d cameras. In IROS, 2013.
[13] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor
graphs and the sum-product algorithm. In T-IT, 2001.
[14] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, and
W. Burgard. g 2 o: A general framework for graph opti-
mization. In ICRA, 2011.
[15] G. H. Lee, F. Fraundorfer, and M. Pollefeys. Robust
pose-graph loop-closures with expectation-maximization. In
IROS, 2013.
[16] G. H. Lee and M. Pollefeys. Unsupervised learning of thresh-
old for geometric verification in visual-based loop-closure.
In ICRA, 2014.
[17] W. Maddern, G. Pascoe, C. Linegar, and P. Newman. 1 year,
1000 km: The oxford robotcar dataset. In IJRR, 2017.
[18] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux,
D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and
A. Fitzgibbon. Kinectfusion: Real-time dense surface map-
ping and tracking. In ISMAR, 2011.
[19] M. Nießner, M. Zollhofer, S. Izadi, and M. Stamminger.
Real-time 3d reconstruction at scale using voxel hashing. In
TOG, 2013.
[20] D. Nister and H. Stewenius. Scalable recognition with a vo-
cabulary tree. In CVPR, 2006.
[21] E. Olson, J. Leonard, and S. Teller. Fast iterative alignment
of pose graphs with poor initial estimates. In ICRA, 2006.
[22] M. Pollefeys, D. Nister, J.-M. Frahm, A. Akbarzadeh,
P. Mordohai, B. Clipp, C. Engels, D. Gallup, S.-J. Kim,
P. Merrell, et al. Detailed real-time urban 3d reconstruction
from video. In IJCV, 2008.
[23] T. Schops, T. Sattler, C. Hane, and M. Pollefeys. Large-
scale outdoor 3d reconstruction on a mobile device. In CVIU,
2017.
[24] F. Steinbrucker, C. Kerl, and D. Cremers. Large-scale multi-
resolution surface reconstruction from rgb-d sequences. In
ICCV, 2013.
[25] N. Sunderhauf and P. Protzel. Switchable constraints for ro-
bust pose graph slam. In IROS, 2012.
[26] M. A. Uy and G. H. Lee. PointNetVLAD: Deep Point
Cloud Based Retrieval for Large-Scale Place Recognition.
In CVPR, 2018.
[27] T. Whelan, H. Johannsson, M. Kaess, J. J. Leonard, and
J. McDonald. Robust real-time visual odometry for dense
rgb-d mapping. In ICRA, 2013.
[28] T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald.
Deformation-based loop closure for large scale dense rgb-d
slam. In IROS, 2013.
[29] J. Xiao, A. Owens, and A. Torralba. Sun3d: A database
of big spaces reconstructed using sfm and object labels. In
ICCV, 2013.
[30] Z. J. Yew and G. H. Lee. 3DFeat-Net: Weakly Supervised
Local 3D Features for Point Cloud Registration. In ECCV,
2018.
[31] Q.-Y. Zhou and V. Koltun. Dense scene reconstruction with
points of interest. In TOG, 2013.
[32] Q.-Y. Zhou and V. Koltun. Simultaneous localization and
calibration: Self-calibration of consumer depth cameras. In
CVPR, 2014.
[33] Q.-Y. Zhou, S. Miller, and V. Koltun. Elastic fragments for
dense scene reconstruction. In ICCV, 2013.
[34] Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration.
In ECCV, 2016.
9698