arxiv:2108.12876v1 [cs.cv] 29 aug 2021
TRANSCRIPT
Noname manuscript No.(will be inserted by the editor)
Solving Viewing Graph Optimization for SimultaneousPosition and Rotation Registration
Seyed-Mahdi Nasiri · Reshad Hosseini · Hadi Moradi
Received: date / Accepted: date
Abstract A viewing graph is a set of unknown camera
poses, as the vertices, and the observed relative mo-
tions, as the edges. Solving the viewing graph is an
essential step in a Structure-from-Motion procedure,
where a set of relative motions is obtained from a col-
lection of 2D images. Almost all methods in the liter-
ature solve for the rotations separately, through rota-
tion averaging process, and use them for solving the
positions. Obtaining positions is the challenging part
because the translation observations only tell the direc-
tion of the motions. It becomes more challenging when
the set of edges comprises pairwise translation obser-
vations between either near and far cameras. In this
paper an iterative method is proposed that overcomes
these issues. Also a method is proposed which obtains
the rotations and positions simultaneously. Experimen-tal results show the-state-of-the-art performance of the
proposed methods.
SM. NasiriE-mail: [email protected] of ECE, College of Engineering, University of Tehran,Tehran, Iran
R. HosseiniE-mail: [email protected].: +98-21-82089799School of ECE, College of Engineering, University of Tehran,Tehran, IranSchool of Computer Science, Institute of Research in Funda-mental Sciences (IPM), Tehran, Iran
H. MoradiE-mail: [email protected].: +98-21-82084960School of ECE, College of Engineering, University of Tehran,Tehran, IranIntelligent Systems Research Institute, SKKU, South Korea
Keywords Viewing Graph · Structure-from-Motion ·Camera Pose Registration
1 Introduction
Structure-from-Motion (SfM) refers to a process in which
a set of 3D points are reconstructed from their projec-
tions on a given set of images. Almost all SfM tech-
niques can be classified into two categories, sequen-
tial or incremental techniques [27,1,8,36,9,15,28], and
global or batch techniques [16,19,2,7,30].
As the incremental approaches often suffer from large
drifting error, it is recommended to use global methods
to achieve better accuracy and consistent models. The
global methods consist of the following main steps: 1)
Pairwise image registration: Estimating relative rota-
tions and movement directions between pairs of images
using feature points [31,22,17]. The result is a graph,
i.e. the viewing graph, whose edges are the relative
pose measurements. 2) Solving the viewing graph: Cam-
era poses estimation [23,30,16,37,13,6,4], which is the
topic of this paper. 3) Triangulation: Reconstructing 3D
points by triangulating corresponding points from cal-
culated camera points [14,29,5]. 4) Bundle adjustment:
Camera poses and 3D points refinement by reprojection
error minimization [32,18].
Since the relative translation observations only con-
tains the movement directions and not their scale, solv-
ing the viewing graph is the challenging step in the SfM
process. To simplify the problem, almost all viewing
graph solvers determine the rotations using only the
rotation observations and then use them to compute
the camera positions [35,11,23,16]. Obtaining a set of
rotations from their relative observations is known as
arX
iv:2
108.
1287
6v1
[cs
.CV
] 2
9 A
ug 2
021
2 Seyed-Mahdi Nasiri et al.
rotation averaging problem and has been studied in the
computer vision community [21,12,3,13].
The rotations obtained by solving the rotation av-
eraging problem are used to change the representation
of direction observations from local coordinates to a
global coordinate system. The camera location estima-
tion problem is to find camera positions from these rel-
ative directions in the global coordinate frame. A so-
lution is to find camera locations which minimize the
squared sum of direction errors [35]. The difficulty of
solving this non-linear cost function led most of the
available methods in the literature to change the cost
function from the direction error to the displacement er-
ror. To this end, unknown scale factors are added to the
problem which should be estimated besides the camera
poses. Some of these approaches used the cross product
of solution translations with direction observations as
displacement error [11,2]. These approaches suffer from
the property of cross product distance that decreases for
an angle error of more than 90 degrees. The constrained
least-squared [34,33], and least-unsquared chordal dis-
tances [24,23] are other commonly used cost functions.
Using the displacement error instead of direction error
biases the errors in proportion to the length of edges.
In this paper, we show that the constrained least-
squared formulation for solving camera positions has
inherit limitation that decreases its performance. We,
therefore, propose an iterative algorithm to solve the
original displacement error or direction error cost func-
tions starting from the solution of the constrained least-
squared formulation. Experimental results show the ef-
ficacy of the proposed methods. We also propose itera-
tive methods for solving simultaneous position and ro-
tation registration. These methods solves a pose graph
optimization (PGO) problem in each step, that thanks
to recent success of PGO solvers can be solved effi-
ciently and with high accuracy. Experimental results
show that our proposed methods significantly outper-
form most accurate methods proposed in the literature.
2 Preliminaries
2.1 Viewing-Graph
Given n views or images from a scene, a subset of(n2
)pairwise relative motions can be estimated. The camera
poses, as the vertices, and the pairwise relative motions,
as the observed edges, construct the viewing graph.
Each relative motion observation comprises a relative
rotation and a relative direction. Relative directions are
vectors from an endpoint of an edge to another one and
are represented in local coordinate frames of vertices.
A set of vertices, V , and its relevant set of edges, E,
form a viewing graph G = {V,E}. The parameters of
the ith vertex, vi, are the position and the orientation
of the ith camera with respect to a global coordinate
frame, i.e. vi = {pi, Ri}. An edge between the ith and
jth vertices, represented by eij , comprise a noisy mea-
surement of the relative rotation, Rij ← RTj Ri, and a
noisy measurement of the direction of movement in the
ith coordinate frame, i.e. γlij ← RTi (pj −pi)/‖pj −pi‖.Solving a viewing graph means finding unknown pa-
rameters of the vertices, i.e. camera poses, that match
the edges, i.e. relative observations, as much as possi-
ble. Mathematically speaking, the viewing graph opti-
mization problem solves for the edges that minimize the
following mismatch cost function,
min{pi,Ri}
∑eij={i,j}∈E
d2R(Rij , R
Tj Ri
)+ d2d(γij ,
pj − pi‖pj − pi‖
),
(1)
where, dR(.) and dγ(.) are distance metrics and γij =
Riγlij . The weights can be added to the cost function to
handle different confidence level of measurements and
to balance errors of the two parts of the cost function.
2.2 Common Metrics
Rotations distance metrics are widely studied under the
title of rotation averaging [13,6]. The common metric,
that is also used in our proposed methods, is the Frobe-
nius norm of difference of the rotation matrices:
dR (R1, R2) = ‖R1 −R2‖F . (2)
Several distance metrics can also be used for the di-
rection part of the cost function, i.e. dd(.). Some cases
are listed in [35]. The commonly used metrics are the
chordal and the orthogonal distances which are formu-
lated as follows:
Chordal: dd (u,v) = ‖u− v‖ (3)
Orthogonal: dd (u,v) = ‖Pu⊥(v)‖ = ‖u× v‖ (4)
where Pu⊥ is the projector onto the orthogonal com-
plement of the span of u and “×” represents the cross
product.
Common approaches separate the two parts of the
cost function. The first part, i.e. the rotation averaging
problem [13,21], is independent of the positions and can
be solved independently to obtain the vertices orienta-
tions. The rotation averaging problem is defined as:
min{Ri}
∑eij={i,j}∈E
d2R(Rij , R
Tj Ri
). (5)
Solving Viewing Graph Optimization for Simultaneous Position and Rotation Registration 3
𝛾𝑖𝑗
𝒑𝑗
𝒑𝑖 𝒑𝑗 − 𝒑𝑖
‖𝒑𝑗 − 𝒑𝑖‖
𝑎) ‖𝛾𝑖𝑗 −𝒑𝑗 − 𝒑𝑖
‖𝒑𝑗 − 𝒑𝑖‖‖
𝑏) ‖𝛾𝑖𝑗 ×𝒑𝑗 − 𝒑𝑖
‖𝒑𝑗 − 𝒑𝑖‖‖
𝑑) ‖𝛾𝑖𝑗 × (𝒑𝑗 − 𝒑𝑖)‖
𝑐) ‖(‖𝒑𝑗 − 𝒑𝑖‖𝛾𝑖𝑗) − (𝒑𝑗 − 𝒑𝑖)‖
Fig. 1 The geometric interpretation of different error crite-ria. a) Chordal distance of directions error. b) Orthogonaldistance of directions error. c) Chordal distance of displace-ments error. d) Orthogonal distance of displacements error.
The obtained orientations are used to calculate γijs
from γlijs. Now the second part of the cost function is
a function of positions and known as “camera location
estimation” problem. The objective function is designed
to minimize directions error and is formulated as:
min{pi}
∑eij={i,j}∈E
d2d(γij ,pj − pi‖pj − pi‖
). (6)
There are global scale and translation ambiguities in
the solutions of (6), and if optimization methods can
cope with over-parametrization there is no need to re-
move the ambiguities. Some existing methods however
use some constraints to resolve the ambiguites. The
constraints∑
pi = 0 or fixing the first vertex at the
origin are the common constraints to remove the trans-
lation ambiguity, and the constraints like∑‖pi‖2 = 1
or ‖pi−pj‖ > 1,∀(i, j) ∈ E can be used to remove the
scale ambiguity.
A different approach to find camera locations is to
minimize the displacements error instead of directions
error. In this case the problem is formulated as:
min{pi}
∑eij={i,j}∈E
d2d(‖pj − pi‖γij ,pj − pi). (7)
In this case. it is important to consider a constraint
like∑‖pi‖2 = 1 to prevent the trivial solutions of plac-
ing all camera positions at the same point.
The combinations of two distance metrics, i.e. or-
thogonal and chordal, and the two aforementioned ap-
proaches, i.e. minimizing directions or displacements er-
ror, form a set of four different error criteria which were
used by different methods to solve the camera loca-
tion estimation problem. The geometric interpretation
of these error criteria is shown in Fig. 1.
3 Limitations of Existing Methods
In this section, we first talk about a limitation of an
existing method to solve the cost function of (7). The
method is fast and gives relatively good results and we
use it as the initialization procedure of our proposed
methods. In the next part, we show the limitation of the
orthogonal distance, and therefore the existing methods
in the literature that use this cost function.
3.1 Linear constraint problem
A common approach of solving camera locations is to
solve the following alternative problem:
min{pi,λij}
∑eij={i,j}∈E
‖λijγij − (pj − pi)‖2 , (8)
where the scale factors, λijs, are unknown parameters.
For λij = ‖pj−pi‖, problem (8) is the same as (7) with
the chordal distance, which minimizes the displacement
error, i.e. Euclidean distance between the endpoint of
vectors λijγij and pj − pi. In practice, the constraint
λij = ‖pj − pi‖ is dropped and the linear constraint
λij ≥ 1 is replaced, which also handles the scale am-
biguity. Like [3], we refer to the problem of (8) with
the constraint λij ≥ 1 as the constrained least-squares
(CLS).
The cost function of (8) is quadratically related to
the global scale, i.e. the size of the vertex set. Reducing
the global scale reduces the cost quadratically. Obvi-
ously, setting the global scale to zero, results in zero
cost with the trivial solution of {pi = 0,∀i ∈ V }. How-
ever, the constraint λij ≥ 1, prevents a solver to get
this trivial solution. In fact, λijs represent the distance
between the vertices, and the constraint prevents the
vertices from getting too close to each other.
At a first glance, it seems that the constraint causes
the global scale to be determined according to the min-
imum distance between the vertices. This implies that
the smallest scale factor, i.e. the minimum λij corre-
sponding to the minimum distance of the vertices, is
set to one and the other scale factors are estimations of
the distance of their endpoints divided by the minimum
distance of the vertices. But in practice, a significant
number of λijs are exactly set to one. In fact, λijs for
a subset of edges, with shorter length than the others,
are set to one. This will increase the cost on the subset
of edges, but will reduce the global scale and reduce
the cost on the other edges with longer length. There-
fore, the cost is decreased, but λijs and consequently
the positions deviate from their real values.
Fig. 2 compares the obtained λijs by CLS and one
of our proposed methods for a dataset. In this figure,
4 Seyed-Mahdi Nasiri et al.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
2
4
6
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
0
2
4
6
Fig. 2 Comparing the obtained scale factors, i.e. λijs, inCLS and Alg. 1C on “Tower of London” dataset. A significantnumber of scale factors, obtained by CLS, are stick to thelower bound and set to 1. That is while the scale factorsobtained by Alg. 1C match the distances between estimatedcamera positions.
there are a large number of vertices with small ‖pj −pi‖, such that λij is set to 1 and therefore does not
match ‖pj − pi‖. Furthermore, it shows that for CLS
the mismatches also occurred in large λijs. In contrast
in our proposed method, λijs and their corresponding
‖pj − pi‖, completely match.
3.2 Orthogonal metric problem
Orthogonal distance is a common distance metric usedin the literature for solving the camera locations. It is
actually an easier metric to be solved and one can have
interesting convergence results. It can be shown that
the following cost function that has multipliers λij can
be solved iteratively and at the minimum is equal to
the orthogonal distance metric (see Appendix for more
details).
min{pi,λij}
∑eij={i,j}∈E
‖γij − λij(pj − pi)‖2 . (9)
The orthogonal distance decreases for an angle error
of more than 90 degrees. This means that the angle be-
tween γij and (pj−pi), and also the Euclidean distance
between them, can be increased while their orthogonal
distance is reduced. This can results in undesirable re-
sults in methods based on this distance metric.
We construct a simple example to demonstrate the
problem of orthogonal distance. Fig. 3.2 shows a graph
with six vertices. Vertices number 5 and 6, placed near
1
2
3
45 6
Fig. 3 The results of applying algorithms CLS (green plus),the cross product minimization of [11] (purple cross), andorthogonal projection minimization of [10] (orange stars). Thegraph vertices and edges are shown as blue dots and greenlines.
the center of the graph, are closer to each other than the
other vertices. The rotation observations assumed to be
exact and the direction observations are perturbed by
5 degrees error. The cross product method of [11] and
ShapeFit method of [10] and the CLS algorithms are
applied to the problem and the results are shown in the
figure. The relative positions of the first four vertices in
all three methods are close to their true relative posi-
tions, and the four vertices are located in the four ver-
tices of a rhombus. The positions of vertices number 5
and 6 estimated by CLS are far from their true relative
positions, but these vertices are still inside the rhom-
bus of the first four vertices. That is while the positions
of vertices number 5 and 6 estimated by the other two
methods is outside of the rhombus in a situation that
their positions relative to one of the vertices 1 and 2 is in
the opposite direction of the real relative direction. Note
that, when the angle between the relative position and
the direction observation is about 180 degrees, i.e. they
are in opposite directions, then the orthogonal distance
between them are very small. Therefore, the cost value
obtained by the methods based on the orthogonal dis-
tance will be minimized, while the relative positions of
some vertices are completely in different position with
respect to their real relative positions to the positions
of other vertices.
4 The Proposed Methods
To overcome the problem of linear constraint weakness,
discussed in 3.1, we propose an iterative solver used in
two different approaches, minimizing sum of squared of
displacements and directions error, to solve the viewing
graph problem. The main idea is to use λijs obtained
by CLS at the first iteration and then replace them
with fixed coefficients according to the obtained posi-
tions in the last iteration. We are not able to prove the
Solving Viewing Graph Optimization for Simultaneous Position and Rotation Registration 5
Algorithm 1
Input: Edges of a viewing graph (Rij , γlij , ∀{i, j} ∈ E)approach:“C” minimizing the common used objective function
which comprises displacements error.“O” minimizing the original objective function which
comprises directions error.Output: Estimated poses of vertices (Ri, pi, ∀i ∈ V )
1: Ri ← RS(Rij) . Find initial orientations by RS
2: γij ← Riγlij , ∀ek = {i, j} ∈ E . Move directions toglobal coordinate
3: G′ = {E′, V ′} ← 1DSFM(G = {E, V }) . Removeoutliers by 1DSFM
4: {pi} ← CLS(γij) . Solve CLS to obtain positions5: repeat6: if approach=“C” then7: λij ← ‖pj − pj‖, ∀e′k = {i, j} ∈ E′8: {pi} ← LS1(λijγij) . Solve (8) with fixed λijs
to update positions9: else if approach=“O” then
10: λij ← 1/‖pj − pj‖, ∀e′k = {i, j} ∈ E′11: {pi} ← LS2(λij , γij) . Solve (9) with fixed λijs
to update positions12: end if13: until convergence14: return Ri, pi, ∀i ∈ V ′
convergence of the iterative solvers, but we observe ex-
perimentally that all of our iterative solvers converge.
To minimize the sum of squared displacements er-
ror, the formulation of (8) is employed and λijs are set
to obtained distance of the endpoints of the vertices i
and j in the last iteration of the algorithm. Therefore
upon convergence, the algorithm yields λij = ‖pj−pi‖,where pis are the estimated camera positions and we
are solving the actual optimization problem of (7).
The proposed algorithm summarized in Alg. 1 con-
sists of the following steps:
1. Use the RS algorithm [20] to solve (5) and find the
orientation of the vertices. (RS uses the Frobenius
norm for dR(.))
2. Use 1DSFM algorithm [35] to remove the outliers.
3. Solve (8) on the common constraint of λij ≥ 1 to
estimate pis.
4. Replace parameters λij with fixed coefficients λij =
‖pj − pi‖.5. Solve (8) with fixed λijs, which now is a uncon-
strained linear least-squares problem. (We refer to
it as LS1 in Alg. 1)
6. Repeat the last two steps until convergence.
To minimize the directions error instead of displace-
ments error, we can use the formulation of (9). There-
fore the 4th and 5th step of the algorithm is replaced
by:
4. Replace parameters λij with fixed coefficients λij =
1/‖pj − pi‖.
Algorithm 2
Input: Edges of a viewing graph (Rij , γlij , ∀ek = {i, j} ∈E)
approach:“C” minimizing the common used objective function
comprises displacements error.“O” minimizing the original objective function com-
prises directions error.Output: Estimated poses of vertices (Ri, pi, ∀i ∈ V )
1: Ri ← RS(Rij) . Find initial orientations by RS
2: γij ← Riγlij , ∀ek = {i, j} ∈ E . Move directions toglobal coordinate
3: G′ = {E′, V ′} ← 1DSFM(G = {E, V }) . Removeoutliers by 1DSFM
4: {pi} ← CLS(γij) . Solve CLS to obtain positions5: repeat6: γij ← Riγlij , ∀ek = {i, j} ∈ E . Move directions to
global coordinate according to the updated rotations7: if approach=“C” then8: λij ← ‖pj − pj‖, ∀e′k = {i, j} ∈ E′9: wij ← 1, ∀e′k = {i, j} ∈ E′
10: else if approach=“O” then11: λij ← ‖pj − pj‖, ∀e′k = {i, j} ∈ E′12: wij ← 1
‖pj−pj‖2 , ∀e′k = {i, j} ∈ E′
13: end if14: {pi, Ri} ← PS(λijγlij , Rij , wij) . Use PS [20] to
update positions and orientations15: until convergence16: return Ri, pi, ∀i ∈ V ′
5. Solve (9) with fixed λijs, which now is a uncon-
strained linear least-squares problem. (We refer to
it as LS2 in Alg. 1)
Fixing the parameters λijs converts the direction
observations to translation observations. This means
that the given viewing graph is converted to a Pose-
Graph, and the problem is turned to a Pose-Graph op-
timization problem:
min{pi,Ri}
∑eij={i,j}∈E
∥∥Rij , RTj Ri∥∥2F +
wij ‖λijγij − (pj − pi)‖2 .(10)
The literature of the Pose-Graph optimization has
a wide variety of algorithms from which there are algo-
rithms that solve the the orientations and positions of
the vertices simultaneously [20,25]. This was the moti-
vation of our second method which optimizes the po-
sitions and the orientations together. The idea is to
change the 5th step of the algorithm with the state-of-
the-art PS algorithm proposed in [20] to find the ver-
tices’ poses. The proposed algorithm is summarized in
Alg. 2.
In both of our proposed algorithm, we use displace-
ment error in the cost function without any constraint
to avoid trivial solution. Since we use a good initializa-
tion obtained by the CLS algorithm, we do not see any
6 Seyed-Mahdi Nasiri et al.
problems in the experiments. Because of the scale prob-
lem using displacement error in the full viewing graph
loss, which was solve by turning the problem into a
PGO in our method, seems unjustified. But using this
error in the full viewing graph loss helps getting impor-
tant insight that becomes clear later in the experiments.
5 Experiments
To evaluate the proposed algorithms, the challenging
real datasets of [35] are used. The camera poses esti-
mations, computed by Bundler [27] and presented in
[35] are considered as the ground truth. In this section,
Alg. 1C or Alg. 2C stand for using the displacements er-
ror in the proposed algorithms, and Alg. 1O or Alg. 2O
stand for using the directions error in each iteration of
the algorithms 1.
For the first step, the weakness of the constraint
λij ≥ 1, which is discussed in section 3.1, is shown ex-
perimentally. To this end, the orientation of the cam-
eras are estimated by solving the rotation averaging
problem of (5) using RS [20]. Outliers are removed by
1DSFM algorithm, and then the positions are estimated
by CLS, i.e. solving (8) on the constraint of λij ≥ 1. The
results are compared to the results of Alg. 1C. Fig. 4
compares the histogram of the scale factors ratio to the
corresponding vertices distances obtained by CLS and
Alg. 1C in different datasets. It can be seen that all
scale factors obtained by Alg. 1C, despite CLS, match
their corresponding vertices distances.
In the following, the recovery performance of the
proposed algorithms are evaluated in comparison to themost accurate algorithms proposed in the literature.
The comparing methods are Cross method of [11], CLS
method which is used in [33,34], and LUD method pro-
posed in [23]. Since the problem has natural global ro-
tation, translation, and scale ambiguities, an Euclidean
transformation should be found that aligns the cam-
era positions to the ground truth as much as possible.
Mathematically, we solve
minR,t,s
n∑i=1
∥∥((sRpi + t)− pi)∥∥2 , (11)
where R, t, and s are the rotation matrix, transla-
tion vector, and scale parameters of the scaled Eu-
clidean transformation. The comparison criteria are the
mean (e), the median (e), and the root mean squared (e)
distances error. The root mean squared distances error
1 Efficient implementation of the proposed algorithmsin MATLAB is available via http://visionlab.ut.ac.ir/
resources/vgorls.zip
is given by
e =
√√√√ 1
m
m∑i=1
‖pi − pi‖2, (12)
where m is the number of edges.
The histogram of rotation and direction measure-
ments error, before and after applying 1DSFM for Tower
of London are shown in Fig. 5. The figure shows that
the most of outliers were removed and the output has
a few number of outliers. The similar results were ob-
tained for other datasets.
The results of various methods in different datasets
are listed in Table 1. The results show that in almost all
datasets, the proposed methods outperform the others.
Further, the comparison of Alg. 1 and Alg. 2 demon-
strates that minimizing the cost function over rotations
and translations simultaneously, results in better cam-
era localization. Comparing the results of Alg. 1C to
Alg. 1O and Alg. 2C to Alg. 2O shows that, although
the differences are small, minimizing the displacements
error instead of the directions error results in a bet-
ter camera localization. Because of the scale problem
as discussed in the section of the proposed methods,
using Alg. 2C is not justified. But interestingly, this al-
gorithm shows the best performance which shows that
resulting scales obtained by solving the problem using
our proposed method is fine. This shows the importance
of using correct weight for combining two terms of the
viewing graph optimization problem.
6 Conclusion
In this paper, we proposed iterative methods for solv-
ing the camera location estimation problem. One of our
proposed methods fixed the unknown scale factors, i.e.
λijs, which appear on the common cost function (8), in
each iteration. Hence, the solution is obtained through
solving iterative least-squares problems. The scale fac-
tors in each iteration are set to the distance between
the corresponding vertices distances in the previous it-
eration. We observed experimentally that this helps the
final scale factors to converge to the distances of the cor-
responding vertices, i.e. λij → ‖pj − pi‖. This solved
the problem of CLS algorithm that bunches up a sig-
nificant number of scale factors in the lower bound.
Fixing the parameters λijs converts the direction
observations to translation observations and converts
the given viewing graph to a Pose-Graph. This moti-
vated us to use the state-of-the-art method of PS [20]
to obtain rotations and positions of cameras simulta-
neously and to propose the second algorithm. Obtain-
ing the best results by solving the full cost of viewing
Solving Viewing Graph Optimization for Simultaneous Position and Rotation Registration 7
-2.5 -2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
105
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
-2.5 -2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
105
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
105
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
105
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
-2.5 -2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
105
-2 -1.5 -1 -0.5 0 0.5100
101
102
103
104
Fig. 4 Comparing the scale factors obtained by CLS and Alg. 1C on different real datasets. The figures show the histogram
of log10λij
‖pi−pj‖obtained by the methods on different datasets.
Table 1 Performance comparison of various methods in different original datasets. The table depicts the number of edges, m,and the number of vertices, n, after removing the outliers using 1DSFM. The comparison criteria are the root mean squareerror, e, the mean distance error, e, and the median distance error ,e.
Dataset Cross CLS LUD Alg. 1C Alg. 1O Alg. 2C Alg. 2OName m n e e e e e e e e e e e e e e e e e e e e e
Alamo 46353 479 15.94 12.96 12.52 5.36 2.89 1.78 4.89 2.56 1.43 5.46 2.57 1.18 5.39 2.61 1.30 5.14 2.17 0.93 4.83 2.11 0.95Ellis Island 4512 193 30.85 26.18 23.26 11.19 6.87 4.22 10.92 6.17 3.57 10.00 5.71 3.07 10.86 6.17 3.28 10.00 5.37 2.65 10.29 5.50 2.68Madrid Metropolis 4150 253 51.45 35.65 29.23 21.66 14.65 11.36 20.63 13.47 10.16 18.55 12.23 8.51 20.33 13.20 10.01 16.90 10.95 7.05 17.28 10.98 6.79Montreal Notre Dame 26330 385 12.13 11.14 11.99 2.70 1.82 1.13 2.62 1.76 1.10 2.54 1.66 1.01 2.87 1.84 1.05 2.45 1.59 0.88 2.60 1.67 0.96Notre Dame 45048 501 13.32 9.81 7.83 6.26 4.00 2.97 6.27 3.70 2.48 5.71 3.18 2.05 5.84 3.27 2.20 5.28 2.71 1.60 5.39 2.78 1.68NYC Library 5614 260 15.29 13.80 14.05 4.31 3.14 2.28 4.04 2.82 1.88 5.06 3.55 2.41 4.60 3.15 2.05 4.85 3.03 1.92 4.87 2.96 1.82Piazza del Popolo 12027 234 16.43 14.29 12.56 3.30 2.24 1.48 3.10 2.03 1.30 3.04 1.95 1.17 3.31 2.11 1.18 2.35 1.47 0.94 2.52 1.55 0.91Tower of London 7539 371 76.41 66.35 60.20 39.01 26.75 19.02 36.52 24.32 15.37 34.16 20.49 11.86 38.13 25.15 16.79 32.99 18.19 9.26 35.24 20.69 10.96Union Square 5919 468 18.73 14.84 10.81 12.91 9.02 6.61 12.67 8.68 6.27 13.17 9.07 6.74 12.89 9.02 6.89 12.80 8.43 5.75 12.90 8.71 6.19Vienna Cathedral 36729 624 34.58 27.46 22.41 24.27 18.81 14.12 22.84 17.62 13.17 20.90 15.59 10.85 25.45 19.20 13.15 19.56 14.42 10.06 23.32 17.29 11.99Yorkminster 10249 317 27.97 13.74 5.94 15.99 8.68 5.77 17.09 7.85 4.99 15.00 7.64 4.74 14.89 7.45 4.81 14.39 6.88 4.05 14.27 6.70 3.81
graph optimization problem is an important result. It
is an step toward to solve challenging viewing graph
optimization problem and calls other researches to put
emphasis on developing the algorithms considering both
terms together in the viewing graph optimization prob-
lem.
Both algorithms can solve the original cost function
in which the chordal distance of directions are mini-
mized. But the experiments show that minimizing the
common used displacement distance metric (weighting
the directions error by the edges length), results in a
better performance in real datasets. This is an interest-
ing result, because using displacement distance metric
has scale problem which can be more important in the
case of the full viewing graph optimization solved us-
ing our Alg. 2C. Having the best performance of Alg.
2C shows the importance of having good weights for
combining the two term of the viewing graph optimiza-
tion problem. An interesting future direction of work
is obtaining the covariance of noise for using accurate
weights for combining two terms in the loss function of
viewing graph optimization.
We observed experimentally that all our proposed
algorithms converge. Another important line of future
8 Seyed-Mahdi Nasiri et al.
0 50 100 150 200100
105
0 50 100 150 200100
105
0 50 100 150 200100
102
104
0 50 100 150 200100
102
104
Fig. 5 Histograms of rotation and direction measurementserror, before and after applying the 1DSFM algorithm for“Tower of London” dataset.
research that we are undertaking is developing theoret-
ical convergence results for our proposed algorithms.
Appendix
The author in [11] proposed an iterative reweighted al-
gorithm for solving orthogonal distance of directions
error, i.e. the solution of (6) with orthogonal distance.
The strategy is to minimize orthogonal distance of dis-
placements error which can be solved in the closed-form
and then reweigthing errors by the squared distance of
the edges. Although upon convergence, the algorithm
yields that the weights equal to the edges lengths, there
is no guarantee that the algorithm converges.
We use a novel formulation in which the solution is
directly the minimizer of (6) where the distance metric
is orthogonal distance. We use the cost function of (9)
that uses λij multipliers before the second term. The
problem (9) does not need any constraint and the mini-
mizer of the cost function is directly the solution of (8)
with the orthogonal distance metric.
Suppose that the set of {λ∗ij , p∗i } is the optimal so-
lution of (9). So we have,
∀eij = {i, j} ∈ E,∀λij ∈ R,∥∥γij − λ∗ij(p∗j − p∗i )∥∥2 < ∥∥γij − λij(p∗j − p∗i )
∥∥2 . (13)
Since λij(p∗j − p∗i ) comprises all points on the line
which connects pi to pj , it can be conclude from (13)
that λ∗ij(p∗j−p∗i ) is the nearest point of pipj-line to the
endpoint of γij . Obviously the nearest point obtained
by projecting γij onto pipj-line, and its Euclidean dis-
tance is equal to cross product of γij andp∗j−p
∗i
‖p∗j−p∗i ‖.
Problem (9) can be solved by a two block coordinate
descent approach. The algorithm repeats the following
two steps. In one step, the cost is optimized over pis
while λijs are fixed, and in the next step the cost is
optimized over λijs while pis are fixed vectors. Both
steps are linear least-squares problem and can be solved
in a closed form. It is easy to show that the proposed
coordinate descent algorithm satisfies the convergence
conditions of theorem 1 of [26]. Therefore, the algorithm
finds the minimizer of the sum of squared orthogonal
distance of directions error. Although we were able to
prove a convergence result for minimizing orthogonal
distance, we obtained bad results using this method in
experiments.
References
1. Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Cur-less, B., Seitz, S.M., Szeliski, R.: Building rome in a day.Communications of the ACM 54(10), 105–112 (2011)
2. Arie-Nachimson, M., Kovalsky, S.Z., Kemelmacher-Shlizerman, I., Singer, A., Basri, R.: Global motion esti-mation from point matches. In: International Conferenceon 3D Imaging, Modeling, Processing, Visualization &Transmission, pp. 81–88. IEEE (2012)
3. Arrigoni, F., Magri, L., Rossi, B., Fragneto, P., Fusiello,A.: Robust absolute rotation estimation via low-rank andsparse matrix decomposition. In: International Confer-ence on 3D Vision, pp. 491–498. IEEE (2014)
4. Arrigoni, F., Rossi, B., Fragneto, P., Fusiello, A.: Ro-bust synchronization in SO(3) and SE(3) via low-rankand sparse matrix decomposition. Computer Vision andImage Understanding 174, 95–113 (2018)
5. Byrod, M., Josephson, K., Astrom, K.: Improving nu-merical accuracy of Grobner basis polynomial equationsolvers. In: International Conference on Computer Vi-sion, pp. 449–456. IEEE (2007)
6. Chatterjee, A., Govindu, V.M.: Robust relative rotationaveraging. IEEE Transactions on Pattern Analysis andMachine Intelligence 40(4), 958–972 (2017)
7. Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.:Discrete-continuous optimization for large-scale structurefrom motion. In: Conference on Computer Vision andPattern Recognition, pp. 3001–3008. IEEE (2011)
8. Frahm, J.M., Fite-Georgel, P., Gallup, D., Johnson, T.,Raguram, R., Wu, C., Jen, Y.H., Dunn, E., Clipp, B.,Lazebnik, S., et al.: Building rome on a cloudless day. In:European Conference on Computer Vision, pp. 368–381.Springer (2010)
9. Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: To-wards internet-scale multi-view stereo. In: Computer So-ciety Conference on Computer Vision and Pattern Recog-nition, pp. 1434–1441. IEEE (2010)
10. Goldstein, T., Hand, P., Lee, C., Voroninski, V., Soatto,S.: Shapefit and shapekick for robust, scalable structurefrom motion. In: European Conference on Computer Vi-sion, pp. 289–304. Springer (2016)
11. Govindu, V.M.: Combining two-view constraints for mo-tion estimation. In: Computer Society Conference on
Solving Viewing Graph Optimization for Simultaneous Position and Rotation Registration 9
Computer Vision and Pattern Recognition., pp. 218—-225. IEEE (2001)
12. Hartley, R., Aftab, K., Trumpf, J.: L1 rotation averagingusing the weiszfeld algorithm. In: Computer Vision andPattern Recognition (CVPR). IEEE Conference on, pp.3041–3048. IEEE (2011)
13. Hartley, R., Trumpf, J., Dai, Y., Li, H.: Rotation aver-aging. International Journal of Computer Vision 103(3),267–305 (2013)
14. Hartley, R.I., Sturm, P.: Triangulation. Computer Visionand Image Understanding 68(2), 146–157 (1997)
15. Havlena, M., Torii, A., Knopp, J., Pajdla, T.: Random-ized structure from motion based on atomic 3d modelsfrom camera triplets. In: Conference on Computer Visionand Pattern Recognition, pp. 2874–2881. IEEE (2009)
16. Jiang, N., Cui, Z., Tan, P.: A global linear method forcamera pose registration. In: International Conferenceon Computer Vision, pp. 481–488. IEEE (2013)
17. Kukelova, Z., Bujnak, M., Pajdla, T.: Polynomial eigen-value solutions to the 5-pt and 6-pt relative pose prob-lems. In: British Machine Vision Conference, vol. 2, pp.56.1–56.10 (2008)
18. Lourakis, M.I.A., Argyros, A.A.: SBA: A software pack-age for generic sparse bundle adjustment. ACM Trans-actions on Mathematical Software (TOMS) 36(1), 1–30(2009)
19. Moulon, P., Monasse, P., Marlet, R.: Global fusion of rel-ative motions for robust, accurate and scalable structurefrom motion. In: International Conference on ComputerVision, pp. 3248–3255. IEEE (2013)
20. Nasiri, S.M., Hosseini, R., Moradi, H.: Novel parameter-ization for gauss–newton methods in 3D pose graph op-timization. IEEE Transactions on Robotics (2020)
21. Nasiri, S.M., Moradi, H., Hosseini, R.: A linear leastsquare initialization method for 3D pose graph optimiza-tion problem. In: International Conference on Roboticsand Automation, pp. 2474–2479. IEEE (2018)
22. Nister, D.: An efficient solution to the five-point relativepose problem. IEEE Transactions on Pattern Analysisand Machine Intelligence 26(6), 756–770 (2004)
23. Ozyesil, O., Singer, A.: Robust camera location estima-tion by convex programming. In: Conference on Com-puter Vision and Pattern Recognition, pp. 2674–2683.IEEE (2015)
24. Ozyesil, O., Singer, A., Basri, R.: Stable camera motionestimation using convex programming. SIAM Journal onImaging Sciences 8(2), 1220–1262 (2015)
25. Rosen, D.M., Carlone, L., Bandeira, A.S., Leonard, J.J.:Se-sync: A certifiably correct algorithm for synchroniza-tion over the special euclidean group. The InternationalJournal of Robotics Research 38(2-3), 95–125 (2019)
26. Rouzban, S.M., Hosseini, R.: A rate of convergencefor two-block coordinate descent. arXiv preprintarXiv:1901.08794 (2019)
27. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: ex-ploring photo collections in 3d. In: ACM siggraph 2006papers, pp. 835–846 (2006)
28. Snavely, N., Seitz, S.M., Szeliski, R.: Skeletal graphs forefficient structure from motion. In: 2008 IEEE Confer-ence on Computer Vision and Pattern Recognition, pp.1–8. IEEE (2008)
29. Stewenius, H., Schaffalitzky, F., Nister, D.: How hard is3-view triangulation really? In: International Conferenceon Computer Vision, vol. 1, pp. 686–693. IEEE (2005)
30. Sweeney, C., Sattler, T., Hollerer, T., Turk, M., Polle-feys, M.: Optimizing the viewing graph for structure-from-motion. In: International Conference on ComputerVision, pp. 801–809. IEEE (2015)
31. Torr, P.H., Zisserman, A.: Mlesac: A new robust estima-tor with application to estimating image geometry. Com-puter Vision and Image Understanding 78(1), 138–156(2000)
32. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon,A.W.: Bundle adjustment—a modern synthesis. In: In-ternational workshop on vision algorithms, pp. 298–372.Springer (1999)
33. Tron, R., Vidal, R.: Distributed image-based 3-d local-ization of camera sensor networks. In: Conference onDecision and Control held jointly with Chinese ControlConference, pp. 901–908. IEEE (2009)
34. Tron, R., Vidal, R.: Distributed 3-d localization of cam-era sensor networks from 2-d image measurements. IEEETransactions on Automatic Control 59(12), 3325–3340(2014)
35. Wilson, K., Snavely, N.: Robust global translations with1dsfm. In: European Conference on Computer Vision,pp. 61–75. Springer (2014)
36. Wu, C.: Towards linear-time incremental structure frommotion. In: International Conference on 3D Vision-3DV2013, pp. 127–134. IEEE (2013)
37. Zhu, S., Zhang, R., Zhou, L., Shen, T., Fang, T., Tan,P., Quan, L.: Very large-scale global sfm by distributedmotion averaging. In: Conference on Computer Visionand Pattern Recognition, pp. 4568–4577. IEEE (2018)