real-time camera orientation estimation based on vanishing...
TRANSCRIPT
-
ORIGINAL RESEARCH PAPER
Real-time camera orientation estimation based on vanishing pointtracking under Manhattan World assumption
Wael Elloumi • Sylvie Treuillet • Rémy Leconge
Received: 3 June 2013 / Accepted: 20 March 2014 / Published online: 4 April 2014
� Springer-Verlag Berlin Heidelberg 2014
Abstract This paper proposes a real-time pipeline for
estimating the camera orientation based on vanishing
points for indoor navigation assistance on a Smartphone.
The orientation of embedded camera relies on the ability to
find a reliable triplet of orthogonal vanishing points. The
proposed pipeline introduces a novel sampling strategy
among finite and infinite vanishing points with a random
sample consensus-based line clustering and a tracking
along a video sequence to enforce the accuracy and the
robustness by extracting the three most pertinent orthogo-
nal directions while preserving a short processing time for
real-time application. Experiments on real images and
video sequences acquired with a Smartphone show that the
proposed strategy for selecting orthogonal vanishing points
is pertinent as our algorithm gives better results than the
recently published RNS optimal method, in particular for
the yaw angle, which is actually essential for the navigation
task.
Keywords Vanishing point � Real-time cameraorientation � RANSAC algorithm � Navigation assistance �Smartphone
1 Introduction
This paper deals with a practical approach for real-time
camera orientation for indoor navigation assistance on a
Smartphone. Pedestrian navigation assistance is a highly
challenging task that is increasingly needed in various
types of applications such as visually impaired guidance,
emergency intervention, tourism, etc. Despite the growing
interest of research in this topic, there is no effective
operational system. This can be explained by several
technology roadblocks. Besides the absence of existing
cartography, the main obstacle remains the availability and
accuracy of the required position information.
Nearly all Smartphones have built-in GPS sensor, but it
is still insufficient for pedestrian positioning because of its
low accuracy. Besides, it is often ineffective in urban areas
or in indoor environments where the satellite signal
reception is undermined. The alternative solution consists
in joining measures from various additional self-contained
sensors (magnetometers, accelerometer, digital compass,
WiFi, Bluetooth, etc.) with a Kalman filter. The problem
with these other positioning alternative methods is the
integration process which results in large cumulative drift
errors (even for very short periods of times). Errors due to
electromagnetic perturbations by electric or ferrous mate-
rials are also often reported in related works [14]. In this
context, embedded computer vision is likely the most
attractive technique to increase positioning accuracy, since
high-quality cameras are available in Smartphones and
image processing is applicable in all environments requir-
ing no additional infrastructure installation. One conve-
nient way to compute a real-time camera orientation in
urban area is based on vanishing points.
By perspective projection, the parallel lines in 3D scene
intersect in the image plane in a so-called vanishing point.
Electronic supplementary material The online version of thisarticle (doi:10.1007/s11554-014-0419-9) contains supplementarymaterial, which is available to authorized users.
W. Elloumi (&) � S. Treuillet � R. LecongeLaboratoire PRISME, Université d’Orléans, 12 rue de Blois,
45067 Orléans Cedex, France
e-mail: [email protected]
S. Treuillet
e-mail: [email protected]
R. Leconge
e-mail: [email protected]
123
J Real-Time Image Proc (2017) 13:669–684
DOI 10.1007/s11554-014-0419-9
http://dx.doi.org/10.1007/s11554-014-0419-9http://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdf
-
Unlike finite vanishing points whose coordinates are
determined in the image plane, the infinite ones which are
very far from the image center are defined by a direction. In
man-made urban environments, many line segments are
oriented along three orthogonal directions aligned with the
global reference frame. Under this so-called Manhattan
World assumption, vanishing lines or points are pertinent
visual cues to estimate the camera orientation [1, 9, 13, 16,
18, 20, 24]. This is an interesting alternative to structure
and motion estimation based on feature matching, a sen-
sitive problem in computer vision. The orientation matrix
of a calibrated camera, parameterized with three angles,
may be efficiently computed from three noise-free
orthogonal vanishing points. Vanishing points are also used
for camera calibration [18, 26, 30]. However, these tech-
niques rely on the ability to find a robust orthogonal triplet
of vanishing points in a real image.
The estimation of the camera orientation is generally
computed in a single image. Few works address the
tracking along a video sequence [4, 20]. In a previous work
[11], we proposed a solution to achieve an accurate esti-
mation of the camera orientation based on a sampling
strategy among finite and infinite vanishing points extrac-
ted with random sample consensus (RANSAC)-based line
clustering.
This paper extends the previous conference paper. The
principal additive contribution of this paper is a real-time
implementation on a Smartphone. The experimental results
have been significantly extended to evaluate our method
facing other state-of-art methods on real static images. The
performance evaluation contains additionally longer video
sequences acquired by a handheld Smartphone. The paper
gives also more explanations and clarifications than the
earlier conference paper.
The paper is organized as follows. A review of related
work is proposed in Sect. 2. Section 3 presents our method
for selecting and filtering three reliable orthogonal van-
ishing points which are used for estimating the camera
orientation. Experimental results are shown in Sect. 4, and
Sect. 5 concludes the paper.
2 Related work
Since the last 30 years, the literature is broad on the subject
of vanishing points/vanishing point (VP) computation. In
this section we review some of the most relevant ones. The
first approaches used the Hough transform and accumula-
tion methods [6, 7, 19]. The efficiency of these methods
highly depends on the discretization of the accumulation
space and they are not robust in the presence of outliers
[29]. Furthermore, they do not consider the orthogonality
of the resulting VP. An exhaustive search method may take
care of the constraint of orthogonality [27], but is not
adapted for real-time application.
Few authors prefer to work on the raw pixels [10, 20];
published methods mainly work on straight lines extracted
from image. According to the mathematical formalization
of VP, some variants exist in the choice of the workspace:
image plane [27, 30], Hough transform [7, 31], projective
plane [25, 26] or Gaussian sphere [3, 8, 18, 19]. Using
Gaussian unit sphere or projective plane allows treating
equally finite and infinite VP, unlike image plane. This is a
well-suited representation for simultaneously clustering
lines that converge at multiple VP by using a probabilistic
expectation–maximization (EM) joint optimization
approach [1, 9, 18, 23, 25]. These approaches address the
misclassification and optimality issues, but the initializa-
tion and grouping are the determining factors of their
efficiency.
Recently, many authors have adopted robust estimation
based on RANSAC [12]. These approaches consider
intersection of line segments as VP hypotheses and then
iteratively clustering the parallel lines, consistent with this
hypothesis [4, 13, 15, 22, 28, 32]. A variant by J-linkage
algorithm has been used by [30]. By dismissing the outli-
ers, the RANSAC-based classifiers are much more robust
than accumulative methods, limited by the size of the
accumulator cell, and give a more precise position of the
VP. They have been used to initialize EM estimators to
converge to the correct VP. Other optimal solutions rely on
analytical approach often based on time-consuming algo-
rithms [2, 5, 17, 24]. In this last paper, it is interesting to
note that, even if they are nondeterministic, the RANSAC-
based approaches hold comparable results against exhaus-
tive search for the number of clustered lines. So, it remains
a very good approach for extracting the VP candidates, in
addition with a judicious strategy for selecting a triplet
consistent with the orthogonality constraint.
Despite numerous papers dedicated to the straight line
clustering to compute adequate vanishing points, this prob-
lem remains an open issue for real-time application in video
sequences. The estimation of the camera orientation is gen-
erally computed in a single image. As a summary, Table 1
classifies the most relevant works according to their search
space of VP (workspace) and the type of approach used.
To achieve an accurate estimation of the camera orien-
tation on video sequences acquired with a handheld
Smartphone, we develop a real-time pipeline based on
previous work [11]. Our proposed framework operates in
the image space, unlike [4] which uses the projection onto
the Gauss sphere, a workspace well adapted to omnidi-
rectional camera. It uses line segments as primitives which
is an easier image information to handle for VP extraction.
In addition, it allows a joint process of estimation and
classification of VP based on RANSAC. To preserve a
670 J Real-Time Image Proc (2017) 13:669–684
123
-
short processing time, only few VP candidates are selected
by splitting finite and infinite VP extracted with a RAN-
SAC-based line clustering. Indeed, the RANSAC algorithm
allows a robust classification of vanishing lines and an
accurate detection of their intersection. This classification
is easy to implement, fast and requires no initialization as
probabilistic models. Moreover, camera orientation track-
ing is reinforced by filtering the VP along the video
sequence.
3 Proposed pipeline
The proposed pipeline is given in Fig. 1. To achieve an
accurate estimation of the camera orientation based on
three reliable orthogonal VP, the pipeline is composed of
four steps. The first one consists of dominant line extrac-
tion from the detected image edges. The second one con-
sists of selecting a triplet of VP by dominant line clustering
with RANSAC. At this step, we introduce a guided search
strategy to select only three reliable orthogonal VP that
represent the orientation of the camera relative to 3D world
reference frame. Another contribution is the vanishing
point filtering performed along the video sequences (step 3)
to enforce the robustness of the camera orientation com-
putation (step 4). The next sections give some details and
justifications about each bloc.
3.1 Dominant line detection
Some pre-processing is introduced to improve the quality
and the robustness of the detected edges in case of
embedded camera: first, a histogram equalization harmo-
nizes the distribution of brightness levels in the image.
Secondly, a geometric correction of lens distortion is done
assuming that all camera parameters are estimated by
prerequisite calibration. To find the dominant lines, we first
detect edges by using a Canny’s detector. Then, we use the
progressive probabilistic Hough transform (PPHT) [21], a
form of an adaptive probabilistic HT. The PPHT is faster
than both standard Hough transform and probabilistic
Hough transform which makes it well suited for real-time
application. Furthermore, this method enables the compu-
tation of line segment length. So, only line segments whose
length is greater than a threshold p1 are selected as domi-
nant lines to estimate VP. The PPHT algorithm requires
five parameters: two of them for the resolution of the
accumulator or the Hough space (q ¼ 1 pixel and h = 3�);the minimum number of intersections to detect a line in the
accumulator (d ¼ 50Þ; the minimum length of the selectedline segments (p1 ¼ 30 pixels); the last parameter is themaximum gap between two points to be considered in the
same line (p2 ¼ 5 pixels). Note that all these parametershave been set empirically.
3.2 Vanishing point candidates
To provide three VP, aligned with the three main orthog-
onal directions of the Manhattan World, the most intuitive
method is to detect the intersection of dominant lines in
images. Two types of VP are considered: the finite ones
correspond to line interception that may be defined by
coordinates in the image plane; the infinite ones for lines
intercepting very far from the image center are modeled by
Table 1 Classification of vanishing point estimation algorithms
Type of approach Search space of vanishing points
Image Hough transform Gaussian sphere Projective plane
Accumulation methods Rother [27] Cantoni et al. [7] Boulanger et al. [6]
Tuytelaars et al. [31] Shufelt [29]
Lutton et al. [19]
Barnard [3]
RANSAC-based methods Saurer et al. [28] Hwangbo and Kanade [15]
Tardif [30] Förstner [13]
Kalantari et al. [17] Bazin and Pollefeys [4]
Micusik et al. [22]
Wildenauer and Vincze [32]
Elloumi et al. [11]
Probabilistic methods Bazin et al. [5] Kosecka and Zhang [18] Nieto and Salgado [25]
Mirzaei and Roumeliotis [24] Antone and Teller [1] Pflugfelder and Bischof [26]
Barinova et al. [2] Collins and Weiss [8]
Mingawa et al. [23]
J Real-Time Image Proc (2017) 13:669–684 671
123
-
a direction. Unlike the Gauss sphere, working in the image
plane needs this differentiation between finite and infinite
points, but is faster since no transformation is required.
Recently, numerous authors adopt RANSAC as a simple
and powerful method to provide a partition of parallel
straight lines into clusters by pruning outliers [13, 15, 22,
28, 32]. The process starts by randomly selecting two lines
to generate a VP hypothesis. Then, all lines consistent with
this hypothesis are grouped together to optimize the VP
estimate. Once a dominant VP is detected, all the associ-
ated lines are removed and the process is repeated to detect
the next dominant VP. The principal drawback of this
sequential search is that no orthogonality constraint is
imposed for selecting a reliable set of three VP to compute
the camera orientation. Very recent works propose optimal
estimates of three orthogonal VP by an analytical approach
based on a multivariate polynomial system solver [24] or
by optimization approach based on interval analysis theory
[5], but at the expenses of complex time-consuming
algorithms.
In this work, we introduce a guided search strategy to
extract a limited number of reliable VP while enforcing the
orthogonality constraint, in conjunction with RANSAC.
3.3 VP sampling strategy
In the context of indoor navigation, the main orthogonal
directions inManhattanWorld consist generally of a vertical
one (often associated with an infinite VP) and two horizontal
ones (associated with finite or infinite VP). So, we propose a
fast and robust sampling strategy taking advantage of the
differentiation between finite and infinite VP.
Three different possible configurations depending on the
alignment of the image plane with the 3D urban scene are
considered: (1) one finite and two infinite VP, (2) two finite
and one infinite VP, (3) three finite VP. The first two
configurations are common, unlike the third. More details
about the computation of the camera orientation depending
on these three configurations will be given in Sect. 3.5. To
avoid confusion between finite VP and infinite ones, VP
detection is carried out in two steps: first, we detect finite
VP; then, we detect the infinite ones. For a robust selection
of VP, the proposed algorithm starts with the detection of
three finite candidates. For this, we use RANSAC to choose
randomly a pair of line segment ðli; ljÞ from the set ofunlabeled lines L. The intersection of these two lines pro-
vides a potential finite VP ðfvppotÞ. To extract the partici-pant lines to fvppot, we compute a consensus score given by
Eqs. (1) and (2).
Score ¼Xn
i¼0f v; lið Þ ð1Þ
where n is the number of dominant lines of the set L.
f v; lið Þ1; d v; lið Þ\d0; otherwise
�ð2Þ
Fig. 1 Overview of theproposed algorithm
672 J Real-Time Image Proc (2017) 13:669–684
123
-
where d v; lið Þ is the Euclidian distance from the finite VPcandidate v to the line li.
These three steps are repeated k times. k is the number
of random combinations for selecting a pair of line seg-
ments from L (number of iterations performed by the
algorithm). Note that, for better performance, kmust be
proportional to the number of line segments. Specifically,
in our experiments this parameter is set to k ¼ 2� cardðLÞ.The potential VP ðfvppotÞ with the highest number of par-ticipant lines is selected as the first finite VP ðv1Þ. Alldominant lines associated with v1 are labeled and excluded
from L and the process is repeated twice to detect v2 and v3.
The procedure used to detect two infinite VP is similar
to that of finite VP except that the aim is to identify the
vanishing line direction instead of their intersection. For
this, we select randomly a line segment from the remaining
dominant lines of the set L. Then, we compute its direction
which gives us an infinite potential VP ðivppotÞ: Theassociated lines to ivppot are obtained using the consensus
score given by Eqs. (1) and (3).
f v; lið Þ 1 Mindv~; li!� �
;dli!; v~
� �� �\s
0 otherwise
8<
: ð3Þ
wheredv~; li!� �
is the angle between the infinite VP direc-
tion from the image center and the line li to test in image
space. Note that, in such case, k ¼ cardðLÞ: k is the numberof random combinations for selecting a line segment from
L (number of iterations performed by the algorithm).
At the end of this procedure, we obtain three finite VP
and two infinite VP. The three VP that best satisfy the
orthogonality constraint are selected as shown in Fig. 2.
The criteria used in the consensus score (Eq. 1) for
clustering lines by RANSAC are different depending on
each category. Unlike the finite VP whose coordinates may
be determined in the image plane, the infinite VP are
generally represented as a direction. For finite VP, the
consensus score is based on a distance between the can-
didate straight line and the intersecting point (Eq. 2). All
lines whose distance is below a fixed threshold d areconsidered as participants (d is equal to 4 pixels in ourexperiments). For infinite VP, it uses an angular distance
between the direction of the candidate straight line and the
direction representing the infinite VP (Eq. 3). The threshold
s is equal to 4� in our experiments.To avoid redundant VP candidates, we introduce the
supplementary constraint to be far enough from each other:
we impose VP to have a minimum angular distance
between their directions from the image center (threshold is
set to 30� for finite VP and 60� for infinite ones).By separating finite from infinite VP, the sampling
strategy provides the most significant of them without
giving more importance to one or the other category (we
enforce having at least one candidate finite). Furthermore,
this VP sampling strategy is faster as we detect only five
reliable VP candidates against generally much more for the
previous published methods.
Among the five candidates selected before, only three
VP whose directions from the optical center are orthogonal
have to be accepted, including at least one finite VP. We
adopt the following heuristic: (1) choose the finite VP with
the highest consensus score, (2) select two other VP (finite
or infinite) based on their orthogonality to the first one, and
considering their consensus score as a second criterion.
Finally, we identify the vertical VP and the two horizontal
ones. In our application, we assume that the camera is kept
upright: we identify the vertical VP as that which presents
the closest direction to the vertical direction from the
image center. The two remaining VP are thus horizontal.
3.4 Vanishing point filtering
Once the whole described algorithm is processed for the
first frame of the video sequence ðN[ 1Þ, the VP positionscan be tracked from one frame to another. Indeed, VP
positions or directions are slightly modified in video
sequences or even in a list of successive frames. So, we
introduce a filtering test to check the consistency between
the positions of the estimated VP in the frame N and those
Fig. 2 Illustration of our VP sampling strategy. a Image of dominant lines, b detection of the three pertinent finite VP, c detection of the twopertinent infinite VP, d selection of the three orthogonal VP
J Real-Time Image Proc (2017) 13:669–684 673
123
-
estimated in frame N � 1. For this, we use the distancebetween the positions of the VP for the finite ones
d vN ; vN�1ð Þ, and the angle between the VP directions forthe infinite ones dvN�!; vN�1��!
� . When a VP is not coherent
with its previous position or direction, it is re-estimated
taking into account its previous position or direction and
using the remains of unclassified lines. Hence, aberrant VP
are discarded and replaced by new VP that are, at the same
time, consistent with the previous ones and satisfy the
orthogonality constraint.
However in a video sequence, it could happen that
because of the camera motion an infinite vanishing point
could become finite or vice versa. This situation could
happen especially when turning and it affects one of the
horizontal VP. There are two possible situations:
1. A finite VP becomes infinite To address this problem,
we predict the position and the type (finite or infinite)
of the VP using the position of the horizontal finite VP
that gives the camera heading. After verifying the
consistency of this horizontal finite VP, we compute
the distance d that separates its positions in the frames,N and N - 1. Then, we predict the position of the
second horizontal VP. Finally, we can determine the
type of the second horizontal VP depending on
whether the midpoint of the segment that joins its
predicted position to the first horizontal VP is located
inside the image plane or not. For a VP that goes away
from the center of the image, the midpoint coordinates
are outside the image and the VP will next be
considered as infinite.
2. An infinite VP becomes finite This means that the
interception point of associated lines to the VP is no
longer very far from the image center. Therefore, the
VP is detected as finite. In this case, the consistency
criterion for infinite VP remains valid since we can
compute the angular distance between the VP direc-
tions. For this, we just estimate the direction of the VP
in the current image from the image center (finite VP)
and compare it to its previous direction (infinite VP).
Vanishing point filtering is efficient since it makes our
algorithm much more stable and robust as will be shown in
the experiments section, making unnecessary the standard
optimization steps after RANSAC. Once the three most
reliable VP are extracted in the image, the camera orientation
is computed frame by frame as described in the next section.
3.5 Computation of the camera orientation
This part is directly inspired from [6] to compute the camera
orientation from the three VP supposed to be orthogonal.
We use the directions of the detected VP which corre-
spond to the camera orientation to compute the rotation
matrix u~; v~;w~ð Þ. The vectors u~; v~ and w~ represent threeorthogonal directions of the scene, respectively: the first
horizontal direction, the vertical direction and the second
horizontal direction. They need to satisfy the following
orthonormal relations:
u~: v~¼ v~:w~ ¼ w~ : u~¼ 0u~k k ¼ v~k k ¼ w~k k ¼ 1
�ð4Þ
Figure 3 depicts an example of the camera orientation
estimation with three orthogonal VP (two finite and one
infinite VP). X~; Y~;Z~�
is the camera coordinate frame,
a;w; qð Þ the three Euler angles relating to the camera thatrepresent the yaw, the pitch and the roll, respectively, V1
and V2 the two finite VP and I1!
the infinite VP direction.
The estimation of the camera orientation depends on the
VP configurations [6].
3.5.1 One finite and two infinite VP
This situation is the most frequent one. It occurs when the
image plane is aligned with two axes of the world coor-
dinate frame as shown in Fig. 4. Let V be the finite VP and
f the focal length. The direction of V can be expressed as
OV�! ¼ ðV1x;V1y;�f ÞT, whereas the directions of the infiniteVP, in image space, are I1
!¼ ðI1x; I1y; 0ÞTandI2!¼ ðI2x; I2y; 0ÞT. The vectors of the rotation matrix aregiven by the following system of equations [6]:
u~¼ I1x; I1y;I1xV1x þ I1yV1y
f
� �T
v~¼ I2x; I2y;I2xV1x þ I2yV1y
f
� �T
w~ ¼ �OV�! ¼ �V1x;�V1y; f� T
8>>>>><
>>>>>:
ð5Þ
3.5.2 Two finite and one infinite VP
This situation happens when the image plane is aligned
with only one of the three axes of the world coordinate
frame (Fig. 3). Let V1 and V2 be the two finite VP of
directions OV1��! ¼ ðV1x;V1y;�f ÞT and OV2
��! ¼ðV2x;V2y;�f ÞT: Since there are two finite horizontal VP,we set w~ to the closest VP to the image center. The vector v~
is obtained by cross product as shown in the system of
equations below [6].
u~¼ �V1x;�V1y; f� T
w~ ¼ �V2x;�V2y; f� T
v~¼ u~� w~
8><
>:ð6Þ
674 J Real-Time Image Proc (2017) 13:669–684
123
-
3.5.3 Three finite VP
This last configuration is the least frequent one. It occurs
when there is no alignment between the image plane and
the world coordinate frame as depicted in Fig. 5.
Let V1, V2 and V3 be the three finite VP of directions
OV1��! ¼ ðV1x;V1y;�f ÞT, OV2
��! ¼ ðV2x;V2y;�f ÞT and OV3��! ¼
ðV3x;V3y;�f ÞT: We start by setting v~ to the VP whosedirection is closest to the vertical direction. We then set w~
to the closest VP to the image center. In the system of Eq.
(7) [6], we assume that V2 is the vertical VP and V3 is
closest to the image center.
u~¼ �V1x;�V1y; f� T
v~¼ �V2x;�V2y; f� T
w~ ¼ �V3x;�V3y; f� T
8><
>:ð7Þ
4 Experimental results
This section presents the performance evaluation of the
proposed method on real static images and video
sequences.
4.1 Accuracy study
For comparison purpose, we have tested our algorithm on
the public York Urban Database (YUD) facing two other
approaches for estimating the VP quoted in the related
work section:
• An Hough-based accumulation approach ATIP pro-posed by [6].
• An analytical method RNS recently published by [24]that provides optimal least-squares estimates of three
orthogonal vanishing points using a RANSAC-based
classification of lines.
This database provides the original images, camera
calibration parameters, ground truth line segments and the
three Manhattan vanishing points to compute the three
Euler angles relating to the camera orientation for each
image [10]. Figure 6 illustrates some orthogonal vanishing
points and their associated parallel lines extracted by our
algorithm on some images pulled out of the YUD. Van-
ishing lines are colored according to the vanishing point
they meet at.
The Table 2 presents a comparative study of the angular
distance from ground truth (GT) of the camera orientation
Fig. 4 Computation of the camera orientation with one finite VP andtwo infinite VP
Fig. 5 Computation of the camera orientation with three finite VP
Fig. 3 Camera coordinate frame
J Real-Time Image Proc (2017) 13:669–684 675
123
-
computed, respectively, with the proposed framework,
RNS method, and the ATIP method. The average and
standard deviation of the angular distance are performed on
a set of 50 images for the three angles. The three last rows
of Table 2 give the number of times the distance exceeds a
fixed value of 2�, 5� and 10�, respectively. Our methodperforms accurate estimates of the camera orientation since
the angular distance remains inferior to 2� for most images.The ATIP method is obviously less reliable, as it may
misclassify parallel lines in the presence of outliers and
does not include the orthogonality constraints between the
vanishing points. By using RANSAC-based classification
of lines, the other methods remove the outliers from the
parallel groups.
The RNS method gives the best result for the pitch
angle, but it is interesting to note that our method is sig-
nificantly better for the yaw and roll angles. The yaw is
actually essential for a pedestrian navigation task, since it
gives the camera viewing direction. This may be explained
by our strategy for selecting orthogonal vanishing points
that are distant enough from each other without confusion
between finite and infinite points.
The angular deviation of the three Euler angles esti-
mated by our proposed framework, RNS and ATIP meth-
ods, compared to the GT, is reported in Fig. 7a. Figure 7b,
c provides the maximum and the minimum angular devi-
ation of the estimated vanishing points from the GT,
respectively. We note that, using our proposed framework,
78 % of the tested images provide an angular deviation
error inferior to 2�, while RNS and ATIP methods giveonly 68 and 18 %, respectively.
Our method and ATIP have been implemented by using
Visual C?? and OpenCV library which is an open source
library specialized for computer vision applications. The
Fig. 6 Examples of some triplets of orthogonal VP detected by our algorithm on images from YUD. Magenta lines correspond to the verticalVP, while blue and green lines correspond to the two horizontal VP
Table 2 Accuracy validation of the camera orientation computation
Our method ang. dist. from GT RNS ang. dist. from GT ATIP ang. dist. from GT
Pitch Yaw Roll Pitch Yaw Roll Pitch Yaw Roll
Average 1.38 0.75 0.69 0.74 1.70 1.81 3.80 4.36 1.80
Standard deviation 1.57 0.60 0.65 0.62 1.82 1.88 4.74 5.69 2.67
[2� 8 3 1 1 13 16 23 27 11[5� 2 0 0 0 2 2 14 13 6[10� 0 0 0 0 1 1 5 4 1
Comparative study of the methods with regard to average and standard deviation of the angular distance from the GT (in degrees)
676 J Real-Time Image Proc (2017) 13:669–684
123
-
full results of the RNS method are available in a technical
report provided online by the authors (http://umn.edu/
*faraz/vp).Concerning the processing time for estimating the three
orthogonal vanishing points on YUD images of size
640 9 480 pixels, our algorithm takes *86 ms against*1.12 s for ATIP method on an Intel Core 2 Duo pro-cessor without considering the step of line extraction. For
RNS, the authors mention that their current C??
implementation of the symbolic-numeric method takes
*290 ms to find the solutions on a Core 2 Duo processor.
4.2 Tracking the camera orientation on a Smartphone
To show the efficiency of our algorithm for tracking the
camera orientation, we acquire real video sequences by a
handheld Smartphone. Figure 8 illustrates our experimen-
tal prototype. The whole image processing chain presented
Fig. 7 Comparison of theaccuracy of our proposed
framework against RNS and
ATIP methods. Cumulative
histograms of: a the angulardeviation of the three Euler
angles from the GT (in degrees),
b the maximum angulardeviation from the GT and c theminimum angular deviation
from the GT
J Real-Time Image Proc (2017) 13:669–684 677
123
http://umn.edu/~faraz/vphttp://umn.edu/~faraz/vp
-
above has been implemented on Android-based Smart-
phone (Samsung GALAXY S III) using OpenCv4Android
SDK. The Smartphone’s camera has been first calibrated
using the tool: http://www.vision.caltech.edu/bouguetj/
calib_doc/, a software proposed by Bouguet.
Figures 9 and 10 depict some typical results of vanish-
ing point extraction for two sequences acquired at the
premises of Polytech’ Orléans. The first one, referred to as
‘‘Seq1’’, is composed of 475 frames of size 320 9 240
pixels. The second one, referred to as ‘‘Seq2’’, contains 478
Fig. 8 Our experimental prototype
Fig. 9 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence
referred to as ‘‘Seq1’’. Vanishing lines are colored according to the
vanishing point at which they meet. Below each example image, a 3D
vector representation to illustrate the estimate angles is shown
678 J Real-Time Image Proc (2017) 13:669–684
123
http://www.vision.caltech.edu/bouguetj/calib_doc/http://www.vision.caltech.edu/bouguetj/calib_doc/
-
images. The vanishing lines are shown colored according
to the vanishing point they have been associated to. For
each image example, the camera orientation is shown
below by a 3D vector representation.
Figures 11 and 12 compare the evolution of pitch, yaw
and the roll angles of the camera orientation along the two
sequences by applying our method with and without the
vanishing point filtering (VPF). The VPF produces a
smooth running and a more reliable estimation for camera
orientation along the video sequence. Since the VPF
removes some aberrant vanishing points, keeping only the
points that are consistent, we then obtain a more accurate
camera orientation.
To illustrate the efficiency of the proposed sampling
strategy of vanishing points based on line clustering with
RANSAC, Figs. 13 and 14 show the evolution of the
number of vanishing lines extracted along the two
sequences. These two figures represent the total number of
vanishing lines as well as the number of participant lines to
the three VP (inliers). They also depict the subset of lines
associated with the two horizontal VP and the subset of
lines associated with the vertical VP. It is clear that by
using RANSAC-based classification of lines, the method
removes the outliers.
The whole algorithm for estimating the three orthogonal
vanishing points including line segment detection and
orientation estimation runs in about 100 ms per frame of
size 320 9 240 pixels on the Galaxy S III, depending on
the scene and the number of detected lines.
5 Conclusion
Selecting three reliable orthogonal vanishing points in
images, corresponding to the Manhattan direction, is a
prerequisite to achieve an accurate estimation of the
Fig. 10 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence
referred to as ‘‘Seq2’’. Line segments are colored according to the
vanishing point at which they meet. Below each example image, a 3D
vector representation to illustrate the estimate angles is shown
J Real-Time Image Proc (2017) 13:669–684 679
123
-
Fig. 11 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq1’’
680 J Real-Time Image Proc (2017) 13:669–684
123
-
Fig. 12 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq2’’
J Real-Time Image Proc (2017) 13:669–684 681
123
-
camera orientation. In this paper, we present and test a
solution for real-time application on a Smartphone. Our
algorithm relies on a novel sampling strategy among finite
and infinite vanishing points and a filtering along a video
sequence. The performance of our algorithm is validated
using real static images and video sequences. Experimental
results on real images show that the adopted strategy for
selecting three reliable distant and orthogonal vanishing
points in conjunction with RANSAC performs well in
practice. We obtain competitive results for camera orienta-
tion estimation compared with the best analytical method of
the state of the art. Actually, our method is best adapted to
navigation task, since the estimation is significantly better
for the yaw angle which gives the pedestrian heading.
Furthermore, our method was tested on videos captured
in real condition of pedestrian navigation with an embed-
ded handheld Smartphone. It has proved strong for tracking
the camera orientation during video sequences. The results
show that the proposed filtering is efficient to dismiss
aberrant vanishing points along the sequence.
In future, the presented camera orientation estimation
will be a part of a blind pedestrian navigation system. The
vision-based module is devoted to limit the drift error by
using key frames along the path and to refine position
approximate by integrating the heading of the user between
successive key frames. Furthermore, the heading informa-
tion is very important for guiding a pedestrian during
navigation.
Fig. 13 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq1’’
Fig. 14 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq2’’
682 J Real-Time Image Proc (2017) 13:669–684
123
-
Acknowledgments This study is supported by HERON Technolo-gies SAS and the Conseil Général du LOIRET. The authors gratefully
acknowledge the contribution of Kamel Guissous and Aladine Che-
touani for their help concerning Smartphone coding.
References
1. Antone, M., Teller, S.: Automatic recovery of relative camera
rotations for urban scene. In: Proceedings of the of the IEEE
Conference on computer vision and pattern recognition (CVPR),
Hilton Head Island, South Carolina, 13–15 June 2000. IEEE
Computer Society (2000)
2. Barinova, O., Lempitsky, V., Tretiakand, E., Kohli P.: Geometric
image parsing in man-made environments. In: Proceedings of the
11th European Conference on computer vision (ECCV), Crete,
Greece, 5–11 September 2010. Lecture Notes in Computer Sci-
ence, vol. 6312. Springer, Berlin Heidelberg (2010)
3. Barnard, S.T.: Interpreting perspective images. Artif. Intell.
21(4), 435–462 (1983). (Elsevier Science B.V)4. Bazin, J.C., Pollefeys, M.: 3-line RANSAC for orthogonal van-
ishing point detection. In: Proceedings of IEEE/RSJ International
Conference on intelligent robots and systems, IROS 2012, Vil-
amoura, Algarve, Portugal, 7–12 October (2012)
5. Bazin, J.C., Seo, Y., Demonceaux, C., Vasseur, P., Ikeuchi, K.,
Kweon, I., Pollefeys, M.: Globally optimal line clustering and
vanishing points estimation in a Manhattan world. In: Proceed-
ings of the IEEE International Conference on computer vision
and pattern recognition (CVPR), Providence, Rhode Island, USA,
16–21 June 2012. IEEE Computer Society (2012)
6. Boulanger, K., Bouatouch, K., Pattanaik, S.: ATIP: a tool for 3D
navigation inside a single image with automatic camera calibra-
tion. In: Proceedings of the EG UK Conference on theory and
practice of computer graphics, Middlesbrough, UK, 20–22 June
(2006)
7. Cantoni, V., Lombardi, L., Porta, M., Sicard, N.: Vanishing point
detection: representation analysis and new approaches. In: Pro-
ceedings of the IEEE International Conference on image analysis
and processing (ICIAP), 26–28 September 2001, Palermo, Italy
2001. IEEE Computer Society (2001)
8. Collins, R.T., Weiss, R.S.: Vanishing point calculation as statis-
tical inference on the unit sphere. In: Proceedings of the 3rd
International Conference on computer vision (ICCV), Osaka,
Japan, 4–7 December, 1990. IEEE (1990)
9. Coughlan, J.M., Yuille, A.L.: Manhattan world: compass
direction from a single image by Bayesian inference. In: Pro-
ceedings of the 7th IEEE International Conference on computer
vision (ICCV), Kerkyra, Greece, 20–27 September 1999. IEEE
(1999)
10. Denis, P., Elder, J.H., Estrada, F.: Efficient edge-based methods
for estimating Manhattan frames in urban imagery. In: Proceed-
ings of the 10th European Conference on computer vision
(ECCV), Marseille, France, 12–18 October 2008. Lecture Notes
in Computer Science, vol. 5303. Springer, Berlin Heidelberg
(2008)
11. Elloumi, W., Treuillet, S., Leconge, R.: Tracking orthogonal
vanishing points in video sequences for a reliable camera orien-
tation in Manhattan world. In: Proceedings of the 5th Interna-
tional Congress on image and signal processing (CISP 2012),
Chongqing, China, 16–18 October (2012)
12. Fisher, M.A., Bolles, R.C.: Random sample consensus: a para-
digm for model fitting with applications to image analysis and
automated cartography. Commun. ACM 24(6), 381–395 (1981).doi:10.1145/358669.358692
13. Förstner, W.: Optimal vanishing point detection and rotation
estimation of single images from a Legoland scene. In: Pro-
ceedings of the ISPRS symposium commission III PCV. S, Paris,
1–3 September (2010)
14. Hide, C., Moore, T., Botterill, T.: Low cost vision-aided IMU for
pedestrian navigation. J. Glob. Pos. Syst. 10(1), 3–10 (2011).doi:10.5081/jgps.10.1.3
15. Hwangbo, M., Kanade, T.: Visual-inertial UAV attitude estima-
tion using urban scene regularities. In: Proceedings of the IEEE
International Conference on robotics and automation (ICRA
2011), Shanghai, China, 9–13 May, 2011. IEEE (2011)
16. Kalantari, M., Hashemi, A., Jung, F., Guédon, J.P.: A new
solution to the relative orientation problem using only 3 points
and the vertical direction. J. Math. Imaging Vis. Arch. 39(3),259–268 (2011). doi: 10.1007/s10851-010-0234-2
17. Kalantari, M., Jung, F., Guédon, J.P.: Precise, automatic and fast
method for vanishing point detection. Photogramm. Rec. 24(127),246–263 (2009)
18. Kosecka, J., Zhang, W.: Video compass. In: Proceedings of the
7th European Conference on computer vision (ECCV), Copen-
hagen, Denmark, 27 May–2 June 2002. Lecture Notes in Com-
puter Science, vol. 2353. Springer, Berlin Heidelberg (2002)
19. Lutton, E., Maitre, H., Lopez-Krahe, J.: Contribution to the
determination of vanishing points using Hough transform. IEEE
Trans. Pattern Anal. Mach. Intell. 16(4), 430–438 (1994)20. Martins, A., Aguiar, P., Figueiredo, M.: Orientation in Manhattan
world: equiprojective classes and sequential estimation. IEEE
Trans. Pattern Anal. Mach. Intell. 27, 822–826 (2005)21. Matas, J., Galambos, C., Kittler, J.V.: Robust detection of lines
using the progressive probabilistic Hough transform. J. Computer
Vis. Image Underst.—Spec. Issue Robust Stat. Tech. Image
Underst. 78(1), 119–137 (2000)22. Micusik, B., Wildenauer, H., Vincze, M.: Towards detection of
orthogonal planes in monocular images of indoor environments.
In: Proceedings of the IEEE International Conference on robotics
and automation (ICRA 2008), Pasadena, California, USA, 19–23
May, 2008. IEEE (2008)
23. Mingawa, A., Tagawa, N., Moriya, T., Gotoh, T.: Vanishing point
and vanishing line estimation with line clustering. IEICE Trans.
Inf. Syst. E83–D(7), 1574–1582 (2000)24. Mirzaei, F.M., Roumeliotis, S.I.: Optimal estimation of vanishing
points in a Manhattan world. In: Proceedings of the 13th IEEE
International Conference on computer vision (ICCV), Barcelona,
Spain, 6–13 November 2011. IEEE (2011)
25. Nieto, M., Salgado, L.: Simultaneous estimation of vanishing
points and their converging lines using the EM algorithm. Pattern
Recogn. Lett. 32(14), 1691–1700 (2011)26. Pflugfelder, R., Bischof, H.: Online auto-calibration in man-made
world. In: Proceedings of the digital image computing: tech-
niques and applications (DICTA), Cairns, Australia. 6–8
December 2005. IEEE Computer Society (2005)
27. Rother, C.: A new approach for vanishing point detection in
architectural environments. In: Proceedings of the 11th British
machine vision conference (BMVC), University of Bristol, 11–14
September (2000)
28. Saurer, O., Fraundorfer, F., Pollefeys, M.: Visual localization
using global visual features and vanishing points. In: Proceedings
of the CLEF (notebook papers/LABs/workshops), Padua, Italy,
20–23 September (2010)
29. Shufelt, J.A.: Performance evaluation and analysis of vanishing
point detection techniques. IEEE Trans. Pattern Anal. Mach.
Intell. 21(3), 282–288 (1999)30. Tardif, J.-P.: Non-iterative approach for fast and accurate van-
ishing point detection. In: Proceedings of the 12th IEEE Inter-
national Conference on computer vision (ICCV), Kyoto, Japan,
27 September–4 October 2009. IEEE (2009)
J Real-Time Image Proc (2017) 13:669–684 683
123
http://dx.doi.org/10.1145/358669.358692http://dx.doi.org/10.5081/jgps.10.1.3http://dx.doi.org/10.1007/s10851-010-0234-2
-
31. Tuytelaars, T., Gool, L.J.V., Proesmans, M., Moons, T.: A cas-
caded Hough transform as an aid in aerial image interpretation.
In: Proceedings of the 6th IEEE International Conference on
computer vision (ICCV), Bombay, India, 4–7 January 1998.
IEEE Computer Society (1998)
32. Wildenauer, H., Vincze, M.: Vanishing point detection in com-
plex man-made worlds. In: Proceedings of the International
Conference on image analysis and processing (ICIAP), Modena,
Palazzo Ducale, 10–13 September (2007)
Wael Elloumi received hismaster’s degree in computer
science, from the INSA of Ro-
uen, France, in 2008. In 2012,
he obtained his Ph.D. degree in
computer vision from the Uni-
versity of Orleans, France.
Currently, he is a postdoctoral
researcher at PRISME Labora-
tory, Image and Vision team of
the University of Orleans. His
research interests include com-
puter vision, image matching,
3D mapping, vanishing points
and their applications to indoor
navigation and assistive systems.
Sylvie Treuillet received hermaster’s degree in electrical
engineering from the University
of Clermont-Ferrand, France in
1988. From 1988 to 1990, she
worked in a private company in
Montpellier developing image
processing and pattern recogni-
tion methods in automatics
cytogenetics. She then joined
the academic research labora-
tory LASMEA and was awarded
her Ph.D. degree in computer
vision in 1993 at the University
of Clermont-Ferrand, France.
Since 1994, she has been assistant professor in computer sciences and
electronic engineering at the Ecole Polytechnique of the University of
Orleans, France. Her research interests include computer vision for
3D object modeling, pattern recognition, and color image analysis for
biomedical or industrial applications.
Rémy Leconge received hisPh.D. degree in 2002 from the
University of Bourgogne,
France. He is an associated
professor in the image, vision
department of the laboratory
PRISME at the University of
Orleans, France. His research
interests are focused on edge
detection, multiscale, motion
estimation, tracking.
684 J Real-Time Image Proc (2017) 13:669–684
123
Real-time camera orientation estimation based on vanishing point tracking under Manhattan World assumptionAbstractIntroductionRelated workProposed pipelineDominant line detectionVanishing point candidatesVP sampling strategyVanishing point filteringComputation of the camera orientationOne finite and two infinite VPTwo finite and one infinite VPThree finite VP
Experimental resultsAccuracy studyTracking the camera orientation on a Smartphone
ConclusionAcknowledgmentsReferences