real-time camera orientation estimation based on vanishing...

ORIGINAL RESEARCH PAPER

Real-time camera orientation estimation based on vanishing pointtracking under Manhattan World assumption

Wael Elloumi • Sylvie Treuillet • Rémy Leconge

Received: 3 June 2013 / Accepted: 20 March 2014 / Published online: 4 April 2014

� Springer-Verlag Berlin Heidelberg 2014

Abstract This paper proposes a real-time pipeline for

estimating the camera orientation based on vanishing

points for indoor navigation assistance on a Smartphone.

The orientation of embedded camera relies on the ability to

find a reliable triplet of orthogonal vanishing points. The

proposed pipeline introduces a novel sampling strategy

among finite and infinite vanishing points with a random

sample consensus-based line clustering and a tracking

along a video sequence to enforce the accuracy and the

robustness by extracting the three most pertinent orthogo-

nal directions while preserving a short processing time for

real-time application. Experiments on real images and

video sequences acquired with a Smartphone show that the

proposed strategy for selecting orthogonal vanishing points

is pertinent as our algorithm gives better results than the

recently published RNS optimal method, in particular for

the yaw angle, which is actually essential for the navigation

task.

Keywords Vanishing point � Real-time cameraorientation � RANSAC algorithm � Navigation assistance �Smartphone

1 Introduction

This paper deals with a practical approach for real-time

camera orientation for indoor navigation assistance on a

Smartphone. Pedestrian navigation assistance is a highly

challenging task that is increasingly needed in various

types of applications such as visually impaired guidance,

emergency intervention, tourism, etc. Despite the growing

interest of research in this topic, there is no effective

operational system. This can be explained by several

technology roadblocks. Besides the absence of existing

cartography, the main obstacle remains the availability and

accuracy of the required position information.

Nearly all Smartphones have built-in GPS sensor, but it

is still insufficient for pedestrian positioning because of its

low accuracy. Besides, it is often ineffective in urban areas

or in indoor environments where the satellite signal

reception is undermined. The alternative solution consists

in joining measures from various additional self-contained

sensors (magnetometers, accelerometer, digital compass,

WiFi, Bluetooth, etc.) with a Kalman filter. The problem

with these other positioning alternative methods is the

integration process which results in large cumulative drift

errors (even for very short periods of times). Errors due to

electromagnetic perturbations by electric or ferrous mate-

rials are also often reported in related works [14]. In this

context, embedded computer vision is likely the most

attractive technique to increase positioning accuracy, since

high-quality cameras are available in Smartphones and

image processing is applicable in all environments requir-

ing no additional infrastructure installation. One conve-

nient way to compute a real-time camera orientation in

urban area is based on vanishing points.

By perspective projection, the parallel lines in 3D scene

intersect in the image plane in a so-called vanishing point.

Electronic supplementary material The online version of thisarticle (doi:10.1007/s11554-014-0419-9) contains supplementarymaterial, which is available to authorized users.

W. Elloumi (&) � S. Treuillet � R. LecongeLaboratoire PRISME, Université d’Orléans, 12 rue de Blois,

45067 Orléans Cedex, France

e-mail: [email protected]

S. Treuillet


R. Leconge


123

J Real-Time Image Proc (2017) 13:669–684

DOI 10.1007/s11554-014-0419-9

http://dx.doi.org/10.1007/s11554-014-0419-9http://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdf

Unlike finite vanishing points whose coordinates are

determined in the image plane, the infinite ones which are

very far from the image center are defined by a direction. In

man-made urban environments, many line segments are

oriented along three orthogonal directions aligned with the

global reference frame. Under this so-called Manhattan

World assumption, vanishing lines or points are pertinent

visual cues to estimate the camera orientation [1, 9, 13, 16,

18, 20, 24]. This is an interesting alternative to structure

and motion estimation based on feature matching, a sen-

sitive problem in computer vision. The orientation matrix

of a calibrated camera, parameterized with three angles,

may be efficiently computed from three noise-free

orthogonal vanishing points. Vanishing points are also used

for camera calibration [18, 26, 30]. However, these tech-

niques rely on the ability to find a robust orthogonal triplet

of vanishing points in a real image.

The estimation of the camera orientation is generally

computed in a single image. Few works address the

tracking along a video sequence [4, 20]. In a previous work

[11], we proposed a solution to achieve an accurate esti-

mation of the camera orientation based on a sampling

strategy among finite and infinite vanishing points extrac-

ted with random sample consensus (RANSAC)-based line

clustering.

This paper extends the previous conference paper. The

principal additive contribution of this paper is a real-time

implementation on a Smartphone. The experimental results

have been significantly extended to evaluate our method

facing other state-of-art methods on real static images. The

performance evaluation contains additionally longer video

sequences acquired by a handheld Smartphone. The paper

gives also more explanations and clarifications than the

earlier conference paper.

The paper is organized as follows. A review of related

work is proposed in Sect. 2. Section 3 presents our method

for selecting and filtering three reliable orthogonal van-

ishing points which are used for estimating the camera

orientation. Experimental results are shown in Sect. 4, and

Sect. 5 concludes the paper.

2 Related work

Since the last 30 years, the literature is broad on the subject

of vanishing points/vanishing point (VP) computation. In

this section we review some of the most relevant ones. The

first approaches used the Hough transform and accumula-

tion methods [6, 7, 19]. The efficiency of these methods

highly depends on the discretization of the accumulation

space and they are not robust in the presence of outliers

[29]. Furthermore, they do not consider the orthogonality

of the resulting VP. An exhaustive search method may take

care of the constraint of orthogonality [27], but is not

adapted for real-time application.

Few authors prefer to work on the raw pixels [10, 20];

published methods mainly work on straight lines extracted

from image. According to the mathematical formalization

of VP, some variants exist in the choice of the workspace:

image plane [27, 30], Hough transform [7, 31], projective

plane [25, 26] or Gaussian sphere [3, 8, 18, 19]. Using

Gaussian unit sphere or projective plane allows treating

equally finite and infinite VP, unlike image plane. This is a

well-suited representation for simultaneously clustering

lines that converge at multiple VP by using a probabilistic

expectation–maximization (EM) joint optimization

approach [1, 9, 18, 23, 25]. These approaches address the

misclassification and optimality issues, but the initializa-

tion and grouping are the determining factors of their

efficiency.

Recently, many authors have adopted robust estimation

based on RANSAC [12]. These approaches consider

intersection of line segments as VP hypotheses and then

iteratively clustering the parallel lines, consistent with this

hypothesis [4, 13, 15, 22, 28, 32]. A variant by J-linkage

algorithm has been used by [30]. By dismissing the outli-

ers, the RANSAC-based classifiers are much more robust

than accumulative methods, limited by the size of the

accumulator cell, and give a more precise position of the

VP. They have been used to initialize EM estimators to

converge to the correct VP. Other optimal solutions rely on

analytical approach often based on time-consuming algo-

rithms [2, 5, 17, 24]. In this last paper, it is interesting to

note that, even if they are nondeterministic, the RANSAC-

based approaches hold comparable results against exhaus-

tive search for the number of clustered lines. So, it remains

a very good approach for extracting the VP candidates, in

addition with a judicious strategy for selecting a triplet

consistent with the orthogonality constraint.

Despite numerous papers dedicated to the straight line

clustering to compute adequate vanishing points, this prob-

lem remains an open issue for real-time application in video

sequences. The estimation of the camera orientation is gen-

erally computed in a single image. As a summary, Table 1

classifies the most relevant works according to their search

space of VP (workspace) and the type of approach used.

To achieve an accurate estimation of the camera orien-

tation on video sequences acquired with a handheld

Smartphone, we develop a real-time pipeline based on

previous work [11]. Our proposed framework operates in

the image space, unlike [4] which uses the projection onto

the Gauss sphere, a workspace well adapted to omnidi-

rectional camera. It uses line segments as primitives which

is an easier image information to handle for VP extraction.

In addition, it allows a joint process of estimation and

classification of VP based on RANSAC. To preserve a

670 J Real-Time Image Proc (2017) 13:669–684

123

short processing time, only few VP candidates are selected

by splitting finite and infinite VP extracted with a RAN-

SAC-based line clustering. Indeed, the RANSAC algorithm

allows a robust classification of vanishing lines and an

accurate detection of their intersection. This classification

is easy to implement, fast and requires no initialization as

probabilistic models. Moreover, camera orientation track-

ing is reinforced by filtering the VP along the video

sequence.

3 Proposed pipeline

The proposed pipeline is given in Fig. 1. To achieve an

accurate estimation of the camera orientation based on

three reliable orthogonal VP, the pipeline is composed of

four steps. The first one consists of dominant line extrac-

tion from the detected image edges. The second one con-

sists of selecting a triplet of VP by dominant line clustering

with RANSAC. At this step, we introduce a guided search

strategy to select only three reliable orthogonal VP that

represent the orientation of the camera relative to 3D world

reference frame. Another contribution is the vanishing

point filtering performed along the video sequences (step 3)

to enforce the robustness of the camera orientation com-

putation (step 4). The next sections give some details and

justifications about each bloc.

3.1 Dominant line detection

Some pre-processing is introduced to improve the quality

and the robustness of the detected edges in case of

embedded camera: first, a histogram equalization harmo-

nizes the distribution of brightness levels in the image.

Secondly, a geometric correction of lens distortion is done

assuming that all camera parameters are estimated by

prerequisite calibration. To find the dominant lines, we first

detect edges by using a Canny’s detector. Then, we use the

progressive probabilistic Hough transform (PPHT) [21], a

form of an adaptive probabilistic HT. The PPHT is faster

than both standard Hough transform and probabilistic

Hough transform which makes it well suited for real-time

application. Furthermore, this method enables the compu-

tation of line segment length. So, only line segments whose

length is greater than a threshold p1 are selected as domi-

nant lines to estimate VP. The PPHT algorithm requires

five parameters: two of them for the resolution of the

accumulator or the Hough space (q ¼ 1 pixel and h = 3�);the minimum number of intersections to detect a line in the

accumulator (d ¼ 50Þ; the minimum length of the selectedline segments (p1 ¼ 30 pixels); the last parameter is themaximum gap between two points to be considered in the

same line (p2 ¼ 5 pixels). Note that all these parametershave been set empirically.

3.2 Vanishing point candidates

To provide three VP, aligned with the three main orthog-

onal directions of the Manhattan World, the most intuitive

method is to detect the intersection of dominant lines in

images. Two types of VP are considered: the finite ones

correspond to line interception that may be defined by

coordinates in the image plane; the infinite ones for lines

intercepting very far from the image center are modeled by

Table 1 Classification of vanishing point estimation algorithms

Type of approach Search space of vanishing points

Image Hough transform Gaussian sphere Projective plane

Accumulation methods Rother [27] Cantoni et al. [7] Boulanger et al. [6]

Tuytelaars et al. [31] Shufelt [29]

Lutton et al. [19]

Barnard [3]

RANSAC-based methods Saurer et al. [28] Hwangbo and Kanade [15]

Tardif [30] Förstner [13]

Kalantari et al. [17] Bazin and Pollefeys [4]

Micusik et al. [22]

Wildenauer and Vincze [32]

Elloumi et al. [11]

Probabilistic methods Bazin et al. [5] Kosecka and Zhang [18] Nieto and Salgado [25]

Mirzaei and Roumeliotis [24] Antone and Teller [1] Pflugfelder and Bischof [26]

Barinova et al. [2] Collins and Weiss [8]

Mingawa et al. [23]

J Real-Time Image Proc (2017) 13:669–684 671

123

a direction. Unlike the Gauss sphere, working in the image

plane needs this differentiation between finite and infinite

points, but is faster since no transformation is required.

Recently, numerous authors adopt RANSAC as a simple

and powerful method to provide a partition of parallel

straight lines into clusters by pruning outliers [13, 15, 22,

28, 32]. The process starts by randomly selecting two lines

to generate a VP hypothesis. Then, all lines consistent with

this hypothesis are grouped together to optimize the VP

estimate. Once a dominant VP is detected, all the associ-

ated lines are removed and the process is repeated to detect

the next dominant VP. The principal drawback of this

sequential search is that no orthogonality constraint is

imposed for selecting a reliable set of three VP to compute

the camera orientation. Very recent works propose optimal

estimates of three orthogonal VP by an analytical approach

based on a multivariate polynomial system solver [24] or

by optimization approach based on interval analysis theory

[5], but at the expenses of complex time-consuming

algorithms.

In this work, we introduce a guided search strategy to

extract a limited number of reliable VP while enforcing the

orthogonality constraint, in conjunction with RANSAC.

3.3 VP sampling strategy

In the context of indoor navigation, the main orthogonal

directions inManhattanWorld consist generally of a vertical

one (often associated with an infinite VP) and two horizontal

ones (associated with finite or infinite VP). So, we propose a

fast and robust sampling strategy taking advantage of the

differentiation between finite and infinite VP.

Three different possible configurations depending on the

alignment of the image plane with the 3D urban scene are

considered: (1) one finite and two infinite VP, (2) two finite

and one infinite VP, (3) three finite VP. The first two

configurations are common, unlike the third. More details

about the computation of the camera orientation depending

on these three configurations will be given in Sect. 3.5. To

avoid confusion between finite VP and infinite ones, VP

detection is carried out in two steps: first, we detect finite

VP; then, we detect the infinite ones. For a robust selection

of VP, the proposed algorithm starts with the detection of

three finite candidates. For this, we use RANSAC to choose

randomly a pair of line segment ðli; ljÞ from the set ofunlabeled lines L. The intersection of these two lines pro-

vides a potential finite VP ðfvppotÞ. To extract the partici-pant lines to fvppot, we compute a consensus score given by

Eqs. (1) and (2).

Score ¼Xn

i¼0f v; lið Þ ð1Þ

where n is the number of dominant lines of the set L.

f v; lið Þ1; d v; lið Þ\d0; otherwise

�ð2Þ

Fig. 1 Overview of theproposed algorithm


123

where d v; lið Þ is the Euclidian distance from the finite VPcandidate v to the line li.

These three steps are repeated k times. k is the number

of random combinations for selecting a pair of line seg-

ments from L (number of iterations performed by the

algorithm). Note that, for better performance, kmust be

proportional to the number of line segments. Specifically,

in our experiments this parameter is set to k ¼ 2� cardðLÞ.The potential VP ðfvppotÞ with the highest number of par-ticipant lines is selected as the first finite VP ðv1Þ. Alldominant lines associated with v1 are labeled and excluded

from L and the process is repeated twice to detect v2 and v3.

The procedure used to detect two infinite VP is similar

to that of finite VP except that the aim is to identify the

vanishing line direction instead of their intersection. For

this, we select randomly a line segment from the remaining

dominant lines of the set L. Then, we compute its direction

which gives us an infinite potential VP ðivppotÞ: Theassociated lines to ivppot are obtained using the consensus

score given by Eqs. (1) and (3).

f v; lið Þ 1 Mindv~; li!� �

;dli!; v~

� �� \s

0 otherwise

8<

: ð3Þ

wheredv~; li!� �

is the angle between the infinite VP direc-

tion from the image center and the line li to test in image

space. Note that, in such case, k ¼ cardðLÞ: k is the numberof random combinations for selecting a line segment from

L (number of iterations performed by the algorithm).

At the end of this procedure, we obtain three finite VP

and two infinite VP. The three VP that best satisfy the

orthogonality constraint are selected as shown in Fig. 2.

The criteria used in the consensus score (Eq. 1) for

clustering lines by RANSAC are different depending on

each category. Unlike the finite VP whose coordinates may

be determined in the image plane, the infinite VP are

generally represented as a direction. For finite VP, the

consensus score is based on a distance between the can-

didate straight line and the intersecting point (Eq. 2). All

lines whose distance is below a fixed threshold d areconsidered as participants (d is equal to 4 pixels in ourexperiments). For infinite VP, it uses an angular distance

between the direction of the candidate straight line and the

direction representing the infinite VP (Eq. 3). The threshold

s is equal to 4� in our experiments.To avoid redundant VP candidates, we introduce the

supplementary constraint to be far enough from each other:

we impose VP to have a minimum angular distance

between their directions from the image center (threshold is

set to 30� for finite VP and 60� for infinite ones).By separating finite from infinite VP, the sampling

strategy provides the most significant of them without

giving more importance to one or the other category (we

enforce having at least one candidate finite). Furthermore,

this VP sampling strategy is faster as we detect only five

reliable VP candidates against generally much more for the

previous published methods.

Among the five candidates selected before, only three

VP whose directions from the optical center are orthogonal

have to be accepted, including at least one finite VP. We

adopt the following heuristic: (1) choose the finite VP with

the highest consensus score, (2) select two other VP (finite

or infinite) based on their orthogonality to the first one, and

considering their consensus score as a second criterion.

Finally, we identify the vertical VP and the two horizontal

ones. In our application, we assume that the camera is kept

upright: we identify the vertical VP as that which presents

the closest direction to the vertical direction from the

image center. The two remaining VP are thus horizontal.

3.4 Vanishing point filtering

Once the whole described algorithm is processed for the

first frame of the video sequence ðN[ 1Þ, the VP positionscan be tracked from one frame to another. Indeed, VP

positions or directions are slightly modified in video

sequences or even in a list of successive frames. So, we

introduce a filtering test to check the consistency between

the positions of the estimated VP in the frame N and those

Fig. 2 Illustration of our VP sampling strategy. a Image of dominant lines, b detection of the three pertinent finite VP, c detection of the twopertinent infinite VP, d selection of the three orthogonal VP


123

estimated in frame N � 1. For this, we use the distancebetween the positions of the VP for the finite ones

d vN ; vN�1ð Þ, and the angle between the VP directions forthe infinite ones dvN�!; vN�1��!

� . When a VP is not coherent

with its previous position or direction, it is re-estimated

taking into account its previous position or direction and

using the remains of unclassified lines. Hence, aberrant VP

are discarded and replaced by new VP that are, at the same

time, consistent with the previous ones and satisfy the

orthogonality constraint.

However in a video sequence, it could happen that

because of the camera motion an infinite vanishing point

could become finite or vice versa. This situation could

happen especially when turning and it affects one of the

horizontal VP. There are two possible situations:

1. A finite VP becomes infinite To address this problem,

we predict the position and the type (finite or infinite)

of the VP using the position of the horizontal finite VP

that gives the camera heading. After verifying the

consistency of this horizontal finite VP, we compute

the distance d that separates its positions in the frames,N and N - 1. Then, we predict the position of the

second horizontal VP. Finally, we can determine the

type of the second horizontal VP depending on

whether the midpoint of the segment that joins its

predicted position to the first horizontal VP is located

inside the image plane or not. For a VP that goes away

from the center of the image, the midpoint coordinates

are outside the image and the VP will next be

considered as infinite.

2. An infinite VP becomes finite This means that the

interception point of associated lines to the VP is no

longer very far from the image center. Therefore, the

VP is detected as finite. In this case, the consistency

criterion for infinite VP remains valid since we can

compute the angular distance between the VP direc-

tions. For this, we just estimate the direction of the VP

in the current image from the image center (finite VP)

and compare it to its previous direction (infinite VP).

Vanishing point filtering is efficient since it makes our

algorithm much more stable and robust as will be shown in

the experiments section, making unnecessary the standard

optimization steps after RANSAC. Once the three most

reliable VP are extracted in the image, the camera orientation

is computed frame by frame as described in the next section.

3.5 Computation of the camera orientation

This part is directly inspired from [6] to compute the camera

orientation from the three VP supposed to be orthogonal.

We use the directions of the detected VP which corre-

spond to the camera orientation to compute the rotation

matrix u~; v~;w~ð Þ. The vectors u~; v~ and w~ represent threeorthogonal directions of the scene, respectively: the first

horizontal direction, the vertical direction and the second

horizontal direction. They need to satisfy the following

orthonormal relations:

u~: v~¼ v~:w~ ¼ w~ : u~¼ 0u~k k ¼ v~k k ¼ w~k k ¼ 1

�ð4Þ

Figure 3 depicts an example of the camera orientation

estimation with three orthogonal VP (two finite and one

infinite VP). X~; Y~;Z~�

is the camera coordinate frame,

a;w; qð Þ the three Euler angles relating to the camera thatrepresent the yaw, the pitch and the roll, respectively, V1

and V2 the two finite VP and I1!

the infinite VP direction.

The estimation of the camera orientation depends on the

VP configurations [6].

3.5.1 One finite and two infinite VP

This situation is the most frequent one. It occurs when the

image plane is aligned with two axes of the world coor-

dinate frame as shown in Fig. 4. Let V be the finite VP and

f the focal length. The direction of V can be expressed as

OV�! ¼ ðV1x;V1y;�f ÞT, whereas the directions of the infiniteVP, in image space, are I1

!¼ ðI1x; I1y; 0ÞTandI2!¼ ðI2x; I2y; 0ÞT. The vectors of the rotation matrix aregiven by the following system of equations [6]:

u~¼ I1x; I1y;I1xV1x þ I1yV1y

f

� �T

v~¼ I2x; I2y;I2xV1x þ I2yV1y

f

� �T

w~ ¼ �OV�! ¼ �V1x;�V1y; f� T

8>>>>><

>>>>>:

ð5Þ

3.5.2 Two finite and one infinite VP

This situation happens when the image plane is aligned

with only one of the three axes of the world coordinate

frame (Fig. 3). Let V1 and V2 be the two finite VP of

directions OV1��! ¼ ðV1x;V1y;�f ÞT and OV2

��! ¼ðV2x;V2y;�f ÞT: Since there are two finite horizontal VP,we set w~ to the closest VP to the image center. The vector v~

is obtained by cross product as shown in the system of

equations below [6].

u~¼ �V1x;�V1y; f� T

w~ ¼ �V2x;�V2y; f� T

v~¼ u~� w~

8><

>:ð6Þ


123

3.5.3 Three finite VP

This last configuration is the least frequent one. It occurs

when there is no alignment between the image plane and

the world coordinate frame as depicted in Fig. 5.

Let V1, V2 and V3 be the three finite VP of directions

OV1��! ¼ ðV1x;V1y;�f ÞT, OV2

��! ¼ ðV2x;V2y;�f ÞT and OV3��! ¼

ðV3x;V3y;�f ÞT: We start by setting v~ to the VP whosedirection is closest to the vertical direction. We then set w~

to the closest VP to the image center. In the system of Eq.

(7) [6], we assume that V2 is the vertical VP and V3 is

closest to the image center.

u~¼ �V1x;�V1y; f� T

v~¼ �V2x;�V2y; f� T

w~ ¼ �V3x;�V3y; f� T

8><

>:ð7Þ

4 Experimental results

This section presents the performance evaluation of the

proposed method on real static images and video

sequences.

4.1 Accuracy study

For comparison purpose, we have tested our algorithm on

the public York Urban Database (YUD) facing two other

approaches for estimating the VP quoted in the related

work section:

• An Hough-based accumulation approach ATIP pro-posed by [6].

• An analytical method RNS recently published by [24]that provides optimal least-squares estimates of three

orthogonal vanishing points using a RANSAC-based

classification of lines.

This database provides the original images, camera

calibration parameters, ground truth line segments and the

three Manhattan vanishing points to compute the three

Euler angles relating to the camera orientation for each

image [10]. Figure 6 illustrates some orthogonal vanishing

points and their associated parallel lines extracted by our

algorithm on some images pulled out of the YUD. Van-

ishing lines are colored according to the vanishing point

they meet at.

The Table 2 presents a comparative study of the angular

distance from ground truth (GT) of the camera orientation

Fig. 4 Computation of the camera orientation with one finite VP andtwo infinite VP

Fig. 5 Computation of the camera orientation with three finite VP

Fig. 3 Camera coordinate frame


123

computed, respectively, with the proposed framework,

RNS method, and the ATIP method. The average and

standard deviation of the angular distance are performed on

a set of 50 images for the three angles. The three last rows

of Table 2 give the number of times the distance exceeds a

fixed value of 2�, 5� and 10�, respectively. Our methodperforms accurate estimates of the camera orientation since

the angular distance remains inferior to 2� for most images.The ATIP method is obviously less reliable, as it may

misclassify parallel lines in the presence of outliers and

does not include the orthogonality constraints between the

vanishing points. By using RANSAC-based classification

of lines, the other methods remove the outliers from the

parallel groups.

The RNS method gives the best result for the pitch

angle, but it is interesting to note that our method is sig-

nificantly better for the yaw and roll angles. The yaw is

actually essential for a pedestrian navigation task, since it

gives the camera viewing direction. This may be explained

by our strategy for selecting orthogonal vanishing points

that are distant enough from each other without confusion

between finite and infinite points.

The angular deviation of the three Euler angles esti-

mated by our proposed framework, RNS and ATIP meth-

ods, compared to the GT, is reported in Fig. 7a. Figure 7b,

c provides the maximum and the minimum angular devi-

ation of the estimated vanishing points from the GT,

respectively. We note that, using our proposed framework,

78 % of the tested images provide an angular deviation

error inferior to 2�, while RNS and ATIP methods giveonly 68 and 18 %, respectively.

Our method and ATIP have been implemented by using

Visual C?? and OpenCV library which is an open source

library specialized for computer vision applications. The

Fig. 6 Examples of some triplets of orthogonal VP detected by our algorithm on images from YUD. Magenta lines correspond to the verticalVP, while blue and green lines correspond to the two horizontal VP

Table 2 Accuracy validation of the camera orientation computation

Our method ang. dist. from GT RNS ang. dist. from GT ATIP ang. dist. from GT

Pitch Yaw Roll Pitch Yaw Roll Pitch Yaw Roll

Average 1.38 0.75 0.69 0.74 1.70 1.81 3.80 4.36 1.80

Standard deviation 1.57 0.60 0.65 0.62 1.82 1.88 4.74 5.69 2.67

[2� 8 3 1 1 13 16 23 27 11[5� 2 0 0 0 2 2 14 13 6[10� 0 0 0 0 1 1 5 4 1

Comparative study of the methods with regard to average and standard deviation of the angular distance from the GT (in degrees)


123

full results of the RNS method are available in a technical

report provided online by the authors (http://umn.edu/

*faraz/vp).Concerning the processing time for estimating the three

orthogonal vanishing points on YUD images of size

640 9 480 pixels, our algorithm takes *86 ms against*1.12 s for ATIP method on an Intel Core 2 Duo pro-cessor without considering the step of line extraction. For

RNS, the authors mention that their current C??

implementation of the symbolic-numeric method takes

*290 ms to find the solutions on a Core 2 Duo processor.

4.2 Tracking the camera orientation on a Smartphone

To show the efficiency of our algorithm for tracking the

camera orientation, we acquire real video sequences by a

handheld Smartphone. Figure 8 illustrates our experimen-

tal prototype. The whole image processing chain presented

Fig. 7 Comparison of theaccuracy of our proposed

framework against RNS and

ATIP methods. Cumulative

histograms of: a the angulardeviation of the three Euler

angles from the GT (in degrees),

b the maximum angulardeviation from the GT and c theminimum angular deviation

from the GT


123

http://umn.edu/~faraz/vphttp://umn.edu/~faraz/vp

above has been implemented on Android-based Smart-

phone (Samsung GALAXY S III) using OpenCv4Android

SDK. The Smartphone’s camera has been first calibrated

using the tool: http://www.vision.caltech.edu/bouguetj/

calib_doc/, a software proposed by Bouguet.

Figures 9 and 10 depict some typical results of vanish-

ing point extraction for two sequences acquired at the

premises of Polytech’ Orléans. The first one, referred to as

‘‘Seq1’’, is composed of 475 frames of size 320 9 240

pixels. The second one, referred to as ‘‘Seq2’’, contains 478

Fig. 8 Our experimental prototype

Fig. 9 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence

referred to as ‘‘Seq1’’. Vanishing lines are colored according to the

vanishing point at which they meet. Below each example image, a 3D

vector representation to illustrate the estimate angles is shown


123

http://www.vision.caltech.edu/bouguetj/calib_doc/http://www.vision.caltech.edu/bouguetj/calib_doc/

images. The vanishing lines are shown colored according

to the vanishing point they have been associated to. For

each image example, the camera orientation is shown

below by a 3D vector representation.

Figures 11 and 12 compare the evolution of pitch, yaw

and the roll angles of the camera orientation along the two

sequences by applying our method with and without the

vanishing point filtering (VPF). The VPF produces a

smooth running and a more reliable estimation for camera

orientation along the video sequence. Since the VPF

removes some aberrant vanishing points, keeping only the

points that are consistent, we then obtain a more accurate

camera orientation.

To illustrate the efficiency of the proposed sampling

strategy of vanishing points based on line clustering with

RANSAC, Figs. 13 and 14 show the evolution of the

number of vanishing lines extracted along the two

sequences. These two figures represent the total number of

vanishing lines as well as the number of participant lines to

the three VP (inliers). They also depict the subset of lines

associated with the two horizontal VP and the subset of

lines associated with the vertical VP. It is clear that by

using RANSAC-based classification of lines, the method

removes the outliers.

The whole algorithm for estimating the three orthogonal

vanishing points including line segment detection and

orientation estimation runs in about 100 ms per frame of

size 320 9 240 pixels on the Galaxy S III, depending on

the scene and the number of detected lines.

5 Conclusion

Selecting three reliable orthogonal vanishing points in

images, corresponding to the Manhattan direction, is a

prerequisite to achieve an accurate estimation of the

Fig. 10 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence

referred to as ‘‘Seq2’’. Line segments are colored according to the

vanishing point at which they meet. Below each example image, a 3D

vector representation to illustrate the estimate angles is shown


123

Fig. 11 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq1’’


123

Fig. 12 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq2’’


123

camera orientation. In this paper, we present and test a

solution for real-time application on a Smartphone. Our

algorithm relies on a novel sampling strategy among finite

and infinite vanishing points and a filtering along a video

sequence. The performance of our algorithm is validated

using real static images and video sequences. Experimental

results on real images show that the adopted strategy for

selecting three reliable distant and orthogonal vanishing

points in conjunction with RANSAC performs well in

practice. We obtain competitive results for camera orienta-

tion estimation compared with the best analytical method of

the state of the art. Actually, our method is best adapted to

navigation task, since the estimation is significantly better

for the yaw angle which gives the pedestrian heading.

Furthermore, our method was tested on videos captured

in real condition of pedestrian navigation with an embed-

ded handheld Smartphone. It has proved strong for tracking

the camera orientation during video sequences. The results

show that the proposed filtering is efficient to dismiss

aberrant vanishing points along the sequence.

In future, the presented camera orientation estimation

will be a part of a blind pedestrian navigation system. The

vision-based module is devoted to limit the drift error by

using key frames along the path and to refine position

approximate by integrating the heading of the user between

successive key frames. Furthermore, the heading informa-

tion is very important for guiding a pedestrian during

navigation.

Fig. 13 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq1’’

Fig. 14 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq2’’


123

Acknowledgments This study is supported by HERON Technolo-gies SAS and the Conseil Général du LOIRET. The authors gratefully

acknowledge the contribution of Kamel Guissous and Aladine Che-

touani for their help concerning Smartphone coding.

References

1. Antone, M., Teller, S.: Automatic recovery of relative camera

rotations for urban scene. In: Proceedings of the of the IEEE

Conference on computer vision and pattern recognition (CVPR),

Hilton Head Island, South Carolina, 13–15 June 2000. IEEE

Computer Society (2000)

2. Barinova, O., Lempitsky, V., Tretiakand, E., Kohli P.: Geometric

image parsing in man-made environments. In: Proceedings of the

11th European Conference on computer vision (ECCV), Crete,

Greece, 5–11 September 2010. Lecture Notes in Computer Sci-

ence, vol. 6312. Springer, Berlin Heidelberg (2010)

3. Barnard, S.T.: Interpreting perspective images. Artif. Intell.

21(4), 435–462 (1983). (Elsevier Science B.V)4. Bazin, J.C., Pollefeys, M.: 3-line RANSAC for orthogonal van-

ishing point detection. In: Proceedings of IEEE/RSJ International

Conference on intelligent robots and systems, IROS 2012, Vil-

amoura, Algarve, Portugal, 7–12 October (2012)

5. Bazin, J.C., Seo, Y., Demonceaux, C., Vasseur, P., Ikeuchi, K.,

Kweon, I., Pollefeys, M.: Globally optimal line clustering and

vanishing points estimation in a Manhattan world. In: Proceed-

ings of the IEEE International Conference on computer vision

and pattern recognition (CVPR), Providence, Rhode Island, USA,

16–21 June 2012. IEEE Computer Society (2012)

6. Boulanger, K., Bouatouch, K., Pattanaik, S.: ATIP: a tool for 3D

navigation inside a single image with automatic camera calibra-

tion. In: Proceedings of the EG UK Conference on theory and

practice of computer graphics, Middlesbrough, UK, 20–22 June

(2006)

7. Cantoni, V., Lombardi, L., Porta, M., Sicard, N.: Vanishing point

detection: representation analysis and new approaches. In: Pro-

ceedings of the IEEE International Conference on image analysis

and processing (ICIAP), 26–28 September 2001, Palermo, Italy

2001. IEEE Computer Society (2001)

8. Collins, R.T., Weiss, R.S.: Vanishing point calculation as statis-

tical inference on the unit sphere. In: Proceedings of the 3rd

International Conference on computer vision (ICCV), Osaka,

Japan, 4–7 December, 1990. IEEE (1990)

9. Coughlan, J.M., Yuille, A.L.: Manhattan world: compass

direction from a single image by Bayesian inference. In: Pro-

ceedings of the 7th IEEE International Conference on computer

vision (ICCV), Kerkyra, Greece, 20–27 September 1999. IEEE

(1999)

10. Denis, P., Elder, J.H., Estrada, F.: Efficient edge-based methods

for estimating Manhattan frames in urban imagery. In: Proceed-

ings of the 10th European Conference on computer vision

(ECCV), Marseille, France, 12–18 October 2008. Lecture Notes

in Computer Science, vol. 5303. Springer, Berlin Heidelberg

(2008)

11. Elloumi, W., Treuillet, S., Leconge, R.: Tracking orthogonal

vanishing points in video sequences for a reliable camera orien-

tation in Manhattan world. In: Proceedings of the 5th Interna-

tional Congress on image and signal processing (CISP 2012),

Chongqing, China, 16–18 October (2012)

12. Fisher, M.A., Bolles, R.C.: Random sample consensus: a para-

digm for model fitting with applications to image analysis and

automated cartography. Commun. ACM 24(6), 381–395 (1981).doi:10.1145/358669.358692

13. Förstner, W.: Optimal vanishing point detection and rotation

estimation of single images from a Legoland scene. In: Pro-

ceedings of the ISPRS symposium commission III PCV. S, Paris,

1–3 September (2010)

14. Hide, C., Moore, T., Botterill, T.: Low cost vision-aided IMU for

pedestrian navigation. J. Glob. Pos. Syst. 10(1), 3–10 (2011).doi:10.5081/jgps.10.1.3

15. Hwangbo, M., Kanade, T.: Visual-inertial UAV attitude estima-

tion using urban scene regularities. In: Proceedings of the IEEE

International Conference on robotics and automation (ICRA

2011), Shanghai, China, 9–13 May, 2011. IEEE (2011)

16. Kalantari, M., Hashemi, A., Jung, F., Guédon, J.P.: A new

solution to the relative orientation problem using only 3 points

and the vertical direction. J. Math. Imaging Vis. Arch. 39(3),259–268 (2011). doi: 10.1007/s10851-010-0234-2

17. Kalantari, M., Jung, F., Guédon, J.P.: Precise, automatic and fast

method for vanishing point detection. Photogramm. Rec. 24(127),246–263 (2009)

18. Kosecka, J., Zhang, W.: Video compass. In: Proceedings of the

7th European Conference on computer vision (ECCV), Copen-

hagen, Denmark, 27 May–2 June 2002. Lecture Notes in Com-

puter Science, vol. 2353. Springer, Berlin Heidelberg (2002)

19. Lutton, E., Maitre, H., Lopez-Krahe, J.: Contribution to the

determination of vanishing points using Hough transform. IEEE

Trans. Pattern Anal. Mach. Intell. 16(4), 430–438 (1994)20. Martins, A., Aguiar, P., Figueiredo, M.: Orientation in Manhattan

world: equiprojective classes and sequential estimation. IEEE

Trans. Pattern Anal. Mach. Intell. 27, 822–826 (2005)21. Matas, J., Galambos, C., Kittler, J.V.: Robust detection of lines

using the progressive probabilistic Hough transform. J. Computer

Vis. Image Underst.—Spec. Issue Robust Stat. Tech. Image

Underst. 78(1), 119–137 (2000)22. Micusik, B., Wildenauer, H., Vincze, M.: Towards detection of

orthogonal planes in monocular images of indoor environments.

In: Proceedings of the IEEE International Conference on robotics

and automation (ICRA 2008), Pasadena, California, USA, 19–23

May, 2008. IEEE (2008)

23. Mingawa, A., Tagawa, N., Moriya, T., Gotoh, T.: Vanishing point

and vanishing line estimation with line clustering. IEICE Trans.

Inf. Syst. E83–D(7), 1574–1582 (2000)24. Mirzaei, F.M., Roumeliotis, S.I.: Optimal estimation of vanishing

points in a Manhattan world. In: Proceedings of the 13th IEEE

International Conference on computer vision (ICCV), Barcelona,

Spain, 6–13 November 2011. IEEE (2011)

25. Nieto, M., Salgado, L.: Simultaneous estimation of vanishing

points and their converging lines using the EM algorithm. Pattern

Recogn. Lett. 32(14), 1691–1700 (2011)26. Pflugfelder, R., Bischof, H.: Online auto-calibration in man-made

world. In: Proceedings of the digital image computing: tech-

niques and applications (DICTA), Cairns, Australia. 6–8

December 2005. IEEE Computer Society (2005)

27. Rother, C.: A new approach for vanishing point detection in

architectural environments. In: Proceedings of the 11th British

machine vision conference (BMVC), University of Bristol, 11–14

September (2000)

28. Saurer, O., Fraundorfer, F., Pollefeys, M.: Visual localization

using global visual features and vanishing points. In: Proceedings

of the CLEF (notebook papers/LABs/workshops), Padua, Italy,

20–23 September (2010)

29. Shufelt, J.A.: Performance evaluation and analysis of vanishing

point detection techniques. IEEE Trans. Pattern Anal. Mach.

Intell. 21(3), 282–288 (1999)30. Tardif, J.-P.: Non-iterative approach for fast and accurate van-

ishing point detection. In: Proceedings of the 12th IEEE Inter-

national Conference on computer vision (ICCV), Kyoto, Japan,

27 September–4 October 2009. IEEE (2009)


123

http://dx.doi.org/10.1145/358669.358692http://dx.doi.org/10.5081/jgps.10.1.3http://dx.doi.org/10.1007/s10851-010-0234-2

31. Tuytelaars, T., Gool, L.J.V., Proesmans, M., Moons, T.: A cas-

caded Hough transform as an aid in aerial image interpretation.

In: Proceedings of the 6th IEEE International Conference on

computer vision (ICCV), Bombay, India, 4–7 January 1998.

IEEE Computer Society (1998)

32. Wildenauer, H., Vincze, M.: Vanishing point detection in com-

plex man-made worlds. In: Proceedings of the International

Conference on image analysis and processing (ICIAP), Modena,

Palazzo Ducale, 10–13 September (2007)

Wael Elloumi received hismaster’s degree in computer

science, from the INSA of Ro-

uen, France, in 2008. In 2012,

he obtained his Ph.D. degree in

computer vision from the Uni-

versity of Orleans, France.

Currently, he is a postdoctoral

researcher at PRISME Labora-

tory, Image and Vision team of

the University of Orleans. His

research interests include com-

puter vision, image matching,

3D mapping, vanishing points

and their applications to indoor

navigation and assistive systems.

Sylvie Treuillet received hermaster’s degree in electrical

engineering from the University

of Clermont-Ferrand, France in

1988. From 1988 to 1990, she

worked in a private company in

Montpellier developing image

processing and pattern recogni-

tion methods in automatics

cytogenetics. She then joined

the academic research labora-

tory LASMEA and was awarded

her Ph.D. degree in computer

vision in 1993 at the University

of Clermont-Ferrand, France.

Since 1994, she has been assistant professor in computer sciences and

electronic engineering at the Ecole Polytechnique of the University of

Orleans, France. Her research interests include computer vision for

3D object modeling, pattern recognition, and color image analysis for

biomedical or industrial applications.

Rémy Leconge received hisPh.D. degree in 2002 from the

University of Bourgogne,

France. He is an associated

professor in the image, vision

department of the laboratory

PRISME at the University of

Orleans, France. His research

interests are focused on edge

detection, multiscale, motion

estimation, tracking.


123

Real-time camera orientation estimation based on vanishing point tracking under Manhattan World assumptionAbstractIntroductionRelated workProposed pipelineDominant line detectionVanishing point candidatesVP sampling strategyVanishing point filteringComputation of the camera orientationOne finite and two infinite VPTwo finite and one infinite VPThree finite VP

Experimental resultsAccuracy studyTracking the camera orientation on a Smartphone

ConclusionAcknowledgmentsReferences

real-time camera orientation estimation based on vanishing...

Documents