real-time camera orientation estimation based on vanishing...

16
ORIGINAL RESEARCH PAPER Real-time camera orientation estimation based on vanishing point tracking under Manhattan World assumption Wael Elloumi Sylvie Treuillet Re ´my Leconge Received: 3 June 2013 / Accepted: 20 March 2014 / Published online: 4 April 2014 Ó Springer-Verlag Berlin Heidelberg 2014 Abstract This paper proposes a real-time pipeline for estimating the camera orientation based on vanishing points for indoor navigation assistance on a Smartphone. The orientation of embedded camera relies on the ability to find a reliable triplet of orthogonal vanishing points. The proposed pipeline introduces a novel sampling strategy among finite and infinite vanishing points with a random sample consensus-based line clustering and a tracking along a video sequence to enforce the accuracy and the robustness by extracting the three most pertinent orthogo- nal directions while preserving a short processing time for real-time application. Experiments on real images and video sequences acquired with a Smartphone show that the proposed strategy for selecting orthogonal vanishing points is pertinent as our algorithm gives better results than the recently published RNS optimal method, in particular for the yaw angle, which is actually essential for the navigation task. Keywords Vanishing point Real-time camera orientation RANSAC algorithm Navigation assistance Smartphone 1 Introduction This paper deals with a practical approach for real-time camera orientation for indoor navigation assistance on a Smartphone. Pedestrian navigation assistance is a highly challenging task that is increasingly needed in various types of applications such as visually impaired guidance, emergency intervention, tourism, etc. Despite the growing interest of research in this topic, there is no effective operational system. This can be explained by several technology roadblocks. Besides the absence of existing cartography, the main obstacle remains the availability and accuracy of the required position information. Nearly all Smartphones have built-in GPS sensor, but it is still insufficient for pedestrian positioning because of its low accuracy. Besides, it is often ineffective in urban areas or in indoor environments where the satellite signal reception is undermined. The alternative solution consists in joining measures from various additional self-contained sensors (magnetometers, accelerometer, digital compass, WiFi, Bluetooth, etc.) with a Kalman filter. The problem with these other positioning alternative methods is the integration process which results in large cumulative drift errors (even for very short periods of times). Errors due to electromagnetic perturbations by electric or ferrous mate- rials are also often reported in related works [14]. In this context, embedded computer vision is likely the most attractive technique to increase positioning accuracy, since high-quality cameras are available in Smartphones and image processing is applicable in all environments requir- ing no additional infrastructure installation. One conve- nient way to compute a real-time camera orientation in urban area is based on vanishing points. By perspective projection, the parallel lines in 3D scene intersect in the image plane in a so-called vanishing point. Electronic supplementary material The online version of this article (doi:10.1007/s11554-014-0419-9) contains supplementary material, which is available to authorized users. W. Elloumi (&) S. Treuillet R. Leconge Laboratoire PRISME, Universite ´ d’Orle ´ans, 12 rue de Blois, 45067 Orle ´ans Cedex, France e-mail: [email protected] S. Treuillet e-mail: [email protected] R. Leconge e-mail: [email protected] 123 J Real-Time Image Proc (2017) 13:669–684 DOI 10.1007/s11554-014-0419-9

Upload: others

Post on 17-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • ORIGINAL RESEARCH PAPER

    Real-time camera orientation estimation based on vanishing pointtracking under Manhattan World assumption

    Wael Elloumi • Sylvie Treuillet • Rémy Leconge

    Received: 3 June 2013 / Accepted: 20 March 2014 / Published online: 4 April 2014

    � Springer-Verlag Berlin Heidelberg 2014

    Abstract This paper proposes a real-time pipeline for

    estimating the camera orientation based on vanishing

    points for indoor navigation assistance on a Smartphone.

    The orientation of embedded camera relies on the ability to

    find a reliable triplet of orthogonal vanishing points. The

    proposed pipeline introduces a novel sampling strategy

    among finite and infinite vanishing points with a random

    sample consensus-based line clustering and a tracking

    along a video sequence to enforce the accuracy and the

    robustness by extracting the three most pertinent orthogo-

    nal directions while preserving a short processing time for

    real-time application. Experiments on real images and

    video sequences acquired with a Smartphone show that the

    proposed strategy for selecting orthogonal vanishing points

    is pertinent as our algorithm gives better results than the

    recently published RNS optimal method, in particular for

    the yaw angle, which is actually essential for the navigation

    task.

    Keywords Vanishing point � Real-time cameraorientation � RANSAC algorithm � Navigation assistance �Smartphone

    1 Introduction

    This paper deals with a practical approach for real-time

    camera orientation for indoor navigation assistance on a

    Smartphone. Pedestrian navigation assistance is a highly

    challenging task that is increasingly needed in various

    types of applications such as visually impaired guidance,

    emergency intervention, tourism, etc. Despite the growing

    interest of research in this topic, there is no effective

    operational system. This can be explained by several

    technology roadblocks. Besides the absence of existing

    cartography, the main obstacle remains the availability and

    accuracy of the required position information.

    Nearly all Smartphones have built-in GPS sensor, but it

    is still insufficient for pedestrian positioning because of its

    low accuracy. Besides, it is often ineffective in urban areas

    or in indoor environments where the satellite signal

    reception is undermined. The alternative solution consists

    in joining measures from various additional self-contained

    sensors (magnetometers, accelerometer, digital compass,

    WiFi, Bluetooth, etc.) with a Kalman filter. The problem

    with these other positioning alternative methods is the

    integration process which results in large cumulative drift

    errors (even for very short periods of times). Errors due to

    electromagnetic perturbations by electric or ferrous mate-

    rials are also often reported in related works [14]. In this

    context, embedded computer vision is likely the most

    attractive technique to increase positioning accuracy, since

    high-quality cameras are available in Smartphones and

    image processing is applicable in all environments requir-

    ing no additional infrastructure installation. One conve-

    nient way to compute a real-time camera orientation in

    urban area is based on vanishing points.

    By perspective projection, the parallel lines in 3D scene

    intersect in the image plane in a so-called vanishing point.

    Electronic supplementary material The online version of thisarticle (doi:10.1007/s11554-014-0419-9) contains supplementarymaterial, which is available to authorized users.

    W. Elloumi (&) � S. Treuillet � R. LecongeLaboratoire PRISME, Université d’Orléans, 12 rue de Blois,

    45067 Orléans Cedex, France

    e-mail: [email protected]

    S. Treuillet

    e-mail: [email protected]

    R. Leconge

    e-mail: [email protected]

    123

    J Real-Time Image Proc (2017) 13:669–684

    DOI 10.1007/s11554-014-0419-9

    http://dx.doi.org/10.1007/s11554-014-0419-9http://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdfhttp://crossmark.crossref.org/dialog/?doi=10.1007/s11554-014-0419-9&domain=pdf

  • Unlike finite vanishing points whose coordinates are

    determined in the image plane, the infinite ones which are

    very far from the image center are defined by a direction. In

    man-made urban environments, many line segments are

    oriented along three orthogonal directions aligned with the

    global reference frame. Under this so-called Manhattan

    World assumption, vanishing lines or points are pertinent

    visual cues to estimate the camera orientation [1, 9, 13, 16,

    18, 20, 24]. This is an interesting alternative to structure

    and motion estimation based on feature matching, a sen-

    sitive problem in computer vision. The orientation matrix

    of a calibrated camera, parameterized with three angles,

    may be efficiently computed from three noise-free

    orthogonal vanishing points. Vanishing points are also used

    for camera calibration [18, 26, 30]. However, these tech-

    niques rely on the ability to find a robust orthogonal triplet

    of vanishing points in a real image.

    The estimation of the camera orientation is generally

    computed in a single image. Few works address the

    tracking along a video sequence [4, 20]. In a previous work

    [11], we proposed a solution to achieve an accurate esti-

    mation of the camera orientation based on a sampling

    strategy among finite and infinite vanishing points extrac-

    ted with random sample consensus (RANSAC)-based line

    clustering.

    This paper extends the previous conference paper. The

    principal additive contribution of this paper is a real-time

    implementation on a Smartphone. The experimental results

    have been significantly extended to evaluate our method

    facing other state-of-art methods on real static images. The

    performance evaluation contains additionally longer video

    sequences acquired by a handheld Smartphone. The paper

    gives also more explanations and clarifications than the

    earlier conference paper.

    The paper is organized as follows. A review of related

    work is proposed in Sect. 2. Section 3 presents our method

    for selecting and filtering three reliable orthogonal van-

    ishing points which are used for estimating the camera

    orientation. Experimental results are shown in Sect. 4, and

    Sect. 5 concludes the paper.

    2 Related work

    Since the last 30 years, the literature is broad on the subject

    of vanishing points/vanishing point (VP) computation. In

    this section we review some of the most relevant ones. The

    first approaches used the Hough transform and accumula-

    tion methods [6, 7, 19]. The efficiency of these methods

    highly depends on the discretization of the accumulation

    space and they are not robust in the presence of outliers

    [29]. Furthermore, they do not consider the orthogonality

    of the resulting VP. An exhaustive search method may take

    care of the constraint of orthogonality [27], but is not

    adapted for real-time application.

    Few authors prefer to work on the raw pixels [10, 20];

    published methods mainly work on straight lines extracted

    from image. According to the mathematical formalization

    of VP, some variants exist in the choice of the workspace:

    image plane [27, 30], Hough transform [7, 31], projective

    plane [25, 26] or Gaussian sphere [3, 8, 18, 19]. Using

    Gaussian unit sphere or projective plane allows treating

    equally finite and infinite VP, unlike image plane. This is a

    well-suited representation for simultaneously clustering

    lines that converge at multiple VP by using a probabilistic

    expectation–maximization (EM) joint optimization

    approach [1, 9, 18, 23, 25]. These approaches address the

    misclassification and optimality issues, but the initializa-

    tion and grouping are the determining factors of their

    efficiency.

    Recently, many authors have adopted robust estimation

    based on RANSAC [12]. These approaches consider

    intersection of line segments as VP hypotheses and then

    iteratively clustering the parallel lines, consistent with this

    hypothesis [4, 13, 15, 22, 28, 32]. A variant by J-linkage

    algorithm has been used by [30]. By dismissing the outli-

    ers, the RANSAC-based classifiers are much more robust

    than accumulative methods, limited by the size of the

    accumulator cell, and give a more precise position of the

    VP. They have been used to initialize EM estimators to

    converge to the correct VP. Other optimal solutions rely on

    analytical approach often based on time-consuming algo-

    rithms [2, 5, 17, 24]. In this last paper, it is interesting to

    note that, even if they are nondeterministic, the RANSAC-

    based approaches hold comparable results against exhaus-

    tive search for the number of clustered lines. So, it remains

    a very good approach for extracting the VP candidates, in

    addition with a judicious strategy for selecting a triplet

    consistent with the orthogonality constraint.

    Despite numerous papers dedicated to the straight line

    clustering to compute adequate vanishing points, this prob-

    lem remains an open issue for real-time application in video

    sequences. The estimation of the camera orientation is gen-

    erally computed in a single image. As a summary, Table 1

    classifies the most relevant works according to their search

    space of VP (workspace) and the type of approach used.

    To achieve an accurate estimation of the camera orien-

    tation on video sequences acquired with a handheld

    Smartphone, we develop a real-time pipeline based on

    previous work [11]. Our proposed framework operates in

    the image space, unlike [4] which uses the projection onto

    the Gauss sphere, a workspace well adapted to omnidi-

    rectional camera. It uses line segments as primitives which

    is an easier image information to handle for VP extraction.

    In addition, it allows a joint process of estimation and

    classification of VP based on RANSAC. To preserve a

    670 J Real-Time Image Proc (2017) 13:669–684

    123

  • short processing time, only few VP candidates are selected

    by splitting finite and infinite VP extracted with a RAN-

    SAC-based line clustering. Indeed, the RANSAC algorithm

    allows a robust classification of vanishing lines and an

    accurate detection of their intersection. This classification

    is easy to implement, fast and requires no initialization as

    probabilistic models. Moreover, camera orientation track-

    ing is reinforced by filtering the VP along the video

    sequence.

    3 Proposed pipeline

    The proposed pipeline is given in Fig. 1. To achieve an

    accurate estimation of the camera orientation based on

    three reliable orthogonal VP, the pipeline is composed of

    four steps. The first one consists of dominant line extrac-

    tion from the detected image edges. The second one con-

    sists of selecting a triplet of VP by dominant line clustering

    with RANSAC. At this step, we introduce a guided search

    strategy to select only three reliable orthogonal VP that

    represent the orientation of the camera relative to 3D world

    reference frame. Another contribution is the vanishing

    point filtering performed along the video sequences (step 3)

    to enforce the robustness of the camera orientation com-

    putation (step 4). The next sections give some details and

    justifications about each bloc.

    3.1 Dominant line detection

    Some pre-processing is introduced to improve the quality

    and the robustness of the detected edges in case of

    embedded camera: first, a histogram equalization harmo-

    nizes the distribution of brightness levels in the image.

    Secondly, a geometric correction of lens distortion is done

    assuming that all camera parameters are estimated by

    prerequisite calibration. To find the dominant lines, we first

    detect edges by using a Canny’s detector. Then, we use the

    progressive probabilistic Hough transform (PPHT) [21], a

    form of an adaptive probabilistic HT. The PPHT is faster

    than both standard Hough transform and probabilistic

    Hough transform which makes it well suited for real-time

    application. Furthermore, this method enables the compu-

    tation of line segment length. So, only line segments whose

    length is greater than a threshold p1 are selected as domi-

    nant lines to estimate VP. The PPHT algorithm requires

    five parameters: two of them for the resolution of the

    accumulator or the Hough space (q ¼ 1 pixel and h = 3�);the minimum number of intersections to detect a line in the

    accumulator (d ¼ 50Þ; the minimum length of the selectedline segments (p1 ¼ 30 pixels); the last parameter is themaximum gap between two points to be considered in the

    same line (p2 ¼ 5 pixels). Note that all these parametershave been set empirically.

    3.2 Vanishing point candidates

    To provide three VP, aligned with the three main orthog-

    onal directions of the Manhattan World, the most intuitive

    method is to detect the intersection of dominant lines in

    images. Two types of VP are considered: the finite ones

    correspond to line interception that may be defined by

    coordinates in the image plane; the infinite ones for lines

    intercepting very far from the image center are modeled by

    Table 1 Classification of vanishing point estimation algorithms

    Type of approach Search space of vanishing points

    Image Hough transform Gaussian sphere Projective plane

    Accumulation methods Rother [27] Cantoni et al. [7] Boulanger et al. [6]

    Tuytelaars et al. [31] Shufelt [29]

    Lutton et al. [19]

    Barnard [3]

    RANSAC-based methods Saurer et al. [28] Hwangbo and Kanade [15]

    Tardif [30] Förstner [13]

    Kalantari et al. [17] Bazin and Pollefeys [4]

    Micusik et al. [22]

    Wildenauer and Vincze [32]

    Elloumi et al. [11]

    Probabilistic methods Bazin et al. [5] Kosecka and Zhang [18] Nieto and Salgado [25]

    Mirzaei and Roumeliotis [24] Antone and Teller [1] Pflugfelder and Bischof [26]

    Barinova et al. [2] Collins and Weiss [8]

    Mingawa et al. [23]

    J Real-Time Image Proc (2017) 13:669–684 671

    123

  • a direction. Unlike the Gauss sphere, working in the image

    plane needs this differentiation between finite and infinite

    points, but is faster since no transformation is required.

    Recently, numerous authors adopt RANSAC as a simple

    and powerful method to provide a partition of parallel

    straight lines into clusters by pruning outliers [13, 15, 22,

    28, 32]. The process starts by randomly selecting two lines

    to generate a VP hypothesis. Then, all lines consistent with

    this hypothesis are grouped together to optimize the VP

    estimate. Once a dominant VP is detected, all the associ-

    ated lines are removed and the process is repeated to detect

    the next dominant VP. The principal drawback of this

    sequential search is that no orthogonality constraint is

    imposed for selecting a reliable set of three VP to compute

    the camera orientation. Very recent works propose optimal

    estimates of three orthogonal VP by an analytical approach

    based on a multivariate polynomial system solver [24] or

    by optimization approach based on interval analysis theory

    [5], but at the expenses of complex time-consuming

    algorithms.

    In this work, we introduce a guided search strategy to

    extract a limited number of reliable VP while enforcing the

    orthogonality constraint, in conjunction with RANSAC.

    3.3 VP sampling strategy

    In the context of indoor navigation, the main orthogonal

    directions inManhattanWorld consist generally of a vertical

    one (often associated with an infinite VP) and two horizontal

    ones (associated with finite or infinite VP). So, we propose a

    fast and robust sampling strategy taking advantage of the

    differentiation between finite and infinite VP.

    Three different possible configurations depending on the

    alignment of the image plane with the 3D urban scene are

    considered: (1) one finite and two infinite VP, (2) two finite

    and one infinite VP, (3) three finite VP. The first two

    configurations are common, unlike the third. More details

    about the computation of the camera orientation depending

    on these three configurations will be given in Sect. 3.5. To

    avoid confusion between finite VP and infinite ones, VP

    detection is carried out in two steps: first, we detect finite

    VP; then, we detect the infinite ones. For a robust selection

    of VP, the proposed algorithm starts with the detection of

    three finite candidates. For this, we use RANSAC to choose

    randomly a pair of line segment ðli; ljÞ from the set ofunlabeled lines L. The intersection of these two lines pro-

    vides a potential finite VP ðfvppotÞ. To extract the partici-pant lines to fvppot, we compute a consensus score given by

    Eqs. (1) and (2).

    Score ¼Xn

    i¼0f v; lið Þ ð1Þ

    where n is the number of dominant lines of the set L.

    f v; lið Þ1; d v; lið Þ\d0; otherwise

    �ð2Þ

    Fig. 1 Overview of theproposed algorithm

    672 J Real-Time Image Proc (2017) 13:669–684

    123

  • where d v; lið Þ is the Euclidian distance from the finite VPcandidate v to the line li.

    These three steps are repeated k times. k is the number

    of random combinations for selecting a pair of line seg-

    ments from L (number of iterations performed by the

    algorithm). Note that, for better performance, kmust be

    proportional to the number of line segments. Specifically,

    in our experiments this parameter is set to k ¼ 2� cardðLÞ.The potential VP ðfvppotÞ with the highest number of par-ticipant lines is selected as the first finite VP ðv1Þ. Alldominant lines associated with v1 are labeled and excluded

    from L and the process is repeated twice to detect v2 and v3.

    The procedure used to detect two infinite VP is similar

    to that of finite VP except that the aim is to identify the

    vanishing line direction instead of their intersection. For

    this, we select randomly a line segment from the remaining

    dominant lines of the set L. Then, we compute its direction

    which gives us an infinite potential VP ðivppotÞ: Theassociated lines to ivppot are obtained using the consensus

    score given by Eqs. (1) and (3).

    f v; lið Þ 1 Mindv~; li!� �

    ;dli!; v~

    � �� �\s

    0 otherwise

    8<

    : ð3Þ

    wheredv~; li!� �

    is the angle between the infinite VP direc-

    tion from the image center and the line li to test in image

    space. Note that, in such case, k ¼ cardðLÞ: k is the numberof random combinations for selecting a line segment from

    L (number of iterations performed by the algorithm).

    At the end of this procedure, we obtain three finite VP

    and two infinite VP. The three VP that best satisfy the

    orthogonality constraint are selected as shown in Fig. 2.

    The criteria used in the consensus score (Eq. 1) for

    clustering lines by RANSAC are different depending on

    each category. Unlike the finite VP whose coordinates may

    be determined in the image plane, the infinite VP are

    generally represented as a direction. For finite VP, the

    consensus score is based on a distance between the can-

    didate straight line and the intersecting point (Eq. 2). All

    lines whose distance is below a fixed threshold d areconsidered as participants (d is equal to 4 pixels in ourexperiments). For infinite VP, it uses an angular distance

    between the direction of the candidate straight line and the

    direction representing the infinite VP (Eq. 3). The threshold

    s is equal to 4� in our experiments.To avoid redundant VP candidates, we introduce the

    supplementary constraint to be far enough from each other:

    we impose VP to have a minimum angular distance

    between their directions from the image center (threshold is

    set to 30� for finite VP and 60� for infinite ones).By separating finite from infinite VP, the sampling

    strategy provides the most significant of them without

    giving more importance to one or the other category (we

    enforce having at least one candidate finite). Furthermore,

    this VP sampling strategy is faster as we detect only five

    reliable VP candidates against generally much more for the

    previous published methods.

    Among the five candidates selected before, only three

    VP whose directions from the optical center are orthogonal

    have to be accepted, including at least one finite VP. We

    adopt the following heuristic: (1) choose the finite VP with

    the highest consensus score, (2) select two other VP (finite

    or infinite) based on their orthogonality to the first one, and

    considering their consensus score as a second criterion.

    Finally, we identify the vertical VP and the two horizontal

    ones. In our application, we assume that the camera is kept

    upright: we identify the vertical VP as that which presents

    the closest direction to the vertical direction from the

    image center. The two remaining VP are thus horizontal.

    3.4 Vanishing point filtering

    Once the whole described algorithm is processed for the

    first frame of the video sequence ðN[ 1Þ, the VP positionscan be tracked from one frame to another. Indeed, VP

    positions or directions are slightly modified in video

    sequences or even in a list of successive frames. So, we

    introduce a filtering test to check the consistency between

    the positions of the estimated VP in the frame N and those

    Fig. 2 Illustration of our VP sampling strategy. a Image of dominant lines, b detection of the three pertinent finite VP, c detection of the twopertinent infinite VP, d selection of the three orthogonal VP

    J Real-Time Image Proc (2017) 13:669–684 673

    123

  • estimated in frame N � 1. For this, we use the distancebetween the positions of the VP for the finite ones

    d vN ; vN�1ð Þ, and the angle between the VP directions forthe infinite ones dvN�!; vN�1��!

    � . When a VP is not coherent

    with its previous position or direction, it is re-estimated

    taking into account its previous position or direction and

    using the remains of unclassified lines. Hence, aberrant VP

    are discarded and replaced by new VP that are, at the same

    time, consistent with the previous ones and satisfy the

    orthogonality constraint.

    However in a video sequence, it could happen that

    because of the camera motion an infinite vanishing point

    could become finite or vice versa. This situation could

    happen especially when turning and it affects one of the

    horizontal VP. There are two possible situations:

    1. A finite VP becomes infinite To address this problem,

    we predict the position and the type (finite or infinite)

    of the VP using the position of the horizontal finite VP

    that gives the camera heading. After verifying the

    consistency of this horizontal finite VP, we compute

    the distance d that separates its positions in the frames,N and N - 1. Then, we predict the position of the

    second horizontal VP. Finally, we can determine the

    type of the second horizontal VP depending on

    whether the midpoint of the segment that joins its

    predicted position to the first horizontal VP is located

    inside the image plane or not. For a VP that goes away

    from the center of the image, the midpoint coordinates

    are outside the image and the VP will next be

    considered as infinite.

    2. An infinite VP becomes finite This means that the

    interception point of associated lines to the VP is no

    longer very far from the image center. Therefore, the

    VP is detected as finite. In this case, the consistency

    criterion for infinite VP remains valid since we can

    compute the angular distance between the VP direc-

    tions. For this, we just estimate the direction of the VP

    in the current image from the image center (finite VP)

    and compare it to its previous direction (infinite VP).

    Vanishing point filtering is efficient since it makes our

    algorithm much more stable and robust as will be shown in

    the experiments section, making unnecessary the standard

    optimization steps after RANSAC. Once the three most

    reliable VP are extracted in the image, the camera orientation

    is computed frame by frame as described in the next section.

    3.5 Computation of the camera orientation

    This part is directly inspired from [6] to compute the camera

    orientation from the three VP supposed to be orthogonal.

    We use the directions of the detected VP which corre-

    spond to the camera orientation to compute the rotation

    matrix u~; v~;w~ð Þ. The vectors u~; v~ and w~ represent threeorthogonal directions of the scene, respectively: the first

    horizontal direction, the vertical direction and the second

    horizontal direction. They need to satisfy the following

    orthonormal relations:

    u~: v~¼ v~:w~ ¼ w~ : u~¼ 0u~k k ¼ v~k k ¼ w~k k ¼ 1

    �ð4Þ

    Figure 3 depicts an example of the camera orientation

    estimation with three orthogonal VP (two finite and one

    infinite VP). X~; Y~;Z~�

    is the camera coordinate frame,

    a;w; qð Þ the three Euler angles relating to the camera thatrepresent the yaw, the pitch and the roll, respectively, V1

    and V2 the two finite VP and I1!

    the infinite VP direction.

    The estimation of the camera orientation depends on the

    VP configurations [6].

    3.5.1 One finite and two infinite VP

    This situation is the most frequent one. It occurs when the

    image plane is aligned with two axes of the world coor-

    dinate frame as shown in Fig. 4. Let V be the finite VP and

    f the focal length. The direction of V can be expressed as

    OV�! ¼ ðV1x;V1y;�f ÞT, whereas the directions of the infiniteVP, in image space, are I1

    !¼ ðI1x; I1y; 0ÞTandI2!¼ ðI2x; I2y; 0ÞT. The vectors of the rotation matrix aregiven by the following system of equations [6]:

    u~¼ I1x; I1y;I1xV1x þ I1yV1y

    f

    � �T

    v~¼ I2x; I2y;I2xV1x þ I2yV1y

    f

    � �T

    w~ ¼ �OV�! ¼ �V1x;�V1y; f� T

    8>>>>><

    >>>>>:

    ð5Þ

    3.5.2 Two finite and one infinite VP

    This situation happens when the image plane is aligned

    with only one of the three axes of the world coordinate

    frame (Fig. 3). Let V1 and V2 be the two finite VP of

    directions OV1��! ¼ ðV1x;V1y;�f ÞT and OV2

    ��! ¼ðV2x;V2y;�f ÞT: Since there are two finite horizontal VP,we set w~ to the closest VP to the image center. The vector v~

    is obtained by cross product as shown in the system of

    equations below [6].

    u~¼ �V1x;�V1y; f� T

    w~ ¼ �V2x;�V2y; f� T

    v~¼ u~� w~

    8><

    >:ð6Þ

    674 J Real-Time Image Proc (2017) 13:669–684

    123

  • 3.5.3 Three finite VP

    This last configuration is the least frequent one. It occurs

    when there is no alignment between the image plane and

    the world coordinate frame as depicted in Fig. 5.

    Let V1, V2 and V3 be the three finite VP of directions

    OV1��! ¼ ðV1x;V1y;�f ÞT, OV2

    ��! ¼ ðV2x;V2y;�f ÞT and OV3��! ¼

    ðV3x;V3y;�f ÞT: We start by setting v~ to the VP whosedirection is closest to the vertical direction. We then set w~

    to the closest VP to the image center. In the system of Eq.

    (7) [6], we assume that V2 is the vertical VP and V3 is

    closest to the image center.

    u~¼ �V1x;�V1y; f� T

    v~¼ �V2x;�V2y; f� T

    w~ ¼ �V3x;�V3y; f� T

    8><

    >:ð7Þ

    4 Experimental results

    This section presents the performance evaluation of the

    proposed method on real static images and video

    sequences.

    4.1 Accuracy study

    For comparison purpose, we have tested our algorithm on

    the public York Urban Database (YUD) facing two other

    approaches for estimating the VP quoted in the related

    work section:

    • An Hough-based accumulation approach ATIP pro-posed by [6].

    • An analytical method RNS recently published by [24]that provides optimal least-squares estimates of three

    orthogonal vanishing points using a RANSAC-based

    classification of lines.

    This database provides the original images, camera

    calibration parameters, ground truth line segments and the

    three Manhattan vanishing points to compute the three

    Euler angles relating to the camera orientation for each

    image [10]. Figure 6 illustrates some orthogonal vanishing

    points and their associated parallel lines extracted by our

    algorithm on some images pulled out of the YUD. Van-

    ishing lines are colored according to the vanishing point

    they meet at.

    The Table 2 presents a comparative study of the angular

    distance from ground truth (GT) of the camera orientation

    Fig. 4 Computation of the camera orientation with one finite VP andtwo infinite VP

    Fig. 5 Computation of the camera orientation with three finite VP

    Fig. 3 Camera coordinate frame

    J Real-Time Image Proc (2017) 13:669–684 675

    123

  • computed, respectively, with the proposed framework,

    RNS method, and the ATIP method. The average and

    standard deviation of the angular distance are performed on

    a set of 50 images for the three angles. The three last rows

    of Table 2 give the number of times the distance exceeds a

    fixed value of 2�, 5� and 10�, respectively. Our methodperforms accurate estimates of the camera orientation since

    the angular distance remains inferior to 2� for most images.The ATIP method is obviously less reliable, as it may

    misclassify parallel lines in the presence of outliers and

    does not include the orthogonality constraints between the

    vanishing points. By using RANSAC-based classification

    of lines, the other methods remove the outliers from the

    parallel groups.

    The RNS method gives the best result for the pitch

    angle, but it is interesting to note that our method is sig-

    nificantly better for the yaw and roll angles. The yaw is

    actually essential for a pedestrian navigation task, since it

    gives the camera viewing direction. This may be explained

    by our strategy for selecting orthogonal vanishing points

    that are distant enough from each other without confusion

    between finite and infinite points.

    The angular deviation of the three Euler angles esti-

    mated by our proposed framework, RNS and ATIP meth-

    ods, compared to the GT, is reported in Fig. 7a. Figure 7b,

    c provides the maximum and the minimum angular devi-

    ation of the estimated vanishing points from the GT,

    respectively. We note that, using our proposed framework,

    78 % of the tested images provide an angular deviation

    error inferior to 2�, while RNS and ATIP methods giveonly 68 and 18 %, respectively.

    Our method and ATIP have been implemented by using

    Visual C?? and OpenCV library which is an open source

    library specialized for computer vision applications. The

    Fig. 6 Examples of some triplets of orthogonal VP detected by our algorithm on images from YUD. Magenta lines correspond to the verticalVP, while blue and green lines correspond to the two horizontal VP

    Table 2 Accuracy validation of the camera orientation computation

    Our method ang. dist. from GT RNS ang. dist. from GT ATIP ang. dist. from GT

    Pitch Yaw Roll Pitch Yaw Roll Pitch Yaw Roll

    Average 1.38 0.75 0.69 0.74 1.70 1.81 3.80 4.36 1.80

    Standard deviation 1.57 0.60 0.65 0.62 1.82 1.88 4.74 5.69 2.67

    [2� 8 3 1 1 13 16 23 27 11[5� 2 0 0 0 2 2 14 13 6[10� 0 0 0 0 1 1 5 4 1

    Comparative study of the methods with regard to average and standard deviation of the angular distance from the GT (in degrees)

    676 J Real-Time Image Proc (2017) 13:669–684

    123

  • full results of the RNS method are available in a technical

    report provided online by the authors (http://umn.edu/

    *faraz/vp).Concerning the processing time for estimating the three

    orthogonal vanishing points on YUD images of size

    640 9 480 pixels, our algorithm takes *86 ms against*1.12 s for ATIP method on an Intel Core 2 Duo pro-cessor without considering the step of line extraction. For

    RNS, the authors mention that their current C??

    implementation of the symbolic-numeric method takes

    *290 ms to find the solutions on a Core 2 Duo processor.

    4.2 Tracking the camera orientation on a Smartphone

    To show the efficiency of our algorithm for tracking the

    camera orientation, we acquire real video sequences by a

    handheld Smartphone. Figure 8 illustrates our experimen-

    tal prototype. The whole image processing chain presented

    Fig. 7 Comparison of theaccuracy of our proposed

    framework against RNS and

    ATIP methods. Cumulative

    histograms of: a the angulardeviation of the three Euler

    angles from the GT (in degrees),

    b the maximum angulardeviation from the GT and c theminimum angular deviation

    from the GT

    J Real-Time Image Proc (2017) 13:669–684 677

    123

    http://umn.edu/~faraz/vphttp://umn.edu/~faraz/vp

  • above has been implemented on Android-based Smart-

    phone (Samsung GALAXY S III) using OpenCv4Android

    SDK. The Smartphone’s camera has been first calibrated

    using the tool: http://www.vision.caltech.edu/bouguetj/

    calib_doc/, a software proposed by Bouguet.

    Figures 9 and 10 depict some typical results of vanish-

    ing point extraction for two sequences acquired at the

    premises of Polytech’ Orléans. The first one, referred to as

    ‘‘Seq1’’, is composed of 475 frames of size 320 9 240

    pixels. The second one, referred to as ‘‘Seq2’’, contains 478

    Fig. 8 Our experimental prototype

    Fig. 9 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence

    referred to as ‘‘Seq1’’. Vanishing lines are colored according to the

    vanishing point at which they meet. Below each example image, a 3D

    vector representation to illustrate the estimate angles is shown

    678 J Real-Time Image Proc (2017) 13:669–684

    123

    http://www.vision.caltech.edu/bouguetj/calib_doc/http://www.vision.caltech.edu/bouguetj/calib_doc/

  • images. The vanishing lines are shown colored according

    to the vanishing point they have been associated to. For

    each image example, the camera orientation is shown

    below by a 3D vector representation.

    Figures 11 and 12 compare the evolution of pitch, yaw

    and the roll angles of the camera orientation along the two

    sequences by applying our method with and without the

    vanishing point filtering (VPF). The VPF produces a

    smooth running and a more reliable estimation for camera

    orientation along the video sequence. Since the VPF

    removes some aberrant vanishing points, keeping only the

    points that are consistent, we then obtain a more accurate

    camera orientation.

    To illustrate the efficiency of the proposed sampling

    strategy of vanishing points based on line clustering with

    RANSAC, Figs. 13 and 14 show the evolution of the

    number of vanishing lines extracted along the two

    sequences. These two figures represent the total number of

    vanishing lines as well as the number of participant lines to

    the three VP (inliers). They also depict the subset of lines

    associated with the two horizontal VP and the subset of

    lines associated with the vertical VP. It is clear that by

    using RANSAC-based classification of lines, the method

    removes the outliers.

    The whole algorithm for estimating the three orthogonal

    vanishing points including line segment detection and

    orientation estimation runs in about 100 ms per frame of

    size 320 9 240 pixels on the Galaxy S III, depending on

    the scene and the number of detected lines.

    5 Conclusion

    Selecting three reliable orthogonal vanishing points in

    images, corresponding to the Manhattan direction, is a

    prerequisite to achieve an accurate estimation of the

    Fig. 10 Examples of detection and tracking of triplets of orthogonalvanishing points and their associated lines in the video sequence

    referred to as ‘‘Seq2’’. Line segments are colored according to the

    vanishing point at which they meet. Below each example image, a 3D

    vector representation to illustrate the estimate angles is shown

    J Real-Time Image Proc (2017) 13:669–684 679

    123

  • Fig. 11 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq1’’

    680 J Real-Time Image Proc (2017) 13:669–684

    123

  • Fig. 12 Smoothing effect of the vanishing point filtering (VPF) on the estimation of the camera’s orientation (pitch, yaw and roll angles) alongthe video sequence referred to as ‘‘Seq2’’

    J Real-Time Image Proc (2017) 13:669–684 681

    123

  • camera orientation. In this paper, we present and test a

    solution for real-time application on a Smartphone. Our

    algorithm relies on a novel sampling strategy among finite

    and infinite vanishing points and a filtering along a video

    sequence. The performance of our algorithm is validated

    using real static images and video sequences. Experimental

    results on real images show that the adopted strategy for

    selecting three reliable distant and orthogonal vanishing

    points in conjunction with RANSAC performs well in

    practice. We obtain competitive results for camera orienta-

    tion estimation compared with the best analytical method of

    the state of the art. Actually, our method is best adapted to

    navigation task, since the estimation is significantly better

    for the yaw angle which gives the pedestrian heading.

    Furthermore, our method was tested on videos captured

    in real condition of pedestrian navigation with an embed-

    ded handheld Smartphone. It has proved strong for tracking

    the camera orientation during video sequences. The results

    show that the proposed filtering is efficient to dismiss

    aberrant vanishing points along the sequence.

    In future, the presented camera orientation estimation

    will be a part of a blind pedestrian navigation system. The

    vision-based module is devoted to limit the drift error by

    using key frames along the path and to refine position

    approximate by integrating the heading of the user between

    successive key frames. Furthermore, the heading informa-

    tion is very important for guiding a pedestrian during

    navigation.

    Fig. 13 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq1’’

    Fig. 14 Evolution of the total number of lines extracted in images and the number of lines, respectively, associated with horizontal and verticalVP along the video sequence referred to as ‘‘Seq2’’

    682 J Real-Time Image Proc (2017) 13:669–684

    123

  • Acknowledgments This study is supported by HERON Technolo-gies SAS and the Conseil Général du LOIRET. The authors gratefully

    acknowledge the contribution of Kamel Guissous and Aladine Che-

    touani for their help concerning Smartphone coding.

    References

    1. Antone, M., Teller, S.: Automatic recovery of relative camera

    rotations for urban scene. In: Proceedings of the of the IEEE

    Conference on computer vision and pattern recognition (CVPR),

    Hilton Head Island, South Carolina, 13–15 June 2000. IEEE

    Computer Society (2000)

    2. Barinova, O., Lempitsky, V., Tretiakand, E., Kohli P.: Geometric

    image parsing in man-made environments. In: Proceedings of the

    11th European Conference on computer vision (ECCV), Crete,

    Greece, 5–11 September 2010. Lecture Notes in Computer Sci-

    ence, vol. 6312. Springer, Berlin Heidelberg (2010)

    3. Barnard, S.T.: Interpreting perspective images. Artif. Intell.

    21(4), 435–462 (1983). (Elsevier Science B.V)4. Bazin, J.C., Pollefeys, M.: 3-line RANSAC for orthogonal van-

    ishing point detection. In: Proceedings of IEEE/RSJ International

    Conference on intelligent robots and systems, IROS 2012, Vil-

    amoura, Algarve, Portugal, 7–12 October (2012)

    5. Bazin, J.C., Seo, Y., Demonceaux, C., Vasseur, P., Ikeuchi, K.,

    Kweon, I., Pollefeys, M.: Globally optimal line clustering and

    vanishing points estimation in a Manhattan world. In: Proceed-

    ings of the IEEE International Conference on computer vision

    and pattern recognition (CVPR), Providence, Rhode Island, USA,

    16–21 June 2012. IEEE Computer Society (2012)

    6. Boulanger, K., Bouatouch, K., Pattanaik, S.: ATIP: a tool for 3D

    navigation inside a single image with automatic camera calibra-

    tion. In: Proceedings of the EG UK Conference on theory and

    practice of computer graphics, Middlesbrough, UK, 20–22 June

    (2006)

    7. Cantoni, V., Lombardi, L., Porta, M., Sicard, N.: Vanishing point

    detection: representation analysis and new approaches. In: Pro-

    ceedings of the IEEE International Conference on image analysis

    and processing (ICIAP), 26–28 September 2001, Palermo, Italy

    2001. IEEE Computer Society (2001)

    8. Collins, R.T., Weiss, R.S.: Vanishing point calculation as statis-

    tical inference on the unit sphere. In: Proceedings of the 3rd

    International Conference on computer vision (ICCV), Osaka,

    Japan, 4–7 December, 1990. IEEE (1990)

    9. Coughlan, J.M., Yuille, A.L.: Manhattan world: compass

    direction from a single image by Bayesian inference. In: Pro-

    ceedings of the 7th IEEE International Conference on computer

    vision (ICCV), Kerkyra, Greece, 20–27 September 1999. IEEE

    (1999)

    10. Denis, P., Elder, J.H., Estrada, F.: Efficient edge-based methods

    for estimating Manhattan frames in urban imagery. In: Proceed-

    ings of the 10th European Conference on computer vision

    (ECCV), Marseille, France, 12–18 October 2008. Lecture Notes

    in Computer Science, vol. 5303. Springer, Berlin Heidelberg

    (2008)

    11. Elloumi, W., Treuillet, S., Leconge, R.: Tracking orthogonal

    vanishing points in video sequences for a reliable camera orien-

    tation in Manhattan world. In: Proceedings of the 5th Interna-

    tional Congress on image and signal processing (CISP 2012),

    Chongqing, China, 16–18 October (2012)

    12. Fisher, M.A., Bolles, R.C.: Random sample consensus: a para-

    digm for model fitting with applications to image analysis and

    automated cartography. Commun. ACM 24(6), 381–395 (1981).doi:10.1145/358669.358692

    13. Förstner, W.: Optimal vanishing point detection and rotation

    estimation of single images from a Legoland scene. In: Pro-

    ceedings of the ISPRS symposium commission III PCV. S, Paris,

    1–3 September (2010)

    14. Hide, C., Moore, T., Botterill, T.: Low cost vision-aided IMU for

    pedestrian navigation. J. Glob. Pos. Syst. 10(1), 3–10 (2011).doi:10.5081/jgps.10.1.3

    15. Hwangbo, M., Kanade, T.: Visual-inertial UAV attitude estima-

    tion using urban scene regularities. In: Proceedings of the IEEE

    International Conference on robotics and automation (ICRA

    2011), Shanghai, China, 9–13 May, 2011. IEEE (2011)

    16. Kalantari, M., Hashemi, A., Jung, F., Guédon, J.P.: A new

    solution to the relative orientation problem using only 3 points

    and the vertical direction. J. Math. Imaging Vis. Arch. 39(3),259–268 (2011). doi: 10.1007/s10851-010-0234-2

    17. Kalantari, M., Jung, F., Guédon, J.P.: Precise, automatic and fast

    method for vanishing point detection. Photogramm. Rec. 24(127),246–263 (2009)

    18. Kosecka, J., Zhang, W.: Video compass. In: Proceedings of the

    7th European Conference on computer vision (ECCV), Copen-

    hagen, Denmark, 27 May–2 June 2002. Lecture Notes in Com-

    puter Science, vol. 2353. Springer, Berlin Heidelberg (2002)

    19. Lutton, E., Maitre, H., Lopez-Krahe, J.: Contribution to the

    determination of vanishing points using Hough transform. IEEE

    Trans. Pattern Anal. Mach. Intell. 16(4), 430–438 (1994)20. Martins, A., Aguiar, P., Figueiredo, M.: Orientation in Manhattan

    world: equiprojective classes and sequential estimation. IEEE

    Trans. Pattern Anal. Mach. Intell. 27, 822–826 (2005)21. Matas, J., Galambos, C., Kittler, J.V.: Robust detection of lines

    using the progressive probabilistic Hough transform. J. Computer

    Vis. Image Underst.—Spec. Issue Robust Stat. Tech. Image

    Underst. 78(1), 119–137 (2000)22. Micusik, B., Wildenauer, H., Vincze, M.: Towards detection of

    orthogonal planes in monocular images of indoor environments.

    In: Proceedings of the IEEE International Conference on robotics

    and automation (ICRA 2008), Pasadena, California, USA, 19–23

    May, 2008. IEEE (2008)

    23. Mingawa, A., Tagawa, N., Moriya, T., Gotoh, T.: Vanishing point

    and vanishing line estimation with line clustering. IEICE Trans.

    Inf. Syst. E83–D(7), 1574–1582 (2000)24. Mirzaei, F.M., Roumeliotis, S.I.: Optimal estimation of vanishing

    points in a Manhattan world. In: Proceedings of the 13th IEEE

    International Conference on computer vision (ICCV), Barcelona,

    Spain, 6–13 November 2011. IEEE (2011)

    25. Nieto, M., Salgado, L.: Simultaneous estimation of vanishing

    points and their converging lines using the EM algorithm. Pattern

    Recogn. Lett. 32(14), 1691–1700 (2011)26. Pflugfelder, R., Bischof, H.: Online auto-calibration in man-made

    world. In: Proceedings of the digital image computing: tech-

    niques and applications (DICTA), Cairns, Australia. 6–8

    December 2005. IEEE Computer Society (2005)

    27. Rother, C.: A new approach for vanishing point detection in

    architectural environments. In: Proceedings of the 11th British

    machine vision conference (BMVC), University of Bristol, 11–14

    September (2000)

    28. Saurer, O., Fraundorfer, F., Pollefeys, M.: Visual localization

    using global visual features and vanishing points. In: Proceedings

    of the CLEF (notebook papers/LABs/workshops), Padua, Italy,

    20–23 September (2010)

    29. Shufelt, J.A.: Performance evaluation and analysis of vanishing

    point detection techniques. IEEE Trans. Pattern Anal. Mach.

    Intell. 21(3), 282–288 (1999)30. Tardif, J.-P.: Non-iterative approach for fast and accurate van-

    ishing point detection. In: Proceedings of the 12th IEEE Inter-

    national Conference on computer vision (ICCV), Kyoto, Japan,

    27 September–4 October 2009. IEEE (2009)

    J Real-Time Image Proc (2017) 13:669–684 683

    123

    http://dx.doi.org/10.1145/358669.358692http://dx.doi.org/10.5081/jgps.10.1.3http://dx.doi.org/10.1007/s10851-010-0234-2

  • 31. Tuytelaars, T., Gool, L.J.V., Proesmans, M., Moons, T.: A cas-

    caded Hough transform as an aid in aerial image interpretation.

    In: Proceedings of the 6th IEEE International Conference on

    computer vision (ICCV), Bombay, India, 4–7 January 1998.

    IEEE Computer Society (1998)

    32. Wildenauer, H., Vincze, M.: Vanishing point detection in com-

    plex man-made worlds. In: Proceedings of the International

    Conference on image analysis and processing (ICIAP), Modena,

    Palazzo Ducale, 10–13 September (2007)

    Wael Elloumi received hismaster’s degree in computer

    science, from the INSA of Ro-

    uen, France, in 2008. In 2012,

    he obtained his Ph.D. degree in

    computer vision from the Uni-

    versity of Orleans, France.

    Currently, he is a postdoctoral

    researcher at PRISME Labora-

    tory, Image and Vision team of

    the University of Orleans. His

    research interests include com-

    puter vision, image matching,

    3D mapping, vanishing points

    and their applications to indoor

    navigation and assistive systems.

    Sylvie Treuillet received hermaster’s degree in electrical

    engineering from the University

    of Clermont-Ferrand, France in

    1988. From 1988 to 1990, she

    worked in a private company in

    Montpellier developing image

    processing and pattern recogni-

    tion methods in automatics

    cytogenetics. She then joined

    the academic research labora-

    tory LASMEA and was awarded

    her Ph.D. degree in computer

    vision in 1993 at the University

    of Clermont-Ferrand, France.

    Since 1994, she has been assistant professor in computer sciences and

    electronic engineering at the Ecole Polytechnique of the University of

    Orleans, France. Her research interests include computer vision for

    3D object modeling, pattern recognition, and color image analysis for

    biomedical or industrial applications.

    Rémy Leconge received hisPh.D. degree in 2002 from the

    University of Bourgogne,

    France. He is an associated

    professor in the image, vision

    department of the laboratory

    PRISME at the University of

    Orleans, France. His research

    interests are focused on edge

    detection, multiscale, motion

    estimation, tracking.

    684 J Real-Time Image Proc (2017) 13:669–684

    123

    Real-time camera orientation estimation based on vanishing point tracking under Manhattan World assumptionAbstractIntroductionRelated workProposed pipelineDominant line detectionVanishing point candidatesVP sampling strategyVanishing point filteringComputation of the camera orientationOne finite and two infinite VPTwo finite and one infinite VPThree finite VP

    Experimental resultsAccuracy studyTracking the camera orientation on a Smartphone

    ConclusionAcknowledgmentsReferences