ucalgary 2014 dawar neha · 2017. 6. 27. · neha dawar a thesis ... cv based 6dof trajectory...
TRANSCRIPT
UNIVERSITY OF CALGARY
Computer Vision based Indoor Navigation Utilizing Information from Planar Surfaces
by
Neha Dawar
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE
DEGREE OF MASTER OF SCIENCE
GRADUATE PROGRAM IN ELECTRICAL AND COMPUTER ENGINEERING
CALGARY, ALBERTA
SEPTEMBER, 2014
© Neha Dawar 2014
ii
Abstract
Traditional wireless signalling based outdoor navigation techniques generally result in
unsatisfactory performance for indoor environments due to low signal strength and multipath
distortions. Computer vision (CV) sensors, due to their low cost and high performance, have
gained enormous interest in indoor navigation over the past years.
CV based 6DOF trajectory estimation is understood to be a computationally intensive ill-posed
problem. Drastic simplification and enhanced robustness are possible in scenarios where camera
observed features are constrained to a plane, such as a floor surface. Furthermore, if the features
have geometric patterns, such as a regularly tiled surface, significantly more powerful constraints
can be implemented. Exploration of such constraints is the aim of this thesis. Experimental
results show that centimeter level accuracy in trajectory estimation can be achieved for arbitrary
camera motion spanning several meters. As shown in this thesis, this accuracy is a result of
constraints due to planar features observed.
iii
Acknowledgements
First of all, I would like to express my deepest appreciation and thanks to my supervisor, Dr.
John Nielsen, and my co-supervisor, Dr. Gérard Lachapelle, for providing me with this
opportunity to be a part of one of the most renowned research groups in navigation. I would like
to thank you for your valuable support and wisdom in encouraging my research. Without your
guidance and assistance, this research and the results achieved would not have been possible.
I would like to thank my friend and team member, Yuqi Li, for all the educational discussions
and guidance throughout the course of my research. A special thanks to my friend, Tushar
Sharma, for his continuous help and assistance with taking the readings for the experimental
verification of the work. I would like to acknowledge the Electrical and Computer Engineering
Department for making this research possible.
Finally, I would like to thank my parents, who despite of being far away, have always provided
me with great advices and encouragement. I am highly grateful to them for always being
supportive of my studies and for always being a source of enthusiasm for me.
iv
Table of Contents
Abstract ............................................................................................................................... ii Acknowledgements ............................................................................................................ iii Table of Contents ............................................................................................................... iv List of Tables ..................................................................................................................... vi List of Figures and Illustrations ....................................................................................... viii List of Symbols, Abbreviations and Nomenclature ......................................................... xiv
INTRODUCTION ..................................................................................1 CHAPTER ONE:1.1 Introduction to Navigation .........................................................................................1 1.2 Existing Indoor Navigation Techniques ....................................................................3 1.3 Integration with Computer Vision .............................................................................4 1.4 Objectives ..................................................................................................................7 1.5 Contributions .............................................................................................................8 1.6 Organization ...............................................................................................................9
BACKGROUND .................................................................................11 CHAPTER TWO:2.1 The Geometric Model ..............................................................................................12 2.2 Transformations .......................................................................................................16
2.2.1 Affine Transformation .....................................................................................16 2.2.2 Perspective Transformation .............................................................................18
2.3 Feature points ...........................................................................................................22 2.3.1 Examples of feature detection .........................................................................25
2.4 Optical Flow ............................................................................................................28 2.4.1 Example of optical flow using Lucas Kanade Pyramid ..................................34
PROPOSED ALGORITHM ............................................................37 CHAPTER THREE:3.1 Camera Calibration ..................................................................................................37
3.1.1 Intrinsic Camera Parameters ............................................................................38 3.1.2 Distortion Parameters ......................................................................................40 3.1.3 Calibration and distortion mitigation ...............................................................42
3.2 Image Pre-processing ...............................................................................................43 3.2.1 Gaussian Smoothing ........................................................................................45 3.2.2 Edge Detection ................................................................................................47 3.2.3 Thresholding ....................................................................................................50
3.3 Hough Lines .............................................................................................................52 3.4 Proposed 4DOF egomotion algorithm .....................................................................55
3.4.1 Least Squares estimation .................................................................................61 3.4.2 Kalman Filter estimation .................................................................................63 3.4.3 Estimation of camera motion from the transformation matrix ........................65
3.5 Proposed 6DOF algorithm for rectangular patterned surface ..................................66 3.6 Proposed 6DOF algorithm for any planar surface ...................................................75
EXPERIMENTAL VERIFICATION ................................................84 CHAPTER FOUR:4.1 4DOF algorithm verification ...................................................................................85
v
4.1.1 Verification on simulated videos .....................................................................86 4.1.2 Verification using stereoscopic view ...............................................................95 4.1.3 Back Projection Verification .........................................................................100 4.1.4 Verification based on known trajectory ........................................................104
4.2 Verification of 6DOF algorithm for rectangular patterned surfaces ......................108 4.2.1 Results of tilt removal on rectangular tiled floor ...........................................109 4.2.2 Verification based on stereoscopic view .......................................................110 4.2.3 Verification based on back projection ...........................................................115 4.2.4 Verification based on a known trajectory ......................................................117
4.3 Verification of 6DOF algorithm for camera directed at any planar surface ..........120 4.3.1 Verification using stereoscopic view .............................................................120 4.3.2 Verification based on known trajectory ........................................................125 4.3.3 Comparison of trajectory estimation on patterned and concrete surfaces .....127
CONCLUSIONS AND FUTURE WORK ........................................132 CHAPTER FIVE:5.1 Conclusions ............................................................................................................132 5.2 Future Work ...........................................................................................................135
REFERENCES ................................................................................................................137
vi
List of Tables
Table 2.1 Definitions to understand the geometric model ............................................................ 12
Table 3.1 Intrinsic parameters of the Bumblebee stereo camera .................................................. 43
Table 4.1 RMS variations in trajectories of two cameras moving together estimated using the 4DOF algorithm .................................................................................................................... 97
Table 4.2 RMS variations in azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm .................................................................................................... 98
Table 4.3 RMS variations in trajectories of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................. 100
Table 4.4 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm ...................................................................... 100
Table 4.5 RMS errors in camera trajectory obtained for a circular motion using the 4DOF algorithm ............................................................................................................................. 107
Table 4.6 RMS errors in azimuthal rotation obtained for circular motion using the 4DOF algorithm ............................................................................................................................. 108
Table 4.7 RMS variations in trajectories of cameras moving together obtained using the 6DOF algorithm for rectangular patterns ............................................................................ 111
Table 4.8 RMS variations in azimuthal rotations of cameras moving together obtained using the 6DOF algorithm for rectangular patterns ...................................................................... 112
Table 4.9 RMS variations in trajectories of the left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ................................. 113
Table 4.10 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ................................. 114
Table 4.11 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns ....................................................................................................... 119
Table 4.12 RMS errors in the azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................ 120
Table 4.13 RMS variations in long range trajectories of the two sensors of the Bumblebee camera obtained using the 6DOF algorithm ....................................................................... 121
Table 4.14 RMS variations in long range azimuthal rotations of two sensors of the Bumblebee camera obtained using the 6DOF algorithm .................................................... 122
vii
Table 4.15 RMS variations in the trajectories of the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................. 123
Table 4.16 RMS variations in azimuthal rotations of sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................. 125
Table 4.17 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm 127
Table 4.18 RMS errors in azimuthal rotation obtained for a circular motion using the 6DOF algorithm ............................................................................................................................. 127
Table 4.19 RMS errors in trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm ............................................................................................................ 129
Table 4.20 RMS errors of trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm ............................................................................................................ 129
Table 4.21 RMS errors in azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm ...................................................................................... 131
Table 4.22 RMS errors of azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm ...................................................................................... 131
viii
List of Figures and Illustrations
Figure 1.1 Multipath scenario in urban canyon .............................................................................. 2
Figure 1.2 Positioning based on triangulation ................................................................................ 3
Figure 1.3 Examples of planar surfaces .......................................................................................... 5
Figure 1.4 Examples of patterned surfaces ..................................................................................... 6
Figure 2.1 Imaging model for pinhole camera [22] ...................................................................... 13
Figure 2.2 Frontal imaging model for pinhole camera ................................................................. 14
Figure 2.3 Projection of 3D point on the camera image plane ..................................................... 15
Figure 2.4 An example of an affine transformed image ............................................................... 16
Figure 2.5 Affine transformation .................................................................................................. 17
Figure 2.6 An example of perspective transformation .................................................................. 19
Figure 2.7 Illustration of suitable and unsuitable feature points ................................................... 22
Figure 2.8 Wedge corners deviating from 90o providing low quality feature points ................... 23
Figure 2.9 Poor quality feature points at circular arcs .................................................................. 23
Figure 2.10 Corner feature points ................................................................................................. 25
Figure 2.11 Derivative images for corner features ....................................................................... 26
Figure 2.12 Plot of the larger eigenvalues of Q for 90! features ................................................ 26
Figure 2.13 Plot of the smaller eigenvalues of Q for 90! features .............................................. 27
Figure 2.14 Corner detection of simple geometric shapes ............................................................ 28
Figure 2.15 Side view of Gaussian pulse at time t and t+dt ......................................................... 32
Figure 2.16 Top view of Gaussian pulse at time t and t+dt .......................................................... 32
Figure 2.17 Spatial derivative of the Gaussian pulse in x and y directions .................................. 33
Figure 2.18 Time derivative of the Gaussian pulse ...................................................................... 33
Figure 2.19 Pyramid structure of images in Lucas Kanade Pyramid algorithm ........................... 34
ix
Figure 2.20 Plot of Gaussian pulses at different levels of pyramid .............................................. 35
Figure 2.21 Contour of Gaussian pulse at the second level of pyramid ....................................... 36
Figure 3.1 Effects of radial distortion ........................................................................................... 41
Figure 3.2 Images of different orientations of checkerboard captured using a camera ................ 42
Figure 3.3 Undistortion of the image of a tiled floor .................................................................... 43
Figure 3.4 Kernel based image processing ................................................................................... 44
Figure 3.5 Plot of Gaussian filter kernel ....................................................................................... 46
Figure 3.6 Results of Gaussian filtering ....................................................................................... 47
Figure 3.7 Result of Canny edge detection ................................................................................... 50
Figure 3.8 Results of thresholding applied to an image. ............................................................... 51
Figure 3.9 Binary thresholding applied to a tiled surface ............................................................. 52
Figure 3.10 Parameters of a line ................................................................................................... 53
Figure 3.11 Plot of lines passing through a point ......................................................................... 53
Figure 3.12 Probability mapping of points in the image for line detection .................................. 54
Figure 3.13 Hough lines on an image of rectangle ....................................................................... 55
Figure 3.14 Line detection on a patterned surface ........................................................................ 55
Figure 3.15 Results of GF2T on concrete and tiled surfaces ........................................................ 56
Figure 3.16 Result of feature detection on a tiled floor based on GF2T and Hough lines ........... 57
Figure 3.17 Two-way optical flow ................................................................................................ 58
Figure 3.18 Translation error induced by tilts in the camera ........................................................ 68
Figure 3.19 Images of a tiled floor with tilt-free and tilted camera .............................................. 68
Figure 3.20 Possible grid selected on the tiled floor image .......................................................... 69
Figure 3.21 Feature points at the corners of the selection ............................................................ 69
Figure 3.22 Mapping from a tilted image to tilt-compensated image ........................................... 70
x
Figure 3.23 Example of grid shifting using the tilt compensation algorithm ............................... 73
Figure 3.24 Flow chart of the proposed 6DOF egomotion algorithm for rectangular patterned surface ................................................................................................................................... 74
Figure 4.1 Bumblebee stereo camera ............................................................................................ 84
Figure 4.2 First and last frame of uniformly translating rectangle ............................................... 86
Figure 4.3 Camera translations for the simulated case of uniformly translating rectangle .......... 87
Figure 4.4 Plot of estimated trajectory as compared to the true trajectory for the simulation of uniformly translating rectangle ............................................................................................. 88
Figure 4.5 Azimuthal rotation for the simulated case of uniformly translating rectangle ............ 89
Figure 4.6 Height of camera from planar surface for the simulated case of uniformly translating rectangle .............................................................................................................. 89
Figure 4.7 Frames of the simulated video of a uniformly rotating rectangle ............................... 90
Figure 4.8 Trajectory of the camera for the simulation of uniformly rotating rectangle .............. 90
Figure 4.9 Azimuthal rotation of the camera for the simulation of uniformly rotating rectangle ................................................................................................................................ 91
Figure 4.10 Frames of the simulated video of uniformly rotating rectangle ................................ 92
Figure 4.11 Trajectory for the simulation of camera when changing height ................................ 92
Figure 4.12 Plot of azimuthal rotation for the simulation of camera when changing height ....... 93
Figure 4.13 Plot of height estimate for the simulation of camera when changing height ............ 93
Figure 4.14 Comparison of trajectory estimation using Least Squares and Kalman filtering ...... 95
Figure 4.15 Setup of two cameras moving together on a floor surface ........................................ 96
Figure 4.16 Trajectories of two cameras moving together estimated using the 4DOF algorithm ............................................................................................................................... 97
Figure 4.17 Azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm ............................................................................................................................... 98
Figure 4.18 Trajectories of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................................... 99
xi
Figure 4.19 Azimuthal rotations of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................... 99
Figure 4.20 Scattered plot of back projection errors obtained for the iPhone camera using the 4DOF algorithm .................................................................................................................. 102
Figure 4.21 Scattered plot of back projection errors obtained for the Nexus camera using the 4DOF algorithm .................................................................................................................. 102
Figure 4.22 Scattered plot of back projection errors obtained for the Bumblebee camera using the 4DOF algorithm ............................................................................................................ 103
Figure 4.23 Scattered plot of back projection errors obtained for the Bumblebee camera based on the Kalman filter estimation of 4DOF algorithm ................................................. 104
Figure 4.24 Turntable used to move the camera in a circular motion ........................................ 105
Figure 4.25 Newmark RT-5 motorized rotatory stage ................................................................ 105
Figure 4.26 Newmark NSC-1 motion controller ........................................................................ 105
Figure 4.27 iPhone camera mounted on the turntable ................................................................ 106
Figure 4.28 Camera trajectory obtained for a circular motion using the 4DOF algorithm ........ 107
Figure 4.29 Azimuthal rotation obtained for circular camera motion using 4DOF algorithm ... 108
Figure 4.30 Result of tilt removal algorithm .............................................................................. 109
Figure 4.31 Result of tilt removal algorithm .............................................................................. 110
Figure 4.32 Setup of cameras mounted at different tilt angles on the cart ................................. 110
Figure 4.33 Trajectories of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns ....................................................................................................... 111
Figure 4.34 Azimuthal rotations of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns ....................................................................................... 112
Figure 4.35 Trajectories of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ............................................................................ 113
Figure 4.36 Azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ............................................................ 114
Figure 4.37 Scattered plot of back projection errors obtained for the iPhone camera using the 6DOF algorithm for rectangular patterns ............................................................................ 115
xii
Figure 4.38 Scattered plot of back projection errors obtained for the Nexus camera using the 6DOF algorithm for rectangular patterns ............................................................................ 116
Figure 4.39 Scattered plot of back projection errors obtained for the Bumblebee camera using the 6DOF algorithm for rectangular patterns ...................................................................... 116
Figure 4.40 Scattered plot of back projection errors for the Bumblebee camera considering 6DOF estimation using Kalman filtering ............................................................................ 117
Figure 4.41 Camera mounted on the turntable at a certain tilt .................................................... 118
Figure 4.42 Camera trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................................................. 118
Figure 4.43 Azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................................................. 119
Figure 4.44 Trajectory for long range motion of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 121
Figure 4.45 Azimuthal rotation for long range motion of stereoscopic camera obtained using the 6DOF algorithm ............................................................................................................ 122
Figure 4.46 Trajectories of the two sensors of stereoscopic camera obtained using the 6DOF algorithm ............................................................................................................................. 123
Figure 4.47 Tilts in x direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 124
Figure 4.48 Tilts in y direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 124
Figure 4.49 Azimuthal rotations of the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 125
Figure 4.50 Camera trajectory obtained for a circular motion using the 6DOF algorithm ........ 126
Figure 4.51 Azimuthal rotation obtained for a circular camera motion using the 6DOF algorithm ............................................................................................................................. 126
Figure 4.52: Trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm ............................................................................................................................. 128
Figure 4.53 Trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm ............................................................................................................................. 128
xiii
Figure 4.54 Azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm .................................................................................................................. 130
Figure 4.55 Azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm .................................................................................................................. 130
Figure 5.1 Example of planar structure not repeating its pattern ................................................ 136
xiv
List of Symbols, Abbreviations and Nomenclature
AOA Angle of Arrival AGPS Assisted Global Positioning System CV Computer Vision DOF Degrees of Freedom FOV Field of View GF2T Good Features to Track GNSS Global Navigation Satellite System GPS Global Positioning System LOS MMSE RANSAC
Line-of-Sight Minimum Mean Square Error Random Sample Consensus
RFID Radio Frequency Identification RMS Root Mean Square RSS Received Signal Strength SNR Signal-to-Noise Ratio SLAM Simultaneous Localization and Mapping SVD Singular Value Decomposition TDOA Time Difference of Arrival TOA Time of Arrival UHF Ultra High Frequency UWB Ultra Wideband WLAN Wireless Local Area Network
1
Introduction Chapter One:
1.1 Introduction to Navigation
Navigation has become an integral part of our lives. Information about our current location, time
to a destination, possible routes to a destination provided by navigation services save
considerable amount of time and effort in our busy schedules. There are several available Global
Navigation Satellite System (GNSS) based technologies that use wireless signalling to perform
navigation. For instance, the Global Positioning System (GPS) is widely used to provide
location-based services to the users all across the globe. The European GALILEO and Russian
GLONASS are also becoming functional to provide these services. While these technologies are
effective for positioning in outdoor environments, their performance is quite unsatisfactory when
used for indoors. Indoor environments require sub meter level accuracy for many positioning
applications to be practical. Being subject to low signal-to-noise ratio (SNR) and multipath
distortions, wireless signals are not able to meet these requirements [1].
It takes several seconds for a standard GPS receiver to acquire the satellites. In addition to that,
the initial acquisition requires clear view of the sky and high signal strength [2]. In indoor
environments and urban canyons, where the view of the sky is not clear and the strength of the
received signal is weak, the acquisition time is extended significantly. Poor signal strength not
only increases the time to acquire the satellite but also makes it difficult to decode the navigation
data from the satellite [3].
The direct unobstructed signals from the satellite are referred as the Line-of-sight (LOS) signals.
Due to the free space loss, LOS signals already have very low SNR [1]. Multipath in outdoor
environments results in reflected signals that are weaker than the LOS signals. However in
indoor environments and urban canyons, the reflected signals may be stronger than the LOS
2
signals. In such environments, in order to acquire the weak LOS signals, it is important to
remove the strong reflected signals first [3]. Figure 1.1 shows the multipath scenario in urban
canyons. Orange lines represent the line of sight (LOS) signals while the green lines represent
the reflected signals.
Figure 1.1 Multipath scenario in urban canyon
In standard GPS receivers, no prior information about the satellite is available at the GPS
receiver at the time of acquisition. The advanced technology to overcome the conventional GPS
problems of excessive acquisition time and ineffective performance in weak SNR conditions is
assisted GPS (AGPS) [1-5]. In AGPS, wireless network of the handset provides information
about the GPS signal that the handset will receive [6]. It not only reduces the acquisition time,
but also enables the detection of signals having low SNR. However, in order to be able to use the
GPS Receiver
3
services of AGPS, it is important that the GPS device is always connected to a cellular network
[7].
1.2 Existing Indoor Navigation Techniques
There are various indoor positioning techniques available, for instance, triangulation, scene
analysis (location fingerprinting) and proximity devices [8]. These techniques use the existing
wireless technologies like Wireless Local Area Networks (WLAN), Radio Frequency
Identification (RFID), Ultra Wideband (UWB), Bluetooth and Ultra High Frequency (UHF) for
positioning. In triangulation, the distance from three different access points is used to estimate
the 2D position of the navigating object, as shown in Figure 1.2. Distance measurements can be
based on various metrics like time of arrival (TOA), time difference of arrival (TDOA), angle of
arrival (AOA) and received signal strength (RSS) [8-9].
Figure 1.2 Positioning based on triangulation
Scene analysis, also known as location fingerprinting, is based on RSS measurements and is
divided into two stages: offline stage and online stage. In the offline stage, the features or the
fingerprints of the surrounding are gathered based on the RSS measurements from the access
AN2
AN1
AN3
Estimated Location
4
nodes. In the online stage, the collected fingerprints are matched with a priori fingerprints to find
the estimated location [8][10].
Proximity algorithm looks for the antenna in whose vicinity the navigating object lies. If there
are more than one such antenna, it looks for the one with the strongest signal strength. Based on
this information, it provides the relative position of the navigating object with respect to the
antenna.
Although these techniques provide solution to the indoor navigation problem, they are more
suitable for open areas. Moreover, they require positioning infrastructure that are much more
expensive and time-consuming to deploy. Thus, the demand for high accuracy indoor positioning
requires highly accurate trajectory estimation, which had led to the development of new
algorithms for indoor positioning. Due to their high performance and low cost, computer vision
(CV) based sensors have gained enormous interest in the field of indoor navigation.
1.3 Integration with Computer Vision
CV based sensors and algorithms provide highly accurate trajectory estimation in the presence of
anchor node based landmark features. When such features are unavailable, stationary features of
opportunity are used in Simultaneous Localization and Mapping (SLAM) to support trajectory
estimation. While CV observables provide high accuracy for short trajectories, the estimation
drifts as the distance increases. As a result, CV observables complement the GNSS based
observations, which are more accurate for large distances. Hence, any smart phone having both a
camera and GPS allows for the integration of CV based trajectory estimation and GNSS based
navigation.
There are many techniques available that use cameras to estimate the egomotion or the self-
motion of the camera. As an example, [11-12] present a method to estimate the egomotion by
5
tracking the lines in the images. [13-14] perform feature correspondence and random sample
consensus (RANSAC) based egomotion. An algorithm to find the camera motion based on scale
invariant image features is presented in [15], while [16] makes use of stereo-vision with an
iterative closest point scheme to perform egomotion. [17] presents a method to estimate the
trajectory utilizing the structure of the environment.
This thesis presents a robust 6 degrees of freedom (DOF) ego-motion estimation of a camera,
directed to a planar surface, for instance floor, ceiling or wall surface. The surface may be a
concrete surface or a highly patterned surface, as shown in Figure 1.3. It will be shown that while
capturing a concrete unpatterned surface, only random features of opportunity are available. As a
result, the estimation of trajectory drifts over distance. While if the camera is facing a patterned
surface, then it can utilize the additional information regarding the structure of the patterned
surface to ameliorate the performance of trajectory estimation algorithm.
(a) Concrete floor with random features of opportunity
(b) Tiled floor with patterned features of opportunity
Figure 1.3 Examples of planar surfaces
6
Any well-defined patterns on floors, ceilings or walls, which provide more structural
information, may be used to improve the accuracy of trajectory estimation. Concrete random
points result in potential difficulties in detecting feature point correspondences while in patterned
surfaces, the points of the intersecting grout lines provide much higher SNR, resulting in higher
trajectory estimation accuracy. Some other examples of patterns on the planar surfaces are shown
in Figure 1.4.
Figure 1.4 Examples of patterned surfaces
7
A camera directed at a planar surface can undergo perspective transformation, that is, it might
undergo rotation and translation. 6DOF estimation deals with the estimation of the translations
and rotations along the three perpendicular axes. The rotation in the xy plane is referred to as the
azimuthal rotation, while the rotations in the yz and xz plane are referred as the tilts, since they
are the result of the tilting of camera about its optical axis.
During its motion, the camera picks up some features of opportunity on the planar surface and
the perspective transformation of the features determines the motion of the camera. Detection of
tilts in the camera from the estimated rotation matrix is generally an ill posed CV problem. A
camera generally has a MEMS sensor accelerometer and rate gyrometer that could be used to
estimate the tilts in the camera. However, a complication here is that the accelerometer is unable
to distinguish between gravity vector and receiver acceleration relative to an inertial frame. The
proposed CV algorithm also determines the tilts in the camera, with reference to the planar
surface, using the homography of the features of opportunity. The structure of the patterned
surfaces enhances the SNR such that robust tilt detection is possible. The novelty of this research
lies in the use of patterned surfaces to improve the accuracy of trajectory estimation using CV
observables.
1.4 Objectives
This primary hypothesis of this research is that we can perform accurate 6DOF trajectory
estimation with minimal processing effort if the observed features are planar and if they have
some good structure. In order to address this hypothesis, the major objectives of this research can
be summarized as follows:
1. Extraction of rotation and translation vectors from the perspective transformation that the
features of opportunity undergo in the consecutive frames. The estimation of rotation and
8
translation from the transformation will be based on Least Squares [18] or Kalman Filter [19-
20] estimations, depending upon the information about the motion that is available to us.
2. Comparison of Least Squares and Kalman Filter estimation of the camera motion.
3. Estimation and compensation of tilts in the camera from the transformation of features of
opportunity.
4. Estimation of 6 DOF trajectory of the camera from the motion of observed planar features of
opportunity in the consecutive frames.
5. Verifying if the structure of the surface improves the trajectory estimation by providing a
comparison of trajectory estimation in case of concrete and regular patterned surfaces.
1.5 Contributions
A novel 6DOF algorithm, which is partitioned into 2DOF and 4DOF estimations, is proposed for
rectangular patterned surfaces. The 2DOF algorithm estimates and compensates the absolute tilts
in the camera while the 4DOF algorithm determines the camera translations and azimuthal
rotation. A general 6DOF algorithm that determines the differential translations, azimuthal
rotation and tilts in the camera facing any planar surface is presented. This algorithm is also
partitioned into a sequence of 2DOF and 4DOF estimations. The 2DOF estimation compensates
the differential tilts while the 4DOF estimation determines the relative rotation and translations
between two camera positions. It is shown that the use of patterns on planar surfaces can
improve the performance of trajectory estimation of the camera. Drifts in the estimated trajectory
can be reduced in case of patterned surfaces as opposed to concrete surfaces. 6 DOF egomotion
of the camera can be determined from the motion of the feature points on the planar surfaces. A
paper titled “Indoor Navigation based on Computer Vision utilizing Information from Patterned
9
Surfaces” based on this concept was presented and will be published in the proceedings of the
ION (Institute of Navigation) GNSS+ conference held in September 2014 in Tampa, Florida.
1.6 Organization
In this chapter, we have discussed the basic concepts of navigation and problems with using
wireless signalling for indoor navigation. Some techniques used for indoor navigation are
discussed and positioning using CV observables is introduced. The rest of the thesis is organized
as follows:
• In Chapter 2, the necessary definitions and algorithms to understand the proposed algorithm
for 6DOF camera egomotion are developed. The pinhole model of the camera for image
formation is explained. An introduction to the different transformations that the images can
undergo is provided, followed by the concept of feature extraction and correspondence.
• Chapter 3 explains the proposed algorithm for trajectory estimation. The process of camera
calibration using a chessboard pattern and image pre-processing techniques are explained in
this chapter. A robust 4DOF algorithm for extraction of motion based on Least Squares and
Kalman filtering is provided. A 6DOF algorithm based on 2DOF tilt compensation and
4DOF trajectory estimation is proposed for rectangular patterned surfaces. Finally, a 6DOF
egomotion estimation algorithm based on the motion of features of opportunities is provided
for any general planar surface.
• Chapter 4 provides the experimental verification of the proposed algorithm. Various different
experiments are performed to verify the proposed algorithms. The algorithms are verified
using the stereo camera. Then, a back projection method is used to verify the algorithms.
Finally, the comparison of estimated trajectory is provided against the true trajectory. Results
of trajectory estimation using Least Squares are compared against those using Kalman
10
filtering, depending on the information available regarding the motion of the camera. A
comparison is provided for trajectory estimation on a patterned and concrete surface.
• Finally chapter 5 concludes the thesis and provides some suggestions for future work.
11
Background Chapter Two:
Navigation finds applications in various indoor facilities like airports, hospitals, shopping centers
and malls. Various GNSS based technologies, being subject to multipath distortions and low
SNR, are unsuitable for use in indoor environments. Hence, in this research, we use cameras to
perform navigation in indoor environments. A camera observes the motion of objects that fall in
its field of view (FOV) and based on that information, it estimates its own motion, known as the
ego-motion, with respect to the surroundings.
In order to understand the ego-motion of the camera, it is important to understand the concept of
reference frames used to reach the trajectory estimation algorithm. An object in the FOV of the
camera is mapped from the world reference frame onto the camera reference frame, which is
further mapped onto the image plane of the camera to obtain the image of the object. Based on
the mapping of object obtained on the camera image plane, transformation of an object in two
consecutive frames is determined. The translation and rotation of the object in two consecutive
frames is obtained from this transformation. The estimated translation and rotation vectors are
hence used to determine the motion of the camera in two consecutive frames.
This chapter introduces some necessary definitions important to understand the proposed
algorithm. Firstly, the geometric model is introduced and types of transformations are defined.
Determination of transformation between two consecutive frames of the video sequence is done
on the basis of motion of some points, known as feature points, on the image frame. So, after
explaining the transformations, the concept of feature points extraction is explained and an
algorithm to obtain the feature points is introduced. The final section presents an algorithm to
find the correspondence of feature points in two consecutive frames, so that they can be used to
obtain the underlying transformation.
12
2.1 The Geometric Model
We will consider a pinhole camera model to understand the geometry behind image formation
[21]. An image is the representation of visual perception on a two-dimensional matrix. Hence,
image formation involves the mapping of a 3D point in the world frame onto the 2D image frame
of the camera. To understand this mapping, we will first consider the projection of the point from
the world frame to the 3D camera frame, which will then be mapped onto the 2D camera image
plane. Some definitions and notations to understand the concept are given in Table 2.1.
Table 2.1 Definitions to understand the geometric model
OC Camera origin in world reference frame OW World origin in world reference frame
{XW ,YW ,ZW } Directional unit vector of world reference frame
{XC ,YC ,ZC} Directional unit vector of camera reference frame {x, y} Unit vector of camera image plane
P Source point in generic coordinate system (world or camera)
PW Position vector from world origin to P
PC Position vector from camera origin to P
A left hand coordinate system is used for both camera and world coordinates.
Source point, P, is referenced in the world coordinate frame as PW , denoted in vector form as
PW =xwywzw
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.1)
such that PW = xwXW + ywYW + zwZW .
Similarly, P is referenced as PC in the camera coordinate frame, where
13
PC =xcyczc
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(2.2)
such that PC = xcXC + ycYC + zcZC .
Figure 2.1 shows the imaging model for a pinhole camera. The image of a point, P, is formed
where the ray passing through P and the camera optical center intersects the image plane [22].
Note that the distance of the image plane from the camera optical center is referred to as the focal
length, denoted by f.
Figure 2.1 Imaging model for pinhole camera [22]
Based on this pinhole model, using similar triangles, we obtain
x = − f xczc
(2.3)
y = − f yczc
(2.4)
XC
OC ZC
YC
y
x P
Image of P
f
Image plane
14
The negative signs indicate that the image appears to be upside down in the image plane. This
effect can be overcome by placing the image plane in front of the optical axis, as shown in
Figure 2.2.
Figure 2.2 Frontal imaging model for pinhole camera
Using this updated model, we obtain
x = f xczc
(2.5)
y = f yczc
(2.6)
The coordinates {xc , yc , zc} are the homogeneous projection coordinates while {x, y} are the
non-homogeneous coordinates.
The overall geometric model considering a pinhole camera is shown in Figure 2.3. The 3D
mapping from the world frame to the camera frame is based on rotation and translation operators,
represented by R and T, respectively. Directions of both R and T can be referenced with respect
to world or camera frames.
y
x
XC
OC
ZC
YC
Image of P f
P
Image plane
15
Figure 2.3 Projection of 3D point on the camera image plane
Considering translation followed by rotation and reference to the world coordinate frame, the
translation vector is given as
T =OC −OW (2.7)
and the rotation matrix is given by the projection of unit vectors of {XW ,YW ,ZW } onto
{XC ,YC ,ZC} as
R =XW ⋅XC YW ⋅XC ZW ⋅XC
XW ⋅YC YW ⋅YC ZW ⋅YCXW ⋅ZC YW ⋅ZC ZW ⋅ZC
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.8)
Based on these operators, the projective transformation from world reference frame to the
camera reference frame is given as [23]
PC = R(PW −T ) (2.9)
The non-homogeneous coordinates of projection can be obtained using equations (2.5) and (2.6).
P
PC PW
YW
ZW
XW
OW
XC
ZC
YC
OC x
y
Image of P on the image plane
16
It is important to note that the above mentioned projective transformation relation was obtained
considering that the world coordinate frame is first translated and then rotated to obtain the
camera coordinate frame. Had it been the other way around, the transformation will change
accordingly.
2.2 Transformations
By undergoing scaling, skewing, rotation and translation, an image is transformed or mapped
onto another image. While undergoing transformation, if an arbitrary parallelogram is mapped
onto another parallelogram, then the transformation is said to be an affine transformation. The
mapping of any quadrilateral to any other quadrilateral is referred to as the perspective
transformation. So, it can be said that any affine transformation is a perspective transformation
but not every perspective transformation is an affine transformation.
2.2.1 Affine Transformation
An affine transformation between a set of images is the result of image translation, scaling or
azimuthal rotation. An example of an affine transformed image resulting from translation,
azimuthal rotation and scaling is shown in Figure 2.4.
Figure 2.4 An example of an affine transformed image
17
Affine transformations can be visualized as a parallelogram ABCD in plane mapped onto
another parallelogram PQRS , as shown in Figure 2.5.
Figure 2.5 Affine transformation
Let{(xA , yA ),(xB , yB ),(xC , yC ),(xD , yD )} represent the vertices of parallelogram ABCD and
{(xP , yP ),(xQ , yQ ),(xR , yR ),(xS , yS )} represent the vertices of parallelogram PQRS. The affine
transformation between corresponding set of points (xA , yA} and (xP , yP} is given by the
following equation:
xPyP
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
a b c
d e f
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xAyA1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.10)
The matrix with variables {a,b,c,d,e, f } is termed as the affine transformation matrix between
the two images, which is to be determined here.
Similarly, writing the affine transformation for other corresponding set of points on the two
parallelograms and rearranging the equations, we obtain the following equations for determining
the affine transformation variables:
B
A
C D
Image 1
P Q
R
S
Image 2
18
xPxQxRxS
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=
xA yA 1
xB yB 1
xC yC 1
xD yD 1
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
abc
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.11)
yPyQyRyS
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=
xA yA 1
xB yB 1
xC yC 1
xD yD 1
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
def
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.12)
Since in both the above equations, we have 3 variables that we need to determine, hence we need
only 3 out of 4 corners of the parallelogram. Considering the first three set of points, we obtain
xPxQxR
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=xA yA 1xB yB 1xC yC 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
abc
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.13)
yPyQyR
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=xA yA 1xB yB 1xc yC 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
def
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.14)
By solving (2.13) and (2.14), we obtain the affine transformation between the two images.
2.2.2 Perspective Transformation
Unlike affine transformations, perspective transformations are much more complicated as they
map any quadrilateral onto any other quadrilateral with arbitrary scaling, rotation, translation and
skewing. This is because it considers the tilting of camera while capturing the images, which is
19
not considered in affine transforms. An example of images that underwent perspective
transformation is shown in Figure 2.6.
Figure 2.6 An example of perspective transformation
Considering the rotation matrix, R, to be represented as a matrix of row vectors as
R =R1R2R3
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.15)
we can write equation (2.9) as
PC =R1R2R3
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥[PW −T ] (2.16)
By substituting PC and PW , equation (2.16) can be written as
xcyczc
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
R1R2R3
−R1T−R2T−R3T
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xwywzw1
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(2.17)
20
For a planar surface, like a floor, ceiling or wall, we have zw = 0 .
Hence, we can write
xcyczc
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥= H
xwyw1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.18)
where H is a perspective transformation matrix, which can be defined as
H =H11 H12 H13
H21 H22 H23
H 31 H 32 H 33
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(2.19)
The objective here is to determine the elements of H. Using the pinhole camera model, from
(2.5) and (2.6), we have
x = f xczc
= f H11xw + H12yw + H13
H 31xw + H 32yw + H 33
(2.20)
y = f yczc
= f H21xw + H22yw + H23
H 31xw + H 32yw + H 33
(2.21)
On rearranging, we obtain
H 31xxw + H 32xyw + H 33x = H11 fxw + H12 fyw + H13 f (2.22)
H 31yxw + H 32yyw + H 33y = H21 fxw + H22 fyw + H23 f (2.23)
This can be expressed as
uxb = 0 (2.24)
uyb = 0 (2.25)
where
b = H11 H12 H13 H21 H22 H23 H 31 H 32 H 33⎡⎣
⎤⎦T
(2.26)
21
ux = − fxw − fyw − f 0 0 0 xxw xyw x⎡⎣
⎤⎦ (2.27)
uy = 0 0 0 − fxw − fyw − f yxw yyw y⎡⎣
⎤⎦ (2.28)
For a set of 4 points on the quadrilateral, we have 8 equations, which can be written as
ux1
uy1
!ux4
uy4
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
b =
00!00
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(2.29)
Defining
U =
ux1
uy1
!ux4
uy4
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
(2.30)
we have Ub = 08 .
While we have 8 constraints here, the number of variables in b is 9. So, the homogenous solution
of (2.29) can be obtained by performing the singular value decomposition (SVD) of U. The right
singular vector corresponding to the singular value of zero will correspond to the values of b. It
is important to note that the singular vector is arbitrarily scalable, which means that H is obtained
to within a scaling factor. Hence, a set of 4 feature points is sufficient to determine the
perspective warping matrix. However, the feature point correspondences are noisy and distorted.
Hence, we usually need additional feature points to determine the warping matrix.
22
2.3 Feature points
A feature point is a small sub-region of image intensity field with a structure that makes it
definable in two orthogonal directions such that it is suitable for tracking from one frame to the
next. For instance, an edge between the corners is ambiguous in one of the directions. Hence, it
is not suitable for use as a feature point. Similarly, a sub-region homogeneous in both the
directions contains no useful feature points. However a corner point is clearly a suitable feature
point since the displacement in two orthogonal directions can easily be resolved using a corner
point. Figure 2.7 provides a detailed illustration of suitable and unsuitable feature points.
Figure 2.7 Illustration of suitable and unsuitable feature points
Deviation of corners from 90! feature results in the feature point quality degradation. Hence the
feature points at corners deviating from 90! , as shown in Figure 2.8, will perform worse than
90! corners.
Points at circular arcs, for instance circles or ellipses, provide poor quality feature points since
the gradient along the tangent is very low, hence they are ambiguous in one direction as shown in
Figure 2.9.
Suitable feature point resolvable in both the directions
Unsuitable feature point ambiguous in both directions
Unsuitable feature point ambiguous in one direction
23
Figure 2.8 Wedge corners deviating from 90o providing low quality feature points
Figure 2.9 Poor quality feature points at circular arcs
For an image with intensity function I(x, y) , let Ix (x, y) and Iy(x, y) represent the partial
derivatives of the intensity function along x and y directions respectively, given as
Ix (x, y) =∂I(x, y)∂x
(2.31)
Iy(x, y) =∂I(x, y)∂y
(2.32)
Now consider a small sub-region window W of the image. The partial derivatives are found
within the boundaries of the window for each position of window on the image. If Ix (x, y) and
Iy(x, y) are small throughout W, it indicates that the intensity function is featureless in the sub-
region. A second possibility is that | Ix (x, y) | and | Iy(x, y) | are moderately high, indicating the
Feature points weaker than a 90° corner
Ambiguity in the direction of tangent
24
presence of a potential feature. However, a high correlation between Ix (x, y) and Iy(x, y) imply
single dimensionality, hence the presence of an edge, which is not suitable for tracking. The third
possible situation is that Ix (x, y) and Iy(x, y) are reasonably high and not highly correlated,
indicating the presence of a suitable two-dimensional feature.
Hence, the presence of a suitable feature point is dependent on the covariance of the random
functions Ix (x, y) and Iy(x, y) within the sub-region window. Assuming that the window W has
Nx pixels in the x direction and Ny pixels in the y direction, let us define the functions Ixx , Ixy
and Iyy as follows:
Ixx = Ew[Ix (x, y)2 ]= 1
NxNy
Ix (i, j)2
j=0
Ny−1
∑i=0
Nx−1
∑ (2.33)
Ixy = Ew[Ix (x, y)Iy(x, y)]=1
NxNy
Ix (i, j)Iy(i, j)j=0
Ny−1
∑i=0
Nx−1
∑ (2.34)
Iyy = Ew[Iy(x, y)2 ]= 1
NxNy
Iy(i, j)2
j=0
Ny−1
∑i=0
Nx−1
∑ (2.35)
Here W has pixels indexed as 0 ≤ i < Nx and 0 ≤ j < Ny .
The covariance of the intensity gradient functions Ix (x, y) and Iy(x, y) averaged over W results
in the Q matrix given as
Q =Ixx IxyIxy Iyy
⎡
⎣⎢⎢
⎤
⎦⎥⎥
(2.36)
Next, we determine the eigenvalues of Q denoted as λ1 and λ2 such that λ1 < λ2 , both of which
are real and positive since Q is symmetric. One high eigenvalue indicates single dimensionality,
25
hence the presence of an edge which is unsuitable for use as a feature point. However, if both the
eigenvalues are high then it indicates the presence of a usable feature point within the window.
Both low eigenvalues are indicative of a featureless region.
Based on the eigenvalues of the Q matrix, there are various criteria for feature point detection
[24-25]. In this research, we have used Shi and Tomasi’s good features to track (GF2T) [26] for
feature detection. It is based on the criteria that the smaller eigenvalue greater than a minimum
threshold is indicative of the presence of a good feature point.
2.3.1 Examples of feature detection
We consider simple 90! corners, as shown in Figure 2.10, for corner detection based on the
eigenvalues of Q .
(a) Side view of features
(b) Top view of features
Figure 2.10 Corner feature points
Considering the partial derivatives of the intensity field given as
Ix (x, y) = I(x +1, y)− I(x −1, y) (2.37)
Iy(x, y) = I(x, y +1)− I(x, y −1) (2.38)
26
the derivative images are shown in Figure 2.11.
(a) Derivative in x
(b) Derivative in y
Figure 2.11 Derivative images for corner features
The Q matrix is obtained using a 3× 3 window and its eigenvalues are calculated. Figure 2.12
and Figure 2.13 show the plots for the larger and smaller eigenvalues of Q respectively, which
will be used to find the corner points of the given intensity field.
Figure 2.12 Plot of the larger eigenvalues of Q for 90! features
27
Figure 2.13 Plot of the smaller eigenvalues of Q for 90! features
From Figure 2.12 and Figure 2.13, it can be seen that at the edges of the rectangle, one of the
eigenvalues is large while the other one is very small because of high correlation between the
directional derivatives. Hence, as per the feature point detection algorithm, the edges cannot be
used as suitable features. It can be observed from these figures that it is only at the corners of the
rectangle that both the eigenvalues of Q are large. As a result, corners will be detected as quality
feature points.
Next, we show some examples of feature detection based on the GF2T routine of openCV.
Figure 2.14 shows some geometric shapes for which the routine was applied to obtain the feature
points. As can be seen from the figure, rectangle, star and line segment provide correct corner or
feature detection. However, the corners for ellipse, circle and rectangle with rounded corners are
not accurately identified.
For the shapes for which the corners are not properly identified, it is an indication that the feature
points are poor quality, hence not suitable for tracking purpose.
28
Figure 2.14 Corner detection of simple geometric shapes
2.4 Optical Flow
Having obtained the feature points in an image, the next step is to determine the correspondence
of feature points in the two images, that is, how feature points in one image relate to those in
another image. The purpose of finding this correspondence is to obtain the motion of an object
through a set of video frames, also known as optical flow.
Optical flow algorithms assume that the feature points are almost time invariant and follow the
conditions for accurate optical flow estimation as follows [27]:
1. Brightness Consistency: It means that the brightness of pixels remains consistence
between two consecutive frames.
2. Temporal Persistence: The feature points move in very small increments between two
consecutive frames of the video.
3. Spatial Coherence: Neighbouring points on the image belong to the same surface and
have similar motion.
29
To understand the optical flow of feature points in two consecutive frames, we first consider one
dimensional intensity field, given as I(x, t) at time t . From the condition of time invariance, we
obtain
dI(x,t)dt
= ∂I∂x
⋅ dxdt
+ ∂I∂t
= 0 (2.39)
From here we obtain the flow velocity as
vx =dxdt
= −
∂I∂t∂I∂x
= − ItIx
(2.40)
Now consider a general two-dimensional intensity function at time t given as I(x, y,t) . By
taking the total derivative, we obtain
∂I∂x
⋅ dxdt
+ ∂I∂y
⋅ dydt
+ ∂I∂t
= 0 (2.41)
Ixvx + Iyvy + It = 0 (2.42)
where vx and vy are the velocities of optical flow in x and y directions respectively. Here we
have two unknowns, which cannot be resolved using one space-time observation. Hence we need
at least two linearly independent equations to solve for vx and vy . Typically, a window is used in
the neighbourhood of the feature, resulting in a set of over-determined equations for the
estimation of vx and vy . The best fit for vx and vy is obtained by using Least Squares
estimation.
Equation (2.42) can be written as
Ix Iy⎡⎣
⎤⎦
vxvy
⎡
⎣⎢⎢
⎤
⎦⎥⎥= − It[ ] (2.43)
30
For k points in the window, indexed such that k ∈ 1,2,...,K[ ] , we have
Ix,1Ix,2!Ix,K
Iy,1Iy,2!Iy,K
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
vxvy
⎡
⎣⎢⎢
⎤
⎦⎥⎥= −
It,1It,2!It,K
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(2.44)
The second subscript in above equation indicates the pixel position in the window where spatial
and time derivatives are taken. Now, let
A =
Ix,1Ix,2!Ix,K
Iy,1Iy,2!Iy,K
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
(2.45)
M = −
It,1It,2!It,K
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(2.46)
and
P =vxvy
⎡
⎣⎢⎢
⎤
⎦⎥⎥
(2.47)
where A is called the design matrix, M the measurement matrix and P is called the parameter
matrix.
The square error for the form M = AP is given as
e = (M − AP)T (M − AP) (2.48)
31
For least squares estimation of the parameter matrix, we want to minimize the square error, e,
which can be done as
∂e∂P
= 0 (2.49)
Hence we obtain
∂e∂P
= ∂∂P(M − AP)T (M − AP) = −(M − AP)T A = 0 (2.50)
−MTA + PT AT A = 0
PT AT A = MTA
Multiplying by (AT A)−1 , we obtain
PT = MTA(AT A)−1
P = (AT A)−1(ATM ) (2.51)
Hence, we obtain the optical flow in x and y directions for a specified sub-region by using Least
Squares estimation.
As an example, consider a Gaussian pulse propagating in time with a velocity equivalent to 3
points in x direction and 2 points in y direction per 0.05 sec. The plot for the Gaussian pulse at
time t and t+dt is shown in Figure 2.15. Figure 2.16 shows the top view of the two pulses. The
spatial derivatives of the pulse in x and y directions are shown in Figure 2.17. Figure 2.18 shows
the time derivative of the pulse.
Using the time and spatial derivatives of the Gaussian pulse and using least squares, we obtain
the optical flow velocity in x and y directions to be 2.9 and 1.9 respectively, which are close to
the actual velocities of 3 and 2 in x and y, respectively.
32
(a) Pulse at time t
(b) Pulse at time t+dt
Figure 2.15 Side view of Gaussian pulse at time t and t+dt
(a) Pulse at time t
(b) Pulse at time t+dt
Figure 2.16 Top view of Gaussian pulse at time t and t+dt
33
(a) Derivative in x
(b) Derivative in y
Figure 2.17 Spatial derivative of the Gaussian pulse in x and y directions
Figure 2.18 Time derivative of the Gaussian pulse
Based on this concept, there are various algorithms to find the optical flow of an image. In this
research, we have employed the Lucas Kanade Pyramid (LKP) [28][29] algorithm to determine
the optical flow. This algorithm uses image pyramid to establish the correspondence of feature
points in a set of images. The initial images are first smoothened and decimated to obtain smaller
images. Further smoothing and decimation is applied to the obtained images such that we obtain
34
a pyramid of images, as shown in Figure 2.19. The correspondence is first determined at the
topmost level of the pyramid. The next level is now shifted by the displacement vector obtained
in the first level and again correspondence is established. The displacement at the third level is
determined by the sum of displacements at the first two levels and the process is repeated until
the bottom level is reached.
Figure 2.19 Pyramid structure of images in Lucas Kanade Pyramid algorithm
2.4.1 Example of optical flow using Lucas Kanade Pyramid
We reconsider the example of a Gaussian pulse propagating in time. Let the optical flow velocity
in x be equivalent to 1 point and that in y be equivalent to 2 points. We consider a 4 level Lucas
Pyramid of first image Pyramid of second image
Find correspondence
Warp and upsample
Find correspondence
…
35
Kanade Pyramid to find the optical flow for this pulse. Figure 2.20 shows the original pulse at
the bottommost level and decimated pulse at the topmost level of a level 4 pyramid.
(a) Pulse at bottommost level
(b) Decimated pulse at the topmost level
Figure 2.20 Plot of Gaussian pulses at different levels of pyramid
The optical flow velocity is calculated between the pulses at time t and t+dt in their topmost
level. The velocity at this level for this example is found to be 0.4 and 0.9 in x and y direction
respectively, that is vx = 0.4 and vy = 0.9 . Now, the next level of pyramid at time t+dt is shifted
by displacement vector of vxdt in x direction and vydt in y direction and compared to the second
level at time t. Figure 2.21 shows a close view of the pulse contour at time t and shifted pulse at
time t+dt in the second level.
Similarly, the pulse at the third level will be shifted by a value equal to the cumulative
displacement vector of the first two levels. This process is repeated until we reach the
bottommost level. In this case, the velocity calculated at the final level is equal to vx = 1 and
vy = 2 , which corresponds to the actual optical flow velocity.
36
(a) Pulse at time t
(b) Pulse at time t+dt, shifted by the displacement vector
Figure 2.21 Contour of Gaussian pulse at the second level of pyramid
It is important to note that in Lucas Kanade Pyramid algorithm, for correct estimate of the flow,
the displacement between two frames should be less than the minimum distance between two
feature points. Also, as stated earlier, the brightness of the pixels should not change between two
consecutive frames.
37
Proposed Algorithm Chapter Three:
Estimation of the 6DOF egomotion of a camera consists of estimating the translation and rotation
along the three perpendicular coordinate axes. The sequence of camera images is taken as the
input and a set of corresponding feature points are obtained on the image frames. Based on the
transformation of feature points from frame to frame, the underlying trajectory of the camera is
estimated. Before performing the trajectory estimation based on the motion of feature points, the
camera is calibrated in order to compensate for the lens distortion and to establish the camera
intrinsic matrix, which specifies the scaling factor and optical center of the camera. After
calibrating the camera, the images captured by the camera are pre-processed to remove noise and
to prepare the images in a form they can be used for better extraction of features. Finally, based
on the transformation of feature points from one frame to the other, different algorithms to
estimate the trajectory of the camera are proposed.
3.1 Camera Calibration
In chapter 2, the concept of mapping of a point from the world reference frame to the camera
image plane was introduced. The mapping deals with the estimation of only the extrinsic
parameters of the camera involving the rotation matrix and the translation vector. However, there
are intrinsic parameters of the camera, particularly the focal length and the principal point, and
distortion parameters that need to be determined. The intrinsic and distortion parameters of the
camera remain constant with time. Hence, they are estimated prior to using a camera for the
estimation of extrinsic parameters. The determination of intrinsic and distortion parameters is
done using camera calibration.
38
3.1.1 Intrinsic Camera Parameters
As shown in chapter 2, with the assumption of a pinhole camera model, any 3D point in plane in
the FOV of camera can be projected onto a 2D point on the image plane of the camera. For a
point P, the relationship between the point observed in the camera reference frame and that
observed in the camera image plane is given by equations (2.5) and (2.6), which can be re-
written as
xy
⎡
⎣⎢⎢
⎤
⎦⎥⎥= fzc
xcyc
⎡
⎣⎢⎢
⎤
⎦⎥⎥
(3.1)
This can be represented as
zcxy1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
f 0 00 f 00 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xcyczc
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.2)
It is important to note that when we do the transformation from a world coordinate frame to the
camera coordinate frame, the parameters are in term of metric units, that is, millimeters. The
metric unit needs to be scaled to pixels. Let sx and sy be the scale of this conversion. Hence,
(3.2) can be changed to
zcxy1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
sx 0 00 sy 0
0 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
f 0 00 f 00 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xcyczc
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.3)
All the pixels in an image are specified with respect to the top left corner, indicated as (0, 0)
whereas (x, y) are still specified with respect to the principle point, that is the point where the
camera optical axis intersects with the image plane [22]. Hence, two new parameters, ox and oy
39
are introduced to shift the origin of image reference frame from the principle axis to the top left
corner.
x ' = x + oxy ' = y + oy
(3.4)
Also, we replace zc by an arbitrary scalar λ . Hence, we obtain
λx 'y '1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
sx f 0 ox0 sy f oy0 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xcyczc
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.5)
We can write sx f = fx and sy f = fy to obtain
λx 'y '1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥=
fx 0 ox0 fy oy0 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xcyczc
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.6)
The above 3× 3 matrix is called the Intrinsic Camera Matrix.
Hence, from (2.18) and (3.6), the homography can be represented as [27]
λx 'y '1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥= H intH
xwywzw1
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(3.7)
where H int is the intrinsic camera matrix and H is the extrinsic matrix.
Hence, determination of the intrinsic camera matrix using camera calibration is the estimation of
the parameters fx, fy,ox,oy{ } .
40
3.1.2 Distortion Parameters
Due to manufacturing defects, there are various kinds of distortions in a camera lens. The two
main kinds of distortions in the lens are radial distortions and tangential distortions. These
distortions can be represented by approximate interpolative models, which make these distortions
invertible.
Radial distortion refers to the image deformation along the radial direction from a point known
as the center of distortion [30]. It causes inward or outward bulging of the image, resulting in
pincushion or barrel effect, as shown in Figure 3.1. Radial distortion is not seen at the image
center but it increases as we move away from the center [27].
Let (xd, yd ) represent the coordinates of distorted image point and (xu, yu ) represent the
coordinates of corresponding undistorted or corrected image point. The scaling of the points
between distorted and undistorted images is given by the radial distortion model equation
described in [31] as
xu = xd (1+ k1rd
2 + k2rd4 )
yu = yd (1+ k1rd2 + k2rd
4 ) (3.8)
where k1 and k2 are the radial distortion coefficients and rd refers to the radial distance given as
rd = xd2 + yd
2 (3.9)
41
(a) Original Image
(b) Barrel distortion
(c) Pincushion distortion
Figure 3.1 Effects of radial distortion
The second common form of distortion is the tangential distortion, which is caused by
manufacturing defects resulting in the lens not being parallel to the image plane. The model for
tangential distortion is given by [27]
xu = xd + 2p1yd + p2 rd
2 + 2xd2( )⎡⎣ ⎤⎦
yu = yd + p1 rd2 + 2yd
2( )+ 2p2xd⎡⎣ ⎤⎦ (3.10)
42
where p1 and p2 are the tangential distortion coefficients.
Hence, the estimation of distortion coefficients using camera calibration is the estimation of the
parameters k1,k2, p1, p2{ } .
3.1.3 Calibration and distortion mitigation
We have employed the 2D plane based calibration described in [32] to estimate the intrinsic and
distortion parameters of the camera. In this calibration, we have used the checkerboard pattern as
the 2D planar surface, which is viewed at different orientations, as shown in Figure 3.2 as an
example. The images of the checkerboard at various orientations are taken using the camera to be
calibrated.
Figure 3.2 Images of different orientations of checkerboard captured using a camera
The main purpose of estimating the distortion coefficients is to be able to invert the distortion in
an image to compensate for the effects of radial and tangential distortions. Based on the code
available in [33], the camera calibration was performed using the images of a checkerboard
pattern captured by a Bumblebee stereo camera. As a result of calibration, the obtained intrinsic
parameters of this camera are shown in Table 3.1. Using the models for radial and tangential
distortions mentioned above, the un-distortion of a sequence of images captured by this camera
43
was performed. The result of the un-distortion performed on the image of a tiled floor is shown
in Figure 3.3.
Table 3.1 Intrinsic parameters of the Bumblebee stereo camera
Parameter Value (pixels) fx 1976.9 fy 1957.5 ox 308.6 oy 145.6
(a) Original captured image
(b) Image obtained after applying lens distortion compensation
Figure 3.3 Undistortion of the image of a tiled floor
3.2 Image Pre-processing
Pre-processing refers to the process required to prepare an image so that it can be used for further
analysis of feature detection, correspondence and trajectory estimation [34]. Pre-processing is
done using kernel based operations, which are accomplished using 2D correlation operations.
The source image is acted upon by a rectangle shaped kernel operator to obtain the destination
image. For each window position, the content of the kernel is correlated with the image content
44
within the boundaries of the window and the result is stored at the image pixel point coincident
with the kernel point denoted as the anchor point. For example, as shown in Figure 3.4, for a
square shaped kernel with the anchor point at the center, the result of correlation operator will be
stored at the pixel in the destination image which is coincident with the center of the kernel.
Figure 3.4 Kernel based image processing
Consider a kernel H with width of the kernel window nx and height ny . Let a and b represent
the anchor point index of the kernel in x and y directions respectively and h(i, j) be the value
of kernel at pixel point (i, j) . The kernel is represented in the matrix form as
H =
h(0, 0) h(1, 0) ! h(nx −1,0)h(0,1) h(1,1) ! h(nx −1,1)! ! " !
h(0,ny −1) h(1,ny −1) ! h(nx −1,ny −1)
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(3.11)
Let Isrc (i, j) represents the pixel intensity of source image at (i, j) and Idst (i, j) represents the
intensity of destination image at pixel location (i, j) .
Anchor point
Kernel with anchor point at the center
Result of this convolution stored at this pixel point
45
The expression for a 2D correlation of the source image with the kernel is given as
Idst (n,m) = h(i, j)Isrc (n + i − a,m + j − b)j=0
ny−1
∑i=0
nx−1
∑ (3.12)
An immediate problem with sliding correlation is that there are times when the indices of source
image move out of the support domain. In such cases, the approach that we follow is the
extension of background intensity values to the outside boundary of the image such that the
correlation is defined for all indices of the source image. For (Nx,Ny ) being the size of source
image, the background extension can be expressed numerically as
Isrc (i, j) = Isrc (0, j) for i < 0
Isrc (i, j) = Isrc (Nx, j) for i ≥ Nx (3.13)
Isrc (i, j) = Isrc (i, 0) for j < 0
Isrc (i, j) = Isrc (i,Ny ) for j ≥ Ny
The problem with any boundary extension is that we are creating a feature that is not a part of the
original image. These artifacts are minimized by taking care not to infer image features at
boundary for eventual egomotion observables.
3.2.1 Gaussian Smoothing
Smoothing is a low pass filtering operation that is used to supresses the higher spatial
frequencies of the image that are generally corrupted by noise. Smoothing is also used to
decrease the image resolution, as in case of Lucas Kanade Pyramid [27]. There are various
smoothing operations available for blurring of the image by low pass filtering, for instance
simple blur, Gaussian blur, median blur, etc. In this research, we have used Gaussian blur for
image smoothing. Gaussian blurring or filtering is done by applying a Gaussian kernel to the
46
source array to obtain the result, which is stored in the destination array. The anchor point in the
case of Gaussian smoothing is always the center point of the 2D kernel. It weighs the source
image based on a weighted average where the center pixel has higher weight and the weight
decreases away from the center according to the Gaussian distribution. The 2D Gaussian kernel
can be represented mathematically as
h(i, j) = C exp −i − (nx −1) / 2( )2
2σ x2
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪exp −
j − (ny −1) / 2( )22σ y
2
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪ (3.14)
where σ x and σ y represent the standard deviation of the Gaussian kernel in the x and y
directions, respectively. A plot of the Gaussian smoothing kernel is shown in Figure 3.5.
Figure 3.5 Plot of Gaussian filter kernel
Figure 3.6 shows some results of Gaussian smoothing with different sizes of the window applied
to an image. It is important to be careful with the parameters of Gaussian smoothing, since
besides removing noise from the image, it also gives blurry definition to the edges.
47
(a)
(b)
(c)
(d)
Figure 3.6 Results of Gaussian filtering (a) shows the original image. (b), (c) and (d) show the results of Gaussian filtering applied with the filter kernel of size 13x13, 17x17 and
21x21 respectively.
3.2.2 Edge Detection
In this research, particularly in case of patterned surfaces, significant amount of information is
extracted from the edges of the image. So, edge detection is an important step in image pre-
processing to gather information about the structure of the surface. Among the various available
edge detection operators, like the Sobel edge detector [35], Canny edge detector [36], Roberts
48
Cross operator [37] and Prewitt operator [38], the most common edge detector is the Canny edge
detection, which is the routine we have employed in this research. The reason for using Canny
edge detector is that it is computationally very efficient and generally provides satisfactory
results in the experimental work. It is dependent on the kernel-based gradients of the image.
Hence, before going to the Canny edge detection, we will throw some light on the kernel based
spatial derivative operations.
If we take the derivative of an image in a particular direction, we obtain its edges in one
particular direction. For instance, by taking the derivative in the y direction, we obtain all the
horizontal edges of the image. Similarly, the derivative in x direction provides us with the edges
in the vertical direction.
The expressions for a one-dimensional derivative of a discrete signal are generalized in
following different ways:
Forward differencing:
v(n) ≈ du(n)dn
= u(n +1)− u(n) (3.15)
Backward differencing:
v(n) ≈ du(n)dn
= u(n)− u(n −1) (3.16)
Central differencing:
v(n) ≈ du(n)dn
= u(n +1)− u(n −1)2
(3.17)
Forward differencing advances the derivative and backward differencing delays the derivative.
However, the central differencing provides an unbiased estimate of the derivative. Hence, based
49
on the central differencing, the kernel for gradient in x to obtain the derivative image, Ix , is
given as − 12
0 12
⎡
⎣⎢
⎤
⎦⎥ with the anchor point at the center.
Similarly, the kernel for gradient in y to obtain the derivative image, Iy , is given by
− 12012
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
with the anchor point at the center.
Based on these spatial derivatives, the Canny edge detector computes which is given as
Ie = Ix2 + Iy
2 (3.18)
Ie is then compared against two thresholds, λhigh and λlow . Depending on how Ie at a particular
pixel compares to these thresholds, the corresponding binary output is assigned at that particular
pixel.
Idst (i, j) =1 if Ie(i, j) > λhigh
Idst (i, j) = 0 if Ie(i, j) < λlow
Idst (i, j) =1 if λlow < Ie(i, j) < λhigh and one of the neighbours is higher than λhigh .
An example of Canny edge detection is shown in Figure 3.7.
Ie
50
(a) Original image
(b) Result of Canny edge detector applied to it
Figure 3.7 Result of Canny edge detection
3.2.3 Thresholding
Thresholding refers to the technique of segmenting an image to extract the desired clutter in an
image such that it is separated from undesirable pixels. It is done in a way that each pixel is
either accepted or rejected depending on whether it fall above a predefined threshold or below it.
The purpose of doing thresholding is to segment an image such that the pixels associated with
the object in the image appear above a threshold while the background appears below the
threshold. There are a various methods that use image histograms to estimate the threshold of an
image [39-41].
Given the image and threshold value, thresholding can be performed in various ways like binary
thresholding, binary inverse thresholding, threshold to zero, threshold to zero inverse and
threshold truncate. In binary thresholding, the pixels with intensity above the threshold are
assigned the maximum value while those below it are assigned zero. Binary inverse thresholding
works in a reverse fashion where the intensities above the threshsold are assigned zero and those
51
below it are assigned the maximum value. In threshold to zero, the pixels with intensity values
above the threshold remain unchanged while those below it are assigned a zero value and vice-
versa in threshold to zero inverse. Finally, in threshold truncate, the pixel intensities above the
threshold get the maximum value while those below it remain unchanged. Figure 3.8 shows the
results of various kinds of thresholding operations applied to an image.
(a) Original Image
(b) Binary Thresholding
(c) Binary Inverse Thresholding
(d) Threshold to Zero
(e) Threshold to Zero Inverse
(f) Threshold Truncate
Figure 3.8 Results of thresholding applied to an image.
In this research, we have used binary thresholding to highlight the structure of a patterned
surface for which the structure is not clearly defined, so that the information from the pattern can
be used to improve the accuracy of the algorithm. An example is shown in Figure 3.9. However,
thresholding itself is not reliable in isolating the lines in the image, since its performance is
dependent on a variety of factors like brightness and contrast of the tiles, etc.
52
(a) Original image
(b) Thresholded image
Figure 3.9 Binary thresholding applied to a tiled surface
3.3 Hough Lines
For a patterned surface, it is necessary to find the position of lines of the pattern so that extra
information can be gathered from it. This is accomplished using Hough lines [42]. The Hough
lines algorithm is a standard algorithm that detects the lines in an image by exploiting the
parameters of the line [43]. Rather than using the slope-intercept parameters, it uses the angle-
radius parameters, which makes the computation simple [44].
Consider any line that is extended into an infinite line in 2D plane. It can be represented by 2
variables, ρ and θ , where ρ is the perpendicular distance from the origin to the line and θ is
the angle of the line, as shown in Figure 3.10.
53
Figure 3.10 Parameters of a line
In an edge detected image, for each point in the intensity field, the contour of all potential {ρ,θ}
combinations that the point can belong to is plotted. Figure 3.11 shows how a point may belong
to multiple lines. In the contour, each line is weighted by the intensity of the point. The plot is
then thresholded to find the {ρ,θ} of the line segment. The peak is a monotonic function of the
line length and the intensity.
Figure 3.11 Plot of lines passing through a point
ρ
θ
x
y
x
y
Point
54
For example, consider a line on an edge detected image as shown in Figure 3.12(a). For each
point that is lit up, a contour of the possible {ρ,θ} combinations is plotted. For instance, Figure
3.12(b) shows some example contours of four points on the line. The final probability map will
have peaks, which will correspond to the more likely line segments. Hence, we can extract the
line segments in the image by proper thresholding.
Figure 3.13 shows an example of lines detected in the image based on the Hough lines transform.
It is important to note that the lines can be detected either using edge detected or thresholded
image. Figure 3.14 shows the result of line detection on a patterned surface resulting from low
pass filtering, edge detection and then the Hough transform.
(a) A line in an edge detected image. 4 points are selected on the line to show
the contour mapping on {ρ,θ} plot.
(b) Mapping of contours of possible {ρ,θ}
combinations for each point in the edge detected image to find the peak.
Figure 3.12 Probability mapping of points in the image for line detection
1
2
3
4
55
Figure 3.13 Hough lines on an image of rectangle
(a) Original image
(b) Lines detected in the original image
Figure 3.14 Line detection on a patterned surface
3.4 Proposed 4DOF egomotion algorithm
Before considering the camera undergoing full perspective transformation, we will consider the
case where the camera is capturing the planar surface such that the optical axis of the camera is
always perpendicular to the planar surface. That is, the camera can undergo translation in x and y
56
directions, change in height (translation in z) and azimuthal rotation, but it cannot undergo tilts in
x and y. Hence, the motion of the camera will exhibit 4DOF.
The proposed algorithm to determine the 4DOF camera motion takes a pair of consecutive
images captured by the camera as input. A certain sequence of steps is carried out on each pair to
estimate the relative transformation between them. In each pair of consecutive images, we will
denote the image at time t-1 as prior image and the one at time t as post image.
To implement egomotion, the calibrated consecutive images are first gray-scaled and Gaussian
smoothing is applied to them. The next step is to extract the rotation and translation of the
camera based on the feature points observed. As mentioned in chapter 2, we have used Shi and
Tomasi’s GF2T [26] to detect the features of the prior image. Figure 3.15 shows the results of
the GF2T feature detector on a concrete and tiled floor surface.
(a) Features on a concrete floor
(b) Features on a tiled floor
Figure 3.15 Results of GF2T on concrete and tiled surfaces
For a patterned surface, we have used Hough lines along with GF2T to obtain the feature points
at the intersection of lines. The lines in the patterned surface add extra constraints, which helps to
eliminate outlier feature points that are not a part of the pattern and hence obtain a rich set of
57
feature points. The result of feature detection based on GF2T and Hough lines for a tiled floor
surface is shown in Figure 3.16.
(a) Original image
(b) Feature detection on original image
Figure 3.16 Result of feature detection on a tiled floor based on GF2T and Hough lines
Having obtained the feature points in the prior image, we need to find the corresponding set of
feature points in the post image, which is done based on the optical flow algorithm employed by
Lucas Kanade Pyramid (LKP) [29]. We have employed a two-way optical flow algorithm [45] to
find the correspondence of feature points. In a two-way optical flow, the correspondence is first
established in the forward direction, that is, from prior image to post image. Then with the
corresponding points obtained in the post image, the correspondence is established in the
backward direction, that is, from post image to prior image. The correspondences that do not
match in the two directions are discarded as being outliers. Let xa, ya( ) represent the feature
points in the prior image obtained using GF2T and xb, yb( ) represent the corresponding feature
points in the post image obtained using LKP. Then using the points obtained in the post image,
we apply LKP in a backward direction and obtain the corresponding feature points in the prior
58
image. Let xc, yc( ) represent the feature points in the prior image obtained as a result of
backward optical flow. Only those points will be retained that satisfy the following equation
[45]:
xA − xC( )2 + yA − yC( )2 <σ 2 (3.19)
The rest of the points will be discarded. As given by [45], we have taken σ = 0.2 pixels to
implement a two-way optical flow, since it gives reasonable results for practical
implementations. Figure 3.17 shows an example of a two-way optical flow. The green donuts
represent the points that were retained by the two-way optical flow while the red donuts denote
the points that were discarded as a result of the two-way optical flow since the correspondences
in two directions did not agree.
(a) Original image in prior frame
(b) Result of two way optical flow with post frame
Figure 3.17 Two-way optical flow
Depending on how the feature points have moved from one frame to the next, the underlying
transformation can be calculated. We have considered Least Squares [46] and Kalman filter [46-
47] estimation of the transformation matrix. The differential motion of the camera between two
59
frames can be estimated based on this transformation matrix. When the motion of the camera and
the covariance matrix of the measurement of feature points is unknown, then Least Squares
estimation is used for the transformation matrix. Best Linear Unbiased estimator [46] is used
when the covariance matrix of measurements is known. However, when both the covariance of
the measurements as well as the dynamic model of the motion of camera are known, Kalman
filtering is used as the motion estimator.
At time t, let the translation in x and y directions between the world and camera frame of
reference be denoted by Txt,Ty
t( ) . Let the perpendicular distance from the optical center to the
planar surface be denoted as ht and the counterclockwise azimuthal rotation be represented as
azt . Let the kth feature point at time t be denoted as xw,kt , yw,k
t( ) in the world reference frame,
xc,kt , yc,k
t , zc,kt( ) in the camera reference frame and xk
t , ykt( ) in the camera image plane. Based on
(2.9) we obtain
xc,kt
yc,kt
zc,kt
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=cos(azt ) sin(azt ) 0−sin(azt ) cos(azt ) 0
0 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xw,kt
yw,kt
0
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
−Txt
Tyt
−ht
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
(3.20)
From here we have zc,kt = ht . From (2.5) and (2.6), we obtain
xkt
ykt
⎡
⎣⎢⎢
⎤
⎦⎥⎥= fht
cos(azt ) sin(azt ) −Txt '
−sin(azt ) cos(azt ) −Tyt '
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xw,kt
yw,kt
1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.21)
where
Txt ' = cos(azt )Tx
t + sin(azt )Tyt
Tyt ' = −sin(azt )Tx
t + cos(azt )Tyt (3.22)
60
An important assumption made here is that the feature points in the world frame remain
stationary in time. Hence, we can remove the time index t from the world coordinates to obtain
xkt
ykt
⎡
⎣⎢⎢
⎤
⎦⎥⎥= fht
cos(azt ) sin(azt ) −Txt '
−sin(azt ) cos(azt ) −Tyt '
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xw,kyw,k1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.23)
Similarly, if Txt−1,Ty
t−1( ) represents the translation between the world and camera reference frame
at time t-1, ht−1 represents the perpendicular distance from the optical center to the planar
surface and azt−1 represents the counter clockwise azimuthal rotation between the two coordinate
frames, the affine transformation between them is given as
xkt−1
ykt−1
⎡
⎣⎢⎢
⎤
⎦⎥⎥= fht−1
cos(azt−1) sin(azt−1) −Txt−1'
−sin(azt−1) cos(azt−1) −Tyt−1'
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xw,kyw,k1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.24)
From (3.23) and (3.24), we obtain the transformation between feature points in two consecutive
frames as
xkt
ykt
⎡
⎣⎢⎢
⎤
⎦⎥⎥= h
t−1
htcos(Δazt ) sin(Δazt ) −ΔTx
t
−sin(Δazt ) cos(Δazt ) −ΔTyt
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xkt−1
ykt−1
1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.25)
where
Δazt = azt − azt−1
ΔTxt = cos(azt )(Tx
t −Txt−1)+ sin(azt )(Ty
t −Tyt−1)
ΔTyt = −sin(azt )(Tx
t −Txt−1)+ cos(azt )(Ty
t −Tyt−1)
(3.26)
Note that (3.25) can be viewed as the following relationship between the feature points in two
frames.
61
p2 = R(p1 −T ) (3.27)
where p2 refers to the feature points in the post frame and p1 refers to the feature points in the
prior frame. R and T denote the relative rotation matrix and relative translation vector
respectively.
For the sake of simplicity, let us write
ht−1htcos(Δazt ) = cΔ
t
ht−1htsin(Δazt ) = sΔ
t
ht−1ht
ΔTxt = ΔTx
t '
ht−1ht
ΔTyt = ΔTy
t '
(3.28)
Hence, equation (3.25) becomes
xkt
ykt
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
cΔt sΔ
t −ΔTxt '
−sΔt cΔ
t −ΔTyt '
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xkt−1
ykt−1
1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.29)
This equation can be solved for the parameters {cΔt , sΔ
t ,ΔTxt ',ΔTy
t '} using either Least Squares or
Kalman filter.
3.4.1 Least Squares estimation
A Least Squares estimator provides an estimate of the differential rotation and translation of the
feature points from one frame to the next without assuming any dynamic model. (3.29) can be
rearranged in Least Squares notation as
62
xkt
ykt
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
xkt−1 yk
t−1 −1 0
ykt−1 −xk
t−1 0 −1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
cΔt
sΔt
ΔTxt '
ΔTyt '
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
(3.30)
For K number of quality feature points, we obtain an over-determined set of constraints for the
parameter vector {cΔt , sΔ
t ,ΔTxt ' ,ΔTy
t '} . These set of constraints give us the M = AP form, which
is solvable using Least Squares. Here M represents the measurement vector, A the system matrix
and P the measurement matrix. Hence, for K feature points, we have (3.30) as
x1t
y1t
!xKt
yKt
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
M"#$ %$
=
x1t−1 y1
t−1 −1 0
y1t−1 −x1
t−1 0 −1!
xKt−1 yK
t−1 −1 0
yKt−1 −xK
t−1 0 −1
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
A" #$$$$ %$$$$
cΔt
sΔt
Txt '
Tyt '
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
P"#$ %$
(3.31)
The Least Squares solution for this form is given by [46]
P = (ATA)−1ATM (3.32)
It can be noted here that we are solving for 4 parameters {cΔt , sΔ
t ,ΔTxt ' ,ΔTy
t '} instead of 3
parameters {Δazt,ΔTxt ',ΔTy
t '} . The former is preferred as the Least Squares formulation is linear
and can be solved directly. This estimation will provide us with only three parameters of camera
motion, that is translations in x and y and azimuthal rotation. The fourth parameter, that is the
height of the camera from the planar surface, will be estimated by imposing the constraint
between sin(Δazt ) and cos(Δazt ) , as will be seen shortly.
63
3.4.2 Kalman Filter estimation
Kalman filter estimation assumes a statistical update model. It is used when we have a specified
covariance matrix. For rest of the cases, the Least Squares algorithm is used. For solving (3.29)
using Kalman filter, the state vector, St , at time t is given as
St =
cΔt
sΔt
Txt '
Tyt '
⎡
⎣
⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥
(3.33)
The observation vector, Xt , and the measurement matrix, Ht are given as
Xt =
x1t
y1t
!xKt
yKt
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
(3.34)
Ht =
x1t−1 y1
t−1 −1 0
y1t−1 −x1
t−1 0 −1!
xKt−1 yK
t−1 −1 0
yKt−1 −xK
t−1 0 −1
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
(3.35)
Let A be the transition matrix and Q and C be the process and measurement noise covariances
respectively. Let S∧
t|t−1 be the estimate of state vector at time t based on previous measurements
and S∧
t|t be the updated estimate after the correction. Let Mt|t−1 represents the predicted Minimum
Mean Square Error (MMSE) and Mt|t represents the corrected MMSE. Let Kt represents the
64
Kalman gain at tth time instant. The state vector at a particular time instant can be solved using
the following recursive process of Kalman Filter stated in [46-47].
Prediction:
S∧
t|t−1 = AS∧
t−1|t−1 (3.36)
Prediction MMSE:
Mt|t−1 = AMt−1|t−1AT +Q (3.37)
Kalman Gain:
Kt = Mt|t−1HtT (Ct + HtMt|t−1Ht
T )−1 (3.38)
Correction:
S∧
t|t = S∧
t|t−1+ Kt (Xt −Ht S∧
t|t−1) (3.39)
Corrected MMSE:
Mt|t = 1−KtHt( )Mt|t−1 (3.40)
The viability of using Kalman filtering in this context depends on having a reasonably accurate
state update model, A, and good approximations of the covariance matrices Q and C. Q is based
on the known statistics of the statistical camera trajectory. C can be estimated from the variance
of feature points in the image, which depends on the quality of the camera as well as lighting and
contrast of tiles.
We note here that from equation (3.29), we have only two observables per time step per feature
point but there are a number of feature points observed in each frame. However, if any frame has
too few feature points available, then Kalman filter estimation can even be made with fewer
observations but Least Squares will not be able to estimate the parameters.
65
3.4.3 Estimation of camera motion from the transformation matrix
Having obtained the parameter vector, {cΔt , sΔ
t ,ΔTxt ' ,ΔTy
t '} from Least Squares or Kalman
filtering, the overall 4DOF egomotion of the camera can be obtained. Let az,t represent the total
counter clockwise azimuthal rotation that the camera has undergone at time t starting from its
initial position.
az,t = az,t−1 + tan−1 sΔ
t
cΔt
⎛⎝⎜
⎞⎠⎟
(3.41)
Let xa,t and ya,t represent the translations of the camera from the first frame in x and y
directions at time t. They are given by
xa,tya,t
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
xa,t−1ya,t−1
⎡
⎣⎢⎢
⎤
⎦⎥⎥+
Txt −Tx
t−1
Tyt −Ty
t−1
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
(3.42)
From (3.26), we have
ΔTx
t
ΔTyt
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥=
cos(azt ) sin(azt )−sin(azt ) cos(azt )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
Txt −Tx
t−1
Tyt −Ty
t−1
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
(3.43)
This can be written as
Tx
t −Txt−1
Tyt −Ty
t−1
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥=
cos(azt ) −sin(azt )sin(azt ) cos(azt )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
ΔTxt
ΔTyt
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
(3.44)
Also, we have ΔTxt = ht
ht−1ΔTx
t ' and ΔTyt = ht
ht−1ΔTy
t ' . Substituting it, we obtain
Tx
t −Txt−1
Tyt −Ty
t−1
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥= htht−1
cos(azt ) −sin(azt )sin(azt ) cos(azt )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
ΔTxt '
ΔTyt '
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
(3.45)
Hence, from (3.42) and (3.45), we obtain the translations of the camera as
66
xa,tya,t
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
xa,t−1ya,t−1
⎡
⎣⎢⎢
⎤
⎦⎥⎥+ htht−1
cos(azt ) −sin(azt )sin(azt ) cos(azt )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
ΔTxt '
ΔTyt '
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
(3.46)
To obtain the height, ht , of the camera at time t, we consider
(cΔt )2 + (sΔ
t )2 = ht−1ht
⎛⎝⎜
⎞⎠⎟
2
cos(azt )2 + sin(azt )2( ) (3.47)
Thus,
(cΔ
t )2 + (sΔt )2 = ht−1
ht
⎛⎝⎜
⎞⎠⎟
2
ht−1ht
= (cΔt )2 + (sΔ
t )2 (3.48)
Hence the height of the camera from the planar surface is given as
ht =ht−1
(cΔt )2 + (sΔ
t )2 (3.49)
3.5 Proposed 6DOF algorithm for rectangular patterned surface
The 6DOF egomotion algorithm proposed in this section works for cameras directed at the
rectangular patterned surfaces moving in any random fashion and exploits the grid of grout lines
associated with the pattern. The egomotion is with respect to the camera center. Desired
parameter are the differential translation vectors, as these are the parameters that ultimately
define the trajectory. The tilt angles and the azimuthal angles are the nuisance parameters, but
they need to be removed for proper trajectory estimation.
In this proposed algorithm, the egomotion estimation is robustly partitioned into a sequence of
2DOF and 4DOF estimations. The 2DOF algorithm uses the sequence of raw camera images to
estimate and compensate the camera tilt angles such that the camera image plane is coplanar with
67
the planar surface. Tilt compensation adjusts the images or projects it via a perspective mapping
to effectively remove the tilt angles. The compensated feature points are then used for extraction
of camera translation and azimuthal rotation using the 4DOF algorithm. The 4DOF algorithm
takes the sequence of tilt-compensated images as the input and uses Least Squares or Kalman
Filtering to estimate the differential translation, azimuthal rotation and change in height between
two consecutive image frames, as described in Section 3.4.
The reason for partitioning the 6DOF algorithm into 2DOF and 4DOF algorithms is that the
2DOF algorithm compensates for the tilts in the camera, which if left uncompensated will drift
the trajectory very quickly and the 4DOF algorithm uses Least Squares and Kalman Filter
estimation to find the trajectory. Hence, the partition increases the robustness in trajectory
estimation as compared to when a complete 6DOF estimation is performed.
A wheeled robot moving over a smooth surface perceives a simpler 2D environment as opposed
to the general 3D environment. Consequently, its motion can be fully characterised in 3D.
However, a person navigating with a handheld smart phone will invariably tilt it significantly.
Uncompensated tilt induces a translation error in the trajectory, as shown in Figure 3.18. The
trajectory estimation is quite sensitive to the camera tilt. If the tilt is not compensated, the
trajectory will drift off very quickly. Even if the tilt angles average out over the whole trajectory,
there is azimuthal rotation inter-dispersed with non-zero tilt angles that the overall egomotion
estimation will drift since the tilt transformation is not commutative.
68
Figure 3.18 Translation error induced by tilts in the camera
The proposed algorithm exploits the significant structure of the uniformly patterned surfaces to
map it to a tilt-compensated surface. A grid on the rectangular patterned surface is selected from
the first captured frame and the constraints on the grid are used to map this grid onto a tilt-
compensated grid such that the whole frame is transformed to an image that appears to be like
the one captured by a camera with no tilt. Figure 3.19 shows an example of how a tiled floor
would appear when captured from a camera with and without tilt.
(a) Image of tiled floor with tilt-free camera
(b) Image of tiled floor with tilted camera
Figure 3.19 Images of a tiled floor with tilt-free and tilted camera
Initial camera position with no tilt Final position of the camera that has not translated but only tilted
Translation
69
For implementation of tilt removal and compensation, we find the feature points at the
intersection of pattern lines on the gray scaled and Gaussian smoothed first captured frame by
employing Hough lines and GF2T. Based on the feature points obtained, a grid of tiles with
equal number of tiles in the two perpendicular directions is selected, as shown in Figure 3.20.
Green circles in the figure represent feature points at line intersections. The corner feature points
of the selected grid are used to map it onto a perfect rectangle, as shown in Figure 3.21 and
Figure 3.22. An assumption made here is that the tiled surface or rectangular patterned surface,
in particular, is uniform but can have local imperfections and irregularities. The mapping will
map the whole image into tilt-compensated image.
Figure 3.20 Possible grid selected on the tiled floor image
Figure 3.21 Feature points at the corners of the selection (shown by yellow circles)
70
Figure 3.22 Mapping from a tilted image to tilt-compensated image
The perspective relation between the tilted and tilt compensated images is given by
ptilt = R(ptilt _ free −T ) (3.50)
where ptilt represents the feature points of tilted image and ptilt _ free represents the feature points
of tilt-compensated image.
Based on the perspective rotation matrix between the two images, we can determine the camera
rotations in perpendicular directions. Let ax and ay represent the counter clockwise differential
tilts between two consecutive frames along x and y axes respectively and az represents the
counter clockwise azimuthal rotation along z axis. The respective rotation matrices, Rx , Ry and
Rz along x, y and z will be given by
Rx =1 0 00 cos(ax ) sin(ax )0 −sin(ax ) cos(ax )
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.51)
71
Ry =cos(ay ) 0 −sin(ay )
0 1 0sin(ay ) 0 cos(ay )
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.52)
Rz =cos(az ) sin(az ) 0−sin(az ) cos(az ) 00 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.53)
The overall rotation matrix, R, between the two frames is defined by the following order:
R = RxRyRz (3.54)
This gives
R =
cos(ay )cos(az ) cos(ay )sin(az ) −sin(ay )
sin(ax )sin(ay )cos(az )− cos(ax )sin(az ) sin(ax )sin(ay )sin(az )+ cos(ax )cos(az ) sin(ax )cos(ay )
cos(ax )sin(ay )cos(az )+ sin(ax )sin(az ) cos(ax )sin(ay )sin(az )− sin(ax )cos(az ) cos(ax )cos(ay )
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.55)
If R(i, j) represents the (i, j)th element of the rotation matrix, the tilt angles in the camera can be
obtained using the following equations:
az = tan−1 R(1,2)
R(1,1)⎛⎝⎜
⎞⎠⎟
(3.56)
ay = cos−1 R(1,1)cos(az )
⎛⎝⎜
⎞⎠⎟= cos−1 R(1,2)
sin(az )⎛⎝⎜
⎞⎠⎟
(3.57)
ax = cos−1 R(3,1)cos(az )+ R(3,2)sin(az )
sin(ay )⎛
⎝⎜⎞
⎠⎟
= sin−1 R(2,1)cos(az )+ R(2,2)sin(az )sin(ay )
⎛
⎝⎜⎞
⎠⎟
(3.58)
Thus, ax and ay are the required tilts that are removed from the image so that the image can be
used for the determination of affine egomotion.
72
Hence, using these angles, we determine the rotation matrix that will be used for tilt-
compensation. Let the tilt compensation rotation matrix be denoted as R ' , which will be given as
R ' = Ry−1Rx
−1 (3.59)
Transforming the tilted image with this rotation, we obtain
R ' ptilt = Ry−1Rx
−1ptilt= Ry
−1Rx−1R(ptilt _ free −T )
= Rz (ptilt _ free −T ) (3.60)
We, thus, obtain the tilt-compensated image, which appears to be like the one captured with a
camera whose optical axis is perpendicular to the planar surface. Hence, we obtain a relation
similar to (3.27), where the two consecutive images have translation and rotation in azimuth. As
a result, we can solve for the 4DOF estimation by using the algorithm proposed in section 3.4.
Now, having performed the tilt compensation on the first frame captured by the camera, we have
to compensate the tilt in all the consecutive frames and at the same time, we have to estimate the
affine transformation between each pair of consecutive frames. To compensate for the tilt in the
consecutive frames, we track the square grid of tiles selected in the first frame using the Lucas
Kanade Pyramid and determine the tilts between the two consecutive frames. The tilt obtained is
added to the tilt in the prior frame to obtain the cumulative tilt of the post frame. The cumulative
tilt obtained at a particular frame is used to remove the camera tilt from that particular frame.
Having removed the camera tilt from a pair of consecutive frames, we use the 4DOF egomotion
algorithm mentioned in section 3.4 to obtain the differential translation and azimuthal rotation of
the camera. As a result, we have the complete 6DOF trajectory estimation of the camera.
A problem faced while estimating the tilt using the algorithm mentioned above is that there are
times when a portion of the selected grid moves out of the FOV of the camera. As a result, Lucas
73
Kanade Pyramid will fail to track the grid. In order to account for the tracked feature points
moving out of the FOV, the algorithm adapts by selecting a new grid in the FOV which is
subsequently tracked in the consecutive frames. That is, whenever a corner feature point of the
selected grid moves out of the FOV, the selected grid shifts in such a way that the number of tiles
of the new grid in the two perpendicular directions remains the same. For instance, if it is
observed that the top left corner of the selected grid is moving out of the FOV from the left edge
of the image, the algorithm shifts the whole grid to the right by one tile. Figure 3.23 shows an
example of this grid shifting. In this example, as the top left corner of the grid moves closer to
the top edge of the frame, the algorithm causes the grid to shift down in the next frame.
(a) Selected grid in the prior frame
(b) Grid shifted down by one tile in the post frame
Figure 3.23 Example of grid shifting using the tilt compensation algorithm
Hence, the perspective between the two frames will be found based on the shifted grid of tiles.
The flow chart of the overall 6DOF egomotion algorithm proposed in this section is shown in
Figure 3.24.
74
Figure 3.24 Flow chart of the proposed 6DOF egomotion algorithm for rectangular patterned surface
Start
Find the camera tilt in the first frame using a grid of tiles
Find the tilt in the post frame and obtain tilt-compensated post frame
Using Hough lines and GF2T to find the feature points at the intersection of lines in tilt-
compensated prior image
Use Lucas Kanade Pyramid to find the corresponding feature points in tilt-compensated
post frame
Use Least Squares or Kalman filtering to obtain the differential translation and rotation between
the frames
Based on the obtained differential rotation and translation, estimate the camera trajectory
Stop
75
3.6 Proposed 6DOF algorithm for any planar surface
Like the algorithm proposed in section 3.5, this algorithm is also partitioned into a sequence of
2DOF and 4DOF algorithms. The 2DOF algorithm finds the relative tilt between the consecutive
frames and transforms the images in such a way that there is not relative tilt between the two
images. The 4DOF algorithm then estimates the translation and azimuthal rotation between the
two frames based on a Least Squares estimation method.
Like the algorithms mentioned in the above sections, the frames captured by the camera are pre-
processed and good features are extracted from the planar surface. For concrete surfaces, Shi and
Tomasi’s GF2T is used for feature extraction while the Hough lines algorithm along with GF2T
is used for the determination of quality feature points in the patterned surfaces. With the
available feature points in the prior frame, the corresponding feature points are extracted from
the post frame using a two-way optical flow. Based on the perspective between the two frames,
the rotation and translation of the camera are determined.
For this algorithm, instead of starting with the perspective between the camera and world
coordinate frames, we will consider the homography between the camera coordinate frames in
two positions. Let PC ,1 = xc,1 yc,1 zc,1⎡⎣
⎤⎦T
and PC,2 = xc,2 yc,2 zc,2⎡⎣⎢
⎤⎦⎥T
represent the
coordinates of the feature points with respect to the camera reference frame in the two camera
positions. Let the perspective transformation between the two frames be defined as
PC,2 = RPC,1 +T (3.61)
where the rotation matrix is given by the order R = RΔzRΔyRΔx . Operator Δ is used here to
signify that R is the relative rotation between the two frames. T represents the relative translation
between the two frames.
76
Let N = [n1,n2,n3]T represents the unit vector perpendicular to the planar surface with respect to
the camera reference frame at first camera position and d represents the distance from the optical
center of first camera position to the planar surface [22]. Thus,
NTPC,1 = n1xc,1 + n2xc,2 + n3xc,3 = d (3.62)
Hence, we can write
1dNTPC,1 =1 (3.63)
Substituting in equation (3.61), we get
PC,2 = RPC,1 +T ⋅1
= RPC,1 +T ⋅ 1dNTPC,1
⎛⎝⎜
⎞⎠⎟
= R + 1dTNT⎛
⎝⎜⎞⎠⎟ PC,1
(3.64)
Let the transformation matrix between the two frames be represented by H and given as
H = R + 1dTNT (3.65)
Hence, we have
PC,2 = HPC,1 (3.66)
If p1 and p2 represent the feature points corresponding to PC,1 and PC,2 in the respective image
planes in two positions, from (2.5) and (2.6) we have for some constants λ1 and λ2
p1 = λ1PC,1 =x1y11
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.67)
77
p2 = λ2PC,2 =x2y21
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.68)
Thus, for a new constant λ , we can write
p2 = λHp1 (3.69)
Hence, provided p1 and p2 , H can be determined to within a scale factor.
For the determination of H , let us consider p2 × p2 = p2∧p2 = 0 such that
p∧
2 Hp1 = 0 (3.70)
where any vector r = a b c⎡⎣
⎤⎦T
can be represented as the skew symmetric matrix operator,
r∧
as
r∧
=0 −c bc 0 −a−b a 0
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.71)
From (3.70) and (3.71), we have
0 −1 y21 0 −x2
− y2 x2 0
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
H11 H12 H13H21 H22 H23
H31 H32 H33
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
x1y11
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
= 0 (3.72)
From here, we obtain the following three equations.
−x1H21 − y1H22 − H23 + x1y2H31 + y1y2H32 + y2H33 = 0x1H11 + y1H12 + H13 − x1x2H31 − y1x2H32 − x2H33 = 0−x1y2H11 − y1y2H12 − y2H13 + x1x2H21 + y1x2H22 + x2H23 = 0
(3.73)
This gives us
78
0 0 0x1 y1 1
−x1y2 −y1y2 −y2
−x1 −y1 −10 0 0x1x2 y1x2 x2
x1y2 y1y2 y2−x1x2 −y1x2 −x20 0 0
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
H11
H12
H13
H21
H22
H23
H31
H32
H33
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
= 0 (3.74)
For K number of quality feature points, we can solve the above equation and solve for H by
using SVD. The right singular vector corresponding to the smallest singular value will
correspond to the vector of elements of H . The second singular value of H should be 1, which
provides a way of normalizing it [22].
Having obtained H from the observed feature points in the two planes, the next step is to estimate
the relative tilts from the estimated H. From (3.69), we can write
x2y21
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥= λ
H11 H12 H13
H21 H22 H23
H31 H32 H33
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
x1y11
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.75)
This gives us
x2 = λ(H11x1 + H12y1 + H13)y2 = λ(H21x1 + H22y1 + H23)
(3.76)
If {ΔTx,ΔTy,ΔTz} represents the relative translation between the two frames and {Δax,Δay,Δaz}
represents the relative rotation, then from (3.61), we have
79
xc,2yc,2zc,2
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=R11 R12 R13R21 R22 R23R31 R32 R33
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
xc,1yc,1zc,1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
+
ΔTxΔTyΔTz
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.77)
Hence,
xc,2 = R11xc,1 + R12yc,1 + R13zc,1 + ΔTxyc,2 = R21xc,1 + R22yc,1 + R23zc,2 + ΔTy
(3.78)
Since x1 = λ1xc,1 and x2 = λ2xc,2 , we can re-write the equation as
x2 = λ(R11x1 + R12y1 + R13 + λ1ΔTx )y2 = λ(R21x1 + R22y1 + R23 + λ1ΔTy )
(3.79)
Comparing (3.76) and (3.79), we obtain the following equations:
R11 = H11
R12 = H12
R21 = H21
R22 = H22
(3.80)
Considering the definitions of Rx , Ry and Rz given in the equations (3.51)-(3.53) and the order
of rotation R = RΔzRΔyRΔx , we have
R11 = cos(Δaz )cos(Δay )R12 = sin(Δaz )cos(Δay )R21 = −cos(Δaz )sin(Δay )sin(Δax )− sin(Δaz )cos(Δay )R22 = −sin(Δaz )sin(Δay )sin(Δax )+ cos(Δaz )cos(Δax )
(3.81)
Considering the frame rate to be sufficiently high for the tilt angles to be small, we can obtain the
relative tilt angles, Δax and Δay , from these equations. We can neglect the terms containing
sin(Δax )sin(Δay ) since these angles are very small. Hence the tilts can be obtained as
80
Δay = cos−1 R(1,1)2 + R(1, 2)2( )
Δax = cos−1
R(1,1)2 + R(1, 2)2( )R(2, 2)R(1,1)
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
(3.82)
Hence, from these equations, we can obtain the relative tilt between two consecutive frames. We
can compensate for the tilt such that there is only translation and azimuthal rotation between the
prior and post frame. Equation (3.61) can be written as
PC,2 = RΔzRΔyRΔxPC,1 +T (3.83)
We can compensate for the tilt between the images by transforming the prior frame by RΔyRΔx .
Let PC,3 be the camera coordinates of new image obtained. As a result, we have only translation
and azimuthal rotation between the two frames as
PC,2 = RΔzPC,3 +T (3.84)
RΔz and T can be solved for by using least squares estimation. It is important to note that in this
case, we cannot solve for the 4DOF estimation with the Least Squares model proposed in section
3.4 because in that model we considered translation followed by rotation. But in this case, we
started with rotation, followed by translation.
We can re-write equation (3.84) as
xc,2yc,2zc,2
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
=cos(Δaz ) sin(Δaz ) 0−sin(Δaz ) cos(Δaz ) 0
0 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
xc,3yc,3zc,3
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
+
ΔTxΔTyΔTz
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
(3.85)
We thus obtain
81
zc,2 = zc,3 + ΔTz
xc,2yc,2
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
cos(Δaz ) sin(Δaz )−sin(Δaz ) cos(Δaz )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
xc,3yc,3
⎡
⎣⎢⎢
⎤
⎦⎥⎥+
ΔTxΔTy
⎡
⎣⎢⎢
⎤
⎦⎥⎥
(3.86)
Dividing both sides by zc,2zc,3 , we obtain
1zc,3
x2y2
⎡
⎣⎢⎢
⎤
⎦⎥⎥= 1zc,2
cos(Δaz ) sin(Δaz )−sin(Δaz ) cos(Δaz )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
x3y3
⎡
⎣⎢⎢
⎤
⎦⎥⎥+ 1zc,2zc,3
ΔTxΔTy
⎡
⎣⎢⎢
⎤
⎦⎥⎥
(3.87)
This can be rearranged to obtain
x2y2
⎡
⎣⎢⎢
⎤
⎦⎥⎥=zc,3zc,2
cos(Δaz ) sin(Δaz )−sin(Δaz ) cos(Δaz )
ΔTxzc,3
ΔTyzc,3
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
x3y31
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.88)
Let zc,3zc,2cos(Δaz ) = cΔ ,
zc,3zc,2sin(Δaz ) = sΔ ,
ΔTxzc,2
= TΔx and ΔTyzc,2
= TΔy . Hence the equation can be
written as
x2y2
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
cΔ sΔ−sΔ cΔ
TΔxTΔy
⎡
⎣⎢⎢
⎤
⎦⎥⎥
x3y31
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
(3.89)
It can be re-arranged in Least Squares notation as
x2y2
⎡
⎣⎢⎢
⎤
⎦⎥⎥=
x3 y3 1 0
y3 −x3 0 1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
cΔsΔTΔxTΔy
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
(3.90)
For K number of quality feature points, we obtain an over-determined set of equations given as
82
x21
y21
!x2K
y2K
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
M"#$ %$
=
x31 y3
1 1 0
y31 −x3
1 0 1!
x3K y3
K 1 0
y3K −x3
K 0 1
⎡
⎣
⎢⎢⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥⎥⎥
A" #$$$ %$$$
cΔsΔTΔxTΔy
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
P"#$ %$
(3.91)
The superscripts denote the number of feature point. The Least Squares solution of this equation
is given as P = (ATA)−1ATM . Once we obtain the parameters {cΔ, sΔ,TΔx,TΔy} , we can solve for
the differential translations and azimuthal rotation.
The azimuthal rotation is given as
Δaz = tan−1 sΔ
cΔ
⎛⎝⎜
⎞⎠⎟
(3.92)
Translation along z axis is given as
zc,2 =zc,3cΔ2 + sΔ
2 (3.93)
Translations along x and y axis are given by
ΔTx =zc,3cΔ2 + sΔ
2TΔx
ΔTy =zc,3cΔ2 + sΔ
2TΔy
(3.94)
Hence we obtain the remaining 4DOF motion of the camera. This algorithm works for all kinds
of planar surfaces, concrete and structured. Structured surfaces provide very high quality feature
points at the intersection of lines. However the feature points on a concrete surface are
ambiguous. As a result, we observe drifts in the trajectory when the planar surface is concrete
83
while the structure of the pattern surfaces provides extra constraint, which is used for the better
trajectory estimation.
The major difference of this algorithm from the 6DOF algorithm proposed in section 3.5 is that
the algorithm proposed in section 3.5 uses the constraint on the rectangular structure to estimate
the absolute tilts in the camera and compensates for the tilts in a way that the images appear to be
like the ones taken from a camera exhibiting no tilt, while this 6DOF works for any general
planar surface. Due of the absence of constraints on the structure of the planar surface, it can
only find the differential tilts between the two consecutive image frames and compensates for the
tilts in the frames such that there is no relative tilt between the frames.
An important consideration in all the above algorithms is that even if the floor is concrete, there
must always be some good trackable feature points available for trajectory estimation.
84
Experimental Verification Chapter Four:
In this chapter, the various algorithms proposed in the previous chapter are verified
experimentally. The algorithms are verified in an indoor environment using different validation
schemes. A variety of floor surfaces are used to estimate the camera trajectory. The algorithms
are tested using three different types of camera. Two of the cameras used for egomotion
estimation are standard iPhone 5S camera and LG Nexus 4 camera. Both of these cameras
contain an 8 megapixels sensor with video frame rate of 30 frames per second. The third type of
camera that we have employed for the verification of egomotion algorithms is a Point Grey’s
Bumblebee stereo camera. This camera, shown in Figure 4.1, consists of two camera sensors, the
left sensor and the right sensor, each 1.3 megapixels. Hence using a Bumblebee camera, we can
obtain a video with two different sensors at the same time, which can be used for verifying the
accuracy of the algorithms, as will be seen in the following sections.
Figure 4.1 Bumblebee stereo camera
This chapter involves various set of experiments performed to verify the three proposed
algorithms. The set of experiments performed for the three algorithms are mentioned below.
4DOF algorithm is verified with
• Simulated videos
Right camera Left camera
85
• Experiments involving two cameras moving together in an affine fashion on any kind of
planar floor surface.
• Back projection error calculation for three different cameras moving in affine fashion
without tilts on any planar floor surface.
• Experiment involving RMS error calculation of trajectory of an iPhone moving in a circular
fashion without any tilt on a planar surface.
6DOF algorithm for rectangular patterns is verified with
• Experiments involving two cameras moving together in any random fashion on rectangular
patterned floor surface.
• Back projection error calculation for three different cameras moving randomly on a
rectangular patterned floor surface.
• Experiment involving RMS error calculation of trajectory of an iPhone moving in a circular
fashion on a rectangular tiled floor with tilts associated with it.
6DOF algorithm for any planar surface is verified with
• Experiments involving two cameras moving together in any random fashion on any planar
floor surface.
• Experiment involving RMS error calculation of an iPhone trajectory moving in a circular
fashion on any planar surface with tilts associated with it.
4.1 4DOF algorithm verification
To verify the 4DOF egomotion estimation algorithm, we first apply the algorithm to the
simulated videos of a rectangle moving in different known fashions. The 4DOF algorithm
estimates the translation of the camera, its azimuthal rotation and change in height with respect
to the planar surface. The estimated trajectory is hence compared to the known true trajectory.
86
Having verified the algorithm for simulated cases, various experiments are performed where a
camera is moved in a random fashion against planar surfaces, concrete or patterned, and the
trajectory is estimated. Care needs to be taken that the camera is not tilted and the optical axis
remains perpendicular to the planar surface, otherwise, as shown in chapter 3, the tilts in the
camera will induce translation errors causing drifts in the trajectory estimate. Different methods
are used to verify the accuracy of trajectory estimation.
4.1.1 Verification on simulated videos
The algorithm is verified on simulated videos by showing the plots of the estimates with the
actual parameters. The root mean square (RMS) error of the estimated trajectory is calculated
with reference to the known actual trajectory. We start with the simulated video of a rectangle
translating uniformly in x and y directions. The translation is taken to be 1 pixel per frame in
both x and y directions. The snapshots of the first and the last frame of the video are shown in
Figure 4.2. In this case, we expect the translation to be linear in x and y directions. The
azimuthal rotation is expected to be zero throughout and so is the change in height.
First frame
Last frame
Figure 4.2 First and last frame of uniformly translating rectangle
87
The affine transformation between each pair of consecutive frames is calculated using the Least
Squares estimation algorithm proposed in chapter 3. Based on this transformation, the rotation
and translation of the camera are estimated and plotted. Figure 4.3(a) plots the estimated
translation of the camera in the x direction against the true translation that is already known to
us. Similarly, Figure 4.3(b) shows the plot of the estimated translation in the y versus the true
translation. It can be seen that the two trajectories are approximately the same because of the
absence of noise in the video.
(a) Translation in x
(b) Translation in y
Figure 4.3 Camera translations for the simulated case of uniformly translating rectangle
88
The overall trajectory of the camera in terms of translation in x and y is shown in Figure 4.4.
The RMS error of the trajectory was found to be 0.0083 pixels. As expected, the motion of the
camera comes out to be linear in x and y .
Figure 4.4 Plot of estimated trajectory as compared to the true trajectory for the simulation of uniformly translating rectangle
The azimuthal rotation of the camera, which is expected to be zero throughout the video, is
shown in Figure 4.5. Considering the initial height of the camera from the planar surface to be 1
unit, the plot for the height of the camera for the whole video is shown in Figure 4.6.
The RMS errors for the azimuthal rotation and camera height from the planar surface were also
calculated. The RMS error of the azimuthal rotation was found to be 1.67e-05 radians (9.56e-
04°) while that for the height was found to be 8.37e-06 units.
89
Figure 4.5 Azimuthal rotation for the simulated case of uniformly translating rectangle
Figure 4.6 Height of camera from planar surface for the simulated case of uniformly translating rectangle
Next, we consider a simulation of a rectangle rotating uniformly about its center, as shown in
Figure 4.7. A complete 360o rotation of the rectangle about the optical axis is considered.
90
Figure 4.7 Frames of the simulated video of a uniformly rotating rectangle
For this simulation, we plot the trajectory in terms of translation of the camera in x and y . Since
the simulation considers a camera rotating about its optical axis, it is expected to not change its
position in terms of x and y . Figure 4.8 shows the trajectory for one complete rotation of the
camera about its optical axis. The plot for the expected and actual azimuthal rotations is shown
in Figure 4.9.
Figure 4.8 Trajectory of the camera for the simulation of uniformly rotating rectangle
91
Figure 4.9 Azimuthal rotation of the camera for the simulation of uniformly rotating rectangle
The RMS of the trajectory in the case of uniform rotation was found to be 0.9482 pixels. From
Figure 4.8 and Figure 4.9, it can be seen that the drifts are not random; rather they are uniform
and systematic. One possible reason for this systematic drift could be the uniform drift in the
Lucas Kanade based feature correspondence. Due to pixel quantization of the simulation, Lucas
Kanade finds a close match for the feature points in the consecutive images, resulting in the
estimation drift. This drift is carried forward and added to the further drift in the
correspondences, resulting in a systematic drift.
Now we consider the simulation of a uniformly shrinking and expanding rectangle, indicating the
change in height of the camera. When the camera is supposed to increase its height with respect
to the planar surface, the rectangle tends to shrink about the camera optical center. Likewise,
when the height is supposed to decrease, the rectangle tends to expand. We have considered
uniform shrinking of the rectangle for first few frames and for the next few frames, it is
92
uniformly expanded and remains still for last few frames. Figure 4.10 shows some of the frames
of the rectangle undergoing shrinking/expanding.
Figure 4.10 Frames of the simulated video of uniformly rotating rectangle
The overall trajectory for this case is shown in Figure 4.11. As expected, the camera remains
static in x and y for the entire video.
Figure 4.11 Trajectory for the simulation of camera when changing height
93
The RMS error of the estimated trajectory with respect to the true trajectory is found to be
0.1284 pixels. Azimuthal rotation, expected to be zero throughout, is shown in Figure 4.12.
Figure 4.13 shows the plot of the estimated height using the proposed 4DOF estimation
algorithm. The RMS error of the height is found to be 6.0252e-04 units.
Figure 4.12 Plot of azimuthal rotation for the simulation of camera when changing height
Figure 4.13 Plot of height estimate for the simulation of camera when changing height
94
All the above results are obtained considering the Least Squares estimation of the trajectory.
Since in case of all the simulations, we are aware of the motion of the camera, that is, the
statistical model of camera motion is known, we can also use Kalman filter for the trajectory
estimation. Hence, now we will estimate the trajectory using Kalman filtering and compare its
results with those obtained using Least Squares.
We consider the case of uniformly translating rectangle to estimate the trajectory using Least
Squares and Kalman filter. Figure 4.14(a) shows the results of trajectory estimation for first few
frames of the video while Figure 4.14(b) plots the trajectory estimation for the middle few
frames of the video. The reason for showing these different plots is to throw light on how
Kalman filter starts with a larger drift in the trajectory than Least Squares estimation, but as time
proceeds, it tracks the true trajectory and moves closer to it. This is because Kalman filtering is a
recursive process of prediction and correction. Even if the initial predicted values of the state
vector of Kalman filter do not correspond to the actual values, it eventually gets closer to the true
values as the number of measurements increase.
(a) Trajectory estimation during first few frames
95
(b) Trajectory estimation for later frames
Figure 4.14 Comparison of trajectory estimation using Least Squares and Kalman filtering
Here also we observe a constant residual error in the trajectory estimation, which might be due to
pixel quantization. In this case, the RMS error of the Least Squares trajectory was found to be
0.0083 pixels while that for the Kalman filter trajectory, it was found to be 0.0054 pixels. This is
because we are aware of the camera model in this case, which improves the performance of the
Kalman filter. We will see in the following sections that when the motion of the camera is
random, Least Squares estimation performs better than Kalman filter estimation.
4.1.2 Verification using stereoscopic view
For this experiment, we use two different cameras, the iPhone camera and the Nexus camera at
the same time. Both the cameras are pasted together and mounted on a cart in a way that they are
facing the floor surface. The cart moves the cameras on the floor such that their optical axis is
perpendicular to the planar surface. The picture of the setup is shown in Figure 4.15.
96
(a) Front view
(b) Top view
Figure 4.15 Setup of two cameras moving together on a floor surface
The cameras are calibrated using the calibration procedure described in Chapter 3. The cart is
moved for around 3.6 m and videos of the floor surface are recorded using the two cameras
simultaneously. The estimated trajectories of the two camera motions are plotted in terms of their
translations in x and y. Although both the cameras are independently observing different feature
points, but since they move in the same fashion, they should have the same trajectory. The RMS
variation of the two trajectories is calculated at various instants of the motion. Figure 4.16 shows
the resultant trajectories from the two cameras. The RMS variations at different instants of time
are given in Table 4.1.
97
Figure 4.16 Trajectories of two cameras moving together estimated using the 4DOF algorithm
Table 4.1 RMS variations in trajectories of two cameras moving together estimated using the 4DOF algorithm
Frame Number RMS variation (centimeters) 50 0.19
100 0.42
500 1.28
1000 2.51
1500 2.90
2000 3.53
The RMS variation of the entire trajectory was found to be 3.8 cm for 3.6 m of camera motion.
The azimuthal rotations of the two cameras are plotted in Figure 4.17 and their RMS variations
at various instants of motion are shown in Table 4.2. The overall RMS variation in the azimuthal
rotations was found to be 0.08 radians (4.58°) for 3.6 m.
98
Figure 4.17 Azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm
Table 4.2 RMS variations in azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm
Frame Number RMS variation (radians) 50 0.00
100 0.00
500 0.07
1000 0.08
1500 0.08
2000 0.08
In the second part of this experiment, instead of using two different cameras, we mount the Point
Grey’s Bumblebee stereo camera on the cart such that there is no tilt associated with the camera.
Videos of the floor surface are taken in the left and right camera simultaneously. The trajectories
and the azimuthal rotations of the two cameras estimated using the 4DOF algorithm are plotted
in Figure 4.18 and Figure 4.19 respectively. Table 4.3 and Table 4.4 provide the RMS variations
in the trajectories and azimuthal rotations of the two cameras.
99
Figure 4.18 Trajectories of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm
Figure 4.19 Azimuthal rotations of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm
100
Table 4.3 RMS variations in trajectories of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm
Frame Number RMS variation (centimeters) 50 0.01
100 0.10
300 0.90
500 2.29
800 2.89
1000 3.16
Table 4.4 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm
Frame Number RMS variation (radians) 50 0.00
100 0.00
300 0.01
500 0.01
800 0.01
1000 0.01
The total RMS variation in the trajectories for 3 m of camera motion was calculated to be 3.2 cm.
The RMS variation in the azimuthal rotation for the same length of camera motion was found to
be 0.01 radians (0.57°).
4.1.3 Back Projection Verification
This experiment aims to find the error in the trajectory of a camera moving against a planar
surface in a random fashion. Since the camera moves randomly in a 3D plane, it is not
convenient to determine its true trajectory for comparison with the estimated trajectory. Hence,
the back projection method [48] is used to determine the accuracy of the algorithm.
In the back projection method, for a pair of consecutive frames of the camera, the transformation
of the camera motion is estimated from the prior frame to the post frame based on Least Squares.
101
The translation and rotation of the camera are estimated from the obtained transformation matrix
and an inverse transformation matrix is obtained, which provides the transformation from the
post frame to the prior frame. The post image is transformed with this inverse transformation
matrix to obtain a new image. Back projection error is calculated between the feature points of
the new image and feature points of the prior image.
For our analysis, we have paired a frame with its tenth consecutive frame to compute the back
projection error, that is, the frame at time t is paired with the frame at time t+9. All the
differential translations and rotations until the tenth frame are accumulated to obtain the overall
transformation between the frame at time t and the one at time t+9. The frame at time t+9 is
back projected with the inverse transformation matrix and the back projection error is calculated
with the frame at time t. A scattered plot of the back projection errors is obtained and the
standard deviation of the errors is calculated.
We started with the iPhone camera moving randomly without any tilt against a floor surface.
Figure 4.20 shows the scattered plot for its motion obtained as a result of the back projection
method. The standard deviation of the errors was calculated to be 0.33 cm in the x direction and
0.54 cm in the y direction for 8.5 m of camera motion.
102
Figure 4.20 Scattered plot of back projection errors obtained for the iPhone camera using the 4DOF algorithm
Next we moved the Nexus camera in a random fashion against the floor surface to perform the
back projection experiment. Figure 4.21 shows the scattered plot obtained for the back projection
errors. The standard deviation in x direction was calculated to be 0.41 cm and that in y was found
to be 0.41 cm for 8.5 m of random camera motion.
Figure 4.21 Scattered plot of back projection errors obtained for the Nexus camera using the 4DOF algorithm
103
Finally, the stereoscopic camera is used to verify the algorithm using the back projection method.
Scattered plot of errors is shown in Figure 4.22. The standard deviation of the back projection
errors was calculated to be 0.12 cm in the x direction and 0.15 cm in the y direction for 7 m of
camera motion involving translation and azimuthal rotation.
Figure 4.22 Scattered plot of back projection errors obtained for the Bumblebee camera using the 4DOF algorithm
It is interesting to note the quantization effect that exists in the case of iPhone and Nexus phone
scattered plots shown in Figure 4.20 and Figure 4.21 respectively, but is absent in the case of the
Bumblebee camera scattered plot shown in Figure 4.22. The reason for the presence of this
artefact in the iPhone and Nexus camera plots might be that the frame rate in case of Bumblebee
camera is very high as compared to the iPhone and Nexus cameras. As a result, the differentials
between the two camera positions are much small in Bumblebee camera, which might be the
reason for the absence of quantization effect.
Having obtained the scattered plot for back projection based on Least Squares, we now use
Kalman filtering to obtain the scattered plot and calculate the errors. For the motion of
104
stereoscopic camera, the obtained scattered plot as a result of back projection is shown in Figure
4.23. Since we don’t have an idea about the motion model in this case, Kalman filtering performs
worse than Least Squares. The standard deviations of the back projection errors obtained are
slightly higher than those obtained with Least Squares for the same number of frames. The
standard deviation of the errors was calculated to be 0.13 cm in x direction and 0.17 cm in y
direction.
Figure 4.23 Scattered plot of back projection errors obtained for the Bumblebee camera based on the Kalman filter estimation of 4DOF algorithm
4.1.4 Verification based on known trajectory
This experiment considers a deterministic motion of the camera to verify the accuracy of the
4DOF algorithm. The camera is moved in a circular fashion by using a turntable, as shown in
Figure 4.24. The turntable consists of a Newmark RT-5 motorized rotary stage, shown in Figure
4.25 and a Newmark NSC-1 motion controller, shown in Figure 4.26, which controls the velocity
of the circular motion.
105
Figure 4.24 Turntable used to move the camera in a circular motion
Figure 4.25 Newmark RT-5 motorized rotatory stage
Figure 4.26 Newmark NSC-1 motion controller
106
For this experiment, we mounted the iPhone at a certain distance from the rotary stage, as shown
in Figure 4.27, and made the controller to rotate the table at a speed of 1! per sec. The turntable is
rotated for 120! and the video is recorded.
Figure 4.27 iPhone camera mounted on the turntable
The trajectory of the camera motion was plotted using Least Squares estimation and compared
against the actual trajectory. Figure 4.28 shows the plot of the camera trajectory obtained using
the 4DOF algorithm against the actual trajectory. The starting point of the actual trajectory is not
known exactly and is inferred from the estimated trajectory as the radius of motion and angle
moved are known. Hence, the actual trajectory might differ slightly from the one shown in the
figure. A uniform drift is observed in the trajectory, which might be due to slight tilts in the
camera resulting from mounting errors.
The RMS errors of the trajectory are obtained at various instants of time and are shown in Table
4.5. The RMS error for the complete motion of around 2 m was found to be 2.2 cm.
107
Figure 4.28 Camera trajectory obtained for a circular motion using the 4DOF algorithm
Table 4.5 RMS errors in camera trajectory obtained for a circular motion using the 4DOF algorithm
Frame Number Error (centimeters) 100 0.11
500 0.86
1000 1.50
2000 1.98
3000 2.12
The azimuthal rotation of the trajectory is plotted against the actual azimuthal rotation in Figure
4.29. Table 4.6 shows the azimuthal rotation RMS errors. The total azimuthal rotation for 2 m of
rotation on the turntable was found to be 0.02 radians (1.14°).
108
Figure 4.29 Azimuthal rotation obtained for circular camera motion using 4DOF algorithm
Table 4.6 RMS errors in azimuthal rotation obtained for circular motion using the 4DOF
algorithm
Frame Number Error (radians) 100 0.00
500 0.00
1000 0.00
2000 0.01
3000 0.01
The high accuracy of the algorithm is verified by a few centimeters of error for several meters of
camera motion.
4.2 Verification of 6DOF algorithm for rectangular patterned surfaces
Like the 4DOF algorithm for motion estimation, the verification of this algorithm is done by
performing various experiments with the three different cameras moved in a random fashion. In
this case, we move the camera against a rectangular patterned floor surface. There is no
restriction on the camera axis being perpendicular to the planar surface in this case. Hence the
109
camera might be tilted while capturing the frames on the rectangular tiled floor. The algorithm
first removes the tilt from the frames using the 2DOF algorithm and then performs the 4DOF
egomotion estimation.
4.2.1 Results of tilt removal on rectangular tiled floor
Before verifying the algorithm for trajectory estimation, the results for tilt removal are presented.
As stated in section 3.5, the 2DOF tilt removal algorithm works by choosing a square grid on the
patterned surface and mapping it onto a perfect square to estimate the tilt. The inverse of the tilt
calculated is then applied to the image to obtain a tilt compensated image. Figure 4.30 and Figure
4.31 show some images on which tilt removal is applied and the corresponding tilt compensated
images are obtained.
(a) Original image
(b) Tilt compensated image
Figure 4.30 Result of tilt removal algorithm
110
(a) Original image
(b) Tilt compensated image
Figure 4.31 Result of tilt removal algorithm
4.2.2 Verification based on stereoscopic view
Like section 4.1.2, for this algorithm also we performed the experiment where two different
cameras were mounted together on a cart and moved. To verify the tilt removal, the two cameras
were mounted with different tilts on the cart, as shown in Figure 4.32. For the verification of the
algorithm, the 2DOF algorithm should be able to compensate for the different tilts in the two
cameras and result in a similar trajectory. Figure 4.33 shows the result of trajectory estimation
for the two cameras and their calculated RMS variations are given in Table 4.7.
Figure 4.32 Setup of cameras mounted at different tilt angles on the cart
111
Figure 4.33 Trajectories of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns
Table 4.7 RMS variations in trajectories of cameras moving together obtained using the 6DOF algorithm for rectangular patterns
Frame Number RMS variation (centimeters) 50 0.28
100 0.22
500 2.62
1000 2.64
1500 3.00
2000 3.90
The RMS variation of the entire trajectory was found to be 4.9 cm for 8.5 m of camera motion.
The azimuthal rotations of the two cameras are plotted in Figure 4.34 and the RMS variations are
given in Table 4.8. The RMS variation in azimuthal rotations for the entire motion was found to
be 0.04 radians (2.29°).
112
Figure 4.34 Azimuthal rotations of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns
Table 4.8 RMS variations in azimuthal rotations of cameras moving together obtained using the 6DOF algorithm for rectangular patterns
Frame Number RMS variation (radians) 50 0.00
100 0.00
500 0.01
1000 0.02
1500 0.03
2000 0.04
Next, we moved the Bumblebee stereo camera in a tilted fashion randomly against the
rectangular tiled floor. The trajectories of the left and right camera sensors are shown in Figure
4.35 and their RMS variations at various instants of the video are shown in Table 4.9. The RMS
variation of the entire motion of 4.8 m is found to be 2.3 cm.
113
Figure 4.35 Trajectories of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns
Table 4.9 RMS variations in trajectories of the left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns
Frame Number RMS variation (centimeters) 50 0.05
100 0.07
500 1.51
1000 2.44
1500 2.56
2000 2.28
The azimuthal rotations of the left and right sensors of the Bumblebee camera are plotted in
Figure 4.36. The RMS variations at various times are shown in Table 4.10. The total RMS
variation for 4.8 m of camera motion was 0.02 radians (1.14°). A sudden significant deviation
can be seen in the azimuthal rotation at frame 1000. This might be due to the presence of
114
possible features in one of the camera frames, which significantly affect the estimation of that
camera, while these significant features are absent in the second camera, resulting in a sudden
deviation.
Figure 4.36 Azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns
Table 4.10 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns
Frame Number RMS variation (radians) 50 0.00
100 0.00
500 0.01
1000 0.01
1500 0.02
2000 0.02
115
4.2.3 Verification based on back projection
We performed the verification of the 6DOF algorithm using the back projection method
mentioned in Section 4.1.3. Figure 4.37 shows the scattered plot of the back projection errors for
a random motion of the iPhone camera on a rectangular tiled surface. The standard deviation of
the errors for 3 m of the iPhone camera motion was calculated to be 0.69 cm in the x direction
and 0.88 cm in the y direction for 3 m of camera motion.
Figure 4.37 Scattered plot of back projection errors obtained for the iPhone camera using the 6DOF algorithm for rectangular patterns
The scattered plot of the back projection errors obtained for the Nexus 4 camera is shown in
Figure 4.38. For 3 m of camera motion, the standard deviation of the errors was calculated to be
0.42 cm in the x direction and 0.59 cm in the y direction.
116
Figure 4.38 Scattered plot of back projection errors obtained for the Nexus camera using the 6DOF algorithm for rectangular patterns
Figure 4.39 shows the scattered plot of back projection errors for 4.8 m of the Bumblebee camera
motion. The standard deviations of the errors obtained in x and y direction are 0.17 cm and 0.53
cm respectively.
Figure 4.39 Scattered plot of back projection errors obtained for the Bumblebee camera using the 6DOF algorithm for rectangular patterns
117
All the above plots were obtained by using the Least Squares estimation for the 4DOF algorithm.
For the Bumblebee camera motion, we used Kalman filtering to estimate the egomotion and plot
the back projection errors. Figure 4.40 shows the scattered plot of the errors; the standard
deviations of the errors were 0.23 cm in the x direction and 0.86 cm in the y direction.
Figure 4.40 Scattered plot of back projection errors for the Bumblebee camera considering 6DOF estimation using Kalman filtering
Hence, we verify that when the motion model of the camera is known, Kalman filtering performs
better, as shown in Section 4.1.1. However, when the model of camera motion is unknown, Least
Squares estimation gives better results than the Kalman Filter estimation.
4.2.4 Verification based on a known trajectory
We again used the turntable to obtain a known trajectory of the camera motion. In this case, the
iPhone was mounted with a certain tilt at the end of the turntable shaft, as shown in Figure 4.41.
118
Figure 4.41 Camera mounted on the turntable at a certain tilt
The turntable was rotated about 1! per sec for 180! . Figure 4.42 shows the plot of the trajectory.
It is compared against the true known trajectory and the RMS errors were calculated. The RMS
errors in the trajectory at various instants of its motion are shown in Table 4.11. The total RMS
error for around 2.3 m of motion is 1.6 cm.
Figure 4.42 Camera trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns
119
Table 4.11 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns
Frame Number Error (centimeters) 100 0.59
500 1.57
1000 1.92
1500 1.68
2000 1.51
The azimuthal rotation of the camera for the semi-circle traversed is plotted in Figure 4.43. The
RMS errors obtained when comparing with the actual azimuthal rotation are shown in Table
4.12. The error for entire 2.3 m of motion is 0.05 radians (2.86°).
Figure 4.43 Azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns
120
Table 4.12 RMS errors in the azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns
Frame Number Error (radians) 100 0.01
500 0.03
1000 0.04
1500 0.04
2000 0.05
4.3 Verification of 6DOF algorithm for camera directed at any planar surface
For the verification of this algorithm, we have used the cameras on both tiled and patterned
surfaces and the cameras were moved in a random fashion with some tilts to obtain the trajectory
of motion. Different experimental trials were done and results obtained using various methods of
verification. The trajectory estimation on a concrete surface was compared with the trajectory
estimation on a patterned surface.
4.3.1 Verification using stereoscopic view
We will use a long range motion to verify the accuracy of this algorithm. The Bumblebee
stereoscopic camera was moved in a random fashion for a distance of 16 m on a concrete floor
and trajectories of the left and right sensors are plotted using the 6DOF algorithm as shown in
Figure 4.44. Their RMS variations are shown in Table 4.13. The RMS variation for the entire
motion of 16 m was found to be 9.9 cm.
121
Figure 4.44 Trajectory for long range motion of stereoscopic camera obtained using the 6DOF algorithm
Table 4.13 RMS variations in long range trajectories of the two sensors of the Bumblebee camera obtained using the 6DOF algorithm
Frame Number RMS variation (centimeters) 50 0.01
100 0.08
500 1.36
1000 5.24
1500 7.55
2000 9.88
The azimuthal rotations for the left and right sensors of the camera are shown in Figure 4.45.
Variations in the rotation at various instants during the motion are shown in Table 4.14. The
RMS variation for the entire 16 m of motion was found to be 0.10 radians (5.72°).
122
Figure 4.45 Azimuthal rotation for long range motion of stereoscopic camera obtained using the 6DOF algorithm
Table 4.14 RMS variations in long range azimuthal rotations of two sensors of the Bumblebee camera obtained using the 6DOF algorithm
Frame Number RMS variation (radians) 50 0.00
100 0.00
500 0.02
1000 0.06
1500 0.08
2000 0.10
Next we move the Bumblebee stereoscopic camera in a random fashion with some tilt. The
trajectories of the two sensors for this case are shown in Figure 4.46. The calculated RMS
variations are given in Table 4.15. The total RMS variation for around 3.2 m of motion was
calculated to be around 0.9 cm.
123
Figure 4.46 Trajectories of the two sensors of stereoscopic camera obtained using the 6DOF algorithm
Table 4.15 RMS variations in the trajectories of the sensors of stereoscopic camera obtained using the 6DOF algorithm
Frame Number RMS variation (centimeters) 50 0.01
100 0.12
200 0.54
300 0.92
Whichever way the camera is tilted, an equal amount of tilt is reflected by the left and right
sensors of the camera. Hence, we plot the tilts in the two camera sensors in Figure 4.47 and
Figure 4.48, which show the tilts in x and y directions respectively. The RMS variations in the tilt
angles obtained for the two trajectories were estimated. The variation in the tilt angles in the x
direction was found to be 0.00 radians and that in y direction was found to be 0.00 radians for
around 3.2 m of camera motion, hence insignificant.
124
Figure 4.47 Tilts in x direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm
Figure 4.48 Tilts in y direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm
125
Finally, the azimuthal rotations calculated for the two sensors are plotted in Figure 4.49 and their
RMS variations are given in Table 4.16. The variation for the overall motion was estimated to be
0.01 radians (0.57°).
Figure 4.49 Azimuthal rotations of the sensors of stereoscopic camera obtained using the 6DOF algorithm
Table 4.16 RMS variations in azimuthal rotations of sensors of stereoscopic camera obtained using the 6DOF algorithm
Frame Number RMS variation (radians) 50 0.00
100 0.00
200 0.01
300 0.01
4.3.2 Verification based on known trajectory
Like the known trajectory verification of the previous two algorithms, we used the turntable to
obtain a deterministic trajectory for a tilted camera and for comparison with the estimated
trajectory based on the 6DOF algorithm. The algorithm first compensates for the differential tilts
126
in the two consecutive frames and then estimates the trajectory and azimuthal rotation. The true
and the estimated trajectory for 160! of rotation are shown in Figure 4.50 and the azimuthal
rotations, actual and estimated, are plotted in Figure 4.51.
Figure 4.50 Camera trajectory obtained for a circular motion using the 6DOF algorithm
Figure 4.51 Azimuthal rotation obtained for a circular camera motion using the 6DOF algorithm
127
The RMS errors in the estimated trajectory and in the estimated azimuthal rotation calculated at
various instants of motion are shown in Table 4.17 and Table 4.18. The total RMS error in
trajectory was found to be 1.5 cm and that in azimuthal rotation was found to be 0.03 radians
(1.71°) for 2.5 m of rotation.
Table 4.17 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm
Frame Number Error (centimeters) 100 0.60
500 1.14
1000 1.03
1500 0.92
2000 1.15
Table 4.18 RMS errors in azimuthal rotation obtained for a circular motion using the 6DOF algorithm
Frame Number Error (radians) 100 0.01
500 0.02
1000 0.02
1500 0.02
2000 0.03
4.3.3 Comparison of trajectory estimation on patterned and concrete surfaces
In this subsection, we show how the results of trajectory estimation are affected when the floor is
patterned, instead of concrete. We performed the experiment based on a known trajectory using
the same camera on a concrete and a tiled floor under the same lightening conditions and moving
the camera for the same distance. The trajectories obtained on the two different floors were
128
plotted and their RMS errors calculated. Figure 4.52 and Figure 4.53 show the trajectories
estimated for the motion of camera on tiled and concrete floor respectively.
Figure 4.52: Trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm
Figure 4.53 Trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm
129
The RMS error calculated for 2.5 m of camera motion on tiled floor was found to be 2.5 cm and
that on concrete floor was found to be 5.2 cm. The RMS errors in the trajectories at several
instants during camera motion on tiled and concrete floor are shown in Table 4.19 and Table
4.20 respectively.
Table 4.19 RMS errors in trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm
Frame Number Error (centimeters) 100 0.35
500 0.53
1000 1.20
1500 1.95
2000 2.41
Table 4.20 RMS errors of trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm
Frame Number Error (centimeters) 1000 0.17
2000 0.93
3000 1.03
4000 1.86
5000 3.65
The azimuthal rotations for the circular camera motion for tiled and concrete floor are shown in
Figure 4.54 and Figure 4.55 and their corresponding RMS errors are given in Table 4.21 and
Table 4.22. The RMS errors for 2.5 m of motion were found to be 0.02 radians (1.14°) for tiled
floor and 0.03 radians (1.71°) for concrete floor.
130
Figure 4.54 Azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm
Figure 4.55 Azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm
131
Table 4.21 RMS errors in azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm
Frame Number Error (radians) 100 0.01
500 0.01
1000 0.01
1500 0.01
2000 0.02
Table 4.22 RMS errors of azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm
Frame Number Error (radians) 1000 0.00
2000 0.02
3000 0.03
4000 0.04
5000 0.03
The performance of the egomotion estimation algorithm improves significantly on a patterned
surface, since the lines of the pattern add extra information for the extraction of higher quality
feature points, while the ambiguity on the concrete surface results in lower quality feature points,
which causes the trajectory estimation to drift from its actual value.
Various trials of different experiments using a variety of cameras result in few centimeters of
errors in the trajectory estimation for several meters of camera motion, which proves the high
accuracy of the proposed egomotion algorithms.
132
Conclusions and Future Work Chapter Five:
This thesis provides robust 6DOF algorithms for the estimation of camera trajectory by making
use of the feature points on a planar surface. The estimated trajectory of the camera can be used
further to improve the performance of indoor navigation. This chapter provides a summary of the
thesis, which includes the contributions of this research. It also provides suggestions for future
work, which could improve the performance of the proposed algorithms.
5.1 Conclusions
This research aims to address the hypothesis stated in Chapter 1 that accurate trajectory
estimation can be achieved if the observed feature points are planar and the estimation can be
further improved if the features are on patterned surfaces.
The contributions of this research to support this hypothesis can be summarized as follows:
• The process of image formation and image transformation was explained. Methods of
feature point extraction and correspondence were introduced, which were used in the
proposed algorithms for the extraction and correspondence of feature points on the planar
surfaces.
• It was determined that in order to achieve centimeter level accuracy for trajectory
estimation, it is necessary to accurately compensate for the lens distortion of the camera. An
efficient method of doing this based on the chessboard camera calibration was introduced
and implemented in the overall routine.
• It was identified that for the extraction of high quality feature points from a planar surface,
noise needs to be removed and the structure of the surface needs to be highlighted. Thus,
various methods of pre-processing the image to remove noise and highlight the features
were established. These methods include Gaussian smoothing, edge detection and image
133
thresholding. In patterned surface, rich features can be obtained at the intersection of lines.
Hence, a method to extract the lines in the image based on the Hough transform was
introduced and implemented.
• A 4DOF algorithm for trajectory estimation based on Least Squares and Kalman filtering
was proposed for the cases where the camera is held in such a way that its optical axis is
perpendicular to the planar surface. The algorithm estimates the translations and azimuthal
rotation that the camera undergoes based on how the features on the planar surface move
from frame to frame.
• A 6DOF algorithm was proposed for the estimation of egomotion of the camera which
moves randomly against a rectangular patterned surface. The constraints on the structure of
the surface provide a means to estimate the absolute tilts in the camera. The proposed
algorithm compensates for the camera tilts and estimates the trajectory of the camera from
the motion of feature points in the tilt compensated images.
• A 6DOF algorithm that estimates the egomotion of the camera moving randomly against any
planar surface, concrete or tiled, was proposed. The algorithm first estimates the differential
tilts between the two camera positions and compensates for this differential tilt. The tilt
compensated images are then used to estimate the relative translations and azimuthal
rotation between the two camera positions.
• Simulated videos were used to verify the accuracy of the 4DOF algorithm for trajectory
estimation. The RMS errors calculated for the estimated trajectories were found to be of the
order of few millimeters. A comparison was provided between the estimations based on
Least Squares and Kalman filtering and it was shown that when the motion model of the
134
camera is known, Kalman filtering performs better than Least Squares while in all other
cases, Least Squares exhibits a better performance.
• Two different cameras moving together were used to verify that the proposed algorithms
provide similar estimation of trajectory for two independent cameras moving in the same
fashion. The RMS variations were calculated in the estimated trajectories of the two cameras
and were found to be of the order of few centimeters. Also, the RMS errors in the estimated
azimuthal rotations of the two cameras were calculated.
• Back projection method was used for the verification of proposed algorithms. Based on the
obtained rotation and translation, the feature points in the post frame were back projected
and the errors were calculated in the back projected feature points. Scattered plots were
obtained for the back projection error and millimeter level standard deviations obtained
support highly accurate trajectory estimation using the proposed algorithms.
• The performance of the algorithms was evaluated based on a known trajectory. A camera
was moved in a circular fashion using a rotary stage and a controller and the trajectory was
estimated based on the proposed algorithms. The RMS errors in the trajectories were
calculated based on the actual trajectory, which were found to be few centimeters for several
meters of camera motion.
• A comparison was provided for trajectory estimation on a concrete and patterned surface
based on known camera motion. It was shown that low RMS errors were obtained in the
trajectory and azimuthal rotation on a patterned surface as compared to the concrete surface,
which proves that the structure of the patterned surface adds information to improve the
performance of trajectory estimation.
135
Centimeter level accuracy in the experiments involving two cameras moving together, low
standard deviation in the back projection errors and few centimeters of RMS errors for several
meters of camera motion in the turntable experiment all indicate the high accuracy of the
proposed algorithms.
5.2 Future Work
The egomotion algorithms discussed in this thesis can be improved in performance to provide
better and robust trajectory estimation. Some potential future work includes:
• As discussed in Chapter 1, CV based algorithms provide good trajectory estimation for short
range trajectories but they are subject to long term drift. However, GNSS based observations
and other wireless signals provide high performance for long range with less drift issues. As
a result, integrating CV observables with GNSS might be an important step towards
attaining a very high accuracy in indoor navigation.
• Another important task that could be performed in the future is the integration of the CV
observables and wireless signals with data from inertial measurements for data fusion of
indoor location observables.
• This research focused on performing indoor navigation based on the features from a planar
surface. The algorithm proposed here could be extended to curved surfaces, steps of multiple
levels and intersecting walls.
• For the planar structures like the one shown in Figure 5.1, where the pattern of the surface
does not repeat itself, it is possible to estimate the absolute position of the camera.
136
Figure 5.1 Example of planar structure not repeating its pattern
Repeating tiles on a planar surface are ambiguous in terms of absolute camera location but
with this arrangement of tiles that never repeat their pattern, the ambiguity in the absolute
camera location can be mitigated. Hence, the use of local patterns to determine the camera
position is an interesting task that could be implemented in the future based on the
algorithms proposed herein.
• The trajectory, estimated using the proposed algorithm, could be used to perform beam
forming under line of sight conditions.
137
References
[1] G. Lachapelle, "GNSS Indoor Location Technologies," J. Global Position Systems, vol. 3,
nos. 1-2, pp. 2-11, 2004.
[2] F. Van Diggelen, “Indoor GPS Theory & Implementation”, Position Location and Navigation
Symposium, 2002 IEEE, pages 240-247, 2002.
[3] G. Dedes and A. G. Dempster, "Indoor GPS Positioning: Challenges and Opportunities," in
Proceedings of the IEEE Semiannual Vehicular Technology Conference. 2005.
[4] Van Diggelen and F. S. Tromp, “A-GPS: Assisted GPS, GNSS, and SBAS”, Artech House,
2009.
[5] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal, "GPS: location-tracking
technology", Computer 35.4 (2002): 92-94.
[6] G. M. Djuknic and R. E. Richton, "Geolocation and assisted GPS", IEEE Computer, vol.
34, no. 2, pp.123 -125 2001.
[7] M. Weyn and F. Schrooyen, "A Wi-Fi Assisted GPS Positioning Concept", ECUMICT 08,
Gent, Belgium, March 2008.
[8] H. Liu, H. Darabi , P. Banerjee and J. Liu, "Survey of wireless indoor positioning techniques
and systems", Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
Transactions on 37.6 (2007): 1067-1080.
[9] M. Kanaan and K. Pahlavan, "A comparison of wireless geolocation algorithms in the indoor
environment", Wireless Communications and Networking Conference, 2004. WCNC. 2004
IEEE. Vol. 1. IEEE, 2004.
[10] K. Kaemarungsi, and P. Krishnamurthy, "Modeling of Indoor Positioning Systems Based on
Location Fingerprinting", in Proc. IEEE INFOCOM, May 2004.
138
[11] Z. Zhang, "Estimating motion and structure from correspondences of line segments between
two perspective images", Pattern Analysis and Machine Intelligence, IEEE Transactions
on 17.12 (1995): 1129-1139.
[12] S. Scharer, J. Baltes and J. Anderson, "Practical ego-motion estimation for mobile
robots", Robotics, Automation and Mechatronics, 2004 IEEE Conference on. Vol. 2. IEEE,
2004.
[13] K. Yamaguchi, T. Kato, and Y. Ninomiya, "Vehicle ego-motion estimation and moving
object detection using a monocular camera", Pattern Recognition, 2006. ICPR 2006. 18th
International Conference on. Vol. 4. IEEE, 2006.
[14] S. W. Yang and C. C. Wang, "Multiple-model RANSAC for ego-motion estimation in
highly dynamic environments", Robotics and Automation, 2009. ICRA'09. IEEE International
Conference on. IEEE, 2009.
[15] S. Se, D. Lowe and J. Little, "Vision-based mobile robot localization and mapping using
scale-invariant features", Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE
International Conference on. Vol. 2. IEEE, 2001.
[16] A. Milella and R. Siegwart, "Stereo-based ego-motion estimation using pixel tracking and
iterative closest point", Computer Vision Systems, 2006 ICVS'06. IEEE International
Conference on. IEEE, 2006.
[17] A. Dev, B. Krose, and F. Groen, "Navigation of a mobile robot on the temporal
development of the optic flow", Intelligent Robots and Systems, 1997. IROS'97, Proceedings of
the 1997 IEEE/RSJ International Conference on. Vol. 2. IEEE, 1997.
[18] H. W. Sorenson, "Least-squares estimation: from Gauss to Kalman", Spectrum, IEEE 7.7
(1970): 63-68.
139
[19] G. Welch and G. Bishop, "An introduction to the Kalman filter" (1995).
[20] D. Simon, "Kalman filtering", Embedded Systems Programming 14.6 (2001): 72-79.
[21] E. Trucco and A. Verri, “Introductory techniques for 3-D computer vision”, Vol. 201.
Englewood Cliffs: Prentice Hall, 1998.
[22] Y. Ma. (Ed.), “An invitation to 3-d vision: from images to geometric models”, Vol. 26.
springer, 2004.
[23] J. S. Zelek, M. Holbein, K. Hajebi, D. C. Asmar and D. Cheng, "IR depth from stereo for
autonomous navigation", Defense and Security. International Society for Optics and Photonics,
2005.
[24] C. Harris and M. Stephens, “A combined corner and edge detector”, Alvey vision
conference, pp. 147-151, 1988.
[25] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International
Journal of Computer Vision, pp. 91–110, 2004.
[26] J. Shi and C. Tomasi, "Good features to track", Computer Society Conference on Computer
Vision and Pattern Recognition, 1994, pp. 593-600, 1994.
[27] G. Bradski and A. Kaehler, “Learning OpenCV: Computer vision with the OpenCV
library”, O'Reilly Media, Inc., 2008.
[28] B. D. Lucas and T. Kanade, "An iterative image registration technique with an application
to stereo vision", Proceedings of the 7th international joint conference on Artificial intelligence,
pp. 674-679, 1981.
[29] J. Y. Bouguet, "Pyramidal implementation of the affine Lucas Kanade feature tracker:
description of the algorithm", Intel Corporation, 2001.
140
[30] F. Devernay and O. Faugeras. "Straight lines have to be straight." Machine vision and
applications 13.1 (2001): 14-24.
[31] J. Park, S. C. Byun and B. U. Lee, "Lens distortion correction using ideal image
coordinates", Consumer Electronics, IEEE Transactions on 55.3 (2009): 987-991.
[32] G. Medioni and S B Kang, “Emerging topics in computer vision”, Prentice Hall PTR, 2004.
[33] “Camera Calibration using OpenCV” to obtain the calibration code and algorithm, online
link http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html.
[34] S. Krig, "Image Pre-Processing", Computer Vision Metrics. Apress, 2014. 39-83.
[35] O. R. Vincent and O. Folorunso, "A descriptive algorithm for sobel image edge
detection", Proceedings of Informing Science & IT Education Conference (InSITE). 2009.
[36] J. Canny, "A computational approach to edge detection", Pattern Analysis and Machine
Intelligence, IEEE Transactions on 6 (1986): 679-698.
[37] L. G. Roberts, “MACHINE PERCEPTION OF THREE-DIMENSIONAL soups”, PhD diss.
Massachusetts Institute of Technology, 1963.
[38] J. M. S. Prewitt, "Object enhancement and extraction", Picture processing and
Psychopictorics 10.1 (1970): 15-19.
[39] J. S. Weszka, R. N. Nagel and A. Rosenfeld, "A threshold selection technique", Computers,
IEEE Transactions on 100.12 (1974): 1322-1326.
[40] N. Otsu, "A threshold selection method from gray-level histograms", Automatica 11.285-
296 (1975): 23-27.
[41] T. Pun, "Entropic thresholding, a new approach", Computer Graphics and Image
Processing 16.3 (1981): 210-239.
141
[42] P. V. C. Hough, "Method and means for recognizing complex patterns", U.S. Patent No.
3,069,654. 18 Dec. 1962.
[43] D. H. Ballard, "Generalizing the Hough transform to detect arbitrary shapes", Pattern
recognition 13.2 (1981): 111-122.
[44] R. O. Duda, and P. E. Hart, "Use of the Hough transformation to detect lines and curves in
pictures", Communications of the ACM 15.1 (1972): 11-15.
[45] C. Golban, C. Mitran and S. Nedevschi. "A practical method for ego vehicle motion
estimation from video", Intelligent Computer Communication and Processing, 2009. ICCP 2009.
IEEE 5th International Conference on. IEEE, 2009.
[46] S. M. Kay, “Fundamentals of Statistical signal processing: Estimation Theory”, Vol 1,
Prentice Hall PTR, 1993.
[47] P. J. Hargrave, "A tutorial introduction to Kalman filtering", Kalman Filters: Introduction,
Applications and Future Developments, IEE Colloquium on. IET, 1989.
[48] C. Nielsen, and J. Nielsen, "Robust 6DOF Ego-Motion Estimation for Handheld Indoor
Positioning", International Conference on Image Processing, Computer Vision and Pattern
Recognition, 2012.