ucalgary 2014 dawar neha · 2017. 6. 27. · neha dawar a thesis ... cv based 6dof trajectory...

UNIVERSITY OF CALGARY

Computer Vision based Indoor Navigation Utilizing Information from Planar Surfaces

by

Neha Dawar

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

GRADUATE PROGRAM IN ELECTRICAL AND COMPUTER ENGINEERING

CALGARY, ALBERTA

SEPTEMBER, 2014

© Neha Dawar 2014

ii

Abstract

Traditional wireless signalling based outdoor navigation techniques generally result in

unsatisfactory performance for indoor environments due to low signal strength and multipath

distortions. Computer vision (CV) sensors, due to their low cost and high performance, have

gained enormous interest in indoor navigation over the past years.

CV based 6DOF trajectory estimation is understood to be a computationally intensive ill-posed

problem. Drastic simplification and enhanced robustness are possible in scenarios where camera

observed features are constrained to a plane, such as a floor surface. Furthermore, if the features

have geometric patterns, such as a regularly tiled surface, significantly more powerful constraints

can be implemented. Exploration of such constraints is the aim of this thesis. Experimental

results show that centimeter level accuracy in trajectory estimation can be achieved for arbitrary

camera motion spanning several meters. As shown in this thesis, this accuracy is a result of

constraints due to planar features observed.

iii

Acknowledgements

First of all, I would like to express my deepest appreciation and thanks to my supervisor, Dr.

John Nielsen, and my co-supervisor, Dr. Gérard Lachapelle, for providing me with this

opportunity to be a part of one of the most renowned research groups in navigation. I would like

to thank you for your valuable support and wisdom in encouraging my research. Without your

guidance and assistance, this research and the results achieved would not have been possible.

I would like to thank my friend and team member, Yuqi Li, for all the educational discussions

and guidance throughout the course of my research. A special thanks to my friend, Tushar

Sharma, for his continuous help and assistance with taking the readings for the experimental

verification of the work. I would like to acknowledge the Electrical and Computer Engineering

Department for making this research possible.

Finally, I would like to thank my parents, who despite of being far away, have always provided

me with great advices and encouragement. I am highly grateful to them for always being

supportive of my studies and for always being a source of enthusiasm for me.

iv

Table of Contents

Abstract ............................................................................................................................... ii Acknowledgements ............................................................................................................ iii Table of Contents ............................................................................................................... iv List of Tables ..................................................................................................................... vi List of Figures and Illustrations ....................................................................................... viii List of Symbols, Abbreviations and Nomenclature ......................................................... xiv

INTRODUCTION ..................................................................................1 CHAPTER ONE:1.1 Introduction to Navigation .........................................................................................1 1.2 Existing Indoor Navigation Techniques ....................................................................3 1.3 Integration with Computer Vision .............................................................................4 1.4 Objectives ..................................................................................................................7 1.5 Contributions .............................................................................................................8 1.6 Organization ...............................................................................................................9

BACKGROUND .................................................................................11 CHAPTER TWO:2.1 The Geometric Model ..............................................................................................12 2.2 Transformations .......................................................................................................16

2.2.1 Affine Transformation .....................................................................................16 2.2.2 Perspective Transformation .............................................................................18

2.3 Feature points ...........................................................................................................22 2.3.1 Examples of feature detection .........................................................................25

2.4 Optical Flow ............................................................................................................28 2.4.1 Example of optical flow using Lucas Kanade Pyramid ..................................34

PROPOSED ALGORITHM ............................................................37 CHAPTER THREE:3.1 Camera Calibration ..................................................................................................37

3.1.1 Intrinsic Camera Parameters ............................................................................38 3.1.2 Distortion Parameters ......................................................................................40 3.1.3 Calibration and distortion mitigation ...............................................................42

3.2 Image Pre-processing ...............................................................................................43 3.2.1 Gaussian Smoothing ........................................................................................45 3.2.2 Edge Detection ................................................................................................47 3.2.3 Thresholding ....................................................................................................50

3.3 Hough Lines .............................................................................................................52 3.4 Proposed 4DOF egomotion algorithm .....................................................................55

3.4.1 Least Squares estimation .................................................................................61 3.4.2 Kalman Filter estimation .................................................................................63 3.4.3 Estimation of camera motion from the transformation matrix ........................65

3.5 Proposed 6DOF algorithm for rectangular patterned surface ..................................66 3.6 Proposed 6DOF algorithm for any planar surface ...................................................75

EXPERIMENTAL VERIFICATION ................................................84 CHAPTER FOUR:4.1 4DOF algorithm verification ...................................................................................85

v

4.1.1 Verification on simulated videos .....................................................................86 4.1.2 Verification using stereoscopic view ...............................................................95 4.1.3 Back Projection Verification .........................................................................100 4.1.4 Verification based on known trajectory ........................................................104

4.2 Verification of 6DOF algorithm for rectangular patterned surfaces ......................108 4.2.1 Results of tilt removal on rectangular tiled floor ...........................................109 4.2.2 Verification based on stereoscopic view .......................................................110 4.2.3 Verification based on back projection ...........................................................115 4.2.4 Verification based on a known trajectory ......................................................117

4.3 Verification of 6DOF algorithm for camera directed at any planar surface ..........120 4.3.1 Verification using stereoscopic view .............................................................120 4.3.2 Verification based on known trajectory ........................................................125 4.3.3 Comparison of trajectory estimation on patterned and concrete surfaces .....127

CONCLUSIONS AND FUTURE WORK ........................................132 CHAPTER FIVE:5.1 Conclusions ............................................................................................................132 5.2 Future Work ...........................................................................................................135

REFERENCES ................................................................................................................137

vi

List of Tables

Table 2.1 Definitions to understand the geometric model ............................................................ 12

Table 3.1 Intrinsic parameters of the Bumblebee stereo camera .................................................. 43

Table 4.1 RMS variations in trajectories of two cameras moving together estimated using the 4DOF algorithm .................................................................................................................... 97

Table 4.2 RMS variations in azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm .................................................................................................... 98

Table 4.3 RMS variations in trajectories of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................. 100

Table 4.4 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm ...................................................................... 100

Table 4.5 RMS errors in camera trajectory obtained for a circular motion using the 4DOF algorithm ............................................................................................................................. 107

Table 4.6 RMS errors in azimuthal rotation obtained for circular motion using the 4DOF algorithm ............................................................................................................................. 108

Table 4.7 RMS variations in trajectories of cameras moving together obtained using the 6DOF algorithm for rectangular patterns ............................................................................ 111

Table 4.8 RMS variations in azimuthal rotations of cameras moving together obtained using the 6DOF algorithm for rectangular patterns ...................................................................... 112

Table 4.9 RMS variations in trajectories of the left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ................................. 113

Table 4.10 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ................................. 114

Table 4.11 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns ....................................................................................................... 119

Table 4.12 RMS errors in the azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................ 120

Table 4.13 RMS variations in long range trajectories of the two sensors of the Bumblebee camera obtained using the 6DOF algorithm ....................................................................... 121

Table 4.14 RMS variations in long range azimuthal rotations of two sensors of the Bumblebee camera obtained using the 6DOF algorithm .................................................... 122

vii

Table 4.15 RMS variations in the trajectories of the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................. 123

Table 4.16 RMS variations in azimuthal rotations of sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................. 125

Table 4.17 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm 127

Table 4.18 RMS errors in azimuthal rotation obtained for a circular motion using the 6DOF algorithm ............................................................................................................................. 127

Table 4.19 RMS errors in trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm ............................................................................................................ 129

Table 4.20 RMS errors of trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm ............................................................................................................ 129

Table 4.21 RMS errors in azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm ...................................................................................... 131

Table 4.22 RMS errors of azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm ...................................................................................... 131

viii

List of Figures and Illustrations

Figure 1.1 Multipath scenario in urban canyon .............................................................................. 2

Figure 1.2 Positioning based on triangulation ................................................................................ 3

Figure 1.3 Examples of planar surfaces .......................................................................................... 5

Figure 1.4 Examples of patterned surfaces ..................................................................................... 6

Figure 2.1 Imaging model for pinhole camera [22] ...................................................................... 13

Figure 2.2 Frontal imaging model for pinhole camera ................................................................. 14

Figure 2.3 Projection of 3D point on the camera image plane ..................................................... 15

Figure 2.4 An example of an affine transformed image ............................................................... 16

Figure 2.5 Affine transformation .................................................................................................. 17

Figure 2.6 An example of perspective transformation .................................................................. 19

Figure 2.7 Illustration of suitable and unsuitable feature points ................................................... 22

Figure 2.8 Wedge corners deviating from 90o providing low quality feature points ................... 23

Figure 2.9 Poor quality feature points at circular arcs .................................................................. 23

Figure 2.10 Corner feature points ................................................................................................. 25

Figure 2.11 Derivative images for corner features ....................................................................... 26

Figure 2.12 Plot of the larger eigenvalues of Q for 90! features ................................................ 26

Figure 2.13 Plot of the smaller eigenvalues of Q for 90! features .............................................. 27

Figure 2.14 Corner detection of simple geometric shapes ............................................................ 28

Figure 2.15 Side view of Gaussian pulse at time t and t+dt ......................................................... 32

Figure 2.16 Top view of Gaussian pulse at time t and t+dt .......................................................... 32

Figure 2.17 Spatial derivative of the Gaussian pulse in x and y directions .................................. 33

Figure 2.18 Time derivative of the Gaussian pulse ...................................................................... 33

Figure 2.19 Pyramid structure of images in Lucas Kanade Pyramid algorithm ........................... 34

ix

Figure 2.20 Plot of Gaussian pulses at different levels of pyramid .............................................. 35

Figure 2.21 Contour of Gaussian pulse at the second level of pyramid ....................................... 36

Figure 3.1 Effects of radial distortion ........................................................................................... 41

Figure 3.2 Images of different orientations of checkerboard captured using a camera ................ 42

Figure 3.3 Undistortion of the image of a tiled floor .................................................................... 43

Figure 3.4 Kernel based image processing ................................................................................... 44

Figure 3.5 Plot of Gaussian filter kernel ....................................................................................... 46

Figure 3.6 Results of Gaussian filtering ....................................................................................... 47

Figure 3.7 Result of Canny edge detection ................................................................................... 50

Figure 3.8 Results of thresholding applied to an image. ............................................................... 51

Figure 3.9 Binary thresholding applied to a tiled surface ............................................................. 52

Figure 3.10 Parameters of a line ................................................................................................... 53

Figure 3.11 Plot of lines passing through a point ......................................................................... 53

Figure 3.12 Probability mapping of points in the image for line detection .................................. 54

Figure 3.13 Hough lines on an image of rectangle ....................................................................... 55

Figure 3.14 Line detection on a patterned surface ........................................................................ 55

Figure 3.15 Results of GF2T on concrete and tiled surfaces ........................................................ 56

Figure 3.16 Result of feature detection on a tiled floor based on GF2T and Hough lines ........... 57

Figure 3.17 Two-way optical flow ................................................................................................ 58

Figure 3.18 Translation error induced by tilts in the camera ........................................................ 68

Figure 3.19 Images of a tiled floor with tilt-free and tilted camera .............................................. 68

Figure 3.20 Possible grid selected on the tiled floor image .......................................................... 69

Figure 3.21 Feature points at the corners of the selection ............................................................ 69

Figure 3.22 Mapping from a tilted image to tilt-compensated image ........................................... 70

x

Figure 3.23 Example of grid shifting using the tilt compensation algorithm ............................... 73

Figure 3.24 Flow chart of the proposed 6DOF egomotion algorithm for rectangular patterned surface ................................................................................................................................... 74

Figure 4.1 Bumblebee stereo camera ............................................................................................ 84

Figure 4.2 First and last frame of uniformly translating rectangle ............................................... 86

Figure 4.3 Camera translations for the simulated case of uniformly translating rectangle .......... 87

Figure 4.4 Plot of estimated trajectory as compared to the true trajectory for the simulation of uniformly translating rectangle ............................................................................................. 88

Figure 4.5 Azimuthal rotation for the simulated case of uniformly translating rectangle ............ 89

Figure 4.6 Height of camera from planar surface for the simulated case of uniformly translating rectangle .............................................................................................................. 89

Figure 4.7 Frames of the simulated video of a uniformly rotating rectangle ............................... 90

Figure 4.8 Trajectory of the camera for the simulation of uniformly rotating rectangle .............. 90

Figure 4.9 Azimuthal rotation of the camera for the simulation of uniformly rotating rectangle ................................................................................................................................ 91

Figure 4.10 Frames of the simulated video of uniformly rotating rectangle ................................ 92

Figure 4.11 Trajectory for the simulation of camera when changing height ................................ 92

Figure 4.12 Plot of azimuthal rotation for the simulation of camera when changing height ....... 93

Figure 4.13 Plot of height estimate for the simulation of camera when changing height ............ 93

Figure 4.14 Comparison of trajectory estimation using Least Squares and Kalman filtering ...... 95

Figure 4.15 Setup of two cameras moving together on a floor surface ........................................ 96

Figure 4.16 Trajectories of two cameras moving together estimated using the 4DOF algorithm ............................................................................................................................... 97

Figure 4.17 Azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm ............................................................................................................................... 98

Figure 4.18 Trajectories of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................................... 99

xi

Figure 4.19 Azimuthal rotations of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm .................................................................................... 99

Figure 4.20 Scattered plot of back projection errors obtained for the iPhone camera using the 4DOF algorithm .................................................................................................................. 102

Figure 4.21 Scattered plot of back projection errors obtained for the Nexus camera using the 4DOF algorithm .................................................................................................................. 102

Figure 4.22 Scattered plot of back projection errors obtained for the Bumblebee camera using the 4DOF algorithm ............................................................................................................ 103

Figure 4.23 Scattered plot of back projection errors obtained for the Bumblebee camera based on the Kalman filter estimation of 4DOF algorithm ................................................. 104

Figure 4.24 Turntable used to move the camera in a circular motion ........................................ 105

Figure 4.25 Newmark RT-5 motorized rotatory stage ................................................................ 105

Figure 4.26 Newmark NSC-1 motion controller ........................................................................ 105

Figure 4.27 iPhone camera mounted on the turntable ................................................................ 106

Figure 4.28 Camera trajectory obtained for a circular motion using the 4DOF algorithm ........ 107

Figure 4.29 Azimuthal rotation obtained for circular camera motion using 4DOF algorithm ... 108

Figure 4.30 Result of tilt removal algorithm .............................................................................. 109

Figure 4.31 Result of tilt removal algorithm .............................................................................. 110

Figure 4.32 Setup of cameras mounted at different tilt angles on the cart ................................. 110

Figure 4.33 Trajectories of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns ....................................................................................................... 111

Figure 4.34 Azimuthal rotations of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns ....................................................................................... 112

Figure 4.35 Trajectories of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ............................................................................ 113

Figure 4.36 Azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns ............................................................ 114

Figure 4.37 Scattered plot of back projection errors obtained for the iPhone camera using the 6DOF algorithm for rectangular patterns ............................................................................ 115

xii

Figure 4.38 Scattered plot of back projection errors obtained for the Nexus camera using the 6DOF algorithm for rectangular patterns ............................................................................ 116

Figure 4.39 Scattered plot of back projection errors obtained for the Bumblebee camera using the 6DOF algorithm for rectangular patterns ...................................................................... 116

Figure 4.40 Scattered plot of back projection errors for the Bumblebee camera considering 6DOF estimation using Kalman filtering ............................................................................ 117

Figure 4.41 Camera mounted on the turntable at a certain tilt .................................................... 118

Figure 4.42 Camera trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................................................. 118

Figure 4.43 Azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns ............................................................................................................. 119

Figure 4.44 Trajectory for long range motion of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 121

Figure 4.45 Azimuthal rotation for long range motion of stereoscopic camera obtained using the 6DOF algorithm ............................................................................................................ 122

Figure 4.46 Trajectories of the two sensors of stereoscopic camera obtained using the 6DOF algorithm ............................................................................................................................. 123

Figure 4.47 Tilts in x direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 124

Figure 4.48 Tilts in y direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 124

Figure 4.49 Azimuthal rotations of the sensors of stereoscopic camera obtained using the 6DOF algorithm .................................................................................................................. 125

Figure 4.50 Camera trajectory obtained for a circular motion using the 6DOF algorithm ........ 126

Figure 4.51 Azimuthal rotation obtained for a circular camera motion using the 6DOF algorithm ............................................................................................................................. 126

Figure 4.52: Trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm ............................................................................................................................. 128

Figure 4.53 Trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm ............................................................................................................................. 128

xiii

Figure 4.54 Azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm .................................................................................................................. 130

Figure 4.55 Azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm .................................................................................................................. 130

Figure 5.1 Example of planar structure not repeating its pattern ................................................ 136

xiv

List of Symbols, Abbreviations and Nomenclature

AOA Angle of Arrival AGPS Assisted Global Positioning System CV Computer Vision DOF Degrees of Freedom FOV Field of View GF2T Good Features to Track GNSS Global Navigation Satellite System GPS Global Positioning System LOS MMSE RANSAC

Line-of-Sight Minimum Mean Square Error Random Sample Consensus

RFID Radio Frequency Identification RMS Root Mean Square RSS Received Signal Strength SNR Signal-to-Noise Ratio SLAM Simultaneous Localization and Mapping SVD Singular Value Decomposition TDOA Time Difference of Arrival TOA Time of Arrival UHF Ultra High Frequency UWB Ultra Wideband WLAN Wireless Local Area Network

1

Introduction Chapter One:

1.1 Introduction to Navigation

Navigation has become an integral part of our lives. Information about our current location, time

to a destination, possible routes to a destination provided by navigation services save

considerable amount of time and effort in our busy schedules. There are several available Global

Navigation Satellite System (GNSS) based technologies that use wireless signalling to perform

navigation. For instance, the Global Positioning System (GPS) is widely used to provide

location-based services to the users all across the globe. The European GALILEO and Russian

GLONASS are also becoming functional to provide these services. While these technologies are

effective for positioning in outdoor environments, their performance is quite unsatisfactory when

used for indoors. Indoor environments require sub meter level accuracy for many positioning

applications to be practical. Being subject to low signal-to-noise ratio (SNR) and multipath

distortions, wireless signals are not able to meet these requirements [1].

It takes several seconds for a standard GPS receiver to acquire the satellites. In addition to that,

the initial acquisition requires clear view of the sky and high signal strength [2]. In indoor

environments and urban canyons, where the view of the sky is not clear and the strength of the

received signal is weak, the acquisition time is extended significantly. Poor signal strength not

only increases the time to acquire the satellite but also makes it difficult to decode the navigation

data from the satellite [3].

The direct unobstructed signals from the satellite are referred as the Line-of-sight (LOS) signals.

Due to the free space loss, LOS signals already have very low SNR [1]. Multipath in outdoor

environments results in reflected signals that are weaker than the LOS signals. However in

indoor environments and urban canyons, the reflected signals may be stronger than the LOS

2

signals. In such environments, in order to acquire the weak LOS signals, it is important to

remove the strong reflected signals first [3]. Figure 1.1 shows the multipath scenario in urban

canyons. Orange lines represent the line of sight (LOS) signals while the green lines represent

the reflected signals.

Figure 1.1 Multipath scenario in urban canyon

In standard GPS receivers, no prior information about the satellite is available at the GPS

receiver at the time of acquisition. The advanced technology to overcome the conventional GPS

problems of excessive acquisition time and ineffective performance in weak SNR conditions is

assisted GPS (AGPS) [1-5]. In AGPS, wireless network of the handset provides information

about the GPS signal that the handset will receive [6]. It not only reduces the acquisition time,

but also enables the detection of signals having low SNR. However, in order to be able to use the

GPS Receiver

3

services of AGPS, it is important that the GPS device is always connected to a cellular network

[7].

1.2 Existing Indoor Navigation Techniques

There are various indoor positioning techniques available, for instance, triangulation, scene

analysis (location fingerprinting) and proximity devices [8]. These techniques use the existing

wireless technologies like Wireless Local Area Networks (WLAN), Radio Frequency

Identification (RFID), Ultra Wideband (UWB), Bluetooth and Ultra High Frequency (UHF) for

positioning. In triangulation, the distance from three different access points is used to estimate

the 2D position of the navigating object, as shown in Figure 1.2. Distance measurements can be

based on various metrics like time of arrival (TOA), time difference of arrival (TDOA), angle of

arrival (AOA) and received signal strength (RSS) [8-9].

Figure 1.2 Positioning based on triangulation

Scene analysis, also known as location fingerprinting, is based on RSS measurements and is

divided into two stages: offline stage and online stage. In the offline stage, the features or the

fingerprints of the surrounding are gathered based on the RSS measurements from the access

AN2

AN1

AN3

Estimated Location

4

nodes. In the online stage, the collected fingerprints are matched with a priori fingerprints to find

the estimated location [8][10].

Proximity algorithm looks for the antenna in whose vicinity the navigating object lies. If there

are more than one such antenna, it looks for the one with the strongest signal strength. Based on

this information, it provides the relative position of the navigating object with respect to the

antenna.

Although these techniques provide solution to the indoor navigation problem, they are more

suitable for open areas. Moreover, they require positioning infrastructure that are much more

expensive and time-consuming to deploy. Thus, the demand for high accuracy indoor positioning

requires highly accurate trajectory estimation, which had led to the development of new

algorithms for indoor positioning. Due to their high performance and low cost, computer vision

(CV) based sensors have gained enormous interest in the field of indoor navigation.

1.3 Integration with Computer Vision

CV based sensors and algorithms provide highly accurate trajectory estimation in the presence of

anchor node based landmark features. When such features are unavailable, stationary features of

opportunity are used in Simultaneous Localization and Mapping (SLAM) to support trajectory

estimation. While CV observables provide high accuracy for short trajectories, the estimation

drifts as the distance increases. As a result, CV observables complement the GNSS based

observations, which are more accurate for large distances. Hence, any smart phone having both a

camera and GPS allows for the integration of CV based trajectory estimation and GNSS based

navigation.

There are many techniques available that use cameras to estimate the egomotion or the self-

motion of the camera. As an example, [11-12] present a method to estimate the egomotion by

5

tracking the lines in the images. [13-14] perform feature correspondence and random sample

consensus (RANSAC) based egomotion. An algorithm to find the camera motion based on scale

invariant image features is presented in [15], while [16] makes use of stereo-vision with an

iterative closest point scheme to perform egomotion. [17] presents a method to estimate the

trajectory utilizing the structure of the environment.

This thesis presents a robust 6 degrees of freedom (DOF) ego-motion estimation of a camera,

directed to a planar surface, for instance floor, ceiling or wall surface. The surface may be a

concrete surface or a highly patterned surface, as shown in Figure 1.3. It will be shown that while

capturing a concrete unpatterned surface, only random features of opportunity are available. As a

result, the estimation of trajectory drifts over distance. While if the camera is facing a patterned

surface, then it can utilize the additional information regarding the structure of the patterned

surface to ameliorate the performance of trajectory estimation algorithm.

(a) Concrete floor with random features of opportunity

(b) Tiled floor with patterned features of opportunity

Figure 1.3 Examples of planar surfaces

6

Any well-defined patterns on floors, ceilings or walls, which provide more structural

information, may be used to improve the accuracy of trajectory estimation. Concrete random

points result in potential difficulties in detecting feature point correspondences while in patterned

surfaces, the points of the intersecting grout lines provide much higher SNR, resulting in higher

trajectory estimation accuracy. Some other examples of patterns on the planar surfaces are shown

in Figure 1.4.

Figure 1.4 Examples of patterned surfaces

7

A camera directed at a planar surface can undergo perspective transformation, that is, it might

undergo rotation and translation. 6DOF estimation deals with the estimation of the translations

and rotations along the three perpendicular axes. The rotation in the xy plane is referred to as the

azimuthal rotation, while the rotations in the yz and xz plane are referred as the tilts, since they

are the result of the tilting of camera about its optical axis.

During its motion, the camera picks up some features of opportunity on the planar surface and

the perspective transformation of the features determines the motion of the camera. Detection of

tilts in the camera from the estimated rotation matrix is generally an ill posed CV problem. A

camera generally has a MEMS sensor accelerometer and rate gyrometer that could be used to

estimate the tilts in the camera. However, a complication here is that the accelerometer is unable

to distinguish between gravity vector and receiver acceleration relative to an inertial frame. The

proposed CV algorithm also determines the tilts in the camera, with reference to the planar

surface, using the homography of the features of opportunity. The structure of the patterned

surfaces enhances the SNR such that robust tilt detection is possible. The novelty of this research

lies in the use of patterned surfaces to improve the accuracy of trajectory estimation using CV

observables.

1.4 Objectives

This primary hypothesis of this research is that we can perform accurate 6DOF trajectory

estimation with minimal processing effort if the observed features are planar and if they have

some good structure. In order to address this hypothesis, the major objectives of this research can

be summarized as follows:

1. Extraction of rotation and translation vectors from the perspective transformation that the

features of opportunity undergo in the consecutive frames. The estimation of rotation and

8

translation from the transformation will be based on Least Squares [18] or Kalman Filter [19-

20] estimations, depending upon the information about the motion that is available to us.

2. Comparison of Least Squares and Kalman Filter estimation of the camera motion.

3. Estimation and compensation of tilts in the camera from the transformation of features of

opportunity.

4. Estimation of 6 DOF trajectory of the camera from the motion of observed planar features of

opportunity in the consecutive frames.

5. Verifying if the structure of the surface improves the trajectory estimation by providing a

comparison of trajectory estimation in case of concrete and regular patterned surfaces.

1.5 Contributions

A novel 6DOF algorithm, which is partitioned into 2DOF and 4DOF estimations, is proposed for

rectangular patterned surfaces. The 2DOF algorithm estimates and compensates the absolute tilts

in the camera while the 4DOF algorithm determines the camera translations and azimuthal

rotation. A general 6DOF algorithm that determines the differential translations, azimuthal

rotation and tilts in the camera facing any planar surface is presented. This algorithm is also

partitioned into a sequence of 2DOF and 4DOF estimations. The 2DOF estimation compensates

the differential tilts while the 4DOF estimation determines the relative rotation and translations

between two camera positions. It is shown that the use of patterns on planar surfaces can

improve the performance of trajectory estimation of the camera. Drifts in the estimated trajectory

can be reduced in case of patterned surfaces as opposed to concrete surfaces. 6 DOF egomotion

of the camera can be determined from the motion of the feature points on the planar surfaces. A

paper titled “Indoor Navigation based on Computer Vision utilizing Information from Patterned

9

Surfaces” based on this concept was presented and will be published in the proceedings of the

ION (Institute of Navigation) GNSS+ conference held in September 2014 in Tampa, Florida.

1.6 Organization

In this chapter, we have discussed the basic concepts of navigation and problems with using

wireless signalling for indoor navigation. Some techniques used for indoor navigation are

discussed and positioning using CV observables is introduced. The rest of the thesis is organized

as follows:

• In Chapter 2, the necessary definitions and algorithms to understand the proposed algorithm

for 6DOF camera egomotion are developed. The pinhole model of the camera for image

formation is explained. An introduction to the different transformations that the images can

undergo is provided, followed by the concept of feature extraction and correspondence.

• Chapter 3 explains the proposed algorithm for trajectory estimation. The process of camera

calibration using a chessboard pattern and image pre-processing techniques are explained in

this chapter. A robust 4DOF algorithm for extraction of motion based on Least Squares and

Kalman filtering is provided. A 6DOF algorithm based on 2DOF tilt compensation and

4DOF trajectory estimation is proposed for rectangular patterned surfaces. Finally, a 6DOF

egomotion estimation algorithm based on the motion of features of opportunities is provided

for any general planar surface.

• Chapter 4 provides the experimental verification of the proposed algorithm. Various different

experiments are performed to verify the proposed algorithms. The algorithms are verified

using the stereo camera. Then, a back projection method is used to verify the algorithms.

Finally, the comparison of estimated trajectory is provided against the true trajectory. Results

of trajectory estimation using Least Squares are compared against those using Kalman

10

filtering, depending on the information available regarding the motion of the camera. A

comparison is provided for trajectory estimation on a patterned and concrete surface.

• Finally chapter 5 concludes the thesis and provides some suggestions for future work.

11

Background Chapter Two:

Navigation finds applications in various indoor facilities like airports, hospitals, shopping centers

and malls. Various GNSS based technologies, being subject to multipath distortions and low

SNR, are unsuitable for use in indoor environments. Hence, in this research, we use cameras to

perform navigation in indoor environments. A camera observes the motion of objects that fall in

its field of view (FOV) and based on that information, it estimates its own motion, known as the

ego-motion, with respect to the surroundings.

In order to understand the ego-motion of the camera, it is important to understand the concept of

reference frames used to reach the trajectory estimation algorithm. An object in the FOV of the

camera is mapped from the world reference frame onto the camera reference frame, which is

further mapped onto the image plane of the camera to obtain the image of the object. Based on

the mapping of object obtained on the camera image plane, transformation of an object in two

consecutive frames is determined. The translation and rotation of the object in two consecutive

frames is obtained from this transformation. The estimated translation and rotation vectors are

hence used to determine the motion of the camera in two consecutive frames.

This chapter introduces some necessary definitions important to understand the proposed

algorithm. Firstly, the geometric model is introduced and types of transformations are defined.

Determination of transformation between two consecutive frames of the video sequence is done

on the basis of motion of some points, known as feature points, on the image frame. So, after

explaining the transformations, the concept of feature points extraction is explained and an

algorithm to obtain the feature points is introduced. The final section presents an algorithm to

find the correspondence of feature points in two consecutive frames, so that they can be used to

obtain the underlying transformation.

12

2.1 The Geometric Model

We will consider a pinhole camera model to understand the geometry behind image formation

[21]. An image is the representation of visual perception on a two-dimensional matrix. Hence,

image formation involves the mapping of a 3D point in the world frame onto the 2D image frame

of the camera. To understand this mapping, we will first consider the projection of the point from

the world frame to the 3D camera frame, which will then be mapped onto the 2D camera image

plane. Some definitions and notations to understand the concept are given in Table 2.1.

Table 2.1 Definitions to understand the geometric model

OC Camera origin in world reference frame OW World origin in world reference frame

{XW ,YW ,ZW } Directional unit vector of world reference frame

{XC ,YC ,ZC} Directional unit vector of camera reference frame {x, y} Unit vector of camera image plane

P Source point in generic coordinate system (world or camera)

PW Position vector from world origin to P

PC Position vector from camera origin to P

A left hand coordinate system is used for both camera and world coordinates.

Source point, P, is referenced in the world coordinate frame as PW , denoted in vector form as

PW =xwywzw

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.1)

such that PW = xwXW + ywYW + zwZW .

Similarly, P is referenced as PC in the camera coordinate frame, where

13

PC =xcyczc

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(2.2)

such that PC = xcXC + ycYC + zcZC .

Figure 2.1 shows the imaging model for a pinhole camera. The image of a point, P, is formed

where the ray passing through P and the camera optical center intersects the image plane [22].

Note that the distance of the image plane from the camera optical center is referred to as the focal

length, denoted by f.

Figure 2.1 Imaging model for pinhole camera [22]

Based on this pinhole model, using similar triangles, we obtain

x = − f xczc

(2.3)

y = − f yczc

(2.4)

XC

OC ZC

YC

y

x P

Image of P

f

Image plane

14

The negative signs indicate that the image appears to be upside down in the image plane. This

effect can be overcome by placing the image plane in front of the optical axis, as shown in

Figure 2.2.

Figure 2.2 Frontal imaging model for pinhole camera

Using this updated model, we obtain

x = f xczc

(2.5)

y = f yczc

(2.6)

The coordinates {xc , yc , zc} are the homogeneous projection coordinates while {x, y} are the

non-homogeneous coordinates.

The overall geometric model considering a pinhole camera is shown in Figure 2.3. The 3D

mapping from the world frame to the camera frame is based on rotation and translation operators,

represented by R and T, respectively. Directions of both R and T can be referenced with respect

to world or camera frames.

y

x

XC

OC

ZC

YC

Image of P f

P

Image plane

15

Figure 2.3 Projection of 3D point on the camera image plane

Considering translation followed by rotation and reference to the world coordinate frame, the

translation vector is given as

T =OC −OW (2.7)

and the rotation matrix is given by the projection of unit vectors of {XW ,YW ,ZW } onto

{XC ,YC ,ZC} as

R =XW ⋅XC YW ⋅XC ZW ⋅XC

XW ⋅YC YW ⋅YC ZW ⋅YCXW ⋅ZC YW ⋅ZC ZW ⋅ZC

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.8)

Based on these operators, the projective transformation from world reference frame to the

camera reference frame is given as [23]

PC = R(PW −T ) (2.9)

The non-homogeneous coordinates of projection can be obtained using equations (2.5) and (2.6).

P

PC PW

YW

ZW

XW

OW

XC

ZC

YC

OC x

y

Image of P on the image plane

16

It is important to note that the above mentioned projective transformation relation was obtained

considering that the world coordinate frame is first translated and then rotated to obtain the

camera coordinate frame. Had it been the other way around, the transformation will change

accordingly.

2.2 Transformations

By undergoing scaling, skewing, rotation and translation, an image is transformed or mapped

onto another image. While undergoing transformation, if an arbitrary parallelogram is mapped

onto another parallelogram, then the transformation is said to be an affine transformation. The

mapping of any quadrilateral to any other quadrilateral is referred to as the perspective

transformation. So, it can be said that any affine transformation is a perspective transformation

but not every perspective transformation is an affine transformation.

2.2.1 Affine Transformation

An affine transformation between a set of images is the result of image translation, scaling or

azimuthal rotation. An example of an affine transformed image resulting from translation,

azimuthal rotation and scaling is shown in Figure 2.4.

Figure 2.4 An example of an affine transformed image

17

Affine transformations can be visualized as a parallelogram ABCD in plane mapped onto

another parallelogram PQRS , as shown in Figure 2.5.

Figure 2.5 Affine transformation

Let{(xA , yA ),(xB , yB ),(xC , yC ),(xD , yD )} represent the vertices of parallelogram ABCD and

{(xP , yP ),(xQ , yQ ),(xR , yR ),(xS , yS )} represent the vertices of parallelogram PQRS. The affine

transformation between corresponding set of points (xA , yA} and (xP , yP} is given by the

following equation:

xPyP

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

a b c

d e f

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xAyA1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.10)

The matrix with variables {a,b,c,d,e, f } is termed as the affine transformation matrix between

the two images, which is to be determined here.

Similarly, writing the affine transformation for other corresponding set of points on the two

parallelograms and rearranging the equations, we obtain the following equations for determining

the affine transformation variables:

B

A

C D

Image 1

P Q

R

S

Image 2

18

xPxQxRxS

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

=

xA yA 1

xB yB 1

xC yC 1

xD yD 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

abc

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.11)

yPyQyRyS

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

=

xA yA 1

xB yB 1

xC yC 1

xD yD 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

def

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.12)

Since in both the above equations, we have 3 variables that we need to determine, hence we need

only 3 out of 4 corners of the parallelogram. Considering the first three set of points, we obtain

xPxQxR

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=xA yA 1xB yB 1xC yC 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

abc

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.13)

yPyQyR

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=xA yA 1xB yB 1xc yC 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

def

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.14)

By solving (2.13) and (2.14), we obtain the affine transformation between the two images.

2.2.2 Perspective Transformation

Unlike affine transformations, perspective transformations are much more complicated as they

map any quadrilateral onto any other quadrilateral with arbitrary scaling, rotation, translation and

skewing. This is because it considers the tilting of camera while capturing the images, which is

19

not considered in affine transforms. An example of images that underwent perspective

transformation is shown in Figure 2.6.

Figure 2.6 An example of perspective transformation

Considering the rotation matrix, R, to be represented as a matrix of row vectors as

R =R1R2R3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.15)

we can write equation (2.9) as

PC =R1R2R3

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥[PW −T ] (2.16)

By substituting PC and PW , equation (2.16) can be written as

xcyczc

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

R1R2R3

−R1T−R2T−R3T

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xwywzw1

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(2.17)

20

For a planar surface, like a floor, ceiling or wall, we have zw = 0 .

Hence, we can write

xcyczc

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥= H

xwyw1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.18)

where H is a perspective transformation matrix, which can be defined as

H =H11 H12 H13

H21 H22 H23

H 31 H 32 H 33

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(2.19)

The objective here is to determine the elements of H. Using the pinhole camera model, from

(2.5) and (2.6), we have

x = f xczc

= f H11xw + H12yw + H13

H 31xw + H 32yw + H 33

(2.20)

y = f yczc

= f H21xw + H22yw + H23

H 31xw + H 32yw + H 33

(2.21)

On rearranging, we obtain

H 31xxw + H 32xyw + H 33x = H11 fxw + H12 fyw + H13 f (2.22)

H 31yxw + H 32yyw + H 33y = H21 fxw + H22 fyw + H23 f (2.23)

This can be expressed as

uxb = 0 (2.24)

uyb = 0 (2.25)

where

b = H11 H12 H13 H21 H22 H23 H 31 H 32 H 33⎡⎣

⎤⎦T

(2.26)

21

ux = − fxw − fyw − f 0 0 0 xxw xyw x⎡⎣

⎤⎦ (2.27)

uy = 0 0 0 − fxw − fyw − f yxw yyw y⎡⎣

⎤⎦ (2.28)

For a set of 4 points on the quadrilateral, we have 8 equations, which can be written as

ux1

uy1

!ux4

uy4

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

b =

00!00

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(2.29)

Defining

U =

ux1

uy1

!ux4

uy4

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

(2.30)

we have Ub = 08 .

While we have 8 constraints here, the number of variables in b is 9. So, the homogenous solution

of (2.29) can be obtained by performing the singular value decomposition (SVD) of U. The right

singular vector corresponding to the singular value of zero will correspond to the values of b. It

is important to note that the singular vector is arbitrarily scalable, which means that H is obtained

to within a scaling factor. Hence, a set of 4 feature points is sufficient to determine the

perspective warping matrix. However, the feature point correspondences are noisy and distorted.

Hence, we usually need additional feature points to determine the warping matrix.

22

2.3 Feature points

A feature point is a small sub-region of image intensity field with a structure that makes it

definable in two orthogonal directions such that it is suitable for tracking from one frame to the

next. For instance, an edge between the corners is ambiguous in one of the directions. Hence, it

is not suitable for use as a feature point. Similarly, a sub-region homogeneous in both the

directions contains no useful feature points. However a corner point is clearly a suitable feature

point since the displacement in two orthogonal directions can easily be resolved using a corner

point. Figure 2.7 provides a detailed illustration of suitable and unsuitable feature points.

Figure 2.7 Illustration of suitable and unsuitable feature points

Deviation of corners from 90! feature results in the feature point quality degradation. Hence the

feature points at corners deviating from 90! , as shown in Figure 2.8, will perform worse than

90! corners.

Points at circular arcs, for instance circles or ellipses, provide poor quality feature points since

the gradient along the tangent is very low, hence they are ambiguous in one direction as shown in

Figure 2.9.

Suitable feature point resolvable in both the directions

Unsuitable feature point ambiguous in both directions

Unsuitable feature point ambiguous in one direction

23

Figure 2.8 Wedge corners deviating from 90o providing low quality feature points

Figure 2.9 Poor quality feature points at circular arcs

For an image with intensity function I(x, y) , let Ix (x, y) and Iy(x, y) represent the partial

derivatives of the intensity function along x and y directions respectively, given as

Ix (x, y) =∂I(x, y)∂x

(2.31)

Iy(x, y) =∂I(x, y)∂y

(2.32)

Now consider a small sub-region window W of the image. The partial derivatives are found

within the boundaries of the window for each position of window on the image. If Ix (x, y) and

Iy(x, y) are small throughout W, it indicates that the intensity function is featureless in the sub-

region. A second possibility is that | Ix (x, y) | and | Iy(x, y) | are moderately high, indicating the

Feature points weaker than a 90° corner

Ambiguity in the direction of tangent

24

presence of a potential feature. However, a high correlation between Ix (x, y) and Iy(x, y) imply

single dimensionality, hence the presence of an edge, which is not suitable for tracking. The third

possible situation is that Ix (x, y) and Iy(x, y) are reasonably high and not highly correlated,

indicating the presence of a suitable two-dimensional feature.

Hence, the presence of a suitable feature point is dependent on the covariance of the random

functions Ix (x, y) and Iy(x, y) within the sub-region window. Assuming that the window W has

Nx pixels in the x direction and Ny pixels in the y direction, let us define the functions Ixx , Ixy

and Iyy as follows:

Ixx = Ew[Ix (x, y)2 ]= 1

NxNy

Ix (i, j)2

j=0

Ny−1

∑i=0

Nx−1

∑ (2.33)

Ixy = Ew[Ix (x, y)Iy(x, y)]=1

NxNy

Ix (i, j)Iy(i, j)j=0

Ny−1

∑i=0

Nx−1

∑ (2.34)

Iyy = Ew[Iy(x, y)2 ]= 1

NxNy

Iy(i, j)2

j=0

Ny−1

∑i=0

Nx−1

∑ (2.35)

Here W has pixels indexed as 0 ≤ i < Nx and 0 ≤ j < Ny .

The covariance of the intensity gradient functions Ix (x, y) and Iy(x, y) averaged over W results

in the Q matrix given as

Q =Ixx IxyIxy Iyy

⎡

⎣⎢⎢

⎤

⎦⎥⎥

(2.36)

Next, we determine the eigenvalues of Q denoted as λ1 and λ2 such that λ1 < λ2 , both of which

are real and positive since Q is symmetric. One high eigenvalue indicates single dimensionality,

25

hence the presence of an edge which is unsuitable for use as a feature point. However, if both the

eigenvalues are high then it indicates the presence of a usable feature point within the window.

Both low eigenvalues are indicative of a featureless region.

Based on the eigenvalues of the Q matrix, there are various criteria for feature point detection

[24-25]. In this research, we have used Shi and Tomasi’s good features to track (GF2T) [26] for

feature detection. It is based on the criteria that the smaller eigenvalue greater than a minimum

threshold is indicative of the presence of a good feature point.

2.3.1 Examples of feature detection

We consider simple 90! corners, as shown in Figure 2.10, for corner detection based on the

eigenvalues of Q .

(a) Side view of features

(b) Top view of features

Figure 2.10 Corner feature points

Considering the partial derivatives of the intensity field given as

Ix (x, y) = I(x +1, y)− I(x −1, y) (2.37)

Iy(x, y) = I(x, y +1)− I(x, y −1) (2.38)

26

the derivative images are shown in Figure 2.11.

(a) Derivative in x

(b) Derivative in y

Figure 2.11 Derivative images for corner features

The Q matrix is obtained using a 3× 3 window and its eigenvalues are calculated. Figure 2.12

and Figure 2.13 show the plots for the larger and smaller eigenvalues of Q respectively, which

will be used to find the corner points of the given intensity field.

Figure 2.12 Plot of the larger eigenvalues of Q for 90! features

27

Figure 2.13 Plot of the smaller eigenvalues of Q for 90! features

From Figure 2.12 and Figure 2.13, it can be seen that at the edges of the rectangle, one of the

eigenvalues is large while the other one is very small because of high correlation between the

directional derivatives. Hence, as per the feature point detection algorithm, the edges cannot be

used as suitable features. It can be observed from these figures that it is only at the corners of the

rectangle that both the eigenvalues of Q are large. As a result, corners will be detected as quality

feature points.

Next, we show some examples of feature detection based on the GF2T routine of openCV.

Figure 2.14 shows some geometric shapes for which the routine was applied to obtain the feature

points. As can be seen from the figure, rectangle, star and line segment provide correct corner or

feature detection. However, the corners for ellipse, circle and rectangle with rounded corners are

not accurately identified.

For the shapes for which the corners are not properly identified, it is an indication that the feature

points are poor quality, hence not suitable for tracking purpose.

28

Figure 2.14 Corner detection of simple geometric shapes

2.4 Optical Flow

Having obtained the feature points in an image, the next step is to determine the correspondence

of feature points in the two images, that is, how feature points in one image relate to those in

another image. The purpose of finding this correspondence is to obtain the motion of an object

through a set of video frames, also known as optical flow.

Optical flow algorithms assume that the feature points are almost time invariant and follow the

conditions for accurate optical flow estimation as follows [27]:

1. Brightness Consistency: It means that the brightness of pixels remains consistence

between two consecutive frames.

2. Temporal Persistence: The feature points move in very small increments between two

consecutive frames of the video.

3. Spatial Coherence: Neighbouring points on the image belong to the same surface and

have similar motion.

29

To understand the optical flow of feature points in two consecutive frames, we first consider one

dimensional intensity field, given as I(x, t) at time t . From the condition of time invariance, we

obtain

dI(x,t)dt

= ∂I∂x

⋅ dxdt

+ ∂I∂t

= 0 (2.39)

From here we obtain the flow velocity as

vx =dxdt

= −

∂I∂t∂I∂x

= − ItIx

(2.40)

Now consider a general two-dimensional intensity function at time t given as I(x, y,t) . By

taking the total derivative, we obtain

∂I∂x

⋅ dxdt

+ ∂I∂y

⋅ dydt

+ ∂I∂t

= 0 (2.41)

Ixvx + Iyvy + It = 0 (2.42)

where vx and vy are the velocities of optical flow in x and y directions respectively. Here we

have two unknowns, which cannot be resolved using one space-time observation. Hence we need

at least two linearly independent equations to solve for vx and vy . Typically, a window is used in

the neighbourhood of the feature, resulting in a set of over-determined equations for the

estimation of vx and vy . The best fit for vx and vy is obtained by using Least Squares

estimation.

Equation (2.42) can be written as

Ix Iy⎡⎣

⎤⎦

vxvy

⎡

⎣⎢⎢

⎤

⎦⎥⎥= − It[ ] (2.43)

30

For k points in the window, indexed such that k ∈ 1,2,...,K[ ] , we have

Ix,1Ix,2!Ix,K

Iy,1Iy,2!Iy,K

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

vxvy

⎡

⎣⎢⎢

⎤

⎦⎥⎥= −

It,1It,2!It,K

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(2.44)

The second subscript in above equation indicates the pixel position in the window where spatial

and time derivatives are taken. Now, let

A =

Ix,1Ix,2!Ix,K

Iy,1Iy,2!Iy,K

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

(2.45)

M = −

It,1It,2!It,K

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(2.46)

and

P =vxvy

⎡

⎣⎢⎢

⎤

⎦⎥⎥

(2.47)

where A is called the design matrix, M the measurement matrix and P is called the parameter

matrix.

The square error for the form M = AP is given as

e = (M − AP)T (M − AP) (2.48)

31

For least squares estimation of the parameter matrix, we want to minimize the square error, e,

which can be done as

∂e∂P

= 0 (2.49)

Hence we obtain

∂e∂P

= ∂∂P(M − AP)T (M − AP) = −(M − AP)T A = 0 (2.50)

−MTA + PT AT A = 0

PT AT A = MTA

Multiplying by (AT A)−1 , we obtain

PT = MTA(AT A)−1

P = (AT A)−1(ATM ) (2.51)

Hence, we obtain the optical flow in x and y directions for a specified sub-region by using Least

Squares estimation.

As an example, consider a Gaussian pulse propagating in time with a velocity equivalent to 3

points in x direction and 2 points in y direction per 0.05 sec. The plot for the Gaussian pulse at

time t and t+dt is shown in Figure 2.15. Figure 2.16 shows the top view of the two pulses. The

spatial derivatives of the pulse in x and y directions are shown in Figure 2.17. Figure 2.18 shows

the time derivative of the pulse.

Using the time and spatial derivatives of the Gaussian pulse and using least squares, we obtain

the optical flow velocity in x and y directions to be 2.9 and 1.9 respectively, which are close to

the actual velocities of 3 and 2 in x and y, respectively.

32

(a) Pulse at time t

(b) Pulse at time t+dt

Figure 2.15 Side view of Gaussian pulse at time t and t+dt

(a) Pulse at time t

(b) Pulse at time t+dt

Figure 2.16 Top view of Gaussian pulse at time t and t+dt

33

(a) Derivative in x

(b) Derivative in y

Figure 2.17 Spatial derivative of the Gaussian pulse in x and y directions

Figure 2.18 Time derivative of the Gaussian pulse

Based on this concept, there are various algorithms to find the optical flow of an image. In this

research, we have employed the Lucas Kanade Pyramid (LKP) [28][29] algorithm to determine

the optical flow. This algorithm uses image pyramid to establish the correspondence of feature

points in a set of images. The initial images are first smoothened and decimated to obtain smaller

images. Further smoothing and decimation is applied to the obtained images such that we obtain

34

a pyramid of images, as shown in Figure 2.19. The correspondence is first determined at the

topmost level of the pyramid. The next level is now shifted by the displacement vector obtained

in the first level and again correspondence is established. The displacement at the third level is

determined by the sum of displacements at the first two levels and the process is repeated until

the bottom level is reached.

Figure 2.19 Pyramid structure of images in Lucas Kanade Pyramid algorithm

2.4.1 Example of optical flow using Lucas Kanade Pyramid

We reconsider the example of a Gaussian pulse propagating in time. Let the optical flow velocity

in x be equivalent to 1 point and that in y be equivalent to 2 points. We consider a 4 level Lucas

Pyramid of first image Pyramid of second image

Find correspondence

Warp and upsample

Find correspondence

…

35

Kanade Pyramid to find the optical flow for this pulse. Figure 2.20 shows the original pulse at

the bottommost level and decimated pulse at the topmost level of a level 4 pyramid.

(a) Pulse at bottommost level

(b) Decimated pulse at the topmost level

Figure 2.20 Plot of Gaussian pulses at different levels of pyramid

The optical flow velocity is calculated between the pulses at time t and t+dt in their topmost

level. The velocity at this level for this example is found to be 0.4 and 0.9 in x and y direction

respectively, that is vx = 0.4 and vy = 0.9 . Now, the next level of pyramid at time t+dt is shifted

by displacement vector of vxdt in x direction and vydt in y direction and compared to the second

level at time t. Figure 2.21 shows a close view of the pulse contour at time t and shifted pulse at

time t+dt in the second level.

Similarly, the pulse at the third level will be shifted by a value equal to the cumulative

displacement vector of the first two levels. This process is repeated until we reach the

bottommost level. In this case, the velocity calculated at the final level is equal to vx = 1 and

vy = 2 , which corresponds to the actual optical flow velocity.

36

(a) Pulse at time t

(b) Pulse at time t+dt, shifted by the displacement vector

Figure 2.21 Contour of Gaussian pulse at the second level of pyramid

It is important to note that in Lucas Kanade Pyramid algorithm, for correct estimate of the flow,

the displacement between two frames should be less than the minimum distance between two

feature points. Also, as stated earlier, the brightness of the pixels should not change between two

consecutive frames.

37

Proposed Algorithm Chapter Three:

Estimation of the 6DOF egomotion of a camera consists of estimating the translation and rotation

along the three perpendicular coordinate axes. The sequence of camera images is taken as the

input and a set of corresponding feature points are obtained on the image frames. Based on the

transformation of feature points from frame to frame, the underlying trajectory of the camera is

estimated. Before performing the trajectory estimation based on the motion of feature points, the

camera is calibrated in order to compensate for the lens distortion and to establish the camera

intrinsic matrix, which specifies the scaling factor and optical center of the camera. After

calibrating the camera, the images captured by the camera are pre-processed to remove noise and

to prepare the images in a form they can be used for better extraction of features. Finally, based

on the transformation of feature points from one frame to the other, different algorithms to

estimate the trajectory of the camera are proposed.

3.1 Camera Calibration

In chapter 2, the concept of mapping of a point from the world reference frame to the camera

image plane was introduced. The mapping deals with the estimation of only the extrinsic

parameters of the camera involving the rotation matrix and the translation vector. However, there

are intrinsic parameters of the camera, particularly the focal length and the principal point, and

distortion parameters that need to be determined. The intrinsic and distortion parameters of the

camera remain constant with time. Hence, they are estimated prior to using a camera for the

estimation of extrinsic parameters. The determination of intrinsic and distortion parameters is

done using camera calibration.

38

3.1.1 Intrinsic Camera Parameters

As shown in chapter 2, with the assumption of a pinhole camera model, any 3D point in plane in

the FOV of camera can be projected onto a 2D point on the image plane of the camera. For a

point P, the relationship between the point observed in the camera reference frame and that

observed in the camera image plane is given by equations (2.5) and (2.6), which can be re-

written as

xy

⎡

⎣⎢⎢

⎤

⎦⎥⎥= fzc

xcyc

⎡

⎣⎢⎢

⎤

⎦⎥⎥

(3.1)

This can be represented as

zcxy1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

f 0 00 f 00 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xcyczc

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.2)

It is important to note that when we do the transformation from a world coordinate frame to the

camera coordinate frame, the parameters are in term of metric units, that is, millimeters. The

metric unit needs to be scaled to pixels. Let sx and sy be the scale of this conversion. Hence,

(3.2) can be changed to

zcxy1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

sx 0 00 sy 0

0 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

f 0 00 f 00 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xcyczc

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.3)

All the pixels in an image are specified with respect to the top left corner, indicated as (0, 0)

whereas (x, y) are still specified with respect to the principle point, that is the point where the

camera optical axis intersects with the image plane [22]. Hence, two new parameters, ox and oy

39

are introduced to shift the origin of image reference frame from the principle axis to the top left

corner.

x ' = x + oxy ' = y + oy

(3.4)

Also, we replace zc by an arbitrary scalar λ . Hence, we obtain

λx 'y '1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

sx f 0 ox0 sy f oy0 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xcyczc

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.5)

We can write sx f = fx and sy f = fy to obtain

λx 'y '1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥=

fx 0 ox0 fy oy0 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xcyczc

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.6)

The above 3× 3 matrix is called the Intrinsic Camera Matrix.

Hence, from (2.18) and (3.6), the homography can be represented as [27]

λx 'y '1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥= H intH

xwywzw1

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(3.7)

where H int is the intrinsic camera matrix and H is the extrinsic matrix.

Hence, determination of the intrinsic camera matrix using camera calibration is the estimation of

the parameters fx, fy,ox,oy{ } .

40

3.1.2 Distortion Parameters

Due to manufacturing defects, there are various kinds of distortions in a camera lens. The two

main kinds of distortions in the lens are radial distortions and tangential distortions. These

distortions can be represented by approximate interpolative models, which make these distortions

invertible.

Radial distortion refers to the image deformation along the radial direction from a point known

as the center of distortion [30]. It causes inward or outward bulging of the image, resulting in

pincushion or barrel effect, as shown in Figure 3.1. Radial distortion is not seen at the image

center but it increases as we move away from the center [27].

Let (xd, yd ) represent the coordinates of distorted image point and (xu, yu ) represent the

coordinates of corresponding undistorted or corrected image point. The scaling of the points

between distorted and undistorted images is given by the radial distortion model equation

described in [31] as

xu = xd (1+ k1rd

2 + k2rd4 )

yu = yd (1+ k1rd2 + k2rd

4 ) (3.8)

where k1 and k2 are the radial distortion coefficients and rd refers to the radial distance given as

rd = xd2 + yd

2 (3.9)

41

(a) Original Image

(b) Barrel distortion

(c) Pincushion distortion

Figure 3.1 Effects of radial distortion

The second common form of distortion is the tangential distortion, which is caused by

manufacturing defects resulting in the lens not being parallel to the image plane. The model for

tangential distortion is given by [27]

xu = xd + 2p1yd + p2 rd

2 + 2xd2( )⎡⎣ ⎤⎦

yu = yd + p1 rd2 + 2yd

2( )+ 2p2xd⎡⎣ ⎤⎦ (3.10)

42

where p1 and p2 are the tangential distortion coefficients.

Hence, the estimation of distortion coefficients using camera calibration is the estimation of the

parameters k1,k2, p1, p2{ } .

3.1.3 Calibration and distortion mitigation

We have employed the 2D plane based calibration described in [32] to estimate the intrinsic and

distortion parameters of the camera. In this calibration, we have used the checkerboard pattern as

the 2D planar surface, which is viewed at different orientations, as shown in Figure 3.2 as an

example. The images of the checkerboard at various orientations are taken using the camera to be

calibrated.

Figure 3.2 Images of different orientations of checkerboard captured using a camera

The main purpose of estimating the distortion coefficients is to be able to invert the distortion in

an image to compensate for the effects of radial and tangential distortions. Based on the code

available in [33], the camera calibration was performed using the images of a checkerboard

pattern captured by a Bumblebee stereo camera. As a result of calibration, the obtained intrinsic

parameters of this camera are shown in Table 3.1. Using the models for radial and tangential

distortions mentioned above, the un-distortion of a sequence of images captured by this camera

43

was performed. The result of the un-distortion performed on the image of a tiled floor is shown

in Figure 3.3.

Table 3.1 Intrinsic parameters of the Bumblebee stereo camera

Parameter Value (pixels) fx 1976.9 fy 1957.5 ox 308.6 oy 145.6

(a) Original captured image

(b) Image obtained after applying lens distortion compensation

Figure 3.3 Undistortion of the image of a tiled floor

3.2 Image Pre-processing

Pre-processing refers to the process required to prepare an image so that it can be used for further

analysis of feature detection, correspondence and trajectory estimation [34]. Pre-processing is

done using kernel based operations, which are accomplished using 2D correlation operations.

The source image is acted upon by a rectangle shaped kernel operator to obtain the destination

image. For each window position, the content of the kernel is correlated with the image content

44

within the boundaries of the window and the result is stored at the image pixel point coincident

with the kernel point denoted as the anchor point. For example, as shown in Figure 3.4, for a

square shaped kernel with the anchor point at the center, the result of correlation operator will be

stored at the pixel in the destination image which is coincident with the center of the kernel.

Figure 3.4 Kernel based image processing

Consider a kernel H with width of the kernel window nx and height ny . Let a and b represent

the anchor point index of the kernel in x and y directions respectively and h(i, j) be the value

of kernel at pixel point (i, j) . The kernel is represented in the matrix form as

H =

h(0, 0) h(1, 0) ! h(nx −1,0)h(0,1) h(1,1) ! h(nx −1,1)! ! " !

h(0,ny −1) h(1,ny −1) ! h(nx −1,ny −1)

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(3.11)

Let Isrc (i, j) represents the pixel intensity of source image at (i, j) and Idst (i, j) represents the

intensity of destination image at pixel location (i, j) .

Anchor point

Kernel with anchor point at the center

Result of this convolution stored at this pixel point

45

The expression for a 2D correlation of the source image with the kernel is given as

Idst (n,m) = h(i, j)Isrc (n + i − a,m + j − b)j=0

ny−1

∑i=0

nx−1

∑ (3.12)

An immediate problem with sliding correlation is that there are times when the indices of source

image move out of the support domain. In such cases, the approach that we follow is the

extension of background intensity values to the outside boundary of the image such that the

correlation is defined for all indices of the source image. For (Nx,Ny ) being the size of source

image, the background extension can be expressed numerically as

Isrc (i, j) = Isrc (0, j) for i < 0

Isrc (i, j) = Isrc (Nx, j) for i ≥ Nx (3.13)

Isrc (i, j) = Isrc (i, 0) for j < 0

Isrc (i, j) = Isrc (i,Ny ) for j ≥ Ny

The problem with any boundary extension is that we are creating a feature that is not a part of the

original image. These artifacts are minimized by taking care not to infer image features at

boundary for eventual egomotion observables.

3.2.1 Gaussian Smoothing

Smoothing is a low pass filtering operation that is used to supresses the higher spatial

frequencies of the image that are generally corrupted by noise. Smoothing is also used to

decrease the image resolution, as in case of Lucas Kanade Pyramid [27]. There are various

smoothing operations available for blurring of the image by low pass filtering, for instance

simple blur, Gaussian blur, median blur, etc. In this research, we have used Gaussian blur for

image smoothing. Gaussian blurring or filtering is done by applying a Gaussian kernel to the

46

source array to obtain the result, which is stored in the destination array. The anchor point in the

case of Gaussian smoothing is always the center point of the 2D kernel. It weighs the source

image based on a weighted average where the center pixel has higher weight and the weight

decreases away from the center according to the Gaussian distribution. The 2D Gaussian kernel

can be represented mathematically as

h(i, j) = C exp −i − (nx −1) / 2( )2

2σ x2

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪exp −

j − (ny −1) / 2( )22σ y

2

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪ (3.14)

where σ x and σ y represent the standard deviation of the Gaussian kernel in the x and y

directions, respectively. A plot of the Gaussian smoothing kernel is shown in Figure 3.5.

Figure 3.5 Plot of Gaussian filter kernel

Figure 3.6 shows some results of Gaussian smoothing with different sizes of the window applied

to an image. It is important to be careful with the parameters of Gaussian smoothing, since

besides removing noise from the image, it also gives blurry definition to the edges.

47

(a)

(b)

(c)

(d)

Figure 3.6 Results of Gaussian filtering (a) shows the original image. (b), (c) and (d) show the results of Gaussian filtering applied with the filter kernel of size 13x13, 17x17 and

21x21 respectively.

3.2.2 Edge Detection

In this research, particularly in case of patterned surfaces, significant amount of information is

extracted from the edges of the image. So, edge detection is an important step in image pre-

processing to gather information about the structure of the surface. Among the various available

edge detection operators, like the Sobel edge detector [35], Canny edge detector [36], Roberts

48

Cross operator [37] and Prewitt operator [38], the most common edge detector is the Canny edge

detection, which is the routine we have employed in this research. The reason for using Canny

edge detector is that it is computationally very efficient and generally provides satisfactory

results in the experimental work. It is dependent on the kernel-based gradients of the image.

Hence, before going to the Canny edge detection, we will throw some light on the kernel based

spatial derivative operations.

If we take the derivative of an image in a particular direction, we obtain its edges in one

particular direction. For instance, by taking the derivative in the y direction, we obtain all the

horizontal edges of the image. Similarly, the derivative in x direction provides us with the edges

in the vertical direction.

The expressions for a one-dimensional derivative of a discrete signal are generalized in

following different ways:

Forward differencing:

v(n) ≈ du(n)dn

= u(n +1)− u(n) (3.15)

Backward differencing:

v(n) ≈ du(n)dn

= u(n)− u(n −1) (3.16)

Central differencing:

v(n) ≈ du(n)dn

= u(n +1)− u(n −1)2

(3.17)

Forward differencing advances the derivative and backward differencing delays the derivative.

However, the central differencing provides an unbiased estimate of the derivative. Hence, based

49

on the central differencing, the kernel for gradient in x to obtain the derivative image, Ix , is

given as − 12

0 12

⎡

⎣⎢

⎤

⎦⎥ with the anchor point at the center.

Similarly, the kernel for gradient in y to obtain the derivative image, Iy , is given by

− 12012

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

with the anchor point at the center.

Based on these spatial derivatives, the Canny edge detector computes which is given as

Ie = Ix2 + Iy

2 (3.18)

Ie is then compared against two thresholds, λhigh and λlow . Depending on how Ie at a particular

pixel compares to these thresholds, the corresponding binary output is assigned at that particular

pixel.

Idst (i, j) =1 if Ie(i, j) > λhigh

Idst (i, j) = 0 if Ie(i, j) < λlow

Idst (i, j) =1 if λlow < Ie(i, j) < λhigh and one of the neighbours is higher than λhigh .

An example of Canny edge detection is shown in Figure 3.7.

Ie

50

(a) Original image

(b) Result of Canny edge detector applied to it

Figure 3.7 Result of Canny edge detection

3.2.3 Thresholding

Thresholding refers to the technique of segmenting an image to extract the desired clutter in an

image such that it is separated from undesirable pixels. It is done in a way that each pixel is

either accepted or rejected depending on whether it fall above a predefined threshold or below it.

The purpose of doing thresholding is to segment an image such that the pixels associated with

the object in the image appear above a threshold while the background appears below the

threshold. There are a various methods that use image histograms to estimate the threshold of an

image [39-41].

Given the image and threshold value, thresholding can be performed in various ways like binary

thresholding, binary inverse thresholding, threshold to zero, threshold to zero inverse and

threshold truncate. In binary thresholding, the pixels with intensity above the threshold are

assigned the maximum value while those below it are assigned zero. Binary inverse thresholding

works in a reverse fashion where the intensities above the threshsold are assigned zero and those

51

below it are assigned the maximum value. In threshold to zero, the pixels with intensity values

above the threshold remain unchanged while those below it are assigned a zero value and vice-

versa in threshold to zero inverse. Finally, in threshold truncate, the pixel intensities above the

threshold get the maximum value while those below it remain unchanged. Figure 3.8 shows the

results of various kinds of thresholding operations applied to an image.

(a) Original Image

(b) Binary Thresholding

(c) Binary Inverse Thresholding

(d) Threshold to Zero

(e) Threshold to Zero Inverse

(f) Threshold Truncate

Figure 3.8 Results of thresholding applied to an image.

In this research, we have used binary thresholding to highlight the structure of a patterned

surface for which the structure is not clearly defined, so that the information from the pattern can

be used to improve the accuracy of the algorithm. An example is shown in Figure 3.9. However,

thresholding itself is not reliable in isolating the lines in the image, since its performance is

dependent on a variety of factors like brightness and contrast of the tiles, etc.

52

(a) Original image

(b) Thresholded image

Figure 3.9 Binary thresholding applied to a tiled surface

3.3 Hough Lines

For a patterned surface, it is necessary to find the position of lines of the pattern so that extra

information can be gathered from it. This is accomplished using Hough lines [42]. The Hough

lines algorithm is a standard algorithm that detects the lines in an image by exploiting the

parameters of the line [43]. Rather than using the slope-intercept parameters, it uses the angle-

radius parameters, which makes the computation simple [44].

Consider any line that is extended into an infinite line in 2D plane. It can be represented by 2

variables, ρ and θ , where ρ is the perpendicular distance from the origin to the line and θ is

the angle of the line, as shown in Figure 3.10.

53

Figure 3.10 Parameters of a line

In an edge detected image, for each point in the intensity field, the contour of all potential {ρ,θ}

combinations that the point can belong to is plotted. Figure 3.11 shows how a point may belong

to multiple lines. In the contour, each line is weighted by the intensity of the point. The plot is

then thresholded to find the {ρ,θ} of the line segment. The peak is a monotonic function of the

line length and the intensity.

Figure 3.11 Plot of lines passing through a point

ρ

θ

x

y

x

y

Point

54

For example, consider a line on an edge detected image as shown in Figure 3.12(a). For each

point that is lit up, a contour of the possible {ρ,θ} combinations is plotted. For instance, Figure

3.12(b) shows some example contours of four points on the line. The final probability map will

have peaks, which will correspond to the more likely line segments. Hence, we can extract the

line segments in the image by proper thresholding.

Figure 3.13 shows an example of lines detected in the image based on the Hough lines transform.

It is important to note that the lines can be detected either using edge detected or thresholded

image. Figure 3.14 shows the result of line detection on a patterned surface resulting from low

pass filtering, edge detection and then the Hough transform.

(a) A line in an edge detected image. 4 points are selected on the line to show

the contour mapping on {ρ,θ} plot.

(b) Mapping of contours of possible {ρ,θ}

combinations for each point in the edge detected image to find the peak.

Figure 3.12 Probability mapping of points in the image for line detection

1

2

3

4

55

Figure 3.13 Hough lines on an image of rectangle

(a) Original image

(b) Lines detected in the original image

Figure 3.14 Line detection on a patterned surface

3.4 Proposed 4DOF egomotion algorithm

Before considering the camera undergoing full perspective transformation, we will consider the

case where the camera is capturing the planar surface such that the optical axis of the camera is

always perpendicular to the planar surface. That is, the camera can undergo translation in x and y

56

directions, change in height (translation in z) and azimuthal rotation, but it cannot undergo tilts in

x and y. Hence, the motion of the camera will exhibit 4DOF.

The proposed algorithm to determine the 4DOF camera motion takes a pair of consecutive

images captured by the camera as input. A certain sequence of steps is carried out on each pair to

estimate the relative transformation between them. In each pair of consecutive images, we will

denote the image at time t-1 as prior image and the one at time t as post image.

To implement egomotion, the calibrated consecutive images are first gray-scaled and Gaussian

smoothing is applied to them. The next step is to extract the rotation and translation of the

camera based on the feature points observed. As mentioned in chapter 2, we have used Shi and

Tomasi’s GF2T [26] to detect the features of the prior image. Figure 3.15 shows the results of

the GF2T feature detector on a concrete and tiled floor surface.

(a) Features on a concrete floor

(b) Features on a tiled floor

Figure 3.15 Results of GF2T on concrete and tiled surfaces

For a patterned surface, we have used Hough lines along with GF2T to obtain the feature points

at the intersection of lines. The lines in the patterned surface add extra constraints, which helps to

eliminate outlier feature points that are not a part of the pattern and hence obtain a rich set of

57

feature points. The result of feature detection based on GF2T and Hough lines for a tiled floor

surface is shown in Figure 3.16.

(a) Original image

(b) Feature detection on original image

Figure 3.16 Result of feature detection on a tiled floor based on GF2T and Hough lines

Having obtained the feature points in the prior image, we need to find the corresponding set of

feature points in the post image, which is done based on the optical flow algorithm employed by

Lucas Kanade Pyramid (LKP) [29]. We have employed a two-way optical flow algorithm [45] to

find the correspondence of feature points. In a two-way optical flow, the correspondence is first

established in the forward direction, that is, from prior image to post image. Then with the

corresponding points obtained in the post image, the correspondence is established in the

backward direction, that is, from post image to prior image. The correspondences that do not

match in the two directions are discarded as being outliers. Let xa, ya( ) represent the feature

points in the prior image obtained using GF2T and xb, yb( ) represent the corresponding feature

points in the post image obtained using LKP. Then using the points obtained in the post image,

we apply LKP in a backward direction and obtain the corresponding feature points in the prior

58

image. Let xc, yc( ) represent the feature points in the prior image obtained as a result of

backward optical flow. Only those points will be retained that satisfy the following equation

[45]:

xA − xC( )2 + yA − yC( )2 <σ 2 (3.19)

The rest of the points will be discarded. As given by [45], we have taken σ = 0.2 pixels to

implement a two-way optical flow, since it gives reasonable results for practical

implementations. Figure 3.17 shows an example of a two-way optical flow. The green donuts

represent the points that were retained by the two-way optical flow while the red donuts denote

the points that were discarded as a result of the two-way optical flow since the correspondences

in two directions did not agree.

(a) Original image in prior frame

(b) Result of two way optical flow with post frame

Figure 3.17 Two-way optical flow

Depending on how the feature points have moved from one frame to the next, the underlying

transformation can be calculated. We have considered Least Squares [46] and Kalman filter [46-

47] estimation of the transformation matrix. The differential motion of the camera between two

59

frames can be estimated based on this transformation matrix. When the motion of the camera and

the covariance matrix of the measurement of feature points is unknown, then Least Squares

estimation is used for the transformation matrix. Best Linear Unbiased estimator [46] is used

when the covariance matrix of measurements is known. However, when both the covariance of

the measurements as well as the dynamic model of the motion of camera are known, Kalman

filtering is used as the motion estimator.

At time t, let the translation in x and y directions between the world and camera frame of

reference be denoted by Txt,Ty

t( ) . Let the perpendicular distance from the optical center to the

planar surface be denoted as ht and the counterclockwise azimuthal rotation be represented as

azt . Let the kth feature point at time t be denoted as xw,kt , yw,k

t( ) in the world reference frame,

xc,kt , yc,k

t , zc,kt( ) in the camera reference frame and xk

t , ykt( ) in the camera image plane. Based on

(2.9) we obtain

xc,kt

yc,kt

zc,kt

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=cos(azt ) sin(azt ) 0−sin(azt ) cos(azt ) 0

0 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xw,kt

yw,kt

0

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

−Txt

Tyt

−ht

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

(3.20)

From here we have zc,kt = ht . From (2.5) and (2.6), we obtain

xkt

ykt

⎡

⎣⎢⎢

⎤

⎦⎥⎥= fht

cos(azt ) sin(azt ) −Txt '

−sin(azt ) cos(azt ) −Tyt '

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xw,kt

yw,kt

1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.21)

where

Txt ' = cos(azt )Tx

t + sin(azt )Tyt

Tyt ' = −sin(azt )Tx

t + cos(azt )Tyt (3.22)

60

An important assumption made here is that the feature points in the world frame remain

stationary in time. Hence, we can remove the time index t from the world coordinates to obtain

xkt

ykt

⎡

⎣⎢⎢

⎤

⎦⎥⎥= fht

cos(azt ) sin(azt ) −Txt '

−sin(azt ) cos(azt ) −Tyt '

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xw,kyw,k1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.23)

Similarly, if Txt−1,Ty

t−1( ) represents the translation between the world and camera reference frame

at time t-1, ht−1 represents the perpendicular distance from the optical center to the planar

surface and azt−1 represents the counter clockwise azimuthal rotation between the two coordinate

frames, the affine transformation between them is given as

xkt−1

ykt−1

⎡

⎣⎢⎢

⎤

⎦⎥⎥= fht−1

cos(azt−1) sin(azt−1) −Txt−1'

−sin(azt−1) cos(azt−1) −Tyt−1'

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xw,kyw,k1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.24)

From (3.23) and (3.24), we obtain the transformation between feature points in two consecutive

frames as

xkt

ykt

⎡

⎣⎢⎢

⎤

⎦⎥⎥= h

t−1

htcos(Δazt ) sin(Δazt ) −ΔTx

t

−sin(Δazt ) cos(Δazt ) −ΔTyt

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xkt−1

ykt−1

1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.25)

where

Δazt = azt − azt−1

ΔTxt = cos(azt )(Tx

t −Txt−1)+ sin(azt )(Ty

t −Tyt−1)

ΔTyt = −sin(azt )(Tx

t −Txt−1)+ cos(azt )(Ty

t −Tyt−1)

(3.26)

Note that (3.25) can be viewed as the following relationship between the feature points in two

frames.

61

p2 = R(p1 −T ) (3.27)

where p2 refers to the feature points in the post frame and p1 refers to the feature points in the

prior frame. R and T denote the relative rotation matrix and relative translation vector

respectively.

For the sake of simplicity, let us write

ht−1htcos(Δazt ) = cΔ

t

ht−1htsin(Δazt ) = sΔ

t

ht−1ht

ΔTxt = ΔTx

t '

ht−1ht

ΔTyt = ΔTy

t '

(3.28)

Hence, equation (3.25) becomes

xkt

ykt

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

cΔt sΔ

t −ΔTxt '

−sΔt cΔ

t −ΔTyt '

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xkt−1

ykt−1

1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.29)

This equation can be solved for the parameters {cΔt , sΔ

t ,ΔTxt ',ΔTy

t '} using either Least Squares or

Kalman filter.

3.4.1 Least Squares estimation

A Least Squares estimator provides an estimate of the differential rotation and translation of the

feature points from one frame to the next without assuming any dynamic model. (3.29) can be

rearranged in Least Squares notation as

62

xkt

ykt

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

xkt−1 yk

t−1 −1 0

ykt−1 −xk

t−1 0 −1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

cΔt

sΔt

ΔTxt '

ΔTyt '

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

(3.30)

For K number of quality feature points, we obtain an over-determined set of constraints for the

parameter vector {cΔt , sΔ

t ,ΔTxt ' ,ΔTy

t '} . These set of constraints give us the M = AP form, which

is solvable using Least Squares. Here M represents the measurement vector, A the system matrix

and P the measurement matrix. Hence, for K feature points, we have (3.30) as

x1t

y1t

!xKt

yKt

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

M"#$ %$

=

x1t−1 y1

t−1 −1 0

y1t−1 −x1

t−1 0 −1!

xKt−1 yK

t−1 −1 0

yKt−1 −xK

t−1 0 −1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

A" #$$$$ %$$$$

cΔt

sΔt

Txt '

Tyt '

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

P"#$ %$

(3.31)

The Least Squares solution for this form is given by [46]

P = (ATA)−1ATM (3.32)

It can be noted here that we are solving for 4 parameters {cΔt , sΔ

t ,ΔTxt ' ,ΔTy

t '} instead of 3

parameters {Δazt,ΔTxt ',ΔTy

t '} . The former is preferred as the Least Squares formulation is linear

and can be solved directly. This estimation will provide us with only three parameters of camera

motion, that is translations in x and y and azimuthal rotation. The fourth parameter, that is the

height of the camera from the planar surface, will be estimated by imposing the constraint

between sin(Δazt ) and cos(Δazt ) , as will be seen shortly.

63

3.4.2 Kalman Filter estimation

Kalman filter estimation assumes a statistical update model. It is used when we have a specified

covariance matrix. For rest of the cases, the Least Squares algorithm is used. For solving (3.29)

using Kalman filter, the state vector, St , at time t is given as

St =

cΔt

sΔt

Txt '

Tyt '

⎡

⎣

⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥

(3.33)

The observation vector, Xt , and the measurement matrix, Ht are given as

Xt =

x1t

y1t

!xKt

yKt

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

(3.34)

Ht =

x1t−1 y1

t−1 −1 0

y1t−1 −x1

t−1 0 −1!

xKt−1 yK

t−1 −1 0

yKt−1 −xK

t−1 0 −1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

(3.35)

Let A be the transition matrix and Q and C be the process and measurement noise covariances

respectively. Let S∧

t|t−1 be the estimate of state vector at time t based on previous measurements

and S∧

t|t be the updated estimate after the correction. Let Mt|t−1 represents the predicted Minimum

Mean Square Error (MMSE) and Mt|t represents the corrected MMSE. Let Kt represents the

64

Kalman gain at tth time instant. The state vector at a particular time instant can be solved using

the following recursive process of Kalman Filter stated in [46-47].

Prediction:

S∧

t|t−1 = AS∧

t−1|t−1 (3.36)

Prediction MMSE:

Mt|t−1 = AMt−1|t−1AT +Q (3.37)

Kalman Gain:

Kt = Mt|t−1HtT (Ct + HtMt|t−1Ht

T )−1 (3.38)

Correction:

S∧

t|t = S∧

t|t−1+ Kt (Xt −Ht S∧

t|t−1) (3.39)

Corrected MMSE:

Mt|t = 1−KtHt( )Mt|t−1 (3.40)

The viability of using Kalman filtering in this context depends on having a reasonably accurate

state update model, A, and good approximations of the covariance matrices Q and C. Q is based

on the known statistics of the statistical camera trajectory. C can be estimated from the variance

of feature points in the image, which depends on the quality of the camera as well as lighting and

contrast of tiles.

We note here that from equation (3.29), we have only two observables per time step per feature

point but there are a number of feature points observed in each frame. However, if any frame has

too few feature points available, then Kalman filter estimation can even be made with fewer

observations but Least Squares will not be able to estimate the parameters.

65

3.4.3 Estimation of camera motion from the transformation matrix

Having obtained the parameter vector, {cΔt , sΔ

t ,ΔTxt ' ,ΔTy

t '} from Least Squares or Kalman

filtering, the overall 4DOF egomotion of the camera can be obtained. Let az,t represent the total

counter clockwise azimuthal rotation that the camera has undergone at time t starting from its

initial position.

az,t = az,t−1 + tan−1 sΔ

t

cΔt

⎛⎝⎜

⎞⎠⎟

(3.41)

Let xa,t and ya,t represent the translations of the camera from the first frame in x and y

directions at time t. They are given by

xa,tya,t

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

xa,t−1ya,t−1

⎡

⎣⎢⎢

⎤

⎦⎥⎥+

Txt −Tx

t−1

Tyt −Ty

t−1

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.42)

From (3.26), we have

ΔTx

t

ΔTyt

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥=

cos(azt ) sin(azt )−sin(azt ) cos(azt )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

Txt −Tx

t−1

Tyt −Ty

t−1

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.43)

This can be written as

Tx

t −Txt−1

Tyt −Ty

t−1

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥=

cos(azt ) −sin(azt )sin(azt ) cos(azt )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

ΔTxt

ΔTyt

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.44)

Also, we have ΔTxt = ht

ht−1ΔTx

t ' and ΔTyt = ht

ht−1ΔTy

t ' . Substituting it, we obtain

Tx

t −Txt−1

Tyt −Ty

t−1

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥= htht−1


⎡

⎣⎢⎢

⎤

⎦⎥⎥

ΔTxt '

ΔTyt '

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.45)

Hence, from (3.42) and (3.45), we obtain the translations of the camera as

66

xa,tya,t

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

xa,t−1ya,t−1

⎡

⎣⎢⎢

⎤

⎦⎥⎥+ htht−1


⎡

⎣⎢⎢

⎤

⎦⎥⎥

ΔTxt '

ΔTyt '

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.46)

To obtain the height, ht , of the camera at time t, we consider

(cΔt )2 + (sΔ

t )2 = ht−1ht

⎛⎝⎜

⎞⎠⎟

2

cos(azt )2 + sin(azt )2( ) (3.47)

Thus,

(cΔ

t )2 + (sΔt )2 = ht−1

ht

⎛⎝⎜

⎞⎠⎟

2

ht−1ht

= (cΔt )2 + (sΔ

t )2 (3.48)

Hence the height of the camera from the planar surface is given as

ht =ht−1

(cΔt )2 + (sΔ

t )2 (3.49)

3.5 Proposed 6DOF algorithm for rectangular patterned surface

The 6DOF egomotion algorithm proposed in this section works for cameras directed at the

rectangular patterned surfaces moving in any random fashion and exploits the grid of grout lines

associated with the pattern. The egomotion is with respect to the camera center. Desired

parameter are the differential translation vectors, as these are the parameters that ultimately

define the trajectory. The tilt angles and the azimuthal angles are the nuisance parameters, but

they need to be removed for proper trajectory estimation.

In this proposed algorithm, the egomotion estimation is robustly partitioned into a sequence of

2DOF and 4DOF estimations. The 2DOF algorithm uses the sequence of raw camera images to

estimate and compensate the camera tilt angles such that the camera image plane is coplanar with

67

the planar surface. Tilt compensation adjusts the images or projects it via a perspective mapping

to effectively remove the tilt angles. The compensated feature points are then used for extraction

of camera translation and azimuthal rotation using the 4DOF algorithm. The 4DOF algorithm

takes the sequence of tilt-compensated images as the input and uses Least Squares or Kalman

Filtering to estimate the differential translation, azimuthal rotation and change in height between

two consecutive image frames, as described in Section 3.4.

The reason for partitioning the 6DOF algorithm into 2DOF and 4DOF algorithms is that the

2DOF algorithm compensates for the tilts in the camera, which if left uncompensated will drift

the trajectory very quickly and the 4DOF algorithm uses Least Squares and Kalman Filter

estimation to find the trajectory. Hence, the partition increases the robustness in trajectory

estimation as compared to when a complete 6DOF estimation is performed.

A wheeled robot moving over a smooth surface perceives a simpler 2D environment as opposed

to the general 3D environment. Consequently, its motion can be fully characterised in 3D.

However, a person navigating with a handheld smart phone will invariably tilt it significantly.

Uncompensated tilt induces a translation error in the trajectory, as shown in Figure 3.18. The

trajectory estimation is quite sensitive to the camera tilt. If the tilt is not compensated, the

trajectory will drift off very quickly. Even if the tilt angles average out over the whole trajectory,

there is azimuthal rotation inter-dispersed with non-zero tilt angles that the overall egomotion

estimation will drift since the tilt transformation is not commutative.

68

Figure 3.18 Translation error induced by tilts in the camera

The proposed algorithm exploits the significant structure of the uniformly patterned surfaces to

map it to a tilt-compensated surface. A grid on the rectangular patterned surface is selected from

the first captured frame and the constraints on the grid are used to map this grid onto a tilt-

compensated grid such that the whole frame is transformed to an image that appears to be like

the one captured by a camera with no tilt. Figure 3.19 shows an example of how a tiled floor

would appear when captured from a camera with and without tilt.

(a) Image of tiled floor with tilt-free camera

(b) Image of tiled floor with tilted camera

Figure 3.19 Images of a tiled floor with tilt-free and tilted camera

Initial camera position with no tilt Final position of the camera that has not translated but only tilted

Translation

69

For implementation of tilt removal and compensation, we find the feature points at the

intersection of pattern lines on the gray scaled and Gaussian smoothed first captured frame by

employing Hough lines and GF2T. Based on the feature points obtained, a grid of tiles with

equal number of tiles in the two perpendicular directions is selected, as shown in Figure 3.20.

Green circles in the figure represent feature points at line intersections. The corner feature points

of the selected grid are used to map it onto a perfect rectangle, as shown in Figure 3.21 and

Figure 3.22. An assumption made here is that the tiled surface or rectangular patterned surface,

in particular, is uniform but can have local imperfections and irregularities. The mapping will

map the whole image into tilt-compensated image.

Figure 3.20 Possible grid selected on the tiled floor image

Figure 3.21 Feature points at the corners of the selection (shown by yellow circles)

70

Figure 3.22 Mapping from a tilted image to tilt-compensated image

The perspective relation between the tilted and tilt compensated images is given by

ptilt = R(ptilt _ free −T ) (3.50)

where ptilt represents the feature points of tilted image and ptilt _ free represents the feature points

of tilt-compensated image.

Based on the perspective rotation matrix between the two images, we can determine the camera

rotations in perpendicular directions. Let ax and ay represent the counter clockwise differential

tilts between two consecutive frames along x and y axes respectively and az represents the

counter clockwise azimuthal rotation along z axis. The respective rotation matrices, Rx , Ry and

Rz along x, y and z will be given by

Rx =1 0 00 cos(ax ) sin(ax )0 −sin(ax ) cos(ax )

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.51)

71

Ry =cos(ay ) 0 −sin(ay )

0 1 0sin(ay ) 0 cos(ay )

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.52)

Rz =cos(az ) sin(az ) 0−sin(az ) cos(az ) 00 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.53)

The overall rotation matrix, R, between the two frames is defined by the following order:

R = RxRyRz (3.54)

This gives

R =

cos(ay )cos(az ) cos(ay )sin(az ) −sin(ay )

sin(ax )sin(ay )cos(az )− cos(ax )sin(az ) sin(ax )sin(ay )sin(az )+ cos(ax )cos(az ) sin(ax )cos(ay )

cos(ax )sin(ay )cos(az )+ sin(ax )sin(az ) cos(ax )sin(ay )sin(az )− sin(ax )cos(az ) cos(ax )cos(ay )

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.55)

If R(i, j) represents the (i, j)th element of the rotation matrix, the tilt angles in the camera can be

obtained using the following equations:

az = tan−1 R(1,2)

R(1,1)⎛⎝⎜

⎞⎠⎟

(3.56)

ay = cos−1 R(1,1)cos(az )

⎛⎝⎜

⎞⎠⎟= cos−1 R(1,2)

sin(az )⎛⎝⎜

⎞⎠⎟

(3.57)

ax = cos−1 R(3,1)cos(az )+ R(3,2)sin(az )

sin(ay )⎛

⎝⎜⎞

⎠⎟

= sin−1 R(2,1)cos(az )+ R(2,2)sin(az )sin(ay )

⎛

⎝⎜⎞

⎠⎟

(3.58)

Thus, ax and ay are the required tilts that are removed from the image so that the image can be

used for the determination of affine egomotion.

72

Hence, using these angles, we determine the rotation matrix that will be used for tilt-

compensation. Let the tilt compensation rotation matrix be denoted as R ' , which will be given as

R ' = Ry−1Rx

−1 (3.59)

Transforming the tilted image with this rotation, we obtain

R ' ptilt = Ry−1Rx

−1ptilt= Ry

−1Rx−1R(ptilt _ free −T )

= Rz (ptilt _ free −T ) (3.60)

We, thus, obtain the tilt-compensated image, which appears to be like the one captured with a

camera whose optical axis is perpendicular to the planar surface. Hence, we obtain a relation

similar to (3.27), where the two consecutive images have translation and rotation in azimuth. As

a result, we can solve for the 4DOF estimation by using the algorithm proposed in section 3.4.

Now, having performed the tilt compensation on the first frame captured by the camera, we have

to compensate the tilt in all the consecutive frames and at the same time, we have to estimate the

affine transformation between each pair of consecutive frames. To compensate for the tilt in the

consecutive frames, we track the square grid of tiles selected in the first frame using the Lucas

Kanade Pyramid and determine the tilts between the two consecutive frames. The tilt obtained is

added to the tilt in the prior frame to obtain the cumulative tilt of the post frame. The cumulative

tilt obtained at a particular frame is used to remove the camera tilt from that particular frame.

Having removed the camera tilt from a pair of consecutive frames, we use the 4DOF egomotion

algorithm mentioned in section 3.4 to obtain the differential translation and azimuthal rotation of

the camera. As a result, we have the complete 6DOF trajectory estimation of the camera.

A problem faced while estimating the tilt using the algorithm mentioned above is that there are

times when a portion of the selected grid moves out of the FOV of the camera. As a result, Lucas

73

Kanade Pyramid will fail to track the grid. In order to account for the tracked feature points

moving out of the FOV, the algorithm adapts by selecting a new grid in the FOV which is

subsequently tracked in the consecutive frames. That is, whenever a corner feature point of the

selected grid moves out of the FOV, the selected grid shifts in such a way that the number of tiles

of the new grid in the two perpendicular directions remains the same. For instance, if it is

observed that the top left corner of the selected grid is moving out of the FOV from the left edge

of the image, the algorithm shifts the whole grid to the right by one tile. Figure 3.23 shows an

example of this grid shifting. In this example, as the top left corner of the grid moves closer to

the top edge of the frame, the algorithm causes the grid to shift down in the next frame.

(a) Selected grid in the prior frame

(b) Grid shifted down by one tile in the post frame

Figure 3.23 Example of grid shifting using the tilt compensation algorithm

Hence, the perspective between the two frames will be found based on the shifted grid of tiles.

The flow chart of the overall 6DOF egomotion algorithm proposed in this section is shown in

Figure 3.24.

74

Figure 3.24 Flow chart of the proposed 6DOF egomotion algorithm for rectangular patterned surface

Start

Find the camera tilt in the first frame using a grid of tiles

Find the tilt in the post frame and obtain tilt-compensated post frame

Using Hough lines and GF2T to find the feature points at the intersection of lines in tilt-

compensated prior image

Use Lucas Kanade Pyramid to find the corresponding feature points in tilt-compensated

post frame

Use Least Squares or Kalman filtering to obtain the differential translation and rotation between

the frames

Based on the obtained differential rotation and translation, estimate the camera trajectory

Stop

75

3.6 Proposed 6DOF algorithm for any planar surface

Like the algorithm proposed in section 3.5, this algorithm is also partitioned into a sequence of

2DOF and 4DOF algorithms. The 2DOF algorithm finds the relative tilt between the consecutive

frames and transforms the images in such a way that there is not relative tilt between the two

images. The 4DOF algorithm then estimates the translation and azimuthal rotation between the

two frames based on a Least Squares estimation method.

Like the algorithms mentioned in the above sections, the frames captured by the camera are pre-

processed and good features are extracted from the planar surface. For concrete surfaces, Shi and

Tomasi’s GF2T is used for feature extraction while the Hough lines algorithm along with GF2T

is used for the determination of quality feature points in the patterned surfaces. With the

available feature points in the prior frame, the corresponding feature points are extracted from

the post frame using a two-way optical flow. Based on the perspective between the two frames,

the rotation and translation of the camera are determined.

For this algorithm, instead of starting with the perspective between the camera and world

coordinate frames, we will consider the homography between the camera coordinate frames in

two positions. Let PC ,1 = xc,1 yc,1 zc,1⎡⎣

⎤⎦T

and PC,2 = xc,2 yc,2 zc,2⎡⎣⎢

⎤⎦⎥T

represent the

coordinates of the feature points with respect to the camera reference frame in the two camera

positions. Let the perspective transformation between the two frames be defined as

PC,2 = RPC,1 +T (3.61)

where the rotation matrix is given by the order R = RΔzRΔyRΔx . Operator Δ is used here to

signify that R is the relative rotation between the two frames. T represents the relative translation

between the two frames.

76

Let N = [n1,n2,n3]T represents the unit vector perpendicular to the planar surface with respect to

the camera reference frame at first camera position and d represents the distance from the optical

center of first camera position to the planar surface [22]. Thus,

NTPC,1 = n1xc,1 + n2xc,2 + n3xc,3 = d (3.62)

Hence, we can write

1dNTPC,1 =1 (3.63)

Substituting in equation (3.61), we get

PC,2 = RPC,1 +T ⋅1

= RPC,1 +T ⋅ 1dNTPC,1

⎛⎝⎜

⎞⎠⎟

= R + 1dTNT⎛

⎝⎜⎞⎠⎟ PC,1

(3.64)

Let the transformation matrix between the two frames be represented by H and given as

H = R + 1dTNT (3.65)

Hence, we have

PC,2 = HPC,1 (3.66)

If p1 and p2 represent the feature points corresponding to PC,1 and PC,2 in the respective image

planes in two positions, from (2.5) and (2.6) we have for some constants λ1 and λ2

p1 = λ1PC,1 =x1y11

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.67)

77

p2 = λ2PC,2 =x2y21

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.68)

Thus, for a new constant λ , we can write

p2 = λHp1 (3.69)

Hence, provided p1 and p2 , H can be determined to within a scale factor.

For the determination of H , let us consider p2 × p2 = p2∧p2 = 0 such that

p∧

2 Hp1 = 0 (3.70)

where any vector r = a b c⎡⎣

⎤⎦T

can be represented as the skew symmetric matrix operator,

r∧

as

r∧

=0 −c bc 0 −a−b a 0

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.71)

From (3.70) and (3.71), we have

0 −1 y21 0 −x2

− y2 x2 0

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

H11 H12 H13H21 H22 H23

H31 H32 H33

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

x1y11

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

= 0 (3.72)

From here, we obtain the following three equations.

−x1H21 − y1H22 − H23 + x1y2H31 + y1y2H32 + y2H33 = 0x1H11 + y1H12 + H13 − x1x2H31 − y1x2H32 − x2H33 = 0−x1y2H11 − y1y2H12 − y2H13 + x1x2H21 + y1x2H22 + x2H23 = 0

(3.73)

This gives us

78

0 0 0x1 y1 1

−x1y2 −y1y2 −y2

−x1 −y1 −10 0 0x1x2 y1x2 x2

x1y2 y1y2 y2−x1x2 −y1x2 −x20 0 0

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

H11

H12

H13

H21

H22

H23

H31

H32

H33

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

= 0 (3.74)

For K number of quality feature points, we can solve the above equation and solve for H by

using SVD. The right singular vector corresponding to the smallest singular value will

correspond to the vector of elements of H . The second singular value of H should be 1, which

provides a way of normalizing it [22].

Having obtained H from the observed feature points in the two planes, the next step is to estimate

the relative tilts from the estimated H. From (3.69), we can write

x2y21

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥= λ

H11 H12 H13

H21 H22 H23

H31 H32 H33

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

x1y11

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.75)

This gives us

x2 = λ(H11x1 + H12y1 + H13)y2 = λ(H21x1 + H22y1 + H23)

(3.76)

If {ΔTx,ΔTy,ΔTz} represents the relative translation between the two frames and {Δax,Δay,Δaz}

represents the relative rotation, then from (3.61), we have

79

xc,2yc,2zc,2

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=R11 R12 R13R21 R22 R23R31 R32 R33

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

xc,1yc,1zc,1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

+

ΔTxΔTyΔTz

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.77)

Hence,

xc,2 = R11xc,1 + R12yc,1 + R13zc,1 + ΔTxyc,2 = R21xc,1 + R22yc,1 + R23zc,2 + ΔTy

(3.78)

Since x1 = λ1xc,1 and x2 = λ2xc,2 , we can re-write the equation as

x2 = λ(R11x1 + R12y1 + R13 + λ1ΔTx )y2 = λ(R21x1 + R22y1 + R23 + λ1ΔTy )

(3.79)

Comparing (3.76) and (3.79), we obtain the following equations:

R11 = H11

R12 = H12

R21 = H21

R22 = H22

(3.80)

Considering the definitions of Rx , Ry and Rz given in the equations (3.51)-(3.53) and the order

of rotation R = RΔzRΔyRΔx , we have

R11 = cos(Δaz )cos(Δay )R12 = sin(Δaz )cos(Δay )R21 = −cos(Δaz )sin(Δay )sin(Δax )− sin(Δaz )cos(Δay )R22 = −sin(Δaz )sin(Δay )sin(Δax )+ cos(Δaz )cos(Δax )

(3.81)

Considering the frame rate to be sufficiently high for the tilt angles to be small, we can obtain the

relative tilt angles, Δax and Δay , from these equations. We can neglect the terms containing

sin(Δax )sin(Δay ) since these angles are very small. Hence the tilts can be obtained as

80

Δay = cos−1 R(1,1)2 + R(1, 2)2( )

Δax = cos−1

R(1,1)2 + R(1, 2)2( )R(2, 2)R(1,1)

⎛

⎝

⎜⎜

⎞

⎠

⎟⎟

(3.82)

Hence, from these equations, we can obtain the relative tilt between two consecutive frames. We

can compensate for the tilt such that there is only translation and azimuthal rotation between the

prior and post frame. Equation (3.61) can be written as

PC,2 = RΔzRΔyRΔxPC,1 +T (3.83)

We can compensate for the tilt between the images by transforming the prior frame by RΔyRΔx .

Let PC,3 be the camera coordinates of new image obtained. As a result, we have only translation

and azimuthal rotation between the two frames as

PC,2 = RΔzPC,3 +T (3.84)

RΔz and T can be solved for by using least squares estimation. It is important to note that in this

case, we cannot solve for the 4DOF estimation with the Least Squares model proposed in section

3.4 because in that model we considered translation followed by rotation. But in this case, we

started with rotation, followed by translation.

We can re-write equation (3.84) as

xc,2yc,2zc,2

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

=cos(Δaz ) sin(Δaz ) 0−sin(Δaz ) cos(Δaz ) 0

0 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

xc,3yc,3zc,3

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

+

ΔTxΔTyΔTz

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

(3.85)

We thus obtain

81

zc,2 = zc,3 + ΔTz

xc,2yc,2

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

cos(Δaz ) sin(Δaz )−sin(Δaz ) cos(Δaz )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

xc,3yc,3

⎡

⎣⎢⎢

⎤

⎦⎥⎥+

ΔTxΔTy

⎡

⎣⎢⎢

⎤

⎦⎥⎥

(3.86)

Dividing both sides by zc,2zc,3 , we obtain

1zc,3

x2y2

⎡

⎣⎢⎢

⎤

⎦⎥⎥= 1zc,2


⎡

⎣⎢⎢

⎤

⎦⎥⎥

x3y3

⎡

⎣⎢⎢

⎤

⎦⎥⎥+ 1zc,2zc,3

ΔTxΔTy

⎡

⎣⎢⎢

⎤

⎦⎥⎥

(3.87)

This can be rearranged to obtain

x2y2

⎡

⎣⎢⎢

⎤

⎦⎥⎥=zc,3zc,2


ΔTxzc,3

ΔTyzc,3

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

x3y31

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.88)

Let zc,3zc,2cos(Δaz ) = cΔ ,

zc,3zc,2sin(Δaz ) = sΔ ,

ΔTxzc,2

= TΔx and ΔTyzc,2

= TΔy . Hence the equation can be

written as

x2y2

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

cΔ sΔ−sΔ cΔ

TΔxTΔy

⎡

⎣⎢⎢

⎤

⎦⎥⎥

x3y31

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

(3.89)

It can be re-arranged in Least Squares notation as

x2y2

⎡

⎣⎢⎢

⎤

⎦⎥⎥=

x3 y3 1 0

y3 −x3 0 1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

cΔsΔTΔxTΔy

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

(3.90)

For K number of quality feature points, we obtain an over-determined set of equations given as

82

x21

y21

!x2K

y2K

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

M"#$ %$

=

x31 y3

1 1 0

y31 −x3

1 0 1!

x3K y3

K 1 0

y3K −x3

K 0 1

⎡

⎣

⎢⎢⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥⎥⎥

A" #$$$ %$$$

cΔsΔTΔxTΔy

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

P"#$ %$

(3.91)

The superscripts denote the number of feature point. The Least Squares solution of this equation

is given as P = (ATA)−1ATM . Once we obtain the parameters {cΔ, sΔ,TΔx,TΔy} , we can solve for

the differential translations and azimuthal rotation.

The azimuthal rotation is given as

Δaz = tan−1 sΔ

cΔ

⎛⎝⎜

⎞⎠⎟

(3.92)

Translation along z axis is given as

zc,2 =zc,3cΔ2 + sΔ

2 (3.93)

Translations along x and y axis are given by

ΔTx =zc,3cΔ2 + sΔ

2TΔx

ΔTy =zc,3cΔ2 + sΔ

2TΔy

(3.94)

Hence we obtain the remaining 4DOF motion of the camera. This algorithm works for all kinds

of planar surfaces, concrete and structured. Structured surfaces provide very high quality feature

points at the intersection of lines. However the feature points on a concrete surface are

ambiguous. As a result, we observe drifts in the trajectory when the planar surface is concrete

83

while the structure of the pattern surfaces provides extra constraint, which is used for the better

trajectory estimation.

The major difference of this algorithm from the 6DOF algorithm proposed in section 3.5 is that

the algorithm proposed in section 3.5 uses the constraint on the rectangular structure to estimate

the absolute tilts in the camera and compensates for the tilts in a way that the images appear to be

like the ones taken from a camera exhibiting no tilt, while this 6DOF works for any general

planar surface. Due of the absence of constraints on the structure of the planar surface, it can

only find the differential tilts between the two consecutive image frames and compensates for the

tilts in the frames such that there is no relative tilt between the frames.

An important consideration in all the above algorithms is that even if the floor is concrete, there

must always be some good trackable feature points available for trajectory estimation.

84

Experimental Verification Chapter Four:

In this chapter, the various algorithms proposed in the previous chapter are verified

experimentally. The algorithms are verified in an indoor environment using different validation

schemes. A variety of floor surfaces are used to estimate the camera trajectory. The algorithms

are tested using three different types of camera. Two of the cameras used for egomotion

estimation are standard iPhone 5S camera and LG Nexus 4 camera. Both of these cameras

contain an 8 megapixels sensor with video frame rate of 30 frames per second. The third type of

camera that we have employed for the verification of egomotion algorithms is a Point Grey’s

Bumblebee stereo camera. This camera, shown in Figure 4.1, consists of two camera sensors, the

left sensor and the right sensor, each 1.3 megapixels. Hence using a Bumblebee camera, we can

obtain a video with two different sensors at the same time, which can be used for verifying the

accuracy of the algorithms, as will be seen in the following sections.

Figure 4.1 Bumblebee stereo camera

This chapter involves various set of experiments performed to verify the three proposed

algorithms. The set of experiments performed for the three algorithms are mentioned below.

4DOF algorithm is verified with

• Simulated videos

Right camera Left camera

85

• Experiments involving two cameras moving together in an affine fashion on any kind of

planar floor surface.

• Back projection error calculation for three different cameras moving in affine fashion

without tilts on any planar floor surface.

• Experiment involving RMS error calculation of trajectory of an iPhone moving in a circular

fashion without any tilt on a planar surface.

6DOF algorithm for rectangular patterns is verified with

• Experiments involving two cameras moving together in any random fashion on rectangular

patterned floor surface.

• Back projection error calculation for three different cameras moving randomly on a

rectangular patterned floor surface.

• Experiment involving RMS error calculation of trajectory of an iPhone moving in a circular

fashion on a rectangular tiled floor with tilts associated with it.

6DOF algorithm for any planar surface is verified with

• Experiments involving two cameras moving together in any random fashion on any planar

floor surface.

• Experiment involving RMS error calculation of an iPhone trajectory moving in a circular

fashion on any planar surface with tilts associated with it.

4.1 4DOF algorithm verification

To verify the 4DOF egomotion estimation algorithm, we first apply the algorithm to the

simulated videos of a rectangle moving in different known fashions. The 4DOF algorithm

estimates the translation of the camera, its azimuthal rotation and change in height with respect

to the planar surface. The estimated trajectory is hence compared to the known true trajectory.

86

Having verified the algorithm for simulated cases, various experiments are performed where a

camera is moved in a random fashion against planar surfaces, concrete or patterned, and the

trajectory is estimated. Care needs to be taken that the camera is not tilted and the optical axis

remains perpendicular to the planar surface, otherwise, as shown in chapter 3, the tilts in the

camera will induce translation errors causing drifts in the trajectory estimate. Different methods

are used to verify the accuracy of trajectory estimation.

4.1.1 Verification on simulated videos

The algorithm is verified on simulated videos by showing the plots of the estimates with the

actual parameters. The root mean square (RMS) error of the estimated trajectory is calculated

with reference to the known actual trajectory. We start with the simulated video of a rectangle

translating uniformly in x and y directions. The translation is taken to be 1 pixel per frame in

both x and y directions. The snapshots of the first and the last frame of the video are shown in

Figure 4.2. In this case, we expect the translation to be linear in x and y directions. The

azimuthal rotation is expected to be zero throughout and so is the change in height.

First frame

Last frame

Figure 4.2 First and last frame of uniformly translating rectangle

87

The affine transformation between each pair of consecutive frames is calculated using the Least

Squares estimation algorithm proposed in chapter 3. Based on this transformation, the rotation

and translation of the camera are estimated and plotted. Figure 4.3(a) plots the estimated

translation of the camera in the x direction against the true translation that is already known to

us. Similarly, Figure 4.3(b) shows the plot of the estimated translation in the y versus the true

translation. It can be seen that the two trajectories are approximately the same because of the

absence of noise in the video.

(a) Translation in x

(b) Translation in y

Figure 4.3 Camera translations for the simulated case of uniformly translating rectangle

88

The overall trajectory of the camera in terms of translation in x and y is shown in Figure 4.4.

The RMS error of the trajectory was found to be 0.0083 pixels. As expected, the motion of the

camera comes out to be linear in x and y .

Figure 4.4 Plot of estimated trajectory as compared to the true trajectory for the simulation of uniformly translating rectangle

The azimuthal rotation of the camera, which is expected to be zero throughout the video, is

shown in Figure 4.5. Considering the initial height of the camera from the planar surface to be 1

unit, the plot for the height of the camera for the whole video is shown in Figure 4.6.

The RMS errors for the azimuthal rotation and camera height from the planar surface were also

calculated. The RMS error of the azimuthal rotation was found to be 1.67e-05 radians (9.56e-

04°) while that for the height was found to be 8.37e-06 units.

89

Figure 4.5 Azimuthal rotation for the simulated case of uniformly translating rectangle

Figure 4.6 Height of camera from planar surface for the simulated case of uniformly translating rectangle

Next, we consider a simulation of a rectangle rotating uniformly about its center, as shown in

Figure 4.7. A complete 360o rotation of the rectangle about the optical axis is considered.

90

Figure 4.7 Frames of the simulated video of a uniformly rotating rectangle

For this simulation, we plot the trajectory in terms of translation of the camera in x and y . Since

the simulation considers a camera rotating about its optical axis, it is expected to not change its

position in terms of x and y . Figure 4.8 shows the trajectory for one complete rotation of the

camera about its optical axis. The plot for the expected and actual azimuthal rotations is shown

in Figure 4.9.

Figure 4.8 Trajectory of the camera for the simulation of uniformly rotating rectangle

91

Figure 4.9 Azimuthal rotation of the camera for the simulation of uniformly rotating rectangle

The RMS of the trajectory in the case of uniform rotation was found to be 0.9482 pixels. From

Figure 4.8 and Figure 4.9, it can be seen that the drifts are not random; rather they are uniform

and systematic. One possible reason for this systematic drift could be the uniform drift in the

Lucas Kanade based feature correspondence. Due to pixel quantization of the simulation, Lucas

Kanade finds a close match for the feature points in the consecutive images, resulting in the

estimation drift. This drift is carried forward and added to the further drift in the

correspondences, resulting in a systematic drift.

Now we consider the simulation of a uniformly shrinking and expanding rectangle, indicating the

change in height of the camera. When the camera is supposed to increase its height with respect

to the planar surface, the rectangle tends to shrink about the camera optical center. Likewise,

when the height is supposed to decrease, the rectangle tends to expand. We have considered

uniform shrinking of the rectangle for first few frames and for the next few frames, it is

92

uniformly expanded and remains still for last few frames. Figure 4.10 shows some of the frames

of the rectangle undergoing shrinking/expanding.

Figure 4.10 Frames of the simulated video of uniformly rotating rectangle

The overall trajectory for this case is shown in Figure 4.11. As expected, the camera remains

static in x and y for the entire video.

Figure 4.11 Trajectory for the simulation of camera when changing height

93

The RMS error of the estimated trajectory with respect to the true trajectory is found to be

0.1284 pixels. Azimuthal rotation, expected to be zero throughout, is shown in Figure 4.12.

Figure 4.13 shows the plot of the estimated height using the proposed 4DOF estimation

algorithm. The RMS error of the height is found to be 6.0252e-04 units.

Figure 4.12 Plot of azimuthal rotation for the simulation of camera when changing height

Figure 4.13 Plot of height estimate for the simulation of camera when changing height

94

All the above results are obtained considering the Least Squares estimation of the trajectory.

Since in case of all the simulations, we are aware of the motion of the camera, that is, the

statistical model of camera motion is known, we can also use Kalman filter for the trajectory

estimation. Hence, now we will estimate the trajectory using Kalman filtering and compare its

results with those obtained using Least Squares.

We consider the case of uniformly translating rectangle to estimate the trajectory using Least

Squares and Kalman filter. Figure 4.14(a) shows the results of trajectory estimation for first few

frames of the video while Figure 4.14(b) plots the trajectory estimation for the middle few

frames of the video. The reason for showing these different plots is to throw light on how

Kalman filter starts with a larger drift in the trajectory than Least Squares estimation, but as time

proceeds, it tracks the true trajectory and moves closer to it. This is because Kalman filtering is a

recursive process of prediction and correction. Even if the initial predicted values of the state

vector of Kalman filter do not correspond to the actual values, it eventually gets closer to the true

values as the number of measurements increase.

(a) Trajectory estimation during first few frames

95

(b) Trajectory estimation for later frames

Figure 4.14 Comparison of trajectory estimation using Least Squares and Kalman filtering

Here also we observe a constant residual error in the trajectory estimation, which might be due to

pixel quantization. In this case, the RMS error of the Least Squares trajectory was found to be

0.0083 pixels while that for the Kalman filter trajectory, it was found to be 0.0054 pixels. This is

because we are aware of the camera model in this case, which improves the performance of the

Kalman filter. We will see in the following sections that when the motion of the camera is

random, Least Squares estimation performs better than Kalman filter estimation.

4.1.2 Verification using stereoscopic view

For this experiment, we use two different cameras, the iPhone camera and the Nexus camera at

the same time. Both the cameras are pasted together and mounted on a cart in a way that they are

facing the floor surface. The cart moves the cameras on the floor such that their optical axis is

perpendicular to the planar surface. The picture of the setup is shown in Figure 4.15.

96

(a) Front view

(b) Top view

Figure 4.15 Setup of two cameras moving together on a floor surface

The cameras are calibrated using the calibration procedure described in Chapter 3. The cart is

moved for around 3.6 m and videos of the floor surface are recorded using the two cameras

simultaneously. The estimated trajectories of the two camera motions are plotted in terms of their

translations in x and y. Although both the cameras are independently observing different feature

points, but since they move in the same fashion, they should have the same trajectory. The RMS

variation of the two trajectories is calculated at various instants of the motion. Figure 4.16 shows

the resultant trajectories from the two cameras. The RMS variations at different instants of time

are given in Table 4.1.

97

Figure 4.16 Trajectories of two cameras moving together estimated using the 4DOF algorithm

Table 4.1 RMS variations in trajectories of two cameras moving together estimated using the 4DOF algorithm

Frame Number RMS variation (centimeters) 50 0.19

100 0.42

500 1.28

1000 2.51

1500 2.90

2000 3.53

The RMS variation of the entire trajectory was found to be 3.8 cm for 3.6 m of camera motion.

The azimuthal rotations of the two cameras are plotted in Figure 4.17 and their RMS variations

at various instants of motion are shown in Table 4.2. The overall RMS variation in the azimuthal

rotations was found to be 0.08 radians (4.58°) for 3.6 m.

98

Figure 4.17 Azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm

Table 4.2 RMS variations in azimuthal rotations of two cameras moving together estimated using the 4DOF algorithm

Frame Number RMS variation (radians) 50 0.00

100 0.00

500 0.07

1000 0.08

1500 0.08

2000 0.08

In the second part of this experiment, instead of using two different cameras, we mount the Point

Grey’s Bumblebee stereo camera on the cart such that there is no tilt associated with the camera.

Videos of the floor surface are taken in the left and right camera simultaneously. The trajectories

and the azimuthal rotations of the two cameras estimated using the 4DOF algorithm are plotted

in Figure 4.18 and Figure 4.19 respectively. Table 4.3 and Table 4.4 provide the RMS variations

in the trajectories and azimuthal rotations of the two cameras.

99

Figure 4.18 Trajectories of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm

Figure 4.19 Azimuthal rotations of the left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm

100

Table 4.3 RMS variations in trajectories of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm


100 0.10

300 0.90

500 2.29

800 2.89

1000 3.16

Table 4.4 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera estimated using the 4DOF algorithm


100 0.00

300 0.01

500 0.01

800 0.01

1000 0.01

The total RMS variation in the trajectories for 3 m of camera motion was calculated to be 3.2 cm.

The RMS variation in the azimuthal rotation for the same length of camera motion was found to

be 0.01 radians (0.57°).

4.1.3 Back Projection Verification

This experiment aims to find the error in the trajectory of a camera moving against a planar

surface in a random fashion. Since the camera moves randomly in a 3D plane, it is not

convenient to determine its true trajectory for comparison with the estimated trajectory. Hence,

the back projection method [48] is used to determine the accuracy of the algorithm.

In the back projection method, for a pair of consecutive frames of the camera, the transformation

of the camera motion is estimated from the prior frame to the post frame based on Least Squares.

101

The translation and rotation of the camera are estimated from the obtained transformation matrix

and an inverse transformation matrix is obtained, which provides the transformation from the

post frame to the prior frame. The post image is transformed with this inverse transformation

matrix to obtain a new image. Back projection error is calculated between the feature points of

the new image and feature points of the prior image.

For our analysis, we have paired a frame with its tenth consecutive frame to compute the back

projection error, that is, the frame at time t is paired with the frame at time t+9. All the

differential translations and rotations until the tenth frame are accumulated to obtain the overall

transformation between the frame at time t and the one at time t+9. The frame at time t+9 is

back projected with the inverse transformation matrix and the back projection error is calculated

with the frame at time t. A scattered plot of the back projection errors is obtained and the

standard deviation of the errors is calculated.

We started with the iPhone camera moving randomly without any tilt against a floor surface.

Figure 4.20 shows the scattered plot for its motion obtained as a result of the back projection

method. The standard deviation of the errors was calculated to be 0.33 cm in the x direction and

0.54 cm in the y direction for 8.5 m of camera motion.

102

Figure 4.20 Scattered plot of back projection errors obtained for the iPhone camera using the 4DOF algorithm

Next we moved the Nexus camera in a random fashion against the floor surface to perform the

back projection experiment. Figure 4.21 shows the scattered plot obtained for the back projection

errors. The standard deviation in x direction was calculated to be 0.41 cm and that in y was found

to be 0.41 cm for 8.5 m of random camera motion.

Figure 4.21 Scattered plot of back projection errors obtained for the Nexus camera using the 4DOF algorithm

103

Finally, the stereoscopic camera is used to verify the algorithm using the back projection method.

Scattered plot of errors is shown in Figure 4.22. The standard deviation of the back projection

errors was calculated to be 0.12 cm in the x direction and 0.15 cm in the y direction for 7 m of

camera motion involving translation and azimuthal rotation.

Figure 4.22 Scattered plot of back projection errors obtained for the Bumblebee camera using the 4DOF algorithm

It is interesting to note the quantization effect that exists in the case of iPhone and Nexus phone

scattered plots shown in Figure 4.20 and Figure 4.21 respectively, but is absent in the case of the

Bumblebee camera scattered plot shown in Figure 4.22. The reason for the presence of this

artefact in the iPhone and Nexus camera plots might be that the frame rate in case of Bumblebee

camera is very high as compared to the iPhone and Nexus cameras. As a result, the differentials

between the two camera positions are much small in Bumblebee camera, which might be the

reason for the absence of quantization effect.

Having obtained the scattered plot for back projection based on Least Squares, we now use

Kalman filtering to obtain the scattered plot and calculate the errors. For the motion of

104

stereoscopic camera, the obtained scattered plot as a result of back projection is shown in Figure

4.23. Since we don’t have an idea about the motion model in this case, Kalman filtering performs

worse than Least Squares. The standard deviations of the back projection errors obtained are

slightly higher than those obtained with Least Squares for the same number of frames. The

standard deviation of the errors was calculated to be 0.13 cm in x direction and 0.17 cm in y

direction.

Figure 4.23 Scattered plot of back projection errors obtained for the Bumblebee camera based on the Kalman filter estimation of 4DOF algorithm

4.1.4 Verification based on known trajectory

This experiment considers a deterministic motion of the camera to verify the accuracy of the

4DOF algorithm. The camera is moved in a circular fashion by using a turntable, as shown in

Figure 4.24. The turntable consists of a Newmark RT-5 motorized rotary stage, shown in Figure

4.25 and a Newmark NSC-1 motion controller, shown in Figure 4.26, which controls the velocity

of the circular motion.

105

Figure 4.24 Turntable used to move the camera in a circular motion

Figure 4.25 Newmark RT-5 motorized rotatory stage

Figure 4.26 Newmark NSC-1 motion controller

106

For this experiment, we mounted the iPhone at a certain distance from the rotary stage, as shown

in Figure 4.27, and made the controller to rotate the table at a speed of 1! per sec. The turntable is

rotated for 120! and the video is recorded.

Figure 4.27 iPhone camera mounted on the turntable

The trajectory of the camera motion was plotted using Least Squares estimation and compared

against the actual trajectory. Figure 4.28 shows the plot of the camera trajectory obtained using

the 4DOF algorithm against the actual trajectory. The starting point of the actual trajectory is not

known exactly and is inferred from the estimated trajectory as the radius of motion and angle

moved are known. Hence, the actual trajectory might differ slightly from the one shown in the

figure. A uniform drift is observed in the trajectory, which might be due to slight tilts in the

camera resulting from mounting errors.

The RMS errors of the trajectory are obtained at various instants of time and are shown in Table

4.5. The RMS error for the complete motion of around 2 m was found to be 2.2 cm.

107

Figure 4.28 Camera trajectory obtained for a circular motion using the 4DOF algorithm

Table 4.5 RMS errors in camera trajectory obtained for a circular motion using the 4DOF algorithm

Frame Number Error (centimeters) 100 0.11

500 0.86

1000 1.50

2000 1.98

3000 2.12

The azimuthal rotation of the trajectory is plotted against the actual azimuthal rotation in Figure

4.29. Table 4.6 shows the azimuthal rotation RMS errors. The total azimuthal rotation for 2 m of

rotation on the turntable was found to be 0.02 radians (1.14°).

108

Figure 4.29 Azimuthal rotation obtained for circular camera motion using 4DOF algorithm

Table 4.6 RMS errors in azimuthal rotation obtained for circular motion using the 4DOF

algorithm

Frame Number Error (radians) 100 0.00

500 0.00

1000 0.00

2000 0.01

3000 0.01

The high accuracy of the algorithm is verified by a few centimeters of error for several meters of

camera motion.

4.2 Verification of 6DOF algorithm for rectangular patterned surfaces

Like the 4DOF algorithm for motion estimation, the verification of this algorithm is done by

performing various experiments with the three different cameras moved in a random fashion. In

this case, we move the camera against a rectangular patterned floor surface. There is no

restriction on the camera axis being perpendicular to the planar surface in this case. Hence the

109

camera might be tilted while capturing the frames on the rectangular tiled floor. The algorithm

first removes the tilt from the frames using the 2DOF algorithm and then performs the 4DOF

egomotion estimation.

4.2.1 Results of tilt removal on rectangular tiled floor

Before verifying the algorithm for trajectory estimation, the results for tilt removal are presented.

As stated in section 3.5, the 2DOF tilt removal algorithm works by choosing a square grid on the

patterned surface and mapping it onto a perfect square to estimate the tilt. The inverse of the tilt

calculated is then applied to the image to obtain a tilt compensated image. Figure 4.30 and Figure

4.31 show some images on which tilt removal is applied and the corresponding tilt compensated

images are obtained.

(a) Original image

(b) Tilt compensated image

Figure 4.30 Result of tilt removal algorithm

110

(a) Original image

(b) Tilt compensated image

Figure 4.31 Result of tilt removal algorithm

4.2.2 Verification based on stereoscopic view

Like section 4.1.2, for this algorithm also we performed the experiment where two different

cameras were mounted together on a cart and moved. To verify the tilt removal, the two cameras

were mounted with different tilts on the cart, as shown in Figure 4.32. For the verification of the

algorithm, the 2DOF algorithm should be able to compensate for the different tilts in the two

cameras and result in a similar trajectory. Figure 4.33 shows the result of trajectory estimation

for the two cameras and their calculated RMS variations are given in Table 4.7.

Figure 4.32 Setup of cameras mounted at different tilt angles on the cart

111

Figure 4.33 Trajectories of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns

Table 4.7 RMS variations in trajectories of cameras moving together obtained using the 6DOF algorithm for rectangular patterns


100 0.22

500 2.62

1000 2.64

1500 3.00

2000 3.90

The RMS variation of the entire trajectory was found to be 4.9 cm for 8.5 m of camera motion.

The azimuthal rotations of the two cameras are plotted in Figure 4.34 and the RMS variations are

given in Table 4.8. The RMS variation in azimuthal rotations for the entire motion was found to

be 0.04 radians (2.29°).

112

Figure 4.34 Azimuthal rotations of two cameras moving together obtained using the 6DOF algorithm for rectangular patterns

Table 4.8 RMS variations in azimuthal rotations of cameras moving together obtained using the 6DOF algorithm for rectangular patterns


100 0.00

500 0.01

1000 0.02

1500 0.03

2000 0.04

Next, we moved the Bumblebee stereo camera in a tilted fashion randomly against the

rectangular tiled floor. The trajectories of the left and right camera sensors are shown in Figure

4.35 and their RMS variations at various instants of the video are shown in Table 4.9. The RMS

variation of the entire motion of 4.8 m is found to be 2.3 cm.

113

Figure 4.35 Trajectories of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns

Table 4.9 RMS variations in trajectories of the left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns


100 0.07

500 1.51

1000 2.44

1500 2.56

2000 2.28

The azimuthal rotations of the left and right sensors of the Bumblebee camera are plotted in

Figure 4.36. The RMS variations at various times are shown in Table 4.10. The total RMS

variation for 4.8 m of camera motion was 0.02 radians (1.14°). A sudden significant deviation

can be seen in the azimuthal rotation at frame 1000. This might be due to the presence of

114

possible features in one of the camera frames, which significantly affect the estimation of that

camera, while these significant features are absent in the second camera, resulting in a sudden

deviation.

Figure 4.36 Azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns

Table 4.10 RMS variations in azimuthal rotations of left and right sensors of the Bumblebee camera obtained using the 6DOF algorithm for rectangular patterns


100 0.00

500 0.01

1000 0.01

1500 0.02

2000 0.02

115

4.2.3 Verification based on back projection

We performed the verification of the 6DOF algorithm using the back projection method

mentioned in Section 4.1.3. Figure 4.37 shows the scattered plot of the back projection errors for

a random motion of the iPhone camera on a rectangular tiled surface. The standard deviation of

the errors for 3 m of the iPhone camera motion was calculated to be 0.69 cm in the x direction

and 0.88 cm in the y direction for 3 m of camera motion.

Figure 4.37 Scattered plot of back projection errors obtained for the iPhone camera using the 6DOF algorithm for rectangular patterns

The scattered plot of the back projection errors obtained for the Nexus 4 camera is shown in

Figure 4.38. For 3 m of camera motion, the standard deviation of the errors was calculated to be

0.42 cm in the x direction and 0.59 cm in the y direction.

116

Figure 4.38 Scattered plot of back projection errors obtained for the Nexus camera using the 6DOF algorithm for rectangular patterns

Figure 4.39 shows the scattered plot of back projection errors for 4.8 m of the Bumblebee camera

motion. The standard deviations of the errors obtained in x and y direction are 0.17 cm and 0.53

cm respectively.

Figure 4.39 Scattered plot of back projection errors obtained for the Bumblebee camera using the 6DOF algorithm for rectangular patterns

117

All the above plots were obtained by using the Least Squares estimation for the 4DOF algorithm.

For the Bumblebee camera motion, we used Kalman filtering to estimate the egomotion and plot

the back projection errors. Figure 4.40 shows the scattered plot of the errors; the standard

deviations of the errors were 0.23 cm in the x direction and 0.86 cm in the y direction.

Figure 4.40 Scattered plot of back projection errors for the Bumblebee camera considering 6DOF estimation using Kalman filtering

Hence, we verify that when the motion model of the camera is known, Kalman filtering performs

better, as shown in Section 4.1.1. However, when the model of camera motion is unknown, Least

Squares estimation gives better results than the Kalman Filter estimation.

4.2.4 Verification based on a known trajectory

We again used the turntable to obtain a known trajectory of the camera motion. In this case, the

iPhone was mounted with a certain tilt at the end of the turntable shaft, as shown in Figure 4.41.

118

Figure 4.41 Camera mounted on the turntable at a certain tilt

The turntable was rotated about 1! per sec for 180! . Figure 4.42 shows the plot of the trajectory.

It is compared against the true known trajectory and the RMS errors were calculated. The RMS

errors in the trajectory at various instants of its motion are shown in Table 4.11. The total RMS

error for around 2.3 m of motion is 1.6 cm.

Figure 4.42 Camera trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns

119

Table 4.11 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm for rectangular patterns


500 1.57

1000 1.92

1500 1.68

2000 1.51

The azimuthal rotation of the camera for the semi-circle traversed is plotted in Figure 4.43. The

RMS errors obtained when comparing with the actual azimuthal rotation are shown in Table

4.12. The error for entire 2.3 m of motion is 0.05 radians (2.86°).

Figure 4.43 Azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns

120

Table 4.12 RMS errors in the azimuthal rotation obtained for a circular motion using the 6DOF algorithm for rectangular patterns


500 0.03

1000 0.04

1500 0.04

2000 0.05

4.3 Verification of 6DOF algorithm for camera directed at any planar surface

For the verification of this algorithm, we have used the cameras on both tiled and patterned

surfaces and the cameras were moved in a random fashion with some tilts to obtain the trajectory

of motion. Different experimental trials were done and results obtained using various methods of

verification. The trajectory estimation on a concrete surface was compared with the trajectory

estimation on a patterned surface.

4.3.1 Verification using stereoscopic view

We will use a long range motion to verify the accuracy of this algorithm. The Bumblebee

stereoscopic camera was moved in a random fashion for a distance of 16 m on a concrete floor

and trajectories of the left and right sensors are plotted using the 6DOF algorithm as shown in

Figure 4.44. Their RMS variations are shown in Table 4.13. The RMS variation for the entire

motion of 16 m was found to be 9.9 cm.

121

Figure 4.44 Trajectory for long range motion of stereoscopic camera obtained using the 6DOF algorithm

Table 4.13 RMS variations in long range trajectories of the two sensors of the Bumblebee camera obtained using the 6DOF algorithm


100 0.08

500 1.36

1000 5.24

1500 7.55

2000 9.88

The azimuthal rotations for the left and right sensors of the camera are shown in Figure 4.45.

Variations in the rotation at various instants during the motion are shown in Table 4.14. The

RMS variation for the entire 16 m of motion was found to be 0.10 radians (5.72°).

122

Figure 4.45 Azimuthal rotation for long range motion of stereoscopic camera obtained using the 6DOF algorithm

Table 4.14 RMS variations in long range azimuthal rotations of two sensors of the Bumblebee camera obtained using the 6DOF algorithm


100 0.00

500 0.02

1000 0.06

1500 0.08

2000 0.10

Next we move the Bumblebee stereoscopic camera in a random fashion with some tilt. The

trajectories of the two sensors for this case are shown in Figure 4.46. The calculated RMS

variations are given in Table 4.15. The total RMS variation for around 3.2 m of motion was

calculated to be around 0.9 cm.

123

Figure 4.46 Trajectories of the two sensors of stereoscopic camera obtained using the 6DOF algorithm

Table 4.15 RMS variations in the trajectories of the sensors of stereoscopic camera obtained using the 6DOF algorithm


100 0.12

200 0.54

300 0.92

Whichever way the camera is tilted, an equal amount of tilt is reflected by the left and right

sensors of the camera. Hence, we plot the tilts in the two camera sensors in Figure 4.47 and

Figure 4.48, which show the tilts in x and y directions respectively. The RMS variations in the tilt

angles obtained for the two trajectories were estimated. The variation in the tilt angles in the x

direction was found to be 0.00 radians and that in y direction was found to be 0.00 radians for

around 3.2 m of camera motion, hence insignificant.

124

Figure 4.47 Tilts in x direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm

Figure 4.48 Tilts in y direction for the sensors of stereoscopic camera obtained using the 6DOF algorithm

125

Finally, the azimuthal rotations calculated for the two sensors are plotted in Figure 4.49 and their

RMS variations are given in Table 4.16. The variation for the overall motion was estimated to be

0.01 radians (0.57°).

Figure 4.49 Azimuthal rotations of the sensors of stereoscopic camera obtained using the 6DOF algorithm

Table 4.16 RMS variations in azimuthal rotations of sensors of stereoscopic camera obtained using the 6DOF algorithm


100 0.00

200 0.01

300 0.01

4.3.2 Verification based on known trajectory

Like the known trajectory verification of the previous two algorithms, we used the turntable to

obtain a deterministic trajectory for a tilted camera and for comparison with the estimated

trajectory based on the 6DOF algorithm. The algorithm first compensates for the differential tilts

126

in the two consecutive frames and then estimates the trajectory and azimuthal rotation. The true

and the estimated trajectory for 160! of rotation are shown in Figure 4.50 and the azimuthal

rotations, actual and estimated, are plotted in Figure 4.51.

Figure 4.50 Camera trajectory obtained for a circular motion using the 6DOF algorithm

Figure 4.51 Azimuthal rotation obtained for a circular camera motion using the 6DOF algorithm

127

The RMS errors in the estimated trajectory and in the estimated azimuthal rotation calculated at

various instants of motion are shown in Table 4.17 and Table 4.18. The total RMS error in

trajectory was found to be 1.5 cm and that in azimuthal rotation was found to be 0.03 radians

(1.71°) for 2.5 m of rotation.

Table 4.17 RMS errors in trajectory obtained for a circular motion using the 6DOF algorithm


500 1.14

1000 1.03

1500 0.92

2000 1.15

Table 4.18 RMS errors in azimuthal rotation obtained for a circular motion using the 6DOF algorithm


500 0.02

1000 0.02

1500 0.02

2000 0.03

4.3.3 Comparison of trajectory estimation on patterned and concrete surfaces

In this subsection, we show how the results of trajectory estimation are affected when the floor is

patterned, instead of concrete. We performed the experiment based on a known trajectory using

the same camera on a concrete and a tiled floor under the same lightening conditions and moving

the camera for the same distance. The trajectories obtained on the two different floors were

128

plotted and their RMS errors calculated. Figure 4.52 and Figure 4.53 show the trajectories

estimated for the motion of camera on tiled and concrete floor respectively.

Figure 4.52: Trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm

Figure 4.53 Trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm

129

The RMS error calculated for 2.5 m of camera motion on tiled floor was found to be 2.5 cm and

that on concrete floor was found to be 5.2 cm. The RMS errors in the trajectories at several

instants during camera motion on tiled and concrete floor are shown in Table 4.19 and Table

4.20 respectively.

Table 4.19 RMS errors in trajectory obtained for circular motion on a patterned surface using the 6DOF algorithm


500 0.53

1000 1.20

1500 1.95

2000 2.41

Table 4.20 RMS errors of trajectory obtained for circular motion on a concrete surface using the 6DOF algorithm


2000 0.93

3000 1.03

4000 1.86

5000 3.65

The azimuthal rotations for the circular camera motion for tiled and concrete floor are shown in

Figure 4.54 and Figure 4.55 and their corresponding RMS errors are given in Table 4.21 and

Table 4.22. The RMS errors for 2.5 m of motion were found to be 0.02 radians (1.14°) for tiled

floor and 0.03 radians (1.71°) for concrete floor.

130

Figure 4.54 Azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm

Figure 4.55 Azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm

131

Table 4.21 RMS errors in azimuthal rotation obtained for circular motion on a patterned surface using the 6DOF algorithm


500 0.01

1000 0.01

1500 0.01

2000 0.02

Table 4.22 RMS errors of azimuthal rotation obtained for circular motion on a concrete surface using the 6DOF algorithm


2000 0.02

3000 0.03

4000 0.04

5000 0.03

The performance of the egomotion estimation algorithm improves significantly on a patterned

surface, since the lines of the pattern add extra information for the extraction of higher quality

feature points, while the ambiguity on the concrete surface results in lower quality feature points,

which causes the trajectory estimation to drift from its actual value.

Various trials of different experiments using a variety of cameras result in few centimeters of

errors in the trajectory estimation for several meters of camera motion, which proves the high

accuracy of the proposed egomotion algorithms.

132

Conclusions and Future Work Chapter Five:

This thesis provides robust 6DOF algorithms for the estimation of camera trajectory by making

use of the feature points on a planar surface. The estimated trajectory of the camera can be used

further to improve the performance of indoor navigation. This chapter provides a summary of the

thesis, which includes the contributions of this research. It also provides suggestions for future

work, which could improve the performance of the proposed algorithms.

5.1 Conclusions

This research aims to address the hypothesis stated in Chapter 1 that accurate trajectory

estimation can be achieved if the observed feature points are planar and the estimation can be

further improved if the features are on patterned surfaces.

The contributions of this research to support this hypothesis can be summarized as follows:

• The process of image formation and image transformation was explained. Methods of

feature point extraction and correspondence were introduced, which were used in the

proposed algorithms for the extraction and correspondence of feature points on the planar

surfaces.

• It was determined that in order to achieve centimeter level accuracy for trajectory

estimation, it is necessary to accurately compensate for the lens distortion of the camera. An

efficient method of doing this based on the chessboard camera calibration was introduced

and implemented in the overall routine.

• It was identified that for the extraction of high quality feature points from a planar surface,

noise needs to be removed and the structure of the surface needs to be highlighted. Thus,

various methods of pre-processing the image to remove noise and highlight the features

were established. These methods include Gaussian smoothing, edge detection and image

133

thresholding. In patterned surface, rich features can be obtained at the intersection of lines.

Hence, a method to extract the lines in the image based on the Hough transform was

introduced and implemented.

• A 4DOF algorithm for trajectory estimation based on Least Squares and Kalman filtering

was proposed for the cases where the camera is held in such a way that its optical axis is

perpendicular to the planar surface. The algorithm estimates the translations and azimuthal

rotation that the camera undergoes based on how the features on the planar surface move

from frame to frame.

• A 6DOF algorithm was proposed for the estimation of egomotion of the camera which

moves randomly against a rectangular patterned surface. The constraints on the structure of

the surface provide a means to estimate the absolute tilts in the camera. The proposed

algorithm compensates for the camera tilts and estimates the trajectory of the camera from

the motion of feature points in the tilt compensated images.

• A 6DOF algorithm that estimates the egomotion of the camera moving randomly against any

planar surface, concrete or tiled, was proposed. The algorithm first estimates the differential

tilts between the two camera positions and compensates for this differential tilt. The tilt

compensated images are then used to estimate the relative translations and azimuthal

rotation between the two camera positions.

• Simulated videos were used to verify the accuracy of the 4DOF algorithm for trajectory

estimation. The RMS errors calculated for the estimated trajectories were found to be of the

order of few millimeters. A comparison was provided between the estimations based on

Least Squares and Kalman filtering and it was shown that when the motion model of the

134

camera is known, Kalman filtering performs better than Least Squares while in all other

cases, Least Squares exhibits a better performance.

• Two different cameras moving together were used to verify that the proposed algorithms

provide similar estimation of trajectory for two independent cameras moving in the same

fashion. The RMS variations were calculated in the estimated trajectories of the two cameras

and were found to be of the order of few centimeters. Also, the RMS errors in the estimated

azimuthal rotations of the two cameras were calculated.

• Back projection method was used for the verification of proposed algorithms. Based on the

obtained rotation and translation, the feature points in the post frame were back projected

and the errors were calculated in the back projected feature points. Scattered plots were

obtained for the back projection error and millimeter level standard deviations obtained

support highly accurate trajectory estimation using the proposed algorithms.

• The performance of the algorithms was evaluated based on a known trajectory. A camera

was moved in a circular fashion using a rotary stage and a controller and the trajectory was

estimated based on the proposed algorithms. The RMS errors in the trajectories were

calculated based on the actual trajectory, which were found to be few centimeters for several

meters of camera motion.

• A comparison was provided for trajectory estimation on a concrete and patterned surface

based on known camera motion. It was shown that low RMS errors were obtained in the

trajectory and azimuthal rotation on a patterned surface as compared to the concrete surface,

which proves that the structure of the patterned surface adds information to improve the

performance of trajectory estimation.

135

Centimeter level accuracy in the experiments involving two cameras moving together, low

standard deviation in the back projection errors and few centimeters of RMS errors for several

meters of camera motion in the turntable experiment all indicate the high accuracy of the

proposed algorithms.

5.2 Future Work

The egomotion algorithms discussed in this thesis can be improved in performance to provide

better and robust trajectory estimation. Some potential future work includes:

• As discussed in Chapter 1, CV based algorithms provide good trajectory estimation for short

range trajectories but they are subject to long term drift. However, GNSS based observations

and other wireless signals provide high performance for long range with less drift issues. As

a result, integrating CV observables with GNSS might be an important step towards

attaining a very high accuracy in indoor navigation.

• Another important task that could be performed in the future is the integration of the CV

observables and wireless signals with data from inertial measurements for data fusion of

indoor location observables.

• This research focused on performing indoor navigation based on the features from a planar

surface. The algorithm proposed here could be extended to curved surfaces, steps of multiple

levels and intersecting walls.

• For the planar structures like the one shown in Figure 5.1, where the pattern of the surface

does not repeat itself, it is possible to estimate the absolute position of the camera.

136

Figure 5.1 Example of planar structure not repeating its pattern

Repeating tiles on a planar surface are ambiguous in terms of absolute camera location but

with this arrangement of tiles that never repeat their pattern, the ambiguity in the absolute

camera location can be mitigated. Hence, the use of local patterns to determine the camera

position is an interesting task that could be implemented in the future based on the

algorithms proposed herein.

• The trajectory, estimated using the proposed algorithm, could be used to perform beam

forming under line of sight conditions.

137

References

[1] G. Lachapelle, "GNSS Indoor Location Technologies," J. Global Position Systems, vol. 3,

nos. 1-2, pp. 2-11, 2004.

[2] F. Van Diggelen, “Indoor GPS Theory & Implementation”, Position Location and Navigation

Symposium, 2002 IEEE, pages 240-247, 2002.

[3] G. Dedes and A. G. Dempster, "Indoor GPS Positioning: Challenges and Opportunities," in

Proceedings of the IEEE Semiannual Vehicular Technology Conference. 2005.

[4] Van Diggelen and F. S. Tromp, “A-GPS: Assisted GPS, GNSS, and SBAS”, Artech House,

2009.

[5] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal, "GPS: location-tracking

technology", Computer 35.4 (2002): 92-94.

[6] G. M. Djuknic and R. E. Richton, "Geolocation and assisted GPS", IEEE Computer, vol.

34, no. 2, pp.123 -125 2001.

[7] M. Weyn and F. Schrooyen, "A Wi-Fi Assisted GPS Positioning Concept", ECUMICT 08,

Gent, Belgium, March 2008.

[8] H. Liu, H. Darabi , P. Banerjee and J. Liu, "Survey of wireless indoor positioning techniques

and systems", Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE

Transactions on 37.6 (2007): 1067-1080.

[9] M. Kanaan and K. Pahlavan, "A comparison of wireless geolocation algorithms in the indoor

environment", Wireless Communications and Networking Conference, 2004. WCNC. 2004

IEEE. Vol. 1. IEEE, 2004.

[10] K. Kaemarungsi, and P. Krishnamurthy, "Modeling of Indoor Positioning Systems Based on

Location Fingerprinting", in Proc. IEEE INFOCOM, May 2004.

138

[11] Z. Zhang, "Estimating motion and structure from correspondences of line segments between

two perspective images", Pattern Analysis and Machine Intelligence, IEEE Transactions

on 17.12 (1995): 1129-1139.

[12] S. Scharer, J. Baltes and J. Anderson, "Practical ego-motion estimation for mobile

robots", Robotics, Automation and Mechatronics, 2004 IEEE Conference on. Vol. 2. IEEE,

2004.

[13] K. Yamaguchi, T. Kato, and Y. Ninomiya, "Vehicle ego-motion estimation and moving

object detection using a monocular camera", Pattern Recognition, 2006. ICPR 2006. 18th

International Conference on. Vol. 4. IEEE, 2006.

[14] S. W. Yang and C. C. Wang, "Multiple-model RANSAC for ego-motion estimation in

highly dynamic environments", Robotics and Automation, 2009. ICRA'09. IEEE International

Conference on. IEEE, 2009.

[15] S. Se, D. Lowe and J. Little, "Vision-based mobile robot localization and mapping using

scale-invariant features", Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE

International Conference on. Vol. 2. IEEE, 2001.

[16] A. Milella and R. Siegwart, "Stereo-based ego-motion estimation using pixel tracking and

iterative closest point", Computer Vision Systems, 2006 ICVS'06. IEEE International

Conference on. IEEE, 2006.

[17] A. Dev, B. Krose, and F. Groen, "Navigation of a mobile robot on the temporal

development of the optic flow", Intelligent Robots and Systems, 1997. IROS'97, Proceedings of

the 1997 IEEE/RSJ International Conference on. Vol. 2. IEEE, 1997.

[18] H. W. Sorenson, "Least-squares estimation: from Gauss to Kalman", Spectrum, IEEE 7.7

(1970): 63-68.

139

[19] G. Welch and G. Bishop, "An introduction to the Kalman filter" (1995).

[20] D. Simon, "Kalman filtering", Embedded Systems Programming 14.6 (2001): 72-79.

[21] E. Trucco and A. Verri, “Introductory techniques for 3-D computer vision”, Vol. 201.

Englewood Cliffs: Prentice Hall, 1998.

[22] Y. Ma. (Ed.), “An invitation to 3-d vision: from images to geometric models”, Vol. 26.

springer, 2004.

[23] J. S. Zelek, M. Holbein, K. Hajebi, D. C. Asmar and D. Cheng, "IR depth from stereo for

autonomous navigation", Defense and Security. International Society for Optics and Photonics,

2005.

[24] C. Harris and M. Stephens, “A combined corner and edge detector”, Alvey vision

conference, pp. 147-151, 1988.

[25] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints", International

Journal of Computer Vision, pp. 91–110, 2004.

[26] J. Shi and C. Tomasi, "Good features to track", Computer Society Conference on Computer

Vision and Pattern Recognition, 1994, pp. 593-600, 1994.

[27] G. Bradski and A. Kaehler, “Learning OpenCV: Computer vision with the OpenCV

library”, O'Reilly Media, Inc., 2008.

[28] B. D. Lucas and T. Kanade, "An iterative image registration technique with an application

to stereo vision", Proceedings of the 7th international joint conference on Artificial intelligence,

pp. 674-679, 1981.

[29] J. Y. Bouguet, "Pyramidal implementation of the affine Lucas Kanade feature tracker:

description of the algorithm", Intel Corporation, 2001.

140

[30] F. Devernay and O. Faugeras. "Straight lines have to be straight." Machine vision and

applications 13.1 (2001): 14-24.

[31] J. Park, S. C. Byun and B. U. Lee, "Lens distortion correction using ideal image

coordinates", Consumer Electronics, IEEE Transactions on 55.3 (2009): 987-991.

[32] G. Medioni and S B Kang, “Emerging topics in computer vision”, Prentice Hall PTR, 2004.

[33] “Camera Calibration using OpenCV” to obtain the calibration code and algorithm, online

link http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html.

[34] S. Krig, "Image Pre-Processing", Computer Vision Metrics. Apress, 2014. 39-83.

[35] O. R. Vincent and O. Folorunso, "A descriptive algorithm for sobel image edge

detection", Proceedings of Informing Science & IT Education Conference (InSITE). 2009.

[36] J. Canny, "A computational approach to edge detection", Pattern Analysis and Machine

Intelligence, IEEE Transactions on 6 (1986): 679-698.

[37] L. G. Roberts, “MACHINE PERCEPTION OF THREE-DIMENSIONAL soups”, PhD diss.

Massachusetts Institute of Technology, 1963.

[38] J. M. S. Prewitt, "Object enhancement and extraction", Picture processing and

Psychopictorics 10.1 (1970): 15-19.

[39] J. S. Weszka, R. N. Nagel and A. Rosenfeld, "A threshold selection technique", Computers,

IEEE Transactions on 100.12 (1974): 1322-1326.

[40] N. Otsu, "A threshold selection method from gray-level histograms", Automatica 11.285-

296 (1975): 23-27.

[41] T. Pun, "Entropic thresholding, a new approach", Computer Graphics and Image

Processing 16.3 (1981): 210-239.

141

[42] P. V. C. Hough, "Method and means for recognizing complex patterns", U.S. Patent No.

3,069,654. 18 Dec. 1962.

[43] D. H. Ballard, "Generalizing the Hough transform to detect arbitrary shapes", Pattern

recognition 13.2 (1981): 111-122.

[44] R. O. Duda, and P. E. Hart, "Use of the Hough transformation to detect lines and curves in

pictures", Communications of the ACM 15.1 (1972): 11-15.

[45] C. Golban, C. Mitran and S. Nedevschi. "A practical method for ego vehicle motion

estimation from video", Intelligent Computer Communication and Processing, 2009. ICCP 2009.

IEEE 5th International Conference on. IEEE, 2009.

[46] S. M. Kay, “Fundamentals of Statistical signal processing: Estimation Theory”, Vol 1,

Prentice Hall PTR, 1993.

[47] P. J. Hargrave, "A tutorial introduction to Kalman filtering", Kalman Filters: Introduction,

Applications and Future Developments, IEE Colloquium on. IET, 1989.

[48] C. Nielsen, and J. Nielsen, "Robust 6DOF Ego-Motion Estimation for Handheld Indoor

Positioning", International Conference on Image Processing, Computer Vision and Pattern

Recognition, 2012.

ucalgary 2014 dawar neha · 2017. 6. 27. · neha dawar a thesis ... cv based 6dof trajectory...

Documents