pose estimation from one conic correspondence
Post on 01-Mar-2018
222 Views
Preview:
TRANSCRIPT
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 1/110
Pose estimation from one coniccorrespondence
by
Snehal I. Bhayani
201211008
A Thesis Submitted in Partial Fulfilment of the Requirements for the Degree of
MASTER OF TECHNOLOGY
in
INFORMATION AND COMMUNICATION TECHNOLOGY
to
DHIRUBHAI A MBANI I NSTITUTE OF I NFORMATION AND C OMMUNICATION T ECHNOLOGY
June, 2014
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 2/110
Declaration
I hereby declare that
i) the thesis comprises of my original work towards the degree of Master of
Technology in Information and Communication Technology at Dhirubhai
Ambani Institute of Information and Communication Technology and has
not been submitted elsewhere for a degree,
ii) due acknowledgment has been made in the text to all the reference material
used.
Snehal Bhayani
Certificate
This is to certify that the thesis work entitled INSERT YOUR THESIS TITLE HEREhas been carried out by INSERT YOUR NAME HERE for the degree of Master of
Technology in Information and Communication Technology at Dhirubhai Ambani
Institute of Information and Communication Technology under my/our supervision.
Prof. Aditya Tatu
Thesis Supervisor
i
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 3/110
Acknowledgments
Is this what you ask,
or is this your answer?
Pray illuminate, for I have questions more and more
after you answer each one of these,
and more... ;
there is no learning save for when I stumble,
pray illuminate, for I have questions more.
Of the swathe of acknowledgments, the first one goes out to my supervisor
Prof. Aditya Tatu. His insights helped me not only achieve crucial progress, but
have different and interesting perspectives of problems at various times along
the duration of my thesis work. His guidance, starting from what, when, where
to keep notes(in Latex), to what shall we infer from which experiment, has been
paramount in molding my work in a comprehensive manner. With his knowledge
of mathematics and his sheer interest in the same, he has helped me get over mypoints of confusion, again, a numerous times.
I would like to acknowledge Prof. Manjunath Joshi for his guidance on camera
calibration tools and approaches. I would also like to acknowledge Prabhunath
sir, for his instantaneous help in creating a virtual setup for camera calibration. A
special thanks to my friend, Haritha for her swanky DSLR camera helped me have
the best possible images of calibration patterns for days at end. I would like to
thank all of my friends, colleagues and classmates, who put up with my changed
self while I worked and put up with my other self while I was not working and
they were. And last but not the least, a special thanks to my parents, and my sister
for their constant support and care all along.
ii
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 4/110
Contents
Abstract vi
List of Principal Symbols and Acronyms viii
List of Tables ix
List of Figures x
1 Introduction 1
1.1 Two camera setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 General assumptions . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Introduction to problem of pose estimation . . . . . . . . . . . . . . 8
1.3 Background work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Our contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Layout of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Epipolar geometry 13
2.1 Introduction to epipolar geometry . . . . . . . . . . . . . . . . . . . 13
2.1.1 Geometric definition of homography between two images . 14
2.1.2 Algebraic definition of epipolar mapping . . . . . . . . . . . 15
2.1.3 Some properties of the fundamental matrix, F . . . . . . . . 16
2.1.4 Question on homography generated in a one camera setup . 17
2.1.5 Question on homography generated in a two camera setup 20
2.2 Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Quadrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Geometric approach to pose estimation from one conic correspondence 28
3.1 Dependence of pose on conic correspondence and vector defining
the scene plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Conic correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . 30
iii
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 5/110
3.3 Mathematical implication of the first assumption on scene plane π 32
3.4 Estimating R and t through geometric construction . . . . . . . . . 37
3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5.1 Experiments for geometric approach on synthetic data . . . 47
3.5.2 Experiments for geometric approach on real data . . . . . . 50
3.5.3 Experiment of geometric approach on part real and part syn-
thetic dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Alternate approaches to pose estimation 54
4.1 Estimating R and t through optimization . . . . . . . . . . . . . . . 54
4.1.1 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Multi-stage approach to pose estimation: a comparison . . . . . . . 57
4.2.1 Optimizing the cost function . . . . . . . . . . . . . . . . . . 60
4.2.2 Results and discussion . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Conclusion and future work 63
References 66
Appendix A Basics of projective geometry 70
A.1 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
A.1.1 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.1.2 Basis of affine spaces . . . . . . . . . . . . . . . . . . . . . . . 72
A.1.3 Affine morphism . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.1.4 Affine subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.1.5 Invariants of Affine morphism . . . . . . . . . . . . . . . . . 75
A.2 Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
A.2.1 Definition of a projective space . . . . . . . . . . . . . . . . . 77
A.2.2 Basis of a projective space . . . . . . . . . . . . . . . . . . . . 78
A.2.3 Projective transformation . . . . . . . . . . . . . . . . . . . . 79
A.2.4 Projective subspaces . . . . . . . . . . . . . . . . . . . . . . . 81
A.2.5 Affine completion . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.2.6 Action of Homographies on subspaces and study of invariants 85
A.2.7 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.2.8 Homography as a perspective projection between two pro-
jective lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.2.9 Homography between two planes . . . . . . . . . . . . . . . 89
iv
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 6/110
Appendix B Camera models and camera calibration 92
B.1 Finite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
B.1.1 Elements of a finite projective camera . . . . . . . . . . . . . 93
B.2 Infinite Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
B.3 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Appendix C Some miscelleneous proofs 96
v
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 7/110
Abstract
In this thesis we attempt to solve the problem of camera pose estimation from one
conic correspondence by exploiting the epipolar geometry. For this we make two
important assumptions which simplify the geometry further. The assumptions
are, (a) The scene conic is a circle and (b) The translation vector is contained in a
known plane. These two assumptions are justified by noting that many artifacts
in scenes(especially indoor scenes), contain circles, which are wholly in front of
the camera. Additionally, there is a good possibility that the plane which contains
the translation vector would be known. Through the epipolar geometry frame-
work, a matrix equation is defined which relates the camera pose to one conic
correspondence and the normal vector defining the scene plane. Through the as-
sumptions, we simplify the system of polynomials in such a way that the task
involving solution to a set of seven simultaneous polynomials in seven equations,
is transformed into a task of solving only two polynomials in two variables, at
the same time. For this we design a geometric construction. This method gives
a set of finitely many camera pose solutions. We test our propositions throughsynthetic datasets and suggest an observation which helps in selecting a unique
solution from the finite set of pose solutions. For synthetic dataset, the solution
so obtained is quite accurate with an error of 10 −4, and for real datasets, the solu-
tion is erroneous due to errors in camera calibration data we have. We justify this
fact through an experiment. Additionally, the formulation of above mentioned
seven equations relating the pose to conic correspondence and scene plane po-
sition, helps to understand that, how does the relative pose establish point and
conic correspondences between the two images. We then compare the perfor-
mance of our geometric approach with the conventional way of optimizing a cost
function and show that the geometric approach gives us more accurate pose solu-
tions.
vi
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 8/110
List of Principal Symbols and Acronyms
α, β, γ : Real valued scalars.
En,Rn : n dimensional vector spaces over R. The basics of projective geometry
which we shall refer to, are mostly written in a language that uses the no-
tation En to denote a real vector space. The rest of our work shall use the
usual Rn notation.
P(En+1) : n dimensional projective space over R. As mentioned in appendix
(A.2), the underlying vector space is En+1.
p : En+1 \ 0n+1 → P(En+1) : Canonical projection from vector space, En to P(En)
as explained in section (A.2.2).
f :P(En+1)→P(En+1): A bijective mapping of points of P(En+1) onto itself such
that −→
f : En+1 → En+1 is an isomorphism. In literature this is termed as a
projective morphism or a projective transformation.
GL(n) : Set of n × n real invertible matrices.
E : The calibrated counterpart of F that relates the two images in the same way,
except for the fact that the calibration matrices are known and fixed.
K : A 3 × 3 camera calibration matrix that has the intrinsic parameters of the
camera, K ∈ GL(3).
H : A homography or a projective morphism over a projective plane, f :P(En+1)→P(E
where n = 2. This mapping is denoted by a matrix H ∈ GL(3).
cam(O, π , K ) : A camera unit with centre O, image plane π and calibration matrix
K .
F : A fundamental matrix defined in a framework of epipolar geometry, [1].
vii
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 9/110
[x]× : The skew-symmetric matrix defining the vector cross-product of x. If x =x1 x2 x3
T , then [x]× =
0 −x3 x2
x3 0 −x1
−x2 x1 0
.
I n×n or I n: An n × n dimensional identity matrix.
xn×1 or xn: An n dimensional column vector, x.
A F: The frobenius norm of matrix A.
viii
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 10/110
List of Tables
3.1 Results of single stage geometric approach on synthetic dataset.
Here Rtrue and ttrue denote true values and, R and t denote the pose
solution obtained through convergence for gradient descent scheme. 49
3.2 Results of single stage geometric approach on real dataset . . . . . 51
3.3 Result of part real data for investigating the error due to erroneous
calibration matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1 Result of gradient descent approach on synthetic data. Here Rinit
and tinit denote starting points, Rtrue and ttrue denote true values
and, R and t denote the pose solution obtained through conver-
gence for gradient descent scheme. . . . . . . . . . . . . . . . . . . . 56
ix
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 11/110
List of Figures
1.1 A setup describing epipolar geometry . . . . . . . . . . . . . . . . . 3
2.1 Epipolar geometry or the geometry of two views . . . . . . . . . . . 13
2.2 Epipolar plane drawn for epipolar geometry . . . . . . . . . . . . . 15
2.3 A one camera setup and its question . . . . . . . . . . . . . . . . . . 18
2.4 Geometric description of poncelet’s theorem, figure from [2]. . . . . 20
2.5 Conic correspondence through a projective transformation. . . . . . 24
2.6 A cone with its apex at origin, image from [1]. . . . . . . . . . . . . 25
3.1 Two cones,Q1 and Q2 describing a conic correspondence. . . . . . . 39
3.2 Rigid body motion of the cone Q2 onto Q2. . . . . . . . . . . . . . . 40
3.3 A diagram describing the geometric construction. . . . . . . . . . . 43
3.4 Pose solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 First test image containing conic C1 . . . . . . . . . . . . . . . . . . . 51
3.6 Second test image containing conic C2
. . . . . . . . . . . . . . . . . 51
3.7 Difference between the two conics of real and sythetic datasets . . 52
A.1 Commutativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
A.2 Associativitiy of perspective projections . . . . . . . . . . . . . . . . 90
A.3 An example of a homography between two projective planes l and
m due to perspectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C.1 Two series of circular cross-sections in circular cone, figure from [3]. 98
x
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 12/110
CHAPTER 1
Introduction
This thesis deals with the problem of one form of pose estimation as defined for
computer vision community. In rudimentary terminology this form of pose es-
timation can be stated as an estimation of relative orientation between two camera
positions in euclidean coordinate system from where the given scene has been imaged.
Haralick in [4] introduces four classes of pose estimation problems as given next:
1. 2D-2D pose estimation problem: We are given two-dimensional coordinate
observations from N observed images: x1,..., xN . These could correspond,
for example, to the observed center position of all observed objects. We are
also given the corresponding (or matching) N two-dimensional coordinate
vectors from the model: y1,..., yN . The rotation and translation in 2D plane
are to be estimated that relate these two sets of observations. In other words,
we have to determine the rotation matrix R and the translation vector t such
that the least squares error,
2 = ΣN n=1wn yn − (Rxn + t) 2, (1.1)
is minimized. wn represents the weight for the contribution to the error by
the nth point correspondence.
2. 3D-3D pose estimation problem: Let N 3D-coordinate observations be given
as y1,..., yN and that they match the corresponding 3D coordinates x1,..., xN .
Each observation yn is said to be the rigid body motion of the correspondingobservations xn in R3 space. They are related as
yn = Rxn + t + ηn, ∀n, 1 ≤ j ≤ N .
1
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 13/110
The pose estimation in this case is defined to be the estimation of R and t
that minimizes the error
2 = ΣN n=1wn yn − (Rxn + t) 2 .
3. 2D perspective-3D pose estimation problem: Let N 3D coordinate observa-tions be given as y1,..., yN and they match the corresponding 2D coordinates
represented as x1,..., xN , x j =
u j1 u j2
T . The exact relationship is given as
u j1 = f r1 yn + t1
r3 yn + t3,
u j2 = f r2 yn + t2
r3 yn + t3,
t = (t1, t2, t3) ,
R =
r1
r2
r3
, (1.2)
where f is the focal length or the distance of the image plane in front of
the origin that is the center of perspectivity and r1, r2 and r3 are the rows of
the rotation matrix R. Then the problem of pose estimation of this kind is
to estimate R and t when a set of correspondences between the 3D points
and the perspective 2D points are given. This problem is termed as exterior
orientation problem in photogrammetry literature.
4. 2D perspective-2D perspective pose estimation problem: This is perhaps the
most difficult of pose estimation problems. Here we do not have the 3D
world coordinates. Instead we have two images or perspective projections
of the same object. Or one can assume the object to be moving and the
perspective projection device to be fixed. A pin-hole camera model is onesuch theoretical device and is of interest to us. As a setup, we have a scene
and it’s image from two distinct positions of the camera(or as stated, we
have one camera and we are taking images of a scene which has undergone
a rigid body motion.). Then with point correspondences between these two
perspective projections one has to estimate the rigid body motion that the
scene has undergone. This statement forms the statement of 2D perspective-
2D perspective pose estimation problem.
2
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 14/110
Of the four classes above, our work is about the fourth type of pose estimation
problem, 2D perspective- 2D perspective pose estimation problem. This approach
requires an overview of a two camera setup. Hence before we go further, a general
arrangement of the two camera setup introduced in next section (1.1). The math-
ematical spaces considered throughout this report would be euclidean spaces 1,
unless specified otherwise.
1.1 Two camera setup
The purpose of introducing such an arrangement is two-fold. Firstly, it intro-
duces the various feature artifacts which would be used for establishing corre-
spondences between two images(like points, lines, conics etc.) and the varying
mathematical relationships amongst them, and secondly the same framework sets
up the idea of multiple-view geometry(or termed as epipolar geometry by Hartleyand Zisserman in [1]).
Figure 1.1: A setup describing epipolar geometry
1One can wonder, we are dealing with projective spaces and still the ones considered are notprojective spaces. The reason is, as shown in section (A.2.5), the projective space is obtained by"adding" points at infinity to an affine space(here we can consider an euclidean space as an affinespace with origin as the point [0,0,0]T ). For practical purposes we assume the points we deal withare "not at infinity". Hence the projective space is reduced to an affine(or an euclidean) space.
3
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 15/110
1.1.1 General assumptions
A two camera setup is depicted in figure (1.1). Here a pin-hole camera is decom-
posed into a projection center O(a point in R3), an image plane π and its calibra-
tion matrix K . This model is mainly of theoretical interest but for our application
we see that this highly simplified model works well enough so as to be able toignore various practical issues in a camera model. Such a camera model shall be
denoted as cam(O, π , K ). The calibration matrix houses quantities that determine
the relation between the position of a point x ∈ π in 2D image coordinate system
with respect to its position in the 3D global coordinate system of the camera. Let
O⊥ be the intersection of the line from O perpendicular to π with π . Then the
matrix K gives us the distance of the plane π from center O and the position of
the point O⊥ in terms of the local coordinate system of π . More on the structure
of K can be read from appendix (B.3).
As shown in the figure we have a pair of cameras cam(O1, π 1, K ) and cam(O2, π 2, K )
with their centers at points O1 and O2 in R3. The calibration matrices are same for
both of the cameras. The image planes associated with cameras O1 and O2 are π 1
and π 2 respectively. Now a quadratic curve is defined as the zero set of a second
order polynomial
Ax2 + By2 + Cxy + Dx + Ey + F = 0.
This polynomial can be written in matrix form as
x y 1
A C/2 D/2
C/2 B E/2
D/2 E/2 F
x
y
1
= 0. (1.3)
Using dual notation, henceforth we shall have the same notation for a quadratic
curve and for a matrix representation of its defining polynomial, C. For the above
defined curve, C means the matrix
A C/2 D/2
C/2 B E/2
D/2 E/2 F
and also the set of points
defined by the solution to the equation (1.3). In computer vision community such
a quadratic curve is termed as a conic.
The conics in the two image planes are assumed to be C1 in π 1 and C2 in π 2.
Further, let us have a third plane, π is oriented in R3 space, containing the scene
4
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 16/110
conic C such that its images are C1 and C2 upon imaging by the two cameras. For
a general orientation we assume that none of O1 or O2 lie on the plane π . This ar-
rangement of the three planes π , π 1 and π 2 constructs a special bijective mapping
between each pair of these projective planes, known as homographies in computer
vision terminology. Without getting into details, we mention that a homography
is a bijective mapping between two projective planes such that projective lines
are mapped to projective lines. Precise definition and properties of homography
between two planes is described in sections (A.2.3) and (A.2.9) of appendix (A).
From these definitions we note that such a mapping can be represented by a real
invertible matrix, H , unique upto a non-zero scalar multiple. Then the point map-
ping between two projective planes π 1 and π 2 is defined as
Hx = y, x ∈ π 1, y ∈ π 2,
where x and y are homogeneous representations for points of projective planes.
This matrix, H , shall henceforth represent such a homography between two pro-
jective planes. Then as mentioned before, that the arrangement of the three planes
π , π 1 and π 2 construct homographies between planes π and π 1, between π and
π 2, and between π 1 and π 2 as shown below:
H 1 : π 1 → π , H 2 : π 2 → π and H : π 1 → π 2
where
H = H −12 H 1.
Contrary to what we assume for defining homography above, we assume that
the three planes π , π 1 and π 2 are represented as euclidean planes rather than pro-
jective planes. A practical application would mostly have the cameras at finite
location and the projective point representing the finite camera center would be
uniquely identified by its corresponding euclidean (or affine) counterpart. Even
if there is point at infinity in P(E4) which is imaged to obtain a finite point in the
image plane, we treat those points as special cases of parallelism. Further, the
points in P(E4) which are imaged to points at infinity on the image planes are the
points lying on the principal plane2, [1]. But for practical situations we don’t con-
sider those parts of scene that lie on the same side of the image plane on which
the camera center lies. This means that points on the principal plane would not be
imaged, which implies that the points in the scene in front will never be mapped
2A principal plane in a camera is the plane parallel to the image plane and passing through thecamera center.
5
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 17/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 18/110
ditions for point correspondence x ↔ y:
yT Ex = 0 (1.5)
if and only if x and y are the images of the same scene point4. Another matrix
discussed and taken up subsequently by noted researchers termed as fundamentalmatrix. This matrix is the un-calibrated counterpart of the essential matrix.
E = K T FK .
This means the same point correspondence is defined but the point measurements
don’t need the calibration matrix to be known. A detailed explanation and treat-
ment of both these matrices can be found in the textbook, [1] by Hartley and Zis-
serman. These equations form the backbone of our thesis.
Relative orientation of plane π 1 with respect to plane π 2 in E3 is assumed to
be rotation, R and translation, t. These quantities are such that a point y ∈ π 2
when rotated and translated through R and t, we would get a corresponding
point x ∈ π 1 as x = Ry + t. Thus in the figure given above, if O1 is at origin
O1 =
0 0 0T
, then O2 = −RT t. The points of intersection of line −−−→O1O2
with planes π 2 and π 1 are known as the epipoles e1 and e2 of cameras 2 and 1 re-
spectively. The essential matrix, as introduced above can be decomposed in terms
of R and t, [5]:E = [t]×R.
The fundamental matrix in terms of R and t is decomposed as [1]:
F = [e2]× H .
In lemma (1) in chapter (3), we prove the following relationship between pose
parameters R, t and variables of epipolar geometry, H , e
t = λK −1e,
R = λ−1(K −1 HK + K −1evT K ), (1.6)
where R, t, e and K have their usual meanings and λ is a real scaling factor. v
represents the position of the scene plane.
4x and y are measured in image planes, assuming that the cameras are calibrated.
7
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 19/110
1.2 Introduction to problem of pose estimation
With the above setup in mind, one can now define the pose estimation through
mathematical quantities. Considering the camera centers, O1 and O2, translation
vector t is defined as:
t =−→O1 − R
−→O2, (1.7)
where−→O1 and
−→O2 are the vector representations of points O1 and O2 in R
3.
R is the rotation matrix which maps image plane π 2 to a parallel position to that
of π 1 upon rotation. In other words if π 1 is defined as uT 1 x + 1 = 0, ∀x ∈ R3 and
π 2 as uT 2 x + 1 = 0, ∀x ∈ R3, then R can be estimated as the rotation of the unit
vector u2/u2 to the unit vector u1/u1.
Thus pose estimation in this thesis, is defined to be an estimation of R and t, given
the two images or the two image planes, π 1 and π 2. In this thesis, the assumption is
that the cameras are calibrated and the camera calibration matrix K is same for
both cameras. More assumptions would follow in later chapters, some for more
accuracy and some for simplicity.
1.3 Background work
The history of computer vision is rich and full of brilliant insights. Its association
with projective geometry is even richer. We have listed out the four main classes
of pose estimation problem along the lines of a paper by Haralick, [4]. In a later
paper, Haralick et al. in [6] work over a similar kind of problem but exclusively
in an euclidean space, where they look at a closed form solution to pose estima-
tion from a set of three point perspective projections. This problem would be of
the type defined as 2D perspective-3D pose estimation problem in point (3). Pho-
togrammetry deals with these problems in detail and as well has its application
to computer vision. Higgins, in [5], introduced the essential matrix of equation(1.5) primarily to tackle the problem of relative orientation or in other words the
pose estimation problem. Authors in the same paper give us an algorithm to es-
timate R and t from E. This fact can be understood from the algorithm proposed
for estimation of R and t in [5]. A comprehensive study on fundamental matrix
and its related treatment has been carried out by other researchers, Zhang in [7]
and, by Luong and Faugeras in [8]. For the second and more crucial part of the
problem [5, 8, 7, 4, 9, 10] use point correspondences to estimate either the fun-
8
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 20/110
damental matrix or essential matrix. As against this, Heyden and Kahl in [11]
have used conic correspondences to estimate the fundamental matrix. The au-
thors give a brief survey of various features(like points, lines, curves and many
more) used in the past to estimate the fundamental matrix. They also state the
reasons for why conic correspondences are preferred by certain researchers over
the conventional point and line correspondences. The primary motivation is the
fact that many man-made objects contain a curve which is either a conic or can
be approximated to be formed of conics. Another reason is a property of projec-
tive transformation that any projective transformation maps a conic into another
conic(also termed as the projective invariance property). A projective transforma-
tion is a pointwise mapping between two projective spaces.5 Ji et al. in [12] have
used a mix of various geometric features like points, lines and conics, to estimate
the pose of a camera with respect to the object coordinate frame. Towards the
same objective, they have have considered a linear approach that combines geo-metric features at different levels of complexity, thus improving the stability and
accuracy of the solution. The approach estimates the pose parameters from point
correspondences, line correspondences and 2D ellipse-3D circle correspondences.
For circle-ellipse correspondence, they have obtained two polynomials which de-
fine two constraints on the relative pose. But the authors have assumed that the
radius of the circle is known and the property of the circle as a conic section is
not used completely as the focus is more on using as many feature correspon-
dences as available. The same problem of 2D perspective-3D pose estimation is
worked over by Wang in [13]. The approach as proposed in the paper amounts
to estimating the pose of the camera from single view and under the assumption
that the intrinsic camera parameters are known. The approach uses the image
of an absolute conic6 to estimate the pose of the camera. An added assumption
is needed that the image of center of the 3D circle is known is employed for the
minimal case where image of only one circle is known. But it hasn’t been explic-
itly justified when would such an assumption hold true though some methods of
estimating the same have been suggested.
5Definition of projective transformation in strict mathematical terms is given in appendix(A.2.3).
6The absolute conic is an imaginary conic at infinity that consists of purely imaginary points.The image of the conic is shown to depend only on the calibration matrix.
9
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 21/110
1.4 Our contribution
Our contribution primarily lies in an attempt to solve the problem from a slightly
different perspective. It has motivated two different approaches for pose estima-
tion. The first approach is based on the equation,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
where R, t, C1, C2 have their usual meanings, u ∈ R3 is the vector7 defining
the scene plane that contains the conic C and µ is a scaling factor introduced to
account for the homogeneous quantities, C1 and C2 in the equation. The above
equation is derived by combining epipolar geometry with one conic correspon-
dence. Intuitively this equation describes the relationship between the pose R, t
and the pair of conics in correspondence through the normal vector of scene plane,
u. This constraint can be further simplified if we assume that the conic C in scene
whose images C1 and C2 are known is a circle8 and that the translation vector lies
in a specified plane(defined by a normal vector w). These assumptions reduce the
number of unknown variables in previous equation to get
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
wT t = 0, (1.8)
where R, t and µ are to estimated and C1, C2, u, K and w are known. A straight-forward way to solve the above equations is to write a gradient descent algorithm
via explicit calculation of gradient vectors or use MATLAB’s inbuilt functions for
optimization on a cost function modeled from equation (1.8). Unfortunately any
optimization method, in general, can get stuck in a local minima, and through ex-
periments on synthetic datasets, we have found that the algorithm does get stuck
at a point which is nowhere close to the true value. Such an experiment and its
result is given in section (4.1) of chapter (4). A second problem is that there is
no sure-shot way of figuring out how many global minimas does our system of
polynomials have. These facts make the starting points of the parameters more ac-
countable to how does the algorithm behave. An estimate closer to the true value
helps the algorithm behave nicely and converge accurately to the true solution.
But with a starting point quite far off, the solution achieved upon convergence is
not at all close enough to the true value. To get round to this problem, we design
7The vector u defines the plane through the plane equation xT u + 1 = 0, ∀x ∈ R3.8By its requirement to being a circle we mean, a circle in the global coordinate system in R3.
10
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 22/110
a geometric construction9 such that one can estimate all possible pose solutions to
a given problem. For this we transform the problem of estimating pose solutions
through optimization of cost function of equation (1.8) to a problem that involves
finding solutions to two pairs of polynomials, with each pair depending only on
two variables. The first pair of polynomials has three and four degree polyno-
mials, whereas the second pair has quadratic polynomials. These polynomials
can be accurately solved using the symbolic computation toolbox available with
MATLAB. The advantage here is that at a time we have only two polynomials in
two variables to solve which is a considerable improvement over the conventional
optimization task which includes solving seven polynomials in seven variables at
the same time. This is the reason for the high accuracy our approach achieves.
Further, solving these polynomials we get the pose as a finite set of all possible
solutions in form of R and t. The process follows a geometric construction and
does not need optimization which in turn helps improve the accuracy of the re-sults. The construction further improves our understanding of the above equa-
tion. The equation (1.8) relates the image and camera coordinate systems through
a conic correspondence. As a set of observations we propose some points on how
to pick one solution out of the finite set of all possible solutions as obtained from
this approach. We perform experiments on both real and synthetic data for this
geometric approach to pose estimation. For synthetic datasets, we find that the
pose solutions thus estimated, are accurate to an error of the order of 10−4. Espe-
cially for datasets with rotation matrix close to identity matrix, the observations
help us select a solution which is closest to the true values. But the observations
don’t hold true for datasets with rotation matrices considerably far from identity
matrix. For such cases, we propose using one additionally point correspondence
which is beyond the scope of this thesis. For real datasets the estimated pose so-
lution is not accurate enough. But through a related experiment, we demostrate
that the error in pose solution is primarily due to the error in camera calibration
process.
1.5 Layout of thesis
In chapter (2) we introduce the basics of epipolar geometry. It deals with the setup
of a two camera system but from a projective geometry point of view. The pre-
requisites of epipolar geometry are projective, affine and euclidean spaces whose
9For the time being, we consider only the euclidean coordinate system.
11
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 23/110
properties and definitions are well covered in appendix (A). The camera models
are covered in appendix (B) and camera calibration in appendix (B.3). The discus-
sion in these two appendices follows the textbooks by Hartley and Zisserman in
[1] and by Trucco, Emanuele and Verri in [14]. In chapter (3) we introduce and
describe in detail the geometric approach to pose estimation from one conic cor-
respondence with the two assumptions. Alongwith the discussion of the algebra
and geometry behind the approach we list the experiments performed on syn-
thetic and real data, we infer certain points of merit and demerit for the proposed
approach. In chapter (4) we take up two alternate methods of pose estimation,
which are solved through optimization algorithms. Their shortcomings and sam-
ple results for one method are provided alongwith an interpretation for the other
method. In chapter (5) we conclude the thesis where we discuss practical and
theoretical difficulties encountered, and a possible future line of work.
12
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 24/110
CHAPTER 2
Epipolar geometry
2.1 Introduction to epipolar geometry
Epipolar geometry is a geometry of two views and the underlying framework on
which this thesis is built upon. Before one looks into the details, it is worthwhile
to see why should one study the same. We start with a purely euclidean setup
already introduced in section (1.1) on a two camera system. The same setup is
redrawn here but with the necessary details kept and the rest removed.
Figure 2.1: Epipolar geometry or the geometry of two views
The two cameras are cam(o, π 2, K ) and cam(q, π 1, K ) and the scene plane is π .
Through this scene plane we have the point mappings a ↔ a,b ↔ b, c ↔ c
and d ↔ d. The points e2 and e1 are known as epipoles for images π 1 and π 2
13
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 25/110
respectively. These two points define correspondence a geometric setup. This
gives us the name epipolar geometry meaning, the geometry of two(’epi’) poles.
In general this point correspondence between points x1 in π 1 and x2 in π 2 can be
defined as
x1 ↔ x2 ⇔ −→qx1 ∩ −→ox2 = ∅ & pint ∈ π , ∃ pint ∈ −→qx1 ∩ −→ox2. (2.1)
This is the geometric way of defining a point correspondence. One point worth
noting is that the camera setup of the figure (2.1) is in R3. If lines −→qx1 and −→ox2 are
parallel they don’t intersect in a point in E3, but in a point x∞, well defined in the
projective space P(E4) which by equation (A.13) is decomposed as
P(E4) = E3 P(E3),
where denotes the union of two disjoint sets. Thus point x∞ lies in P(E3).With this decomposition in mind, we can ensure that the point correspondence
between two images is well defined. This way of defining a point correspondence
motivates a special homography between two images. We call it special because,
such a homography would be constructed through the scene plane. As shown
later, this mapping is a part of a more general mapping between these two images
through the scene. In next section we intuitively describe this homography map-
ping through a scene plane and after that algebraically define the more general
mapping through scene points.
2.1.1 Geometric definition of homography between two images
Based on the way a point correspondence between two images through a scene
plane, π , is described, one can infer that such a mapping would be bijective. Dis-
tinct positions for π would give different mappings unless the planes are parallel
to each other. One point to note is that given a pair of images and a scene, not
every point in first image forms a correspondence pair with a point in second
image through a homography realized through a scene plane. Only the pointswhich are projection of points on scene plane, in both of the image planes are the
only ones forming correspondence pairs through homgraphy mapping generated
through π . This is termed as point transfer through scene plane π by Hartley and
Zisserman in chapter (9), [1]. But the scene points(irrespective of whether they lie
on scene plane or not) in general also setup point correspondences between the
two images. We look at this mapping in an algebraic formulation next.
14
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 26/110
2.1.2 Algebraic definition of epipolar mapping
We can model the correspondence of equation (2.1) using a fundamental matrix
as well:
x1 ↔ x2 ⇐⇒ xT 2 Fx1 = 0 where x1 ∈ π 1 and x2 ∈ π 2. (2.2)
Hartley and Zisserman in [1] term this representation to be the algebraic expres-
sion of the epipolar geometry. Given a pair of cameras, their image planes have
point correspondences related through this algebraic equation. But the point map-
ping is not unique which is evident from the two figures (2.1) and (2.2).
Figure 2.2: Epipolar plane drawn for epipolar geometry
From figure (2.2), we see that points c and c in plane π 1 map to the same
point c in plane π 2. In short all points that lie on line−−→ce1 are mapped to the same
point in plane π 2. Thus we say that the line−−→ce1 corresponds to the point c. For
geometric intuition one has the following definitions from [1]:
1. Epipolar plane of a point c ∈ π 2: A plane containing the line −→qo and the point
c is known as the epipolar plane of c.
15
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 27/110
2. Epipolar line of a point c ∈ π 2: The line l in π 1 obtained by the intersection of
the epipolar plane of c as defined above, with the image plane π 1 is known
as the epipolar line of c. This line is the set of all points of π 1 which can be
mapped to c through the two-camera setup described above.
To conclude, each point x ∈ π 2 has a unique line associated with it, l ∈ π 1. Thesame epipolar plane is also seen to be the epipolar plane of all points x ∈ π 1
such that x ∈ l. With simple geometry, one can say that, to every point x ∈ π 2
there is a unique line associated, l in π 1. The fundamental matrix F encodes this
correspondence:
l = F x, (2.3)
where l is a vector representation of line l in P(E3). Referring to section (A.2) of
appendix (A) we say that every line l in P(E3) corresponds to a plane throughorigin in E3 and the normal vector of this plane is denoted by l here. Hence this
representation is unique upto a non-zero scalar multiple, which conforms well
with the relationship given above. This is a point-line correspondence between
the two images that solely depends on the relative orientation of the two cam-
eras. It is just another perspective of the point-point correspondence of equation
(2.2). The geometric description of homography we saw in previous section is
constrained mapping of current mapping, as is evident from figure (2.2). In other
words, the point correspondence pairs through geometric description are a sub-
set of the correspondence pairs through the algebraic definition we discussed in
present section. In summary, this section builds the framework of epipolar ge-
ometry through which two images have point mappings realized through scene
points.
2.1.3 Some properties of the fundamental matrix, F
The fundamental matrix is of rank 2, unique upto a non-zero real scalar. Certain
decompositions and properties of this fundamental matrix are enlisted below for
a quick reference. Detailed discussions on properties and different interpretations
can be obtained from [1, 8, 7]:
1. If P1 and P2 are the projection matrices1, of two cameras then F = [e2]×P2P†1 .
2. If the relative orientation and position between the two cameras are defined
1 A projection matrix of a camera is discussed in appendix (B.1).
16
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 28/110
by rotation R and translation t,
F = K −T [t]×RK −1. (2.4)
3. If the scene contains a plane π and the point mapping through the plane is
defined by the homography H ,
F = [e]× H ,
were e is the epipole of the image plane π 2 of the second camera and H is
defined such that
x = Hx, ∃x ∈ π 2, ∀x ∈ π 1. (2.5)
The second property is helpful for an intuitive grasp of the setup. The fun-
damental matrix maps points from one image to the other albeit upto a certainambiguity. The points are specified in local coordinate systems2. The decompo-
sition is though specified in terms of R and t which can be seen as being external
or specified in absolute coordinate system as compared to the image and scene
planes involved. This enables us to infer from an algebraic point of view how
does the change in R and/or t affect the change in point mapping. For more clar-
ification, we can put equations (2.2) and (2.4) together:
xT K −T [t]×RK −1x = 0. (2.6)
Intuitively we see that this equation describes a relationship between point
mappings and relative orientation between the two cameras. Such an interpre-
tation will be useful for the approach we have devised for pose estimation, as
the aim is to estimate R and t from various feature correspondences. For want
of deeper insight, there are two questions which need to be answered related to
equations (2.5) and (2.6) in chapter (3). These answers help is a better understand-
ing of the single stage geometric approach for pose estimation, taken up in chapter
(3). Next we take up both the questions one by one.
2.1.4 Question on homography generated in a one camera setup
Before taking up the problem with two cameras, we consider a situation with just
one camera and the scene plane π 1. For a given relative orientation of the camera
2To every plane(image or object) we fix an internal cartesian coordinate system. When we talkof calibration matrix being fixed, we mean the coordinate system as well.
17
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 29/110
Figure 2.3: A one camera setup and its question
3 with respect to the scene plane π 1, we can have a homography H representingthe mapping π → π 1
4. Thus given a relative orientation of the camera and the
planes, we can construct a unique homography. This statement is well proved and
discussed in depth in the textbook, [1] by Hartley and Zisserman, and which we
accept here without proof. The actual question is inverse of the above statement:
“For a given homography can we orient the camera and scene plane in order to induce the
given matrix?". If we have fixed coordinate systems in both the planes, the given
homography actually translates to a euclidean problem. The homography, thus
gives us four point correspondences
5
between two planes π 1 and π :
ai → bi, ai ∈ π , bi ∈ π 1, 1 ≤ i ≤ 4. (2.7)
Thus the problem is about finding an orientation between the camera and scene
plane such that the point correspondences as mentioned above are obtained. One
can show that not any given homography (or a set of four point correspondences) can be
represented by an arrangement of the camera and the scene plane. It amounts to getting
the right representation and at the same time reducing the number of unknowns
and the number of equations in play. Once the basic arrangement is laid out, the3Following discussion on cameras in section (1.1), by camera, we mean a model comprising of
centre O1, its image plane π and the calibration matrix, K fixed as well as known.4One more point to note is, we can fix any coordinate system in π 1 and π planes. Thereafter a
change of coordinate system in any of the planes amounts to multiplying the obtained homogra-phy matrix by an invertible matrix of coordinate transformation. In fact calibration matrix is forthe same reason, to transform the coordinates from one coordinate system to another.
5The point correspondences are also assumed to have been measured in the pre-decided eu-clidean coordinate system.
18
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 30/110
reason for such a constraint is explained.
Such an euclidean arrangement is illustrated in figure (2.3). Here we have the
camera cam(O, π , K ). A calibrated camera means that the relation between the
local coordinate system of π and the global coordinate system is fixed. In fig-
ure (2.3) , the origin of coordinate system in π is O plane and the origin of global
coordinate system is O, the line −−−−→OO plane is perpendicular to π . As well as the
axes x − y of global coordinate system being parallel to the x plane − y plane axes of
the plane π . This information fixes the orientation of the plane π with respect
to the origin O and also the relationship of point P =
u plane v plane
T with the
global coordinate system. P as defined in the global coordinate system will be
P ≡
u plane v plane f T
, where f is the distance of O from the plane π . In terms
of polynomials, we can specify the same setup as a fixture of three quantities viz:
f and the distances of two arbitrary points6 P1, P2 ∈ π from O. These constraints
fix the orientation and the position of plane π with respect to the origin O. Thecalibration matrix encodes this information in the form of upper triangular ma-
trix K , but the equations help us understand the conditions that control the image
formation in a simple pin-hole camera.
With the basic setup with us, the point correspondences can now be defined as
mentioned before. Given four such point correspondences as labelled in equation
(2.7), we have to orient the plane π 1 relative to camera cam(O, π , K ).
Orienting π 1 in R3 to construct the desired homography
The way point correspondence between π and π 1 is defined, points ai, i = 1,..., 4
in plane π are mapped to bi = λiai in π 1, where λi is a scaling factor for point
ai. Then points λiai have to lie in the same plane, π . Further, the points bi are
measured in a local coordinate system and hence their positions are represented
by five distance constraints. In other words, five inter-point distances,
dist(b1, b2), dist(b1, b3), dist(b2, b3), dist(b3, b4) and dist(b2, b4)
are known, where dist(x, y) represents euclidean distance between two points xand y in R2 . Hence we have six polynomial constraints in four variables, λi, i =
1, ..., 4. This proves the fact we stated before that not all homography mappings
can be realized by a relative orientation of the scene plane with respect to the given
camera. We have an interesting result to further reinforce this fact, by Poncelet,
6The two arbitrary points ought be specified in the local coordinate system. So we can select
P1 =
1 0T
and P2 =
0 1T
.
19
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 31/110
[2] which is stated next.
Figure 2.4: Geometric description of poncelet’s theorem, figure from [2].
Poncelet’s theorem: A version of the famous Poncelet’s theorem mentions that
“When a planar scene is the central projection of another plane(image plane), the
planar scene and the image plane stay in perspective correspondence even if the
scene plane is rotated about the line of intersection of the image and the scene
planes. The center of perspectivity moves in a circle in the plane perpendicular to
this line of intersection".
For our requirements we can translate the same theorem as “Given an orientation
of the scene plane and the camera(consisting of the center, image plane and the calibra-
tion matrix, with a fixed coordinate system) inducing the given homography, any further
change in the relative orientation of the scene plane with respect to the camera will change
the homography."
This fact is an important point towards building up the original problem. For it
shows that in order to maintain the same homography in spite of change in orien-
tation of scene plane with respect to the image plane, the camera centre also needs
to move with respect to image plane(specifically in a circle). This means if we at-tempt to keep the distance of the camera centre from image plane fixed, no two different
orientations of scene plane can give the same homography.
2.1.5 Question on homography generated in a two camera setup
Adding one more camera to the above arrangement, we have two cameras and
the scene plane, π . We assume π 1 and π 2 are image planes of two cameras. Any
20
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 32/110
orientation so formed would give us a homography H between the two images
π 1 and π 2. The question we are asking is the inverse (as we did for Question 1
above):
Given a homography, H, can we orient the two cameras and the scene plane such that the
H is induced between the two image planes, by point transfer through the scene plane?.
This means that given a homography mapping H : π 1 → π 2, we can arrange the
three entities so as to obtain H 1 : π 1 → π , H 2 : π 2 → π and
H = H 2 H −11 . (2.8)
The orientation so obtained is the pose between the two cameras consisting of
rotation R and translation t. An exact dependence of R and t on H as well as on
epipole e(of image plane π 2 if H is defined as in equation (2.5)).
Algebraically the relation is specified as :
R = λ−1(K −1 HK + K −1evT K ) (2.9)
t = λK −1e, (2.10)
where λ is a non-zero scalar, and v is a parameter vector uniquely specifying the
orientation of the scene plane π in space. Next we derive these two equations.
The arguments follow a lemma stated in [1] and which we state here,
Lemma 1. We know that a fundamental matrix is of rank two. It can be decomposed as
F = [e]× H as we have seen earlier. Such a decomposition, given F, is not unique. Hence
this lemma says that if F has two decompositions,
F = [e]× H = [e1] H 1,
then e1 = λe and H 1 = λ−1( H + evT ) for a non-zero scalar λ and a vector v in R3.
Now, if we assume that the relative orientation between the two cameras with
same calibration matrix is represented by R and t, the projection matrices of the
two cameras are P1 = K [I 3|0] and P2 = K [R|t]. A property stated in [10] says
that with projection matrices given in this form, fundamental matrix would be
decomposed as F = [Kt]×KRK −1. Let us assume that for the same camera setup,
point e is the epipole of second image and homography between image planes
of the two cameras through a scene plane is H . Hence fundamental matrix can
be alternately decomposed as F = [e]× H . Thus we can apply lemma (1) with F
21
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 33/110
having two decompositions, F = [Kt]×KRK −1 = [e]× H to get:
Kt = λe,
KRK −1 = ( H + evT )/λ. (2.11)
Rearranging the terms, we get equations (2.9 ) and (2.10).Using these equations as a base, we hypothesize that a given H can help us to
identify the specific R and t in form of some conclusions:
1. If we keep the relative orientation of the two cameras the same and change
the plane position, the homography, H changes alongwith upto non-zero
scalar multiple.
2. If we keep the plane position fixed with respect to the coordinate system of
the first camera, it is not possible to obtain a relative orientation of the two
cameras so that a given homography is formed. This is obvious from the
above two equations in which R, t, e and λ are unknowns, nine in all. But
we have twelve algebraically independent equations. Hence not for every
H is a solution R and t, guaranteed.
The same inference can be seen through the breakup of homography be-
tween the two planes π 1 and π 2 in form of homographies H 1 and H 2. From
the discussion in question-1, we see that fixing a relative orientation between
a scene plane and the image plane, corresponding homography gets fixed.
Hence here as scene plane is fixed with respect to first camera, H 1 is fixed.
Assuming that H is given to us as well, from equation (2.8) we have
H 2 = H H 1
and hence H 2 is also fixed. Applying the discussion in question-1 again, we
see that not every homography H can be obtained through relative orienta-
tion between a scene plane and a camera. Thus for some values of H 2, we
have no possible relative orientation between two cameras, R and t possible.
3. If we have a solution R and t for a given homography, H and a given plane
position, then changing R and t would invariably change H . Two distinct
relative orientations would generate different homography through the same
scene plane.
These two equations motivate a method of pose estimation, R and t, from conic
correspondence which would put constraints on H , e and v and forms the basis
22
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 34/110
for one approach to pose estimation, taken up at the end in chapter (3). But it is
purely an optimization task though there is some possible of future work on it.
This thesis focuses on a different approach that involves one defining equation
instead of two here. We can combine this two equations by eliminating e. The
equation so formed forms the basis of our geometric approach. This equation has
been solved through optimization tool as well, but with results not good enough,
we create a geometric design and estimate R and t. Discussion on this design is
given in section (3.4) of chapter (3).
2.2 Conics
The epipolar geometry is laid out in previous section. It defines the point cor-
respondences between the two images. Such point correspondences lead to cor-respondences between more complex features of images. The main focus of this
thesis being the use of conic correspondences for pose estimation, it is worthwhile
investigating the formulation of conics, its basic properties and mathematical def-
initions for a conic correspondence. A conic is a second degree curve in a plane
described by a quadratic equation as its solution set:
ax2 + bxy + cy2 + dx + ey + f = 0. (2.12)
This is the euclidean plane equation. Its corresponding representation in P(E3) isobtained by homogenizing equation (2.12) using a third variable as:
ax2 + bxy + cy2 + dxz + eyz + f z2 = 0. (2.13)
The same equation can be encoded using a symmetric matrix:
x y z
a b/2 c/2
b/2 c e/2
d/2 e/2 f
x
y
z
= 0. (2.14)
The matrix C =
a b/2 d/2
b/2 c e/2
d/2 e/2 f
would define the conic upto a non-zero scalar
multiple. Using dual notation, we would use C to mean both, the set of points of
the conic and it’s defining equation. We use this notation to classify conics by in-
specting the matrix C. Ideally in a euclidean plane, we can have either degenerate
23
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 35/110
or non-degenerate conics:
1. A degenerate conic is the one in which rank (C) is less than three. In this case
the conic reduces to either a lone real point (
0 0 0T
), a set of two lines
(x = ± y) or a single line counted twice x = 0 in P(E3).
2. A non-degenerate conic is the one with a full rank matrix. We find all ver-
sions of a non-degenerate conic, viz parabola, hyperbola, ellipse and circle to
be projectively equivalent. This means that one non-degenerate conic can be
transformed into another through a projective morphism. Each of the three
forms parabola, ellipse and hyperbola are characterized by where does the
line at infinity meet them. A hyperbola meets the line in two distinct points
at infinity, a parabola is a tangent to the line at infinity and the ellipse doesn’t
intersect the line at infinity at all. So every projective morphism changing
one non-degenerate form to another is equivalent to moving the line at in-finity. But this is not the case in affine or euclidean classification.
In this thesis we focus on non-degenerate conics. As is evident from equation
(2.13) there are six coefficients to estimate which are unique upto a non-zero scalar
multiple. Hence five distinct points are sufficient to determine a unique conic.
Figure 2.5: Conic correspondence through a projective transformation.
Projective morphism of conics:
A projective morsphism would transform one conic into another. Referring to the
24
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 36/110
figure (2.5), we encode this relation as
H T C2 H = µC1, (2.15)
where H = A2 A−11 , A1 and A2 are the homographies of planes π 1 and π 2 with
respect to the scene plane π and µ is a real scaling factor. Matrix H denotes thehomography between π 1 and π 2.
Figure 2.6: A cone with its apex at origin, image from [ 1].
2.3 Quadrics
A Quadric is a surface defined by a quadratic polynomial in four homogeneous
variables in P(E4). It is the set of points which satisfy the relation
x y z w
S
x y z w
T
= 0 (2.16)
where S is a real symmetric 4 × 4 matrix defining the solution set. The above
definition tells that the surface is a quadratic surface. Just like conics, various
classes of quadrics can be studied through its defining matrix S. Henceforth we
shall denote a quadric as a set of points by its defining matrix, S.
A quadric has certain fundamental properties which can be read from chapter-3
in the book [1]. For the sake of reference below are certain properties which we
would need in coming chapters:
25
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 37/110
1. Every real symmetric matrix, S uniquely corresponds to a quadric upto a
non-zero scalar multiple. Hence nine points in general uniquely define a
quadric.
2. If S is singular, the quadric is said to be degenerate just like we mentioned
for conics in previous section. A cone at the origin defined as the set of points
x y z w
T such that
x2 + y2 = z2
is an example of a degenerate quadric which we use in this thesis. Such a
cone is shown in figure (2.6).
3. The intersection of a plane π with a quadric Q is a conic C. Computing the
conic can be tricky because it requires a coordinate system for the plane. Inlemma (2) we select one such coordinate system to estimate the conics C1,
C2 and C, as plane sections of cones.
4. As is for conics, a projective morphism transforms a quadric into another
quadric. Let us consider a projective morphism f : P(E4) → P(E4) as de-
fined in appendix (A.2.3) represented by a 4 × 4 real invertible matrix H .
Then the transformed set of points represented by Q = H −T QH −1 is also a
quadric.
2.4 Summary
In this chapter we introduce the epipolar geometry framework. We start with a
two camera system, and derive two important equations that encode the relation-
ship of relative orientation of one camera with respect to the other camera, R and
t on a conic correspondence. These two equations lead to three inferences about
how does a homography mapping change with change in R and t between the two
cameras. We present arguments in two different ways to arrive at the same con-
clusion regarding this dependence, one using the geometry of Poncelet’s theorem,
(2.1.4) and the other through the equations (2.9) and (2.10). With the focus of this
thesis primarily being on conics, next we introduce conics, their representations
as symmetric matrices and how conics would be transformed through projective
morphisms of P(E2) spaces. Discussion on cross-ratios of points on a conic is in-
cluded for a different perspective on the group of homographies that preserve a
conic. Similar argument holds for the subgroup of homographies that transform
26
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 38/110
a given conic C1 to the second conic C2. Lastly we give a brief introduction to
quadrics as homogeneous surfaces in R4 or as projective surfaces in P(E4), defin-
ing along the process, a cone.
Next chapter, (3) introduces our proposed method for pose estimation. The
approach suggested is an estimation of R and t, through a geometric construction
of a two camera setup. Hence we build the setup step by step and propose an
algorithm for the same. With results of experiments and analysis, on both real
and synthetic data, we compare the performance with conventional optimization
approach of solving pose for such a setup.
27
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 39/110
CHAPTER 3
Geometric approach to pose estimation from
one conic correspondence
In this chapter we start with two equations that relate pose parameters, R and
t to homography( H ), epipole(e) and a vector(let us denote it by u) defining the
scene plane, which are important constructs of the epipolar geometry. Using a
constraint developed from one conic correspondence(as discussed in section (2.2)
), alongwith two assumptions which we introduce and justify in section (3.4), we
obtain a matrix equation that directly encodes the dependence of R and t on the
two image conics and u. The simplified equation is indirectly solved through a
geometric construction, developed in section (3.4). Sample experiments are per-
formed for this proposed geometric approach on both synthetic and real data,
with the results listed in sections (3.5.1) and (3.5.2). Next two sections simplify
the above discussed set of equations to give us one defining equation, we men-
tioned above.
3.1 Dependence of pose on conic correspondence and
vector defining the scene plane
Let us restate the two equations that define R and t, (2.9) and (2.10) in terms of
epipolar geometry,
R = λ−1(K −1 HK + K −1evT K ),
t = λK −1e, (3.1)
where
1. R and t are the rotation matrix and the translation vector respectively de-
scribing the orientation of camera cam(O1, π 1, K ) with respect to cam(O2, π 2, K ).
28
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 40/110
Precise definition of pose as it depends on R and t and how it maps camera
1 to camera 2 is given in section (1.2).
2. H is the homography matrix that represents the mapping of points from
image plane π 1 to the image plane π 2. This mapping is through the third
scene plane, π via point transfer, described in section (2.1.1).
3. Point e is the epipole in the second image(or image plane π 2). e being mea-
sured in local coordinate system, is unique upto a non-zero scalar multiple.
4. K is the calibration matrix of the two cameras. It is in fact unique upto a
non-zero scalar multiple. One can see from the above equations that the
non-unique representation of e and H is compensated by the scalar λ and
the non-unique representation of K matrix.
5. Vector v defines the position of scene plane π , uniquely upto a non-zeroscalar multiple.
6. Scaling v up(or down) can be compensated for, by scaling t down(or up)
respectively keeping R and H the same.
Eliminating e from the two equations (2.9) and (2.10) we have
R − t
vT K
λ2
=
K −1 HK
λ . (3.2)
Let us denote the quantity by vT K /(λ2) by uT . Further, K −1 HK would be another
invertible matrix unique upto scalar multiple and hence is in one-to-one corre-
spondence with the homography matrix, H . One important point to note is that
H describes point mapping between the two images with the points measured
in local image coordinate system. Hence K −1 HK represents the same point map-
ping between the two images, but in another local coordinate system that has its
x − y axes parallel to the camera’s global coordinate system. This implies that the
matrix K −1 HK /λ represents the same homography matrix1. We rewrite the above
equation to get,
K −1(R − tuT )K = H /λ or R − tuT = H , (3.3)
where we denote K −1 HK /λ by H and To describe this equation in one line is
to say, given the relative orientation of camera cam(O1, π 1, K ) with respect to camera
cam(O2, π 2, K ) and the position of the scene plane, π in a global coordinate system, the
1This is one of the many places in this thesis where the fact that the cameras are calibratedwould be used to simplify the situation.
29
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 41/110
point correspondences between points in π 1 and π 2 through the points of the scene plane is
mapped through the homography H . The above equation is proved in a different way
by Hartley, [1]. It can be shown with some algebra that u represents the position
of the scene plane π . If π is represented by the solution set of the equation
π ≡
x y zT
∈ R3|m1x + m2 y + m3 z + 1 = 0, m1, m2, m3 ∈ R
. (3.4)
Then u is the vector
m1 m2 m3
T uniquely defining the position of π . Hence-
forth we shall alternately denote the plane π defined by a vector u as above, by
the notation π u.
3.2 Conic correspondence
The scene plane π contains conic, C whose images are C1 and C2 in planes π 1 and
π 2 respectively. These two conics are measured in the local coordinate system.
Then we have the transformed conics as C1 = K T C1K and C
2 = K T C2K as repre-
sentations of the two conics in a transformed local coordinate system in which the
x plane − y plane axes are aligned to the x − y axes of the camera’s coordinate system
and the origin O plane point of intersection of normal vector to π . We can use the
equation of conic correspondence stated in equation (2.15),
H
T
C2 H = µC1,
to form the new constraint
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3. (3.5)
This equation transforms the problem of pose estimation from conic corre-
spondence into a problem of estimating R, t and u from a set of five equations.
Though the matrix equation has six polynomial equations in all, its elements be-
ing unique upto non-zero scalar multiple, we have five equations, or by introduc-ing one more variable µ, we have six equations but the additional variable, µ. As
evident from the equation (3.5), t and u appear as a scalar product form. We need
to estimate u upto a scalar multiple. This argument reduces the variable set to R,
t, u =
1 n2 n3
T and µ: nine parameters in all from six equations. In order to
reduce the number of unknowns further, we introduce two assumptions,
1. Scene conic C is a circle in the global coordinate system.
30
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 42/110
2. The translation vector lies in a particular plane. Let us denote it by π w and
its defining normal vector by w.
The first assumption is easily realized by indoor scenes and to an extent by out-
door scenes. For example, a scene comprising of household artifacts is quite likely
to contain circular cross-sections in form of bottle-mouths, cups, glasses, doorknobs, objects of art and craft that contain circular arcs and curves, holes in walls
etc. And the complete circular curves don’t need to be visible, partially visible
curves can be fit to curves with considerable accuracy. In most of the cases, the cir-
cular objects in scenes would be solids and more like circular discs which implies
that they would be wholly in front of the camera while imaging. This means that
these circles would be always projected as ellipses. We have many tools available
for detecting an ellipse from an image and then fit a polynomial to this ellipse.
One such tool which we use for our experiments is developed by Prasad, [ 15].
The second assumption is not so commonly fulfilled as is the first one. But many
times it so happens that the camera is leveled and held on a tripod stand, even
as it moves. This fact can be used to estimate the plane that contains translation
vector. Hence in such cases, the plane containing the translation vector is already
known.
These two assumptions further reduce the number of variables in equation
(3.5). In next section (3.3) we prove the lemma (2), which says that u can be es-
timated as a set of finite solutions, each unique upto a non-zero multiple. Forthis we derive two equations (3.16) and (3.17). This means that out of the three
parameters of vector u, two are estimated through the lemma. Thus we are left
with seven parameters and six equations. Next, the second assumption reduces
the parameter set by one more variable. In summary, by employing two assump-
tions, we have a fully determined set of seven polynomials in seven variables.
Rewriting the constraint equations we get,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,
wT t = 0, (3.6)
where u, C1, C2 and w are known and R, t and µ are to be estimated. If we con-
sider the geometry described by the above equations, we can intuitively note that
all of the seven polynomials are algebraically independent. This means that for
non-trivial cases, unique solution(s) exist. These equations form the backbone of
the approach for pose estimation we next propose.
31
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 43/110
The conventional approach to pose estimation is a two stage task of estimating F
from feature2 correspondences and, then R and t from F. As mentioned earlier in
section on background work in chapter (1), there is a lot of literature and study on
methods for estimating F from point or conic correspondences. But most of these
methods treat F as a single mathematical entity to be estimated at once. For the
second stage of estimating R and t, we have an algorithm proposed by Hartley
in [10], based on SVD decomposition of the fundamental matrix. A point worth
noting here is the fact that the estimation of R and t from F gives little insight
in a two camera setup in euclidean space. Additionally, the first assumption is
not encoded directly in the fundamental matrix formulation and in the way it is
estimated from point correspondences. These are the reasons why we look for
a different approach to pose estimation from conic correspondence. The reason
why assumptions affect the methods we adopt, is due to the fact that for the first
assumption of the scene conic being a circle, there are direct constraints on theposition of the scene plane. These constraints on plane position aren’t evident
directly from treatment of F as one quantity or even when we estimate it directly
from point or conic correspondences. The idea here is to breakup the fundamental
matrix in such a way that we have a direct relationship among the quantities de-
scribing pose, R, t, the conic correspondence and scene plane position. Equation
(3.6) encodes such a relationship. This equation can be solved through a geo-
metric construction, with which we can estimate all possible pose solutions with
substantial accuracy. This relationship in mathematical terms is given by equa-
tion (3.6). Next we give a derivation of two equations that put two polynomial
constraints on the plane position by employing the first assumption. And though
we are looking for a geometric construction in order to list out all possible pose
solutions, in section (4.1),we give an account of a way to optimize a cost function
that encodes the equation (3.6) so that one can register the follies with such an
approach and we justify our motivation for such an approach.
3.3 Mathematical implication of the first assumption
on scene plane π
Let us consider the arrangement as given in section (1.1), where only one camera
cam(O1, π 1, K ) and the scene plane π are considered. Let the conic C1 ∈ π 1 be
known. Given this setup we claim that there are finitely many positions of the
2Traditionally, features have included points but lines, conics and curves have been subse-quently used for estimating fundamental matrix.
32
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 44/110
plane π , unique upto non-zero scalar multiple, such that the scene conic, C ∈ π
is a circle in the global coordinate system. This coordinate system is assumed
to have its origin coincide with O1 and the x − y axes parallel to the x − y axes
of the local coordinate system in plane π 1. In other words the orientation of the
camera cam(O1, π 1, K ) with respect to the global coordinate system is represented
by R = I 3 and t = [0 0 0]T .
Lemma 2. Let us assume π be a scene plane defined by the normal vector, u =
m1 m2 m3
T .
Without loss of generality, we can assume m1 = 0. Then we have a finite set of solutions
for variables m2 and m3 such that conic C is a circle.
Proof. Based on our assumptions, the projection matrix of camera cam(O1, π 1, K )
is P1 = K [I 3 03]. From a result by Quan in [16], we know that given a projecton
matrix P1 of a camera, and a conic C1 in image plane π 1, the cone containing C1 is
given by PT 1 C1P1. Let us denote it by Q.
∴ Q =
I 3
03
K T C1K [I 3 03] .
Let K T C1K be denoted as Ccal .
∴ Q = PT C1P1 =
Ccal 03
0T 3 0
.
Alternately Q can also denote a set of points defined as
x y z
T ∈ R
3|
x y z 1 Ccal 03
0T 3 0
x y z 1
T = 0
.
Then the conic C in π , (π is the scene plane) is the intersection of π with Q. Then
the intersection of π with Q is given as the solution set of the equation,
−1 − m2 y − m3 z
m1 y z 1
Ccal 03
0T 3 0
−1 − m2 y − m3 z
m1 y
z
1
= 0 (3.7)
33
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 45/110
=⇒
−1 − m2 y − m3 z
m1 y z
Ccal
−1 − m2 y − m3 z
m1
y
z
= 0. (3.8)
Thus the conic, C, in global coordinate system is obtained as the set of points
C =
x y zT
∈ R3|x =
(−1 − m2 y − m3 z)
m1&
−1 − m2 y − m3 z
m1 y z
Ccal
−1 − m2 y − m3 z
m1
y
z
Let us have the following notations,
n1 = 1
m1, n2 =
−m2
m1, n3 =
−m3
m1, (3.9)
o3 =
−n1 0 0
T , o1 =
n2 1 0
T , o2 =
n3 0 1
T . (3.10)
Then we have
−1 − m2 y − m3 z
m1 y z
Ccal
−1 − m2 y − m3 z
m1
y
z
= 0.
∴ (o3 + y(o1)) + z(o2))T Ccal (o3 + y(o1)) + z(o2)) = 0. (3.11)
This can be rewritten as
y z 1
oT
1 Ccal o1 oT 1 Ccal o2 oT
1 Ccal o3
oT 2 Ccal o1 oT
2 Ccal o2 oT 2 Ccal o3
oT 3 Ccal o1 oT
3 Ccal o2 oT 3 Ccal o3
y
z
1
= 0.
Now we need to represent the points
x y zT
∈ π in a local coordinate sys-
tem of π . Following the definition of plane π by a zero set of equation (3.4), we
select an orthonormal coordinate system that depends only on the parameters,ni, i = 1, 2, 3. We further simplify the notations,
34
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 46/110
M = −1 + n2n3 + n2
3
1 + n2n3 + n22
, a =
−n1 0 0T
,
k 1 =
(n2 + n3)2 + 2, k 2 =
( Mn2 + n3)2 + M2 + 1,
b =
n2 + n3k 1
− n1 1k 11k 1
T
,
c =
Mn2 + n3
k 2− n1
M
k 2
1
k 2
T
. (3.12)
The points a, b, c so parameterized lie on plane π . And the orthogonal axes are
−→ab =
n2 + n3 1 1
T
k 1, −→ac =
Mn2 + n3 M 1
T
k 2.
One can easily verify the following ,
−→ab .−→ac = 0,
−→ab = 1, −→ac = 1 .
From this parametrization, we have the following relationship between local co-
ordinate vector representation [u v]T and global coordinate vector representation
[x y z]T of the same point:
y = u
k 1+
Mv
k 2, z =
u
k 1+
v
k 2⇔ u = k 1
Mz − y
M − 1 , v = k 2
y − z
M − 1. (3.13)
Substituting the values of y, z in equation (3.11) we get
o3 + (
u
k 1+
Mv
k 2)(o1) + (
u
k 1+
v
k 2)(o2)
T
Ccal
o3 + (
u
k 1+
Mv
k 2)(o1) + (
u
k 1+
v
k 2)(o2)
=
Rearranging the terms, we obtain a polynomial in variables u and v,
o3 + u(o2 + o1)
k 1+
v( Mo1 + o2)
k 2 T
Ccal o3 + u(o2 + o1)
k 1+
v( Mo1 + o2)
k 2 = 0.
Rewriting we get,
u v 1
lT 1 Ccal l1 lT
1 Ccal l2 lT 1 Ccal l3
lT 2 Ccal l1 lT
2 Ccal l2 lT 2 Ccal l3
lT 3 Ccal l1 lT
3 Ccal l2 lT 3 Ccal l3
u
v
1
= 0,
35
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 47/110
where l1 = o1 + o2
k 1, l2 =
Mo1 + o2
k 2and l3 = o3.
Thus the conic C in parameteric form in local coordinate system is defined as
C =
u v
T ∈ π | u v 1
lT 1 Ccal l1 lT
1 Ccal l2 lT 1 Ccal l3
lT 2 Ccal l1 lT
2 Ccal l2 lT 2 Ccal l3
lT 3 Ccal l1 lT
3 Ccal l2 lT 3 Ccal l3
u
v
1
= 0
, (3.14)
and hence represented by the matrix,
lT 1 Ccal l1 lT
1 Ccal l2 lT 1 Ccal l3
lT 2 Ccal l1 lT
2 Ccal l2 lT 2 Ccal l3
lT 3 Ccal l1 lT
3 Ccal l2 lT 3 Ccal l3
. (3.15)
If we have a circle represented by the solution set of the equation,
(u − a)2 + (v − b)2 = r2,
its matrix representation would be obtained by rewriting the equation as
u v 1
1 0 −a
0 1 −b
−a −b a2 + b2 − r2
u
v
1
= 0.
For the conic C to be a circle, its matrix, as defined in equation (3.15), has to be
of the same form as given above. Hence we have two conditions as
lT 1 Ccal l1 − lT
2 Ccal l2 = 0 (3.16)
and
lT 1 Ccal l2 = 0, (3.17)
where l1 =
n2 + n3 1 1
T
(n2 + n
3)2 + 2
, l2 =
Mn2 + n3 M 1
T
( Mn2 + n
3)2 + M2 + 1
, Ccal = K −1C1K and
M = −1 + n2n3 + n2
3
1 + n2n3 + n22
, with n1, n2 and n3 as defined in equation (3.9).
We can solve the two equations, (3.16) an d(3.17) for two variables n2 and n3. Then
the plane π is defined by the vector,
−m1
−1 n2 n3
T ,
36
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 48/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 49/110
where i can be any numeral or a sub-scripted string, e.g. π αua23 denotes the plane
defined the vector αua23, α being a real scalar.
Geometric construction
The two assumptions which were stated at the start of this chapter help in geo-metric construction forming the bulk of this approach:
1. The scene conic C is a circle in the global coordinate system.
2. The translation vector t lies in a given plane(lets denote the plane as π w)
specified in the global coordinate system.
The geometry is augmented by the availability of conic correspondences C1 ↔ C2.
The calibrated counterparts of C1 and C2 are C 1 and C
2 respectively. Henceforth
in this section for geometric construction we shall use C 1 to mean the calibratedcounterpart of the conic C1 defined as C
1 = K T C1K . Revisiting the arrangement of
section (1.1), let us form cones Q1 and Q2 through conics C
1 and C2 respectively as
shown in the figure (3.1). This diagram is basically an extension of the two camera
setup of section (1.1). The rigid body motion that defines the relative pose of cone
Q1 with respect to cone Q2 is considered to be R and t. C 1 is the intersection of π 1
with Q1 and C 2 is the intersection of π 2 with Q
2. C is the intersection of π with
Q1 or with Q2. In fact the cones Q1 and Q
2 have to intersect in a planar conic. A
result is stated by Quan in [16] that the two cones must intersect in a quartic curve
which disintegrates into two second order planar curves of which one is the scene
conic C.
Step-1
Let us apply rigid body motion3 on the structure formed from camera-2(shown
in yellow in the above figure) and cone Q2. Camera-2 has its coordinate system
as origin O2 and triplet of axes {X c, Y c, Zc}. In other words, we first rotate Q2
through rotation matrix R and then translate it through the translation matrix
t. This motion results in cone Q2 and circle C is transformed into circle C. As
a result, the two cameras in this case would coincide and so would the image
planes π 2 and π 1. We see further that the circle C and the circle C have the same
radii. This situation is shown in the next figure (3.2). This rigid body motion is
precisely the relative pose we have to estimate. The idea lies in estimating the
3The rigid body motion is with reference to the coordinate system of camera-1, which consists
of origin O1, and the triplet of axes
X w f , Y w f , Zw f
.
38
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 50/110
Figure 3.1: Two cones,Q1 and Q2 describing a conic correspondence.
relative orientation between the circles C and C. The series of steps to follow
demonstrate a geometric construction of estimating the pose once the two circles
are known in R3. The two conics C1 and C
2 are known which give us two cones
Q1 and Q2 respectively. We apply lemma (2) to these two cones to get two sets of
plane positions of the form u =
m1 m2 m3
T denoted by U 1 and U 2. The two
sets are defined as:
U 1 is the set of planes π u1 such that the intersection of plane π u1 with cone Q1
is a circle, and U 2 is the set of planes π u2 such that the intersection of plane π u2with cone Q2 is a circle. The following property can be inferred from the proof of
lemma (2):
Lemma 3. If π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R − {0}. Similarly if π u2 ∈ U 2, then
π αu2 ∈ U 2, ∀α ∈ R − {0}.
Proof. Let us apply lemma (2) to C1 and its cone Q1. Inspecting the form of equa-
tions (3.16) an d(3.17) so obtained, we see that they are homogeneous polynomials
39
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 51/110
Figure 3.2: Rigid body motion of the cone Q2 onto Q2.
in three variables, m1, m2 and m3. By change of variables, we transform them into
polynomials of two variables n2 = −m2/m1 and n3 = −m3/m1. Hence scaling u1
by α does not have any effect on n2 and n3. Similarly we can argue for conic C2
and its cone Q2. Thus we have the result that if π u1 ∈ U 1, then π αu1 ∈ U 1, ∀α ∈ R.
Similarly if π u2 ∈ U 2, then π αu2 ∈ U 2, ∀α ∈ R.
For every plane π u1 ∈ U 1, we can always find a plane π u2 ∈ U 2 such that the
radius of the circle of intersection of π u1 with Q1 is the same as the radius of the
circle of intersection of π u2 with cone Q24. Let us define the radius of intersection
of π u1 plane with Q1 as ru1 and the radius of the intersection of π u2 with Q2 as ru2.
This means for every π u1 in U 1 we have π u2 in U 2 such that ru1 = ru2.
This relationship defines a pair of planes. This pair is important as every such paircan give a possible pose estimation and for every plane π u1 we have two planes
in U 2 which form such a pair viz π u2i and π −u2i for a scalar i ∈ N. Thus the set of
all possible pairs of planes which can give us a solution can be defined as
U sol = {(π u1, π u2) ∈ U 1 × U 2|ru1 = ru2} . (3.18)
4This can be seen as every cone extends to infinity and the radius can take any positive realvalue by appropriately positioning the plane.
40
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 52/110
This brings to us another interesting property which we have proved as a lemma
(18) in appendix (C).
Step-2
Once we have the two sets U 1 and U 2 in order, the set of all possible solutions to
the pose estimation problem forms a subset of U 1 × U 2 which has been defined in
equation (3.18). Let (π u11, π u21) ∈ U sol be one such solution pair. In other words
ru11 = ru21. This gives us the pair of planes as defined before. Let the circles be
defined by the matrices as
C =
1 0 −a1
0 1 −b1
−a1 −b1 a21 + b2
1 − r21
, (3.19)
and
C =
1 0 −a2
0 1 −b2
−a2 −b2 a22 + b2
2 − r21
. (3.20)
The circles are in a specific local orthonormal coordinate system which solely de-
pends on the plane position in R3. Matrix C represents the circle of intersection
of cone Q1 with π u11 and C represents the circle of intersection of cone Q2 with
π u21. Their radii being the same we denote them by r1. The two circles can be seen
as one being a rigid body motion of the other. Let us have the relative orientationdefined as
Rx + t = y, ∀x ∈ C, ∃ y ∈ C.
Further we know that cones Q1 and Q 2 intersect in C. Hence, applying the same
rigid body motion, R and t on cone Q 2, we should get the cone Q2, as shown in
figure (3.2).
Step-3
The next step is to map circle C to C through a rigid body motion comprising
of rotation R and translation t, on C. From the two circles’ representations (3.19)
and (3.20) we have the centers of two circles represented as
a1 b1
T for C and
center
a2 b2
T for C . But these center representations are in a local coordinate
system, unique for each plane. Their representations in global coordinate system
are obtained through equation (3.13) as shown in next equation. We shall denote
41
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 53/110
the global coordinate representation of the two center points as xc1 and xc2. Then
equations (3.12), (3.9) and (3.13) give us,
˜ yc1 = a1
k 11+
M1b1
k 12
˜ zc1 =
a1
k 11 +
b1
k 12
˜xc1 = −1 − m12 ˜ yc1 − m13 ˜ zc1
m11, (3.21)
for plane π u11 and
˜ yc2 = a2
k 21+
M2b2
k 22
˜ zc2 = a2
k 21+
b2
k 22
˜xc2 = −1 − m22 ˜ yc2 − m23 ˜ zc2
m21, (3.22)
for plane π u21. The plane π u11 is assumed to be defined by the vector
m11 m12 m13
T
and π u21 is assumed to be defined by the vector
m21 m22 m23
T . From equa-
tions (3.21) and (3.22), centers of the two circles are represented in global coordi-
nate system as
xc1 =
˜xc1 ˜ yc1 ˜ zc1T
, xc2 =
˜xc2 ˜ yc2 ˜ zc2T
.
Primary condition in mapping circle C to C is that the center xc1 should be mapped
to xc2. Second condition is that the translation vector t should satisfy the assump-
tion wT t = 0 where w is pre-specified. These two conditions lead to our next step
of geometric construction. figure (3.3) depicts the geometric construction for esti-
mating R and t by mapping one circle C to C . Steps four and five next describe
and solve this construction.
Step-4
We have a plane π w1 through point xc2 such that π w1 π w. Then the point
xc1rot = Rxc1 (let’s assume) ,
should lie on π w1 and
t = xc2 − xc1rot ,
42
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 54/110
Figure 3.3: A diagram describing the geometric construction.
which by construction would be on π w1. The distance of point xc1rot from origin
is the same as that of xc1 from origin. Hence we have the first two equations
constraining xc1rot as
wT 1 xc1rot + 1 = 0, (3.23)
xc1rot = xc1 . (3.24)
Let us denote the point on perpendicular line from origin to plane π u11 and also
lying on π u11 as p1, whose coordinates depend on u11 as
p1 = − u11
u11 2. (3.25)
The plane through xc1rot and parallel to π u21 is denoted as π uc1rot and defined by
the vector, uc1rot as
uc1rot = − u21
(xT c1rot u21)
.
The point on perpendicular line from origin to plane π uc1rot and also lying on
π uc1rot is denoted by p2. Its coordinates depend on uc1rot as
p2 = − uc1rot
uc1rot 2. (3.26)
43
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 55/110
Then the distance of p2 from xc1rot should be the same as that of p1 from xc1, giving
us the following polynomial equation:
p2 − xc1rot = p1 − xc1 . (3.27)
These equations (3.23), (3.24) and (3.27) encode the solution to the parameters Rand t. Point xc1rot obtained as a solution to the above three equations help us in
determining R with the following constraints:
Rxc1 = xc1rot ,
Rp1 = p2. (3.28)
Let us assume, A = xc1 p1 (xc1 × p1) and B = xc1rot p2 (xc1rot × p2) , (3.29)
where xc1 × p1 represents the vector cross-product of xc1 and p1, and similarly for
xc1rot and p2. Then, we estimate such a matrix R as
∴ R = B A−1, (3.30)
with A and B both being invertible matrices, justifying an existence of R as ob-
tained above. Now, from the way the solution to R is designed, we can ascertain
the following from equations (3.24), (3.27) and (3.28),
xc1 = xc1rot ,
p1 = p2 , (3.31)
and the angle between the vectors xc1 and p1 is the same as the angle between
vectors xc1rot and p2. With these facts in mind, one can easily prove the following
with matrices A and B as defined in equation (3.29),
AT A = BT B.
From this it is straightforward to note that the matrix R = BA−1 obtained as in
equation (3.30) is a rotation matrix. Once R is known, t is estimated as
t = xc2 − Rxc1, (3.32)
44
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 56/110
with xc1rot estimated from equation (3.28). Thus we have estimated R and t for
one solution point xc1rot to the three equations (3.23), (3.24) and (3.27) designed
for one pair of planes (π u11, π u21).
Step-5
The three polynomial equations have at most four real solutions, and each point
gives one pose solution R and t, leading to a maximum of four pose solutions
for each plane pair. Then above steps are repeated for all possible plane pairs
{π u11, π u21} ∈ U sol . Thus we get a set of solutions R and t obtained for all such
plane pairs. For a general case there would be more than one in such a set, of
which one would be the true solution we desire and the rest are to be elimi-
nated. Next section describes the non-uniqueness of solutions in this set and some
thoughts on how to pick the particular solution which actually realized camera
setup.
Non-uniqueness of solution
Case-1: The three equations (3.23), (3.24) an d(3.27) are polynomials of degree one,
two and two respectively. One can eliminate a variable and reduce the three equa-
tions to two. Hence the total number of possible solutions are four for each pair
of planes as an application of bezout’s theorem on counting intersection points of
two curves. Hence for every plane pair, (π u11, π u21), we have at most four pose
solutions possible.Case-2: The second case arises due to the fact that for every plane π u1 in U 1, we
have two planes possible in U 2, π u2 and π −u2, such that ru1 = ru2 = r−u2. These
planes have their normal vectors in opposite directions. The discussion following
lemma (3) states the same fact which in precise terms can be rewritten as:
If (π u11, π u21) ∈ U sol has translation t as part of its solution then (π u11, π −u21)
has translation −t as a part of one of its solutions. So the translation vectors for
all solutions to (π u11, π −u21) are negative counterparts of translation obtained as
a solution to (π u11, π u21). The complete relationship between pose solutions for
plane pairs (π u11, π u21) and (π u11, π −u21) can be derived based on equations (3.29)
and (3.30) as:
t1 = −t,
R1 = B1 A−1 (3.33)
45
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 57/110
where B1 = −B + 2
0 0 xc1rot × p2
.
∴ R1 = −R + 2
0 0 −xc1rot × p2
A−1. (3.34)
This relationship gives us a way to estimate R and t for one pair of planes (π u1, π −u2)
if R and t for the pair (π u1, π u2) are known.Case-3: The third case arises due to the fact that if (π u11, π u21) ∈ Rsol gives us a
solution R and t then (π αu11, π αu21) gives us a solution R and t/α. We prove this
fact in lemma (19) of appendix (C).
Because of the first two cases of non-uniqueness of a pose solution we have thirty
two pose solutions in all. Accounting for the third case as well, we can’t estimate
the translation vector beyond non-zero scaling. Case-1 & 2 can be solved through
some point correspondences or as we show next through some observations. We
show next a breakup of how one can eliminate all but one solution.Solution to case-1 and case-2: These two problems can be worked out by us-
ing some point correspondences. Ideally a single point correspondence should
be enough to select a true solution. But the one point correspondence we have
might be realized by more than one solution of R and t. Unfortunately, to the best
of our knowledge, there is no one-shot way of selecting the right discriminating
point correspondence. Additionally the main focus of this thesis being on min-
imal correspondences, we look for other ways for fixing one solution of the set
of solutions. We have tested our approach on synthetic data. The data has been
designed to model a real world scenario as closely as possible. For this we have
used the epipolar geometry toolbox, [17] in MATLAB. The point which is taken
care for is that the circle which is imagined by both of the cameras is wholly in
front of the cameras. In other words if c is the camera center, π is the image plane
and x is a point on circle, then c and x are points on different sides of the plane
π . This would eliminate sixteen of the thirty two solutions. The procedure is out-
lined next.
Condition for the scene conic to lie in front of the camera:
Consider a plane pair (π u11, π u21) and circles, C and C
in these two planes. Letcenters of the two circles be xc1 and xc2 as mentioned in step-3 above. Writ-
ing the defining vectors for the two planes, u11 as m1
1 −n2 −n3
T and u21
as m 1
1 −n
2 −n3
T , lemma (2) fixes n2, n3 for π u11 and n
2, n3 for π u21, which
means that the factors, m1 and m1 scale centers xc1 and xc2 respectively. Hence we
need the scales to be such that m1xc1 and m1xc2 lie in the front of the first camera.
This condition gives a possible range for the scaling factors. Either the range con-
46
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 58/110
sists of positive real values or negative values, based on which, we eliminate one
half of pose solutions in U sol .
Second observation is that for scenarios with small rotation angles the geodesic
distance of rotation matrix from identity matrix is the least for the specific pose
solution which is the best approximation to the true solution. This hypothesis
has been tested extensively on synthetic datasets. The distance metric used is the
geodesic metric on unit sphere, [18]:
d(R, I 3) =
trace(LT L)
where L = ((acos((trace(R) − 1)/2))(R − RT )/(2sin(acos((trace(R) − 1)/2)))),
(3.35)
where R is the rotation matrix whose geodesic distance from identity matrix, de-
noted by d(R, I 3) is to be estimated.
Case-3 solution: Once R is known, all that is left is to find the correct value of t.
Hartley in [1] talks about the case of epipolar geometry under the effect of pure
translation. From section (2.4) we can see that scaling t would scale F correspond-
ingly and thus the point mapping between the two images would still stay the
same. Hence, we can not estimate the scale of t simply by using the point cor-
respondences in the two images. Translation can be determined upto non-zero
scalar multiple only.
3.5 Experiments
This section contains results of some experiments we have performed on synthetic
as well as real data for the geometric approach proposed for pose estimation in
this section.
3.5.1 Experiments for geometric approach on synthetic data
Synthetic dataset has been designed using the epipolar geometry toolbox, [17].
Not going in details of the process followed, it should be noted that a scene circle
is first chosen in R3. Calibration matrix K is assumed to be an identity matrix.
Projection matrix P1 is the same for all examples, with the first camera assumed
to be at the origin of the world coordinate system. The projection matrix of the
second camera, P2 is chosen randomly, starting from ones with smaller rotation
angles and progressively with larger angles. One such dataset and its solution, so
obtained through our algorithm is described next.
47
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 59/110
Discussion on an experiment on synthetic dataset for geometric
approach
For this dataset, we estimate all possible thirty two distinct pose solutions, R andt. From points on non-uniqueness of solutions discussed previously, we select two
pose solutions, which are shown in figure (3.4), where the true camera positions
Figure 3.4: Pose solution
are shown in green and yellow colors. First camera is shown in green. For sec-
ond camera(shown in yellow), the rotation matrix, R true is defined through euler
angles about the three coordinate axes as −8◦ about z axis, 10◦ about y axis and
0◦ about z axis. The translation vector is set to be ttrue =
1 −11 1T
. Let R1, t1
48
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 60/110
and R2, t2 be the two best pose solutions selected through our algorithm. Then
the camera for pose solution R1, t1 is shown in blue which almost coincides with
the true pose for second camera and the camera for pose solution R2, t2 is shown
in black color.
Departure of rotation matrices for these two solutions and the true solution from
identity matrix is
d(Rtrue, I 3) = 0.3515, d(R1, I 3) = 0.3516 & d(R2, I 3) = 1.8472.
The distances are based on geodesic distance on unit sphere between two points
R1 and R2 in SO(3) group, [18]:
d(R1, R2) = log(RT 1 R2) F,
where R1 and R2 are two rotation matrices.
Table 3.1: Results of single stage geometric approach on synthetic dataset. HereRtrue and ttrue denote true values and, R and t denote the pose solution obtained
through convergence for gradient descent scheme.
Angles with re-spect to x, y and z axes
Translation vec-tor, ttrue
Geodesic dis-tance of R fromRtrue
RecoveredTranslationvector, t
Angle be-tweent andttrue
Geodesicdis-tance of R fromI 3
Is the se-lected solutionwith small-est geodesiclength?
10◦ , 0◦, 0◦
0.50.10.1
2.1 × 10−4
0.500970.10200.0989
0.2428◦ 0.2469 yes
10◦ , 20◦ , 0◦
0.7−0.1
1
7.9 × 10−6
0.6993−0.0999
1.0000
0.0028◦ 0.5513 yes
0◦, 10◦, −5◦
1
−30.1
2.3 × 10−4
0.9993
−2.99930.1000
0.0080◦ 0.2760 yes
1◦, 10◦, −8◦
1
−111
1.3 × 10−4
0.9960
−11.00961.0049
0.0321◦ 0.3154 yes
30◦ , 0◦, 0◦
−0.1
−13
0.1666 × 10−4
−0.0914
−1.0642−3.0429
0.8615◦ 0.7239 yes
1◦, −30◦, −80◦
0.0891
−0.0980−0.0178
3.9 × 10−4
0.0900
−0.9853−0.1790
0.0264◦ 2.093 no
One more point to note is that we estimate the translation vector only upto a
non-zero scalar multiple. Hence we scale it up with the same scalar which has
scaled the true translation of second camera for visualization purposes in figure
(3.4). Hence for this case we select R1, t1 as the best possible pose solution taking
into consideration the observation that this solution has its rotation matrix closest
to identity matrix in geodesic sense.
This was for one experiment on synthetic dataset for our proposed approach. We
49
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 61/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 62/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 63/110
Table 3.3: Result of part real data for investigating the error due to erroneouscalibration matrix
Geodesic dis-tance of Rtrue
from I 3
Translation vec-tor, ttrue
Geodesic dis-tance of R fromRtrue
RecoveredTranslationvector, t
Angle be-tweent andttrue
Geodesicdis-tance of R fromI 3
Is the se-lected solutionwith small-est geodesicdistance fromI
3?
0.4810
743.7650
−130.3833508.9385
0.0015
7.4437
−1.30335.0907
0.0168◦ 0.4813 yes
plotted in local image coordinate system.
Figure 3.7: Difference between the two conics of real and sythetic datasets
With this new dataset, we run our algorithm and select the best solution, tabu-
lated in table (3.3). If we continue the assumption of previous section that C2 with
other parameters kept the same, gives us the same pose solution, we have
K T C2K /µsyn = K T C2syn K /µ, µ, µsyn = 0
from equation (3.5). Hence C2 and C2syn represent the same conic which has been
found to not be true(as evident from figure (3.7)). Either the calibration matrix
or the ground true values for pose, aren’t accurate or the conics C1 and C2 have
erroneous representation matrices. But the conic detection algorithm has errors
of the order of 10−3 which can be considered to be sufficiently negligible. And
R and t as estimated through the toolbox, [19], give us pixel errors of the order
of 0.1. This leaves us the calibration matrix which has substantial errors, of the
order of 10 in each of its elements. Added to these errors is the distortion which
is not included in the calibration matrix, giving us incomplete rectification while
52
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 64/110
constructing conic representations in camera coordinate system. One can deduce
from this discussion that the primary reason for substantial error in pose solution
for a real dataset is errors in the calibration process we have employed through the
calibration toolbox. But it is worth noting that our algorithm gives better accurate
results than conventional optimization process for synthetic datasets.
3.6 Summary
This chapter forms the core of the thesis. We start with an introduction to two
equations derived in section (1) which relate the relative pose to a conic corre-
spondence. Based on these two equations, we devise a geometric construction
in an epipolar geometry framework simplified by two important assumptions
regarding scene conic and plane containing the true translation vector. The ge-
ometric approach thus proposed is tested upon both synthetic and real dataset.The results so obtained are compared, analyzed and discussed in order to explain
the performance of our proposed method. In next chapter, (4) we consider two
alternate approaches to pose estimation from one conic correspondence. These
two approaches differ from the geometric method we have taken up in this chap-
ter in the way in which we estimate the pose solution. As against this geometric
approach, these alternate approaches are based on optimization of cost functions
appropriately modeled on equations that relate pose, R, t to elements of epipolar
geometry, H , e, C1, C2.
53
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 65/110
CHAPTER 4
Alternate approaches to pose estimation
This chapter describes two techniques for pose estimation which we have consid-
ered at certain points but the results haven’t been as good as the ones we obtained
with geometric approach, discussed and reported in chapter (3). The first tech-
nique is based on the same set of equations, on which the geometric approach is
based, which means the two assumptions defined in section (3.2) of chapter (3)also hold true here. But we estimate pose through a conventional optimization
scheme instead of the geometric construction. This is described in next section,
(4.1). The second approach we report here is based on a different idea, which can
be seen to be loosely based on the work of Higgins, [5], Zhang, [7] and Luong,
[8]. We employ optimization for a cost function modeled for one conic correspon-
dence and one point correspondence. Optimization schemes have been either
gradient descent method implemented through calculation of gradient vectors or
through MATLAB’s inbuilt methods like lsqonlin(.). The results for both of these
implementations are comparable. Hence in section (4.1.1) we report results for
experiments of first approach through gradient descent.
4.1 Estimating R and t through optimization
The equations which define the dependence of R and t on conics C1, C2 and the
scene plane π are,
(R − tuT )T K T C2K (R − tuT ) − µK T C1K = 03×3,wT t = 0, (4.1)
where u, C1, C2, w and K are known and R, t and µ are to be estimated. For sake
of brevity we assume C1 = K T C1K and C
2 = K T C2K . From this we define the
54
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 66/110
following cost function,
E(Y , t, µ) = (Y − tuT )T C2(Y − tuT ) − µC
1 2F +(wT t)2+ Y T Y − I 3 2
F +(det(Y ) − 1)2.
(4.2)
This allows us to use the above lemma (2) for C 1, giving us the vector u upto a
scalar multiple. Hence from equation (3.6), we can consider u as a known constantand hence have to estimate all elements of t. Vector w being constant, we have
unknown variables as Y , t and µ. The norm for matrices considered here is the
frobenius norm:
A F=
trace( AT A).
We have replaced the rotation matrix, R, with a real matrix Y and additional con-
straints Y T Y = I 3 and det(Y ) = 1. The cost function has been optimized through a
command lsqnonlin(.) in MATLAB, [21]. Results of sample experiments with this
approach are listed in section (4.1.1). With a random starting point, the behavior
of the algorithm is as expected for a conventional optimization technique. After
a certain value of cost function is achieved, the algorithm tends to get stuck in a
local minima. Additionally the final value achieved upon convergence depends
on the starting point. Due to these reasons, it is practically unfeasible to estimate
a unique solution in form of a global minima to the cost function. This is evident
from results listed in table (4.1) of section (4.1.1). With a starting point closer to
the true value, the algorithm converges to a solution which is considerably close
to the true value. But with a starting point which is considerably far from the true
values, the point reached upon convergence is far from the true solution.
One can perform optimization by explicit computation of gradient descent as
well. These vectors, ∂E(Y , t, µ)
∂Y ,
∂E(Y , t)
∂t and
∂E(Y , t, µ)
∂µ are:
∂E(Y , t, µ)
∂Y = 4C
2YY T C2 + 2(tT C
2Yu)C2tuT + 2 u 2 C
2ttT C2Y + 2C
2YL
+ C2tuT LT + L + 4Y T Y − 4Y + 2det(Y )R−T
n (det(Y ) − 1),
∂E(Y , t, µ)
∂t = 2(tT C
2Yu)C2Yu + 2 u 2 C2
2t + 4 u 4 C2t − 4µuT C
1uC2t + C2Yu
+ 2C2tuT Y T C
2Yu+ u 2 (tT C2tC
2Yu + 2C2ttT C
2Yu) − µC2YC
1u + 2(wT t)w,
∂E(Y , t, µ)
∂µ = −2tT C
2tuT C1u + 2µ(trace(C2
1)) − trace(Y T C2YC
1) − tT C2YC
1u, (4.3)
where L = (tT C2t)uuT − µC
1 and L = ∂(tT C
2YY T C2Yu)
∂Y . L hasn’t been further
simplified here for it doesn’t have a concise representation in matrix form. But it
can be simplified using symbolic computation toolboxes like MATLAB or Maple.
Or it’s analytic expression can be derived through some tedious matrix algebra.
55
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 67/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 68/110
purely optimization scheme, it is not feasible to estimate all possible pose solu-
tions.
The function lsqnonlin(.) of optimization toolbox in MATLAB has the option of
two types of optimization algorithms inherently. One is the Levenberg Marquardt
algorithm, [22] and the other is the trust region method. These two vary in a man-
ner which is not quite important to our problem at hand. But what is crucial is
the fact that the these algorithms don’t always converge to the global minima or
even if they do, one can never ascertain fully how many distinct points of global
minima our cost function can attain. A second problem here is that the cost func-
tion which we are attempting to solve is a set of thirteen polynomials in thirteen
variables. Theoretically such a solution set has multiple solutions and through
such a pure optimization approach, it is not feasible to estimate all possible pose
solutions.
4.2 Multi-stage approach to pose estimation: a com-
parison
Another approach which we have given some thought to is a based on two stage
dependence of R and t on point and conic correspondences. This relationship
depends on a property of fundamental matrix that defines point correspondence
between two image planes π 1 and π 2 as,
a ↔ b ⇔ bT Fa = 0, a ∈ π 1 & b ∈ π 2. (4.4)
A fundamental matrix can be decomposed as F = [e]× H , as introduced in
section (2.4). Thus, given n point correspondences, {ai ↔ bi} as defined above,
one can think of minimizing the error
f (F) = Σni=1(bT
i Fa2i ).
This gives us the fundamental matrix F from which we have the essential ma-trix E = K −T FK −1 with the assumption that calibration matrix K , known. Once
E is known, R and t can be estimated through a relative orientation algorithm
suggested by Hartley in [10]. This has led to a method of pose estimation from
point correspondences which has been quite studied and researched in the past
and successfully implemented. Theoretically seven point correspondences of this
form are sufficient to estimate F. The points involved in this correspondence need
57
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 69/110
to be in general position. By a general position, we mean that no three points
should lie on the same line in any of the two planes1. Similar to these approaches
we suggest a method for estimating F from point and conic correspondences. To
begin with let us consider one point correspondence,
a ↔ b, a ∈ π 1 & b ∈ π 2, (4.5)
and one conic correspondence,
C1 ↔ C2, C1 ∈ π 1, C2 ∈ π 2.
Let us have a scene conic C in plane π , π being a scene plane. Then C1 and
C2 are images of C by the two cameras on image planes π 1 and π 2 respectively,
thus defining the above mentioned conic correspondence. The two cameras im-
age the same scene plane and hence there exists a homography between the twoimage planes, constructed by point transfer between two image planes through
π . We have defined this point transfer in section (2.1.1). Projective invariance of
conics implies that the same homography ought to transform C1 into C2. If this
homography mapping is denoted by H , we have
H T C2 H = µC1. (4.6)
This equation introduces a constraint on H in the form of a zero set of six ho-
mogeneous polynomials in nine homogeneous variables2, which are the elements
of vector h. Let us assume h = vec( H ), where vec(.) is the usual vector operation
in linear algebra that transforms an n × n dimensional matrix to n2 dimensional
vector formed by stacking up the columns of the matrix. The equation (4.6) is thus
transformed into a set of five polynomials given next:
f : R9 → R5 : f (h) =
hT S1h
hT S2h
hT S3h
hT S4h
−hT S5h
= 05×1, (4.7)
1In fact a set of three collinear points in one plane would invariably be mapped to threecollinear points in the other plane.
2The conic and homography representations are in homogeneous coordinates, due to whichwe estimate H uotp non-zero scalar multiple.
58
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 70/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 71/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 72/110
and H (or h = vec( H )) are obtained as:
δE(e, H )
δe = −2[ Ha]×bbT [ Ha]×e + 2(eT e − 1)e,
δE(e, H )
δh = −2bT [e]× Ha
a1[e]×b
a2[e]×b
a3[e]×b+ 2(hT h − 1)h, (4.14)
where a =
a1 a2 a3
T . The proposed algorithm is listed below.
Initialization: Let n = 0. We set timesteps te for update in e and th for update in h.
It is possible to keep the timesteps dynamically changing with the magnitude of
gradient vectors. But for the time being we take them to be constants. The thresh-
old for the cost function value, tolcost is preset. Starting point for e is a random
vector, but the starting point for H is chosen to lie on MX , h0 ∈ MX .
Algorithm:
1. Update e as en+1 = en − teδE(e, H )
δe |e=en, H = H n .
2. Project δE(e, H )
δh |e=en+1, H = H n on the tangent space of MX at point hn =
vec( H n). Let the projected vector so obtained be δh.
3. Compute geodesic with starting point as hn and the starting vector as δh.
The updated value of h is the endpoint of the geodesic, say hn+1 and H n+1
is obtained as H n+1 = vec−1(hn+1). The geodesic computation has been
implemented along the lines of [23] by Nowicki and Dedieu.
4. If E(en+1, H n+1) < tolcost stop and the solution is e = en+1 and H = H n+1
else increment n by one and repeat steps one through four.
4.2.2 Results and discussion
Both of these methods estimate e and H and hence F through one point and one
conic correspondence. Unlike the approach of section (4.1), we do away with the
two assumptions regarding scene conic being a circle and the translation plane
being known. As stated previously, R and t then can be found from via an SVD
decomposition of E = K −T FK −1. But the problem here is that being an under de-
termined polynomial system, the solution obtained through optimization would
vary with starting points and may not the be true one. We have implemented
61
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 73/110
these two optimization tasks on synthetic datasets, but the results aren’t promis-
ing enough to be listed here. The reason why we have listed this approach is that
there is some intuition in this idea. Previously we saw that MX is a polynomial
manifold and can be seen as an intersection of five quadrics in R9. Their defin-
ing matrices,S1,..., S5 are 9 × 9 symmetric matrices with a special structure. This
fact opens up a new way of looking at this optimization task. If the geometric
structure of this quadrics’ intersection is studied in detail, it may be possible to
have an improved optimization algorithm which gives us more accurate pose so-
lutions. Further, we can see that the intersection is a non-linear set of points inR9.
Hence a point of importance, we assume, is that by identifying the subsets of this
intersection set which are linear sets, we can simplify the process of optimization.
4.3 Summary
This chapter introduces two alternate ways of estimating pose from one conic
correspondence. The first approach, section (4.1) considers two assumptions and
hence one conic correspondence is enough for pose estimation. Whereas the sec-
ond approach of section (4.2) doesn’t consider the two assumptions and hence
needs five more point correspondences. For both of these approaches we design
cost functions which are optimized through either MATALB’s optimization tool-
box or gradient descent by explicit computation of the gradient vectors. We state
a common point for these two approaches, that the optimization tasks fail to con-
verge to the true solution point. For the first approach we carry out experiments
on synthetic data and justify this point as well.
62
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 74/110
CHAPTER 5
Conclusion and future work
In conclusion, we note that the geometric approach for pose estimation from one
conic correspondence gives us accurate pose solutions with the error being of the
order of 10−4. The idea for this approach rests on two important assumptions.
One assumption that the scene conic is a circle and the other assumption is that
the translation vector lies in a known plane. With these two assumptions thegeometry is highly simplified due to which we are able to employ computation
toolbox in MATLAB to solve the simplified set of polynomials and obtain all pos-
sible pose solutions. This further helps is estimating the finite set of all possible
pose solutions. Next, an observation is made that the pose solution with the rota-
tion matrix closest to identity matrix is the best approximation to the true value.
This observation helps in selecting one particular pose solution as final solution to
pose estimation problem. With experiments on synthetic data, we show that this
observation holds true and is valid for rotation matrices close enough to identity
matrix. For larger distances, we find the observation failing. This point props up
an important question which can form a part of future work. The question is that,
is a threshold possible to be computed analytically, such that for all cases with
distance rotation matrices from identity matrix less than this threshold, the obser-
vation holds true? If not then we need to find another way to select one solution
out of the finite set of pose solutions estimated through our geometric approach.
Another sure shot way is to use one point correspondence. But as mentioned in
chapter (3), we need to select one point correspondence which can be realized by
only one pose out of all the solutions. And such a selection in all general cases,doesn’t seem possible, atleast to our knowledge. So the search for a universal
method to pick one pose solution is still an open problem.
Secondly the results for real dataset have been marred by inaccuracies in cam-
era calibration. We haven’t figured out the source of error, but have shown that the
error in pose solution so estimated through geometric approach is solely due to
63
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 75/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 76/110
[24] form a grassmannian representation of a conic(in general a curve) in P(E4)
for epipolar geometry. Similarly, Burdis et al. in [25] consider the problem of
establishing correspondence between two curves which are images of the same
curve in 3D by considering the groups of projective transformations that leave a
curve invariant in a specific sense. These two approaches lend a new meaning
to epipolar geometry of curves and might be extended to the specific problem of
estimating pose from a conic(or generally a curve) correspondence.
65
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 77/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 78/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 79/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 80/110
[31] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 22(11), pp. 1330–1334,
2000.
[32] [Online]. Available: http://en.wikipedia.org/wiki/Bezout’s_theorem
69
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 81/110
CHAPTER A
Basics of projective geometry
A.1 Affine Geometry
In this section we introduce the geometry of affine spaces. These discussions will
lay the foundation for projective geometry.
A.1.1 Affine spaces
An affine space is a set A together with a vector space V and a faithful and transi-
tive group action of V 1 (with addition of vectors as group action) on A. Explicitly,
an affine space is a set of points, A together with a map:
l : V × A −→ A, (ν, a) → ν + a with the following properties:
1. Left identity: ∀a ∈ A, 0 + a = a (0 is a vector).
2. Associativity: ∀ν, w ∈ V , a ∈ A, ν + (w + a) = (ν + w) + a.
3. Uniqueness: ∀a ∈ A, ψa : V −→ A, ν → (ν + a) is a bijection. (This justi-
fies transitivity of the map l and faithfulness is seen in the fact that if two
elements f , g of V are such that
f + a = a, g + a = a, ∀a ∈ A,
then f = g.
The vector space V is said to underly the affine space A and is also called a differ-
ence space. Thus we have the operator ’+ ( defined as the map l)’ between a point
and a vector. Equivalently we can define an affine space in another way. We can
see it as some results that come from the above definition, considering an affine
space A and the underlying vector space is V :
1A group action of a vector space V on a set X is a map V × X → X , (v, x) : → v.x with asso-ciativity and existence of an identity element in V .
70
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 82/110
Lemma 4. We can have a subtraction map defined as :
φ : A × A −→ V , φ(a, b) → ν, ∀a, b ∈ A, v ∈ V
where
(ν, a) → ν + a = b
as per the definition of ’+’ operator above. Thus we can define φ(a, b) → b − a ≡−→ab = ν.
Then we prove here that this map is onto V and many-one.
Proof. If for two distinct vectors, v, u in V , we have φ(a, b) = v and φ(a, b) = u
then
φ(a, b) = v = u =⇒ v + a = b, u + a = b
=⇒ v + a = u + a
=⇒ (v − u) + a = a. (A.1)
Further, the uniqueness property says that 0 is the only vector such that
0 + a = a, ∀a ∈ A.
Hence we have
v − u = 0 =⇒ v = u.
This proves that the map φ : A × A → V is well defined. Also for every vector vin V and every point a in A, we can find a point b in A such that b = v + a. Hence
φ(a, b) = v, ∃b ∈ A, ∀v ∈ V .
This proves that the map φ is onto. And we can find distinct points a1, a2, b1, b2 in
A such that v = φ(a1, b1) = φ(a2, b2) for atleast one v in V . This proves that φ is
many-one.
Lemma 5. For three points a, b, c in A, φ(a, b) + φ(b, c) = φ(a, c) where φ is what we
have defined in lemma (4) above.
We shall accept this lemma without proof here.
Thus with a definition of an affine space in place, we can note one point that an
affine space is actually a set of points with such a vector space and represent it as
( A, V , φ) where V is the underlying vector space or also represented as (X ,−→X , φ)
where−→X is the underlying vector space. Henceforth in this literature we will use
71
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 83/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 84/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 85/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 86/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 87/110
U ⊆ V or V ⊆ U or U ∩ V = ∅.
Hence the condition of parallelism invariance can be stated as:
Lemma 8. Given an affine morphism f : X → X
between two affine spaces (X ,−→X , φ)
and (X
,
−→
X
, φ) if U , V be the affine subspaces of X such that U //V then the correspond-ing affine subspaces in image of f, U
= f (U ) and V
= f (V ) follow the property
U //V
.
Proof. We have seen in the section on affine morphism, that−→ f defines the corre-
sponding mapping between the underlying vectors spaces−→X and
−→X
. Hence by
definition of parallel subspaces,−→U ⊆
−→V or
−→V ⊆
−→U .
Let−→U ⊆
−→V .
∴ a
∈−→ f (
−→U ) =⇒ a
=
−→ f (a), ∃a ∈
−→U
=⇒ a
=−→ f (a), ∃a ∈
−→V ( ∵
−→U ⊆
−→V )
=⇒ a
∈−→ f (
−→V ). (A.5)
Thus a
∈ −→
f (−→U ) =⇒ a
∈
−→ f (
−→V ). Hence
−→ f (
−→U ) ⊆
−→ f (
−→V ). Similarly we
can show the other way round if we assume −→
V ⊆ −→
U . Hence now all that we
need to show is that f (U ) and f (V ) are affine subspaces of X . For that we need toshow that
−→ f (
−→U )and
−→ f (
−→V ) are corresponding vector spaces of f (U ) and f (V ).
This is pretty obvious from the alternate definition of affine morphism and that of
affine subspaces. Hence once we prove ( f (U ),−→ f (
−→U ), φ) and ( f (U ),
−→ f (
−→V ), φ)
are affine subspaces of (X ,
−→X
, φ), we can say that parallel affine subspaces are
transformed into parallel subspaces in image affine space.
A.2 Projective Geometry
Now we move on to definitions of a projective geometry. The projective geometry
is the geometry of a most general form and hence has fewer invariants but they
are neverthless extremely crucial.
76
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 88/110
A.2.1 Definition of a projective space
The projective space of dimension n, denoted P(En+1) is obtained by taking the
quotient of an (n + l)-dimensional vector space En+1 \ {0} with respect to the
equivalence relationship
x x
⇔ x = λx, ∀x, x
∈ En+1, ∃λ ∈ R \ {0}. (A.6)
Here we assume En+1 is the vector space over R. In some cases we might gen-
eralize to the complex field C and mentioned as required. Also do we see that
is an equivalence relation here.
Many other equivalent definitions are also found in literature for P(En+1). One
might be interested in looking at an equivalence relation as a 1-dimensional sub-
space of En+1, thus P(En+1) can be seen as the set of all 1-dimensional subspacesof En+1, or also the set of all lines passing through origin in En+1. Different ways
of looking at the definition, but essentially the same structure is obtained. Alter-
nate ways of describing a projective geometry are interesting enough to not miss.
Hence just for the sake of lateral view:
A projective space is a triplet (P, L, I ) such that
1. Any pair of distinct points are joined by a unique line.
2. Given any four points A, B, C, and D with no three collinear, if AB intersects
CD, then AC intersects BD.3
3. Every line is incident with at least three distinct points.
4. There exist three non-collinear points.
P is a set of points, L is a set of lines and I is an incidence structure which gives
us the information as to which line is incident on which point and which point is
incident on which line. We can derive as results from these axioms many otherproperties of a projective space including it’s invariants. But they being out of
the scope of this text we skip it. Beutelspacher in [28] and Casse in [29] give an
extensive treatment of this topic.
3This axiom leads to the much talked about property of a projective plane that any two linesmust intersect at a point.
77
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 89/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 90/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 91/110
Groups of projective transformations
If the mapping f :P(En+1)→P(E
n+1) is bijective , the mapping−→ f :En+1→E
n+1 is
one-one in a sense that ∀u, v ∈ En+1, u = λv, for any λ ∈ R, then−→ f (u) = λ
−→ f (v),
for any λ
∈ R. Similarly we can see that−→ f is onto as every vector in E
n+1 would
project on a point in P(E
n+1) which would have a unique corresponding point inP(En+1) which in turn would be a projection of some vector in En+1.
These three results tell us that we can uniquely identify a projective map-
ping between two projective spaces with a unique linear mapping between their
corresponding vector spaces. Further, we know well that any linear mapping,−→ f :En+1→E
n+1 can be alternately represented as multiplication by a matrix:
∀v ∈ En+1, f ( p(v)) = p( A × v). (A.10)
Hence for an homography, we need matrix A to be an invertible matrix. In other
words an homography is defined as the projective transformation where−→ f is an
isomorphism. The set of all homographies, PLG(En+1), that can be represented by
the set of all such invertible matrices, A, form a group, with the group operation
being the composition of homographies:
f , g ∈ C(P(En+1), P(En+1)) ⇒ f ◦ g ∈ C(P(En+1), P(En+1)).
The identity element is seen as an identity homography defined as
∀v ∈ En+1,−→ f (v) = Av = v,
implying ∀a ∈ P(En+1), f (a) = a. Further, the mapping−→ f :En+1→En+1 and hence
f :E→E are bijective, the inverse homography exists such that f ◦ f −1 is an iden-
tity homography. Thus PLG(En+1) is a group.
Lemma 11 (First fundamental theorem of projective geometry). Let P(En+1) andP(E
n+1) be two projective spaces of n dimensions and let their associated vector spaces
be En+1 and E
n+1. Let’s assume {mi}1≤i≤n+2 and {m
i}1≤i≤n+2 be the basis of P(En+1)
and P(E
n+1) respectively. Then the theorem says that there is a unique homography
g : P(En+1) → P(E
n+1), such that g(mi) = m
i, ∀ i,1≤i≤n+2.
Proof. Given the basis of a projective space, {mi}1≤i≤n+2 let’s assume {−→m i}1≤i≤n+1
forms the basis of the vector space associated. Thus {−→m i}1≤i≤n+1 and {−→m
i}1≤i≤n+1
80
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 92/110
are the bases of vector spaces of P(E) and P(E) as defined in subsection (A.2.2)
on bases of projective spaces.
We shall use canonical projection functions, p :En+1\{0}→P(En+1), p(−→mi ) = mi
and p
: E
n+1\{0} → P(E
n+1), p(
−→m
i ) = m
i. From the given condition g(mi) =
mi, ∀i,1≤i≤n+2, we have
g( p(−→mi )) = g(mi) = m
i = p(
−→m
i ) = p(−→ g (−→mi )), ∀i,1≤i≤n+2.
∴
−→m
i = λi−→ g (−→mi), λi ∈ R, ∀i,1≤i≤n+1. (A.11)
Let,−−→m
n+2 = λ−→ g (−−→mn+2).
But −−→mn+2 = Σn+1i=1 (−→mi ) and
−−→m
n+2 = Σn+1i=1 (
−→m
i).
From equation (A.11) we get,
Σn+1i=1 (λi
−→ g (−→mi )) = λ ∗ −→ g (Σn+1i=1 (−→mi )).
Using the fact that −→ g is a linear function and as {−→m i}1≤i≤n+1 forms a basis of
En+1 we see that {−→ g (−→m i)}1≤i≤n+1 forms a basis of E
n+1. Thus we get
λi = λ, ∀i,1≤i≤n+1.
∴−→ g (−→mi ) = λ ∗
−→m
i , ∀i,1≤i≤n+2, ∃λ ∈ R \ {0}. (A.12)
Let us consider two homographies, g1 and g2 such that −→ g1 (−→mi ) = λ−→m
i, ∀i,1≤i≤n+2
and −→ g2 (−→mi ) = µ−→m
i, ∀i,1≤i≤n+2, ∃λ, µ ∈ R \ {0}. Then from lemma (10), we deduce
that −→ g1 = α−→ g2 (here α = λ
µ ) implies that there is a unique homography associated
with them. Hence g1 = g2.
A.2.4 Projective subspaces
Let V
be a subset of a given projective space P(En+1) ≡ P(E) and it’s associ-
ated vector space En+1, then it is a projective subspace of P(En+1) iff we can find
a vector subspace V of En+1 such that P(V ) = V 4. Thus if
−→V is an m dimen-
sional subspace(m ≤ n + 1) of −→E , then V is known as m − 1 dimensional projective
subspace of P(E).
81
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 93/110
Transformations of projective subspaces
A projective transformation f : En → E
m transforms an k dimensional projective
subspace(k < n) into l dimensional projective subspace where l ≤ k . In other
words, a plane in En would be transformed into either a plane, a line or a point
in E
m whereas a homography would preserve the dimensions. Hence a line is
transformed into a line, and a plane transformed into a plane. Here a projective
line is defined as a 1 dimensional subspace of the projective space and a plane
as 2 dimensional subspace, a point as a 0 dimensional subspace of a projective
space, and so on so forth. The reason for homography preserving the dimensions
is a point to note. A homography or collineation(a term used for homography
in certain literature) is a projective mapping associated with an isomorphism of −→ f : En+1 → E
n+1. Hence each of the vectors of a basis of any subspace of En+1
would be uniquely mapped to a unique vector in E
n+1, and a set of of linearly
independent vectors {mi}1≤i≤k would be mapped to a set of linearly independent
vectors {m
i}1≤i≤k . Hence a subspace of k dimensions spanned by {mi}1≤i≤k is
uniquely mapped to a subspace of k dimensions spanned by {m
i}1≤i≤k .
More on subspaces
Given two subspaces U and V of P(En+1), we have a span of U and V , < U ∪ V >
as the smallest projective subspace containing U ∪ V (or seen as the intersection
of all subspaces of P(En+1) containing U ∪ V ). Then we can easily show that
U ∪ V has the vector subspace F + G of En+1, when F, G are the subspaces of En+1
associated with U , V .
A.2.5 Affine completion
Generalizing the affine geometry we obtain the projective geometry. Specifically
we show here the extension of the affine space to obtain a projective space. Con-
sider an n dimensional affine space (X ,−→X , φ). Now assuming {mi}1≤i≤n+1 to be
a basis of affine space X , we can denote every point m ∈ X by taking the vector
−−→m1m = −→m = (x1,..., xn) and representing it with it’s co-ordinates in the given
basis. Thus extending this co-ordinate representation by appending 1 we have−→m p = p((x1,..., xn, 1)) = p([−→m , 1]) 5. Hence as there is a one-one correspondence−→m ↔ −→m p we can represent every point a in P(En+1) not at infinity by a unique
point m in X . Also for every point m in X we have a unique point a in P(En+1).
5 p is the same canonical projection defined in equation (A.8).
82
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 94/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 95/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 96/110
And any hyperplane in En+1 can be represented as
Σn+1i=1 (hixi) = 0,
∀x ∈ En+1, x ≡ (x1,..., xn+1),
∃h ∈ En+1, h ≡ (h1,..., hn+1).
Vector of coefficients, h as described above uniquely defines the hyperplane,
upto multiplication by a non-zero scalar. Thus a hyperplane can be uniquely
represented by a projective point a = p(h), a ∈ P(En+1). Vice versa, every point
h ∈ P(En+1) uniquely gives us a hyperplane, as one from H (En+1). Hence from
equation (A.14)
H (En+1) ←→ H (P(En+1)) ←→ P(En+1), (A.15)
e.g. the hyperplane defined by equation, xn+1 = 0 corresponds to multiples of
a vector h ≡ (0,...,1)n+1 and hence to the projective point (0,...,0)n. This sec-
tion indirectly hands over us the concept of duality in projective space. Given
a n dimensional projective space P(En+1), we see that any point uniquely corre-
sponds to a hyperplane in corresponding En+1 and hence to a unique hyperplane
in P(En+1). Thus points and lines (line is an hyperplane of P(E2)) are duals of
each other in p(E2).
A.2.6 Action of Homographies on subspaces and study of invari-
ants
We know that parallelism and incidence are invariant in any affine transforma-
tion. And this can be seen from the previous sections on affine invariants. Simi-
larly here in projective geometry we can talk about invariants, which are cross-ratio
and incidence. It’s elementary to prove that incidence is preserved in a projective
transformation. For all we have to show is that the underlying vector spaces for
two projective subspaces in which one is a subset of another, are transformed into
vector subspaces in which again the corresponding vector subspace is a subset of
the other image. This can be proved using the linear property of the linear vector
space mapping. Hence we just define a cross-ratio here:
Lemma 13. Given four points a, b, c, d on P(E2) such that the first three points are
distinct, and denoting ha,b,c as the homography on P(E2) such that ha,b,c(a) = ∞,
85
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 97/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 98/110
two lemmas next, which we accept without proof. We still give short explanation
for each lemma along with.
Lemma 14. A point in P(E∗n+1) uniquely represents a hyperplane, H in P(En+1) and
hence a unique hyperplane H
in En+1 (Refer to equation ( A.15)). In fact if the point is
f ∈ P(E∗
n+1
), represented by the coordinate vector (α1,..., αn) of E∗
n+1
upto a non-zero
scalar multiplication, then the corresponding hyperplane is defined by the equation hT x =
0 where h is a vector uniquely defined as (α1,..., αn) upto a non-zero scalar multiplication.
This hyperplane is normally considered to lie in En+1. Thus a set of points in P(E∗n+1) is a
set of hyperplanes in En+1 and also by equation ( A.15) is a set of hyperplanes in P(En+1).
Lemma 15. There is a unique n − 2 dimensional subspace of P(En+1), V, such that
∆= { H ∈ H (P(En+1)) : V ⊂ H }. And for every x not in V, there is a unique H ∈ ∆
containning V. ∆ as either a line in P(En+1) or in P(E∗n+1) (we can note that there is a
one-one correspondence between P(En+1) and P(E∗n+1)).
An elaborate proof can be read from [27], given by Faugeras et al. . This gives
us a kind of understanding of a line in a projective dual space P(E∗n+1) and corre-
sponding pencil of hyperplanes in the corresponding projective space P(En+1).
Ideally we have a vector in dual space uniquely corresponding to a vector in
the vector space and vice versa. And by result (A.15) each of the points on the line
in projective dual space P(E∗n+1) corresponds to a unique hyperplane in the vec-
tor space En+1 and hence to a unique hyperplane in the projective space P(En+1).
The above result adds to it that corresponding to all these points lying on a line,
the hyperplanes in P(En+1), contain a common n − 2 dimensional projective sub-
space, V .
Lemma 16. If we consider another line D in P(En+1) such that it doesn’t intersect V,
then we have a homography,
∆→ D : H → H ∩D.
Proof. We are given the application ∆ → D : H → H ∩ D. Hence to show that
it is a homography, we consider the corresponding map between the respective
vector spaces:
∆ = P(F∗), D = P(D∗), F∗ ⊂ E∗n+1, D∗ ⊂ En+1.
∴
−→ f : F∗ → D∗
is the corresponding map which we show is an isomorphism. We proceed in two
parts.
87
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 99/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 100/110
maps to v in D∗. Which can be expressed as,
v j = Σn+1i=1 (hiαij ), ∀ j,1≤ j≤n+1, (A.16)
where αij are constant scalars dependent only on m1,..., mn−1.
Thus this transformation is a linear one which can be easily verified using the def-inition of a linear map and the transformation
−→ f : F∗ → D∗ : g → v.
Now this transformation defines the corresponding projective mapping, f : ∆ →
D : H → H ∩ D, and hence the projective mapping is a homography., or a projec-
tive morphism.
A.2.8 Homography as a perspective projection between two pro-
jective lines
Consider a setting of a 2D projective space(plane). Let us have a point o and two
lines l and m not passing through o. Further let us have four lines n1, n2, n3 and
n4, passing through o. Then we can assume a, b, c, d to be the points of intersection
of line l with lines n1, n2, n3, n4 and a, b
, c
, d
are points of intersection of line m
with lines n1, n2, n3, n4 respectively. Then this kind of projective transformation is
a homography from P(E1) onto P(E1). The proof which is quite elementary is left
here. Though this is not the only kind of homography possible between two projec-
tive lines. We can show that given 3 point correspondences between two P1 lines,
we can obtain a unique homography8
between them. And not all of them would be such that the lines defined by vectors
−→aa
,
−→bb
and
−→cc
are concurrent. Thus this
kind of projective transform is a special case, known as perspective projection or
perspectivity. In fact in section (A.2.3) we saw that all homographies f : P1 → P1
form a group PLG(R2). Thus these perspective projections also form a group
which is also a subgroup of P LG(R2). How such a projection forms a group can
be summed up from the below figure (A.2) where m1 → m2 and m2 → m3 are
homographies due to perspective projection and also is m1 → m3 a perspective
projection.
A.2.9 Homography between two planes
Figure (A.3) shows a correspondence between two planes m and l due to perspec-
tive projection. In this case we can extend the result for lines to see that such a
8Of course as defined in section (A.2.3.) a homography is unique upto non-zero scalar multi-plication
89
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 101/110
Figure A.2: Associativitiy of perspective projections
Figure A.3: An example of a homography between two projective planes l and mdue to perspectivity
perspective projection between two planes centered at point o is also a homog-
raphy f : P(E2) → P(E2). Then f ∈ PLG(R3). Thus this is a special case of a
homography between two projective planes which otherwise would have needed
4 point correspondences. In this scenario as we know central point o, knowing
just three non-collinear point correspondences, we have planes well defined and
correspondences between all the other points obtained by collineating through
point o. The three points in each plane constructing the correspondences are mea-
sured in a local coordinate systems in respective planes.
A good example of why we need four points for specifying a homography
between two projective planes in general can be seen if we consider only three
point corrrespondences, the position of the center point o is uncertain. With ref-
90
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 102/110
erence to [6], we can see that given a point o, we can fix three directions and then
find three points, one in each direction, this would determine the position of two
planes and a homography between them. With some extension we can show that
specifying a fourth point correspondence, the freedom is restricted and not every
point o can act as a center of the perspective projection. A point to note is that the
relative positions are to be calculated. We have proved a similar result in chapter
(2), where we show that not all four point mappings can be realized by a per-
spective projection. Thus the set of homographies of projective planes obtained
through perspective projection forms a subgroup of the set of all homographies
of projective planes. The group structure of this subgroup has been studied and
discussed in literature on projective geometry like [27, 30]. This kind of homog-
raphy is extremely useful for camera calibration and pose estimation. It defines
many properties governing image formation in a pin-hole camera model.
One point to note is that, in this appendix we have looked at homographies
as invertible bijective projective transformation, as in section (A.2.3) between two
projective spaces of equal dimensions(also known as projective morphisms). But
henceforth from here we will use the term homography only for projective mor-
phisms between projective planes as considered in this section.
91
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 103/110
CHAPTER B
Camera models and camera calibration
A basic camera is a projective model that maps points in P(E4) to points in P(E3).
Skipping elementary constructions we give a general formula that maps point
X ∈ P(E4) to a point x ∈ P(E3) 1. Assuming the camera coordinate system to be
centered at the euclidean coordinate system,
x = KR
I | − C
3×4 X , (B.1)
where K is a 3 × 3 camera calibration matrix that relates points in 3D camera co-
ordinate system to 2D image coordinate system and houses intrinsic parametes.
Further, R and t = −RC are the extrinsic parameters of camera. R is the rota-
tion matrix and t is the translation vector relating 3D world coordinate system to
3D camera coordinate system. Point C denotes the camera center in the world co-
ordinate system and hence C = [C 1]T is one of the vectors in R4 representing
C.
P = K R
I | − C
is the 3 × 4 projection matrix of the camera.
This raises an important question, can all 3 × 4 real matrices represent a camera
projection matrix? The answer is yes. This question leads us to two kinds of
cameras, classified based on the position of the camera center:
B.1 Finite Camera
If the left-most 3 × 3 submatrix of projection matrix P(let us denote it as M), is a
non-singular matrix, we have
P = M[I | − M−1 p4],
1Owing to space restrictions we denote a point X ≡ (a, b, c) in P(E4) by one of it’s correspond-ing vectors (a, b, c, 1) in R
4. Henceforth we would use this notation for a projective point unlessspecified otherwise.
92
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 104/110
where p4 is the rightmost column of P. A camera center in world coordinate frame
is defined as a vector C, such that PC = 0. For the finite camera with non-singular
M, C is the point represented as C =
− M−1 p4
1
In short, a finite camera is the one
whose center C is a finite point in 3D world coordinate system. 2
B.1.1 Elements of a finite projective camera
Assuming that we have a finite camera at hand, a camera projection matrix P = M | p
is dissected into following elements:
1. Column points: The leftmost 3 columns of P, which are p1, p2, p3 represent
the images of 3 principal directions X , Y , Z of world coordinate system. And
p4 represents the image of the origin of world coordinate system. This is so
in P3 a direction is represented by a point at infinity in that direction. HenceX direction is represented by a point represented by the vector (1, 0, 0, 0)
2. Row vectors: Denoting the rows of P as r1, r2, r3, the principal plane is the
plane parallel to image plane and passing through the camera center. Hence
all points that project to points represented by (x, y, 0) lie on this plane.
Thus PX =
r1
r2
r3
X =
x
y
0
, and hence r3X = 0. Thus r3 is the correspond-
ing row representing the principal plane. Similarly we can see that the othertwo rows are the planes which project to X and Y axes of the image plane.
They are known as axis planes.
B.2 Infinite Camera
An infinite camera is the one whose center is at infinity. Using the notion of the
previous section, we say that M is a singular matrix. And hence applying the
condition PC = 0, we get the camera center as
C =
d3×1
0
.
2Can we say that a point at infinity in 3D world coordinate system is also a point at infinity in3D camera coordinate system?
93
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 105/110
B.3 Camera calibration
From section (B), a projection matrix of a camera model is given by:
P = K R [I | − C] ,
K =
α γ u0
0 β v0
0 0 1
, (B.2)
where K is the camera calibration matrix. In fact, α and β represent the focal length
of the camera in terms of pixel dimensions in x and y directions respectively, γ
represents the skew due to distorted sensors in practical cameras and, u0 and v0
are the x and y coordinates of the principal point3 in image coordinate system.
Further, R is the rotation matrix and t = −RC is the translation vector.
The process of camera calibration is defined as estimating these quantities.
Further, P is a 3 × 4 matrix with 11 degrees of freedom4 and rank 3. Thus a knowl-
edge of 6 point correspondences in needed to uniquely estimate P upto non-zero
scalar multiplication. In fact only 5 and a half correspondences are needed. Rep-
resenting our image plane as P(E3) and world coordinate space as P(E4), we can
show that the process of imaging points in scene,P(E4), onto image plane, P(E3),
is a form of a projective transform. Hartley and Zisserman in [1] and Trucco et al.
in [14] have a detailed treatment for various ways of estimating P. Also does a
paper by Zhang, [31], outline two main kinds of methods for camera calibration:
1. Photogrammetric calibration: Here the calibration is done by specifying a
set of 3D-2D point correspondences between the world coordinate system
and the image plane. For this an elaborate setup and knowledge of the 3D
coordinates of the model object are required.
2. Self-calibration: Here more than one images are obtained using the same
camera for the scene. Different images are created by a rigid motion of thecamera in 3D space5. These views impose certain constraints on the inter-
nal parameters of the camera and hence can help us estimate the projection
matrix without the need for an explicit calibration model.
3A principal point is the point of intersection of the perpendicular line to image plane from thecamera center, with the image plane itself.
4K has 5 degrees of freedom, R has 2 and C has 3, thus a total of 11.5It can be an euclidean or a projective space.
94
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 106/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 107/110
CHAPTER C
Some miscelleneous proofs
This section contains some mathematical proofs to certain statements claimed in
various sections of the thesis. While some proofs might look trivial and some
not so trivial, the premise of a rigorous mathematical backbone is upon airtight
arguments and reasoning. And hence we aspire to lay down whatever proofs felt
relevant with utmost rigor.
Lemma 17. The zero set of function f in equation (4.7) defines the set of valid values of
h. We hypothesize that this set of points
X = {h ∈ R9| f (h) = 05}
defines an implicit manifold of fourth dimension. Or in other words, the jacobian of f ,
J X (h) defined as
J X (h) = 2
hT S1
hT S2
hT S3
hT S4
hT S5
5×9
(C.1)
is a matrix of rank five for all nonzero values of vector h ∈ R9. Where Si are nine
dimensional quadrics or real symmetric matrices defined in section (4.2).
Proof. Let us in general assume that the five vector rows are linearly dependent.
Hence we have some scalars αi, i = 1, 2, ..., 5 such that Σ5i=1αihT Si = 0 and not allof αi are zero. Using the definition of Si, i = 1, 2, ..., 5 we can write
hT S1 =
hT 1 C2 01×3 − p1hT
3 C2
, hT S2 =
01×3 hT
2 C2 − p2hT 3 C2
,
hT S3 =
hT 2 C2/2 hT
1 C2/2 − p3hT 3 C2
, hT S4 =
hT
3 C2/2 01×3 hT 1 C2/2 − p4hT
3
hT S5 =
01×3 hT 3 C2/2 hT
2 C2/2 − p5hT 3 C2
.
96
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 108/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 109/110
7/25/2019 Pose estimation from one conic correspondence
http://slidepdf.com/reader/full/pose-estimation-from-one-conic-correspondence 110/110
for two set of conics and their plane solutions, as given next.
Scaling the two vectors defining π u11 and π u21, by α, we get planes, π αu11 and
π αu21 and the vector defining the plane π αu11 is α
m11 m12 m13
T and that for
π u21 the vector is α
m21 m22 m23T
. From the definition of quantities k 11, k 12and M1 for vector u11 and k 21, k 22 and M2 for vector u21, given by equations (3.9)
and (3.12), we can notice that they are not affected by scaling. Hence from equa-
tions (3.21) and (3.22), one can infer that scaling u11 and u21 by α results in scaling
of the centers xc1 and xc2 by 1/α. The local coordinate system is chosen to be an
orthonormal set of axes. Hence the radius of the circle represented in both of the
local and global coordinate systems is the same which means that the radius is
also scaled by 1/α.
Then by application of lemma (2) to each of the two planes’ scaled versions, π αu11
and π αu21, with conics C1 and C2 being the same, we have
xαc1 = xc1/α and xαc2 = xc2/α,
and the radius(which stays the same as it was for C and C ) of these two scaled
circles is
rα1 = r1/α.
Further do we see that xαc1rot = xc1rot /α. From equations (3.25) an d(3.26) we have
pα1 = p/α and pα2 = p2/α. Applying equation (3.29),
Aα =
xc1
α
p1
α
(xc1 × p1)
α2
and Bα =
xc1rot
α
p2
α
(xc1rot × p2)
α2
.
top related