bsc thesis fundamental matrix fun - elte

BSc Thesis

Fundamental Matrix Fun

Johanna K. SiemelinkELTE, Faculty of Science Mathematics BSc

SupervisorDr. Dávid Szeghy

ELTE, Faculty of Science, Institute of Mathematics, Department of Geometry

ContentsCentral Projection 3

Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Starting Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Cross Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Projective Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Projective Central Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Homogeneous Coordinates 10Coordinates of a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Line Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Features of Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . 13Cross Ratio Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Fitting Space with Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . 15

Transformations 16Projective Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Cross Ratio’s Usefulness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Matrix Form of Projective Transformations . . . . . . . . . . . . . . . . . . . . . . 19Projective Transformation’s Effects on Lines . . . . . . . . . . . . . . . . . . . . . . 22An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Camera Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Camera Extrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Camera Intrinsics, the Calibration Matrix . . . . . . . . . . . . . . . . . . . . . . . 24Tying the Two Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Fundaments of the Fundamental Matrix 27Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Epipolar Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Fundamental Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Geometric Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29F ’s Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Algebraic Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Epipolar Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Additional Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Usefullness of F 34Why Do We Like F? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Up to a Projective Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Canonical Cameras Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

How to Get F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Prelude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Singular Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1

8-point Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Using the SVD in Our Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Adding the Singularity Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457-point Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

From Here 46What We Did . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2

IntroductionIn this thesis we will explore a bit of the rich world of two view geometry. If we take two picturesof the same scene from two different angles we suddenly have a lot more information of the scene,compared to just a single picture. We use this everyday with our two eyes for depth perception.Another application is for self driving cars, they have to construct the scene around them frompictures taken by their cameras. In this thesis we will show the relation between the two picturesin the form of the fundamental matrix. To do all this we will first introduce the tool set necessaryto achieve this.

We need a mathematical model to show what cameras can do and a new type of space and a newcoordinate system that are compatible with it. These are the projective space and homogeneouscoordinates, we will introduce these and show some of their very useful properties. We willformalize even further and present a way we can use the transformations we need as matrices. Wewill also construct the camera matrix, which is the matrix form of the transformation of a cameracapturing a picture. With these tools we will be able to start exploring two view geometry.

The fundamental matrix ()F is the algebraic representation of the relation between two imagesof the same scene. It is very useful because with knowledge of F we can construct the cameramatrices up to a certain ambiguity. We will also show several ways to construct F , all fromdifferent input data. The most important one of these is the 8-point algorithm, which shows away to construct the fundamental matrix from just 8 points on one of the images and the matchingones on the other. Showing this algorithm is the goal of this thesis. For this we’ll delve a bit intosingular value distribution. The minimum case for constructing F is actually 7 and we also havea 7-point algorithm. However the 8-point algorithm is recommended, because it can be used withmore pointpairs, which makes it more accurate in the case of noisy data. The 8-point algorithmis easily implemented and is one of the most used algorithms to find the fundamental matrix.

Central Projection

Figure 1: The workings of eyes and cameras [7]

To explore the mathematics of making images, we must first examine the two most importantimage-making systems: the human eye and cameras. The light rays from an object enter the eyethrough the pupil and are focused onto the retina. The retina is a surface filled with receptorsand the image is projected onto this wall. A camera works in a very similar way: light comes inthrough the shutter and is focused by the lens onto a back panel. Then the picture hits a surface

3

from which it can later be developed. Our goal is to construct a mathematical model that workslike the objects above.

Definition (Central Projection): The image plane Σ ∈ R3 and O /∈ Σ (a point we’ll useas the center) define the projection. For any point O ̸= P ∈ R3 (P /∈ ΣO, the plane parallel to Σ,going through O) its image will be the point P ′ := Σ ∩ OP , the intersection of the line OP andthe image plane.

Figure 2: Central projection, the preferred position of 3D points when projecting is on the right

Here the center is like the pupil or shutter, the lines which represent the light enter throughhere. They are then captured on the panel, which is the image plane.

Basic PropertiesStarting Points

It’s immediately apparent that not all points behave the same way when they are projected. Thepoints on the plane Σ are fixed-points, they remain themselves. The center O can’t have an image,because in the case of P = O the line OP is not defined, thus it is exempt from our definition.

Another case are the points which lie on the plane parallel to Σ that runs through O , theplane ΣO. If P is such a point the line OP will be parallel to Σ, so they will not intersect. Wewill construct a solution for this later on.

There is also a difference between points depending which side of the image plane they comefrom. With a camera or the human eye we can’t see through the back of our heads. So we onlysee the points in front of us i. e. on the side that the center is on. In mathematics we have thefreedom to disregard such constraints. With this definition we don’t differentiate between frontan back, we can see all the points to the back side of the plane.

We do however want a model that we can use in various applications. We can sidestep thisbroad definition by simply putting all the points we’re working with on the side of the plane wewant. Mathematically the two sides are easily transferred into each other, so we do not haveto choose the side that gives us the alignment eyes and cameras use. To have a very practicalalignment we choose the other construction, the side further from O. When we’re reconstructing

4

from an image we also assume the points all come from this side. This will be used later, in thissection we will explore this projection in its broadest form.

Shapes

Figure 3: Florence Cathedral [6]

When photographing a three dimensional model we know there will be some distortion in the2D picture. To explore what changes and what stays the same, we’ll be looking at a picture ofthe Florence Cathedral.

When we look at a corner of this building, we can see that in the picture it is in the right spot:at the intersection of two walls. In fact, looking at any other intersection we observe that thesealso remain intersections on the picture.

We know from the original model that the sequence of orange segments along the wall are allthe same height. On the picture they are clearly not. From this we may conclude that lengthis not a preserved property when we make a photograph. This also shows when we look at thecircular windows. A circle is a collection of points that are equidistant from its center. Distancesare not preserved in a photograph, thus the circle gets deformed.

A circle turns into an oval, but a line apparently keeps its shape. This is easily proven: Let ustake a line l ∈ R3 and project all its points onto the image plane (see Figure 4). If we look at allthe individual OP ∗ lines P ∗ ∈ l they sweep a plane, the plane defined by l and O. The image ofthe line will be on Σ, at the intersection of these two planes, which is a line.

If we look at the long red lines along the wall we will find maybe the most important propertythat is distorted. These lines were parallel in the real world, but on the picture they are not. Nonparallel lines on a plane always have an intersection, these two are in no way exempt. Where didthis point come from? This intriguing mystery is at the heart of projective geometry, we’ll exploreit later.

5

Figure 4: The image of a line is also a line.

Cross Ratio

There is an obscure quantity that is retained, but it is quite unapparent at first glance.

Definition (Cross Ratio): Let’s take four collinear points A,B,C,D ∈ l. Picking a centernot on this line O /∈ l defines four vectors pointing from O to the four points. Let’s take fourvectors parallel to these: a, b, c, d. They are all parallel to a plane, so a and b make a base and wecan express the last two with them:c = γaa+ γbb

d = δaa+ δbb .The cross ratio of these points denoted by (ABCD) := γbδa

γaδb.

Figure 5: The definition for a cross ratio works for all sorts of arrangements

We can show some motivation for this arbitrary number. The cross ratio is also referred toas the ratio of ratios, this is simply because γbδa

γaδb= δa

δb: γa

γb. In a sense a central projection ruins

ratios, but the ratio of these ratios remains intact. We have to briefly note that the cross ratiocan be any real number, except 0 or 1. If γbδa

γaδbwere 0, either c = γaa or d = δbb, but these would

result in the same point on our line. These were distinct points, so (ABCD) can not be 0. With

6

a cross ratio of one we get a similar problem, because then γb

γa= δa

δb, resulting in C = D.

We’re not going to prove that this definition is mathematically valid or that it is independentfrom our choice for O, for the proof see Szeghy 2013 [2]. We will prove the cross ratio does staythe same from the real world to a picture:

Theorem (Cross Ratio in Central Projection): Four collinear points A,B,C,D ∈ l havethe same cross ratio as their centrally projected counterparts: (ABCD) = (A′B′C ′D′).

Figure 6: The cross ratio of the four projected points will be the same as the original

Proof: The line and the center of projection define a plane and the whole projection processhappens on here, it doesn’t leave this plane in any of the steps. All the points, the points’ images,the line, the line’s image and the center are on this one special plane. This means that we needonly prove our theorem for a planar central projection. Higher dimension cases will follow fromthis.

We have complete freedom for choosing O, so we can pick the center of projection. This meansO,A and A′ lie on the same line. The vector a has to be parallel to OA and a′ has to be parallelto OA′. Because these segments are on the same line, a and a′ become parallel. We are at libertyof choosing their lengths, so we can actually make them the exact same vector a = a′. Choosingthe other three vectors in the same way it is clear that the cross ratio derived from them will bethe same.

□

Using the thought process in this proof we can see that four vectors define the cross ratio fora lot of collinear point sets. First we take a point O and graph four lines through O parallel tothe four vectors. After that any line not parallel to these or going through O will intersect theselines and define four collinear points. Any set of points derived like this will have the exact samecross ratio. This gives rise to a definition for a cross ratio for vectors:

Definition (Vector Cross Ratio): Four non parallel vectors that are parallel to a plane willhave the cross ratio of (ABCD), where we get these points by intersecting lines parallel to thesevectors running through a central point.

7

Lines

Returning to our investigation of central projections we will look at the behaviour of lines. Likewith points, lines on Σ are fixed and lines on ΣO have no image. This is trivial as they areconstructed from points with these properties.

The lines through O raise a different problem; they project into a single point. A lot of linescan’t be fully projected: they cross through the ΣO plane. (The ones that run parallel to it haveno such issue). If we look at these ”lines” as lines with a single point missing, they have the imageof a line. If we go nearer and nearer to the intersection, its image shoots away into infinity alongthe image of the line. If we close in on the missing point from the other side of the problematicplane the image point shoots off in the other direction along the line to infinity.

Figure 7: Closing in on the point without an image the image points shoot off into infinity

Another curious phenomenon has a point suddenly appearing on our image. Imagine a pictureof train tracks disappearing in the distance. The two rails were parallel in reality, but on theimage they are not. Two non parallel lines intersect in a point, which we can find on our picture.So now we have a point that didn’t exist in our 3D space, but it appeared on the picture. Weneed some serious renovations in our system for this to start making sense.

Projective SpaceIn the last segment we saw that the missing image of our point should be somewhere at infinity.This is exactly what we will introduce: an ideal point, which is at both ends of our line, infinitelyfar away. We know R1 is a line, so to properly define a new type of line we’ll be defining a onedimensional projective space.

Definition (Projective Line): P1 a one dimensional projective space will consist of R1 andan added point at infinity.

Now let’s try and construct a model for a plane. We want the lines to be projective lines likethe one we defined above. The big question here is what the relation of all the points at infinitywill have to each other. Does every line have its own point? Is there only one point at infinityand all lines intersect in the distance? We have to decide what will best suit our purposes. Ourbiggest hint comes from the train track problem. We saw two previously parallel lines intersect on

8

the picture. If we lay down another pair of train tracks parallel to the previous one, all four railswill share an intersection in the image. This gives rise to the idea of taking parallelism classes andassigning a point to each.

Another motivating factor is the image of a pencil of lines through a point on ΣO. We sawearlier that when nearing the missing point, the image points of the line blast off into infinity.We can do this with all the lines going through here. But what is the relation of the images ofthese lines on the image plane? In space these lines only intersect at our critical point. It followsthat the image lines also have no other intersections. Omitting the singular point, our lines havethe images of lines on the image plane without intersections. That means these must be parallel!Thus our point at infinity should be at both ends of a parallelism class containing our initial line.

Figure 8: Going through the same disappearing point the images of these lines become parallel

The last thing we have to decide before defining anything is how the points at infinity relateto each other. If we take a look at the images of all the ends of the train tracks they actually formthe horizon. These all lie in a plane so this gives enough incentive to take all of the ideal pointsof a plane and define the ideal line constructed from them. Now we can have a definition:

Definition (Projective Plane): A two dimensional projective space (P2) will consist of R2

and points at infinity. We redefine lines as a Euclidean line with a point at infinity, which will becollinear with any pointpair on this line. Each ideal point will belong uniquely to a parallelismclass, being on each line belonging to this class. These points will be threaded onto a line atinfinity, which means all infinite points are collinear.

From this definition we can see that any two distinct points still uniquely define a line. Weshould enjoy the useful properties we have gained. Any two projective lines have exactly oneintersection, all the lines behave in the exact same way. This also means that the line at infinityis not special, it behaves the same as all the others. This is a very useful tool in a lot of geometryproblems. We can certainly expand these definitions to multiple dimensions.

9

Definition (Projective 3D space):We take R3 and add points at infinity at the end of allparallel lines of the same class. No two classes will have the same ideal point. We redefine linesinto projective lines by adding the point at infinity. We also redefine planes into projective planes:a Euclidean plane plus an ideal line with points, like in the previous definition. All the ideal pointsand lines will form a R2 at infinity, called the infinite plane.

Definition (Projective n-space): We take Rn and add points at infinity at the end ofparallel lines of the same class. No two classes get the same point. We redefine lines into projectivelines, planes into projective planes and inductively continue up to projective Rn−1 spaces. All theinfinite points will form a Rn−1 at infinity.

Projective Central Projection

If we’re in a projective space the problem of the ΣO plane not having an image disappears. If apoint P is on this plane, the line OP that defines it will lie entirely in the ΣO plane. This planeis by definition parallel to the image plane, thus the line OP inherits this property. A plane anda line parallel to it will only intersect at the point at infinity, where the line meets the plane atinfinity. This way the image of this point, previously missing, can be found at infinity.

Definition (Central Projection): The image plane Σ ∈ P3 and O ∈ P3 the center (O /∈ Σ)define the projection. For any point O ̸= P ∈ P3 it’s image will be point P ′ := Σ ∩ OP , theintersection of the line OP and the image plane.

We now also know where the added intersection of the Cathedral’s walls comes from: It is theimage of a point at infinity, where the previous parallel lines meet.

To surmise: central projection is a mathematical transformation, which recreates how camerasand the human eye work. It is a map that captures 3D points on a 2D plane. It retains someproperties of the model at hand, it also loses a lot. For it to work properly we introduced pointsat infinity, the projective space.

Homogeneous CoordinatesNow that we have a new type of structure we want to use it to calculate everything. To dothis we need coordinates for our projective space. We could add some new coordinates to ourexisting Cartesian coordinates with an asterisk to show they are for the infinite points. This typeof patchwork model will probably have rough transitions and would give us trouble down theline. That’s why we want a comprehensive system that is of extra mathematical ue. For exampleit could show us if three points are collinear in an easy algebraic way. Thankfully there existssuch a coordinate system that will help us immensely in our projective forays: the homogeneouscoordinate system.

Coordinates of a PlaneWe’ll first define our coordinate system only for a two dimensional projective plane, we’ll laterexpand it to 3D. The big trick is to place the plane we want fitted with coordinates into a threedimensional projective space Σ ⊂ P3. We then choose the center O ∈ R3 ⊂ P3 for our coordinatesystem as a point not on this plane O /∈ Σ.

10

Let’s take point P ∈ Σ and assign to it the vector v, which is parallel to the OP line. Beforewe proceed we have to check that we can obtain P from this vector. As v and O are known to us,we can construct the line that will pass through Σ exactly at P .

Figure 9: The point on the plane gets its coordinates from the three dimensional vectors pointingat it

There are many vectors that are parallel to OP , so this assignment is surjective. We want acoordinate system to be bijective: one coordinate representation for each object and one objectfor each coordinate set. To attain this we simply regard all the vectors that point in the samedirection as equivalent. They are rescalable into each other, so we define the equivalency: v ∼ µv

for every µ ̸= 0.Our vector has a representation in the Cartesian coordinate system of our 3D space: v =

xe1 + ye2 + ze3, where e1, e2, e3 is the base of our space. The equivalent group of vectors onlydiffer in a scalar µ ̸= 0 This gives us µv = µxe1 + µye2 + µze3 . Let’s take these coordinates anduse them to give our point P homogeneous coordinates.

Definition (2D Homogeneous Point Coordinates): Let the homogeneous coordinates of

the point P ∈ Σ ⊂ P3 be the equivalency class

xyz

∼

µxµyµz

(µ ̸= 0) These are the Cartesian

coordinates of a vector v, which is parallel to the line OP , with a fixed O ∈ R3, O /∈ Σ as thecenter of our coordinate system.

We have passed over the points at infinity until now, we have to check how these fit in ourmodel. In constructing the coordinates we assigned the vector with the help of the line OP . If Pis a point at infinity this line still exists, looking at the Euclidean part of our model this line willbe parallel to Σ. This still defines a direction, an equivalence class of vectors which have Cartesiancoordinates. The same steps we used for regular points are also applicable for the points at infinity,the definition we gave still stands.

11

Line Coordinates

A huge upgrade from Cartesian coordinates is that we have coordinates for all the lines l ∈ Σ.Let’s take such a line together with O. A line and a detached point define a plane, we want to usethis plane to define coordinates. Like point coordinates we’d like these to come from a vector aswell. A vector perpendicular to the plane will be perfect! There are a lot of these normal vectorsto choose from, so we’ll introduce an equivalency again.

Figure 10: The line l on the plane gets its coordinates from the three dimensional vectors thatare perpendicular to the Ol plane

Definition (2D Homogeneous Line Coordinates): Let the homogeneous coordinates of

the line l ⊂ Σ ⊂ P3 be the equivalency class

xyz

∼

µxµyµz

(mu ̸= 0) These are the Cartesian

coordinates of a vector v, which is perpendicular to the plane fitted onto l and O, the center ofour coordinate system.

We will not denote if we’re using point or line coordinates, it will however be clear from context.

Special Case

We have a lot of freedom in this method: where to place the plane and which point to choose ascenter. A common choice for these is placing the plane at z = 1 and the center at the Cartesiancoordinate system’s origin (see Figure 11). This gives us the useful property that all the vectorscorresponding to points at infinity are all parallel to the z = 0 plane. Their third coordinate isalways 0, this means the last coordinate of the point will also be 0. Because of the ambiguityof vector choice we can also choose the OP vector for the regular points, we’ll give them a z

coordinate of 1. Of note is the fact that the plane can be at any distance from the xy plane, itjust has to be parallel to it. If we have z = f we can still choose the last coordinate to be 1.

So all points at infinity have homogeneous coordinates in the form of

xy0

and all finite ones

xy1

.

12

Figure 11: Putting the plane at z = 1 and O at the origin gives us easy to use homogeneouscoordinates

Sometimes the plane we want fitted with coordinates is already in 3D space, so we can’t justplace our plane wherever we want it. However if we have the freedom to choose the Cartesiancoordinate system required for our construction, we can use that to get the same result. Let one ofthe bases be a vector perpendicular to the plane and the other two at a right angle to each other,parallel to the plane. Then let’s set the center anywhere not on the plane. This orthonormalcoordinate system is arranged in a way to give us the special case homogeneous coordinate systemdetailed above.

We’ll use this trick to construct a favourable coordinate system for cameras. The plane wewant fitted with coordinates is Σ, our image plane. We’ll use O the center of projection as thecenter of our coordinate system. Now we choose the bases of the Cartesian coordinate systemto result in the practical construction. From now on, the camera coordinate system will meanthe homogeneous coordinate system fitted on the image plane, using the special case constructionwhere the last coordinate indicates a point’s finiteness.

Features of Homogeneous Coordinates

There are some relations between objects that give a very simple equation if we use homogeneouscoordinates. Three of these are essential moving forward, they are the first three in the tablebelow.

A point is on a line P ∈ l PT · l = 0

Two points defining a line P,Q ∈ l P ×Q = l

Two lines intersecting in a point P ∈ l, k l × k = P

Three points are collinear if PT · (Q×R) = 0

Three lines intersect at one point if lT · (k × j) = 0

As a help to prove these we give the images in Figure 12. They show that the scalar and crossproducts of the coordinate vectors give these equations because of various right angles.

13

Figure 12: Right angles between the coordinate defining vectors explain the algebraic equations

A useful trick is to use the matrix form of a cross product. This is where we use a specialmatrix that mimics a cross multiplication by a specific vector. For example by vector v:

[v]× =

0 −v3 v2

v3 0 −v1

−v2 v1 0

Using this notation, the equation dubbed two points defining a line becomes [P ]×Q = l. This willhelp us compute cross products, using matrices is much preferred.

Cross Ratio Revisited

First we have to look at the cross ratios involving ideal points. The definition we gave in theCentral Projection section still holds up with a projective space. If we have four collinear points0, 1 or 4 of these can be ideal. If we have two or three points at infinity this forces our line to beat infinity, so all four points have to be ideal.

With our new knowledge coming from homogeneous coordinates we can clearly see that thecross ratio definition for vectors can actually be applied to equivalency classes of vectors. Usinghomogeneous coordinates gives us a way to calculate the cross ratio, it is often used as a definitionfor the cross ratio as well.

Definition (Cross Ratio with Homogeneous Coordinates): We have four collinearpoints with homogeneous coordinates a, b, c and d. From the equivalency class of vectors we canchoose the defining vectors as c = a+ b and d = ka+ b. (We can do this because of their linear de-pendence stemming from the collinearity.) Let the cross ratio for the four points be (ABCD) := k.

This also gives us an easy proof for a very important theorem.

Theorem (Three Points and a Cross Ratio): Given three collinear points A,B and C

and a number k ∈ R, these define exactly one point D, for which (ABCD) = k.

Proof: Setting the homogeneous coordinates of the three points so that c = a+ b, we can findthe homogeneous coordinate d, because it is d = ka+ b

□

14

Fitting Space with Homogeneous CoordinatesFor three dimensional projective space we can use some of the same tricks we used for a plane.We take our V ≈ P3 and place it in P4 space fitted with Cartesian coordinates in its Euclideanpart. Then we choose O ∈ R4 ⊂ P4 as the origin of our homogeneous coordinate system. Defin-ing points works the same as planar points. Surprisingly, giving planes coordinates can be doneanalogously to lines on a plane.

Figure 13: Illustrations to help visualize a 3D space embedded in a 4D one

Definition (3D Homogeneous Point Coordinates): Let the homogeneous coordinates of

the point P ∈ V ⊂ P4 be the equivalency class

x

y

z

t

∼

µx

µy

µz

µt

(µ ̸= 0). These are the Cartesian

coordinates of a vector v, which is parallel to the line OP , with a fixed O ∈ R4, O /∈ V as thecenter of our coordinate system.

Definition (3D Homogeneous Coordinates of a Plane): Let the homogeneous coordi-

nates of Σ ⊂ V ⊂ P4 be the equivalency group

x

y

z

t

∼

µx

µy

µz

µt

(µ ̸= 0). These are the Cartesian

coordinates of a vector v, which is perpendicular to the space fitted onto Σ and O, the center ofour coordinate system.

Fitting 3D projective lines with homogeneous coordinates is a very different process and alsohas many solutions. It is not in the scope of this thesis and therefore omitted.

15

We can also employ our trick of fixing one of the coordinates. Let t = 1 be V and the O be

at the world origin. Now

x

y

z

1

these are the coordinates of finite points and the points at infinity

have the form of

x

y

z

0

.

TransformationsOne of the most useful perks of homogeneous coordinates is that transformations are very easyto express in matrix form. With Cartesian coordinates a translation was a sum, while a rotationwas done by multiplication with a a rotational matrix. Now we can standardize and do bothas a multiplication, we can even have a transformation matrix that does both at the same time.In this table we can see these plane transformations. In higher dimensions these are similarlyconstructible.

Cartesian coordinates: Homogeneous Coordinates:(x

y

)+

(tx

ty

)=

(x+ tx

y + ty

) 1 0 tx

0 1 ty

0 0 1

xy1

=

x+ tx

y + ty

1

[cosα − sinα

sinα cosα

](x

y

)=

(cosα · xsinα · y

) cosα − sinα 0

sinα cosα 0

0 0 1

xy1

=

cosα · xsinα · y

1

[cosα − sinα

sinα cosα

](x

y

)+

(tx

ty

)=

(cosα · xsinα · y

) cosα − sinα tx

sinα cosα ty

0 0 1

xy1

=

cosα · x+ tx

sinα · y + ty

1

Translations and rotations and combinations of the two are isometries, meaning they maintain

distances through transformations. They also preserve orientation: clockwise stays clockwise.These are very useful, as they reflect the transformations we can preform on real life objects.Moving from one camera position into the next is done by a translation and a rotation. We arealso fortunate that rotation around a different axis than one of the bases can be put together fromthese, so there really is no need for us to construct all their matrices.

It is important to note that in the case of transformation on homogeneous vectors the scalingof the matrix does not matter. If we take H ∈ Rk×k and µH also a k × k matrix, applying theseto vector v we get Hv and µHv as results. These are homogeneous coordinates, thus equivalent.This means H and µH both define the same transformation. Matrices where multiplication by ascalar don’t matter are called homogeneous matrices, these are similar equivalency classes as seenbefore. Transformations on homogeneous coordinates always happen with these matrices.

16

Projective Transformation

Figure 14: A projective transformation preformed on the image of the Florence Cathedral canstraighten the previously discussed non-parallel lines

During the transformations mentioned in the previous section, the last homogeneous coordi-nate stays 1, these points do not disappear into infinity. Nor do any additional points appear, thepoints at infinity stay there. This is called an affine transformation. It raises the topic of transfor-mations, where ideal points come into play. We’ll be working with a subgroup of transormations,where only one very important feature is retained:

All transformations of a projective space into itself, where lines transfer into lines are projectivetransformations.

Definition (Projective Transformation): An invertible mapping h : Pk → Pk, wherethree points P,Q,R ∈ Pk are collinear if and only if h(P ), h(Q) and h(R) are collinear.

Cross Ratio’s Usefulness

Projective transformations have some similar qualities to central projections. Distances, shapes,ratios are lost, intersections remain, lines stay lines. The ratio of ratios, the cross ratio is invariantunder a projective transformation. We will use this in the next theorem without proof.

To define a projective transformation we fortunately don’t have to indicate where all of thepoints in our projective space go:

Theorem (Generic Points Define a Projective Transformation): Let us take k + 2

points A,B, ... ∈ Pk in general position (no k of them in the same k − 1 dimensional hyperplane)and k + 2 more, also in general position A′, B′, .... There exists exactly one projective transfor-mation that transfers all the points into their intended point: h(A) = A′, h(B) = B′, ...

Proof: We will prove for k = 2, for points A,B,C,D and A′, B′, C ′, D′. These four pointswith their images place constraints on where the images of other points can be. If all of the other

17

image points are thus defined these four are enough to define the transformation, we win. Thereare a few image points that we can instantly find. Line incidence stays the same, so the imageof AC ∩ BD = M must be A′C ′ ∩ B′D′ = M ′. The other intersections like this can be similarlyobtained, but we’ll only use this one for now.

Taking a closer look at the line AC we see that there are now three points on it with knownimages. We know that a projective transformation retains the cross ratio. We also know that threecollinear points and a cross ratio define the fourth point. Because any point Q on AC has a crossratio with these points of (ACMQ), we can easily find its image: we know where A′, C ′ and M ′

are and we know (A′C ′M ′Q′) and this defines Q′. With the homogeneous coordinate definitionwe can construct the homogeneous coordinate for Q′. Thus all the points on the projective lineAC are settled, we have found the image of all the points on this line. By the same logic all thepoints on the lines defined by the four original points now have a single point that can be theirimage.

Figure 15: The images of the points of M and Q are determined uniquely

Now we can construct the image for any point P on our plane. First we take two lines runningthrough P that intersect with AC and BD in four distinct points M1,M2,M3,M4. We know theimages of all four of these, because they all lie on the special lines, where we have establishedall image points. We also know P ′ must be at the intersection of the lines M ′

1M′2 and M ′

3M′4,

because intersections are invariant under projective transformations. With these few steps we canconstruct the image of any and all points on our plane.

Figure 16: The intersection points are known, thus P ′ is uniquely determined

18

We have to take note that this proof seems to rely on the specific arrangement of these points.We don’t use the specific order of the points on the lines and all the points we found can be definedby intersections only. Because we are in projective space it is also not a problem if we stumbleupon any parallel lines, because these also have intersections. We only use incidence and crossratios and we are well within our right to do so.

□

Matrix Form of Projective Transformations

Of course we want to use these transformations in matrix form and this shows us the ingenuity ofhomogeneous coordinates. All projective transformations have a matrix form and all non singularmatrices generate a projective transformation.

Theorem (Matrix form of Projective Transformations): A mapping h : Pk → Pk is aprojective transformation if and only if there exists a non singular matrix H ∈ R(k+1)×(k+1) whereh(x) = Hx holds true for the homogeneous vectors x of all the points.

Before we prove anything, let’s take a step back for a second. We got the matrices from themanipulation a projective transformation does to our homogeneous coordinates. We can shiftperspective and look at the effect of a matrix on the vectors that gave us the coordinates. Amatrix’s effect on vectors is a linear transformation, only here everything is homogeneous, so itsscaling doesn’t matter. We’re looking for homogeneous matrices anyway, so this model suits usjust fine. First we construct our homogeneous coordinates for P , then subject our vectorspace toa linear transformation. After which we construct the coordinates anew, for the other projectivespace. This induces a projective transformation from the first to the second space. In the case oftwo planes we call this a homography. We can summerise this in a theorem.

Figure 17: A linear transformation of the vectors induces a homography between the two planes

Theorem (Linear Transformation Matrices in Projective Geometry): An invertablelinear transformation of a k + 1 dimensional vector space L : V → V induces a projective trans-formation [L]1,2 : P1 → P2 between two projective spaces of dimension k, in ⊂ Pk+1.

Proof: The method described above does induce a transformation, we just have to prove itis a projective one. For that it has to be invertible, which it is and lines have to transform intolines. It is enough to prove that lines from P1 stay lines, because [L−1]2,1 : P2 → P1 will provelines come from lines.

19

We’ll prove this theorem for k = 2, so a linear transformation of the vector space R3 and ahomography between two planes Σ1 and Σ2 ⊂ P3. Let’s look at three homogeneous vectors thatdefine three points on a line P,Q,R ∈ l. Because of the points’ collinearity we know all three areperpendicular to the same line defining vector: l. In three dimensions this results in the threevectors being parallel to a plane. This is a property that is retained with a linear transformation.

Lemma (Linear Transformation of Degenerate Case Vectors) Three vectors P,Q,R ̸=0 ∈ Rk+1 parallel to a plane maintain this property when subjected to a linear transformationL : Rk+1 → Rk+1.

Figure 18: The linear transformation of three coplanar vectors gives three coplanar ones

Proof: We know that three vectors are parallel to a plane if and only if they are linearlydependent, meaning there exist three scalars that αP + βQ + γR = 0. We want to see thatL(P ), L(Q), L(R) vectors are also parallel to a (different) plane, so our goal is three scalars thatgive aL(P ) + bL(Q) + cL(R) = 0.

As L is a linear transformation, our equation can be rewritten as L(aP + bQ + cR) = 0.We have three constants at the ready a := α, b := β, c := γ, using these L(aP + bQ + cR) =

L(αP + βQ+ γR) = L(0) = 0.

□

This means there exists a vector to which all three transformed vectors are perpendicular. Inour new coordinate frame this vector can be taken as a homogeneous line coordinate. This givesus a line on plane Σ2, on which all three points lie. Thus we have proven collinearity is maintanedin a transformation induced by a linear transformation, making it a projective transformation.

□

Proof (Matrix Form Theorem): We proved the theorem of linear transformation inducedmatrices, for k = 2. In this thesis we won’t need the higher dimensions of the theorem of thematrix form of projective transformations either. That’s why the theorem for linear transformationmatrices in projective geometry is adequate as half of the proof for the first theorem. With it weproved all invertible matrices result in a projective transformation.

We’ll show the other direction of this proof for a projective transformation of a plane. If wehave four points A,B,C,D and four intended image points A′, B′, C ′, D′, this defines a projective

20

Figure 19: a, b, c forms a base in R3 and d is their sum

transformation. Any three vectors a, b, c belonging to the homogeneous equivalence classes ofA,B,C respectively, define a base for R3, because they were in general position. We get a secondbase from the image points’ coordinate vectors a′, b′, c′. A transfer between the two bases defines alinear transformation L. But what happens to the point D? We have a lot of freedom in our choicefor the length of the defining vectors. a, b, c form a base in R3 and we can write d as d = αa+βb+γc,where none of these coefficients are zero.. We can choose the lengths of the vectors, let’s do thisso the equation d = a+ b+ c is realized. We can similarly attain d′ = a′ + b′ + c′. As L is a lineartransformation it means d′ = L(d) = L(a + b + c) = L(a) + L(b) + L(c) = a′ + b′ + c′. Thus D

gets transferred into the right spot as well. From the theorem Generic Points Define a ProjectiveTransformation we know this is enough to force the other points to go into the right places as well.Now we have a linear transformation that induces the exact projective transformation we want.Because the linear transformation has a matrix, this gives us a method to derive the matrix of aspecific projective transformation.

For higher dimension cases this works in the exact same way. We have k + 2 points in Pk,this gives us a base for our vectorspace Rk+1. With our enormous freedom for vector lengthswe can choose the base vectors so their sum is the last vector. Because the transformationbetween the bases is linear this property is preserved, so in the end all the points go to theirallocated destinations. This proves every projective transformation can be written as a matrixthat tranforms homogeneous coordinates.

□

It can be easily proven with algebra (see Szeghy 2013 [2]) that not only does a matrix H exist,but that it is unique up to a non zero multiplication scalar. This theorem can be used to give analternative definition for projective transformations, these will be used interchangeably here onout.

Definition (Matrix Form of Projective Transformations): A projective transformationis defined by a non singular matrix H ∈ R(k+1)×(k+1), which transforms a point x ∈ Pk by multi-plying its homogeneous coordinate form: Hx.

21

Projective Transformation’s Effects on Lines

Part of the usefulness of homogeneous coordinates is that lines have coordinates too. Whathappens to these when subjected to a projective transformation? Let’s take three points x1, x2, x3

and the line l they all lie on. Let’s look at the first point on l: lTx1 = 0. We can add an identitymatrix in the middle lT Ix1 and H is invertible, so lT H−1Hx1 = 0. This gives us a vector that isperpendicular to Hx1. The way we found it is applicable to the other two points as well, so lT H−1

or H−T l is perpendicular to all three transformed points. This means it is in the equivalencyclass of vectors that give the homogeneous coordinates of the line running through the three newpoints. This shows that homogeneous line coordinates in a projective transformation are affectedby H−T .

An Example

A projective transformation we’re going to use excessively is a central projection between twoplanes. The central projection we introduced in the first section is not a projective transforma-tion. This is shown quickly with a line going through the center: its image is a point, not a line.We can however adapt it to give us a very useful transformation: a central projection betweentwo planes:

Figure 20: Plane to plane central projection

Definition (Central Projection Plane to Plane): Let’s take Σ ̸= Π ⊂ P3 two projectiveplanes. And O ∈ P3 the center (O /∈ Σ,Π). For any point P ∈ Π its image will be the pointP ′ := Σ ∩OP , the intersection of the OP line and the image plane.

It is basically the same as our previous central projection with a restriction on the points wetransform, they can only come from a given plane.

Theorem: A central projection between two planes is a projective transformation.

22

Figure 21: Three collinear points remain collinear after being projected form Σ to Π

Proof: To prove this we have to show all lines turn into lines and all lines originate fromlines. If we switch Σ and Π in the definition we have a central projection from Π to Σ. Thisshow us two things: h is indeed invertible and we need only prove lines turn into lines, the otherdirection is proven by the role reversal. We have to show x1, x2, x3 ∈ l ⊂ Π result in collinearimage points. The Ox1, Ox2, Ox3 projection rays all lie flat in the plane defined by O and l. Thusthe image points also lie in this plane. They also are on the image plane, so they are found atthe intersection of Σ and the plane Ol. Two planes intersect in a line, so h(x1), h(x2), h(x3) beingthere means they’re collinear. With the role reversal trick explained above this is enough to provethis projection is indeed a central projection.

□

We have to briefly look at the homogeneous coordinates involved in this transformation. Forthe coordinate system the origin will be at the center for projection and we’ll use the special caseconstruction. This results in a very simple transformation matrix, namely the identity matrix. Thedefining v vectors of the coordinates don’t necessarily coincide, they might be scaled differently.But these result in the same coordinates because of the equivalency class we defined.

Camera MatricesCameras do not perform projective transformations, but we can hope to construct a similar matrixrepresentation for a central projection in this case. There are a lot of components which affect thisfunction. We can categorize these into two main groups: intrinsic and extrinsic factors. Intrinsicfactors are the parts that only depend on the internal calibration of a camera. Extrinsic factorsconsist of the coordinate frame of our camera relative to the world frame.

Camera Extrinsics

We need to find the camera’s position relative to the origin and the direction it’s pointed in. Thisaccounts for all the positions the camera can be in. We can procure all of these with a translationand a rotation starting from the world frame. The matrix R ∈ R3×3 now denotes a classic three

23

dimensional rotational matrix and t ∈ R3 is the translational vector. These form a transformationmatrix for three dimensional homogeneous coordinates, with the measurements of 4× 4.[

R t

0T 1

]

Figure 22: First the world frame is rotated to align with the camera coordinate system, then atranslation brings it into position

We will need the world frame from the camera’s point of view, thus we need the inverse ofthe above matrix. Thankfully rotational matrices are orthogonal, so their inverse is equal to theirtranspose, which is easy to calculate. [

RT −RT t

0T 1

]

Camera Intrinsics, the Calibration Matrix

The definition we used for central projection is characterized by the center of projection and theprojection plane. We’ll calculate the projective transformation matrix in the case of the centerbeing at the origin and the plane at z = f where f is the focal length. This is according to theworld coordinate system, our plane coordinates come from the special camera coordinate systemwe designed earlier. This way we’ll get the intrinsic factors in a matrix known as the calibrationmatrix.

We want our 3D point

X

Y

Z

T

projected onto a plane, where it will have the homogeneous

coordinates of

xyz

. If we only look at the x or y coordinate our central projection rescales it with

fZ (see Figure 23). In other words our new plane-coordinates should be: x = f

Z ·X and y = fZ ·Y .

In the matrix below X and Y are multiplied by f and the third coordinate stays itself.

24

xyz

=

f 0 0 0

0 f 0 0

0 0 1 0

X

Y

Z

T

Figure 23: Point P on the xz plane shows how the x coordinate gets rescaled by fZ , this can be

seen on the dotted segments, P ′ =[fZX 0 f 1

]TThis matrix does exactly what we want, because we’re using homogeneous coordinates, so the

scaling by Z will happen automatically. The origin of the Cartesian coordinate system of the

plane isn’t necessarily at

001

. This means the x and y coordinates of a point aren’t the same in

the homogeneous coordinate form, and the Cartesian. This could come in handy though, so we’llcorrect for it.

xyz

=

f 0 tx 0

0 f ty 0

0 0 1 0

X

Y

Z

T

Figure 24: The translation with(tx ty

)Tcorrects the misalignment of the origins

25

We will be working with the matrix above, but we’ll rearrange it for convenience sake in thesteps below. (I is the identity matrix, 0 is a zero vector and these are glued together to form amatrix in R3x4)

=

f 0 tx

0 f ty

0 0 1

[I|0]

X

Y

Z

T

= K[I|0]

X

Y

Z

T

The matrix we just constructed K ∈ R3x3 is our calibration matrix, which represents the

intrinsical factors. It is important to note that this is a purely mathematical model. At some pointwe have to draw a line at what factors we take into consideration. There are other influencingelements here: pixel coordinates breadth and hight don’t necessarily coincide with the plane’shomogeneous coordinates and the lens may also be distorted. We will not discuss these in depthin this thesis, for detailed solutions see Multiple View Geometry [1].

In the beginning of this piece we decided to put all the useful points on the ”wrong” side ofthe image plane. We don’t use the information of which side the points come from, but now wesee how cameras work we can show how to flip everything to real eye and camera alignment. Itwould simply be a matter of putting the image plane at z = −f instead of z = f . This is amultiplication by −1 which shows us it really is easy to correct, so we were justified in choosingthe ”back” side to put all points for convenience.

Tying the Two Together

We now know our camera inside and out, we just have to combine the two components. Ourcalibration matrix works on a normalized camera, where the x axis is the optical axis and theprojection plane is at f . The formula below is read from right to left: first we move our camerainto position with an isometry (a rotational matrix R and a translation t combined in the matrixbelow). Then we use our formula for the intrinsic factors.

K[I|0]

[RT −RT t

0T 1

]=

We can perform the second multiplication and then multiply out the rotational matrix. Now ourformula will achieve its final form:

K[RT | −RT t] = KRT [I| − t] = P

We now have a matrix that simulates the image making process of a camera in any position. Wehave reached our goal, this is the camera matrix and it works as follows:

xyz

= P

X

Y

Z

T

Where we have a point with 3D world coordinates and we can calculate its image coordinates

on the projective image plane using the camera matrix P .

26

Fundaments of the Fundamental MatrixLet’s take a ball as example. If we have one image of our ball we get a rough idea where itis in space. A huge ball far away and a small ball close to our camera remain unfortunatelyappear equivalent. If we have another image of our ball, but this time from a different, knownangle, we can figure out its position in space. Using two cameras is different positions gives usmore information and is a basic need for depth perception. From now on we will mostly use twocameras and will explore a bit of the vast world of two view geometry.

TriangulationRight now we assume to know the two different camera positions, let us also assume our ball is apoint. A camera is essentially a center and a plane, so now we have two centers O1, O2 and twoimage planes. We also have two dots on the two image planes (x1 ∈ Σ1, x2 ∈ Σ2), two images ofthe same 3D point, made with the two cameras respectively. Our objective right now is to findX, the original space point. The method to do this is called triangulation, we will briefly touchon some versions of it.

We know that x1 is the result of a projection, it is exactly at Σ1∩O1X. This also means O1x1,contains X somewhere. Reverse projecting x2 results in O2x2, also containing X. Thus X mustbe at the intersection of these two rays.

Figure 25: If the two projection lines don’t intersect we need triangulation to find an accaptableX

Real pictures are rarely mathematically precise, the two rays won’t usually meet. One way tofind an acceptable approximation of our point is to take the point in space which is closest to thetwo rays. Expanding on this further leads to a regular least squares problem, which is favorablefor our computations.

A statistically more precise method of triangulation uses our knowledge of the relation betweenX and its image points. We have a set of equations to obtain our image points’ homogeneouscoordinates. With the 3D point’s coordinates as variables we can minimize the residue that camefrom the inaccuracy of our cameras. For a more detailed, programmable description see Szelsk2010 [4].

27

Epipolar GeometryWe now return to our mathematically perfect world and pretend for a while that our cameras areperfect, our measurements are precise. Epipolar geometry is the math of the relation between thetwo images, regardless of the captured scene’s characteristics. To investigate this, we’ll look closerat the relation of x1 and x2, the two images of the same space point.

Figure 26: The main characters in epipolar geometry

We’re looking at interrelations of objects on the two image planes, so from now on we’ll haveto omit using X. An important step in the image making process is projecting by the ray definedby O1 and X. Thankfully x1 is collinear to these, so the ray we used in the central projectionO1X is also the ray O1x1. Taking its picture with our second camera results in a line as its image,this is known as an epipolar line l2. We have perfect cameras now, so X, the ray of the firstprojection and l2 are all on the same plane π. Taking a picture of X only involves componentson this plane, the epipolar plane. This is quite fortunate if for example we were searching for x2,we’d only have to look on l2. π also contains the line running between O1 and O2 called the axis,on it the intersections with the planes. These are called the epipoles e1 and e2 and are the twoimages of the other camera centers. Epipolar geometry is basically the geometry dictated by thepencil of these planes around the baseline.

Fundamental MatrixNow that we’re acquainted with the basic idea of epipolar geometry we will look for its algebraicrepresentation. We saw that every image point has a corresponding epipolar line on the otherplane. This means there is a function that turns the points on one image into epipolar lines onthe other. Our objective in this section is to find the matrix that represents this transfer calledthe fundamental matrix.

28

We’ll be looking at their homogeneous coordinates not from the world frame, but for eachtheir own camera’s frame of reference. For example x1 will denote the image point’s homoge-neous coordinates formed by Σ1 as the plane and O1 as the center. The function we’re looking at

transforms x1 =

x1

y1

z1

into l2 =

x2

y2

z2

, so F , the matrix representation of it must be a 3x3 matrix.

Definition (Fundamental Matrix): F ∈ R3×3, where Fx1 = l2 for every x1 ∈ Σ1 pointand its corresponding epipolar line l2.

This does not guarantee that F exists, but thankfully we can construct it. If we can do forany two camras it would prove its existence, so let’s see a way to find F .

Geometric Construction

Two pictures of the same scene are connected by a homography. We will be using one of these,transfer by plane Hπ : Σ1 → Σ2,. We take a plane Π ⊂ P3 not coinciding with the two imageplanes. To get the image of x1 we first project it from the center O1 onto this plane and then witha central projection from O2 onto Σ2 we transfer it there. This is a projective transformation,because it is two central projections between planes, one after the other. Using Hπ x2 may bewritten in the following form:

x2 = Hπx1

Figure 27: Transfer by plane

We know from the previous section that the epipole e2 and x2 both lie on the epipolar line l2.Expressed in homogeneous terms, with the matrix form of a cross product:

l2 = e2 × x2 = [e2]×x2

Combining these two findings gives us a fundamental matrix:

l2 = [e2]×Hπx1 = Fx1 =⇒ [e2]×Hπ = F

We can apply this construction for any two cameras, so this proves the existence of at least onefundamental matrix for each camera pair.

29

F ’s Rank

We will begin the exploration of the properties of fundamental matrices by looking for its rank.This is the dimension spanned by its columns, so it can’t be one (or zero). If it were, then all thex1 points would be transferred into the same line. We know this is not the case, so F can’t havea rank of 1 (or less).

It is easy to find a vector in F ’s left nullspace: eT2 . All the epipolar lines on the second imageplane contain the epipole, expressed with homogeneous coordinates eT2 l2 = 0. So for all x1 ∈ Σ1

it must result in eT2 Fx1 = 0, proving eT2 F = 0. Because F ’s nullspace is not empty, F can not beof full rank 3.

This leaves us with rank 2 as the only option: all fundamental matrices have the rank of 2.

Symmetrically e1 is in F ’s right nullspace. We can prove this by looking at the all the pointsthat give l2 as their epipolar line. When transferring x1, its projection ray defined a plane togetherwith the axis. The intersection of this plane and Σ2 defined l2. So to get the same line on thesecond image plane all the points x∗

1 have to define the same plane. Thus all of them lie on aplane and on Σ1 which means they lie on a line l1 at the intersection of the two planes.

Projecting these points to Σ2 again puts them all on the line l2. We can express this inhomogeneous coordinate terms Fx∗

1 = l2. Another point on l1 is the epipole. If we close in one1 on l1 then Fx∗

1 closes in on Fe1 on the other image plane. Here all these coordinates arel2 multiplied by a scalar, this is true however close we go to Fe1. Because the change betweencoordinates has to be continuous, this means Fe1 also has the form of λl2.

Figure 28: Closing in on Fe1 shows us it must have a form of λl2

Let’s now take a different point x′1. This has its own, distinct epipolar line which in turn defines

a new set of x′∗1 points on a new l′1. Like in the last paragraph we can deduce that Fe1 = λ′l′2.

And this holds true for all the l∗2 epipolar lines. This can only be true if these λ′’s are zeros,because all the l∗2 lines are distinct. This proves Fe1 = 0, the first epipole is in the right nullspaceof the fundamental matrix.

30

Algebraic Derivation

We can construct a fundamental matrix directly from the camera matrices P1 and P2. We have tofind the matrix that transforms points on Σ1 into their correspondng lines on Σ2. To do this wewill use a 3D point that we know is on the OX projection line and we can find with no knowledge ofX, only the 2D homogeneous coordinates of the image: x1. It is P+

1 x1, where P+1 = PT

1 (P1PT1 )−1.

This works as an inverse matrix for non square matrices, because P1P+1 = I, it is called a pseudo

inverse. Thus P+1 x1 is indeed on the right line, because its image is P1P

+1 x1 = x1.

Figure 29: Finding l2 using the camera matrices

Now we project this point onto Σ2, we get P2P+1 x. We are looking for its epipolar line, which

is easy to find because it also runs through the epipole. We know e2 ∈ l2 and P2P+1 x ∈ l2, if we

express this in homogeneous terms we are done:

l2 = e2 × (P2P+1 x1) = [e2]×P2P

+1 x1 = Fx1 ⇒ F = [e2]×P2P

+1

This coincides with the specific case of the geometric derivation when Hπ = P2P+1 .

Epipolar Constraint

We know x2 ∈ l2 and we can express this relation with homogeneous coordinates: 0 = xT2 l2 This

is called the epipolar constraint for the fundamental matrix and is a very important discovery.

xT2 Fx1 = 0

It gives us another way to define F :

Definition (Second Definition for the Fundamental Matrix): F ∈ R3×3 of rank 2 isthe fundamental matrix if xT

2 Fx1 = 0 for every x1, x2 corresponding pair of image points.

Theorem (Equivalency of the Fundamental Matrix Definitions): For a matrix F ∈R3×3: Fx1 = l2 for every x1 ∈ Σ1 point and its corresponding epipolar line l2 if and only if theepipolar constraint is realized for all matched points: xT

2 Fx1 = 0 and it has rank 2.

31

Proof: The elements of the second definition are both properties stemming from the first one,so we have already proven this direction of the theorem.

Now for the first definition from the one with epipolar constraints. If we take a fixed x1 ̸= e1

all its corresponding points lie on the same epipolar line x∗2 ∈ l2. We know that for all these

points x∗T2 Fx1 = 0. This means the vector Fx1 is perpendicular to the coordinate vectors of all

the points x∗2. These vectors sweep across l2 stemming from O2. Forming right angles with all of

these is equivalent to being perpendicular to the plane defined by O2 and l2.

Figure 30: l2 is perpendicular to all x∗2’s

This leaves us two options: Fx1 is a null vector or it is a normal vector of this plane. If it werea null vector x1 would be in F ’s nullspace.

We already know e1 is in the nullspace for a fundamental matrix. But we don’t know this forF defined by the epipolar constraint, we just strongly suspect this will be the case. We can proveour suspition by taking a fixed x2 ∈ Σ2 and transferring it back onto Σ1. This gives us the pointsx∗1 that define the l1 line. We know that for all these points xT

2 Fx∗1 = 0. If we continuously move

along l1 to e1 the transition can’t be abrupt, so xT2 Fe1 = 0 also. But this can be said of any

x2 ∈ Σ2, thus Fe1 has to be 0, placing e1 in the right nullspace.

Figure 31: As we near e1 the value of xT2 Fx∗

1 has to continuously stay 0

32

Now that x1 is also in the nullspace, it makes it two dimensional. This would mean F has arank of 1, so x1 and e1 don’t fit because F has to have a rank of 2. We’re left with the option thatFx1 is a normal vector of the O2l2 plane. This is the definition of the homogeneous coordinate ofl2. Thus we have proved for a fixed x1 that Fx1 = l2.

These steps can be applied to any x1 ∈ Σ1, which results in our first definition of the funda-mental matrix: Fx1 = l2 for all points on Σ1 and their corresponding epipolar lines.

□

The usefulness of the definition using the epipolar equation lies in its independence from X.It gives constraints on F solely from two images, without in depth knowledge of the scene. Thiswill lead us to the computation of F from 8 corresponding image points, the ultimate goal in thisthesis.

Additional Traits

It is apparent from the algebraic derivation’s formula F = [e2]×P2P+1 that the order of the

camera matrices matters. This is not all that surprising, the input for F1,2 comes from Σ1, itfollows that there exists an F2,1 for transferring points from Σ2 to their epipolar lines on Σ1.What is the relation between these two fundamental matrices? This is the perfect time to use ouralternative definition, using the epipolar constraint: F1,2 is P1, P2’s fundamental matrix if andonly if xT

2 F1,2x1 = 0. We can similarly state xT1 F2,1x2 = 0. Let us maneuver the first equation

into a comparable form, let’s transpose it!

0T = xT2 F1,2x1 = (xT

2 F1,2x1)T = xT

1 FT1,2x2

Now they look very alike, especially if we take a closer look at 0T . Right now this is not a zerovector, this is the number zero. Transposing it does nothing!

xT1 F

T1,2x2 = 0 = xT

1 F2,1x2

This was the defining equation for F2,1, which means FT1,2 is a good choice for F2,1.

Using the transposing trick we can give a very simple proof for Fe1 = 0. We saw that theother epipole is in F ’s left nullspace: eT2 F = 0. Following the same logic, but with the cameraorder reversed we get eT1 F

T = 0. Transposing both sides of this equation gives Fe1 = 0T , so e1 isin F ’s right nullspace.

It is now apparent that the line l1 we constructed when we first proved Fe1 = 0 is in fact theepipolar line corresponding to x2. We have matching lines on the two image planes, where everypoint on one line has the other line as epipolar line.

F transforms homogeneous coordinates into different but still homogenous coordinates. Thismeans scalars don’t matter, µF will do the exact same thing as F . This is because the homogeneouscoordinates can swallow the excess scalars: µFx1 = Fµx1 = Fx1 = l2. This means only ever haveto look for a unique F up to a scalar.

33

Usefulness of F

Why Do We Like F?If we have two cameras they define a fundamental matrix. The question arises: Does the funda-mental matrix define the cameras? This is sadly not the case. The theorem below easily disprovesit, for it shows that there are camera pairs that result in the same fundamental matrix.

Theorem: If one pair of cameras can be moved by a projective transformation into anotherpair, both sets will define the same fundamental matrix.

Proof: We need to show that if we compute F from some property of P1 and P2 and usingthe same computing method get F ′ from P ′

1 and P ′2, these two matrices are actually the same.

We saw earlier that to get F , it is enough to know all the corresponding pairs of image points.We have not yet shown a concrete computational method for this, but we will be using the factthat it could be done in this proof.

This method uses the homogeneous coordinates of the image points. If by some miracle x1

and x2 coordinates stay matching after translation, the method will have to result in the samefundamental matrix in both cases. We have to look at x1 that we got from P1 in the P ′

1 = P1Hsystem.

x1 = P1X = P1HH−1X = (P1H)(H−1X)

x1 is the image point of X made with camera P1, but it is also the image point of H−1X withP1H = P ′

1. It’s the same with x2:

x2 = P2X = P2HH−1X = (P2H)(H−1X)

So the two originally matching homogeneous coordinates x1 =

x1

y1

z1

and x2 =

x2

y2

z2

match af-

ter the translation too. This is true for any originally matching pair that is why the method, whichuses only pairs such as these must result in the same F . Proving that a projective transformationof the cameras doesn’t change the fundamental matrix.

□

Up to a projective ambiguity

We should be glad, this is the extent of F ’s inconclusiveness. Any fundamental matrix uniquely de-fines a pair of camera matrices up to a projective ambiguity. To prove this we have to show that allthe camera pairs attained from F belong to the same projectively equivalent class. In other words:

Theorem: For any two pairs of cameras that give the same F there exists a projective trans-formation H, where P ′

1 = P1H and P ′2 = P2H

Proof: Our first step is to simplify our problem. The fundamedental matrix is only definedwith the relationship of x1 and l2, so it is only determined by intersection of lines and planes. Ifwe move the camera pairs it is apparent, these will stay the same and result in the same F . This

34

means we can move the bases of the first cameras to align with our world frame. This results inthe simple camera matrices of P1 = P ′

1 = [I|0] (canonical form, we’ll define it properly later).The camera matrix used to be KRT[I|−t]. We omit the use of the calibration matrix K,

because the technicalities of the pixel coordinates on the plane don’t interest us right now. Tocalm our conscience we can assume these are cameras with the identity matrix as their calibrationmatrices.

The two image planes of the first cameras in each pair are now aligned parallel to each otherand to z = 0 according to the world frame. Because we are using homogeneous coordinates acoordinate defines a line through the center, the same line for both cameras. This means theimage on one plane is the same as the image on the other, only their scale is different. This makesthese cameras sufficiently similar for our purposes.

What about the other two cameras? Where are those? We can get that information fromtheir camera matrices: P2 = RT[I|− t], P ′

2 = R′T[I|− t′]. Their camera coordinate systems don’toriginate from the world origin, so there is a translational vector −RTt and −R′Tt′ from theworld origin to the cameras’. The first two cameras are at the world origin, so this translationalvector also runs between O1 and O2, also O′

1 and O′2 (see Figure 32). If these were two random

pairs of cameras, then these transational vectors would point in two arbitrary directions. In ourcase however we know these camera pairs have the same fundamental matrix. This results in thesetranslational vectors having a special relationship.

This vector points from one camera center to the other, so it is parallel to the axis of thiscamera pair. If we look at it as a coordinate defining vector for the homogeneous coordinates ofΣ2 it corresponds to the image of the other center. We know this as the epipole e2, e′2 for Σ′

2. Weknow the epipole is special, it is the left nullspace of the fundamental matrix. This nullspace isone dimensional, because the rank of F is 2. Both e2 and e′2 being in it means they only differin a scalar: e′2 = ke2. This results in all the camera centers being collinear. Going back to ourcamera matrices we now have: P2 = [RT|e2] and P ′

2 = [R′T|ke2].

Figure 32: Two canonical cameras’ arrangement, in the case we’re discussing the two epipolesalign

We have a formula that gives us the fundamental matrix from the camera matrices: F =

[e2]×P2P+1 . Now P1 = [I|0], so P+

1 = I. So using this formula now gives us F = [e2]×RTI =

[e2]×RT. Again, both these camera pairs have to result in the same F and from there we will

make a few algebraic adjustments.

F = [e2]×RT = [ke2]×R

′T

0 = k[e2]×RT − [e2]×R

T = [e2]×(kRT −R′T)

35

The cross products of the columns of (kRT −R′T) with e2 all are zero. This means they areall parralel. So there exists three scalars v1, v2, v3, which rescale e2 into the three columns. Thesethree form the vector v, giving us a way to express R′T with the other matrix:

kR′T −RT = e2vT

R′T =RT − e2v

T

k

Now our camera matrices all have very specific forms: P1 = P ′1 = [I|0], P2 = [RT|e2]. And

finally P ′2 = [R

T−e2vT

k |ke2], where k is a scalar and v is a specific 3-vector. Our main goal wasto find a projective transformation that transfers one pair into the other and the transformationwith the matrix H ∈ R4×4 does exactly this:

P1H = [I|0]

[1k I 01kv

T k

]=

1

k[I|0] = P ′

1

P2H = [RT|e2]

[1k I 01kv

T k

]= [

1

k(RT + e2v

T )|ke2] = P ′1

This proves that any two pairs of cameras sharing a fundamental matrix have a projectivetransformation that moves them into each other. F defines the camera matrices up to a projectiveambiguity.

□

Canonical cameras formula

We know that any F matrix of rank 2 defines a pair of cameras up to a projective ambiguity. Nowwe will construct one of these which is enough because all the other ones can be found by applyinga projective transformation. For our convenience we will look for a pair of canonical cameras.

Definition (Canonical Form of Cameras): The canonical form of camera matrices P1

and P2 is the pair of camera matrices in their projectively equivalent group defined by F whereP1 = [I|0] .

We have already worked with canonical cameras in the last proof (see Figure 32). Now weshow a way to construct the projective transformation H that forms the camera matrix P1 ∈ R3×4

appropriately. Let’s add a row to P1, which is linearly independent from the rest. This way P ∗1 is

square and of full rank, so it is invertible. Choosing H as P ∗−11 will give us exactly what we need,

because P1H = [I|0]. H is a projective transformation that transforms our camera matrices intotheir canonical form and we showed a construction that works every time.

We’ll invoke three lemmas to construct the formula for the canonical cameras belonging to afundamental matrix:

Lemma 1 (Third Definition for the Fundamental Matrix): Any F ∈ R3×3 non zeromatrix is P1 and P2’s fundamental matrix if and only if PT

2 FP1 is skew symmetric (AT = −A).

36

Proof: We have already proven that F is the fundamental matrix if and only if the epipolarconstraint is realized. From there it is a few steps of machinations to arrive at the definition ofskew symmetrism. The following statements ensue from ”if and only if” relationships.

xT2 Fx1 = 0 for every matched x1 ∈ Σ1, x2 ∈ Σ2

XTPT2 FP1X = 0 for every X ∈ R4

(PT2 FP1)

T = −PT2 FP1

□

Lemma 2: Let P1 be [I|0] and P2 = [SF |e2] where S ∈ R3×3 is a skew matrix, F ∈ R3×3. e2

is the epipole, which simply means eT2 F = 0. If P2 is of rank 3, F will indeed be the fundamentalmatrix corresponding to P1 and P2.

Proof: According to the first lemma if [SF |e2]TF [I|0] is skew symmetric we win!

[SF |e2]TF [I|0] =[(SF )TF 0eT2 F 0

]=

[FTSF 0

0 0

]FTSF is skew symmetric, which makes the construction above skew symmetric also. This fulfillsthe condition of the first lemma, making F the fundamental matrix corresponding to P1 and P2.

□

We’re almost done with our formula: for any matrix F ∈ R3×3 of rank 2 if P2 = [SF |e2] is offull rank we have constructed a canonical pair of cameras belonging to F as fundamental matrix.But what skew symmetric matrix S ∈ R3×3 should we use? Every skew symmetric matrix S canbe written as a cross product matrix of some s. So we’re actually looking for a vector s ∈ R3

where P2 = [[s]×F |e2] has rank 3.

Lemma 3: If sT e2 ̸= 0 then [[s]×F |e2] is of full rank.

Proof: Visual aid for this proof is in Figure 33. sT e2 ̸= 0 means s is not perpendicular to e2.As e2 is the epipole eT2 F = 0, we know all of F ’s columns are perpendicular to e2. This shows thats is not in F ’s column space (span(F )). Let’s take a look at [s]×F ’s column space: it is spannedby s’s cross products with F ’s columns. Because s is not in F ’s column space the cross productsconstruct the whole plane perpendicular to s. As e2 is not at a right angle to s, so it is not inthis plane. Gluing e2 to [s]×F will therefore add another rank bringing the total up to 3, whichis what we desired.

□

For picking an s ∈ R3 where sT e2 ̸= 0 we have a handy vector lying around. e2 is notperpendicular to itself, so eT2 e2 ̸= 0 which is what we needed.

Now we have constructed the canonical pair of cameras: P1 = [I|0] and P2 = [[e2]×F |e2]belonging to fundamental matrix F . We actually want to construct all of the possible P1, P2

canonical form camera pairs. We’ve already created a formula for this in the proof of F determiningthe camera matrices up to a projective ambiguity from. From P2 = [RT|e2] we get the generalformula [R

T−e2vT

k |ke2]. Combining this with the deliberations above gives us the ultimate formula:

P1 = [I|0], P2 = [[e2]×F − e2v

T

k|ke2] for all v ∈ R3 and k scalar.

37

Figure 33: These aren’t geometric objects of any relevance, they’re just a visual aid for the proofof Lemma 3

How to Get F

Until now we have built F with input derived in some way from the camera matrices. We needa method that does not use P1 and P2, otherwise constructing the camera matrices from F isuseless. We did see that a construction from the homogeneous coordinates of matching pointpairsshould be possible.

Prelude

We now have corresponding pointpairs P1 ↔ Q1, P2 ↔ Q2, ... however many we need. Pi =Pi,x

Pi,y

Pi,z

∈ Σ1 ⊂ R3 and Qi ∈ Σ2 ⊂ R3. The F we’re looking for should satisfy all of the epipolar

constraints defined by these QTi FPi = 0.

Figure 34: We now have n corresponding pairs

38

For convenience’s sake we’ll introduce the vector f ∈ R9, in which we will store all the elementsof F . This turns the epipolar constraint placed by P1 ↔ Q1 into a long equation: Q1,xf1,1P1,x +

Q1,xf1,2P1,y + ...+Q1,zf3,3P1,z = 0. Now we can multiply f out, giving us (Q1,xP1,x+Q1,xP1,y +

... + Q1,zP1,z)f = 0. The input image points are most often regular points, so one might comeacross this equation with all the Qi,z and Pi,z coefficients equal to 1:

(Q1,xP1,x +Q1,xP1,y +Q1,x +Q1,yP1,x +Q1,yP1,y +Q1,y + P1,x + P1,y + 1)f = 0

Using all the pointpairs we can create all these equations at the same time by putting it in amatrix:Q1,xP1,x Q1,xP1,y Q1,x Q1,yP1,x Q1,yP1,y Q1,y P1,x P1,y 1

Q2,xP2,x Q2,xP2,y ... 1

...

f1,1

f1,2

...

f3,3

= Af = 0

Our vector f stems from a homogeneous matrix, so we only have to determine it up to scale.This means we have a unique solution if A has a rank of 8, which means we need at least 8 points.

If we lived in a perfect world we could solve it from here with linear methods from 8 corre-sponding pointpairs. But if coordinate data of the points is not precise, which happens in reallife, we can still construct an acceptable solution. It is best to use a lot of points, because thanthe inaccuracy is smaller. But using n points, with polluted data often results in an A ∈ Rn×9 offull rank 9. In this case Af = 0 means f can only be 0, which is quite uninteresting and not whatwe’re looking for. To bypass this case we should place a constraint, generally ||f || = 1 is used.Which norm doesn’t matter, because we only need it up to scale.

Solving an overdetermined system is best done with the least squares method. If we can’t havea solution that gives exactly 0, it makes sense to minimize ||Af || under the conventional distancedefinition: the Euclidian norm. We will be using the constraint ||f || = 1 against the 0 case.

Singular Values

We are looking for min||Ax|| with the constraint ||x|| = 1. Let’s take a closer look at the questionat hand, squared for convenience:

||Ax||2 = (Ax)TAx = xTATAx

ATA looks like an interesting matrix, let’s investigate. We know A is a n × 9 matrix, soATA ∈ R9×9. It is also apparent that this is a symmetric matrix, (ATA)T = ATA. Symmetricmatrices of n × n have n real eigenvalues and their eigenvectors form an orthogonal base. SoATA has 9 real eigenvalues, let’s index these in decreasing order λ1 ≥ ... ≥ λ9. These have theircorresponding normalized eigenvectors v1, ...v9, all perpendicular to each other. Let’s put one ofthem in our previous formula:

||Av1||2 = vT1 ATAv1 = vT1 λ1v1 = λ1v

T1 v1 = λ1||v1||2 = λ1

Good thing we took normalized eigenvectors. This result shows us that the eigenvalues of Aare all positive. It also gives rise to the next definition.

Definition (Singular Value): Let ||Avi|| =√λi be A’s singular values, denoted by σi. Here

vi is ATA’s normalized eigenvector corresponding to the eigenvalue λi.

39

These inherit the eigenvalues decreasing order, σ1 ≥ ...σ9 ≥ 0.We won’t prove the next part, but it gives insight into singular values. The transformed

normal eigenvectors and the singular values somehow show how A stretches space. For easiervisualization imagine 3D, but the same train of thought will go in 9 dimensions. The constraint||x|| = 1 confines us on the unit sphere. The nine eigenvectors are nine perpendicular vectors thatfit in this sphere. If we transform the unit sphere with A we get an ellipsoid. It can be shownthat ||Ax|| with the constraint ||x|| = 1 is maximized exactly at v1. The vector v1 is transformedinto Av1, which is at the longest part of the ellipsoid. We actually defined our singular values forthis, so that gives us the value of the maximum: σ1.

Figure 35: How A stretches 3D space at the eigenvectors of ATA [3]

We have found the ”stretch” in one direction, let’s now look at the other ones. We will onlylook at the vectors perpendicular to v1. The transformed unit sphere is now an ellipsoid of onedimension less, eight in our case. We can do this by adding an extra constraint xT v1 = 0 assuringus of the perpendicularity limitation. Similarly to v1 we can show that ||Ax|| while ||x|| = 1 andxT v1 = 0 is maximized at v2, with a value of σ2. This shows us the second stretching point of ourellipsoid.

We can go on and on adding perpendicularity constraints, making our ellipsoid smaller andsmaller and all the while max||Ax|| if ||x|| = 1 and xT v1 = 0, xT v2 = 0, ...xT vk−1 = 0 is at vk,with value of σk. With the last eigenvector we have so many constraints in place that we actuallyget the minimum of ||Ax||, ||x|| = 1. This is very fortunate, because we wanted to minimize inthe first place.

From this it is also apparent that the Avi vectors are all perpendicular to each other. Thismeans they form a base, we’ll be using this shortly.

Singular Value Decomposition

To find this minimum we will be using the singular value decomposition (SVD) of A = UDV T .Here U ∈ Rn×n and V ∈ R9×9 are orthogonal matrices. And D ∈ Rn×9 is a diagonal matrix thatcontains A’s singular values in decreasing order. As A has a rank of 9, D looks like this (the partsleft blank are all zeros).

40

D :=

σ1 0

σ2 0

... ...

σ9

0 0 ... 0

But what exactly are U and V ? We have the ortonormal set of eigenvectors from the previ-

ous section v1, ...v9 putting these in a matrix makes it orthogonal. This is V , we’ll be using ittransposed.

V :=[v1 v2 v3 v4 v5 v6 v7 v8 v9

]We actually also had a second set of perpendicular vectors, it can be shown that Av1, ...Av9

are an orthogonal set. Their lengths can differ from each others’, these were the singular values.We want them normalized, so our second orthonormal set will be: u1 := Av1

σ1, ...u9 := Av9

σ9. This

is an orthonormal base for Ax and gives us our second matrix:

U :=[u1 u2 u3 u4 u5 u6 u7 u8 u9

]Sidenote: in a more general case we might not have enough ui vectors to form an entire basefor Ax. If A has more columns than its rank, we fall short. In this case one can add vectors toform an orthogonal base using for example the Gramm-Schmidt algorithm. In our case the wholeproblem arose with A being full rank, so we don’t need this.

Now we know what they are, but we have to show this is actually what we need:

Theorem (This Is the SVD): With the matrices defined above A = UDV T , this is A’ssingular value decomposition.

Proof: It will be easier if we look at U(DV T )

DV T =

σ1vT1

...

σ9vT9

The matrix U can be viewed as a sum

[u1 0 ... 0

]+ ... +

[0 0 ... u9

]. With these two

tricks the question becomes if the next equation is true:

A?= U(DV T ) = (

[u1 0 ... 0

]+ ...+

[0 0 ... u9

])

σ1vT1

...

σ9vT9

= u1σ1vT1 + ...+ u9σ9v

T9

To show this is true we’ll multiply by x:

Ax?= u1σ1v

T1 x+ ...+ u9σ9v

T9 x =

vTx is a dot product, so it gives a scalar. σi is also a scalar, this means we can rearrange ourequation into the following order:

= σ1(vT1 x)u1 + ...+ σ9(v

T9 x)u9

41

We have a similar decomposition of Ax = a1u1 + ...+ a9u9, remember u1, ...u9 is an orthonormalbase. If only these coefficients were actually the ones in the equation above.

Lemma (Coefficients in Ax’s decomposition): If Ax = a1u1 + ... + a9u9 and ui is theorthonormal base for Ax made from ATA’s eigenvectors, then ai = σiv

Ti x.

Proof: We get ai as the coefficient belonging to ui. So it is the length of Ax vector’s componentparallel to ui. Algebraically this is their scalar product, we also fill in the definition of ui:

ai = (Ax)Tui = xTATui = xTAT Aviσi

=1

σixTATAvi =

We defined these vi vectors as the eigenvectors of ATA. Also the eigenvalues are the square of thesingular values.

=1

σixTλivi =

1

σixTσ2

i vi = σixT vi

And because of the commutativity of the dot product we have proven our lemma.

□

Ax = σ1vT1 xu1 + ...+ σvT9 xu9 = u1σ1v

T1 x+ ...+ σ9u9v

T9 x

This is true for every x ∈ R9, thus:A = u1σ1v

T1 + ...+ σ9u9v

T9 = U(DV T )

So the singular value decomposition made up of U,D, V does indeed give us A.

□

The train of thought explained above can be implemented, if we know the eigenvalues of ATA.Finding eigenvalues is a difficult question and leads us too far from our objective in this thesis.

8-point AlgorithmUsing the SVD in Our Case

Our objective was to minimize ||Ax|| with the constraint of ||x|| = 1 in place. We now know thisis the non zero singular value with the biggest index and it is found in the corresponding columnof V . In our case σ9 and v9, the last column of V .

We are in danger of making a circular argument, or at the very least we introduced a lot ofunnecessary steps. For constructing the SVD we took the eigenvectors as a given, we could’vebeen done there by using v9 at once. But we got v9 as a solution from the decomposition. This isbecause the singular value decomposition is a standard method and has a lot of different uses. It isoften readily available either from a previous project, or built in to the program being used. Thatis why a peek into its workings is included in here and we don’t jump straight to the eigenvector,which gives us the solution instantly. Constructing the SVD from scratch also shows why the lastcolumn of V is what we needed.

In our algorithm we will be using the singular value decomposition as a tool we have readilyat hand. The step involving this is: Let f be the last column of V from the singular valuedecomposition of A = UDV T , which minimizes ||Ax|| with the constraint ||x|| = 1.

42

Figure 36: Epipolar lines constructed from a matrix with full rank 3 on the left and from a propersingular F on the right [1]

Adding the Singularity Constraint

We are not done yet, because we’re missing an essential property of the fundamental matrix. TheF fundamental matrix reconstructed from f gained from the singular value decomposition willmost often not have a rank of 2. A matrix of full rank used as the fundamental matrix gives riseto a myriad of problems. For example the epipoles, which were the right and left nullspaces ofF , now no longer exist. The epipolar lines no longer intersect in a single point, throwing off anycomputation, or geometric deliberation we made for a two view set-up.

Clearly being of rank 2 is a very important property if one were to use the fundamentalmatrix, so we will be forcing it onto the F we have found thus far. We will be looking forF ′, which does have rank 2 and is also close enough to F to be an acceptable solution to theminimization we discussed in the last section. We choose close to mean in the sense of theFrobenius matrix norm, where we take the square root of the squared sum of all the elementsof the matrix (||M || =

√∑i,j m

2i,j). In the case of a Rn×1 matrix which is a vector this is the

conventional Euclidean norm.We will be using the singular value decomposition again, let F = UDV T . Now F has a full

rank of three, so the SVD of this matrix is:

F =[u1 u2 u3

]σ1

σ2

σ3

vT1vT2vT3

We get F ′ simply by removing the last singular value, replacing σ3 with 0.

F ′ =[u1 u2 u3

]σ1

σ2

0

vT1vT2vT3

This is also the SVD of F ′. This does indeed have a rank of 2, which means this transformationgives an ellipse. Flattening our ellipsoid with the least difference happens along the minimum ofthis ellipsoid. This rings true, but we still have to show hard proof that this is the closest singularmatrix to F under the Frobenius norm.

Theorem (F ′ Minimizes): F ′ constructed as instructed minimizes ||F − F ′|| under theFrobenius norm with the constraint of F ’s rank being 2.

43

Proof: First we’ll calculate the exact minimum of ||F −F ′||. Next we’ll show a quick and easyway to calculate the Frobenius norm of a matrix with known singular value distribution. Usingthese two we will show our theorem is true.

Lemma (Lower Bound of Frobenius Norm): For all A ∈ Rn×m their Frobenius norm isat least ||Ax|| where ||x|| = 1 is an m-vector.

||Ax|| ≤ ||A||

Proof: We’ll square the norms for convenience sake and regard A as a matrix made up of itsrows as vectors.

||A||2 =

∥∥∥∥∥∥∥aT1...

aTn

∥∥∥∥∥∥∥2

= aT1 a1 + ...+ aTnan = a21 + ...+ a2n

Looking at the other half of the equation that we want to prove:

||Ax||2 =

∥∥∥∥∥∥∥aT1 x

...

aTnx

∥∥∥∥∥∥∥2

= (aT1 x)2 + ...+ (aTnx)

2 =

Using the definition of a dot product

= (||aT1 ||||x|| cosα1)2 + ...+ (||aTn ||||x|| cosαn)

2 =

= ||aT1 ||2||x||2 cos2 α1 + ...+ ||aTn ||2||x||2 cos2 αn

What do we know of these quantities? We assumed ||x|| = 1 so ||x||2 = 1. We also knowcos2α ≤ 1. Using these:

||aT1 ||2||x||2 cos2 α1 + ...+ ||aTn ||2||x||2 cos2 αn ≤ ||aT1 ||2 · 1 · 1 + ...+ ||aTn ||2 · 1 · 1

||Ax||2 ≤ ||A||2

These are positive values, so taking the square root gives us exactly the equation we wantedto prove.

□

Let us use this in our case, with any x

||F − F ′|| ≥ ||(F − F ′)x|| = ||Fx− F ′x||

We’re in the fortunate position that we can choose x rather freely, as long as the constraint ||x|| = 1

stays true. We will now make use of the fact that F ′ has to have a rank of 2. Because of thisit has to have a non zero vector v in its right nullspace. If we normalize this vector it still staysin the nullspace, but now the constraint is realized, its a good choice for x∗ = v

||v|| . Using thisspecial x∗

||F − F ′|| ≥ ||Fx∗ − F ′x∗|| = ||Fx∗|| ≥ σ3

44

We saw previously that σ3 was the minimum for ||Fx|| with the constraint of ||x|| = 1. So thenorm we want minimalized has a lower bound of σ3. Now we have to show that the matrix wewant as F ′ results in this amount.

Lemma (Frobenius Norm From SVD): A matrix A ∈ Rn×m with a SVD of UDV T hasthe Frobenius norm of ||A|| = ||D|| =

√σ21 + ...+ σ2

r where r is A’s rank.

Proof: We’ll be using some algebraic tricks to show this. The trace of the matrix ATA isa21 + ... + a2n where these are the row vectors. We saw previously that this is exactly the normsquared.

||A||2 = tr(ATA) = tr(V DTUTUDV T ) = tr(V DTDV T ) =

U is an orthogonal matrix, thus UTU = I, it disappears. We’ll now use the fact that if M isan n×m matrix and N is m× n then tr(MN) = tr(NM).

= tr(V DTDV T ) = tr(DV TV DT ) =

Now the two orthogonal matrices are again aligned, they disappear:

= tr(DDT ) = σ21 + ...+ σ2

r

Taking the square root of this equation proves our lemma.

□

Now we can quickly and easily calculate, using the second lemma in the last step:

||F − F ′|| = ||U

σ1

σ2

σ3

V T − U

σ1

σ2

0

V T || =

= ||U

0 0

σ3

V T || =√0 + 0 + σ2

3 = σ3

We know the minimum case of ||F − F ′|| has the value of σ3 and our matrix gives just that.

□

Algorithm

Now we know enough to finally write down the 8-point algorithm step by step.

0. Convert the input to the preferred form:Unfold the matrix F into the vector f

Construct the matrix A from the equations gleaned from the epipolar constraints placed bythe input points: Q1,xP1,x Q1,xP1,y ...1

Q2,xP2,x Q2,xP2,y ...1

...

= A

45

1. Solve

(a) If we have 8 points, or n, but the rank of A is still 8, solve Ax = 0, which gives us anf as solution

(b) If we have n points and rank 9, find min||Ax|| with the constraint of ||x|| = 1 in place.Do this by taking the last column of V from the SVD of A = UDV T . This is our f inthis case.

2. Enforce singularity constraintIf the rank of F reconstructed from f is not 2, calculate F ′ which does have the correctrank. Find min||F − F ′|| with the constraint detF ′ = 0. This is done by replacing the lastsingular value of F with 0 in the diagonal matrix of its SVD

Unfortunately this algorithm is very sensitive to the nature of the input data. For example ifall the points are very far from the origin the error is greatly amplified. The F computed fromthis data will be very imprecise. It is therefore recommended to normalize the input points beforecomputing. The recommended normalization for the 8 point algorithm is a translation and ascaling so that the quadratic mean of the distance of the points from the origin is

√2. The way

to do this is explained in Multiple View Geometry [1].This is called the normalized 8-point algorithm and is recommended over the original. This

algorithm is fast, simple and readily implemented with the right tool set and with the normalizationit is also adequately precise.

7-point Algorithm

In the algorithm above we needed 8 points or more, but we can actually find a solution from just7 points. Using seven points gives us a matrix A ∈ R7×9 which will most often have a rank of7. The solutions we got for Af = 0 from 8 points defined it up to scale, but from 7 points thesolutions form a two dimensional subspace. Now we find the generators of this subspace f1, f2 andreconstruct them into matrices. All the solutions can be gotten in the form of αF1 + (1− α)F2.

From these solutions we can select the best by using the other piece of information about F

that we know. We know it has a rank of 2, so we have gained the equation det(αF1+(1−α)F2) = 0.This is a cubic equation for α, which we can easily solve, giving us a final answer for F .

From HereWe have reached the end of this thesis, but there are a lot of things still left to explore. We usedthe book Multiple View Geometry [1], the following topics can be found in it.

We have only briefly touched on triangulation, but it has a lot of variations. There are othermethods to reconstruct scene points from their images. If we know how to construct the scenefrom two images we can now also gain knowledge of the scene beside the fundamental matrix andthe camera matrices.

Projective space is amazing, but we still like Euclidian geometry a lot. If we have the cameracenter at infinity we work with parallel projection rays. No extra points appear, none disappear.Between two pictures taken with cameras like this the transformation is an affine transformation:it is like a projective one, but ideal points stay ideal. The fundamental matrix for this constructionis called the essential matrix. The 8-point algorithm was originally developed to find this matrix.

46

Another big and very useful part of this subject is its programmability. The way we presentedthe 8-point algorithm can be used as a road map to actually program it. There are a lot of otherthings in here that also can be made into concrete computations.

One can also go further along the vein of this thesis. Two views were wonderful, so what aboutthree, four, n? The geometry for all of these exist and are vastly useful. The fundamental matrixwe used for two views has its equivalent in the trifocal tensor and multifocal tensors for moreviews.

What We Did

Homogeneous coordinates and projective transformations are essential tools for programming vi-suals in any setting. We introduced them together with projective space and discussed theirproperties. We used these here to capture algebraically what a camera does. First we constructeda mathematical model for this, which we then turned into the camera matrix.

Moving on to two views we introduced the basic concepts of epipolar geometry and presentedthe fundamental matrix. We showed F defines the camera matrices up to a projective ambigu-ity and also constructed a formula to get these instantly. For getting the fundamental matrixwe showed a few approaches, the most important one of these is the 8-point algorithm. Thiscomputational method uses the singular value distribution, which we briefly looked into.

All in all multiple view geometry is a vast subject with many applications. We have given atool set to explore this and presented a peak into two views, with programmable parts and lotsof geometric deliberations.

AcknowledgementsI want to thank my supervisor Dávid Szeghy for his mathematical support and indispensable helpin writing my thesis.

References[1] Richard Hartley and Andrew Zisserman Multiple View Geometry in Computer Vision, 2003

[2] Dávid Szeghy Alkalmazott Modul Jegyzet Geometriai Transzfomrációk (Hungarian)https://web.cs.elte.hu/ szeghy/files/MBSJ.pdf, 2013

[3] Reza Bagheri Understanding Singular Value Decomposition and its Application in Data Sciencehttps://towardsdatascience.com/understanding-singular-value-decomposition-and-its-application-in-data-science-388a54be95d, 2020

[4] Richard Szelski Computer Vision: Algorithms and Applications, 2010

[5] Rijn van den Boomgaard The Pinhole Camera Matrixhttps://staff.fnwi.uva.nl/r.vandenboomgaard/IPCV20172018/LectureNotes/CV/PinholeCamera/PinholeCamera.html

[6] Florence dome picture from Viatorhttps://www.viator.com/tours/Florence/Skip-the-Line-Florence-Duomo-with-Brunelleschis-Dome-Climb/d519-3092BRUNESCHELLI#hostPhotos

[7] George Wald Eye and Camera, 1950

47

bsc thesis fundamental matrix fun - elte

Documents