structure from motion16385/s18/lectures/lecture12.pdfstructure from motion ambiguity • if we scale...

104
Structure from motion 16-385 Computer Vision Spring 2018, Lecture 12 http://www.cs.cmu.edu/~16385/

Upload: others

Post on 11-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

16-385 Computer VisionSpring 2018, Lecture 12http://www.cs.cmu.edu/~16385/

Page 2: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Course announcements

• Homework 3 has been posted and is due on March 9th.- Any questions about the homework?- How many of you have looked at/started/finished homework 3?

• Grades for homework 1 have been posted.

• Vote for Yannis’ office hours on Piazza.

• Talk: Pulkit Agrawal, "Computational Sensorimotor Learning," Tuesday 10:00am NSH 3305.

Page 3: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Leftover from lecture 11:

• Structured light.

New in lecture 12:

• A note on normalization.

• Two-view structure from motion.

• Ambiguities in structure from motion.

• Affine structure from motion.

• Multi-view structure from motion.

• Large-scale structure from motion.

Overview of today’s lecture

Page 4: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Slide credits

Many of these slides were adapted from:

• Kris Kitani (16-385, Spring 2017).

• Noah Snavely (Cornell University).

• Rob Fergus (New York University).

Page 5: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

A note on normalization

Page 6: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Estimating F – 8-point algorithm

• The fundamental matrix F is defined by

0Fxx'

for any pair of matches x and x’ in two images.

• Let x=(u,v,1)T and x’=(u’,v’,1)T,

333231

232221

131211

fff

fff

fff

F

each match gives a linear equation

0''''''333231232221131211 fvfuffvfvvfuvfufvufuu

Page 7: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

0

1´´´´´´

1´´´´´´

1´´´´´´

33

32

31

23

22

21

13

12

11

222222222222

111111111111

f

f

f

f

f

f

f

f

f

vuvvvvuuuvuu

vuvvvvuuuvuu

vuvvvvuuuvuu

nnnnnnnnnnnn

Problem with 8-point algorithm

~10000 ~10000 ~10000 ~10000~100 ~100 1~100 ~100

!

Orders of magnitude difference

between column of data matrix

least-squares yields poor results

Page 8: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Normalized 8-point algorithm

(0,0)

(700,500)

(700,0)

(0,500)

(1,-1)

(0,0)

(1,1)(-1,1)

(-1,-1)

1

1500

2

10700

2

normalized least squares yields good results

Transform image to ~[-1,1]x[-1,1]

Page 9: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Normalized 8-point algorithm

1. Transform input by ,

2. Call 8-point on to obtain

3.

ii Txx ˆ '

i

'

i Txx ˆ

'

ii xx ˆ,ˆ

TFTF ˆΤ'

0Fxx'

0ˆ'ˆ1

xFTTx'

Page 10: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Normalized 8-point algorithm

A = [x2(1,:)'.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...

x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...

x1(1,:)' x1(2,:)' ones(npts,1) ];

[U,D,V] = svd(A);

F = reshape(V(:,9),3,3)';

[U,D,V] = svd(F);

F = U*diag([D(1,1) D(2,2) 0])*V';

% Denormalise

F = T2'*F*T1;

[x1, T1] = normalise2dpts(x1);

[x2, T2] = normalise2dpts(x2);

Page 11: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Results (ground truth)

Page 12: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Results (8-point algorithm)

Page 13: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Results (normalized 8-point algorithm)

Page 14: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view structure from motion

Page 15: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure(scene geometry)

Motion(camera geometry)

Measurements

Pose Estimation known estimate3D to 2D

correspondences

Triangulation estimate known2D to 2D

coorespondences

Reconstruction estimate estimate2D to 2D

coorespondences

Page 16: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

Page 17: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Camera calibration &

triangulation• Suppose we know 3D points

• And have matches between these points and an image

• How can we compute the camera parameters?

• Suppose we have know camera parameters, each of which observes a point

• How can we compute the 3D location of that point?

Page 18: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

• SfM solves both of these problems at once

• A kind of chicken-and-egg problem

• (but solvable)

Page 19: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Given a set of matched points

Estimate the camera matrices

Estimate the 3D point

Reconstruction(2 view structure from motion)

Page 20: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Given a set of matched points

Estimate the camera matrices

Estimate the 3D point

Reconstruction(2 view structure from motion)

‘structure’

‘motion’(of the cameras)

Page 21: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view SfM

1. Compute the Fundamental Matrix F from points

correspondences

Page 22: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view SfM

1. Compute the Fundamental Matrix F from points

correspondences

8-point algorithm

Page 23: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view SfM

2. Compute the camera matrices P from the Fundamental

matrix

P = [ I | 0 ] and P’ = [ [ex]F | e’ ]

1. Compute the Fundamental Matrix F from points

correspondences

8-point algorithm

Page 24: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Camera matrices corresponding to the

fundamental matrix F may be chosen as

(See Hartley and Zisserman C.9 for proof)

Page 25: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Find the configuration where the points is in front of both cameras

Page 26: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view SfM

2. Compute the camera matrices P from the Fundamental

matrix

P = [ I | 0 ] and P’ = [ [e’x]F | e’ ]

3. For each point correspondence, compute the point X in

3D space (triangularization)

DLT with x = P X and x’ = P’ X

1. Compute the Fundamental Matrix F from points

correspondences

8-point algorithm

Page 27: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Triangulation

image 2image 1

Find 3D object point

camera 1 with matrix camera 2 with matrix

Page 28: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Two-view SfM

2. Compute the camera matrices P from the Fundamental

matrix

P = [ I | 0 ] and P’ = [ [e’x]F | e’ ]

3. For each point correspondence, compute the point X in

3D space (triangularization)

DLT with x = P X and x’ = P’ X

1. Compute the Fundamental Matrix F from points

correspondences

8-point algorithm

Page 29: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Is SfM always uniquely

solvable?

Page 30: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Ambiguities in structure

from motion

Page 31: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Is SfM always uniquely

solvable?

• No…

Page 32: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

SfM – Failure cases

• Necker reversal

Page 33: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective Ambiguity

• Reconstruction is ambiguous by an arbitrary 3D

projective transformation without prior knowledge of

camera parameters

Page 34: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

• Given: m images of n fixed 3D points

xij = Pi Xj , i = 1, … , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and

n 3D points Xj from the mn correspondences xij

x1j

x2j

x3j

Xj

P1

P2

P3

Page 35: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion ambiguity

• If we scale the entire scene by some factor k and, at

the same time, scale the camera matrices by the

factor of 1/k, the projections of the scene points in the

image remain exactly the same:

It is impossible to recover the absolute scale of the scene!

)(1

XPPXx kk

Page 36: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion ambiguity

• If we scale the entire scene by some factor k and, at

the same time, scale the camera matrices by the

factor of 1/k, the projections of the scene points in the

image remain exactly the same

• More generally: if we transform the scene using a

transformation Q and apply the inverse transformation

to the camera matrices, then the images do not

change

QXPQPXx-1

Page 37: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Calibrated cameras

(similarity projection ambiguity)

Uncalibrated cameras

(projective projection ambiguity)

Page 38: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Types of ambiguity

vT

v

tAProjective

15dof

Affine

12dof

Similarity

7dof

Euclidean

6dof

Preserves intersection and

tangency

Preserves parallellism,

volume ratios

Preserves angles, ratios of

length

10

tA

T

10

tR

T

s

10

tR

T

Preserves angles, lengths

• With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction

• Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean

Slide: S. Lazebnik

Page 39: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective ambiguity

XQPQPXx P

-1

P

Page 40: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective ambiguity

Page 41: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine ambiguity

XQPQPXx A

-1

A

Affine

Page 42: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine ambiguity

Page 43: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Similarity ambiguity

XQPQPXxS

-1

S

Page 44: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Similarity ambiguity

Page 45: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

What can we do to remove ambiguities?

Page 46: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine structure from

motion

Page 47: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

• Let’s start with affine cameras (the math is easier)

center at

infinity

Page 48: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Recall: Orthographic Projection

Special case of perspective projection

• Distance from center of projection to image plane is infinite

• Projection matrix:

Image World

Page 49: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Orthographic Projection

Parallel Projection

Affine cameras

Page 50: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine cameras

• A general affine camera combines the effects of an

affine transformation of the 3D space, orthographic

projection, and an affine transformation of the image:

• Affine projection is a linear mapping + translation in

inhomogeneous coordinates

10

bAP

1000

]affine44[

1000

0010

0001

]affine33[2232221

1131211

baaa

baaa

x

Xa1

a2

bAXx

2

1

232221

131211

b

b

Z

Y

X

aaa

aaa

y

x

Projection of

world origin

Page 51: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine structure from motion

• Given: m images of n fixed 3D points:

xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n

• Problem: use the mn correspondences xij to estimate m projection matrices Ai and translation vectors bi, and n points Xj

• The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom):

• We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity)

• Thus, we must have 2mn >= 8m + 3n – 12

• For two views, we need four point correspondences

1

XQ

1

X,Q

10

bA

10

bA1

Page 52: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine structure from motion

• Centering: subtract the centroid of the image points

• For simplicity, assume that the origin of the world

coordinate system is at the centroid of the 3D points

• After centering, each normalized point xij is related to

the 3D point Xi by

ji

n

k

kji

n

k

ikiiji

n

k

ikijij

n

nn

XAXXA

bXAbXAxxx

ˆ1

11ˆ

1

11

jiijXAx ˆ

Page 53: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine structure from motion

• Let’s create a 2m × n data (measurement) matrix:

mnmm

n

n

xxx

xxx

xxx

D

ˆˆˆ

ˆˆˆ

ˆˆˆ

21

22221

11211

cameras

(2m)

points (n)

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 54: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine structure from motion

• Let’s create a 2m × n data (measurement) matrix:

n

mmnmm

n

n

XXX

A

A

A

xxx

xxx

xxx

D

21

2

1

21

22221

11211

ˆˆˆ

ˆˆˆ

ˆˆˆ

cameras

(2m × 3)

points (3 × n)

The measurement matrix D = MS must have rank 3!

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 55: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Factorizing the measurement matrix

Page 56: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Factorizing the measurement matrix

• Singular value decomposition of D:

Page 57: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Factorizing the measurement matrix

• Singular value decomposition of D:

Page 58: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Factorizing the measurement matrix

• Obtaining a factorization from SVD:

This decomposition minimizes

|D-MS|2

Page 59: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Affine ambiguity

• The decomposition is not unique. We get the same D

by using any 3×3 matrix C and applying the

transformations M → MC, S →C-1S

• That is because we have only an affine transformation

and we have not enforced any Euclidean constraints

(like forcing the image axes to be perpendicular, for

example)

Page 60: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

• Orthographic: image axes are perpendicular and of unit length

Eliminating the affine ambiguity

x

Xa1

a2

a1 · a2 = 0

|a1|2 = |a2|

2 = 1

Page 61: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Solve for orthographic constraints

• Solve for L = CCT

• Recover C from L by Cholesky decomposition: L = CCT

• Update A and X: A = AC, X = C-1X

T

i

T

i

i

2

1

~

~~

a

aAwhere

1~~

11

T

i

TT

iaCCa

1~~

22

T

i

TT

iaCCa

0~~

21

T

i

TT

iaCCa

~ ~

Three equations for each image i

Slide: D. Hoiem

Page 62: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Algorithm summary

• Given: m images and n features xij

• For each image i, center the feature coordinates

• Construct a 2m × n measurement matrix D:

• Column j contains the projection of point j in all views

• Row i contains one coordinate of the projections of all the n

points in image i

• Factorize D:

• Compute SVD: D = U W VT

• Create U3 by taking the first 3 columns of U

• Create V3 by taking the first 3 columns of V

• Create W3 by taking the upper left 3 × 3 block of W

• Create the motion and shape matrices:

• M = U3W3½ and S = W3

½ V3T (or M = U3 and S = W3V3

T)

• Eliminate affine ambiguity

Page 63: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Reconstruction results

C. Tomasi and T. Kanade. Shape and motion from image streams under orthography:

A factorization method. IJCV, 9(2):137-154, November 1992.

Page 64: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Multi-view projective

structure from motion

Page 65: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective structure from motion

• Given: m images of n fixed 3D points

zij xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D

points Xj from the mn correspondences xij

x1j

x2j

x3j

Xj

P1

P2

P3

Page 66: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective structure from motion

• Given: m images of n fixed 3D points

zij xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D

points Xj from the mn correspondences xij

• With no calibration info, cameras and points can only

be recovered up to a 4x4 projective transformation Q:

X → QX, P → PQ-1

• We can solve for structure and motion when

2mn >= 11m +3n – 15

• For two cameras, at least 7 points are needed

Page 67: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Projective SFM: Two-camera case

• Compute fundamental matrix F between the two views

• First camera matrix: [I|0]

• Second camera matrix: [A|b]

• Then b is the epipole (FTb = 0), A = –[b×]F

Page 68: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure by triangulation

•For each additional view:

• Determine projection matrix of

new camera using all the known

3D points that are visible in its

image – calibration cam

era

s

points

Page 69: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure by triangulation

•For each additional view:

• Determine projection matrix of

new camera using all the known

3D points that are visible in its

image – calibration

• Refine and extend structure:

compute new 3D points,

re-optimize existing points that

are also seen by this camera –

triangulation

cam

era

s

points

Page 70: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Sequential structure from motion

•Initialize motion from two images

using fundamental matrix

•Initialize structure by triangulation

•For each additional view:

• Determine projection matrix of

new camera using all the known

3D points that are visible in its

image – calibration

• Refine and extend structure:

compute new 3D points,

re-optimize existing points that

are also seen by this camera –

triangulation

•Refine structure and motion: bundle

adjustment

cam

era

s

points

Page 71: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Bundle adjustment

• Non-linear method for refining structure and motion

• Minimizing reprojection error

2

1 1

,),(

m

i

n

j

jiijDE XPxXP

x1j

x2j

x3j

Xj

P1

P2

P3

P1Xj

P2Xj

P3Xj

Page 72: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Review: Structure from motion

• Ambiguity

• Affine structure from motion

• Factorization

• Dealing with missing data

• Incremental structure from motion

• Projective structure from motion

• Bundle adjustment

Page 73: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure(scene geometry)

Motion(camera geometry)

Measurements

Pose Estimation known estimate3D to 2D

correspondences

Triangulation estimate known2D to 2D

coorespondences

Reconstruction estimate estimate2D to 2D

coorespondences

Page 74: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Large-scale structure from

motion

Page 75: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

• Input: images with points in correspondence pi,j = (ui,j,vi,j)

• Output• structure: 3D location xi for each point pi• motion: camera parameters Rj , tj possibly Kj

• Objective function: minimize reprojection error

Reconstruction (side)(top)

Page 76: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

15,464

76,389

37,383

Page 77: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Standard way to view photos

Page 78: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Photo Tourism

Page 79: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Input: Point correspondences

Page 80: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Feature detection

Detect features using SIFT [Lowe, IJCV 2004]

Page 81: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Feature description

Describe features using SIFT [Lowe, IJCV 2004]

Page 82: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Feature matching

Match features between each pair of images

Page 83: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Feature matching

Refine matching using RANSAC to estimate fundamental

matrix between each pair

Page 84: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Correspondence estimation

• Link up pairwise matches to form connected components of matches across several images

Image 1 Image 2 Image 3 Image 4

Page 85: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Image connectivity graph

(graph layout produced using the Graphviz toolkit: http://www.graphviz.org/)

Page 86: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Structure from motion

Camera 1

Camera 2

Camera 3

R1,t1

R2,t2

R3,t3

X1

X4

X3

X2

X5

X6

X7

minimize

g(R,T,X)

p1,1

p1,2

p1,3

non-linear least squares

Page 87: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Global structure from motion

• Minimize sum of squared reprojection errors:

• Minimizing this function is called bundle adjustment

– Optimized using non-linear least squares, e.g. Levenberg-Marquardt

predictedimage location

observedimage location

indicator variable:is point i visible in image j ?

Page 88: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Problem size

• What are the variables?

• How many variables per camera?

• How many variables per point?

• Trevi Fountain collection

466 input photos

+ > 100,000 3D points

= very large optimization problem

Page 89: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Doing bundle adjustment

• Minimizing g is difficult– g is non-linear due to rotations, perspective division– lots of parameters: 3 for each 3D point, 6 for each camera– difficult to initialize– gauge ambiguity: error is invariant to a similarity transform

(translation, rotation, uniform scale)

• Many techniques use non-linear least-squares (NLLS) optimization (bundle adjustment)– Levenberg-Marquardt is one common algorithm for NLLS– Lourakis, The Design and Implementation of a Generic

Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm, http://www.ics.forth.gr/~lourakis/sba/

– http://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm

Page 90: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Initialization: Incremental structure from motion

Page 91: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Incremental structure from motion

Page 92: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Final reconstruction

Page 93: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

More examples

Page 94: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

More examples

Page 95: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

More examples

Page 96: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices
Page 98: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices
Page 99: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

http://grail.cs.washington.edu/projects/rome/

Even larger scale SfM

City-scale structure from motion

• “Building Rome in a day”

Page 100: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

SfM applications

• 3D modeling

• Surveying

• Robot navigation and mapmaking

• Visual effects (“Match moving”)

– https://www.youtube.com/watch?v=RdYWp70P_kY

Page 101: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Applications – Photosynth

Page 102: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Applications – Hyperlapse

https://www.youtube.com/watch?v=SOpwHaQnRSY

Page 103: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

Summary: 3D geometric vision

• Single-view geometry• The pinhole camera model

– Variation: orthographic projection

• The perspective projection matrix

• Intrinsic parameters

• Extrinsic parameters

• Calibration

• Multiple-view geometry• Triangulation

• The epipolar constraint– Essential matrix and fundamental matrix

• Stereo – Binocular, multi-view

• Structure from motion– Reconstruction ambiguity

– Affine SFM

– Projective SFM

Page 104: Structure from motion16385/s18/lectures/lecture12.pdfStructure from motion ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices

References

Basic reading:• Szeliski textbook, Chapter 7.• Hartley and Zisserman, Chapter 18.