multiview geometry: stereo & structure from motion

86
Multiview Geometry: Stereo & Structure from Motion CS194: Image Manipulation, Comp. Vision, and Comp. Photo Alexei Efros, UC Berkeley, Spring 2020

Upload: others

Post on 24-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiview Geometry: Stereo & Structure from Motion

Multiview Geometry: Stereo & Structure from Motion

CS194: Image Manipulation, Comp. Vision, and Comp. PhotoAlexei Efros, UC Berkeley, Spring 2020

Page 2: Multiview Geometry: Stereo & Structure from Motion

Why multiple views?• Structure and depth are inherently ambiguous from

single views.

Images from Lana Lazebnik

Page 3: Multiview Geometry: Stereo & Structure from Motion

Why multiple views?• Structure and depth are inherently ambiguous from

single views.

Optical center

P1P2

P1’=P2’

Page 4: Multiview Geometry: Stereo & Structure from Motion

changing camera centerDoes it still work? synthetic PP

PP1

PP2

Page 5: Multiview Geometry: Stereo & Structure from Motion

Multi-view geometry problems• Structure: Given projections of the same 3D point in two

or more images, compute the 3D coordinates of that point

Camera 3R3,t3 Slide credit:

Noah Snavely

?

Camera 1Camera 2R1,t1 R2,t2

Page 6: Multiview Geometry: Stereo & Structure from Motion

Multi-view geometry problems• Stereo correspondence: Given a point in one of the

images, where could its corresponding points be in the other images?

Camera 3R3,t3

Camera 1Camera 2R1,t1 R2,t2

Slide credit: Noah Snavely

Page 7: Multiview Geometry: Stereo & Structure from Motion

Multi-view geometry problems• Motion: Given a set of corresponding points in two or

more images, compute the camera parameters

Camera 1Camera 2 Camera 3

R1,t1 R2,t2R3,t3

? ? ? Slide credit: Noah Snavely

Page 8: Multiview Geometry: Stereo & Structure from Motion

Estimating depth with stereo

• Stereo: shape from “motion” between two views• We’ll need to consider:

• Info on camera pose (“calibration”)• Image point correspondences

scene point

optical center

image plane

Page 9: Multiview Geometry: Stereo & Structure from Motion

Two cameras, simultaneous views

Single moving camera and static scene

Stereo vision

Page 10: Multiview Geometry: Stereo & Structure from Motion

Review: Camera calibration

Camera frame

Intrinsic parameters:Image coordinates relative to camera Pixel coordinates

Extrinsic parameters:Camera frame Reference frame

World frame

• Extrinsic params: rotation matrix and translation vector• Intrinsic params: focal length, pixel sizes (mm), image center point,

radial distortion parameters

We’ll assume for now that these parameters are given and fixed.

Grauman

Page 11: Multiview Geometry: Stereo & Structure from Motion

Oriented and Translated Camera

Ow

iw

kw

jw

t

R

Page 12: Multiview Geometry: Stereo & Structure from Motion

Degrees of freedom

[ ]XtRKx =

=

1100

01 333231

232221

131211

0

0

zyx

trrrtrrrtrrr

vus

vu

w

z

y

x

βα

5 6

Page 13: Multiview Geometry: Stereo & Structure from Motion

How to calibrate the camera?

=

1************

ZYX

ssvsu

[ ]XtRKx =

Page 14: Multiview Geometry: Stereo & Structure from Motion

How do we calibrate a camera?

312.747 309.140 30.086305.796 311.649 30.356307.694 312.358 30.418310.149 307.186 29.298311.937 310.105 29.216311.202 307.572 30.682307.106 306.876 28.660309.317 312.490 30.230307.435 310.151 29.318308.253 306.300 28.881306.650 309.301 28.905308.069 306.831 29.189309.671 308.834 29.029308.255 309.955 29.267307.546 308.613 28.963311.036 309.206 28.913307.518 308.175 29.069309.950 311.262 29.990312.160 310.772 29.080311.988 312.709 30.514

880 21443 203

270 197886 347745 302943 128476 590419 214317 335783 521235 427665 429655 362427 333412 415746 351434 415525 234716 308602 187

=

1************

ZYX

ssvsu

Page 15: Multiview Geometry: Stereo & Structure from Motion

=

−−−−−−−−

−−−−−−−−

00

00

1000000001

1000000001

34

33

32

31

24

23

22

21

14

13

12

11

1111111111

1111111111

mmmmmmmmmmmm

vZvYvXvZYXuZuYuXuZYX

vZvYvXvZYXuZuYuXuZYX

nnnnnnnnnn

nnnnnnnnnn

Method 1 – homogeneous linear system

• Solve for m’s entries using linear least squaresAx=0 form

=

134333231

24232221

14131211

ZYX

mmmmmmmmmmmm

ssvsu

Page 16: Multiview Geometry: Stereo & Structure from Motion

Can we factorize M back to K [R | T]?• Yes• Simplest is to use camera calibration packages

(there is a good one in OpenCV)

Page 17: Multiview Geometry: Stereo & Structure from Motion

Stereo reconstruction: main steps

– Calibrate cameras– Compute disparity– Estimate depth

Grauman

disparity

Page 18: Multiview Geometry: Stereo & Structure from Motion

• Assume parallel optical axes, known camera parameters (i.e., calibrated cameras).

Use similar triangles (pl, P, pr) and (Ol, P, Or):

Geometry for a simple stereo system

ZT

fZxxT rl =

−−+

lr xxTfZ−

=disparity

Grauman

want Z

Page 19: Multiview Geometry: Stereo & Structure from Motion

image I(x,y) image I´(x´,y´)Disparity map D(x,y)

(x´,y´)=(x+D(x,y), y)

Disparity example

Grauman

Page 20: Multiview Geometry: Stereo & Structure from Motion

Correspondence problem

Source: Andrew Zisserman

Page 21: Multiview Geometry: Stereo & Structure from Motion

Intensity profiles

Source: Andrew Zisserman

Page 22: Multiview Geometry: Stereo & Structure from Motion

Correspondence problem

Source: Andrew Zisserman

Neighborhood of corresponding points are similar in intensity patterns.

Page 23: Multiview Geometry: Stereo & Structure from Motion

Normalized cross correlation

Source: Andrew Zisserman

Page 24: Multiview Geometry: Stereo & Structure from Motion

Correlation-based window matching

Source: Andrew Zisserman

Page 25: Multiview Geometry: Stereo & Structure from Motion

Dense correspondence search

For each epipolar lineFor each pixel / window in the left image

• compare with every pixel / window on same epipolar line in right image

• pick position with minimum match cost (e.g., SSD, correlation)

Adapted from Li Zhang Grauman

Page 26: Multiview Geometry: Stereo & Structure from Motion

Textureless regions

Source: Andrew Zisserman

Textureless regions are non-distinct; high ambiguity for matches.

Grauman

Page 27: Multiview Geometry: Stereo & Structure from Motion

Effect of window size

Source: Andrew Zisserman Grauman

Page 28: Multiview Geometry: Stereo & Structure from Motion

Effect of window size

W = 3 W = 20

Figures from Li Zhang

Want window large enough to have sufficient intensity variation, yet small enough to contain only pixels with about the same disparity.

Grauman

Page 29: Multiview Geometry: Stereo & Structure from Motion

Stereo Results

Ground truthScene

– Data from University of Tsukuba

Page 30: Multiview Geometry: Stereo & Structure from Motion

Results with Window Search

Window-based matching(best window size)

Ground truth

Page 31: Multiview Geometry: Stereo & Structure from Motion

Better methods exist...

Energy MinimizationBoykov et al., Fast Approximate Energy Minimization via Graph Cuts,

International Conference on Computer Vision, September 1999.

Ground truth

Page 32: Multiview Geometry: Stereo & Structure from Motion

General case, with calibrated cameras • The two cameras need not have parallel optical axes.

Page 33: Multiview Geometry: Stereo & Structure from Motion

Stereo correspondence constraints

• Given p in left image, where can corresponding point p’ be?

Page 34: Multiview Geometry: Stereo & Structure from Motion

• Given p in left image, where can corresponding point p’ be?

Stereo correspondence constraints

Page 35: Multiview Geometry: Stereo & Structure from Motion

Where do we need to search?

Slide by James Hays

Page 36: Multiview Geometry: Stereo & Structure from Motion

Stereo correspondence constraints

Page 37: Multiview Geometry: Stereo & Structure from Motion

Stereo correspondence constraints•Geometry of two views allows us to constrain where the corresponding pixel for some image point in the first view must occur in the second view.

Epipolar constraint: Why is this useful?• Reduces correspondence problem to 1D search along conjugate

epipolar lines

epipolar planeepipolar lineepipolar line

Adapted from Steve Seitz

http://www.ai.sri.com/~luong/research/Meta3DViewer/EpipolarGeo.html

Page 38: Multiview Geometry: Stereo & Structure from Motion

• Epipolar Plane – plane containing baseline (1D family)• Epipoles= intersections of baseline with image planes = projections of the other camera center= vanishing points of the motion direction

• Baseline – line connecting the two camera centers

Epipolar geometryX

x x’

Page 39: Multiview Geometry: Stereo & Structure from Motion

The Epipole

Photo by Frank Dellaert

Page 40: Multiview Geometry: Stereo & Structure from Motion

Epipolar geometry: terms

• Baseline: line joining the camera centers• Epipole: point of intersection of baseline with the image plane• Epipolar plane: plane containing baseline and world point• Epipolar line: intersection of epipolar plane with the image

plane

• All epipolar lines intersect at the epipole• An epipolar plane intersects the left and right image planes in

epipolar lines

Grauman

Page 41: Multiview Geometry: Stereo & Structure from Motion

• Potential matches for p have to lie on the corresponding epipolar line l’.

• Potential matches for p’ have to lie on the corresponding epipolar line l.

Source: M. Pollefeys

Epipolar constraint

Page 42: Multiview Geometry: Stereo & Structure from Motion

Example

Page 43: Multiview Geometry: Stereo & Structure from Motion

Example: converging cameras

Figure from Hartley & Zisserman

As position of 3d point varies, epipolar lines “rotate” about the baseline

Page 44: Multiview Geometry: Stereo & Structure from Motion

Example: motion parallel with image plane

Figure from Hartley & Zisserman

Page 45: Multiview Geometry: Stereo & Structure from Motion

Example: forward motion

Figure from Hartley & Zisserman

e

e’

Epipole has same coordinates in both images.Points move along lines radiating from e: “Focus of expansion”

Page 46: Multiview Geometry: Stereo & Structure from Motion

• For a given stereo rig, how do we express the epipolarconstraints algebraically?

• For calibrated cameras, with Essential Matrix• For uncalibrated cameras, with Fundamental Matrix

Page 47: Multiview Geometry: Stereo & Structure from Motion

X

x x’

Epipolar constraint: Calibrated case

• Intrinsic and extrinsic parameters of the cameras are known, world coordinate system is set to that of the first camera

• Then the projection matrices are given by K[I | 0] and K’[R | t]• We can multiply the projection matrices (and the image points)

by the inverse of the calibration matrices to get normalizedimage coordinates:

XtRxKxX,IxKx ][0][ pixel1

normpixel1

norm =′′=′== −−

Page 48: Multiview Geometry: Stereo & Structure from Motion

X

x x’ = Rx+t

Epipolar constraint: Calibrated case

Rt

The vectors Rx, t, and x’ are coplanar

= (x,1)T

[ ]

1

0x

I [ ]

1x

tR

Page 49: Multiview Geometry: Stereo & Structure from Motion

Epipolar constraint: Calibrated case

0])([ =×⋅′ xRtx

X

x x’ = Rx+t

baba ][0

00

×=

−−

−=×

z

y

x

xy

xz

yz

bbb

aaaa

aaRecall:

The vectors Rx, t, and x’ are coplanar

Page 50: Multiview Geometry: Stereo & Structure from Motion

Epipolar constraint: Calibrated case

0])([ =×⋅′ xRtx

X

x x’ = Rx+t

Essential Matrix(Longuet-Higgins, 1981)

The vectors Rx, t, and x’ are coplanar

Page 51: Multiview Geometry: Stereo & Structure from Motion

X

x x’

Epipolar constraint: Calibrated case

• E x is the epipolar line associated with x (l' = E x)• Recall: a line is given by ax + by + c = 0 or

=

==

1, where0 y

x

cba

T xlxl

Page 52: Multiview Geometry: Stereo & Structure from Motion

X

x x’

Epipolar constraint: Calibrated case

• E x is the epipolar line associated with x (l' = E x)• ETx' is the epipolar line associated with x' (l = ETx')• E e = 0 and ETe' = 0• E is singular (rank two)• E has five degrees of freedom

Page 53: Multiview Geometry: Stereo & Structure from Motion

Epipolar constraint: Uncalibrated case

• The calibration matrices K and K’ of the two cameras are unknown

• We can write the epipolar constraint in terms of unknown normalized coordinates:

X

x x’

0ˆˆ =′ xEx T xKxxKx ′′=′= −− 11 ˆ,ˆ

Page 54: Multiview Geometry: Stereo & Structure from Motion

Epipolar constraint: Uncalibrated case

X

x x’

Fundamental Matrix(Faugeras and Luong, 1992)

0ˆˆ =′ xEx T

xKxxKx′′=′

=−

1

1

ˆˆ

1with0 −−′==′ KEKFxFx TT

Page 55: Multiview Geometry: Stereo & Structure from Motion

Epipolar constraint: Uncalibrated case

• F x is the epipolar line associated with x (l' = F x)• FTx' is the epipolar line associated with x' (l = FTx')• F e = 0 and FTe' = 0• F is singular (rank two)• F has seven degrees of freedom

X

x x’

0ˆˆ =′ xEx T 1with0 −−′==′ KEKFxFx TT

Page 56: Multiview Geometry: Stereo & Structure from Motion

Estimating the fundamental matrix

Page 57: Multiview Geometry: Stereo & Structure from Motion

The eight-point algorithm

Enforce rank-2 constraint (take SVD of F and throw out the smallest singular value)

[ ] 01

1

333231

232221

131211

=

′′ v

u

fffffffff

vu [ ] 01

33

32

31

23

22

21

13

12

11

=

′′′′′′

fffffffff

vuvvvuvuvuuu

)1,,(,)1,,( vuvu T ′′=′= xx

Solve homogeneous linear system using eight or more matches

Page 58: Multiview Geometry: Stereo & Structure from Motion

Problem with eight-point algorithm

[ ] 1

32

31

23

22

21

13

12

11

−=

′′′′′′

ffffffff

vuvvvuvuvuuu

Page 59: Multiview Geometry: Stereo & Structure from Motion

[ ] 1

32

31

23

22

21

13

12

11

−=

′′′′′′

ffffffff

vuvvvuvuvuuu

Problem with eight-point algorithm

Poor numerical conditioningCan be fixed by rescaling the data

Page 60: Multiview Geometry: Stereo & Structure from Motion

The Fundamental Matrix Song

http://danielwedge.com/fmatrix/https://www.youtube.com/watch?time_continue=8&v=DgGV3l82NTk&feature=emb_title

Page 61: Multiview Geometry: Stereo & Structure from Motion

Review: Structure from Motion (SfM)

• Given many images, how can we a) figure out where they were all taken from?b) build a 3D model of the scene?

This is (roughly) the structure from motion problem

Page 62: Multiview Geometry: Stereo & Structure from Motion

Structure from motion

• Input: images with points in correspondence pi,j = (ui,j,vi,j)

• Output• structure: 3D location xi for each point pi• motion: camera parameters Rj , tj possibly Kj

• Objective function: minimize reprojection error

Reconstruction (side) (top)

Page 63: Multiview Geometry: Stereo & Structure from Motion

Camera calibration & triangulation

• Suppose we know 3D points and have matchesbetween these points and an image– How can we compute the camera parameters?

• Suppose we have know camera parameters, each of which observes a point– How can we compute the 3D location of that point?

Page 64: Multiview Geometry: Stereo & Structure from Motion

Structure from motion

• SfM solves both of these problems at once• A kind of chicken-and-egg problem

– (but solvable)

Page 65: Multiview Geometry: Stereo & Structure from Motion

Photo TourismNoah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006

https://youtu.be/mTBPGuPLI5Y

Page 66: Multiview Geometry: Stereo & Structure from Motion

Large-scale structure from motion

Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).Total reconstruction time: 23 hoursNumber of cores: 352

Page 67: Multiview Geometry: Stereo & Structure from Motion

First step: how to get correspondence?

• Feature detection and matching

Page 68: Multiview Geometry: Stereo & Structure from Motion

Feature detectionDetect features using SIFT [Lowe, IJCV 2004]

Page 69: Multiview Geometry: Stereo & Structure from Motion

Feature detectionDetect features using SIFT [Lowe, IJCV 2004]

Page 70: Multiview Geometry: Stereo & Structure from Motion

Feature matchingMatch features between each pair of images

Page 71: Multiview Geometry: Stereo & Structure from Motion

Feature matchingRefine matching using RANSAC to estimate fundamental matrix between each pair

Page 72: Multiview Geometry: Stereo & Structure from Motion

Correspondence estimation

• Link up pairwise matches to form connected components of matches across several images

Image 1 Image 2 Image 3 Image 4

Page 73: Multiview Geometry: Stereo & Structure from Motion

The story so far…

Feature detection

Matching + track generation

Input images

Images with feature correspondence

Page 74: Multiview Geometry: Stereo & Structure from Motion

• Next step:– Use structure from motion

to solve for geometry (cameras and points)

• First: what are cameras and points?

The story so far…

Page 75: Multiview Geometry: Stereo & Structure from Motion

Points and cameras

• Point: 3D position in space ( )

• Camera ( ): – A 3D position ( )– A 3D orientation ( )– Intrinsic parameters

(focal length, aspect ratio, …)– 7 parameters (3+3+1) in total

Page 76: Multiview Geometry: Stereo & Structure from Motion

Structure from motion

Camera 1

Camera 2

Camera 3R1,t1

R2,t2

R3,t3

X1

X4

X3

X2

X5

X6

X7

minimizeg(R,T,X)

p1,1

p1,2

p1,3

non-linear least squares

Page 77: Multiview Geometry: Stereo & Structure from Motion

Structure from motion

• Minimize sum of squared reprojection errors:

• Minimizing this function is called bundle adjustment– Optimized using non-linear least squares,

e.g. Levenberg-Marquardt

predictedimage location

observedimage location

indicator variable:is point i visible in image j ?

Page 78: Multiview Geometry: Stereo & Structure from Motion

Solving structure from motion

• How do we solve the SfM problem?• Challenges:

– Large number of parameters (1000’s of cameras, millions of points)

– Very non-linear objective function

Inputs: feature tracks Outputs: 3D cameras and points

Page 79: Multiview Geometry: Stereo & Structure from Motion

Solving structure from motion

Inputs: feature tracks Outputs: 3D cameras and points

• Important tool: Bundle Adjustment [Triggs et al. ’00]– Joint non-linear optimization of both cameras and points– Very powerful, elegant tool

• The bad news:– Starting from a random initialization is very likely to give the

wrong answer– Difficult to initialize all the cameras at once

Page 80: Multiview Geometry: Stereo & Structure from Motion

Solving structure from motion

Inputs: feature tracks Outputs: 3D cameras and points

• The good news:– Structure from motion with two cameras is (relatively) easy– Once we have an initial model, it’s easy to add new cameras

• Idea:– Start with a small seed reconstruction, and grow

Page 81: Multiview Geometry: Stereo & Structure from Motion

Incremental SfM

• Automatically select an initial pair of images

Page 82: Multiview Geometry: Stereo & Structure from Motion

1. Picking the initial pair• We want a pair with many matches, but which

has as large a baseline as possible

lots of matchessmall baseline

very few matcheslarge baseline

lots of matcheslarge baseline

Page 83: Multiview Geometry: Stereo & Structure from Motion

• Many possible heuristics• E.g.

– Choose the pair with at least 100 matches, such that the ratio

is as small as possible

– A homography will be a bad fit if there is sufficient parallax (and the scene is not planar)

1. Picking the initial pair

Page 84: Multiview Geometry: Stereo & Structure from Motion

2. Two-frame reconstruction• Input: two images with correspondence• Output: camera parameters, 3D points

• In general, there can be ambiguities if the cameras are uncalibrated (camera intrinsics are unknown)

• Usually assume that the only intrinsic parameter is an unknown focal length

Page 85: Multiview Geometry: Stereo & Structure from Motion

2. Two-view reconstruction

• Two-view SfM: Given two calibrated images with corresponding points, compute the camera and point positions

• Solved by finding the essential matrix between the images

Page 86: Multiview Geometry: Stereo & Structure from Motion

Incremental SfM: Algorithm

1. Pick a strong initial pair of images2. Initialize the model using two-frame SfM3. While there are connected images remaining:

a. Pick the image which sees the most existing 3D pointsb. Estimate the pose of that camerac. Triangulate any new pointsd. Run bundle adjustment