stanford cs223b computer vision, winter 2007 lecture 8 structure from motion

49
Stanford CS223B Computer Vision, Winter 2007 Lecture 8 Structure From Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens Slide credit: Gary Bradski, Stanford SAIL

Upload: yen-cantrell

Post on 31-Dec-2015

33 views

Category:

Documents


5 download

DESCRIPTION

Stanford CS223B Computer Vision, Winter 2007 Lecture 8 Structure From Motion. Professors Sebastian Thrun and Jana Ko š eck á CAs: Vaibhav Vaish and David Stavens Slide credit: Gary Bradski, Stanford SAIL. Summary SFM. Problem Determine feature locations (=structure) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Stanford CS223B Computer Vision, Winter 2007

Lecture 8 Structure From Motion

Professors Sebastian Thrun and Jana Košecká

CAs: Vaibhav Vaish and David Stavens

Slide credit: Gary Bradski, Stanford SAIL

Page 2: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Summary SFM

Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)

Two Principal Solutions– Bundle adjustment (nonlinear least squares, local

minima)– SVD (through orthographic approximation, affine

geometry) Correspondence

– (RANSAC)– Expectation Maximization

Page 3: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion

camera

features

Recover: structure (feature locations), motion (camera extrinsics)

Page 4: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

SFM = Holy Grail of 3D Reconstruction

Take movie of object Reconstruct 3D model

Would be

commercially

highly viable

live.com

Page 5: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (1)

[Tomasi & Kanade 92]

Page 6: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (2)

[Tomasi & Kanade 92]

Page 7: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (3)

[Tomasi & Kanade 92]

Page 8: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (4a): Images

Marc Pollefeys

Page 9: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (4b)

Marc Pollefeys

Page 10: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion (5)

http://www.cs.unc.edu/Research/urbanscape

Page 11: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion

Problem 1:– Given n points pij =(xij, yij) in m images

– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)

– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)

Problem 2:– Establish correspondence: c(pij)

Page 12: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion

camera

features

Recover: structure (feature locations), motion (camera extrinsics)

Page 13: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Recovery Problems

1 image 2+ images

Location known calibration stereo

Location unknown

SFM, stitching

Page 14: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

SFM: General Formulation

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

,

,

,

,

,

,

,

,

,

,

,

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

fZ Z

fXx

XO

-x

Page 15: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

SFM: Bundle Adjustment

min

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

2

,

,

,

,

,

,

,

,

,

,

,

,

ji

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

fZ Z

fXx

XO

-x

Page 16: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Bundle Adjustment

SFM = Nonlinear Least Squares problem Minimize through

– Gradient Descent– Conjugate Gradient– Gauss-Newton– Levenberg Marquardt common method

Prone to local minima

Page 17: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 6m+3n unknowns

Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???

Page 18: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

How Many Parameters Can’t We Recover?

0 3 6 7 8 10 12 n m nm

Place Your Bet!

We can recover all but…

m = #camera posesn = # feature points

Page 19: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 6m+3n unknowns

Suggests: need 2mn 6m + 3n But: Can we really recover all parameters???

– Can’t recover origin, orientation (6 params)– Can’t recover scale (1 param)

Thus, we need 2mn 6m + 3n - 7

Page 20: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Are we done?

No, bundle adjustment has many local minima.

Page 21: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The “Trick Of The Day”

Replace Perspective by Orthographic Geometry

Replace Euclidean Geometry by Affine Geometry

Solve SFM linearly via PCA (“closed” form, globally optimal)

Post-Process to make solution Euclidean

Post-Process to make solution perspective

By Tomasi and Kanade, 1992

Page 22: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Orthographic Camera Model

Orthographic = Limit of Pinhole Model:

z

y

x

z

y

x

z

y

x

b

b

b

P

P

P

aaa

aaa

aaa

p

p

p

333231

232221

131211

Extrinsic Parameters

Rotation

Orthographic Projection bAPb

b

P

P

P

a

a

a

a

a

a

p

p

y

x

Z

Y

X

y

x

23

13

22

12

21

11

Page 23: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Orthographic Projection

Limit of Pinhole Model:

Orthographic Projection

1||

1||

0

22

21

21

a

a

aa

rotation is

333231

232221

131211

aaa

aaa

aaa

ijiij bPAp

featurejcamerai

bAPb

b

P

P

P

a

a

a

a

a

a

p

p

y

x

Z

Y

X

y

x

23

13

22

12

21

11

Page 24: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The Orthographic SFM Problem

}{ and },{recover jPii bA

ijiij bPAp featurejcamerai 1||

1||

0

22

21

21

a

a

aa

subject to

Page 25: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The Affine SFM Problem

}{ and },{recover jPii bA

ijiij bPAp featurejcamerai 1||

1||

0

22

21

21

a

a

aa

subject todrop theconstraints

Page 26: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Count # Constraints vs #Unknowns

m camera poses n points 2mn point constraints 8m+3n unknowns

Suggests: need 2mn 8m + 3n But: Can we really recover all parameters???

ijiij bPAp featurejcamerai

Page 27: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

How Many Parameters Can’t We Recover?

0 3 6 7 8 10 12 n m nm

Place Your Bet!

We can recover all but…

Page 28: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The Answer is (at least): 12

ijiij bPAp

iijiij bdAdCPCCAp ))(( :Proof 11

iji bPA

iiiji bdAdAPA

''' ijiij bPAp

dCPCP jj11'

iii bdAb 'singular-non , Cd CAA ii '

Page 29: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Points for Solving Affine SFM Problem

m camera poses n points

Need to have: 2mn 8m + 3n-12

Page 30: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Affine SFM

jiij PAp

Fix coordinate systemby making pi0=P0=origin

mj

j

j

p

p

q 1

mA

A

A 1

jj APqm :cameras

ADQn :points

NPPD 1

mn

n

m p

p

p

p

Q

1

1

11

ijiij bPAp

Proof:

3m2 size has A

Rank Theorem: Q has rank 3

nD 3 size has

Page 31: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The Rank Theorem

3rank has

1

1

1

1

11

11

Nyy

Nxx

Nyy

Nxx

MM

MM

pp

pp

pp

pp

n elements

2m

ele

me

nts

Page 32: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Singular Value Decomposition

T

Nyy

Nxx

Nyy

Nxx

VWU

pp

pp

pp

pp

MM

MM

1

1

1

1

11

11

n332 m 33

Page 33: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Affine Solution to Orthographic SFM

structure affine TWV

positions camera affine U

Gives also the optimal affine reconstruction under noise

Page 34: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Back To Orthographic Projection

1||

1||

0

sConstraint

22

21

21

a

a

aa

matrix singular -non , vector Cd

with

Find C for which constraints are metSearch in 9-dim space (instead of 8m + 3n-12)

''' ijiij bPAp

dCPCP jj11'

ii CAA '

iii bdAb '

Page 35: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Back To Projective Geometry

Orthographic (in the limit)

Projective

Page 36: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Back To Projective Geometry

min

cossin0

sincos0

001

cos0sin

010

sin0cos

100

cossin0

sincos0

001

cos0sin

010

sin0cos

0cossin

0sincos

2

,

,

,

,

,

,

,

,

,

,

,

,

ji

iz

jz

jy

jx

ii

ii

ii

ii

iy

ix

jz

jy

jx

ii

ii

ii

ii

ii

ii

jy

jx

b

P

P

P

b

b

P

P

P

fp

p

fZ Z

fXx

XO

-x

Optimize

Using orthographic solution as starting point

Page 37: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The “Trick Of The Day”

Replace Perspective by Orthographic Geometry

Replace Euclidean Geometry by Affine Geometry

Solve SFM linearly via PCA (“closed” form, globally optimal)

Post-Process to make solution Euclidean

Post-Process to make solution perspective

By Tomasi and Kanade, 1992

Page 38: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Structure From Motion

Problem 1:– Given n points pij =(xij, yij) in m images

– Reconstruct structure: 3-D locations Pj =(xj, yj, zj)

– Reconstruct camera positions (extrinsics) Mi=(Aj, bj)

Problem 2:– Establish correspondence: c(pij)

Page 39: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

The Correspondence Problem

View 1 View 3View 2

Page 40: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Correspondence: Solution 1

Track features (e.g., optical flow)

…but fails when images taken from widely different poses

Page 41: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Correspondence: Solution 2

Start with random solution A, b, P Compute soft correspondence: p(c|A,b,P) Plug soft correspondence into SFM Reiterate

See Dellaert/Seitz/Thorpe/Thrun, Machine Learning Journal, 2003

Page 42: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Example

Page 43: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Results: Cube

Page 44: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Animation

Page 45: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Tomasi’s Benchmark Problem

Page 46: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Reconstruction with EM

Page 47: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

3-D Structure

Page 48: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Correspondence: Alternative Approach

Ransac [Fisher/Bolles]

= Random sampling and consensus

Will be discussed Wednesday

Page 49: Stanford CS223B Computer Vision, Winter 2007 Lecture 8  Structure From Motion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007

Summary SFM

Problem– Determine feature locations (=structure)– Determine camera extrinsic (=motion)

Two Principal Solutions– Bundle adjustment (nonlinear least squares, local

minima)– SVD (through orthographic approximation, affine

geometry) Correspondence

– (RANSAC)– Expectation Maximization