· image formation, processing and features • image formation – light, cameras, optics and...

Summary of UNIK4690

12.04.2018

IMAGE FORMATION, PROCESSING AND FEATURES • Image formation

– Light, cameras, optics and color – Pose in 2D and 3D – Basic projective geometry – The perspective camera model

• Image processing – Image filtering – Image pyramids – Laplace blending

• Feature detection – Line features – Local keypoint features – Robust estimation with RANSAC

• Feature matching – From keypoints to feature correspondences – Feature descriptors – Feature matching – Estimating homographies from feature

correspondences

Lectures 2018

WORLD GEOMETRY AND 3D • Single-view geometry

– Camera calibration – Pose from known 3D points – Scene geometry from a single

view • Stereo imaging

– Basic epipolar geometry – Stereo imaging – Stereo processing

• Two-view geometry – Epipolar geometry – Triangulation – Pose from epipolar geometry

• Multiple-view geometry – Multiple-view geometry – Structure from motion – Multiple-view stereo

Lectures 2018

SCENE ANALYSIS • Image analysis

– Image segmentation – Image feature extraction – Introduction to machine learning

• Object detection – Descriptor-based detection – Introduction to deep learning

Lectures 2018






correspondences










Lectures 2018

Image formation Light, cameras, optics and color

6

Image formation: • Illumination • Cameras • Optics • Image Capture • Color Sensing.

Image capture

7

Light source

CMOS image sensor (CMOSIS 48Mp)

Scene Imaging system (camera)

Shutter

Shutter: Mechanical / electronic Global / rolling

Detector

Image formation Pose in 2D and 3D

8

• Pose – General properties

• Representing pose in 2D

– Homogeneous transformations

• Representing pose in 3D – Homogeneous transformations – Other alternatives

• Representing rotation in 3D

– Rotation matrix – Euler angles – Angle-axis – Unit quaternion

Pose – Representation in 2D

• The pose of 𝐵𝐵 relative to 𝐴𝐴 can be represented by a homogeneous transformation 𝑇𝑇𝐵𝐵𝐴𝐴 ∈ 𝑆𝑆𝑆𝑆 2

• Properties

• Points are represented in homogeneous coordinates

9

1

A A B A A BB B

A A B A A BC B C C B C

A AB B

TT T T

T

ξξ ξ ξ

ξ −

= ⋅ == ⊕ =

p p p p

11 1

A A BB

A BA A

A BB B

T

x xR

y y

=

=

p p

t0

cos sinsin cos

10 0 1

ABxA A

A A AB BB B By

tR

T tθ θ

ξ θ θ −

= =

t0

𝑥𝑥𝐴𝐴

𝑦𝑦𝐴𝐴

𝐴𝐴

𝐵𝐵

𝑥𝑥𝐵𝐵

𝑦𝑦𝐵𝐵

𝜉𝜉𝐵𝐵𝐴𝐴

𝒑𝒑𝐴𝐴

𝒑𝒑𝐵𝐵 𝑃𝑃

Pose – Representation in 3D

• The pose of 𝐵𝐵 relative to 𝐴𝐴 can be represented by a homogeneous transformation 𝑇𝑇𝐵𝐵𝐴𝐴 ∈ 𝑆𝑆𝑆𝑆 3

• Properties

• Points are represented in homogeneous coordinates

10

11 12 13

21 22 23

31 32 3310 0 0 1

ABx

A A AA A B B By

B B ABz

r r r tR r r r t

Tr r r t

ξ

= =

t0

1

A A B A A BB B

A A B A A BC B C C B C

A AB B

TT T T

T

ξξ ξ ξ

ξ −

= ⋅ == ⊕ =

p p p p

11 1

A A BB

A B

A A A BB B

A B

T

x xy R yz z

=

=

p p

t0

𝑥𝑥𝐴𝐴

𝑦𝑦𝐴𝐴

𝐴𝐴

𝐵𝐵

𝑥𝑥𝐵𝐵

𝑦𝑦𝐵𝐵

𝜉𝜉𝐵𝐵𝐴𝐴 𝑧𝑧𝐵𝐵

𝑧𝑧𝐴𝐴

𝒑𝒑𝐴𝐴

𝒑𝒑𝐵𝐵 𝑃𝑃

Image formation Basic projective geometry

11

• The projective plane 2 – Homogeneous coordinates – Line at infinity – Points & lines are dual

• The projective space 3

– Homogeneous coordinates – Plane at infinity – Points & planes are dual

• Linear transformations of 2 and 3

– Represented by homogeneous matrices – Homographies ⊃ Affine ⊃ Similarities ⊃

Euclidean ⊃ Translations

Transformations of the projective plane

12

Transformation of 2 Matrix #DoF Preserves Visualization

Translation

𝐼𝐼 𝒕𝒕𝟎𝟎𝑻𝑻 1

2 Orientation + all below

Euclidean

𝑅𝑅 𝒕𝒕𝟎𝟎𝑻𝑻 1

3 Lengths + all below

Similarity

𝑠𝑠𝑅𝑅 𝒕𝒕𝟎𝟎𝑻𝑻 1

4 Angles + all below

Affine 𝑎𝑎11 𝑎𝑎12 𝑎𝑎13𝑎𝑎21 𝑎𝑎22 𝑎𝑎230 0 1

6 Parallelism,

line at infinity + all below

Homography /projective

ℎ11 ℎ12 ℎ13ℎ21 ℎ22 ℎ23ℎ31 ℎ32 ℎ33

8 Straight lines

Image formation The perspective camera model

13

• The perspective camera model – 𝑃𝑃 = 𝐾𝐾 𝑅𝑅, 𝒕𝒕 – The camera matrix – Intrinsic: 𝐾𝐾 – The camera calibration matrix – Extrinsic: 𝑅𝑅, 𝒕𝒕

• Lens distortion

– Radial distortion – Tangential distortion (often ignored)

The perspective camera model

14

• The perspective camera model describes the correspondence between observed points in the world and points in the captured image

𝒖𝒖� = 𝑃𝑃 𝑿𝑿�𝑊𝑊 𝒖𝒖� = 𝐾𝐾 𝑅𝑅 𝒕𝒕 𝑿𝑿�𝑊𝑊


15


𝒖𝒖� = 𝑃𝑃 𝑿𝑿�𝑊𝑊 𝒖𝒖� = 𝐾𝐾 𝑅𝑅 𝒕𝒕 𝑿𝑿�𝑊𝑊

Intrinsic Extrinsic


16


𝒖𝒖� =𝑓𝑓𝑢𝑢 𝑠𝑠 𝑐𝑐𝑢𝑢0 𝑓𝑓𝑣𝑣 𝑐𝑐𝑣𝑣0 0 1

1 00 10 0

0 00 01 0

𝑅𝑅 𝒕𝒕𝟎𝟎 1 𝑿𝑿�𝑊𝑊

Intrinsic Extrinsic

where 𝑅𝑅 𝒕𝒕𝟎𝟎 1 = 𝜉𝜉𝐶𝐶

−1𝑊𝑊 = 𝜉𝜉𝑊𝑊𝐶𝐶

Practical use

• The geometry of the perspective camera is simple since we assume the pinhole to be infinitely small

• In reality the light passes through a lens that complicates the camera intrinsics

• If we want to use images for geometrical computations based on the perspective camera model, we MUST compensate for distortion

– Radial distortion • Barrel • Pincushion

– Tangential distortion

17

Barrel

Pincushion

Projecting a point in the world to a pixel in the image

18

If we have mounted a perspective camera 𝐶𝐶 on a vehicle 𝑉𝑉 and we know the pose of the camera relative to the vehicle as well as the pose of the vehicle relative to the world. How can we compute the pixel corresponding to a given point , 𝑿𝑿𝑊𝑊 , in the world 𝑊𝑊 ?

{ }W{ }V

{ }C W X


We know • The pose of the camera relative to the vehicle

• The pose of the vehicle relative to the world

• The point in the world We also need to know • The camera intrinsics

19

0 0.1736 0.9848 31 0 0 2

0 0.9848 0.1736 60 0 0 1

VCξ

− − − = − −

75600

W X =

1500 0 8000 1500 6000 0 1

K =

0.8660 0.5000 0 150.5000 0.8660 0 20

0 0 1 00 0 0 1

WVξ

− =

Then we can compute the pose of the world relative to the camera And we can use the perspective camera model directly to compute the pixel Hence we get the pixel


20

( ) 1

0.5000 0.8660 0 7.82050.1504 0.0868 0.9848 10.4220

0.8529 0.4924 0.1736 24.55360 0 0 1

C W VW V Cξ ξ ξ

−

− − − − = = − −

751500 0 800 0.5000 0.8660 0 7.8205 45203.0620

600 1500 600 0.1504 0.0868 0.9848 10.4220 32274.1449

00 0 1 0.8529 0.4924 0.1736 24.5536 68.9557

1

uvw

−

= − − − = − −

[ ]K R=u t X

45203.0620655.537468.9557

32274.1449 468.041568.9557

uv

= =

Reprojecting a pixel...

If we know the pose of a perspective camera relative to the world, how can we reproject a given pixel back into the world?

21


If we know the pose of a perspective camera relative to the world, how can we reproject a given pixel back into the world? • We know that the ray of any pixel contains the projective center given by the cameras

position

• Using the inverse calibration matrix, we can determine another point on the ray

• These two points define the line of sight for the pixel and we can use it to reproject the pixel onto any surface we like

22

11

1

1 1

C X uY K v−

=

0

0

0

000

C CXYZ

=


23






correspondences










Lectures 2018

Image processing • Point operators (pixel-to-pixel)

- Adjustment of brightness, contrast and colour - Histogram equalization

• Image filtering in spatial domain

- Mathematical operations on a local neighborhood - Linear filters (convolution, cross-correlation) - Non-linear filters - Image enhancement (smoothing, sharpening) - Feature extraction (edges, texture etc.)

• Image filtering in frequency domain

- Modification of spatial image frequencies - Noise removal, (re)sampling, image compression - 2D Fourier transform

25

Linear filtering (cross-correlation or convolution)

26

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 90 90 0 0 0 0 0

0 0 0 90 90 90 0 0 0 0

0 0 90 90 90 90 0 0 0 0

0 0 90 90 90 90 90 0 0 0

0 90 90 90 90 90 90 90 0 0

0 0 90 90 90 90 90 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 10 20 20 10 0 0 0 0

0 0 20

1 1 1 1 1 1 1 1 1

Filtering in frequency domain

Fourier (1807): Any univariate function can be rewritten as a

weighted sum of sines and cosines of different frequencies (true with some subtle restrictions).

This leads to: • Fourier Series • Fourier Transform (continuous and discrete) • Fast Fourier Transform (FFT)

Jean Baptiste Joseph Fourier (1768-1830)

27

Image processing Image Pyramids: • Downsampling (decimation) • Upsampling (interpolation) • Gaussian Pyramids

28

(Cmglee)

Gaussian Pyramid

29

Blur

Blur

Blur

Downsample Downsample Downsample

Image Processing

30

Laplacian Pyramids: • Laplacian filter • Laplacian pyramid • Image blending

Laplacian pyramid

31

- -

-

Laplacian pyramid

32

Image blending with Laplacian pyramids

Weighted sum for each level of the pyramid

= X X +

L Laplacian

blend

L1 Laplacian of img 1

G Gaussian of mask

L2 Laplacian of img 2

1-G Flipped mask

33






correspondences










Lectures 2018

Feature detection

Line features: • Edge detectors • Line detection with the Hough transform

35

Thinning and thresholding

• Detection of local maxima (i.e. suppression of non-maxima)

• Thresholding

Binary image with isolated edges (single pixels at discrete locations along edge contours)

Edge image (Canny) Edge enhanced image (Sobel)

36

Line detection - Hough transform

37

The set of all lines going through a given point corresponds to a sinusoidal curve in the plane. Two or more points on a straight line will give rise to sinusoids intersecting at the point for that line.

The Hough transform can be generalized to other shapes.

Example

38 Original Edge image (Canny)

Example (2)

39

Detected lines

Feature detection Local keypoint features

40

• Corner detectors – Stable in space – Min eigenvalue, Harris

• Blob detectors

– Stable in scale and space – LoG, DoG

Characteristics of good features

• Repeatability • Distinctiveness

• Efficiency • Locality

41

Local measure of feature distinctiveness

• Consider a small window of pixels around a feature • How does the window change when you shift it?

42

“Flat” region: No change in all directions

“Edge”: No change along edge

“Corner”: Change in all directions

Simplifying the measure even further

43

• Consider a horizontal “slice” of E(u,v):

• This is the equation of an ellipse – Describe the surface using

the eigenvalues of M

[ ]( , )u

E u v u v M constv

≈ =

Corner detection summary

• Compute the gradient at each point in the image using derivatives of Gaussians • Create the second moment matrix M from the entries in the gradient • Compute the eigenvalues • Find points with large response (λmin > threshold) • Choose those points where λmin is a local maximum as features

44

Harris detector properties

• Scaling

45

LoG blob detector

46

• Convolve the image with scale-normalized LoG at several scales • Find maxima of squared LoG response in scale-space

• Approximate with Difference of Gaussians (DoG)

Feature detection Robust estimation with RANSAC

47

• RANSAC – A robust iterative method for estimating the parameters of a mathematical model

from a set of observed data containing outliers – Separates the observed data into “inliers” and “outliers” – Very useful if we want to use better, but less robust, estimation methods

RANSAC Objective Robustly fit a model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶 to a data set 𝑆𝑆 = 𝒙𝒙𝑖𝑖

48

Algorithm 1. Determine a test model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 from 𝑛𝑛 random data points 𝒙𝒙1,𝒙𝒙2, … ,𝒙𝒙𝑛𝑛

2. Check how well each individual data point in 𝑆𝑆 fits with the test model

– Data points within a distance 𝑡𝑡 of the model constitute a set of inliers 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 ⊆ 𝑆𝑆 – Data points outside a distance 𝑡𝑡 of the model are outliers

3. If 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 is the largest set of inliers encountered so far, we keep this model – Set 𝜶𝜶 = 𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 and 𝑆𝑆𝐼𝐼𝐼𝐼 = 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡

4. Repeat steps 1-3 until 𝑁𝑁 models have been tested

RANSAC

Comments • Number of iterations required to achieve

confidence 𝑝𝑝 when testing random models from 𝑛𝑛-tuples of data elements from a dataset with inlier probability 𝜔𝜔

𝑁𝑁 = 𝑙𝑙𝑙𝑙𝑙𝑙 1−𝑝𝑝𝑙𝑙𝑙𝑙𝑙𝑙 1−𝜔𝜔𝑛𝑛

• Typical desired level of confidence

𝑝𝑝 = 0.99

• Inlier probability 𝜔𝜔 is typically unknown, but can be estimated per iteration

𝜔𝜔 =#𝑚𝑚𝑎𝑎𝑥𝑥 𝑒𝑒𝑠𝑠𝑡𝑡𝑒𝑒𝑚𝑚𝑎𝑎𝑡𝑡𝑒𝑒𝑒𝑒 𝑒𝑒𝑛𝑛𝑖𝑖𝑒𝑒𝑒𝑒𝑖𝑖𝑠𝑠

#𝑒𝑒𝑎𝑎𝑡𝑡𝑎𝑎 𝑒𝑒𝑖𝑖𝑒𝑒𝑚𝑚𝑒𝑒𝑛𝑛𝑡𝑡𝑠𝑠

• Instead of operating with a fixed and larger than necessary 𝑁𝑁 we can update 𝑁𝑁 for each iteration

Adaptive RANSAC Objective Robustly fit a model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶 to a data set 𝑆𝑆 = 𝒙𝒙𝑖𝑖

50

Algorithm 1. Let 𝑁𝑁 = ∞, 𝑆𝑆𝐼𝐼𝐼𝐼 = ∅

2. As long as #iterations < 𝑁𝑁 repeat steps 3-5

3. Determine a test model 𝒚𝒚 = 𝑓𝑓 𝒙𝒙;𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 from 𝑛𝑛 random data points 𝒙𝒙1,𝒙𝒙2, … ,𝒙𝒙𝑛𝑛

4. Determine how well the data points in 𝑆𝑆 fits with the test model

– Data points within a distance 𝑡𝑡 of the model constitute a set of inliers 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 ⊆ 𝑆𝑆

5. If 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡 > 𝑆𝑆𝐼𝐼𝐼𝐼 , update the model parameters 𝜶𝜶, the inlier set 𝑆𝑆𝐼𝐼𝐼𝐼 and the max number of iterations 𝑁𝑁 – Set 𝜶𝜶 = 𝜶𝜶𝑡𝑡𝑡𝑡𝑡𝑡 and 𝑆𝑆𝐼𝐼𝐼𝐼 = 𝑆𝑆𝑡𝑡𝑡𝑡𝑡𝑡

– Compute 𝑁𝑁 = 𝑙𝑙𝑙𝑙𝑙𝑙 1−𝑝𝑝𝑙𝑙𝑙𝑙𝑙𝑙 1−𝜔𝜔𝑛𝑛 using that 𝜔𝜔 = 𝑆𝑆𝐼𝐼𝐼𝐼

𝑆𝑆 and 𝑝𝑝 = 0.99






correspondences










Lectures 2018

Feature matching Feature descriptors and matching

• Matching keypoints – Comparing local patches in canonical scale and orientation

• Feature descriptors

– Robust, distinctive and efficient

• Descriptor types – HoG descriptors – Binary descriptors

• Putative matching

– Closest match, distance ratio, cross check

52

Feature matching From keypoints to feature correspondences

53

1. Detect a set of distinct feature points

2. Define a patch around each point

3. Extract and normalize the patch

4. Compute a local descriptor

5. Match local descriptors

Patch at detected position, scale, orientation

54

SIFT descriptor

• Extract a 16x16 patch around detected keypoint • Compute the gradients and apply a Gaussian weighting function • Divide the window into a 4x4 grid of cells • Compute gradient direction histograms over 8 directions in each cell • Concatenate the histograms to obtain a 128 dimensional feature vector • Normalize to unit length

55

Binary descriptors

• Extremely efficient construction and comparison

• Based on pairwise intensity comparisons – Sampling pattern around keypoint – Set of sampling pairs – Feature descriptor vector is a binary string:

• Matching using Hamming distance:

56

0

1 2

2 ( )

1 if ( ) ( )( )

0 otherwise

aa

a N

r ra a

a

F T P

I P I PT P

≤ ≤

=

>=

∑

1 2

0( , )a a

a NL XOR F F

≤ ≤

= ∑

BRISK sampling pattern

BRISK sampling pairs

Feature matching Estimating homographies from feature correspondences

57

• Homography 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′

𝐻𝐻 =ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9

• Automatic point-correspondences

• Wrong correspondences are common

• RANSAC estimation

– Basic DLT (Direct Linear Transform) on 4 random correspondences

– Inliers determined based on the reprojection error 𝜖𝜖𝑖𝑖 = 𝑒𝑒 𝐻𝐻𝒖𝒖𝑖𝑖 ,𝒖𝒖′𝑖𝑖 + 𝑒𝑒 𝒖𝒖𝑖𝑖 ,𝐻𝐻−1𝒖𝒖′𝑖𝑖

• Improve estimate by normalized DLT on inliers or iterative methods for an even better estimate

Homographies induced by central projection

58

• Homography 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′

𝐻𝐻 =ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9

• Point-correspondences can be determined automatically • Erroneous correspondences are common • Robust estimation is required to find 𝐻𝐻

Direct Linear Transform (DLT)

59

• Solve the equation 𝐻𝐻𝒖𝒖� = 𝒖𝒖�′ for the entries of the homography matrix

1 2 3

4 5 6

7 8 9

1 2 3

4 5 6

7 8 9

1

2

9

1 1

1

0 0 0 1 01 0 0 0 0

0 0 0 0

h h h u uh h h v vh h h

uh vh h uuh vh h vuh vh h

hu v uv vv v

hu v uu vu u

uv vv v uu vu uh

′ ′=

′+ + = ′⇔ + + = + + =

′ ′ ′− − −

′ ′ ′⇔ − − − = ′ ′ ′ ′ ′ ′ − − −

A

⇔ =h 0

60

1 2 3

4 5 6

7 8 9 1

uh vh h uuh vh h vuh vh h

′+ + = ′+ + = + + =

( )( )( ) ( )

1 2 3

4 5 6

1 2 3 4 5 6

1 2 3 4 5 6 0

uh vh h v u vuh vh h u u v

uh vh h v uh vh h uuv h vv h v h uu h vu h u h

′ ′ ′+ + ⋅ = ⋅ ⇒ ′ ′ ′+ + ⋅ = ⋅ ′ ′⇒ + + = + +

′ ′ ′ ′ ′ ′⇒ + + − − − =

( )1 2 3

7 8 9

1 2 3 7 8 9

1 2 3 7 8 9

1

0

uh vh h uuh vh h u u

uh vh h uu h vu h u huh vh h uu h vu h u h

′+ + = ⇒ ′ ′+ + ⋅ = ⋅

′ ′ ′⇒ + + = + +′ ′ ′⇒ + + − − − =

( )4 5 6

7 8 9

4 5 6 7 8 9

4 5 6 7 8 9

1

0

uh vh h vuh vh h v v

uh vh h uv h vv h v huh vh h uv h vv h v h

′+ + = ⇒ ′ ′+ + ⋅ = ⋅

′ ′ ′⇒ + + = + +′ ′ ′⇒ + + − − − =

1

2

9

0 0 0 1 01 0 0 0 0

0 0 0 0

hu v uv vv v

hu v uu vu u

uv vv v uu vu uh

′ ′ ′− − −

′ ′ ′− − − = ′ ′ ′ ′ ′ ′ − − −






correspondences










Lectures 2018

World geometry from correspondences

Structure (scene geometry)

Motion (camera geometry) Measurements

Pose estimation Known Estimate 3D to 2D correspondences

Triangulation, Stereo Estimate Known 2D to 2D

correspondences

Reconstruction, Structure from Motion Estimate Estimate 2D to 2D

correspondences

62

Camera calibration

• Camera calibration is a process where we estimate the intrinsic parameters 𝑓𝑓𝑢𝑢, 𝑓𝑓𝑣𝑣, 𝑠𝑠, 𝑐𝑐𝑢𝑢, 𝑐𝑐𝑣𝑣 and distortion parameters for a camera

• Zhang’s method requires that the calibration object is planar. This ensures that the 3D-2D relationship is described by a homography that comes with 2 constraints since it is partially a rotation matrix

• The 2 constraints allow us to estimate the intrinsic parameters using linear methods. Non-linear estimation methods can then be used to refine the intrinsic parameters and estimate the distortion parameters

63

[ ] [ ]1 2 3 01 1 1

1

Xu X X

Yv K K Y H Y

= = =

1 2r r r t r r t

Distortion models

64

• A distortion model describes the relationship between undistorted coordinates 𝒙𝒙 and distorted coordinates 𝒙𝒙′ of the normalized image plane

• This example model describes both radial distortion and tangential distortion

( ) ( )( ) ( )

2 4 2 21 2 1 2

2 4 2 21 2 1 2

1 2 2

1 2 2

x x k r k r p x y p r x

y y k r k r p r y p x y

′ ′ ′ ′= + + + + +

′ ′ ′ ′= + + + + +2 2 2where r x y′ ′= +

Undistortion

65

undistortion

[ ]K R=u t X[ ]K R≠u t X

• We can use the distortion model to warp the original image into the so called undistorted image

• The undistorted image satisfy the perspective camera model and are thus well suited for geometrical computations

• Since the distortion model depends on 𝐾𝐾 for the undistorted camera, it is custom to estimate both in a common calibration process

Images: http://www.robots.ox.ac.uk/~vgg/hzbook/

Original image Undistorted image

http://www.robots.ox.ac.uk/%7Evgg/hzbook/

Single-view geometry Pose from known 3D points

• Pose estimation relative to a world plane • PnP

66

Pose estimation relative to a world plane

67

𝑿𝑿𝝅𝝅

𝐻𝐻𝑊𝑊𝐶𝐶

𝐾𝐾

𝐶𝐶

𝒖𝒖

𝒙𝒙

𝒖𝒖� = 𝑲𝑲 𝑹𝑹 | 𝒕𝒕 𝑿𝑿�𝝅𝝅

𝑧𝑧

𝑥𝑥 𝑦𝑦 𝑊𝑊

𝜋𝜋

[ ]

[ ]

[ ]

1 2 3

1 2

|

, , ,01

, ,1

CW

xy

xy

π

π

=

=

=

=

u K R t X

K r r r t

K r r t

H x

• We can map points on the world plane into image coordinates by using the perspective camera model

1C

W

=

R tT

0

n-Point Pose Problem (PnP)

• Several different methods available – Typically fast non-iterative methods – Minimal in number of points – Accuracy comparable to iterative methods

• Examples

– P3P, EPnP • Estimate pose, known K

– P4Pf • Estimate pose and focal length

– P6P • Estimates P with DLT

– R6P • Estimate pose with rolling shutter

68

Single-view geometry Scene geometry from a single image

• Vanishing points – Perspective projection of infinitely long lines

(non-parallel to the image plane) – Parallel lines have the same vanishing point

• Vanishing lines

– Perspective projection of planes (non-parallel to the image plane)

– Parallel planes have the same vanishing line

69

Vertical structures on a horizontal plane

70

Straight and level camera

Translated downwards

Translated upwards

Translated rightwards

Tilted upwards

Tilted downwards






correspondences










Lectures 2018

Stereo imaging Basic epipolar geometry

72

• Epipolar geometry – Epipolar planes – Epipolar lines – Epipoles

Stereo imaging Stereo imaging

• Stereo imaging – Horizontal epipolar lines – Disparity – 3D from disparity – Stereo rectification

73

Stereo geometry

• Parallel identical cameras – Translated along x-axis

74

𝑥𝑥𝐿𝐿 𝑦𝑦𝐿𝐿

𝑧𝑧𝐿𝐿

𝑏𝑏𝑥𝑥

𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍

Stereo geometry


• Horizontal epipolar lines

– Corresponding points lie along the same row in the two images

75


𝑧𝑧𝐿𝐿

𝑏𝑏𝑥𝑥

𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍

Stereo geometry




• Depth from disparity

76


𝑧𝑧𝐿𝐿

𝑏𝑏𝑥𝑥

𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍 𝑏𝑏𝑥𝑥

𝑍𝑍

𝑒𝑒

Stereo geometry




• 3D from disparity

77


𝑧𝑧𝐿𝐿

𝑏𝑏𝑥𝑥

𝑷𝑷𝐿𝐿 = 𝑋𝑋,𝑌𝑌,𝑍𝑍 𝑏𝑏𝑥𝑥

𝑍𝑍

𝑒𝑒

xL

bX xd

= xL

bY yd

=

Stereo rectification

• Reproject image planes onto a common plane parallel to the line between the camera centers

• The epipolar lines are horizontal after this transformation

• Two homographies

• C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

78

𝑥𝑥𝐶𝐶1 𝑦𝑦𝐶𝐶1

𝑧𝑧𝐶𝐶1

𝑥𝑥𝐶𝐶2 𝑦𝑦𝐶𝐶2

𝑧𝑧𝐶𝐶2

Stereo imaging Stereo processing

• Stereo processing – Sparse vs dense matching – DSI – Typical failures – Removing failures vs smoothness

79

Stereo processing

• Sparse stereo – Extract keypoints – Match keypoints along the same row – Compute 3D from disparity

• Dense stereo – Try to match all pixels along rows – Compute disparity image by finding the best disparity for each pixel – Refine and clean disparity image – Compute dense 3D point cloud or surface from disparity

80

419

657

1038

642

610

566

526

574

534

672

571

667

563

476

562

620

616

638

548

637

584 560

475

617

564

584

1019

567

613

565 569

469

665

578

651

597

431

683

558

512

613

445

473

547

571

630

657

443

657

1046

537

546

668

554

708 693

655

465

623

559

593

511

1063

504

523

664

680

625

573 563

567

691

614

645

799

653

578

512

564

618

567

491

560

535

618

518

655

567

555

447

420 419

614

696

566

694

418

640

514

507

655

423

482

657

648

567

69

415

598

529

557

605

469

508

1041 651

470

721

450

561

554

525

774

654

510

423

599

661

418

587 650

491

530

570

628

622

663

578 546

558

651

555

610 656

488

657

434

595

685

1067

775

556

472

510

395

563

416

494

1040

504

632

601

569

625

476 488

683

530

550

432 418

614

403

419

572 547

515

635

415

640

658

551

696

513

547

540 567

682

487 517

529

1076

462

434

409

416

-300-200-1000100200300

Disparity (pixels)

-0.5

0

0.5

Sim

ilarit

y

Dense stereo matching

81

• For a patch in the left image – Compare with patches along

the same row in the right image – Select patch with highest score

• Repeat for all pixels in the left image






correspondences










Lectures 2018

Two-view geometry Epipolar geometry

83

• Algebraic representation of epipolar geometry – The essential matrix – The fundamental matrix

• Estimating the epipolar geometry

– Estimate 𝐹𝐹: 7pt, 8pt, RANSAC – Estimate 𝑆𝑆: 5pt

𝒖𝒖 𝒖𝒖′

𝐶𝐶

𝒙𝒙 𝒙𝒙′

𝐶𝐶′

𝑿𝑿

𝐹𝐹

𝐾𝐾 𝐾𝐾′

𝑆𝑆

Representations of epipolar geometry • Observing the same points in two views puts a

strong geometrical constraint on the cameras • Algebraically this epipolar constraint is usually

represented by two related 3 × 3 matrices

• The fundamental matrix 𝐹𝐹 𝒖𝒖�′𝑇𝑇𝐹𝐹𝒖𝒖� = 0

• The essential matrix 𝑆𝑆

𝒙𝒙�′𝑇𝑇𝑆𝑆𝒙𝒙� = 0

• These are related through the two calibration matrices 𝐾𝐾 and 𝐾𝐾′

84


𝐶𝐶


𝐶𝐶′

𝑿𝑿

𝐹𝐹


𝑆𝑆

The essential matrix E • Let 𝒙𝒙𝐶𝐶 ↔ 𝒙𝒙′𝐶𝐶𝐶 be corresponding points in the

normalized image planes and let the pose of 𝐶𝐶 relative to 𝐶𝐶′ be

𝜉𝜉𝐶𝐶𝐶𝐶𝐶 = 𝑅𝑅 𝒕𝒕𝟎𝟎 1

• In terms of vectors, the equation for the epipolar

plane can be written like 𝒙𝒙�′𝐶𝐶𝐶 × 𝒕𝒕 ∙ 𝑅𝑅 𝒙𝒙�𝐶𝐶 = 0

• Rewritten in terms of matrices this takes the form

𝒙𝒙�′𝑇𝑇𝐶𝐶𝐶 𝒕𝒕 ×𝑅𝑅 𝒙𝒙�𝐶𝐶 = 0

• This relationship defines the essential matrix 𝑆𝑆 = 𝒕𝒕 ×𝑅𝑅

85

𝒙𝒙�′𝑇𝑇𝑆𝑆𝒙𝒙� = 0

𝒙𝒙𝐶𝐶

𝒙𝒙′𝐶𝐶𝐶

𝐶𝐶

𝐶𝐶′

𝒕𝒕 𝒙𝒙�′𝐶𝐶𝐶

𝒙𝒙�𝐶𝐶

The fundamental matrix F

• The epipolar constraint on image points is naturally connected to the essential matrix by the calibration matrices 𝐾𝐾 and 𝐾𝐾′

• Combined with the epipolar constraint for normalized image points we get

• This defines the fundamental matrix 𝐹𝐹 = 𝐾𝐾′−𝑇𝑇𝑆𝑆𝐾𝐾−1

86

𝒖𝒖�′𝑇𝑇𝐹𝐹𝒖𝒖� = 0


𝐶𝐶


𝐶𝐶′

𝑿𝑿

𝐹𝐹


𝑆𝑆 1

1

C C

C C C T T T

K KK K K

−

′ ′ ′− −

= ⇒ =

′ ′ ′ ′ ′ ′ ′ ′ ′= ⇒ = ⇒ =

x u x ux u x u x u

1

00

C T C

T T

EK EK

′

− −

′ =

′ ′ =

x xu u

Two-view geometry Triangulation

87

• Triangulation – Estimate a 3D point 𝑿𝑿𝑖𝑖 for a noisy 2D correspondence under the assumption that camera matrices 𝑃𝑃 and 𝑃𝑃′ are known

• Minimal 3D error – Choose 𝑿𝑿𝑖𝑖 to be the mid-point between back projected image points

• Minimal algebraic error – Combine the two perspective models to get a homogeneous system of linear equations, then determine 𝑿𝑿𝑖𝑖 by SVD

• Minimal reprojection error – Determine the epipolar plane (and points 𝒖𝒖�𝒊𝒊 and 𝒖𝒖�𝑖𝑖𝐶) that minimize the reprojection error by minimizing a 6th order polynomial

𝐶𝐶

𝐶𝐶′

𝑿𝑿𝑖𝑖

𝒙𝒙𝒊𝒊 𝒙𝒙𝑖𝑖𝐶

𝒙𝒙�𝒊𝒊

𝒙𝒙�𝑖𝑖𝐶

Two-view geometry Pose from epipolar geometry

88

• Pose from epipolar geometry

• Non-planar case – Estimate epipolar geometry – Estimate relative pose from 𝑆𝑆

• Planar case

– Estimate homography – Estimate relative pose from 𝐻𝐻

• Visual odometry

𝜉𝜉𝑘𝑘0 cam0 cam𝑘𝑘

cam𝑘𝑘−1 cam𝑘𝑘+1

𝜉𝜉𝑘𝑘+1𝑘𝑘

Pose from epipolar geometry

89

• Since we only can estimate 𝑆𝑆 up to scale, we can always rescale it so that the SVD of 𝑆𝑆 has the form where det 𝑈𝑈 = det 𝑉𝑉 = 1

• Then one can show that 𝑅𝑅 ∈ 𝑈𝑈𝑊𝑊𝑉𝑉𝑇𝑇 ,𝑈𝑈𝑊𝑊𝑇𝑇𝑉𝑉𝑇𝑇 𝒕𝒕 = ±𝜆𝜆𝒖𝒖3; 𝜆𝜆 ∈ ℝ\0

where

𝑊𝑊 =0 1 0−1 0 00 0 1

[ ]1

1 2 3 2

3

1 0 00 1 00 0 0

T

T T

T

E UDV = =

vu u u v

v

Visual odometry

• Based on what we now know it is possible to do visual odometry, i.e. estimating the motion of a single camera from captured images

• A visual odometry algorithm can look like this – How to compute 𝒕𝒕𝑘𝑘𝑘𝑘+1 from 𝒕𝒕𝑘𝑘−1𝑘𝑘 ? – Determine two scene points 𝑿𝑿𝑘𝑘−1,𝑘𝑘

𝑘𝑘 and 𝑿𝑿′𝑘𝑘−1,𝑘𝑘

𝑘𝑘 by triangulation of two 2D-correspondences 𝒙𝒙𝑘𝑘−1 ↔ 𝒙𝒙𝑘𝑘 and 𝒙𝒙′𝑘𝑘−1 ↔ 𝒙𝒙′𝑘𝑘

– Determine the same two scene points 𝑿𝑿𝑘𝑘,𝑘𝑘+1𝑘𝑘

and 𝑿𝑿′𝑘𝑘,𝑘𝑘+1𝑘𝑘 by triangulation of two 2D-

correspondences 𝒙𝒙𝑘𝑘 ↔ 𝒙𝒙𝑘𝑘+1 and 𝒙𝒙′𝑘𝑘 ↔ 𝒙𝒙′𝑘𝑘+1 – Then

Visual odometry from 2D-correspondences 1. Capture new frame 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘+1 2. Extract and match features between 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘+1

and 𝑒𝑒𝑚𝑚𝑖𝑖𝑘𝑘 3. Estimate the essential matrix 𝑆𝑆𝑘𝑘,𝑘𝑘+1 4. Decompose the 𝑆𝑆𝑘𝑘,𝑘𝑘+1 into 𝑅𝑅𝑘𝑘+1𝑘𝑘 and 𝒕𝒕𝑘𝑘+1𝑘𝑘 to

get the relative pose 𝜉𝜉𝑘𝑘+1𝑘𝑘 = 𝑅𝑅𝑘𝑘+1𝑘𝑘 𝒕𝒕𝑘𝑘+1𝑘𝑘

5. Compute 𝒕𝒕𝑘𝑘+1𝑘𝑘 from 𝒕𝒕𝑘𝑘𝑘𝑘−1 and rescale 𝒕𝒕𝑘𝑘+1𝑘𝑘 accordingly

6. Calculate the pose of camera 𝑘𝑘 + 1 relative to the first camera

𝜉𝜉𝑘𝑘+10 = 𝜉𝜉𝑘𝑘0 𝜉𝜉𝑘𝑘+1𝑘𝑘

90

11, 1,

1 , 1 , 1

k k kk k k k k

k k kk k k k k

−− −

+ + +

′−=

′−

t X X

t X X

𝜉𝜉𝑘𝑘0 cam0 cam𝑘𝑘

cam𝑘𝑘−1 cam𝑘𝑘+1

𝜉𝜉𝑘𝑘+1𝑘𝑘






correspondences










Lectures 2018

Multiple-view geometry Multiple-view geometry

92

• Multiple-view geometry • Correspondences

– Two-view vs Three-view – Fundamental matrix vs Trifocal tensor

Correspondences

93

Three views • As we just saw, point transfer can be done

directly from the epipolar constraints 𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2

• However, this fails for points in the plane

defined by the three camera centers – the trifocal plane – since the epipolar lines then will coincide

• The trifocal tensor allows point transfer also for points in the trifocal plane

img1

𝒖𝒖1 ⊗

img2

𝒖𝒖2 ×

img3

𝒖𝒖3 ⊠

𝐹𝐹31

𝐹𝐹32 𝐹𝐹12

𝐹𝐹31𝒖𝒖�1

𝐹𝐹32𝒖𝒖�2

𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2

Example Point transfer based on epipolar constraints

94

𝒖𝒖1 𝒖𝒖2

𝒖𝒖3

𝐹𝐹31 𝐹𝐹32

𝐹𝐹12

𝒖𝒖�3 = 𝐹𝐹31𝒖𝒖�1 × 𝐹𝐹32𝒖𝒖�2

Uncertainty in feature points transfer to uncertainty in the epipolar lines Hence the reliability of the predicted point depends on the angle between the epipolar lines A large angle is good!

Multiple-view geometry Structure from motion

95

• Structure from motion – Sequential SfM – Bundle adjustment

𝜖𝜖 = ��𝑒𝑒 𝒖𝒖�𝑖𝑖𝑖𝑖 ,𝑃𝑃𝑖𝑖𝑿𝑿�𝑖𝑖2

𝑛𝑛

𝑖𝑖=1

𝑚𝑚

𝑖𝑖=1

Structure from Motion Problem Given 𝑚𝑚 images of 𝑛𝑛 fixed 3D points, estimate the 𝑚𝑚 projection matrices 𝑃𝑃𝑖𝑖 and the 𝑛𝑛 points 𝑿𝑿𝑖𝑖 from the 𝑚𝑚 ∙ 𝑛𝑛 correspondences 𝒖𝒖𝑖𝑖𝑖𝑖 ↔ 𝒖𝒖𝑘𝑘𝑖𝑖 • We can solve for structure and motion when

2𝑚𝑚𝑛𝑛 ≥ 11𝑚𝑚 + 3𝑛𝑛 − 15 • In the general/uncalibrated case, cameras and

points can only be recovered up to a projective ambiguity (𝒖𝒖�𝑖𝑖𝑖𝑖 = 𝑃𝑃𝑖𝑖𝑄𝑄−1𝑄𝑄𝑿𝑿�𝑖𝑖)

• In the calibrated case, they can be recovered up to a similarity (scale)

– Known as Euclidean/metric reconstruction

96

𝒖𝒖�𝑖𝑖𝑖𝑖 = 𝑃𝑃𝑖𝑖𝑿𝑿�𝑖𝑖

Multiple-view geometry Multiple-view stereo

97

• Multi-view stereo – Plane-sweep – Volumetric stereo – Surface expansion

• Surface reconstruction

Plane sweep

• Sweep planes at different depths

98

Robert Collins, A Space-Sweep Approach to True Multi-Image Matching, CVPR 1996. D. Gallup, J.-M. Frahm, P. Mordohai, Q. Yang and M. Pollefeys, Real-Time Plane-Sweeping Stereo with Multiple Sweeping Directions, CVPR 2007

𝑲𝑲𝒌𝒌

Reference camera Camera k

𝑿𝑿

𝐻𝐻

𝑲𝑲𝒓𝒓𝒓𝒓𝒓𝒓

𝑃𝑃𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃𝑘𝑘



𝒖𝒖 = 𝑲𝑲𝑟𝑟𝑟𝑟𝑟𝑟 𝐼𝐼 | 𝟎𝟎 𝑿𝑿 𝒖𝒖′ = 𝑲𝑲𝑘𝑘 𝑅𝑅𝑘𝑘 | 𝒕𝒕𝑘𝑘 𝑿𝑿

𝒏𝒏𝑚𝑚−1

𝑒𝑒𝑚𝑚−1

http://www.ri.cmu.edu/pub_files/pub1/collins_robert_1996_1/collins_robert_1996_1.pdf

https://www.inf.ethz.ch/personal/pomarc/pubs/GallupCVPR07.pdf

Plane sweep


99

𝑿𝑿

𝐻𝐻






𝒏𝒏𝑚𝑚

𝑒𝑒𝑚𝑚


𝑲𝑲𝒌𝒌




Plane sweep


100

𝐻𝐻






𝒏𝒏𝑚𝑚+1

𝑒𝑒𝑚𝑚+1


𝑲𝑲𝒌𝒌




Plane sweep and ambiguities

• Multiple views can resolve ambiguities in difficult areas!

101

Plane sweep through oriented planes

• Fronto-parallel

• Other plane orientations

102

𝑿𝑿

𝐻𝐻






𝒏𝒏𝑚𝑚

𝑒𝑒𝑚𝑚


𝑲𝑲𝒌𝒌


[ ]( , )

1m

m Tref m

dZ u vu v K −

−=

n

[ ]0 0 1 Tm = −n

( , )m mZ u v d=



Plane sweep with ground normal

103

𝑒𝑒𝑚𝑚 = 200 meter below reference camera

Red:

Green:

Blue:

𝑍𝑍𝑚𝑚 = 790 meter


104


Red:

Green:

Blue:



105


Red:

Green:

Blue:




106


Red:

Green:

Blue:








correspondences










Lectures 2018

Image Analysis

Image Segmentation: • Thresholding techniques • Clustering methods for segmentation • Morphological operations.

108

Segmentation methods

Active contours (Snakes, Scissors, Level Sets) Split and merge (Watershed, Divisive & agglomerative clustering, Graph-based segmentation) Gray level thresholding K-means (parametric clustering) Mean shift (non-parametric clustering) Normalized cuts Graph cuts

109

Supervised color based segmentation (region growing)

Image analysis

110

Image feature extraction: • Feature extraction • Feature selection

Feature extraction

The goal is to generate features that exhibit high information-packing properties: • Extract the information from the raw data that is most relevant for

discrimination between the classes • Extract features with low within-class variability and high between

class variability • Discard redundant information. • The information in an image f[i,j] must be reduced to enable reliable

classification (generalization) • A 64x64 image 4096-dimensional feature space!

111

Feature types (regional features)

112

• Colour features • Shape features • Histogram (texture) features:

– Mean gray level – Variance – Skewness – Kurtosis – Entropy – …

Image analysis

Introduction to Machine learning: • Recognition of individuals (instance recognition) • Discrimination between classes (pattern recognition, classification)

113

Supervised learning

Classifiers and training methods

• Bayes classifier • Nearest-neighbors and K-nearest-neighbors • Parzen windows • Linear and higher order discriminant functions • Neural nets • Support Vector Machines (SVM) • Decision trees • Random forest • …

114






correspondences










Lectures 2018

Object detection

Descriptor-based detection: • Feature descriptors (HoG, SIFT, SURF, …) • Object detection • Instance recognition. Applications: • Pedestrian detection • Face detection • Face recognition.

116

Object detection

117

Introduction to deep learning: • Deep learning • Artificial neural networks • Convolutional neural networks (CNN)

Deep Learning for Object Recognition

«Ship»

Millions of images Millions of parameters Thousands of classes

(AlexNet)

118

· image formation, processing and features • image formation – light, cameras, optics and...

Documents