3d photography: features & correspondences - cvg

3D Photography:Features &

Correspondences

Feb 17 Introduction

Feb 24 Geometry, Camera Model, Calibration

Mar 3 Features, Tracking / Matching

Mar 10 Project Proposals by Students

Mar 17 Structure from Motion (SfM) + 2 papers

Mar 24 Dense Correspondence (stereo / optical flow) + 2 papers

Mar 31 Bundle Adjustment & SLAM + 2 papers

Apr 7 Multi-View Stereo & Volumetric Modeling + 2 papers

Apr 14 Project Updates

Apr 21 Easter

Apr 28 3D Modeling with Depth Sensors + 2 papers

May 5 3D Scene Understanding + 2 papers

May 12 4D Video & Dynamic Scenes + 2 papers

May 19 Guest lecture: KinectFusion by Shahram Izadi

May 26 Final Demos

Schedule (tentative)

2D-2D

2D-3D 2D-3D

mimi+1

M

Correspondences are at the heart of 3D reconstruction from images

Today: Features & Correspondences

Feature matching vs. tracking

Extract features independently and then match by comparing descriptors

Extract features in first images and then try to find same feature back in next view

What is a good feature?

Image-to-image correspondences are key to passive triangulation-based 3D reconstruction

Compare intensities pixel-by-pixel

Comparing image regions

I(x,y) I´(x,y)

Sum of Square Differences

Dissimilarity measures

Feature points

• Required properties:

• Well-localized

• Stable across views, „repeatable“

(i.e. same 3D point should be extractedas feature for neighboring viewpoints)

Feature point extraction

homogeneous

edge

corner

Find points (local image patches) that differ as much as possible from all neighboring points


• Approximate SSD for small displacement ∆

• Image difference, square difference for pixel

• SSD for window


homogeneous

edge

corner

Find points for which the following is maximum

i.e. maximize smallest eigenvalue of M

Harris corner detector

• Only use local maxima, subpixel accuracy through second order surface fitting

• Select strongest features over whole image and over each tile (e.g. 1000/image, 2/tile)

• Use small local window:

• Maximize „cornerness“:

Simple matching• for each corner in image 1 find the corner in

image 2 that is most similar and vice-versa

• Only compare geometrically compatible points

• Keep mutual best matches

What transformations does this work for?



I(x,y) I´(x,y)

Sum of Square Differences

Dissimilarity measures



I(x,y) I´(x,y)

Zero-mean Normalized Cross Correlation

Similarity measures

Feature matching: example

0.96 -0.40 -0.16 -0.39 0.19

-0.05 0.75 -0.47 0.51 0.72

-0.18 -0.39 0.73 0.15 -0.75

-0.27 0.49 0.16 0.79 0.21

0.08 0.50 -0.45 0.28 0.99

1 5

24

3

1 5

24

3

What transformations does this work for?

What level of transformation do we need?

Wide baseline matching

• Requirement to cope with larger variations between images• Translation, rotation, scaling

• Foreshortening

• Non-diffuse reflections

• Illumination

} geometric

transformations

photometric

changes}

Invariant detectors

Scale invariant

Affine invariant(approximately invariant

w.r.t. perspective/viewpoint)

Rotation invariant

2D Transformations of a Local Patch

Block

Matching

e.g. MSER

In practise

hardly observable

in small

patches !

Example: Find Correspondences between these images using the MSER Detector [Matas‘02]

MSER Features

Local regions, not points !

Extremal Regions:

- Much Brighter than

Surrounding

- Use intensity threshold

Extremal Regions:

- OR: Much Darker than

Surrounding

- Use intensity threshold

Regions: Connected Pixels at some threshold

- Region Size = # Pixels

- Maximally stable: Size Constant near some

threshold

A Sample Feature

„T“ is maximally stable with respect to surrounding

- Compute „center of gravity“

- Compute Scatter (PCA / Ellipsoid)

Different Images: Different positions/sizes/shapes

Ellipse abstracts from pixels !

Geometric representation: position/size/shape

Still: How to compare ?

Idea: Normalize to „Default“ Position/Size/Shape !

⇒ e.g. Circle of Radius 16 Pixels !

Idea: Normalize to „Default“ Position/Size/Shape !

Ok, but 2D orientation ?

• Idea (Lowe‘99): Run over all Pixels:Chart Local Gradient Orientation in Histogram

• Find Dominant Orientation in Histogram

• Rotate Local Patch into Dominant Orientation

Each normalized patch obtained from single image !

Wrap-up MSER

• detect sets of pixels brighter/darker than surr.

• fit elliptical shape to pixel set

• warp image so that ellipse becomes circle

• rotate to dominant gradient direction [otherconstructions possible as well]

⇒Affine normalization of feature leads to similarpatches in different views !

Two MSERegions: Are they in correspondence ?

Traditional Matching Approach:

Compare Regions (Sum of Squared Differences)

- Small Misalignment

- Brightness Change

⇒ More Tolerant Comparison ?

SIFT Descriptor [Lowe‘99]:

- Brightness offset: use only gradients !

For each Sector:

- Store Orientations of Gradients !

Gradient Magnitude

Partition into Sectors

Gradient Orientation/Magnitude

Quantize Gradient Orientation, e.g. 45° Steps

Orientation Histogram per SectorGradient Orientation/Magnitude

Orientation Histogram (Magnitude as Weight)

m Sectors with n Orientations: (m·n) values

Construct Vector

35

12

10

25

…

29

35

12

10

25

…

29

0.12

0.04

0.03

0.08

…

0.10

Normalize

(Suppresses

Changing

Contrast

Effects)

… …

Summed Gradient

Magnitudes for

Different Sectors and

Orientations

„SIFT

Descriptor“:

128 Bytes

Memory Comparison

11x11 Patch:

Raw Grey Values =

121 Bytes

Wrap Up:

Normalized Patch Comparison vs. Descriptor

Usage of Gradients:

⇒ Intensity Offset Compensation

Subdivision into Sectors / Per-Sector Histogram:

⇒Small Alignment Error Compensation

Normalization of Histogram Vector:

⇒ Image Contrast Compensation

But most important:

Avoid sudden „descriptor jumps“

Classical Histogram (Quantization 45°):

22° quantized/rounded to 0°

23° quantized/rounded to 45°

Small Differences can lead to different bins !

Feature Position, Size, Shape, Orientation uncertain,

Image Content noisy !

Descriptor MUST tolerate this (no sudden changes !)

Solution: „Soft-Binning“ !

Histogram (Quantization 45°):

20

°

0

°

45

°

90

°

…

Classical

(closest bin)

1.0

22

°

2.0

If orientation is 3°

different, all

measurements go to

second bin !

=> Sudden Change in

Histogram from

(2 0 0 0) to (0 2 0 0)

Histogram (Quantization 45°):

20

°

0

°

45

°

90

°

…

Soft-Binning

0.56

22

°

1.07

If orientation is 3°

different, descriptor

changes only gradually !

0.440.93

0.56 0.44

Soft Weights:

„Bin Correctness“

Wrap-up

Detector:

- Find interesting regions (position/size/shape)

- Assign dominant gradient orientation

- Normalize regions

Descriptor:

- Compute „signature“ upon normalized region

- Behave smoothly in presence of distortions:

brightness changes / normalization inaccuracies

How to find correspondences ?

For each Region: 128-dim. Descriptor

Matching Scenario I

Two images in a dense foto sequence:

- think about maximum movement d (e.g. 50 pixel)

- Search in a window +/- d of old position

- Compare descriptors, choose most similar

Matching Scenario II

Two arbitrary images / Wide baseline

- Compare every descriptor with every other (e.g. GPU)

- OR: Find small set of matches, predict others

- OR: Find nearest neighbor in descriptor space

Searching Descriptor Space

Key Ideas:

- Each descriptor consists of 128 numbers

Imagine vector from IR128

- Correspondending descriptors: not far apart !

- Arrange all descriptors of image 1 in kd-tree

(imagine „octree“ but with more dims)

- For each descriptor of image 2:

Find (approximate) nearest neighbor in tree

Searching Descriptor Space

„Learn“ important dimensions of 128D space for a

given scene, e.g. PCA or LDA

Project descriptors to important dimensions, use kd-tree

Matching Techniques

Spatial Search Window:

- Requires/exploits good prediction

- Can avoid far away similar-looking features

- Good for sequences

Descriptor Space:

- Initial tree setup

- Fast lookup for huge amounts of features

- More sophisticated outlier detection required

- Good for asymmetric (offline/online) problems,

registration, initialization, object recognition, wide

baseline matching

Correspondence Verification

Features have only very local view => Mismatches

How to detect ?

- Discard Matches with low similarity

- Delete „non-distinctive“ features (those with close

match in same image or with similar 2nd best match)

- Check for bi-directional consistency

- Geometric verification, e.g. RANSAC

Object Detection / Pose Estimation using single MSER

Affine feature: position(2x), shape(3x), orientation(1x)

6 degree of freedom, more than 2D simple point !

2D Transformations of a Local Patch

Block

Matching

e.g. MSERe.g. SIFT

Lowe’s SIFT features

Detector + Descriptor

Recover features with position, orientation and scale

(Lowe, ICCV99)

Position

• Look for strong responses of DOG filter (Difference-Of-Gaussian)

• Only consider local maxima

Scale

• Look for strong responses of DOG filter (Difference-Of-Gaussian) over scale space

• Only consider local maxima in both position and scale

• Fit quadratic around maxima for subpixel

Minimum contrast and “cornerness”

Orientation

• Create histogram of local gradient directions computed at selected scale

• Assign canonical orientation at peak of smoothed histogram

• Each key specifies stable 2D coordinates (x, y, scale, orientation) 0 2π

SIFT descriptor

• Thresholded image gradients are sampled over 16x16 array of locations in scale space

• Create array of orientation histograms

• 8 orientations x 4x4 histogram array = 128 dimensions

• Affine feature evaluation + binaries:http://www.robots.ox.ac.uk/~vgg/research/affine/

• SIFT + MSER + some tools:http://vlfeat.org

• SURF:http://www.vision.ee.ethz.ch/~surf/

• GPU-SIFT:http://www.cs.unc.edu/~ccwu/siftgpu/

• DAISY (dense descriptors)http://cvlab.epfl.ch/~tola/daisy.html

• FAST[er] (simple but …)http://svr-www.eng.cam.ac.uk/~er258/work/fast.html

Some Feature Resources

Check also opencv + try to google

• BRIEF[Calonder10]: binary descriptor (tests=position a darker than b), compare descriptors by XOR (Hamming) + POPCNT

• RIFF[Takacs10]: CENSURE + gradients tangential/radial

• ORB[Rublee11] FAST+orientation

• BRISK[Leutenegger11] FAST+scale+BRIEF

• FREAK[Alahi12] FAST + “daisy”-BRIEF

• Lucid[Ziegler12]: “sort intensities”

• D-BRIEF[Trzcinski12]:Box-Filter+learned projection+BRIEF

• LDA HASH[Strecha12]: binary testson descriptor

Recent Variants and Accelerations (much faster, but this usually comes at a price)

Features:

local normalization + robust descriptors

Allow

• changing perspective/illumination/scale

• scenarios with many occlusions

• different reasoning (descriptor vectors)

Require• (Nearly) planarity across feature in 3d• „detectable“ regions, enough structure

But• fewer features / slower• invariance vs. descriptiveness:

Descriptors can be similar when regions are not !

=> What level of invariance / speed / properties doYOU really need ?

Feature tracking

• Identify features and track them over video

• Small difference between frames

• potential large difference overall

• Standard approach:

KLT (Kanade-Lukas-Tomasi)

Tracking corners through video

Good features to track

• Use same window in feature selection as for tracking itself

• Compute motion assuming it is small

Affine is also possible, but a bit harder (6x6 instead of 2x2)

differentiate:

Example

Simple displacement is sufficient between consecutive frames, but not to compare to reference template

Example

Synthetic example

Good features to keep tracking

Perform affine alignment between first and last frame

Stop tracking features with too large errors

• Brightness constancy assumption

Intensity Linearization

(small motion)

• 1D example

possibility for iterative refinement

• Brightness constancy assumption


(small motion)

• 2D example

(2 unknowns)

(1 constraint)?

isophote I(t)=I

isophote I(t+1)=I

the “aperture” problem


• How to deal with aperture problem?

Assume neighbors have same displacement

(3 constraints if color gradients are different)

Lucas-Kanade

Assume neighbors have same displacement

least-squares:

Revisiting the small motion assumption

• Is this motion small enough?

• Probably not—it’s much larger than one pixel (1st order Taylor not sufficient)

• How might we solve this problem?* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

Reduce the resolution!

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003

image It-1 image I

Gaussian pyramid of image It-1 Gaussian pyramid of image I

image Iimage It-1u=10 pixels

u=5 pixels

u=2.5 pixels

u=1.25 pixels

Coarse-to-fine optical flow estimation

slides from

Bradsky and Thrun

image Iimage J

Gaussian pyramid of image It-1 Gaussian pyramid of image I

image Iimage It-1

Coarse-to-fine optical flow estimation

run iterative L-K

run iterative L-K

warp & upsample

.

.

.

slides from

Bradsky and Thrun

Next week: Project Proposals

3d photography: features & correspondences - cvg

Documents