3d photography: features & correspondences - cvg
TRANSCRIPT
3D Photography:Features &
Correspondences
Feb 17 Introduction
Feb 24 Geometry, Camera Model, Calibration
Mar 3 Features, Tracking / Matching
Mar 10 Project Proposals by Students
Mar 17 Structure from Motion (SfM) + 2 papers
Mar 24 Dense Correspondence (stereo / optical flow) + 2 papers
Mar 31 Bundle Adjustment & SLAM + 2 papers
Apr 7 Multi-View Stereo & Volumetric Modeling + 2 papers
Apr 14 Project Updates
Apr 21 Easter
Apr 28 3D Modeling with Depth Sensors + 2 papers
May 5 3D Scene Understanding + 2 papers
May 12 4D Video & Dynamic Scenes + 2 papers
May 19 Guest lecture: KinectFusion by Shahram Izadi
May 26 Final Demos
Schedule (tentative)
2D-2D
2D-3D 2D-3D
mimi+1
M
Correspondences are at the heart of 3D reconstruction from images
Today: Features & Correspondences
Feature matching vs. tracking
Extract features independently and then match by comparing descriptors
Extract features in first images and then try to find same feature back in next view
What is a good feature?
Image-to-image correspondences are key to passive triangulation-based 3D reconstruction
Compare intensities pixel-by-pixel
Comparing image regions
I(x,y) I´(x,y)
Sum of Square Differences
Dissimilarity measures
Feature points
• Required properties:
• Well-localized
• Stable across views, „repeatable“
(i.e. same 3D point should be extractedas feature for neighboring viewpoints)
Feature point extraction
homogeneous
edge
corner
Find points (local image patches) that differ as much as possible from all neighboring points
Feature point extraction
• Approximate SSD for small displacement ∆
• Image difference, square difference for pixel
• SSD for window
Feature point extraction
homogeneous
edge
corner
Find points for which the following is maximum
i.e. maximize smallest eigenvalue of M
Harris corner detector
• Only use local maxima, subpixel accuracy through second order surface fitting
• Select strongest features over whole image and over each tile (e.g. 1000/image, 2/tile)
• Use small local window:
• Maximize „cornerness“:
Simple matching• for each corner in image 1 find the corner in
image 2 that is most similar and vice-versa
• Only compare geometrically compatible points
• Keep mutual best matches
What transformations does this work for?
Compare intensities pixel-by-pixel
Comparing image regions
I(x,y) I´(x,y)
Sum of Square Differences
Dissimilarity measures
Compare intensities pixel-by-pixel
Comparing image regions
I(x,y) I´(x,y)
Zero-mean Normalized Cross Correlation
Similarity measures
Feature matching: example
0.96 -0.40 -0.16 -0.39 0.19
-0.05 0.75 -0.47 0.51 0.72
-0.18 -0.39 0.73 0.15 -0.75
-0.27 0.49 0.16 0.79 0.21
0.08 0.50 -0.45 0.28 0.99
1 5
24
3
1 5
24
3
What transformations does this work for?
What level of transformation do we need?
Wide baseline matching
• Requirement to cope with larger variations between images• Translation, rotation, scaling
• Foreshortening
• Non-diffuse reflections
• Illumination
} geometric
transformations
photometric
changes}
Invariant detectors
Scale invariant
Affine invariant(approximately invariant
w.r.t. perspective/viewpoint)
Rotation invariant
2D Transformations of a Local Patch
Block
Matching
e.g. MSER
In practise
hardly observable
in small
patches !
Example: Find Correspondences between these images using the MSER Detector [Matas‘02]
MSER Features
Local regions, not points !
Extremal Regions:
- Much Brighter than
Surrounding
- Use intensity threshold
Extremal Regions:
- OR: Much Darker than
Surrounding
- Use intensity threshold
Regions: Connected Pixels at some threshold
- Region Size = # Pixels
- Maximally stable: Size Constant near some
threshold
A Sample Feature
„T“ is maximally stable with respect to surrounding
- Compute „center of gravity“
- Compute Scatter (PCA / Ellipsoid)
Different Images: Different positions/sizes/shapes
Ellipse abstracts from pixels !
Geometric representation: position/size/shape
Still: How to compare ?
Idea: Normalize to „Default“ Position/Size/Shape !
⇒ e.g. Circle of Radius 16 Pixels !
Idea: Normalize to „Default“ Position/Size/Shape !
Ok, but 2D orientation ?
• Idea (Lowe‘99): Run over all Pixels:Chart Local Gradient Orientation in Histogram
• Find Dominant Orientation in Histogram
• Rotate Local Patch into Dominant Orientation
Each normalized patch obtained from single image !
Wrap-up MSER
• detect sets of pixels brighter/darker than surr.
• fit elliptical shape to pixel set
• warp image so that ellipse becomes circle
• rotate to dominant gradient direction [otherconstructions possible as well]
⇒Affine normalization of feature leads to similarpatches in different views !
Two MSERegions: Are they in correspondence ?
Traditional Matching Approach:
Compare Regions (Sum of Squared Differences)
- Small Misalignment
- Brightness Change
⇒ More Tolerant Comparison ?
SIFT Descriptor [Lowe‘99]:
- Brightness offset: use only gradients !
For each Sector:
- Store Orientations of Gradients !
Gradient Magnitude
Partition into Sectors
Gradient Orientation/Magnitude
Quantize Gradient Orientation, e.g. 45° Steps
Orientation Histogram per SectorGradient Orientation/Magnitude
Orientation Histogram (Magnitude as Weight)
m Sectors with n Orientations: (m·n) values
Construct Vector
35
12
10
25
…
29
35
12
10
25
…
29
0.12
0.04
0.03
0.08
…
0.10
Normalize
(Suppresses
Changing
Contrast
Effects)
… …
Summed Gradient
Magnitudes for
Different Sectors and
Orientations
„SIFT
Descriptor“:
128 Bytes
Memory Comparison
11x11 Patch:
Raw Grey Values =
121 Bytes
Wrap Up:
Normalized Patch Comparison vs. Descriptor
Usage of Gradients:
⇒ Intensity Offset Compensation
Subdivision into Sectors / Per-Sector Histogram:
⇒Small Alignment Error Compensation
Normalization of Histogram Vector:
⇒ Image Contrast Compensation
But most important:
Avoid sudden „descriptor jumps“
Classical Histogram (Quantization 45°):
22° quantized/rounded to 0°
23° quantized/rounded to 45°
Small Differences can lead to different bins !
Feature Position, Size, Shape, Orientation uncertain,
Image Content noisy !
Descriptor MUST tolerate this (no sudden changes !)
Solution: „Soft-Binning“ !
Histogram (Quantization 45°):
20
°
0
°
45
°
90
°
…
Classical
(closest bin)
1.0
22
°
2.0
If orientation is 3°
different, all
measurements go to
second bin !
=> Sudden Change in
Histogram from
(2 0 0 0) to (0 2 0 0)
Histogram (Quantization 45°):
20
°
0
°
45
°
90
°
…
Soft-Binning
0.56
22
°
1.07
If orientation is 3°
different, descriptor
changes only gradually !
0.440.93
0.56 0.44
Soft Weights:
„Bin Correctness“
Wrap-up
Detector:
- Find interesting regions (position/size/shape)
- Assign dominant gradient orientation
- Normalize regions
Descriptor:
- Compute „signature“ upon normalized region
- Behave smoothly in presence of distortions:
brightness changes / normalization inaccuracies
How to find correspondences ?
For each Region: 128-dim. Descriptor
Matching Scenario I
Two images in a dense foto sequence:
- think about maximum movement d (e.g. 50 pixel)
- Search in a window +/- d of old position
- Compare descriptors, choose most similar
Matching Scenario II
Two arbitrary images / Wide baseline
- Compare every descriptor with every other (e.g. GPU)
- OR: Find small set of matches, predict others
- OR: Find nearest neighbor in descriptor space
Searching Descriptor Space
Key Ideas:
- Each descriptor consists of 128 numbers
Imagine vector from IR128
- Correspondending descriptors: not far apart !
- Arrange all descriptors of image 1 in kd-tree
(imagine „octree“ but with more dims)
- For each descriptor of image 2:
Find (approximate) nearest neighbor in tree
Searching Descriptor Space
„Learn“ important dimensions of 128D space for a
given scene, e.g. PCA or LDA
Project descriptors to important dimensions, use kd-tree
Matching Techniques
Spatial Search Window:
- Requires/exploits good prediction
- Can avoid far away similar-looking features
- Good for sequences
Descriptor Space:
- Initial tree setup
- Fast lookup for huge amounts of features
- More sophisticated outlier detection required
- Good for asymmetric (offline/online) problems,
registration, initialization, object recognition, wide
baseline matching
Correspondence Verification
Features have only very local view => Mismatches
How to detect ?
- Discard Matches with low similarity
- Delete „non-distinctive“ features (those with close
match in same image or with similar 2nd best match)
- Check for bi-directional consistency
- Geometric verification, e.g. RANSAC
Object Detection / Pose Estimation using single MSER
Affine feature: position(2x), shape(3x), orientation(1x)
6 degree of freedom, more than 2D simple point !
2D Transformations of a Local Patch
Block
Matching
e.g. MSERe.g. SIFT
Lowe’s SIFT features
Detector + Descriptor
Recover features with position, orientation and scale
(Lowe, ICCV99)
Position
• Look for strong responses of DOG filter (Difference-Of-Gaussian)
• Only consider local maxima
Scale
• Look for strong responses of DOG filter (Difference-Of-Gaussian) over scale space
• Only consider local maxima in both position and scale
• Fit quadratic around maxima for subpixel
Minimum contrast and “cornerness”
Orientation
• Create histogram of local gradient directions computed at selected scale
• Assign canonical orientation at peak of smoothed histogram
• Each key specifies stable 2D coordinates (x, y, scale, orientation) 0 2π
SIFT descriptor
• Thresholded image gradients are sampled over 16x16 array of locations in scale space
• Create array of orientation histograms
• 8 orientations x 4x4 histogram array = 128 dimensions
• Affine feature evaluation + binaries:http://www.robots.ox.ac.uk/~vgg/research/affine/
• SIFT + MSER + some tools:http://vlfeat.org
• SURF:http://www.vision.ee.ethz.ch/~surf/
• GPU-SIFT:http://www.cs.unc.edu/~ccwu/siftgpu/
• DAISY (dense descriptors)http://cvlab.epfl.ch/~tola/daisy.html
• FAST[er] (simple but …)http://svr-www.eng.cam.ac.uk/~er258/work/fast.html
Some Feature Resources
Check also opencv + try to google
• BRIEF[Calonder10]: binary descriptor (tests=position a darker than b), compare descriptors by XOR (Hamming) + POPCNT
• RIFF[Takacs10]: CENSURE + gradients tangential/radial
• ORB[Rublee11] FAST+orientation
• BRISK[Leutenegger11] FAST+scale+BRIEF
• FREAK[Alahi12] FAST + “daisy”-BRIEF
• Lucid[Ziegler12]: “sort intensities”
• D-BRIEF[Trzcinski12]:Box-Filter+learned projection+BRIEF
• LDA HASH[Strecha12]: binary testson descriptor
Recent Variants and Accelerations (much faster, but this usually comes at a price)
Features:
local normalization + robust descriptors
Allow
• changing perspective/illumination/scale
• scenarios with many occlusions
• different reasoning (descriptor vectors)
Require• (Nearly) planarity across feature in 3d• „detectable“ regions, enough structure
But• fewer features / slower• invariance vs. descriptiveness:
Descriptors can be similar when regions are not !
=> What level of invariance / speed / properties doYOU really need ?
Feature tracking
• Identify features and track them over video
• Small difference between frames
• potential large difference overall
• Standard approach:
KLT (Kanade-Lukas-Tomasi)
Tracking corners through video
Good features to track
• Use same window in feature selection as for tracking itself
• Compute motion assuming it is small
Affine is also possible, but a bit harder (6x6 instead of 2x2)
differentiate:
Example
Simple displacement is sufficient between consecutive frames, but not to compare to reference template
Example
Synthetic example
Good features to keep tracking
Perform affine alignment between first and last frame
Stop tracking features with too large errors
• Brightness constancy assumption
Intensity Linearization
(small motion)
• 1D example
possibility for iterative refinement
• Brightness constancy assumption
Intensity Linearization
(small motion)
• 2D example
(2 unknowns)
(1 constraint)?
isophote I(t)=I
isophote I(t+1)=I
the “aperture” problem
Intensity Linearization
• How to deal with aperture problem?
Assume neighbors have same displacement
(3 constraints if color gradients are different)
Lucas-Kanade
Assume neighbors have same displacement
least-squares:
Revisiting the small motion assumption
• Is this motion small enough?
• Probably not—it’s much larger than one pixel (1st order Taylor not sufficient)
• How might we solve this problem?* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003
Reduce the resolution!
* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003
image It-1 image I
Gaussian pyramid of image It-1 Gaussian pyramid of image I
image Iimage It-1u=10 pixels
u=5 pixels
u=2.5 pixels
u=1.25 pixels
Coarse-to-fine optical flow estimation
slides from
Bradsky and Thrun
image Iimage J
Gaussian pyramid of image It-1 Gaussian pyramid of image I
image Iimage It-1
Coarse-to-fine optical flow estimation
run iterative L-K
run iterative L-K
warp & upsample
.
.
.
slides from
Bradsky and Thrun
Next week: Project Proposals