perceptual and sensory augmented computing visual object recognition tutorial visual object...
TRANSCRIPT
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
Visual Object Recognition
Bastian Leibe &
Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen Grauman
Department of Computer SciencesUniversity of Texas in Austin
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
2K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
3K. Grauman, B. Leibe
Recognition with Local Features
• Image content is transformed into local features that are invariant to translation, rotation, and scale
• Goal: Verify if they belong to a consistent configuration
Local Features, e.g. SIFT
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
4K. Grauman, B. Leibe
Finding Consistent Configurations
• Global spatial models Generalized Hough Transform [Lowe99] RANSAC [Obdrzalek02, Chum05, Nister06] Basic assumption: object is planar
• Assumption is often justified in practice Valid for many structures on
buildings Sufficient for small viewpoint
variations on 3D objects
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
5K. Grauman, B. Leibe
Hough Transform
• Origin: Detection of straight lines in clutter Basic idea: each candidate point votes
for all lines that it is consistent with. Votes are accumulated in quantized array Local maxima correspond to candidate lines
• Representation of a line Usual form y = a x + b has a singularity around 90º. Better parameterization: x cos() + y sin() =
x
y
θ
ρ
x
y
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
7K. Grauman, B. Leibe
Hough Transform: Noisy Line
• Problem: Finding the true maximum
Tokens Votesθ
ρ
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
8K. Grauman, B. Leibe
Hough Transform: Noisy Input
• Problem: Lots of spurious maxima
Tokens Votes
Slide credit: David Lowe
θ
ρ
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
9K. Grauman, B. Leibe
Generalized Hough Transform [Ballard81]
• Generalization for an arbitrary contour or shape Choose reference point for the contour (e.g. center) For each point on the contour remember where it is
located w.r.t. to the reference point Remember radius r and angle
relative to the contour tangent Recognition: whenever you find
a contour point, calculate the tangent angle and ‘vote’ for all possible reference points
Instead of reference point, can also vote for transformation
The same idea can be used with local features!
Slide credit: Bernt Schiele
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
10K. Grauman, B. Leibe
Gen. Hough Transform with Local Features
• For every feature, store possible “occurrences”
– Object identity– Pose– Relative position
• For new image, let the matched features vote for possible object positions
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
12K. Grauman, B. Leibe
3D Object Recognition
• Gen. HT for Recognition Typically only 3 feature
matches needed for recognition
Extra matches provide robustness
Affine model can be used for planar objects
Slide credit: David Lowe
[Lowe99]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
13K. Grauman, B. Leibe
View Interpolation
• Training Training views from similar
viewpoints are clusteredbased on feature matches.
Matching features between adjacent views are linked.
• Recognition Feature matches may be
spread over several training viewpoints.
Use the known links to “transfer votes” to other viewpoints.
Slide credit: David Lowe
[Lowe01]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
14K. Grauman, B. Leibe
Recognition Using View Interpolation
Slide credit: David Lowe
[Lowe01]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
15K. Grauman, B. Leibe
Location Recognition
Slide credit: David Lowe
Training
[Lowe04]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
16K. Grauman, B. Leibe
Applications
• Sony Aibo(Evolution Robotics)
• SIFT usage Recognize
docking station Communicate
with visual cards
• Other uses Place recognition Loop closure in SLAM
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
17K. Grauman, B. Leibe
RANSAC (RANdom SAmple Consensus) [Fischler81]
• Randomly choose a minimal subset of data points necessary to fit a model (a sample)
• Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support.
• Repeat for N samples; model with biggest support is most robust fit
Points within distance t of best model are inliers Fit final model to all inliers
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
19K. Grauman, B. Leibe
RANSAC: How many samples?
• How many samples are needed? Suppose w is fraction of inliers (points from line). n points needed to define hypothesis (2 for lines) k samples chosen.
• Prob. that a single sample of n points is correct:
• Prob. that all samples fail is:
Choose k high enough to keep this below desired failure rate.
nwknw )1(
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
21K. Grauman, B. Leibe
After RANSAC
• RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers
• Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization)
• But this may change inliers, so alternate fitting with re-classification as inlier/outlier
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
22K. Grauman, B. Leibe
Example: Finding Feature Matches
• Find best stereo match within a square search window (here 300 pixels2)
• Global transformation model: epipolar geometry
from Hartley & Zisserman
Slide credit: David Lowe
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
23K. Grauman, B. Leibe
Example: Finding Feature Matches
• Find best stereo match within a square search window (here 300 pixels2)
• Global transformation model: epipolar geometry
from Hartley & Zisserman
Slide credit: David Lowe
before RANSAC after RANSAC
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
24K. Grauman, B. Leibe
Comparison
Gen. Hough Transform• Advantages
Very effective for recognizing arbitrary shapes or objects
Can handle high percentage of outliers (>95%)
Extracts groupings from clutter in linear time
• Disadvantages Quantization issues Only practical for small
number of dimensions (up to 4)
• Improvements available Probabilistic Extensions Continuous Voting Space
RANSAC• Advantages
General method suited to large range of problems
Easy to implement Independent of number of
dimensions
• Disadvantages Only handles moderate
number of outliers (<50%)
• Many variants available, e.g. PROSAC: Progressive RANSAC
[Chum05] Preemptive RANSAC [Nister05]
[Leibe08]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
25B. Leibe
Example Applications
Mobile tourist guide• Self-localization• Object/building recognition• Photo/video augmentation
Aachen Cathedral
[Quack, Leibe, Van Gool, CIVR’08]
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
Vis
ual O
bje
ct
Recog
nit
ion
Tu
tori
al
Web Demo: Movie Poster Recognition
26K. Grauman, B. Leibe
http://www.kooaba.com/en/products_engine.html#
50’000 movieposters indexed
Query-by-imagefrom mobile phoneavailable in Switzer-land
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
Vis
ual O
bje
ct
Recog
nit
ion
Tu
tori
al
Application: Large-Scale Retrieval
27K. Grauman, B. Leibe [Philbin CVPR’07]
Query Results from 5k Flickr images (demo available for 100k set)
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
Vis
ual O
bje
ct
Recog
nit
ion
Tu
tori
al
Application: Image Auto-Annotation
28K. Grauman, B. Leibe
Left: Wikipedia imageRight: closest match from Flickr
[Quack CIVR’08]
Moulin Rouge
Tour Montparnasse Colosseum
ViktualienmarktMaypole
Old Town Square (Prague)
Perc
ep
tual an
d S
en
sory
Au
gm
en
ted
Com
pu
tin
gV
isu
al O
bje
ct
Recog
nit
ion
Tu
tori
al
29K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Features
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions