visual object recognitiongrauman/slides/aaai2008/aaai2008...perceptual and sensory augmented...
TRANSCRIPT
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual Object Recognition
Bastian Leibe &Computer Vision LaboratoryETH Zurich
Chicago, 14.07.2008
Kristen GraumanDepartment of Computer SciencesUniversity of Texas in Austin
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
2K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Global representations: limitations
• Success may rely on alignment -> sensitive to viewpoint
• All parts of the image or window impact the description -> sensitive to occlusion, clutter
3K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Local representations• Describe component regions or patches separately.• Many options for detection & description…
4K. Grauman, B. Leibe
Superpixels[Ren et al.]
Shape context [Belongie et al.]
Maximally Stable Extremal Regions [Matas
et al.]
Geometric Blur [Berg et al.]
SIFT [Lowe]
Salient regions [Kadir et al.]
Harris-Affine [Schmid et al.]
Spin images [Johnson and Hebert]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Recall: Invariant local features
Subset of local feature types designed to be invariant to
ScaleTranslationRotationAffine transformationsIllumination
1) Detect interest points 2) Extract descriptors
x1 x2…xd
y1 y2…yd
[Mikolajczyk & Schmid, Matas et al., Tuytelaars & Van Gool, Lowe, Kadiret al.,… ]
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Recognition with local feature sets
• Previously, we saw how to use local invariant features + a global spatial model to recognize specific objects, using a planar object assumption.
• Now, we’ll use local features for Indexing-based recognition
Bags of words representations
Correspondence / matching kernels
6K. Grauman, B. Leibe
Aachen Cathedral
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Basic flow
7K. Grauman, B. Leibe
Detect or sample features
Describe features
…List of positions,
scales, orientations
Associated list of d-dimensional
descriptors
…Index each one into pool of descriptors from previously seen images
…
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Indexing local features
• Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT)
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Indexing local features
• When we see close points in feature space, we have similar descriptors, which indicates similar local content.
Figure credit: A. Zisserman K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Indexing local features
• We saw in the previous section how to use voting and pose clustering to identify objects using local features
10K. Grauman, B. Leibe
Figure credit: David Lowe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Indexing local features
• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?
Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search
High-dimensional descriptors: approximate nearest neighbor search methods more practical
Inverted file indexing schemes
11K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Indexing local features: approximate nearest neighbor search
12K. Grauman, B. Leibe
Best-Bin First (BBF), a variant of k-dtrees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997]
Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
• For text documents, an efficient way to find all pages on which a word occurs is to use an index…
• We want to find all images in which a feature occurs.
• To use this idea, we’ll need to map our features to “visual words”.
13K. Grauman, B. Leibe
Indexing local features: inverted file index
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
• Extract some local features from a number of images …
14K. Grauman, B. Leibe
e.g., SIFT descriptor space: each point is 128-dimensional
Slide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
15K. Grauman, B. LeibeSlide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
16K. Grauman, B. LeibeSlide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
17K. Grauman, B. LeibeSlide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
18K. Grauman, B. LeibeSlide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
19K. Grauman, B. LeibeSlide credit: D. Nister
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
Map high-dimensional descriptors to tokens/words by quantizing the feature space
20K. Grauman, B. Leibe
• Quantize via clustering, let cluster centers be the prototype “words”
Descriptor space
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words: main idea
Map high-dimensional descriptors to tokens/words by quantizing the feature space
21K. Grauman, B. Leibe
• Determine which word to assign to each new image region by finding the closest cluster center.
Descriptor space
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words
• Example: each group of patches belongs to the same visual word
22K. Grauman, B. Leibe
Figure from Sivic & Zisserman, ICCV 2003
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words
• First explored for texture and material representations
• Texton = cluster center of filter responses over collection of images
• Describe textures and materials based on distribution of prototypical texture elements.
Leung & Malik 1999; Varma & Zisserman, 2002; Lazebnik, Schmid & Ponce, 2003;
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual words
24K. Grauman, B. Leibe
• More recently used for describing scenes and objects for the sake of indexing or classification.
Sivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Inverted file index for images comprised of visual words
Image credit: A. Zisserman K. Grauman, B. Leibe
Word number
List of image numbers
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Bags of visual words
• Summarize entire image based on its distribution (histogram) of word occurrences.
• Analogous to bag of words representation commonly used for documents.
26K. Grauman, B. LeibeImage credit: Fei-Fei Li
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Video Google System
1. Collect all words within query region
2. Inverted file index to find relevant frames
3. Compare word counts4. Spatial verification
Sivic & Zisserman, ICCV 2003
• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html
27K. Grauman, B. Leibe
Query region
Retrieved frames
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Basic flow
28K. Grauman, B. Leibe
Detect or sample features
Describe features
…List of positions,
scales, orientations
Associated list of d-dimensional
descriptors
…Index each one into pool of descriptors from previously seen images
…
Quantize to form bag of words vector for the image
or
…
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Visual vocabulary formation
Issues:• Sampling strategy• Clustering / quantization algorithm• Unsupervised vs. supervised• What corpus provides features (universal vocabulary?)• Vocabulary size, number of words
29K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Sampling strategies
30K. Grauman, B. LeibeImage credits: F-F. Li, E. Nowak, J. Sivic
Dense, uniformly Sparse, at interest points
Randomly
Multiple interest operators
• To find specific, textured objects, sparse sampling from interest points often more reliable.
• Multiple complementary interest operators offer more image coverage.
• For object categorization, dense sampling offers better coverage.
[See Nowak, Jurie & Triggs, ECCV 2006]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Clustering / quantization methods
• k-means (typical choice), agglomerative clustering, mean-shift,…
• Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies
Vocabulary tree [Nister & Stewenius, CVPR 2006]
31K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
32K. Grauman, B. Leibe
Example: Recognition with Vocabulary Tree
• Tree construction:
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
33K. Grauman, B. Leibe
Vocabulary Tree
• Training: Filling the tree
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
34K. Grauman, B. Leibe
Vocabulary Tree
• Training: Filling the tree
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
35K. Grauman, B. Leibe
Vocabulary Tree
• Training: Filling the tree
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
36K. Grauman, B. Leibe
Vocabulary Tree
• Training: Filling the tree
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
37K. Grauman, B. Leibe
Vocabulary Tree
• Training: Filling the tree
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
38K. Grauman, B. Leibe
Vocabulary Tree
• Recognition
Slide credit: David Nister
[Nister & Stewenius, CVPR’06]
RANSACverification
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
39K. Grauman, B. Leibe
Vocabulary Tree: Performance
• Evaluated on large databasesIndexing with up to 1M images
• Online recognition for databaseof 50,000 CD covers
Retrieval in ~1s
• Find experimentally that large vocabularies can be beneficial for recognition
[Nister & Stewenius, CVPR’06]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Vocabulary formation
• Ensembles of trees provide additional robustness
Figure credit: F. Jurie K. Grauman, B. Leibe
Moosmann, Jurie, & Triggs 2006; Yeh, Lee, & Darrell 2007; Bosch, Zisserman, & Munoz 2007; …
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Supervised vocabulary formation
• Recent work considers how to leverage labeled images when constructing the vocabulary
41K. Grauman, B. Leibe
Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for Generic Visual Categorization, ECCV 2006.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Supervised vocabulary formation
• Merge words that don’t aid in discriminability
Winn, Criminisi, & Minka, Object Categorization by Learned Universal Visual Dictionary, ICCV 2005
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Supervised vocabulary formation
• Consider vocabulary and classifier construction jointly.
43K. Grauman, B. Leibe
Yang, Jin, Sukthankar, & Jurie, Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008.
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Learning and recognition with bag of words histograms• Bag of words representation makes it possible to
describe the unordered point set with a single vector (of fixed dimension across image examples)
• Provides easy way to use distribution of feature types with various learning algorithms requiring vector input.
44K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
• …including unsupervised topic models designed for documents.
• Hierarchical Bayesian text models (pLSA and LDA)– Hoffman 2001, Blei, Ng & Jordan, 2004– For object and scene categorization: Sivic et al. 2005,
Sudderth et al. 2005, Quelhas et al. 2005, Fei-Fei et al. 2005
45K. Grauman, B. Leibe
Learning and recognition with bag of words histograms
Figure credit: Fei-Fei Li
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
• …including unsupervised topic models designed for documents.
46K. Grauman, B. Leibe
Learning and recognition with bag of words histograms
Probabilistic Latent Semantic Analysis (pLSA)w
Nd z
D
“face”
Sivic et al. ICCV 2005[pLSA code available at: http://www.robots.ox.ac.uk/~vgg/software/]
Figure credit: Fei-Fei Li
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Bags of words: pros and cons
+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ has yielded good recognition results in practice
- basic model ignores geometry – must verify afterwards, or encode via features
- background and foreground mixed when bag covers whole image
- interest points or sampling: no guarantee to capture object-level parts
- optimal vocabulary formation remains unclear47
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
48K. Grauman, B. Leibe
Outline
1. Detection with Global Appearance & Sliding Windows
2. Local Invariant Features: Detection & Description
3. Specific Object Recognition with Local Features
― Coffee Break ―
4. Visual Words: Indexing, Bags of Words Categorization
5. Matching Local Feature Sets
6. Part-Based Models for Categorization
7. Current Challenges and Research Directions
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Basic flow
49K. Grauman, B. Leibe
Detect or sample features
Describe features
…List of positions,
scales, orientations
Associated list of d-dimensional
descriptors
…Index each one into pool of descriptors from previously seen images
…
Compute match with another image
orQuantize to form bag of words vector for the image
or
…
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Local feature correspondences
• The matching between sets of local features helps to establish overall similarity between objects or shapes.
• Assigned matches also useful for localization
50K. Grauman, B. Leibe
Shape context [Belongie & Malik 2001]
Low-distortion matching [Berg & Malik 2005] Match kernel [Wallraven, Caputo & Graf 2003]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Local feature correspondences
• Least cost match: minimize total cost between matched points
• Least cost partial match: match all of smaller set to some portion of larger set.
∑∈
→−
Xxii
YXi
xx )(min:
ππ
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel (PMK)
• Optimal matching expensive relative to number of features per image (m).
• PMK is approximate partial match for efficient discriminative learning from sets of local features.
52K. Grauman, B. Leibe
Optimal match: O(m3)Greedy match: O(m2 log m)Pyramid match: O(m)
[Grauman & Darrell, ICCV 2005]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel: pyramid extraction
53K. Grauman
,
Histogram pyramid: level i has bins of size
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel: counting matches
54K. Grauman
Histogram intersection
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel: counting new matches
55K. Grauman
Difference in histogram intersections across levels counts number of new pairs matched
matches at this level matches at previous level
Histogram intersection
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel
56K. Grauman
• For similarity, weights inversely proportional to bin size (or may be learned discriminatively)
• Normalize kernel values to avoid favoring large sets
measure of difficulty of a match at level i
histogram pyramids
number of newly matched pairs at level i
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Example pyramid match
57K. Grauman
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Example pyramid match
58K. Grauman
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Example pyramid match
59K. Grauman
pyramid match
optimal match
Example pyramid match
K. Grauman
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets
Acc
urac
y
Mean number of featuresTi
me
(s)
Mean number of features
ETH
-80 data set
Pyramid match
Match [Wallraven et al.] O(m2)
O(m)
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Pyramid match kernel
• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods
• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets• Use data-dependent pyramid partitions for high-d
feature spaces
Uniform pyramid bins Vocabulary-guided pyramid bins
Code for PMK: http://people.csail.mit.edu/jjl/libpmk/
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Matching smoothness & local geometry
• Solving for linear assignment means (non-overlapping) features can be matched independently, ignoring relative geometry.
• One alternative: simply expand feature vectors to include spatial information before matching.
64K. Grauman, B. Leibe
[ f1,…,f128, ]
xa
ya
xa, ya
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Spatial pyramid match kernel
• First quantize descriptors into words, then do one pyramid match per word in image coordinate space.
Lazebnik, Schmid & Ponce, CVPR 2006
K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Matching smoothness & local geometry
• Use correspondence to estimate parameterized transformation, regularize to enforce smoothness
Shape context matching [Belongie, Malik, & Puzicha 2001]
K. Grauman, B. Leibe
Code: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Matching smoothness & local geometry
• Let matching cost include term to penalize distortion between pairs of matched features.
Approximate for efficient solutions: Berg & Malik, CVPR 2005; Leordeanu & Hebert, ICCV 2005
i i 'i i'
j j'QueryTemplate
RijSi'j'
Figure credit: Alex Berg K. Grauman, B. Leibe
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Matching smoothness & local geometry• Compare “semi-local” features: consider configurations
or neighborhoods and co-occurrence relationships
K. Grauman, B. Leibe
Hyperfeatures: Agarwal & Triggs, ECCV 2006]
Correlograms of visual words [Savarese, Winn, & Criminisi, CVPR 2006]
Proximity distribution kernel [Ling & Soatto, ICCV 2007]
Feature neighborhoods [Sivic& Zisserman, CVPR 2004]
Tiled neighborhood [Quack, Ferrari, Leibe, van Gool ICCV 2007]
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Matching smoothness & local geometry
• Learn or provide explicit object-specific shape model [Next in the tutorial : part-based models]
x1
x3
x4
x6
x5
x2
Perc
eptu
al a
nd S
enso
ry A
ugm
ente
d Co
mpu
ting
Visu
al O
bjec
t Rec
ogni
tion
Tuto
rial
Summary
• Local features are a useful, flexible representationInvariance properties - typically built into the descriptorDistinctive, especially helpful for identifying specific textured objectsBreaking image into regions/parts gives tolerance to occlusions and clutterMapping to visual words forms discrete tokens from image regions
• Efficient methods available forIndexing patches or regionsComparing distributions of visual wordsMatching features
70K. Grauman, B. Leibe