visual object recognitiongrauman/slides/aaai2008/aaai2008...perceptual and sensory augmented...

70
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, 14.07.2008 Kristen Grauman Department of Computer Sciences University of Texas in Austin

Upload: others

Post on 13-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual Object Recognition

Bastian Leibe &Computer Vision LaboratoryETH Zurich

Chicago, 14.07.2008

Kristen GraumanDepartment of Computer SciencesUniversity of Texas in Austin

Page 2: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

2K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 3: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Global representations: limitations

• Success may rely on alignment -> sensitive to viewpoint

• All parts of the image or window impact the description -> sensitive to occlusion, clutter

3K. Grauman, B. Leibe

Page 4: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Local representations• Describe component regions or patches separately.• Many options for detection & description…

4K. Grauman, B. Leibe

Superpixels[Ren et al.]

Shape context [Belongie et al.]

Maximally Stable Extremal Regions [Matas

et al.]

Geometric Blur [Berg et al.]

SIFT [Lowe]

Salient regions [Kadir et al.]

Harris-Affine [Schmid et al.]

Spin images [Johnson and Hebert]

Page 5: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Recall: Invariant local features

Subset of local feature types designed to be invariant to

ScaleTranslationRotationAffine transformationsIllumination

1) Detect interest points 2) Extract descriptors

x1 x2…xd

y1 y2…yd

[Mikolajczyk & Schmid, Matas et al., Tuytelaars & Van Gool, Lowe, Kadiret al.,… ]

K. Grauman, B. Leibe

Page 6: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Recognition with local feature sets

• Previously, we saw how to use local invariant features + a global spatial model to recognize specific objects, using a planar object assumption.

• Now, we’ll use local features for Indexing-based recognition

Bags of words representations

Correspondence / matching kernels

6K. Grauman, B. Leibe

Aachen Cathedral

Page 7: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Basic flow

7K. Grauman, B. Leibe

Detect or sample features

Describe features

…List of positions,

scales, orientations

Associated list of d-dimensional

descriptors

…Index each one into pool of descriptors from previously seen images

Page 8: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Indexing local features

• Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT)

K. Grauman, B. Leibe

Page 9: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Indexing local features

• When we see close points in feature space, we have similar descriptors, which indicates similar local content.

Figure credit: A. Zisserman K. Grauman, B. Leibe

Page 10: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Indexing local features

• We saw in the previous section how to use voting and pose clustering to identify objects using local features

10K. Grauman, B. Leibe

Figure credit: David Lowe

Page 11: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Indexing local features

• With potentially thousands of features per image, and hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search

High-dimensional descriptors: approximate nearest neighbor search methods more practical

Inverted file indexing schemes

11K. Grauman, B. Leibe

Page 12: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Indexing local features: approximate nearest neighbor search

12K. Grauman, B. Leibe

Best-Bin First (BBF), a variant of k-dtrees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997]

Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998]

Page 13: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

• For text documents, an efficient way to find all pages on which a word occurs is to use an index…

• We want to find all images in which a feature occurs.

• To use this idea, we’ll need to map our features to “visual words”.

13K. Grauman, B. Leibe

Indexing local features: inverted file index

Page 14: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

• Extract some local features from a number of images …

14K. Grauman, B. Leibe

e.g., SIFT descriptor space: each point is 128-dimensional

Slide credit: D. Nister

Page 15: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

15K. Grauman, B. LeibeSlide credit: D. Nister

Page 16: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

16K. Grauman, B. LeibeSlide credit: D. Nister

Page 17: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

17K. Grauman, B. LeibeSlide credit: D. Nister

Page 18: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

18K. Grauman, B. LeibeSlide credit: D. Nister

Page 19: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

19K. Grauman, B. LeibeSlide credit: D. Nister

Page 20: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

20K. Grauman, B. Leibe

• Quantize via clustering, let cluster centers be the prototype “words”

Descriptor space

Page 21: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

21K. Grauman, B. Leibe

• Determine which word to assign to each new image region by finding the closest cluster center.

Descriptor space

Page 22: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words

• Example: each group of patches belongs to the same visual word

22K. Grauman, B. Leibe

Figure from Sivic & Zisserman, ICCV 2003

Page 23: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words

• First explored for texture and material representations

• Texton = cluster center of filter responses over collection of images

• Describe textures and materials based on distribution of prototypical texture elements.

Leung & Malik 1999; Varma & Zisserman, 2002; Lazebnik, Schmid & Ponce, 2003;

Page 24: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual words

24K. Grauman, B. Leibe

• More recently used for describing scenes and objects for the sake of indexing or classification.

Sivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others.

Page 25: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Inverted file index for images comprised of visual words

Image credit: A. Zisserman K. Grauman, B. Leibe

Word number

List of image numbers

Page 26: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Bags of visual words

• Summarize entire image based on its distribution (histogram) of word occurrences.

• Analogous to bag of words representation commonly used for documents.

26K. Grauman, B. LeibeImage credit: Fei-Fei Li

Page 27: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Video Google System

1. Collect all words within query region

2. Inverted file index to find relevant frames

3. Compare word counts4. Spatial verification

Sivic & Zisserman, ICCV 2003

• Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogle/index.html

27K. Grauman, B. Leibe

Query region

Retrieved frames

Page 28: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Basic flow

28K. Grauman, B. Leibe

Detect or sample features

Describe features

…List of positions,

scales, orientations

Associated list of d-dimensional

descriptors

…Index each one into pool of descriptors from previously seen images

Quantize to form bag of words vector for the image

or

Page 29: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Visual vocabulary formation

Issues:• Sampling strategy• Clustering / quantization algorithm• Unsupervised vs. supervised• What corpus provides features (universal vocabulary?)• Vocabulary size, number of words

29K. Grauman, B. Leibe

Page 30: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Sampling strategies

30K. Grauman, B. LeibeImage credits: F-F. Li, E. Nowak, J. Sivic

Dense, uniformly Sparse, at interest points

Randomly

Multiple interest operators

• To find specific, textured objects, sparse sampling from interest points often more reliable.

• Multiple complementary interest operators offer more image coverage.

• For object categorization, dense sampling offers better coverage.

[See Nowak, Jurie & Triggs, ECCV 2006]

Page 31: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Clustering / quantization methods

• k-means (typical choice), agglomerative clustering, mean-shift,…

• Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies

Vocabulary tree [Nister & Stewenius, CVPR 2006]

31K. Grauman, B. Leibe

Page 32: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

32K. Grauman, B. Leibe

Example: Recognition with Vocabulary Tree

• Tree construction:

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 33: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

33K. Grauman, B. Leibe

Vocabulary Tree

• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 34: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

34K. Grauman, B. Leibe

Vocabulary Tree

• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 35: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

35K. Grauman, B. Leibe

Vocabulary Tree

• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 36: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

36K. Grauman, B. Leibe

Vocabulary Tree

• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 37: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

37K. Grauman, B. Leibe

Vocabulary Tree

• Training: Filling the tree

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 38: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

38K. Grauman, B. Leibe

Vocabulary Tree

• Recognition

Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

RANSACverification

Page 39: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

39K. Grauman, B. Leibe

Vocabulary Tree: Performance

• Evaluated on large databasesIndexing with up to 1M images

• Online recognition for databaseof 50,000 CD covers

Retrieval in ~1s

• Find experimentally that large vocabularies can be beneficial for recognition

[Nister & Stewenius, CVPR’06]

Page 40: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Vocabulary formation

• Ensembles of trees provide additional robustness

Figure credit: F. Jurie K. Grauman, B. Leibe

Moosmann, Jurie, & Triggs 2006; Yeh, Lee, & Darrell 2007; Bosch, Zisserman, & Munoz 2007; …

Page 41: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Supervised vocabulary formation

• Recent work considers how to leverage labeled images when constructing the vocabulary

41K. Grauman, B. Leibe

Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for Generic Visual Categorization, ECCV 2006.

Page 42: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Supervised vocabulary formation

• Merge words that don’t aid in discriminability

Winn, Criminisi, & Minka, Object Categorization by Learned Universal Visual Dictionary, ICCV 2005

Page 43: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Supervised vocabulary formation

• Consider vocabulary and classifier construction jointly.

43K. Grauman, B. Leibe

Yang, Jin, Sukthankar, & Jurie, Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition, CVPR 2008.

Page 44: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Learning and recognition with bag of words histograms• Bag of words representation makes it possible to

describe the unordered point set with a single vector (of fixed dimension across image examples)

• Provides easy way to use distribution of feature types with various learning algorithms requiring vector input.

44K. Grauman, B. Leibe

Page 45: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

• …including unsupervised topic models designed for documents.

• Hierarchical Bayesian text models (pLSA and LDA)– Hoffman 2001, Blei, Ng & Jordan, 2004– For object and scene categorization: Sivic et al. 2005,

Sudderth et al. 2005, Quelhas et al. 2005, Fei-Fei et al. 2005

45K. Grauman, B. Leibe

Learning and recognition with bag of words histograms

Figure credit: Fei-Fei Li

Page 46: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

• …including unsupervised topic models designed for documents.

46K. Grauman, B. Leibe

Learning and recognition with bag of words histograms

Probabilistic Latent Semantic Analysis (pLSA)w

Nd z

D

“face”

Sivic et al. ICCV 2005[pLSA code available at: http://www.robots.ox.ac.uk/~vgg/software/]

Figure credit: Fei-Fei Li

Page 47: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Bags of words: pros and cons

+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ has yielded good recognition results in practice

- basic model ignores geometry – must verify afterwards, or encode via features

- background and foreground mixed when bag covers whole image

- interest points or sampling: no guarantee to capture object-level parts

- optimal vocabulary formation remains unclear47

K. Grauman, B. Leibe

Page 48: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

48K. Grauman, B. Leibe

Outline

1. Detection with Global Appearance & Sliding Windows

2. Local Invariant Features: Detection & Description

3. Specific Object Recognition with Local Features

― Coffee Break ―

4. Visual Words: Indexing, Bags of Words Categorization

5. Matching Local Feature Sets

6. Part-Based Models for Categorization

7. Current Challenges and Research Directions

Page 49: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Basic flow

49K. Grauman, B. Leibe

Detect or sample features

Describe features

…List of positions,

scales, orientations

Associated list of d-dimensional

descriptors

…Index each one into pool of descriptors from previously seen images

Compute match with another image

orQuantize to form bag of words vector for the image

or

Page 50: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Local feature correspondences

• The matching between sets of local features helps to establish overall similarity between objects or shapes.

• Assigned matches also useful for localization

50K. Grauman, B. Leibe

Shape context [Belongie & Malik 2001]

Low-distortion matching [Berg & Malik 2005] Match kernel [Wallraven, Caputo & Graf 2003]

Page 51: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Local feature correspondences

• Least cost match: minimize total cost between matched points

• Least cost partial match: match all of smaller set to some portion of larger set.

∑∈

→−

Xxii

YXi

xx )(min:

ππ

Page 52: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel (PMK)

• Optimal matching expensive relative to number of features per image (m).

• PMK is approximate partial match for efficient discriminative learning from sets of local features.

52K. Grauman, B. Leibe

Optimal match: O(m3)Greedy match: O(m2 log m)Pyramid match: O(m)

[Grauman & Darrell, ICCV 2005]

Page 53: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel: pyramid extraction

53K. Grauman

,

Histogram pyramid: level i has bins of size

Page 54: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel: counting matches

54K. Grauman

Histogram intersection

Page 55: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel: counting new matches

55K. Grauman

Difference in histogram intersections across levels counts number of new pairs matched

matches at this level matches at previous level

Histogram intersection

Page 56: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel

56K. Grauman

• For similarity, weights inversely proportional to bin size (or may be learned discriminatively)

• Normalize kernel values to avoid favoring large sets

measure of difficulty of a match at level i

histogram pyramids

number of newly matched pairs at level i

Page 57: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Example pyramid match

57K. Grauman

Page 58: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Example pyramid match

58K. Grauman

Page 59: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Example pyramid match

59K. Grauman

Page 60: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

pyramid match

optimal match

Example pyramid match

K. Grauman

Page 61: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets

K. Grauman, B. Leibe

Page 62: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets

Acc

urac

y

Mean number of featuresTi

me

(s)

Mean number of features

ETH

-80 data set

Pyramid match

Match [Wallraven et al.] O(m2)

O(m)

Page 63: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Pyramid match kernel

• Forms a Mercer kernel -> allows classification with SVMs, use of other kernel methods

• Bounded error relative to optimal partial match• Linear time -> efficient learning with large feature sets• Use data-dependent pyramid partitions for high-d

feature spaces

Uniform pyramid bins Vocabulary-guided pyramid bins

Code for PMK: http://people.csail.mit.edu/jjl/libpmk/

Page 64: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Matching smoothness & local geometry

• Solving for linear assignment means (non-overlapping) features can be matched independently, ignoring relative geometry.

• One alternative: simply expand feature vectors to include spatial information before matching.

64K. Grauman, B. Leibe

[ f1,…,f128, ]

xa

ya

xa, ya

Page 65: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Spatial pyramid match kernel

• First quantize descriptors into words, then do one pyramid match per word in image coordinate space.

Lazebnik, Schmid & Ponce, CVPR 2006

K. Grauman, B. Leibe

Page 66: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Matching smoothness & local geometry

• Use correspondence to estimate parameterized transformation, regularize to enforce smoothness

Shape context matching [Belongie, Malik, & Puzicha 2001]

K. Grauman, B. Leibe

Code: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html

Page 67: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Matching smoothness & local geometry

• Let matching cost include term to penalize distortion between pairs of matched features.

Approximate for efficient solutions: Berg & Malik, CVPR 2005; Leordeanu & Hebert, ICCV 2005

i i 'i i'

j j'QueryTemplate

RijSi'j'

Figure credit: Alex Berg K. Grauman, B. Leibe

Page 68: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Matching smoothness & local geometry• Compare “semi-local” features: consider configurations

or neighborhoods and co-occurrence relationships

K. Grauman, B. Leibe

Hyperfeatures: Agarwal & Triggs, ECCV 2006]

Correlograms of visual words [Savarese, Winn, & Criminisi, CVPR 2006]

Proximity distribution kernel [Ling & Soatto, ICCV 2007]

Feature neighborhoods [Sivic& Zisserman, CVPR 2004]

Tiled neighborhood [Quack, Ferrari, Leibe, van Gool ICCV 2007]

Page 69: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Matching smoothness & local geometry

• Learn or provide explicit object-specific shape model [Next in the tutorial : part-based models]

x1

x3

x4

x6

x5

x2

Page 70: Visual Object Recognitiongrauman/slides/aaai2008/aaai2008...Perceptual and Sensory Augmented ComputingVisual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection

Perc

eptu

al a

nd S

enso

ry A

ugm

ente

d Co

mpu

ting

Visu

al O

bjec

t Rec

ogni

tion

Tuto

rial

Summary

• Local features are a useful, flexible representationInvariance properties - typically built into the descriptorDistinctive, especially helpful for identifying specific textured objectsBreaking image into regions/parts gives tolerance to occlusions and clutterMapping to visual words forms discrete tokens from image regions

• Efficient methods available forIndexing patches or regionsComparing distributions of visual wordsMatching features

70K. Grauman, B. Leibe