beyond bags of features: part-based models

Beyond bags of features:Part-based models

Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Implicit shape models• Visual codebook is used to index votes for

object position

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

training image annotated with object localization info

visual codeword withdisplacement vectors

http://www.pascal-network.org/challenges/VOC/pubs/leibe04.pdf

Implicit shape models• Visual codebook is used to index votes for

object position

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

test image

http://www.pascal-network.org/challenges/VOC/pubs/leibe04.pdf

Implicit shape models: Training

1. Build codebook of patches around extracted interest points using clustering



2. Map the patch around each interest point to closest codebook entry



2. Map the patch around each interest point to closest codebook entry

3. For each codebook entry, store all positions it was found, relative to object center

Implicit shape models: Testing1. Given test image, extract patches, match to

codebook entry

2. Cast votes for possible positions of object center

3. Search for maxima in voting space

4. Extract weighted segmentation mask based on stored masks for the codebook occurrences

Source: B. Leibe

Original image

Example: Results on Cows

Interest points


Source: B. Leibe


Matched patchesSource: B. Leibe


Probabilistic votesSource: B. Leibe


Hypothesis 1Source: B. Leibe

Additional examples

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, IJCV 77 (1-3), pp. 259-289, 2008.

http://www.mmp.rwth-aachen.de/publications/pdf/leibe-interleaved-ijcv07final.pdf

Generative part-based models

R. Fergus, P. Perona and A. Zisserman, Object Class Recognition by Unsupervised Scale-Invariant Learning, CVPR 2003

http://cs.nyu.edu/~fergus/papers/fergus03.pdf

• Model the probability of an image given a class

What is a generative model?

)|( zebranoimagep)|( zebraimagep vs.

• Maximum a posteriori (MAP) decision: assign the image to the class that gets the highest posterior probability P(class | image)

Making a decision

• Maximum a posteriori (MAP) decision: assign the image to the class that gets the highest posterior probability P(class | image)

• Bayes rule:

Making a decision

)(

)()|()|(

imagep

classpclassimagepimageclassp

Generative vs. discriminative models

• Discriminative methods: model posterior

• Generative methods: model likelihood and prior

)()|()|( classpclassimagepimageclassp

posterior likelihood prior

Generative vs. discriminative models

Generative Discriminative

Cla

ss d

ensi

ties

Pos

terio

r pr

obab

ilitie

s

The Naïve Bayes model

• Generative model for a bag of features

• Assume that each feature fi is conditionally independent given the class c:

N

iiN cfpcffpcimagep

11 )|()|,,()|(

Generative part-based model

Candidate parts

)|(),|(),|(max

)|,,(max

)|,()|(

objecthpobjecthshapepobjecthappearanceP

objecthshapeappearanceP

objectshapeappearancePobjectimageP

h

h

Partdescriptors

Partlocations


h: assignment of features to parts

Part 2

Part 3

Part 1

)|(),|(),|(max

)|,,(max

)|,()|(


objecthshapeappearanceP


h

h


)|(),|(),|(max

)|,()|(



h

High-dimensional appearance space

Distribution over patchdescriptors


)|(),|(),|(max

)|,()|(



h

2D image space

Distribution over jointpart positions

Results: Faces

Faceshapemodel

Patchappearancemodel

Recognitionresults

Results: Motorbikes and airplanes

Pictorial structures

P. Felzenszwalb and D. Huttenlocher, Pictorial Structures for Object Recognition, IJCV 61(1), 2005

• Set of parts (oriented rectangles) connected by edges

• Recognition problem: find the most probable part layout l1, …, ln in the image

http://people.cs.uchicago.edu/~pff/papers/blobrecJ.pdf

Pictorial structures

• MAP formulation: maximize posterior

• Energy-based formulation: minimize minus the log of probability:

Eji

jii

innn llPlPllPllPllP,

111 )|())(Im(),...,(),...,|(ImIm)|,...,(

i ji

jiijiin lldlmllE,

1 ),()(),...,(

Appearance Geometry

Matching cost

Deformation cost

Pictorial structures: Complexity

• Suppose there are n parts, and each has h possible positions in the image

• What is the complexity of finding the optimal layout?

• Brute force search: O(hn)• Tree-structured model: O(h2n)• Deformation cost is quadratic: O(hn)

i ji

jiijiill lldlmn

,,, ),()(minarg

1

beyond bags of features: part-based models

Documents

leibe example

model likelihood

codebook occurrences

object positionb

based model h

object class recognition

combined object categorization

closest codebook entryfor