![Page 1: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/1.jpg)
Lecture 17 -Fei-Fei Li
Lecture 17:
object detection
Professor Fei-Fei Li
Stanford Vision Lab
30-Nov-111
![Page 2: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/2.jpg)
Lecture 17 -Fei-Fei Li
Object detection
30-Nov-112
![Page 3: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/3.jpg)
Lecture 17 -Fei-Fei Li
What we will learn today?
• Implicit Shape Model
– Representation
– Recognition
– Experiments and results
• Deformable Models
– The PASCAL challenge
– Latent SVM Model
30-Nov-113
![Page 4: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/4.jpg)
Lecture 17 -Fei-Fei Li
What we will learn today?
• Implicit Shape Model
– Representation
– Recognition
– Experiments and results
• Deformable Models
– The PASCAL challenge
– Latent SVM Model
30-Nov-114
![Page 5: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/5.jpg)
Lecture 17 -Fei-Fei Li
Implicit Shape Model (ISM)
• Basic ideas
– Learn an appearance codebook
– Learn a star-topology structural model
• Features are considered independent given obj. center
• Algorithm: probabilistic Gen. Hough Transform
– Exact correspondences → Prob. match to object part
– NN matching → Soft matching
– Feature location on obj. → Part location distribution
– Uniform votes → Probabilistic vote weighting
– Quantized Hough array → Continuous Hough space
x1
x3
x4
x6
x5
x2
Source: Bastian Leibe
30-Nov-115
![Page 6: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/6.jpg)
Lecture 17 -Fei-Fei Li
Implicit Shape Model: Basic Idea
• Visual vocabulary is used to index votes for object
position [a visual word = “part”].
Training image
Visual codeword with
displacement vectors
Source: Bastian Leibe
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
30-Nov-116
![Page 7: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/7.jpg)
Lecture 17 -Fei-Fei Li
• Objects are detected as consistent configurations of
the observed parts (visual words).
Test image
Implicit Shape Model: Basic Idea
Source: Bastian Leibe
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
30-Nov-117
![Page 8: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/8.jpg)
Lecture 17 -Fei-Fei Li
Implicit Shape Model - Representation
• Learn appearance codebook
– Extract local features at interest points
– Agglomerative clustering ⇒ codebook
• Learn spatial distributions
– Match codebook to training images
– Record matching positions on object
Training images(+reference segmentation)
Appearance codebook…
………
…
Spatial occurrence distributionsx
y
sx
y
s
x
y
sx
y
s
+ local figure-ground labelsSource: Bastian Leibe
30-Nov-118
![Page 9: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/9.jpg)
Lecture 17 -Fei-Fei Li
Implicit Shape Model - Recognition
Interest Points Matched Codebook Entries
Probabilistic Voting
3D Voting Space(continuous)
x
y
s
Object Position
o,x
Image Feature
f
Interpretation(Codebook match)
Ci
)( fCp i ),,( lin Cxop
Probabilistic vote weighting(will be explained later in detail)
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-119
![Page 10: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/10.jpg)
Lecture 17 -Fei-Fei Li
Implicit Shape Model - Recognition[L
eib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
BackprojectedHypotheses
Interest Points Matched Codebook Entries
Probabilistic Voting
3D Voting Space(continuous)
x
y
s
Backprojectionof Maxima
30-Nov-1110
![Page 11: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/11.jpg)
Lecture 17 -Fei-Fei Li
Original image
Example: Results on Cows
Source: Bastian Leibe
30-Nov-1111
![Page 12: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/12.jpg)
Lecture 17 -Fei-Fei Li
Original imageInterest points
Example: Results on Cows
Source: Bastian Leibe
30-Nov-1112
![Page 13: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/13.jpg)
Lecture 17 -Fei-Fei Li
Original imageInterest pointsMatched patches
Example: Results on Cows
Source: Bastian Leibe
30-Nov-1113
![Page 14: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/14.jpg)
Lecture 17 -Fei-Fei Li
Example: Results on Cows
Prob. VotesSource: Bastian Leibe
30-Nov-1114
![Page 15: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/15.jpg)
Lecture 17 -Fei-Fei Li
1st hypothesis
Example: Results on Cows
So
urc
e:
K.
Gra
um
an
& B
. Le
ibe
30-Nov-1115
![Page 16: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/16.jpg)
Lecture 17 -Fei-Fei Li
2nd hypothesis
Example: Results on Cows
Source: Bastian Leibe
30-Nov-1116
![Page 17: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/17.jpg)
Lecture 17 -Fei-Fei Li
Example: Results on Cows
3rd hypothesisSource: Bastian Leibe
30-Nov-1117
![Page 18: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/18.jpg)
Lecture 17 -Fei-Fei Li
• Scale-invariant feature selection– Scale-invariant interest points
– Rescale extracted patches
– Match to constant-size codebook
• Generate scale votes– Scale as 3rd dimension in voting space
– Search for maxima in 3D voting space
Scale Invariant Voting
Search window
x
y
s
Source: Bastian Leibe
30-Nov-1118
![Page 19: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/19.jpg)
Lecture 17 -Fei-Fei Li
Scale Voting: Efficient Computation
• Continuous Generalized Hough Transform
� Binned accumulator array similar to standard Gen. Hough Transf.
� Quickly identify candidate maxima locations
� Refine locations by Mean-Shift search only around those points
⇒ Avoid quantization effects by keeping exact vote locations.
⇒ Mean-shift interpretation as kernel prob. density estimation.
y
s
xRefinement(Mean-Shift)
y
s
xCandidatemaxima
y
s
Scale votesx
y
s
Binned accum. array
x
Source: Bastian Leibe
30-Nov-1119
![Page 20: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/20.jpg)
Lecture 17 -Fei-Fei Li
• Scale-adaptive Mean-Shift search for refinement
– Increase search window size with hypothesis scale
– Scale-adaptive balloon density estimatorThis image cannot currently be displayed.
Scale Voting: Efficient Computation
y
s
xRefinement(Mean-Shift)
y
s
xCandidatemaxima
y
s
Scale votesx
y
s
Binned accum. array
x
Source: Bastian Leibe
30-Nov-1120
![Page 21: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/21.jpg)
Lecture 17 -
Detection Results
• Qualitative Performance
– Recognizes different kinds of objects
– Robust to clutter, occlusion, noise, low contrast
Source: Bastian Leibe
21
![Page 22: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/22.jpg)
Lecture 17 -Fei-Fei Li
Figure-Ground Segregation
• What happens first – segmentation or recognition?
• Problem extensively studied in Psychophysics
• Experiments with ambiguousfigure-ground stimuli
• Results:
– Evidence that object recognition canand does operate before figure-ground organization
– Interpreted as Gestalt cue familiarity.
M.A. Peterson, “Object Recognition Processes Can an d Do Operate Before Figure-Ground Organization”, Cur. Dir. in Psych. Sc., 3:105-111, 1994.
30-Nov-1122
![Page 23: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/23.jpg)
Lecture 17 -Fei-Fei Li
ISM – Top-Down Segmentation
BackprojectedHypotheses
Interest Points Matched Codebook Entries
Probabilistic Voting
Segmentation3D Voting Space
(continuous)
x
y
s
Backprojectionof Maxima
p(figure)Probabilities
[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]
30-Nov-1123
![Page 24: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/24.jpg)
Lecture 17 -Fei-Fei Li
Top-Down Segmentation: Motivation
• Secondary hypotheses (“mixtures of cars/cows/etc.”)– Desired property of algorithm! ⇒ robustness to occlusion
– Standard solution: reject based on bounding box overlap
⇒ Problematic - may lead to missing detections!
⇒ Use segmentations to resolve ambiguities instead.
– Basic idea: each observed pixel can only be explained by (at most) one detection.
Source: Bastian Leibe
30-Nov-1124
![Page 25: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/25.jpg)
Lecture 17 -Fei-Fei Li
• Secondary hypotheses (“mixtures of cars/cows/etc.”)– Desired property of algorithm! ⇒ robustness to occlusion
– Standard solution: reject based on bounding box overlap
⇒ Problematic - may lead to missing detections!
⇒ Use segmentations to resolve ambiguities instead.
– Basic idea: each observed pixel can only be explained by (at most) one detection.
Top-Down Segmentation: Motivation
Source: Bastian Leibe
30-Nov-1125
![Page 26: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/26.jpg)
Lecture 17 -Fei-Fei Li
Segmentation: Probabilistic Formulation
• Influence of patch on object hypothesis (vote weight)
( ) ( ) ( ) ( )( )xop
f,pfCpCxopxofp
n
i iin
n ,
||,,,
∑=l
l
( ) ( ) ( )∑∈
===),(
,|,,,,|,|l
ll
fnnn xofpxoffigurepxofigurep
p
pp
• Backprojection to features f and pixels p:
Segmentationinformation
Influence on object hypothesis
[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]
30-Nov-1126
![Page 27: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/27.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
Object location
on,x
Image Feature fat location l
f
Codebook matches
Ci
)( fCp i ),,( lin Cxop
Matching probability
Occurrence distribution
30-Nov-1127
![Page 28: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/28.jpg)
Lecture 17 -Fei-Fei Li
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
� Probability that object on occurs at location x given (f,l)
( , , )ni
p o x f =∑l
)( fCp i
Matching probability
),,( lin Cxop
Occurrence distribution
)( fCp i
Matching probability
),,( lin Cxop
Occurrence distribution
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-1128
![Page 29: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/29.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
� Probability that object on occurs at location x given (f,l)
� How to measure those probabilities?
( , , )ni
p o x f =∑l
{ }1( ) , where | ( , )
| |i i ip C f C C d C fC
θ= = ≤
1( , , )
# ( )n ii
p o x Coccurrences C
=l
)( fCp i ),,( lin Cxop
θ f
Activatedcodebook entries
30-Nov-1129
![Page 30: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/30.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
� Probability that object on occurs at location x given (f,l)
� Likelihood of the observed features given the object hypothesis
( , , )ni
p o x f =∑l )( fCp i ),,( lin Cxop
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
( ),np o x : Prior for the object location( )p f,l : Indicator variable for sampled features
30-Nov-1130
![Page 31: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/31.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
30-Nov-1131
![Page 32: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/32.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
30-Nov-1132
![Page 33: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/33.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Recognition
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
x
y
s
30-Nov-1133
![Page 34: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/34.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Top-Down Segmentation
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
• Figure-ground backprojection
Fig./Gnd. labelfor each occurrence
Influence on object hypothesis
( ) ( ) ( ) ( )( )∑ ∑
∈
==)p
pl
lll
f, i n
iinin xop
f,pfCpCxopCxofig.p
( ,
|,|,,,,|
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
( ) == l,,,,| in Cfxofigurep p
30-Nov-1134
![Page 35: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/35.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Top-Down Segmentation
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
• Figure-ground backprojection
Fig./Gnd. labelfor each occurrence
Influence on object hypothesis
( ) ( ) ( ) ( )( )∑ ∑
∈
==)p
pl
lll
f, i n
iinin xop
f,pfCpCxopCxofig.p
( ,
|,|,,,,|
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
( ) ∑==i
n fxofigurep l,,,|p
Marginalize overall codebook entries
matched to f
30-Nov-1135
![Page 36: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/36.jpg)
Lecture 17 -Fei-Fei Li
Derivation: ISM Top-Down Segmentation
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
• Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
• Vote weights: contribution of a single feature f
• Figure-ground backprojection
Fig./Gnd. labelfor each occurrence
Influence on object hypothesis
( ) ( ) ( ) ( )( )∑ ∑
∈
==)p
pl
lll
f, i n
iinin xop
f,pfCpCxopCxofig.p
( ,
|,|,,,,|
( ) ( ) ( )( )
( ) ( ) ( )( )
, | , |, || ,
, ,n i in i
nn n
p o x C p C f p f,p o x f, p f,p f, o x
p o x p o x= = ∑
l ll ll
( ) ∑ ∑∈
==),(
,|lf i
n xofigurepp
p
Marginalize overall features contai-
ning pixel p
30-Nov-1136
![Page 37: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/37.jpg)
Lecture 17 -Fei-Fei Li
Top-Down Segmentation Algorithm
• This may sound quite complicated, but it boils down to a very simple algorithm…
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-1137
![Page 38: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/38.jpg)
Lecture 17 -Fei-Fei Li
Segmentation
• Interpretation of p(figure) map
� per-pixel confidence in object hypothesis
� Use for hypothesis verification
p(figure)
p(ground)
Segmentation
p(figure)
p(ground)
Original image
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-1138
![Page 39: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/39.jpg)
Lecture 17 -Fei-Fei Li
Example Results: Motorbikes
[Leibe, Leonardis, Schiele, SLCV’04; IJCV’08]
30-Nov-1139
![Page 40: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/40.jpg)
Lecture 17 -Fei-Fei Li
• Training
– 112 hand-segmented images
• Results on novel sequences:
Single-frame recognition - No temporal continuity used!
Example Results: Cows
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-1140
![Page 41: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/41.jpg)
Lecture 17 -Fei-Fei Li
Office chairs
Dining room chairs
Example Results: Chairs
Source: Bastian Leibe
30-Nov-1141
![Page 42: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/42.jpg)
Lecture 17 -Fei-Fei Li
Detections Using Ground Plane Constraints
left camera 1175 frames
Battery of 5ISM detectorsfor differentcar views
[Leib
e,
Leonard
is,
Schie
le,
SLCV’0
4;
IJCV’0
8]
30-Nov-1142
![Page 43: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/43.jpg)
Lecture 17 -Fei-Fei Li
TrainingTraining
TestTest OutputOutput
[Thom
as,
Ferr
ari
, Tu
yte
laars
, Leib
e,
Van G
ool,
3D
RR’0
7;
RSS’0
8]
Inferring Other Information: Part Labels (1)
30-Nov-1143
![Page 44: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/44.jpg)
Lecture 17 -Fei-Fei Li
Inferring Other Information: Part Labels (2)
[Thom
as,
Ferr
ari
, Tu
yte
laars
, Leib
e,
Van G
ool,
3D
RR’0
7;
RSS’0
8]
30-Nov-1144
![Page 45: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/45.jpg)
Lecture 17 -Fei-Fei Li
Inferring Other Information: Depth Maps
“Depth from a single image”
[Thom
as,
Ferr
ari
, Tu
yte
laars
, Leib
e,
Van G
ool,
3D
RR’0
7;
RSS’0
8]
30-Nov-1145
![Page 46: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/46.jpg)
Lecture 17 -Fei-Fei Li
• Try to fit silhouette to detected person
• Basic idea– Search for the silhouette that simultaneously optimizes the
• Chamfer match to the distance-transformed edge image
• Overlap with the top-down segmentation
– Enforces global consistency
– Caveat: introduces again reliance on global model
Extension: Estimating Articulation
[Leibe, Seemann, Schiele, CVPR’05]
30-Nov-1146
![Page 47: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/47.jpg)
Lecture 17 -Fei-Fei Li
• Polar instead of Cartesian voting scheme
• Benefits:– Recognize objects under image-plane rotations
– Possibility to share parts between articulations.
• Caveats:– Rotation invariance should only be used when it’s really needed.
(Also increases false positive detections)
Extension: Rotation-Invariant Detection
[Mikolajczyk, Leibe, Schiele, CVPR’06]
θq
φ
dq
φ
θ
d
30-Nov-1147
![Page 48: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/48.jpg)
Lecture 17 -Fei-Fei Li
Sometimes, Rotation Invariance Is Needed…
[Mikolajczyk et al., CVPR’06]
30-Nov-1148
![Page 49: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/49.jpg)
Lecture 17 -Fei-Fei Li
You Can Try It At Home…
• Linux binaries available
– Including datasets & several pre-trained detectors
– http://www.vision.ee.ethz.ch/bleibe/code
x
y
s
Source: Bastian Leibe
30-Nov-1149
![Page 50: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/50.jpg)
Lecture 17 -Fei-Fei Li
Discussion: Implicit Shape Model• Pros:
– Works well for many different object categories• Both rigid and articulated objects
– Flexible geometric model• Can recombine parts seen on different training examples
– Learning from relatively few (50-100) training examples– Optimized for detection, good localization properties
• Cons:– Needs supervised training data
• Object bounding boxes for detection• Reference segmentations for top-down segm.
– Only weak geometric constraints• Result segmentations may contain superfluous
body parts.
– Purely representative model• No discriminative learning
Source: Bastian Leibe
30-Nov-1150
![Page 51: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/51.jpg)
Lecture 17 -Fei-Fei Li
What we will learn today?
• Implicit Shape Model
– Representation
– Recognition
– Experiments and results
• Deformable Models
– The PASCAL challenge
– Latent SVM Model
30-Nov-1151
![Page 52: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/52.jpg)
Lecture 17 -Fei-Fei Li
Object Detection
– the PASCAL Challenge
• ~10,000 images, with ~25,000 target objects.
– Objects from 20 categories (person, car, bicycle, cow,
table...).
– Objects are annotated with labeled bounding boxes.
30-Nov-1152
So
urc
e:
Pe
dro
Fe
lze
nsw
alb
![Page 53: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/53.jpg)
Lecture 17 -Fei-Fei Li 30-Nov-1153
![Page 54: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/54.jpg)
Lecture 17 -Fei-Fei Li
Latent SVM Model: an Overview
30-Nov-1154
root filter part filters deformation
modelsdetection
So
urc
e:
Pe
dro
Fe
lze
nsw
alb
![Page 55: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/55.jpg)
Lecture 17 -Fei-Fei Li
Histogram of Oriented Gradient (HOG) Features
• Image is partitioned into 8x8 pixel blocks.
• In each block we compute a histogram of gradient
orientations.
– Invariant to changes in lighting, small deformations, etc.
• We compute features at different resolutions (pyramid).
30-Nov-1155
So
urc
e:
Pe
dro
Fe
lze
nsw
alb
![Page 56: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/56.jpg)
Lecture 17 -Fei-Fei Li
Filters
• Filters are rectangular templates defining weights for features.
• Score is dot product of filter and subwindow of HOG pyramid.
30-Nov-1156
HOG pyramid
W
Score of H at this location is H ⋅ W
H
Source: Pedro Felzenswalb
![Page 57: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/57.jpg)
Lecture 17 -Fei-Fei Li
Object Hypothesis
30-Nov-1157
Multiscale model captures features at two-resolutions
Score is sum of filter scores
plus deformation scores
![Page 58: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/58.jpg)
Lecture 17 -Fei-Fei Li
Training the Latent SVM Model
• Training data consists of images with labeled bounding boxes.
• Need to learn the model structure, filters and deformation costs.
30-Nov-1158
Training
Source: Pedro Felzenswalb
![Page 59: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/59.jpg)
Lecture 17 -Fei-Fei Li
Connection with Linear Classifiers
• Score of model is sum of filter scores plus
deformation scores
– Bounding box in training data specifies that the score
should be high for some placement in a range
30-Nov-1159
w is a model
x is a detection window
z are filter placements
Concatenation of filters and
deformation parameters
Concatenation of features
and part displacements
Latent
SVM
Standard
SVM
Weight vector Features
![Page 60: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/60.jpg)
Lecture 17 -Fei-Fei Li
Latent SVM Training
• Semi-convex optimization problem
– is convex in w
– convex if we fix z for positive examples
• Iterative optimization procedure:
– Initialize w
– Iterate:
• Pick best z for each positive example
• Optimize w via gradient descent with data mining
30-Nov-1160
Linear in w if z is fixed Observed variables Latent variables
![Page 61: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/61.jpg)
Lecture 17 -Fei-Fei Li
Latent SVM Training: Initializing w
30-Nov-1161
• For k component mixture model:– Split examples into k sets based on bounding box aspect
ratio
• Learn k root filters using standard SVM– Training data: Warped positive examples and random
windows from negative images (Dalal & Triggs)
• Initialize parts by selecting patches from root filters:– Sub-windows with strong coefficients
– Interpolate to get higher resolution filters
– Initialize spatial model using fixed spring constants
![Page 62: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/62.jpg)
Lecture 17 -Fei-Fei Li
Learned Models
30-Nov-1162
![Page 63: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/63.jpg)
Lecture 17 -Fei-Fei Li
Example Results
30-Nov-1163
So
urc
e:
Pe
dro
Fe
lze
nsw
alb
![Page 64: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/64.jpg)
Lecture 17 -Fei-Fei Li
More Results
30-Nov-1164
![Page 65: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/65.jpg)
Lecture 17 -Fei-Fei Li
Quantitative Results
• 9 systems competed in the 2007 challenge.
• Out of 20 classes:
– First place in 10 classes
– Second place in 6 classes
• Some statistics:
– It takes ~2 seconds to evaluate a model in one image.
– It takes ~3 hours to train a model.
– MUCH faster than most systems.
30-Nov-1165
Source: Pedro Felzenswalb
![Page 66: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/66.jpg)
Lecture 17 -Fei-Fei Li
Code for Latent SVM
Source code for the system and models trained on PASCAL 2006, 2007 and 2008
data are available at:
http://www.cs.uchicago.edu/~pff/latent
30-Nov-1166
Source: Pedro Felzenswalb
![Page 67: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/67.jpg)
Lecture 17 -Fei-Fei Li
Summary
• Deformable models provide an elegant framework
for object detection and recognition.
– Efficient algorithms for matching models to images.
– Applications: pose estimation, medical image analysis,
object recognition, etc.
• We can learn models from partially labeled data.
– Generalized standard ideas from machine learning.
– Leads to state-of-the-art results in PASCAL challenge.
• Future work: hierarchical models, grammars, 3D
objects.
30-Nov-1167
Source: Pedro Felzenswalb
![Page 68: Lecture 17: object detection - Stanford Computer Vision Labvision.stanford.edu/teaching/cs231a_autumn1213... · Fei-Fei Li Lecture 17 - Scale Voting: Efficient Computation • Continuous](https://reader030.vdocument.in/reader030/viewer/2022040412/5f085d977e708231d421a6c2/html5/thumbnails/68.jpg)
Lecture 17 -Fei-Fei Li
What we have learned today
• Implicit Shape Model
– Representation
– Recognition
– Experiments and results
• Deformable Models
– The PASCAL challenge
– Latent SVM Model
30-Nov-1168