visual element discovery as discriminative mode seeking
DESCRIPTION
Visual Element Discovery as Discriminative Mode Seeking. CMU CMU UCB. Carl Doersch , Abhinav Gupta, Alexei A. Efros. The need for mid-level representations. 6 billion images. 70 billion images. 1 billion images served daily. - PowerPoint PPT PresentationTRANSCRIPT
Visual Element Discovery as Discriminative Mode Seeking
Carl Doersch, Abhinav Gupta, Alexei A. EfrosCMU CMU UCB
The need for mid-level representations
6 billion images 70 billion images 1 billion images served daily
10 billion images
60 hours uploaded per minute
Almost 90% of web traffic is visual!
:From
Discriminative patches
• Visual words are too simple
• Objects are too difficult
• Something in the middle?(Felzenswalb et al. 2008)
(Singh et al. 2012)
Mid-level “Visual Elements”
• Simple enough to be detected easily• Complex enough to be meaningful– “Meaningful” as measured by weak labels
(Doersch et al. 2012)
(Singh et al. 2012)
Mid-level “Visual Elements”
(Doersch et al. 2012)
(Singh et al. 2012)
• Doersch et al. 2012• Singh et al. 2012• Jain et al. 2013• Endres et al. 2013• Juneja et al. 2013
• Li et al. 2013• Sun et al. 2013• Wang et al. 2013• Fouhey et al. 2013• Lee et al. 2013
Our goal
• Provide a mathematical optimization for visual elements
• Improve performance of mid-level representations.
Elements as Patch Classifiers
What if the labels are weak?
• E.g. image has horse/no-horse• (Or even weaker, like Paris/not-Paris)
• Idea: Label these all as “horse”
• Problem: 10,000 patches per image, most of which are unclassifiable.
The weaker the label, the bigger the problem.
Task: Learn to classify Paris from Not-Paris
Paris Also Paris
Other approaches
• Latent SVM:– Assumes we have one instance per positive image
• Multiple instance learning– Not clear how to define the bags
What if the labels are weak?
• Negatives are negatives, positives might not be positive
• Most of our data can be ignored• First: how to cluster without clustering everything
(Doersch et al. 2012)
(Singh et al. 2012)
Mean shift
Mean shift
Mean shift
Patch distances
Min distance: 2.59e-4
Max distance: 1.22e-4
Input Nearest neighbor
Mean shift
Negative Set Not ParisParis
Negative Set Not ParisParis
Density Ratios Not ParisParis
Density Ratios Not ParisParis
Adaptive Bandwidth NegativePositive
Bandwidth
Discriminative Mode Seeking
• Find local optima of an estimate of the density ratio
• Allow an adaptive bandwidth• Be extremely fast– Minimize the number of passes through the data
Discriminative Mode Seeking
• Mean shift: maximize (w.r.t. w)
Centroid
Patch FeatureBandwidth
Distance
w
b
Discriminative Mode Seeking
B(w) is the value of b satisfying:
Discriminative Mode Seeking
s.t.
optimize
• Distance metric: Normalized Correlation
s.t.
optimize
NegativePositive
w
Discriminative Mode Seeking
Optimization
• Initialization is straightforward• For each element, just keep around ~500
patches where wTx - b > 0• Trivially parallelizable in MapReduce.• Optimization is piecewise quadratic
s.t.
Evaluation via Purity-Coverage Plot
• Analogous to Precision-Recall Plot
Low Purity
Element 1
Element 2
Element 3
Element 4
Element 5
High purity, Low Coverage
Element 1
Element 2
Element 3
Element 4
Element 5
0 2 4 6 8 100
0.10.20.30.40.50.60.70.80.9
1
Purity-Coverage Curve
ParisNot Paris
Purity
Coverage x1e4 pixels
Purity
Purity-Coverage Curve
ParisNot Paris Coverage
0 2 4 6 8 100
0.10.20.30.40.50.60.70.80.9
1
x1e4 pixels
Purity-Coverage Curve
• Coverage for multiple elements is simply the union.
Purity-Coverage
0 0.1 0.2 0.3 0.4 0.50.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 0.2 0.4 0.6 0.8
Purit
y
Coverage (fraction of positive dataset) Coverage (fraction of positive dataset)
Top 25 Elements Top 200 Elements
This workThis work, no inter-elementSVM Retrained 5x (Doersch et al. 2012)LDA Retrained 5xLDA RetrainedExemplar LDA (Hariharan et al. 2012)
Results on Indoor 67 Scenes
Kitchen Grocery Bowling
Elevator Bakery Bathroom
Results on Indoor 67 Scenes
Method Accuracy Method Accuracy
ROI+Gist (Quattoni et al.) 26.05 miSVM (Li et al.) 46.40
MM-Scene (Zhu et al.) 28.00 D. Patches (full) (Singh et al.) 49.40
Scene-DPM (Pandley et al.) 30.40 MMDL (Wang et al.) 50.15
CENTRIST (Wu et al.) 36.90 Discr. Parts (Sun et al.) 51.40
Object Bank (Li et al.) 37.60 IFV (Juneja et al.) 60.77
RBoW (Parizi et al.) 37.93 Bag of Parts+IFV (Juneja et al.) 63.10
Discr. Patches (Singh et al.) 38.10 Ours (no inter-element) 63.36
Latent Pyramid. (Sadeghi et al.) 44.84 Ours 64.03
Bag of Parts (Juneja et al.) 46.10 Ours+IFV 66.87
Qualitative Indoor67 Results
Indoor67: Error Analysis
Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase
GT: laundromat Guess: closetGT: museum Guess: garage
Ground Truth (GT): deli GT: corridorGuess: grocery store Guess: staircase
GT: laundromat Guess: closetGT: museum Guess: garage
Thank you!
More results athttp://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/
Paris Elements • Indoor 67 ElementsIndoor 67 Heatmaps • Source code (soon)
Some New Paris Elements