on sparse representations for scalability in image pattern matching€¦ · nov. 8, 2008 oct 9,...
TRANSCRIPT
![Page 1: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/1.jpg)
On Sparse Representations for
Scalability in Image Pattern Matching
1
SIAM, PPSC-2012
Parallel Processing and Scientific Computing
Karl Ni, [email protected] Lincoln Laboratory
22 September 2011
This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
![Page 2: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/2.jpg)
• Motivation– Image pattern recognition– Image data collection capabilities
• Problems and Challenges
• Training an Image Database
• Results
• Conclusions
Outline
2
SIAM, PPSC-2012
![Page 3: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/3.jpg)
• What can a computer understand?
Applying Semantic Understanding of Images
• Who?
• What?
• When?
• Where?
3
SIAM, PPSC-2012
Classifier Decision!
Feature Extraction
Feature Extraction
Matching &AssociationMatching &Association
Training DataQuery by example
Statistical modeled
Query by sketch
Computer vision algorithms
• Image retrieval
• Robotic navigation
• Semantic labeling
• Image sketch
• Structure from Motion
• Requires: Some prior knowledge
![Page 4: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/4.jpg)
• Open source and image collection capabilities
Where can we get training sets?
“Kullen.Net”
UW vs Cornell’s “PhotoCity” competitionThe Growth of Flickr
4
SIAM, PPSC-2012
• FlickR “trounced by” Facebook– 15 billion photos– In Nov 2008, “2 billion photos each month.”
Nov. 8, 2008 Oct 9, 2009
Nov 13, 2007
“Still, it’s a staggering
number of photos for
a site that launched in
2004,” -- Tech Crunch
2Billionth Image
3Billionth Image4Billionth Image
![Page 5: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/5.jpg)
• Motivation
• Problems and Challenges– Current Techniques– Computational Issues
• Training an Image Database
• Results
• Conclusions
Outline
5
SIAM, PPSC-2012
![Page 6: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/6.jpg)
• Face detection and recognition: mostly done:
• Generic object detector: not so much:
Specialized Content Detectors
6
SIAM, PPSC-2012
• Computation for algorithms (e.g., deformable parts)– Rely on multiple instance learnings (can be considerable # of instances)– Parsing the entire image for relevant features– Serial in computation– Rely on false alarm rejections to reduce computations– The feature space is exceedingly complex
![Page 7: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/7.jpg)
Detectors for Every Object?
1 46
7
25
7
SIAM, PPSC-2012
• Parallelizable, but still poor algorithmic performance
• Let’s say you have 10 very good detectors (~%5 FA rate)– Still have a large image to classify at different scales/orientations
and 10 x 0.05 FA rate for ~40% FA rate!– These classifiers don’t know anything about their surroundings!
People can’t be flying or walking on billboards!1. Chair, 2. Table, 3. Road, 4. Road, 5. Table, 6. Car, 7. Keyboard
We use context in order inference about an image
3
46
![Page 8: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/8.jpg)
• Motivation
• Problems and Challenges
• Training an Image Database– What are the “best” features?– How to automatically train for the best features– Automated choice in hierarchical GMMs– A better option: optimizing for sparsity– Parallel processing in sparse feature finding
• Results
Outline
8
SIAM, PPSC-2012
• Results
• Conclusions
![Page 9: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/9.jpg)
• Problems in image pattern matching
Finding the Features of Image
• Each image = 10 million pixels!
• Most dimensions are irrelevant
• Multiple concepts inside the image
Feature Extraction
Feature Extraction
Training / ClassifierTraining / Classifier
9
SIAM, PPSC-2012
• Features are a quantitative way for machines to understand of an image
Image Property Feature Technique– Local Color (Luma + Chroma Components)– Object texture (Fourier domain/Wavelet)– Shape (Curvelets, Shapelets)– Lower level gradients (Wavelets: Haar, Daubechies)– Higher level descriptors (SIFT/SURF/etc)– Overall image descriptors (GIST)
![Page 10: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/10.jpg)
Numerous features: subset is relevant
Feature
Extraction
Feature
ExtractionMatching &
Association
Matching &
Association
10
SIAM, PPSC-2012
• FEATURES ARE:
• Red bricks on multiple buildings
• Small hedges, etc
• Windows of a certain type
• Types of buildings are there
• FEATURES ARE:
• Arches and white buildings
• Domes and ancient architecture
• Older/speckled materials (higher frequency image content)
• FEATURES ARE:
• More suburb-like
• Larger roads
• Drier vegetation
• Shorter houses
Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images
Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images
![Page 11: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/11.jpg)
Environment is relevant
Feature
Extraction
Feature
ExtractionClassification
/ Training
Classification
/ Training
• Feature invariance in images is necessary for most concepts– to transformations (e.g., 3D rotation, translation, scale)– to dynamic content (e.g., deformable parts)– to various contexts (e.g., illumination at different times of day)– …to different instances
11
SIAM, PPSC-2012
• Some features (e.g. SIFT) acquire some of these attributes but only to a certain extent
– 30 Degrees (SIFT)– Many times don’t match– Illumination invariance
• A collective group of features is necessary (boosting/blending)– A large set of features make training/classification more complex– Training is very difficult (feature extraction & training/classification)
![Page 12: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/12.jpg)
• Tools to hand labelconcepts
• 2006-2011– Google Image Labeler– Kobus’s Corel Dataset– MIT LabelMe
Getting the Right Features
Feature
Extraction
Feature
Extraction
12
SIAM, PPSC-2012
– MIT LabelMe– Yahoo! Games
• Problems– Tedious– Time consuming– Incorrect– Very low throughput
• Face detection, time to collect all the data
• Feature selection: currently an active area of research
![Page 13: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/13.jpg)
• Computational complexity is high– Feature extraction is difficult problem– Classifiers are difficult problems– Traditionally passed between the two problems
More knowledge of the domain/model, rely on better feature extractorsLess knowledge, rely (unfortunately) on complexity in discriminant methods
• Would like to feed the entire image into– Won’t need to manually segment images– Feeding in noise will learn the context (info about surroundings)– Learn multiple instances of a concept (build invariance through example)
Automatically Learn the Best Features
Feature Extraction
Feature Extraction
Training / ClassifierTraining / Classifier
13
SIAM, PPSC-2012
– Learn multiple instances of a concept (build invariance through example)– Massively parallel per image per class
• Take several features, and subselect the “best” ones
Image Class 1 Image Class 2 Image Class N
Distribution 1
Entire image
Distribution 2 Distribution N
Entire image Entire image
Training imagesTraining images
Automatic feature subselection has been submitted to SSP 2012
![Page 14: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/14.jpg)
• Lots of work in the 1990s– Conditional probabilities through large training data sets– Vasconcelos et al’s work on semantic image retrieval– Primarily based on multiple instance learning and noisy density
estimation
• Learning multiple instances of an object (no noise case)
How do we do it?
14
SIAM, PPSC-2012
• Robustness to noise through law of large numbers– Hope to integrate it out
– Although the area of red boxes per instance is small, their aggregate over all instances is dominant
Noise, if uncorrelated, will
become more and more sparse
![Page 15: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/15.jpg)
• Statistical Distributions:– Generative methods: represent millions of points by a few
parameters
• Mixture hierarchies can be incrementally trained
Parallel Calculations through Mixture Hierarchies
Top Level GMM
15
SIAM, PPSC-2012
• Problems with HGMMs– Extensive computational process to bring hierarchies together– Difficult to train as each level requires initialization point– Specify number of classes at each level for initialization
Lower Level GMMs
Can be done in parallel
image 1 image 2 image 3
![Page 16: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/16.jpg)
• Gaussian mixtures as a density estimate– Non-convex / sensitive to initialization– Iterative and very slow– Small sample-bias is large
• Think discriminantly:– Instead of: Generating centroids that represent images– Think: Prune features to eliminate redundancy
Finding a sparse basis set
16
SIAM, PPSC-2012
• Sparsity optimization– Solving directly for the features that we want to use– Induces less complexity and as will see is a LP problem– Reduction of redundancy is intuitive and not generative
• Under normalization, GMM’s classifier can be implemented with matched filter instead
normalize
><∈
iCi
yx,maxarg},...,1{
2
2||||maxarg iyx −β
![Page 17: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/17.jpg)
• Gaussian Mixture Models
• Many optimization problems (compressed sensing) induce sparsity:
Finding sparsity with linear programming
Group Lasso
GMM, solved via EM
(non-convex optimization problem)
Exponential according to N
Each iteration O(MNd2)
[ ])(||)(||maxarg 2
2 ββλββ
TXX +−
}1,0{∈ijβ 11 =Tβsuch that and
∑ ∑= =
∑∑
∑−
N
j
M
m
mmjmm xpMM 1 1
......)|(logmin
11
µπµµ
17
SIAM, PPSC-2012
• Matched filter constraint:
• Relaxation of constraints
Max-Constraint Optimization
LP Optimization Problem:
Faster than EM
Faster than G-Lasso
Independemt of dimensionality!
Convex (unlike MF opt & GMM, EM)
On average, according to N2
+− ∑
i
i
TtXXtr λβ
β
)(minarg
10 ≤≤≤ iij tβ 11 =Tβsuch that and
[ ]2
2||1||)(maxarg βλββ
+XXtrT
}1,0{∈ijβ 11 =Tβsuch that and
![Page 18: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/18.jpg)
• Relies on covariance matrix concept
Intuition
< t1
+−= ∑
i
i
TtXXtr λββ
β
)(minarg*
10 ≤≤≤ iij tβ 11=Tβs.t. and
1.0098.1
18
SIAM, PPSC-2012
• Actual implementation does not include covariance matrix, but rather keeps track of beta indices
β =
< t1
< t2
< t3
< t4
=
195.1.01.0
95.12.00
1.02.0198.
1.0098.1
XXT
![Page 19: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/19.jpg)
• Motivation
• Problems and Challenges
• Training an Image Database
• Results
• Conclusions
Outline
19
SIAM, PPSC-2012
![Page 20: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/20.jpg)
LP Feature Learning versus G-Lasso
20
SIAM, PPSC-2012
• More intuitive grouping– Threshold learning is unnecessary– Post-processing is unnecessary
• 5.452% more accurate in +1/-1 learning classes
• 80.054% faster than GMM’s
![Page 21: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/21.jpg)
Classifying Texture
20 40 60 80 100 120 140 160 180 200 220
20
40
60
80
100
120
140
160
180
200
Decision Confidence
50 100 150 200
20
40
60
80
100
120
140
160
180
200 4
6
8
10
12
14
DecisionsOriginal Image
21
SIAM, PPSC-2012
Decision ConfidenceDecisionsOriginal Image
![Page 22: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/22.jpg)
Complexity and Confusion Matrix
• 1400 images per dataset
• Filter reduction to 356 filters per class
• Less than a minute classification time
• Coverage of cities: entire cities (Vienna, Dubrovnik, Lubbock), portion of Cambridge (MIT-Kendall)
22
SIAM, PPSC-2012
Training
Datasets MIT-Kendall
Vienna Dubrovnik Lubbock
Testing MIT-Kendall 0.975 0.056 0.024 0.102
Vienna 0.050 0.896 0.035 0.060
Dubrovnik 0.015 0.024 0.905 0.057
Lubbock 0.097 0.002 0.053 0.901
![Page 23: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/23.jpg)
Computational Results & Accuracy
0
2
4
Co
mp
uta
tio
n t
ime
lo
g1
0(m
in)
200
400
600
MS
E
GMMs
Beta Opt
23
SIAM, PPSC-2012
• Fixed iteration and k GMM’s
• Best initialization via k-means (not included in optimization)
2 2.5 3 3.5 4 4.5 5 5.5 6-2
log10
#DCT features
Co
mp
uta
tio
n t
ime
lo
g
2 2.5 3 3.5 4 4.5 5 5.5 6
0
![Page 24: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/24.jpg)
Interesting automatic semantic learning result
24
SIAM, PPSC-2012
![Page 25: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/25.jpg)
• Training in computer vision is troublesome– Big data– Feature extraction– Non-automated processes
• Statistical characterization reduces complexity
• Redundancy arbitration achieves savings
Conclusions
25
SIAM, PPSC-2012
• Redundancy arbitration achieves savings
• Feature selection through LP programming produces gains in computation time
![Page 26: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/26.jpg)
• MIT Lincoln Laboratory– Karl Ni– Nicholas Armstrong-Crews– Scott Sawyer– Nadya Bliss
• MIT– Katherine L. Bouman
Contributors and Acknowledgements
26
SIAM, PPSC-2012
• Boston University– Zachary Sun
• Northeastern University– Alexandru Vasile
• Cornell University– Noah Snavely
![Page 27: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/27.jpg)
Questions?
27
SIAM, PPSC-2012
![Page 28: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/28.jpg)
Between Class Training
28
SIAM, PPSC-2012
• There’s sky in both of these
• It’s a feature that is descriptive of most situations where you would find a car or a buffalo
• Simply get rid of the features that are common
• Find the most discriminative features
Car Class Buffalo Class
![Page 29: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/29.jpg)
• Training an image class is most efficiently done with a small number of descriptive yet discriminative features. Features are often manually handpicked from subsets of imagery or machine generated feature extractors. It is beneficial to automatically discard irrelevant features and retain the most representative ones. Determining the best features to use is inherently a difficult and computationally tasking process. Such a methodology would allow training large scale datasets quickly, in
Abstract
29
SIAM, PPSC-2012
would allow training large scale datasets quickly, in parallel, and without human aid. We overview an automated technique in image pattern matching that uses sparse optimization constraints to select the best subset of large amounts of feature data.
![Page 30: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/30.jpg)
Can you tell what is in this picture?
30
SIAM, PPSC-2012
Courtesy A. Torralba
![Page 31: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched](https://reader033.vdocument.in/reader033/viewer/2022051920/600d24fb692a2a12ae347429/html5/thumbnails/31.jpg)
Context in processing is important
31
SIAM, PPSC-2012
Courtesy A. Torralba