on sparse representations for scalability in image pattern matching€¦ · nov. 8, 2008 oct 9,...
Post on 25-Sep-2020
0 Views
Preview:
TRANSCRIPT
On Sparse Representations for
Scalability in Image Pattern Matching
1
SIAM, PPSC-2012
Parallel Processing and Scientific Computing
Karl Ni, karl.ni@ll.mit.eduMIT Lincoln Laboratory
22 September 2011
This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
• Motivation– Image pattern recognition– Image data collection capabilities
• Problems and Challenges
• Training an Image Database
• Results
• Conclusions
Outline
2
SIAM, PPSC-2012
• What can a computer understand?
Applying Semantic Understanding of Images
• Who?
• What?
• When?
• Where?
3
SIAM, PPSC-2012
Classifier Decision!
Feature Extraction
Feature Extraction
Matching &AssociationMatching &Association
Training DataQuery by example
Statistical modeled
Query by sketch
Computer vision algorithms
• Image retrieval
• Robotic navigation
• Semantic labeling
• Image sketch
• Structure from Motion
• Requires: Some prior knowledge
• Open source and image collection capabilities
Where can we get training sets?
“Kullen.Net”
UW vs Cornell’s “PhotoCity” competitionThe Growth of Flickr
4
SIAM, PPSC-2012
• FlickR “trounced by” Facebook– 15 billion photos– In Nov 2008, “2 billion photos each month.”
Nov. 8, 2008 Oct 9, 2009
Nov 13, 2007
“Still, it’s a staggering
number of photos for
a site that launched in
2004,” -- Tech Crunch
2Billionth Image
3Billionth Image4Billionth Image
• Motivation
• Problems and Challenges– Current Techniques– Computational Issues
• Training an Image Database
• Results
• Conclusions
Outline
5
SIAM, PPSC-2012
• Face detection and recognition: mostly done:
• Generic object detector: not so much:
Specialized Content Detectors
6
SIAM, PPSC-2012
• Computation for algorithms (e.g., deformable parts)– Rely on multiple instance learnings (can be considerable # of instances)– Parsing the entire image for relevant features– Serial in computation– Rely on false alarm rejections to reduce computations– The feature space is exceedingly complex
Detectors for Every Object?
1 46
7
25
7
SIAM, PPSC-2012
• Parallelizable, but still poor algorithmic performance
• Let’s say you have 10 very good detectors (~%5 FA rate)– Still have a large image to classify at different scales/orientations
and 10 x 0.05 FA rate for ~40% FA rate!– These classifiers don’t know anything about their surroundings!
People can’t be flying or walking on billboards!1. Chair, 2. Table, 3. Road, 4. Road, 5. Table, 6. Car, 7. Keyboard
We use context in order inference about an image
3
46
• Motivation
• Problems and Challenges
• Training an Image Database– What are the “best” features?– How to automatically train for the best features– Automated choice in hierarchical GMMs– A better option: optimizing for sparsity– Parallel processing in sparse feature finding
• Results
Outline
8
SIAM, PPSC-2012
• Results
• Conclusions
• Problems in image pattern matching
Finding the Features of Image
• Each image = 10 million pixels!
• Most dimensions are irrelevant
• Multiple concepts inside the image
Feature Extraction
Feature Extraction
Training / ClassifierTraining / Classifier
9
SIAM, PPSC-2012
• Features are a quantitative way for machines to understand of an image
Image Property Feature Technique– Local Color (Luma + Chroma Components)– Object texture (Fourier domain/Wavelet)– Shape (Curvelets, Shapelets)– Lower level gradients (Wavelets: Haar, Daubechies)– Higher level descriptors (SIFT/SURF/etc)– Overall image descriptors (GIST)
Numerous features: subset is relevant
Feature
Extraction
Feature
ExtractionMatching &
Association
Matching &
Association
10
SIAM, PPSC-2012
• FEATURES ARE:
• Red bricks on multiple buildings
• Small hedges, etc
• Windows of a certain type
• Types of buildings are there
• FEATURES ARE:
• Arches and white buildings
• Domes and ancient architecture
• Older/speckled materials (higher frequency image content)
• FEATURES ARE:
• More suburb-like
• Larger roads
• Drier vegetation
• Shorter houses
Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images
Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images
Environment is relevant
Feature
Extraction
Feature
ExtractionClassification
/ Training
Classification
/ Training
• Feature invariance in images is necessary for most concepts– to transformations (e.g., 3D rotation, translation, scale)– to dynamic content (e.g., deformable parts)– to various contexts (e.g., illumination at different times of day)– …to different instances
11
SIAM, PPSC-2012
• Some features (e.g. SIFT) acquire some of these attributes but only to a certain extent
– 30 Degrees (SIFT)– Many times don’t match– Illumination invariance
• A collective group of features is necessary (boosting/blending)– A large set of features make training/classification more complex– Training is very difficult (feature extraction & training/classification)
• Tools to hand labelconcepts
• 2006-2011– Google Image Labeler– Kobus’s Corel Dataset– MIT LabelMe
Getting the Right Features
Feature
Extraction
Feature
Extraction
12
SIAM, PPSC-2012
– MIT LabelMe– Yahoo! Games
• Problems– Tedious– Time consuming– Incorrect– Very low throughput
• Face detection, time to collect all the data
• Feature selection: currently an active area of research
• Computational complexity is high– Feature extraction is difficult problem– Classifiers are difficult problems– Traditionally passed between the two problems
More knowledge of the domain/model, rely on better feature extractorsLess knowledge, rely (unfortunately) on complexity in discriminant methods
• Would like to feed the entire image into– Won’t need to manually segment images– Feeding in noise will learn the context (info about surroundings)– Learn multiple instances of a concept (build invariance through example)
Automatically Learn the Best Features
Feature Extraction
Feature Extraction
Training / ClassifierTraining / Classifier
13
SIAM, PPSC-2012
– Learn multiple instances of a concept (build invariance through example)– Massively parallel per image per class
• Take several features, and subselect the “best” ones
Image Class 1 Image Class 2 Image Class N
Distribution 1
Entire image
Distribution 2 Distribution N
Entire image Entire image
Training imagesTraining images
Automatic feature subselection has been submitted to SSP 2012
• Lots of work in the 1990s– Conditional probabilities through large training data sets– Vasconcelos et al’s work on semantic image retrieval– Primarily based on multiple instance learning and noisy density
estimation
• Learning multiple instances of an object (no noise case)
How do we do it?
14
SIAM, PPSC-2012
• Robustness to noise through law of large numbers– Hope to integrate it out
– Although the area of red boxes per instance is small, their aggregate over all instances is dominant
Noise, if uncorrelated, will
become more and more sparse
• Statistical Distributions:– Generative methods: represent millions of points by a few
parameters
• Mixture hierarchies can be incrementally trained
Parallel Calculations through Mixture Hierarchies
Top Level GMM
15
SIAM, PPSC-2012
• Problems with HGMMs– Extensive computational process to bring hierarchies together– Difficult to train as each level requires initialization point– Specify number of classes at each level for initialization
Lower Level GMMs
Can be done in parallel
image 1 image 2 image 3
• Gaussian mixtures as a density estimate– Non-convex / sensitive to initialization– Iterative and very slow– Small sample-bias is large
• Think discriminantly:– Instead of: Generating centroids that represent images– Think: Prune features to eliminate redundancy
Finding a sparse basis set
16
SIAM, PPSC-2012
• Sparsity optimization– Solving directly for the features that we want to use– Induces less complexity and as will see is a LP problem– Reduction of redundancy is intuitive and not generative
• Under normalization, GMM’s classifier can be implemented with matched filter instead
normalize
><∈
iCi
yx,maxarg},...,1{
2
2||||maxarg iyx −β
• Gaussian Mixture Models
• Many optimization problems (compressed sensing) induce sparsity:
Finding sparsity with linear programming
Group Lasso
GMM, solved via EM
(non-convex optimization problem)
Exponential according to N
Each iteration O(MNd2)
[ ])(||)(||maxarg 2
2 ββλββ
TXX +−
}1,0{∈ijβ 11 =Tβsuch that and
∑ ∑= =
∑∑
∑−
N
j
M
m
mmjmm xpMM 1 1
......)|(logmin
11
µπµµ
17
SIAM, PPSC-2012
• Matched filter constraint:
• Relaxation of constraints
Max-Constraint Optimization
LP Optimization Problem:
Faster than EM
Faster than G-Lasso
Independemt of dimensionality!
Convex (unlike MF opt & GMM, EM)
On average, according to N2
+− ∑
i
i
TtXXtr λβ
β
)(minarg
10 ≤≤≤ iij tβ 11 =Tβsuch that and
[ ]2
2||1||)(maxarg βλββ
+XXtrT
}1,0{∈ijβ 11 =Tβsuch that and
• Relies on covariance matrix concept
Intuition
< t1
+−= ∑
i
i
TtXXtr λββ
β
)(minarg*
10 ≤≤≤ iij tβ 11=Tβs.t. and
1.0098.1
18
SIAM, PPSC-2012
• Actual implementation does not include covariance matrix, but rather keeps track of beta indices
β =
< t1
< t2
< t3
< t4
=
195.1.01.0
95.12.00
1.02.0198.
1.0098.1
XXT
• Motivation
• Problems and Challenges
• Training an Image Database
• Results
• Conclusions
Outline
19
SIAM, PPSC-2012
LP Feature Learning versus G-Lasso
20
SIAM, PPSC-2012
• More intuitive grouping– Threshold learning is unnecessary– Post-processing is unnecessary
• 5.452% more accurate in +1/-1 learning classes
• 80.054% faster than GMM’s
Classifying Texture
20 40 60 80 100 120 140 160 180 200 220
20
40
60
80
100
120
140
160
180
200
Decision Confidence
50 100 150 200
20
40
60
80
100
120
140
160
180
200 4
6
8
10
12
14
DecisionsOriginal Image
21
SIAM, PPSC-2012
Decision ConfidenceDecisionsOriginal Image
Complexity and Confusion Matrix
• 1400 images per dataset
• Filter reduction to 356 filters per class
• Less than a minute classification time
• Coverage of cities: entire cities (Vienna, Dubrovnik, Lubbock), portion of Cambridge (MIT-Kendall)
22
SIAM, PPSC-2012
Training
Datasets MIT-Kendall
Vienna Dubrovnik Lubbock
Testing MIT-Kendall 0.975 0.056 0.024 0.102
Vienna 0.050 0.896 0.035 0.060
Dubrovnik 0.015 0.024 0.905 0.057
Lubbock 0.097 0.002 0.053 0.901
Computational Results & Accuracy
0
2
4
Co
mp
uta
tio
n t
ime
lo
g1
0(m
in)
200
400
600
MS
E
GMMs
Beta Opt
23
SIAM, PPSC-2012
• Fixed iteration and k GMM’s
• Best initialization via k-means (not included in optimization)
2 2.5 3 3.5 4 4.5 5 5.5 6-2
log10
#DCT features
Co
mp
uta
tio
n t
ime
lo
g
2 2.5 3 3.5 4 4.5 5 5.5 6
0
Interesting automatic semantic learning result
24
SIAM, PPSC-2012
• Training in computer vision is troublesome– Big data– Feature extraction– Non-automated processes
• Statistical characterization reduces complexity
• Redundancy arbitration achieves savings
Conclusions
25
SIAM, PPSC-2012
• Redundancy arbitration achieves savings
• Feature selection through LP programming produces gains in computation time
• MIT Lincoln Laboratory– Karl Ni– Nicholas Armstrong-Crews– Scott Sawyer– Nadya Bliss
• MIT– Katherine L. Bouman
Contributors and Acknowledgements
26
SIAM, PPSC-2012
• Boston University– Zachary Sun
• Northeastern University– Alexandru Vasile
• Cornell University– Noah Snavely
Questions?
27
SIAM, PPSC-2012
Between Class Training
28
SIAM, PPSC-2012
• There’s sky in both of these
• It’s a feature that is descriptive of most situations where you would find a car or a buffalo
• Simply get rid of the features that are common
• Find the most discriminative features
Car Class Buffalo Class
• Training an image class is most efficiently done with a small number of descriptive yet discriminative features. Features are often manually handpicked from subsets of imagery or machine generated feature extractors. It is beneficial to automatically discard irrelevant features and retain the most representative ones. Determining the best features to use is inherently a difficult and computationally tasking process. Such a methodology would allow training large scale datasets quickly, in
Abstract
29
SIAM, PPSC-2012
would allow training large scale datasets quickly, in parallel, and without human aid. We overview an automated technique in image pattern matching that uses sparse optimization constraints to select the best subset of large amounts of feature data.
Can you tell what is in this picture?
30
SIAM, PPSC-2012
Courtesy A. Torralba
Context in processing is important
31
SIAM, PPSC-2012
Courtesy A. Torralba
top related