on sparse representations for scalability in image pattern matching€¦ · nov. 8, 2008 oct 9,...

31
On Sparse Representations for Scalability in Image Pattern Matching 1 SIAM, PPSC-2012 Parallel Processing and Scientific Computing Karl Ni, [email protected] MIT Lincoln Laboratory 22 September 2011 This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Upload: others

Post on 25-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

On Sparse Representations for

Scalability in Image Pattern Matching

1

SIAM, PPSC-2012

Parallel Processing and Scientific Computing

Karl Ni, [email protected] Lincoln Laboratory

22 September 2011

This work is sponsored by the Department of the Air Force under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Page 2: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Motivation– Image pattern recognition– Image data collection capabilities

• Problems and Challenges

• Training an Image Database

• Results

• Conclusions

Outline

2

SIAM, PPSC-2012

Page 3: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• What can a computer understand?

Applying Semantic Understanding of Images

• Who?

• What?

• When?

• Where?

3

SIAM, PPSC-2012

Classifier Decision!

Feature Extraction

Feature Extraction

Matching &AssociationMatching &Association

Training DataQuery by example

Statistical modeled

Query by sketch

Computer vision algorithms

• Image retrieval

• Robotic navigation

• Semantic labeling

• Image sketch

• Structure from Motion

• Requires: Some prior knowledge

Page 4: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Open source and image collection capabilities

Where can we get training sets?

“Kullen.Net”

UW vs Cornell’s “PhotoCity” competitionThe Growth of Flickr

4

SIAM, PPSC-2012

• FlickR “trounced by” Facebook– 15 billion photos– In Nov 2008, “2 billion photos each month.”

Nov. 8, 2008 Oct 9, 2009

Nov 13, 2007

“Still, it’s a staggering

number of photos for

a site that launched in

2004,” -- Tech Crunch

2Billionth Image

3Billionth Image4Billionth Image

Page 5: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Motivation

• Problems and Challenges– Current Techniques– Computational Issues

• Training an Image Database

• Results

• Conclusions

Outline

5

SIAM, PPSC-2012

Page 6: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Face detection and recognition: mostly done:

• Generic object detector: not so much:

Specialized Content Detectors

6

SIAM, PPSC-2012

• Computation for algorithms (e.g., deformable parts)– Rely on multiple instance learnings (can be considerable # of instances)– Parsing the entire image for relevant features– Serial in computation– Rely on false alarm rejections to reduce computations– The feature space is exceedingly complex

Page 7: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Detectors for Every Object?

1 46

7

25

7

SIAM, PPSC-2012

• Parallelizable, but still poor algorithmic performance

• Let’s say you have 10 very good detectors (~%5 FA rate)– Still have a large image to classify at different scales/orientations

and 10 x 0.05 FA rate for ~40% FA rate!– These classifiers don’t know anything about their surroundings!

People can’t be flying or walking on billboards!1. Chair, 2. Table, 3. Road, 4. Road, 5. Table, 6. Car, 7. Keyboard

We use context in order inference about an image

3

46

Page 8: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Motivation

• Problems and Challenges

• Training an Image Database– What are the “best” features?– How to automatically train for the best features– Automated choice in hierarchical GMMs– A better option: optimizing for sparsity– Parallel processing in sparse feature finding

• Results

Outline

8

SIAM, PPSC-2012

• Results

• Conclusions

Page 9: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Problems in image pattern matching

Finding the Features of Image

• Each image = 10 million pixels!

• Most dimensions are irrelevant

• Multiple concepts inside the image

Feature Extraction

Feature Extraction

Training / ClassifierTraining / Classifier

9

SIAM, PPSC-2012

• Features are a quantitative way for machines to understand of an image

Image Property Feature Technique– Local Color (Luma + Chroma Components)– Object texture (Fourier domain/Wavelet)– Shape (Curvelets, Shapelets)– Lower level gradients (Wavelets: Haar, Daubechies)– Higher level descriptors (SIFT/SURF/etc)– Overall image descriptors (GIST)

Page 10: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Numerous features: subset is relevant

Feature

Extraction

Feature

ExtractionMatching &

Association

Matching &

Association

10

SIAM, PPSC-2012

• FEATURES ARE:

• Red bricks on multiple buildings

• Small hedges, etc

• Windows of a certain type

• Types of buildings are there

• FEATURES ARE:

• Arches and white buildings

• Domes and ancient architecture

• Older/speckled materials (higher frequency image content)

• FEATURES ARE:

• More suburb-like

• Larger roads

• Drier vegetation

• Shorter houses

Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images

Choice of features requires looking at multiple semantic concepts defined by entities and attributes inside of images

Page 11: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Environment is relevant

Feature

Extraction

Feature

ExtractionClassification

/ Training

Classification

/ Training

• Feature invariance in images is necessary for most concepts– to transformations (e.g., 3D rotation, translation, scale)– to dynamic content (e.g., deformable parts)– to various contexts (e.g., illumination at different times of day)– …to different instances

11

SIAM, PPSC-2012

• Some features (e.g. SIFT) acquire some of these attributes but only to a certain extent

– 30 Degrees (SIFT)– Many times don’t match– Illumination invariance

• A collective group of features is necessary (boosting/blending)– A large set of features make training/classification more complex– Training is very difficult (feature extraction & training/classification)

Page 12: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Tools to hand labelconcepts

• 2006-2011– Google Image Labeler– Kobus’s Corel Dataset– MIT LabelMe

Getting the Right Features

Feature

Extraction

Feature

Extraction

12

SIAM, PPSC-2012

– MIT LabelMe– Yahoo! Games

• Problems– Tedious– Time consuming– Incorrect– Very low throughput

• Face detection, time to collect all the data

• Feature selection: currently an active area of research

Page 13: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Computational complexity is high– Feature extraction is difficult problem– Classifiers are difficult problems– Traditionally passed between the two problems

More knowledge of the domain/model, rely on better feature extractorsLess knowledge, rely (unfortunately) on complexity in discriminant methods

• Would like to feed the entire image into– Won’t need to manually segment images– Feeding in noise will learn the context (info about surroundings)– Learn multiple instances of a concept (build invariance through example)

Automatically Learn the Best Features

Feature Extraction

Feature Extraction

Training / ClassifierTraining / Classifier

13

SIAM, PPSC-2012

– Learn multiple instances of a concept (build invariance through example)– Massively parallel per image per class

• Take several features, and subselect the “best” ones

Image Class 1 Image Class 2 Image Class N

Distribution 1

Entire image

Distribution 2 Distribution N

Entire image Entire image

Training imagesTraining images

Automatic feature subselection has been submitted to SSP 2012

Page 14: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Lots of work in the 1990s– Conditional probabilities through large training data sets– Vasconcelos et al’s work on semantic image retrieval– Primarily based on multiple instance learning and noisy density

estimation

• Learning multiple instances of an object (no noise case)

How do we do it?

14

SIAM, PPSC-2012

• Robustness to noise through law of large numbers– Hope to integrate it out

– Although the area of red boxes per instance is small, their aggregate over all instances is dominant

Noise, if uncorrelated, will

become more and more sparse

Page 15: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Statistical Distributions:– Generative methods: represent millions of points by a few

parameters

• Mixture hierarchies can be incrementally trained

Parallel Calculations through Mixture Hierarchies

Top Level GMM

15

SIAM, PPSC-2012

• Problems with HGMMs– Extensive computational process to bring hierarchies together– Difficult to train as each level requires initialization point– Specify number of classes at each level for initialization

Lower Level GMMs

Can be done in parallel

image 1 image 2 image 3

Page 16: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Gaussian mixtures as a density estimate– Non-convex / sensitive to initialization– Iterative and very slow– Small sample-bias is large

• Think discriminantly:– Instead of: Generating centroids that represent images– Think: Prune features to eliminate redundancy

Finding a sparse basis set

16

SIAM, PPSC-2012

• Sparsity optimization– Solving directly for the features that we want to use– Induces less complexity and as will see is a LP problem– Reduction of redundancy is intuitive and not generative

• Under normalization, GMM’s classifier can be implemented with matched filter instead

normalize

><∈

iCi

yx,maxarg},...,1{

2

2||||maxarg iyx −β

Page 17: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Gaussian Mixture Models

• Many optimization problems (compressed sensing) induce sparsity:

Finding sparsity with linear programming

Group Lasso

GMM, solved via EM

(non-convex optimization problem)

Exponential according to N

Each iteration O(MNd2)

[ ])(||)(||maxarg 2

2 ββλββ

TXX +−

}1,0{∈ijβ 11 =Tβsuch that and

∑ ∑= =

∑∑

∑−

N

j

M

m

mmjmm xpMM 1 1

......)|(logmin

11

µπµµ

17

SIAM, PPSC-2012

• Matched filter constraint:

• Relaxation of constraints

Max-Constraint Optimization

LP Optimization Problem:

Faster than EM

Faster than G-Lasso

Independemt of dimensionality!

Convex (unlike MF opt & GMM, EM)

On average, according to N2

+− ∑

i

i

TtXXtr λβ

β

)(minarg

10 ≤≤≤ iij tβ 11 =Tβsuch that and

[ ]2

2||1||)(maxarg βλββ

+XXtrT

}1,0{∈ijβ 11 =Tβsuch that and

Page 18: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Relies on covariance matrix concept

Intuition

< t1

+−= ∑

i

i

TtXXtr λββ

β

)(minarg*

10 ≤≤≤ iij tβ 11=Tβs.t. and

1.0098.1

18

SIAM, PPSC-2012

• Actual implementation does not include covariance matrix, but rather keeps track of beta indices

β =

< t1

< t2

< t3

< t4

=

195.1.01.0

95.12.00

1.02.0198.

1.0098.1

XXT

Page 19: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Motivation

• Problems and Challenges

• Training an Image Database

• Results

• Conclusions

Outline

19

SIAM, PPSC-2012

Page 20: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

LP Feature Learning versus G-Lasso

20

SIAM, PPSC-2012

• More intuitive grouping– Threshold learning is unnecessary– Post-processing is unnecessary

• 5.452% more accurate in +1/-1 learning classes

• 80.054% faster than GMM’s

Page 21: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Classifying Texture

20 40 60 80 100 120 140 160 180 200 220

20

40

60

80

100

120

140

160

180

200

Decision Confidence

50 100 150 200

20

40

60

80

100

120

140

160

180

200 4

6

8

10

12

14

DecisionsOriginal Image

21

SIAM, PPSC-2012

Decision ConfidenceDecisionsOriginal Image

Page 22: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Complexity and Confusion Matrix

• 1400 images per dataset

• Filter reduction to 356 filters per class

• Less than a minute classification time

• Coverage of cities: entire cities (Vienna, Dubrovnik, Lubbock), portion of Cambridge (MIT-Kendall)

22

SIAM, PPSC-2012

Training

Datasets MIT-Kendall

Vienna Dubrovnik Lubbock

Testing MIT-Kendall 0.975 0.056 0.024 0.102

Vienna 0.050 0.896 0.035 0.060

Dubrovnik 0.015 0.024 0.905 0.057

Lubbock 0.097 0.002 0.053 0.901

Page 23: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Computational Results & Accuracy

0

2

4

Co

mp

uta

tio

n t

ime

lo

g1

0(m

in)

200

400

600

MS

E

GMMs

Beta Opt

23

SIAM, PPSC-2012

• Fixed iteration and k GMM’s

• Best initialization via k-means (not included in optimization)

2 2.5 3 3.5 4 4.5 5 5.5 6-2

log10

#DCT features

Co

mp

uta

tio

n t

ime

lo

g

2 2.5 3 3.5 4 4.5 5 5.5 6

0

Page 24: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Interesting automatic semantic learning result

24

SIAM, PPSC-2012

Page 25: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Training in computer vision is troublesome– Big data– Feature extraction– Non-automated processes

• Statistical characterization reduces complexity

• Redundancy arbitration achieves savings

Conclusions

25

SIAM, PPSC-2012

• Redundancy arbitration achieves savings

• Feature selection through LP programming produces gains in computation time

Page 26: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• MIT Lincoln Laboratory– Karl Ni– Nicholas Armstrong-Crews– Scott Sawyer– Nadya Bliss

• MIT– Katherine L. Bouman

Contributors and Acknowledgements

26

SIAM, PPSC-2012

• Boston University– Zachary Sun

• Northeastern University– Alexandru Vasile

• Cornell University– Noah Snavely

Page 27: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Questions?

27

SIAM, PPSC-2012

Page 28: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Between Class Training

28

SIAM, PPSC-2012

• There’s sky in both of these

• It’s a feature that is descriptive of most situations where you would find a car or a buffalo

• Simply get rid of the features that are common

• Find the most discriminative features

Car Class Buffalo Class

Page 29: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

• Training an image class is most efficiently done with a small number of descriptive yet discriminative features. Features are often manually handpicked from subsets of imagery or machine generated feature extractors. It is beneficial to automatically discard irrelevant features and retain the most representative ones. Determining the best features to use is inherently a difficult and computationally tasking process. Such a methodology would allow training large scale datasets quickly, in

Abstract

29

SIAM, PPSC-2012

would allow training large scale datasets quickly, in parallel, and without human aid. We overview an automated technique in image pattern matching that uses sparse optimization constraints to select the best subset of large amounts of feature data.

Page 30: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Can you tell what is in this picture?

30

SIAM, PPSC-2012

Courtesy A. Torralba

Page 31: On Sparse Representations for Scalability in Image Pattern Matching€¦ · Nov. 8, 2008 Oct 9, 2009 Nov 13, 2007 “Still, it’s a staggering number of photos for a site that launched

Context in processing is important

31

SIAM, PPSC-2012

Courtesy A. Torralba