large scale image processing

Large Scale Image Processing with Hadoop

Brandyn [email protected]

Advisor: Prof. Larry Davis

Outline

• 'Big Data' in Computer Vision• Map/Reduce and Computer Vision• Map/Reduce Image Search• Application: Screenshot Retrieval

'Big Data' in Vision

• Traditional Vision: Focus on the modelo Pose Est.: 2D Image -> Virtual 3D model + Camera

Under-constrained, slow, sensitive to noiseo Object Recognition: SVM + features

Breaks with many classes (e.g., every flickr tag)

• New Trend: Focus on the datao DB of images (w/ metadata) -> query imageo Problem becomes similar image searcho Transfer metadata from DB images to query imageo KNN methods simple and scalable

Clustering, hashing, metric learning

• NLP: rule-based models -> statistical models

Example: Image Search -> MetadataQuery Image

Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)

TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews



Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)




Output Metadata

TagsLocation (GPS)

Big Data in Vision: Pose EstimationGoal: Given an image of a person, estimate 3D pose.

G. Shakhnarovich, P. Viola, T. Darrell Fast pose estimation with parameter-sensitive hashing, October 2003.

Big Data in Vision: Scene CompletionGoal: Given an image and a selected region, fill the region with a plausible texture.

J. Hays and A. A. Efros, "Scene completion using millions of photographs," in SIGGRAPH '07: ACM SIGGRAPH 2007 papers. New York, NY, USA: ACM, 2007, pp. 4+.

Big Data in Vision: IM2GPSGoal: Given an image, guess where in the world it was taken.

J. Hays and A. A. Efros, "Im2gps: estimating geographic information from a single image," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1-8, 2008.

Big Data in Vision: Object RecognitionGoal: Given an image, select a noun that describes it.

A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008

Big Data in Vision: Pixel AnnotationGoal: Given an image, annotate every pixel (e.g., building).

C. Liu, J. Yuen, and A. Torralba, "Nonparametric scene parsing: Label transfer via dense scene alignment," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1972-1979, 2009.

Big Data in Vision: One Frame MotionGoal: Given an image, estimate the pixel motion.

C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, "Sift flow: Dense correspondence across different scenes," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 28-42.

Outline


Hadoop+CV: No Reducer

Example Maps• Object Detection (e.g., cars, faces)• Feature Computation (e.g., SIFT)• Sliding Windows (given a region+image)

Map Map Map

Hadoop+CV: Model Creation

Map: Feature ComputationRed: Model CreationExamples• Classifiers (e.g., SVM, Bayes)• Geometry Problems (e.g., RANSAC, SfM)

Reduce

Map Map Map

Hadoop+CV: Expectation Maximization

Map: Fit data to model given parameters (E-Step)Red: Compute new model parameters given data (M-Step)Iterate until stopping conditions are met.Examples• Clustering (e.g., K-Means)• Mixture Models (e.g., MoG)

Vec0 Vec1 Vec2

Map Map MapParameter Estimate (in JAR or cache)

Reduce

Outline


Image Retrieval with Hadoop

• Analogies between image and text retrievalo Bag of Words -> Bag of Featureso Document -> Imageo Visual Word: Cluster of similar visual features

• Compute Local Image Features (e.g., SIFT)• Cluster Features (i.e., create visual words)• Find cluster medians• Make Hamming Embeddings (compact feature) [1]

o Efficient binary code (256 -> 8 Bytes per feature)o Hamming Distanceo Benefit: Small size means more in memory

• Inverted Index[1] H. Jegou, M. Douze, and C. Schmid, "Hamming embedding and weak geometric consistency for large scale image search," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 304-317

Hadoop Job Workflow

Image Features (SURF 64D)

Remove Dupes (Curr./Prev.)

(Database Images)

K-Means Clustering (Initial)

K-Means Clustering

Median Computation

Hamming Embedding

Hadoop Job Workflow: Image Features


(Database Images)

Map In: (image_url, image_hash, image_data, image_tags)

Map Out: (image_hash, image_url, image_features)

Hadoop Job Workflow: Remove Dupes

Map In: [image_hash, image_url, image_features]orMap In: [image_hash] (for images already in the DB)

Map Out Key: image_hashMap Out Val: image_features

Reduce Out: [image_hash, image_feature]



Hadoop Job Workflow: K-Means (init)

Map In: [image_hash, image_feature]

Map Out Key: random [0,1]Map Out Val: image_feature (extended by 1 dim to get count)

1 Reducer (outputs once per cluster)Reduce Out: [cluster_num, cluster_mean]



Hadoop Job Workflow: K-Means

File: cluster_meansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature (extended by 1 dim to get count)

Reduce Out: [cluster_num, cluster_mean]


K-Means Clustering

Hadoop Job Workflow: Medians

File: cluster_meansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature

Reduce Out: [cluster_num, cluster_median]

K-Means Clustering

Median Computation

Hadoop Job Workflow: Ham. Emb.

File: cluster_means, cluster_mediansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: hamming_embedding

Reduce Out: [cluster_num, hamming_embedding]

Median Computation

Hamming Embedding

Image Retrieval Overview: Query


Find Nearest Cluster

For each feature...

(Query Image)

Compute hamming embedding(using cluster median)

Vote (tf-idf) for DB image if a feature if hamming dist < Thresh

Outline


Current Work: PC Help Doc. Retrieval

• Goal: Take a screenshot and retrieve books and websites that provide relevant help documentation.

Tom Yeh, Brandyn White, Larry Davis, and Boris Katz

Outline


Conclusion

• Vision has 'Big Data' applications• Many image search applications• Common design patterns for M/R+Vision• Hadoop useful image search

References

[1] P. Duygulu, K. Barnard, J. de Freitas, and D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Computer Vision — ECCV 2002, ser. Lecture Notes in Computer Science, 2002, ch. 7, pp. 349-354.[2] A. Makadia, V. Pavlovic, and S. Kumar, "A new baseline for image annotation," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 316-329.[3] Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek and Cordelia Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation." ICCV 2009[4] A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008.

large scale image processing

Documents

single image

d image virtual

visiontraditional vision

large scale image processing

scene recognition

screenshot retrievalbig

large data set

larry davisoutlinebig