color naming 65,274,705,768 pixels

20
Color naming 65,274,705,768 pixels Nathan Moroney and Giordano Beretta HP Labs Electronic Imaging 2013: Color Imaging XVIII

Upload: nmoroney

Post on 06-Sep-2014

1.268 views

Category:

Technology


1 download

DESCRIPTION

Presented at Color Imaging XVIII: Displaying, Processing, Hardcopy, and Applications in 2013. Application of machine color naming to 200,000+ wikipedia images.

TRANSCRIPT

Page 1: Color naming 65,274,705,768 pixels

Color naming 65,274,705,768 pixels

Nathan Moroney and Giordano Beretta

HP Labs

Electronic Imaging 2013: Color Imaging XVIII

Page 2: Color naming 65,274,705,768 pixels

Outline

Motivation More (pixel) data

Finding and processing 65 billion pixels Hint: Wikipedia & a dual core Open MP color namer

What did you learn? The most frequent non-achromatic color term is…

What’s next? Other than a trillion pixels

Electronic Imaging 2013: Color Imaging XVIII

Page 3: Color naming 65,274,705,768 pixels

Motivation

Previous work in crowd-sourcing color training data and experimental efforts

Related work in the area of big (image) data A. Torralba, R. Fergus, W. T. Freeman, "80 million tiny images: a

large dataset for non-parametric object and scene recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008.

Ben Shneiderman, "Extreme Visualization: Squeezing a Billion Records into a Million Pixels", SIGMOD Conference, pp. 3-12, (2008).

Steven Seitz, “A Trillion Photos”, EI’13 Keynote (2013).

Electronic Imaging 2013: Color Imaging XVIII

Page 4: Color naming 65,274,705,768 pixels

Motivation

Electronic Imaging 2013: Color Imaging XVIII

Log Number of Images

0 1 2 3 4 5 6

Page 5: Color naming 65,274,705,768 pixels

Source Data

ImageClef 2010 snapshot Adrian Popescu, Theodora Tsikrika and Jana Kludas, "Overview

of the wikipedia retrieval task at ImageCLEF 2010", In the Working Notes for the CLEF 2010 Workshop, 20-23 September, Padova, Italy, 2010.

250,000 images plus associated wikipedia data 20 gigabytes 65,000,000,000 pixels uncompressed

Electronic Imaging 2013: Color Imaging XVIII

Page 6: Color naming 65,274,705,768 pixels

Source Data: At 200 PPI

Electronic Imaging 2013: Color Imaging XVIII

Page 7: Color naming 65,274,705,768 pixels

Processing

Basic single dual-core (but Open MP threaded) script to process over all image files

Simple stuff like getting image dimensions can be done over lunch

Uncompressing all the JPEG files to memory can take hours

Goal was a color naming algorithm that could be run in less than a day

Electronic Imaging 2013: Color Imaging XVIII

Page 8: Color naming 65,274,705,768 pixels

Processing

Some testing done using HP Cloud Services and compute clusters

But majority of focus on single computing device Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg

O'Shea, and Andrew Douglas. "Nobody ever got fired for using hadoop on a cluster", In HotCDP 2012 - 1st International Workshop on Hot Topics in Cloud Data Processing, (2012).

Electronic Imaging 2013: Color Imaging XVIII

Page 9: Color naming 65,274,705,768 pixels

Processing

Won’t describe the specifics of the color naming algorithm (throw produce if you have it) but generally Input single RGB pixel and output is a single color term Size of vocabulary or number of color terms is a parameter Relative range of chroma values corresponding to an achromatic

values is also a parameter Also currently testing a completely revised model Finally, in the Future directions section note that the

best option for formal publication is to make use of currently available open source machine learning toolboxes.

Electronic Imaging 2013: Color Imaging XVIII

Page 10: Color naming 65,274,705,768 pixels

Results: Aspect Ratios

Electronic Imaging 2013: Color Imaging XVIII

Wide range of image types

Most basic test of processing scripts

Page 11: Color naming 65,274,705,768 pixels

Results: Median

Electronic Imaging 2013: Color Imaging XVIII

Additional test and visualization of basic color properties of images

Large enough data set was worthwhile to write custom HTML5 2d canvas renderer

Page 12: Color naming 65,274,705,768 pixels

Results: Median

Electronic Imaging 2013: Color Imaging XVIII

So much data, that as noted by Shneiderman the density plot "uses a spatial substrate organizing principle, but shows concentrations of markers” is maybe a better idea

Data, alpha=0.05

Page 13: Color naming 65,274,705,768 pixels

Results: Max

Electronic Imaging 2013: Color Imaging XVIII

Max of R+G+B for the images

Final test of basic scripting code

Page 14: Color naming 65,274,705,768 pixels

Results

Electronic Imaging 2013: Color Imaging XVIII

Color terms across all images

Majority pixels achromatic

Top chromatic colors are arguably natural tones

Higher chroma terms relatively infrequent

Page 15: Color naming 65,274,705,768 pixels

Results

Electronic Imaging 2013: Color Imaging XVIII

Color terms per image

Peak at 5 are all achromatic terms or images

Gradual then rapid usage of chromatic terms

Page 16: Color naming 65,274,705,768 pixels

Results

Electronic Imaging 2013: Color Imaging XVIII

Sudden drop off at 30 is a model failure

Term added to vocabulary based on previous limited optimization

Page 17: Color naming 65,274,705,768 pixels

Current Work

Repeated entire process adjusting the model parameters

Processing to fill SQL databases Query the database to validate all of the steps and

explore specific

Electronic Imaging 2013: Color Imaging XVIII

Page 18: Color naming 65,274,705,768 pixels

Current Work SELECT * from

cntable order by skyblue desc limit 40

Electronic Imaging 2013: Color Imaging XVIII

Page 19: Color naming 65,274,705,768 pixels

Future Directions

Image collections as “pixel corpora” for algorithm design, testing and optimization. Similar to the role that written and spoken

corpora fill for NLP and corpus linguistics Useful to formalize for citation and

repeatability Additional analysis features Testing with more public domain

machine learning algorithms for repeatability

Electronic Imaging 2013: Color Imaging XVIII

Page 20: Color naming 65,274,705,768 pixels

Summary

Algorithm optimization, like machine color naming, with 200,000 images is different than with 200.

Based on Wikipedia, majority of visual content or pixels are achromatic

Based on Wikipedia, higher chroma named pixels are less frequent

Based on Wikipedia, there is a gradual then sudden transition in color term usage

Electronic Imaging 2013: Color Imaging XVIII