searching images by color using solr

35

Upload: chris-becker

Post on 02-Jul-2015

1.096 views

Category:

Internet


2 download

DESCRIPTION

Slides from "Searching 35 Million Images by Color Using Solr" presented by Chris Becker at Solr Lucene Revolution 2014 in Washington D.C.

TRANSCRIPT

Page 1: Searching Images by Color Using Solr
Page 2: Searching Images by Color Using Solr

Searching Images by ColorChris Becker

Search Engineering @ Shutterstock

Page 3: Searching Images by Color Using Solr

What is Shutterstock?

• Shutterstock sells stock images, videos & music.

• Crowdsourced from artists around the world

• Shutterstock reviews and indexes them for search

• Customers buy a subscription and download them

Page 4: Searching Images by Color Using Solr

Why search by color?

Page 5: Searching Images by Color Using Solr

Stock photography on the internet…

images from www.shutterstock.com

Page 6: Searching Images by Color Using Solr

Stock photography on the internet…

images from www.shutterstock.com

Page 7: Searching Images by Color Using Solr

Color is one of many visual

attributes that you can use

to create an engaging

image search experience

Page 9: Searching Images by Color Using Solr

Diving into Color Data

Page 10: Searching Images by Color Using Solr

Color Spaces

• RGB

• HSL

• Lab

• LCH

images from www.wikipedia.org

Page 11: Searching Images by Color Using Solr

Calculating Distances Between Colors

• Euclidean distance works reasonably well in any color space

distRGB = sqrt((r1-r

2)^2 + (g

1-g

2)^2 + (b

1-b

2)^2)

distHSL = sqrt((h1-h

2)^2 + (s

1-s

2)^2 + (l

1-l

2)^2)

distLCH = sqrt((L1-L

2)^2 + (C

1-C

2)^2 + (H

1-H

2)^2)

distLAB = sqrt((L1-L

2)^2 + (a

1-a

2)^2 + (b

1-b

2)^2)

• More sophisticated equations that better account for human

perception can be found at

http://en.wikipedia.org/wiki/Color_difference

Page 12: Searching Images by Color Using Solr

Images are just numbers

[

[[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]],

[[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]],

[[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]],

[[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]],

[[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]],

[[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]],

[[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]],

[[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]],

[[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]],

[[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]],

[[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]],

[[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]],

[[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]],

[[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]],

[[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]],

[[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]],

[[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]],

[[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]],

[[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]],

]

Page 13: Searching Images by Color Using Solr

• getting histograms

• computing median values

• standard deviations / variance

• other statistics

Any operation you can do on a set of

numbers, you can do on an image

Page 14: Searching Images by Color Using Solr
Page 15: Searching Images by Color Using Solr

Extracting Color Data

Page 16: Searching Images by Color Using Solr

Tools & Libraries

• ImageMagick

• Python Image Library

• ImageJ

Page 17: Searching Images by Color Using Solr

# python example to get a histogram from an image

import PIL

from PIL import Image

from pprint import pprint

image = Image.open('./samplephoto.jpg')

width, height = image.size

colors = image.getcolors(width*height)

hist = {}

for i, c in enumerate(colors):

hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2])

hist[hex] = c[0]

pprint(hist)

Page 18: Searching Images by Color Using Solr

Indexing & Searching

in Solr

Page 19: Searching Images by Color Using Solr

Indexing color histograms

color_txt = "cfebc2

cfebc2 cfebc2 cfebc2

cfebc2 cfebc2 cfebc2

cfebc2 cfebc2 cfebc2

95bf40 95bf40 95bf40

95bf40 95bf40 95bf40

2e6b2e 2e6b2e 2e6b2e

ff0000 …"

• index colors just like you would index text

• amount of color = frequency of the term

Page 20: Searching Images by Color Using Solr

Solr Schema & Queries

• Can use solr’s default ranking effectively

/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax…

• or use term frequencies directly for specific sort functions:

sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc

<field name="color" type="text_ws" …>

Page 21: Searching Images by Color Using Solr

Indexing color statistics

lightness:

median: 2

standard dev: 1

largest bin: 0

largest bin size: 50

saturation

median: 0

standard dev: 0

largest bin: 0

largest bin size: 100

Represent aggregate statistics of each image

Page 22: Searching Images by Color Using Solr

Solr Fields & Queries

• Sort by the distance between input param

and median value for each image

/solr/select?q=*&sort=abs(sub($query,hue_median)) asc

<field name=”hue_median” type=”int” …>

Page 23: Searching Images by Color Using Solr

Ranking & Relevance

Page 24: Searching Images by Color Using Solr

How much of the image has the color ?

image from www.shutterstock.com

Page 25: Searching Images by Color Using Solr

is this relevant if I search for ?

image from www.shutterstock.com

Page 26: Searching Images by Color Using Solr

which image is more relevant if I search for ?

image from www.shutterstock.com

Page 27: Searching Images by Color Using Solr

is this relevant if I search for ?

image from www.shutterstock.com

Page 28: Searching Images by Color Using Solr

How do we account for these factors?

Page 29: Searching Images by Color Using Solr

How much of the image contains the

selected color?

• Score each color by the number of pixels

sort=tf(color,"cfebc2") desc

Page 30: Searching Images by Color Using Solr

Balance Precision and Recall

• Reduce your colorspace enough

to balance:

• color accuracy

• index size

• query complexity

• result counts

• only need 100-200 colors for a good UX

Page 31: Searching Images by Color Using Solr

Weighing Multiple Colors Together

• If you search for 2 or more colors, the top result should have

the most even distribution of those colors

• simple option:sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc

• more complex: compute the standard deviation or variance

of the term frequencies of matching color values for each

image, and sort the results with the lowest variance first.

Page 32: Searching Images by Color Using Solr

Weighing Similar & Different Colors

• The score for one color should reflect all the colors in the image.

• At indexing time, increase the score based on similar colors;

decrease it based on differing colors.

Page 33: Searching Images by Color Using Solr

Conclusion

Page 34: Searching Images by Color Using Solr

Conclusion• Steps for building color search in Solr:

• Extract colors using a tool like the Python Image Library

• Score colors based on the number of pixels

• Adjust scores based on similar / different colors

• Index colors into Solr as text document

• In your query, sort by the term frequency values for each

color

Page 35: Searching Images by Color Using Solr

One more demo…