labeling images for fun!!! yan cao, chris hinrichs
TRANSCRIPT
![Page 1: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/1.jpg)
Labeling Images for FUN!!!
Yan Cao, Chris Hinrichs
![Page 2: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/2.jpg)
How do you improve Learning systems?
• Get more processing power. (Faster computers, more memory, more parallel.)
• Find a more sophisticated algorithm.• Get lots and lots of quality data.
![Page 3: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/3.jpg)
Why Manually label Images?
• A job that’s easy for humans but challenging for Computer Vision
• Why? Acquire Ground Truth– Segmentation, i.e. object extraction from an
image, is hard– Multiple poses and views of objects– Depth of objects, which one is in the front when
there is an intersection– Relationships between objects and their parts.
E.g., face and eyes, car and wheels
![Page 4: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/4.jpg)
General Idea to make computers do labeling – Supervised learning
• Enough training data – Images with manually pre-assigned labels.
• Classifiers which are trained by the training data and used to label the queried images.
• If we want do segmentations on the queried images, the training images need to include the information about the boundaries of the inside objects.
![Page 5: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/5.jpg)
Who is willing to be volunteer
• Manually Labeling numerous images is a tedious job
• Motivations which can make humans do something– Money! You know you will be paid– Fun! You enjoy doing it– Gain respect from others
![Page 6: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/6.jpg)
ESP – an image labeling game
• Rules– Server randomly arranges a partner to you (could
be a “bot”)– The same image on the two partners’ screen– When the labels typed by the partners match each
other, gain scores and move to the next image– There might be some taboo words which can not
be the labels for the image
![Page 7: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/7.jpg)
ESP – an image labeling game
• Rules– Partners strive to agree on as many images as
they can in 2.5 minutes– Partners can choose to pass images when they
both click “Pass” button– The more images the partners agree on the labels,
the higher the final scores they achieve
![Page 8: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/8.jpg)
ESP – an image labeling game
![Page 9: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/9.jpg)
ESP – an image labeling game
![Page 10: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/10.jpg)
Taboo Words
• Gained from the game– When the image is shown the first time in ESP,
there are no taboo words– If the image is used again, there is a taboo word
which is obtained from last agreements
• At most 6 taboo words for one image• Taboo words guarantee that each image has
many different labels
![Page 11: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/11.jpg)
Good Label Threshold
• It is the threshold to include a label to the list of taboo words for an image
• If threshold = 1, it means that once a pair of partners agree on a label, this label will be set as a taboo word
• If threshold =10, when 10 pairs of partners agree on the same label for an image, it is set as a taboo word
![Page 12: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/12.jpg)
Image source
• Randomly selected from the Web using a small amount of filters
• From “Random Bounce Me”, which randomly returns images from Google database
• Qualifications of images:– Large enough (>20 pixels on either dimension)– Aspect ratio between (1/4.5, 4.5)– Not blank/single color image
![Page 13: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/13.jpg)
Evaluation
• Are the labels relevant to the images?– Do a search within the labeled images in the ESP
database
• Are the players motivated by the game?– Do statistics on user log
• How’s the labeling rate?– See how many images are labeled within a time
period
![Page 14: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/14.jpg)
Accuracy of Labels• 20 images are randomly selected from ESP• 15 participants are asked to label 20 images
with 6 labels on each image, given no information about the taboo words.
• When the labels made by the participants are compared with the labels obtained from the game, 83% of the labels match
• For all images, the 3 most common words entered by the participants were contained by ESP labels
![Page 15: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/15.jpg)
Example: some images labeled with “car”
![Page 16: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/16.jpg)
Is it fun?
• Over 80% users played the game on multiple dates
• In 4 months, 33 players played more than 50 hours on the game
![Page 17: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/17.jpg)
Labeling Rate
• If there are 5000 users online all 24 hours (it is easy to reach for online games), within a month all images in Google database (425,000,000) will be labeled!
![Page 18: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/18.jpg)
More than Labeling
• What if the players tell more information about images, such as where the objects are in the images?
• Peekaboom– An interesting game which is fun and at the same
time, collects information other than labels
![Page 19: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/19.jpg)
Peekaboom
![Page 20: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/20.jpg)
Rules of Peekboom
• Pairs of partners randomly arranged by Server• One sees a whole image and its label (Boom
side)• The other sees a blank screen and an input
box at bottom (Peek side)• The boom partner clicks on the image and
each click reveals an area with a 20-pixel radius to the peek partner
![Page 21: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/21.jpg)
Rules of Peekaboom
• According to the revealed parts, the peek partner inputs labels until one matches the label shown on the boom side
• The boom partner can give hints to help the peek partner get the right label– Ping the “key” parts in the images– Tell how the word is related to the image
![Page 22: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/22.jpg)
Hints given by the boom partner
![Page 23: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/23.jpg)
Rules of Peekaboom
• The partners switch between peek and boom alternatively
• For images with a hard-to-guess label, the partners can choose to pass
• The more images they correctly label in 2.5 minutes, the higher their score
• To make the game more fun, bonus rounds are added and users are ranked by their scores
![Page 24: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/24.jpg)
Information collected by Peekaboom
• How the word relates to the image (from hints)
• Pixels necessary to guess the word• The pixels inside the object, animal, or person
(from pings)• The most salient aspects of the objects in the
image (from the sequence of clicks)• Elimination of poor image-word pairs (passing)
![Page 25: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/25.jpg)
Applications based on the information
• Improving Image-Search Results– images in which the word refers to a higher
fraction of the total pixels should be ranked higher
• Bounding boxes of objects
![Page 26: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/26.jpg)
Applications based on the information
• Using Ping data for pointing
![Page 27: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/27.jpg)
Evaluation
• Do people have fun?– More than 90% people play multiple times on
different days– Players on the “Top Scores” all played over 53 hours
• Accuracies of collected data– Bounding boxes. Participants VS Peekaboom.
Overlap percentage 0.754– Accuracies of Pings. Participants VS Peekaboom.
100% accuracy!
![Page 28: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/28.jpg)
Label Me
Russel et. al. MIT CSAILab
![Page 29: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/29.jpg)
Improving on image captions• Many image DBs are available
which have captions for every image, which say what is in the image.
• LabelMe allows users to add their own bounding boxes around objects and label them directly.
• LabelMe’s authors claim their pictures are taken from a wide variety of places. (They seem to be mostly street scenes, and other travel photos, and a few insides of houses.)
![Page 30: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/30.jpg)
How do you participate?
• Just go to the URL: http://labelme.csail.mit.edu/• You are given an image, which may or may not have
previously drawn boundaries. If you see an object which you can identify, draw a boundary, and when you close the polygon it asks for a label.
• There are no rules on how to choose the labels, or on how closely to draw the boxes. They trust your judgment – but more importantly, it reflects peoples’ different ideas.
![Page 31: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/31.jpg)
How good are the bounding boxes?
• It varies.
![Page 32: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/32.jpg)
More general results:
25th, 50th, and 75th percentile by polygon count of come common object types.
![Page 33: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/33.jpg)
We can learn something about the way people take pictures from the distribution of where objects are located. Generally, people are standing when they take pictures.
![Page 34: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/34.jpg)
What do the average objects look like?
![Page 35: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/35.jpg)
Tying it in with WordNet
• Some words have synonyms: man/woman, person, pedestrian; car, automobile, cab, suv
• Look up each label on Wordnet. The authors report 93% of labels found a matching WordNet entry, though some manual word sense disambiguation had to be done.
• This allows queries to match at various levels of specificity in the WordNet tree, and more general queries.
![Page 36: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/36.jpg)
Some general queries & results, using WordNet
![Page 37: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/37.jpg)
Dealing with occlusion: simple rules
• If an object is completely contained, it is inside.• If it has more control points in the overlapping region is
probably on top.• Can use features like color histograms to match the
overlapping region with one region or the other, but this is expensive, complicated, and doesn’t work as well.
![Page 38: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/38.jpg)
Depth ordering results
![Page 39: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/39.jpg)
Image search reranking• Do segmentation on query image, extract features, compare
with features of regions labeled with search terms, reorder by strength of correlation.
![Page 40: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/40.jpg)
80 Million Tiny Images
Torralba et. al. http://people.csail.mit.edu/torralba/tinyimages/
![Page 41: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/41.jpg)
Shrinking images
• How much information does an image need to contain in order to identify its contents?
• Why not ask humans before asking computers?
• Torralba et. al. looked for the minimum resolution that humans need in order to identify the contents of an image.
![Page 42: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/42.jpg)
Can you tell what these are?
![Page 43: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/43.jpg)
Note that for color images, the humans’ accuracy levels off at 32x32. For grayscale, the same happens at 64x64.
The humans did much better at 32x32 resolution than the best recognition algorithms did at full resolution.
32x32x3 dimensions for color images, 32x32x4 dimensions for grayscale with very nearly the same accuracy, so ~3000 dimensions needed for recognition.
![Page 44: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/44.jpg)
Next: Acquire a huge number of images
• Where do you start? – even at reduced resolution, there are just too many images out there to get them all.
• Start with WordNet. For each of the 75,062 concrete nouns in Wordnet, do an image retrieval search on many image search engines. They used, Google, Cydral, AltaVista, Flickr, Picsearch, and Webshots.
• Then eliminate duplicates and solid-color images.• About 10% of the words were rare, and had no matching images.
![Page 45: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/45.jpg)
Finding nearest neighbors• Need a distance metric to compare the tiny images. They examine 3:
SSD(Sum of Squared Differences), Warp, and Shift.
![Page 46: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/46.jpg)
SSD
• Normal SSD is done by summing the squared difference over all dimensions.
• Computing distance between all pairs this way is too expensive, so they used the top 19 Principal Components. They did some experiments to show that this works reliably.
![Page 47: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/47.jpg)
Warp & Shift• Warp: Just warp the image in some simple way, like flipping, scaling or
translating, and see if that improves the SSD.
• Shift: Allow each pixel to shift in a 5x5 window, and take the best SSD from that. (Crude approximation of general warping.)
![Page 48: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/48.jpg)
Effect of DB size
• As the DB grows, the quality of nearest neighbors noticeably changes, even up to ~100,000,000.
![Page 49: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/49.jpg)
Applications
• Object Recognition• Image retrieval reranking• Person Detection & localization• Image Colorization
![Page 50: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/50.jpg)
Recognition• Recognition is done by finding
neighbors, and retrieving the Wordnet entry for each.
• Each one corresponds to a unique leaf node in the WordNet tree, and gets a single “vote”.
• Unify the branches into a tree, weighting internal nodes by how many branches pass through them.
• Classify by following link to highest voted child node.
![Page 51: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/51.jpg)
Image search rerankingDo an image search on, say, “person”, on any image retrieval engine. Then find the correlation with the search term with the neighbor set of each image returned, and rank them based on the strength of the correlation with the original search terms.
![Page 52: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/52.jpg)
Person detection
Images matched with the Wordnet node “person” and their nearest neighbors. Note that the neighbors match the part of the person shown in the query image, and their poses and color of clothing.
Here, the system only returns whether the best match passes through the “person” internal node.
The internet has a large bias towards images with people in them, so not all applications of this method will work with things that are not people.
![Page 53: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/53.jpg)
Person locationGiven a portion of an image, we can find its neighbors, and measure the correlation with “person” in that set.
Extending this, we can find the portion of a query image whose neighbor set has the highest correlation with “person”. This region is very likely to have a person in it.
![Page 54: Labeling Images for FUN!!! Yan Cao, Chris Hinrichs](https://reader036.vdocument.in/reader036/viewer/2022062323/56649ebd5503460f94bc6884/html5/thumbnails/54.jpg)
ColorizationGiven a query image, (grayscale,) find its neighbor set, and take the average color of the set. Then apply that coloring to the grayscale image. Surprisingly, this works, especially given that not all neighbor images are even of the same type of object!