visual recognition with humans in the...
TRANSCRIPT
Serge Belongie UC San Diego
Visual Recognition with Humans in the Loop
Peter Welinder Pietro Perona
Steve Branson Catherine Wah Boris Babenko Florian Schroff
http://www.cse.ucsd.edu
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
What Is Visipedia?
http://en.wikipedia.org/wiki/Bird
• The visual counterpart to Wikipedia • A user-generated encyclopedia of visual knowledge • An effort to associate articles with large quantities of
well-organized, intuitive visual concepts
Motivation• People will willingly label or organize certain
images if: – They are interested in a particular subject matter – They have the appropriate expertise
Ring-tailed lemur Thruxton Jackaroo
Motivation• Construct a more comprehensive and intuitive
knowledge base of visual objects • Provide services like better text-to-image search
and image-to-article search
Populating Visipedia• Populate Wikipedia articles with more visual
data using large quantities of unlabeled data on the web
World wide web Visipedia
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
Related Work: Systems• Botanist’s Field Guide [Belhumeur et al. ’08] • Oxford Flowers [Nilsback & Zisserman ’08] • STONEFLY9 [Martínez-Muñoz et al. ’09] • omoby [IQEngines.com ‘10] • 20 Questions game [20q.net] • ReCAPTCHA [von Ahn et al. ’08] • Wikimedia Commons
!10
Related Work: Methods
• Relevance Feedback • Active Learning • Expert Systems • Decision Trees • Feature Sharing & Taxonomies • Parts & Attributes • Crowdsourcing & Human Computation
!11
Attribute-Based Classification• Train classifiers on
attributes instead of objects • Attributes are shared by
different object classes • Attributes provide the
ingredients necessary to recognize each object class
Lampert et al. 2009 Farhadi et al. 2009
Wikimedia Commons• Multiple ways of
organizing sub-categories and visual information
• Sub-categories or clusters are represented by some exemplar image
http://commons.wikimedia.org/wiki/Dog
Motivation (Computer Vision Perspective)
• Need for more training data – Beyond the capacity of any one research group – Better quality control
• Need for more realistic data – Let people define what tasks are important – Study tightly-related categories
Dealing With a Large Number of Related Classes
• Standard classification methods fail because: – Only small number of training examples per class are
available – Variation between classes is small – Variation within a class is often still high
Brewer’s Sparrow Vesper Sparrow
!17
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
Birds-200 Dataset
6033 images over 200 bird species
Image Harvesting
• Flickr: text search on species name • MTurk: presence/absence and bounding
boxes
!20
!21
!22
Attribute Labeling• attributes from whatbird.com • 25 visual attributes -> 288 binary attributes
– similar to “dichotomous key” in biology • MTurk interface
– {guessing, probably, definitely} • 5x redundancy factor
!23
Attribute-Based Classification• Number of attributes
is less than number of classes
• Attribute classification tasks might be easier
• Makes it easier to incorporate human knowledge
www.whatbird.com
!25
!26
!27
!28
!29
MTurker Label Certainty
MTurker Feedback• “These hits were fun. Will you be posting more of them anytime
soon? Thanks!” • “These are Beautiful birds and I am enjoying this hit collection” • “I really enjoy doing your hits, they are fun and interesting. Thanks.” • “Love doing these because I'm a bird watcher.” • “the birds are so cute..hope u can send more kind of birds” • “I haven't really studied birds, but doing these HITs has made me
realize just how beautiful they are. It has also made me aware of the many different types of birds. Thank you”
• “I REALLY LOVE THE COLOR OF THE BIRDS.” • “Thank you for providing this job. The fact that the images are
beautiful to look at make it a lot more enjoyable to do!” • “Enjoyable to do.” • Hourly Wage ≈ $1.25
!30
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
Visual 20 Questions
!32
• “Computer Vision” module = Vedaldi’s VLFeat • VQ Geometric Blur, color/gray SIFT spatial pyramid • Multiple Kernel Learning • Per-Class 1-vs-All SVM • 15 training examples per bird species • Choose question to maximize expected Information Gain
!33
Outline• Visipedia project overview • Relevant Work • Birds-200 dataset • “Visual 20 Questions” game • Results • Discussion
General Observations
• User Responses are Stochastic • Computer Vision Reduces Manual Labor • User Responses Drive Up Performance • Computer Vision Improves Overall
Performance • Different Questions are Asked w/ and w/o
Computer Vision • Recognition is not Always Successful
!35
0 10 20 30 40 50 600
0.2
0.4
0.6
0.8
1
Number of Binary Questions Asked
Perc
ent C
lass
ified
Cor
rect
ly
Deterministic UsersMTurk UsersMTurk Users + Model
w/o Computer Vision
!36
• User Responses are Stochastic
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Number of Binary Questions Asked
Perc
ent C
lass
ified
Cor
rect
ly
No CV1−vs−allAttribute
w/ Computer Vision
!37• Computer Vision Reduces Manual Labor
0 2 4 6 8 10 12 14 160
0.05
0.1
0.15
0.2
Number of Binary Questions Asked
Perc
ent o
f Tes
tset
Imag
es
No CV (11.11)1−vs−all (6.64)Attribute (6.43)
w/ Computer Vision (cont’d)
!38• User Responses Drive Up Performance
• Computer Vision Improves Overall Performance • Different Questions are Asked w/ and w/o
Computer Vision
• Recognition is not Always Successful
Indigo Bunting Blue Grosbeak
Future Work• More Birds! More Categories! • Attribute Induction • Incorporate Part Localization • Partner with Wikimedia Foundation
Project Website
• Database, harvesting software, etc – http://visipedia.org
!43
• (extra slides follow)
!44
Part Labels
• Part diagrams give some indication of the spatial configuration of parts, but people will do this only for a small number of images
Object Localization and Shared Parts
• Training a classifier with latent variables (Dollar et al. 2008, Felzenszwalb et al. 2008)
• Latent variables are things like the pose and location of parts
• Objects in related domains share the same types of parts and poses
Shared Parts and AttributesPine Warbler Cape May Warbler Kentucky Warbler
Yellow Beak Black Striped
Hornet
Attribute and Part Detectors
Belly
Object Localization and Shared Parts
Pine Warbler Cape May Warbler Kentucky Warbler
• Train part and attribute classifiers from class descriptions: – – – Part locations zi
belly, zihead, zi
beak in image xi are latent variables
Belly: solid, yellow Head: yellow Beak: all-purpose
Belly: striped, yellow, black Head: black Beak: all-purpose
Belly: solid, yellow Head: black Beak: all-purpose
Object Localization and Shared Parts
Pine Warbler Cape May Warbler Kentucky Warbler
• Training examples for each part/attribute span across different bird classes – For each Cape May Warbler image xi
Belly: solid, yellow Head: yellow Beak: all-purpose
Belly: striped, yellow, black Head: black Beak: all-purpose
Belly: solid, yellow Head: black Beak: all-purpose
Objects are More than Class LabelsPine Warbler Cape May Warbler Kentucky Warbler
Yellow Beak Black Striped
Hornet
Attribute and Part Detectors
• Represent objects as parts and attributes • Model relationships between classes • Pool training examples from different object classes • Define building blocks useful to detect new object classes
Belly
Classification Using Multiple Pathways
• Arrange recognition tasks into multiple “pathways”Bird Pathway
Bird Detector
Species Detectors
Parts, Pose,
Attributes
Face Pathway
Face Detector
Face Recognition
Parts, Attributes
Text Pathway
Text Detector
Text Reading
Image
Indoor vs. Outdoor
Graphic vs. Real Image
Classification Using Multiple Pathways
• Place redundant calculations in earlier pathways • Transfer information from easier tasks to harder
ones • Cascade classification tasks to avoid
unnecessary computations
Classification Using Multiple Pathways
• Pathway components: – A domain: a set of object classes that often have
similar parts or attributes – Takes as input an image, information extracted from
earlier pathways – Algorithms useful for extracting attributes and
information relevant to the domain – Outputs estimated attributes and part locations – Invokes other pathways as necessary
Clustering and Near Duplicate Detection
Raw Beef Cooked Beef Cow Diagrams
• Improve presentation of data by suppressing duplicate, redundant, or similar images
Clustering and Near Duplicate Detection
• Use similarity metrics in different feature spaces, e.g. bag of words, color histogram, GIST and standard methods for clustering and near duplicate detection
Image Registration• Bring unlabeled images into correspondence with a
labeled one using some matching function, e.g. an affine or perspective transformation or shape matching
• Transfer labels from labeled images to unlabeled ones
Visual Knowledge
• Associate categories with predictions of which visual attributes are most representative
Sacred IbisGlossy Ibis
Curved Beak
Interactive Labeling Systems• Speedup the population of image examples for
some category: – Active learning: intelligently query labeling tasks
while incrementally training a category classifier – Relevance feedback: use labeled images to re-rank
the relevancy of unlabeled to a category
• Semi-supervised segmentation methods, e.g. GrabCut
!59
!60
Combining Knowledge From Text and Images
• Leverage article text and link structure in Wikipedia articles
Connecting Knowledge Between Article Text and Images
• Use article text and link structure to predict class attributes, taxonomical structure, and object parts
Connecting Knowledge Between Article Text and Images
• Add “links” between images and article text