visual recognition with humans in the loop steve branson catherine wah florian schroff boris babenko...
TRANSCRIPT
1
Visual Recognition With Humans in the Loop
Steve Branson
Catherine Wah
Florian Schroff
Boris Babenko
Serge Belongie
Peter WelinderPietro Perona
ECCV 2010, Crete, Greece
2
What type of bird is this?
What type of bird is this?
3…?
Field Guide
4
What type of bird is this?
Computer Vision
?
5
What type of bird is this?
Bird?
Computer Vision
6
What type of bird is this?
Chair?Bottle?
Computer Vision
7
Parakeet Auklet
• Field guides difficult for average users
• Computer vision doesn’t work perfectly (yet)
• Research mostly on basic-level categories
8
Visual Recognition With Humans in the Loop
Parakeet Auklet
What kind of bird is this?
9
Levels of Categorization
Airplane? Chair? Bottle? …
Basic-Level Categories
[Griffin et al. ‘07, Lazebnik et al. ‘06, Grauman et al. ‘06, Everingham et al. ‘06, Felzenzwalb et al. ‘08, Viola et al. ‘01, … ]
10
Levels of Categorization
American Goldfinch? Indigo Bunting? …
Subordinate Categories
[Belhumeur et al. ‘08 , Nilsback et al. ’08, …]
11
Levels of Categorization
Yellow Belly? Blue Belly?…
Parts and Attributes
[Farhadi et al. ‘09, Lampert et al. ’09, Kumar et al. ‘09]
12
Visual 20 Questions Game
Blue Belly? no
Cone-shaped Beak? yes
Striped Wing? yes
American Goldfinch? yes
Hard classification problems can be turned into a sequence of easy ones
13
Recognition With Humans in the Loop
Computer Vision
Cone-shaped Beak? yes
American Goldfinch? yes
Computer Vision
• Computers: reduce number of required questions• Humans: drive up accuracy of vision algorithms
14
Research Agenda
Heavy Reliance on Human Assistance
More Automated
Computer Vision
Improves
Blue belly? noCone-shaped beak? yesStriped Wing? yesAmerican Goldfinch? yes
Striped Wing? yesAmerican Goldfinch? yes
Fully AutomaticAmerican Goldfinch? yes
2010 2015
2025
15
Field Guides
www.whatbird.com
16
Field Guides
www.whatbird.com
17
Example Questions
18
Example Questions
19
Example Questions
20
Example Questions
21
Example Questions
22
Example Questions
23
Basic Algorithm
Input Image ( )x
Question 1:Is the belly black?
Question 2:Is the bill hooked?
Computer Vision
A: NO
A: YES
)|( xcp
),|( 1uxcp
),,|( 21 uuxcp
1u
2u
Max Expected Information Gain
Max Expected Information Gain
…
24
Without Computer Vision
Input Image ( )x
Question 1:Is the belly black?
Question 2:Is the bill hooked?
Class Prior
A: NO
A: YES
)(cp
)|( 1ucp
),|( 21 uucp
1u
2u
Max Expected Information Gain
Max Expected Information Gain
…
25
Basic Algorithm
Select the next question that maximizes expected information gain:• Easy to compute if we can to estimate probabilities of
the form:
)...,,|( 21 tuuuxcp
Object Class
Image Sequence of user responses
26
Basic Algorithm
Zxcpcuuup t )|()|...,( 21
Model of user responses
Computer vision
estimate
)...,,|( 21 tuuuxcp
Normalization factor
27
Basic Algorithm
Zxcpcuuup t )|()|...,( 21
Model of user responses
Computer vision
estimate
)...,,|( 21 tuuuxcp
Normalization factor
28
Modeling User Responses
ti it cupcuuup...121 )|()|...,(• Assume:
• Estimate using Mechanical Turk)|( cup i
grey red
blac k
whi
tebr
own
blue
grey red
blac k
whi
tebr
own
blue
grey red
blac k
whi
tebr
own
blue
Definitely
Probably Guessing
What is the color of the belly?
Pine Grosbeak
29
Incorporating Computer Vision
• Use any recognition algorithm that can estimate: p(c|x)
• We experimented with two simple methods:
)}(exp{)|( xmxcp 1-vs-all SVM
i i capxcp )|()|(
Attribute-based classification
[Lampert et al. ’09, Farhadi et al. ‘09]
30
Incorporating Computer Vision
[Vedaldi et al. ’08, Vedaldi et al. ’09]
Self SimilarityColor Histograms Color Layout
Bag of WordsSpatial Pyramid
Geometric Blur Color SIFT, SIFT
Multiple Kernels
•Used VLFeat and MKL code + color features
31
Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes
•Why birds?
Black-footed Albatross
Groove-Billed Ani
Parakeet Auklet Field Sparrow Vesper Sparrow
Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow
32
Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes
•Why birds?
Black-footed Albatross
Groove-Billed Ani
Parakeet Auklet Field Sparrow Vesper Sparrow
Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow
33
Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes
•Why birds?
Black-footed Albatross
Groove-Billed Ani
Parakeet Auklet Field Sparrow Vesper Sparrow
Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow
34
Results: Without Computer Vision
Comparing Different User Models
35
Results: Without Computer Vision
if users answers agree with field guides…Perfect Users: 100% accuracy in 8≈log2(200) questions
36
Results: Without Computer VisionReal users answer questions
MTurkers don’t always agree with field guides…
37
Results: Without Computer VisionReal users answer questions
MTurkers don’t always agree with field guides…
38
Results: Without Computer VisionProbabilistic User Model: tolerate imperfect user responses
39
Results: With Computer Vision
40
Results: With Computer Vision
Users drive performance: 19% 68%
Just Computer Vision19%
41
Results: With Computer Vision
Computer Vision Reduces Manual Labor: 11.1 6.5 questions
42
Examples
Without computer vision: Q #1: Is the shape perching-like? no (Def.)With computer vision: Q #1: Is the throat white? yes (Def.)
Western Grebe
Different Questions Asked w/ and w/out Computer Vision
perching-like
43
Examples
computer vision
Magnolia Warbler
User Input Helps Correct Computer Vision
Is the breast pattern solid? no (definitely)
Common Yellowthroat
Magnolia Warbler
Common Yellowthroat
44
Recognition is Not Always Successful
Acadian Flycatcher
Least Flycatcher
Parakeet Auklet
Least Auklet
Is the belly multi-colored? yes (Def.)
Unlimited questions
Summary
11.1 6.5 questions
Computer vision reduces manual labor
Users drive up performance
19%
45
Recognition of fine-grained categories
More reliable than field guides
Summary
11.1 6.5 questions
Computer vision reduces manual labor
Users drive up performance
19%
46
Recognition of fine-grained categories
More reliable than field guides
Summary
11.1 6.5 questions
Computer vision reduces manual labor
Users drive up performance
19%
47
Recognition of fine-grained categories
More reliable than field guides
Summary
11.1 6.5 questions
Computer vision reduces manual labor
Users drive up performance
19%
48
Recognition of fine-grained categories
More reliable than field guides
49
Future Work• Extend to domains other than birds• Methodologies for generating questions• Improve computer vision
50
Questions?
Project page and datasets available at:http://vision.caltech.edu/visipedia/
http://vision.ucsd.edu/project/visipedia/