human computation and computer vision
DESCRIPTION
Human Computation and Computer Vision. CS143 Computer Vision James Hays, Brown University. 24 hours of Photo Sharing. installation by Erik Kessels. And sometimes Internet photos have useful labels. Im2gps. Hays and Efros. CVPR 2008. But what if we want more?. Image Categorization. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/1.jpg)
Human Computation andComputer Vision
CS143 Computer VisionJames Hays, Brown University
![Page 2: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/2.jpg)
24 hours of Photo Sharing
installation by Erik Kessels
![Page 3: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/3.jpg)
And sometimes Internet photos have useful labels
Im2gps. Hays and Efros. CVPR 2008
But what if we want more?
![Page 4: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/4.jpg)
Image Categorization
Training Labels
Training Images
Classifier Training
Training
Image Features
Trained Classifier
![Page 5: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/5.jpg)
Image Features
Testing
Test Image
Trained Classifier Outdoor
Prediction
Image Categorization
Training Labels
Training Images
Classifier Training
Image Features
Trained Classifier
Training
![Page 6: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/6.jpg)
Human Computation for Annotation
Training LabelsTraining
ImagesClassifier Training
Training
Image Features
Trained Classifier
Show images,Collect and filter labels
Unlabeled Images
![Page 7: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/7.jpg)
Active Learning
Training LabelsTraining
ImagesClassifier Training
Training
Image Features
Trained Classifier
Show images,Collect and filter labels
Unlabeled ImagesFind images
near decision boundary
![Page 8: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/8.jpg)
Human-in-the-loop Recognition
Human Estimate
Human Observer
Image Features
Testing
Test Image
Trained Classifier Outdoor
Prediction
![Page 9: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/9.jpg)
Outline
• Human Computation for Annotation– ESP Game– Mechanical Turk
• Human-in-the-loop Recognition– Visipedia
![Page 10: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/10.jpg)
![Page 11: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/11.jpg)
Luis von Ahn and Laura Dabbish. Labeling Images with a Computer Game. ACM Conf. on Human Factors in Computing Systems, CHI 2004
![Page 12: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/12.jpg)
![Page 13: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/13.jpg)
![Page 14: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/14.jpg)
![Page 15: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/15.jpg)
![Page 16: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/16.jpg)
![Page 17: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/17.jpg)
![Page 18: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/18.jpg)
![Page 19: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/19.jpg)
![Page 20: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/20.jpg)
Utility data annotation via Amazon Mechanical Turk
Alexander SorokinDavid Forsyth
University of Illinois at Urbana-Champaign
Slides by Alexander Sorokin
X 100 000 = $5000
![Page 21: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/21.jpg)
Task
Amazon Mechanical Turk
Is this a dog?
o Yeso No
Workers
Answer: Yes
Task: Dog?
Pay: $0.01
Broker
www.mturk.com
$0.01
![Page 22: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/22.jpg)
Annotation protocols
• Type keywords• Select relevant images• Click on landmarks• Outline something• Detect features
……….. anything else ………
![Page 24: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/24.jpg)
Select examples
Joint work with Tamara and Alex Berg
http://visionpc.cs.uiuc.edu/~largescale/data/simpleevaluation/html/horse.html
![Page 25: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/25.jpg)
Select examples
requester mtlabel$0.02
![Page 26: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/26.jpg)
Click on landmarks
$0.01 http://vision-app1.cs.uiuc.edu/mt/results/people14-batch11/p7/
![Page 27: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/27.jpg)
Outline something
$0.01 http://visionpc.cs.uiuc.edu/~largescale/results/production-3-2/results_page_013.htmlData from Ramanan NIPS06
![Page 28: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/28.jpg)
Motivation
X 100 000 = $5000
Customannotations
Large scale Low price
![Page 29: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/29.jpg)
Issues
• Quality?– How good is it?– How to be sure?
• Price? – How to price it?
![Page 30: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/30.jpg)
Annotation quality
Agree within 5-10 pixels on 500x500 screen
There are bad ones.
A C E G
![Page 31: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/31.jpg)
How do we get quality annotations?
![Page 32: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/32.jpg)
Ensuring Annotation Quality
• Consensus / Multiple Annotation / “Wisdom of the Crowds”
• Gold Standard / Sentinel – Special case: qualification exam
• Grading Tasks– A second tier of workers who grade others
![Page 33: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/33.jpg)
Pricing
• Trade off between throughput and cost• Higher pay can actually attract scammers
![Page 34: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/34.jpg)
Visual Recognition with Humans in the Loop
Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona,
Serge Belongie
Part of the Visipedia project
Slides from Brian O’Neil
![Page 35: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/35.jpg)
Introduction:
Computers starting to get good at this.
If it’s hard for humans, it’s probably
too hard for computers.
Semantic feature extraction difficult for
computers.
Combine strengths to solve this
problem.
![Page 36: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/36.jpg)
The Approach: What is progress?
• Supplement visual recognition with the human capacity for visual feature extraction to tackle difficult (fine-grained) recognition problems.
• Typical progress is viewed as increasing data difficulty while maintaining full autonomy
• Here, the authors view progress as reduction in human effort on difficult data.
![Page 37: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/37.jpg)
The Approach: 20 Questions
• Ask the user a series of discriminative visual questions to make the classification.
![Page 38: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/38.jpg)
Which 20 questions?
• At each step, exploit the image itself and the user response history to select the most informative question to ask next.
Image x Ask user a question
Stop?( | , )tp c U x max ( | , )t
c p c U x
1( | , )tp c U x
No
Yes
![Page 39: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/39.jpg)
Some definitions:
• Set of possible questions• Possible answers to question i• Possible confidence in answer i
(Guessing, Probably, Definitely)
• User response • History of user responses at time t
1{ ... }
( , )
n
i i
i
i i i
t
Q q q
a A
r V
u a r
U
![Page 40: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/40.jpg)
Question selection
• Seek the question that gives the maximum information gain (entropy reduction) given the image and the set of previous user responses.
Probability of obtainingResponse ui given the imageAnd response history
Entropy when response is Added to history
Entropy before response is added.
1 1 1
1
| , | , log | ,C
t t t
c
H c x U p c x U p c x U
where
1 1 1 1; | , | , | , | ,i i
t t t ti i i
u A V
I c u x U p u x U H c x u U H c x U
![Page 41: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/41.jpg)
Incorporating vision
• Bayes Rule• A visual recognition algorithm outputs a
probability distribution across all classes that is used as the prior.
• A posterior probability is then computed based on the probability of obtaining a particular response history given each class.
| , | , | | |p c x U p U c x p c x p U c p c x
![Page 42: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/42.jpg)
Modeling user responses
• Assume that the questions are answered independently.
11
1 1
1
| |
| , | | ,
tt
ii
Ct t
i ic
p U c p u c
p u x U p u c p c x U
Required for posterior computation
Required for information gaincomputation
![Page 43: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/43.jpg)
The Dataset: Birds-200
• 6033 images of 200 species
![Page 44: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/44.jpg)
Implementation
• Assembled 25 visual questions encompassing 288 visual attributes extracted from www.whatbird.com
• Mechanical Turk users asked to answer questions and provide confidence scores.
![Page 45: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/45.jpg)
User Responses.
![Page 46: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/46.jpg)
Visual recognition
• Any vision system that can output a probability distribution across classes will work.
• Authors used Andrea Vedaldis’s code.– Color/gray SIFT– VQ geometric blur– 1 v All SVM
• Authors added full image color histograms and VQ color histograms
![Page 47: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/47.jpg)
Experiments
• 2 Stop criteria:– Fixed number of questions – evaluate accuacy– User stops when bird identified – measure
number of questions required.
Image x Ask user a question
Stop?( | , )tp c U x max ( | , )t
c p c U x
1( | , )tp c U x
No
Yes
![Page 48: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/48.jpg)
Results
• Average number of questions to make ID reduced from 11.11 to 6.43
• Method allows CV to handle the easy cases, consulting with users only on the more difficult cases.
![Page 49: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/49.jpg)
Key Observations
• Visual recognition reduces labor over a pure “20 Q” approach.
• Visual recognition improves performance over a pure “20 Q” approach. (69% vs 66%)
• User input dramatically improves recognition results. (66% vs 19%)
![Page 50: Human Computation and Computer Vision](https://reader030.vdocument.in/reader030/viewer/2022032709/56813237550346895d989d64/html5/thumbnails/50.jpg)
Strengths and weaknesses
• Handles very difficult data and yields excellent results.
• Plug-and-play with many recognition algorithms.
• Requires significant user assistance• Reported results assume humans are perfect
verifiers• Is the reduction from 11 questions to 6 really
that significant?