visual recognition with humans in the loop steve branson catherine wah florian schroff boris babenko...

Post on 31-Mar-2015

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Visual Recognition With Humans in the Loop

Steve Branson

Catherine Wah

Florian Schroff

Boris Babenko

Serge Belongie

Peter WelinderPietro Perona

ECCV 2010, Crete, Greece

2

What type of bird is this?

What type of bird is this?

3…?

Field Guide

4

What type of bird is this?

Computer Vision

?

5

What type of bird is this?

Bird?

Computer Vision

6

What type of bird is this?

Chair?Bottle?

Computer Vision

7

Parakeet Auklet

• Field guides difficult for average users

• Computer vision doesn’t work perfectly (yet)

• Research mostly on basic-level categories

8

Visual Recognition With Humans in the Loop

Parakeet Auklet

What kind of bird is this?

9

Levels of Categorization

Airplane? Chair? Bottle? …

Basic-Level Categories

[Griffin et al. ‘07, Lazebnik et al. ‘06, Grauman et al. ‘06, Everingham et al. ‘06, Felzenzwalb et al. ‘08, Viola et al. ‘01, … ]

10

Levels of Categorization

American Goldfinch? Indigo Bunting? …

Subordinate Categories

[Belhumeur et al. ‘08 , Nilsback et al. ’08, …]

11

Levels of Categorization

Yellow Belly? Blue Belly?…

Parts and Attributes

[Farhadi et al. ‘09, Lampert et al. ’09, Kumar et al. ‘09]

12

Visual 20 Questions Game

Blue Belly? no

Cone-shaped Beak? yes

Striped Wing? yes

American Goldfinch? yes

Hard classification problems can be turned into a sequence of easy ones

13

Recognition With Humans in the Loop

Computer Vision

Cone-shaped Beak? yes

American Goldfinch? yes

Computer Vision

• Computers: reduce number of required questions• Humans: drive up accuracy of vision algorithms

14

Research Agenda

Heavy Reliance on Human Assistance

More Automated

Computer Vision

Improves

Blue belly? noCone-shaped beak? yesStriped Wing? yesAmerican Goldfinch? yes

Striped Wing? yesAmerican Goldfinch? yes

Fully AutomaticAmerican Goldfinch? yes

2010 2015

2025

15

Field Guides

www.whatbird.com

16

Field Guides

www.whatbird.com

17

Example Questions

18

Example Questions

19

Example Questions

20

Example Questions

21

Example Questions

22

Example Questions

23

Basic Algorithm

Input Image ( )x

Question 1:Is the belly black?

Question 2:Is the bill hooked?

Computer Vision

A: NO

A: YES

)|( xcp

),|( 1uxcp

),,|( 21 uuxcp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

24

Without Computer Vision

Input Image ( )x

Question 1:Is the belly black?

Question 2:Is the bill hooked?

Class Prior

A: NO

A: YES

)(cp

)|( 1ucp

),|( 21 uucp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

25

Basic Algorithm

Select the next question that maximizes expected information gain:• Easy to compute if we can to estimate probabilities of

the form:

)...,,|( 21 tuuuxcp

Object Class

Image Sequence of user responses

26

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

27

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

28

Modeling User Responses

ti it cupcuuup...121 )|()|...,(• Assume:

• Estimate using Mechanical Turk)|( cup i

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

Definitely

Probably Guessing

What is the color of the belly?

Pine Grosbeak

29

Incorporating Computer Vision

• Use any recognition algorithm that can estimate: p(c|x)

• We experimented with two simple methods:

)}(exp{)|( xmxcp 1-vs-all SVM

i i capxcp )|()|(

Attribute-based classification

[Lampert et al. ’09, Farhadi et al. ‘09]

30

Incorporating Computer Vision

[Vedaldi et al. ’08, Vedaldi et al. ’09]

Self SimilarityColor Histograms Color Layout

Bag of WordsSpatial Pyramid

Geometric Blur Color SIFT, SIFT

Multiple Kernels

•Used VLFeat and MKL code + color features

31

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

32

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

33

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

34

Results: Without Computer Vision

Comparing Different User Models

35

Results: Without Computer Vision

if users answers agree with field guides…Perfect Users: 100% accuracy in 8≈log2(200) questions

36

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

37

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

38

Results: Without Computer VisionProbabilistic User Model: tolerate imperfect user responses

39

Results: With Computer Vision

40

Results: With Computer Vision

Users drive performance: 19% 68%

Just Computer Vision19%

41

Results: With Computer Vision

Computer Vision Reduces Manual Labor: 11.1 6.5 questions

42

Examples

Without computer vision: Q #1: Is the shape perching-like? no (Def.)With computer vision: Q #1: Is the throat white? yes (Def.)

Western Grebe

Different Questions Asked w/ and w/out Computer Vision

perching-like

43

Examples

computer vision

Magnolia Warbler

User Input Helps Correct Computer Vision

Is the breast pattern solid? no (definitely)

Common Yellowthroat

Magnolia Warbler

Common Yellowthroat

44

Recognition is Not Always Successful

Acadian Flycatcher

Least Flycatcher

Parakeet Auklet

Least Auklet

Is the belly multi-colored? yes (Def.)

Unlimited questions

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

45

Recognition of fine-grained categories

More reliable than field guides

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

46

Recognition of fine-grained categories

More reliable than field guides

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

47

Recognition of fine-grained categories

More reliable than field guides

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

48

Recognition of fine-grained categories

More reliable than field guides

49

Future Work• Extend to domains other than birds• Methodologies for generating questions• Improve computer vision

50

Questions?

Project page and datasets available at:http://vision.caltech.edu/visipedia/

http://vision.ucsd.edu/project/visipedia/

top related