visual recognition with humans in the loop steve branson catherine wah florian schroff boris babenko...

50
Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010, Crete, Greece 1

Upload: jakobe-wynter

Post on 31-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

1

Visual Recognition With Humans in the Loop

Steve Branson

Catherine Wah

Florian Schroff

Boris Babenko

Serge Belongie

Peter WelinderPietro Perona

ECCV 2010, Crete, Greece

Page 2: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

2

What type of bird is this?

Page 3: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

What type of bird is this?

3…?

Field Guide

Page 4: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

4

What type of bird is this?

Computer Vision

?

Page 5: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

5

What type of bird is this?

Bird?

Computer Vision

Page 6: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

6

What type of bird is this?

Chair?Bottle?

Computer Vision

Page 7: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

7

Parakeet Auklet

• Field guides difficult for average users

• Computer vision doesn’t work perfectly (yet)

• Research mostly on basic-level categories

Page 8: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

8

Visual Recognition With Humans in the Loop

Parakeet Auklet

What kind of bird is this?

Page 9: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

9

Levels of Categorization

Airplane? Chair? Bottle? …

Basic-Level Categories

[Griffin et al. ‘07, Lazebnik et al. ‘06, Grauman et al. ‘06, Everingham et al. ‘06, Felzenzwalb et al. ‘08, Viola et al. ‘01, … ]

Page 10: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

10

Levels of Categorization

American Goldfinch? Indigo Bunting? …

Subordinate Categories

[Belhumeur et al. ‘08 , Nilsback et al. ’08, …]

Page 11: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

11

Levels of Categorization

Yellow Belly? Blue Belly?…

Parts and Attributes

[Farhadi et al. ‘09, Lampert et al. ’09, Kumar et al. ‘09]

Page 12: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

12

Visual 20 Questions Game

Blue Belly? no

Cone-shaped Beak? yes

Striped Wing? yes

American Goldfinch? yes

Hard classification problems can be turned into a sequence of easy ones

Page 13: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

13

Recognition With Humans in the Loop

Computer Vision

Cone-shaped Beak? yes

American Goldfinch? yes

Computer Vision

• Computers: reduce number of required questions• Humans: drive up accuracy of vision algorithms

Page 14: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

14

Research Agenda

Heavy Reliance on Human Assistance

More Automated

Computer Vision

Improves

Blue belly? noCone-shaped beak? yesStriped Wing? yesAmerican Goldfinch? yes

Striped Wing? yesAmerican Goldfinch? yes

Fully AutomaticAmerican Goldfinch? yes

2010 2015

2025

Page 15: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

15

Field Guides

www.whatbird.com

Page 16: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

16

Field Guides

www.whatbird.com

Page 17: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

17

Example Questions

Page 18: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

18

Example Questions

Page 19: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

19

Example Questions

Page 20: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

20

Example Questions

Page 21: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

21

Example Questions

Page 22: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

22

Example Questions

Page 23: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

23

Basic Algorithm

Input Image ( )x

Question 1:Is the belly black?

Question 2:Is the bill hooked?

Computer Vision

A: NO

A: YES

)|( xcp

),|( 1uxcp

),,|( 21 uuxcp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

Page 24: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

24

Without Computer Vision

Input Image ( )x

Question 1:Is the belly black?

Question 2:Is the bill hooked?

Class Prior

A: NO

A: YES

)(cp

)|( 1ucp

),|( 21 uucp

1u

2u

Max Expected Information Gain

Max Expected Information Gain

Page 25: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

25

Basic Algorithm

Select the next question that maximizes expected information gain:• Easy to compute if we can to estimate probabilities of

the form:

)...,,|( 21 tuuuxcp

Object Class

Image Sequence of user responses

Page 26: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

26

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

Page 27: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

27

Basic Algorithm

Zxcpcuuup t )|()|...,( 21

Model of user responses

Computer vision

estimate

)...,,|( 21 tuuuxcp

Normalization factor

Page 28: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

28

Modeling User Responses

ti it cupcuuup...121 )|()|...,(• Assume:

• Estimate using Mechanical Turk)|( cup i

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

grey red

blac k

whi

tebr

own

blue

Definitely

Probably Guessing

What is the color of the belly?

Pine Grosbeak

Page 29: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

29

Incorporating Computer Vision

• Use any recognition algorithm that can estimate: p(c|x)

• We experimented with two simple methods:

)}(exp{)|( xmxcp 1-vs-all SVM

i i capxcp )|()|(

Attribute-based classification

[Lampert et al. ’09, Farhadi et al. ‘09]

Page 30: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

30

Incorporating Computer Vision

[Vedaldi et al. ’08, Vedaldi et al. ’09]

Self SimilarityColor Histograms Color Layout

Bag of WordsSpatial Pyramid

Geometric Blur Color SIFT, SIFT

Multiple Kernels

•Used VLFeat and MKL code + color features

Page 31: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

31

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 32: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

32

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 33: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

33

Birds 200 Dataset•200 classes, 6000+ images, 288 binary attributes

•Why birds?

Black-footed Albatross

Groove-Billed Ani

Parakeet Auklet Field Sparrow Vesper Sparrow

Arctic Tern Forster’s Tern Common Tern Baird’s Sparrow Henslow’s Sparrow

Page 34: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

34

Results: Without Computer Vision

Comparing Different User Models

Page 35: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

35

Results: Without Computer Vision

if users answers agree with field guides…Perfect Users: 100% accuracy in 8≈log2(200) questions

Page 36: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

36

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

Page 37: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

37

Results: Without Computer VisionReal users answer questions

MTurkers don’t always agree with field guides…

Page 38: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

38

Results: Without Computer VisionProbabilistic User Model: tolerate imperfect user responses

Page 39: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

39

Results: With Computer Vision

Page 40: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

40

Results: With Computer Vision

Users drive performance: 19% 68%

Just Computer Vision19%

Page 41: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

41

Results: With Computer Vision

Computer Vision Reduces Manual Labor: 11.1 6.5 questions

Page 42: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

42

Examples

Without computer vision: Q #1: Is the shape perching-like? no (Def.)With computer vision: Q #1: Is the throat white? yes (Def.)

Western Grebe

Different Questions Asked w/ and w/out Computer Vision

perching-like

Page 43: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

43

Examples

computer vision

Magnolia Warbler

User Input Helps Correct Computer Vision

Is the breast pattern solid? no (definitely)

Common Yellowthroat

Magnolia Warbler

Common Yellowthroat

Page 44: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

44

Recognition is Not Always Successful

Acadian Flycatcher

Least Flycatcher

Parakeet Auklet

Least Auklet

Is the belly multi-colored? yes (Def.)

Unlimited questions

Page 45: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

45

Recognition of fine-grained categories

More reliable than field guides

Page 46: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

46

Recognition of fine-grained categories

More reliable than field guides

Page 47: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

47

Recognition of fine-grained categories

More reliable than field guides

Page 48: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

Summary

11.1 6.5 questions

Computer vision reduces manual labor

Users drive up performance

19%

48

Recognition of fine-grained categories

More reliable than field guides

Page 49: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

49

Future Work• Extend to domains other than birds• Methodologies for generating questions• Improve computer vision

Page 50: Visual Recognition With Humans in the Loop Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona ECCV 2010,

50

Questions?

Project page and datasets available at:http://vision.caltech.edu/visipedia/

http://vision.ucsd.edu/project/visipedia/