learning object detectors from weakly supervised image data

Post on 23-Aug-2014

787 Views

Category:

Science

17 Downloads

Preview:

Click to see full reader

DESCRIPTION

One of the fundamental challenges in automatically detecting and localizing objects in images is the need to collect a large number of example images with annotated object locations (bounding boxes). The introduction of detection challenge datasets has propelled progress by providing the research community with enough fully annotated images to train competitive detectors for 20-200 classes. However, as we look forward towards the goal of scaling our systems to human level category detection, it becomes impractical to collect a large quantity of bounding box labels for tens, or even hundreds of thousands of categories. In this talk I will discuss recent work on enabling the training of detectors with weakly annotated images, i.e. images that are known to contain the object but with unknown object location (bounding box). The first approach I will present proposes a new multiple instance learning method for object detection that is capable of handling noisy automatically obtained annotations. Our approach consists of first obtaining confidence estimates over the label space and second incorporating these estimates within a new Boosting procedure. We demonstrate the efficiency of our procedure on two detection tasks, namely horse detection and pedestrian detection, where the training data is primarily annotated by a coarse area of interest detector, and show substantial improvements over existing MIL methods. I will also present a second, complimentary approach--a domain adaptation algorithm which learns the difference between the classification task and the detection task, and transfers this knowledge to classifiers for categories without bounding box annotated data, adapting them into detectors. Our method has the potential to enable detection for the tens of thousands of categories that lack bounding box annotations, yet have plenty of classification data in Imagenet. The approach is evaluated on the ImageNet LSVRC-2013 detection challenge.

TRANSCRIPT

КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ

Kate Saenko, University of Massachusetts, Lowell

COMPUTER VISION: LEARNING TO DETECT OBJECTS

Kate Saenko, University of Massachusetts, Lowell

What is computer vision?3

Computer Vision4

Terminator 2

we’re not quite there yet, but….

terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)

Machine Learning: What is it?

Program a computer to learn from experience

Learn from “big data”

Machine Learning in practice

Machine learning is not perfect7

Machine learning is not perfect8

Personal photo albums

Lots of image data available!

Data for computer vision

What are applications of computer vision?11

Surveillance and security

Computer Vision: Surveillance and Security

Smart cars

Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers

Slide content courtesy of Amnon Shashua

Scientific Images

Medical Imaging

Image guided surgeryGrimson et al., MIT

3D imagingMRI, CT

slide by S. Seitz

Vision for Robotics

http://www.robocup.org/NASA’s Mars Spirit Roverhttp://en.wikipedia.org/wiki/Spirit_rover

slide by S. Seitz

Object Detection: Face Detection

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

What is object detection?18

Goal of object detection19

Detect: PERSON

Why is object detection difficult?20

Why is object detection difficult?21

Can you detect all objects in this image?

Easy to collect data on the web!22

Difficult to label image annotations23

Easy to label from search engine

Much more difficult and costly to label

dog apple

dog apple

Goal of this research:24

Learn from weakly labeled data!

How well can we do without bounding box labels?

25

Computer detecting pedestrians

26

Computer detecting 7,000 object categories

How well can we do without bounding box labels?

Join work with Karim Ali

Confidence-rated Multiple instance Boosting for Detection

Motivation28

Object Detection High accuracy requires large labeled data sets Scalability

Reducing annotation requirements Semi-supervised Learning Active Learning Multiple-Instance Learning

Overview29

CR-MILBOOST

Multiple instance learning with noise30

MI Learning cannot handle noisy bags

Outline31

Reminder: What is MIL?

CR-MILBoost (CVPR’14)

Conclusion & Future Work

Discussion

Reminder: What is MIL?32

Supervised Learning Each instance has an associated label

MIL: Weaker Supervision Examples come in bags Each Bag has a label

Negative Bag: all instances in bag are negative Positive Bag: at least one instance in bag is positive

Supervised vs MIL (binary)33

Supervised Learning MI Learning

Related Methods34

How to estimate latent labels for positives

Gartner, ICML’02 Xu, ICML’04 Andrews, NIPS’03

Bunescu, ICML’07 SVM Constraints

Viola, NIPS’07

Supervised MIL

CR-MILBOOST35

MILBoost

CR-MILBOOST36

MILBoost

CR-MILBOOST37

Two Step Procedure Estimate Probabilities on latent label Integrate estimate in new loss

Mitigates label estimation error by incorporating priors

CR-MILBOOST38

Step 1

CR-MILBOOST39

Step 2

CR-MILBOOST40

Step 2

Experiments: Features41

Weak Learners: An edge orientation A sub-window A threshold

Simple, Efficient Q=4, number of stumps

Experiments: Pedestrian Detection42

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Experiments: Pedestrian Detection43

Testing Data INRIA Person 300 images containing 600 pedestrians

Experiments: Pedestrian Detection44

Experiments: Pedestrian Detection45

Experiments: Pedestrian Detection46

Experiments: Horse Detection47

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Experiments: Horse Detection48

Testing Data 200 images containing 200 side-view horses

Experiments: Horse Detection49

Experiments: Horse Detection50

Experiments: Horse Detection51

Conclusion52

New MIL method: CR-MILBOOST Two step procedure

Dramatic increase in performance 200% on two datasets

Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples

Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley

Adapting Deep CNNs from Classification to Detection

54

Recall: classification is easier than detection55

Classification label: Easy to label

Detection label: much more difficult and costly!

dog apple

dog apple

ICLASSIFY

dog

apple

IDET

dog

apple

ICLASSIFY

cat

WCLASSIFYdog

WCLASSIFYapple

ClassifiersWDET

dog

WDETapple

Detectors

WCLASSIFYcat WDET

cat IDET

?

Main idea behind the approach

cat: 0.90

dog: 0.85

airplane: 0.05

person: 0.10

layers 1-5

fc6 fc7fcA

fcB

Classification data from categories A and B

Train Classification CNN

cat

dog

Deep Convolutional Neural Network

dog: 0.87

person: 0.15

cat: 0.90

dog: 0.85

background: 0.25

airplane: 0.05

person: 0.10

layers 1-5

det layers 1-5

fc6

detfc6

fc7

detfc7

fcA

fcB

detfcB

Classification data from categories A and B

Train Classification CNN

Detection data from categories B

Labeledwarped region

Train adapteddetection CNN

dog

cat

dog

background

background: 0.25

detlayers 1-5

detfc6

detfc7

Final Combined and fully adapted CNN

cat: 0.90

airplane: 0.02detfcA

dog: 0.45

person: 0.15

detfcB

adapt

background

(c) Output Layer Adaptation

(a) C

lass

ifica

tion

CNN

(b) Hidden Layer Adaptation

Results on ILSVRC 2013 Detection

Results on ILSVRC 2013 Detection

Results on ILSVRC 2013 Detection

Preliminary results on 7K categories63

Conclusion64

Presented two new methods for object detector training with minimal bounding box annotation MIL based method for learning from results of image

search Adaptation from classification to detection task

Questions?65

top related