learning object detectors from weakly supervised image data

КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ

Kate Saenko, University of Massachusetts, Lowell

COMPUTER VISION: LEARNING TO DETECT OBJECTS

Kate Saenko, University of Massachusetts, Lowell

What is computer vision?3

Computer Vision4

Terminator 2

we’re not quite there yet, but….

terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)

Machine Learning: What is it?

Program a computer to learn from experience

Learn from “big data”

Machine Learning in practice

Machine learning is not perfect7

Machine learning is not perfect8

Personal photo albums

Lots of image data available!

Data for computer vision

What are applications of computer vision?11

Surveillance and security

Computer Vision: Surveillance and Security

Smart cars

Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers

Slide content courtesy of Amnon Shashua

Scientific Images

Medical Imaging

Image guided surgeryGrimson et al., MIT

3D imagingMRI, CT

slide by S. Seitz

Vision for Robotics

http://www.robocup.org/NASA’s Mars Spirit Roverhttp://en.wikipedia.org/wiki/Spirit_rover

slide by S. Seitz

Object Detection: Face Detection

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

What is object detection?18

Goal of object detection19

Detect: PERSON

Why is object detection difficult?20

Why is object detection difficult?21

Can you detect all objects in this image?

Easy to collect data on the web!22

Difficult to label image annotations23

Easy to label from search engine

Much more difficult and costly to label

dog apple

Goal of this research:24

Learn from weakly labeled data!

How well can we do without bounding box labels?

Computer detecting pedestrians

Computer detecting 7,000 object categories

How well can we do without bounding box labels?

Join work with Karim Ali

Confidence-rated Multiple instance Boosting for Detection

Motivation28

Object Detection High accuracy requires large labeled data sets Scalability

Reducing annotation requirements Semi-supervised Learning Active Learning Multiple-Instance Learning

Overview29

CR-MILBOOST

Multiple instance learning with noise30

MI Learning cannot handle noisy bags

Outline31

Reminder: What is MIL?

CR-MILBoost (CVPR’14)

Conclusion & Future Work

Discussion

Reminder: What is MIL?32

Supervised Learning Each instance has an associated label

MIL: Weaker Supervision Examples come in bags Each Bag has a label

Negative Bag: all instances in bag are negative Positive Bag: at least one instance in bag is positive

Supervised vs MIL (binary)33

Supervised Learning MI Learning

Related Methods34

How to estimate latent labels for positives

Gartner, ICML’02 Xu, ICML’04 Andrews, NIPS’03

Bunescu, ICML’07 SVM Constraints

Viola, NIPS’07

Supervised MIL

CR-MILBOOST35

MILBoost

CR-MILBOOST36

MILBoost

CR-MILBOOST37

Two Step Procedure Estimate Probabilities on latent label Integrate estimate in new loss

Mitigates label estimation error by incorporating priors

CR-MILBOOST38

Step 1

CR-MILBOOST39

Step 2

CR-MILBOOST40

Step 2

Experiments: Features41

Weak Learners: An edge orientation A sub-window A threshold

Simple, Efficient Q=4, number of stumps

Experiments: Pedestrian Detection42

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Testing Data INRIA Person 300 images containing 600 pedestrians

Experiments: Horse Detection47

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes

Testing Data 200 images containing 200 side-view horses

Conclusion52

New MIL method: CR-MILBOOST Two step procedure

Dramatic increase in performance 200% on two datasets

Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples

Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley

Adapting Deep CNNs from Classification to Detection

Recall: classification is easier than detection55

Classification label: Easy to label

Detection label: much more difficult and costly!

dog apple

ICLASSIFY

WCLASSIFYdog

WCLASSIFYapple

ClassifiersWDET

WDETapple

Detectors

WCLASSIFYcat WDET

cat IDET

Main idea behind the approach

cat: 0.90

dog: 0.85

airplane: 0.05

person: 0.10

layers 1-5

fc6 fc7fcA

Classification data from categories A and B

Train Classification CNN

Deep Convolutional Neural Network

dog: 0.87

person: 0.15

cat: 0.90

dog: 0.85

background: 0.25

airplane: 0.05

person: 0.10

layers 1-5

det layers 1-5

detfc6

detfc7

detfcB

Classification data from categories A and B

Train Classification CNN

Detection data from categories B

Labeledwarped region

Train adapteddetection CNN

background

background: 0.25

detlayers 1-5

detfc6

detfc7

Final Combined and fully adapted CNN

cat: 0.90

airplane: 0.02detfcA

dog: 0.45

person: 0.15

detfcB

background

(c) Output Layer Adaptation

(b) Hidden Layer Adaptation

Results on ILSVRC 2013 Detection

Preliminary results on 7K categories63

Conclusion64

Presented two new methods for object detector training with minimal bounding box annotation MIL based method for learning from results of image

search Adaptation from classification to detection task

Questions?65

learning object detectors from weakly supervised image data

bounding box

object detection

computer vision

pi 1

pij

learning

difficult

bag

Science

weakly supervised instance segmentation using class peak...

ecs 289g: visual...

weakly supervised machine reading

w2f: a weakly-supervised to fully-supervised framework for...

towards weakly-supervised visual understanding

wildcat: weakly supervised learning of deep convnets for...

self-supervised difference detection for weakly...

weakly supervised medication regimen extraction from

iterative attention mining for weakly supervised thoracic...

weakly supervised instance segmentation using the bounding...

weakly supervised segmentation and firma convenzione

onregularizedlosses for weakly-supervised cnn...

weakly-supervised action localization with background...

weakly supervised cascaded convolutional...

weakly-supervised video summarization using variational...

weakly-supervised discovery of visual pattern...

weakly supervised segmentation-aided classification of

information extraction and weakly-supervised … extraction...

towards safe weakly supervised learning - nju.edu.cn

weakly supervised subevent knowledge acquisition