agenda introduction bag-of-words models visual words with spatial location part-based models...

19
Agenda Agenda • Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based image retrieval Datasets & Conclusions

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

AgendaAgenda

• Introduction

• Bag-of-words models

• Visual words with spatial location

• Part-based models

• Discriminative methods

• Segmentation and recognition

• Recognition-based image retrieval

• Datasets & Conclusions

Page 2: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Databases

• Caltech 101

• Caltech 256

• Pascal Visual Object Classes (VOC)

• LabelMe

• Slides from Andrew Zisserman

Page 3: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Caltech 101

• Pictures of objects belonging to 101 categories.

• About 40 to 800 images per category. Most categories have about 50 images.

• The size of each image is roughly 300 x 200 pixels.

• Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato. 

• Train on 5, 10, 15, 20 or 30 images

• Test on rest – report results per class

Page 4: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Caltech 101 images

Page 5: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

• Smallest category size is 31 images:

• Too easy?

– left-right aligned

– Rotation artifacts

– Soon will saturate performance

Caltech-101: Drawbacks

N train 30

Page 6: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Caltech-256• Smallest category size now 80 images

• About 30K images

• Harder

– Not left-right aligned

– No artifacts

– Performance is halved

– More categories

• New and larger clutter category

Page 7: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Caltech 256 images

base

ball-

bat

bask

etba

ll-ho

opdo

gka

yac

traf

fic li

ght

Page 8: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

The PASCAL Visual Object Classes (VOC) Dataset and Challenge

Mark EveringhamLuc Van GoolChris Williams

John WinnAndrew Zisserman

Page 9: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

The PASCAL VOC Challenge• Challenge in visual object

recognition funded byPASCAL network ofexcellence

• Publicly available dataset ofannotated images. Development kit available.

• Main competitions in classification (is there an X in this image) and detection (where are the X’s)

• “Taster competitions” in segmentation and 2-D human “pose estimation” (2007-present)

Page 10: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Dataset Content

• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV

• Real images downloaded from flickr, not filtered for “quality”

• Complex scenes, scale, pose, lighting, occlusion, ...

Page 11: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Annotation

• Complete annotation of all objects

• Annotated in one session with written guidelines

TruncatedObject extends beyond BB

OccludedObject is significantly occluded within BB

PoseFacing left

DifficultNot scored in evaluation

Page 12: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

ExamplesAeroplane

Bus

Bicycle Bird Boat Bottle

Car Cat Chair Cow

Page 13: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

History

• New dataset annotated annually– Annotation of test set is withheld until after challenge

Images Objects Classes Entries

2005 2,232 2,871 4 12 Collection of existing and some new data.

2006 5,304 9,507 10 25 Completely new dataset from flickr (+MSRC)

2007 9,963 24,640 20 28 Increased classes to 20. Introduced tasters.

2008 8,776 20,739 20 Added “occlusion” flag. Reuse of taster data.Release detailed results to support “meta-analysis”

Page 14: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Main Challenge Tasks

• Classification– Is there a dog in this image?– Evaluation by precision/recall

• Detection– Localize all the people (if any) in

this image– Evaluation by precision/recall

based on bounding box overlap

Page 15: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

IRISA (0.221)UoCTTI (0.213)

INRIA_Normal (0.121)

MPI_ESSOL (0.117)

INRIA_PlusClass (0.092)

MPI_Center (0.091)TKK (0.061)

• Person detection

Example Precision/Recall: 2007

Page 16: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Russell, Torralba, Freman, 2005

LabelMe

Page 17: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

CMU/MIT frontal faces vasc.ri.cmu.edu/idb/html/face/frontal_images

cbcl.mit.edu/software-datasets/FaceData2.html

Patches Frontal faces

Graz-02 Database www.emt.tugraz.at/~pinz/data/GRAZ_02/ Segmentation masks Bikes, cars, people

UIUC Image Database l2r.cs.uiuc.edu/~cogcomp/Data/Car/ Bounding boxes Cars

TU Darmstadt Database www.vision.ethz.ch/leibe/data/ Segmentation masks Motorbikes, cars, cows

LabelMe dataset people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary >500 Categories

Caltech 101 www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Segmentation masks 101 categories

Caltech 256

COIL-100

http://www.vision.caltech.edu/Image_Datasets/Caltech256/

www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html

Bounding Box

Patches

256 Categories

100 instances

NORB www.cs.nyu.edu/~ylclab/data/norb-v1.0/ Bounding box 50 toys

Databases for object localization

Databases for object recognition

On-line annotation toolsESP game www.espgame.org Global image descriptions Web images

LabelMe people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary High resolution images

The next tables summarize some of the available datasets for training and testing object detection and recognition algorithms. These lists are far from exhaustive.

Links to datasets

CollectionsPASCAL http://www.pascal-network.org/challenges/VOC/ Segmentation, boxes various

Page 18: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Topics not covered

• Context– Scene– Inter-object relations

• Video– Tracking & detection

• Multiple viewpoints

Page 19: Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based

Summary

• Methods reviewed here– Bag of words– Bag of words with location– Parts and structure– Discriminative methods– Combined Segmentation and recognition– Recognition for retrieval

• Resources online: http://cs.nyu.edu/~fergus/icml_tutorial

– Slides– Code– Links to datasets