fcv dataset zisserman_2

The PASCAL Visual Object Classes (VOC) Dataset and Challenge

Andrew Zisserman

Visual Geometry Group University of Oxford

• Challenge in visual objectrecognition funded byPASCAL network of excellence

• Publicly available dataset ofannotated images

• Main competitions in classification (is there an X in this image), detection(where are the X’s), and segmentation (which pixels belong to X)

• Competition has run each year since 2006.

• Organized by Mark Everingham, Luc Van Gool, Chris Williams, John Winn and Andrew Zisserman

The PASCAL VOC Challenge

Dataset Content

• 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV

• Real images downloaded from flickr, not filtered for “quality”

• Complex scenes, scale, pose, lighting, occlusion, truncation ...

Annotation• Complete annotation of all objects

• Annotated in one session with written guidelines

TruncatedObject extends beyond BB

OccludedObject is significantly occluded within BB

PoseFacing left

DifficultNot scored in evaluation

Examples

Aeroplane

Bus

Bicycle Bird Boat Bottle

Car Cat Chair Cow

Examples

Dining Table

Potted Plant

Dog Horse Motorbike Person

Sheep Sofa Train TV/Monitor

Dataset Statistics – VOC 2010

22,992

9,637

23,374

10,103

Training TestingImages

Objects

• Minimum ~500 training objects per category– ~1700 cars, 1500 dogs, 7000 people

• Approximately equal distribution across training and test sets

• Data augmented each year

Design choices: Things we got right …

1. Standard method of assessment

• Train/validation/test splits given• Standard evaluation protocol – AP per class• Software supplied

– Includes baseline classifier/detector/segmenter– Runs from training to generating PR curve and AP on validation or

test data out of the box

• Has meant that results on VOC can be consistently compared in publications

• “Best Practice” webpage on how the training/validation/test data should be used for the challenge


2. Evaluation of test data

Three possibilities

1. Release test data and annotation (most liberal) and participants can assess performance– Cons: open to abuse

2. Release test data, but test annotation withheld - participants submit results and organizers assess performance (use an evaluation server)

3. No release of test data - participants have to submit software and organizers run this and assess performance

Design choices: Things we got right …3. Augmentation of dataset each year

• VOC2010 around 40% increase in size over VOC2009

• Has prevented over fitting on data

• 2008/9 datasets retained as subset of 2010– Assignments to training/test sets maintained– So can measure progress from 2008 to 2010

22,992

9,637

23,374

10,103

Training Testing

Images (7,054) (6,650)

Objects (17,218) (16,829)

VOC2009 counts shown in brackets

Detection Challenge: Progress 2008-2010

• Results on 2008 data improve for best 2009 and 2010 methods for all categories, by over 100% for some categories

– Caveat: Better methods or more training data?

0

10

20

30

40

50

60ae

ropl

ane

bicy

cle

bird

boat

bottl

e

bus

car

cat

chai

rco

w

dini

ngta

ble

dog

hors

e

mot

orbi

kepe

rson

potte

dpla

nt

shee

pso

fa

train

tvm

onito

r

Max

AP

(%)

200820092010


4. Equivalence bands

• Uses Friedman/Nemenyi ANOVA significance test• Makes explicit that many methods are just slight

perturbations of each other• Need more research on this area for obtaining equivalence

classes for ranking (rather than classification)• Plan to include banner/header results on evaluation server

(to aid comparisons for reviewing etc)

Statistical Significance

• Friedman/Nemenyi analysis– Compare differences in mean rank of methods over classes using non-

parametric version of ANOVA– Mean rank must differ by at least 5.4 to be considered significant

(p=0.05)51015202530354045

Mean rank

NUSPSL_KERNELREGFUSINGNUSPSL_MFDETSVMNUSPSL_EXCLASSIFIERNEC_V1_HOGLBP_NONLIN_SVMDETNLPR_VSTAR_CLS_DICTLEARNNEC_V1_HOGLBP_NONLIN_SVMUVA_BW_NEWCOLOURSIFTUVA_BW_NEWCOLOURSIFT_SRKDACVC_PLUSDETSURREY_MK_KDACVC_PLUSBUT_FU_SVM_SIFTXRCE_IFVCVC_FLATBONN_FGT_SEGMLIRIS_MKL_TRAINVAL

NUDT_SVM_LDP_SIFT_PMK_SPMKTIT_SIFT_GMM_MKLRITSU_CBVR_WKF

NUDT_SVM_WHGO_SIFT_CENTRIST_LLMUC3M_GENDISC

LIP6UPMC_MKL_L1LIP6UPMC_KSVM_BASELINE

LIP6UPMC_RANKINGBUPT_SPM_SC_HOG

WLU_SPM_EMDISTLIG_MSVM_FUSE_CONCEPT

BUPT_SVM_MULTFEATNTHU_LINSPARSE_2

BUPT_LPBETA_MULTFEATNII_SVMSIFT

HIT_PROTOLEARN_2(chance)

Action Classification Taster Challenge – 2010 on

• Given the bounding box of a person, predict whether they are performing a given action

• Developed with Ivan Laptev• Strong participation from the start (11 methods, 8 groups)

Playing Instrument? Reading?

The future …

Ten Action ClassesPhoning Playing Instrument Reading Riding Bike Riding Horse

Running Taking Photo Using Computer Walking

• 2011: added jumping + `other’ class

Jumping

Person Layout Taster Challenge – 2009 on• Given the bounding box of a person, predict the positions of head, hands

and feet.• aim: to encourage research on more detailed image interpretation

head

foot

hand

head

handhand

footfoot

head

handhand

• 2009: no participation• 2010: relaxed success criteria, 2 participants• Assessment method still not satisfactory?

Futures2012• Last official year of PASCAL2 • May introduce a new taster competition, e.g

– Materials– Actions in video

• Open to suggestions

fcv dataset zisserman_2

Technology