learning spatial context: using stuff to find things

42
Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008

Upload: hedia

Post on 13-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Learning Spatial Context: Using stuff to find things. Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008. Things vs. Stuff. From: Forsyth et al. Finding pictures of objects in large collections of images . Object Representation in Computer Vision , 1996. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Spatial Context: Using  stuff  to find  things

Learning Spatial Context:

Using stuff to find things

Geremy HeitzDaphne Koller

Stanford University

October 13, 2008

ECCV 2008

Page 2: Learning Spatial Context: Using  stuff  to find  things

Things vs. Stuff

Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.

Thing (n): An object with a specific size and shape.

From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996.

Page 3: Learning Spatial Context: Using  stuff  to find  things

Finding Things

Context is key!

Page 4: Learning Spatial Context: Using  stuff  to find  things

Outline

What is Context? The Things and Stuff (TAS)

model Results

Page 5: Learning Spatial Context: Using  stuff  to find  things

Satellite Detection Example

D(W) = 0.8

D(W) = 0.8

Page 6: Learning Spatial Context: Using  stuff  to find  things

Error AnalysisTypically…

We need to look outside the bounding box!

False Positives areOUT OF CONTEXT

True Positives areIN CONTEXT

Page 7: Learning Spatial Context: Using  stuff  to find  things

Types of Context

Scene-Thing:

Stuff-Stuff:

gist car “likely”

keyboard “unlikely”

Thing-Thing:

[ Torralba et al., LNCS 2005 ]

[ Gould et al., IJCV 2008 ]

[ Rabinovich et al., ICCV 2007 ]

Page 8: Learning Spatial Context: Using  stuff  to find  things

Types of Context

Stuff-Thing: Based on spatial

relationships

Intuition:

Trees = no cars

Houses = cars nearby

Ro

ad =

cars here

“Cars drive on roads”

“Cows graze on grass”

“Boats sail on water”Goal: Unsupervised

Page 9: Learning Spatial Context: Using  stuff  to find  things

Outline

What is Context? The Things and Stuff (TAS)

model Results

Page 10: Learning Spatial Context: Using  stuff  to find  things

Things Detection “candidates”

Low detector threshold -> “over-detect” Each candidate has a detector score

Page 11: Learning Spatial Context: Using  stuff  to find  things

Things

Candidate detections Image Window Wi +

Score Boolean R.V. Ti

Ti = 1: Candidate is a positive detection

Thing model

Ti

ImageWindow

Wi

))(exp(1

1)(

WDWTP i

Page 12: Learning Spatial Context: Using  stuff  to find  things

Stuff

Coherent image regions Coarse “superpixels” Feature vector Fj in Rn

Cluster label Sj in {1…C}

Stuff model Naïve Bayes

Sj

Fj

jjjjj SFPSPFSP )(),(

ssjj sSF ,~

Page 13: Learning Spatial Context: Using  stuff  to find  things

Relationships

Descriptive Relations “Near”, “Above”,

“In front of”, etc. Choose set R = {r1…

rK} Rijk=1: Detection i and

region j have relation k Relationship model

S72 = Trees

S 4 = H

ouses

S10 =

Ro

ad

T1

Rijk

TiSj

R1,10,in=1

Page 14: Learning Spatial Context: Using  stuff  to find  things

The TAS Model

RijkTi Sj

Fj

ImageWindow

Wi

Wi: Window

Ti: Object Presence

Sj: Region Label

Fj: Region Features

Rijk: Relationship

N

J

K

Supervisedin Training Set

AlwaysObserved

AlwaysHidden

Page 15: Learning Spatial Context: Using  stuff  to find  things

Unrolled Model

T1

S1

S2

S3

S4

S5

T2

T3

R2,1,above = 0

R3,1,left = 1

R1,3,near = 0

R3,3,in = 1

R1,1,left = 1

CandidateWindows

ImageRegions

Page 16: Learning Spatial Context: Using  stuff  to find  things

Learning the Parameters

Assume we know R Sj is hidden

Everything else observed Expectation-Maximization

“Contextual clustering” Parameters are readily

interpretable

RijkTi Sj

Fj

ImageWindow

Wi

N

J

K

Supervisedin Training Set

AlwaysObserved

AlwaysHidden

Page 17: Learning Spatial Context: Using  stuff  to find  things

Learned Satellite Clusters

Page 18: Learning Spatial Context: Using  stuff  to find  things

Which Relationships to Use?

Rijk = spatial relationship between candidate i and region j

Rij1 = candidate in regionRij2 = candidate closer than 2 bounding boxes (BBs) to regionRij3 = candidate closer than 4 BBs to regionRij4 = candidate farther than 8 BBs from regionRij5 = candidate 2BBs left of regionRij6 = candidate 2BBs right of regionRij7 = candidate 2BBs below regionRij8 = candidate more than 2 and less than 4 BBs from region…RijK = candidate near region boundary

How do we avoid overfitting?

Page 19: Learning Spatial Context: Using  stuff  to find  things

Learning the Relationships

Intuition “Detached” Rijk = inactive

relationship Structural EM iterates:

Learn parameters Decide which edge to toggle

Evaluate with l(T|F,W,R) Requires inference Better results than using

standard E[l(T,S,F,W,R)]

Rij1

Ti Sj

Fj

Rij2 RijK

Page 20: Learning Spatial Context: Using  stuff  to find  things

Inference

Goal:

Block Gibbs Sampling Easy to sample Ti’s given Sj’s

and vice versa

Page 21: Learning Spatial Context: Using  stuff  to find  things

Outline

What is Context? The Things and Stuff (TAS)

model Results

Page 22: Learning Spatial Context: Using  stuff  to find  things

Base Detector - HOG

[ Dalal & Triggs, CVPR, 2006 ] HOG Detector:

Feature Vector X SVM Classifier

Page 23: Learning Spatial Context: Using  stuff  to find  things

Results - Satellite

Prior:Detector Only

Posterior:Detections

Posterior:Region Labels

Page 24: Learning Spatial Context: Using  stuff  to find  things

Results - Satellite

40 80 120 1600

0.2

0.4

0.6

0.8

1

False Positives Per Image

Rec

all R

ate

Base DetectorTAS Model

~10% improvement in recall at 40 fppi

Page 25: Learning Spatial Context: Using  stuff  to find  things

PASCAL VOC Challenge

2005 Challenge 2232 images split into {train, val, test} Cars, Bikes, People, and Motorbikes

2006 Challenge 5304 images plit into {train, test} 12 classes, we use Cows and Sheep

Page 26: Learning Spatial Context: Using  stuff  to find  things

Base Detector Error Analysis

Cows

Page 27: Learning Spatial Context: Using  stuff  to find  things

Discovered Context - Bicycles

Bicycles

Cluster #3

Page 28: Learning Spatial Context: Using  stuff  to find  things

TAS Results – Bicycles

Examples

Discover “true positives”

Remove “false positives”

BIKE

??

?

Page 29: Learning Spatial Context: Using  stuff  to find  things

Results – VOC 2005

Page 30: Learning Spatial Context: Using  stuff  to find  things

Results – VOC 2006

Page 31: Learning Spatial Context: Using  stuff  to find  things

Conclusions

Detectors can benefit from context The TAS model captures

an important type of context We can improve any sliding window

detector using TAS The TAS model can be interpreted and

matches our intuitions We can learn which relationships to use

Page 32: Learning Spatial Context: Using  stuff  to find  things

Merci!

Page 33: Learning Spatial Context: Using  stuff  to find  things

Object Detection

Task: Find the things Example: Find all the

cars in this image Return a “bounding

box” for each Evaluation:

Maximize true positives

Minimize false positives

Page 34: Learning Spatial Context: Using  stuff  to find  things

Sliding Window Detection Consider every bounding box

All shifts All scales Possibly all rotations

Each such window gets a score: D(W)

Detections: Local peaks in D(W) Pros:

Covers the entire image Flexible to allow variety of

D(W)’s Cons:

Brute force – can be slow Only considers features in box

D = 1.5

D = -0.3

Page 35: Learning Spatial Context: Using  stuff  to find  things

Sliding Window Results

PASCALVisual Object Classes Challenge

Cows 2006

score(A,B) > 0.5 TRUE POSITIVE

score(A,B) ≤ 0.5 FALSE POSITIVE

B A

Recall(T) = TP / (TP + FN)Precision(T) = TP / (TP + FP)

score(A,B) = |A∩B| / |AUB|

D(W) > T

Page 36: Learning Spatial Context: Using  stuff  to find  things

Quantitative Evaluation

0 40 80 120 160

0.2

0.4

0.6

0.8

1

False Positives Per Image

Rec

all R

ate

Page 37: Learning Spatial Context: Using  stuff  to find  things
Page 38: Learning Spatial Context: Using  stuff  to find  things

0 40 80 120 160

0.2

0.4

0.6

0.8

1

False Positives Per Image

Rec

all R

ate

Base DetectorTAS Model

Prior:Detector Only

Posterior:TAS Model

Region Labels

Detections in Context

Task: Identify all cars in the satellite image

Idea: The surrounding context adds info to the local window detector

+ =Houses

Road

Page 39: Learning Spatial Context: Using  stuff  to find  things

Equations

))(exp(1

1)(

WDWTP i

jjjjj SFPSPFSP )(),(

ssjj sSF ,~

Page 40: Learning Spatial Context: Using  stuff  to find  things

Features: Haar wavelets

Haar filters and integral imageViola and Jones, ICCV 2001

The average intensity in the block is computed with four sums independently of the block size.BOOSTING!

Page 41: Learning Spatial Context: Using  stuff  to find  things

Features: Edge fragments

Weak detector = Match of edge chain(s) from training image to edgemap of test image

Opelt, Pinz, Zisserman, ECCV 2006

BOOSTING!

Page 42: Learning Spatial Context: Using  stuff  to find  things

Histograms of oriented gradients

• Dalal & Trigs, 2006

• SIFT, D. Lowe, ICCV 1999

SVM!