1 integrating vision models for holistic scene understanding geremy heitz cs223b march 4 th, 2009

53
1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th , 2009

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

1

Integrating Vision Models for Holistic Scene

Understanding

Geremy Heitz

CS223BMarch 4th, 2009

Page 2: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

2

Scene/Image Understanding

What’s happening in these pictures?

Page 3: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

3

Human View of a “Scene”

“A car passes a bus on the road,

while people walk past a building.”

ROAD

BUILDING

CAR

BUSPEOPLEWALKING

Page 4: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

4

Computer View of a “Scene”

BUILDING

ROAD

STREET

SCENE

Can we integrateall of these subtasks,

so thatwhole > sum of parts ?

Page 5: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

5

Outline

Overview Integrating Vision Models

CCM: Cascaded Classification Models Learning Spatial Context

TAS: Things and Stuff Future Directions

[Heitz et al. NIPS 2008a]

[Heitz & Koller ECCV 2008]

Page 6: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

6

Image/Scene Understanding

“a man and a dogare walking

on a sidewalkin front of a building”

Man

Dog

Backpack

Cigarette

Primitives Objects Parts Surfaces Regions

Interactions Context Actions Scene

Descriptions

Established techniques address these in isolation.

Reasoning over image statistics

Complex web of relations well represented by graphical models.

Reasoning over more abstract entities.

Building

Sidewalk

Page 7: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

7

Why will integration help?

What is this object?

Page 8: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

8

More Context

Context is key!

Page 9: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

9

Outline

Overview Integrating Vision Models

CCM: Cascaded Classification Models Learning Spatial Context

TAS: Things and Stuff Future Directions

[Heitz et al. NIPS 2008a]

Page 10: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

10

Human View of a “Scene”

ROAD

BUILDING

CAR

BUSPEOPLEWALKING

Scene Categorization

Object Detection Region Labelling Depth

Reconstruction Surface Orientations Boundary/Edge

Detection Outlining/Refined

Localization Occlusion

Reasoning ...

Page 11: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

11

Intrinsic Images [Barrow and Tenenbaum, 1978], [Tappen et al., 2005]

Hoiem et al., “Closing the Loop in Scene Interpretation” , 2008

We want to focus more on “semantic” classes We want to be flexible to using outside models We want an extendable framework, not one engineered for a particular set of

tasks

Related Work

= +

=

Page 12: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

12

How Should we Integrate? Single joint model over all variables

Pros: Tighter interactions, more designer control Cons: Need expertise in each of the subtasks

Simple, flexible combination of existing models

Pros: State-of-the-art models, easier to extendLimited “black-box” interface to components

Cons: Missing some of the modeling power

DETECTIONDalal & Triggs, 2006

REGION LABELINGGould et al., 2007

DEPTH RECONSTRUCTIONSaxena et al., 2007

Page 13: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

13

DET1 REG1 REC1

Cascaded Classification Models

Image

Features fDET

Object Detection

RegionLabeling

DET0IndependentModels

fREG

REG0

fREC

REC0

3DReconstructio

n

Context-awareModels

Page 14: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

14

Integrated Model for Scene Understanding

Object Detection Multi-class

Segmentation Depth Reconstruction Scene Categorization

I’ll show you

these

Page 15: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

15

Basic Object Detection

= Car

= Person

= Motorcycle= Boat

= Sheep= Cow

Detection Window W

Score(W) > 0.5

Page 16: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

16

Base Detector - HOG

[ Dalal & Triggs, CVPR, 2006 ] HOG Detector:

Feature Vector X SVM Classifier

Page 17: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

17

Context-Aware Object Detection

From Base Detector Log Score D(W)

From Scene Category MAP category, marginals

From Region Labels How much of each label is in

a window adjacent to W From Depths

Mean, variance of depths,estimate of “true” object size

Final Classifier

P(Y) = Logistic(Φ(W))

Scene Type: Urban scene

% of “road” below W

Variance of depths in W

Page 18: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

18

Multi-class Segmentation CRF Model

Label each pixel as one of:{‘grass’, ‘road’, ‘sky’, etc }

Conditional Markov random field (CRF) over superpixels:

Singleton potentials: log-linear function of boosted detectors scores for each class

Pairwise potentials: affinity of classes appearing together conditioned on (x,y) location within the image

[Gould et al., IJCV 2007]

Page 19: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

19

Context-Aware Multi-class Seg.

Additional Feature:Relative Location Map

Where is

the grass?

Page 20: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

20

Depth Reconstruction CRF

[Saxena et al., PAMI 2008]

Label each pixel with it’s distance from the camera

Conditional Markov random field (CRF) over superpixels

Continuous variables Models depth as linear

function of features with pairwise smoothness constraints

http://make3d.stanford.edu

Page 21: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

21

Depth Reconstruction with Context

BLACK BOX

GRASS

SKY

Grass is horizontal

Sky is far away

Find d* Reoptimize depths

with new constraints:

dCCM = argmin α||d - d*||

+ β||d - dCONTEXT||

Page 22: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

22

Training

I: Image f: Image Features Ŷ: Output labels Training Regimes

Independent

Ground: Groundtruth Input

I

fD fS fZ

ŶD0 ŶS

0 ŶZ0

I

fD fS fZ

ŶD1

ŶS*

ŶS1

ŶZ*

ŶZ1

)(maxarg 0 fYPINDEP

),(maxarg *1 fYYP otherGROUND

Page 23: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

23

Training

CCM Training Regime Later models can

ignore the mistakes of previous models

Training realistically emulates testing setup

Allows disjoint datasets

K-CCM: A CCM with K levels of classifiers

I

fD fS fZ

ŶD0

ŶD1

ŶS0

ŶS1

ŶZ0

ŶZ1

),ˆ(maxarg 01 fYYP otherCCM

Page 24: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

24

Experiments

DS1 422 Images, fully labeled Categorization, Detection, Multi-class

Segmentation 5-fold cross validation

DS2 1745 Images, disjoint labels Detection, Multi-class Segmentation, 3D

Reconstruction 997 Train, 748 Test

Page 25: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

25

CCM Results – DS1

CAR PEDESTRIAN

MOTORBIKE BOAT

CATEGORIES

REGION LABELS

Page 26: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

26

CCM Results – DS2

Detection

Car Person Bike Boat Sheep Cow Depth

INDEP 0.357 0.267 0.410 0.096 0.319 0.395 16.7m

2-CCM 0.364

0.272 0.410 0.212 0.289 0.415 15.4m

Regions

Tree Road Grass Water Sky Building

FG

INDEP 0.541 0.702 0.859 0.444 0.924 0.436 0.828

2-CCM 0.581 0.692 0.860 0.565 0.930 0.489 0.819

Boats

Page 27: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

27

Example ResultsIN

DEP

EN

DEN

TC

CM

Page 28: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

28

Example Results

Independent Objects

Independent Regions CCM Objects

Independent Objects

Independent Regions CCM Regions

Page 29: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

29

Understanding the man

“a man, a dog, a sidewalk, a building”

Page 30: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

30

Outline

Overview Integrating Vision Models

CCM: Cascaded Classification Models Learning Spatial Context

TAS: Things and Stuff Future Directions

[Heitz & Koller ECCV 2008]

Page 31: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

31

Things vs. Stuff

Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.

(REGIONS)

Thing (n): An object with a specific size and shape.

(DETECTIONS)

From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996.

Page 32: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

32

Cascaded Classification Models

DET1 REG1 REC1

Image

Features fDET fREG fREC

Object Detection

RegionLabeling

DET0IndependentModels

REG0 REC0

3DReconstructio

n

Context-awareModels

Page 33: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

33

CCM

Feedforward

CCMs vs. TAS

Image

fDET fREG

DET0 REG0

DET1 REG1

TAS

Modeled Jointly

Image

fDET fREG

DET REG

Relationships

Page 34: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

34

Satellite Detection Example

FALSE POSITIVE

TRUE POSITIVE

Page 35: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

35

Stuff-Thing Context

Stuff-Thing: Based on spatial

relationships

Intuition:

Trees = no cars

Houses = cars nearby

Ro

ad =

cars here

“Cars drive on roads”

“Cows graze on grass”

“Boats sail on water”Goal: Unsupervised

Page 36: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

36

Things

Detection Ti Є {0,1} Ti = 1: Candidate

window contains a positive detection

Ti

ImageWindow

Wi

P(Ti) = Logistic(score(Wi))

Page 37: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

37

Stuff

Coherent image regions Coarse “superpixels” Feature vector Fj in Rn

Cluster label Sj in {1…C}

Stuff model Naïve BayesSj

Fj

Page 38: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

38

Relationships

Descriptive Relations “Near”, “Above”,

“In front of”, etc. Choose set R = {r1…

rK} Rijk=1: Detection i and

region j have relation k Relationship model

S72 = Trees

S 4 = H

ouses

S10 =

Ro

ad

T1

Rijk

TiSj

R1,10,in=1

Page 39: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

39

Unrolled Model

T1

S1

S2

S3

S4

S5

T2

T3

R2,1,above = 0

R3,1,left = 1

R1,3,near = 0

R3,3,in = 1

R1,1,left = 1

CandidateWindows

ImageRegions

Page 40: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

40

Learning the Parameters

Assume we know R Sj is hidden

Everything else observed Expectation-Maximization

“Contextual clustering” Parameters are readily

interpretable

RijkTi Sj

Fj

ImageWindow

Wi

N

J

K

Supervisedin Training Set

AlwaysObserved

AlwaysHidden

Page 41: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

41

Which Relationships to Use?

Rijk = spatial relationship between candidate i and region j

Rij1 = candidate in regionRij2 = candidate closer than 2 bounding boxes (BBs) to regionRij3 = candidate closer than 4 BBs to regionRij4 = candidate farther than 8 BBs from regionRij5 = candidate 2BBs left of regionRij6 = candidate 2BBs right of regionRij7 = candidate 2BBs below regionRij8 = candidate more than 2 and less than 4 BBs from region…RijK = candidate near region boundary

How do we avoid overfitting?

Page 42: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

42

Learning the TAS Relations

Intuition “Detached” Rijk = inactive

relationship Structural EM iterates:

Learn parameters Decide which edge to toggle

Evaluate with l(T|F,W,R) Requires inference Better results than using

standard E[l(T,S,F,W,R)]

Rij1

Ti Sj

Fj

Rij2 RijK

Page 43: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

43

Inference

Goal:

Block Gibbs Sampling Easy to sample Ti’s given Sj’s

and vice versa

Page 44: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

44

Learned Satellite Clusters

Page 45: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

45

Results - Satellite

Prior:Detector Only

Posterior:Detections

Posterior:Region Labels

Page 46: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

46

Discovered Context - Bicycles

Bicycles

Cluster #3

Page 47: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

47

TAS Results – Bicycles

Examples

Discover “true positives”

Remove “false positives”

BIKE

??

?

Page 48: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

48

Results – VOC 2005TASBase Detector

Page 49: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

49

Understanding the man

“a man and a dog on a sidewalk,

in front of a building ”

Page 50: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

50

Outline

Overview Integrating Vision Models

CCM: Cascaded Classification Models Learning Spatial Context

TAS: Things and Stuff Future Directions

Page 51: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

51

Shape models for segmentation

We have a good deformable shape model (LOOPS) for outlining objects

We have good models for segmenting objects

Let’s combine them Add terms

encouraging landmarks to lie on segmentation boundaries

Ben Packer is working on this…

Outline Segmentation

Joint Outline Joint Segmentation

LandmarkSeg Mask

Page 52: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

52

Refined Segmentation

Our segmentation only knows about pixel “classes” What about objects?

Steve Gould is working on this…

Region Class

Region Appearance

Pixel/Region Assignment

Pixel Appearance

Page 53: 1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009

53

Full TAS-like Integration

Rijk

TiSj

Depths

OcclusionEdges

SurfaceEdges

ShapeModels