3d layoutcrf derek hoiem carsten rother john winn

3D LayoutCRF

Derek Hoiem

Carsten Rother

John Winn

2

Goal 1: Object Description

Object Description:

• Bounding Box

• Viewpoint

• Color

• Pose

• Subclass

3

Goal 2: Object Segmentation

4

• Combine object-level and pixel-level reasoning

Key Idea

5

Recognition Requires Object-Level Reasoning

• Position

• Shape/Size

• Viewpoint/Pose

• Style/Color

6

Recognition Requires Object-Level Reasoning

7

Solution: Window Detector?

• 45 degree range of viewpoints

• Minor scale/position variation

8

What if we have a really good model?

9

Recognition Requires Part-Level Reasoning

• Propose good global model

10

Recognition Requires Part-Level Reasoning

• Propose good global model

• Occlusions

11

Context Requires Both Object and Part-Level Info

• Size relationships require object model

12

Context Requires Both Object and Part-Level Info

• Surface relationships require occlusion info

Visibly sitting on ground

Not visibly sitting on ground

13

Our Object/Part Model

Ti = {

hj object parts

bounding box, viewpoint, color model, instance cost }

part consistency

occlusions

Tm

h1 h2 h3 h4

h5 h6 h7 h8

h9 h10 h11 hn

x

…

…

Extension from [Winn Shotton 2006]

T1…

14

Modeling Viewpoint

Parameterized by Bounding Box and Corner

15

Assigning Parts from Model

Training Image

FL

Training Annotation

Assigned Parts3D Parts Model

16

Part Assignment Consistency

17

Relabeling

• Allowing slight deformations, relabel training data

Training Image

Original Labels

New Labels

18

Eight Viewpoint/Scale Ranges

Height Range

• Appearance (but not location) constant within each range

20

Modeling Part Appearance

• Template patches (normalized xcorr)

• Intensity / Color

Image Edges (DT)

21

Modeling Part Appearance

• Randomized decision trees– 25 trees, 250 leaf nodes

• Once:– Learn structure on 50,000 object / 50,000 background

pixels

• For each appearance model:– Learn parameters on all pixels (850 LabelMe images)

22

Inference

Input Image

23

Inference

Input Image

Proposals

• One per appearance model

• Objects proposed by connected components

24

Proposal Stage Model

hi object parts

part consistency

occlusions

h1 h2 h3 h4

h5 h6 h7 h8

h9 h10 h11 hn

x

…

…

• CRF Inference (TRW-BP)

25

Inference

Refinement

• One per proposal

• Incorporate viewpoint, size information

Proposals

Input Image

26

Refinement Stage Model

Ti = {

hi object parts

bounding box, viewpoint }

part consistency

occlusions

T1

h1 h2 h3 h4

h5 h6 h7 h8

h9 h10 h11 hn

x

…

…

27

Inference

Refinement

Proposals

Arbitration

• Includes color model, instance penalty (graph cuts)

Input Image

28

Preliminary Results on UIUC

• Trained on 20, tested on rest• Quantitatively comparable to best

29

Preliminary Results on UIUC

Without Instance Cost

With Instance Cost

T1

h1 h2 h3 h4

h5 h6 h7 h8

h9 h10 h11 hn

x

…

…

30

Preliminary Results on PASCAL’06

• 25 images– One proposal (viewpoint within 45 degrees,

scale of 26-38 pixels)

31


32


33


Without Color Model

With Color Model

34

Conclusion

• Combined object-level and pixel-level reasoning – Object-level: Position/Size, Viewpoint, Color– Pixel-level: Part appearance, Occlusion

reasoning

• Good preliminary results

3d layoutcrf derek hoiem carsten rother john winn

Documents

objectlevel reasoningsolution

object modelcontext

good model

pascalevaluate color

hj object partsbounding

level infosize relationships

pascal06preliminary

bestpreliminary results