3d layoutcrf derek hoiem carsten rother john winn
TRANSCRIPT
3D LayoutCRF
Derek Hoiem
Carsten Rother
John Winn
2
Goal 1: Object Description
Object Description:
• Bounding Box
• Viewpoint
• Color
• Pose
• Subclass
3
Goal 2: Object Segmentation
4
• Combine object-level and pixel-level reasoning
Key Idea
5
Recognition Requires Object-Level Reasoning
• Position
• Shape/Size
• Viewpoint/Pose
• Style/Color
6
Recognition Requires Object-Level Reasoning
7
Solution: Window Detector?
• 45 degree range of viewpoints
• Minor scale/position variation
8
What if we have a really good model?
9
Recognition Requires Part-Level Reasoning
• Propose good global model
10
Recognition Requires Part-Level Reasoning
• Propose good global model
• Occlusions
11
Context Requires Both Object and Part-Level Info
• Size relationships require object model
12
Context Requires Both Object and Part-Level Info
• Surface relationships require occlusion info
Visibly sitting on ground
Not visibly sitting on ground
13
Our Object/Part Model
Ti = {
hj object parts
bounding box, viewpoint, color model, instance cost }
part consistency
occlusions
Tm
h1 h2 h3 h4
h5 h6 h7 h8
h9 h10 h11 hn
x
…
…
Extension from [Winn Shotton 2006]
T1…
14
Modeling Viewpoint
Parameterized by Bounding Box and Corner
15
Assigning Parts from Model
Training Image
FL
Training Annotation
Assigned Parts3D Parts Model
16
Part Assignment Consistency
17
Relabeling
• Allowing slight deformations, relabel training data
Training Image
Original Labels
New Labels
18
Eight Viewpoint/Scale Ranges
Height Range
• Appearance (but not location) constant within each range
20
Modeling Part Appearance
• Template patches (normalized xcorr)
• Intensity / Color
Image Edges (DT)
21
Modeling Part Appearance
• Randomized decision trees– 25 trees, 250 leaf nodes
• Once:– Learn structure on 50,000 object / 50,000 background
pixels
• For each appearance model:– Learn parameters on all pixels (850 LabelMe images)
22
Inference
Input Image
23
Inference
Input Image
Proposals
• One per appearance model
• Objects proposed by connected components
24
Proposal Stage Model
hi object parts
part consistency
occlusions
h1 h2 h3 h4
h5 h6 h7 h8
h9 h10 h11 hn
x
…
…
• CRF Inference (TRW-BP)
25
Inference
Refinement
• One per proposal
• Incorporate viewpoint, size information
Proposals
Input Image
26
Refinement Stage Model
Ti = {
hi object parts
bounding box, viewpoint }
part consistency
occlusions
T1
h1 h2 h3 h4
h5 h6 h7 h8
h9 h10 h11 hn
x
…
…
27
Inference
Refinement
Proposals
Arbitration
• Includes color model, instance penalty (graph cuts)
Input Image
28
Preliminary Results on UIUC
• Trained on 20, tested on rest• Quantitatively comparable to best
29
Preliminary Results on UIUC
Without Instance Cost
With Instance Cost
T1
h1 h2 h3 h4
h5 h6 h7 h8
h9 h10 h11 hn
x
…
…
30
Preliminary Results on PASCAL’06
• 25 images– One proposal (viewpoint within 45 degrees,
scale of 26-38 pixels)
31
Preliminary Results on PASCAL’06
32
Preliminary Results on PASCAL’06
33
Preliminary Results on PASCAL’06
Without Color Model
With Color Model
34
Conclusion
• Combined object-level and pixel-level reasoning – Object-level: Position/Size, Viewpoint, Color– Pixel-level: Part appearance, Occlusion
reasoning
• Good preliminary results