geometric and semantic 3d reconstruction: part …chaene/cvpr17tut/geometry...geometric and semantic...

32
Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian Häne UC Berkeley

Upload: lyquynh

Post on 08-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Geometric and Semantic 3D Reconstruction:

Part 4B: 3D Prediction using ConvNets

CVPR 2017 Tutorial

Christian Häne

UC Berkeley

Page 2: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Problem Statement

• Predict full 3D geometry from little input

– Single depth map

– Single RGB image

• Multi-view geometry not applicable

Input Desired Output

Page 3: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Classical Solution

• Fit shape model into input data

Morphable Face Model [Blanz and Vetter, SIGGRAPH 99]

• High resolution geometry • Manual generation of model • Only possible for suitable

object classes

Page 4: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Encoder-Decoder Architecture

• Auto-Encoder, Input = Output

– Feature Learning [Zeiler et al. CVPR 2010]

– Representation Learning

Convolutional Layers Up-Convolutional Layers

Low Dimensional Representation

Input Output

Page 5: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Auto-Encoder Architecture

• Learning Shape Representations

Input Voxels Output Voxels

3D Up-Convolutional Layers

Shape Code

3D Convolutional Layers

Page 6: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Predicting Geometry from RGB

• Input mapped to shape representation

• Decoded to occupancy volume

2D Convolutional Layers 3D Up-Convolutional Layers

Low Shape Representation

Input Image Output Voxel Grid

Page 7: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

TL-Networks [Girdhar et al. ECCV 2016]

• Learn shape representation with auto-encoder

• Predict learned embedding from rgb image

Page 8: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

TL-Networks [Girdhar et al. ECCV 2016]

• Coarse resolution voxel grids

• Generalizes to real-data

Page 9: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

3D-R2N2 [Choy et al. ECCV 2016]

• Multiple input images

Page 10: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

3D-R2N2 [Choy et al. ECCV 2016]

• Recurrent network with memory layers

– LSTM (long short-term memory) [Hochreiter & Schmidhuber 1997]

– GRU (gated recurrent unit) [Cho et al, 2014]

• Variable number of Input Images

Update Equations GRU

Page 11: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

3D-R2N2 [Choy et al. ECCV 2016]

Page 12: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Datasets

• Pascal 3D [Xiang et al. 2014]

– CAD model annotations in real images

• CAD annotation for NYUD [Guo & Hoiem, ICCV 2013]

– RGBD images annotated with CAD models

• ShapeNet [Chang et al. 2015]

– CAD model database

– Input images rendered synthetically

Page 13: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Pascal 3D

• Real images with manually aligned CAD models

Page 14: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

NYUD + CAD

• CAD models aligned with RGBD data

Page 15: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

ShapeNet

• CAD model from 57 object categories

– Rendering of images

– Voxelization

Page 16: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Notes on Evaluation

• CAD annotations unsuitable for training

– Very small number of models (ca. 10 per class)

– No train / test split of used CAD models

– Training a classifier outperforms voxel prediction

• Using CAD models directly (ShapeNet)

– Synthetic images

• Alleviated by random backgrounds

– Numbers not comparable between papers

• Different: renderings, voxelizations, viewpoint sampling, subset of categories, …..

Page 17: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Supervision

• Ground truth voxels

• Weaker supervision

– Silhouettes

– Depth maps

– Color

Page 18: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Perspective Transformer Nets [Yan et al. NIPS 2016]

• Silhouettes as Supervision

Page 19: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Perspective Transformer Nets [Yan et al. NIPS 2016]

Page 20: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Ray Consistency [Tulsiani et al. CVPR 2017]

• Ray formulation with multi-view supervision

Observation O

from camera C

Input Image

Geometric Consistency Loss

CNN

Page 21: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

• Silhouettes

• Depth

• Color

Ray Consistency [Tulsiani et al. CVPR 2017]

Page 22: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Ray Consistency [Tulsiani et al. CVPR 2017]

GT Input 3D Supervision

DRC (Mask)

DRC (Depth)

Page 23: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Coarse Resolution

• Problem: Volumetric prediction at low resolution 323

• Surface is not 3D just 2D

– Predict coefficients of parametric shape model [Dibra et al. 3DV 2016]

– Predict volume hierarchically around surface [Häne et. al. 2017]

Page 24: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Predict Moraphable Shape Model [Dibra et al. 3DV 2016]

• Needs known shape model

Page 25: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

• Our approach: Hierarchical Surface Prediction (HSP)

– Three label prediction (free/occupied/boundary)

– Higher resolution only on boundary (octree)

Input 163 323 643 1283 2563

Page 26: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

• Network architecture

Page 27: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

• Two baselines

– LR Soft

– LR Hard

Page 28: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

Page 29: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

Page 30: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

Intersection over Union

Page 31: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Hierarchical Surface Prediction [Häne, Tulsiani, Malik 2017]

• Smooth high res surfaces

• Quantitatively more accurate than low res

• Various object categories

Page 32: Geometric and Semantic 3D Reconstruction: Part …chaene/cvpr17tut/geometry...Geometric and Semantic 3D Reconstruction: Part 4B: 3D Prediction using ConvNets CVPR 2017 Tutorial Christian

Conclusion

• Geometry prediction recent approach

• Difficult to compare methods

• Weak supervision

• Hierarchical prediction