learning 3d representations, disparity estimation, and

32
Thomas Brox Learning 3D representations, disparity estimation, and structure from motion Thomas Brox University of Freiburg, Germany Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung

Upload: others

Post on 18-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning 3D representations, disparity estimation, and

Thomas Brox

Learning 3D representations, disparity estimation, and structure from motion

Thomas Brox

University of Freiburg, Germany

Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung

Page 2: Learning 3D representations, disparity estimation, and

Thomas Brox

3D shape and texture from a single image

FlowNet: end-to-end optical flow

DispNet: end-to-end disparities

DeMoN: end-to-end structure from motion

2

Outline

Page 3: Learning 3D representations, disparity estimation, and

Thomas Brox 3

Single-view to multi-viewMaxim TatarchenkoAlexey DosovitskiyECCV 2016

Choose desired output view

New image from arbitrary

view

Up-convolutional part

Additional depth map

Analysis part Canonical 3D representation?

Page 4: Learning 3D representations, disparity estimation, and

Thomas Brox 4

Single-view to multi-view

Real images

Synthetic images

Page 5: Learning 3D representations, disparity estimation, and

Thomas Brox 5

Multi-view looks like 3D

Page 6: Learning 3D representations, disparity estimation, and

Thomas Brox 6

Reconstructing explicit 3D models

Page 7: Learning 3D representations, disparity estimation, and

Thomas Brox 7

Multiview morphing

Page 8: Learning 3D representations, disparity estimation, and

Thomas Brox 8

Other interesting work

Yang et al. NIPS 2015Recurrent network, incrementally rotates the object

Ours for comparison

Kar et al. CVPR 2015Choy et al. 2016

Input GT Choy Kar Input GT Choy Kar

Page 9: Learning 3D representations, disparity estimation, and

Thomas Brox

3D shape and texture from a single image

FlowNet: end-to-end optical flow

DispNet: end-to-end disparities

DeMoN: end-to-end structure from motion

9

Outline

Page 10: Learning 3D representations, disparity estimation, and

Thomas Brox

• Can networks learn to find correspondences?

• New learning task! (very different from classification, etc.)

10

FlowNet: estimating optical flow with a ConvNet

Dosovitskiy et al. ICCV 2015

Page 11: Learning 3D representations, disparity estimation, and

Thomas Brox

Help the network with an explicit correlation layer

11

Can networks learn to find correspondences?

Dosovitskiy et al. ICCV 2015

Page 12: Learning 3D representations, disparity estimation, and

Thomas Brox

• Getting ground truth optical flow for realistic videos is hard

• Existing datasets are small:

12

Enough data to train such a network?

Frames with ground truth

Middlebury 8

KITTI 194

Sintel 1041

Needed >10000

Page 13: Learning 3D representations, disparity estimation, and

Thomas Brox 13

Realism is overrated: the “flying chairs” dataset

Image pair Optical flow

Page 14: Learning 3D representations, disparity estimation, and

Thomas Brox 14

Synthetic 3D datasets

Driving, Monkaa, FlyingThings3D datasets publicly available

Mayer et al. CVPR 2016

Page 15: Learning 3D representations, disparity estimation, and

Thomas Brox 15

Generalization: it works!

Although the network has only seen flying chairs for training, it predicts good optical flow on other data

Input images Ground truth

FlowNetSimple FlowNetCorr

Page 16: Learning 3D representations, disparity estimation, and

Thomas Brox 16

Optical flow estimation in 18ms

Page 17: Learning 3D representations, disparity estimation, and

Thomas Brox 17

FlowNet 2.0

Major changes:

• Improved data and training schedules

• Stacking of networks with motion compensation

• Special small displacements and fusion network

Eddy Ilg et al. arXiv 2016

Page 18: Learning 3D representations, disparity estimation, and

Thomas Brox 18

FlowNet vs. FlowNet 2.0

Page 19: Learning 3D representations, disparity estimation, and

Thomas Brox 19

Numbers….

Sintel KITTI runtime

DeepFlow (Weinzaepfel et al. 2013) 7.21 5.8 51940 ms

FlowFields (Bailer et al. 2015) 5.81 3.5 22810 ms

PCA Flow (Wulff & Black 2015) 8.65 6.2 140 ms

FlowNet (Dosovitskiy et al. 2015) 7.52 - 18 ms

FlowNet 2.0 5.74 1.8 123 ms

Page 20: Learning 3D representations, disparity estimation, and

Thomas Brox 20

DispNet: disparity estimation

Mayer et al. CVPR 2016

Page 21: Learning 3D representations, disparity estimation, and

Thomas Brox

DispNet: disparity estimation

21

Page 22: Learning 3D representations, disparity estimation, and

Thomas Brox

3D shape and texture from a single image

FlowNet: end-to-end optical flow

DispNet: end-to-end disparities

DeMoN: end-to-end structure from motion

22

Outline

Page 23: Learning 3D representations, disparity estimation, and

Thomas Brox 23

DeMoN: Structure from motion with a network

Egomotion estimation and depth estimation are mutually dependent

Benjamin UmmenhoferHuizhong Zhou

arxiv 2016

Page 24: Learning 3D representations, disparity estimation, and

Thomas Brox 24

Straightforward idea

Imagepair

Network ignores the second imagemotion parallax not learned

Page 25: Learning 3D representations, disparity estimation, and

Thomas Brox 25

DeMoN architecture

Estimates optical flowEstimates depth and

egomotion

Page 26: Learning 3D representations, disparity estimation, and

Thomas Brox 26

Iterative refinement

Input images Ground truth Optical Flow

Ground truth Depth

Estimated optical flow

Estimated depth

Page 27: Learning 3D representations, disparity estimation, and

Thomas Brox 27

Outperforms two-frame SfM baselines

Page 28: Learning 3D representations, disparity estimation, and

Thomas Brox 28

Two images generalize better than one image

Page 29: Learning 3D representations, disparity estimation, and

Thomas Brox 29

Two images generalize better than one image

Page 30: Learning 3D representations, disparity estimation, and

Thomas Brox 30

Structure from motion at 7fps

Page 31: Learning 3D representations, disparity estimation, and

Thomas Brox 31

Estimated camera trajectory

Example from RGB-D SLAM dataset (Sturm et al.) Red: DeMoN. Black: Ground truth.

Page 32: Learning 3D representations, disparity estimation, and

Thomas Brox

3D shape and texture from a single image

FlowNet: end-to-end optical flow

DispNet: end-to-end disparities

DeMoN: end-to-end structure from motion

32

Deep learning for 3D Vision is promising