deep end2end voxel2voxel prediction · ahmed osman • motivation –“convolutional neural...
TRANSCRIPT
![Page 1: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/1.jpg)
Ahmed Osman
Deep End2End Voxel2Voxel Prediction
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
Presented by: Ahmed Osman
![Page 2: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/2.jpg)
Ahmed Osman
•Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
•Related Work
•Contribution
•Method
•Experiments and Results
•Conclusion
2
Outline
![Page 3: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/3.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
3
Outline
![Page 4: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/4.jpg)
Ahmed Osman
• Semantic Segmentation
Video Semantic Segmentation
4
http://jamie.shotton.org/work/images/resear6.png
![Page 5: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/5.jpg)
Ahmed Osman
• Video Semantic Segmentation
Video Semantic Segmentation
5
http://jamie.shotton.org/work/images/resear6.png
![Page 6: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/6.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
6
Outline
![Page 7: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/7.jpg)
Ahmed Osman
Optical Flow Estimation
7
http://www.cvlibs.net/projects/objectsceneflow/showcase.jpg
A Filter Formulation for Computing Real Time Optical FlowAdarve et al.https://www.youtube.com/watch?v=_oW1vMdBMuY
![Page 8: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/8.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
8
Outline
![Page 9: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/9.jpg)
Ahmed Osman
Video Coloring
9
http://images.mentalfloss.com/sites/default/files/styles/article_640x430/public/colorizing-movies_6.jpg
![Page 10: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/10.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
10
Outline
![Page 11: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/11.jpg)
Ahmed Osman
Traditional Computer Vision Pipeline
11
![Page 12: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/12.jpg)
Ahmed Osman
• Motivation– “Convolutional Neural Networks (CNN) are biologically-
inspired variants of MLPs.”
– “Revolutionized the traditional computer vision pipeline”
– Re-popularized by Krizhevsky et al. in 2012 by producing state-of-the-art results on the ImageNet dataset (Image Classification).
– Why was AlexNet successful?• Large labeled datasets
• GPU Computing
Convolutional Neural Networks
12
![Page 13: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/13.jpg)
Ahmed Osman
ConvNets
13
![Page 14: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/14.jpg)
Ahmed Osman
• Convolution
ConvNets
14
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
![Page 15: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/15.jpg)
Ahmed Osman
• Convolution Layer
ConvNets
15
http://cs231n.github.io/convolutional-networks/
![Page 16: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/16.jpg)
Ahmed Osman
• Activation function
ConvNets
16
![Page 17: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/17.jpg)
Ahmed Osman
• Activation function– Rectified Linear Unit (ReLU)
• No gradient vanishing problem
• Non linear
ConvNets
17
![Page 18: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/18.jpg)
Ahmed Osman
• Pooling
ConvNets
18
![Page 19: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/19.jpg)
Ahmed Osman
• Fully Connected Layer
ConvNets
19
![Page 20: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/20.jpg)
Ahmed Osman
• How to determine the weights?– Learn them using backpropagation
ConvNets
20
![Page 21: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/21.jpg)
Ahmed Osman
• Loss Function
– Softmax
– Huber
– L2
ConvNets
21
![Page 22: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/22.jpg)
Ahmed Osman
• Loss Function
– Softmax
– Huber
– L2
ConvNets
22Green: Huber Blue: L2
![Page 23: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/23.jpg)
Ahmed Osman
• How to determine the weights?– Learn them using backpropagation
ConvNets
23
![Page 24: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/24.jpg)
Ahmed Osman
• How to determine the weights?– Learn them using backpropagation
– Chain Rule
ConvNets
24
![Page 25: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/25.jpg)
Ahmed Osman
Backpropagation
25
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 26: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/26.jpg)
Ahmed Osman
Backpropagation
26
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 27: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/27.jpg)
Ahmed Osman
Backpropagation
27
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 28: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/28.jpg)
Ahmed Osman
Backpropagation
28
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 29: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/29.jpg)
Ahmed Osman
Backpropagation
29
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 30: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/30.jpg)
Ahmed Osman
Backpropagation
30
Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
![Page 31: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/31.jpg)
Ahmed Osman
• Fully Convolutional Network
• FlowNet
• Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
Related Work
31
![Page 32: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/32.jpg)
Ahmed Osman
• Fully Convolutional Network (FCN)
Related Work
32
![Page 33: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/33.jpg)
Ahmed Osman
• FlowNet
Related Work
33
![Page 34: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/34.jpg)
Ahmed Osman
Related Work
34
• FlowNet
![Page 35: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/35.jpg)
Ahmed Osman
• Eigen et al. [2014]
Related Work
35
![Page 36: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/36.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
36
Outline
![Page 37: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/37.jpg)
Ahmed Osman
• 3D CNN end-to-end voxel-wise prediction
• Same network architecture for all three challenges.
• Introduces an approach for training with limited data.
Contribution
37
![Page 38: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/38.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
38
Outline
![Page 39: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/39.jpg)
Ahmed Osman
• Input: Channels x # of Frames x Height x Width
• Output: K x # of Frames x Height x Width
Recap: Problem
39
Segmentation done by http://segmentit.sourceforge.net/http://barkpost.com/wp-content/uploads/2013/03/oie_5181838bU3HJXJp.gif
![Page 40: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/40.jpg)
Ahmed Osman
• Adapted from C3D
• Main Difference:
Method
40
Learning Spatiotemporal Features with 3D Convolutional NetworksDu Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
![Page 41: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/41.jpg)
Ahmed Osman
• Adapted from C3D
• Main Difference: Added deconvolution layers
Method
41
Learning Spatiotemporal Features with 3D Convolutional NetworksDu Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri
![Page 42: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/42.jpg)
Ahmed Osman
Deconvolution
42
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
Layer 1 Layer 2
![Page 43: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/43.jpg)
Ahmed Osman
Deconvolution
43
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
![Page 44: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/44.jpg)
Ahmed Osman
Deconvolution
44
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
![Page 45: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/45.jpg)
Ahmed Osman
Deconvolution
45
Visualizing and Understanding Convolutional Networks
Matthew D Zeiler, Rob Fergus
![Page 46: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/46.jpg)
Ahmed Osman
Deconvolution
46
Upsampling
Learnable DeconvolutionVisualization Deconvolution
![Page 47: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/47.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
47
Outline
![Page 48: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/48.jpg)
Ahmed Osman
• Video Semantic Segmentation
• Optical Flow Estimation
• Video Coloring
Experiments and Results
48
![Page 49: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/49.jpg)
Ahmed Osman
• Dataset: – GATECH dataset
– Training set: 63 videos
– Test set: 38 sequences
– 8 Classes
Experiments: Video Semantic Segmentation
49
Geometric Context from Videos. Hussain Raza Matthias Grundmann Irfan Essa
![Page 50: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/50.jpg)
Ahmed Osman
• Experiment: – Training:
• Split each video into all possible clips of length 16 frames (i.e. stride:1).
– Testing:• Performed on all non-overlapping clips (i.e. stride: 16).
Experiments: Video Semantic Segmentation
50
Geometric Context from Videos. Hussain Raza Matthias Grundmann Irfan Essa
16 frames16 frames
![Page 51: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/51.jpg)
Ahmed Osman
• Experiment:
– Network details (V2V):• Loss layer: Softmax
• Weights initialized from C3D. New layers are randomly initialized.
• Initial learning rate: 10-4, divided by 10 every 30K iterations
Experiments: Video Semantic Segmentation
51
![Page 52: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/52.jpg)
Ahmed Osman
Results: Video Semantic Segmentation
52
![Page 53: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/53.jpg)
Ahmed Osman
Results: Video Semantic Segmentation
53
Bilinear
![Page 54: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/54.jpg)
Ahmed Osman
Results: Video Semantic Segmentation
54
Bilinear
![Page 55: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/55.jpg)
Ahmed Osman
Results: Video Semantic Segmentation
55
Bilinear
![Page 56: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/56.jpg)
Ahmed Osman
Results: Video Semantic Segmentation
56
![Page 57: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/57.jpg)
Ahmed Osman
• d
Results: Video Semantic Segmentation
57
Smooth
Noisy
Net
dep
th
![Page 58: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/58.jpg)
Ahmed Osman
• Video Semantic Segmentation
• Optical Flow Estimation
• Video Coloring
Experiments
58
![Page 59: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/59.jpg)
Ahmed Osman
• Training:– Problem:
• No large dataset with optical flow ground truth.
– Solution?• Fabricate “semi-truth” from an existing optical flow method.
• Brox’s method was used.
– Dataset: • (V2V) UCF101 (Partial: test split 1)
• (Fine-tuned V2V) MPI-Sintel
• Network:– Loss function: Huber loss
– Initial learning rate: 10-8, divided by 10 every 200K iterations
Experiments: Optical Flow Estimation
59
![Page 60: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/60.jpg)
Ahmed Osman
• Testing:– MPI-Sintel
Results: Optical Flow Estimation
60
Input V2V Brox Ground truth
![Page 61: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/61.jpg)
Ahmed Osman
• Testing:– MPI-Sintel
Results: Optical Flow Estimation
61
Input V2V Brox Ground truth
![Page 62: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/62.jpg)
Ahmed Osman
• Testing:– MPI-Sintel
Results: Optical Flow Estimation
62
![Page 63: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/63.jpg)
Ahmed Osman
• Fine-tuning from C3D does not improve a lot.
• Same Architecture, Different Purpose
Results: Optical Flow Estimation
63
![Page 64: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/64.jpg)
Ahmed Osman
• Video Semantic Segmentation
• Optical Flow Estimation
• Video Coloring
Experiments
64
![Page 65: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/65.jpg)
Ahmed Osman
• Dataset:– UCF101
– Convert color videos to grayscale.
• Experiment: – Training:
• Loss function: L2
• Initial learning rate: 10-8, divided by 10 every 200K iterations
Experiments: Video Coloring
65
![Page 66: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/66.jpg)
Ahmed Osman
Network Average Distance Error (ADE)
2D-V2V 0.1495
V2V 0.1375
Results: Video Coloring
66
![Page 67: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/67.jpg)
Ahmed Osman
Results: Video Coloring
67
• V2V learns “common sense” colors
Input
Ground TruthV2V
![Page 68: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/68.jpg)
Ahmed Osman
Results: Video Coloring
68
• V2V learns “common sense” colors
Input
Ground TruthV2V
![Page 69: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/69.jpg)
Ahmed Osman
Results: Video Coloring
69
• V2V learns “common sense” colors
Input
Ground TruthV2V
![Page 70: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/70.jpg)
Ahmed Osman
Results: Video Coloring
70
• V2V learns “common sense” colors
Input
Ground TruthV2V
![Page 71: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/71.jpg)
Ahmed Osman
• Problems– Video Semantic Segmentation
– Optical Flow Estimation
– Video Coloring
• Related Work
• Contribution
• Method
• Experiments and Results
• Conclusion
71
Outline
![Page 72: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/72.jpg)
Ahmed Osman
• Contributions:– 3D CNN end-to-end voxel-wise prediction
– “Same” network architecture for all three challenges.
– Utilizes a well-established method to generate training data.
• Criticisms– Fine-tuning improved the result in OF, noticeably in
comparison with Brox’s method
– No mention activation function even in C3D
Conclusion
72
![Page 73: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/73.jpg)
Ahmed Osman
Thank You
for Listening
73
Questions?
![Page 74: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/74.jpg)
Ahmed Osman
• “Deep End2End Voxel2Voxel Prediction”– Tran et al. 2015
• “Flownet: Learning optical flow with convolutional networks”– Fischer et al. 2015
• “Imagenet classification with deep convolutional neural networks”– Krizhevsky et al. 2012
• “Learning spatiotemporal features with 3d convolutional networks”– Tran et al. 2015
• “Visualizing and understanding convolutional networks”– Zeiler et al. 2014
• “Fully convolutional networks for semantic segmentation”– Long et al. 2015
• “Depth map prediction from a single image using a multi-scale deep network”
– Eigen et al. 2014
• “Large displacement optical flow: Descriptor matching in variational motion estimation”
– Brox et al. 2011
References
74
![Page 75: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/75.jpg)
Ahmed Osman
Backup Slides
75
![Page 76: Deep End2End Voxel2Voxel Prediction · Ahmed Osman • Motivation –“Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs.” –“Revolutionized the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31ebe125de3f2d7f5f8baa/html5/thumbnails/76.jpg)
Ahmed Osman
• A perceptron is a linear classifier that utilizes a set of weights to predict an output for a feature vector.
Multi-layer Perceptron
76
https://blog.dbrgn.ch/images/2013/3/26/perceptron.png