ece 6504: deep learning for perception dhruv batra virginia tech topics: –(finish) backprop...
TRANSCRIPT
![Page 1: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/1.jpg)
ECE 6504: Deep Learningfor Perception
Dhruv Batra
Virginia Tech
Topics: – (Finish) Backprop– Convolutional Neural Nets
![Page 2: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/2.jpg)
Administrativia• Presentation Assignments
– https://docs.google.com/spreadsheets/d/1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/edit#gid=2045905312
(C) Dhruv Batra 2
![Page 3: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/3.jpg)
Recap of last time
(C) Dhruv Batra 3
![Page 4: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/4.jpg)
Last Time• Notation + Setup• Neural Networks• Chain Rule + Backprop
(C) Dhruv Batra 4
![Page 5: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/5.jpg)
5
Recall: The Neuron Metaphor• Neurons
• accept information from multiple inputs, • transmit information to other neurons.
• Artificial neuron• Multiply inputs by weights along edges• Apply some function to the set of inputs at each node
Image Credit: Andrej Karpathy, CS231n
![Page 6: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/6.jpg)
Activation Functions• sigmoid vs tanh
(C) Dhruv Batra 6
![Page 7: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/7.jpg)
A quick note
(C) Dhruv Batra 7Image Credit: LeCun et al. ‘98
![Page 8: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/8.jpg)
Rectified Linear Units (ReLU)
(C) Dhruv Batra 8
![Page 9: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/9.jpg)
(C) Dhruv Batra 9
![Page 10: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/10.jpg)
(C) Dhruv Batra 10
![Page 11: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/11.jpg)
Visualizing Loss Functions• Sum of individual losses
(C) Dhruv Batra 11Image Credit: Andrej Karpathy, CS231n
![Page 12: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/12.jpg)
Detour
(C) Dhruv Batra 12
![Page 13: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/13.jpg)
Logistic Regression as a Cascade
(C) Dhruv Batra 13Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 14: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/14.jpg)
Key Computation: Forward-Prop
(C) Dhruv Batra 14Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 15: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/15.jpg)
Key Computation: Back-Prop
(C) Dhruv Batra 15Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 16: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/16.jpg)
Plan for Today• MLPs
– Notation– Backprop
• CNNs– Notation– Convolutions– Forward pass– Backward pass
(C) Dhruv Batra 16
![Page 17: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/17.jpg)
Multilayer Networks• Cascade Neurons together• The output from one layer is the input to the next• Each Layer has its own sets of weights
(C) Dhruv Batra 17Image Credit: Andrej Karpathy, CS231n
![Page 18: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/18.jpg)
Equivalent Representations
(C) Dhruv Batra 18Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 19: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/19.jpg)
19
Question: Does BPROP work with ReLU layers only?
Answer: Nope, any a.e. differentiable transformation works.
Question: What's the computational cost of BPROP?
Answer: About twice FPROP (need to compute gradients w.r.t. input and parameters at every layer).
Note: FPROP and BPROP are dual of each other. E.g.,:
+
+
FPROP BPROP
SU
MC
OPY
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun(C) Dhruv Batra
Backward Propagation
![Page 20: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/20.jpg)
20
Example: 200x200 image 40K hidden units
~2B parameters!!!
- Spatial correlation is local- Waste of resources + we have not enough training samples anyway..
Fully Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 21: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/21.jpg)
21
Example: 200x200 image 40K hidden units Filter size: 10x10
4M parameters
Note: This parameterization is good when input image is registered (e.g., face recognition).
Locally Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 22: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/22.jpg)
22
STATIONARITY? Statistics is similar at different locations
Note: This parameterization is good when input image is registered (e.g., face recognition).
Example: 200x200 image 40K hidden units Filter size: 10x10
4M parameters
Locally Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 23: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/23.jpg)
23
Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 24: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/24.jpg)
(C) Dhruv Batra 24
"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Convolution_of_box_signal_with_itself2.gif#/media/File:Convolution_of_box_signal_with_itself2.gif
![Page 25: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/25.jpg)
Convolution Explained• http://setosa.io/ev/image-kernels/
• https://github.com/bruckner/deepViz
(C) Dhruv Batra 25
![Page 26: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/26.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 26
![Page 27: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/27.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 27
![Page 28: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/28.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 28
![Page 29: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/29.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 29
![Page 30: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/30.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 30
![Page 31: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/31.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 31
![Page 32: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/32.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 32
![Page 33: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/33.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 33
![Page 34: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/34.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 34
![Page 35: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/35.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 35
![Page 36: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/36.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 36
![Page 37: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/37.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 37
![Page 38: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/38.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 38
![Page 39: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/39.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 39
![Page 40: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/40.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 40
![Page 41: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/41.jpg)
Mathieu et al. “Fast training of CNNs through FFTs” ICLR 2014
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 41
![Page 42: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/42.jpg)
*
-1 0 1-1 0 1-1 0 1
=
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 42
![Page 43: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/43.jpg)
Learn multiple filters.
E.g.: 200x200 image 100 Filters Filter size: 10x10
10K parameters
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 43
![Page 44: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/44.jpg)
Convolutional Netsa
(C) Dhruv Batra 44Image Credit: Yann LeCun, Kevin Murphy
![Page 45: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/45.jpg)
Conv.
layerh1n−1
h2n−1
h3n−1
h 1n
h 2n
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 45
output feature
map
input feature
map
kernel
![Page 46: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/46.jpg)
h1n−1
h2n−1
h3n−1
h 1n
h 2n
output feature
map
input feature
map
kernel
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 46
![Page 47: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/47.jpg)
h1n−1
h2n−1
h3n−1
h 1n
h 2n
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 47
output feature
map
input feature
map
kernel
![Page 48: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/48.jpg)
Question: What is the size of the output? What's the computational cost?
Answer: It is proportional to the number of filters and depends on the stride. If kernels have size KxK, input has size DxD, stride is 1, and there are M input feature maps and N output feature maps then:- the input has size M@DxD - the output has size N@(D-K+1)x(D-K+1)- the kernels have MxNxKxK coefficients (which have to be learned)- cost: M*K*K*N*(D-K+1)*(D-K+1)Question: How many feature maps? What's the size of the filters?
Answer: Usually, there are more output feature maps than input feature maps. Convolutional layers can increase the number of hidden units by big factors (and are expensive to compute).The size of the filters has to match the size/scale of the patterns we want to detect (task dependent).
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 48
![Page 49: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/49.jpg)
A standard neural net applied to images:
- scales quadratically with the size of the input
- does not leverage stationarity
Solution:
- connect each hidden unit to a small patch of the input
- share the weight across space
This is called: convolutional layer.A network with convolutional layers is called convolutional network.
LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998
Key Ideas
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 49
![Page 50: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/50.jpg)
Let us assume filter is an “eye” detector.
Q.: how can we make the detection robust to the exact location of the eye?
Pooling Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 50
![Page 51: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/51.jpg)
By “pooling” (e.g., taking max) filter
responses at different locations we gain robustness to the exact spatial location of features.
Pooling Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 51
![Page 52: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/52.jpg)
Max-pooling:
Average-pooling:
L2-pooling:
L2-pooling over features:
Pooling Layer: Examples
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 52
![Page 53: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/53.jpg)
Question: What is the size of the output? What's the computational cost?
Answer: The size of the output depends on the stride between the pools. For instance, if pools do not overlap and have size KxK, and the input has size DxD with M input feature maps, then:- output is M@(D/K)x(D/K)- the computational cost is proportional to the size of the input (negligible compared to a convolutional layer)
Question: How should I set the size of the pools?
Answer: It depends on how much “invariant” or robust to distortions we want the representation to be. It is best to pool slowly (via a few stacks of conv-pooling layers).
Pooling Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 53
![Page 54: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/54.jpg)
Task: detect orientation L/R
Conv layer: linearizes manifold
Pooling Layer: Interpretation
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 54
![Page 55: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/55.jpg)
Conv layer: linearizes manifold
Pooling layer: collapses manifold
Task: detect orientation L/R
Pooling Layer: Interpretation
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 55
![Page 56: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/56.jpg)
Conv.
layer
hn−1 hn
Pool.
layer
hn1
If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1)
Pooling Layer: Receptive Field Size
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 56
![Page 57: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/57.jpg)
Conv.
layer
hn−1 hn
Pool.
layer
hn1
If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1)
Pooling Layer: Receptive Field Size
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 57
![Page 58: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/58.jpg)
Convol. Pooling
One stage (zoom)
ConvNets: Typical Stage
courtesy of K. Kavukcuoglu
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 58
![Page 59: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/59.jpg)
Convol. Pooling
One stage (zoom)
ConvNets: Typical Stage
Conceptually similar to: SIFT, HoG, etc.
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 59
![Page 60: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/60.jpg)
courtesy of K. Kavukcuoglu
Note: after one stage the number of feature maps is usually increased (conv. layer) and the spatial resolution is usually decreased (stride in conv. and pooling layers). Receptive field gets bigger.
Reasons:- gain invariance to spatial translation (pooling layer)- increase specificity of features (approaching object specific units)
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 60
![Page 61: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/61.jpg)
One stage (zoom)
Fully Conn. Layers
Whole system
1st stage 2nd stage 3rd stage
Input Image
ClassLabels
Convol. Pooling
ConvNets: Typical Architecture
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 61
![Page 62: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/62.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 62Figure Credit: [Zeiler & Fergus ECCV14]
![Page 63: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/63.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 63Figure Credit: [Zeiler & Fergus ECCV14]
![Page 64: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/64.jpg)
Visualizing Learned Filters
(C) Dhruv Batra 64Figure Credit: [Zeiler & Fergus ECCV14]
![Page 65: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/65.jpg)
Frome et al. “Devise: a deep visual semantic embedding model” NIPS 2013
CNNText
Embedding
tiger
Matching
shared representation
Fancier Architectures: Multi-Modal
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 65
![Page 66: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/66.jpg)
Zhang et al. “PANDA..” CVPR 2014
Conv
Norm
Pool
Conv
Norm
Pool
Conv
Norm
Pool
Conv
Norm
Pool
Fully
Conn.
Fully
Conn.
Fully
Conn.
Fully
Conn.
.. .
Attr. 1
Attr. 2
Attr. N
image
Fancier Architectures: Multi-Task
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 66
![Page 67: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets](https://reader035.vdocument.in/reader035/viewer/2022081603/56649f505503460f94c72aa4/html5/thumbnails/67.jpg)
Any DAG of differentialble modules is allowed!
Fancier Architectures: Generic DAG
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 67