ece 6504: deep learning for perceptionf15ece6504/slides/l4_cnn.pptx.pdf · 19 question: does bprop...

67
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: (Finish) Backprop Convolutional Neural Nets

Upload: others

Post on 19-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

ECE 6504: Deep Learning for Perception

Dhruv Batra Virginia Tech

Topics: –  (Finish) Backprop –  Convolutional Neural Nets

Page 2: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Administrativia •  Presentation Assignments

–  https://docs.google.com/spreadsheets/d/1m76E4mC0wfRjc4HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/edit#gid=2045905312

(C) Dhruv Batra 2

Page 3: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Recap of last time

(C) Dhruv Batra 3

Page 4: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Last Time •  Notation + Setup •  Neural Networks •  Chain Rule + Backprop

(C) Dhruv Batra 4

Page 5: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Recall: The Neuron Metaphor •  Neurons

•  accept information from multiple inputs, •  transmit information to other neurons.

•  Artificial neuron •  Multiply inputs by weights along edges •  Apply some function to the set of inputs at each node

5 Image Credit: Andrej Karpathy, CS231n

Page 6: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Activation Functions •  sigmoid vs tanh

(C) Dhruv Batra 6

Page 7: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

A quick note

(C) Dhruv Batra 7 Image Credit: LeCun et al. ‘98

Page 8: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Rectified Linear Units (ReLU)

(C) Dhruv Batra 8

Page 9: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

(C) Dhruv Batra 9

Page 10: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

(C) Dhruv Batra 10

Page 11: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Visualizing Loss Functions •  Sum of individual losses

(C) Dhruv Batra 11 Image Credit: Andrej Karpathy, CS231n

Page 12: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Detour

(C) Dhruv Batra 12

Page 13: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Logistic Regression as a Cascade

(C) Dhruv Batra 13

w

|x

w

|x

w

|x

w

|x

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 14: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Key Computation: Forward-Prop

(C) Dhruv Batra 14 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 15: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Key Computation: Back-Prop

(C) Dhruv Batra 15 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 16: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Plan for Today •  MLPs

–  Notation –  Backprop

•  CNNs –  Notation –  Convolutions –  Forward pass –  Backward pass

(C) Dhruv Batra 16

Page 17: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Multilayer Networks •  Cascade Neurons together •  The output from one layer is the input to the next •  Each Layer has its own sets of weights

(C) Dhruv Batra 17 Image Credit: Andrej Karpathy, CS231n

Page 18: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Equivalent Representations

(C) Dhruv Batra 18 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Page 19: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

19

Question: Does BPROP work with ReLU layers only?

Answer: Nope, any a.e. differentiable transformation works.

Question: What's the computational cost of BPROP?

Answer: About twice FPROP (need to compute gradients w.r.t. input and parameters at every layer).

Note: FPROP and BPROP are dual of each other. E.g.,:

+

+

FPROP BPROP

SU

M C

OP

Y

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun (C) Dhruv Batra

Backward Propagation

Page 20: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

20

Example: 200x200 image 40K hidden units

~2B parameters!!!

- Spatial correlation is local - Waste of resources + we have not enough training samples anyway..

Fully Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 21: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

21

Example: 200x200 image 40K hidden units Filter size: 10x10

4M parameters

Note: This parameterization is good when input image is registered (e.g., face recognition).

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 22: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

22

STATIONARITY? Statistics is similar at different locations

Note: This parameterization is good when input image is registered (e.g., face recognition).

Example: 200x200 image 40K hidden units Filter size: 10x10

4M parameters

Locally Connected Layer

Slide Credit: Marc'Aurelio Ranzato

Page 23: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

23

Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato

Page 24: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

(C) Dhruv Batra 24

"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons - https://commons.wikimedia.org/

wiki/File:Convolution_of_box_signal_with_itself2.gif#/media/File:Convolution_of_box_signal_with_itself2.gif

Page 25: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolution Explained •  http://setosa.io/ev/image-kernels/

•  https://github.com/bruckner/deepViz

(C) Dhruv Batra 25

Page 26: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 26

Page 27: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 27

Page 28: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 28

Page 29: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 29

Page 30: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 30

Page 31: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 31

Page 32: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 32

Page 33: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 33

Page 34: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 34

Page 35: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 35

Page 36: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 36

Page 37: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 37

Page 38: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 38

Page 39: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 39

Page 40: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 40

Page 41: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Mathieu et al. “Fast training of CNNs through FFTs” ICLR 2014

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 41

Page 42: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

* -1 0 1 -1 0 1 -1 0 1

=

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 42

Page 43: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Learn multiple filters.

E.g.: 200x200 image 100 Filters Filter size: 10x10

10K parameters

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 43

Page 44: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convolutional Nets a

(C) Dhruv Batra 44

INPUT 32x32

Convolutions SubsamplingConvolutions

C1: feature maps 6@28x28

Subsampling

S2: f. maps6@14x14

S4: f. maps 16@5x5C5: layer120

C3: f. maps 16@10x10

F6: layer 84

Full connectionFull connection

Gaussian connections

OUTPUT 10

Image Credit: Yann LeCun, Kevin Murphy

Page 45: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Conv.

layer h1n− 1

h2n− 1

h3n− 1

h1n

h2n

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 45

output feature map

input feature map

kernel

hni = max

8<

:0,#input channelsX

j=1

hn�1j ⇤ wn

ij

9=

;

Page 46: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

h1n− 1

h2n− 1

h3n− 1

h1n

h2n

output feature map

input feature map

kernel

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 46

hni = max

8<

:0,#input channelsX

j=1

hn�1j ⇤ wn

ij

9=

;

Page 47: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

h1n− 1

h2n− 1

h3n− 1

h1n

h2n

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 47

output feature map

input feature map

kernel

hni = max

8<

:0,#input channelsX

j=1

hn�1j ⇤ wn

ij

9=

;

Page 48: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Question: What is the size of the output? What's the computational cost?

Answer: It is proportional to the number of filters and depends on the stride. If kernels have size KxK, input has size DxD, stride is 1, and there are M input feature maps and N output feature maps then: - the input has size M@DxD - the output has size N@(D-K+1)x(D-K+1) - the kernels have MxNxKxK coefficients (which have to be learned) - cost: M*K*K*N*(D-K+1)*(D-K+1)

Question: How many feature maps? What's the size of the filters?

Answer: Usually, there are more output feature maps than input feature maps. Convolutional layers can increase the number of hidden units by big factors (and are expensive to compute). The size of the filters has to match the size/scale of the patterns we want to detect (task dependent).

Convolutional Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 48

Page 49: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

A standard neural net applied to images:

- scales quadratically with the size of the input

- does not leverage stationarity

Solution:

- connect each hidden unit to a small patch of the input

- share the weight across space

This is called: convolutional layer. A network with convolutional layers is called convolutional network.

LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998

Key Ideas

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 49

Page 50: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Let us assume filter is an “eye” detector.

Q.: how can we make the detection robust to the exact location of the eye?

Pooling Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 50

Page 51: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

By “pooling” (e.g., taking max) filter

responses at different locations we gain robustness to the exact spatial location of features.

Pooling Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 51

Page 52: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Max-pooling:

Average-pooling:

L2-pooling:

L2-pooling over features:

Pooling Layer: Examples

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 52

hni (r, c) = max

r̄2N(r), c̄2N(c)hn�1i (r̄, c̄)

hni (r, c) = mean

r̄2N(r), c̄2N(c)hn�1i (r̄, c̄)

hni (r, c) =

s X

r̄2N(r), c̄2N(c)

hn�1i (r̄, c̄)2

hni (r, c) =

s X

j2N(i)

hn�1i (r, c)2

Page 53: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Question: What is the size of the output? What's the computational cost?

Answer: The size of the output depends on the stride between the pools. For instance, if pools do not overlap and have size KxK, and the input has size DxD with M input feature maps, then: - output is M@(D/K)x(D/K) - the computational cost is proportional to the size of the input (negligible compared to a convolutional layer)

Question: How should I set the size of the pools?

Answer: It depends on how much “invariant” or robust to distortions we want the representation to be. It is best to pool slowly (via a few stacks of conv-pooling layers).

Pooling Layer

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 53

Page 54: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Task: detect orientation L/R

Conv layer: linearizes manifold

Pooling Layer: Interpretation

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 54

Page 55: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Conv layer: linearizes manifold

Pooling layer: collapses manifold

Task: detect orientation L/R Pooling Layer: Interpretation

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 55

Page 56: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Conv.

layer

hn− 1 hn

Pool.

layer

hn 1

If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1)

Pooling Layer: Receptive Field Size

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 56

Page 57: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Conv.

layer

hn− 1 hn

Pool.

layer

hn 1

If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1)

Pooling Layer: Receptive Field Size

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 57

Page 58: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convol. Pooling

One stage (zoom) ConvNets: Typical Stage

courtesy of K. Kavukcuoglu

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 58

Page 59: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Convol. Pooling

One stage (zoom) ConvNets: Typical Stage

Conceptually similar to: SIFT, HoG, etc.

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 59

Page 60: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

courtesy of K. Kavukcuoglu

Note: after one stage the number of feature maps is usually increased (conv. layer) and the spatial resolution is usually decreased (stride in conv. and pooling layers). Receptive field gets bigger.

Reasons: - gain invariance to spatial translation (pooling layer) - increase specificity of features (approaching object specific units)

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 60

Page 61: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

One stage (zoom)

Fully Conn. Layers

Whole system

1st stage 2nd stage 3rd stage

Input Image

Class Labels

Convol. Pooling

ConvNets: Typical Architecture

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 61

Page 62: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Visualizing Learned Filters

(C) Dhruv Batra 62 Figure Credit: [Zeiler & Fergus ECCV14]

Page 63: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Visualizing Learned Filters

(C) Dhruv Batra 63 Figure Credit: [Zeiler & Fergus ECCV14]

Page 64: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Visualizing Learned Filters

(C) Dhruv Batra 64 Figure Credit: [Zeiler & Fergus ECCV14]

Page 65: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Frome et al. “Devise: a deep visual semantic embedding model” NIPS 2013

CNN Text

Embedding

tiger

Matching shared representation

Fancier Architectures: Multi-Modal

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 65

Page 66: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Zhang et al. “PANDA..” CVPR 2014

Conv

Norm

Pool

Conv

Norm

Pool

Conv

Norm

Pool

Conv

Norm

Pool

Fully

Conn.

Fully

Conn.

Fully

Conn.

Fully

Conn.

...

Attr. 1

Attr. 2

Attr. N

image

Fancier Architectures: Multi-Task

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 66

Page 67: ECE 6504: Deep Learning for Perceptionf15ece6504/slides/L4_cnn.pptx.pdf · 19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation

Any DAG of differentialble modules is allowed!

Fancier Architectures: Generic DAG

Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra 67