![Page 1: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/1.jpg)
Deep Learning
Eric Xing(and Pengtao Xie)
Lecture 8, October 6, 2015
Machine Learning
10-701, Fall 2015
© Eric Xing @ CMU, 2015 1
![Page 2: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/2.jpg)
Courtesy: Lee and Ng
A perennial challenge in computer vision: feature engineering
SIFT Spin image
HoG RIFT
Textons GLOH
Drawbacks of feature engineering1. Needs expert knowledge2. Time consuming hand-tuning
2© Eric Xing @ CMU, 2015
![Page 3: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/3.jpg)
Automatic feature learning? Successful learning of intermediate representations
[Lee et al ICML 2009, Lee et al NIPS 2009]
© Eric Xing @ CMU, 2015 3
![Page 4: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/4.jpg)
“Deep” models Neural Networks: Feed-forward*
You have seen it
Autoencoders (multilayer neural net with target output = input) Non-probabilistic -- Directed: PCA, Sparse Coding Probabilistic -- Undirected: MRFs and RBMs*
Convolutional Neural Nets Recursive Neural Networks*
© Eric Xing @ CMU, 2015 4
![Page 5: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/5.jpg)
Neural Network
Input
Hidden
Output
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y Loss Function
Activation Function
Weights
© Eric Xing @ CMU, 2015 5
![Page 6: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/6.jpg)
Local Computation At Each Unit
5
2 21
i ii
z w x
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output Activation Function
Weights connecting to
units in the lower layer
units in the lower layer
Linear Combination + Nonlinear Activation
© Eric Xing @ CMU, 2015 6
![Page 7: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/7.jpg)
Deep Neural Network
Input
More Hidden Layers
Output
1x 2x 3x 4x 5x
1z 2z 3z 4z
2u 3u 4u
1y 2y 3y
1u
…………
© Eric Xing @ CMU, 2015 7
![Page 8: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/8.jpg)
Activation Functions
Sigmoid Tanh Rectified Linear
• Applied on the hidden units• Achieve nonlinearity • Popular activation functions
© Eric Xing @ CMU, 2015 8
![Page 9: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/9.jpg)
Loss Functions
• Squared loss for regression
• Cross entropy loss for classification
∑
Prediction True value
Class labelPrediction
© Eric Xing @ CMU, 2015 9
![Page 10: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/10.jpg)
Neural Network Prediction
© Eric Xing @ CMU, 2015
• Compute unit values layer by layer in a forward manner
• Prediction function Activation Function
Weights
Input
Output
10
![Page 11: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/11.jpg)
Neural Network Prediction
5
1 11
i ii
z w x
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 11
![Page 12: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/12.jpg)
Neural Network Prediction
5
2 21
i ii
z w x
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 12
![Page 13: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/13.jpg)
Neural Network Prediction
5
4 41
i ii
z w x
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 13
![Page 14: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/14.jpg)
Neural Network Prediction
4
1 11
i ii
y u z
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 14
![Page 15: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/15.jpg)
Neural Network Prediction
4
2 21
i ii
y u z
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 15
![Page 16: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/16.jpg)
Neural Network Prediction
4
3 31
i ii
y u z
1( )1 xx
e
1x 2x 3x 4x 5x
1z 2z 3z 4z
1y 2y 3y
11w 12w
11u 12u
Input
Hidden
Output
© Eric Xing @ CMU, 2015 16
![Page 17: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/17.jpg)
Neural Network Training
• Gradient descent• Back-Propagation (BP)
• A routine to compute gradient• Use chain rule of derivative
© Eric Xing @ CMU, 2015 17
![Page 18: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/18.jpg)
Linear combination value ∑
Neural Network Training• Goal: compute gradient
• Apply chain rule
jiw
Training lossWeight between unit and
kz
jz
iz
Called error, computed recursively in a
backward manner
© Eric Xing @ CMU, 2015 18
![Page 19: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/19.jpg)
Neural Network Training• Apply chain rule (cont’d)
gradient= =backward error x forward activation
• Pseudo code of BP
jiw
kz
jz
iz
Derivative of activation function
While not converge1. compute forward activations2. compute backward errors3. compute gradients of weights4. perform gradient descent
© Eric Xing @ CMU, 2015 19
![Page 20: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/20.jpg)
Pretraining• A better initialization strategy of weight parameters
• Based on Restricted Boltzmann Machine• An auto-encoder model • Unsupervised• Layer-wise, greedy
• Useful when training data is limited• Not necessary when training data is rich
© Eric Xing @ CMU, 2015 20
![Page 21: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/21.jpg)
Restricted Boltzmann Machine
© Eric Xing @ CMU, 2015 21
![Page 22: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/22.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
© Eric Xing @ CMU, 2015 22
![Page 23: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/23.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
Reconstructionof input
... ... Input?=
© Eric Xing @ CMU, 2015 23
Auto-encoder:
![Page 24: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/24.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
© Eric Xing @ CMU, 2015 24
![Page 25: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/25.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
More abstract features
...
© Eric Xing @ CMU, 2015 25
![Page 26: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/26.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
More abstract features
...
Reconstructionof features
... ...=?
© Eric Xing @ CMU, 2015 26
Auto-encoder:
![Page 27: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/27.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
More abstract features
...
© Eric Xing @ CMU, 2015 27
![Page 28: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/28.jpg)
Layer-wise Unsupervised Pre-training
Input ...
Features ...
More abstract features
...
Even more abstract features
...
© Eric Xing @ CMU, 2015 28
![Page 29: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/29.jpg)
Supervised Fine-Tuning• Use the weights learned in unsupervised pretraining to
initialize the network• Then run BP in supervised setting
© Eric Xing @ CMU, 2015 29
![Page 30: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/30.jpg)
Convolutional Neural Network
Some contents are borrowed from Rob Fergus, Yan Lecun and Stanford’s course
© Eric Xing @ CMU, 2015 30
![Page 31: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/31.jpg)
Ordinary Neural
Network
Now
Figure courtesy, Fei-Fei, Andrej Karpathy© Eric Xing @ CMU, 2015 31
![Page 32: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/32.jpg)
All Neural Net activations arranged in 3 dimensions
For example, a CIFAR-10 image is a 32*32*3 volume: 32 width, 32 height, 3 depth (RGB)
Figure courtesy, Fei-Fei, Andrej Karpathy© Eric Xing @ CMU, 2015 32
![Page 33: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/33.jpg)
Local connectivity
© Eric Xing @ CMU, 2015
32
32
3
image: 32 * 32 * 3 volume
before: full connectivity: 32 * 32 * 3 weights for each neuron
now: one unit will connect to, e.g. 5*5*3 chunk and only have 5*5*3 weights
Note the connectivity is:- local in space- full in depth
33
![Page 34: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/34.jpg)
Convolution
© Eric Xing @ CMU, 2015
• One local region only gives one output• Convolution: Replicate the column of hidden units
across space, with some stride
• 7 * 7 Input • Assume 3*3 connectivity,
stride = 1
• Produce a map• What’s the size of the map?
5 * 534
![Page 35: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/35.jpg)
Convolution
© Eric Xing @ CMU, 2015
• One local region only gives one output• Convolution: Replicate the column of hidden units
across space, with some stride
• 7 * 7 Input • Assume 3*3 connectivity,
stride = 1
• What if stride = 2?
35
![Page 36: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/36.jpg)
Convolution
© Eric Xing @ CMU, 2015
• One local region only gives one output• Convolution: Replicate the column of hidden units
across space, with some stride
• 7 * 7 Input • Assume 3*3 connectivity,
stride = 1
• What if stride = 3?
36
![Page 37: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/37.jpg)
Convolution: In Practice
© Eric Xing @ CMU, 2015
• Zero Padding• Input size: 7 * 7• Filter Size: 3*3, stride 1• Pad with 1 pixel border
• Output size?• 7 * 7 => preserved size!
Slide courtesy, Fei-Fei, Andrej Karpathy37
![Page 38: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/38.jpg)
Convolution: Summary
© Eric Xing @ CMU, 2015
• Zero Padding• Input volume of size [W1 * H1 * D1]• Using K units with receptive fields F x F and applying them at
strides of S givesOutput volume: [W2, H2, D2]
• W2 = (W1 – F)/S + 1• H2 = (H1 - F) / S + 1• D2 =k
Slide courtesy, Fei-Fei, Andrej Karpathy38
![Page 39: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/39.jpg)
Convolution: Problem
© Eric Xing @ CMU, 2015
• Assume input [32 * 32 * 3]• 30 units with receptive field 5 * 5, applied at stride
1/pad 1=> Output volume: [30 * 30 * 30]
At each position of the output volume, we need 5 * 5 * 3 weights
=> Number of weights in such layer: 27000 * 75 = 2 million
Idea: Weight sharing!
Learn one unit, let the unit convolve across all local receptive fields!
39
![Page 40: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/40.jpg)
Convolution: Problem
© Eric Xing @ CMU, 2015
• Assume input [32 * 32 * 3]• 30 units with receptive field 5 * 5, applied at stride 1/pad 1
=> Output volume: [30 * 30 * 30] = 27000 units
Weight sharing=> Before: Number of weights in such layer: 27000 * 75 = 2
million => After: weight sharing: 30 * 75 = 2250
But also note that sometimes it’s not a good idea to do weight sharing! When?
40
![Page 41: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/41.jpg)
Convolutional Layers
© Eric Xing @ CMU, 2015
• Connect units only to local receptive fields• Use the same unit weight parameters for units in each “depth
slice” (i.e. across spatial positions)
Can call the units “filters”
We call the layer convolutional because it is related to convolution of two signals
Short question: Will convolution layers introduce nonlinearity?
Sometimes we also add a bias term b, y = Wx + b, like what we have done for ordinary NN
41
![Page 42: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/42.jpg)
Stacking Convolutional Layers
© Eric Xing @ CMU, 2015 42
![Page 43: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/43.jpg)
Pooling Layers
© Eric Xing @ CMU, 2015
• In ConvNet architectures, Conv layers are often followed by Pool layers
• makes the representations smaller and more manageable without losing too much information. Computes MAX operation (most common)
Slide courtesy, Fei-Fei, Andrej Karpathy43
![Page 44: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/44.jpg)
Pooling Layers
© Eric Xing @ CMU, 2015
• In ConvNet architectures, Conv layers are often followed by Pool layers
• makes the representations smaller and more manageable without losing too much information. Computes MAX operation (most common)
• Input volume of size [W1 x H1 x D1] • Pooling unit receptive fields F x F and applying them at
strides of S gives • Output volume: [W2, H2, D1]: depth unchanged!
W2 = (W1-F)/S+1, H2 = (H1-F)/S+1
Short question: Will pooling layer introduce nonlinearity?
44
![Page 45: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/45.jpg)
Nonlinerity
© Eric Xing @ CMU, 2015
• Similar to NN, we need to introduce nonlinearity in CNN
• Sigmoid• Tanh• RELU: Rectified Linear Units -> preferred
• Simplifies backpropagation • Makes learning faster • Avoids saturation issues
Slide courtesy, Yan Lecun45
![Page 46: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/46.jpg)
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
Slide courtesy, Yangqing Jia© Eric Xing @ CMU, 2015 46
![Page 47: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/47.jpg)
Convolutional Nets: 2012
AlexNet: a layered model composed of convolution, subsampling, and further operations followed by a holistic representation and all-in-all a landmark classifier onILSVRC12. [ AlexNet ]
+ data+ gpu+ non-saturating nonlinearity+ regularization
Slide courtesy, Yangqing Jia© Eric Xing @ CMU, 2015 47
![Page 48: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/48.jpg)
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error- GoogLeNet: composition of multi-
scale dimension-reduced modules (pictured)
- VGG: 16 layers of 3x3 convolution interleaved with max pooling + 3 fully-connected layers
+ depth+ data+ dimensionality reduction
Slide courtesy, Yangqing Jia© Eric Xing @ CMU, 2015 48
![Page 49: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/49.jpg)
Training CNN: Use GPU Convolutional layers
Reduce parameters BUT Increase computations
FC layers each neuron has more weights but less computations
Conv layers each neuron has less weights but more computations. Why?
because of weight sharing! it will convolve at every position!
GPU is good at
convolution!
© Eric Xing @ CMU, 2015 49
![Page 50: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/50.jpg)
Training CNN: depth cares!
21 Layers!
Gradient vanishes when the network is too deep: Lazy to learn!
Add intermediate loss layers to produce error signals! Do contrast normalization after each conv layer! Use ReLU to avoid saturation!
© Eric Xing @ CMU, 2015 50
![Page 51: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/51.jpg)
Training CNN: Huge model needs more data!
Only 7 layers, 60M parameters! Need more labeled data to train! Data augmentation: crop, translate, rotate, add noise!
© Eric Xing @ CMU, 2015 51
![Page 52: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/52.jpg)
Training CNN: highly nonconvex objective
Demand more advanced optimization techniques
Add momentum as we have done for NN
Learning rate policy decrease learning rate regularly! different layers use different learning rate! observe the trend of objective curve more often!
Initialization really cares! Supervised pretraining Unsupervised pretraining
© Eric Xing @ CMU, 2015 52
![Page 53: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/53.jpg)
Training CNN: avoid overfitting
More data are always the best way to avoid overfitting data augmentation
Add regualizations: recall what we have done for linear regression
Dropout
© Eric Xing @ CMU, 2015 53
![Page 54: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/54.jpg)
Visualize and Understand CNN
© Eric Xing @ CMU, 2015
A CNN transforms the image to 4096 numbers that are then linearly classified.
54
![Page 55: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/55.jpg)
Visualize and Understand CNN
© Eric Xing @ CMU, 2015
• Find images that maximize some class score:
Yes, Google Inceptionism! 55
![Page 56: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/56.jpg)
Visualize and Understand CNN
© Eric Xing @ CMU, 2015
• More visualizations
https://www.youtube.com/watch?v=AgkfIQ4IGaM&feature=youtu.be
56
![Page 57: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/57.jpg)
Limitations
© Eric Xing @ CMU, 2015
• Supervised Training• Need huge amount of labeled data, but label is
scarce!• Slow Training
• Train an AlexNet on a single machine need one week!
• Optimization• Highly nonconvex objective
• Parameter tuning is hard• The parameter space is so large…
57
![Page 58: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/58.jpg)
Summary
© Eric Xing @ CMU, 2015
• Neural network with specialized connectivity structure• Stack multiple stages of feature extractors• Higher stages compute more global, more invariant
features• Classification layer at the end
Slide courtesy, Rob Fergus58
![Page 59: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/59.jpg)
Summary
© Eric Xing @ CMU, 2015
• Feed-forward• Convolve input• Non-linearity (rectified linear)• Pooling (local max, mean)
• Supervised • Train convolutional filter s by back-propagation
classification error at the end
Input Image
Convolution (Learned)
Non-linearity
Spatial pooling
Normalization
Feature maps
59
![Page 60: Deep Learning - Carnegie Mellon School of Computer Scienceepxing/Class/10701/slides/lecture8-dl.pdf · 2015-10-06 · Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October](https://reader034.vdocument.in/reader034/viewer/2022042222/5ec900986b6392450e6125e5/html5/thumbnails/60.jpg)
Further reading Andrej Karpathy: The Unreasonable Effectiveness of
Recurrent Neural Networks (http://karpathy.github.io/2015/05/21/rnn-effectiveness)
Recurrent Neural Networks Tutorial (http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns)
© Eric Xing @ CMU, 2015 60