![Page 1: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/1.jpg)
Machine learning - HT 2016
8. Neural Networks
Varun Kanade
University of OxfordFebruary 19, 2016
![Page 2: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/2.jpg)
Outline
Today, we’ll study feedforward neural networks
I Multi-layer perceptrons
I Application to classification or regression settings
I Backpropagation to compute gradients
I Finally understand what model:forward and model:backward is doing
1
![Page 3: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/3.jpg)
Artificial Neuron : Logistic Regression
1
x1
x2
Σ y = Pr(y = 1 | x,w, b)
b
w1
w2
2
![Page 4: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/4.jpg)
Multilayer Perceptron (MLP) : Classification
1
x1
x2
1
Σ
Σ
1 Σ y = Pr(y = 1 | x,W,b)
b21
w211
w212
b12
w121
w122
w311
w212
b31
3
![Page 5: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/5.jpg)
Multilayer Perceptron (MLP) : Regression
1
x1
x2
1
Σ
Σ
1 Σ y = E[y | x,W,b]
b21
w211
w212
b12
w121
w122
w311
w212
b31
4
![Page 6: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/6.jpg)
A Simple Example
y = σ
(w4
11σ(w3
11σ(w211x1 + w2
12x2 + b21) + w312σ(w2
21x1 + w222x2 + b22) + b31
)+ w4
12σ(w3
21σ(w211x1 + w2
12x2 + b21) + w322σ(w2
21x1 + w222x2 + b22) + b31
)+ b41
)
5
![Page 7: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/7.jpg)
Feedforward Neural Networks
Layer 2(Hidden)
Layer 1(Input)
Layer 3(Hidden)
Layer 4(Output)
6
![Page 8: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/8.jpg)
layer 2
layer l − 1
layer l
layer L− 1
layer L
z1input x δ1 ∂L∂z1
zL+1loss L
δl
δL model = nn.Sequential()model:add(nn.Linear(n_in, n_h1));model:add(nn.Sigmoid());......model:add(nn.Linear(n_h_L-1, n_out));model:Sigmoid();criterion = nn.MSECriterion();
model:forward(x)criterion:forward(model.output, y)
dL_do = criterion:backward(model.output,y)model:backward(x, dL_do)
7
![Page 9: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/9.jpg)
layer 2
layer l − 1
layer l
layer L− 1
layer L
z1input x δ1 ∂L∂z1
zL+1loss L
δl
δL
Forward Equations
(1) z1 = x (input)
(2) zl = Wlal−1 + bl
(3) al = f(zl)
(4) L(aL, y)
8
![Page 10: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/10.jpg)
Output Layer
layer L (zL)
aL−1 δL
aL
zL = WLaL−1 + bL
aL = f(zL)
Loss: L(y,aL) = criterion : forward(aL, y)
δL = ∂L∂zL
= ∂L∂aL f ′(zL)
9
![Page 11: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/11.jpg)
Back Propagation
layer l (zl)
al−1 δl
al δl+1
al (the inputs into layer l + 1)
zl+1 = Wl+1al + bl+1 (wl+1j,k weight on connection from kth
unit in layer l to jth unit in layer l+ 1)
al = f(zl) (f is a non-linearity)
δl+1 = ∂L∂zl+1
δl = ∂L∂zl
=(
∂zl+1
∂zl
)T· ∂L∂zl+1
10
![Page 12: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/12.jpg)
Gradients wrt Parameters
layer l (zl)
al−1 δl
al δl+1
zl = Wlal−1 + bl (wlj,k weight on connection from kth
unit in layer l-1 to jth unit in layer l)
δl = ∂L∂zl
11
![Page 13: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/13.jpg)
layer 2
layer l − 1
layer l
layer L− 1
layer L
z1input x δ1 ∂L∂z1
zL+1loss L
δl
δL
Forward Equations
(1) z1 = x (input)
(2) zl = Wlal−1 + bl
(3) al = f(zl)
(4) L(aL, y)
Back-propagation Equations
(1) δL = ∂L∂aL f ′(zL)
(2) δl = (Wl+1)T δl+1 f ′(zl)
(3) ∂L∂bl = δl
(4) ∂L∂Wl = δl(al−1)T
12
![Page 14: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/14.jpg)
Output Layer : LogSoftmax
layer L (zL)
aL−1 δL
aLcriterion = nn.ClassNLLCriterion();
Loss: L(y,aL) = −aLyaL = LogSoftmax(zL)
δL = ∂L∂zL
=(
∂aL
∂zL
)T· ∂L∂aL
13
![Page 15: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/15.jpg)
layer 2
layer l − 1
layer l
layer L− 1
layer L
z1input x δ1 ∂L∂z1
zL+1loss L
δl
δL
Forward Equations
(1) z1 = x (input)
(2) zl = gl(al−1;Wl,bl)
(3) al = f l(zl)
(4) L(aL, y)
Back-propagation Equations
(1) δL =(
∂aL
∂zL
)T· ∂L∂aL
(2) δl =(
∂zl+1
∂al · ∂al
∂zl
)T· δl+1
(3) ∂L∂bl =
(∂zl
∂bl
)T· δl
(4) ∂L∂Wl =
(∂zl
∂Wl
)T· δl
14
![Page 16: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/16.jpg)
layer 2
layer l − 1
layer l
layer L− 1
layer L
z1input x δ1 ∂L∂z1
zL+1loss L
δl
δL
Forward Equations
(1) z1 = x (input)
(2) zl = Wlal−1 + bl
(3) al = f(zl)
(4) L(aL, y)
Back-propagation Equations
(1) δL = ∂L∂aL f ′(zL)
(2) δl = (Wl+1)T δl+1 f ′(zl)
(3) ∂L∂bl = δl
(4) ∂L∂Wl = δl(al−1)T
15
![Page 17: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/17.jpg)
Computational Questions
What is the running time to compute the gradient for a single data point?
What is the space requirement?
Can we process multiple examples together?
16
![Page 18: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/18.jpg)
17
![Page 19: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/19.jpg)
18
![Page 20: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/20.jpg)
Training Deep Neural Networks
I Back-propagation gives gradient
I Stochastic gradient descent is the method of choice
I RegularisationI How do we add `1 or `2 regularisation?I Don’t regularise bias terms
I How about convergence?
I What did we learn in the last 10 years, that we didn’t know in the 80s?
19
![Page 21: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/21.jpg)
Training Feedforward Deep Networks
Layer 2(Hidden)
Layer 1(Input)
Layer 3(Hidden)
Layer 4(Output)
Why do we get non-convex optimisation problem?
20
![Page 22: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/22.jpg)
A toy example
1
x ∈ −1, 1
a21 Target is y = 1−x
2
w21
b21
21
![Page 23: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/23.jpg)
Propagating Gradients Backwards
z11 a4
1w21 w3
1 w41
b21 b31 b41
22
![Page 24: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/24.jpg)
Digit Classification Problem on Sheet 4
One-hot encoding Binary encoding
23
![Page 25: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/25.jpg)
Digit Classification: Squared Loss vs Cross Entropy
Squared Error Cross Entropy
(See paper by Glorot and Bengio (2010))
24
![Page 26: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/26.jpg)
Avoiding Saturation
Use rectified linear units
ReLU(z) = max(0, z)
Sometimes, you will see justf(z) = |z|
Other varieties: leaky ReLUs,parametric ReLUs
25
![Page 27: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/27.jpg)
Initialising Weights and Biases
Why is initialising important?
Suppose we were using a sigmoid unit, howwould you initialise the incoming weights?
What if it were a ReLU unit?
How about the biases?
26
![Page 28: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/28.jpg)
Avoiding Overfitting
Deep Neural Networks have a lot of parameters
I Fully connected layers with n1, n2, .., nk units have at leastn1n2 + n2n3 + · · ·+ nk−1nk parameters
I MLP for digit recognition had 2 million parameters!
I For image detection, the neural net used by Krizhevsky, Sutskever,Hinton (2012) has 60 million parameters and 1.2 million training images
I How do we avoid deep neural networks from overfitting?
27
![Page 29: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/29.jpg)
Early Stopping
Maintain validation set and stop trainingwhen error on validation set stopsdecreasing.
What are the computational costs?
What are the advantages?
Early stopping leads to better generalisation
(See paper by Hardt, Recht and Singer (2015))
28
![Page 30: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/30.jpg)
Add DataTypically, getting additional data is either impossible or expensive
Fake the data!
Images can be translated slight, rotated slightly, change of brightness, etc.
Google Offline Translate trained on entirely fake data!
(Google Research Blog)29
![Page 31: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/31.jpg)
Add Data
Adversarial Training
Take trained (or partially trained model)
Create examples by modifications ‘‘imperceptible to the human eye’’, butwhere the model fails
(Szegedy et al., Goodfellow et al.)
30
![Page 32: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/32.jpg)
Other Ideas to Reduce Overfitting
Hard constraints on weights
Gradient Clipping
Inject noise into the system
Enforce sparsity in the neural network
Unsupervised Pre-training(Bengio et al.)
31
![Page 33: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/33.jpg)
Bagging and Dropout
Bagging (Leo Breiman - 1994)
I Given datasetD = 〈(xi, yi)〉mi=1, sampleD1,D2, · · · ,Dk of sizem fromD with replacement
I Train classifiers f1, . . . , fk onD1, . . . ,Dk
I When predicting use majority (or average if using regression)
I Clearly this approach is not practical for deep networks
32
![Page 34: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/34.jpg)
Dropout
I For input x each hidden unit with probability 1/2 independently
I Every input, will have a potentially different mask
I Potentially exponentially different models, but have ‘‘same weights’’
I After training whole network is used by halving all the weights
(Srivastava, Hinton, Krizhevsky, 2014)
33
![Page 35: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/35.jpg)
Errors Made by MLP for Digit Recognition
34
![Page 36: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/36.jpg)
Avoiding Overfitting
I Use weight sharing a.k.a tied weights in the model
I Exploit invariances -- translation
I Exploit locality in images, audio, text, etc.
I Convolutional Neural Networks (convnets)
35
![Page 37: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/37.jpg)
Convolutional Neural Networks (convnets)
(Fukushima, LeCun, Hinton 1980s)
36
![Page 38: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/38.jpg)
Convolution
37
![Page 39: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/39.jpg)
Image Convolution
Source: L. W. Kheng38
![Page 40: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/40.jpg)
Source: Krizhevsky, Sutskever, Hinton (2012)39
![Page 41: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/41.jpg)
Source: Krizhevsky, Sutskever, Hinton (2012); Wikipedia40
![Page 42: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/42.jpg)
Source: Krizhevsky, Sutskever, Hinton (2012)41
![Page 43: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/43.jpg)
Source: Zeiler and Fergus (2013)42
![Page 44: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/44.jpg)
Source: Zeiler and Fergus (2013)43
![Page 45: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/45.jpg)
Convnet in Torch
convnet
model = nn.Sequential ()model:add(nn.Reshape (1, 32, 32))-- layer 1:model:add(nn.SpatialConvolution (1, 16, 5, 5))model:add(nn.ReLU ())model:add(nn.SpatialMaxPooling (2, 2, 2, 2))-- layer 2:model:add(nn.SpatialConvolution (16, 128, 5, 5))model:add(nn.ReLU ())model:add(nn.SpatialMaxPooling (2, 2, 2, 2))-- layer 3:model:add(nn.Reshape (128*5*5))model:add(nn.Linear (128*5*5 , 200))model:add(nn.ReLU ())-- outputmodel:add(nn.Linear (200, 10))model:add(nn.LogSoftMax ())
44
![Page 46: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/46.jpg)
Convolutional Layer
zl+1i′,j′,f ′ = bf ′ +
Wf′∑i=1
Hf′∑j=1
Fl∑f=1
ali′+i−1,j′+j−1,fwl+1,f ′
i,j,f
∂zl+1i′,j′,f ′
∂wl+1,f ′
i,j,f
= ali′+i−1,j′+j−1,f
∂L
∂wl+1,f ′
i,j,f
=∑i′,j′
δl+1i′,j′,f ′a
li′+i−1,j′+j−1,f
45
![Page 47: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/47.jpg)
Convolutional Layer
zl+1i′,j′,f ′ = bf ′ +
Wf′∑i=1
Hf′∑j=1
Fl∑f=1
ali′+i−1,j′+j−1,fwl+1,f ′
i,j,f
∂zl+1i′,j′,f ′
∂ali,j,f= wl+1,f ′
i−i′+1,j−j′+1,f
∂L
∂ali,j,f=∑
i′,j′,f ′
δl+1i′,j′,f ′w
l+1,f ′
i−i′+1,j−j′+1,f
46
![Page 48: Machine learning - HT 2016 8. Neural Networks · Machine learning - HT 2016 8. Neural Networks Varun Kanade University of Oxford February 19, 2016. ... Multilayer Perceptron (MLP)](https://reader035.vdocument.in/reader035/viewer/2022062916/5ec4f31e60d60a099220fb5d/html5/thumbnails/48.jpg)
Max-Pooling Layer
bl+1i′,j′ = max
i,j∈Ω(i′,j′)ali,j
∂bl+1i′,j′
∂ali,j= I
((i, j) = argmax
i,j∈Ω(i′,j′)
ali,j
)
47