![Page 1: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/1.jpg)
Introduction to neural networks
16-385 Computer VisionSpring 2019, Lecture 19http://www.cs.cmu.edu/~16385/
![Page 2: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/2.jpg)
• Homework 5 has been posted and is due on April 10th.- Any questions about the homework?- How many of you have looked at/started/finished homework 5?
• Homework 6 will be posted on Wednesday.- There will be some adjustment in deadlines (and homework lengths), so that homework 7 is not squeezed to 5 days.
Course announcements
![Page 3: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/3.jpg)
• Perceptron.
• Neural networks.
• Training perceptrons.
• Gradient descent.
• Backpropagation.
• Stochastic gradient descent.
Overview of today’s lecture
![Page 4: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/4.jpg)
Slide credits
Most of these slides were adapted from:
• Kris Kitani (16-385, Spring 2017).
• Noah Snavely (Cornell University).
• Fei-Fei Li (Stanford University).
• Andrej Karpathy (Stanford University).
![Page 5: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/5.jpg)
Perceptron
![Page 6: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/6.jpg)
1950s Age of the Perceptron
1980s Age of the Neural Network
2010s Age of the Deep Network
1969 Perceptrons (Minsky, Papert)
2000s Age of the Support Vector Machine
1957 The Perceptron (Rosenblatt)
1990s Age of the Graphical Model
1986 Back propagation (Hinton)
deep learning = known algorithms + computing power + big data
![Page 7: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/7.jpg)
![Page 8: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/8.jpg)
The Perceptron
inputs
weights
output
sum sign function(Heaviside step function)
![Page 9: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/9.jpg)
Neural nets/perceptrons are loosely inspired by
biology.
But they certainly are not a model of how the brain
works, or even how neurons work.
Aside: Inspiration from Biology
![Page 10: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/10.jpg)
N-d binary vector
perceptron is just one line of code!
sign of zero is +1
![Page 11: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/11.jpg)
initialized to 0
![Page 12: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/12.jpg)
observation (1,-1)
label -1
![Page 13: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/13.jpg)
= 1
observation (1,-1)
label -1
![Page 14: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/14.jpg)
observation (1,-1)
label -1
![Page 15: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/15.jpg)
update w
observation (1,-1)
label -1
![Page 16: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/16.jpg)
update w
observation (1,-1)
label -1
-1 1(1,-1)(0,0)(-1,1)
no match!
![Page 17: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/17.jpg)
observation (-1,1)
label +1
(-1,1)
![Page 18: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/18.jpg)
observation (-1,1)
label +1
= 1(-1,1)
(-1,1) (-1,1)
![Page 19: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/19.jpg)
observation (-1,1)
label +1
= 1(-1,1)
(-1,1) (-1,1)
![Page 20: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/20.jpg)
update w
observation (-1,1)
label +1
update w
+1 0(-1,1)(-1,1)(-1,1)
match!
![Page 21: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/21.jpg)
![Page 22: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/22.jpg)
update w
![Page 23: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/23.jpg)
update w
![Page 24: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/24.jpg)
![Page 25: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/25.jpg)
update w
![Page 26: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/26.jpg)
repeat …
![Page 27: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/27.jpg)
The Perceptron
inputs
weights
output
sum sign function(e.g., step,sigmoid, Tanh, ReLU)
bias
![Page 28: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/28.jpg)
Another way to draw it…
inputs
weights
output
Activation Function(e.g., Sigmoid function of weighted sum)
(1) Combine the sum and
activation function
(2) suppress the bias
term (less clutter)
![Page 29: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/29.jpg)
output
float perceptron(vector<float> x, vector<float> w)
{
float a = dot(x,w);
return f(a);
}
float f(float a)
{
return 1.0 / (1.0+ exp(-a));
}
Activation function (sigmoid, logistic function)
Perceptron function (logistic regression)
Programming the 'forward pass'
![Page 30: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/30.jpg)
Neural networks
![Page 31: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/31.jpg)
Connect a bunch of perceptrons together …
![Page 32: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/32.jpg)
Neural Networka collection of connected perceptrons
Connect a bunch of perceptrons together …
![Page 33: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/33.jpg)
Neural Networka collection of connected perceptrons
Connect a bunch of perceptrons together …
How many perceptrons in this neural network?
![Page 34: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/34.jpg)
Neural Networka collection of connected perceptrons
‘one perceptron’
Connect a bunch of perceptrons together …
![Page 35: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/35.jpg)
Neural Networka collection of connected perceptrons
‘two perceptrons’
Connect a bunch of perceptrons together …
![Page 36: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/36.jpg)
Neural Networka collection of connected perceptrons
‘three perceptrons’
Connect a bunch of perceptrons together …
![Page 37: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/37.jpg)
Neural Networka collection of connected perceptrons
‘four perceptrons’
Connect a bunch of perceptrons together …
![Page 38: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/38.jpg)
Neural Networka collection of connected perceptrons
‘five perceptrons’
Connect a bunch of perceptrons together …
![Page 39: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/39.jpg)
Neural Networka collection of connected perceptrons
‘six perceptrons’
Connect a bunch of perceptrons together …
![Page 40: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/40.jpg)
‘input’ layer
Some terminology…
…also called a Multi-layer Perceptron (MLP)
![Page 41: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/41.jpg)
‘input’ layer‘hidden’ layer
Some terminology…
…also called a Multi-layer Perceptron (MLP)
![Page 42: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/42.jpg)
‘input’ layer‘hidden’ layer
‘output’ layer
Some terminology…
…also called a Multi-layer Perceptron (MLP)
![Page 43: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/43.jpg)
this layer is a
‘fully connected layer’
all pairwise neurons between layers are connected
![Page 44: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/44.jpg)
so is this
all pairwise neurons between layers are connected
![Page 45: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/45.jpg)
How many neurons (perceptrons)?
How many weights (edges)?
How many learnable parameters total?
![Page 46: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/46.jpg)
How many neurons (perceptrons)?
How many weights (edges)?
How many learnable parameters total?
4 + 2 = 6
![Page 47: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/47.jpg)
How many neurons (perceptrons)?
How many weights (edges)?
How many learnable parameters total?
4 + 2 = 6
(3 x 4) + (4 x 2) = 20
![Page 48: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/48.jpg)
How many neurons (perceptrons)?
How many weights (edges)?
How many learnable parameters total?
4 + 2 = 6
(3 x 4) + (4 x 2) = 20
20 + 4 + 2 = 26bias terms
![Page 49: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/49.jpg)
performance usually tops out at 2-3 layers,
deeper networks don’t really improve performance...
...with the exception of convolutional networks for images
![Page 50: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/50.jpg)
Training perceptrons
![Page 51: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/51.jpg)
Let’s start easy
![Page 52: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/52.jpg)
world’s smallest perceptron!
What does this look like?
![Page 53: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/53.jpg)
world’s smallest perceptron!
(a.k.a. line equation, linear regression)
![Page 54: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/54.jpg)
Given a set of samples and a Perceptron
Estimate the parameters of the Perceptron
Learning a Perceptron
![Page 55: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/55.jpg)
What do you think the weight parameter is?
1 1.1
2 1.9
3.5 3.4
10 10.1
Given training data:
![Page 56: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/56.jpg)
What do you think the weight parameter is?
1 1.1
2 1.9
3.5 3.4
10 10.1
Given training data:
not so obvious as the network gets more complicated so we use …
![Page 57: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/57.jpg)
Given several examples
An Incremental Learning Strategy(gradient descent)
and a perceptron
![Page 58: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/58.jpg)
Given several examples
Modify weight such that gets ‘closer’ to
and a perceptron
An Incremental Learning Strategy(gradient descent)
![Page 59: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/59.jpg)
Given several examples
Modify weight such that gets ‘closer’ to
and a perceptron
perceptron
output
true
label
perceptron
parameter
An Incremental Learning Strategy(gradient descent)
![Page 60: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/60.jpg)
Given several examples
Modify weight such that gets ‘closer’ to
and a perceptron
perceptron
output
true
label
perceptron
parameter
An Incremental Learning Strategy(gradient descent)
what does
this mean?
![Page 61: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/61.jpg)
Loss Function
defines what is means to be
close to the true solution
YOU get to chose the loss function!(some are better than others depending on what you want to do)
Before diving into gradient descent, we need to understand …
![Page 62: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/62.jpg)
Squared Error (L2)(a popular loss function) ((why?))
![Page 63: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/63.jpg)
L1 Loss L2 Loss
Zero-One Loss Hinge Loss
![Page 64: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/64.jpg)
World’s Smallest Perceptron!
back to the…
function of ONE parameter!
(a.k.a. line equation, linear regression)
![Page 65: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/65.jpg)
Given a set of samples and a Perceptron
Estimate the parameter of the Perceptron
Learning a Perceptron
what is this
activation function?
![Page 66: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/66.jpg)
Given a set of samples and a Perceptron
Estimate the parameter of the Perceptron
Learning a Perceptron
what is this
activation function?linear function!
![Page 67: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/67.jpg)
Given several examples
Modify weight such that gets ‘closer’ to
Learning Strategy(gradient descent)
and a perceptron
perceptron
output
true
label
perceptron
parameter
![Page 68: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/68.jpg)
Let’s demystify this process first…
Code to train your perceptron:
![Page 69: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/69.jpg)
Let’s demystify this process first…
Code to train your perceptron:
just one line of code!
Now where does this come from?
![Page 70: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/70.jpg)
Gradient descent
![Page 71: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/71.jpg)
(partial) derivatives tell us how much
one variable affects another
![Page 72: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/72.jpg)
Slope of a function Knobs on a machine
Two ways to think about them:
![Page 73: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/73.jpg)
describes the slope around a
point
1. Slope of a function:
![Page 74: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/74.jpg)
describes how each
‘knob’ affects the output
input output
2. Knobs on a machine:
output will change bysmall change in parameter
![Page 75: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/75.jpg)
Given a
fixed-point on a function,
move in the direction
opposite of the gradient
Gradient descent:
![Page 76: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/76.jpg)
Gradient descent:
update rule:
![Page 77: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/77.jpg)
Backpropagation
![Page 78: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/78.jpg)
World’s Smallest Perceptron!
back to the…
function of ONE parameter!
(a.k.a. line equation, linear regression)
![Page 79: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/79.jpg)
Training the world’s smallest perceptron
this should be the
gradient of the loss
function
Now where does this come from?
This is just gradient
descent, that means…
![Page 80: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/80.jpg)
…is the rate at which this will change…
… per unit change of this
the loss function
the weight parameter
Let’s compute the derivative…
![Page 81: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/81.jpg)
Compute the derivative
That means the weight update for gradient descent is:
just shorthand
move in direction of negative gradient
![Page 82: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/82.jpg)
Gradient Descent (world’s smallest perceptron)
For each sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
![Page 83: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/83.jpg)
Training the world’s smallest perceptron
![Page 84: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/84.jpg)
world’s (second) smallest
perceptron!
function of two parameters!
![Page 85: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/85.jpg)
Gradient Descent
For each sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
we just need to compute partial
derivatives for this network
![Page 86: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/86.jpg)
Derivative computation
Why do we have partial derivatives now?
![Page 87: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/87.jpg)
Derivative computation
Gradient Update
![Page 88: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/88.jpg)
Gradient Descent
For each sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update(adjustable step size)
two lines now
(side computation to track loss. not
needed for backprop)
![Page 89: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/89.jpg)
We haven’t seen a lot of ‘propagation’ yet
because our perceptrons only had one layer…
![Page 90: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/90.jpg)
multi-layer perceptron
function of FOUR parameters and FOUR layers!
![Page 91: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/91.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 92: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/92.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 93: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/93.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 94: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/94.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 95: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/95.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 96: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/96.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 97: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/97.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 98: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/98.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 99: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/99.jpg)
hidden
layer 2
hidden
layer 3
output
layer 4
weightinputactivationsum
weight weightactivation activation
input
layer 1
![Page 100: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/100.jpg)
Entire network can be written out as one long equation
What is known? What is unknown?
We need to train the network:
![Page 101: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/101.jpg)
Entire network can be written out as a long equation
What is known? What is unknown?
known
We need to train the network:
![Page 102: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/102.jpg)
Entire network can be written out as a long equation
What is known? What is unknown?
unknown
We need to train the network:
activation function
sometimes has
parameters
![Page 103: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/103.jpg)
Given a set of samples and a MLP
Estimate the parameters of the MLP
Learning an MLP
![Page 104: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/104.jpg)
Gradient Descent
For each random sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient updatevector of parameter update equations
vector of parameter partial derivatives
![Page 105: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/105.jpg)
So we need to compute the partial derivatives
![Page 106: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/106.jpg)
Partial derivative describes…
So, how do you compute it?
(loss layer)
Remember,
![Page 107: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/107.jpg)
The Chain Rule
![Page 108: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/108.jpg)
rest of the network
Intuitively, the effect of weight on loss function :
depends on
depends on
depends on
According to the chain rule…
![Page 109: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/109.jpg)
rest of the network
Chain Rule!
![Page 110: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/110.jpg)
rest of the network
Just the partial
derivative of L2 loss
![Page 111: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/111.jpg)
rest of the network
Let’s use a Sigmoid function
![Page 112: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/112.jpg)
rest of the network
Let’s use a Sigmoid function
![Page 113: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/113.jpg)
rest of the network
![Page 114: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/114.jpg)
![Page 115: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/115.jpg)
already computed.
re-use (propagate)!
![Page 116: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/116.jpg)
The Chain Rule
a.k.a. backpropagation
![Page 117: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/117.jpg)
The chain rule says…
depends ondepends on
depends ondepends ondepends on
depends on
depends on
![Page 118: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/118.jpg)
The chain rule says…
depends ondepends on
depends ondepends ondepends on
depends on
depends on
already computed.
re-use (propagate)!
![Page 119: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/119.jpg)
depends ondepends on
depends ondepends ondepends on
depends on
depends on
![Page 120: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/120.jpg)
depends ondepends on
depends ondepends ondepends on
depends on
depends on
![Page 121: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/121.jpg)
depends ondepends on
depends ondepends ondepends on
depends on
depends on
![Page 122: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/122.jpg)
Gradient Descent
For each example sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
![Page 123: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/123.jpg)
vector of parameter update equations
vector of parameter partial derivatives
Gradient Descent
For each example sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
![Page 124: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/124.jpg)
Stochastic gradient
descent
![Page 125: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/125.jpg)
What we are truly minimizing:
minθ
𝑖=1
𝑁
𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
The gradient is:
![Page 126: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/126.jpg)
What we are truly minimizing:
minθ
𝑖=1
𝑁
𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
The gradient is:
𝑖=1
𝑁𝜕𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
𝜕θ
What we use for gradient update is:
![Page 127: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/127.jpg)
What we are truly minimizing:
minθ
𝑖=1
𝑁
𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
The gradient is:
𝑖=1
𝑁𝜕𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
𝜕θ
What we use for gradient update is:
𝜕𝐿(𝑦𝑖 , 𝑓𝑀𝐿𝑃(𝑥𝑖))
𝜕θfor some i
![Page 128: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/128.jpg)
vector of parameter update equations
vector of parameter partial derivatives
Stochastic Gradient Descent
For each example sample
1. Predict
a. Forward pass
b. Compute Loss
2. Update
a. Back Propagation
b. Gradient update
![Page 129: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/129.jpg)
How do we select which sample?
![Page 130: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/130.jpg)
How do we select which sample?
• Select randomly!
Do we need to use only one sample?
![Page 131: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/131.jpg)
How do we select which sample?
• Select randomly!
Do we need to use only one sample?
• You can use a minibatch of size B < N.
Why not do gradient descent with all samples?
![Page 132: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/132.jpg)
How do we select which sample?
• Select randomly!
Do we need to use only one sample?
• You can use a minibatch of size B < N.
Why not do gradient descent with all samples?
• It’s very expensive when N is large (big data).
Do I lose anything by using stochastic GD?
![Page 133: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/133.jpg)
How do we select which sample?
• Select randomly!
Do we need to use only one sample?
• You can use a minibatch of size B < N.
Why not do gradient descent with all samples?
• It’s very expensive when N is large (big data).
Do I lose anything by using stochastic GD?
• Same convergence guarantees and complexity!
• Better generalization.
![Page 134: Introduction to neural networks - cs.cmu.edu16385/s19/lectures/lecture19.pdf · Introduction to neural networks 16-385 Computer Vision 16385/ Spring 2019, Lecture 19](https://reader030.vdocument.in/reader030/viewer/2022040618/5f26ec722d3d7e59b24406ca/html5/thumbnails/134.jpg)
References
Basic reading: No standard textbooks yet! Some good resources:
• https://sites.google.com/site/deeplearningsummerschool/
• http://www.deeplearningbook.org/
• http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf