lecture 14 – neural networks
DESCRIPTION
Lecture 14 – Neural Networks. Machine Learning. Last Time. Perceptrons Perceptron Loss vs. Logistic Regression Loss Training Perceptrons and Logistic Regression Models using Gradient Descent. Today. Multilayer Neural Networks Feed Forward Error Back-Propagation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/1.jpg)
1
Lecture 14 – Neural Networks
Machine Learning
![Page 2: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/2.jpg)
2
Last Time
• Perceptrons– Perceptron Loss vs. Logistic Regression Loss– Training Perceptrons and Logistic Regression
Models using Gradient Descent
![Page 3: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/3.jpg)
3
Today
• Multilayer Neural Networks– Feed Forward– Error Back-Propagation
![Page 4: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/4.jpg)
4
Recall: The Neuron Metaphor• Neurons
– accept information from multiple inputs, – transmit information to other neurons.
• Multiply inputs by weights along edges• Apply some function to the set of inputs at each node
![Page 5: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/5.jpg)
5
Types of Neurons
Linear Neuron
Logistic Neuron
Perceptron
Potentially more. Require a convex loss function for gradient descent training.
![Page 6: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/6.jpg)
6
Multilayer Networks• Cascade Neurons together• The output from one layer is the input to the next• Each Layer has its own sets of weights
![Page 7: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/7.jpg)
7
Linear Regression Neural Networks
• What happens when we arrange linear neurons in a multilayer network?
![Page 8: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/8.jpg)
8
Linear Regression Neural Networks• Nothing special happens.
– The product of two linear transformations is itself a linear transformation.
![Page 9: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/9.jpg)
9
Neural Networks• We want to introduce non-linearities to the network.
– Non-linearities allow a network to identify complex regions in space
![Page 10: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/10.jpg)
10
Linear Separability• 1-layer cannot handle XOR• More layers can handle more complicated spaces – but
require more parameters• Each node splits the feature space with a hyperplane• If the second layer is AND a 2-layer network can represent
any convex hull.
![Page 11: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/11.jpg)
11
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 12: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/12.jpg)
12
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 13: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/13.jpg)
13
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 14: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/14.jpg)
14
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 15: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/15.jpg)
15
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 16: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/16.jpg)
16
Feed-Forward Networks
• Predictions are fed forward through the network to classify
![Page 17: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/17.jpg)
17
Error Backpropagation
• We will do gradient descent on the whole network.
• Training will proceed from the last layer to the first.
![Page 18: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/18.jpg)
18
Error Backpropagation
• Introduce variables over the neural network
![Page 19: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/19.jpg)
19
Error Backpropagation
• Introduce variables over the neural network– Distinguish the input and output of each node
![Page 20: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/20.jpg)
20
Error Backpropagation
![Page 21: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/21.jpg)
21
Error Backpropagation
Training: Take the gradient of the last component and iterate backwards
![Page 22: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/22.jpg)
22
Error BackpropagationEmpirical Risk Function
![Page 23: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/23.jpg)
23
Error BackpropagationOptimize last layer weights wkl
Calculus chain rule
![Page 24: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/24.jpg)
24
Error BackpropagationOptimize last layer weights wkl
Calculus chain rule
![Page 25: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/25.jpg)
25
Error BackpropagationOptimize last layer weights wkl
Calculus chain rule
![Page 26: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/26.jpg)
26
Error BackpropagationOptimize last layer weights wkl
Calculus chain rule
![Page 27: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/27.jpg)
27
Error BackpropagationOptimize last layer weights wkl
Calculus chain rule
![Page 28: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/28.jpg)
28
Error BackpropagationOptimize last hidden weights wjk
![Page 29: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/29.jpg)
29
Error BackpropagationOptimize last hidden weights wjk
Multivariate chain rule
![Page 30: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/30.jpg)
30
Error BackpropagationOptimize last hidden weights wjk
Multivariate chain rule
![Page 31: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/31.jpg)
31
Error BackpropagationOptimize last hidden weights wjk
Multivariate chain rule
![Page 32: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/32.jpg)
32
Error BackpropagationOptimize last hidden weights wjk
Multivariate chain rule
![Page 33: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/33.jpg)
33
Error BackpropagationRepeat for all previous layers
![Page 34: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/34.jpg)
34
Error BackpropagationNow that we have well defined gradients for each parameter, update using Gradient Descent
![Page 35: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/35.jpg)
35
Error Back-propagation• Error backprop unravels the multivariate chain rule and
solves the gradient for each partial component separately.• The target values for each layer come from the next layer.• This feeds the errors back along the network.
![Page 36: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/36.jpg)
36
Problems with Neural Networks
• Interpretation of Hidden Layers• Overfitting
![Page 37: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/37.jpg)
37
Interpretation of Hidden Layers
• What are the hidden layers doing?!• Feature Extraction• The non-linearities in the feature extraction
can make interpretation of the hidden layers very difficult.
• This leads to Neural Networks being treated as black boxes.
![Page 38: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/38.jpg)
38
Overfitting in Neural Networks
• Neural Networks are especially prone to overfitting.
• Recall Perceptron Error– Zero error is possible,
but so is more extreme overfitting
LogisticRegression
Perceptron
![Page 39: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/39.jpg)
39
Bayesian Neural Networks
• Bayesian Logistic Regression by inserting a prior on the weights– Equivalent to L2 Regularization
• We can do the same here.• Error Backprop then becomes Maximum A
Posteriori (MAP) rather than Maximum Likelihood (ML) training
![Page 40: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/40.jpg)
40
Handwriting Recognition
• Demo: http://yann.lecun.com/exdb/lenet/index.html
![Page 41: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/41.jpg)
41
Convolutional Network
• The network is not fully connected.
• Different nodes are responsible for different regions of the image.
• This allows for robustness to transformations.
![Page 42: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/42.jpg)
42
Other Neural Networks
• Multiple Outputs• Skip Layer Network• Recurrent Neural Networks
![Page 43: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/43.jpg)
43
Multiple Outputs
•Used for N-way classification.•Each Node in the output layer corresponds to a different class.•No guarantee that the sum of the output vector will equal 1.
![Page 44: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/44.jpg)
44
Skip Layer Network
• Input nodes are also sent directly to the output layer.
![Page 45: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/45.jpg)
45
Recurrent Neural Networks
• Output or hidden layer information is stored in a context or memory layer.
Hidden Layer Context Layer
Output Layer
Input Layer
![Page 46: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/46.jpg)
46
Recurrent Neural Networks
Hidden Layer Context Layer
Input Layer
Output Layer
• Output or hidden layer information is stored in a context or memory layer.
![Page 47: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/47.jpg)
47
Time Delayed Recurrent Neural Networks (TDRNN)
Hidden Layer
Output Layer
Input Layer
• Output layer from time t are used as inputs to the hidden layer at time t+1.
With an optional decay
![Page 48: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/48.jpg)
48
Maximum Margin
• Perceptron can lead to many equally valid choices for the decision boundary
Are these really “equally valid”?
![Page 49: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/49.jpg)
49
Max Margin
• How can we pick which is best?
• Maximize the size of the margin.
Are these really “equally valid”?
Small Margin
LargeMargin
![Page 50: Lecture 14 – Neural Networks](https://reader035.vdocument.in/reader035/viewer/2022062218/5681668a550346895dda4a65/html5/thumbnails/50.jpg)
50
Next
• Maximum Margin Classifiers– Support Vector Machines– Kernel Methods