neural networks i - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/cs489_689/11.neural...
TRANSCRIPT
![Page 1: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/1.jpg)
INTRODUCTION TO MACHINE LEARNING
NEURAL NETWORKS I
Mingon Kang, Ph.D.
Department of Computer Science @ UNLV
1
![Page 2: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/2.jpg)
Begin with a single Perceptron
2
Ref: Dr. Manuela Veloso, CMU
![Page 3: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/3.jpg)
Threshold Functions
3
Ref: Dr. Manuela Veloso, CMU
![Page 4: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/4.jpg)
Boolean Functions and Perceptron
4
Ref: Dr. Manuela Veloso, CMU
?
![Page 5: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/5.jpg)
Discussion in Perceptrons
A single perceptron works well to classify a linearly
separable set of inputs
Multi-layer perceptrons
found as a “solution” to represent nonlineaerly
separable functions (1950s)
Many local minima – non-convex
Research in neural networks stopped until the 70s
5
Ref: Dr. Manuela Veloso, CMU
![Page 6: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/6.jpg)
XOR?
𝐗 =
00
01
11
01
, 𝑔 𝑥 = max(𝑥, 0)
𝐖1 = 1, 1; 1,1 ,𝐖2 = [1,−2], c = [0, −1]
6
![Page 7: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/7.jpg)
Neural Network
Feedforward Networks or Multilayer Perceptrons
(MLPs)
E.g., for a classifier
𝑦 = 𝑓∗(𝒙; 𝜽) maps an input 𝒙 to a category 𝑦
A feedforward network defines a mapping 𝑦 = 𝑓(𝒙; 𝜽)and learns the optimal parameters 𝜽.
7
![Page 8: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/8.jpg)
Neural Network
Called Networks
Involve many various functions
Associated with a directed acyclic graph that represent
how the functions are composed together
Mostly, chain structures
◼ 𝑓 𝑥 = 𝑓 3 𝑓 2 𝑓 1 𝑥
𝑓 𝑖 : the i-th layer in the network
8
![Page 9: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/9.jpg)
Neural Network
9
First layer
Second layer
Output layerThird layer
Input layer
Hidden layerDepth
Width
![Page 10: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/10.jpg)
Design of a NN model
Architecture of the network
How many layers
How these layers are connected to each other
How many units are in each layer
Activation function: to compute the hidden layer
values
Cost function: to optimize the model
Optimizer: how to optimize the model
10
![Page 11: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/11.jpg)
Weights and Bias
11
Labeling the Weights of a Neural Network
𝒃𝟏𝟑
𝒃𝒋𝒍 is the bias of the 𝒋𝒕𝒉 neuron of the 𝒍𝒕𝒉 layer
![Page 12: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/12.jpg)
Weighted Input and Activation of a Neuron
12
Weighted Input of the 𝑗𝑡ℎ neuron of the 𝑙𝑡ℎ layer is 𝑧𝑗𝑙:
𝑧𝑗𝑙 = σ𝑘 𝑤𝑗𝑘
𝑙 𝑎𝑘𝑙−1 + 𝑏𝑗
𝑙
Activation from the 𝑗𝑡ℎ neuron of the 𝑙𝑡ℎ layer is 𝑎𝑗𝑙:
𝑎𝑗𝑙 = 𝑓 𝑧𝑗
𝑙 , where 𝑓 is the activation function
𝑎𝑗𝑙 = 𝑓 σ𝑘 𝑤𝑗𝑘
𝑙 𝑎𝑘𝑙−1 + 𝑏𝑗
𝑙
In matrix notation, the activation becomes:
𝑎𝑙 = 𝑓 𝑤𝑙 𝑎𝑙−1 + 𝑏𝑙
![Page 13: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/13.jpg)
Example
13
Ref: https://visualstudiomagazine.com/articles/2013/05/01/neural-network-feed-forward.aspx
![Page 14: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/14.jpg)
Activation Functions
14
• Sigmoid Activation: 𝑎𝑗𝑙 = 𝜎 𝑧𝑗
𝑙 =1
1+ 𝑒−𝑧𝑗
𝑙
• Softmax Activation: 𝑎𝑗𝑙 =
𝑒𝑧𝑗𝑙
σ𝑘 𝑒𝑧𝑘𝑙
• Tanh Activation: 𝑡𝑎𝑛ℎ 𝑧𝑗𝑙 =
𝑒𝑥 − 𝑒−𝑥
𝑒𝑥 + 𝑒−𝑥
• Rectified Linear Activation: 𝑚𝑎𝑥 0, 𝑧𝑗𝑙 , maximum of
0 𝑜𝑟 𝑧𝑗𝑙
![Page 15: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/15.jpg)
Activation Functions
15
![Page 16: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/16.jpg)
Activation Functions
16
![Page 17: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/17.jpg)
Universal Approximator Theorem
One hidden layer may be enough to represent (not
learn) an approximation of any function to an
arbitrary degree of accuracy
So why deeper?
Shallow net may need (exponentially) more width
Shallow net may overfit more
17
![Page 18: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/18.jpg)
Cost Functions
18
• Quadratic Cost: 𝐶 =1
2𝑛σ𝑥 ||𝑦 𝑥 − 𝑎𝐿 𝑥 ||2
• Binary Cross-Entropy Cost: 𝐶 = −1
𝑛σ𝑥 ቀ
ቁ
𝑦 𝑥 × log𝑒 𝑎𝐿 𝑥 + (
)
1 −
𝑦 × log𝑒 1 − 𝑎𝐿 𝑥
• Negative Log-likelihood Cost: 𝐶 = − log𝑒 𝑎𝑦𝐿
• A Cost function must satisfy the following two conditions (for backpropagation):
• The Cost function 𝐶 should be calculated as an average over the cost functions 𝐶𝑥 for individual training examples.
• The cost functions for the individual training examples 𝐶𝑥 and consequently the Cost 𝐶 function must be a function of the outputs of the neural network.
![Page 19: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/19.jpg)
Examples of Neg Log-likelihood Cost
Given the posterior probability and the ground truth:
A set of output probabilities: e.g. [0.1, 0.3, 0.5, 0.1]
Ground truth: e.g., [0, 0, 0, 1]
Likelihood
0*0.1 + 0*0.3 + 0*0.5 + 1*0.1 = 0.1
NLL: -ln(0.1) = 2.3
If ground truth is [0, 0, 1, 0]
0*0.1 + 0*0.3 + 1*0.5 + 0*0.1 = 0.5
NLL: -ln(0.5) = 0.69
19
![Page 20: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/20.jpg)
Output Types
20
![Page 21: NEURAL NETWORKS I - mkang.faculty.unlv.edumkang.faculty.unlv.edu/teaching/CS489_689/11.Neural Networks I.pdf · Neural Network Called Networks Involve many various functions Associated](https://reader030.vdocument.in/reader030/viewer/2022040908/5e80438c7401df4e72465ee1/html5/thumbnails/21.jpg)
Discussion
Construct a neural network for MNIST data
28*28 pixel images and 10 labels
Construct a neural network for a binary
classification problem
10 features and 2 labels
Construct a neural network for predicting
tomorrow’s temperature in degree
10 variables
21