from logistic regression to neural networks
TRANSCRIPT
![Page 1: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/1.jpg)
From Logistic Regressionto Neural Networks
CMSC 470
Marine Carpuat
![Page 2: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/2.jpg)
Logistic RegressionWhat you should know
How to make a prediction with logistic regression classifier
How to train a logistic regression classifier
Machine learning concepts:
Loss function
Gradient Descent Algorithm
![Page 3: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/3.jpg)
![Page 4: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/4.jpg)
SGD hyperparameter:the learning rate• The hyperparameter 𝜂 that control the size of the step down the
gradient is called the learning rate
• If 𝜂 is too large, training might not converge; if 𝜂 is too small, training might be very slow.
• How to set the learning rate? Common strategies:• decay over time: 𝜂 =
1
𝐶+𝑡
• Use held-out test set, increase learning rate when likelihood increases
Constant hyperparameter
set by user
Number of samples
![Page 5: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/5.jpg)
Multiclass Logistic Regression
![Page 6: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/6.jpg)
Formalizing classification
Task definition
• Given inputs:• an example x
often x is a D-dimensional vector of binary or real values
• a fixed set of classes Y
Y = {y1, y2,…, yJ}
e.g. word senses from WordNet
• Output: a predicted class y Y
Classifier definition
A function g: x g(x) = y
Many different types of functions/classifiers can be defined
• We’ll talk about perceptron, logistic regression, neural networks.
So far we’ve only worked with binary classification
problems i.e. J = 2
![Page 7: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/7.jpg)
A multiclass logistic regression classifier
aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier
Goal: predict probability P(y=c|x), where c is one of k classes in set C
![Page 8: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/8.jpg)
The softmax function
• A generalization of the sigmoid
• Input: a vector z of dimensionality k
• Output: a vector of dimensionality k
Looks like a probability distribution!
![Page 9: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/9.jpg)
The softmax functionExample
All values are in [0,1] and sum up to 1: they can be interpreted as probabilities!
![Page 10: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/10.jpg)
A multiclass logistic regression classifier
aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier
Goal: predict probability P(y=c|x), where c is one of k classes in set C
Model definition: We now have one weight vector and
one bias PER CLASS
![Page 11: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/11.jpg)
Features in multiclass logistic regression
• Features are a function of the input example and of a candidate output class c
• represents feature i for a particular class c for a given example x
![Page 12: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/12.jpg)
Example: sentiment analysis with 3 classes {positive (+), negative (-), neutral (0)}• Starting from the features for binary classification
• We create one copy of each feature per class
![Page 13: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/13.jpg)
Learning in Multiclass Logistic Regression
• Loss function for a single example
1{ } is an indicator function that evaluates to1 if the condition in the brackets is true, and
to 0 otherwise
![Page 14: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/14.jpg)
Learning in Multiclass Logistic Regression
• Loss function for a single example
![Page 15: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/15.jpg)
Learning in Multiclass Logistic Regression
![Page 16: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/16.jpg)
Logistic RegressionWhat you should knowHow to make a prediction with logistic regression classifier
How to train a logistic regression classifier
For both binary and multiclass problems
Machine learning concepts:
Loss function
Gradient Descent Algorithm
Learning rate
![Page 17: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/17.jpg)
Neural Networks
![Page 18: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/18.jpg)
From logistic regressionto a neural network unit
![Page 19: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/19.jpg)
Limitation of perceptron
● can only find linear separations between positive and negative examples
X
O
O
X
![Page 20: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/20.jpg)
Example: binary classification with a neural network
● Create two classifiers
X
O
O
X
φ0(x2) = {1, 1}φ
0(x1) = {-1, 1}
φ0(x4) = {1, -1}φ
0(x3) = {-1, -1}
sign
sign
φ0[0]
φ0[1]
1
1
1
-1
-1
-1
-1
φ0[0]
φ0[1]φ
1[0]
φ0[0]
φ0[1]
1
w0,0
b0,0
φ1[1]
w0,1
b0,1
![Page 21: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/21.jpg)
Example: binary classification with a neural network
● These classifiers map to a new space
X
O
O
X
φ0(x2) = {1, 1}φ
0(x1) = {-1, 1}
φ0(x4) = {1, -1}φ
0(x3) = {-1, -1}
1
1
-1
-1
-1
-1
φ1
φ2
φ1[1]
φ1[0]
φ1[0]
φ1[1]
φ1(x1) = {-1, -1}
X Oφ
1(x2) = {1, -1}
O
φ1(x3) = {-1, 1}
φ1(x4) = {-1, -1}
![Page 22: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/22.jpg)
X
O
O
X
φ0(x2) = {1, 1}φ
0(x1) = {-1, 1}
φ0(x4) = {1, -1}φ
0(x3) = {-1, -1}
1
1
-1
-1
-1
-1
φ0[0]
φ0[1]
φ1[1]
φ1[0]
φ1[0]
φ1[1]
φ1(x1) = {-1, -1}
X O φ1(x2) = {1, -1}
Oφ1(x3) = {-1, 1}
φ1(x4) = {-1, -1}
1
1
1φ
2[0] = y
Example: binary classification with a neural network
![Page 23: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/23.jpg)
Example: the final network can correctly classify the examples that the perceptron could not.
tanh
tanh
φ0[0]
φ0[1]
1
φ0[0]
φ0[1]
1
1
1
-1
-1
-1
-1
1 1
1
1
tanh
φ1[0]
φ1[1]
φ2[0]
Replace “sign” withsmoother non-linear function
(e.g. tanh, sigmoid)
![Page 24: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/24.jpg)
Feedforward Neural Networks
Components:
• an input layer
• an output layer
• one or more hidden layers
In a fully connected network: each hidden unit takes as input all the units in the previous layer
No loops!
•
A 2-layer feedforward neural network
![Page 25: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/25.jpg)
Designing Neural Networks:Activation functions • Hidden layer can be viewed as set of
hidden features
• The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input
• The activation function is a non-linear function that determines range of hidden feature values
![Page 26: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/26.jpg)
Designing Neural Networks:Activation functions
![Page 27: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/27.jpg)
Designing Neural Networks:Network structure
• 2 key decisions:• Width (number of nodes per layer)
• Depth (number of hidden layers)
• More parameters means that the network can learn more complex functions of the input
![Page 28: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/28.jpg)
Forward Propagation: For a given network, and some input values, compute output
![Page 29: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/29.jpg)
Forward Propagation: For a given network, and some input values, compute output
Given input (1,0) (and sigmoid non-linearities), we can calculate the output by processing one layer at a time:
![Page 30: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/30.jpg)
Forward Propagation: For a given network, and some input values, compute output
Output table for all possible inputs:
![Page 31: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/31.jpg)
Neural Networks as Computation Graphs
![Page 32: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/32.jpg)
Computation Graphs Make Prediction Easy:Forward Propagation consists in traversing graph in topological order
![Page 33: From Logistic Regression to Neural Networks](https://reader031.vdocument.in/reader031/viewer/2022020700/61f60a955cd7cc3b5e178c65/html5/thumbnails/33.jpg)
Neural Networks so far
• Powerful non-linear models for classification
• Predictions are made as a sequence of simple operations• matrix-vector operations• non-linear activation functions
• Choices in network structure• Width and depth• Choice of activation function
• Feedforward networks• no loop
• Next: how to train