classification intro - ece:course page
TRANSCRIPT
![Page 1: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/1.jpg)
Machine Learning Basics
Anupam Datta
CMU
Spring 2018
Security and Fairness of Deep Learning
![Page 2: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/2.jpg)
Image Classification
![Page 3: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/3.jpg)
Image Classification
![Page 4: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/4.jpg)
Image classification pipeline
• Input: A training set of N images, each labeled with one of K different classes.
• Learning: Use training set to learn classifier (model) that predicts what class input images belong to.
• Evaluation: Evaluate quality of classifier by asking it to predict labels for a new set of images that it has never seen before.
![Page 5: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/5.jpg)
CIFAR-10 dataset
• 60,000 tiny images that are 32 pixels high and wide.
• Each image is labeled with one of 10 classes
![Page 6: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/6.jpg)
Nearest Neighbor Classification
The top 10 nearest neighbors in the training set
according to pi el-wise differen e .
![Page 7: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/7.jpg)
Pixel-wise difference
𝑑 𝐼 , 𝐼 = Σ𝑝 |𝐼𝑝 − 𝐼𝑝| L1 norm:
L2 norm: 𝑑 𝐼 , 𝐼 = Σ𝑝 𝐼𝑝 − 𝐼𝑝 2 .
![Page 8: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/8.jpg)
K-Nearest Neighbor Classifier
![Page 9: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/9.jpg)
Disadvantages of k-NN
• The classifier must remember all of the
training data and store it for future
comparisons with the test data. This is space
inefficient because datasets may easily be
gigabytes in size.
• Classifying a test image is expensive since it
requires a comparison to all training images.
![Page 10: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/10.jpg)
Linear Classification
![Page 11: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/11.jpg)
Toward neural networks
• Logistic regression model
– A one-layer neural network
• Training a logistic regression model
– Introduction to gradient descent
• These techniques generalize to deep networks
![Page 12: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/12.jpg)
Linear model
• Score function
– Maps raw data to class scores
• Loss function
– Measures how well predicted classes agree with ground truth labels
• Learning
– Find parameters of score function that minimize loss function
![Page 13: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/13.jpg)
Linear score function
• input image
• weights
• bias
Learning goal:
Learn weights and bias that minimize loss
![Page 14: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/14.jpg)
Using score function
Predict class with highest score
![Page 15: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/15.jpg)
Addresses disadvantages of k-NN
• The classifier does not need to remember all
of the training data and store it for future
comparisons with the test data. It only needs
the weights and bias.
• Classifying a test image is inexpensive since it
just involves tensor multiplication. It does not
require a comparison to all training images.
![Page 16: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/16.jpg)
Linear classifiers as hyperplanes
![Page 17: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/17.jpg)
Linear classifiers as template matching
• Each row of the weight matrix is a template
for a class
• The score of each class for an image is
obtained by comparing each template with
the image using an inner product (or dot
product) one one to find the one that fits best.
![Page 18: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/18.jpg)
Template matching example
Predict class with highest score
(i.e., best template match)
![Page 19: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/19.jpg)
Bias trick
![Page 20: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/20.jpg)
Linear model
• Score function
– Maps raw data to class scores
• Loss function
– Measures how well predicted classes agree with ground truth labels
• Learning
– Find parameters of score function that minimize loss function
![Page 21: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/21.jpg)
Two loss functions
• Multiclass Support Vector Machine loss (SVM
loss)
• Softmax classifier (multiclass logistic
regression)
– Cross-entropy loss function
![Page 22: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/22.jpg)
SVM loss idea
The SVM loss is set up so that the SVM wants the correct class for each image to have a score
higher than the incorrect classes by some fixed
margin
![Page 23: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/23.jpg)
Scores
• Score vector
• Score for j-th class
![Page 24: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/24.jpg)
SVM loss for i-th training example
![Page 25: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/25.jpg)
Example
![Page 26: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/26.jpg)
An Issue
• Suppose
• If the difference in scores between a correct
class and a nearest incorrect class is at least 15
for all examples, then multiplying all elements
of by 2 would make the new difference 30.
• also gives zero loss if
gives zero loss
![Page 27: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/27.jpg)
Regularization
• Add a regularization penalty to the loss
function
![Page 28: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/28.jpg)
Multiclass SVM loss
Final classifier encouraged to take into account all
input dimensions to small amounts rather than a few input dimensions very strongly
![Page 29: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/29.jpg)
Multiclass SVM loss
![Page 30: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/30.jpg)
Example
Final classifier encouraged to take into account all
input dimensions to small amounts rather than a few input dimensions very strongly
![Page 31: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/31.jpg)
Two loss functions
• Multiclass Support Vector Machine loss
• Softmax classifier (multiclass logistic
regression)
– Cross-entropy loss function
![Page 32: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/32.jpg)
Softmax classifier
(multiclass logistic regression)
Pick class with highest probability
![Page 33: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/33.jpg)
Logistic function
33
Figure 1.19(a) from Murphy
![Page 34: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/34.jpg)
Logistic regression example
34
Figure 1.19(b) from Murphy
![Page 35: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/35.jpg)
Cross-entropy loss
Full loss for the dataset is the mean of
over all training examples plus a regularization
term
![Page 36: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/36.jpg)
Interpreting cross-entropy loss
The cross-entropy objective wants the predicted
distribution to have all of its mass on the correct
answer.
![Page 37: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/37.jpg)
Information theory motivation for
cross-entropy loss
Cross-entropy between a true distribution and
an estimated distribution
![Page 38: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/38.jpg)
Information theory motivation for
cross-entropy loss
The Softmax classifier is minimizing the cross-entropy between the estimated class probabilities
( ) and
the true distri ution, whi h in this interpretation is the distribution where all probability mass is on the correct class
( contains a single 1 in the position)
![Page 39: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/39.jpg)
Quiz
![Page 40: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/40.jpg)
Quiz
![Page 41: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/41.jpg)
Learning task
• Find parameters of the model that minimize
loss
• Looking ahead: Stochastic gradient descent
![Page 42: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/42.jpg)
Looking ahead: linear algebra
• Images represented as tensors (3D arrays)
• Operations on these tensors used to train
models
• Review basics of linear algebra
– Chapter 2 of Deep Learning textbook
– Will review briefly in class
![Page 43: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/43.jpg)
Looking ahead: multivariate calculus
• Optimization of functions over tensors used to
train models
• Involves basics of multivariate calculus
– Gradients, Hessian
• Will review briefly in class
![Page 44: Classification Intro - ECE:Course Page](https://reader033.vdocument.in/reader033/viewer/2022043013/626bb56ad97f5653f74eabf1/html5/thumbnails/44.jpg)
Acknowledgment
• Based on material from Stanford CS231n
http://cs231n.github.io/