![Page 1: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/1.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 20201
![Page 2: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/2.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Administrative: Assignment 1
Released last week, due Wed 4/22 at 11:59pm
2
![Page 3: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/3.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Administrative: Project proposal
Due Wed 4/27
TA expertise listed on piazza
There is a Piazza thread to find teammates
Slack has topic specific channels for you to use
3
![Page 4: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/4.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Administrative: Midterm
24-Hours open notes exam
Combination of True/False, Multiple Choice, Short Answer, Coding
Will be released May 12th, specific time TBD
Due May 13th, 24 hours from release time
The exam should take 3-4 hours to finish
4
![Page 5: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/5.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 4 - April 16, 2020
Administrative: Midterm UpdatesUniversity has updated guidance on administering exams in spring quarter. In order to comply with the current policies, we have changed the exam format as the following to be consistent with exams in previous offerings of cs 231n:
Date: released on Tuesday 5/12 (open for 24 hours to choose 1hr 40 mins time frame)
Format: Timestamped with Gradescope
5
![Page 6: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/6.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Administrative: Piazza
Please make sure to check and read all pinned piazza posts.
6
![Page 7: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/7.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
catdogbirddeertruck
Image Classification: A core task in Computer Vision
7
(assume given a set of labels){dog, cat, truck, plane, ...}
This image by Nikita is licensed under CC-BY 2.0
![Page 8: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/8.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Recall from last time: Challenges of recognition
8
This image is CC0 1.0 public domain This image by Umberto Salvagnin is licensed under CC-BY 2.0
This image by jonsson is licensed under CC-BY 2.0
Illumination Deformation Occlusion
This image is CC0 1.0 public domain
Clutter
This image is CC0 1.0 public domain
Intraclass Variation
Viewpoint
![Page 9: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/9.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Recall from last time: data-driven approach, kNN
9
1-NN classifier 5-NN classifier
train test
train testvalidation
![Page 10: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/10.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Recall from last time: Linear Classifier
10
f(x,W) = Wx + b
![Page 11: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/11.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Interpreting a Linear Classifier: Visual Viewpoint
11
![Page 12: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/12.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202012
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Input image
0.2 -0.5
0.1 2.0
1.5 1.3
2.1 0.0
0 .25
0.2 -0.3
1.1 3.2 -1.2
W
b
f(x,W) = Wx
Algebraic Viewpoint
-96.8Score 437.9 61.95
Visual Viewpoint
![Page 13: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/13.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Interpreting a Linear Classifier: Geometric Viewpoint
13
f(x,W) = Wx + b
Array of 32x32x3 numbers(3072 numbers total)
Cat image by Nikita is licensed under CC-BY 2.0Plot created using Wolfram Cloud
![Page 14: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/14.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Recall from last time: Linear Classifier
14
1. Define a loss function that quantifies our unhappiness with the scores across the training data.
2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)
TODO:
Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain
![Page 15: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/15.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202015
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
![Page 16: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/16.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202016
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
![Page 17: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/17.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202017
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
Given a dataset of examples
Where is image and is (integer) label
![Page 18: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/18.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202018
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
Given a dataset of examples
Where is image and is (integer) label
Loss over the dataset is a average of loss over examples:
![Page 19: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/19.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202019
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
![Page 20: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/20.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202020
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Interpreting Multiclass SVM loss:
Score for correct class
Loss
![Page 21: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/21.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202021
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Interpreting Multiclass SVM loss:
Score for correct class
Loss
score amongst other classes
![Page 22: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/22.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202022
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Interpreting Multiclass SVM loss:
Score for correct class
Loss
Marginscore amongst other classes
![Page 23: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/23.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202023
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Interpreting Multiclass SVM loss:
Score for correct class
Loss
Margin
“Hinge loss”
score amongst other classes
![Page 24: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/24.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202024
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
![Page 25: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/25.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202025
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
= max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1)= max(0, 2.9) + max(0, -3.9)= 2.9 + 0= 2.9Losses: 2.9
![Page 26: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/26.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202026
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Losses:
= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 002.9
![Page 27: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/27.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202027
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Losses:
= max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1)= max(0, 6.3) + max(0, 6.6)= 6.3 + 6.6= 12.912.92.9 0
![Page 28: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/28.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202028
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Loss over full dataset is average:
Losses: 12.92.9 0 L = (2.9 + 0 + 12.9)/3 = 5.27
![Page 29: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/29.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202029
cat
frog
car 4.91.3
2.0
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Losses: 0
Q1: What happens to loss if car scores decrease by 0.5 for this training example?
Q2: what is the min/max possible SVM loss Li?
Q3: At initialization W is small so all s ≈ 0. What is the loss Li, assuming N examples and C classes?
![Page 30: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/30.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202030
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q4: What if the sum was over all classes? (including j = y_i)Losses: 12.92.9 0
![Page 31: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/31.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202031
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q5: What if we used mean instead of sum?Losses: 12.92.9 0
![Page 32: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/32.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202032
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q6: What if we used
Losses: 12.92.9 0
![Page 33: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/33.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Multiclass SVM Loss: Example code
33
# First calculate scores# Then calculate the margins sj - syi + 1# only sum j is not yi, so when j = yi, set to zero.# sum across all j
![Page 34: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/34.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Q7. Suppose that we found a W such that L = 0. Is this W unique?
34
![Page 35: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/35.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202035
E.g. Suppose that we found a W such that L = 0. Is this W unique?
No! 2W is also has L = 0!
![Page 36: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/36.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202036
Suppose: 3 training examples, 3 classes.With some W the scores are:
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 0
0Losses: 2.9
Before:
With W twice as large:= max(0, 2.6 - 9.8 + 1) +max(0, 4.0 - 9.8 + 1)= max(0, -6.2) + max(0, -4.8)= 0 + 0= 0
![Page 37: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/37.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202037
E.g. Suppose that we found a W such that L = 0. Is this W unique?
No! 2W is also has L = 0! How do we choose between W and 2W?
![Page 38: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/38.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
38
Data loss: Model predictions should match training data
![Page 39: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/39.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
39
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
![Page 40: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/40.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization intuition: toy example training data
40
x
y
![Page 41: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/41.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization intuition: Prefer Simpler Models
41
x
yf1 f2
![Page 42: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/42.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization: Prefer Simpler Models
42
x
yf1 f2
Regularization pushes against fitting the data too well so we don’t fit noise in the data
![Page 43: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/43.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
43
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
Occam’s Razar: Among multiple competing hypotheses, the simplest is the best, William of Ockham 1285-1347
![Page 44: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/44.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
44
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
![Page 45: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/45.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
45
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):
![Page 46: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/46.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
46
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):
More complex:DropoutBatch normalizationStochastic depth, fractional pooling, etc
![Page 47: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/47.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization
47
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Why regularize?- Express preferences over weights- Make the model simple so it works on test data- Improve optimization by adding curvature
![Page 48: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/48.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization: Expressing Preferences
48
L2 Regularization
Which of w1 or w2 will the L2 regularizer prefer?
![Page 49: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/49.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization: Expressing Preferences
49
L2 Regularization
L2 regularization likes to “spread out” the weights
Which of w1 or w2 will the L2 regularizer prefer?
![Page 50: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/50.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Regularization: Expressing Preferences
50
L2 Regularization
L2 regularization likes to “spread out” the weights
Which one would L1 regularization prefer?
Which of w1 or w2 will the L2 regularizer prefer?
![Page 51: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/51.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Softmax classifier
51
![Page 52: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/52.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202052
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilities
![Page 53: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/53.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202053
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
![Page 54: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/54.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202054
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
exp
unnormalized probabilities
Probabilities must be >= 0
![Page 55: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/55.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202055
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilities
![Page 56: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/56.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202056
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
![Page 57: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/57.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202057
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
Li = -log(0.13) = 2.04
![Page 58: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/58.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202058
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
Li = -log(0.13) = 2.04
Maximum Likelihood EstimationChoose weights to maximize the likelihood of the observed data(See CS 229 for details)
![Page 59: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/59.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202059
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
![Page 60: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/60.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202060
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
Kullback–Leibler divergence
![Page 61: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/61.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202061
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
Cross Entropy
![Page 62: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/62.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202062
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
![Page 63: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/63.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202063
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q1: What is the min/max possible softmax loss Li?
Q2: At initialization all sj will be approximately equal; what is the softmax loss Li, assuming C classes?
![Page 64: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/64.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202064
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q: What is the min/max possible loss Li?A: min 0, max infinity
![Page 65: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/65.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202065
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q2: At initialization all sj will be approximately equal; what is the loss?
![Page 66: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/66.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202066
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q2: At initialization all s will be approximately equal; what is the loss?A: -log(1/C) = log(C), If C = 10, then Li = log(10) ≈ 2.3
![Page 67: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/67.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202067
Softmax vs. SVM
![Page 68: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/68.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202068
Softmax vs. SVM
![Page 69: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/69.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202069
Softmax vs. SVM
assume scores:[10, -2, 3][10, 9, 9][10, -100, -100]and
Q: What is the softmax loss and the SVM loss?
![Page 70: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/70.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202070
Softmax vs. SVM
assume scores:[10, -2, 3][10, 9, 9][10, -100, -100]and
Q: What is the softmax loss and the SVM loss if I double the correct class score from 10 -> 20?
![Page 71: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/71.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202071
Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:
e.g.
Softmax
SVM
Full loss
![Page 72: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/72.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202072
Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:
e.g.
Softmax
SVM
Full loss
How do we find the best W?
![Page 73: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/73.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202073
Optimization
![Page 74: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/74.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202074
This image is CC0 1.0 public domain
![Page 75: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/75.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202075Walking man image is CC0 1.0 public domain
![Page 76: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/76.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202076
Strategy #1: A first very bad idea solution: Random search
![Page 77: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/77.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202077
Lets see how well this works on the test set...
15.5% accuracy! not bad!(SOTA is ~99.3%)
![Page 78: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/78.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202078
Strategy #2: Follow the slope
![Page 79: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/79.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202079
Strategy #2: Follow the slope
In 1-dimension, the derivative of a function:
In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension
The slope in any direction is the dot product of the direction with the gradientThe direction of steepest descent is the negative gradient
![Page 80: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/80.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202080
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
gradient dW:
[?,?,?,?,?,?,?,?,?,…]
![Page 81: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/81.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202081
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (first dim):
[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322
gradient dW:
[?,?,?,?,?,?,?,?,?,…]
![Page 82: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/82.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202082
gradient dW:
[-2.5,?,?,?,?,?,?,?,?,…]
(1.25322 - 1.25347)/0.0001= -2.5
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (first dim):
[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322
![Page 83: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/83.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202083
gradient dW:
[-2.5,?,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (second dim):
[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353
![Page 84: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/84.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202084
gradient dW:
[-2.5,0.6,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (second dim):
[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353
(1.25353 - 1.25347)/0.0001= 0.6
![Page 85: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/85.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202085
gradient dW:
[-2.5,0.6,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
![Page 86: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/86.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202086
gradient dW:
[-2.5,0.6,0,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
(1.25347 - 1.25347)/0.0001= 0
![Page 87: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/87.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202087
gradient dW:
[-2.5,0.6,0,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
Numeric Gradient- Slow! Need to loop over
all dimensions- Approximate
![Page 88: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/88.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202088
This is silly. The loss is just a function of W:
want
![Page 89: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/89.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202089
This is silly. The loss is just a function of W:
want
This image is in the public domain This image is in the public domain
Use calculus to compute an analytic gradient
![Page 90: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/90.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202090
gradient dW:
[-2.5,0.6,0,0.2,0.7,-0.5,1.1,1.3,-2.1,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
dW = ...(some function data and W)
![Page 91: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/91.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202091
In summary:- Numerical gradient: approximate, slow, easy to write
- Analytic gradient: exact, fast, error-prone
=>
In practice: Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check.
![Page 92: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/92.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202092
Gradient Descent
![Page 93: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/93.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202093
original W
negative gradient directionW_1
W_2
![Page 94: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/94.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202094
![Page 95: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/95.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Stochastic Gradient Descent (SGD)
95
Full sum expensive when N is large!
Approximate sum using a minibatch of examples32 / 64 / 128 common
![Page 96: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/96.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 202096
Interactive Web Demo
http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/
![Page 97: Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...cs231n.stanford.edu/slides/2020/lecture_3.pdf · Due May 13th, 24 hours from release time The exam should take 3-4 hours](https://reader033.vdocument.in/reader033/viewer/2022050106/5f445d315d47ef3da41fd690/html5/thumbnails/97.jpg)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14, 2020
Next time:
Introduction to neural networks
Backpropagation
97