lecture 6 recapย ยท cross-entropy (softmax) softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) suppose: 3...
TRANSCRIPT
![Page 1: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/1.jpg)
Lecture 6 recap
Prof. Leal-Taixรฉ and Prof. Niessner 1
![Page 2: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/2.jpg)
Neural Network
Depth
Wid
th
Prof. Leal-Taixรฉ and Prof. Niessner 2
![Page 3: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/3.jpg)
Gradient Descent for Neural Networks
๐ฅ0
๐ฅ1
๐ฅ2
โ0
โ1
โ2
โ3
๐ฆ0
๐ฆ1
๐ก0
๐ก1
๐ฆ๐ = ๐ด(๐1,๐ +
๐
โ๐๐ค1,๐,๐)โ๐ = ๐ด(๐0,๐ +
๐
๐ฅ๐๐ค0,๐,๐)
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐2
๐ป๐ค,๐๐๐ฅ,๐ก (๐ค) =
๐๐
๐๐ค0,0,0โฆโฆ๐๐
๐๐ค๐,๐,๐โฆโฆ๐๐
๐๐๐,๐
Just simple: ๐ด ๐ฅ = max(0, ๐ฅ)Prof. Leal-Taixรฉ and Prof. Niessner 3
![Page 4: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/4.jpg)
Stochastic Gradient Descent (SGD)๐๐+1 = ๐๐ โ ๐ผ๐ป๐๐ฟ(๐
๐ , ๐ฅ{1..๐}, ๐ฆ{1..๐})
๐ป๐๐ฟ =1
๐ฯ๐=1๐ ๐ป๐๐ฟ๐
+ all variations of SGD: momentum, RMSProp, Adam, โฆ
๐ now refers to ๐-th iteration
๐ training samples in the current batch
Gradient for the ๐-th batch
Prof. Leal-Taixรฉ and Prof. Niessner 4
![Page 5: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/5.jpg)
Importance of Learning Rate
Prof. Leal-Taixรฉ and Prof. Niessner 5
![Page 6: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/6.jpg)
Over- and Underfitting
Underfitted Appropriate Overfitted
Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, OโReily Media Inc., 2017
Prof. Leal-Taixรฉ and Prof. Niessner 6
![Page 7: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/7.jpg)
Over- and Underfitting
Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html
Prof. Leal-Taixรฉ and Prof. Niessner 7
![Page 8: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/8.jpg)
Basic recipe for machine learningโข Split your data
Find your hyperparameters
20%
train testvalidation
20%60%
Prof. Leal-Taixรฉ and Prof. Niessner 8
![Page 9: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/9.jpg)
Basic recipe for machine learning
Prof. Leal-Taixรฉ and Prof. Niessner 9
![Page 10: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/10.jpg)
Basicallyโฆ
Prof. Leal-Taixรฉ and Prof. Niessner 10Deep learning memes
![Page 11: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/11.jpg)
Fun thingsโฆ
Prof. Leal-Taixรฉ and Prof. Niessner 11Deep learning memes
![Page 12: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/12.jpg)
Fun thingsโฆ
Prof. Leal-Taixรฉ and Prof. Niessner 12Deep learning memes
![Page 13: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/13.jpg)
Fun thingsโฆ
Prof. Leal-Taixรฉ and Prof. Niessner 13Deep learning memes
![Page 14: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/14.jpg)
Going Deep into Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 14
![Page 15: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/15.jpg)
Simple Starting Points for Debuggingโข Start simple!
โ First, overfit to a single training sampleโ Second, overfit to several training samples
โข Always try simple architecture firstโ It will verify that you are learning something
โข Estimate timings (how long for each epoch?)
Prof. Leal-Taixรฉ and Prof. Niessner 15
![Page 16: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/16.jpg)
Neural Network
โข Problems of going deeperโฆโ Vanishing gradients (multiplication of chain rule)
โข The impact of small decisions (architecture, activation functions...)
โข Is my network training correctly?
Prof. Leal-Taixรฉ and Prof. Niessner 16
![Page 17: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/17.jpg)
Neural Networks
Prof. Leal-Taixรฉ and Prof. Niessner 17
3) Input of data
2) Functions in Neurons
1) Output functions
![Page 18: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/18.jpg)
Output Functions
Prof. Leal-Taixรฉ and Prof. Niessner 18
![Page 19: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/19.jpg)
Neural NetworksWhat is the shape of this function?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixรฉ and Prof. Niessner 19
![Page 20: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/20.jpg)
Sigmoid for Binary Predictions
0
Can be interpreted as a probability
1
Prof. Leal-Taixรฉ and Prof. Niessner 20
![Page 21: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/21.jpg)
Softmax formulationโข What if we have multiple classes?
Prof. Leal-Taixรฉ and Prof. Niessner 21
![Page 22: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/22.jpg)
Softmax formulationโข What if we have multiple classes?
Softmax
Prof. Leal-Taixรฉ and Prof. Niessner 22
![Page 23: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/23.jpg)
Softmax formulationโข What if we have multiple classes?
Softmax
Prof. Leal-Taixรฉ and Prof. Niessner 23
![Page 24: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/24.jpg)
Softmax formulationโข Softmax
โข Softmax loss (Maximum Likelihood Estimate)
exp
normalize
Prof. Leal-Taixรฉ and Prof. Niessner 24
![Page 25: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/25.jpg)
Loss Functions
Prof. Leal-Taixรฉ and Prof. Niessner 25
![Page 26: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/26.jpg)
Naรฏve Losses
- Sum of squared differences (SSD)- Prune to outliers- Compute-efficient (optimization)- Optimum is the mean
L2 Loss: ๐ฟ2 = ฯ๐=1๐ ๐ฆ๐ โ ๐ ๐ฅ๐
2
L1 Loss: ๐ฟ1 = ฯ๐=1๐ |๐ฆ๐ โ ๐(๐ฅ๐)|
12 24 42 23
34 32 5 2
12 31 12 31
31 64 5 13
15 20 40 25
34 32 5 2
12 31 12 31
31 64 5 13
๐ฅ๐ ๐ฆ๐
๐ฟ1 ๐ฅ, ๐ฆ = 3 + 4 + 2 + 2 + 0 +โฏ+ 0 = 15
๐ฟ2 ๐ฅ, ๐ฆ = 9 + 16 + 4 + 4 + 0 +โฏ+ 0 = 66- Sum of absolute differences- Robust- Costly to compute- Optimum is the median
Prof. Leal-Taixรฉ and Prof. Niessner 26
![Page 27: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/27.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
Prof. Leal-Taixรฉ and Prof. Niessner 27
![Page 28: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/28.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
3.25.1-1.7
Prof. Leal-Taixรฉ and Prof. Niessner 28
![Page 29: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/29.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
3.25.1-1.7
24.5164.00.18
exp
Prof. Leal-Taixรฉ and Prof. Niessner 29
![Page 30: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/30.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
norm
aliz
e
Prof. Leal-Taixรฉ and Prof. Niessner 30
![Page 31: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/31.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss 2.04 0.14 6.94
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
2.040.146.94no
rmal
ize
-lo
g(x
)
Prof. Leal-Taixรฉ and Prof. Niessner 31
![Page 32: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/32.jpg)
Cross-Entropy (Softmax)Softmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Suppose: 3 training examples and 3 classes
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss 2.04 0.079 6.156
sco
res
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
3.25.1-1.7
24.5164.00.18
0.130.870.00
exp
2.040.146.94no
rmal
ize
-lo
g(x
)
๐ฟ =1
๐
๐=1
๐
๐ฟ๐ =
=๐ฟ1 + ๐ฟ2 + ๐ฟ3
3=
=2.04 + 0.079 + 6.156
3=
= ๐. ๐๐Prof. Leal-Taixรฉ and Prof. Niessner 32
![Page 33: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/33.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 33
![Page 34: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/34.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
Prof. Leal-Taixรฉ and Prof. Niessner 34
![Page 35: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/35.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
Prof. Leal-Taixรฉ and Prof. Niessner 35
![Page 36: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/36.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโsc
ore
s
Prof. Leal-Taixรฉ and Prof. Niessner 36
![Page 37: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/37.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
๐ฟ1 =max(0, 5.1 โ 3.2 + 1) +max(0, โ1.7 โ 3.2 + 1) == max 0, 2.9 + max(0, โ3.9) == 2.9 + 0 == ๐. ๐
2.9
sco
res
Prof. Leal-Taixรฉ and Prof. Niessner 37
![Page 38: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/38.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss
๐ฟ2 =max(0, 1.3 โ 4.9 + 1) +max(0, 2.0 โ 4.9 + 1) == max 0, โ2.6 + max 0,โ1.9 == 0 + 0 == ๐
2.9 0
sco
res
Prof. Leal-Taixรฉ and Prof. Niessner 38
![Page 39: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/39.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss 2.9 0 12.9
๐ฟ3 =max(0, 2.2 โ (โ3.1) + 1) +max(0, 2.5 โ (โ3.1) + 1) == max 0, 6.3 + max 0, 6.6 == 6.3 + 6.6 == ๐๐. ๐
sco
res
Prof. Leal-Taixรฉ and Prof. Niessner 39
![Page 40: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/40.jpg)
Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Suppose: 3 training examples and 3 classes
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
1.34.92.0
3.25.1-1.7
2.22.5-3.1
catchairโcarโ
Loss 2.9 0 12.9
๐ฟ =1
๐
๐=1
๐
๐ฟ๐ =
=๐ฟ1 + ๐ฟ2 + ๐ฟ3
3=
=2.9 + 0 + 10.9
3=
= ๐. ๐
sco
res
Full Loss (over all pairs):
Prof. Leal-Taixรฉ and Prof. Niessner 40
![Page 41: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/41.jpg)
Hinge Loss vs Softmax
Softmax: ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Hinge loss: ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 41
![Page 42: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/42.jpg)
Hinge Loss vs Softmax
Softmax: ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Given the following scores:
๐ = [5,โ3, 2]
๐ = [5, 10, 10]
๐ = [5, โ20,โ20]
๐ฆ๐ = 0
Hinge loss: ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 42
![Page 43: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/43.jpg)
Hinge Loss vs Softmax
Softmax: ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Given the following scores:
๐ = [5,โ3, 2]
๐ = [5, 10, 10]
๐ = [5, โ20,โ20]
๐ฆ๐ = 0
Hinge loss: max(0,โ3 โ 5 + 1) +max 0, 2 โ 5 + 1 =0
max(0, 10 โ 5 + 1) +max 0, 10 โ 5 + 1 =12
max(0,โ20 โ 5 + 1) +max 0,โ20 โ 5 + 1 =0
Hinge loss: ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 43
![Page 44: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/44.jpg)
Hinge Loss vs Softmax
Softmax: ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Given the following scores:
๐ = [5,โ3, 2]
๐ = [5, 10, 10]
๐ = [5, โ20,โ20]
๐ฆ๐ = 0
Hinge loss: Softmax loss: max(0,โ3 โ 5 + 1) +max 0, 2 โ 5 + 1 =0
max(0, 10 โ 5 + 1) +max 0, 10 โ 5 + 1 =12
max(0,โ20 โ 5 + 1) +max 0,โ20 โ 5 + 1 =0
Googleโฆ-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05
Googleโฆ-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70
Googleโฆ-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11
Hinge loss: ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 44
![Page 45: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/45.jpg)
Hinge Loss vs Softmax
Softmax: ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Given the following scores:
๐ = [5,โ3, 2]
๐ = [5, 10, 10]
๐ = [5, โ20,โ20]
๐ฆ๐ = 0
Hinge loss: Softmax loss: max(0,โ3 โ 5 + 1) +max 0, 2 โ 5 + 1 =0
max(0, 10 โ 5 + 1) +max 0, 10 โ 5 + 1 =12
max(0,โ20 โ 5 + 1) +max 0,โ20 โ 5 + 1 =0
Googleโฆ-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05
Googleโฆ-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70
Googleโฆ-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11
Softmax *always* wants to improve!Hinge Loss saturates
Hinge loss: ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Prof. Leal-Taixรฉ and Prof. Niessner 45
![Page 46: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/46.jpg)
Loss in Compute Graph
๐
๐ฅ๐
๐(๐ฅ๐ ,๐)SVM ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
๐ฆ๐
๐ฟscore function
regularization loss
data lossesSoftmax ๐ฟ๐ = โ log(
๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Full Loss ๐ฟ =1
๐ฯ๐=1๐ ๐ฟ๐ + ๐ 2(๐)
e.g., ๐ฟ2-reg: ๐ 2 ๐ = ฯ๐=1๐ ๐ค๐
2
labeled data
input data
Given a function with weights ๐,Training pairs [๐ฅ๐; ๐ฆ๐] (input and labels)
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Prof. Leal-Taixรฉ and Prof. Niessner 46
![Page 47: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/47.jpg)
Compute Graphs
SVM ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Softmax ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Full Loss ๐ฟ =1
๐ฯ๐=1๐ ๐ฟ๐ + ๐ 2(๐)
e.g., ๐ฟ2-reg: ๐ 2 ๐ = ฯ๐=1๐ ๐ค๐
2
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Prof. Leal-Taixรฉ and Prof. Niessner 47
![Page 48: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/48.jpg)
Compute Graphs
SVM ๐ฟ๐ = ฯ๐โ ๐ฆ๐max(0, ๐ ๐ โ ๐ ๐ฆ๐ + 1)
Softmax ๐ฟ๐ = โ log(๐๐ ๐ฆ๐
ฯ๐ ๐๐ ๐)
Full Loss ๐ฟ =1
๐ฯ๐=1๐ ๐ฟ๐ + ๐ 2(๐)
e.g., ๐ฟ2-reg: ๐ 2 ๐ = ฯ๐=1๐ ๐ค๐
2
Score function ๐ = ๐(๐ฅ๐ ,๐)e.g., ๐(๐ฅ๐ ,๐) = ๐ โ ๐ฅ0, ๐ฅ1, โฆ , ๐ฅ๐
๐
Want to find optimal ๐. I.e., weights are unknowns ofoptimization problem
Compute gradient w.r.t. ๐.
Gradient ๐ป๐๐ฟ is computed via backpropagation
Prof. Leal-Taixรฉ and Prof. Niessner 48
![Page 49: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/49.jpg)
Weight Regularization & SVM LossMulticlass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ฅ๐;๐ ๐ โ ๐ ๐ฅ๐;๐ ๐ฆ๐+ 1)
๐ฟ2-reg: ๐ 2 ๐ = ฯ๐=1๐ ฯ๐โ ๐ฆ๐
๐ค๐2
๐ฟ1-reg: ๐ 1 ๐ = ฯ๐=1๐ ฯ๐โ ๐ฆ๐
|๐ค๐|
Full loss ๐ฟ = 1
๐ฯ๐=1๐ ฯ๐โ ๐ฆ๐
max 0, ๐ ๐ฅ๐;๐ ๐ โ ๐ ๐ฅ๐;๐ ๐ฆ๐ + 1 + ๐๐ (๐)
Prof. Leal-Taixรฉ and Prof. Niessner 49
![Page 50: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/50.jpg)
Weight Regularization & SVM LossMulticlass SVM loss ๐ฟ๐ = ฯ๐โ ๐ฆ๐
max(0, ๐ ๐ฅ๐;๐ ๐ โ ๐ ๐ฅ๐;๐ ๐ฆ๐+ 1)
๐ฟ2-reg: ๐ 2 ๐ = ฯ๐=1๐ ฯ๐โ ๐ฆ๐
๐ค๐2
๐ฟ1-reg: ๐ 1 ๐ = ฯ๐=1๐ ฯ๐โ ๐ฆ๐
|๐ค๐|
Full loss ๐ฟ = 1
๐ฯ๐=1๐ ฯ๐โ ๐ฆ๐
max 0, ๐ ๐ฅ๐;๐ ๐ โ ๐ ๐ฅ๐;๐ ๐ฆ๐ + 1 + ๐๐ (๐)
๐ค2 = [0.25, 0.25, 0.25, 0.25]
๐ฅ = 1,1,1,1
๐ค1 = [1, 0, 0, 0]
Example:
๐ค1๐๐ฅ = ๐ค2
๐๐ฅ = 1
๐ 2 ๐ค1 = 1
๐ 2 ๐ค2 = 0.252 + 0.252 + 0.252 + 0.252 = 0.25
Prof. Leal-Taixรฉ and Prof. Niessner 50
![Page 51: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/51.jpg)
Activation Functions
Prof. Leal-Taixรฉ and Prof. Niessner 51
![Page 52: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/52.jpg)
Neurons
Prof. Leal-Taixรฉ and Prof. Niessner 52
![Page 53: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/53.jpg)
Neurons
Prof. Leal-Taixรฉ and Prof. Niessner 53
![Page 54: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/54.jpg)
Neural NetworksWhat is the shape of this function?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixรฉ and Prof. Niessner 54
![Page 55: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/55.jpg)
Activation Functions or Hidden Units
Prof. Leal-Taixรฉ and Prof. Niessner 55
![Page 56: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/56.jpg)
Sigmoid
Can be interpreted as a probability
Prof. Leal-Taixรฉ and Prof. Niessner 56
![Page 57: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/57.jpg)
SigmoidForward
Prof. Leal-Taixรฉ and Prof. Niessner 57
![Page 58: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/58.jpg)
SigmoidForward
Saturated neurons kill the gradient flow
Prof. Leal-Taixรฉ and Prof. Niessner 58
![Page 59: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/59.jpg)
SigmoidForward
Active region for gradient
descent
Prof. Leal-Taixรฉ and Prof. Niessner 59
![Page 60: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/60.jpg)
Sigmoid
Output is always positive
Prof. Leal-Taixรฉ and Prof. Niessner 60
![Page 61: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/61.jpg)
Problem of Positive Output
We want to compute the gradient wrt the weights
Prof. Leal-Taixรฉ and Prof. Niessner 61
![Page 62: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/62.jpg)
Problem of Positive Output
We want to compute the gradient wrt the weights
Prof. Leal-Taixรฉ and Prof. Niessner 62
![Page 63: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/63.jpg)
Problem of Positive Output
It is going to be either positive or negative for all weights
Prof. Leal-Taixรฉ and Prof. Niessner 63
![Page 64: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/64.jpg)
Problem of Positive Output
More on zero-mean data later
Prof. Leal-Taixรฉ and Prof. Niessner 64
![Page 65: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/65.jpg)
tanh
Zero-centered
Still saturates
Still saturates
LeCun 1991Prof. Leal-Taixรฉ and Prof. Niessner 65
![Page 66: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/66.jpg)
Rectified Linear Units (ReLU)
Large and consistent gradients
Does not saturateFast convergence
Krizhevsky 2012Prof. Leal-Taixรฉ and Prof. Niessner 66
![Page 67: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/67.jpg)
Rectified Linear Units (ReLU)
Large and consistent gradients
Does not saturateFast convergence
What happens if a ReLU outputs zero?
Dead ReLU
Prof. Leal-Taixรฉ and Prof. Niessner 67
![Page 68: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/68.jpg)
Rectified Linear Units (ReLU)โข Initializing ReLU neurons with slightly positive biases
(0.1) makes it likely that they stay active for most inputs
Prof. Leal-Taixรฉ and Prof. Niessner 68
![Page 69: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/69.jpg)
Leaky ReLU
Does not die
Mass 2013Prof. Leal-Taixรฉ and Prof. Niessner 69
![Page 70: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/70.jpg)
Parametric ReLU
Does not die
One more parameter to backprop into
He 2015Prof. Leal-Taixรฉ and Prof. Niessner 70
![Page 71: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/71.jpg)
Maxout units
Goodfellow 2013Prof. Leal-Taixรฉ and Prof. Niessner 71
๐๐๐ฅ๐๐ข๐ก = max(๐ค1๐๐ฅ + ๐1, ๐ค2
๐๐ฅ + ๐2)
![Page 72: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/72.jpg)
Maxout units
Piecewise linear approximation of a convex function with N pieces
Goodfellow 2013Prof. Leal-Taixรฉ and Prof. Niessner 72
![Page 73: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/73.jpg)
Maxout units
Generalization of ReLUs
Linear regimes
Does not die
Does not saturate
Increase of the number of parametersProf. Leal-Taixรฉ and Prof. Niessner 73
![Page 74: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/74.jpg)
Quick Guideโข Sigmoid is not really used
โข ReLU is the standard choice
โข Second choice are the variants of ReLu or Maxout
โข Recurrent nets will require tanh or similar
Prof. Leal-Taixรฉ and Prof. Niessner 74
![Page 75: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/75.jpg)
Neural Networks
Data pre-processing?
Loss (Softmax,
Hinge)
Prediction
Prof. Leal-Taixรฉ and Prof. Niessner 75
![Page 76: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/76.jpg)
Data pre-processing
For images subtract the mean image (AlexNet) or per-channel mean (VGG-Net)Prof. Leal-Taixรฉ and Prof. Niessner 76
![Page 77: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/77.jpg)
Outlook
Prof. Leal-Taixรฉ and Prof. Niessner 77
![Page 78: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/78.jpg)
Outlook
Prof. Leal-Taixรฉ and Prof. Niessner 78
Regularization in the optimization:- Dropout, weight decay, etc..
Regularization in the architecture:- Convolutions! (CNNs)
Init. of optimization- How to set weights at beginning
Handling limited training data- Data augmentation
![Page 79: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โlog( ๐๐ ๐ฆ ฯ ๐ ๐ ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โcarโ](https://reader033.vdocument.in/reader033/viewer/2022060901/609e3f1d35dd6a763b13ebe8/html5/thumbnails/79.jpg)
Next lecture
โข Next lecture Dec 13th
โ More about training neural networks; regularization, BN, dropout, etc.
โ Followed by CNNs
Prof. Leal-Taixรฉ and Prof. Niessner 79