lecture 6 recapย ยท cross-entropy (softmax) softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) suppose: 3...

79
Lecture 6 recap Prof. Leal-Taixรฉ and Prof. Niessner 1

Upload: others

Post on 31-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Lecture 6 recap

Prof. Leal-Taixรฉ and Prof. Niessner 1

Page 2: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural Network

Depth

Wid

th

Prof. Leal-Taixรฉ and Prof. Niessner 2

Page 3: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Gradient Descent for Neural Networks

๐‘ฅ0

๐‘ฅ1

๐‘ฅ2

โ„Ž0

โ„Ž1

โ„Ž2

โ„Ž3

๐‘ฆ0

๐‘ฆ1

๐‘ก0

๐‘ก1

๐‘ฆ๐‘– = ๐ด(๐‘1,๐‘– +

๐‘—

โ„Ž๐‘—๐‘ค1,๐‘–,๐‘—)โ„Ž๐‘— = ๐ด(๐‘0,๐‘— +

๐‘˜

๐‘ฅ๐‘˜๐‘ค0,๐‘—,๐‘˜)

๐ฟ๐‘– = ๐‘ฆ๐‘– โˆ’ ๐‘ก๐‘–2

๐›ป๐‘ค,๐‘๐‘“๐‘ฅ,๐‘ก (๐‘ค) =

๐œ•๐‘“

๐œ•๐‘ค0,0,0โ€ฆโ€ฆ๐œ•๐‘“

๐œ•๐‘ค๐‘™,๐‘š,๐‘›โ€ฆโ€ฆ๐œ•๐‘“

๐œ•๐‘๐‘™,๐‘š

Just simple: ๐ด ๐‘ฅ = max(0, ๐‘ฅ)Prof. Leal-Taixรฉ and Prof. Niessner 3

Page 4: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Stochastic Gradient Descent (SGD)๐œƒ๐‘˜+1 = ๐œƒ๐‘˜ โˆ’ ๐›ผ๐›ป๐œƒ๐ฟ(๐œƒ

๐‘˜ , ๐‘ฅ{1..๐‘š}, ๐‘ฆ{1..๐‘š})

๐›ป๐œƒ๐ฟ =1

๐‘šฯƒ๐‘–=1๐‘š ๐›ป๐œƒ๐ฟ๐‘–

+ all variations of SGD: momentum, RMSProp, Adam, โ€ฆ

๐‘˜ now refers to ๐‘˜-th iteration

๐‘š training samples in the current batch

Gradient for the ๐‘˜-th batch

Prof. Leal-Taixรฉ and Prof. Niessner 4

Page 5: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Importance of Learning Rate

Prof. Leal-Taixรฉ and Prof. Niessner 5

Page 6: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Over- and Underfitting

Underfitted Appropriate Overfitted

Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, Oโ€˜Reily Media Inc., 2017

Prof. Leal-Taixรฉ and Prof. Niessner 6

Page 7: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Over- and Underfitting

Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html

Prof. Leal-Taixรฉ and Prof. Niessner 7

Page 8: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Basic recipe for machine learningโ€ข Split your data

Find your hyperparameters

20%

train testvalidation

20%60%

Prof. Leal-Taixรฉ and Prof. Niessner 8

Page 9: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Basic recipe for machine learning

Prof. Leal-Taixรฉ and Prof. Niessner 9

Page 10: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Basicallyโ€ฆ

Prof. Leal-Taixรฉ and Prof. Niessner 10Deep learning memes

Page 11: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Fun thingsโ€ฆ

Prof. Leal-Taixรฉ and Prof. Niessner 11Deep learning memes

Page 12: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Fun thingsโ€ฆ

Prof. Leal-Taixรฉ and Prof. Niessner 12Deep learning memes

Page 13: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Fun thingsโ€ฆ

Prof. Leal-Taixรฉ and Prof. Niessner 13Deep learning memes

Page 14: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Going Deep into Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 14

Page 15: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Simple Starting Points for Debuggingโ€ข Start simple!

โ€“ First, overfit to a single training sampleโ€“ Second, overfit to several training samples

โ€ข Always try simple architecture firstโ€“ It will verify that you are learning something

โ€ข Estimate timings (how long for each epoch?)

Prof. Leal-Taixรฉ and Prof. Niessner 15

Page 16: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural Network

โ€ข Problems of going deeperโ€ฆโ€“ Vanishing gradients (multiplication of chain rule)

โ€ข The impact of small decisions (architecture, activation functions...)

โ€ข Is my network training correctly?

Prof. Leal-Taixรฉ and Prof. Niessner 16

Page 17: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural Networks

Prof. Leal-Taixรฉ and Prof. Niessner 17

3) Input of data

2) Functions in Neurons

1) Output functions

Page 18: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Output Functions

Prof. Leal-Taixรฉ and Prof. Niessner 18

Page 19: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural NetworksWhat is the shape of this function?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixรฉ and Prof. Niessner 19

Page 20: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Sigmoid for Binary Predictions

0

Can be interpreted as a probability

1

Prof. Leal-Taixรฉ and Prof. Niessner 20

Page 21: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Softmax formulationโ€ข What if we have multiple classes?

Prof. Leal-Taixรฉ and Prof. Niessner 21

Page 22: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Softmax formulationโ€ข What if we have multiple classes?

Softmax

Prof. Leal-Taixรฉ and Prof. Niessner 22

Page 23: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Softmax formulationโ€ข What if we have multiple classes?

Softmax

Prof. Leal-Taixรฉ and Prof. Niessner 23

Page 24: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Softmax formulationโ€ข Softmax

โ€ข Softmax loss (Maximum Likelihood Estimate)

exp

normalize

Prof. Leal-Taixรฉ and Prof. Niessner 24

Page 25: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Loss Functions

Prof. Leal-Taixรฉ and Prof. Niessner 25

Page 26: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Naรฏve Losses

- Sum of squared differences (SSD)- Prune to outliers- Compute-efficient (optimization)- Optimum is the mean

L2 Loss: ๐ฟ2 = ฯƒ๐‘–=1๐‘› ๐‘ฆ๐‘– โˆ’ ๐‘“ ๐‘ฅ๐‘–

2

L1 Loss: ๐ฟ1 = ฯƒ๐‘–=1๐‘› |๐‘ฆ๐‘– โˆ’ ๐‘“(๐‘ฅ๐‘–)|

12 24 42 23

34 32 5 2

12 31 12 31

31 64 5 13

15 20 40 25

34 32 5 2

12 31 12 31

31 64 5 13

๐‘ฅ๐‘– ๐‘ฆ๐‘–

๐ฟ1 ๐‘ฅ, ๐‘ฆ = 3 + 4 + 2 + 2 + 0 +โ‹ฏ+ 0 = 15

๐ฟ2 ๐‘ฅ, ๐‘ฆ = 9 + 16 + 4 + 4 + 0 +โ‹ฏ+ 0 = 66- Sum of absolute differences- Robust- Costly to compute- Optimum is the median

Prof. Leal-Taixรฉ and Prof. Niessner 26

Page 27: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

Prof. Leal-Taixรฉ and Prof. Niessner 27

Page 28: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

3.25.1-1.7

Prof. Leal-Taixรฉ and Prof. Niessner 28

Page 29: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

3.25.1-1.7

24.5164.00.18

exp

Prof. Leal-Taixรฉ and Prof. Niessner 29

Page 30: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

norm

aliz

e

Prof. Leal-Taixรฉ and Prof. Niessner 30

Page 31: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss 2.04 0.14 6.94

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

2.040.146.94no

rmal

ize

-lo

g(x

)

Prof. Leal-Taixรฉ and Prof. Niessner 31

Page 32: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Cross-Entropy (Softmax)Softmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss 2.04 0.079 6.156

sco

res

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

2.040.146.94no

rmal

ize

-lo

g(x

)

๐ฟ =1

๐‘

๐‘–=1

๐‘

๐ฟ๐‘– =

=๐ฟ1 + ๐ฟ2 + ๐ฟ3

3=

=2.04 + 0.079 + 6.156

3=

= ๐Ÿ. ๐Ÿ•๐Ÿ”Prof. Leal-Taixรฉ and Prof. Niessner 32

Page 33: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 33

Page 34: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

Prof. Leal-Taixรฉ and Prof. Niessner 34

Page 35: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

Prof. Leal-Taixรฉ and Prof. Niessner 35

Page 36: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€sc

ore

s

Prof. Leal-Taixรฉ and Prof. Niessner 36

Page 37: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

๐ฟ1 =max(0, 5.1 โˆ’ 3.2 + 1) +max(0, โˆ’1.7 โˆ’ 3.2 + 1) == max 0, 2.9 + max(0, โˆ’3.9) == 2.9 + 0 == ๐Ÿ. ๐Ÿ—

2.9

sco

res

Prof. Leal-Taixรฉ and Prof. Niessner 37

Page 38: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss

๐ฟ2 =max(0, 1.3 โˆ’ 4.9 + 1) +max(0, 2.0 โˆ’ 4.9 + 1) == max 0, โˆ’2.6 + max 0,โˆ’1.9 == 0 + 0 == ๐ŸŽ

2.9 0

sco

res

Prof. Leal-Taixรฉ and Prof. Niessner 38

Page 39: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss 2.9 0 12.9

๐ฟ3 =max(0, 2.2 โˆ’ (โˆ’3.1) + 1) +max(0, 2.5 โˆ’ (โˆ’3.1) + 1) == max 0, 6.3 + max 0, 6.6 == 6.3 + 6.6 == ๐Ÿ๐Ÿ. ๐Ÿ—

sco

res

Prof. Leal-Taixรฉ and Prof. Niessner 39

Page 40: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss (SVM Loss)Multiclass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Suppose: 3 training examples and 3 classes

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchairโ€œcarโ€

Loss 2.9 0 12.9

๐ฟ =1

๐‘

๐‘–=1

๐‘

๐ฟ๐‘– =

=๐ฟ1 + ๐ฟ2 + ๐ฟ3

3=

=2.9 + 0 + 10.9

3=

= ๐Ÿ’. ๐Ÿ”

sco

res

Full Loss (over all pairs):

Prof. Leal-Taixรฉ and Prof. Niessner 40

Page 41: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss vs Softmax

Softmax: ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Hinge loss: ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 41

Page 42: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss vs Softmax

Softmax: ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Given the following scores:

๐‘  = [5,โˆ’3, 2]

๐‘  = [5, 10, 10]

๐‘  = [5, โˆ’20,โˆ’20]

๐‘ฆ๐‘– = 0

Hinge loss: ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 42

Page 43: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss vs Softmax

Softmax: ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Given the following scores:

๐‘  = [5,โˆ’3, 2]

๐‘  = [5, 10, 10]

๐‘  = [5, โˆ’20,โˆ’20]

๐‘ฆ๐‘– = 0

Hinge loss: max(0,โˆ’3 โˆ’ 5 + 1) +max 0, 2 โˆ’ 5 + 1 =0

max(0, 10 โˆ’ 5 + 1) +max 0, 10 โˆ’ 5 + 1 =12

max(0,โˆ’20 โˆ’ 5 + 1) +max 0,โˆ’20 โˆ’ 5 + 1 =0

Hinge loss: ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 43

Page 44: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss vs Softmax

Softmax: ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Given the following scores:

๐‘  = [5,โˆ’3, 2]

๐‘  = [5, 10, 10]

๐‘  = [5, โˆ’20,โˆ’20]

๐‘ฆ๐‘– = 0

Hinge loss: Softmax loss: max(0,โˆ’3 โˆ’ 5 + 1) +max 0, 2 โˆ’ 5 + 1 =0

max(0, 10 โˆ’ 5 + 1) +max 0, 10 โˆ’ 5 + 1 =12

max(0,โˆ’20 โˆ’ 5 + 1) +max 0,โˆ’20 โˆ’ 5 + 1 =0

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11

Hinge loss: ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 44

Page 45: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Hinge Loss vs Softmax

Softmax: ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Given the following scores:

๐‘  = [5,โˆ’3, 2]

๐‘  = [5, 10, 10]

๐‘  = [5, โˆ’20,โˆ’20]

๐‘ฆ๐‘– = 0

Hinge loss: Softmax loss: max(0,โˆ’3 โˆ’ 5 + 1) +max 0, 2 โˆ’ 5 + 1 =0

max(0, 10 โˆ’ 5 + 1) +max 0, 10 โˆ’ 5 + 1 =12

max(0,โˆ’20 โˆ’ 5 + 1) +max 0,โˆ’20 โˆ’ 5 + 1 =0

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70

Googleโ€ฆ-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11

Softmax *always* wants to improve!Hinge Loss saturates

Hinge loss: ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Prof. Leal-Taixรฉ and Prof. Niessner 45

Page 46: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Loss in Compute Graph

๐‘Š

๐‘ฅ๐‘–

๐‘“(๐‘ฅ๐‘– ,๐‘Š)SVM ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

๐‘ฆ๐‘–

๐ฟscore function

regularization loss

data lossesSoftmax ๐ฟ๐‘– = โˆ’ log(

๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Full Loss ๐ฟ =1

๐‘ฯƒ๐‘–=1๐‘ ๐ฟ๐‘– + ๐‘…2(๐‘Š)

e.g., ๐ฟ2-reg: ๐‘…2 ๐‘Š = ฯƒ๐‘–=1๐‘ ๐‘ค๐‘–

2

labeled data

input data

Given a function with weights ๐‘Š,Training pairs [๐‘ฅ๐‘–; ๐‘ฆ๐‘–] (input and labels)

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Prof. Leal-Taixรฉ and Prof. Niessner 46

Page 47: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Compute Graphs

SVM ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Softmax ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Full Loss ๐ฟ =1

๐‘ฯƒ๐‘–=1๐‘ ๐ฟ๐‘– + ๐‘…2(๐‘Š)

e.g., ๐ฟ2-reg: ๐‘…2 ๐‘Š = ฯƒ๐‘–=1๐‘ ๐‘ค๐‘–

2

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Prof. Leal-Taixรฉ and Prof. Niessner 47

Page 48: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Compute Graphs

SVM ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–max(0, ๐‘ ๐‘— โˆ’ ๐‘ ๐‘ฆ๐‘– + 1)

Softmax ๐ฟ๐‘– = โˆ’ log(๐‘’๐‘ ๐‘ฆ๐‘–

ฯƒ๐‘– ๐‘’๐‘ ๐‘—)

Full Loss ๐ฟ =1

๐‘ฯƒ๐‘–=1๐‘ ๐ฟ๐‘– + ๐‘…2(๐‘Š)

e.g., ๐ฟ2-reg: ๐‘…2 ๐‘Š = ฯƒ๐‘–=1๐‘ ๐‘ค๐‘–

2

Score function ๐‘  = ๐‘“(๐‘ฅ๐‘– ,๐‘Š)e.g., ๐‘“(๐‘ฅ๐‘– ,๐‘Š) = ๐‘Š โ‹… ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘

๐‘‡

Want to find optimal ๐‘Š. I.e., weights are unknowns ofoptimization problem

Compute gradient w.r.t. ๐‘Š.

Gradient ๐›ป๐‘Š๐ฟ is computed via backpropagation

Prof. Leal-Taixรฉ and Prof. Niessner 48

Page 49: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Weight Regularization & SVM LossMulticlass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘— โˆ’ ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘ฆ๐‘–+ 1)

๐ฟ2-reg: ๐‘…2 ๐‘Š = ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

๐‘ค๐‘–2

๐ฟ1-reg: ๐‘…1 ๐‘Š = ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

|๐‘ค๐‘–|

Full loss ๐ฟ = 1

๐‘ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max 0, ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘— โˆ’ ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘ฆ๐‘– + 1 + ๐œ†๐‘…(๐‘Š)

Prof. Leal-Taixรฉ and Prof. Niessner 49

Page 50: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Weight Regularization & SVM LossMulticlass SVM loss ๐ฟ๐‘– = ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max(0, ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘— โˆ’ ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘ฆ๐‘–+ 1)

๐ฟ2-reg: ๐‘…2 ๐‘Š = ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

๐‘ค๐‘–2

๐ฟ1-reg: ๐‘…1 ๐‘Š = ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

|๐‘ค๐‘–|

Full loss ๐ฟ = 1

๐‘ฯƒ๐‘–=1๐‘ ฯƒ๐‘—โ‰ ๐‘ฆ๐‘–

max 0, ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘— โˆ’ ๐‘“ ๐‘ฅ๐‘–;๐‘Š ๐‘ฆ๐‘– + 1 + ๐œ†๐‘…(๐‘Š)

๐‘ค2 = [0.25, 0.25, 0.25, 0.25]

๐‘ฅ = 1,1,1,1

๐‘ค1 = [1, 0, 0, 0]

Example:

๐‘ค1๐‘‡๐‘ฅ = ๐‘ค2

๐‘‡๐‘ฅ = 1

๐‘…2 ๐‘ค1 = 1

๐‘…2 ๐‘ค2 = 0.252 + 0.252 + 0.252 + 0.252 = 0.25

Prof. Leal-Taixรฉ and Prof. Niessner 50

Page 51: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Activation Functions

Prof. Leal-Taixรฉ and Prof. Niessner 51

Page 52: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neurons

Prof. Leal-Taixรฉ and Prof. Niessner 52

Page 53: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neurons

Prof. Leal-Taixรฉ and Prof. Niessner 53

Page 54: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural NetworksWhat is the shape of this function?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixรฉ and Prof. Niessner 54

Page 55: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Activation Functions or Hidden Units

Prof. Leal-Taixรฉ and Prof. Niessner 55

Page 56: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Sigmoid

Can be interpreted as a probability

Prof. Leal-Taixรฉ and Prof. Niessner 56

Page 57: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

SigmoidForward

Prof. Leal-Taixรฉ and Prof. Niessner 57

Page 58: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

SigmoidForward

Saturated neurons kill the gradient flow

Prof. Leal-Taixรฉ and Prof. Niessner 58

Page 59: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

SigmoidForward

Active region for gradient

descent

Prof. Leal-Taixรฉ and Prof. Niessner 59

Page 60: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Sigmoid

Output is always positive

Prof. Leal-Taixรฉ and Prof. Niessner 60

Page 61: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Problem of Positive Output

We want to compute the gradient wrt the weights

Prof. Leal-Taixรฉ and Prof. Niessner 61

Page 62: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Problem of Positive Output

We want to compute the gradient wrt the weights

Prof. Leal-Taixรฉ and Prof. Niessner 62

Page 63: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Problem of Positive Output

It is going to be either positive or negative for all weights

Prof. Leal-Taixรฉ and Prof. Niessner 63

Page 64: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Problem of Positive Output

More on zero-mean data later

Prof. Leal-Taixรฉ and Prof. Niessner 64

Page 65: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

tanh

Zero-centered

Still saturates

Still saturates

LeCun 1991Prof. Leal-Taixรฉ and Prof. Niessner 65

Page 66: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Rectified Linear Units (ReLU)

Large and consistent gradients

Does not saturateFast convergence

Krizhevsky 2012Prof. Leal-Taixรฉ and Prof. Niessner 66

Page 67: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Rectified Linear Units (ReLU)

Large and consistent gradients

Does not saturateFast convergence

What happens if a ReLU outputs zero?

Dead ReLU

Prof. Leal-Taixรฉ and Prof. Niessner 67

Page 68: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Rectified Linear Units (ReLU)โ€ข Initializing ReLU neurons with slightly positive biases

(0.1) makes it likely that they stay active for most inputs

Prof. Leal-Taixรฉ and Prof. Niessner 68

Page 69: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Leaky ReLU

Does not die

Mass 2013Prof. Leal-Taixรฉ and Prof. Niessner 69

Page 70: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Parametric ReLU

Does not die

One more parameter to backprop into

He 2015Prof. Leal-Taixรฉ and Prof. Niessner 70

Page 71: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Maxout units

Goodfellow 2013Prof. Leal-Taixรฉ and Prof. Niessner 71

๐‘€๐‘Ž๐‘ฅ๐‘œ๐‘ข๐‘ก = max(๐‘ค1๐‘‡๐‘ฅ + ๐‘1, ๐‘ค2

๐‘‡๐‘ฅ + ๐‘2)

Page 72: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Maxout units

Piecewise linear approximation of a convex function with N pieces

Goodfellow 2013Prof. Leal-Taixรฉ and Prof. Niessner 72

Page 73: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Maxout units

Generalization of ReLUs

Linear regimes

Does not die

Does not saturate

Increase of the number of parametersProf. Leal-Taixรฉ and Prof. Niessner 73

Page 74: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Quick Guideโ€ข Sigmoid is not really used

โ€ข ReLU is the standard choice

โ€ข Second choice are the variants of ReLu or Maxout

โ€ข Recurrent nets will require tanh or similar

Prof. Leal-Taixรฉ and Prof. Niessner 74

Page 75: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Neural Networks

Data pre-processing?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixรฉ and Prof. Niessner 75

Page 76: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Data pre-processing

For images subtract the mean image (AlexNet) or per-channel mean (VGG-Net)Prof. Leal-Taixรฉ and Prof. Niessner 76

Page 77: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Outlook

Prof. Leal-Taixรฉ and Prof. Niessner 77

Page 78: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Outlook

Prof. Leal-Taixรฉ and Prof. Niessner 78

Regularization in the optimization:- Dropout, weight decay, etc..

Regularization in the architecture:- Convolutions! (CNNs)

Init. of optimization- How to set weights at beginning

Handling limited training data- Data augmentation

Page 79: Lecture 6 recapย ยท Cross-Entropy (Softmax) Softmax =โˆ’log( ๐‘’๐‘ ๐‘ฆ ฯƒ ๐‘’ ๐‘  ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair โ€œcarโ€

Next lecture

โ€ข Next lecture Dec 13th

โ€“ More about training neural networks; regularization, BN, dropout, etc.

โ€“ Followed by CNNs

Prof. Leal-Taixรฉ and Prof. Niessner 79