lecture 20 forward pass and back propagation
TRANSCRIPT
Happy Wednesday!โช Assignment 4 due tonight Nov 11th, 11:59 pm (midnight)
โช Exceptional late policy: No penalty until Mon, Nov 23rd, 11:59 pm
โช Quiz 11, Friday, Oct 30th 6am until Nov 1st 11:59pm (midnight)โช Neural networks
โช Touch-point 3: deliverables due Nov 22nd, live-event Mon, Nov 23rd
โช Single-slide presentation outlining progress highlights and current challenges โช Three-minute pre-recorded presentation with your progress and current challenges
โช Project final report due Dec 7th 11:59pm (midnight)โช GitHub page with all of the results you have achieved utilizing both unsupervised
learning and supervised learningโช Final seven-minute long pre-recorded presentation
CS4641B Machine Learning | Fall 2020 1
Coming up soon
CS4641B Machine Learning
Lecture 22: Neural networks: Backpropagation algorithmRodrigo Borela โฃ [email protected]
Recap
๐ฅ1
1
ฮฃ
๐ฅ2
โฎ
ฮฃ
ฮฃ
ฮฃ
ฮฃ
๐ฅ๐ท
โฎ
โฎ
๐ฆ๐
๐ฆ1
1
โช This is a two layer feed forward neural network
Input layer
Hidden layer
Output layer
CS4641B Machine Learning | Fall 2020 3
Recap: Forward pass
CS4641B Machine Learning | Fall 2020 4
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
๐ค23(1)
๐ค31(2)
๐ค01(1)
๐ค13(1)
๐ค01(2)
Recap: Forward pass
CS4641B Machine Learning | Fall 2020 5
โช Activations
๐1 =
๐=1
๐ท
๐ค๐1(1)๐ฅ๐ +๐ค01
(1)= ๐ค11
(1)๐ฅ1 +๐ค21
(1)๐ฅ2 +๐ค01
(1)
๐2 =
๐=1
๐ท
๐ค๐2(1)๐ฅ๐ +๐ค02
(1)= ๐ค12
(1)๐ฅ1 + ๐ค22
(1)๐ฅ2 +๐ค02
(1)
๐3 =
๐=1
๐ท
๐ค๐3(1)๐ฅ๐ +๐ค03
(1)= ๐ค13
(1)๐ฅ1 + ๐ค23
(1)๐ฅ2 +๐ค03
(1)
โช Hidden units
๐ง1 = โ ๐1 =1
1 + exp โ๐1
๐ง2 = โ ๐2 =1
1 + exp โ๐2
๐ง3 = โ ๐3 =1
1 + exp โ๐3
Recap: Forward pass
CS4641B Machine Learning | Fall 2020 6
โช Output
๐1(2)
=
๐=1
๐
๐ค๐๐(2)๐ง๐ +๐ค01
(2)= ๐ค11
(2)๐ง1 +๐ค21
(2)๐ง2 +๐ค31
(2)๐ง3 +๐ค01
(2)
๐ฆ1 = ๐ ๐1(2)
= ๐1(2)
Training the network
CS4641B Machine Learning | Fall 2020 7
Training the networkโช Stochastic gradient descent
๐ฐ ๐+1 = ๐ฐ ๐ โ ๐โ๐ธ๐ ๐ฐ ๐
โช Considering a linear model ๐ฆ๐ ๐ฑ,๐ฐ๐ = ๐ฐ๐๐๐ฑ = ฯ๐=0
๐ท ๐ค๐๐๐ฅ๐
โช Error function for a datapoint ๐ฑ๐ with a target vector of size ๐:
๐ธ๐ =1
2
๐=1
๐พ
๐ฆ๐๐ โ ๐ก๐๐2
โช Calculating the gradient of this error function with respect to ๐ค๐๐:๐๐ธ๐๐๐ค๐๐
= ๐ฆ๐๐ โ ๐ก๐๐ ๐ฅ๐๐
CS4641B Machine Learning | Fall 2020 8
Training the networkโช Activation:
๐๐ = ฯ๐๐ค๐๐๐ง๐ (๐ง๐ input from another layer)
โช Hidden unit
๐ง๐ = โ ๐๐
โช The gradient of the error ๐ธ๐ depends on the weight ๐ค๐๐ only via the summed input ๐๐to unit ๐
๐๐ธ๐๐๐ค๐๐
=๐๐ธ๐๐๐๐
๐๐๐
๐๐ค๐๐
CS4641B Machine Learning | Fall 2020 9
Training the networkโช Output layer
๐ฟ๐ = ๐ฆ๐ โ ๐ก๐
โช Hidden layers
๐ฟ๐ =๐๐ธ๐๐๐๐
=
๐
๐๐ธ๐๐๐๐
๐๐๐๐๐๐
๐ฟ๐ =๐๐ธ๐๐๐๐
= โโฒ ๐๐
๐
๐ค๐๐๐ฟ๐
CS4641B Machine Learning | Fall 2020 10
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 11
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 12
Backprop: example
CS4641B Machine Learning | Fall 2020 13
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
๐ค23(1)
๐ค31(2)
๐ค01(1)
๐ค13(1)
๐ค01(2)
๐ฟ0(1)
๐ฟ1(2)
๐ฟ1(1)
๐ฟ2(1)
๐ฟ3(1) Our goal is to compute:
๐๐ธ๐๐๐ค๐๐
,๐๐ธ๐๐๐ค๐๐
Backprop: example
CS4641B Machine Learning | Fall 2020 14
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
๐ค๐๐ = 1.0, ๐ค๐๐ = 1
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 15
Backprop: example
CS4641B Machine Learning | Fall 2020 16
โช Forward pass with input vector ๐ฑ๐ = 0.5 0.5 ๐, ๐ก๐ = 0.5
๐11=
๐=1
๐ท
๐ค๐1(1)๐ฅ๐ + ๐ค01
(1)= ๐ค11
(1)๐ฅ1 +๐ค21
(1)๐ฅ2 + ๐ค01
(1)= 1 ร 0.5 + 1 ร 0.5 + 1 = 2.0
๐2(1)
=
๐=1
๐ท
๐ค๐2(1)๐ฅ๐ + ๐ค02
(1)= ๐ค12
(1)๐ฅ1 +๐ค22
(1)๐ฅ2 + ๐ค02
(1)= 1 ร 0.5 + 1 ร 0.5 + 1 = 2.0
๐3(1)
=
๐=1
๐ท
๐ค๐3(1)๐ฅ๐ + ๐ค03
(1)= ๐ค13
(1)๐ฅ1 +๐ค23
(1)๐ฅ2 + ๐ค03
(1)= 1 ร 0.5 + 1 ร 0.5 + 1 = 2.0
โช Hidden units:
๐ง1 = โ ๐11
=1
1 + exp โ๐11
=1
1 + exp(โ2)= 0.88
๐ง2 = โ ๐2(1)
=1
1 + exp โ๐2(1)
=1
1 + exp(โ2)= 0.88
๐ง3 = โ ๐3(1)
=1
1 + exp โ๐3(1)
=1
1 + exp(โ2)= 0.88
Backprop: example
CS4641B Machine Learning | Fall 2020 17
โช Output
๐12=
๐=1
๐
๐ค๐๐(2)๐ง๐ + ๐ค01
(2)= ๐ค11
(2)๐ง1 + ๐ค21
(2)๐ง2 +๐ค31
(2)๐ง3 + ๐ค01
(2)= 3.64
๐ฆ1 = ๐ ๐12
= ๐12= 3.64
Backprop: example
CS4641B Machine Learning | Fall 2020 18
Input variable ๐ฑ๐โช ๐ฅ1 = 0.5โช ๐ฅ2 = 0.5
Hidden units:โช ๐ง1 = 0.88โช ๐ง2 = 0.88โช ๐ง3 = 0.88
Outputโช ๐ฆ1 = 3.64
Targetโช ๐ก๐ = 0.5
0.5
0.5
.88
1
1
3.64
๐ฟ0(1)
๐ฟ1(2)
๐ฟ1(1)
๐ฟ2(1)
๐ฟ3(1)
.88
.88
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 19
Backprop: example
CS4641B Machine Learning | Fall 2020 20
โช Only one output with a linear activation function, therefore:
๐ฟ1(2)
= ๐ฆ1 โ ๐ก๐1 = 3.64 โ 0.5 = 3.14
โช Backpropagate
๐ฟ๐(1)
= โโฒ ๐๐ ๐ค๐1(1)๐ฟ1(2)
โช For the sigmoid function:
โ ๐ =1
1 + exp โ๐โ โโฒ ๐ = โ(๐)(1 โ โ ๐ )
โช Since ๐ง๐ = โ(๐๐), the expression then becomes:
๐ฟ๐(1)
= ๐ง๐ โ ๐ง๐2 ๐ค๐1
(1)๐ฟ1(2)
โช ๐ฟ0(1)
= 0, ๐ฟ1(1)
= 0.33, ๐ฟ2(1)
= 0.33, ๐ฟ3(1)
= 0.33
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
๐ค23(1)
๐ค31(2)
๐ค01(1)
๐ค13(1)
๐ค01(2)
๐ฟ0(1)
๐ฟ1(2)
๐ฟ1(1)
๐ฟ2(1)
๐ฟ3(1)
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 21
Backprop: example
CS4641B Machine Learning | Fall 2020 22
โช Evaluate the derivatives
๐๐ธ๐
๐๐ค๐๐(2)
= ๐ฟ๐๐ง๐ โ
๐๐ธ๐
๐๐ค01(2)
= ๐ฟ1(2)๐ง0 = 3.14 ร 1 = 3.14
๐๐ธ๐
๐๐ค11(2)
= ๐ฟ1(2)๐ง1 = 3.14 ร 0.88 = 2.76
๐๐ธ๐
๐๐ค21(2)
= ๐ฟ1(2)๐ง2 = 3.14 ร 0.88 = 2.76
๐๐ธ๐
๐๐ค31(2)
= ๐ฟ1(2)๐ง3 = 3.14 ร 0.88 = 2.76
Backprop: example
CS4641B Machine Learning | Fall 2020 23
โช Evaluate the derivatives
๐๐ธ๐
๐๐ค๐๐(1)
= ๐ฟ๐(1)๐ฅ๐ โ
๐๐ธ๐
๐๐ค01(1)
= ๐ฟ1(1)๐ฅ0,
๐๐ธ๐
๐๐ค11(1)
= ๐ฟ1(1)๐ฅ1,
๐๐ธ๐
๐๐ค21(1)
= ๐ฟ1(1)๐ฅ2
๐๐ธ๐
๐๐ค02(1)
= ๐ฟ2(1)๐ฅ0,
๐๐ธ๐
๐๐ค12(1)
= ๐ฟ2(1)๐ฅ1,
๐๐ธ๐
๐๐ค22(1)
= ๐ฟ2(1)๐ฅ2
๐๐ธ๐
๐๐ค03(1)
= ๐ฟ3(1)๐ฅ0,
๐๐ธ๐
๐๐ค13(1)
= ๐ฟ3(1)๐ฅ1,
๐๐ธ๐
๐๐ค23(1)
= ๐ฟ3(1)๐ฅ2
๐๐ธ๐
๐๐ค01(1)
= 0.33,๐๐ธ๐
๐๐ค11(1)
= 0.165,๐๐ธ๐
๐๐ค21(1)
= 0.165
๐๐ธ๐
๐๐ค02(1)
= 0.33,๐๐ธ๐
๐๐ค12(1)
= 0.165,๐๐ธ๐
๐๐ค22(1)
= 0.165
๐๐ธ๐
๐๐ค03(1)
= 0.33,๐๐ธ๐
๐๐ค13(1)
= 0.165,๐๐ธ๐
๐๐ค23(1)
= 0.165
Error backpropagation algorithmโช Initialize the weightsโช Apply input vector ๐ฑ๐โช Evaluate the ๐ฟ๐ for all the output unitsโช Backpropagate the ๐ฟ to obtain ๐ฟ๐ for
each hidden unitโช Evaluate the required derivativesโช Update the weights ๐ง๐
๐ง๐
๐ฟ๐
๐ฟ1
๐ฟ๐๐ค๐๐
๐ค๐๐
CS4641B Machine Learning | Fall 2020 24
Backprop: example
CS4641B Machine Learning | Fall 2020 25
โช Use the update rule and the calculated error derivatives
๐ค๐๐๐ค = ๐ค๐๐๐ โ ๐โ๐ธ๐ ๐ค๐๐๐
โช Assuming an ๐ =1
3the new weights are calculated for the network, e.g. weight ๐ค01
(1)(weight from
๐ฅ0 to ๐ง1 on the first layer).
๐ค011 ,๐๐๐ค
= ๐ค011 ,๐๐๐
โ ๐๐๐ธ๐
๐๐ค011 ,๐๐๐
= 1 โ1
30.33 = 0.89
Backprop: example
CS4641B Machine Learning | Fall 2020 26
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
0.89
0.89
0.890.945
โ0.046
0.08
Test
CS4641B Machine Learning | Fall 2020 27
๐ฅ1
๐ฅ2
๐ง1
๐ง2
๐ง3
๐ง0
๐ฅ0
๐ฆ1
0.89
0.89
0.890.945
โ0.046
0.08
๐ฑ =0.70.7
, ๐ก = โ0.02
Test
CS4641B Machine Learning | Fall 2020 28
โช Activations
๐1 =
๐=1
๐ท
๐ค๐1(1)๐ฅ๐ +๐ค01
(1)= ๐ค11
(1)๐ฅ1 +๐ค21
(1)๐ฅ2 +๐ค01
(1)
๐2 =
๐=1
๐ท
๐ค๐2(1)๐ฅ๐ +๐ค02
(1)= ๐ค12
(1)๐ฅ1 + ๐ค22
(1)๐ฅ2 +๐ค02
(1)
๐3 =
๐=1
๐ท
๐ค๐3(1)๐ฅ๐ +๐ค03
(1)= ๐ค13
(1)๐ฅ1 + ๐ค23
(1)๐ฅ2 +๐ค03
(1)
โช Hidden units
๐ง1 = โ ๐1 =1
1 + exp โ๐1
๐ง2 = โ ๐2 =1
1 + exp โ๐2
๐ง3 = โ ๐3 =1
1 + exp โ๐3
Test
CS4641B Machine Learning | Fall 2020 29
โช Output
๐1(2)
=
๐=1
๐
๐ค๐๐(2)๐ง๐ +๐ค01
(2)= ๐ค11
(2)๐ง1 +๐ค21
(2)๐ง2 +๐ค31
(2)๐ง3 +๐ค01
(2)
๐ฆ1 = ๐ ๐1(2)
= ๐1(2)