![Page 1: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/1.jpg)
Recitation 2: Computing Derivatives
1
![Page 2: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/2.jpg)
Notation and Conventions
2
![Page 3: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/3.jpg)
Definition of Derivative1. Math Definition: !"
!#= lim
∆)→+,",#
2. Intuition: • Question: If I increase 𝑥 by a tiny bit, how much will the overall 𝑓(𝑥) increase?
• Answer: This tiny change will result in 𝑓′(𝑥) derivative valuechange
• Geometrics: The derivative of 𝑓 w.r.t. x at 𝑥+ is the slope of the tangent line to the graph of 𝑓 at 𝑥+
3
![Page 4: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/4.jpg)
Computing Derivatives
Notice: the shape of the derivative for any variable will be transposed with respect to that variable
Derivative Shape:𝑧3
𝜕𝐿/𝜕𝑧3
𝑊3
𝜕𝐿/𝜕𝑊3
4
![Page 5: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/5.jpg)
Rule 1(a): Scalar Multiplication
5
![Page 6: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/6.jpg)
Rule 2(a): Scalar Addition𝑧 = 𝑥 + 𝑦𝐿 = 𝑓(𝑧)
𝜕𝐿𝜕𝑥
=𝜕𝐿𝜕𝑧𝜕𝑧𝜕𝑥
=𝜕𝐿𝜕𝑧
𝜕𝐿𝜕𝑦
=𝜕𝐿𝜕𝑧𝜕𝑧𝜕𝑦
=𝜕𝐿𝜕𝑧
• All terms are scalars• :;:<
is known
6
![Page 7: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/7.jpg)
Rule 3(a): Scalar Chain Rule𝑧 = 𝑔(𝑥)𝐿 = 𝑓(𝑧)
7
![Page 8: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/8.jpg)
Rule 4(a): The GeneralizedChain Rule (Scalar)
𝐿 = 𝑓(𝑔>(𝑥), 𝑔@(𝑥), … , 𝑔B(𝑥))
• 𝑥 is scalar• :;:CD
are know for all i
𝜕𝐿𝜕𝑥
=𝜕𝐿𝜕𝑔>
𝜕𝑔>𝜕𝑥
+𝜕𝐿𝜕𝑔@
𝜕𝑔@𝜕𝑥
+⋯+𝜕𝐿𝜕𝑔B
𝜕𝑔B𝜕𝑥
8
![Page 9: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/9.jpg)
Rule 1(b): Matrix Multiplication𝑧 = 𝑊𝑥𝐿 = 𝑓(𝑧)
9
![Page 10: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/10.jpg)
Rule 2(b): Vector Addition𝑧 = 𝑥 + 𝑦𝐿 = 𝑓(𝑧)
10
![Page 11: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/11.jpg)
Rule 3(b): Chain Rule (vector)𝑧 = 𝑔(𝑥)𝐿 = 𝑓(𝑧)
11
![Page 12: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/12.jpg)
Rule 4(b): The GeneralizedChain Rule (vector)
• 𝑥 is an 𝑁 × 1 vector• The functions 𝑔3 output 𝑀×1 vectors for all i• ∇CD𝐿 are known for all i (and are 1×𝑀 vectors)• JL𝑔3 are 𝐽𝑎𝑐𝑜𝑏𝑖𝑎𝑛 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠 of 𝑔3(𝑥) w.r.t. 𝑥 of size 𝑀×𝑁 matrices.
𝛻#L =[3
∇CD𝐿 𝐽#𝑔312
𝐿 = 𝑓(𝑔>(𝑥), 𝑔@(𝑥), … , 𝑔B(𝑥))
𝑔B…
L
𝑔B\>𝑔>
𝑥
𝑔B
![Page 13: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/13.jpg)
Rule (5): Element-wise Multiplication𝑧 = 𝑥 ∘ 𝑦𝐿 = 𝑓(𝑦)
13
![Page 14: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/14.jpg)
Rule 6: Element-wise Function𝑧 = 𝑔(𝑥)𝐿 = 𝑓(𝑧)
row
14
![Page 15: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/15.jpg)
Computing Derivative of Complex Functions
• We now are prepared to compute very complex derivatives• Given forward computation, the key is to work backward
through the simple relations• Procedure:
• Express the computation as a series of computations of intermediate values
• Each computation must comprise either a unary or binary relation• Unary relation: RHS has one argument,
e.g. 𝑦 = 𝑔(𝑥)• Binary relation: RHS has two arguments,
e.g. 𝑧 = 𝑥 + 𝑦 or 𝑧 = 𝑥𝑦15
![Page 16: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/16.jpg)
Example 1: MLP FeedfowardNetwork
• Suppose a MLP network with 2 hidden layersEquations of network (in the order in which they are computed sequentially)
1 𝑧> = 𝑊>𝑥 + 𝑏>2 𝑧@ = 𝑟𝑒𝑙𝑢 𝑧>3 𝑧` = 𝑊@𝑧@ + 𝑏@4 𝑧a = 𝑟𝑒𝑙𝑢 𝑧`5 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑊`𝑧a + 𝑏`
(Notice that these operations are not in unary and binary forms)
16
![Page 17: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/17.jpg)
Example 1: MLP FeedfowardNetwork
Rewrite these in terms of unary and binary operations
1 𝑧> = 𝑊>𝑥 + 𝑏>2 𝑧@ = 𝑟𝑒𝑙𝑢 𝑧>3 𝑧` = 𝑊@𝑧@ + 𝑏@4 𝑧a = 𝑟𝑒𝑙𝑢 𝑧`5 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑊`𝑧a + 𝑏`
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
17
![Page 18: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/18.jpg)
Example 1: MLP Backward Network
• Now we will work out way backward
• We assume derivative !;!nopqop
of the loss w.r.t. 𝑂𝑢𝑡𝑝𝑢𝑡 is given
• We need to compute :;:#, :;:sD
, :;:tD
, which derivative w.r.t. input and parameters within hidden layers
18
![Page 19: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/19.jpg)
Example 1: MLP Backward Network
1 𝛻<u𝐿 = 𝛻vopqop𝐿2 𝛻tw𝐿 = 𝛻vopqop𝐿
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
(Recall that for Vector Addition)
19
![Page 20: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/20.jpg)
Example 1: MLP Backward Network
1. 𝛻<u𝐿 = 𝛻vopqop𝐿2. 𝛻tw𝐿 = 𝛻vopqop𝐿3. 𝛻sw𝐿 = 𝑧i𝛻<u𝐿4. 𝛻<y = 𝛻<u𝐿𝑊`
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
Derivative Shape:𝑧3
𝜕𝐿/𝜕𝑧3
𝑊3
𝜕𝐿/𝜕𝑊3
20
![Page 21: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/21.jpg)
Example 1: MLP Backward Network
1. 𝛻<u𝐿 = 𝛻vopqop𝐿2. 𝛻tw𝐿 = 𝛻vopqop𝐿3. 𝛻sw𝐿 = 𝑧i𝛻<u𝐿4. 𝛻<y𝐿 = 𝛻<u𝐿𝑊`5. 𝛻<z𝐿 = 𝛻<y𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
Recall element-wise function, where 𝑔 𝑥is element-wise funtion
21
![Page 22: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/22.jpg)
Example 1: MLP Backward Network
1. 𝛻<u𝐿 = 𝛻vopqop𝐿2. 𝛻tw𝐿 = 𝛻vopqop𝐿3. 𝛻sw𝐿 = 𝑧i𝛻<u𝐿4. 𝛻<y𝐿 = 𝛻<u𝐿𝑊`5. 𝛻<z𝐿 = 𝛻<y𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
6. 𝛻<�𝐿 = 𝛻<z𝐿7. 𝛻t�𝐿 = 𝛻<z𝐿
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
22
![Page 23: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/23.jpg)
Example 1: MLP Backward Network
1. 𝛻<u𝐿 = 𝛻vopqop𝐿2. 𝛻tw𝐿 = 𝛻vopqop𝐿3. 𝛻sw𝐿 = 𝑧i𝛻<u𝐿4. 𝛻<y𝐿 = 𝛻<u𝐿𝑊`5. 𝛻<z𝐿 = 𝛻<y𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
6. 𝛻<�𝐿 = 𝛻<z𝐿7. 𝛻t�𝐿 = 𝛻<z𝐿8. 𝛻s�𝐿 = 𝑧`𝛻<�𝐿9. 𝛻<w𝐿 = 𝛻<�𝐿𝑊@
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
23
![Page 24: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/24.jpg)
Example 1: MLP Backward Network
6. 𝛻<�𝐿 = 𝛻<z𝐿7. 𝛻t�𝐿 = 𝛻<z𝐿8. 𝛻s�𝐿 = 𝑧`𝛻<�𝐿9. 𝛻<w𝐿 = 𝛻<�𝐿𝑊@10. 𝛻<�𝐿 = 𝛻<w𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
24
![Page 25: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/25.jpg)
Example 1: MLP Backward Network
6. 𝛻<�𝐿 = 𝛻<z𝐿7. 𝛻t�𝐿 = 𝛻<z𝐿8. 𝛻s�𝐿 = 𝑧`𝛻<�𝐿9. 𝛻<w𝐿 = 𝛻<�𝐿𝑊@10. 𝛻<�𝐿 = 𝛻<w𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
11. 𝛻t�𝐿 = 𝛻<�𝐿12. 𝛻<�𝐿 = 𝛻<�𝐿
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
25
![Page 26: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/26.jpg)
Example 1: MLP Backward Network
6. 𝛻<�𝐿 = 𝛻<z𝐿7. 𝛻t�𝐿 = 𝛻<z𝐿8. 𝛻s�𝐿 = 𝑧`𝛻<�𝐿9. 𝛻<w𝐿 = 𝛻<�𝐿𝑊@10. 𝛻<�𝐿 = 𝛻<w𝐿 ∘ 1{ 𝑧g |
1{ 𝑧g = }1, 𝑥 > 00, 𝑥 ≤ 0
11. 𝛻t�𝐿 = 𝛻<�𝐿12. 𝛻<�𝐿 = 𝛻<�𝐿13. 𝛻s�𝐿 = 𝑥𝛻<�𝐿14. 𝛻#𝐿 = 𝛻<�𝐿𝑊>
1 𝑧> = 𝑊>𝑥2 𝑧@ = 𝑧> + 𝑏>3 𝑧` = 𝑟𝑒𝑙𝑢 𝑧@4 𝑧a = 𝑊@𝑧`5 𝑧g = 𝑧a + 𝑏@6 𝑧i = 𝑟𝑒𝑙𝑢 𝑧g7 zl = 𝑊`𝑧i8 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑧l + 𝑏`
26
![Page 27: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/27.jpg)
Example 2: Scanning with an MLP• X is a T x 1 vector• The MLP takes an input vector x(t) = X[t : t + N, :] of size N x 1 at each step t• O(t) is the output of the MLP at step t
27
![Page 28: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/28.jpg)
Example 2: Scanning with an MLP
28
![Page 29: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/29.jpg)
Example 2: Scanning with an MLP
29
![Page 30: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/30.jpg)
Example 2: Scanning with an MLP
30
![Page 31: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/31.jpg)
Example 2: Scanning with an MLP
31
![Page 32: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/32.jpg)
Example 2: Scanning with an MLP
32
![Page 33: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/33.jpg)
Example 2: Scanning with an MLP (forward)
• X is a T x 1 vector• The MLP takes an input vector x(t) = X[t : t + N, :] of size N x 1 at each step t• O(t) is the output of the MLP at step t• L = f(O(1), O(2), …, O(T-N+1))• Forward equations of the network at step t:
1. 𝑧> 𝑡 = 𝑊>𝑥 𝑡 + 𝑏>2. 𝑧@ 𝑡 = 𝑟𝑒𝑙𝑢 𝑧> 𝑡3. 𝑧` 𝑡 = 𝑊@𝑧@ + 𝑏@4. 𝑧a 𝑡 = 𝑟𝑒𝑙𝑢(𝑧` 𝑡 )5. 𝑂 𝑡 = 𝑊`𝑧a 𝑡 + 𝑏`
33
![Page 34: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/34.jpg)
Example 2: Scanning with an MLP (forward)
Rewrite these in terms of unary and binary operations
1. 𝑧> 𝑡 = 𝑊>𝑥 𝑡2. 𝑧@ 𝑡 = 𝑧> 𝑡 + 𝑏>3. 𝑧` 𝑡 = 𝑟𝑒𝑙𝑢 𝑧@ 𝑡4. 𝑧a 𝑡 = 𝑊@𝑧`5. 𝑧g 𝑡 = 𝑧a + 𝑏@6. 𝑧i 𝑡 = 𝑟𝑒𝑙𝑢 𝑧g 𝑡7. 𝑧l 𝑡 = 𝑊`𝑧i 𝑡8. 𝑂 𝑡 = 𝑧l 𝑡 + 𝑏`
1. 𝑧> 𝑡 = W>𝑥 𝑡 + 𝑏>2. 𝑧@ 𝑡 = 𝑟𝑒𝑙𝑢 𝑧> 𝑡3. 𝑧` 𝑡 = W@𝑧@ + 𝑏@4. 𝑧a 𝑡 = 𝑟𝑒𝑙𝑢 𝑧` 𝑡5. 𝑂 𝑡 = 𝑊`𝑧a 𝑡 + 𝑏`
34
![Page 35: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/35.jpg)
Example 2: Scanning with an MLP (backward)
• Let’s now work our way backward
• We assume derivative !;!n(p)
of the loss w.r.t. 𝑂 𝑡 is given for t=1,...,T-N+1
• We need to compute !;!� ,
!;!sD
, !;!tD the derivatives of the loss w.r.t. the
inputs and the network parameters
35
![Page 36: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/36.jpg)
Calculating the derivatives for t = 1:1. 𝛻<u(�)𝐿 = 𝛻n(p)𝐿2. 𝛻tw𝐿 = 𝛻n(p)𝐿3. 𝛻sw𝐿 = 𝑧i(𝑡)𝛻<u(�)𝐿4. 𝛻<y(p)𝐿 = 𝛻<u(p)𝐿𝑊`5. 𝛻<z(p)𝐿 = 𝛻<y(p)𝐿 ∘ 1{ 𝑧g(𝑡) |
1{ 𝑧g(𝑡) = }1, 𝑥 > 00, 𝑥 ≤ 0
6. 𝛻<�(p)𝐿 = 𝛻<z(p)𝐿7. 𝛻t�𝐿 = 𝛻<z(p)𝐿8. 𝛻s�𝐿 = 𝑧`(𝑡)𝛻<�(p)𝐿9. 𝛻<w(p)𝐿 = 𝛻<�(p)𝐿𝑊@
6. 𝛻<�(p)𝐿 = 𝛻<z(p)𝐿7. 𝛻t�𝐿 = 𝛻<z(p)𝐿8. 𝛻s�𝐿 = 𝑧`(𝑡)𝛻<�(p)𝐿9. 𝛻<w(p)𝐿 = 𝛻<�(p)𝐿𝑊@10. 𝛻<�(p)𝐿 = 𝛻<w(p)𝐿 ∘ 1{ 𝑧g(𝑡) |
1{ 𝑧g(𝑡) = }1, 𝑥 > 00, 𝑥 ≤ 0
11. 𝛻t�𝐿 = 𝛻<�(p)𝐿12. 𝛻<�(p)𝐿 = 𝛻<�(p)𝐿13. 𝛻s�𝐿 = 𝑥(𝑡)𝛻<�(p)𝐿14. 𝛻#(p)𝐿 = 𝛻<�(p)𝐿𝑊>15. ∇�𝐿 : , 1: 𝑁 + 1 = ∇# p 𝐿
Example 2: Scanning with an MLP (backward)
36
![Page 37: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/37.jpg)
Calculating the derivatives for t > 1:1. 𝛻<u(�)𝐿 = 𝛻n(p)𝐿2. 𝛻tw𝐿 += 𝛻n(p)𝐿3. 𝛻sw𝐿 += 𝑧i(𝑡)𝛻<u(�)𝐿4. 𝛻<y(p)𝐿 = 𝛻<u(p)𝐿𝑊`5. 𝛻<z(p)𝐿 = 𝛻<y(p)𝐿 ∘ 1{ 𝑧g(𝑡) |
1{ 𝑧g(𝑡) = }1, 𝑥 > 00, 𝑥 ≤ 0
6. 𝛻<�(p)𝐿 = 𝛻<z(p)𝐿7. 𝛻t�𝐿 += 𝛻<z(p)𝐿8. 𝛻s�𝐿 += 𝑧`(𝑡)𝛻<�(p)𝐿9. 𝛻<w(p)𝐿 = 𝛻<�(p)𝐿𝑊@
6. 𝛻<�(p)𝐿 = 𝛻<z(p)𝐿7. 𝛻t�𝐿 = 𝛻<z(p)𝐿8. 𝛻s�𝐿 = 𝑧`(𝑡)𝛻<�(p)𝐿9. 𝛻<w(p)𝐿 = 𝛻<�(p)𝐿𝑊@10. 𝛻<�(p)𝐿 = 𝛻<w(p)𝐿 ∘ 1{ 𝑧g(𝑡) |
1{ 𝑧g(𝑡) = }1, 𝑥 > 00, 𝑥 ≤ 0
11. 𝛻t�𝐿 += 𝛻<�(p)𝐿12. 𝛻<�(p)𝐿 = 𝛻<�(p)𝐿13. 𝛻s�𝐿 += 𝑥(𝑡)𝛻<�(p)𝐿14. 𝛻#(p)𝐿 = 𝛻<�(p)𝐿𝑊>15. ∇�𝐿 : , 𝑡 ∶ 𝑡 + 𝑁 − 1 += ∇# p 𝐿 : , ∶ −116. ∇�𝐿 : , 𝑡 + 𝑁 − 1 = ∇# p 𝐿 : , −1
Example 2: Scanning with an MLP (backward)
37
![Page 38: Recitation 2: ComputingDerivativesdeeplearning.cs.cmu.edu/document/recitation/recitation-2.pdf · Computing Derivatives Notice: the shape of the derivative for any variable will](https://reader034.vdocument.in/reader034/viewer/2022042221/5ec745a3a8b2df7f2d5650e2/html5/thumbnails/38.jpg)
When to use “=” vs “+”• In the forward computation, a variable may be used multiple times to compute
other intermediate variables or a sequence of output variables• During backward computations, the first time the derivative is computed for the
variable, the we will use “=”• In subsequent computations we use “+=”• It may be difficult to keep track of when we first compute the derivative for a
variableWhen to use “=” vs when to use “+=”
• Cheap trick:• Initialize all derivatives to 0 during computation• Always use “+=”• You will get the correct answer (why?)
38