deep learning recurrent networksbhiksha/courses/deeplearning/... · deep learning recurrent...
TRANSCRIPT
![Page 1: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/1.jpg)
Deep Learning
Recurrent Networks
21/Feb/2018
1
![Page 2: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/2.jpg)
Which open source project?
2
![Page 3: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/3.jpg)
Related math. What is it talking about?
3
![Page 4: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/4.jpg)
And a Wikipedia page explaining it all
4
![Page 5: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/5.jpg)
The unreasonable effectiveness of recurrent neural networks..
• All previous examples were generated blindly by a recurrent neural network..
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/
5
![Page 6: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/6.jpg)
Modelling Series
• In many situations one must consider a series of inputs to produce an output
– Outputs to may be a series
• Examples: ..
6
![Page 7: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/7.jpg)
Should I invest..
• Stock market
– Must consider the series of stock values in the past several days to decide if it is wise to invest today• Ideally consider all of history
• Note: Inputs are vectors. Output may be scalar or vector
– Should I invest, vs. should I invest in X
15/0314/0313/0312/0311/0310/039/038/037/03
To invest or not to invest?
sto
cks
7
![Page 8: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/8.jpg)
Representational shortcut
• Input at each time is a vector
• Each layer has many neurons
– Output layer too may have many neurons
• But will represent everything simple boxes
– Each box actually represents an entire layer with many units8
![Page 9: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/9.jpg)
Representational shortcut
• Input at each time is a vector
• Each layer has many neurons
– Output layer too may have many neurons
• But will represent everything simple boxes
– Each box actually represents an entire layer with many units9
![Page 10: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/10.jpg)
Representational shortcut
• Input at each time is a vector
• Each layer has many neurons
– Output layer too may have many neurons
• But will represent everything simple boxes
– Each box actually represents an entire layer with many units10
![Page 11: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/11.jpg)
The stock predictor
• The sliding predictor– Look at the last few days
– This is just a convolutional neural net applied to series data• Also called a Time-Delay neural network
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+3)
11
![Page 12: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/12.jpg)
The stock predictor
• The sliding predictor– Look at the last few days
– This is just a convolutional neural net applied to series data• Also called a Time-Delay neural network
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+4)
12
![Page 13: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/13.jpg)
The stock predictor
• The sliding predictor– Look at the last few days
– This is just a convolutional neural net applied to series data• Also called a Time-Delay neural network
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+5)
13
![Page 14: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/14.jpg)
The stock predictor
• The sliding predictor– Look at the last few days
– This is just a convolutional neural net applied to series data• Also called a Time-Delay neural network
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+6)
14
![Page 15: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/15.jpg)
The stock predictor
• The sliding predictor– Look at the last few days
– This is just a convolutional neural net applied to series data• Also called a Time-Delay neural network
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+6)
15
![Page 16: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/16.jpg)
Finite-response model
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡, 𝑋𝑡−1, … , 𝑋𝑡−𝑁
16
![Page 17: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/17.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+2)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁17
![Page 18: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/18.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+3)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁18
![Page 19: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/19.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+4)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁19
![Page 20: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/20.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+5)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁20
![Page 21: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/21.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+6)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁21
![Page 22: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/22.jpg)
The stock predictor
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+7)
• This is a finite response system
– Something that happens today only affects the output of the system for 𝑁 days into the future
• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁22
![Page 23: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/23.jpg)
Finite-response model
• This is a finite response system– Something that happens today only affects the output
of the system for 𝑁 days into the future• 𝑁 is the width of the system
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−𝑁
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+6)
23
![Page 24: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/24.jpg)
Finite-response
• Problem: Increasing the “history” makes the network more complex
– No worries, we have the CPU and memory
• Or do we?
Stockvector
Time
X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7)
Y(t+6)
24
![Page 25: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/25.jpg)
Systems often have long-term dependencies
• Longer-term trends –
– Weekly trends in the market
– Monthly trends in the market
– Annual trends
– Though longer history tends to affect us less than more recent events.. 25
![Page 26: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/26.jpg)
We want infinite memory
• Required: Infinite response systems– What happens today can continue to affect the output
forever• Possibly with weaker and weaker influence
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑋𝑡−1, … , 𝑋𝑡−∞
Time
26
![Page 27: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/27.jpg)
Examples of infinite response systems
𝑌𝑡 = 𝑓 𝑋𝑡 , 𝑌𝑡−1– Required: Define initial state: 𝑌−1 for 𝑡 = 0
– An input at 𝑋0 at 𝑡 = 0 produces 𝑌0– 𝑌0 produces 𝑌1 which produces 𝑌2 and so on until 𝑌∞ even
if 𝑋1…𝑋∞ are 0• i.e. even if there are no further inputs!
• This is an instance of a NARX network
– “nonlinear autoregressive network with exogenous inputs”
– 𝑌𝑡 = 𝑓 𝑋0:𝑡 , 𝑌0:𝑡−1
• Output contains information about the entire past27
![Page 28: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/28.jpg)
A one-tap NARX network
• A NARX net with recursion from the output
TimeX(t)
Y(t)
28
![Page 29: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/29.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t) Y
29
A one-tap NARX network
![Page 30: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/30.jpg)
A one-tap NARX network
• A NARX net with recursion from the output
TimeX(t)
Y(t)
30
![Page 31: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/31.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t)
31
A one-tap NARX network
![Page 32: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/32.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t)
32
A one-tap NARX network
![Page 33: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/33.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t)
33
A one-tap NARX network
![Page 34: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/34.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t)
34
A one-tap NARX network
![Page 35: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/35.jpg)
• A NARX net with recursion from the output
TimeX(t)
Y(t)
35
A one-tap NARX network
![Page 36: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/36.jpg)
A more complete representation
• A NARX net with recursion from the output
• Showing all computations
• All columns are identical
• An input at t=0 affects outputs forever
TimeX(t)
Y(t-1)
Brown boxes show output nodesYellow boxes are outputs
36
![Page 37: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/37.jpg)
Same figure redrawn
• A NARX net with recursion from the output
• Showing all computations
• All columns are identical
• An input at t=0 affects outputs forever
TimeX(t)
Y(t)
Brown boxes show output nodesAll outgoing arrows are the same output
37
![Page 38: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/38.jpg)
A more generic NARX network
• The output 𝑌𝑡 at time 𝑡 is computed from the
past 𝐾 outputs 𝑌𝑡−1, … , 𝑌𝑡−𝐾 and the current
and past 𝐿 inputs 𝑋𝑡, … , 𝑋𝑡−𝐿
TimeX(t)
Y(t)
38
![Page 39: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/39.jpg)
A “complete” NARX network
• The output 𝑌𝑡 at time 𝑡 is computed from all
past outputs and all inputs until time t
– Not really a practical model
TimeX(t)
Y(t)
39
![Page 40: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/40.jpg)
NARX Networks
• Very popular for time-series prediction– Weather
– Stock markets
– As alternate system models in tracking systems
• Any phenomena with distinct “innovations” that “drive” an output
• Note: here the “memory” of the past is in the output itself, and not in the network
40
![Page 41: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/41.jpg)
Lets make memory more explicit
• Task is to “remember” the past
• Introduce an explicit memory variable whose job it is to remember
𝑚𝑡 = 𝑟 𝑦𝑡−1, ℎ𝑡−1, 𝑚𝑡−1
ℎ𝑡 = 𝑓 𝑥𝑡 , 𝑚𝑡
𝑦𝑡 = 𝑔 ℎ𝑡
• 𝑚𝑡 is a “memory” variable
– Generally stored in a “memory” unit
– Used to “remember” the past
41
![Page 42: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/42.jpg)
Jordan Network
• Memory unit simply retains a running average of past outputs
– “Serial order: A parallel distributed processing approach”, M.I.Jordan, 1986
• Input is constant (called a “plan”)
• Objective is to train net to produce a specific output, given an input plan
– Memory has fixed structure; does not “learn” to remember
• The running average of outputs considers entire past, rather than immediate past
Time
Y(t) Y(t+1)1 1
𝜇 𝜇
Fixedweights
Fixedweights
X(t) X(t+1)
42
![Page 43: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/43.jpg)
Elman Networks
• Separate memory state from output
– “Context” units that carry historical state
– “Finding structure in time”, Jeffrey Elman, Cognitive Science, 1990
• For the purpose of training, this was approximated as a set of T independent 1-step history nets
• Only the weight from the memory unit to the hidden unit is learned
Time
X(t)
Y(t) Y(t+1)
1
Cloned state
1
Cloned state
X(t+1)
43
![Page 44: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/44.jpg)
An alternate model for infinite response systems: the state-space model
ℎ𝑡 = 𝑓 𝑥𝑡, ℎ𝑡−1𝑦𝑡 = 𝑔 ℎ𝑡
• ℎ𝑡 is the state of the network– Model directly embeds the memory in the state
• Need to define initial state ℎ−1
• This is a fully recurrent neural network– Or simply a recurrent neural network
• State summarizes information about the entire past
44
![Page 45: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/45.jpg)
The simple state-space model
• The state (green) at any time is determined by the input at that time, and the state at the previous time
• An input at t=0 affects outputs forever
• Also known as a recurrent neural net
Time
X(t)
Y(t)
t=0
h-1
45
![Page 46: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/46.jpg)
An alternate model for infinite response systems: the state-space model
ℎ𝑡 = 𝑓 𝑥𝑡, ℎ𝑡−1𝑦𝑡 = 𝑔 ℎ𝑡
• ℎ𝑡 is the state of the network
• Need to define initial state ℎ−1
• The state an be arbitrarily complex
46
![Page 47: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/47.jpg)
Single hidden layer RNN
• Recurrent neural network
• All columns are identical
• An input at t=0 affects outputs forever
Time
X(t)
Y(t)
t=0
h-1
47
![Page 48: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/48.jpg)
Multiple recurrent layer RNN
• Recurrent neural network
• All columns are identical
• An input at t=0 affects outputs forever
Time
Y(t)
X(t)
t=0
48
![Page 49: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/49.jpg)
A more complex state
• All columns are identical
• An input at t=0 affects outputs forever
TimeX(t)
Y(t)
49
![Page 50: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/50.jpg)
Or the network may be even more complicated
• Shades of NARX
• All columns are identical
• An input at t=0 affects outputs forever
TimeX(t)
Y(t)
50
![Page 51: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/51.jpg)
Generalization with other recurrences
• All columns (including incoming edges) are
identical
Time
Y(t)
X(t)
t=0
51
![Page 52: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/52.jpg)
State dependencies may be simpler
• Recurrent neural network
• All columns are identical
• An input at t=0 affects outputs forever
Time
Y(t)
X(t)
t=0
52
![Page 53: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/53.jpg)
Multiple recurrent layer RNN
• We can also have skips..
Time
Y(t)
X(t)
t=0
53
![Page 54: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/54.jpg)
A Recurrent Neural Network
• Simplified models often drawn
• The loops imply recurrence
54
![Page 55: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/55.jpg)
The detailed version of the simplified representation
Time
X(t)
Y(t)
t=0
h-1
55
![Page 56: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/56.jpg)
Multiple recurrent layer RNN
Time
Y(t)
X(t)
t=056
![Page 57: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/57.jpg)
Multiple recurrent layer RNN
Time
Y(t)
X(t)
t=057
![Page 58: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/58.jpg)
Equations
• Note superscript in indexing, which indicates layer of network from which inputs are obtained
• Assuming vector function at output, e.g. softmax
• The state node activation, 𝑓1() is typically tanh()
• Every neuron also has a bias input
𝑌(𝑡) = 𝑓2
𝑗
𝑤𝑗𝑘1ℎ𝑗
1𝑡 + 𝑏𝑘
1, 𝑘 = 1. .𝑀
ℎ𝑖1(𝑡) = 𝑓1
𝑗
𝑤𝑗𝑖0𝑋𝑗 𝑡 +
𝑗
𝑤𝑗𝑖11
ℎ𝑖1
𝑡 − 1 + 𝑏𝑖1
ℎ𝑖1
−1 = 𝑝𝑎𝑟𝑡 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝑋
ℎ(1)
𝑌
58
![Page 59: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/59.jpg)
Equations
𝑌(𝑡) = 𝑓3
𝑗
𝑤𝑗𝑘2ℎ𝑗
2𝑡 + 𝑏𝑘
3, 𝑘 = 1. .𝑀
ℎ𝑖2(𝑡) = 𝑓2
𝑗
𝑤𝑗𝑖1ℎ𝑗
1(𝑡) +
𝑗
𝑤𝑗𝑖22
ℎ𝑖2
𝑡 − 1 + 𝑏𝑖2
ℎ𝑖1
−1 = 𝑝𝑎𝑟𝑡 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
• Assuming vector function at output, e.g. softmax 𝑓3()
• The state node activations, 𝑓𝑘() are typically tanh()
• Every neuron also has a bias input
ℎ𝑖2
−1 = 𝑝𝑎𝑟𝑡 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
ℎ𝑖1(𝑡) = 𝑓1
𝑗
𝑤𝑗𝑖0𝑋𝑗 𝑡 +
𝑗
𝑤𝑗𝑖11
ℎ𝑖1
𝑡 − 1 + 𝑏𝑖1
𝑋
ℎ(1)
𝑌
ℎ(2)
59
![Page 60: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/60.jpg)
Equations
𝑌𝑖(𝑡) = 𝑓3
𝑗
𝑤𝑗𝑘2ℎ𝑗
2𝑡 +
𝑗
𝑤𝑗𝑘1,3
ℎ𝑗1(𝑡) + 𝑏𝑘
3, 𝑘 = 1. .𝑀
ℎ𝑖2(𝑡) = 𝑓2
𝑗
𝑤𝑗𝑖1,2
ℎ𝑗1(𝑡) +
𝑗
𝑤𝑗𝑖0,2
𝑋𝑗 𝑡 +
𝑖
𝑤𝑖𝑖2,2
ℎ𝑖2
𝑡 − 1 + 𝑏𝑖2
ℎ𝑖1
−1 = 𝑝𝑎𝑟𝑡 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
ℎ𝑖2
−1 = 𝑝𝑎𝑟𝑡 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
ℎ𝑖1(𝑡) = 𝑓1
𝑗
𝑤𝑗𝑖0,1
𝑋𝑗 𝑡 +
𝑖
𝑤𝑖𝑖1,1
ℎ𝑖1
𝑡 − 1 + 𝑏𝑖1
60
![Page 61: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/61.jpg)
Variants on recurrent nets
• 1: Conventional MLP• 2: Sequence generation, e.g. image to caption• 3: Sequence based prediction or classification, e.g. Speech recognition,
text classification
Images fromKarpathy
61
![Page 62: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/62.jpg)
Variants
• 1: Delayed sequence to sequence
• 2: Sequence to sequence, e.g. stock problem, label prediction
• Etc…
Images fromKarpathy
62
![Page 63: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/63.jpg)
How do we train the network
• Back propagation through time (BPTT)
• Given a collection of sequence inputs– (𝐗𝑖 , 𝐃𝑖), where
– 𝐗𝑖 = 𝑋𝑖,0, … , 𝑋𝑖,𝑇
– 𝐃𝑖 = 𝐷𝑖,0, … , 𝐷𝑖,𝑇
• Train network parameters to minimize the error between the output of the network 𝐘𝑖 = 𝑌𝑖,0, … , 𝑌𝑖,𝑇 and the desired outputs
– This is the most generic setting. In other settings we just “remove” some of the input or output entries
X(0)
Y(0)
t
h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
63
![Page 64: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/64.jpg)
Training: Forward pass
• For each training input:
• Forward pass: pass the entire data sequence through the network, generate outputs
X(0)
Y(0)
t
h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
64
![Page 65: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/65.jpg)
Training: Computing gradients
• For each training input:
• Backward pass: Compute gradients via backpropagation
– Back Propagation Through Time
X(0)
Y(0)
t
h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
65
![Page 66: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/66.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
Will only focus on one training instance
All subscripts represent components and not training instance index
66
![Page 67: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/67.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
• The divergence computed is between the sequence of outputsby the network and the desired sequence of outputs
• This is not just the sum of the divergences at individual times Unless we explicitly define it that way
67
![Page 68: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/68.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
First step of backprop: Compute 𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇)for all i
In general we will be required to compute 𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑡)for all 𝑖 and 𝑡 as we will see. This can
be a source of significant difficulty in many scenarios.68
![Page 69: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/69.jpg)
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(𝑇)
𝐷𝑖𝑣(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑡)for all i for all T
𝐷𝑖𝑣(𝑇 − 1)𝐷𝑖𝑣(𝑇 − 2)𝐷𝑖𝑣(2)𝐷𝑖𝑣(1)𝐷𝑖𝑣(0)
𝐷𝐼𝑉
Must compute
𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑡)=𝑑𝐷𝑖𝑣(𝑡)
𝑑𝑌𝑖(𝑡)
Will usually get
Special case, when the overall divergence is a simple combination of localdivergences at each time:
69
![Page 70: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/70.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
First step of backprop: Compute 𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇)for all i
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(1)(𝑇)
=𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑍𝑖(1)(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(𝑇)=
𝑗
𝑑𝐷𝐼𝑉
𝑑𝑌𝑗(𝑇)
𝑑𝑌𝑗(𝑇)
𝑑𝑍𝑗(1)(𝑇)
OR
Vector output activation
70
∇𝑍(1)(𝑇)𝐷𝐼𝑉 = 𝛻𝑌(𝑇)𝐷𝐼𝑉 𝛻𝑍(𝑇)𝑌(𝑇)
![Page 71: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/71.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇)for all i
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(1)(𝑇)
=𝑑𝐷𝑖𝑣(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑍𝑖(1)(𝑇)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇)=
𝑗
𝑑𝐷𝐼𝑉
𝑑𝑍𝑗(1)(𝑇)
𝑑𝑍𝑗(1)(𝑇)
𝑑ℎ𝑖(𝑇)=
𝑗
𝑤𝑖𝑗(1) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗(1)(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
71
∇ℎ(𝑇)𝐷𝐼𝑉 = 𝛻𝑍(1)(𝑇)𝐷𝐼𝑉 𝑊(1)
![Page 72: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/72.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(1)(𝑇)
=𝑑𝐷𝑖𝑣(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑍𝑖(1)(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(1)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗(1)(𝑇)
𝑑𝑍𝑗(1)(𝑇)
𝑑𝑤𝑖𝑗(1)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1
𝑇ℎ𝑖(𝑇)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇)=
𝑗
𝑤𝑖𝑗(1) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗(1)(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
72
∇𝑊(1)𝐷𝐼𝑉 = ℎ 𝑇 𝛻𝑍(1)(𝑇)𝐷𝐼𝑉
![Page 73: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/73.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(1)(𝑇)
=𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇)
𝑑𝑌𝑖(𝑇)
𝑑𝑍𝑖(1)(𝑇)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇)=
𝑗
𝑤𝑖𝑗(1) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗(1)(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(1)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1
𝑇ℎ𝑖(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖(0)(𝑇)
=𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇)
𝑑ℎ𝑖(𝑇)
𝑑𝑍𝑖(0)(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
73
∇𝑍(0) (𝑇)𝐷𝐼𝑉 = 𝛻ℎ(𝑇)𝐷𝐼𝑉 𝛻𝑍(0)(𝑇)ℎ(𝑇)
![Page 74: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/74.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(0)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇𝑋𝑖(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
74
∇𝑊(0)𝐷𝐼𝑉 = 𝑋 𝑇 𝛻𝑍(0)(𝑇)𝐷𝐼𝑉
![Page 75: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/75.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(11)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇ℎ𝑖(𝑇 − 1)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(0)
=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇𝑋𝑖(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
75
∇𝑊(11)𝐷𝐼𝑉 = ℎ 𝑇 − 1 𝛻𝑍(0)(𝑇)𝐷𝐼𝑉
![Page 76: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/76.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖1(𝑇 − 1)
=𝑑𝐷𝐼𝑉
𝑑𝑌𝑖(𝑇 − 1)
𝑑𝑌𝑖(𝑇 − 1)
𝑑𝑍𝑖1(𝑇 − 1)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
Vector output activation
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖1(𝑇 − 1)
=
𝑗
𝑑𝐷𝐼𝑉
𝑑𝑌𝑗(𝑇 − 1)
𝑑𝑌𝑗(𝑇 − 1)
𝑑𝑍𝑖1(𝑇 − 1)
OR
76
∇𝑍(1)(𝑇−1)𝐷𝐼𝑉 = 𝛻𝑌(𝑇−1)𝐷𝐼𝑉 𝛻𝑍(1) 𝑇 𝑌(𝑇 − 1)
![Page 77: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/77.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇 − 1)=
𝑗
𝑤𝑖𝑗(1) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1(𝑇 − 1)
+
𝑗
𝑤𝑖𝑗(11) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
77
∇ℎ(𝑇−1)𝐷𝐼𝑉 = 𝛻𝑍 1 (𝑇−1)𝐷𝐼𝑉 𝑊(1) + 𝛻𝑍(0)(𝑇)𝐷𝐼𝑉 𝑊(11)
![Page 78: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/78.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇 − 1)=
𝑗
𝑤𝑖𝑗(1) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1(𝑇 − 1)
+
𝑗
𝑤𝑖𝑗(11) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(1)
+=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1
𝑇 − 1ℎ𝑖(𝑇 − 1)
Note the addition
78∇𝑊(1)𝐷𝐼𝑉 += ℎ 𝑇 − 1 𝛻𝑍 1 (𝑇−1)𝐷𝐼𝑉
![Page 79: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/79.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖0(𝑇 − 1)
=𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇 − 1)
𝑑ℎ𝑖(𝑇 − 1)
𝑑𝑍𝑖0(𝑇 − 1)
79∇𝑍(0) (𝑇−1)𝐷𝐼𝑉 = 𝛻ℎ(𝑇−1)𝐷𝐼𝑉 𝛻𝑍 0 𝑇−1 ℎ(𝑇 − 1)
![Page 80: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/80.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖0(𝑇 − 1)
=𝑑𝐷𝐼𝑉
𝑑ℎ𝑖(𝑇 − 1)
𝑑ℎ𝑖(𝑇 − 1)
𝑑𝑍𝑖0(𝑇 − 1)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(0)
+=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇 − 1𝑋𝑖(𝑇 − 1)
Note the addition
80
∇𝑊(0)𝐷𝐼𝑉 += 𝑋 𝑇 − 1 𝛻𝑍 0 (𝑇−1)𝐷𝐼𝑉
![Page 81: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/81.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(0)
+=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇 − 1𝑋𝑖(𝑇 − 1)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(11)
+=𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑇 − 1ℎ𝑖(𝑇 − 2)
Note the addition
81∇𝑊(11)𝐷𝐼𝑉 += ℎ 𝑇 − 2 𝛻𝑍 0 (𝑇−1)𝐷𝐼𝑉
![Page 82: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/82.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑ℎ−1=
𝑗
𝑤𝑖𝑗(11) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1(0)
82∇ℎ−1𝐷𝐼𝑉 = 𝛻𝑍 1 (0)𝐷𝐼𝑉𝑊
(11)
![Page 83: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/83.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑍𝑖𝑘(𝑡)
=𝑑𝐷𝐼𝑉
𝑑ℎ𝑖𝑘(𝑡)
𝑓𝑘′ 𝑍𝑖
𝑘(𝑡)
𝑑𝐷𝐼𝑉
𝑑ℎ𝑖𝑘(𝑡)
=
𝑗
𝑤𝑖,𝑗(𝑘) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗𝑘+1
(𝑡)+
𝑗
𝑤𝑖,𝑗(𝑘,𝑘) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗𝑘(𝑡 + 1)
Not showing derivativesat output neurons
83
![Page 84: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/84.jpg)
Back Propagation Through Time
h-1
𝑋(0) 𝑋(1) 𝑋(2) 𝑋(𝑇 − 2) 𝑋(𝑇 − 1) 𝑋(𝑇)
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑇 − 2) 𝑌(𝑇 − 1) 𝑌(𝑇)
𝐷(1. . 𝑇)
𝐷𝐼𝑉
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(0)
=
𝑡
𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑡𝑋𝑖(𝑡)
𝑑𝐷𝐼𝑉
𝑑𝑤𝑖𝑗(11)
=
𝑡
𝑑𝐷𝐼𝑉
𝑑𝑍𝑗0
𝑡ℎ𝑖(𝑡 − 1)
𝑑𝐷𝐼𝑉
𝑑ℎ−1=
𝑗
𝑤𝑖𝑗(11) 𝑑𝐷𝐼𝑉
𝑑𝑍𝑗1(0)
84
![Page 85: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/85.jpg)
BPTT
• Can be generalized to any architecture
85
![Page 86: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/86.jpg)
Extensions to the RNN: Bidirectional RNN
• RNN with both forward and backward recursion
– Explicitly models the fact that just as the future can be predicted from the past, the past can be deduced from the future
86
![Page 87: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/87.jpg)
Bidirectional RNN
• A forward net process the data from t=0 to t=T
• A backward net processes it backward from t=T down to t=0
X(0)
Y(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
87
![Page 88: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/88.jpg)
Bidirectional RNN: Processing an input string
• The forward net process the data from t=0 to t=T
– Only computing the hidden states, initially
• The backward net processes it backward from t=T down to t=0
X(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
88
![Page 89: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/89.jpg)
Bidirectional RNN: Processing an input string
• The backward nets processes the input data in reverse time, end to beginning
– Initially only the hidden state values are computed
– Clearly, this is not an online process and requires the entire input data
– Note: This is not the backward pass of backprop.
• The backward net processes it backward from t=T down to t=0
X(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
89
![Page 90: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/90.jpg)
Bidirectional RNN: Processing an input string
• The computed states of both networks are
used to compute the final output at each time
X(0)
Y(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
90
![Page 91: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/91.jpg)
Backpropagation in BRNNs
• Forward pass: Compute both forward and
backward networks and final output
X(0)
Y(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
91
![Page 92: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/92.jpg)
Backpropagation in BRNNs
• Backward pass: Define a divergence from the desired output
• Separately perform back propagation on both nets
– From t=T down to t=0 for the forward net
– From t=0 up to t=T for the backward net
X(0)
Y(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
Div()d1..dT
Div
92
![Page 93: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/93.jpg)
Backpropagation in BRNNs
• Backward pass: Define a divergence from the desired output
• Separately perform back propagation on both nets
– From t=T down to t=0 for the forward net
– From t=0 up to t=T for the backward net
X(0)
Y(0)
t
hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
Div()d1..dT
Div
93
![Page 94: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/94.jpg)
Backpropagation in BRNNs
• Backward pass: Define a divergence from the desired output
• Separately perform back propagation on both nets
– From t=T down to t=0 for the forward net
– From t=0 up to t=T for the backward net
Y(0)
t
Y(1) Y(2) Y(T-2) Y(T-1) Y(T)
X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
Div()d1..dT
Div
94
![Page 95: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/95.jpg)
RNNs..
• Excellent models for time-series analysis tasks
– Time-series prediction
– Time-series classification
– Sequence prediction..
95
![Page 96: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/96.jpg)
So how did this happen
96
![Page 97: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/97.jpg)
So how did this happen
More on this later..97
![Page 98: Deep Learning Recurrent Networksbhiksha/courses/deeplearning/... · Deep Learning Recurrent Networks 21/Feb/2018 1. Which open source project? 2. Related math. What is it talking](https://reader033.vdocument.in/reader033/viewer/2022042302/5ecdd45d3170013f4a478fd1/html5/thumbnails/98.jpg)
RNNs..
• Excellent models for time-series analysis tasks
– Time-series prediction
– Time-series classification
– Sequence prediction..
– They can even simplify some problems that are difficult for MLPs
• Next class..
98