introduction to fundamentals of rnns
TRANSCRIPT
![Page 2: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/2.jpg)
Icebreaker
Can we predict the future based on our current decisions?
2
“Which research direction should I take?”
Elvis Saravia
![Page 3: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/3.jpg)
Outline
●Part 1: Review Neural Network Essentials
●Part 2: Sequential Modeling
●Part 3: Introduction to Recurrent Neural Networks
●Part 4: RNNs with Tensorflow
3
Elvis Saravia
![Page 5: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/5.jpg)
Perceptron Forward Pass
5
∑ 𝑓
non-linearity
sum
weights
inputs
𝑥0
𝑥1
𝑥𝑛
𝑏
𝑤0
𝑤1
𝑤𝑛
𝑏
𝑦
output
𝑦 = 𝑓(
𝑖=0
𝑁
𝑥𝑖 ∗ 𝑤𝑖 + 𝑏)
Computing output:
𝑦 = 𝑓(𝑋𝑊 + 𝑏)
𝑋 = 𝑥𝑜, 𝑥1, … , 𝑥𝑛
𝑊 = 𝑤𝑜, 𝑤1, … , 𝑤𝑛
tanh, ReLU, sigmoid
Vector form:
Elvis Saravia
![Page 6: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/6.jpg)
Multi-Layer Perceptron (MLP)
6
Output layer
Hidden layer
Input layer
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
𝑥0
𝑥1
𝑥2
𝑥3
![Page 7: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/7.jpg)
Deep Neural Networks (DNN)
7
Output layer
Hidden layers
Input layer
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
𝑥0
𝑥1
𝑥2
𝑥3
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
…
![Page 8: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/8.jpg)
Neural Network (Summary)
• Neural Networks learn features after input is fed into the hidden layers while updating weights through backpropagation (SGD)*.
• Data and activation flows in one direction through the hidden layers, but neurons never interact with each other.
8
Output layer
Hidden layers
Input layer
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
𝑥0
𝑥1
𝑥2
𝑥3
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
…
* Stochastic Gradient Descent (SGD) Elvis Saravia
![Page 9: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/9.jpg)
Drawbacks of NNs
●Lack sequence modeling capabilities: don’t keep track of past information (i.e., no memory), which is very important to model data with a sequential nature.
● Input is of fixed size (more of this later)
9
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
𝑥0
𝑥1
𝑥2
𝑥𝑛
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
…
Elvis Saravia
![Page 11: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/11.jpg)
Sequences
“I took the bus this morning because it was very cold.”
11
Stock price
Speech waveform
Sentence
●Current values depend on previous values (e.g., melody notes, language rules)
●Order needs to be maintained to preserve meaning
●Sequences usually vary in length
Elvis Saravia
![Page 12: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/12.jpg)
Modeling Sequences
How to represent a sequence?
12
I love the coldness [ 1 1 0 1 0 0 1 ]
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
𝑥0
𝑥1
𝑥2
𝑥𝑛
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
∑ 𝑓
…
𝑇
Bag of Words (BOW)
What’s the problem with the BOW representation?
![Page 13: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/13.jpg)
Problem with BOW
Bag of words does not preserve order, therefore no semantics can be captured
13
“The food was good, not bad at all”
“The food was bad, not good at all”
vs
How to differentiate meaning of both sentences?
[ 1 1 0 1 1 0 1 0 1 0 0 1 1 ]
![Page 14: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/14.jpg)
One-Hot Encoding
● Preserve order by maintaining order within feature vector
14
On Monday it was raining
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
We preserved order but what is the problem here?
![Page 15: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/15.jpg)
Problem with One-Hot Encoding
● One-hot encoding cannot deal with variations of the same sequence.
15
On Monday it was raining
It was raining on Monday
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
[ 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 ]
Elvis Saravia
![Page 16: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/16.jpg)
Solution
Solution: We need to relearn the rules of language at each point in the sentence to preserve meaning.
16
On Monday it was raining
It was raining on Monday
No idea of state and what comes next!!!
What was learned atthe beginning of thesentence will have to be relearned at the end of the sentence.
![Page 17: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/17.jpg)
Markov Models
State and transitions can be modeled,therefore it doesn’t matter wherein the sentence we are, we have an ideaof what comes next based on theprobabilities.
17
Problem: Each state depends only on the last state. We can’t model long-term dependencies!
Elvis Saravia
![Page 18: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/18.jpg)
Long-term dependencies
We need information from the far past and future to accurately model sequences.
18
In Italy, I had a great time and I learnt some of the _____ language
Elvis Saravia
![Page 20: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/20.jpg)
Part 3
20
Recurrent Neural Networks
(RNNs)
![Page 21: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/21.jpg)
Recurrent Neural Networks (RNNs)
●RNNs model sequential information by assuming long-term dependencies between elements of a sequence.
●RNNs maintain word order and share parameters across the sequence (i.e., no need to relearn rules).
●RNNs are recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations.
●RNNs memorize information that has been computed so far, so they deal well with long-term dependencies.
21
![Page 22: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/22.jpg)
Applications of RNN
●Analyze time series data to predict stock market
●Speech recognition (e.g., Emotion Recognition from Acoustic features)
●Autonomous driving
●Natural Language Processing (e.g., Machine Translation, Question and Answer)?
22
Elvis Saravia
![Page 23: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/23.jpg)
Some examples
Google Magenta Project (Melody composer) – (https://magenta.tensorflow.org)
Sentence Generator - (http://goo.gl/onkPNd)
Image Captioning – (http://goo.gl/Nwx7Kh)
23
Elvis Saravia
![Page 24: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/24.jpg)
RNNs Main Components
- Recurrent neurons
- Unrolling recurrent neurons
- Layer of recurrent neurons
- Memory cell containing hidden state
24
Elvis Saravia
![Page 25: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/25.jpg)
Recurrent Neurons
𝑥
𝑦A simple recurrent neuron:• receives input• produces output• sends output back to itself
∑
𝑓
𝑦𝑥
∑𝑓
- Input- Output (usually a vector of probabilities)- sum(W. x) + bias- Activation function (e.g., ReLU, tanh, sigmoid)- Weights for inputs- Weights for outputs of previous time step- Function of current inputs and previous time step
𝑊𝑥
𝑊𝑦
𝑊𝑦
𝑊𝑥
25
ℎ
ℎ
![Page 26: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/26.jpg)
Unrolling/Unfolding recurrent neuron
𝑥
𝑦
∑
𝑓
𝑥𝑡−3
𝑦𝑡−3
∑
𝑓
𝑥𝑡−2
𝑦𝑡−2
∑
𝑓
𝑥𝑡−1
𝑦𝑡−1
∑
𝑓
𝑥𝑡
𝑦𝑡
∑
𝑓
Unrollingthrough
time
Time stepor
frame 26
𝑊𝑦 𝑊𝑦 𝑊𝑦 𝑊𝑦
𝑊𝑥 𝑊𝑥 𝑊𝑥 𝑊𝑥
𝑊𝑦
ℎ𝑡−3 ℎ𝑡−2 ℎ𝑡−1 ℎ𝑡
time
ℎ𝑡−4
𝑊𝑥
𝑊𝑦
ℎ
![Page 27: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/27.jpg)
RNNs remember previous state
27
𝑥0: "𝑖𝑡"
𝑦0
∑
𝑓
𝑊𝑥
𝑊𝑦
𝑥0 : vector representing first word
𝑊𝑥,𝑊𝑦: weight matrices
𝑦0 : output at t=0
ℎ0 = tanh(𝑊𝑥𝑥0 +𝑊𝑦ℎ−1)t=0
Can remember things from the past
ℎ−1 ℎ0
![Page 28: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/28.jpg)
RNNs remember previous state
28
𝑥1: "𝑤𝑎𝑠"
𝑦1
∑
𝑓
𝑊𝑥
𝑊𝑦
𝑥1 : vector representing second word
𝑊𝑥,𝑊𝑦: weight matrices stay the same so they are shared across sequence
𝑦1 : output at t=1
ℎ1 = tanh(𝑊𝑥𝑥1 +𝑊𝑦ℎ0)t=1
ℎ0 ℎ1
ℎ0 = tanh(𝑊𝑥𝑥0 +𝑊𝑦ℎ−1)
Can remember things from t=0
![Page 29: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/29.jpg)
Overview
𝑥
𝑦
∑
𝑓
𝑥𝑡−3
𝑦𝑡−3
∑
𝑓
𝑥𝑡−2
𝑦𝑡−2
∑
𝑓
𝑥𝑡−1
𝑦𝑡−1
∑
𝑓
𝑥𝑡
𝑦𝑡
∑
𝑓
Unrollingthrough
time
scalar
[0,1,2] [3,4,5] [6,7,8] [9,0,1]
𝑦𝑡 = 𝑓 𝑊𝑥𝑥𝑡 +𝑊𝑦𝑦𝑡−1 + 𝑏
29[9,8,7] [0,0,0] [6,5,4] [3,2,1] batch
![Page 30: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/30.jpg)
Code Example
30
Where all the magic happens!
𝑦𝑡 = 𝑓 𝑊𝑥𝑥𝑡 +𝑊𝑦𝑦𝑡−1 + 𝑏
![Page 31: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/31.jpg)
Layer of Recurrent Neurons
𝑥
𝑦
∑
𝑓
∑
𝑓
∑
𝑓
∑
𝑓
∑
𝑓
∑
𝑓
𝑥
𝑦
scalar
vector
31
ℎ
Elvis Saravia
![Page 32: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/32.jpg)
Unrolling Layer
∑
𝑓∑𝑓
∑
𝑓
∑𝑓
∑𝑓
𝑥
𝑦
Unrollingthrough
time
𝑥0
𝑦0
𝑥1
𝑦1
𝑥2
𝑦2
[1,2,3] [1,2,3] [1,2,3]
[4,5,6] [7,8,9] [0,0,0]
Yt = 𝑓 𝑋𝑡 𝑌𝑡−1 .𝑊 + 𝑏 , 𝑊 =𝑊𝑥𝑊𝑦
Y𝑡 = 𝑓 𝑋𝑡.𝑊𝑥 + 𝑌𝑡−1.𝑊𝑦 + 𝑏
vectorform
32
𝑊𝑥 𝑊𝑥 𝑊𝑥
𝑊𝑦 𝑊𝑦 𝑊𝑦ℎ0 ℎ1 ℎ2
![Page 33: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/33.jpg)
Variations of RNNs: Input / Output
𝑋0
𝑌0
𝑋1
𝑌1
𝑋2
𝑌2
𝑋3
𝑌3
sequence to sequence- Stock price- Other time series
𝑋0
𝑌0
0
𝑌1
0
𝑌2
0
𝑌3
vector to sequence
- Image captioning
𝑥0
𝑦0
𝑥1
𝑦1
0
𝑌′2
0
𝑌′3
0
𝑌′3
Encoder Decoder
- Translation:- E: Sequence to vector- D: Vector to sequence
33
𝑋0
𝑌0
𝑋1
𝑌1
𝑋2
𝑌2
𝑋3
𝑌3
sequence to vector
- Sentiment analysis ([-1,+1])- Other classification tasks
vector ofprobabilities over classesa.k.a softmax
![Page 34: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/34.jpg)
Training
𝑋0
𝑌0
𝑋1
𝑌1
𝑋2
𝑌2
𝑋3
𝑌3
𝑊,𝑏 𝑊,𝑏 𝑊,𝑏 𝑊, 𝑏
𝑋4
𝑌4
𝑊,𝑏
𝐶(𝑌2, 𝑌3, 𝑌4)
Forward pass
Backpropagation34
- Forward pass- Compute Loss via cost function C- Minimize Loss by backpropagation
through time (BPTT)
𝜕𝐽4𝜕𝑊=
𝑘=0
4𝜕𝐽4𝜕𝑦4
𝜕𝑦4𝜕ℎ4
𝜕ℎ4𝜕ℎ𝑘
𝜕ℎ𝑘𝜕𝑊
𝜕𝐽
𝜕𝑊=
𝑡
𝜕𝐽𝑡𝜕𝑊
For one single time step t:
𝐽2, 𝐽3, 𝐽4
Counting the contributions of W in previous time-stepsto the error at time-step t (using Chain rule)
ℎ0 ℎ1 ℎ2 ℎ3 ℎ4
![Page 35: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/35.jpg)
Drawbacks
Main problem: Vanishing gradients (gradients gets too small)
Intuition: As sequences get longer, gradients tend to get too small during backpropagation process.
35
𝜕𝐽𝑛𝜕𝑊=
𝑘=0
𝑛𝜕𝐽𝑛𝜕𝑦𝑛
𝜕𝑦𝑛𝜕ℎ𝑛
𝜕ℎ𝑛𝜕ℎ𝑘
𝜕ℎ𝑘𝜕𝑊
𝜕ℎ𝑛𝜕ℎ𝑛−1
𝜕ℎ𝑛−1𝜕ℎ𝑛−2
. . .𝜕ℎ3𝜕ℎ2
𝜕ℎ2𝜕ℎ1
𝜕ℎ1𝜕ℎ0
We are just multiplying a lot of small numbers together
Elvis Saravia
![Page 36: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/36.jpg)
Solutions
Long Short-Term Memory Networks –
Deal with vanishing gradient problem, therefore more reliable to model long-term dependencies, especially for very long sequences.
36
Elvis Saravia
![Page 37: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/37.jpg)
RNN Extensions
Extended Readings:
○ Bidirectional RNNs – passing states in both directions
○ Deep (Bidirectional) RNNs – stacking RNNs
○ LSTM networks – Adaptation of RNNs
37
Elvis Saravia
![Page 38: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/38.jpg)
In general
●RNNs are great for analyzing sequences of any arbitrary length.
●RNNs are considered “anticipatory” models
●RNNs are also considered creative learning models as they can, for example, predict set of musical notes to play next in melody, and selects an appropriate one.
38
Elvis Saravia
![Page 40: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/40.jpg)
Demo
●Building RNNS in Tensorflow
●Trainining RNNS in Tensorflow
● Image Classification
●Text Classification
40
Elvis Saravia
![Page 41: Introduction to Fundamentals of RNNs](https://reader033.vdocument.in/reader033/viewer/2022051404/5a64c05e7f8b9a4d0b8b47a5/html5/thumbnails/41.jpg)
References
● Introduction to RNNs - http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
●NTHU Machine Learning Course - https://goo.gl/B4EqMi
●Hands-On Machine Learning with Scikit-Learn and Tensorflow (Book)
41
Elvis Saravia