recurrent neural network (rnn) - ntu speech...
TRANSCRIPT
![Page 1: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/1.jpg)
Recurrent Neural Network (RNN)
![Page 2: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/2.jpg)
Example Application
• Slot Filling
I would like to arrive Taipei on November 2nd.
ticket booking system
Destination:
time of arrival:
Taipei
November 2ndSlot
![Page 3: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/3.jpg)
Example Application
1x2x
2y1y
Taipei
Input: a word
(Each word is represented as a vector)
Solving slot filling by Feedforward network?
![Page 4: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/4.jpg)
1-of-N encoding
Each dimension corresponds to a word in the lexicon
The dimension for the wordis 1, and others are 0
lexicon = {apple, bag, cat, dog, elephant}
apple = [ 1 0 0 0 0]
bag = [ 0 1 0 0 0]
cat = [ 0 0 1 0 0]
dog = [ 0 0 0 1 0]
elephant = [ 0 0 0 0 1]
The vector is lexicon size.
1-of-N Encoding
How to represent each word as a vector?
![Page 5: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/5.jpg)
Beyond 1-of-N encoding
w = “apple”
a-a-a
a-a-b
p-p-l
26 X 26 X 26…
…a-p-p…
p-l-e…
……
……
1
1
1
0
0
Word hashingDimension for “Other”
w = “Sauron” …
apple
bag
cat
dog
elephant
“other”
0
0
0
0
0
1
w = “Gandalf” 5
![Page 6: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/6.jpg)
Example Application
1x2x
2y1y
Taipei
desttime of departure
Input: a word
(Each word is represented as a vector)
Output:
Probability distribution that the input word belonging to the slots
Solving slot filling by Feedforward network?
![Page 7: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/7.jpg)
Example Application
1x2x
2y1y
Taipei
arrive Taipei on November 2nd
other otherdest time time
leave Taipei on November 2nd
place of departure
Neural network needs memory!
desttime of departure
Problem?
![Page 8: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/8.jpg)
Recurrent Neural Network (RNN)
1x2x
2y1y
1a 2a
Memory can be considered as another input.
The output of hidden layer are stored in the memory.
store
![Page 9: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/9.jpg)
Example
1x2x
2y1y
1a 2a
store
All the weights are “1”, no bias
All activation functions are linear
Input sequence: 11
11
22
……
1 1
0 0given Initial
values
2 2
4 4
output sequence:44
![Page 10: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/10.jpg)
Example
1x2x
2y1y
1a 2a
store
All the weights are “1”, no bias
All activation functions are linear
Input sequence: 11
11
22
……
1 1
2 26 6
12 12
output sequence:44
1212
![Page 11: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/11.jpg)
Example
1x2x
2y1y
1a 2a
store
All the weights are “1”, no bias
All activation functions are linear
Input sequence: 11
11
22
……
2 2
6 616 16
32 32
output sequence:44
1212
3232
Changing the sequence order will change the output.
![Page 12: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/12.jpg)
RNN
store store
x1 x2 x3
y1 y2 y3
a1
a1 a2
a2 a3
The same network is used again and again.
arrive Taipei on November 2nd
Probability of “arrive” in each slot
Probability of “Taipei” in each slot
Probability of “on” in each slot
![Page 13: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/13.jpg)
RNN
store
x1 x2
y1 y2
a1
a1a2
……
……
……
store
x1 x2
y1 y2
a1
a1a2
……
……
……
leave Taipei
Prob of “leave” in each slot
Prob of “Taipei” in each slot
Prob of “arrive” in each slot
Prob of “Taipei” in each slot
arrive Taipei
Different
The values stored in the memory is different.
![Page 14: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/14.jpg)
Of course it can be deep …
…… ……
xt xt+1 xt+2
……
……
yt
……
……
yt+1…
…yt+2
……
……
![Page 15: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/15.jpg)
Elman Network & Jordan Network
xt xt+1
yt yt+1
……Wh
Wi
Wo
Wi
Wo
……
xt xt+1
yt yt+1
……
Wh
Wi
Wo
Wi
Wo
……
Elman Network Jordan Network
![Page 16: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/16.jpg)
Bidirectional RNN
yt+1
…… ……
…………yt+2yt
xt xt+1 xt+2
xt xt+1 xt+2
![Page 17: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/17.jpg)
MemoryCell
Long Short-term Memory (LSTM)
Input Gate
Output Gate
Signal control the input gate
Signal control the output gate
Forget Gate
Signal control the forget gate
Other part of the network
Other part of the network
(Other part of the network)
(Other part of the network)
(Other part of the network)
LSTM
Special Neuron:4 inputs, 1 output
![Page 18: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/18.jpg)
𝑧
𝑧𝑖
𝑧𝑓
𝑧𝑜
𝑔 𝑧
𝑓 𝑧𝑖multiply
multiplyActivation function f is usually a sigmoid function
Between 0 and 1
Mimic open and close gate
c
𝑐′ = 𝑔 𝑧 𝑓 𝑧𝑖 + 𝑐𝑓 𝑧𝑓
ℎ 𝑐′𝑓 𝑧𝑜
𝑎 = ℎ 𝑐′ 𝑓 𝑧𝑜
𝑔 𝑧 𝑓 𝑧𝑖
𝑐′
𝑓 𝑧𝑓
𝑐𝑓 𝑧𝑓
𝑐
![Page 19: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/19.jpg)
LSTM - Example
100
0
𝑥1𝑥2
𝑥3
𝑦
310
0
200
0
410
0
200
0
101
7
3-10
0
610
0
101
6
When x2 = 1, add the numbers of x1 into the memory
When x3 = 1, output the number in the memory.
0 0 3 3 7 7 7 0 6
When x2 = -1, reset the memory
![Page 20: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/20.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
x1 x2 x3
x1
x2
x3
x1
x2
x3
x1
x2
x3
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
010
1000
![Page 21: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/21.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
3 1 0
3
1
0
3
1
0
3
1
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
010
100
3
≈1 3
≈103
3≈0
≈0
![Page 22: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/22.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
4 1 0
4
1
0
4
1
0
4
1
0
y
1
1
1
1
4
≈1 4
≈137
7≈0
≈0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 23: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/23.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
2 0 0
2
0
0
2
0
0
2
0
0
y
1
1
1
1
2
≈0 0
≈17
7≈0
≈0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 24: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/24.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
1 0 1
1
0
1
1
0
1
1
0
1
y
1
1
1
1
1
≈0 0
≈17
7≈1
≈7
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 25: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/25.jpg)
𝑥1𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
3 -1 0
3
-1
0
3
-1
0
3
-1
0
y
1
1
1
1
3
≈0 0
≈07
0≈0
≈0
0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 26: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/26.jpg)
x1 x2 Input
Original Network:
Simply replace the neurons with LSTM
𝑎1 𝑎2
……
……
𝑧1 𝑧2
![Page 27: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/27.jpg)
x1 x2
+
+
+
+
+
+
+
+
Input
𝑎1 𝑎2
4 times of parameters
![Page 28: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/28.jpg)
LSTM
ct-1
……vector
xt
zzizf zo 4 vectors
![Page 29: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/29.jpg)
LSTM
xt
zzi
×
zf zo
× + ×
yt
ct-1
z
zi
zf
zo
![Page 30: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/30.jpg)
LSTM
xt
zzi
×
zf zo
× + ×
yt
xt+1
zzi
×
zf zo
× + ×
yt+1
ht
Extension: “peephole”
ht-1 ctct-1
ct-1 ct ct+1
![Page 31: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/31.jpg)
Multiple-layerLSTM
This is quite standard now.
https://img.komicolle.org/2015-09-20/src/14426967627131.gif
Don’t worry if you cannot understand this.Keras can handle it.
Keras supports
“LSTM”, “GRU”, “SimpleRNN” layers
![Page 32: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/32.jpg)
copy copy
x1 x2 x3
y1 y2 y3
Wi
a1
a1 a2
a2 a3
arrive Taipei on November 2ndTrainingSentences:
Learning Target
other otherdest
10 0 10 010 0
other dest other
… … … … … …
time time
![Page 33: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/33.jpg)
Learning
Backpropagation through time (BPTT)
𝑤 ← 𝑤 − 𝜂𝜕𝐿 ∕ 𝜕𝑤 1x2x
2y1y
1a 2a
copy
𝑤
![Page 34: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/34.jpg)
Unfortunately ……
• RNN-based network is not always easy to learn
感謝曾柏翔同學提供實驗結果
Real experiments on Language modeling
Lucky
sometimes
Tota
l Lo
ss
Epoch
![Page 35: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/35.jpg)
The error surface is rough.
w1
w2
Co
st
The error surface is either very flat or very steep.
Clipping
[Razvan Pascanu, ICML’13]
Total Lo
ss
![Page 36: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/36.jpg)
Why?
1
1
y1
0
1
w
y2
0
1
w
y3
0
1
w
y1000
……
𝑤 = 1
𝑤 = 1.01
𝑦1000 = 1
𝑦1000 ≈ 20000
𝑤 = 0.99
𝑤 = 0.01
𝑦1000 ≈ 0
𝑦1000 ≈ 0
1 1 1 1
LargeΤ𝜕𝐿 𝜕𝑤
Small Learning rate?
small Τ𝜕𝐿 𝜕𝑤
Large Learning rate?
Toy Example
=w999
![Page 37: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/37.jpg)
add
• Long Short-term Memory (LSTM)
• Can deal with gradient vanishing (not gradient explode)
Helpful Techniques
Memory and input are added
The influence never disappears unless forget gate is closed
No Gradient vanishing(If forget gate is opened.)
[Cho, EMNLP’14]
Gated Recurrent Unit (GRU): simpler than LSTM
![Page 38: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/38.jpg)
Helpful Techniques
Vanilla RNN Initialized with Identity matrix + ReLU activation function [Quoc V. Le, arXiv’15]
Outperform or be comparable with LSTM in 4 different tasks
[Jan Koutnik, JMLR’14]
Clockwise RNN
[Tomas Mikolov, ICLR’15]
Structurally Constrained Recurrent Network (SCRN)
![Page 39: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/39.jpg)
More Applications ……
store store
x1 x2 x3
y1 y2 y3
a1
a1 a2
a2 a3
arrive Taipei on November 2nd
Probability of “arrive” in each slot
Probability of “Taipei” in each slot
Probability of “on” in each slot
Input and output are both sequences with the same length
RNN can do more than that!
![Page 40: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/40.jpg)
Many to one
• Input is a vector sequence, but output is only one vector
Sentiment Analysis
……
我 覺 太得 糟 了
超好雷
好雷
普雷
負雷
超負雷
看了這部電影覺得很高興 …….
這部電影太糟了…….
這部電影很棒 …….
Positive (正雷) Negative (負雷) Positive (正雷)
……
![Page 41: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/41.jpg)
Many to one
• Input is a vector sequence, but output is only one vector
Key Term Extraction
[Shen & Lee, Interspeech 16]
α1 α2 α3 α4 … αT
ΣαiVi
x4x3x2x1 xT…
…V3V2V1 V4 VT
Embedding Layer
…V3V2V1 V4 VT
OT
…
document
Output Layer
Hidden Layer
Embedding Layer
Key Terms: DNN, LSTN
![Page 42: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/42.jpg)
Many to Many (Output is shorter)
• Both input and output are both sequences, but the output is shorter.
• E.g. Speech Recognition
好 好 好
Trimming
棒 棒 棒 棒 棒
“好棒”
Why can’t it be “好棒棒”
Input:
Output: (character sequence)
(vector sequence)
Problem?
![Page 43: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/43.jpg)
Many to Many (Output is shorter)
• Both input and output are both sequences, but the output is shorter.
• Connectionist Temporal Classification (CTC) [Alex Graves, ICML’06][Alex Graves, ICML’14][Haşim Sak, Interspeech’15][Jie Li, Interspeech’15][Andrew Senior, ASRU’15]
好 φ φ 棒 φ φ φ φ 好 φ φ 棒 φ 棒 φ φ
“好棒” “好棒棒”Add an extra symbol “φ” representing “null”
![Page 44: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/44.jpg)
Many to Many (Output is shorter)
• CTC: Training
好 棒
AcousticFeatures:
Label:
All possible alignments are considered as correct.
好 棒φ φ φ φ
好 棒φ φ φ φ
好 棒φ φ φ φ
……
![Page 45: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/45.jpg)
Many to Many (Output is shorter)
• CTC: example
Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with
recurrent neural networks." Proceedings of the 31st International Conference on
Machine Learning (ICML-14). 2014.
HIS FRIEND’S
φ φ φ φ φ φ φ φ φ φ φ φφφ φ φ φ
![Page 46: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/46.jpg)
Many to Many (No Limitation)
• Both input and output are both sequences with different lengths. → Sequence to sequence learning
• E.g. Machine Translation (machine learning→機器學習)
Containing all information about
input sequence
learnin
g
mach
ine
![Page 47: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/47.jpg)
learnin
g
Many to Many (No Limitation)
• Both input and output are both sequences with different lengths. → Sequence to sequence learning
• E.g. Machine Translation (machine learning→機器學習)
mach
ine
機 習器 學
……
……
Don’t know when to stop
慣 性
![Page 48: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/48.jpg)
Many to Many (No Limitation)
推 tlkagk: =========斷==========接龍推文是ptt在推文中的一種趣味玩法,與推齊有些類似但又有所不同,是指在推文中接續上一樓的字句,而推出連續的意思。該類玩法確切起源已不可知(鄉民百科)
![Page 49: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/49.jpg)
learnin
g
Many to Many (No Limitation)
• Both input and output are both sequences with different lengths. → Sequence to sequence learning
• E.g. Machine Translation (machine learning→機器學習)
mach
ine
機 習器 學
Add a symbol “===“ (斷)
[Ilya Sutskever, NIPS’14][Dzmitry Bahdanau, arXiv’15]
===
![Page 50: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/50.jpg)
Many to Many (No Limitation)
• Both input and output are both sequences with different lengths. → Sequence to sequence learning
• E.g. Machine Translation (machine learning→機器學習)
https://arxiv.org/pdf/1612.01744v1.pdf
![Page 51: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/51.jpg)
Beyond Sequence
• Syntactic parsing
Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton,
Grammar as a Foreign Language, NIPS 2015
john hasa dog
![Page 52: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/52.jpg)
Sequence-to-sequence Auto-encoder - Text• To understand the meaning of a word sequence,
the order of the words can not be ignored.
white blood cells destroying an infection
an infection destroying white blood cells
exactly the same bag-of-word
positive
negative
differentmeaning
![Page 53: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/53.jpg)
Sequence-to-sequence Auto-encoder - Text
Li, Jiwei, Minh-Thang Luong, and Dan Jurafsky. "A hierarchical neural autoencoder
for paragraphs and documents." arXiv preprint arXiv:1506.01057(2015).
![Page 54: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/54.jpg)
Sequence-to-sequence Auto-encoder - Text
Li, Jiwei, Minh-Thang Luong, and Dan Jurafsky. "A hierarchical neural autoencoder
for paragraphs and documents." arXiv preprint arXiv:1506.01057(2015).
![Page 55: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/55.jpg)
Sequence-to-sequence Auto-encoder - Speech• Dimension reduction for a sequence with variable length
ever ever
never
never
never
dog
dog
dogs
Fixed-length vectoraudio segments (word-level)
Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-Shan Lee, Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder, Interspeech 2016
![Page 56: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/56.jpg)
Sequence-to-sequence Auto-encoder - Speech
Audio archive divided into variable-length audio segments
Audio Segment to
Vector
Audio Segment to
VectorSimilarity
Search Result
Spoken Query
Off-line
On-line
![Page 57: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/57.jpg)
Sequence-to-sequence Auto-encoder - Speech
audio segment
acoustic features
The values in the memory
represent the whole audio
segment
x1 x2 x3 x4
RNN Encoder
audio segment
vector
The vector we want
How to train RNN Encoder?
![Page 58: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/58.jpg)
Sequence-to-sequence Auto-encoder
RNN Decoderx1 x2 x3 x4
y1 y2 y3 y4
x1 x2 x3x4
RNN Encoder
audio segment
acoustic features
The RNN encoder and decoder are jointly trained.
Input acoustic features
![Page 59: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/59.jpg)
Sequence-to-sequence Auto-encoder - Speech• Visualizing embedding vectors of the words
fear
nearname
fame
![Page 60: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/60.jpg)
Demo: Chat-bot
電視影集 (~40,000 sentences)、美國總統大選辯論
![Page 61: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/61.jpg)
Demo: Chat-bot
• Develop Team• Interface design: Prof. Lin-Lin Chen & Arron Lu
• Web programming: Shi-Yun Huang
• Data collection: Chao-Chuang Shih
• System implementation: Kevin Wu, Derek Chuang, & Zhi-Wei Lee (李致緯), Roy Lu (盧柏儒)
• System design: Richard Tsai & Hung-Yi Lee
61
![Page 62: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/62.jpg)
Demo: Video Caption Generation
Video
A girl is running.
A group of people is walking in the forest.
A group of people is knocked by a tree.
![Page 63: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/63.jpg)
Demo: Video Caption Generation
• Can machine describe what it see from video?
• Demo: 台大語音處理實驗室曾柏翔、吳柏瑜、盧宏宗
• Video: 莊舜博、楊棋宇、黃邦齊、萬家宏
![Page 64: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/64.jpg)
Demo: Image Caption Generation
• Input an image, but output a sequence of words
Input image
a woman is
……
===
CNN
A vector for whole image
[Kelvin Xu, arXiv’15][Li Yao, ICCV’15]
Caption Generation
![Page 65: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/65.jpg)
Demo: Image Caption Generation
• Can machine describe what it see from image?
• Demo:台大電機系大四蘇子睿、林奕辰、徐翊祥、陳奕安
MTK產學大聯盟
![Page 66: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/66.jpg)
http://news.ltn.com.tw/photo/politics/breakingnews/975542_1
![Page 67: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/67.jpg)
Organize
Attention-based Model
http://henrylo1605.blogspot.tw/2015/05/blog-post_56.html
Breakfast today
What you learned in these lectures
summer vacation 10 years ago
What is deep learning?
Answer
![Page 68: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/68.jpg)
Attention-based Model
Reading Head Controller
Input
Reading Head
output
…… ……
Machine’s Memory
DNN/RNN
Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Attain%20(v3).ecm.mp4/index.html
![Page 69: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/69.jpg)
Attention-based Model v2
Reading Head Controller
Input
Reading Head
output
…… ……
Machine’s Memory
DNN/RNN
Neural Turing Machine
Writing Head Controller
Writing Head
![Page 70: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/70.jpg)
Reading Comprehension
Query
Each sentence becomes a vector.
……
DNN/RNN
Reading Head Controller
……
answer
Semantic Analysis
![Page 71: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/71.jpg)
Reading Comprehension
• End-To-End Memory Networks. S. Sukhbaatar, A. Szlam, J. Weston, R. Fergus. NIPS, 2015.
The position of reading head:
Keras has example: https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py
![Page 72: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/72.jpg)
Visual Question Answering
source: http://visualqa.org/
![Page 73: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/73.jpg)
Visual Question Answering
Query DNN/RNN
Reading Head Controller
answer
CNN A vector for each region
![Page 74: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/74.jpg)
Speech Question Answering
• TOEFL Listening Comprehension Test by Machine
• Example:
Question: “ What is a possible origin of Venus’ clouds? ”
Audio Story:
Choices:
(A) gases released as a result of volcanic activity
(B) chemical reactions caused by high surface temperatures
(C) bursts of radio energy from the plane's surface
(D) strong winds that blow dust into the atmosphere
(The original story is 5 min long.)
![Page 75: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/75.jpg)
Model Architecture
“what is a possible origin of Venus‘ clouds?"
Question:
Question Semantics
…… It be quite possible that this be due to volcanic eruption because volcanic eruption often emit gas. If that be the case volcanism could very well be the root cause of Venus 's thick cloud cover. And also we have observe burst of radio energy from the planet 's surface. These burst be similar to what we see when volcano erupt on earth ……
Audio Story:
Speech Recognition
Semantic Analysis
Semantic Analysis
Attention
Answer
Select the choice most similar to the answer
Attention
Everything is learned from training examples
![Page 76: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/76.jpg)
Simple BaselinesA
ccu
racy
(%
)
(1) (2) (3) (4) (5) (6) (7)
Naive Approaches
random
(4) the choice with semantic most similar to others
(2) select the shortestchoice as answer
Experimental setup: 717 for training, 124 for validation, 122 for testing
![Page 77: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/77.jpg)
Memory NetworkA
ccu
racy
(%
)
(1) (2) (3) (4) (5) (6) (7)
Memory Network: 39.2%
Naive Approaches
(proposed by FB AI group)
![Page 78: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/78.jpg)
Proposed ApproachA
ccu
racy
(%
)
(1) (2) (3) (4) (5) (6) (7)
Memory Network: 39.2%
Naive Approaches
Proposed Approach: 48.8%
(proposed by FB AI group)
[Fang & Hsu & Lee, SLT 16]
[Tseng & Lee, Interspeech 16]
![Page 79: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/79.jpg)
To Learn More ……
• The Unreasonable Effectiveness of Recurrent Neural Networks
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Understanding LSTM Networks
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 80: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/80.jpg)
Deep & Structured
![Page 81: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/81.jpg)
RNN v.s. Structured Learning
• RNN, LSTM• Unidirectional RNN
does not consider the whole sequence
• Cost and error not always related
• Deep
• HMM, CRF, Structured Perceptron/SVM
• Using Viterbi, so consider the whole sequence
• How about Bidirectional RNN?
• Can explicitly consider the label dependency
• Cost is the upper bound of error
?
![Page 82: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/82.jpg)
Integrated Together
RNN, LSTM
HMM, CRF, Structured Perceptron/SVM
Deep
• Explicitly model the dependency
• Cost is the upper bound of error
http://photo30.bababian.com/upload1/20100415/42E9331A6580A46A5F89E98638B8FD76.jpg
![Page 83: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/83.jpg)
Integrated together
• Speech Recognition: CNN/LSTM/DNN + HMM
𝑃 𝑥, 𝑦 = 𝑃 𝑦1|𝑠𝑡𝑎𝑟𝑡 ෑ
𝑙=1
𝐿−1
𝑃 𝑦𝑙+1|𝑦𝑙 𝑃 𝑒𝑛𝑑|𝑦𝐿 ෑ
𝑙=1
𝐿
𝑃 𝑥𝑙|𝑦𝑙
𝑃 𝑥𝑙|𝑦𝑙 =𝑃 𝑥𝑙 , 𝑦𝑙𝑃 𝑦𝑙
=𝑃 𝑦𝑙|𝑥𝑙 𝑃 𝑥𝑙
𝑃 𝑦𝑙
…… ……
P(a|xl)
RNN
P(b|xl) P(c|xl) ……
……
…… ……
xl
RNN
Count
![Page 84: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/84.jpg)
…… ……
Integrated together
• Semantic Tagging: Bi-directional LSTM + CRF/Structured SVM
RNN…… ……
…… ……
𝑤 ∙ 𝜙 𝑥, 𝑦
RNN…… ……
𝑦 = argmax𝑦∈𝕐
𝑤∙𝜙(𝑥,𝑦)
Testing:
![Page 85: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/85.jpg)
Is structured learning practical?
• Considering GAN
Discri-minator
Generator xNoiseFrom
Gaussian
1/0
Generated
Real x
Evaluation Function
F(x)
Problem 1
xFxx
maxargInference:
Problem 2
Problem 3: you know how to learn F(x)
![Page 86: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/86.jpg)
Is structured learning practical?
• Conditional GAN
Discri-minator
Generator yx 1/0
Real (x,y) pair
x
![Page 87: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/87.jpg)
Sounds crazy? People do think in this way …• Connect Energy-based model with GAN:
• A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
• Deep Directed Generative Models with Energy-Based Probability Estimation
• ENERGY-BASED GENERATIVE ADVERSARIAL NETWORKS
• Deep learning model for inference
• Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures
• Conditional Random Fields as Recurrent Neural Networks
Deep and Structured will be the future.
![Page 88: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/88.jpg)
Machine learning and having it deep and structured (MLDS)• 和ML的不同
• 在這學期ML 中有提過的內容 (DNN, CNN …),在MLDS 中不再重複,只做必要的復習
• 教科書: “Deep Learning” (http://www.deeplearningbook.org/)
• Part II 是講 deep learning 、Part III 就是講 structured learning
![Page 89: Recurrent Neural Network (RNN) - NTU Speech …speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN...Recurrent Neural Network (RNN) x 1 x 2 y 1 y 2 a 1 a 2 Memory can be considered](https://reader031.vdocument.in/reader031/viewer/2022020108/5b1e53727f8b9a8a3a8b63c4/html5/thumbnails/89.jpg)
Machine learning and having it deep and structured (MLDS)• 所有作業都 2 ~ 4人一組,可以先組好隊後一起來修
• MLDS 的作業和之前不同
• RNN (把之前MLDS 的三個作業合為一個)、Attention-based model 、Deep Reinforcement Learning 、Deep Generative Model 、Sequence-to-sequence learning
• MLDS 初選不開放加簽,以組為單位加簽,作業0的內容是做一個 DNN (可用現成套件)