lstm: a search space odyssey
TRANSCRIPT
LSTM: A Search Space Odyssey
Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutnยดฤฑk, Bas R. Steunebrink, Jยจurgen Schmidhuber
Outlines
โข Introduction
โข Long Short-Term Memory (LSTM) with peephole connections
โข Experiment and discussion
โข Conclusion
Definition:
โข Recurrent Neural Networks
โข Importance and its applications
โข Gradient problem
โข Vanishing gradient
โข Exploding gradient
โข What is the LSTM?
Introduction LSTM with peephole connections Results and discussion Conclusion
LSTM History:
โข LSTM was proposed in 1997 by Sepp Hochreiter and Jรผrgen Schmidhuber.
โข In 1999, Felix Gers and Jรผrgen Schmidhuber and Fred Cummins introduced the
forget gate into LSTM architecture.
โข In 2000, Gers & Schmidhuber & Cummins added peephole connections
โข In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated
recurrent unit
Introduction LSTM with peephole connections Results and discussion Conclusion
Simple RNN
Introduction LSTM with peephole connections Results and discussion Conclusion
Block diagram
โข Three gates:โข Input gate
โข Forget gate
โข Output gate
โข Two blocks:โข Block input
โข Block output
โข One cell state:โข cell state
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Block input:
๐๐๐ง๐ง: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐ง๐ง: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐ง๐ง: bias weight
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
z
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Input gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
i
๐๐๐ก๐กโ1
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Forget gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
f
๐๐๐ก๐กโ1
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Output gate:๐๐๐๐: input weight ( ๐ ๐ ๐๐ ร๐๐)
๐ ๐ ๐๐: recurrent weight ( ๐ ๐ ๐๐ ร๐๐)
๐๐๐๐: bias weight (๐ ๐ ๐๐ )
๐๐๐๐: peephole weight (๐ ๐ ๐๐ )
๐๐๐ก๐กโ1: cell state at time t-1
๐ฅ๐ฅ๐ก๐ก: input vector at time t
๐ฆ๐ฆ๐ก๐กโ1: output at time t-1
Input
Recurrent
o
๐๐๐ก๐ก
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
State cell:๐ง๐ง๐ก๐ก: the output of block input at time t
๐๐๐ก๐ก: the output of input gate at time t
๐๐๐ก๐กโ1: the output of cell state at time
t-1
๐๐๐ก๐ก: output of forget gate at time t
๐๐๐ก๐กโ1
๐๐๐ก๐ก
๐ง๐ง๐ก๐ก
๐๐๐ก๐กโ1
๐๐๐ก๐ก
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Block output:๐๐๐ก๐ก: the output of output gate at time t
๐๐๐ก๐ก: state cell at time tInput
Recurrent
y
Introduction LSTM with peephole connections Results and discussion Conclusion
LSTM Variants
โข NIG: No Input Gate: ๐๐๐ก๐ก = 1
โข NFG: No Forget Gate: ๐๐๐ก๐ก = 1
โข NOG: No Output Gate: ๐๐๐ก๐ก = 1
โข NIAF: No Input Activation Function: g(x) = x
โข NOAF: No Output Activation Function: h(x) = x
โข CIFG: Coupled Input and Forget Gate: ๐๐๐ก๐ก = 1- ๐๐๐ก๐ก
โข NP: No Peepholes
โข FGR: Full gate recurrence
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Datasets:
โข TIMIT speech corpus
โข IAM Online Handwriting Database
โข JSB Chorales
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Features:
โข TIMIT speech corpus:โข extract 12 MFCCs + energy as well as their first and second derivatives
โข IAM Online Handwriting Database:โข x, y, t and the time of the pen lifting
โข JSB Chorales:
โข transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Network Architectures and training:
Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training
TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD
IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD
JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD
Introduction LSTM with peephole connections Results and discussion Conclusion
Comparison of the Variants
โข Test set performance for all 200 trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
Comparison of the Variants
โข Test set performance for the best 10% trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
Impact of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
Interaction of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
TIMIT:
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
IAM Online:
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
JSB Chorales :
Introduction LSTM with peephole connections Results and discussion Conclusion
Conclusion
โข The most commonly used LSTM architecture performs reasonably well on various datasets.
โข Coupling the input and forget gates (CIFG) or removing peephole connections (NP)
simplified LSTMs in these experiments without significantly decreasing performance.
โข The forget gate and the output activation function are the most critical components of the
LSTM block
โข the learning rate is the most crucial hyperparameter, followed by the network size.
โข Hyperparameters are virtually independent
Introduction LSTM with peephole connections Results and discussion Conclusion
References:
โข K. Greff, R. K. Srivastava, J. Koutnรญk, B. R. Steunebrink and J. Schmidhuber, "LSTM: A
Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.
28, no. 10, pp. 2222-2232, Oct. 2017.
โข https://www.youtube.com/watch?v=lycKqccytfU
โข https://www.youtube.com/watch?v=lWkFhVq9-nc
โข https://en.wikipedia.org/wiki/Long_short-term_memory
Introduction LSTM with peephole connections Results and discussion Conclusion