lstm: a search space odyssey

25
LSTM: A Search Space Odyssey Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutnยดฤฑk, Bas R. Steunebrink, Jยจurgen Schmidhuber

Upload: others

Post on 04-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LSTM: A Search Space Odyssey

LSTM: A Search Space Odyssey

Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutnยดฤฑk, Bas R. Steunebrink, Jยจurgen Schmidhuber

Page 2: LSTM: A Search Space Odyssey

Outlines

โ€ข Introduction

โ€ข Long Short-Term Memory (LSTM) with peephole connections

โ€ข Experiment and discussion

โ€ข Conclusion

Page 3: LSTM: A Search Space Odyssey

Definition:

โ€ข Recurrent Neural Networks

โ€ข Importance and its applications

โ€ข Gradient problem

โ€ข Vanishing gradient

โ€ข Exploding gradient

โ€ข What is the LSTM?

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 4: LSTM: A Search Space Odyssey

LSTM History:

โ€ข LSTM was proposed in 1997 by Sepp Hochreiter and Jรผrgen Schmidhuber.

โ€ข In 1999, Felix Gers and Jรผrgen Schmidhuber and Fred Cummins introduced the

forget gate into LSTM architecture.

โ€ข In 2000, Gers & Schmidhuber & Cummins added peephole connections

โ€ข In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated

recurrent unit

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 5: LSTM: A Search Space Odyssey

Simple RNN

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 6: LSTM: A Search Space Odyssey

Block diagram

โ€ข Three gates:โ€ข Input gate

โ€ข Forget gate

โ€ข Output gate

โ€ข Two blocks:โ€ข Block input

โ€ข Block output

โ€ข One cell state:โ€ข cell state

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 7: LSTM: A Search Space Odyssey

Block Diagram

Block input:

๐‘Š๐‘Š๐‘ง๐‘ง: input weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘…๐‘…๐‘ง๐‘ง: recurrent weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘๐‘๐‘ง๐‘ง: bias weight

๐‘ฅ๐‘ฅ๐‘ก๐‘ก: input vector at time t

๐‘ฆ๐‘ฆ๐‘ก๐‘กโˆ’1: output at time t-1

Input

Recurrent

z

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 8: LSTM: A Search Space Odyssey

Block Diagram

Input gate:๐‘Š๐‘Š๐‘–๐‘–: input weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘…๐‘…๐‘–๐‘–: recurrent weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘๐‘๐‘–๐‘–: bias weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘–๐‘–: peephole weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘ก๐‘กโˆ’1: cell state at time t-1

๐‘ฅ๐‘ฅ๐‘ก๐‘ก: input vector at time t

๐‘ฆ๐‘ฆ๐‘ก๐‘กโˆ’1: output at time t-1

Input

Recurrent

i

๐‘๐‘๐‘ก๐‘กโˆ’1

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 9: LSTM: A Search Space Odyssey

Block Diagram

Forget gate:๐‘Š๐‘Š๐‘“๐‘“: input weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘…๐‘…๐‘“๐‘“: recurrent weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘๐‘๐‘“๐‘“: bias weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘“๐‘“: peephole weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘ก๐‘กโˆ’1: cell state at time t-1

๐‘ฅ๐‘ฅ๐‘ก๐‘ก: input vector at time t

๐‘ฆ๐‘ฆ๐‘ก๐‘กโˆ’1: output at time t-1

Input

Recurrent

f

๐‘๐‘๐‘ก๐‘กโˆ’1

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 10: LSTM: A Search Space Odyssey

Block Diagram

Output gate:๐‘Š๐‘Š๐‘œ๐‘œ: input weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘…๐‘…๐‘œ๐‘œ: recurrent weight ( ๐‘…๐‘…๐‘๐‘ ร—๐‘€๐‘€)

๐‘๐‘๐‘œ๐‘œ: bias weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘œ๐‘œ: peephole weight (๐‘…๐‘…๐‘๐‘ )

๐‘๐‘๐‘ก๐‘กโˆ’1: cell state at time t-1

๐‘ฅ๐‘ฅ๐‘ก๐‘ก: input vector at time t

๐‘ฆ๐‘ฆ๐‘ก๐‘กโˆ’1: output at time t-1

Input

Recurrent

o

๐‘๐‘๐‘ก๐‘ก

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 11: LSTM: A Search Space Odyssey

Block Diagram

State cell:๐‘ง๐‘ง๐‘ก๐‘ก: the output of block input at time t

๐‘–๐‘–๐‘ก๐‘ก: the output of input gate at time t

๐‘๐‘๐‘ก๐‘กโˆ’1: the output of cell state at time

t-1

๐‘“๐‘“๐‘ก๐‘ก: output of forget gate at time t

๐‘๐‘๐‘ก๐‘กโˆ’1

๐‘–๐‘–๐‘ก๐‘ก

๐‘ง๐‘ง๐‘ก๐‘ก

๐‘๐‘๐‘ก๐‘กโˆ’1

๐‘“๐‘“๐‘ก๐‘ก

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 12: LSTM: A Search Space Odyssey

Block Diagram

Block output:๐‘œ๐‘œ๐‘ก๐‘ก: the output of output gate at time t

๐‘๐‘๐‘ก๐‘ก: state cell at time tInput

Recurrent

y

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 13: LSTM: A Search Space Odyssey

LSTM Variants

โ€ข NIG: No Input Gate: ๐‘–๐‘–๐‘ก๐‘ก = 1

โ€ข NFG: No Forget Gate: ๐‘“๐‘“๐‘ก๐‘ก = 1

โ€ข NOG: No Output Gate: ๐‘œ๐‘œ๐‘ก๐‘ก = 1

โ€ข NIAF: No Input Activation Function: g(x) = x

โ€ข NOAF: No Output Activation Function: h(x) = x

โ€ข CIFG: Coupled Input and Forget Gate: ๐‘“๐‘“๐‘ก๐‘ก = 1- ๐‘–๐‘–๐‘ก๐‘ก

โ€ข NP: No Peepholes

โ€ข FGR: Full gate recurrence

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 14: LSTM: A Search Space Odyssey

Experiment setup

Datasets:

โ€ข TIMIT speech corpus

โ€ข IAM Online Handwriting Database

โ€ข JSB Chorales

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 15: LSTM: A Search Space Odyssey

Experiment setup

Features:

โ€ข TIMIT speech corpus:โ€ข extract 12 MFCCs + energy as well as their first and second derivatives

โ€ข IAM Online Handwriting Database:โ€ข x, y, t and the time of the pen lifting

โ€ข JSB Chorales:

โ€ข transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 16: LSTM: A Search Space Odyssey

Experiment setup

Network Architectures and training:

Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training

TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD

IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD

JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 17: LSTM: A Search Space Odyssey

Comparison of the Variants

โ€ข Test set performance for all 200 trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 18: LSTM: A Search Space Odyssey

Comparison of the Variants

โ€ข Test set performance for the best 10% trials:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 19: LSTM: A Search Space Odyssey

Impact of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 20: LSTM: A Search Space Odyssey

Interaction of Hyperparameters

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 21: LSTM: A Search Space Odyssey

Total marginal predicted performance

TIMIT:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 22: LSTM: A Search Space Odyssey

Total marginal predicted performance

IAM Online:

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 23: LSTM: A Search Space Odyssey

Total marginal predicted performance

JSB Chorales :

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 24: LSTM: A Search Space Odyssey

Conclusion

โ€ข The most commonly used LSTM architecture performs reasonably well on various datasets.

โ€ข Coupling the input and forget gates (CIFG) or removing peephole connections (NP)

simplified LSTMs in these experiments without significantly decreasing performance.

โ€ข The forget gate and the output activation function are the most critical components of the

LSTM block

โ€ข the learning rate is the most crucial hyperparameter, followed by the network size.

โ€ข Hyperparameters are virtually independent

Introduction LSTM with peephole connections Results and discussion Conclusion

Page 25: LSTM: A Search Space Odyssey

References:

โ€ข K. Greff, R. K. Srivastava, J. Koutnรญk, B. R. Steunebrink and J. Schmidhuber, "LSTM: A

Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.

28, no. 10, pp. 2222-2232, Oct. 2017.

โ€ข https://www.youtube.com/watch?v=lycKqccytfU

โ€ข https://www.youtube.com/watch?v=lWkFhVq9-nc

โ€ข https://en.wikipedia.org/wiki/Long_short-term_memory

Introduction LSTM with peephole connections Results and discussion Conclusion