sequence to sequence (encoder-decoder) learning
TRANSCRIPT
![Page 1: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/1.jpg)
Seq2seq...and beyond
![Page 2: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/2.jpg)
Hello!I am Roberto Silveira
EE engineer, ML enthusiast
@rsilveira79
![Page 3: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/3.jpg)
SequenceIs a matter of time
![Page 4: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/4.jpg)
RNNIs what you need!
![Page 5: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/5.jpg)
Basic Recurrent cells (RNN)
Source: http://colah.github.io/
Issues× Difficulties to deal with long term
dependencies× Difficult to train - vanish gradient issues
![Page 6: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/6.jpg)
Long term issues
Source: http://colah.github.io/, CS224d notes
Sentence 1"Jane walked into the room. John walked in too. Jane said hi to ___"
Sentence 2"Jane walked into the room. John walked in too. It was late in the day, and everyone was walking home after a long day at work. Jane said hi to ___"
![Page 7: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/7.jpg)
LSTM in 2 min...
Review× Address long term dependencies× More complex to train× Very powerful, lots of data
Source: http://colah.github.io/
![Page 8: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/8.jpg)
LSTM in 2 min...
Review× Address long term dependencies× More complex to train× Very powerful, lots of data
Cell state
Source: http://colah.github.io/
Forget gate
Input gate
Output gate
![Page 9: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/9.jpg)
Gated recurrent unit (GRU) in 2 min ...
Review× Fewer hyperparameters× Train faster× Better solution w/ less data
Source: http://www.wildml.com/, arXiv:1412.3555
![Page 10: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/10.jpg)
Gated recurrent unit (GRU) in 2 min ...
Review× Fewer hyperparameters× Train faster× Better solution w/ less data
Source: http://www.wildml.com/, arXiv:1412.3555
Reset gate
Update gate
![Page 11: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/11.jpg)
Seq2seq learning
Or encoder-decoder architectures
![Page 14: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/14.jpg)
Basic idea"Variable" size input (encoder) ->
Fixed size vector representation ->"Variable" size output (decoder)
"Machine","Learning",
"is","fun"
"Aprendizado","de",
"Máquina","é",
"divertido"
0.6360.1220.981
Input One word at a time Stateful
ModelStateful
ModelEncoded
Sequence
Output One word at a time
First RNN(Encoder)
Second RNN
(Decoder)
Memory of previous word influence next
result
Memory of previous word influence next
result
![Page 15: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/15.jpg)
Sequence to Sequence Learning with Neural Networks (2014)
"Machine","Learning",
"is","fun"
"Aprendizado","de",
"Máquina","é",
"divertido"
0.6360.1220.981
1000d word embeddings
4 layers1000
cells/layer
Encoded Sequence
LSTM(Encoder)
LSTM(Decoder)
Source: arXiv 1409.3215v3
TRAINING → SGD w/o momentum, fixed learning rate of 0.7, 7.5 epochs, batches of 128 sentences, 10 days of training (WMT 14 dataset English to French)
4 layers1000
cells/layer
![Page 16: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/16.jpg)
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source Sequence Target Sequence
Source: arXiv 1409.3215v3
![Page 17: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/17.jpg)
Recurrent encoder-decoders
Les chiens aiment les os <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
![Page 18: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/18.jpg)
Recurrent encoder-decoders
Leschiensaimentlesos <EOS> Dogs love bones
Dogs love bones <EOS>
Source: arXiv 1409.3215v3
![Page 19: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/19.jpg)
Source: arXiv 1409.3215v3
Recurrent encoder-decoders - issues
● Difficult to cope with large sentences (longer than training corpus)
● Decoder w/ attention mechanism →relieve encoder to squash into fixed length vector
![Page 20: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/20.jpg)
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for each target word
Weights of each annotation hj
![Page 21: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/21.jpg)
NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (2015)
Source: arXiv 1409.0473v7
Decoder
Context vector for each target word
Weights of each annotation hj
Non-monotonic alignment
![Page 22: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/22.jpg)
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
![Page 23: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/23.jpg)
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
![Page 24: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/24.jpg)
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love+
![Page 25: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/25.jpg)
Attention models for NLP
Source: arXiv 1409.0473v7
Les chiens aiment les os <EOS>
+
<EOS>
Dogs
Dogs
love+
love
bones+
![Page 26: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/26.jpg)
Challenges in using the model● Cannot handle true
variable size input
Source: http://suriyadeepan.github.io/
PADDING
BUCKETING
WORD EMBEDDINGS
● Capture context semantic meaning
● Hard to deal with both short and large sentences
![Page 27: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/27.jpg)
padding
Source: http://suriyadeepan.github.io/
EOS : End of sentencePAD : FillerGO : Start decodingUNK : Unknown; word not in vocabulary
Q : "What time is it? "A : "It is seven thirty."
Q : [ PAD, PAD, PAD, PAD, PAD, “?”, “it”,“is”, “time”, “What” ] A : [ GO, “It”, “is”, “seven”, “thirty”, “.”, EOS, PAD, PAD, PAD ]
![Page 28: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/28.jpg)
Source: https://www.tensorflow.org/
bucketing
Efficiently handle sentences of different lengths
Ex: 100 tokens is the largest sentence in corpus
How about short sentences like: "How are you?" → lots of PAD
Bucket list: [(5, 10), (10, 15), (20, 25), (40, 50)](defaut on Tensorflow translate.py)
Q : [ PAD, PAD, “.”, “go”,“I”] A : [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
![Page 29: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/29.jpg)
Word embeddings (remember previous presentation ;-)Distributed representations → syntactic and semantic is captured
Take =
0.2860.792-0.177-0.1070.109
-0.5420.3490.271
![Page 30: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/30.jpg)
Word embeddings (remember previous presentation ;-)Linguistic regularities (recap)
![Page 31: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/31.jpg)
Phrase representations (Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
![Page 32: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/32.jpg)
Phrase representations (Paper - earning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation)
Source: arXiv 1406.1078v3
1000d vector representation
![Page 33: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/33.jpg)
applications
![Page 34: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/34.jpg)
Neural conversational model - chatbots
Source: arXiv 1506.05869v3
![Page 35: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/35.jpg)
Google Smart reply
![Page 36: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/36.jpg)
Google Smart reply
Source: arXiv 1606.04870v1
Interesting facts● Currently responsible for 10% Inbox replies● Training set 238 million messages
![Page 37: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/37.jpg)
Google Smart reply
Source: arXiv 1606.04870v1
Seq2Seq model
Interesting facts● Currently responsible for 10% Inbox replies● Training set 238 million messages
Feedforward triggering model
Semi-supervised semantic clustering
![Page 38: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/38.jpg)
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Source: arXiv 1411.4555v2
![Page 39: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/39.jpg)
Image captioning(Paper - Show and Tell: A Neural Image Caption Generator)
Encoder
Decoder
Source: arXiv 1411.4555v2
![Page 40: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/40.jpg)
What's next?
And so?
![Page 41: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/41.jpg)
Multi-task sequence to sequence(Paper - MULTI-TASK SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1511.06114v4
One-to-Many (common encoder)
Many-to-One(common decoder)
Many-to-Many
![Page 42: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/42.jpg)
Neural programmer(Paper - NEURAL PROGRAMMER: INDUCING LATENT PROGRAMS WITH GRADIENT DESCENT)
Source: arXiv 1511.04834v3
![Page 43: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/43.jpg)
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
![Page 44: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/44.jpg)
Unsupervised pre-training for seq2seq - 2017(Paper - UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING)
Source: arXiv 1611.02683v1
Pre-trained
Pre-trained
![Page 46: Sequence to sequence (encoder-decoder) learning](https://reader033.vdocument.in/reader033/viewer/2022052405/589a95f11a28abae648b615f/html5/thumbnails/46.jpg)
Place your screenshot here
A Quick example on tensorflow