![Page 1: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/1.jpg)
Sequence to Sequence Modelsfor Machine TranslationCMSC 723 / LING 723 / INST 725
Marine Carpuat
Slides & figure credits: Graham Neubig
![Page 2: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/2.jpg)
Machine Translation
• Translation system• Input: source sentence F
• Output: target sentence E
• Can be viewed as a function
• Statistical machine translation systems
• 3 problems
• Modeling• how to define P(.)?
• Training/Learning• how to estimate parameters from
parallel corpora?
• Search• How to solve argmax efficiently?
![Page 3: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/3.jpg)
Introduction to Neural Machine Translation
• Neural language models review
• Sequence to sequence models for MT• Encoder-Decoder
• Sampling and search (greedy vs beam search)
• Practical tricks
• Sequence to sequence models for other NLP tasks
![Page 4: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/4.jpg)
A feedforward neural 3-gram model
![Page 5: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/5.jpg)
A recurrent language model
![Page 6: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/6.jpg)
A recurrent language model
![Page 7: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/7.jpg)
Examples of RNN variants
• LSTMs• Aim to address vanishing/exploding gradient issue
• Stacked RNNs
• …
![Page 8: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/8.jpg)
Training in practice: online
![Page 9: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/9.jpg)
Training in practice: batch
![Page 10: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/10.jpg)
Training in practice: minibatch
• Compromise between online and batch
• Computational advantages• Can leverage vector processing instructions in modern hardware
• By processing multiple examples simultaneously
![Page 11: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/11.jpg)
Problem with minibatches: in language modeling, examples don’t have the same length
• 3 tricks• Padding
• Add </s> symbol to make all sentences same length
• Masking• Multiply loss function calculated
over padded symbols by zero
• + sort sentences by length
![Page 12: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/12.jpg)
Introduction to Neural Machine Translation
• Neural language models review
• Sequence to sequence models for MT• Encoder-Decoder
• Sampling and search (greedy vs beam search)
• Training tricks
• Sequence to sequence models for other NLP tasks
![Page 13: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/13.jpg)
Encoder-decoder model
![Page 14: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/14.jpg)
Encoder-decoder model
![Page 15: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/15.jpg)
Generating Output
• We have a model P(E|F), how can we generate translations?
• 2 methods
• Sampling: generate a random sentence according to probability distribution
• Argmax: generate sentence with highest probability
![Page 16: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/16.jpg)
Ancestral Sampling
• Randomly generate words one by one
• Until end of sentence symbol
• Done!
![Page 17: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/17.jpg)
Greedy search
• One by one, pick single highest probability word
• Problems• Often generates easy words first
• Often prefers multiple common words to rare words
![Page 18: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/18.jpg)
Greedy SearchExample
![Page 19: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/19.jpg)
Beam SearchExample with beam size b = 2
We consider b top hypotheses at each time step
![Page 20: Sequence to Sequence Models for Machine Translation · •Training/Learning •how to estimate parameters from parallel corpora? •Search ... •Neural language models review •Sequence](https://reader034.vdocument.in/reader034/viewer/2022050410/5f86e2934e4f3038eb4c319d/html5/thumbnails/20.jpg)
Introduction to Neural Machine Translation
• Neural language models review
• Sequence to sequence models for MT• Encoder-Decoder
• Sampling and search (greedy vs beam search)
• Practical tricks
• Sequence to sequence models for other NLP tasks