addressing the rare word problem in neural machine translation minh tang luon (stanford university)...
TRANSCRIPT
![Page 1: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/1.jpg)
Addressing the Rare Word Problem in Neural
Machine Translation
Minh Tang Luon (Stanford University)Iiya Sutskever (Google)
Quoc V.Le (Google)Orial Vinyals (Google)
Wojciech Zaremba (New York Univerity)
![Page 2: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/2.jpg)
Abstract
• Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches
• A significant weakness in conventional NMT systems is their inability to correctly translate very rare words
![Page 3: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/3.jpg)
Neural Machine Translation
• A neural machine translation system is any neural network that maps a source sentence, s1, . . . , sn, to a target sentence, t1, . . . , tm• More concretely, an NMT system uses a neural network to
parameterize the conditional distributions
for 1 ≤ j ≤ m
![Page 4: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/4.jpg)
Neural Machine Translation
• They use a deep LSTM to encode the input sequence and a separate deep LSTM to output the translation.
• The encoder reads the source sentence, one word at a time, and produces a large vector that represents the entire source sentence.
• The decoder is initialized with this vector and generates a translation, one word at a time, until it emits the end-of-sentence symbol <eos>.
![Page 5: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/5.jpg)
Rare Word Models
• They treated the NMT system as a black box and train it on a corpus annotated by one of the models which will follow shortly.
• First, the alignments are produced with an unsupervised aligner.
• Next, they use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step.
• If a word does not appear in their dictionary, they apply the identity translation
![Page 6: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/6.jpg)
1.Copyable Model
2.Position All Model (PosAll)
3.Positional Unknown Model (PosUnk)
![Page 7: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/7.jpg)
Training Data
• Training data consisted of 12 M parallel sentences.(348 M French and 304 M English words)
• Due to the computationally intensive nature of the naive softmax, they limited the French vocabulary to the either the 40K or the 80K most frequent French words.
• On the source side, they could afford a much larger vocabulary, so they used the 200K most frequent English words.
• The model treats all other words as unknowns.
![Page 8: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/8.jpg)
Main result
![Page 9: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/9.jpg)
Comparison of different alignment models
![Page 10: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/10.jpg)
Effect of depths
![Page 11: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/11.jpg)
Sample Translation
![Page 12: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/12.jpg)
Conclusions
• A simple alignment based technique can mitigate and even overcome the main weakness of the current NMT systems, which is their inability to translate words that are not in their vocabulary.
• A key advantage is that it is applicable to any NMT system and not only deep LSTM model.
• The technique yielded a constant and substantial improvement of up to 2.8 BLEU points over various NMT systems.
• With 37.5 BLEU points they have established the first NMT system that outperformed the best MT system on a WMT’14 contest dataset.
![Page 13: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals](https://reader036.vdocument.in/reader036/viewer/2022082613/5697bfa21a28abf838c961a9/html5/thumbnails/13.jpg)
Thank You!