addressing the rare word problem in neural machine translation minh tang luon (stanford university)...

13
Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals (Google) Wojciech Zaremba (New York Univerity)

Upload: clemence-hodge

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Addressing the Rare Word Problem in Neural

Machine Translation

Minh Tang Luon (Stanford University)Iiya Sutskever (Google)

Quoc V.Le (Google)Orial Vinyals (Google)

Wojciech Zaremba (New York Univerity)

Page 2: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Abstract

• Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches

• A significant weakness in conventional NMT systems is their inability to correctly translate very rare words

Page 3: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Neural Machine Translation

• A neural machine translation system is any neural network that maps a source sentence, s1, . . . , sn, to a target sentence, t1, . . . , tm• More concretely, an NMT system uses a neural network to

parameterize the conditional distributions

for 1 ≤ j ≤ m

Page 4: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Neural Machine Translation

• They use a deep LSTM to encode the input sequence and a separate deep LSTM to output the translation.

• The encoder reads the source sentence, one word at a time, and produces a large vector that represents the entire source sentence.

• The decoder is initialized with this vector and generates a translation, one word at a time, until it emits the end-of-sentence symbol <eos>.

Page 5: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Rare Word Models

• They treated the NMT system as a black box and train it on a corpus annotated by one of the models which will follow shortly.

• First, the alignments are produced with an unsupervised aligner.

• Next, they use the alignment links to construct a word dictionary that will be used for the word translations in the post-processing step.

• If a word does not appear in their dictionary, they apply the identity translation

Page 6: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

1.Copyable Model

2.Position All Model (PosAll)

3.Positional Unknown Model (PosUnk)

Page 7: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Training Data

• Training data consisted of 12 M parallel sentences.(348 M French and 304 M English words)

• Due to the computationally intensive nature of the naive softmax, they limited the French vocabulary to the either the 40K or the 80K most frequent French words.

• On the source side, they could afford a much larger vocabulary, so they used the 200K most frequent English words.

• The model treats all other words as unknowns.

Page 8: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Main result

Page 9: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Comparison of different alignment models

Page 10: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Effect of depths

Page 11: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Sample Translation

Page 12: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Conclusions

• A simple alignment based technique can mitigate and even overcome the main weakness of the current NMT systems, which is their inability to translate words that are not in their vocabulary.

• A key advantage is that it is applicable to any NMT system and not only deep LSTM model.

• The technique yielded a constant and substantial improvement of up to 2.8 BLEU points over various NMT systems.

• With 37.5 BLEU points they have established the first NMT system that outperformed the best MT system on a WMT’14 contest dataset.

Page 13: Addressing the Rare Word Problem in Neural Machine Translation Minh Tang Luon (Stanford University) Iiya Sutskever (Google) Quoc V.Le (Google) Orial Vinyals

Thank You!