advanced neural machine translation (d4l2 deep learning for speech and language upc 2017)
TRANSCRIPT
![Page 1: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/1.jpg)
[course site]
Day 4 Lecture 2
Advanced Neural Machine Translation
Marta R. Costa-jussà
![Page 2: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/2.jpg)
2
Acknowledgments
Kyunghyun Cho, NVIDIA BLOGS:https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus/
![Page 3: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/3.jpg)
3
From previous lecture...
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
![Page 4: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/4.jpg)
Attention-based mechanism
Read the whole sentence, then produce the translated words one at a time, each time focusing on a different part of the input sentence
![Page 5: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/5.jpg)
5
Encoder with attention: context vector
GOAL: Encode a source sentence into a set of context vectors
http://www.deeplearningbook.org/contents/applications.html
![Page 6: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/6.jpg)
6
Composing the context vector: bidirectional RNN
![Page 7: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/7.jpg)
7
Composing the context vector: bidirectional RNN
![Page 8: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/8.jpg)
8
Decoder with attention
● The context vector now concatenates forward and reverse encoding vectors
● The decoder generates one symbol at a time based on this new context set
To compute the new decoder memory state, we must get one vector out of all context vectors.
![Page 9: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/9.jpg)
9
Compute the context vector
Each time step t, ONE vector context (c_i) is computed based on the (1) previous hidden state of the decoder (z_(i-1)), (2) previously decoded symbol (u_(i-1)), (3) whole context set (C)
![Page 10: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/10.jpg)
10
Score each context vector based on how relevant it is for translating the next target word
This scoring (h_j, j=1...T_x) is based on the previous memory state, the previous generated target word and the j-th context vector
![Page 11: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/11.jpg)
11
Score each context vector based on how relevant it is for translating the next target word
fscore is usually a simple single-layer feedforward network
this relevance score measures how relevant the j-th context vector of the source sentence is in deciding the next symbol in the translation
![Page 12: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/12.jpg)
12
Normalize relevance scores=attention weight
These attention weights correspond to how much the decoder attends to each of the context vectors.
![Page 13: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/13.jpg)
13
Obtain the context vector c_i
as the weighted sum of the context vectors with their weights being the attention weights
![Page 14: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/14.jpg)
14
Update the decoder’s hidden state
(The initial hidden state is initialized based on the last hidden state of the reverse RNN)
![Page 15: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/15.jpg)
1515
Decoder
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
RNN’s internal state zi depends on: summary vector ht, previous output word ui-1 and previous internal state zi-1.
NEW INTERNAL STATE
From previous session
![Page 16: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/16.jpg)
16
Translation performances comparison
English-to-French WMT 2014 task
Model BLEU
Simple Encoder-Decoder 17.82
+Attention-based 37.19
Phrase-based 37.03
![Page 17: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/17.jpg)
17
What attention learns… WORD ALIGNMENT
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
![Page 18: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/18.jpg)
18
What attention learns… WORD ALIGNMENT
Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015)
![Page 19: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/19.jpg)
19
Neural MT is better than phrase-basedNeural Network for Machine Translation at Production Scale
![Page 20: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/20.jpg)
20
Results in WMT 2016 international evaluation
![Page 21: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/21.jpg)
What Next?
![Page 22: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/22.jpg)
22
Character-based Neural Machine Translation: Motivation■Word embeddings have been shown to boost the performance in many NLP tasks, including machine translation.■However, the standard look-up based embeddings are limited to a finite-size vocabulary for both computational and sparsity reasons.■The orthographic representation of the words is completely ignored.■The standard learning process is blind to the presence of stems, prefixes, suffixes and any other kind of affixes in words.
![Page 23: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/23.jpg)
23
Character-based Neural MT: Proposal (Step 1)
■The computation of the representation of each word starts with a character-based embedding layer that associates each word (sequence of characters) with a sequence of vectors.
■This sequence of vectors is then processed with a set of 1D convolution filters of different lengths followed with a max pooling layer.
■For each convolutional filter, we keep only the output with the maximum value. The concatenation of these max values already provides us with a representation of each word as a vector with a fixed length equal to the total number of convolutional kernels.
![Page 24: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/24.jpg)
24
Character-based Neural MT: Proposal (Step 2)
■The addition of two highway layers was shown to improve the quality of the language model in (Kim et al., 2016).■The output of the second Highway layer will give us the final vector representation of each source word, replacing the standard source word embedding in the neural machine translation system.
architecture designed to ease gradient-based training of deep networks
![Page 25: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/25.jpg)
25
Character-based Neural MT: Integration with NMT
![Page 26: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/26.jpg)
26
Examples
![Page 27: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/27.jpg)
27
Multilingual Translation
Kyunghyun Cho, “DL4MT slides” (2015)
![Page 28: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/28.jpg)
28
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism” (2016)
![Page 29: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/29.jpg)
29
Multilingual Translation Approaches
Sharing attention-based mechanism across language pairs Orhan Firat et al, “Multi-way, Multilingual Neural Machine Translation with a Shared-based Mechanism” (2016)
Share encoder, decoder, attention accross language pairsJohnson et al, “Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation” (2016)
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
![Page 30: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/30.jpg)
30
Is the system learning an Interlingua?
https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html
![Page 31: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/31.jpg)
31
Available software on github
DL4MTNEMATUS
Most publications have open-source code...
![Page 32: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/32.jpg)
32
Summary● Attention-based mechanism allows to achieve
state-of-the-art results● Progress in MT includes character-based, multilinguality...
![Page 33: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/33.jpg)
33
Learn more
Natural Language Understanding with Distributed Representation, Kyunghyun Cho, Chapter 6, 2015 (available in github)
![Page 34: Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)](https://reader031.vdocument.in/reader031/viewer/2022021922/5899c4731a28ab45548b573d/html5/thumbnails/34.jpg)
Thanks ! Q&A ?
https://www.costa-jussa.com