low resource machine translation - stanford university
TRANSCRIPT
Low Resource Machine Translation
Marc’Aurelio RanzatoFacebook AI Research - NYC
Stanford - CS224N, 10 March 2020
Machine Translation
2
English French
Training data
NMT SystemTrain NMT
Test NMT NMT Systemlife is beautiful la vie est belle
Ingredients: • seq2seq with attention • SGD
Ingredient: • beam
3
• 6000+ languages in the world • 80% of the world population
does not speak English • Less than 5% of the people in
the world are native English speakers.
Some Stats
https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/source:
The Long Tail of Languages
The top 10 languages are spoken by less than 50% of the people. The remaining ~6500 are spoken by the rest! More than 2000 languages are spoken by less than 1000 people.
https://ai.googleblog.com/2019/10/exploring-massively-multilingual.htmlsource:
(X to English)
Machine Translation in Practice
6
English Nepali
Training data25M people
7
English Nepali
Training data
Parallel training data (collection of sentences with corresponding translation) is small!
25M people
Machine Translation in Practice
8
English Nepali
Training data
Let’s represent data with rectangles. The color indicates the language.
Machine Translation in Practice
English Nepali
• Some parallel data originates in the source, some in the target language. • Source and target domains may not match.
Dom
ain
Bible
Parliamentary
Let’s represent (human) translations with empty rectangles.
sentences originating in English
corresponding Nepali translations
sentences originating in Nepali
corresponding English translations
Machine Translation in Practice
English Nepali
• Test data might be in another domain. • There might exist source side in-domain monolingual data.
Dom
ain
mono
mono
News
Bible
Parliamentary
TEST
mono
Machine Translation in Practice
English Nepali
• There might be parallel and monolingual data with a high resource language close to the low resource language of interest. This data may belong to a different domain.
Dom
ain
mono
Hindi
monoBooks
mono
TEST
mono
Machine Translation in PracticeNews
Bible
Parliamentary
English NepaliD
omai
nHindi
TEST
Sinhala Bengali Spanish Tamil Gujarati
…
…the Mondrian like learning setting!
13
Low Resource Machine TranslationLoose definition: A language pair can be considered low resource when the number of parallel sentences is in the order of 10,000 or less. Note: modern NMT systems have several hundred million parameters nowadays!
Challenges: - data
- sourcing data to train on - evaluation datasets
- modeling - unclear learning paradigm - domain adaptation - generalization
Why Low Resource MT Is Interesting?
• It is about learning with less labeled data.
• It is about modeling structured outputs and compositional learning.
• It is a real problem to solve.
14
Outline
15
MODELDATA
ANALYSIS
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019
“Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Investigating Multilingual NMT Representations at Scale” Kudugunta et al., EMNLP 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020
“Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
English NepaliD
omai
n
Wikipedia
Bible, JW300, etc.
GNOME, Ubuntu, etc.
mono
TEST
mono
Common Crawlmono mono
In-domain data: no parallel, little monolingual. Out-of-domain: little parallel, quite a bit monolingual No translation originating from Nepali.
Case Study: En-Ne
A Case Study: En-Ne• Parallel Training data: versions of bible and ubuntu handbook (<1M
sentences).
• Nepali Monolingual data: wikipedia (90K), common crawl (few millions).
• English Monolingual data: unlimited almost.
• Test data: ???
18
FLoRes Evaluation Benchmark• Validation, test and hidden test set, each with 3000 sentences in
English-Nepali and English-Sinhala. • Sentences taken from Wikipedia documents.
• Very expensive and slow. • Very hard to produce high-quality translations:
• automatic checks (language model filtering, transliteration filtering, length filtering, language id filtering, etc),
• human assessment.
Data Collection Process:
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
ExamplesSi-En
En-Si
original
translation
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
Wikipedia originating in Si has differenttopics than Wikipedia originating in En
ExamplesNe-En
En-Ne
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
• Useful to evaluate truly low resource language pairs.
• WMT 2019 and WMT 2020 shared filtering task.
• Several publications.
• Sustained effort, more to come…
22
https://github.com/facebookresearch/floresdata & baseline models
What Did We Learn?
• Data is often as or more important than designing a model.
• Collecting data is not trivial.
• Look at the data!!
23
Outline
24
MODELDATA
ANALYSIS
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019
“Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020
“Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
English NepaliD
omai
nHindi
TEST
Sinhala Bengali Spanish Tamil Gujarati
…
…
ML Perspective: Supervised Learning
If N is small, how can we further regularize the model? - dropout [1] - label smoothing [2]
Learning Framework: Supervised Learning.
Encoder
en ne
Training DatasetD = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqoo/CKclue32o=">AAACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N330kJwA2H4x/MfPHz0eG//oPPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsBB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEEk3K15frl4GZn0mGddHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0ccJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG55w7I55yzSiIyglCNXezYjp3/lBwx+44E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZZ5jf4J3/wFCeLTrg==</latexit>
L(✓) = � log p(y|x)<latexit sha1_base64="5sjE0mn5xSjyfKofXXGLV97RWgQ=">AAADP3icdZJNb9NAEIbX5qPFfDSFI5cVUSRbMpZdkMqlUoEeOFAUJNJWihNrvdkkm6w/5F1Xtrb+Z1z4C9y4cuEAQly5sbHTkqQwkqXRO8/svDveMGWUC9f9ouk3bt66vbV9x7h77/6DndbuwxOe5BkmPZywJDsLESeMxqQnqGDkLM0IikJGTsP560X99JxknCbxB1GmZBChSUzHFCOhpGBX63X8CIkpRkweVfAA+hKahV1aAYV+FUh64NmOY7+rjL/ccTXkDVkMeTCruVnDHQd8gxQNWQ5FMK/J+SUpVsmjaij9KEwKifKiujSS2+cbRrqqKTXLi8IyOmo69DmN4JqzlUNfXhk1a6c29EOUybIKZtb/Tas1+DhP4doxqt60Glfy28r0xZQIZKkZT6HPkglU1uAFLKyg1XYdtw54PfGWSRssoxu0PvujBOcRiQVmiPO+56ZiIFEmKGZETc05SRGeownpqzRGEeEDWf//CnaUMoLjJFNfLGCtrnZIFHFeRqEiF+b5Zm0h/qvWz8X4xUDSOM0FiXEzaJwzKBK4eExwRDOCBStVgnBGlVeIpyhDWKgnZ6gleJtXvp6c7DneM2fv/fP24avlOrbBY/AEmMAD++AQvAFd0ANY+6h91b5rP/RP+jf9p/6rQXVt2fMIrIX++w+p+wMb</latexit>
Per-sample loss:
English NepaliTEST
TRAIN //
x<latexit sha1_base64="iWIFvj4hQ56IJLkVNOFByT95Uok=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt+LYrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr390Xyaq</latexit>
Cross-EntropyLoss
y<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>
human translator
human reference
predictioninput sentenceDecoder
NMT systemDATA
[1] Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting” JMLR 2014[2] Szegedy et al. “Rethinking the inception architecture for computer vision” CVPR 2016
usual attention-based transformer
ML Perspective: Semi-Supervised Learning
Adding source-side monolingual data. Idea: model p(x).
Training Dataset Learning Framework: DAED = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqoo/CKclue32o=">AAACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N330kJwA2H4x/MfPHz0eG//oPPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsBB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEEk3K15frl4GZn0mGddHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0ccJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG55w7I55yzSiIyglCNXezYjp3/lBwx+44E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZZ5jf4J3/wFCeLTrg==</latexit> Either pre-train or add a DAE loss to the supervised cross-entropy term.Ms = {xs
j}j=1,..,Ms<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Noise: word drop, swap, etc.
LDAE(✓) = � log p(x|x+ n)<latexit sha1_base64="BBVELMu8ESha+ppzsI/mZ3d7HZo=">AAADd3icdZJLb9NAEMc3Do8SXmm5wYERIchRQxSXSnCp1EKQOFAUJNJWyibWerNJNvFL3nVly92PwJfjxvfgwo2NnUCSlpFWGs38Zuc/o3FClwvZbv8sGeVbt+/c3blXuf/g4aPH1d29MxHEEWU9GrhBdOEQwVzus57k0mUXYcSI57js3Jl/WOTPL1kkeOB/k2nIBh6Z+HzMKZE6ZO+WvtexR+SUEjfrKDgCnIGZNNOGzQErO+NHVrPVan5RlX/cqRqKgkyGwp7l3KzgTm2xRcqCTIfSnufkfEXKdbKjhhn2nCDJSJyolZC4ebklpKuLQjO9ShqVuu4OWHAPNpStfXryV6iZK20CdkiUpcqeNf4vWq8B0ziEjW90vihdAz8rE8spk6Shm7wG7AYT0NrgCrS6NWo1meqcfLyxJFmUwD74Dbtaa7faucF1x1o6NbS0rl39gUcBjT3mS+oSIfpWO5SDjESSU5epCo4FCwmdkwnra9cnHhODLL8bBXUdGcE4iPTzJeTR9YqMeEKknqPJxTRiO7cI3pTrx3L8bpBxP4wl82nRaBy7IANYHCGMeMSodFPtEBpxrRXolESESn2qFb0Ea3vk687ZQct60zr4elg7fr9cxw56hl4gE1noLTpGn1AX9RAt/TKeGjXjpfG7/Lz8qmwWqFFa1jxBG1a2/gBhxBC2</latexit>
enEnglish NepaliTEST
TRAIN //
Cross-EntropyLosspredictioninput sentence
mono
+
n<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>
enEncoder Decoder
NMT system
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
DATA
Vincent et al. “Stacked denoising auto-encoders:…” JMLR 2010 Liu et al. “Multilingual denoising pretraining for NMT” arXiv:2001.08210 2020
E.g.: The cat the on sat mat.The cat sat on the.
no noise noise only
good region
Adding source-side monolingual data. An alternative approach to DAE.
Training Dataset Learning Framework: Self-Training (ST).D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqoo/CKclue32o=">AAACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N330kJwA2H4x/MfPHz0eG//oPPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsBB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEEk3K15frl4GZn0mGddHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0ccJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG55w7I55yzSiIyglCNXezYjp3/lBwx+44E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZZ5jf4J3/wFCeLTrg==</latexit> ALGORITHM • train model on • repeat
• decode to and create additional dataset
• retrain model on:
p(y|x)<latexit sha1_base64="0bhALflm3VcKbOBXhaYvsf98m/8=">AAACnHicbZFdSxtBFIZnt7XVrW1jeyWFMhiECGHZtQV7IxXNhaWkRGiikA3D7GSiY2Y/mDkru2z3V/lPvPPfONmN1MYeGHh5z3OY8xGmUmjwvHvLfvFy7dXr9Q3nzebbd+9bWx9GOskU40OWyERdhFRzKWI+BAGSX6SK0yiU/Dycnyzy5zdcaZHEv6FI+SSil7GYCUbBWKR1uxtEFK4YlWWvwoc4KHEn7xZ7ROCgIqU49Luu2/1VOX+5fkV0Q+bkuqauG6pP9AoHDVeQec3NHzl4yvVMJojCJC9pllePTWTdm5UmBpWTdoo/+R5ptT3XqwM/F/5StNEyBqR1F0wTlkU8Biap1mPfS2FSUgWCSV45QaZ5StmcXvKxkTGNuJ6U9XIrvGucKZ4lyrwYcO0+rShppHURhYZcTKRXcwvzf7lxBrNvk1LEaQY8Zs1Hs0xiSPDiUngqFGcgCyMoU8L0itkVVZSBuadjluCvjvxcjPZd/4u7f/a1fXS8XMc6+oR2UAf56AAdoVM0QEPErG3ru3Vq/bA/2z37p91vUNta1nxE/4Q9egC4K8XU</latexit>
D<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">AAACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RRL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fvv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYYiODiWt/DHbteazjaZjO1Xhj8KdiSaaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhhhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvvb3rvebx6fzNaxgrbQNmohFx2gY3SOOqiLqPHTuDC6Rs/8ZV6bffNvjZrGLLOJ/imTvgLIHMr3</latexit>
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
As = {(xsj , yj)}j=1,..,Ms
<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AAAC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l133nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK44jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhhr2C7ccIAbDCmm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQppB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>
D [As<latexit sha1_base64="LzJmmXdcvEVdwpwvmNy9sV6dCi8=">AAADD3icbZLPb9MwFMedjB+j/OrgyMWiqtRKUZQMJLhMGmMHLkNFotukpo0c193c2klkO1Mik/+Ay/6VXTiAEFeu3PhvcJMO2ownRfrqvc/z+/rFUcqoVJ7327K3bt2+c3f7Xuv+g4ePHrd3nhzLJBOYDHHCEnEaIUkYjclQUcXIaSoI4hEjJ9Hi7bJ+ckGEpEn8URUpGXN0FtMZxUiZVLhjdbsBR+ocI6YPS7gHAw17uVP0QwqDMtR0z3dc13lftv5xR+VE1mQ+keG84uY1dxTKBqlqspiocFGRi2tSrZOH5UQHPEpyjbK8vDaSORcNIwPTlPaKT3m/1TXTYSAphxvO1g5989dor3LqwCBCQhdlOO83Ta9vIcBZCjdOCdsdz/WqgDeFvxIdsIpB2P4VTBOccRIrzJCUI99L1VgjoShmxIzLJEkRXqAzMjIyRpzIsa7+Zwm7JjOFs0SYL1awyq53aMSlLHhkyKVJ2awtk/+rjTI1ez3WNE4zRWJcD5plDKoELh8HnFJBsGKFEQgLarxCfI4Ewso8oZZZgt+88k1xvOv6L9zdDy87+werdWyDZ+A56AEfvAL74B0YgCHA1mfryvpqfbMv7S/2d/tHjdrWqucp2Aj75x/BqfGZ</latexit>
y<latexit sha1_base64="tLy1OgwMuQcWRRWcjddx94KCJ6o=">AAADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n663l+j/33Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yyskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdyy0jfNOXd6mPZczrmdBgKmsANZ2ubvvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlbb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA==</latexit>
Key elements: decoding and training noise.
LST (✓) = � log p(y|x+ n)<latexit sha1_base64="F3K+2TxkBw65jeBJbH/60t8pRlA=">AAADtHicdZJdb9MwFIbdhI8Rvjq45MaiqtSKUjUDATeTNigSFwwVsa6T6jY4rtu6dT4UO1OizH+QS+74NzhJO9quWLJ0dM7z+rw+Om7ImZCdzp+KYd65e+/+wQPr4aPHT55WD59diCCOCO2TgAfRpYsF5cynfckkp5dhRLHncjpwl5/y+uCKRoIF/rlMQzry8MxnU0aw1CnnsPKrjjws5wTzrKvgMUQZbCSttOkwiJSTsWO71W63vinrH3emxqIkk7FwFgW3KLkzR+yQsiTTsXSWBblck3KT7Kpxhjw3SDIcJ2ptJG5d7RjpaVHYSK+TplXX3SESzINbzjYePb0x2iictiBycZSlylk0/29ajwGROIRbz+h6Kd0Av6oGknMqcVM3eQ0RD2ZQe4PXMHe3ga2/prqnn/dqklwDX0G/ae2T/Tjfq1o5utE61Vqn3SkOvB3Yq6AGVqfnVH+jSUBij/qScCzE0O6EcpThSDLCqbJQLGiIyRLP6FCHPvaoGGXF0ilY15kJnAaRvr6ERXZTkWFPiNRzNZl/SezW8uS+2jCW0w+jjPlhLKlPykbTmEMZwHyD4YRFlEie6gCTiGmvkMxxhInUe27pIdi7X74dXBy17Tfto+9vaycfV+M4AC/AS9AANngPTsAX0AN9QAzbGBg/DWy+M5FJTFqiRmWleQ62jun/BT2aJeU=</latexit>
enEnglish NepaliTEST
TRAIN //
Cross-EntropyLosspredictioninput sentence
mono
+
n<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>
neEncoder Decoder
NMT system
Encoder Decodery
<latexit sha1_base64="dqcAHIDoQn8vOgakMoO2ou+kvGw=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznle+z1Hxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmf4dIMBeuOVt59Pi/0XrutAmRg8M0UfascbdpPQZEogCuPaPrhXQF/KbqSE6pxA39yVuIuD+B2hu8hpm7FWzZmuocf9mqiTMNfAO9O3Q/T7fKFpZuxMv2KtV2q50feDuwFkEVLE7XrvxGI59ELvUk4ViIvtUO5CDFoWSEU1VGkaABJnM8oX0detilYpDm26dgTWdGcOyH+noS5tlVRYpdIRLX0WTWmtisZclttX4kxweDlHlBJKlHio/GEYfSh9kqwxELKZE80QEmIdNeIZniEBOpF76sh2Bttnw7ONtrWe9aez/eV48+LcaxA16B16AOLLAPjsBX0AU9QIwPxqXBjJn50aQmN70CNUoLzUuwdsxf/wDiJClw</latexit>
decoded output
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
DATA
He et al. “Revisiting self-training for neural sequence generation” ICLR 2020
ML Perspective: Semi-Supervised Learning
L(✓) = Lsup(✓) + �LST(✓)<latexit sha1_base64="LB2Wg9oUku21WD7fdgfkmjlUEFI=">AAAGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDddowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt003jFAQiishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xxvWhpg44DD+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSssklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5oTlszVjfbDVbyYLFQMuCTSVbh8b6F33o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcII9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUccLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9sChKf</latexit>
Adding target-side monolingual data. Two benefits: a) Decoder learns a good language model. b) Better generalization via data augmentation. c) Unlike ST, target is correct but input is not.
Training Dataset Learning Framework: Back-Translation (BT).D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqoo/CKclue32o=">AAACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N330kJwA2H4x/MfPHz0eG//oPPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsBB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEEk3K15frl4GZn0mGddHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0ccJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG55w7I55yzSiIyglCNXezYjp3/lBwx+44E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZZ5jf4J3/wFCeLTrg==</latexit>
ALGORITHM • train model and on • decode to with , create
additional dataset • retrain model on:
D<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">AAACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RRL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fvv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYYiODiWt/DHbteazjaZjO1Xhj8KdiSaaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhhhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvvb3rvebx6fzNaxgrbQNmohFx2gY3SOOqiLqPHTuDC6Rs/8ZV6bffNvjZrGLLOJ/imTvgLIHMr3</latexit>
ML Perspective: Semi-Supervised LearningEnglish Nepali
TEST
TRAIN //
mono
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
yt ⇠ Mt<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>
D [At<latexit sha1_base64="9r4+5t8z/Eutjwqsqwo6D2I5ofs=">AAAECnicfZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdKUlSyNZp53552xbQcO46LV+rOl6deu37i5fat0+87de/fLOw+OuB+FhPaI7/jhiY05dZhHe4IJh54EIcWu7dBje/Y6qx+f0ZAz3zsUSUD7Lh57bMQIFipl7WhQNV0sJgQ7aUfCHpgp1OJGUrcYmNJK2Z7RaDYb72XpH3cgB7wg4wG3pjk3LbgDi6+RoiCTgbBmOTlbkGKZ7MhBarq2H6c4iuXCSNQ4WzPSVaKglpzH9VJVdQeTMxdWnC1d2r4wWsudNsC0cZgm0prWrzat1mCSKICVa1S9kC6B72TNFBMqcF01eQqm449BeYNzyNwtYYvRZKf9ZqMmzjTwBLwrdB8PN8rmljaJ2xebL6hYZuMn2RZWR1dv4b+TC6tcaTVb+YHLgTEPKmh+ulb5tzn0SeRSTxAHc35qtALRT3EoGHGoahdxGmAyw2N6qkIPu5T30/xTllBVmSGM/FA9noA8u6xIsct54tqKzEzy9VqW3FQ7jcToZT9lXhAJ6pGi0ShyQPiQ/RcwZCElwklUgEnIlFcgExxiItTfU1JLMNZHvhwc7TaNZ83dD88r+6/m69hGj9BjVEMGeoH20VvURT1EtE/aF+2b9l3/rH/Vf+g/C1TbmmseopWj//oLD/dGQg==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
Encoder
en neCross-Entropy
Losspredictioninput sentenceDecoder
NMT system
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
neyt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
backward NMT system
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
LBT (✓) = � log p(y|x)<latexit sha1_base64="QFoLzUqdj7aJb5peYo68MenzgH0=">AAAERHichZPLbtNAFIanNpdibiks2YyIIiUiRHFBgk2lpgSJBUVBNG1FnFjjySSZxDd5xpUtdx6ODQ/AjidgwwKE2CLGl5QkdcVIIx2d8/1z/nOksXybMt5uf91S1GvXb9zcvqXdvnP33v3KzoNj5oUBJn3s2V5waiFGbOqSPqfcJqd+QJBj2eTEWrxK6ydnJGDUc4947JOhg6YunVCMuEyZO8rHmuEgPsPITroC7kEjgfWoGTdMCg1hJnRPb7ZazXdC+8cdihHLyWjEzHnGzXPu0GQbJM/JeMTNRUYuliRfJbtilBiO5UUJCiOxNBI2zzaM9KTIr8fnUUOrye7QYNSBa85WHu1cGK1nTpvQsFCQxMKcN642Lddg4NCHa8/Iei5dAd+KusFnhKOGbPIUGrY3hdIbPIepuxVsOZrodl6XaqJUA59A9wrdh6NSWWGpTNy52HxORSIdP063sD46/9/oXCtzdFDuKB2+6NfQzEq13WpnB14O9CKoguL0zMoXY+zh0CEuxzZibKC3fT5MUMAptonQjJARH+EFmpKBDF3kEDZMsk8gYE1mxnDiBfK6HGbZVUWCHMZix5JkOg/brKXJstog5JOXw4S6fsiJi/NGk9CG3IPpj4JjGhDM7VgGCAdUeoV4hgKEufx36RL0zZEvB8e7Lf1Za/f98+r+QbGObfAIPAZ1oIMXYB+8AT3QB1j5pHxTfig/1c/qd/WX+jtHla1C8xCsHfXPXzOCXHE=</latexit>
DATA
Sennrich et al. “Improving NMT models with monolingual data” ACL 2016
At = {(xk, ytk)}k=1,..,Mt
<latexit sha1_base64="fgtcm855BUPx8R0Y2MIqfz4NPuA=">AAAF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIIKFJ2+gkHhgaYt0mmjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece889dmz5NmW83f6zdGu5Vr99Z+Vu4979Bw8fra49PmJeGGDSxZ7tBScWYsSmLulyym1y4gcEOZZNjq3JuzR/fE4CRj33kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGddARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GGqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI00dnZm8uJUg48B/cG3ufDubRC0jW5LLkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noTT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmmNBVdea1sK++VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq//gGro+qV</latexit>
L(✓) = Lsup(✓) + �LBT(✓)<latexit sha1_base64="P8VXi2JP7nzKZMIQIatLsVBxvsM=">AAAGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDddowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt003jFAQiishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xxvWhpg44DD+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSssklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5DjlszVjfbDVbyYLFQMuCTSVbh8b6F33o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcII9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUccLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9RWBKO</latexit>
ALGORITHM • train model and on • repeat
• decode to with , create additional dataset
• decode to with , create additional dataset
• retrain both and on:
D<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">AAACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RRL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fvv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYYiODiWt/DHbteazjaZjO1Xhj8KdiSaaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhhhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvvb3rvebx6fzNaxgrbQNmohFx2gY3SOOqiLqPHTuDC6Rs/8ZV6bffNvjZrGLLOJ/imTvgLIHMr3</latexit>
ML Perspective: Semi-Supervised LearningEnglish Nepali
TEST
TRAIN //
mono
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
yt ⇠ Mt<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
mono
Adding both source & target-side monolingual data.
Training DatasetD = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqoo/CKclue32o=">AAACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N330kJwA2H4x/MfPHz0eG//oPPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsBB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEEk3K15frl4GZn0mGddHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0ccJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG55w7I55yzSiIyglCNXezYjp3/lBwx+44E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZZ5jf4J3/wFCeLTrg==</latexit>
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Learning Framework: Iterative ST & BT.
y<latexit sha1_base64="tLy1OgwMuQcWRRWcjddx94KCJ6o=">AAADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n663l+j/33Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yyskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdyy0jfNOXd6mPZczrmdBgKmsANZ2ubvvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlbb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA==</latexit>
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
As = {(xsj , yj)}j=1,..,Ms
<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AAAC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l133nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK44jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhhr2C7ccIAbDCmm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQppB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
D [At [As<latexit sha1_base64="e0NAylEw1ooQi7AHFM3rtuBW9V4=">AAAEdnichZNdaxNBFIanSdS6fqX1SgQZGoIJpmG3CvWm0NQIXliJ2LSFbLLMTibJJPvFzmzZZTv/wF/nnb/DGy+d/UhN0q0ODBzOed457zkwpmdRxlX151apXLl3/8H2Q+XR4ydPn1V3ds+ZG/iY9LFruf6liRixqEP6nHKLXHo+QbZpkQtz8SGpX1wRn1HXOeORR4Y2mjp0QjHiMmXslL7XdRvxGUZW3BXwCOoxbIStqGlQqAsjpkdaq91ufRHKX+5UjFhGhiNmzFNunnGnBtsgeUZGI24sUnKxJPkq2RWjWLdNN4xREIqlkaB1tWGkJ0VeI7oOm0pddoc6ozZcc7byaOfGaCN12oK6ifw4Esa8ebdpuQYdBx5ce0bWM+kK+Fk0dD4jHDVlk32oW+4USm/wGibuVrDlaKLb+VioCRMNfAOdO3TfzgpluaUicedm8xkVimT8KNnC+uj8f6PzYksnxZaS6fOGTeXf7xZs2ajW1LaaHng70PKgBvLTM6o/9LGLA5s4HFuIsYGmenwYI59TbBGh6AEjHsILNCUDGTrIJmwYp99GwLrMjOHE9eV1OEyzq4oY2YxFtinJxCTbrCXJotog4JP3w5g6XsCJg7NGk8CC3IXJH4Rj6hPMrUgGCPtUeoV4hnyEufypilyCtjny7eD8oK29bR98fVc7PsnXsQ1egj3QABo4BMfgE+iBPsClX+UX5b1yrfy78qpSr7zO0NJWrnkO1k5F/QNLLWpM</latexit>
Ltotal(✓) = � log p(y|x)� �1 log p(yt|xt)� �2 log p(y
s|xs)<latexit sha1_base64="yVeodM+YYA8v80/XuviO5lyyFlo=">AAAE9HichVRNb9NAEHWbACV8pXDksiKKFIsQxQEJLpWaUiQOFAXRtJXqxFpvNskm/sI7jmw5+zu4cAAhrvwYbvwb1nYSJalLV7I0nnlv5s3T2qZnMQ7N5t+d3ULx1u07e3dL9+4/ePiovP/4jLuBT2iXuJbrX5iYU4s5tAsMLHrh+RTbpkXPzenbpH4+oz5nrnMKkUd7Nh45bMgIBpky9gulqm5jGBNsxccCHSA9RrWwHqkGQ7owYnag1RuN+kexhjsRfZ4hwz43JilukuFODL6FhAwZ9cGYpsjpEgliY3Y/1m3TDWMchGIpJKjPtoR0JMmrRfNQLVXldKRzZqMNZWtN2yuhtVRpHekm9uNIGBP1etHSBp0EHtpoI+sZdQ34QdR0GFPAqhzyAumWO0JSG5qjRN0abLmaOG6/y+WECQc9R841vM+nubSFpDxye+V8hgpFsn6UuLC5Oty0OuRLOsqXlGy/GKje1DjP5pxR4AK2xH+8Tt/llR9gQ1tVZPuVkj6sY1pb9sk7Mk9usmqUK81GMz3oaqAtgoqyOB2j/EcfuCSwqQPEwpxfak0PejH2gRGLipIecOphMsUjeilDB9uU9+L0oxWoKjMDNHR9+TiA0uw6I8Y255FtSmTiCd+uJcm82mUAwze9mDleANQh2aBhYCFwUfIHQAPmUwJWJANMfCa1IjLGPiYg/xMlaYK2vfLV4KzV0F42Wp9eVQ6PFnbsKU+VZ0pN0ZTXyqHyXukoXYUUvhS+Fr4XfhRnxW/Fn8VfGXR3Z8F5omyc4u9/a1CXhg==</latexit>
mono
mono
English Nepali
decoded with:
decoded with:p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
phas
e 1
phas
e 1
phas
e 2
phas
e 2
TRAIN //
mono
mono
generate train models
DATAEnglish Nepali
Shen et al. “The source-target domain mismatch problem in MT” arXiv:1909.13151 2019 Chen et al. “FBAI WAT’19 Myanmar-English translation task submission” WAT@EMNLP 2019
At = {(xk, ytk)}k=1,..,Mt
<latexit sha1_base64="fgtcm855BUPx8R0Y2MIqfz4NPuA=">AAAF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIIKFJ2+gkHhgaYt0mmjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece889dmz5NmW83f6zdGu5Vr99Z+Vu4979Bw8fra49PmJeGGDSxZ7tBScWYsSmLulyym1y4gcEOZZNjq3JuzR/fE4CRj33kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGddARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GGqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI00dnZm8uJUg48B/cG3ufDubRC0jW5LLkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noTT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmmNBVdea1sK++VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq//gGro+qV</latexit>
ML Perspective: Multi-Task/Multi-ModalEnglish Nepali
TEST
TRAIN //
Adding parallel data in other languages.
Training Dataset Learning Framework: Multilingual Training
DATA
TRAIN //
TRAIN //
HindiEncoder
Src TgtCross-Entropy
Loss
y<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>
human translator
human reference
predictioninput sentence with target language ID
Decoder
NMT system
(x, LID)<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AAAE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUggtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j33aGw7cBiHZvPPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSSFS/WBByIElBLRnHurYppyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjjoGfKu4H06LKVNJJWRW1PnC1QssvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB3111+HPJoH</latexit>
L(✓) = �X
s,t
E(x,y)⇠Ds,t[log p(y|x; t)]
<latexit sha1_base64="2m27Zees8z1jOYapGr5U+EniNnU=">AAAFW3ichVTbbtNAEHWbhAZTaAviiZcRVaVEhCguSCChSm1JJR4oKqI3KU6s9WbTOvEN77iy5e5P8gQP/ApifWlIUpeuZGm8c87MmeP1mr5tcex0fi0tV6q1Byv1h+qj1cdP1tY3np5yLwwoO6Ge7QXnJuHMtlx2ghba7NwPGHFMm52Zk49p/uyKBdzy3GOMfdZ3yIVrjSxKUG4ZG5XvW7pD8JISO+kK2AE9gUbUipuGBbowEmtHa7XbrS9C/Yc7FAOeI6MBN8YZbpzjDg2+gMQcGQ/QmGTIyQ0SZ5FdMUh0x/SihISRuBEStq4WhBxJkt+Ir6OmuiW7g84tB+aUzRTdmwptZEpboJskSGJhjJt3i5Y26DT0Ya6MzOfUGeBn0dDxkiFpyiavQbe9C5Da4BpSdTOwm9FEd++glBOlHHgF7h28b8eltEJSGXlv6nyOikQ6fpy6MD863jc6lkvaL5eUTl80bN5XuNzm273QQ2KL/5idvcszPySGNs3I+lMpA5zFbC/4Jw/JdXqUm+qdn5aHjpHwFopcrmkmB9LE7EdZOIJdUQAF9ADmdH4AKQP6xvpmp93JFtwOtCLYVIp1ZKz/0IceDR3mIrUJ5z2t42M/IQFa1GZC1UPOfEIn5IL1ZOgSh/F+kt0NArbkzhBGXiAfFyHbnWUkxOE8dkyJTEfgi7l0syzXC3H0vp9Yrh8ic2neaBTagB6kFw0MrYBRtGMZEBpYUivQSxIQivI6UqUJ2uLIt4PT7bb2pr399e3m7n5hR115obxUGoqmvFN2lU/KkXKi0MrPyp/qSrVe/V2r1NTaag5dXio4z5S5VXv+F4MBuCM=</latexit>
Share the same encoder and the same decoder with all the language pairs. Prepend a target language identifier to the source sentence to inform decoder of desired language. Concatenate all the datasets together. Train using standard cross-entropy loss.
TRAIN //
Den,ne [Den,hi [Dhi,en [Dne,hi<latexit sha1_base64="tUwh6HHQXAg003JxSttpQv2fWQE=">AAAF2HichVTLbtNAFHVpAiW8WliyuaKqlAgTxQUJJFSpLanEgqIimrYiTqzxZNJM45c848qWOxILEGLLp7HjJ/gGxo+G2HXTkSxdzz3n3nOPx2N6FmW80/mzdGu5Vr99Z+Vu4979Bw8fra49PmJu4GPSw67l+icmYsSiDulxyi1y4vkE2aZFjs3puyR/fE58Rl3nkEceGdjo1KFjihGXW8ba8t8N3UZ8gpEVdwVsgR5DM1SjlkFBF0ZMtzS13VY/isZ/3L4YsgwZDplxluLOMty+wUpIniGjITemKXJ6ieTzyK4YxrptumGMglBcCgnU85KQA0nymtFF2GpsyO6gM2pDQdlc0Z2Z0GaqVAXdRH4cCeOsdb1oaYOOAw8KZWQ+o84BP4imzieEo5Zs8gJ0yz0FqQ0uIFE3B7scTXR39io5YcKB5+Bcw/t8WEnLJVWRd2bOZ6hQJONHiQvF0flNo/NqSbvVkpLp84atmwpX23y1F3c5ssQCs9N3eeZHyNBmGVl/JmXI5zGbJf/kIblIjnJrwbdlgW3ETOUi02ua8Z50Mf1TSmewK3KggD5AQehbkDpg0ChgsxmJozpElF0q5Cd0QX5CVeIsyDsk4xur6512J11wNdDyYF3J14Gx+lsfuTiwicOxhRjrax2PD2Lkc4otIhp6wIiH8BSdkr4MHWQTNojTi0nAhtwZwdj15eNwSHfnGTGyGYtsUyITyaycSzarcv2Aj98MYup4AScOzhqNAwu4C8ktByPqE8ytSAYI+1RqBTxBPsJc3oUNaYJWHvlqcLTZ1l62Nz+9Wt/eze1YUZ4qz5SmoimvlW3lvXKg9BRc69Xi2rfa9/qX+tf6j/rPDHprKec8UQqr/usfSBTqkQ==</latexit>
Johnson et al. “Google’s multilingual NMT system…” ACL 2017 Aharoni et al. “Massively multilingual NMT” ACL 2019
Conclusion so far…• Assuming no domain effect, there are lots of training paradigms
depending on the available data.
• It is hard to predict what works best.
• In general, DAE pretraining, (iterative) BT and multi-lingual training perform strongly on low resource languages.
• All these methods can be combined together, but it requires some level of craftsmanship…
• Final touch: ensembling, fine-tuning, distillation, etc.32
amount of data
domain
language pair
Open Challenges
• Diversity of domains and varying translation quality.
• Wildly varying dataset sizes.
• Diversity of language pairs.
• Training large models to handle large datasets, as soon as we put together lots of language pairs and monolingual data.
33
Case Study #1: Unsupervised MTEnglish French
TEST
monomono
DATA
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
enx
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
fryt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019
Case Study #1: Unsupervised MT
35
English FrenchTEST
monomono
DATA
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Encoder
en frCross-Entropy
Losspredictioninput sentenceDecoder
NMT system
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
fryt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
backward NMT system
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
…and vice versa starting from English.This is an example of auto-encoding or cycle consistency.
Problem: lack of constrained on x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>
Case Study #1: Unsupervised MTEnglish French
TEST
monomono
DATA
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Encoder
en frCross-Entropy
Losspredictioninput sentenceDecoder
NMT system
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
fryt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
backward NMT system
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
Problem: lack of modularity. Decoder may behave differently when fed with representations from French
encoder VS English encoder.
enCross-Entropy
Losspredictioninput sentence+
enEncoder Decoder
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
DAE makes sure decoder outputs fluently in the desired language.
n<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit> NMT system
Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019
Case Study #1: Unsupervised MT
37
English FrenchTEST
monomono
DATA
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Encoder
en frCross-Entropy
Losspredictioninput sentenceDecoder
NMT system
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
fryt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
backward NMT system
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
enCross-Entropy
Losspredictioninput sentence+
n<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit> en
Encoder Decoder
NMT system
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
DAE makes sure decoder outputs fluently in the desired language.
Encoder
Src Tgtpredictioninput sentence
with target language ID
Decoder
NMT system
(x, LID)<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AAAE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUggtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j33aGw7cBiHZvPPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSSFS/WBByIElBLRnHurYppyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjjoGfKu4H06LKVNJJWRW1PnC1QssvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB3111+HPJoH</latexit>
Like in multilingual NMT, share encoder and decoder parameters. Encoder is encouraged to produce
shared representations (particularly if pre-trained).
Case Study #1: Unsupervised MT
38
English FrenchTEST
monomono
DATA
Mt = {ytk}k=1,..,Mt<latexit sha1_base64="WPPS0Z7qvI346bxZ6Qp8zIQogqY=">AAADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWggSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G44Qu46LX+1XT6jdu3rq9d6dx9979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qqAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxxnnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceeoqeIx2Z6B06Qp/RAA0Rqf3Wnmgt7YX2p/6s/rKul6hWW+c8Rluvbv4FlMEQtg==</latexit>
Ms = {xsj}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">AAACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRLL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70ooXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3oou2MSKdX9FcsTzhKTBJjZmEQQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov300qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo788I1/9O1t9+TjdhwH6BV6jfooRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD//Am3y1zs=</latexit>
Encoder
en frCross-Entropy
Losspredictioninput sentenceDecoder
NMT system
x<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">AAADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFSS5aOznlen9dHxwk4E7Ld/lsyzHv3HzzceVR+/OTps+eV3Rdnwo9CQnvE53544WBBOfNoTzLJ6UUQUuw6nJ47889Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhhilzHj1McxWppJGpebRjpalFQT67jRrmmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaaF5CdaO+esf4J8pbw==</latexit>Encoder Decoder
fryt ⇠ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">AAADzHicdZJba9swFMcVe5cuu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/XX+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBDD3kPkRtMoPYGb2HmbgVbPk11Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depjj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynnqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTTCKmvUIyxREmQv//sl6Ctfnk7eTyoGV9aB38+Fg9+bRYxx54A96COrDAITgBX0EX9AAxvhmBkRipeW4KU5qqQI3SQvMarIX56x/eni+B</latexit>
backward NMT system
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
enCross-Entropy
Losspredictioninput sentence+
n<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">AAADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yyelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSSnVOKGbvIWIu5PoPYGb2DmbgVbfk11Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GGXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJJ6pGg0jjiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr88t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit> en
Encoder Decoder
NMT system
xs ⇠ Ms<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">AAACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t//Ie/zk6bPnnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKKFwzag0gxqf4MjgXtmvDonAUU2MOAn7vt//Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPPTscR1ok+JaxrTs/1ESTTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa55Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd33dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirrPpeYVuhfvzL1Ju1zs=</latexit>
DAE makes sure decoder outputs fluently in the desired language.
Encoder
Src Tgtpredictioninput sentence
with target language ID
Decoder
NMT system
(x, LID)<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AAAE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUggtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j33aGw7cBiHZvPPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSSFS/WBByIElBLRnHurYppyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjjoGfKu4H06LKVNJJWRW1PnC1QssvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB3111+HPJoH</latexit>
Like in multilingual NMT, share encoder and decoder parameters. Encoder is encouraged to produce
shared representations (particularly if pre-trained).
ITERATIVE BT
DAE
Multi-Lingual
+
+
Same ideas can be applied to phrase-based statistical MT systems (PBSMT). NMT and PBSMT can be combined for
even better results.
WMT’14 En-Fr
Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018
Since unsupMT was trained on about 10M sentences, each parallel sentence is worth 100 monolingual sentences (for this dataset
and language pair).
Case Study #2: FLoRes Ne-En
40
In-domain (Wikipedia) Out-of-domain
Parallel None
500K sentences (Bible, GNOME/Ubuntu,
OpenSubtitle, …)
*Hindi: 1.5M
Monolingual 100K sentences~5M sentences
(CommonCrawl)
*Hindi: 45M
Results on FLoRes: Ne-En
41
42
Results on FLoRes: Ne-En
43
Results on FLoRes: Ne-En
44
Results on FLoRes: Ne-En
Case-Study #3: English-Burmese
Workshop on Asian Translation 2019: English-Burmese
In-domain (News) Out-of-domain
Parallel 20K sentences 200K sentences
Monolingual ~79M sentences (En only)
~23M sentences (My only)
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
Results: Iterative ST+BTMy —> En En —> My
BLE
U
26
29
32
35
38
Parallel Iter. 1 Iter. 2 Iter. 3B
LEU
35
36.5
38
39.5
41
Parallel Iter. 1 Iter. 2 Iter. 3
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
Results: BT vs ST vs BT+STMy —> En, iter. 2
BLEU
30
31.25
32.5
33.75
35
iter. 1 iter. 2 BT iter. 2 ST iter. 2 BT+ST
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
Final Results of 2019 Competition
BLEU
24
28
32
36
40
Ours NICT NICT-NMT UCSMNLP
En —> My
BLEU
18
23.5
29
34.5
40
Ours NICT-NMT NICT UCSYNLP
My —> En
+8 BLEU compared to second best
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
50
Demo (Burmese —> English) 1
23.3 26.32 27.74
slide credit to Peng-Jen Chen
Conclusion so far…• Iterative back-translation, multi-lingual training work remarkably well.
• By feeding more data (BT, ST, pre-training, multi-lingual training) we can afford training bigger models. Bigger models train on more data generalize better.
• Low-resource MT requires big compute! Remember that about 100 monolingual sentences give the same training signal as a single pair of parallel sentence.
51
Outline
52
MODELDATA
ANALYSIS
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019
“Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020
“Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
Simulating Low-Resource MTSimulating low-resource MT with a high resource language: using EuroParl data with 20K parallel sentences and 100K monolingual target sentences.
only parallel data
parallel data + BT
30.4 BLEU
33.8 BLEU+3.4 BLEU!
https://www.statmt.org/europarl/
EuroParl Fr—>En
A Worrisome FindingBT sometimes yields very mild improvements.
The FLORES evaluation datasets… Guzmán, Chen et al. EMNLP 2019
Exampleonly parallel data
parallel data + BT
15.2 BLEU
15.3 BLEU
FB public posts En—>My
+0.1 BLEU!
Why is BT not working as well?
• football• baseball• basketball
• soccer• cricket• rowing
Sports
55For the same topic: Different distribution over words!
• hamburger• clam chowder• apple pie
• kimchi• bulgogi• bibimbap
Food
56For the same topic: Different distribution over words!
topic distribution57Domains differ in the topic distribution
• politics• sports
• religion• environment
Examples from FLoResSi-En
En-Si
original
translation
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
Wikipedia in Sinhala has different topic distribution.
Source-Target Domain Mismatch (STDM)• Def.: Content produced in blogs, social networks, news outlets, etc. varies with the
geographic location.
• STDM is even more pronounced in low resource MT, where source & target geographic locations are typically farther apart and cultures have more distinct traits.
• STDM manifests itself in terms of: Different distribution of topics, and for the same topic different distribution over words.
• STDM is hard to measure because of the unknown effect of translationese.
• STDM makes the MT problem even harder: source originating data and target originating data are not comparable in general. BT won’t work as well. UnsupMT won’t work as well.
Language and place. Johnstone Cambridge Univ. Press 2010 Leech et al. “Computer corpora: What do they tell us about culture?”Journal Computers in English Linguistics 1992
Questions
• Is it true that BT is less effective when there is STDM?
• What baselines shall we consider when there is STDM?
• What are general best practices when there is STDM?
• How to study STDM in a controlled setting?
60
Controlled Setting
Source Domain EuroParl
Target Domain OpenSubtitles
mono
mono
Source Language: Fr Target Language: En
10K sentences
<1M sentences
61
<1M sentences
human translations
human translations
Controlled Setting
Source Domain EuroParl mono
mono
Source Language: Fr Target Language: En
62
(1� ↵)<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AAAGAHicjVTbbtMwGM5GCyOcNpC44cZiqtTSg5KBAAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMMFCHHLY3DH2/A7SSHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL22vnH9kIfj2CUHbsjC+MjBnDAakANBBSNHUUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLssnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCggaiRGp+lQwL6ac/9q8/y3lqkbr/effPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr++A3w4ANP</latexit>
EuroParl + OpenSubtitles ↵<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AAAF+nicjVTbbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpccuiEL4xMHc8JoQA4FFYycRDHBvsPIsXP6QumPP5KY0zA4EJOIdH08CKhHXSxAZG+WKlUU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+llMqwjjj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhhjxlCCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyuu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCggaiRGp+VQwL6Wc/9q8/y3lqmbr7be//XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEggZtf5I0YgnKodxD1aUxcwSawwW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav99uHW7vNpOja0u9o9raaZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA==</latexit>
↵ = 0<latexit sha1_base64="ZPnRfubrH7b3OY+yvc0OydClHfo=">AAAF/nicjVTbbtMwGM5GCyMctgF33FhMlTp6UDIQIMGkCTSJCxADdpKaNnJStzVzDopdlspY4lW44QKEuOU5uONt+J2k0HbtwFIU+z98/v6TvZhRLizr19LyhVL54qWVy+aVq9eur66t3zjk0TDxyYEfsSg59jAnjIbkQFDByHGcEBx4jBx5J8+0/ug9STiNwn0xikk7wP2Q9qiPBYjc9dKtCoqr6SZy4iSKRYRIRyKHRX0Uu9IJvCiVjqDhSL54qZTShjXE4afMStppoG1UPXWlvWWrutONBK/r0wNLAR6nwVyMBR7oAzp17TrKhCgzARiz8qrqiAEReBPucgIaVq26JtCpPUZjRUMLzEqnMSkKwFcTzdw0BT8KuWgihZxelGDGUIocGkCOCc8zsL0oZLOSJ6SaunbBL3X3tcN5mZq2rYEpVKWLNdczqgmCkFh9kYB8pK58ItRiYlNGQLKXYB+KF2Ax8Dy5q1wJSifrEZmQrhI1uzh5DPsn4wLsqxzGLs4CagRdMMF3UyF1LvB/wTbsOcAQbh4Z7UFpZuLsYoF1x0CiZzTvIhoKrfpTwkl282EUai1AASLtrE8WQowtz8cws1nq3AWstEOnZmoiaNBA1EiPz8IhAf245/7VZ3lvzVM33uy+/Wsy2WNI40fwLjDSEzhJolPZV6aDWTzAcJ3lrm1YTStb6OzGLjYbRrH23LWfUGR/GJBQ+Axz3rKtWLQlTgT1GQHoIScxNAfukxZsQxwQ3pZZ1yhUAUkXwVTCFwqUSSc9JA44HwUeWOoK8VmdFs7TtYai96gtaRgPBQn9/KLekCEoiX4LUZcmxBdsBBvsJxS4In+AYYoEvJgmJMGeDfns5nCrad9rbr2+v7HztEjHinHbuGNUDdt4aOwYz40948DwS7L0qfSl9LX8sfy5/K38PTddXip8bhpTq/zjN1cdAvk=</latexit>
Target Domain OpenSubtitles
Controlled Setting
Source Domain EuroParl
Target Domain
mono
mono
Source Language: Fr Target Language: En
63
(1� ↵)<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AAAGAHicjVTbbtMwGM5GCyOcNpC44cZiqtTSg5KBAAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMMFCHHLY3DH2/A7SSHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL22vnH9kIfj2CUHbsjC+MjBnDAakANBBSNHUUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLssnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCggaiRGp+lQwL6ac/9q8/y3lqkbr/effPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr++A3w4ANP</latexit>
EuroParl + OpenSubtitles ↵<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AAAF+nicjVTbbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpccuiEL4xMHc8JoQA4FFYycRDHBvsPIsXP6QumPP5KY0zA4EJOIdH08CKhHXSxAZG+WKlUU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+llMqwjjj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhhjxlCCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyuu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCggaiRGp+VQwL6Wc/9q8/y3lqmbr7be//XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEggZtf5I0YgnKodxD1aUxcwSawwW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav99uHW7vNpOja0u9o9raaZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA==</latexit>
intermediate value of ↵<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AAAF+nicjVTbbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpccuiEL4xMHc8JoQA4FFYycRDHBvsPIsXP6QumPP5KY0zA4EJOIdH08CKhHXSxAZG+WKlUU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+llMqwjjj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhhjxlCCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyuu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCggaiRGp+VQwL6Wc/9q8/y3lqmbr7be//XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEggZtf5I0YgnKodxD1aUxcwSawwW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav99uHW7vNpOja0u9o9raaZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA==</latexit>
OpenSubtitles
Controlled Setting
Source Domain & Target Domain
EuroParlmono
mono
Source Language: Fr Target Language: En
64
(1� ↵)<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AAAGAHicjVTbbtMwGM5GCyOcNpC44cZiqtTSg5KBAAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMMFCHHLY3DH2/A7SSHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL22vnH9kIfj2CUHbsjC+MjBnDAakANBBSNHUUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLssnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCggaiRGp+lQwL6ac/9q8/y3lqkbr/effPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr++A3w4ANP</latexit>
EuroParl + OpenSubtitles ↵<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AAAF+nicjVTbbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpccuiEL4xMHc8JoQA4FFYycRDHBvsPIsXP6QumPP5KY0zA4EJOIdH08CKhHXSxAZG+WKlUU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+llMqwjjj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhhjxlCCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyuu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCggaiRGp+VQwL6Wc/9q8/y3lqmbr7be//XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEggZtf5I0YgnKodxD1aUxcwSawwW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav99uHW7vNpOja0u9o9raaZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA==</latexit>
↵ = 1<latexit sha1_base64="xIEQP2x/mHoaZFbz/3+L5Jgy2hk=">AAAF/nicjVTbbtMwGM5GCyMctgF33FhMlTp6UDIQIMGkCTSJCxADdpKaNnJSpzVzDopdlspY4lW44QKEuOU5uONt+J22kHbtwFIU+z98/v6TvYRRLizr19LyhVL54qWVy+aVq9eur66t3zjk8SD1yYEfszg99jAnjEbkQFDByHGSEhx6jBx5J8+0/ug9STmNo30xTEg7xL2IBtTHAkTueulWBSXVbBM5SRonIkakI5HD4h5KXOmEXpxJR9BoKF+8VEppwxri8FNmJes00DaqnrrS3rJV3enGgtf16YGlAI/TcC7GAg/0AZ26dh3lQpSbAIxZeVV1RJ8IvAl3OSGNqlZdE+jUHqOJoqEFZqXTKIpC8NVEczdNwY8jLppIISeIU8wYypBDQ8gx4aMMbC8K2ayMElLNXHvML3P3tcN5mZq2rYEpVKWLNdczqgJBSKy+SEA+Mlc+EWoxsSkjIBmk2IfihVj0PU/uKleC0sl7RKakq0TNHp88hv2TSQH21QjGHp8F1Ai6oMB3UyF1LvB/wTbsOcAQ7igyGkBpZuLsYoF1x0CiZzTvYhoJrfpTwiK7+TAKtRagAJF23icLISaW52OY+Sx17gJW1qFTM1UIGjQQNdLjs3BIQD/puX/12ai35qkbb3bf/jUp9hjS+DG8C4wEAqdpfCp7ynQwS/oYrrPdtQ2raeULnd3Y482GMV577tpPKLI/CEkkfIY5b9lWItoSp4L6jAD0gJMEmgP3SAu2EQ4Jb8u8axSqgKSLYCrhiwTKpUUPiUPOh6EHlrpCfFanhfN0rYEIHrUljZKBIJE/uigYMAQl0W8h6tKU+IINYYP9lAJX5PcxTJGAF9OEJNizIZ/dHG417XvNrdf3N3aejtOxYtw27hhVwzYeGjvGc2PPODD8kix9Kn0pfS1/LH8ufyt/H5kuL419bhpTq/zjN1ihAvo=</latexit>
• Back-Translation:
• Self-Training:
target mono data
source mono data
in-domain mono data
Q.: Is it better to have clean targets but out-of-domain data, or noisy targets but in-domain data? Q.: What’s the effect of amount of parallel/monolingual data? Q.: What’s the effect of the quality of the model forward model when training with ST?
65
Varying Domain of Target Originating Data
Target originating data is out-of-domain
Target originating data is in-domain
(1� ↵)<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AAAGAHicjVTbbtMwGM5GCyOcNpC44cZiqtTSg5KBAAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMMFCHHLY3DH2/A7SSHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL22vnH9kIfj2CUHbsjC+MjBnDAakANBBSNHUUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLssnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCggaiRGp+lQwL6ac/9q8/y3lqkbr/effPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr++A3w4ANP</latexit>
EuroParl + OpenSubtitles ↵<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AAAF+nicjVTbbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpccuiEL4xMHc8JoQA4FFYycRDHBvsPIsXP6QumPP5KY0zA4EJOIdH08CKhHXSxAZG+WKlUU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+llMqwjjj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhhjxlCCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyuu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCggaiRGp+VQwL6Wc/9q8/y3lqmbr7be//XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEggZtf5I0YgnKodxD1aUxcwSawwW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav99uHW7vNpOja0u9o9raaZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA==</latexit>
Baseline Approaches• Bitext only:
• Back-Translation:
• Self-Training:
learn:
1) 2) 3) learn:apply
Source Target
1)
learn:
2)source mono
apply
modeltranslation
3) re-learn: source mono
modeltranslation
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
• Bitext only:
• Back-Translation:
• Self-Training:
learn:
1) 2) 3) learn:apply
Source Target
1)
learn:
2)source mono
apply
modeltranslation
3) re-learn: source mono
modeltranslation
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
p(x|y)<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">AAADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybbyZqkqcBwOOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64yy95/eKKRoIF/plMQzry8MxnU0aw1CnnsPKngTws5wTzrKfgMUQZbCbttOUwiJSTsWO73em0f6jqDXeqxqIkk7FwFgW3KLlTR+yQsiTTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVVRQLGmKyxDM61KGPPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhhEWUSJ7qAJOIaa+QzHGEidQbX9VDsHeffDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g==</latexit>
Baseline Approaches: Only In-Domain Data
p(y|x)<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AAAECnichZNNb9NAEIa3Nh8lfKVw5DIiipSIEMUFCS6VGggSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O++MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kxx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VVXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmmTuH2x+IKKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZZUWKXc4T11ZkZpKv17LkptppJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY11zxEK0f/9RdRQUZC</latexit>
In-Domain Only VS. Mixed Domain
69
Varying Amount of Monolingual Data ( )↵ = 0
<latexit sha1_base64="ZPnRfubrH7b3OY+yvc0OydClHfo=">AAAF/nicjVTbbtMwGM5GCyMctgF33FhMlTp6UDIQIMGkCTSJCxADdpKaNnJStzVzDopdlspY4lW44QKEuOU5uONt+J2k0HbtwFIU+z98/v6TvZhRLizr19LyhVL54qWVy+aVq9eur66t3zjk0TDxyYEfsSg59jAnjIbkQFDByHGcEBx4jBx5J8+0/ug9STiNwn0xikk7wP2Q9qiPBYjc9dKtCoqr6SZy4iSKRYRIRyKHRX0Uu9IJvCiVjqDhSL54qZTShjXE4afMStppoG1UPXWlvWWrutONBK/r0wNLAR6nwVyMBR7oAzp17TrKhCgzARiz8qrqiAEReBPucgIaVq26JtCpPUZjRUMLzEqnMSkKwFcTzdw0BT8KuWgihZxelGDGUIocGkCOCc8zsL0oZLOSJ6SaunbBL3X3tcN5mZq2rYEpVKWLNdczqgmCkFh9kYB8pK58ItRiYlNGQLKXYB+KF2Ax8Dy5q1wJSifrEZmQrhI1uzh5DPsn4wLsqxzGLs4CagRdMMF3UyF1LvB/wTbsOcAQbh4Z7UFpZuLsYoF1x0CiZzTvIhoKrfpTwkl282EUai1AASLtrE8WQowtz8cws1nq3AWstEOnZmoiaNBA1EiPz8IhAf245/7VZ3lvzVM33uy+/Wsy2WNI40fwLjDSEzhJolPZV6aDWTzAcJ3lrm1YTStb6OzGLjYbRrH23LWfUGR/GJBQ+Axz3rKtWLQlTgT1GQHoIScxNAfukxZsQxwQ3pZZ1yhUAUkXwVTCFwqUSSc9JA44HwUeWOoK8VmdFs7TtYai96gtaRgPBQn9/KLekCEoiX4LUZcmxBdsBBvsJxS4In+AYYoEvJgmJMGeDfns5nCrad9rbr2+v7HztEjHinHbuGNUDdt4aOwYz40948DwS7L0qfSl9LX8sfy5/K38PTddXip8bhpTq/zjN1cdAvk=</latexit>
Conclusion• STDM is particularly significant in low resource language pairs.
• Controlled setting helps studying STDM in isolation.
• ST is more robust than BT to STDM. We have already seen that combining ST & BT worked best in En-My.
• In practice, the influence of STDM depends on several factors, such as the amount of parallel and monolingual data, the domains, etc. In particular, if domains are not too distinct, STDM may even help regularizing!
71
What I did not talk about: Filtering
• Data: extract clean version of common crawl.
• Learn a joint embedding space.
• Use approximate nearest neighbor methods to find closest matching sentence in embedding space.
• Translation quality on several languages is even higher than using actual existing bilingual + mono data!
Schwenk et al. “CCMatrix: mining billions of high-quality parallel sentences on the WEB” arXiv:1911.04944 2019
BT pre-training
multilingualfiltering
Final Remarks• Low resource MT is a good use case applications of several long standing ML problems:
aligning domains, learning with less supervision, leveraging compositionality, etc.
• The importance and difficulty of data collection should not be under-estimated.
• A healthy cycle of research: data, modeling, analysis.
• Low resource MT key idea: use as many auxiliary tasks and data.
• Low resource MT requires lots of data and compute.
• Lots of open challenges. • Specific to low resource: dealing with all sorts of domain mismatch, learning from little data, quality of evaluation
sets… • General of text generation: better use of context, common sense, striking a good trade-off between accuracy and
speed, controllability, safety, biases…
73
Questions? Вопросы?
¿Preguntas? Domande?
Guillaume Lample
Ludovic Denoyer
Alexis Conneau Hervé Jegou
Myle Ott
Michael AuliSergey Edunov
Peng-Jen Chen
Matt Le
Jiajun Shen
Juan-Miguel Pino
Jiatao Gu
Philipp Kohen
Paco Guzmán
Vishrav Chaudhary
Xian Li
Junxian He
Naman Goyal