unsupervised machine translation - stanford universityintroduction wmt 2019 english-german system...

UnsupervisedMachine Translation

Mikel Artetxe

Facebook AI Research

Introduction

IntroductionWMT 2019 English-German

IntroductionWMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

human evaluators preferred MT to

human translation!

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

human translation!

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

human translation!

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

human translation!

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…

Introduction

human translation!

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

Introduction

human translation!

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

Introduction

human translation!

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

Introduction

human translation!

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

Introduction

human translation!

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

Introduction

human translation!

In my opinion, this social dimension should, in fact, be brought into…

Introduction

human translation!

Introduction

human translation!

En mi opinion, …

Introduction

human translation!

Introduction

human translation!

Introduction

human translation!

We would need 48 years to read that!

Introduction

human translation!

We would need 48 years to read that!- Not sample efficient

Introduction

human translation!

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Introduction

human translation!

Introduction

human translation!

Introduction

human translation!

Can we train MT systems with zero

parallel data?

Introduction

human translation!

parallel data?

Scientific & practical interest!

Introduction

human translation!

parallel data?

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

Introduction

human translation!

parallel data?

…but strong limitations…

Introduction

human translation!

parallel data?

…but strong limitations…

IMPRESSIVE PROGRESS IN THE LAST 2-3 YEARS!

Outline

L1 corpus

Outline

L1 corpus

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

L2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Word embeddings

Word embeddingsDistributed representations of words

Banana

Calendar

Banana

Calendar

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Banana

Calendar

Banana

Calendar

Word embeddingsDistributed representations of words Learned from co-occurrence patterns

in a monolingual corpus

Banana

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

Banana

Calendar

Word embeddings

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Banana

Calendar

Word embeddings

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Cross-lingual word embedding alignment

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

𝑋 𝑍

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

𝑋 𝑍

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑋 𝑍

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

𝑋 𝑍 𝑋𝑊

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Gasteiz

Iruñea

D. Garazi

Baiona

BilboBaionaIruñea

𝑋 𝑍 𝑋𝑊

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Gasteiz

Iruñea

D. Garazi

Baiona

BilboBaionaIruñea

𝑋 𝑍 𝑋𝑊

Donostia

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Gasteiz

Iruñea

D. Garazi

Baiona

Mikolov et al. (arXiv’13)

Zaunka

TxakurKatu

Banana

EgutegiEtxe

Basque

MeowBark

CowBanana

Pear Calendar

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋 𝑍

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊House

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

Mikolov et al. (arXiv’13)Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

arg min!

𝑋"∗𝑊 − 𝑍$∗%

Basque English

MeowBark

CowBanana

Pear Calendar

Zaunka

Txakur

Banana

Egutegi

SagarUdare

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍 𝑋𝑊

arg min!

𝑋"∗𝑊 − 𝑍$∗%

Basque English

Zaunka

TxakurKatu

Banana

EgutegiEtxe

𝑋 𝑍

MeowBark

CowBanana

Pear Calendar

Basque English

𝑋 𝑍

𝑋 𝑍 𝑋𝑊

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

𝑋&∗𝑊 − 𝑍5∗6

- Random dict.

Mapping

Dictionary

Intra-lingual similarity distribution

Mapping

Dictionary

English

Mapping

Dictionary

English

Mapping

Dictionary

English

for x in vocab:

sim(“two”, x)

Mapping

Dictionary

English

Mapping

Dictionary

ItalianEnglish

Mapping

Dictionary

two due (two)

ItalianEnglish

Mapping

Dictionary

two due (two)

ItalianEnglish

Mapping

Dictionary

two due (two) cane (dog)

ItalianEnglish

Mapping

Dictionary

ItalianEnglish

Mapping

Dictionary

ItalianEnglish

Mapping

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

Mapping

Dictionary

ItalianEnglish

Mapping

Dictionary

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8

Mapping

Dictionary

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Mapping

Dictionary

ItalianEnglish

Mapping

Dictionary

ItalianEnglish

+Robust self-learning

Mapping

Dictionary

ItalianEnglish

- Stochastic dictionary induction

Mapping

Dictionary

ItalianEnglish

- Stochastic dictionary induction- Frequency-based vocabulary cutoff

Mapping

Dictionary

ItalianEnglish

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)

Mapping

Dictionary

ItalianEnglish

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction

Mapping

Dictionary

ItalianEnglish

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction- Final symmetric re-weighting

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

5k dict.

25 dict. Artetxe et al. (2017)

5k dict.

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

5k dict.

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Init.heurist.

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Init.heurist.

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Init.heurist.

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 embeddings

Machine Translation

L1 corpus

non-parallel

L2 embeddings

- Neural approach

Machine Translation

L1 corpus

non-parallel

L2 embeddings

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

L2 embeddings

- Neural approach

Unsupervised neural machine translation

Training

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

TrainingSupervised

TrainingSupervisedDenoising

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

TrainingSupervisedDenoisingBacktranslation

A shooting has had place in airport international of Los Angeles.

Machine Translation

L1 corpus

non-parallel

L2 embeddings

- Neural approach

Machine Translation

L1 corpus

non-parallel

L2 embeddings

- Neural approach

L1 corpus L2 corpus

Phrase-based SMT

L1 corpus L2 corpus

Phrase-based SMT

L1 corpus L2 corpus

Phrase-based SMT

Phrase-based SMTLog-linear model combining

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

- Phrase table

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

- Phrase table

nire iritziz in my opinion

nire iritziz in my view

nire iritziz I think

opari bat a present

opari bat one present

opari bat a gift

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

- Phrase table- Direct/inverse translation probabilities

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63

nire iritziz in my view 0.32 0.68

nire iritziz I think 0.11 0.09

opari bat a present 0.32 0.56

opari bat one present 0.14 0.73

opari bat a gift 0.11 0.49

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇) 𝐥𝐞𝐱(I𝒇|I𝒆) 𝐥𝐞𝐱(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63 0.12 0.15

nire iritziz in my view 0.32 0.68 0.09 0.16

nire iritziz I think 0.11 0.09 0.04 0.02

opari bat a present 0.32 0.56 0.21 0.22

opari bat one present 0.14 0.73 0.18 0.32

opari bat a gift 0.11 0.49 0.11 0.13

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

- Language model

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

- Language model- N-gram frequency counts with back-off and smoothing

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

- Reordering model

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

- Reordering model- Distortion model (distance based)

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

- Reordering model- Distortion model (distance based)- Lexical reordering model

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMT

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Cross-lingual mapping

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

I will go to New York by plane .

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

I will go to New York by plane .𝑝 𝑐

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

min!$̅#

log𝜙 ̅𝑓|NN ̅$ ̅𝑓 +$̅$

log𝜙 �̅�|NN ̅# �̅�

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

- Phrase table TRICKY… EASY!!!- Direct/inverse translation probabilities- Direct/inverse lexical weightings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Unsupervised phrase-based SMT

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

… but we don’t have a parallel development set!

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Solutions:

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

LP = LP 𝐸 ⋅ LP 𝐹 , LP 𝐸 = max 1,&'( )%→' )'→% *

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

• 𝐿:< 𝐸 = max 0, H 𝐹 − H T?→= 𝐸6⋅ LP

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpusUnsupervised phrase-based SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 train

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 → L2 SMTL1 train

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 train

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 train L2’ train

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

eL2 → L1 SMT

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e Let’s look inside!

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

Spurious targets!

𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L2 → L1 SMT

L1 → L2 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L2 train

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L2 train

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 corpus

Phrasetable

tuning refine

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

unsup. tuning

L1 → L2 SMT

L2 → L1 SMT

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

but…

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

(unsupervised) SMT has a hard ceiling!

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

NMT hybridization

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

NMT hybridization

L1 train

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

NMT hybridization

L1 → L2 SMTL1 train

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

NMT hybridization

L1 train

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

NMT hybridization

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

NMT hybridization

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

NMT hybridization

L2 → L1 NMTL2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

NMT hybridization

L2 trainL2 → L1 SMT

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

NMT hybridization

L2 train

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

L1 train L2’ trainL1 → L2 NMT

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

NMT hybridization

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

NMT hybridization

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)𝑁-C8 = 𝑁 − 𝑁BC8

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus

tuning

L2 corpus

refine

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

Machine Translation

L1 corpus

non-parallel L1 embeddings

L2 embeddings

Outline

- Neural approach

Machine Translation

L1 corpus

non-parallel L1 embeddings

L2 embeddings

Outline

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

Initialization

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

Initialization Training

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Denoising

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18) - Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Works well, but very expensive to train

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Masking can be seen as a form of noise!

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Basic: word-for-word translation using cross-lingual word embeddings

- Denoising

- Back-translation

- Neural approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

L2 embeddings

- Basic: word-for-word translation using cross-lingual word embeddings

- Better: Unsupervised statistical machine translation

- Denoising

- Back-translation

- Neural approach

Results

- Languages: French-English, German-English

Results

- Training: WMT-14 News Crawl

Results

- Test set: WMT-14 newstest (BLEU)

Results

FR-EN EN-FR DE-EN EN-DE

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

*Tokenized BLEU (about 1-2 points higher)

Results

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Results

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -

Results

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

Results

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5