unsupervised machine translation - stanford universityintroduction wmt 2019 english-german system...

Post on 09-Aug-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

UnsupervisedMachine Translation

Mikel Artetxe

Facebook AI Research

Introduction

IntroductionWMT 2019 English-German

IntroductionWMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

???

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

En mi opinion, …

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

…but strong limitations…

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

…but strong limitations…

IMPRESSIVE PROGRESS IN THE LAST 2-3 YEARS!

Outline

L1 corpus

Outline

L1 corpus

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

magic

L1 corpus

non-parallel

Outline

L2 corpus

magic

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

magic

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

L1 corpus

non-parallel

Outline

L2 corpus

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

L2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

L2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Word embeddings

Word embeddingsDistributed representations of words

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words Learned from co-occurrence patterns

in a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

𝑐

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

𝑐

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

𝑐

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Cross-lingual word embedding alignment

𝑋

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

Cross-lingual word embedding alignment

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Cross-lingual word embedding alignment

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

BilbaoBayonaPamplona

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

BilbaoBayonaPamplona

Cross-lingual word embedding alignment

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

Cross-lingual word embedding alignment

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋

Basque

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊House

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Mikolov et al. (arXiv’13)Basque English

Cross-lingual word embedding alignment

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

arg min!

'"

𝑋"∗𝑊 − 𝑍$∗%

Basque English

Cross-lingual word embedding alignment

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

arg min!

'"

𝑋"∗𝑊 − 𝑍$∗%

Basque English

Cross-lingual word embedding alignment

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋 𝑍

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Basque English

Cross-lingual word embedding alignment

𝑋 𝑍

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

𝑊

𝑋 𝑍 𝑋𝑊

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignment

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignment

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Mapping

Dictionary

Dictionary

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two

English

for x in vocab:

sim(“two”, x)

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two due (two)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two due (two)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8

Cross-lingual word embedding alignment

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction- Final symmetric re-weighting

Cross-lingual word embedding alignment

Cross-lingual word embedding alignmentSupervision Method

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

None

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

None

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Unsupervised neural machine translation

Unsupervised neural machine translation

Unsupervised neural machine translation

Unsupervised neural machine translation

Training

Unsupervised neural machine translation

Training

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

A shooting has had place in airport international of Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

A shooting has had place in airport international of Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

L1 corpus L2 corpus

Phrase-based SMT

L1 corpus L2 corpus

Phrase-based SMT

Phrase-based SMT

L1 corpus L2 corpus

Phrase-based SMT

Phrase-based SMTLog-linear model combining

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table

nire iritziz in my opinion

nire iritziz in my view

nire iritziz I think

opari bat a present

opari bat one present

opari bat a gift

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63

nire iritziz in my view 0.32 0.68

nire iritziz I think 0.11 0.09

opari bat a present 0.32 0.56

opari bat one present 0.14 0.73

opari bat a gift 0.11 0.49

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇) 𝐥𝐞𝐱(I𝒇|I𝒆) 𝐥𝐞𝐱(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63 0.12 0.15

nire iritziz in my view 0.32 0.68 0.09 0.16

nire iritziz I think 0.11 0.09 0.04 0.02

opari bat a present 0.32 0.56 0.21 0.22

opari bat one present 0.14 0.73 0.18 0.32

opari bat a gift 0.11 0.49 0.11 0.13

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMT

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑝 𝑐

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

min!$̅#

log𝜙 ̅𝑓|NN ̅$ ̅𝑓 +$̅$

log𝜙 �̅�|NN ̅# �̅�

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY… EASY!!!- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

LP = LP 𝐸 ⋅ LP 𝐹 , LP 𝐸 = max 1,&'( )%→' )'→% *

&'( *

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

• 𝐿:< 𝐸 = max 0, H 𝐹 − H T?→= 𝐸6⋅ LP

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpusUnsupervised phrase-based SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

phra

se

tabl

e

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train

phra

se

tabl

e

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 → L2 SMTL1 train

phra

se

tabl

e

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train

phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

eL2 → L1 SMT

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e Let’s look inside!

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

eph

rase

ta

ble

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

Spurious targets!

𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 → L1 SMT

L1 → L2 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

unsup. tuning

unsup. tuning

L1 → L2 SMT

L2 → L1 SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

(unsupervised) SMT has a hard ceiling!

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

NMT hybridization

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

NMT hybridization

L1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

NMT hybridization

L1 → L2 SMTL1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

NMT hybridization

L1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

NMT hybridization

L1 train L2’ train

L2 → L1 NMTL2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

NMT hybridization

L1 train L2’ train

L2 trainL2 → L1 SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

L1 train L2’ trainL1 → L2 NMT

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

NMT hybridization

L1 train L2’ train

L2 train L1’ train

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

NMT hybridization

L1 train L2’ train

L2 train L1’ train

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)𝑁-C8 = 𝑁 − 𝑁BC8

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

Machine Translation

L1 corpus

non-parallel L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Outline

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Outline

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18) - Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Basic: word-for-word translation using cross-lingual word embeddings

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Basic: word-for-word translation using cross-lingual word embeddings

- Better: Unsupervised statistical machine translation

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Results

Results

- Languages: French-English, German-English

Results

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

Results

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

Results

FR-EN EN-FR DE-EN EN-DE

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

Supervised

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4Large scale back-translation (Edunov et al., EMNLP’18) - 43.8 - 33.8

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Conclusions

ConclusionsUnsupervised machine translation is a reality!

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!

What’s next?

Thank you!

Twitter: @artetxemEmail: artetxe@fb.com

top related