unsupervised machine translation - stanford universityintroduction wmt 2019 english-german system...
TRANSCRIPT
UnsupervisedMachine Translation
Mikel Artetxe
Facebook AI Research
Introduction
IntroductionWMT 2019 English-German
IntroductionWMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
In my opinion, the new liquidity requirements will help to solve…
En mi opinión, los nuevos requisitos de liquidez contribuirán a…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
In my opinion, the new liquidity requirements will help to solve…
En mi opinión, los nuevos requisitos de liquidez contribuirán a…
In my opinion, it is really a very good thing that Mr Potočnik…
En mi opinión, que el señor Potočnik haya sido tan inequívoco…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
In my opinion, the new liquidity requirements will help to solve…
En mi opinión, los nuevos requisitos de liquidez contribuirán a…
In my opinion, it is really a very good thing that Mr Potočnik…
En mi opinión, que el señor Potočnik haya sido tan inequívoco…
In my opinion, this social dimension should, in fact, be brought into…
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
In my opinion, the new liquidity requirements will help to solve…
En mi opinión, los nuevos requisitos de liquidez contribuirán a…
In my opinion, it is really a very good thing that Mr Potočnik…
En mi opinión, que el señor Potočnik haya sido tan inequívoco…
In my opinion, this social dimension should, in fact, be brought into…
???
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…
NMT trained on 27.7M parallel sentence pairs
In my opinion the issues mentioned previously in the Council's…
En mi opinión, estos aspectos han sido tenidos debidamente…
In my opinion, it is clearly correct and natural to discuss issues…
En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…
En mi opinión también podríamos conseguir una protección…
In my opinion, it is a duty of the Presidency of the European…
A mi juicio, es responsabilidad de la Presidencia de la Unión…
In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…
In my opinion, the new liquidity requirements will help to solve…
En mi opinión, los nuevos requisitos de liquidez contribuirán a…
In my opinion, it is really a very good thing that Mr Potočnik…
En mi opinión, que el señor Potočnik haya sido tan inequívoco…
In my opinion, this social dimension should, in fact, be brought into…
En mi opinion, …
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Can we train MT systems with zero
parallel data?
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Can we train MT systems with zero
parallel data?
Scientific & practical interest!
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Can we train MT systems with zero
parallel data?
Scientific & practical interest!
Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Can we train MT systems with zero
parallel data?
Scientific & practical interest!
Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)
…but strong limitations…
Introduction
System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮
super-human performance
human evaluators preferred MT to
human translation!
WMT 2019 English-German
NMT trained on 27.7M parallel sentence pairs
We would need 48 years to read that!- Not sample efficient- Not available for most language pairs
Can we train MT systems with zero
parallel data?
Scientific & practical interest!
Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)
…but strong limitations…
IMPRESSIVE PROGRESS IN THE LAST 2-3 YEARS!
Outline
L1 corpus
Outline
L1 corpus
Outline
L2 corpus
L1 corpus
non-parallel
Outline
L2 corpus
L1 corpus
non-parallel
Outline
L2 corpus
L1 corpus
non-parallel
Outline
L2 corpus
magic
L1 corpus
non-parallel
Outline
L2 corpus
magic
Machine Translation
L1 corpus
non-parallel
Outline
L2 corpus
magic
Machine Translation
L1 corpus
non-parallel
Outline
L2 corpus
L1 corpus
non-parallel
Outline
L2 corpus
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
L2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
L2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Machine Translation
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Word embeddings
Word embeddingsDistributed representations of words
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddingsDistributed representations of words
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddingsDistributed representations of words
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddingsDistributed representations of words
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddingsDistributed representations of words Learned from co-occurrence patterns
in a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
𝑤I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
𝑤I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
𝑤I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
𝑐
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
𝑤I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
𝑐
Moo
Meow
Bark
Dog
Apple
House
cat
Cow
Banana
Pear
Calendar
Word embeddings
e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)
𝑤I will go to New York by plane .
log 𝜎 𝑤 ⋅ 𝑐 +4&'(
)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-
𝑐
Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus
sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Cross-lingual word embedding alignment
𝑋
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
Cross-lingual word embedding alignment
𝑋 𝑍
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
Cross-lingual word embedding alignment
𝑋 𝑍
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
BilboBaionaIruñea
BilbaoBayonaPamplona
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
BilboBaionaIruñea
BilbaoBayonaPamplona
Cross-lingual word embedding alignment
BilboBaionaIruñea
BilbaoBayonaPamplona
𝑊
𝑋 𝑍 𝑋𝑊
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
Donostia
Bilbo
Gasteiz
Iruñea
Maule
D. Garazi
Baiona
Cross-lingual word embedding alignment
BilboBaionaIruñea
BilbaoBayonaPamplona
𝑊
𝑋 𝑍 𝑋𝑊
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
Donostia
Bilbo
Gasteiz
Iruñea
Maule
D. Garazi
Baiona
Cross-lingual word embedding alignment
BilboBaionaIruñea
BilbaoBayonaPamplona
𝑊
𝑋 𝑍 𝑋𝑊
Donostia
Bilbo
Gasteiz
Iruñea
MauleD. Garazi
Baiona
S. SebastiánBilbao
Vitoria
Pamplona
Mauléon
S. J. Pie
de Puerto
Bayona
Donostia
Bilbo
Gasteiz
Iruñea
Maule
D. Garazi
Baiona
Cross-lingual word embedding alignment
Cross-lingual word embedding alignment
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋
Basque
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋 𝑍
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑋 𝑍
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊House
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋:,∗𝑋<,∗⋮
𝑋=,∗
𝑊 ≈
𝑍:,∗𝑍<,∗⋮𝑍=,∗
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Basque English
Cross-lingual word embedding alignment
Mikolov et al. (arXiv’13)
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋:,∗𝑋<,∗⋮
𝑋=,∗
𝑊 ≈
𝑍:,∗𝑍<,∗⋮𝑍=,∗
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
Mikolov et al. (arXiv’13)Basque English
Cross-lingual word embedding alignment
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋:,∗𝑋<,∗⋮
𝑋=,∗
𝑊 ≈
𝑍:,∗𝑍<,∗⋮𝑍=,∗
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
arg min!
'"
𝑋"∗𝑊 − 𝑍$∗%
Basque English
Cross-lingual word embedding alignment
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Zaunka
Miau
Behi
Marru
Txakur
Katu
Banana
Egutegi
Etxe
SagarUdare
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋:,∗𝑋<,∗⋮
𝑋=,∗
𝑊 ≈
𝑍:,∗𝑍<,∗⋮𝑍=,∗
TxakurSagar⋮
Egutegi
DogApple⋮
Calendar
𝑊
𝑋 𝑍 𝑋𝑊
arg min!
'"
𝑋"∗𝑊 − 𝑍$∗%
Basque English
Cross-lingual word embedding alignment
Zaunka
Miau
Behi
Marru
TxakurKatu
Banana
EgutegiEtxe
Sagar
Udare
𝑋 𝑍
Moo
MeowBark
Dog
Apple
House
cat
CowBanana
Pear Calendar
Basque English
Cross-lingual word embedding alignment
𝑋 𝑍
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
Cross-lingual word embedding alignment
𝑊
𝑋 𝑍 𝑋𝑊
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignment
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignment
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
𝑊∗ = arg min/∈1(3)
4&
min5
𝑋&∗𝑊 − 𝑍5∗6
- 25 word pairs- Numeral list
- Random dict.
Cross-lingual word embedding alignmentSelf-learning
Mapping
Dictionary
Dictionary
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
English
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two
English
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two
English
for x in vocab:
sim(“two”, x)
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two
English
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two due (two)
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two due (two)
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two due (two) cane (dog)
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two due (two) cane (dog)
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
two due (two) cane (dog)
ItalianEnglish
Cross-lingual word embedding alignment
Intra-lingual similarity distribution
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
Cross-lingual word embedding alignment
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
Cross-lingual word embedding alignment
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8
Cross-lingual word embedding alignment
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
- Stochastic dictionary induction
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
- Stochastic dictionary induction- Frequency-based vocabulary cutoff
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction
Mapping
Dictionary
Dictionary
Intra-lingual similarity distributiontwo due (two) cane (dog)
ItalianEnglish
𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8
Cross-lingual word embedding alignment
+Robust self-learning
- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction- Final symmetric re-weighting
Cross-lingual word embedding alignment
Cross-lingual word embedding alignmentSupervision Method
Cross-lingual word embedding alignmentSupervision Method
5k dict.
Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)
Cross-lingual word embedding alignmentSupervision Method
5k dict.
Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)
25 dict. Artetxe et al. (2017)
Cross-lingual word embedding alignmentSupervision Method
5k dict.
Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)
25 dict. Artetxe et al. (2017)Init.
heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.
Cross-lingual word embedding alignmentSupervision Method
5k dict.
Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)
25 dict. Artetxe et al. (2017)Init.
heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.
None
Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡
Conneau et al. (2018), paper‡
Artetxe et al. (2018b)
Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES
5k dict.
Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)
25 dict. Artetxe et al. (2017)Init.
heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.
None
Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡
Conneau et al. (2018), paper‡
Artetxe et al. (2018b)
Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES
5k dict.
Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†
Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*
Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†
Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*
Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†
Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†
Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*
Smith et al. (2017) 43.1 43.33† 29.42† 35.13†
Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -
Init.heurist.
Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -
None
Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*
Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*
Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*
Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*
Artetxe et al. (2018b) 48.13 48.19 32.63 37.33
Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES
5k dict.
Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†
Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*
Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†
Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*
Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†
Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†
Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*
Smith et al. (2017) 43.1 43.33† 29.42† 35.13†
Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -
Init.heurist.
Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -
None
Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*
Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*
Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*
Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*
Artetxe et al. (2018b) 48.13 48.19 32.63 37.33
Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES
5k dict.
Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†
Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*
Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†
Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*
Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†
Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†
Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*
Smith et al. (2017) 43.1 43.33† 29.42† 35.13†
Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -
Init.heurist.
Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -
None
Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*
Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*
Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*
Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*
Artetxe et al. (2018b) 48.13 48.19 32.63 37.33
Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES
5k dict.
Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†
Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*
Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†
Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*
Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†
Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†
Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*
Smith et al. (2017) 43.1 43.33† 29.42† 35.13†
Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -
Init.heurist.
Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -
None
Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*
Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*
Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*
Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*
Artetxe et al. (2018b) 48.13 48.19 32.63 37.33
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
- Statistical approach
Unsupervised neural machine translation
Unsupervised neural machine translation
Unsupervised neural machine translation
Unsupervised neural machine translation
Training
Unsupervised neural machine translation
Training
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a
shooting in Los Angeles International Airport.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a
shooting in Los Angeles International Airport.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a
shooting in Los Angeles International Airport.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervised
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervisedDenoising
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervisedDenoising
Unsupervised neural machine translation
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervisedDenoising
Unsupervised neural machine translation
Une lieu fusillade a eu à l’aéroport de Los international Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervisedDenoisingBacktranslation
Unsupervised neural machine translation
Une lieu fusillade a eu à l’aéroport de Los international Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
TrainingSupervisedDenoisingBacktranslation
Unsupervised neural machine translation
Une lieu fusillade a eu à l’aéroport de Los international Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
A shooting has had place in airport international of Los Angeles.
TrainingSupervisedDenoisingBacktranslation
Unsupervised neural machine translation
Une lieu fusillade a eu à l’aéroport de Los international Angeles.
Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.
A shooting has had place in airport international of Los Angeles.
TrainingSupervisedDenoisingBacktranslation
Unsupervised neural machine translation
Une lieu fusillade a eu à l’aéroport de Los international Angeles.
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
OutlineL1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
- Statistical approach
L1 corpus L2 corpus
Phrase-based SMT
L1 corpus L2 corpus
Phrase-based SMT
Phrase-based SMT
L1 corpus L2 corpus
Phrase-based SMT
Phrase-based SMTLog-linear model combining
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Phrase-based SMTLog-linear model combining
- Phrase table
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Phrase-based SMTLog-linear model combining
- Phrase table
nire iritziz in my opinion
nire iritziz in my view
nire iritziz I think
opari bat a present
opari bat one present
opari bat a gift
⁞
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities
𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63
nire iritziz in my view 0.32 0.68
nire iritziz I think 0.11 0.09
opari bat a present 0.32 0.56
opari bat one present 0.14 0.73
opari bat a gift 0.11 0.49
⁞
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇) 𝐥𝐞𝐱(I𝒇|I𝒆) 𝐥𝐞𝐱(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63 0.12 0.15
nire iritziz in my view 0.32 0.68 0.09 0.16
nire iritziz I think 0.11 0.09 0.04 0.02
opari bat a present 0.32 0.56 0.21 0.22
opari bat one present 0.14 0.73 0.18 0.32
opari bat a gift 0.11 0.49 0.11 0.13
⁞
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Language model
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model
L1 corpus L2 corpus
Phrase-based SMT
Phrasetable
Language model
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Phrase-based SMTLog-linear model combining
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMT
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Reordering model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L1 corpus L2 corpus
learn each component
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
learn each component
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
learn each component
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
learn each component
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
learn each component
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
I will go to New York by plane .
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝑤I will go to New York by plane .
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝑤I will go to New York by plane .
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝑤I will go to New York by plane .
𝑐
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝑤I will go to New York by plane .
𝑐
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝑤I will go to New York by plane .
𝑐
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
I will go to New York by plane .𝑐𝑝
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
I will go to New York by plane .𝑝 𝑐
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
I will go to New York by plane .𝑐𝑝
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )
∑(* 𝑒⁄"#$ &̅,(* )
For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )
∑(* 𝑒⁄"#$ &̅,(* )
For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:
min!$̅#
log𝜙 ̅𝑓|NN ̅$ ̅𝑓 +$̅$
log𝜙 �̅�|NN ̅# �̅�
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTLearn components from monolingual corpora
- Phrase table TRICKY… EASY!!!- Direct/inverse translation probabilities- Direct/inverse lexical weightings
- Language model EASY!!!- N-gram frequency counts with back-off and smoothing
- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model
- Word/phrase penalty EASY!!!- Fixed score to control the length of the output
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMT
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus L2 corpus
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)
Phrasetable
tuning
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)
Phrasetable
tuning
𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)
Phrasetable
tuning
𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)
• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)
… but we don’t have a parallel development set!
Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)
Phrasetable
tuning
𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)
LP = LP 𝐸 ⋅ LP 𝐹 , LP 𝐸 = max 1,&'( )%→' )'→% *
&'( *
• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸
• 𝐿:< 𝐸 = max 0, H 𝐹 − H T?→= 𝐸6⋅ LP
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpusUnsupervised phrase-based SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
phra
se
tabl
e
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train
phra
se
tabl
e
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 → L2 SMTL1 train
phra
se
tabl
e
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train
phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
eL2 → L1 SMT
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e Let’s look inside!
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
eph
rase
ta
ble
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e
Spurious targets!
𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L2 → L1 SMT
L1 → L2 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L2 train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L2 train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
phra
se
tabl
e
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
lexi
cal
reor
derin
g
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
lexi
cal
reor
derin
g
L2 train L1’ train
L1 → L2 SMT
L2 → L1 SMT
Unsupervised phrase-based SMTL1 corpus
Phrase-based SMT
Word/phrase penalty
Distortion model
Language model
L2 n-gram embeddings
L1 n-gram embeddings
L2 corpus
Cross-lingual mapping
Phrasetable
tuning refine
L1 train L2’ train
phra
se
tabl
e 𝒑 ]𝒍𝟏| ]𝒍𝟐
𝒑 ]𝒍𝟐| ]𝒍𝟏phra
se
tabl
e
lexi
cal
reor
derin
g
L2 train L1’ train
unsup. tuning
unsup. tuning
L1 → L2 SMT
L2 → L1 SMT
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
but…
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
but…
NMT >> SMT
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
but…
NMT >> SMT
(unsupervised) SMT has a hard ceiling!
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
NMT hybridization
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
NMT hybridization
L1 train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
NMT hybridization
L1 → L2 SMTL1 train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
NMT hybridization
L1 train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
NMT hybridization
L1 train L2’ train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
NMT hybridization
L1 train L2’ train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
NMT hybridization
L1 train L2’ train
L2 → L1 NMTL2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
NMT hybridization
L1 train L2’ train
L2 trainL2 → L1 SMT
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
NMT hybridization
L1 train L2’ train
L2 train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
NMT hybridization
L1 train L2’ train
L2 train L1’ train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
NMT hybridization
L1 train L2’ train
L2 train L1’ train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
NMT hybridization
L1 train L2’ trainL1 → L2 NMT
L2 train L1’ train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
NMT hybridization
L1 train L2’ train
L2 train L1’ train
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
L1 → L2 NMT
NMT hybridization
L1 train L2’ train
L2 train L1’ train
𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
L1 → L2 NMT
NMT hybridization
L1 train L2’ train
L2 train L1’ train
𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)𝑁-C8 = 𝑁 − 𝑁BC8
L2 n-gram embeddings
Phrase-based SMT
Distortion model
Phrasetable
Language model
Word/phrase penalty
L1 n-gram embeddings
Cross-lingual mapping
L1 corpus
tuning
L2 corpus
refine
NMT
L1 → L2 SMT
L2 → L1 NMT
L2 → L1 SMT
L1 → L2 NMT
Machine Translation
L1 corpus
non-parallel L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Outline
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Outline
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Denoising
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18) - Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Works well, but very expensive to train
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Works well, but very expensive to train
- Masking can be seen as a form of noise!
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Works well, but very expensive to train
- Masking can be seen as a form of noise!
- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Works well, but very expensive to train
- Masking can be seen as a form of noise!
- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)
- Basic: word-for-word translation using cross-lingual word embeddings
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Machine Translation
L1 corpus
non-parallel
General recipe
L1 embeddings
Cross-lingual embeddingsL2 corpus
L2 embeddings
Initialization Training
- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)
- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)
- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)
- Works well, but very expensive to train
- Masking can be seen as a form of noise!
- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)
- Basic: word-for-word translation using cross-lingual word embeddings
- Better: Unsupervised statistical machine translation
- Denoising
- Back-translation
- Neural approach
- Statistical approach
Results
Results
- Languages: French-English, German-English
Results
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
Results
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
Results
FR-EN EN-FR DE-EN EN-DE
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
Results
FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6
+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6
+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6
+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6
+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
detok. SacreBLEU 33.2 33.6 26.4 21.2
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DE
Unsup.
Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -
Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
detok. SacreBLEU 33.2 33.6 26.4 21.2
Supervised
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DE
Unsup.
Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -
Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
detok. SacreBLEU 33.2 33.6 26.4 21.2
SupervisedWMT winner 35.0 35.8 29.0 20.6
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DE
Unsup.
Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -
Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
detok. SacreBLEU 33.2 33.6 26.4 21.2
SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Results
FR-EN EN-FR DE-EN EN-DE
Unsup.
Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -
Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5
detok. SacreBLEU 33.2 33.6 26.4 21.2
SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4Large scale back-translation (Edunov et al., EMNLP’18) - 43.8 - 33.8
- Languages: French-English, German-English
- Training: WMT-14 News Crawl
- Test set: WMT-14 newstest (BLEU)
*Tokenized BLEU (about 1-2 points higher)
Conclusions
ConclusionsUnsupervised machine translation is a reality!
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation
Strong results
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation
Strong results- Competitive with supervised SOTA from 6 years ago
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation
Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!
ConclusionsUnsupervised machine translation is a reality!
- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation
Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!
What’s next?
Thank you!
Twitter: @artetxemEmail: [email protected]