1. introduction 3. algorithm - github pages · 3. algorithm 2. approach •overall results 5....

1
Machine Translation with Weakly Paired Documents 1 Lijun Wu, 2 Jinhua Zhu, 3 Fei Gao, 4 Di He, 5 Tao Qin, 1 Jianhuang Lai and 5 Tie-Yan Liu 1 Sun Yat-sen University; 2 University of Science and Technology of China; 3 Institute of Computing Technology, Chinese Academy of Sciences; 4 Peking University, 5 Microsoft Research Asia Contact [email protected] 1. Introduction NMT achieves strong performance in rich-resource language pairs with large amount of parallel data. Low-resource language pairs have much lower translation accuracy due to the lack of bilingual sentence pairs. Unsupervised machine translation has been explored with monolingual data only. In reality, large amount of weakly paired bilingual documents can be leveraged. We propose to boost the unsupervised machine translation with weakly paired documents using two innovated components. We achieve strong performances in various language pairs and reduce the gap between supervised and unsupervised translation up to 50%. The overall loss function is is the original unsupervised NMT training loss Analysis Data Statistics 3. Algorithm 2. Approach Overall Results 5. Studies We propose to leverage weakly paired bilingual documents from Wikipedia. Notations: = {( , )} as the set of weakly paired documents (e.g., two cross- lingual linked Wikipedia pages) , are the number of sentences in paired documents , , usually != ,y are the sentences of language , ♦ Mining implicitly aligned sentence pairs , cross-lingual word embedding from MUSE , the estimated frequency from the document , predefined parameter and is the weighted sentence embedding 1 , the first principal component from all sentence embedding Select sentence pairs by sim( ,s Y )= < , > larger than 1 , also ensure this pair is larger than others pairs by 2 The implicitly aligned sentence training loss of two-sides is ♦Aligning Topic Distribution Translate to d i Y Evaluate the word distribution between and Feed pair ( , , , ) into NMT model and calculate ( ; ) by The ground-truth document word distribution is The document alignment loss of is The detailed two-sides loss 4. Experiments Ablation Study Impact of Sentence Quality

Upload: others

Post on 28-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. Introduction 3. Algorithm - GitHub Pages · 3. Algorithm 2. Approach •Overall Results 5. Studies •We propose to leverage weakly paired bilingual documents from Wikipedia. •Notations:

Machine Translation with Weakly Paired Documents

1Lijun Wu, 2Jinhua Zhu, 3Fei Gao, 4Di He, 5Tao Qin, 1Jianhuang Lai and 5Tie-Yan Liu

1Sun Yat-sen University; 2University of Science and Technology of China; 3Institute of Computing Technology, Chinese Academy of Sciences;

4Peking University, 5Microsoft Research Asia

Contact

[email protected]

1. Introduction

• NMT achieves strong performance in rich-resource

language pairs with large amount of parallel data.

• Low-resource language pairs have much lower

translation accuracy due to the lack of bilingual

sentence pairs.

• Unsupervised machine translation has been

explored with monolingual data only.

• In reality, large amount of weakly paired bilingual

documents can be leveraged.

• We propose to boost the unsupervised machine

translation with weakly paired documents using two

innovated components.

• We achieve strong performances in various language

pairs and reduce the gap between supervised and

unsupervised translation up to 50%.

• The overall loss function is

• 𝐿𝑚 is the original unsupervised NMT training loss

• Analysis

• Data Statistics

3. Algorithm

2. Approach

• Overall Results

5. Studies

• We propose to leverage weakly paired bilingual

documents from Wikipedia.

• Notations: 𝐷 = {(𝑑𝑖

𝑋, 𝑑𝑖𝑌)} as the set of weakly paired documents (e.g., two cross-

lingual linked Wikipedia pages)

𝑛𝑖𝑋, 𝑛𝑖

𝑌 are the number of sentences in paired documents 𝑑𝑖𝑋, 𝑑𝑖

𝑌, usually

𝑛𝑖𝑋! = 𝑛𝑖

𝑌

𝑥, y are the sentences of language 𝑋, 𝑌

♦Mining implicitly aligned sentence pairs 𝑒𝑤, cross-lingual word embedding from MUSE

𝑝𝑤, the estimated frequency from the document

𝑎, predefined parameter and ෝ𝑒𝑠 is the weighted sentence embedding

𝑢1, the first principal component from all sentence embedding

Select sentence pairs by sim(𝑠𝑋, sY) =<𝑒

𝑠𝑋,𝑒𝑠𝑌>

𝑒𝑠𝑋

𝑒𝑠𝑌

larger than 𝑐1, also

ensure this pair is larger than others pairs by 𝑐2 The implicitly aligned sentence training loss of two-sides is

♦Aligning Topic Distribution Translate 𝑑𝑖

𝑋 to diY

Evaluate the word distribution between 𝑑𝑖𝑌 and 𝑑𝑖

𝑌

Feed pair (𝑠𝑖,𝑘𝑋 , 𝑠𝑖,𝑘

𝑌 ) into NMT model and calculate 𝑃(𝑤𝑌; 𝑑𝑖𝑋) by

The ground-truth document word distribution is

The document alignment loss of 𝑋 → 𝑌 is

The detailed two-sides loss

4. Experiments

Ablation Study

Impact of Sentence Quality