1. introduction 3. algorithm - github pages · 3. algorithm 2. approach •overall results 5....

Machine Translation with Weakly Paired Documents

1Lijun Wu, 2Jinhua Zhu, 3Fei Gao, 4Di He, 5Tao Qin, 1Jianhuang Lai and 5Tie-Yan Liu

1Sun Yat-sen University; 2University of Science and Technology of China; 3Institute of Computing Technology, Chinese Academy of Sciences;

4Peking University, 5Microsoft Research Asia

Contact

• [email protected]

1. Introduction

• NMT achieves strong performance in rich-resource

language pairs with large amount of parallel data.

• Low-resource language pairs have much lower

translation accuracy due to the lack of bilingual

sentence pairs.

• Unsupervised machine translation has been

explored with monolingual data only.

• In reality, large amount of weakly paired bilingual

documents can be leveraged.

• We propose to boost the unsupervised machine

translation with weakly paired documents using two

innovated components.

• We achieve strong performances in various language

pairs and reduce the gap between supervised and

unsupervised translation up to 50%.

• The overall loss function is

• 𝐿𝑚 is the original unsupervised NMT training loss

• Analysis

• Data Statistics

3. Algorithm

2. Approach

• Overall Results

5. Studies

• We propose to leverage weakly paired bilingual

documents from Wikipedia.

• Notations: 𝐷 = {(𝑑𝑖

𝑋, 𝑑𝑖𝑌)} as the set of weakly paired documents (e.g., two cross-

lingual linked Wikipedia pages)

𝑛𝑖𝑋, 𝑛𝑖

𝑌 are the number of sentences in paired documents 𝑑𝑖𝑋, 𝑑𝑖

𝑌, usually

𝑛𝑖𝑋! = 𝑛𝑖

𝑌

𝑥, y are the sentences of language 𝑋, 𝑌

♦Mining implicitly aligned sentence pairs 𝑒𝑤, cross-lingual word embedding from MUSE

𝑝𝑤, the estimated frequency from the document

𝑎, predefined parameter and ෝ𝑒𝑠 is the weighted sentence embedding

𝑢1, the first principal component from all sentence embedding

Select sentence pairs by sim(𝑠𝑋, sY) =<𝑒

𝑠𝑋,𝑒𝑠𝑌>

𝑒𝑠𝑋

𝑒𝑠𝑌

larger than 𝑐1, also

ensure this pair is larger than others pairs by 𝑐2 The implicitly aligned sentence training loss of two-sides is

♦Aligning Topic Distribution Translate 𝑑𝑖

𝑋 to diY

Evaluate the word distribution between 𝑑𝑖𝑌 and 𝑑𝑖

𝑌

Feed pair (𝑠𝑖,𝑘𝑋 , 𝑠𝑖,𝑘

𝑌 ) into NMT model and calculate 𝑃(𝑤𝑌; 𝑑𝑖𝑋) by

The ground-truth document word distribution is

The document alignment loss of 𝑋 → 𝑌 is

The detailed two-sides loss

4. Experiments

Ablation Study

Impact of Sentence Quality