1. introduction 3. algorithm - github pages · 3. algorithm 2. approach •overall results 5....
TRANSCRIPT
Machine Translation with Weakly Paired Documents
1Lijun Wu, 2Jinhua Zhu, 3Fei Gao, 4Di He, 5Tao Qin, 1Jianhuang Lai and 5Tie-Yan Liu
1Sun Yat-sen University; 2University of Science and Technology of China; 3Institute of Computing Technology, Chinese Academy of Sciences;
4Peking University, 5Microsoft Research Asia
Contact
1. Introduction
• NMT achieves strong performance in rich-resource
language pairs with large amount of parallel data.
• Low-resource language pairs have much lower
translation accuracy due to the lack of bilingual
sentence pairs.
• Unsupervised machine translation has been
explored with monolingual data only.
• In reality, large amount of weakly paired bilingual
documents can be leveraged.
• We propose to boost the unsupervised machine
translation with weakly paired documents using two
innovated components.
• We achieve strong performances in various language
pairs and reduce the gap between supervised and
unsupervised translation up to 50%.
• The overall loss function is
• 𝐿𝑚 is the original unsupervised NMT training loss
• Analysis
• Data Statistics
3. Algorithm
2. Approach
• Overall Results
5. Studies
• We propose to leverage weakly paired bilingual
documents from Wikipedia.
• Notations: 𝐷 = {(𝑑𝑖
𝑋, 𝑑𝑖𝑌)} as the set of weakly paired documents (e.g., two cross-
lingual linked Wikipedia pages)
𝑛𝑖𝑋, 𝑛𝑖
𝑌 are the number of sentences in paired documents 𝑑𝑖𝑋, 𝑑𝑖
𝑌, usually
𝑛𝑖𝑋! = 𝑛𝑖
𝑌
𝑥, y are the sentences of language 𝑋, 𝑌
♦Mining implicitly aligned sentence pairs 𝑒𝑤, cross-lingual word embedding from MUSE
𝑝𝑤, the estimated frequency from the document
𝑎, predefined parameter and ෝ𝑒𝑠 is the weighted sentence embedding
𝑢1, the first principal component from all sentence embedding
Select sentence pairs by sim(𝑠𝑋, sY) =<𝑒
𝑠𝑋,𝑒𝑠𝑌>
𝑒𝑠𝑋
𝑒𝑠𝑌
larger than 𝑐1, also
ensure this pair is larger than others pairs by 𝑐2 The implicitly aligned sentence training loss of two-sides is
♦Aligning Topic Distribution Translate 𝑑𝑖
𝑋 to diY
Evaluate the word distribution between 𝑑𝑖𝑌 and 𝑑𝑖
𝑌
Feed pair (𝑠𝑖,𝑘𝑋 , 𝑠𝑖,𝑘
𝑌 ) into NMT model and calculate 𝑃(𝑤𝑌; 𝑑𝑖𝑋) by
The ground-truth document word distribution is
The document alignment loss of 𝑋 → 𝑌 is
The detailed two-sides loss
4. Experiments
Ablation Study
Impact of Sentence Quality