bayesian word alignment for statistical machine translation authors: coskun mermer, murat saraclar...

15
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Sara clar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Upload: percival-bell

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Bayesian Word Alignment for Statistical Machine Translation

Authors: Coskun Mermer, Murat Saraclar

Present by Jun Lang2011-10-13 I2R SMT-Reading Group

Page 2: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Paper info

• Bayesian Word Alignment for Statistical Machine Translation

• ACL 2011 Short Paper

• With Source Code in Perl on 379 lines

• Authors– Coskun Mermer– Murat Saraclar

Page 3: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Core Idea

• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1

• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM

Page 4: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Mathematics

• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic

h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj

• T: parameter table, size is VE x VF

• te,f = P(f|e): word translation probability

Page 5: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

IBM Model 1

T as a random variable

Page 6: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

• T={te,f} is an exponential family distribution

• Specifically being multinomial distribution

• We choose the conjugate prior

• In the case of Dirichlet Distribution for computational convenience

Page 7: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution

Avoid rare words acting as “garbage collectors”

Page 8: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

sample the unknowns A and T in turn

¬j denotes the exclusion ofthe current value of aj .

Page 9: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Algorithm

A can be arbitrary, but normal EM output is better

Page 10: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Results

Page 11: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group
Page 12: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group
Page 13: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Code View

bayesalign.pl

Page 14: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Conclusions

• Outperform classical EM in BLEU up to 2.99

• Effectively address the rare word problem

• Much smaller phrase table than EM

• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing

Page 15: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

3