bayesian word alignment for statistical machine translation authors: coskun mermer, murat saraclar...

15

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Sara clar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Upload: percival-bell

Post on 17-Jan-2016

216 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Bayesian Word Alignment for Statistical Machine Translation

Authors: Coskun Mermer, Murat Saraclar

Present by Jun Lang2011-10-13 I2R SMT-Reading Group

Page 2: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Paper info

• Bayesian Word Alignment for Statistical Machine Translation

• ACL 2011 Short Paper

• With Source Code in Perl on 379 lines

• Authors– Coskun Mermer– Murat Saraclar

Page 3: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Core Idea

• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1

• Result– Outperform classical EM in BLEU up to 2.99– Effectively address the rare word problem– Much smaller phrase table than EM

Page 4: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Mathematics

• (E, F): parallel corpus• ei , fj : i-th (j-th) source (target) word in e (f), whic

h contains I (J) words in corpus E (F).• e0 : Each E sentence contains “null” word• VE (VF): size of source (target) vocabulary• a (A): alignment for sentence (corpus)• aj : fj has alignment aj for source word eaj

• T: parameter table, size is VE x VF

• te,f = P(f|e): word translation probability

Page 5: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

IBM Model 1

T as a random variable

Page 6: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

• T={te,f} is an exponential family distribution

• Specifically being multinomial distribution

• We choose the conjugate prior

• In the case of Dirichlet Distribution for computational convenience

Page 7: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution

Avoid rare words acting as “garbage collectors”

Page 8: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Dirichlet Distribution

sample the unknowns A and T in turn

¬j denotes the exclusion ofthe current value of aj .

Page 9: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Algorithm

A can be arbitrary, but normal EM output is better

Page 10: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Results

Page 11: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Page 12: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Page 13: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Code View

bayesalign.pl

Page 14: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

Conclusions

• Outperform classical EM in BLEU up to 2.99

• Effectively address the rare word problem

• Much smaller phrase table than EM

• Shortcomings– Too slow: 100 sentence pairs costs 18 mins– Maybe can be speedup by parallel computing

Page 15: Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group

3

MUTLU YILLAR HAPPY NEW YEAR ERMAS MERMER I

IR 203 Current issues in international relations (5) Bezen Balamir Coskun office: 417 [email protected] [email protected]

yeraltı mermer işletmelerinde oda topuk yöntem parametrelerinin

IR 203 Global Economy & International Relations Lecture Notes Dr. Bezen Coskun, [email protected]@zirve.edu.tr

A Coskun Thesis Revised

Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly

MURAT COSKUN VISIONS - Pianissimo Musikpianissimomusik.com/pdf/visions_flyer.pdf · Murat Coskun - Visions Wenn Instrumentalisten von Visionen reden, dann sind es weniger Worte, mit

Module 8: I2R Change Control 1.1 Gatekeeper Audit

homepages.math.uic.eduhomepages.math.uic.edu/~coskun//skew-restrict.pdf · SYMPLECTIC RESTRICTION VARIETIES AND GEOMETRIC BRANCHING RULES IZZET COSKUN To Joe, with gratitude, in celebration

IR 501 THEORIES of ınternatıonal relatıons (introduction) Bezen Balamir Coskun office: 417 [email protected] [email protected]

HİNDİSTAN MERMER ve GRANIT SATIN ALMACI LİSTESİ · · 2015-03-10HİNDİSTAN MERMER ve GRANIT SATIN ALMACI LİSTESİ ... Associated Stone Industries Kotah Ltd. 419 - B, 2nd Floor,

BİLECİK BÖLGESİ MERMER SEKTÖRÜNÜN …Bilecik Bölgesi Mermer Sektörünün Uluslar Arası Rekabetçilik Analizi: Sektörel Sorunlar ve Çözüm Önerileri 197 ucuz işçilik

Coskun KILIC, Chief Financial Officer, Turkish Airlinesinvestor.turkishairlines.com/documents/ThyInvestor... · Coskun KILIC, Chief Financial Officer, Turkish Airlines . ... 2014

Pneumatic Belt Xin Chen, Coskun Kocabas, and George M. Whitesides

INTERNET TOPOLOGY MAPPING INTERNET MAPPING PROBING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University

INTERNET MEASUREMENT INTERNET MAPPING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University of Nevada,

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group

IR 203 Current issues in international relations (7) Bezen Balamir Coskun office: 417 [email protected] [email protected]

ĠĢ Sağlığı ve Güvenliğinde Fine- Kinney Yöntemiyle Risk ... · Yöntemiyle Risk Yönetimi: Mermer ĠĢletmesi Örneği” baĢlıklı bu çalıĢma, jürimizce lisansüstü

IR 203 Human Security Lecture Notes Dr. Bezen Coskun, [email protected]@zirve.edu.tr

MERMER ATIĞININ GEOTEKNİK MÜhENDİSLİĞİNDE … · Serbest basınç deneyleri yapılan numuneler taramalı elektron mikroskopu (SEM) ile incelenerek içyapısındaki değişiklikler

A*STAR I2R SME Engagement Strategy

Kahramanlar mermer

Contents Introduction - homepages.math.uic.eduhomepages.math.uic.edu/~coskun/skew-restrict.pdf · curves [EH4]. In a parallel development, Gri ths and Harris used specializations

CONFidence 2015: iOS Hacking: Advanced Pentest & Forensic Techniques - Omer S. Coskun

sercansevimermak.com...mermer makinalar. SRCN 1500 Dönerba§ll Mermer ve Granit Kesim Makinasl Kafa Dönü§leri ve Vagon Otomatik N Özellikler / Technical Data 1500 ... Hidrolik

jacobsinstitute.org · In our i2R, or idea to Reality Center, we conceive the next generation in medical technology in vascular medicine. The i2R focuses squarely on developing novel,

TABANLI MALİYETLEME SİSTEMİNİN KARŞILAŞTIRILMASI: Yrd. …iibf.kilis.edu.tr/iibfdergi/vol5no8/a3akın.pdf · Anahtar Kelimeler: Maliyet, Faaliyet Tabanlı Maliyetleme, Mermer

Graphic Communications - İTÜweb.itu.edu.tr/~coskun/contents/lessons/graph/graphcom_01.pdf · Graphic Communications ... Graphics & Engineering Drafting and documentation, along

Side Müzesi’nde Bulunan Bir Grup Mermer Aphrodite Heykelciği1

Transfer Learning Part I: Overview Sinno Jialin Pan Sinno Jialin Pan Institute for Infocomm Research (I2R), Singapore

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and

Infrastructure to Responder (I2R) Technical Memo · 2019-03-19 · Opportunities and Challenges for Future I2R ... communication and , connectivity. Standard message sets are being

IR 203 Current ıssues ın ınternatıonal relatıons (2) Bezen Balamir Coskun office: 417 [email protected] [email protected]

Scattering diagrams, stability conditions, and coherent sheaves on …homepages.math.uic.edu/~coskun/bousseaufrg.pdf · 2020. 12. 8. · Scattering diagrams, stability conditions,