dr. preslav nakov — combining, adapting and reusing bi-texts between related languages —...

53
Combining, Adapting and Reusing Bi- texts between Related Languages: Application to Statistical Machine Translation Preslav Nakov, Qatar Computing Research Institute (collaborators: Jorg Tiedemann, Pidong Wang, Hwee Tou Ng) Yandex seminar August 13, 2014, Moscow, Russia

Upload: yandex

Post on 10-May-2015

226 views

Category:

Internet


0 download

DESCRIPTION

Bilingual sentence-aligned parallel corpora, or bitexts, are a useful resource for solving many computational linguistics problems including part-of speech tagging, syntactic parsing, named entity recognition, word sense disambiguation, sentiment analysis, etc.; they are also a critical resource for some real-world applications such as statistical machine translation (SMT) and cross-language information retrieval. Unfortunately, building large bi-texts is hard, and thus most of the 6,500+ world languages remain resource-poor in bi-texts. However, many resource-poor languages are related to some resource-rich language, with whom they overlap in vocabulary and share cognates, which offers opportunities for using their bi-texts. We explore various options for bi-text reuse: (i) direct combination of bi-texts, (ii) combination of models trained on such bi-texts, and (iii) a sophisticated combination of (i) and (ii). We further explore the idea of generating bitexts for a resource-poor language by adapting a bi-text for a resource-rich language. We build a lattice of adaptation options for each word and phrase, and we then decode it using a language model for the resource-poor language. We compare word- and phrase-level adaptation, and we further make use of cross-language morphology. For the adaptation, we experiment with (a) a standard phrase-based SMT decoder, and (b) a specialized beam-search adaptation decoder. Finally, we observe that for closely-related languages, many of the differences are at the subword level. Thus, we explore the idea of reducing translation to character-level transliteration. We further demonstrate the potential of combining word- and character-level models.

TRANSCRIPT

Page 1: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

Combining, Adapting and Reusing Bi-texts between Related Languages:

Application to Statistical Machine Translation

Preslav Nakov, Qatar Computing Research Institute(collaborators: Jorg Tiedemann, Pidong Wang, Hwee Tou Ng)

Yandex seminarAugust 13, 2014, Moscow, Russia

Page 2: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

2

Plan

•Part I- Introduction to Statistical Machine Translation

•Part II- Combining, Adapting and Reusing Bi-texts between Related

Languages: Application to Statistical Machine Translation

•Part III- Further Discussion on SMT

Page 3: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

3

StatisticalMachine Translation

Page 4: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

4

Statistical Machine Translation (SMT)Reach Out to Asia (ROTA) has announced its fifth Wheels ‘n’ Heels, Qatar’s largest annual community event, which will promote ROTA’s partnership with the Qatar Japan 2012 Committee. Held at the Museum of Islamic Art Park on 10 February, the event will celebrate 40 years of cordial relations between the two countries. Essa Al Mannai, ROTA Director, said: “A group of 40 Japanese students are traveling to Doha especially to take part in our event.

English

SMT systems:- learn from human-generated translations- extract useful knowledge and build models- use the models to translate new sentences

Page 5: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

5

SMT:The Noisy Channel Model

Page 6: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

6

Translation as Decoding•1947, Warren Weaver, Rockefeller Foundation:

One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’

Example:- Это действительно написано по-английски .

- This is really written in English .

Page 7: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

7

The Basic Components of an SMT System

Look for the best English translation that both conveys the French meaning

and is grammatical.

Page 8: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

8

Components of an SMT System

•Language Model- English text е P(e)

o good English high probabilityo bad English low probability

•Translation Model- Pair <f,e> P(f|e)

o <f,e> are translations high probabilityo <f,e> are not translations low probability

•Decoder- Given P(e), P(f|e), and f we look for е that maximizes

[P(e).P(f|e)]

Page 9: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

9

Combining P(e) and P(f|e)

How do we translate to Englishthe Russian phrase “красный цветок”?

P(e) P(f|e) P(e).P(f|e)

a flower red ↓ ↑ ↓red flower a ↓ ↑ ↓flower red a ↓ ↑ ↓a red dog ↑ ↓ ↓dog cat mouse ↓ ↓ ↓ ↓a red flower ↑ ↑ ↑

Page 10: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

10

SMT:The Language Model P(e)

Page 11: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

11

Language Model

•Goal: prefer “good” to “bad” English- “good” ≠ grammatical- “bad” ≈ unlikely

•Examples (grammaticality):- I do not like strong tea. good - I do not like powerful tea. bad- I like strong tea not. bad- Like not tea strong do I. bad

Page 12: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

12

Example:Grammatical but Low-probability Text

Eye halve a spelling checkerIt came with my pea seaIt plainly marks four my revueMiss steaks eye kin knot sea.

Eye strike a key and type a wordAnd weight four it two sayWeather eye am wrong oar writeIt shows me a strait a weigh.

As soon as a mist ache is maidIt nose bee fore two longAnd eye can put the error riteIts rare lea ever wrong.

Eye have run this poem threw itI am shore your pleased two noIts letter perfect awl the weighMy checker tolled me sew.

Торопыжка был голодный - проглотил утюг холодный.

Page 13: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

13

Language Model:Learned from Monolingual Text

Page 14: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

14

Bigram Language Model

First-orderMarkov model(approximation)

Chain rule

)6()...|(P)|(P)|(P)|(P)(P

)5()...|(P)|(P)|(P)|(P)(P

)4()...(P)(P

453423121

432153214213121

21

wwwwwwwww

wwwwwwwwwwwwwww

wwwe n

Andrei Markov

Page 15: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

15

Bigram Language Model

)(

)(

)(

)()|(P

1

1

1

11

i

ii

wii

iiii wC

wwC

wwC

wwCww

i

n

iiin wwwPwww

21121 |P...P

P(“I eat an apple …”) = P(I | <S>) . P(eat | I) . P(an | eat) . P(apple | an) …

Page 16: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

16

SMT:The Translation Model P(f|e)

Page 17: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

17

Modeling P(f|e) – Sentence Level

Batman did not fight any cat woman .

Бэтмен не вел бой с никакой женщиной кошкой .

•Cannot be estimated directly

Page 18: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

18

Modeling P(f|e)

Batman did not fight any cat woman .

Бэтмен не вел бой с никакой женщиной кошкой .

•Broken into smaller steps

Page 19: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

19

IBM Model 4: Generation(Brown et al., CL 1993)

Batman did not fight any cat woman .

Batman not fight fight any cat woman .

Batman not fight fight NULL any cat woman .

Бэтмен не вел бой с никакой кошкой женщиной .

Бэтмен не вел бой с никакой женщиной кошкой .

n(3|fight)

P-NULL

t(не|not)

d(8|7)

(Brown et al., CL 1993)

Page 20: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

20

IBM Model 4: Generation(Brown et al., CL 1993)

Batman did not fight any cat woman .

Batman not fight fight any cat woman .

Batman not fight fight NULL any cat woman .

Бэтмен не вел бой с никакой кошкой женщиной .

Бэтмен не вел бой с никакой женщиной кошкой .

n(3|fight)

P-NULL

t(не|not)

d(8|7)

• All these probabilities could be learned if word alignments were available.

• We can learn word alignments using EM.

(Brown et al., CL 1993)

Page 21: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

21

Translation Model: Learned from a Bi-TextReach Out to Asia (ROTA) has announced its fifth Wheels ‘n’ Heels, Qatar’s largest annual community event, which will promote ROTA’s partnership with the Qatar Japan 2012 Committee. Held at the Museum of Islamic Art Park on 10 February, the event will celebrate 40 years of cordial relations between the two countries. Essa Al Mannai, ROTA Director, said: “A group of 40 Japanese students are traveling to Doha especially to take part in our event.

Page 22: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

22

100 Sentence Pairs

Page 23: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

23

1000 Sentence Pairs

Page 24: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

24

10,000 Sentences = 1 Book

Page 25: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

25

100,000 Sentences = Stack of Books

Page 26: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

26

1,000,000 Sentences = Shelf of Books

Page 27: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

27

10 Million Sentences = Large Shelf of Books

Page 28: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

28

The Large Data Trend Continues

Page 29: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

29

Alignment Levels

- Document

- Paragraph

- SentenceoGale & Church algorithm

- Wordso IBM models

Page 30: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

30

Learning Word AlignmentsUsing Expectation Minimization (EM)

… красивые цветы … красивые красные цветы … красивые девушки …

… beautiful flowers … beautiful red flowers … beautiful girls …

Page 31: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

31

Learning Word AlignmentsUsing Expectation Minimization (EM)

… красивые цветы … красивые красные цветы … красивые девушки …

… beautiful flowers … beautiful red flowers … beautiful girls …

Page 32: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

32

Learning Word AlignmentsUsing Expectation Minimization (EM)

… красивые цветы … красивые красные цветы … красивые девушки …

… beautiful flowers … beautiful red flowers … beautiful girls …

Page 33: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

33

Learning Word AlignmentsUsing Expectation Minimization (EM)

… красивые цветы … красивые красные цветы … красивые девушки …

… beautiful flowers … beautiful red flowers … beautiful girls …

Page 34: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

34

Phrase-basedSMT

Page 35: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

35

Phrase-Based SMT

•Sentence is broken into phrases- Contiguous token sequences- Not linguistic units

•Each phrase is translated in isolation•Translated phrases are reordered

Batman has not fought a cat woman yet . Бэтмен пока не сражался с женщиной кошкой .

(Koehn&al., HLT-NAACL 2003)

(Koehn&al., HLT-NAACL 2003)

Page 36: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

36

Phrase-Based Translation

•Multiple words Multiple words

•Models context

•Handles non-compositional phrases

•More data – longer phrases

Page 37: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

37

Phrase-Based SMT:Sample

Bulgarian-English Phrases

Page 38: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

38

Sample Phrases: главен

главни прокурори chief prosecutors

главни счетоводители chief accountants

главни архитекти chief architects

главни щабове main staffs

главни улици main streets

главни методисти senior instructors

главно предизвикателство major challenge

Page 39: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

39

Sample Phrases: както

•както физическа , така и психическа ||| both physical and psychological•както целият регион ||| like the whole region•както те са определени ||| as defined•както и размера ||| as well as the size•както и предишните редовни доклади ||| in line with previous regular reports•както и по други ||| and in other

Page 40: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

40

Phrase-Based SMT:Sample

Russian-Bulgarian Phrases

Page 41: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

41

Sample Phrases: заявление

•заявление ||| молба ||| 0.25 0.166667 1 1 2.718•заявление об ||| молба за ||| 1 0.00524692 1 0.53125 2.718•заявление об образовании ||| молба за образуването ||| 1 0.005 ...•заявления ||| заявление ||| 1 1 0.5 0.666667 2.718•заявления ||| заявление от ||| 1 0.500677 0.5 0.222222 2.718•заявляю ||| заявявам ||| 0.333333 0.6 1 1 2.718

Page 42: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

42

Sample Phrases: звонок, звук

•звонка ||| звънец ||| 1 1 0.4 0.5 2.718•звонка ||| звънеца ||| 0.25 0.2 0.4 0.5 2.718•звонка ||| на звънеца ||| 1 0.2 0.2 0.128199 2.718•звонки ||| звънци ||| 0.4 0.4 1 1 2.718•звонко ||| звънко ||| 0.333333 0.428571 1 1 2.718•звонков ||| звънци ||| 0.4 0.4 1 1 2.718•звонку ||| звънеца ||| 0.25 0.2 1 1 2.718•звонок ||| звънеца ||| 0.375 0.3 0.375 0.3 2.718•звонок ||| звънецът ||| 1 1 0.125 0.1 2.718•звонок ||| иззвъня ||| 0.6 0.625 0.375 0.5 2.718

•звук ||| звук ||| 0.666667 0.666667 1 1 2.718•звука ||| звук ||| 0.333333 0.333333 0.666667 0.4 2.718•звука ||| звука ||| 1 0.666667 0.333333 0.4 2.718•звуки ||| звуци ||| 1 1 1 1 2.718

Page 43: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

43

Sample Phrases: здание

•здание ||| здание ||| 1 1 0.4 0.4 2.718•здание ||| зданието ||| 0.75 0.5 0.6 0.6 2.718•здания ||| зданието ||| 0.25 0.5 0.2 0.375 2.718•здания ||| зданието на ||| 1 0.250861 0.4 0.140625 2.718•здания ||| сградите ||| 1 1 0.2 0.25 2.718•здания ||| сградите на ||| 1 0.500861 0.2 0.09375 2.718

Page 44: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

44

Sample Phrases: здравствуй

•здравствуй ||| добро утро ||| 1 0.75 0.333 0.0625 2.718•здравствуй ||| здравей ||| 1 1 0.666667 0.5 2.718

•здравствуйте ||| здравейте ||| 1 1 1 1 2.718

•здравствует ||| живее ||| 0.4 0.333333 1 1 2.718

Page 45: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

45

Sample Phrases: необычайное•необычайное ||| необикновено ||| 0.176471 0.142857 0.75 0.75 2.718•необычайное ||| необикновеното ||| 0.333333 0.333333 0.25 0.25 2.718•необычайно ||| извънредно ||| 1 0.4 0.125 0.117647 2.718•необычайно ||| необикновена ||| 0.222222 0.166667 0.125 0.117647 2.718•необычайно ||| необикновено ||| 0.588235 0.476191 0.625 0.588235 2.718•необычайно ||| необичайно ||| 1 1 0.0625 0.117647 2.718•необычайной ||| необикновена ||| 0.333333 0.416667 0.5 0.625 2.718•необычайной ||| необикновено ||| 0.0588235 0.047619 0.166667 0.125 2.718•необычайной ||| с необикновена ||| 1 0.209808 0.333333 0.15625 2.718•необычайные ||| необикновени ||| 0.5 0.5 1 1 2.718•необычайный ||| необикновен ||| 0.222222 0.222222 0.5 0.5 2.718•необычайный ||| необикновеният ||| 0.5 0.5 0.25 0.25 2.718•необычайный ||| необичайни ||| 0.333333 0.25 0.25 0.25 2.718

•необычное ||| необикновеното ||| 0.666667 0.666667 1 1 2.718•необычные ||| необичайни ||| 0.666667 0.5 1 1 2.718

•неожиданной ||| неочакваната ||| 0.333333 0.333333 0.25 0.25 2.718•неожиданной ||| неочаквана ||| 0.666667 0.6 0.75 0.75 2.718

Page 46: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

46

SMT:Evaluation

Page 47: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

47

How MT Evaluation is NOT Done…

•Backtranslation

- A “mythical” example (Hutchins,1995)o En: The spirit is willing, but the flesh is weak.o Ru: Дух бодр, но плоть слаба.o En. The vodka is good, but the meat is rotten.

- Not used, can be gamed easily:o En: The spirit is willing, but the flesh is weak.o Ru: The spirit is willing, but the flesh is weak.o En: The spirit is willing, but the flesh is weak.

Page 48: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

48

The BLEU Evaluation Metric(Papineni et al., ACL 2002)

Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

• BLEU4 formula (counts n-grams up to length 4)

exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 – max(words-in-reference / words-in-machine – 1, 0)

p1 = 1-gram precisionp2 = 2-gram precisionp3 = 3-gram precisionp4 = 4-gram precision

Correlates well with human judgments Very hard to “game” it

(Papineni et al., ACL 2002)

Page 49: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

49

BLEU: Multiple Reference Translations

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

(Papineni et al., ACL 2002)

Page 50: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

50

Phrase-Based SMT:Parameter Tuning

Page 51: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

51

The Basic Model, Revisited

argmax P(e | f) = e

argmax P(e) x P(f | e) / P(f) e

argmax P(e) x P(f | e) e

argmax P(e)2.4 x P(f | e) e

argmax P(e)2.4 x P(f | e) x #words(e)1.1

eRewards longer hypotheses, since they are unfairly penalized by P(e)

Works better

x P(e | f)1.1 x Plex(f | e)1.3 x Plex(e | f)0.9 x #phrases(e,f)0.5...

(Och, ACL 2003)

Page 52: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

52

Maximum BLEU Training(Och, ACL 2003)

Translation System

(Automatic,Trainable)

Translation Quality

Evaluator(Automatic)

Frenchinput

EnglishMT Output

EnglishReference Translations(sample “right answers”)

BLEUscore

LanguageModel #1

TranslationModel

LanguageModel #2

Length Model

OtherFeatures

MERT: Minimum Error Rate Training(optimizes BLEU directly)

(Och, ACL 2003)

Page 53: Dr. Preslav Nakov — Combining, Adapting and Reusing Bi-texts between Related Languages — Application to Statistical Machine Translation — part 1

53

Statistical Phrase-Based Translation

1. Training:1. P(e): n-gram language model2. P(f|e):

1. Generate word alignments

2. Build a phrase table

2. Tuning:1. Use MERT to tune the parameters

3. Evaluation:1. Run the system on test data2. Calculate BLEU