6.881: natural language processing machine translation ii...6.881: natural language processing —...
TRANSCRIPT
-
6.881: Natural Language Processing
Machine Translation II
Philipp Koehn
CS and AI Lab
MIT
November 23, 2004
– p.1
-
6.881: Natural Language Processing — Machine Translation IIp
Outline p
� Lecture I
– Introduction to Machine Translation
– Principles of Statistical MT
– Word-Based Models
– Phrase-Based Models
� Lecture II
– Beam Search Decoding
– Evaluation
– The Challenge of Syntax
Philipp Koehn, CSAIL, MIT 2 – p.2
-
6.881: Natural Language Processing — Machine Translation IIp
Phrase-Based Translation p
Morgen fliege ich nach Kanada zur Konferenz
Tomorrow I will fly to the conference in Canada
� Foreign input is segmented in phrases
– any sequence of words, not necessarily linguistically motivated
� Each phrase is translated into English
� Phrases are reordered
Philipp Koehn, CSAIL, MIT 3 – p.3
-
6.881: Natural Language Processing — Machine Translation IIp
Decoding Algorithm p
� Goal of the decoding algorithm:
Put models to work, perform the actual translation
Philipp Koehn, CSAIL, MIT 4 – p.4
-
6.881: Natural Language Processing — Machine Translation IIp
Greedy Decoder pMaria no daba una bofetada a la bruja verde
Mary no give a slap to the witch green
Mary no give a slap to the green witch
Mary no give a slap the green witch
Mary did not give a slap the green witch
GLOSS
SWAP
ERASE
INSERTMary not give a slap the green witch
CHANGE
Mary did not slap the green witchJOIN
� Greedy Hill-climbing [Germann, 2003]
– start with gloss
– improve probability with actions
– use 2-step look-ahead to avoid some local minima
Philipp Koehn, CSAIL, MIT 5 – p.5
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no verdelaadio una bofetada
� Build translation left to right
– select foreign words to be translated
Philipp Koehn, CSAIL, MIT 6 – p.6
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no
Mary
verdelaadio una bofetada
� Build translation left to right
– select foreign words to be translated
– find English phrase translation
– add English phrase to end of partial translation
Philipp Koehn, CSAIL, MIT 7 – p.7
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujano verdelaadio una bofetada
Mary
Maria
� Build translation left to right
– select foreign words to be translated
– find English phrase translation
– add English phrase to end of partial translation
– mark foreign words as translated
Philipp Koehn, CSAIL, MIT 8 – p.8
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no
Mary did not
verdelaadio una bofetada
� One to many translation
Philipp Koehn, CSAIL, MIT 9 – p.9
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no dio una bofetada
Mary did not slap
verdelaa
� Many to one translation
Philipp Koehn, CSAIL, MIT 10 – p.10
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no dio una bofetada
Mary did not slap the
verdea la
� Many to one translation
Philipp Koehn, CSAIL, MIT 11 – p.11
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria no dio una bofetada a la
Mary did not slap the green
verde
� Reordering
Philipp Koehn, CSAIL, MIT 12 – p.12
-
6.881: Natural Language Processing — Machine Translation IIp
Beam-Search Decoding Process pbrujaMaria
witch
no verde
Mary did not slap the green
dio una bofetada a la
� Translation finished
Philipp Koehn, CSAIL, MIT 13 – p.13
-
6.881: Natural Language Processing — Machine Translation IIp
Translation Options pbofetadaunadio a la verdebrujanoMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
� Look up possible phrase translations
– many different ways to segment words into phrases
– many different ways to translate each phrase
Philipp Koehn, CSAIL, MIT 14 – p.14
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion pdio a la verdebrujanoMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: f: ---------p: 1
una bofetada
� Start with null hypothesis
– e: no English words
– f: no foreign words covered
– p: probability 1
Philipp Koehn, CSAIL, MIT 15 – p.15
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion pdio a la verdebrujanoMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: Maryf: *--------p: .534
e: f: ---------p: 1
una bofetada
� Pick translation option
� Create hypothesis– e: add English phrase Mary– f: first foreign word covered– p: probability 0.534
Philipp Koehn, CSAIL, MIT 16 – p.16
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion pdio a la verdebrujanoMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: Maryf: *--------p: .534
e: witchf: -------*-p: .182
e: f: ---------p: 1
una bofetada
� Add another hypothesis
Philipp Koehn, CSAIL, MIT 17 – p.17
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion pdio una bofetada a la verdebrujanoMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: Maryf: *--------p: .534
e: witchf: -------*-p: .182
e: f: ---------p: 1
e: ... slapf: *-***----p: .043
� Further hypothesis expansion
Philipp Koehn, CSAIL, MIT 18 – p.18
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion pdio una bofetada bruja verdeMaria
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: Maryf: *--------p: .534
e: witchf: -------*-p: .182
e: f: ---------p: 1
e: slapf: *-***----p: .043
e: did notf: **-------p: .154
e: slapf: *****----p: .015
e: thef: *******--p: .004283
e:green witchf: *********p: .000271
a lano
� ... until all foreign words covered
– find best hypothesis that covers all foreign words
– backtrack to read off translation
Philipp Koehn, CSAIL, MIT 19 – p.19
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Expansion p
Mary notdid not
give a slap to the witch greenby
to theto
green witch
the witch
did not giveno
a slapslap
theslap
e: Maryf: *--------p: .534
e: witchf: -------*-p: .182
e: f: ---------p: 1
e: slapf: *-***----p: .043
e: did notf: **-------p: .154
e: slapf: *****----p: .015
e: thef: *******--p: .004283
e:green witchf: *********p: .000271
no dio a la verdebrujanoMaria una bofetada
� Adding more hypothesis
Explosion of search space
Philipp Koehn, CSAIL, MIT 20 – p.20
-
6.881: Natural Language Processing — Machine Translation IIp
Explosion of Search Space p
� Number of hypotheses is exponential with respect to
sentence length
Decoding is NP-complete [Knight, 1999]
Need to reduce search space
– risk free: hypothesis recombination
– risky: histogram/threshold pruning
Philipp Koehn, CSAIL, MIT 21 – p.21
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Recombination p
p=1Mary did not give
givedid not
p=0.534
p=0.164
p=0.092
p=0.044
p=0.092
� Different paths to the same partial translation
Philipp Koehn, CSAIL, MIT 22 – p.22
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Recombination p
p=1Mary did not give
givedid not
p=0.534
p=0.164
p=0.092
p=0.092
� Different paths to the same partial translation
Combine paths
– drop weaker hypothesis
– keep pointer from worse path
Philipp Koehn, CSAIL, MIT 23 – p.23
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Recombination p
p=1Mary did not give
givedid not
p=0.534
p=0.164
p=0.092Joe
did not givep=0.092 p=0.017
� Recombined hypotheses do not have to match completely
� No matter what is added, weaker path can be dropped, if:
– last two English words match (matters for language model)
– foreign word coverage vectors match (effects future path)
Philipp Koehn, CSAIL, MIT 24 – p.24
-
6.881: Natural Language Processing — Machine Translation IIp
Hypothesis Recombination p
p=1Mary did not give
givedid not
p=0.534
p=0.164
p=0.092Joe
did not givep=0.092
� Recombined hypotheses do not have to match completely
� No matter what is added, weaker path can be dropped, if:
– last two English words match (matters for language model)
– foreign word coverage vectors match (effects future path)
Combine paths
Philipp Koehn, CSAIL, MIT 25 – p.25
-
6.881: Natural Language Processing — Machine Translation IIp
Pruning p
� Hypothesis recombination is not sufficient
Heuristically discard weak hypotheses
� Organize Hypothesis in stacks, e.g. by
– same foreign words covered
– same number of foreign words covered (Pharaoh does this)
– same number of English words produced
� Compare hypotheses in stacks, discard bad ones
– histogram pruning: keep top � hypotheses in each stack (e.g., �=100)
– threshold pruning: keep hypotheses that are at most � times the cost of
best hypothesis in stack (e.g., � = 0.001)
Philipp Koehn, CSAIL, MIT 26 – p.26
-
6.881: Natural Language Processing — Machine Translation IIp
Comparing Hypotheses p
� Comparing hypotheses with same number of foreign
words covered
Maria no
e: Mary did notf: **-------p: 0.154
a la
e: thef: -----**--p: 0.354
dio una bofetada bruja verde
betterpartial
translation
coverseasier part
--> lower cost
� Hypothesis that covers easy part of sentence is preferred
Need to consider future costPhilipp Koehn, CSAIL, MIT 27 – p.27
-
6.881: Natural Language Processing — Machine Translation IIp
Future Cost Estimation p
� Estimate cost to translate remaining part of input
� Step 1: find cheapest translation options– find cheapest translation option for each input span– compute translation model cost– estimate language model cost (no prior context)– ignore reordering model cost
� Step 2: compute cheapest cost– for each contiguous span:– find cheapest sequence of translation options
� Precompute and lookup– precompute future cost for each contiguous span– future cost for any coverage vector:
sum of cost of each contiguous span of uncovered words
�
no expensive computation during run time
Philipp Koehn, CSAIL, MIT 28 – p.28
-
6.881: Natural Language Processing — Machine Translation IIp
Word Lattice Generation p
p=1Mary did not give
givedid not
p=0.534
p=0.164
p=0.092Joe
did not givep=0.092
� Search graph can be easily converted into a word lattice
– can be further mined for n-best lists
�
enables reranking approaches
�
enables discriminative training
Marydid not give
givedid not
Joedid not give
Philipp Koehn, CSAIL, MIT 29 – p.29
-
6.881: Natural Language Processing — Machine Translation IIp
Evaluation p
� Manual Evaluation
– human judge output
– expensive
� Automatic Evaluation
– machines judge output
– fast
– reliable?
� Task-Oriented Evaluation
– humans do task with MT
– tests usefulness of MT
Philipp Koehn, CSAIL, MIT 30 – p.30
-
6.881: Natural Language Processing — Machine Translation IIp
Manual Evaluation p
� Correct yes/no
– simple
– longer sentences almost always have at least one error
� Correct on scale
– 0=bad, 5=perfect
– disagreement between judges
� More detailed judgments
– adequacy: how well is meaning preserved?
– fluency: is it good English?
– ...
Philipp Koehn, CSAIL, MIT 31 – p.31
-
6.881: Natural Language Processing — Machine Translation IIp
Manual Evaluation p
� Give grade from 0=bad to 5=perfect
– In the First Two Months Guangdong’s Export of High-Tech Products 3.76
Billion US Dollars
– The Guangdong provincial foreign trade and economic growth has made
important contributions.
– Suicide explosion in Jerusalem
� Agreement
Philipp Koehn, CSAIL, MIT 32 – p.32
-
6.881: Natural Language Processing — Machine Translation IIp
Automatic Evaluation p
� Why automatic evaluation metrics?
– manual evaluation is too slow
– evaluation on large test sets reveals minor improvements
– automatic tuning to improve machine translation performance
� History
– Word Error Rate
– BLEU since 2002
– BLEU in short: overlap with reference translations
Philipp Koehn, CSAIL, MIT 33 – p.33
-
6.881: Natural Language Processing — Machine Translation IIp
Bi-Lingual Evaluation Understudy (BLEU) p
Philipp Koehn, CSAIL, MIT 34 – p.34
-
6.881: Natural Language Processing — Machine Translation IIp
Correlation with Manual Evaluation p
� Correlates with human evaluation (adequacy, fluency)Philipp Koehn, CSAIL, MIT 35– p.35
-
6.881: Natural Language Processing — Machine Translation IIp
The Challenge of Syntax p
foreignwords
foreignsyntax
foreignsemantics
interlingua
englishsemantics
englishsyntax
englishwords
� Remember the pyramid
Philipp Koehn, CSAIL, MIT 36 – p.36
-
6.881: Natural Language Processing — Machine Translation IIp
Advantages of Syntax-Based Translation p
� Reordering for syntactic reasons
– e.g., move German object to end of sentence
� Better explanation for function words
– e.g., prepositions, determiners
� Conditioning to syntactically related words
– translation of verb may depend on subject or object
� Use of syntactic language models
Philipp Koehn, CSAIL, MIT 37 – p.37
-
6.881: Natural Language Processing — Machine Translation IIp
Inversion Transduction Grammars p
� Generation of both English and foreign trees [Wu, 1997]
� Rules (binary and unary)
–
� � ��� ��� � � � ��
–
� � ��� ��� � ��� ���
–
� �� ��
–
� �� ��
–
� � � ��
Common binary tree required
– limits the complexity of reorderings
Philipp Koehn, CSAIL, MIT 38 – p.38
-
6.881: Natural Language Processing — Machine Translation IIp
Syntax Trees p
Mary did not slap the green witch
� English binary tree
Philipp Koehn, CSAIL, MIT 39 – p.39
-
6.881: Natural Language Processing — Machine Translation IIp
Syntax Trees (2) p
Maria no daba una bofetada a la bruja verde
� Spanish binary tree
Philipp Koehn, CSAIL, MIT 40 – p.40
-
6.881: Natural Language Processing — Machine Translation IIp
Syntax Trees (3) p
MaryMaria
did*
notno
slapdaba
*una
*bofetada
*a
thela
greenverde
witchbruja
� Combined tree with reordering of Spanish
� Can such trees be learned from data?
� Do common tree exist with real syntax on both sides?
Philipp Koehn, CSAIL, MIT 41 – p.41
-
6.881: Natural Language Processing — Machine Translation IIp
Dependency Structure p
gekauftbought
hattehad
autocar
dasthe
ichI
� Common dependency tree
� Interest in dependency-based translation models
Philipp Koehn, CSAIL, MIT 42 – p.42
-
6.881: Natural Language Processing — Machine Translation IIp
String to Tree Translation p
foreignwords
foreignsyntax
foreignsemantics
interlingua
englishsemantics
englishsyntax
englishwords
� Use of English syntax trees [Yamada and Knight, 2001]
– exploit rich resources on the English side
– obtained with statistical parser [Collins, 1997]
– flattened tree to allow more reorderings
– works well with syntactic language model
Philipp Koehn, CSAIL, MIT 43 – p.43
-
6.881: Natural Language Processing — Machine Translation IIp
Yamada and Knight [2001] pVB
VB1 VB2
VB
TO
TO
MN
PRP
he adores
listening
to music
VB
VB1VB2
VB
TO
TO
MN
PRP
he adores
listening
tomusic
VB
VB1VB2
VB
TO
TO
MN
PRP
he adores
listening
tomusic
no
ha ga desu
VB
VB1VB2
VB
TO
TO
MN
PRP
ha daisuki
kiku
woongaku
no
kare ga desu
reorder
insert
translate
take leaves
Kare ha ongaku wo kiku no ga daisuki desu
Philipp Koehn, CSAIL, MIT 44 – p.44
-
6.881: Natural Language Processing — Machine Translation IIp
Syntactic Language Model p
� Good syntax tree good English
� Allows for long distance constraints
NP
NP PP
the house of the man
S
VP
is good
S
NP VP
the house is the man
?
VP
is good
� Left translation preferred by syntactic LM
Philipp Koehn, CSAIL, MIT 45 – p.45
-
6.881: Natural Language Processing — Machine Translation IIp
String to Tree Transfer and Syntactic LM p
� Work presented at this MT Summit by Charniak, Knight,Yamada
– more grammatical correct output
– more perfectly translated sentences
– ... but no improvement in BLEU
� Syntactic transfer and LM on top of phrase translation
– parse a lattice generated by phrase-based MT
– no results yet
Philipp Koehn, CSAIL, MIT 46 – p.46
-
6.881: Natural Language Processing — Machine Translation IIp
Augment Models with Syntactic Features p
� Intuition: other models work fine, syntax provides
additional clues
� Define syntactic properties that should hold
– preservation of plural
– output should have verb
– no dropping of content words
– ...
� 2003 summer workshop at John Hopkins: little
improvement
Philipp Koehn, CSAIL, MIT 47 – p.47
-
6.881: Natural Language Processing — Machine Translation IIp
Clause Structure pS PPER-SB Ich VAFIN-HD werde VP-OC PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen $, , S-MO KOUS-CP damit PPER-SB Sie VP-OC PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen$. .
I will you the corresponding comments pass on , so that you that perhaps in the vote include can.
MAINCLAUSE
SUB-ORDINATE
CLAUSE
� Syntax tree from German parser
– statistical parser by Amit Dubay, trained on TIGER treebank
Philipp Koehn, CSAIL, MIT 48 – p.48
-
6.881: Natural Language Processing — Machine Translation IIp
Reordering When Translating pS PPER-SB Ich VAFIN-HD werde PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen$, ,S-MO KOUS-CP damit PPER-SB Sie PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen$. .
I will you the corresponding comments pass on, so that you that perhaps in the vote include can.
� Reordering when translating into English– tree is flattened– clause level constituents line up
Philipp Koehn, CSAIL, MIT 49 – p.49
-
6.881: Natural Language Processing — Machine Translation IIp
Clause Level Reordering pS PPER-SB Ich VAFIN-HD werde PPER-DA Ihnen NP-OA ART-OA die ADJ-NK entsprechenden NN-NK Anmerkungen VVFIN aushaendigen$, ,S-MO KOUS-CP damit PPER-SB Sie PDS-OA das ADJD-MO eventuell PP-MO APRD-MO bei ART-DA der NN-NK Abstimmung VVINF uebernehmen VMFIN koennen$. .
I will you the corresponding comments pass on, so that you that perhaps in the vote include can.
124
5
3
1264
7
53
� Clause level reordering is a well defined task– label German constituents with their English order– done this for 300 sentences, two annotators, high agreement
Philipp Koehn, CSAIL, MIT 50 – p.50
-
6.881: Natural Language Processing — Machine Translation IIp
Systematic Reordering German English p
� Many types of reorderings are systematic
– move verb group together
– subject - verb - object
– move negation in front of verb
Write rules by hand
– apply rules to test and training data
– train standard phrase-based SMT system
System BLEU
baseline system 25.2%
with manual rules 26.8%
Philipp Koehn, CSAIL, MIT 51 – p.51
-
6.881: Natural Language Processing — Machine Translation IIp
Integration p
f f’ etransformation
phrase-basedstatistical MT
� Transform f into f’ with our methods
� Translate n-best restructurings with phrase-based MT
– uses both transformation score and translation/language model score
– if no restructuring
�
baseline performance
� Transformation does not need to be perfect
– phrase-based model may still reorder
Philipp Koehn, CSAIL, MIT 52 – p.52
-
6.881: Natural Language Processing — Machine Translation IIp
Improved Translations p
� we must also this criticism should be taken seriously .
�
we must also take this criticism seriously .
� i am with him that it is necessary , the institutional balance by means of apolitical revaluation of both the commission and the council to maintain .
�
i agree with him in this , that it is necessary to maintain the institutionalbalance by means of a political revaluation of both the commission and thecouncil .
� thirdly , we believe that the principle of differentiation of negotiations note .
�
thirdly , we maintain the principle of differentiation of negotiations .
� perhaps it would be a constructive dialog between the government andopposition parties , social representative a positive impetus in the rightdirection .
�
perhaps a constructive dialog between government and opposition partiesand social representative could give a positive impetus in the right direction .
Philipp Koehn, CSAIL, MIT 53 – p.53
myheader {Outline}myheader {Phrase-Based Translation}myheader {Decoding Algorithm}myheader {Greedy Decoder}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Beam-Search Decoding Process}myheader {Translation Options}myheader {Hypothesis Expansion}myheader {Hypothesis Expansion}myheader {Hypothesis Expansion}myheader {Hypothesis Expansion}myheader {Hypothesis Expansion}myheader {Hypothesis Expansion}myheader {Explosion of Search Space}myheader {Hypothesis Recombination}myheader {Hypothesis Recombination}myheader {Hypothesis Recombination}myheader {Hypothesis Recombination}myheader {Pruning}myheader {Comparing Hypotheses}myheader {Future Cost Estimation}myheader {Word Lattice Generation}myheader {Evaluation}myheader {Manual Evaluation}myheader {Manual Evaluation}myheader {Automatic Evaluation}myheader {Bi-Lingual Evaluation Understudy (BLEU)}myheader {Correlation with Manual Evaluation}myheader {The Challenge of Syntax}myheader {Advantages of Syntax-Based Translation}myheader {Inversion Transduction Grammars}myheader {Syntax Trees}myheader {Syntax Trees (2)}myheader {Syntax Trees (3)}myheader {Dependency Structure}myheader {String to Tree Translation}myheader {Yamada and Knight [2001]}myheader {Syntactic Language Model}myheader {String to Tree Transfer and Syntactic LM}myheader {Augment Models with Syntactic Features}myheader {Clause Structure}myheader {Reordering When Translating}myheader {Clause Level Reordering}myheader {Systematic Reordering German $ightarrow $ English}myheader {Integration}myheader {Improved Translations}