multi-engine machine translationalavie/presentations/kenneth-memt-may11.pdf · multi-engine machine...
TRANSCRIPT
OverviewSystem Overview
Language ModelingResults
Multi-Engine Machine Translation
Kenneth Heafield
Language Technologies InstituteCarnegie Mellon University
May 26, 2011
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Pipeline
Individual Systems METEOR This Work
Input Translate
Translate
Translate
Align Decode
Output
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Arabic-English Example Combination
System 1: So even if that was meaningful , it is because you were late
System 2: Even if feasible , it is because you have been delayed
Combine
Combined: Even if feasible , it is because you were late
≈ Compare
Reference: And even if that was useful , it was because you were late
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Sentence Pair Alignment
Match surface, stems, WordNet synsets, and learned correspondencesMinimize crossing alignments
Twice that produced by nuclear plants
Double that that produce nuclear power stations
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Search Space
Algorithm
Start at the beginning of each sentence
Branch by appending the first unused word from a system
Use the appended word and those aligned with it
Loop until all hypotheses reach end of sentence
Example
System 1: Now can know why .
System 2: Now we can now know why .Partial Hypothesis{
Now
NowKenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Search Space
Algorithm
Start at the beginning of each sentence
Branch by appending the first unused word from a system
Use the appended word and those aligned with it
Loop until all hypotheses reach end of sentence
Example
System 1: Now can know why .
System 2: Now we can now know why .Partial Hypothesis
Now
{can
weKenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Search Space
Algorithm
Start at the beginning of each sentence
Branch by appending the first unused word from a system
Use the appended word and those aligned with it
Loop until all hypotheses reach end of sentence
Example
System 1: Now can know why .
System 2: Now we can now know why .Partial Hypothesis
Now we
{can
canKenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Search Space
Algorithm
Start at the beginning of each sentence
Branch by appending the first unused word from a system
Use the appended word and those aligned with it
Loop until all hypotheses reach end of sentence
Example
System 1: Now can know why .
System 2: Now we can now know why .Partial Hypothesis
Now we can
{know
nowKenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Search Space
Features
Length
Length of hypothesis
Match
How well each system matches the combined output
Language Model
Model: log probability from a language modelOOV: count of words unknown to the model
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Language Modeling
Our implementations compared to de facto SRILM
Probing 2.4x speed, 57% of the memory
Trie 1.3x speed, 30% of the memory
Impact on Applications
Substantial reduction in CPU and RAM costs
Replacing SRILM: Moses, cdec, and Joshua use my code
Also applies to speech recognition and information retrieval
Kenneth Heafield Multi-Engine Machine Translation
OverviewSystem Overview
Language ModelingResults
Recent Results
Track ∆BLEU ∆-TER ∆METGALE Arabic-English 1.02 0.99NIST Arabic-English 6.67 3.02 3.68
Urdu-English 1.84 0.87 0.74Czech-English 0.26 -1.05 0.73
German-English 1.91 1.76 0.93Spanish-English 3.64 2.68 1.89French-English 2.05 1.81 1.40
English-Czech 0.43 0.21 0.28English-German 0.53 -0.14 -0.04English-Spanish 2.62 3.01 1.00English-French 1.08 0.40 0.76
Table: Performance gains over best individual system from recent externalevaluations: DARPA GALE, NIST-2009, and WMT-2011.
Kenneth Heafield Multi-Engine Machine Translation