direct translation approaches: statistical machine translation stephan vogel, alicia tribble...
TRANSCRIPT
Direct Translation Approaches:Statistical Machine Translation
Stephan Vogel, Alicia Tribble
Interactive Systems LabCarnegie Mellon University &University Karlsruhe
Speech-to-Speech Translation WorkshopESSLLI 2002, Trento, Italy
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 2
Overview
Translation ApproachesStatistical Machine TranslationTranslating with Cascaded TransducersExperiments on Nespole Data
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 3
Translation Approaches
Interlingua basedTransfer basedDirect Example based Statistical
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 4
Statistical Machine Translation
Based on Bayes´ Decision Rule:
ê = argmax{ p(e | f) } = argmax{ p(e) p(f | e) }
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 5
Tasks in SMT
Modelling build statistical models which capture characteristic features of translation equivalences and of the target language
Training train translation model on bilingual corpus, train language model on monolingual corpus
Decoding find best translation for new sentences according to models
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 6
Alignment Example
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 7
Translation Models
IBM1 – lexical probabilities onlyIBM2 – lexicon plus absolut positionHMM – lexicon plus relative positionIBM3 – plus fertilitiesIBM4 – inverted relative position alignment IBM5 – non-deficient version of model 4
[Brown, et.al. 93, Vogel, et.al. 96]
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 8
HMM Alignment Model
p(f|e) = a p(f1J, a1
J | e1I)
= a j p(fj , aj | f1j-1, a1
j-1, e1
I)
= a j p(aj | aj-1) p(fj | ea(j))
~ maxa j p(aj | aj-1) p(fj | ea(j))Alignment aj of current word fj depends on alignment aj-1 of previous word fj-1 .
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 9
Phrase Translation
Why? To capture context Local word reordering
How? Train alignment model Extract phrase-to-phrase translations from Viterbi path
Notes: Often better results when training target to source for
extraction of phrase translations Phrases are not fully integrated into alignment model,
they are extracted only after training is completed
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 10
Translation with Transducers
Transducer: Finite state machine Read sequence of words, write sequene of words Output vocaculary can be different from input vocabulary
Transducer used in current implementation: Tree Transducer, i.e. prefix tree over input strings Output from final states Used to encode lexicon, phrase translations, bilingual word classes and grammers
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 11
Cascaded Transducers
Generalization through cascaded transducers:Replace words by category labels and have a transducer for each category
[Vogel, Ney 2000]
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 12
Language Model
Standard n-gram model:
p(w1 ... wn) = i p(wi | w1... wi-1)
= i p(wi | wi-2 wi-1) trigram
= i p(wi | wi-1) bigram
Many events not seen -> smoothing required
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 13
Decoding Strategies
Sequential construction of target sentence Extend partial translation by words which are
translations of words in the source sentence Language model can be applied immediately Mechanism to ensure proper coverage of
source sentence required
Left – right over source sentence Find translations for sequences of words Construct translation lattice Apply language model and select best path
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 14
Translation Graph
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 15
Speech Recognition and Translation
Search best string in target language for given acoutsic signal in source language
ê = argmax{ p(e) p(x|e) } = argmax{ p(e) f p(f,x|e) }
= argmax{ p(e) f p(f|e) p(f) p(x|f,x) } = argmax{ p(e) f p(f|e) p(f) p(x|f) }
i.e. recognizer language model not needed !?[Ney, 2001]
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 16
Coupling Recognition and Translation
Sequential – first recognition, then translation First best recognition hypothesis N-best list – translate n times Word lattice – translate all pathes in lattice, reuse results
from partial pathes
Integrated – recognition and translation in combined search
Subsequential transducer approach uses this
Note: In Eutrans project best results when translation on first-best hypothesis
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 17
Example-Based Machine Translation
Re-use translations to create new translations:Store bilingual corpus with (partial) alignmentFind partial matches, i.e. sequences of words in stored corpus to cover a new sentence Extract translation(s) and build translation latticeApply language model to find best path, i.e. best translation
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 18
Nespole Experiments
Application of direct translation techniques to dialogue data collected in Nespole!Testing the effect of phrase translationExperiments with additional knowledge sources Preexisting: monolingual data for the LM and
publically available Lexica Engineered: handwritten rules for fixed
expressions and knowledge extracted from semantic grammars
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 19
Nespole Project Data
CMU database of dialogues in the travel domainGerman, English (Italian, French)Speech recognizer hypotheses and human transcriptions both availableSegmented into SDUs (Speech Dialogue Units)
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 20
Nespole Corpus: Training
Language English German
Tokens 15572 14992
Vocabulary 1032 1338
Singletons 404 620
3182 Parallel SDUs
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 21
Nespole Corpus: Testing
German Reference A Reference B
Tokens 437 610 607
Vocabulary 183 (45 OOV) 165 160
70 Parallel SDUs
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 22
0 1 2 3 4 5 6 7 8 9 10
English
German
0 2 4 6 8 10
English
German
Testing Data
Training Data
Corpus Challenges: Sentence Length
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 23
Evaluation
Human Scoring Good, Okay, Bad (c.f. Nespole evaluation) Collapsed into a „human score“ on [0,1]
Bleu Score Average of N-gram precisions from (1..N),
typically N=3 or 4 Penalty for short translations to substitute
for recall measure
[Papinini et.al. 2001]
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 24
Phrase Translation
Unequal sentence lengths means that training can be improved directionally: S T or T SGerman compounds are better for 1 to many alignments with English multiword phrases, so direction is importantStatistical lexicon alone
Statistical lexicon, phrases from S T training
Statistical lexicon, phrases from bidir. training
0,1903 0,2350 0,2654
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 25
Language Model
Monolingual text available from Verbmobil 500.000 words (32x the size of orig. English
corpus)Helps to choose among translation hypotheses but will not generate new ones
Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and small LM
Stat. lexicon, phrases, fixed expression rules, gen. lexicon, and large LM
0,2613 0,3172
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 26
General-Purpose Lexicon
Statistical lexicon, phrases, and fixed exp´s with small LM
0,2654
Adding general-purpose lexicon as a transducer
0,2522
Using large instead of small LM
0,3141
general-purpose lexicon as training data instead of separate transducer
0,3275
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 27
Fixed Expression Rules
Transducer rules are human readable and can be added by handFixed expressions for times and dates are re-usable, require less time to build than domain-specific rules and improve coverage of some semi-idiomatic constructions.
Statistical lexicon with small LM
Statistical lexicon and fixed-expression transducer with small LM
0,1893 0,1903
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 28
Knowledge from Existing Grammars
Could help in domain- but not language- portabilityBenefit mostly in additional vocabulary Statistical lexicon, fixed exp´s, phrases, and general lexicon with large LM
Statistical lexicon, fixed exp´s, phrases, general lexicon and I-transducer with large LM
0,3141 0,3172
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 29
Comparative Evaluation Results
Good Okay Bad Score Bleu
Text IF 77 104 227 0,32 0,068
SMT 127 80 205 0,40 0,333
Speech
IF 64 101 243 0,28 0,059
SMT 95 83 227 0,34 0,262
16 July 2002 Speech-to-Speech Translation Workshop, ESSLLI, Trento, Italy 30
Selected References
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer. The Mathematics of Statistical Machine Translation: Parameter Estimation, Computational Linguistics, 1993, 19,2, pp.263—311
Stephan Vogel, Hermann Ney, Christoph Tillmann. HMM-Based Word Alignment in Statistical Translation. Int. Conf. on Computational Linguistics, Kopenhagen, Danemark, pp. 836-841, August 1996.
Stephan Vogel, Hermann Ney. Translation with Cascaded Finite State Transducers. 36th Annual Conference of the Association for Computational Linguistics, pp. 23-30, Hongkong, China, October2000.
Stephan Vogel, Alicia Tribble. Improving statistical machine translation for a speech-to-speech translation task. To appear in ICSLP 2002.
H. Ney. The Statistical Approach to Spoken Language Translation. Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Madonna di Campiglio, Trento, Italy, 8 pages, CD ROM, IEEE Catalog No. 01EX544, December 2001.
Kishore Papinini, Salim Roukos, Todd Ward, Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation ofMachine Translation. IBM Research Report RC22176(W0109-022), September17, 2001.