machine translator introduction
TRANSCRIPT
Machine Translator
What is Machine Translator?
Automatic translation from one language to another
Koehn: „Translating between languages is a task for which even humans require special training.“
Why Even Humans Require Special Training
وندشی شورای امنیت، مردم ایران رو به قبله نمی « لولو» با نشان دادن
راهابچهكهموجوداتیمثلامنیتشورایاگركهاستگفته*:نیوزویكترجمه.كشندنمیدرازجهانمسلمانانقبلهسویبهایرانمردمشود،ظاهرترسانندمی
همراناكیترسچیزامنیتشورایاگركهگفت*:پائیسالاسپانیایینشریهترجمه.دخوابننمیسعودیعربستانسویبهایرانمردمهمبازدهد،نشانایرانیانبه
مركزسویبهایرانیانكشیدندرازكهگفت*:اومانیتهفرانسوینشریهترجمهایند،بترسنایافسانهموجوداتازآنهاكهدارداینبهبستگیمسلماناناعتقادات
.استایرانیداستانیك
Other Similar Concept
• computer-aided translation
• machine-aided human translation (MAHT)
History
As early as the 17th century by philosophers René Descartes and Gottfried Wilhelm Leibniz
MT in Computer
Applications
Dissemination
Publication inother languages
Communication
Emails, chats
Assimilation
Understand the Content
Challenges
Input
Typology
Lexical
Other
Input: Ambiguity
I saw a man with telescope
Input: Complexity
General relativity includes a dynamical spacetime so it isdifficult to see how to identify the conserved energy andmomentum Noether's theorem allows these quantities to bedetermined from a Lagrangian with translation invariance butgeneral covariance makes translation invariance intosomething of a gauge symmetry
Input: Wrong Sentence
I try for getting best grades but I did not can achive it
Typology: Morphology
Typology: Syntax
order of verbs (V), subjects (S) and objects (O)
Typology: Argument structure and linking
Typology: Pronouns omission
Lexical: Ambiguity
Lexical: Grammer
Lexical: Lexical gap
Lexical: Idiom
Other Challenges
Other Challenges
Human Translation Process
• Decoding the meaning of the source text
• Re-encoding this meaning in the target language.
Simplest Machine Translator
Apple سیب
MT
Human Translation
With Machine Aid
Machine Translation
With Human Aid
Fully Automated Translation
Rule Based MT
Direct MT Transfer MT Interlingua
Knowledge Based MT
Principle Based MT
Empirical Based MT
Statistical MT
Word Based Translation
Phrase Based Translation
Hierarchical Phrase Based
Translation
Example Based MT
Online Interactive MT
Hybrid MT Neural MT
Rule-Base MT
Direct Translation
• dictionary has to cover all cross-lingual phenomena
• need to include contextual information in dictionary (long phrases)
• inflectional agreement, shifts in word order & structure
+ direct translation systems include simplistic rules
Direct Translation Approach
• simplistic: only low-level pre/post-processing (tokenization, etc)
• advanced: handle some specific phenomena
identification & handling of syntactic ambiguity
morphological processing/synthesis
word re-ordering rules
rules for prepositions
handling of compounds and idioms, ...
Is Direct Translation Feasible?
Transfer Based Translation
Motivation:
• complete analysis of source language sentences
• handle lexical & structural ambiguity in one formalism
Transfer Based Needed Information/Tools
• source language parser (morpho-syntactic analysis)
• transfer engine (e.g. unification based grammar)
• target language generator
• Morphological analysis. Surface forms of the input text are classified as to part-of-speech (e.g. noun, verb, etc.) and sub-category (number, gender, tense, etc.). All of the possible "analyses" for each surface form are typically made output at this stage, along with the lemma of the word.
• Lexical categorisation. In any given text some of the words may have more than one meaning, causing ambiguity in analysis. Lexical categorisation looks at the context of a word to try to determine the correct meaning in the context of the input. This can involve part-of-speech tagging and word sense disambiguation.
• Lexical transfer. This is basically dictionary translation; the source language lemma (perhaps with sense information) is looked up in a bilingual dictionary and the translation is chosen.
• Structural transfer. While the previous stages deal with words, this stage deals with larger constituents, for example phrases and chunks. Typical features of this stage include concordance of gender and number, and re-ordering of words or phrases.
• Morphological generation. From the output of the structural transfer stage, the target language surface forms are generated.
Transfer Based: Syntactic Transfer
What are the problems?
• lots of grammar engineering (writing rules ...)
• language-pair specific rules
• exponential ambiguity
• variation & preference
Interlingua-based Translation
Persian
English
Persian
SpanishEnglish
Persian
Spanish
Japanese
English
interlingua
Persian
Spanish
New
English
Advantages & Disadvantages
• no language-pair specific transfer
• simple to add new languages (add new analysis/generation component)
• need to design interlingua that covers all language phenomena
• need semantic representation (and that’s hard!)
Statistical MT
Statistical MT
Statistical MT
Statistical MT
(1) build a language model which allows us to estimate P(e)
(2) build a translation model which allows us to estimate P(f|e)
(3) search for e maximizing the product P(f|e).P(e)
Language Modeling
Which N-Gram?
• 1-Gram is not very realistic
• More realistic still is the trigram model
Problem
50,000 English word
2.5 billion possible bigrams
Many zero bigram in corpus but maybe needed in translations
linear interpolation
Translation Model
(i) a model of the sentence-aligned source–target training corpus
(ii) a method for computing the probability that S and T are equivalent using that model
Translation Model Example
Example
Word Alignment
Simple Word Alignment
Expectation-Maximisation (EM) algorithm
Expectation-Maximisation (EM) algorithm
MT Evaluation
• How can we measure MT quality?
• How can we compare MT engines?
• How can we measure progress in MT development?
• Adequacy: Does the output convey the same meaning as the input sentence?
Is part of the message lost, added, or distorted?
• Fluency: Is the output good fluent English?
This involves both grammatical correctness and idiomatic word choices.
What do We Expect from MT?
• adequacy & informativeness (preserve meaning)
• fluency & grammaticality (translation needs to be natural)
• acceptance (for its task)
Task-specific evaluation
• browsing quality: Is the translation understandable in itscontext?
• post-editing quality: How many edit operations are required to turn it into a good translation?
• publishing quality: How many human interventions arenecessary to make the entire document ready for printing?
Evaluation is Difficult!
• I What is the best translation? (language variation!)
• I Subjective aspects (What is “fluent”? Clarity? Style?)
• I What is “grammatical”?
• I What is “adequate”? (Is it possible to be adequate?)
MT evaluation
Manual Evaluation
• ask actual users to rate translations
• statistics over user responses
• separate evaluations of adequacy & fluency
• requires guidelines
• task-specific evaluation
Automatic Evaluation
• compare to reference translations
• approximations by measuring overlaps
• strong bias but useful for rapid development
Fluency and Adequacy: Scales
Manual MT evaluation: What are the problems?
• need volunteers (every time we want to evaluate)
• expensive evaluation!
• subjective measures & disagreement between annotators
Automatic Evaluation: BLEU-score
• introduced in 2002 by Papineni et al
• desperately needed by rapid MT development
• quickly adapted by statistical MT community
• created a boom in MT research/experiments
• Many MT papers report only BLEU scores and don’t even look at the
translations
BLEU-score
the closer a machine translation is to a professional human translation
the better it is
Definition
•Pn: for each pair of candidate and reference sentences.
• This score represents the proportion of n-word sequences in the candidate translation which also occur in the reference translation.
• Koehn, Philipp. Statistical machine translation. Cambridge University Press, 2009.
• Arnold, D., et al. "Machine translation: An introductory guide. NCC Blackwell." (1994).
• https://www.slideshare.net/rushdishams/types-of-machine-translation
• https://en.wikipedia.org/wiki/Machine_translation
• Brown, Peter F., et al. "A statistical approach to machine translation." Computational linguistics 16.2 (1990): 79-85.
• Hearne, Mary, and Andy Way. "Statistical machine translation: a guide for linguists and translators." Language and Linguistics Compass 5.5 (2011): 205-226.
Question?