Outline
• Course overview
• Introduction to MT– Major challenges– Major approaches– Evaluation of MT systems
• Overview of word-based SMT
General info
• Course website: – Syllabus (incl. slides and papers): updated every week.– Message board– ESubmit
• Office hour: Fri: 10:30am-12:30pm.
• Prerequisites: – Ling570 and Ling571.– Programming: C or C++, Perl is a plus.– Introduction to probability and statistics
Expectations
• Reading: – Papers are online– Finish reading before class. Bring your questions to
class.
• Grade:– Leading discussion (1-2 papers): 50%– Project: 40%– Class participation: 10%– No quizzes, exams
Leading discussion
• Indicate your choice via EPost by Jan 8.
• You might want to read related papers.
• Make slides with PowerPoint.
• Email me your slides by 3:30am on the Monday before your presentation.
• Present the paper in class and lead the discussion: 40-50 minutes.
Project
• Details will be available soon.
• Project presentation: 3/7/06
• Final report: due on 3/12/06
• Pongo account will be ready soon.
A brief history of MT (Based on work by John Hutchins)
• Before the computer: In the mid 1930s, a French-Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’.
• The pioneers (1947-1954): the first public MT demo was given in 1954 (by IBM and Georgetown University).
• The decade of optimism (1954-1966): ALPAC (Automatic Language Processing Advisory Committee) report in 1966: "there is no immediate or predictable prospect of useful machine translation."
A brief history of MT (cont)
• The aftermath of the ALPAC report (1966-1980): a virtual end to MT research
• The 1980s: Interlingua, example-based MT
• The 1990s: Statistical MT
• The 2000s: Hybrid MT
Where are we now?
• Huge potential/need due to the internet, globalization and international politics.
• Quick development time due to SMT, the availability of parallel data and computers.
• Translation is reasonable for language pairs with a large amount of resource.
• Start to include more “minor” languages.
What is MT good for?
• Rough translation: web data• Computer-aided human translation• Translation for limited domain• Cross-lingual IR
• Machine is better than human in: – Speed: much faster than humans– Memory: can easily memorize millions of word/phrase
translations.– Manpower: machines are much cheaper than humans– Fast learner: it takes minutes or hours to build a new system.
Erasable memory – Never complain, never get tired, …
Translation is hard
• Novels
• Word play, jokes, puns, hidden messages
• Concept gaps: go Greek, bei fen
• Other constraints: lyrics, dubbing, poem, …
Major challenges
• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words
• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique constructions: – Divergence
Lexical choice
• Homonymy/Polysemy: bank, run
• Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, …
• Coding (Concept lexeme mapping) differences:– More distinction in one language: e.g., kinship
vocabulary.– Different division of conceptual space:
Choosing the appropriate inflection
• Inflection: gender, number, case, tense, …
• Ex:– Number: Ch-Eng: all the concrete nouns: ch_book book, books– Gender: Eng-Fr: all the adjectives– Case: Eng-Korean: all the arguments– Tense: Ch-Eng: all the verbs: ch_buy buy, bought, will buy
Inserting spontaneous words• Function words:
– Determiners: Ch-Eng: ch_book a book, the book, the books, books
– Prepositions: Ch-Eng: … ch_November … in November
– Relative pronouns: Ch-Eng: … ch_buy ch_book de ch_person the person who bought /book/
– Possessive pronouns: Ch-Eng: ch_he ch_raise ch_hand He raised his hand(s)
– Conjunction: Eng-Ch: Although S1, S2 ch_although S1, ch_but S2
– …
Inserting spontaneous words (cont)
• Content words:– Dropped argument: Ch-Eng: ch_buy le ma Has Subj bought Obj?
– Chinese First name: Eng-Ch: Jiang … ch_Jiang ch_Zemin …
– Abbreviation, Acronyms: Ch-Eng: ch_12 ch_big the 12th National Congress of the
CPC (Communist Party of China)
– …
Major challenges
• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words
• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique construction: – Structural divergence
Word order
• SVO, SOV, VSO, …• VP + PP PP VP• VP + AdvP AdvP + VP
• Adj + N N + Adj• NP + PP PP NP• NP + S S NP
• P + NP NP + P
“Unique” Constructions
• Overt wh-movement: Eng-Ch:– Eng: Why do you think that he came yesterday?– Ch: you why think he yesterday come ASP?– Ch: you think he yesterday why come?
• Ba-construction: Ch-Eng– She ba homework finish ASP She finished her
homework.– He ba wall dig ASP CL hole He digged a hole in
the wall.– She ba orange peel ASP skin She peeled the
orange’s skin.
Translation divergences
• Source and target parse trees (dependency trees) are not identical.
• Example: I like Mary S: Marta me gusta a mi (‘Mary pleases me’)
• More discussion next time.
How humans do translation?
• Learn a foreign language:– Memorize word translations– Learn some patterns: – Exercise:
• Passive activity: read, listen• Active activity: write, speak
• Translation:– Understand the sentence– Clarify or ask for help (optional)– Translate the sentence
Training stage
Decoding stage
Translation lexicon
Templates, transfer rules
Parsing, semantics analysis?
Interactive MT?
Word-level? Phrase-level? Generate from meaning?
Reinforced learning?Reranking?
What kinds of resources are available to MT?
• Translation lexicon: – Bilingual dictionary
• Templates, transfer rules:– Grammar books
• Parallel data, comparable data• Thesaurus, WordNet, FrameNet, …
• NLP tools: tokenizer, morph analyzer, parser, …
More resources for major languages, less for “minor” languages.
Major approaches
• Transfer-based
• Interlingua
• Example-based (EBMT)
• Statistical MT (SMT)
• Hybrid approach
The MT triangle
word Word
Meaning
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
(interlingua)
Ana
lysi
sS
ynthesis
Transfer-based MT
• Analysis, transfer, generation:1. Parse the source sentence2. Transform the parse tree with transfer rules3. Translate source words4. Get the target sentence from the tree
• Resources required:– Source parser– A translation lexicon– A set of transfer rules
• An example: Mary bought a book yesterday.
Transfer-based MT (cont)
• Parsing: linguistically motivated grammar or formal grammar?
• Transfer: – context-free rules? A path on a dependency tree? – Apply at most one rule at each level?– How are rules created?
• Translating words: word-to-word translation?• Generation: using LM or other additional knowledge?
• How to create the needed resources automatically?
Interlingua
• For n languages, we need n(n-1) MT systems.• Interlingua uses a language-independent
representation. • Conceptually, Interlingua is elegant: we only
need n analyzers, and n generators.
• Resource needed:– A language-independent representation– Sophisticated analyzers– Sophisticated generators
Interlingua (cont)
• Questions:– Does language-independent meaning representation
really exist? If so, what does it look like?– It requires deep analysis: how to get such an
analyzer: e.g., semantic analysis– It requires non-trivial generation: How is that done?– It forces disambiguation at various levels: lexical,
syntactic, semantic, discourse levels.– It cannot take advantage of similarities between a
particular language pair.
Example-based MT
• Basic idea: translate a sentence by using the closest match in parallel data.
• First proposed by Nagao (1981).• Ex:
– Training data:• w1 w2 w3 w4 w1’ w2’ w3’ w4’• w5 w6 w7 w5’ w6’ w7’• w8 w9 w8’ w9’
– Test sent:• w1 w2 w6 w7 w9 w1’ w2’ w6’ w7’ w9’
EMBT (cont)
• Types of EBMT:– Lexical (shallow)– Morphological / POS analysis – Parse-tree based (deep)
• Types of data required by EBMT systems:– Parallel text– Bilingual dictionary – Thesaurus for computing semantic similarity– Syntactic parser, dependency parser, etc.
EBMT (cont)
• Word alignment: using dictionary and heuristics exact match
• Generalization: – Clusters: dates, numbers, colors, shapes, etc. – Clusters can be built by hand or learned automatically.
• Ex:– Exact match: 12 players met in Paris last Tuesday
12 Spieler trafen sich letzen Dienstag in Paris
– Templates: $num players met in $city $time $num Spieler trafen sich $time in $city
Statistical MT
• Basic idea: learn all the parameters from parallel data.
• Major types:– Word-based– Phrase-based
• Strengths:– Easy to build, and it requires no human knowledge– Good performance when a large amount of training data is
available.
• Weaknesses:– How to express linguistic generalization?
Comparison of resource requirement
Transfer-based
Interlingua EBMT SMT
dictionary + + +
Transfer rules
+
parser + + + (?)
semantic
analyzer
+
parallel data + +
others Universal representation
thesaurus
Hybrid MT• Basic idea: combine strengths of different approaches:
– Syntax-based: generalization at syntactic level– Interlingua: conceptually elegant– EBMT: memorizing translation of n-grams; generalization at various level.– SMT: fully automatic; using LM; optimizing some objective functions.
• Types of hybrid HT:– Borrowing concepts/methods:
• SMT from EBMT: phrase-based SMT; Alignment templates• EBMT from SMT: automatically learned translation lexicon• Transfer-based from SMT: automatically learned translation lexicon, transfer rules;
using LM• …
– Using two MTs in a pipeline:• Using transfer-based MT as a preprocessor of SMT
– Using multiple MTs in parallel, then adding a re-ranker.
Evaluation
• Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT.
• Human evaluation: accuracy, fluency, …– Problem: expensive, slow, subjective, non-reusable.
• Automatic measures:– Edit distance– Word error rate (WER), Position-independent WER (PER)– Simple string accuracy (SSA), Generation string accuracy (GSA)– BLEU
Edit distance
• The Edit distance (a.k.a. Levenshtein distance) is defined as the minimal cost of transforming str1 into str2, using three operations (substitution, insertion, deletion).
• Use DP and the complexity is O(m*n).
WER, PER, and SSA
• WER (word error rate) is edit distance, divided by |Ref|.• PER (position-independent WER): same as WER but
disregards word ordering• SSA (Simple string accuracy) = 1 - WER
• Previous example:– Sys: w1 w2 w3 w4– Ref: w1 w3 w2– Edit distance = 2– WER=2/3– PER=1/3– SSA=1/3
Generation string accuracy (GSA)
Example:Ref: w1 w2 w3 w4
Sys: w2 w3 w4 w1
Del=1, Ins=1 SSA=1/2Move=1, Del=0, Ins=0 GSA=3/4
|Re|1
f
SubDelInsMoveGSA
BLEU
• Proposal by Papineni et. al. (2002)• Most widely used in MT community.• BLEU is a weighted average of n-gram precision
(pn) between system output and all references, multiplied by a brevity penalty (BP).
)1
(...***
*
21
1
NwwhenpppBP
pBPBLEU
nN
N
N
n
wn
n
N-gram precision
• N-gram precision: the percent of n-grams in the system output that are correct.
• Clipping: – Sys: the the the the the the– Ref: the cat sat on the mat– Unigram precision: – Max_Ref_count: the max number of times a
ngram occurs in any single reference translation.
)_Re_,min( CountfMaxcountCountclip
N-gram precision
i.e. the percent of n-grams in the system output that are correct (after clipping).
SysS Sngram
SysS Sngramclip
n ngramCount
ngramCount
p)(
)(
Brevity Penalty
• For each sent si in system output, find closest matching reference ri (in terms of length).
• Longer system output is already penalized by the n-gram precision measure.
otherwisee
rcifBP cr /1
1
|||,| i
ii
i rrscLet
An example
• Sys: The cat was on the mat• Ref1: The cat sat on a mat• Ref2: There was a cat on the mat
• Assuming N=3
• p1=5/6, p2=3/5, p3=1/4, BP=1 BLEU=0.50
• What if N=4?
Summary
• Course overview
• Major challenges in MT– Choose the right words (root form, inflection,
spontaneous words)– Put them in right positions (word order, unique
constructions, divergences)
Summary (cont)
• Major approaches– Transfer-based MT– Interlingua– Example-based MT– Statistical MT– Hybrid MT
• Evaluation of MT systems– Edit distance– WER, PER, SSA, GSA– BLEU
Translation divergences(based on Bonnie Dorr’s work)
• Thematic divergence: I like Mary S: Marta me gusta a mi (‘Mary pleases me’)
• Promotional divergence: John usually goes home S: Juan suele ira casa (‘John tends to go home’)
• Demotional divergence: I like eating G: Ich esse gern (“I eat likingly)
• Structural divergence: John entered the house S: Juan entro en la casa (‘John entered in the house’)
Translation divergences (cont)
• Conflational divergence: I stabbed John S: Yo le di punaladas a Juan (‘I gave knife-
wounds to John’)
• Categorial divergence: I am hungry G: Ich habe Hunger (‘I have hunger’)
• Lexical divergence: John broke into the room S: Juan forzo la entrada al cuarto (‘John forced
the entry to the room’)
Calculating edit distance
• D(0, 0) = 0• D(i, 0) = delCost * i• D(0, j) = insCost * j
• D(i+1, j+1) = min( D(i,j) + sub, D(i+1, j) + insCost, D(i, j+1) + delCost)
sub = 0 if str1[i+1]=str2[j+1] = subCost otherwise