Download - Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06

Introduction to MT

Ling 580

Fei Xia

Week 1: 1/03/06

Outline

• Course overview

• Introduction to MT– Major challenges– Major approaches– Evaluation of MT systems

• Overview of word-based SMT

Course overview

General info

• Course website: – Syllabus (incl. slides and papers): updated every week.– Message board– ESubmit

• Office hour: Fri: 10:30am-12:30pm.

• Prerequisites: – Ling570 and Ling571.– Programming: C or C++, Perl is a plus.– Introduction to probability and statistics

Expectations

• Reading: – Papers are online– Finish reading before class. Bring your questions to

class.

• Grade:– Leading discussion (1-2 papers): 50%– Project: 40%– Class participation: 10%– No quizzes, exams

Leading discussion

• Indicate your choice via EPost by Jan 8.

• You might want to read related papers.

• Make slides with PowerPoint.

• Email me your slides by 3:30am on the Monday before your presentation.

• Present the paper in class and lead the discussion: 40-50 minutes.

Project

• Details will be available soon.

• Project presentation: 3/7/06

• Final report: due on 3/12/06

• Pongo account will be ready soon.

Introduction to MT

A brief history of MT (Based on work by John Hutchins)

• Before the computer: In the mid 1930s, a French-Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’.

• The pioneers (1947-1954): the first public MT demo was given in 1954 (by IBM and Georgetown University).

• The decade of optimism (1954-1966): ALPAC (Automatic Language Processing Advisory Committee) report in 1966: "there is no immediate or predictable prospect of useful machine translation."

A brief history of MT (cont)

• The aftermath of the ALPAC report (1966-1980): a virtual end to MT research

• The 1980s: Interlingua, example-based MT

• The 1990s: Statistical MT

• The 2000s: Hybrid MT

Where are we now?

• Huge potential/need due to the internet, globalization and international politics.

• Quick development time due to SMT, the availability of parallel data and computers.

• Translation is reasonable for language pairs with a large amount of resource.

• Start to include more “minor” languages.

What is MT good for?

• Rough translation: web data• Computer-aided human translation• Translation for limited domain• Cross-lingual IR

• Machine is better than human in: – Speed: much faster than humans– Memory: can easily memorize millions of word/phrase

translations.– Manpower: machines are much cheaper than humans– Fast learner: it takes minutes or hours to build a new system.

Erasable memory – Never complain, never get tired, …

Major challenges in MT

Translation is hard

• Novels

• Word play, jokes, puns, hidden messages

• Concept gaps: go Greek, bei fen

• Other constraints: lyrics, dubbing, poem, …

Major challenges

• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words

• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique constructions: – Divergence

Lexical choice

• Homonymy/Polysemy: bank, run

• Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, …

• Coding (Concept lexeme mapping) differences:– More distinction in one language: e.g., kinship

vocabulary.– Different division of conceptual space:

Choosing the appropriate inflection

• Inflection: gender, number, case, tense, …

• Ex:– Number: Ch-Eng: all the concrete nouns: ch_book book, books– Gender: Eng-Fr: all the adjectives– Case: Eng-Korean: all the arguments– Tense: Ch-Eng: all the verbs: ch_buy buy, bought, will buy

Inserting spontaneous words• Function words:

– Determiners: Ch-Eng: ch_book a book, the book, the books, books

– Prepositions: Ch-Eng: … ch_November … in November

– Relative pronouns: Ch-Eng: … ch_buy ch_book de ch_person the person who bought /book/

– Possessive pronouns: Ch-Eng: ch_he ch_raise ch_hand He raised his hand(s)

– Conjunction: Eng-Ch: Although S1, S2 ch_although S1, ch_but S2

– …

Inserting spontaneous words (cont)

• Content words:– Dropped argument: Ch-Eng: ch_buy le ma Has Subj bought Obj?

– Chinese First name: Eng-Ch: Jiang … ch_Jiang ch_Zemin …

– Abbreviation, Acronyms: Ch-Eng: ch_12 ch_big the 12th National Congress of the

CPC (Communist Party of China)

– …

Major challenges

• Getting the right words:– Choosing the correct root form– Getting the correct inflected form– Inserting “spontaneous” words

• Putting the words in the correct order:– Word order: SVO vs. SOV, …– Unique construction: – Structural divergence

Word order

• SVO, SOV, VSO, …• VP + PP PP VP• VP + AdvP AdvP + VP

• Adj + N N + Adj• NP + PP PP NP• NP + S S NP

• P + NP NP + P

“Unique” Constructions

• Overt wh-movement: Eng-Ch:– Eng: Why do you think that he came yesterday?– Ch: you why think he yesterday come ASP?– Ch: you think he yesterday why come?

• Ba-construction: Ch-Eng– She ba homework finish ASP She finished her

homework.– He ba wall dig ASP CL hole He digged a hole in

the wall.– She ba orange peel ASP skin She peeled the

orange’s skin.

Translation divergences

• Source and target parse trees (dependency trees) are not identical.

• Example: I like Mary S: Marta me gusta a mi (‘Mary pleases me’)

• More discussion next time.

Major approaches

How humans do translation?

• Learn a foreign language:– Memorize word translations– Learn some patterns: – Exercise:

• Passive activity: read, listen• Active activity: write, speak

• Translation:– Understand the sentence– Clarify or ask for help (optional)– Translate the sentence

Training stage

Decoding stage

Translation lexicon

Templates, transfer rules

Parsing, semantics analysis?

Interactive MT?

Word-level? Phrase-level? Generate from meaning?

Reinforced learning?Reranking?

What kinds of resources are available to MT?

• Translation lexicon: – Bilingual dictionary

• Templates, transfer rules:– Grammar books

• Parallel data, comparable data• Thesaurus, WordNet, FrameNet, …

• NLP tools: tokenizer, morph analyzer, parser, …

More resources for major languages, less for “minor” languages.

Major approaches

• Transfer-based

• Interlingua

• Example-based (EBMT)

• Statistical MT (SMT)

• Hybrid approach

The MT triangle

word Word

Meaning

Transfer-based

Phrase-based SMT, EBMT

Word-based SMT, EBMT

(interlingua)

Ana

lysi

sS

ynthesis

Transfer-based MT

• Analysis, transfer, generation:1. Parse the source sentence2. Transform the parse tree with transfer rules3. Translate source words4. Get the target sentence from the tree

• Resources required:– Source parser– A translation lexicon– A set of transfer rules

• An example: Mary bought a book yesterday.

Transfer-based MT (cont)

• Parsing: linguistically motivated grammar or formal grammar?

• Transfer: – context-free rules? A path on a dependency tree? – Apply at most one rule at each level?– How are rules created?

• Translating words: word-to-word translation?• Generation: using LM or other additional knowledge?

• How to create the needed resources automatically?

Interlingua

• For n languages, we need n(n-1) MT systems.• Interlingua uses a language-independent

representation. • Conceptually, Interlingua is elegant: we only

need n analyzers, and n generators.

• Resource needed:– A language-independent representation– Sophisticated analyzers– Sophisticated generators

Interlingua (cont)

• Questions:– Does language-independent meaning representation

really exist? If so, what does it look like?– It requires deep analysis: how to get such an

analyzer: e.g., semantic analysis– It requires non-trivial generation: How is that done?– It forces disambiguation at various levels: lexical,

syntactic, semantic, discourse levels.– It cannot take advantage of similarities between a

particular language pair.

Example-based MT

• Basic idea: translate a sentence by using the closest match in parallel data.

• First proposed by Nagao (1981).• Ex:

– Training data:• w1 w2 w3 w4 w1’ w2’ w3’ w4’• w5 w6 w7 w5’ w6’ w7’• w8 w9 w8’ w9’

– Test sent:• w1 w2 w6 w7 w9 w1’ w2’ w6’ w7’ w9’

EMBT (cont)

• Types of EBMT:– Lexical (shallow)– Morphological / POS analysis – Parse-tree based (deep)

• Types of data required by EBMT systems:– Parallel text– Bilingual dictionary – Thesaurus for computing semantic similarity– Syntactic parser, dependency parser, etc.

EBMT (cont)

• Word alignment: using dictionary and heuristics exact match

• Generalization: – Clusters: dates, numbers, colors, shapes, etc. – Clusters can be built by hand or learned automatically.

• Ex:– Exact match: 12 players met in Paris last Tuesday

12 Spieler trafen sich letzen Dienstag in Paris

– Templates: $num players met in $city $time $num Spieler trafen sich $time in $city

Statistical MT

• Basic idea: learn all the parameters from parallel data.

• Major types:– Word-based– Phrase-based

• Strengths:– Easy to build, and it requires no human knowledge– Good performance when a large amount of training data is

available.

• Weaknesses:– How to express linguistic generalization?

Comparison of resource requirement

Transfer-based

Interlingua EBMT SMT

dictionary + + +

Transfer rules

+

parser + + + (?)

semantic

analyzer

+

parallel data + +

others Universal representation

thesaurus

Hybrid MT• Basic idea: combine strengths of different approaches:

– Syntax-based: generalization at syntactic level– Interlingua: conceptually elegant– EBMT: memorizing translation of n-grams; generalization at various level.– SMT: fully automatic; using LM; optimizing some objective functions.

• Types of hybrid HT:– Borrowing concepts/methods:

• SMT from EBMT: phrase-based SMT; Alignment templates• EBMT from SMT: automatically learned translation lexicon• Transfer-based from SMT: automatically learned translation lexicon, transfer rules;

using LM• …

– Using two MTs in a pipeline:• Using transfer-based MT as a preprocessor of SMT

– Using multiple MTs in parallel, then adding a re-ranker.

Evaluation of MT

Evaluation

• Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT.

• Human evaluation: accuracy, fluency, …– Problem: expensive, slow, subjective, non-reusable.

• Automatic measures:– Edit distance– Word error rate (WER), Position-independent WER (PER)– Simple string accuracy (SSA), Generation string accuracy (GSA)– BLEU

Edit distance

• The Edit distance (a.k.a. Levenshtein distance) is defined as the minimal cost of transforming str1 into str2, using three operations (substitution, insertion, deletion).

• Use DP and the complexity is O(m*n).

WER, PER, and SSA

• WER (word error rate) is edit distance, divided by |Ref|.• PER (position-independent WER): same as WER but

disregards word ordering• SSA (Simple string accuracy) = 1 - WER

• Previous example:– Sys: w1 w2 w3 w4– Ref: w1 w3 w2– Edit distance = 2– WER=2/3– PER=1/3– SSA=1/3

Generation string accuracy (GSA)

Example:Ref: w1 w2 w3 w4

Sys: w2 w3 w4 w1

Del=1, Ins=1 SSA=1/2Move=1, Del=0, Ins=0 GSA=3/4

|Re|1

f

SubDelInsMoveGSA

BLEU

• Proposal by Papineni et. al. (2002)• Most widely used in MT community.• BLEU is a weighted average of n-gram precision

(pn) between system output and all references, multiplied by a brevity penalty (BP).

)1

(...***

*

21

1

NwwhenpppBP

pBPBLEU

nN

N

N

n

wn

n

N-gram precision

• N-gram precision: the percent of n-grams in the system output that are correct.

• Clipping: – Sys: the the the the the the– Ref: the cat sat on the mat– Unigram precision: – Max_Ref_count: the max number of times a

ngram occurs in any single reference translation.

)_Re_,min( CountfMaxcountCountclip

N-gram precision

i.e. the percent of n-grams in the system output that are correct (after clipping).

SysS Sngram

SysS Sngramclip

n ngramCount

ngramCount

p)(

)(

Brevity Penalty

• For each sent si in system output, find closest matching reference ri (in terms of length).

• Longer system output is already penalized by the n-gram precision measure.

otherwisee

rcifBP cr /1

1

|||,| i

ii

i rrscLet

An example

• Sys: The cat was on the mat• Ref1: The cat sat on a mat• Ref2: There was a cat on the mat

• Assuming N=3

• p1=5/6, p2=3/5, p3=1/4, BP=1 BLEU=0.50

• What if N=4?

Summary

• Course overview

• Major challenges in MT– Choose the right words (root form, inflection,

spontaneous words)– Put them in right positions (word order, unique

constructions, divergences)

Summary (cont)

• Major approaches– Transfer-based MT– Interlingua– Example-based MT– Statistical MT– Hybrid MT

• Evaluation of MT systems– Edit distance– WER, PER, SSA, GSA– BLEU

Additional slides

Translation divergences(based on Bonnie Dorr’s work)

• Thematic divergence: I like Mary S: Marta me gusta a mi (‘Mary pleases me’)

• Promotional divergence: John usually goes home S: Juan suele ira casa (‘John tends to go home’)

• Demotional divergence: I like eating G: Ich esse gern (“I eat likingly)

• Structural divergence: John entered the house S: Juan entro en la casa (‘John entered in the house’)

Translation divergences (cont)

• Conflational divergence: I stabbed John S: Yo le di punaladas a Juan (‘I gave knife-

wounds to John’)

• Categorial divergence: I am hungry G: Ich habe Hunger (‘I have hunger’)

• Lexical divergence: John broke into the room S: Juan forzo la entrada al cuarto (‘John forced

the entry to the room’)

Calculating edit distance

• D(0, 0) = 0• D(i, 0) = delCost * i• D(0, j) = insCost * j

• D(i+1, j+1) = min( D(i,j) + sub, D(i+1, j) + insCost, D(i, j+1) + delCost)

sub = 0 if str1[i+1]=str2[j+1] = subCost otherwise

An example

• Sys: w1 w2 w3 w4• Ref: w1 w3 w2• All three costs are 1.

• Edit distance=2

0 1 2 3

1 0 1 2

2 1 1 1

3 2 1 2

4 3 2 2

w1 w3 w2

w1

w2

w3

w4

Download - Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06

Top Related