cmsc 723 / ling 645: intro to computational linguistics september 8, 2004: dorr mt (continued), mt...

CMSC 723 / LING 645: Intro to Computational Linguistics

September 8, 2004: Dorr

MT (continued), MT Evaluation

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

MT Challenges: Ambiguity

Syntactic AmbiguityI saw the man on the hill with the telescope

Lexical AmbiguityE: bookS: libro, reservar

Semantic Ambiguity– Homography:

ball(E) = pelota, baile(S)– Polysemy:

kill(E), matar, acabar (S)– Semantic granularity

esperar(S) = wait, expect, hope (E)be(E) = ser, estar(S)fish(E) = pez, pescado(S)

MT Challenges: Divergences

Meaning of two translationally equivalent phrases is distributed differently in the two languages

Example:– English: [RUN INTO ROOM]– Spanish: [ENTER IN ROOM RUNNING]

Divergence Frequency

32% of sentences in UN Spanish/English Corpus (5K) 35% of sentences in TREC El Norte Corpus (19K) Divergence Types

– Categorial (X tener hambre X have hunger) [98%]

– Conflational (X dar puñaladas a Z X stab Z) [83%]

– Structural (X entrar en Y X enter Y) [35%]

– Head Swapping (X cruzar Y nadando X swim across Y) [8%]

– Thematic (X gustar a Y Y like X) [6%]

Spanish/Arabic Divergences

Divergence E/E’ (Spanish) E/E’ (Arabic)

Categorial be jealous when he returns have jealousy [tener celos] upon his return [ رجوعه [عند

Conflational float come again go floating [ir flotando] return [عاد]

Structural enter the house seek enter in the house [entrar en la casa] search for [ عن [بحثHead Swap run in do something quickly

enter running [entrar corriendo] go-quickly in doing something [اسرع] Thematic I have a headache my-head hurts me [me duele la cabeza] —

[Arg1 [V]] [Arg1 [MotionV] Modifier(v)]“The boat floated’’ “The boat went floating’’

(using narrowly defined divergence detection rules)

Language Detected Human Sample Corpus Confirmed Size Size

Spanish – Total 11.1% 10.5% 19K 150K

Arabic – Total 31.9 12.5% 1K 28K

Automatic Divergence Detection

Application of Divergence Detection: Bilingual Alignment for MT

Word-level alignments of bilingual texts are an integral part of MT models

Divergences present a great challenge to the alignment task

Common divergence types can be found in multiple language pairs, systematically identified, and resolved

The Problem:Alignment & Projection

I began to eat the fish

Yo empecé a comer el pescado

Why is this a hard problem?

I run into the room

Yo entro en el cuarto corriendo

Divergences!

English: [RUN INTO ROOM]Spanish: [ENTER IN ROOM RUNNING]

Our Goal: Improved Alignment & Projection

Induce higher interannotator agreement rate

Increase the number of aligned words

Decrease multiple alignments

DUSTer Approach: Divergence Unraveling

I run into the roomE:

I move-in running the roomE:

Yo entro en el cuarto corriendoS:

Word-Level Alignment (1): Test Setup

run

John into

room

John

enter

room

running

Ex: John ran into the room → John entered the room running

Divergence Detection: Categorize English sentences into one of 5 divergence types

Divergence Correction: Apply appropriate

structural transformation [E → E]

Word-Level Alignment (2): Testing Impact of Divergence Correction

Human align English and foreign sentence

Human align English and foreign sentence

Compare inter-annotator agreement, unaligned units, multiple alignments

Word-Level Alignment Results

Inter-Annotator Agreement: – English-Spanish: agreement increased from 80.2% to 82.9%

– English-Arabic: agreement increased from 69.7% to 75.1%

Number of aligned words:– English-Spanish: aligned words increased from 82.8% to 86%

– English-Arabic: aligned words increased from 61.5% to 88.1%

Multiple Alignments:– English-Spanish: number of links went from 1.35 to 1.16

– English-Arabic: number of links increased from 1.48 to 1.72

Divergence Unraveling Conclusions

Divergence handling shows promise for improvement of automatic alignment

Conservative lower bound on divergence frequency

Effective solution: syntactic transformation of English

Validity of solution shown through alignment experiments

How do we evaluate MT?

Human-based Metrics– Semantic Invariance– Pragmatic Invariance– Lexical Invariance– Structural Invariance– Spatial Invariance– Fluency– Accuracy– “Do you get it?”

Automatic Metrics: Bleu

BiLingual Evaluation Understudy (BLEU —Papineni, 2001)

Automatic Technique, but ….Requires the pre-existence of Human (Reference)

TranslationsApproach:

– Produce corpus of high-quality human translations– Judge “closeness” numerically (word-error rate)– Compare n-gram matches between candidate translation

and 1 or more reference translations

http://www.research.ibm.com/people/k/kishore/RC22176.pdf

Bleu Comparison

Chinese-English Translation Example:

Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.

Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.

How Do We Compute Bleu Scores?

Key Idea: A reference word should be considered exhausted after a matching candidate word is identified.

• For each word compute: (1) candidate word count(2) maximum ref count

• Add counts for each candidate word using the lower of the two numbers .

• Divide by number of candidate words..

Modified Unigram Precision: Candidate #1




It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1)

What’s the answer??????

17/18

Modified Unigram Precision: Candidate #2

It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0)


8/14




Modified Bigram Precision: Candidate #1It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1)


10/17




Modified Bigram Precision: Candidate #2

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.



It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0)


1/13

Catching Cheaters

Reference 1: The cat is on the mat

Reference 2: There is a cat on the mat

the(2) the the the(0) the(0) the(0) the(0)

What’s the unigram answer?

2/7

What’s the bigram answer?

0/7

cmsc 723 / ling 645: intro to computational linguistics september 8, 2004: dorr mt (continued), mt...

Documents

floating slide

room e

pescados slide

pescado slide

room spanish

room john

cuarto corriendo slide

divergence frequency