cmsc 723 / ling 645: intro to computational linguistics september 8, 2004: dorr mt (continued), mt...
Post on 20-Dec-2015
225 views
TRANSCRIPT
CMSC 723 / LING 645: Intro to Computational Linguistics
September 8, 2004: Dorr
MT (continued), MT Evaluation
Prof. Bonnie J. DorrDr. Christof Monz
TA: Adam Lee
MT Challenges: Ambiguity
Syntactic AmbiguityI saw the man on the hill with the telescope
Lexical AmbiguityE: bookS: libro, reservar
Semantic Ambiguity– Homography:
ball(E) = pelota, baile(S)– Polysemy:
kill(E), matar, acabar (S)– Semantic granularity
esperar(S) = wait, expect, hope (E)be(E) = ser, estar(S)fish(E) = pez, pescado(S)
MT Challenges: Divergences
Meaning of two translationally equivalent phrases is distributed differently in the two languages
Example:– English: [RUN INTO ROOM]– Spanish: [ENTER IN ROOM RUNNING]
Divergence Frequency
32% of sentences in UN Spanish/English Corpus (5K) 35% of sentences in TREC El Norte Corpus (19K) Divergence Types
– Categorial (X tener hambre X have hunger) [98%]
– Conflational (X dar puñaladas a Z X stab Z) [83%]
– Structural (X entrar en Y X enter Y) [35%]
– Head Swapping (X cruzar Y nadando X swim across Y) [8%]
– Thematic (X gustar a Y Y like X) [6%]
Spanish/Arabic Divergences
Divergence E/E’ (Spanish) E/E’ (Arabic)
Categorial be jealous when he returns have jealousy [tener celos] upon his return [ رجوعه [عند
Conflational float come again go floating [ir flotando] return [عاد]
Structural enter the house seek enter in the house [entrar en la casa] search for [ عن [بحثHead Swap run in do something quickly
enter running [entrar corriendo] go-quickly in doing something [اسرع] Thematic I have a headache my-head hurts me [me duele la cabeza] —
[Arg1 [V]] [Arg1 [MotionV] Modifier(v)]“The boat floated’’ “The boat went floating’’
(using narrowly defined divergence detection rules)
Language Detected Human Sample Corpus Confirmed Size Size
Spanish – Total 11.1% 10.5% 19K 150K
Arabic – Total 31.9 12.5% 1K 28K
Automatic Divergence Detection
Application of Divergence Detection: Bilingual Alignment for MT
Word-level alignments of bilingual texts are an integral part of MT models
Divergences present a great challenge to the alignment task
Common divergence types can be found in multiple language pairs, systematically identified, and resolved
Our Goal: Improved Alignment & Projection
Induce higher interannotator agreement rate
Increase the number of aligned words
Decrease multiple alignments
DUSTer Approach: Divergence Unraveling
I run into the roomE:
I move-in running the roomE:
Yo entro en el cuarto corriendoS:
Word-Level Alignment (1): Test Setup
run
John into
room
John
enter
room
running
Ex: John ran into the room → John entered the room running
Divergence Detection: Categorize English sentences into one of 5 divergence types
Divergence Correction: Apply appropriate
structural transformation [E → E]
Word-Level Alignment (2): Testing Impact of Divergence Correction
Human align English and foreign sentence
Human align English and foreign sentence
Compare inter-annotator agreement, unaligned units, multiple alignments
Word-Level Alignment Results
Inter-Annotator Agreement: – English-Spanish: agreement increased from 80.2% to 82.9%
– English-Arabic: agreement increased from 69.7% to 75.1%
Number of aligned words:– English-Spanish: aligned words increased from 82.8% to 86%
– English-Arabic: aligned words increased from 61.5% to 88.1%
Multiple Alignments:– English-Spanish: number of links went from 1.35 to 1.16
– English-Arabic: number of links increased from 1.48 to 1.72
Divergence Unraveling Conclusions
Divergence handling shows promise for improvement of automatic alignment
Conservative lower bound on divergence frequency
Effective solution: syntactic transformation of English
Validity of solution shown through alignment experiments
How do we evaluate MT?
Human-based Metrics– Semantic Invariance– Pragmatic Invariance– Lexical Invariance– Structural Invariance– Spatial Invariance– Fluency– Accuracy– “Do you get it?”
Automatic Metrics: Bleu
BiLingual Evaluation Understudy (BLEU —Papineni, 2001)
Automatic Technique, but ….Requires the pre-existence of Human (Reference)
TranslationsApproach:
– Produce corpus of high-quality human translations– Judge “closeness” numerically (word-error rate)– Compare n-gram matches between candidate translation
and 1 or more reference translations
http://www.research.ibm.com/people/k/kishore/RC22176.pdf
Bleu Comparison
Chinese-English Translation Example:
Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.
Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
How Do We Compute Bleu Scores?
Key Idea: A reference word should be considered exhausted after a matching candidate word is identified.
• For each word compute: (1) candidate word count(2) maximum ref count
• Add counts for each candidate word using the lower of the two numbers .
• Divide by number of candidate words..
Modified Unigram Precision: Candidate #1
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1)
What’s the answer??????
17/18
Modified Unigram Precision: Candidate #2
It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0)
What’s the answer??????
8/14
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
Modified Bigram Precision: Candidate #1It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1)
What’s the answer??????
10/17
Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
Modified Bigram Precision: Candidate #2
Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0)
What’s the answer??????
1/13