machine translation a presentation by: julie conlonova, rob chase, and eric pomerleau

Machine Translation

A Presentation by:

Julie Conlonova,

Rob Chase,

and Eric Pomerleau

Overview

Language Alignment SystemDatasets

Sentence-aligned sets for training (ex. The Hansards Corpus, European Parliamentary Proceedings Parallel Corpus)

A word-aligned set for testing and evaluation to measure accuracy and precision

Decoding

Language Alignment

Goal: Produce a word-aligned set from a sentence-aligned dataset

First step on the road toward Statistical Machine Translation

Example Problem: The motion to adjourn the House is now

deemed to have been adopted. La motion portant que la Chambre s'ajourne

maintenant est réputée adoptée.

IBM Models 1 and 2-Kevin Knight, A Statistical MT Tutorial Workbook, 1999

Each capable of being used to produce a word-aligned dataset separately.

EM Algorithm Model 1 produces T-values based on

normalized fractional counting of corresponding words.

Additionally, Model 2 uses A-values for “reverse distortion probabilities” – probabilities based on the positions of the words

Training Data

European Parliament Proceedings Parallel Corpus 1996-2003

Aligned Languages: English - French English - Dutch English - Italian English - Finish English - Portuguese English - Spanish English - Greek

Training Data cont.

Eliminated Misaligned sentences Sentences with 50 or more words XML tags Symbols and numerical characters other then

commas and periods

Ideally…

http://www.cs.berkeley.edu/~klein/cs294-5

Bypassing Interlingua: Models I-III

Variables contributing to the probability of a sentence:Correlation between words in the

source/target languagesFertility of a wordCorrelation between order of words in

source sentence and order of words in target

A Translation Matrix

Rob Cat is Dog

Rob 1 0 0 0

Gato 0 1 0 0

es 0 0 .5 0

esta 0 0 .5 0

Perro 0 0 0 1

Building the Translation Matrix: Starting from alignments

Find the sentence alignmentIf a word in the source aligns with a word

in the target, then increment the translation matrix.

Normalize the translation matrix

Can’t find alignments

Most sentences in the hansards corpus are 60 words long. There are many that can be over 100.

100100 possible alignments

Counting

Rob is a boy. Rob es nino.Rob is tall. Rob es alto.Eric is tall. Eric es alto.

… …

Base counts on co-occurrence, weighting based on sentence length.

Iterative Convergence

Use Estimation Maximization algorithm

Creates translation matrix

Rob Is Tall boy

Rob .66 .33 .25 .25

es .30 .66 .25 .25

alto .2 .05 .5 0

nino .2 .05 0 .5

Distorting the Sentence

Word order changes between languagesHow is a sentence with 2 words distorted?How is a sentence with 3 words distorted?How is a sentence with …

To keep track of this information we use…

A tesseract!

(A quadruply nested default dictionary)

This could be a problem if there are more than 100 words in a sentence.

100x100x100x100 = too big for RAM and takes too much time

Broad Look at MT

“The translation process can be described simply as:

1. Decoding the meaning of the source text, and

2. Re-encoding this meaning in the target language.”

- “Translation Process”, Wikipedia, May 2006

Decoding

How to go from the T-matrix and A-matrix to a word alignment?

There are several approaches…

Viterbi

If only doing alignment, much smaller memory and time requirements.

Returns optimal path.

T-Matrix probabilities function as the “emission” matrix

A-Matrix probabilities concerned with the positioning of words

Decoding as a Translator

Without supplying a translated sentence to the program, it is capable of being a stand-alone translator instead of a word aligner.

However, while the Viterbi algorithm runs quickly with pruning for decoding, for translating the run time skyrockets.

Greedy Hill ClimbingKnight & Koehn, What’s New in Statistical Machine Translation, 2003

Best first search2-step look ahead to avoid getting stuck in

most probable local maxima

Beam SearchKnight & Koehn, What’s New in Statistical Machine Translation, 2003

Optimization of Best First Search with heuristics and “beam” of choices

Exponential tradeoff when increasing the “beam” width

Other Decoding MethodsKnight & Koehn, What’s New in Statistical Machine Translation, 2003

Finite State Transducer Mapping between languages based on a finite

automaton

Parsing String to Tree Model

Problem: One to Many

Necessary to take all alignments over a certain probability in order to capture the “probability that e has fertility at least a given value”

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

Results

Study done in 2003 on word alignment error rates in Hansards corpus: Model 2 –

29.3% on 8K training sentence pairs 19.5% on 1.47M training sentence pairs

Optimized Model 6 – 20.3% on 8K training sentence pairs 8.7% on 1.47M training sentence pairs

Och and Ney, A Systematic Comparison of Various Statistical Alignment Models, 2003

Expected Accuracy

70% overall

Language performance: Dutch

French• Italian, Spanish, Portuguese

Greek Finish

Possible Future Work

Given more time, we would’ve implemented IBM Model 3

Additionally uses n, p, and d fertilities for weighted alignments: N, number of words produced by one word D, distortion P, parameter involving words that aren’t involved directly

Invokes Model 2 for scoring

Another Possible Translation Scheme

Example-Based Machine Translation Translation-by-Analogy Can sometimes achieve better than the “gist”

translations from other models

Why Is Improving Machine Translation Necessary?

A Chinese to English Translation

The End

Are there any questions/comments?

machine translation a presentation by: julie conlonova, rob chase, and eric pomerleau

Documents