jhu mt class: phrase-based translation

164
Phrase-Based Translation

Upload: alopezfoo

Post on 11-Jun-2015

329 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: JHU MT class: phrase-based translation

Phrase-BasedTranslation

Page 2: JHU MT class: phrase-based translation

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Page 3: JHU MT class: phrase-based translation

The IBM Models

•Fertility probabilities.

•Word translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Page 4: JHU MT class: phrase-based translation

Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然

ε

ε ε北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

IBM Model 4

Page 5: JHU MT class: phrase-based translation

Tradeoffs: Modeling v. Learning

IBM Model 1 ✔ ✘ ✘ ✔ ✔

HMM ✔ ✔ ✘ ✘ ✔

IBM Model 4 ✔ ✔ ✔ ✘ ✘

Lexical

Tran

slatio

n

Local orderi

ng depen

dency

Fertilit

y

Convex

Tracta

ble Exa

ct

Inferen

ce

Page 6: JHU MT class: phrase-based translation

Tradeoffs: Modeling v. Learning

IBM Model 1 ✔ ✘ ✘ ✔ ✔

HMM ✔ ✔ ✘ ✘ ✔

IBM Model 4 ✔ ✔ ✔ ✘ ✘

Lexical

Tran

slatio

n

Local orderi

ng depen

dency

Fertilit

y

Convex

Tracta

ble Exa

ct

Inferen

ce

Lesson:Trade exactnessfor expressivity

Page 7: JHU MT class: phrase-based translation

Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然

ε

ε ε北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

IBM Model 4

What are some things this model doesn’t account for?

Page 8: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 天空 天空 依然 清澈 。

north wind strong , the sky remained clear . under theHowever

ε

ε

ε

What are some things this model doesn’t account for?

Page 9: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Phrase-based Models

Page 10: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Phrase-based Models

Page 11: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

Phrase-based Models

Page 12: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

Phrase-based Models

Page 13: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However

Phrase-based Models

Page 14: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However ,

Phrase-based Models

Page 15: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

Phrase-based Models

Page 16: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

Page 17: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

Page 18: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

Page 19: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

Page 20: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

Page 21: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However the strong north wind , the sky remained clear under .

However , the sky remained clear under the strong north wind .

p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)

Phrase-based Models

distortion = 6

Page 22: JHU MT class: phrase-based translation

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Page 23: JHU MT class: phrase-based translation

Phrase-based Models

•Segmentation probabilities.

•Phrase translation probabilities.

•Distortion probabilities.

•Some problems:

•Weak reordering model -- output is not fluent.

•Many decisions -- many things can go wrong.

Page 24: JHU MT class: phrase-based translation

Phrase-based Models

•Segmentation probabilities: fixed (uniform)

•Phrase translation probabilities.

•Distortion probabilities: fixed (decaying)

Page 25: JHU MT class: phrase-based translation

Learning p(Chinese|English)

•Reminder: (nearly) every problem comes down to computing either:

•Sums: MLE or EM (learning)

•Maximum: most probable (decoding)

Page 26: JHU MT class: phrase-based translation

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 27: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p( )

) +

) +

Marginalize: sum all alignments containing the link

Page 28: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

Page 29: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

We have to sum over exponentially many alignments!

Page 30: JHU MT class: phrase-based translation

EM for Model 1

北p(north| ).�

c∈Chinese words p(north|c)

Page 31: JHU MT class: phrase-based translation

EM for Phrase-Based

•Model parameters: p(E phrase|F phrase)

•All we need to do is compute expectations:

p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)

p(F,E)

Page 32: JHU MT class: phrase-based translation

EM for Phrase-Based

•Model parameters: p(E phrase|F phrase)

•All we need to do is compute expectations:

p(F,E) sums over all possible phrase alignments...which are one-to-one by definition.

p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)

p(F,E)

Page 33: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However

EM for Phrase-Based

Can we compute this quantity?

, the sky remained clear under the strong north wind .

How many 1-to-1 alignments are there ofthe remaing 8 Chinese and 8 English words?

p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)

p(F,E)

Page 34: JHU MT class: phrase-based translation

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Page 35: JHU MT class: phrase-based translation

Recap: Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

•Guaranteed that likelihood is monotonically nondecreasing.

Computing expectations from a phrase-based model, given a sentence pair, is #P-Complete(by reduction to counting perfect matchings;

DeNero & Klein, 2008)

Page 36: JHU MT class: phrase-based translation

Now What?

•Option #1: approximate expectations

•Restrict computation to some tractable subset of the alignment space (arbitrarily biased).

•Markov chain Monte Carlo (very slow).

Page 37: JHU MT class: phrase-based translation

Now What?•Change the problem definition

•We already know how to learn word-to-word translation models efficiently.

•Idea: learn word-to-word alignments, extract most probable alignment, then treat it as observed.

•Learn phrase translations consistent with word alignments.

•Decouples alignment from model learning -- is this a good thing?

Page 38: JHU MT class: phrase-based translation

Japanesewatashi wa

hako wohako woakemasuakemasu

hako wo akemasu

EnglishI

the boxbox

openopen the

open the box

p(E|J)1.00.50.50.50.51.0

A translation model is simply a probabilistic phrase dictionary.

Page 39: JHU MT class: phrase-based translation

I open the box

watashi wa hako wo akemasu

Page 40: JHU MT class: phrase-based translation

I open the box

watashi wa hako wo akemasu

word alignment:expectation maximization

Page 41: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

Page 42: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

akemasu / open

Page 43: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

akemasu / open

Page 44: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Page 45: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Page 46: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

hako wo / box

Page 47: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

hako wo / the box

Page 48: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

hako wo / open the box✘

Page 49: JHU MT class: phrase-based translation

I open the box

watashi

wa

hako

wo

akemasu

hako wo akemasu / open the box

Page 50: JHU MT class: phrase-based translation

Japanesewatashi wa

hako wohako woakemasuakemasu

hako wo akemasu

EnglishI

the boxbox

openopen the

open the box

Page 51: JHU MT class: phrase-based translation

Japanesewatashi wa

hako wohako woakemasuakemasu

hako wo akemasu

EnglishI

the boxbox

openopen the

open the box

#(E,J)111111

Page 52: JHU MT class: phrase-based translation

Japanesewatashi wa

hako wohako woakemasuakemasu

hako wo akemasu

EnglishI

the boxbox

openopen the

open the box

#(E,J)111111

#(J)122221

Page 53: JHU MT class: phrase-based translation

Japanesewatashi wa

hako wohako woakemasuakemasu

hako wo akemasu

EnglishI

the boxbox

openopen the

open the box

#(E,J)111111

#(J)122221

p(E|J)1.00.50.50.50.51.0

Page 54: JHU MT class: phrase-based translation

Phrasal Translation Estimation

Page 55: JHU MT class: phrase-based translation

Phrasal Translation Estimation

•Option #1 (EM over restricted space)

•Align with a word-based model.

•Compute expectations only over alignments consistent with the alignment grid.

Page 56: JHU MT class: phrase-based translation

Phrasal Translation Estimation

•Option #1 (EM over restricted space)

•Align with a word-based model.

•Compute expectations only over alignments consistent with the alignment grid.

•Option #2 (Non-global estimation)

•View phrase pairs as observed, irrespective of context or overlap.

Page 57: JHU MT class: phrase-based translation

Phrasal Translation Estimation

•Option #1 (EM over restricted space)

•Align with a word-based model.

•Compute expectations only over alignments consistent with the alignment grid.

•Option #2 (Non-global estimation)

•View phrase pairs as observed, irrespective of context or overlap.

•Heuristic solutions to intractable exact problem.

Page 58: JHU MT class: phrase-based translation

Decoding

We want to solve this problem:e∗ = arg max

ep(e|f)

Page 59: JHU MT class: phrase-based translation

Decoding

We want to solve this problem:

Q: how many English sentences are there?

e∗ = arg maxe

p(e|f)

Page 60: JHU MT class: phrase-based translation

北 风 呼啸 。

Page 61: JHU MT class: phrase-based translation

北 风 呼啸 。

segmentationssubstitutionspermutations

Page 62: JHU MT class: phrase-based translation

北 风 呼啸 。

O(2n)segmentationssubstitutionspermutations

Page 63: JHU MT class: phrase-based translation

北 风 呼啸 。

O(2n)O(5n)

segmentationssubstitutionspermutations

Page 64: JHU MT class: phrase-based translation

北 风 呼啸 。

O(2n)O(5n)O(n!)

segmentationssubstitutionspermutations

Page 65: JHU MT class: phrase-based translation

北 风 呼啸 。

Can we do this without enumerating pairs?O(10nn!)

Page 66: JHU MT class: phrase-based translation

北 风 呼啸 。

the strong north wind .

Can we do this without enumerating pairs?O(10nn!)

Page 67: JHU MT class: phrase-based translation

北 风 呼啸 。

the strong north wind .

Can we do this without enumerating pairs?O(10nn!)

Page 68: JHU MT class: phrase-based translation

北 风 呼啸 。

the strong north wind .

Given a sentence pair and an alignment, we can easily calculate

p(English, alignment|Chinese)

Can we do this without enumerating pairs?O(10nn!)

Page 69: JHU MT class: phrase-based translation

北 风 呼啸 。

the strong north wind .

There are target sentences.

Key Idea

But there are only ways to start them.

O(10nn!)

O(5n)

Page 70: JHU MT class: phrase-based translation

北 风 呼啸 。

Key Idea

Page 71: JHU MT class: phrase-based translation

北 风 呼啸 。

Key Idea

Page 72: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

Page 73: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

north

Page 74: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

Page 75: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

north

northern

p(north|START ) · p( |north)北

Page 76: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

north

northern

p(north|START ) · p( |north)北

p(northern|START ) · p( |northern)北

Page 77: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

north

northern

strong

p(north|START ) · p( |north)北

p(northern|START ) · p( |northern)北

Page 78: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

north

northern

strong

p(north|START ) · p( |north)北

p(northern|START ) · p( |northern)北

p(strong|START ) · p( |strong)呼啸

Page 79: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

Page 80: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

wind

Page 81: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

wind

p(wind|north) · p( |wind)风

Page 82: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

wind

p(wind|north) · p( |wind)风

strong

Page 83: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

wind

p(wind|north) · p( |wind)风

p(strong|north) · p( |strong)呼啸

strong

Page 84: JHU MT class: phrase-based translation

北 风 呼啸 。

coverage vector

Key Idea

northp(north|START ) · p( |north)北

wind

p(wind|north) · p( |wind)风

Work done at sentence beginnings is shared across many possible output sentences!

p(strong|north) · p( |strong)呼啸

strong

Page 85: JHU MT class: phrase-based translation

Key Idea

Page 86: JHU MT class: phrase-based translation

Key Idea

Page 87: JHU MT class: phrase-based translation

Key Idea

Page 88: JHU MT class: phrase-based translation

Key Idea

Page 89: JHU MT class: phrase-based translation

Key Idea

Page 90: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

Page 91: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

amount of work:O(5n2n)

Page 92: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

bad, but much better thanO(10n

n!)amount of work:

O(5n2n)

Page 93: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

bad, but much better thanO(10n

n!)amount of work:

O(5n2n)

Page 94: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

each edge labelled with a weight and a

word (or words)

bad, but much better thanO(10n

n!)amount of work:

O(5n2n)

Page 95: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

each edge labelled with a weight and a

word (or words)

north, 0.014

bad, but much better thanO(10n

n!)amount of work:

O(5n2n)

Page 96: JHU MT class: phrase-based translation

Key Idea

Dynamic Programming

each edge labelled with a weight and a

word (or words)

north, 0.014

weighted finite-state automata

bad, but much better thanO(10n

n!)amount of work:

O(5n2n)

Page 97: JHU MT class: phrase-based translation

Weighted languages

•The lattice describing the set of all possible translations is a weighted finite state automaton.

•So is the language model.

•Since regular languages are closed under intersection, we can intersect the devices and run shortest path graph algorithms.

•Taking their intersection is equivalent to computing the probability under Bayes’ rule.

Page 98: JHU MT class: phrase-based translation

Wait a second!

We want to solve this problem:e∗ = arg max

ep(e|f)

Page 99: JHU MT class: phrase-based translation

Wait a second!

We want to solve this problem:e∗ = arg max

ep(e|f)

= arg maxe

a

p(e,a|f)

Page 100: JHU MT class: phrase-based translation

Wait a second!

We want to solve this problem:e∗ = arg max

ep(e|f)

But now we’re solving this problem:e∗ = arg max

emax

ap(e,a|f)

= arg maxe

a

p(e,a|f)

Page 101: JHU MT class: phrase-based translation

Wait a second!

We want to solve this problem:e∗ = arg max

ep(e|f)

But now we’re solving this problem:e∗ = arg max

emax

ap(e,a|f)

= arg maxe

a

p(e,a|f)

Often called the Viterbi approximation

Page 102: JHU MT class: phrase-based translation

We can sum over alignments by weighted determinization

Wait a second!

Page 103: JHU MT class: phrase-based translation

How expensive is it?

We can sum over alignments by weighted determinization

Wait a second!

Page 104: JHU MT class: phrase-based translation

How expensive is it?

We can sum over alignments by weighted determinization

O(5n2n)nondeterministic

Wait a second!

Page 105: JHU MT class: phrase-based translation

How expensive is it?

We can sum over alignments by weighted determinization

O(5n2n)nondeterministic

O(25n2n

)deterministic

Wait a second!

Page 106: JHU MT class: phrase-based translation

I made the simplest machine translation model I could think of

and it blew up in my face

is still far too much work.O(5n2n)

Page 107: JHU MT class: phrase-based translation

I made the simplest machine translation model I could think of

and it blew up in my face

is still far too much work.

Can we do better?

O(5n2n)

Page 108: JHU MT class: phrase-based translation

北 风 呼啸 。

Can we do better?

Page 109: JHU MT class: phrase-based translation

北 风 呼啸 。

north wind the strong .

Can we do better?

Page 110: JHU MT class: phrase-based translation

北 风 呼啸 。

north wind the strong .

Can we do better?

Page 111: JHU MT class: phrase-based translation

北 风 呼啸 。

north wind the strong .

Can we do better?

Each arc weighted by translation probability +

bigram probability

Page 112: JHU MT class: phrase-based translation

北 风 呼啸 。

north wind the strong .

Can we do better?

Each arc weighted by translation probability +

bigram probability

Objective: find shortest path that visits each word once.

Page 113: JHU MT class: phrase-based translation

北 风 呼啸 。

London Paris NY Tokyo .

Can we do better?

Each arc weighted by translation probability +

bigram probability

Objective: find shortest path that visits each word once.

Page 114: JHU MT class: phrase-based translation

北 风 呼啸 。

London Paris NY Tokyo .

Can we do better?

Each arc weighted by translation probability +

bigram probability

Objective: find shortest path that visits each word once.

Probably not: this is the traveling salesman problem.

Page 115: JHU MT class: phrase-based translation

北 风 呼啸 。

London Paris NY Tokyo .

Can we do better?

Each arc weighted by translation probability +

bigram probability

Objective: find shortest path that visits each word once.

Probably not: this is the traveling salesman problem.Even the Viterbi approximation is too hard to solve!

Page 116: JHU MT class: phrase-based translation

Approximation: Pruning

Page 117: JHU MT class: phrase-based translation

Approximation: Pruning

Idea: prune states by accumulated path length

Page 118: JHU MT class: phrase-based translation

Approximation: Pruning

Idea: prune states by accumulated path length

Page 119: JHU MT class: phrase-based translation

Approximation: Pruning

Page 120: JHU MT class: phrase-based translation

Approximation: Pruning

Reality: longer paths have lower probability!

Page 121: JHU MT class: phrase-based translation

Approximation: Pruning

Page 122: JHU MT class: phrase-based translation

Approximation: Pruning

Page 123: JHU MT class: phrase-based translation

Approximation: Pruning

Solution: Group states by number of covered words.

Page 124: JHU MT class: phrase-based translation

Approximation: Pruning

Solution: Group states by number of covered words.

Page 125: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 126: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 127: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 128: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 129: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 130: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 131: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 132: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 133: JHU MT class: phrase-based translation

Approximation: Pruning

“Stack” decoding: a linear-time approximation

Page 134: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

Approximation: Distortion Limits

Page 135: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

O(2n)number of vertices:

Approximation: Distortion Limits

Page 136: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

O(2n)number of vertices:

d = 4window

Approximation: Distortion Limits

Page 137: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

O(2n)number of vertices:

d = 4window

outside windowto left: covered

outside windowto right: uncovered

Approximation: Distortion Limits

Page 138: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

number of vertices:

d = 4window

outside windowto left: covered

outside windowto right: uncovered

O(n2d)

Approximation: Distortion Limits

Page 139: JHU MT class: phrase-based translation

Summary

Page 140: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

Page 141: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

•Viterbi approximation: from worse to bad.

Page 142: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

•Viterbi approximation: from worse to bad.

•Dynamic programming: exact but too slow.

Page 143: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

•Viterbi approximation: from worse to bad.

•Dynamic programming: exact but too slow.

•NP-Completeness means exact solutions unlikely.

Page 144: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

•Viterbi approximation: from worse to bad.

•Dynamic programming: exact but too slow.

•NP-Completeness means exact solutions unlikely.

•Heuristic approximations: stack decoding, distortion limits.

Page 145: JHU MT class: phrase-based translation

Summary

•We need every possible trick to make decoding fast.

•Viterbi approximation: from worse to bad.

•Dynamic programming: exact but too slow.

•NP-Completeness means exact solutions unlikely.

•Heuristic approximations: stack decoding, distortion limits.

•Tradeoff: might not find true argmax.

Page 146: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

Page 147: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

Page 148: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

START Although

crystal clear

START However

Page 149: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

START Although

crystal clear

START However

Page 150: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

wind shrieked

wind screamed

north wind

Page 151: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

wind shrieked

wind screamed

north wind

Page 152: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

shrieked ,

, yet

Page 153: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

the sky

shrieked ,

, yet

Page 154: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

sky ,

Page 155: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

sky ,

Page 156: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .

Page 157: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .

Page 158: JHU MT class: phrase-based translation

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

Although the northern wind shrieked across the sky, but was still very clear.

Page 159: JHU MT class: phrase-based translation
Page 160: JHU MT class: phrase-based translation

•Some (not all) key ingredients in Google Translate:

Page 161: JHU MT class: phrase-based translation

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

Page 162: JHU MT class: phrase-based translation

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

Page 163: JHU MT class: phrase-based translation

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•... Coupled with a huge language model

Page 164: JHU MT class: phrase-based translation

•Some (not all) key ingredients in Google Translate:

•Phrase-based translation models

•... Learned heuristically from word alignments

•... Coupled with a huge language model

•... And very tight pruning heuristics