jhu mt class: phrase-based translation
TRANSCRIPT
Phrase-BasedTranslation
The IBM Models
•Fertility probabilities.
•Word translation probabilities.
•Distortion probabilities.
•Some problems:
•Weak reordering model -- output is not fluent.
•Many decisions -- many things can go wrong.
The IBM Models
•Fertility probabilities.
•Word translation probabilities.
•Distortion probabilities.
•Some problems:
•Weak reordering model -- output is not fluent.
•Many decisions -- many things can go wrong.
Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然
ε
ε ε北 风 呼啸 , 天空 天空 依然 清澈 。
north wind strong , the sky remained clear . under theHowever
IBM Model 4
Tradeoffs: Modeling v. Learning
IBM Model 1 ✔ ✘ ✘ ✔ ✔
HMM ✔ ✔ ✘ ✘ ✔
IBM Model 4 ✔ ✔ ✔ ✘ ✘
Lexical
Tran
slatio
n
Local orderi
ng depen
dency
Fertilit
y
Convex
Tracta
ble Exa
ct
Inferen
ce
Tradeoffs: Modeling v. Learning
IBM Model 1 ✔ ✘ ✘ ✔ ✔
HMM ✔ ✔ ✘ ✘ ✔
IBM Model 4 ✔ ✔ ✔ ✘ ✘
Lexical
Tran
slatio
n
Local orderi
ng depen
dency
Fertilit
y
Convex
Tracta
ble Exa
ct
Inferen
ce
Lesson:Trade exactnessfor expressivity
Although north wind howls , but sky still very clear .虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然
ε
ε ε北 风 呼啸 , 天空 天空 依然 清澈 。
north wind strong , the sky remained clear . under theHowever
IBM Model 4
What are some things this model doesn’t account for?
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 天空 天空 依然 清澈 。
north wind strong , the sky remained clear . under theHowever
ε
ε
ε
What are some things this model doesn’t account for?
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However ,
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
distortion = 6
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
distortion = 6
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
distortion = 6
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However the strong north wind , the sky remained clear under .
However , the sky remained clear under the strong north wind .
p(English, alignment|Chinese) =p(segmentation) · p(translations) · p(reorderings)
Phrase-based Models
distortion = 6
Phrase-based Models
•Segmentation probabilities.
•Phrase translation probabilities.
•Distortion probabilities.
•Some problems:
•Weak reordering model -- output is not fluent.
•Many decisions -- many things can go wrong.
Phrase-based Models
•Segmentation probabilities.
•Phrase translation probabilities.
•Distortion probabilities.
•Some problems:
•Weak reordering model -- output is not fluent.
•Many decisions -- many things can go wrong.
Phrase-based Models
•Segmentation probabilities: fixed (uniform)
•Phrase translation probabilities.
•Distortion probabilities: fixed (decaying)
•
Learning p(Chinese|English)
•Reminder: (nearly) every problem comes down to computing either:
•Sums: MLE or EM (learning)
•Maximum: most probable (decoding)
Recap: Expectation Maximization
•Arbitrarily select a set of parameters (say, uniform).
•Calculate expected counts of the unseen events.
•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.
•Iterate.
•Guaranteed that likelihood is monotonically nondecreasing.
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p( )
) +
) +
Marginalize: sum all alignments containing the link
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
) +
) +
)
Divide by sum of all possible alignments
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
p(
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
However , the sky remained clear under the strong north wind .
p(
) +
) +
)
Divide by sum of all possible alignments
We have to sum over exponentially many alignments!
EM for Model 1
北p(north| ).�
c∈Chinese words p(north|c)
EM for Phrase-Based
•Model parameters: p(E phrase|F phrase)
•All we need to do is compute expectations:
p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)
p(F,E)
EM for Phrase-Based
•Model parameters: p(E phrase|F phrase)
•All we need to do is compute expectations:
p(F,E) sums over all possible phrase alignments...which are one-to-one by definition.
p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)
p(F,E)
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .
However
EM for Phrase-Based
Can we compute this quantity?
, the sky remained clear under the strong north wind .
How many 1-to-1 alignments are there ofthe remaing 8 Chinese and 8 English words?
p(ai,i� = �j, j��|F,E) =p(ai,i� = �j, j��, F |E)
p(F,E)
Recap: Expectation Maximization
•Arbitrarily select a set of parameters (say, uniform).
•Calculate expected counts of the unseen events.
•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.
•Iterate.
•Guaranteed that likelihood is monotonically nondecreasing.
Recap: Expectation Maximization
•Arbitrarily select a set of parameters (say, uniform).
•Calculate expected counts of the unseen events.
•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.
•Iterate.
•Guaranteed that likelihood is monotonically nondecreasing.
Computing expectations from a phrase-based model, given a sentence pair, is #P-Complete(by reduction to counting perfect matchings;
DeNero & Klein, 2008)
Now What?
•Option #1: approximate expectations
•Restrict computation to some tractable subset of the alignment space (arbitrarily biased).
•Markov chain Monte Carlo (very slow).
Now What?•Change the problem definition
•We already know how to learn word-to-word translation models efficiently.
•Idea: learn word-to-word alignments, extract most probable alignment, then treat it as observed.
•Learn phrase translations consistent with word alignments.
•Decouples alignment from model learning -- is this a good thing?
Japanesewatashi wa
hako wohako woakemasuakemasu
hako wo akemasu
EnglishI
the boxbox
openopen the
open the box
p(E|J)1.00.50.50.50.51.0
A translation model is simply a probabilistic phrase dictionary.
I open the box
watashi wa hako wo akemasu
I open the box
watashi wa hako wo akemasu
word alignment:expectation maximization
I open the box
watashi
wa
hako
wo
akemasu
I open the box
watashi
wa
hako
wo
akemasu
akemasu / open
I open the box
watashi
wa
hako
wo
akemasu
akemasu / open
I open the box
watashi
wa
hako
wo
akemasu
watashi wa / I
I open the box
watashi
wa
hako
wo
akemasu
watashi wa / I
I open the box
watashi
wa
hako
wo
akemasu
hako wo / box
I open the box
watashi
wa
hako
wo
akemasu
hako wo / the box
I open the box
watashi
wa
hako
wo
akemasu
hako wo / open the box✘
I open the box
watashi
wa
hako
wo
akemasu
hako wo akemasu / open the box
Japanesewatashi wa
hako wohako woakemasuakemasu
hako wo akemasu
EnglishI
the boxbox
openopen the
open the box
Japanesewatashi wa
hako wohako woakemasuakemasu
hako wo akemasu
EnglishI
the boxbox
openopen the
open the box
#(E,J)111111
Japanesewatashi wa
hako wohako woakemasuakemasu
hako wo akemasu
EnglishI
the boxbox
openopen the
open the box
#(E,J)111111
#(J)122221
Japanesewatashi wa
hako wohako woakemasuakemasu
hako wo akemasu
EnglishI
the boxbox
openopen the
open the box
#(E,J)111111
#(J)122221
p(E|J)1.00.50.50.50.51.0
Phrasal Translation Estimation
Phrasal Translation Estimation
•Option #1 (EM over restricted space)
•Align with a word-based model.
•Compute expectations only over alignments consistent with the alignment grid.
Phrasal Translation Estimation
•Option #1 (EM over restricted space)
•Align with a word-based model.
•Compute expectations only over alignments consistent with the alignment grid.
•Option #2 (Non-global estimation)
•View phrase pairs as observed, irrespective of context or overlap.
Phrasal Translation Estimation
•Option #1 (EM over restricted space)
•Align with a word-based model.
•Compute expectations only over alignments consistent with the alignment grid.
•Option #2 (Non-global estimation)
•View phrase pairs as observed, irrespective of context or overlap.
•Heuristic solutions to intractable exact problem.
Decoding
We want to solve this problem:e∗ = arg max
ep(e|f)
Decoding
We want to solve this problem:
Q: how many English sentences are there?
e∗ = arg maxe
p(e|f)
北 风 呼啸 。
北 风 呼啸 。
segmentationssubstitutionspermutations
北 风 呼啸 。
O(2n)segmentationssubstitutionspermutations
北 风 呼啸 。
O(2n)O(5n)
segmentationssubstitutionspermutations
北 风 呼啸 。
O(2n)O(5n)O(n!)
segmentationssubstitutionspermutations
北 风 呼啸 。
Can we do this without enumerating pairs?O(10nn!)
北 风 呼啸 。
the strong north wind .
Can we do this without enumerating pairs?O(10nn!)
北 风 呼啸 。
the strong north wind .
Can we do this without enumerating pairs?O(10nn!)
北 风 呼啸 。
the strong north wind .
Given a sentence pair and an alignment, we can easily calculate
p(English, alignment|Chinese)
Can we do this without enumerating pairs?O(10nn!)
北 风 呼啸 。
the strong north wind .
There are target sentences.
Key Idea
But there are only ways to start them.
O(10nn!)
O(5n)
北 风 呼啸 。
Key Idea
北 风 呼啸 。
Key Idea
北 风 呼啸 。
coverage vector
Key Idea
北 风 呼啸 。
coverage vector
Key Idea
north
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
北 风 呼啸 。
coverage vector
Key Idea
north
northern
p(north|START ) · p( |north)北
北 风 呼啸 。
coverage vector
Key Idea
north
northern
p(north|START ) · p( |north)北
p(northern|START ) · p( |northern)北
北 风 呼啸 。
coverage vector
Key Idea
north
northern
strong
p(north|START ) · p( |north)北
p(northern|START ) · p( |northern)北
北 风 呼啸 。
coverage vector
Key Idea
north
northern
strong
p(north|START ) · p( |north)北
p(northern|START ) · p( |northern)北
p(strong|START ) · p( |strong)呼啸
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
wind
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
wind
p(wind|north) · p( |wind)风
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
wind
p(wind|north) · p( |wind)风
strong
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
wind
p(wind|north) · p( |wind)风
p(strong|north) · p( |strong)呼啸
strong
北 风 呼啸 。
coverage vector
Key Idea
northp(north|START ) · p( |north)北
wind
p(wind|north) · p( |wind)风
Work done at sentence beginnings is shared across many possible output sentences!
p(strong|north) · p( |strong)呼啸
strong
Key Idea
Key Idea
Key Idea
Key Idea
Key Idea
Key Idea
Dynamic Programming
Key Idea
Dynamic Programming
amount of work:O(5n2n)
Key Idea
Dynamic Programming
bad, but much better thanO(10n
n!)amount of work:
O(5n2n)
Key Idea
Dynamic Programming
bad, but much better thanO(10n
n!)amount of work:
O(5n2n)
Key Idea
Dynamic Programming
each edge labelled with a weight and a
word (or words)
bad, but much better thanO(10n
n!)amount of work:
O(5n2n)
Key Idea
Dynamic Programming
each edge labelled with a weight and a
word (or words)
north, 0.014
bad, but much better thanO(10n
n!)amount of work:
O(5n2n)
Key Idea
Dynamic Programming
each edge labelled with a weight and a
word (or words)
north, 0.014
weighted finite-state automata
bad, but much better thanO(10n
n!)amount of work:
O(5n2n)
Weighted languages
•The lattice describing the set of all possible translations is a weighted finite state automaton.
•So is the language model.
•Since regular languages are closed under intersection, we can intersect the devices and run shortest path graph algorithms.
•Taking their intersection is equivalent to computing the probability under Bayes’ rule.
Wait a second!
We want to solve this problem:e∗ = arg max
ep(e|f)
Wait a second!
We want to solve this problem:e∗ = arg max
ep(e|f)
= arg maxe
�
a
p(e,a|f)
Wait a second!
We want to solve this problem:e∗ = arg max
ep(e|f)
But now we’re solving this problem:e∗ = arg max
emax
ap(e,a|f)
= arg maxe
�
a
p(e,a|f)
Wait a second!
We want to solve this problem:e∗ = arg max
ep(e|f)
But now we’re solving this problem:e∗ = arg max
emax
ap(e,a|f)
= arg maxe
�
a
p(e,a|f)
Often called the Viterbi approximation
We can sum over alignments by weighted determinization
Wait a second!
How expensive is it?
We can sum over alignments by weighted determinization
Wait a second!
How expensive is it?
We can sum over alignments by weighted determinization
O(5n2n)nondeterministic
Wait a second!
How expensive is it?
We can sum over alignments by weighted determinization
O(5n2n)nondeterministic
O(25n2n
)deterministic
Wait a second!
I made the simplest machine translation model I could think of
and it blew up in my face
is still far too much work.O(5n2n)
I made the simplest machine translation model I could think of
and it blew up in my face
is still far too much work.
Can we do better?
O(5n2n)
北 风 呼啸 。
Can we do better?
北 风 呼啸 。
north wind the strong .
Can we do better?
北 风 呼啸 。
north wind the strong .
Can we do better?
北 风 呼啸 。
north wind the strong .
Can we do better?
Each arc weighted by translation probability +
bigram probability
北 风 呼啸 。
north wind the strong .
Can we do better?
Each arc weighted by translation probability +
bigram probability
Objective: find shortest path that visits each word once.
北 风 呼啸 。
London Paris NY Tokyo .
Can we do better?
Each arc weighted by translation probability +
bigram probability
Objective: find shortest path that visits each word once.
北 风 呼啸 。
London Paris NY Tokyo .
Can we do better?
Each arc weighted by translation probability +
bigram probability
Objective: find shortest path that visits each word once.
Probably not: this is the traveling salesman problem.
北 风 呼啸 。
London Paris NY Tokyo .
Can we do better?
Each arc weighted by translation probability +
bigram probability
Objective: find shortest path that visits each word once.
Probably not: this is the traveling salesman problem.Even the Viterbi approximation is too hard to solve!
Approximation: Pruning
Approximation: Pruning
Idea: prune states by accumulated path length
Approximation: Pruning
Idea: prune states by accumulated path length
Approximation: Pruning
Approximation: Pruning
Reality: longer paths have lower probability!
Approximation: Pruning
Approximation: Pruning
Approximation: Pruning
Solution: Group states by number of covered words.
Approximation: Pruning
Solution: Group states by number of covered words.
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
Approximation: Pruning
“Stack” decoding: a linear-time approximation
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
Approximation: Distortion Limits
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
O(2n)number of vertices:
Approximation: Distortion Limits
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
O(2n)number of vertices:
d = 4window
Approximation: Distortion Limits
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
O(2n)number of vertices:
d = 4window
outside windowto left: covered
outside windowto right: uncovered
Approximation: Distortion Limits
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
number of vertices:
d = 4window
outside windowto left: covered
outside windowto right: uncovered
O(n2d)
Approximation: Distortion Limits
Summary
Summary
•We need every possible trick to make decoding fast.
Summary
•We need every possible trick to make decoding fast.
•Viterbi approximation: from worse to bad.
Summary
•We need every possible trick to make decoding fast.
•Viterbi approximation: from worse to bad.
•Dynamic programming: exact but too slow.
Summary
•We need every possible trick to make decoding fast.
•Viterbi approximation: from worse to bad.
•Dynamic programming: exact but too slow.
•NP-Completeness means exact solutions unlikely.
Summary
•We need every possible trick to make decoding fast.
•Viterbi approximation: from worse to bad.
•Dynamic programming: exact but too slow.
•NP-Completeness means exact solutions unlikely.
•Heuristic approximations: stack decoding, distortion limits.
Summary
•We need every possible trick to make decoding fast.
•Viterbi approximation: from worse to bad.
•Dynamic programming: exact but too slow.
•NP-Completeness means exact solutions unlikely.
•Heuristic approximations: stack decoding, distortion limits.
•Tradeoff: might not find true argmax.
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
START Although
crystal clear
START However
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
START Although
crystal clear
START However
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
wind shrieked
wind screamed
north wind
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
wind shrieked
wind screamed
north wind
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
shrieked ,
, yet
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
the sky
shrieked ,
, yet
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
sky ,
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
sky ,
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。clear .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。still quite
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。blue .
虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。
Although the northern wind shrieked across the sky, but was still very clear.
•Some (not all) key ingredients in Google Translate:
•Some (not all) key ingredients in Google Translate:
•Phrase-based translation models
•Some (not all) key ingredients in Google Translate:
•Phrase-based translation models
•... Learned heuristically from word alignments
•Some (not all) key ingredients in Google Translate:
•Phrase-based translation models
•... Learned heuristically from word alignments
•... Coupled with a huge language model
•Some (not all) key ingredients in Google Translate:
•Phrase-based translation models
•... Learned heuristically from word alignments
•... Coupled with a huge language model
•... And very tight pruning heuristics