arianna bisazza advisor: marcello federico

85
Linguistically Motivated Reordering Modeling for Phrase- Based Statistical Machine Translation Arianna Bisazza Advisor: Marcello Federico Fondazione Bruno Kessler / Università di Trento PhD Thesis:

Upload: brasen

Post on 24-Feb-2016

36 views

Category:

Documents


3 download

DESCRIPTION

PhD Thesis:. Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation. Arianna Bisazza Advisor: Marcello Federico . Fondazione Bruno Kessler / Università di Trento. PSMT decoding overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Arianna  Bisazza Advisor: Marcello Federico

Linguistically Motivated Reordering Modeling for Phrase-Based

Statistical Machine TranslationArianna Bisazza

Advisor: Marcello Federico

Fondazione Bruno Kessler / Università di Trento

PhD Thesis:

Page 2: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

2

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Page 3: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

Freedom of movement

must be encouraged

LM scores

3

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

LM scores

TM scores

TM scores

ReoM scores

ReoM scores

Page 4: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

career paths

while ensuring that

Freedom of movement

must be encouraged

LM scoresLM scoresLM scores

4

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

LM scores

TM scoresTM scores TM

scores

TM scores

ReoM scores

ReoM scores

ReoM scores

ReoM scores

ReoM scores

Page 5: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

LM scoresLM scoresLM scores

5

PSMT decoding overview

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Freedom of movement must be encouraged while ensuring that career paths

LM scores

TM scoresTM scores TM

scores

TM scores

ReoM scores

ReoM scores

ReoM scores

ReoM scores

ReoM scores

Page 6: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

6

Reordering Models

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores

ReoM scores

ReoM scores

ReoM scores

Many solutions have been proposed with different reo. classes, features, train modes, etc.

Tillman 04, Zens & Ney 06Al Onaizan & Papineni 06Galley & Manning 08Green & al.10, Feng & al.10…

ReoM scores

Page 7: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

7

Reordering Models

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores

ReoM scores

ReoM scores

ReoM scores

No matter what reordering model is used, the permutation search space must be limited! The power of all reordering models is bound to the reordering constraints in use

Tillman04, Zens&Ney06AlOnaizan & Papineni06Galley & Manning08Green &al.10, Feng &al.10…

Many solutions have been proposed with different reo. classes, features, train modes, etc.

Tillman 04, Zens & Ney 06Al Onaizan & Papineni 06Galley & Manning 08Green & al.10, Feng & al.10…

ReoM scores

Page 8: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

8

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

ReoM scores

Page 9: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

9

Reordering Constraints

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

#perm = |w|! ≈40,000,000

Page 10: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

10

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Source-to-Source distortion

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

Reordering Constraints

Page 11: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

11

Source-to-Source distortion

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

DL: distortion limit

Reordering Constraints

E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali

Page 12: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

12

The problem with DL…

Arabic-English

AR

EN

AR

EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1

0

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 1110 9 8 7 6 5 4 3 2

Page 13: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

13

German-English

DE

EN

DE

EN

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1

0

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 1110 9 8 7 6 5 4 3 2

The problem with DL…

Page 14: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

14

Source-to-Source distortion

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000

Increasing the DLimit!

Current solution

Page 15: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

15

Source-to-Source distortion

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000DL=7 #perm ≈7,000,000Coarse reordering

space definition: slower

decoding worse

translations

Increasing the DLimit!

Current solution

Page 16: Arianna  Bisazza Advisor: Marcello Federico

16

Observations

• Word reordering is difficult!• The existing word reordering models are not perfect,

but they are expected to guide search over huge search spaces

Arianna Bisazza – PhD Thesis – 19 April 2013

• design a perfect model• problem: many have

already tried and failed

one way to go:

• simplify the task for the existing reordering models

our way:

Page 17: Arianna  Bisazza Advisor: Marcello Federico

17 Arianna Bisazza – PhD Thesis – 19 April 2013

• A better definition of the reordering search space (i.e. constraints) can simplify the task of the reordering model

• (Shallow) linguistic knowledge can help us to refine the reordering search space for a given language pair

Working hypotheses

Page 18: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

18

Outline

o The problemo The solutions:• verb reordering lattices• modified distortion matrices• dynamically pruning the reordering

spaceo Comparative evaluation &

conclusions

Page 19: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

19

Outline

o The problemo The solutions:• verb reordering lattices• modified distortion matrices• dynamically pruning the reordering

spaceo Comparative evaluation &

conclusions

Bisazza and Federico, Chunk-based Verb Reordering in VSO Sentences for Arabic-English, WMT 2010

Bisazza, Pighin, Federico, Chunk-Lattices for Verb Reordering in Arabic-English Statistical Machine Translation, MT Journal 2012

Page 20: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

20

Source-to-Source distortion

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000DL=7 #perm ≈7,000,000

… modify the input to allow only specific long

reorderings

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

Idea: keep a low distortion limit and

Page 21: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

21

Example of VSO sentences: the Arabic verb is anticipated wrt the English order

Typical PSMT outputs: *The Moroccan monarch King Mohamed VI __ his support to…

*He renewed the Moroccan monarch King Mohamed VI his support to…

Reordering patterns in Arabic-English

Page 22: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

22

We assume they are well handled in standard PSMT

We try to model them explicitly!

Working hypothesis

Uneven distribution of long and short-range word movements:• few long:

verb-subject-object sentences• many short:

adjective-noun head-initial genitive constructions (idafa)

Page 23: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

23

Chunk-based fuzzy reordering rules

Shallow syntax chunking:• cheaper and easier than deep parsing• constrains reorderings in a softer way

Fuzzy (non-determinisic) reordering rules:• generate N permutations for each matching sequence• final reordering decision is taken during translation,

guided by all SMT models (reoM, LM...)Few rules for language pair, to only capture long reordering

Page 24: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

24

Move verb chunk ahead by 1 to N

chunks

Move verb chunk and following

chunk ahead by 1 to N chunks

Chunk-based fuzzy reordering rules

… CH(*) CH(V) CH(*) CH(*) CH(*) CH(*) CH(*) …

CH(V) CH(*) CH(*) CH(*)… CH(*) CH(*) CH(*) …

Page 25: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

25

The optimal reordering is the one that minimizes total

distortion

Chunk-based verb reordering in parallel

data

Page 26: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

26

Chunk-based verb reordering in test data

Move verb chunk

Move verb chunk and following

chunk Verb chunk Other chunks

Page 27: Arianna  Bisazza Advisor: Marcello Federico

27

Experiments

• Task: NIST-MT09 (news translation)• Systems based on Moses, include lexicalized

phrase reordering models [Tillmann 04; Koehn & al 05]

• Non-monotonic lattice decoding [Dyer & al 08]

• Evaluation by - BLEU [Papineni & al 01] for lexical match

& local order - KRS [Birch & al 10] for global order

Arianna Bisazza – PhD Thesis – 19 April 2013

Page 28: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201328

Arabic-English:

Test set: eval09-nwLattices always used with pre-ordered trainingOracle: test pre-ordered looking at reference(more details on lattice pruning in the thesis)

Translation Quality

+0.5 BLEU+0.4 KRS

Page 29: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201329

Arabic-English:

Test set: eval09-nwLattices always used with pre-ordered trainingOracle: test pre-ordered looking at reference(more details on lattice pruning in the thesis)

Translation QualityTranslation Time

-0.1 BLEU-0.3 KRS

Pruning

Decoding

Page 30: Arianna  Bisazza Advisor: Marcello Federico

30 Arianna Bisazza – PhD Thesis – 19 April 2013

limiting long reordering of a few chunks only use lattice to represent extra reordering decoding slow down

Can we do better?

Observation:lattice topology basically distorts word-to-word distances, i.e. during decoding some distant positions become closer

Can we achieve the same effect more directly?

Lessons learned

Page 31: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

31

Outline

o The problemo The solutions:• verb reordering lattices• modified distortion matrices• dynamically pruning the reordering

spaceo Comparative evaluation &

conclusions

Bisazza and Federico, Modified Distortion Matrices for Phrase-Based Statistical Machine Translation, ACL 2012

Page 32: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

32

Source-to-Source distortion

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000DL=7 #perm ≈7,000,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

Page 33: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

33

Source-to-Source distortion

#perm = |w|! ≈40,000,000

D(wx,wy)=|y-x-1|

DL=3 #perm ≈7,000DL=7 #perm ≈7,000,000

DL=3 & modif(D)

#perm ≈20,000

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 0 0 7 8w2 3 2 0 1 2 3 0 0 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 0w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 2 2 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

Refined reordering search space

Idea: modify the distortion matrix for each test sentence!

Page 34: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

34

Arabic-English“Move verb chunk (and following

chunk) to the right by 1 to N chunks”

Chunk-basedfuzzy reordering

rules

CC1 VC2 PC3 NC4 PC5 Pct6

w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades

Page 35: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

35

Arabic-English“Move verb chunk (and following

chunk) to the right by 1 to N chunks”

CC1 VC2 PC3 NC4 PC5 Pct6

CC1 VC2PC3 NC4 PC5

VC2PC3 NC4

VC2PC3 NC4 PC5

CC1

CC1

PC5

Pct6

Pct6

Pct6

w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades

Chunk-basedfuzzy reordering

rules

Page 36: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

36

Arabic-English“Move verb chunk (and following

chunk) to the right by 1 to N chunks”

CC1 VC2 PC3 NC4 PC5 Pct6

CC1 VC2PC3 NC4 PC5

VC2 PC3NC4

VC2PC3 NC4

VC2 PC3NC4 PC5

VC2PC3 NC4 PC5

CC1

CC1

CC1

CC1

PC5

PC5

Pct6

Pct6

Pct6

Pct6

Pct6

w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades

Chunk-basedfuzzy reordering

rules

Page 37: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

37

CC1 VC2 PC3 NC4 PC5 Pct6

CC1 VC2PC3 NC4 PC5

VC2 PC3NC4

VC2PC3 NC4

VC2 PC3NC4 PC5

VC2PC3 NC4 PC5

CC1

CC1

CC1

CC1

PC5

PC5

Pct6

Pct6

Pct6

Pct6

Pct6

w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades

Chunk-basedfuzzy reordering

rulesReordering selection

Reordered source LM

0.9

0.4

0.10.10.7

Page 38: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

38

CC1 VC2 PC3 NC4 PC5 Pct6

CC1 VC2PC3 NC4 PC5

VC2 PC3

Pct6

Pct6

w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades

Chunk-basedfuzzy reordering

rulesReordering selection

Reordered source LM

0.9

0.7

0.4

0.10.1

Reorderings to include in the distortion matrix

NC4 PC5 CC1

Page 39: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

39

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 1 2 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 3 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3

Reorderings to include in the distortion matrix

NC4 PC5 CC1

Pct6

Pct6

Page 40: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

40

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 3 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 Pct6

Pct6

Reorderings to include in the distortion matrix

Page 41: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

41

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 42: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

42

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 43: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

43

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 44: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

44

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 45: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

45

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 0w3 4 2 2 0 1 2 3 0

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 46: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

46

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 0w3 4 2 2 0 1 2 3 0

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2 CC1 VC2PC3 NC4 PC5

VC2 PC3NC4 PC5 CC1

Pct6

Pct6

Reorderings to include in the distortion matrix

Page 47: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

47

Modifying the distortion

matrix

CC1 VC2 PC3 NC4 PC5 Pct6

w0 w1 w2 w3 w4 w5 w6 w7 w8<s>

0 1 2 3 4 5 6 7 8

CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6PC3

w2 3 2 0 1 2 3 4 0w3 4 2 2 0 1 2 3 0

NC4w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2

PC5w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0

Pct6 w8 9 8 7 6 5 4 3 2“ w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . ”

Decoder input

Page 48: Arianna  Bisazza Advisor: Marcello Federico

48

Experiments

• Tasks: NIST-MT09 for Ar-En, WMT10 for De-En

• Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08]

• Baseline Distortion Limits: 5 in Ar-En, 10 in De-En

• Evaluation by: - BLEU for lexical match & local order - KRS for global orderArianna Bisazza – PhD Thesis – 19

April 2013

Page 49: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201349

Arabic-English:

Test set: eval09-nwDistortion modified with 3-best reorderings per rule-matching sequence

Translation QualityTranslation Time

+0.9 BLEU+0.6 KRS

Page 50: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201350

German-English:

Test set: newstest10Distortion modified with 3-best reorderings per rule-matching sequence

Translation QualityTranslation Time

+0.5 BLEU+0.7 KRS

Page 51: Arianna  Bisazza Advisor: Marcello Federico

51 Arianna Bisazza – PhD Thesis – 19 April 2013

modified distortion matrices improve reordering without decoding overheadlanguage-specific reordering rules are still needed

Can we learn everything from the data?

Lessons learned

Page 52: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

52

Outline

o The problemo The solutions:• verb reordering lattices• modified distortion matrices• dynamically pruning the reordering

spaceo Comparative evaluation &

conclusions

Bisazza and Federico, Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation, Transactions of ACL 2013 (accepted with minor revisions)

Page 53: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

53

A fully data-driven approach

• Train a binary classifier to learn if an input word wy is to be translated right after another wx

Word-after-Word (WaW) reordering model

“... anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet ”

yesnonononono

• No rules required, all is learnt from parallel data• Approach is easily portable to new language

pairs with similar reordering characteristics

Page 54: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

54

[usual approach] additional feature function

[novel approach dynamically prune the reordering space:

➞ use model score to decide (early) if a given reordering path is promising enough to be further explored

Decoder-integration

usual approach

novel approach

Page 55: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

55

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentence

Page 56: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

56

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentence

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.3

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Consider a larger space (DL)

Page 57: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

57

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentenceConsider a larger space (DL)

Page 58: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

58

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentenceConsider a larger space (DL)Dynamically prune reorderings before each hypothesis expansion

Page 59: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

59

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentenceConsider a larger space (DL)Dynamically prune reorderings before each hypothesis expansionFor example after “Die”…

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Page 60: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

60

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSta

at~ an

walts

chaf

t h

at ih

re

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentence

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Consider a larger space (DL)Dynamically prune reorderings before each hypothesis expansionFor example after “Die”…

Page 61: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

61

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentence

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Consider a larger space (DL)Dynamically prune reorderings before each hypothesis expansionFor example after “Die”…… after “Staat”…

Page 62: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

62

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSta

at~ an

wal

tsch

aft h

at ih

re

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Early reordering pruning

Test time: run classifier for each input sentence

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Consider a larger space (DL)Dynamically prune reorderings before each hypothesis expansionFor example after “Die”…… after “Staat”…

Page 63: Arianna  Bisazza Advisor: Marcello Federico

Improved Word Reordering for PBSMT

63

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Decoder-integration

How to reduce early pruning errors? always allow short jumps!

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

Page 64: Arianna  Bisazza Advisor: Marcello Federico

Improved Word Reordering for PBSMT

64

0.6

0.5

0.2

0.1

0.3

0.1

0.1

0.2

0.2

0.1

10

0.6

0.5

0.1

0.3

0.1

0.1

0.4

0.1

0.2

0.1

0.6

0.9

0.4

0.2

0.2

0.1

0.1

0.2

0.1

0.1

0.6

0.5

0.8

0.4

0.2

0.3

0.4

0.4

0.2

0.2

0.2

0.4

0.3

0.9

0.3

0.4

0.6

0.2

0.5

0.3

0.1

0.3

0.6

0.7

0.9

0.3

0.4

0.6

0.7

0.1

0.1

0.1

0.4

0.5

0.2

0.6

0.8

0.4

0.4

0.2

0.4

0.2

0.3

0.4

0.6

0.2

0.8

0.4

0.1

0.1

0.1

0.1

0.1

0.3

0.5

0.3

0.1

0.9

0.5

0.7

0.2

0.2

0.1

0.2

0.2

0.2

0.1

0.4

0.6

0.5

0.1

0.1

0.2

0.1

0.1

0.8

0.6

0.1

0.3

0.6

0.1

0.1

0.1

0.1

0.1

0.2

0.1

0.3

0.1

0.1

DieBudapester

Staat~anwaltschaft

hatihre

Ermittlungenzum

Vorfall eingeleitet

.

<S>

Die

Buda

pest

erSt

aat

~ anwa

ltsch

aft

hat

ihre

Erm

ittlu

nge

n zum

Vorfa

ll e

inge

leite

t

.

Decoder-integration

How to reduce early pruning errors? always allow short jumps!

Off limits

Prunable zone

Non-prunable zone

Page 65: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

65

Experiments

• Same tasks• Similar baselines, but with early distortion

cost [Moore & Quirk 07]• Baseline Distortion Limit: 8• Evaluation by: - BLEU, KRS

- KRS-V Weighted KRS, only sensitive to verbs

Page 66: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201366

Arabic-English:

Translation Quality

+0.3 BLEU+0.8 KRS-V

Test set: eval09-nwNon-prunable zone width: 5(more metrics and test sets in the thesis)

Page 67: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201367

Arabic-English:

Translation QualityTranslation Time

+0.6 BLEU+1.2 KRS-V

Test set: eval09-nwNon-prunable zone width: 5(more metrics and test sets in the thesis)

Page 68: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201368

German-English:

Translation Quality

Test set: newstest10Non-prunable zone width: 5(more metrics and test sets in the thesis)

+0.2 BLEU+0.7 KRS-V

Page 69: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201369

German-English:

Translation Quality

Test set: newstest10Non-prunable zone width: 5(more metrics and test sets in the thesis)

Translation Time

+1.3 BLEU+4.0 KRS-V

Page 70: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

70

Outline

o The problemo The solutions:• verb reordering lattices• modified distortion matrices• dynamically pruning the reordering

spaceo Comparative evaluation &

conclusions

Page 71: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

71

Experiments

• Same PSMT baselines• Best enhanced PSMT systems:

- Ar-En: WaW model & erly reo. pruning- De-En: reo. lattices pruned with reo. source LM

• Hierarchical phrase-based system:- default configuration (max span for rule extract.:

10 words)- max span for decoding: 10 or 20

• Evaluation by:- BLEU, KRS- KRS-V Weighted KRS, only sensitive to verbs

Page 72: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201372

Translation QualityTranslation Time

Test set: eval09-nwNon-prunable zone width: 5(more metrics and test sets in the thesis)

Arabic-English:

Page 73: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 201373

Translation Quality

Test set: newstest10Lattices pruned with reo. source LM(more metrics and test sets in the thesis)

Translation Time

German-English:

Page 74: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

74

Arabic-English examples (1)

Page 75: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

75

Arabic-English examples (1)

Page 76: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

76

Arabic-English examples (2)

Page 77: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

77

Arabic-English examples (2)

Page 78: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

78

German-English examples (1)

Page 79: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

79

German-English examples (1)

Page 80: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

80

German-English examples (2)

Page 81: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

81

German-English examples (2)

Page 82: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

82

Conclusions• Our techniques advance the state of the art in

reordering modeling within the PSMT framework: capture long-range reordering patterns without

sacrificing decoding efficiency proved importance of refining the reordering

search space

• Positive results on large-scale news translation task in two difficult language pairs: significant gains in reordering-specific metrics

while generic scores are preserved or increased our best PSMT systems compare favorably with a

strong tree-based approach (HSMT) - both in quality and efficiency

Page 83: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

83

Future Directions

• Improve the proposed methods by: refining chunk-based reordering rules with POS or

lexical clues increasing accuracy of WaW model with new

features combining different reordering scores for early

pruning• Evaluate on language pairs with similar reordering

characteristics• Analyze the effect of improved long reordering on

post-editing effort by human translators• Address the problem of reordering search space

definition in HSMT, possibly with analogous strategies

Page 84: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

85

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 T 0 1 2 3 4 5 6 7w3 4 H 2 0 1 2 3 Y 5 6w4 5 A T T E N T I O N !w5 6 N 4 3 2 0 1 U 3 4w6 7 K 5 4 3 2 F O R 2 3w7 8 S 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2

Page 85: Arianna  Bisazza Advisor: Marcello Federico

Arianna Bisazza – PhD Thesis – 19 April 2013

86

w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

<s>

0 1 2 3 4 5 6 7 8 9 10

w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 T 0 1 2 3 4 5 6 7w3 4 H 2 0 1 2 3 Y 5 6w4 5 A T T E N T I O N !w5 6 N 4 3 2 0 1 U 3 4w6 7 K 5 4 3 2 F O R 2 3w7 8 S 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2