imposing constraints from the source tree on itg constraints for smt hirofumi yamamoto, hideo okuma,...

29
Imposing Constraints from the Source Tree on ITG Constraints for SMT rofumi Yamamoto, Hideo Okuma, Eiichiro Sumi National Institute of Information and Communications Technology R Spoken Language Communication Research Lab ai University School of Science and Engineer Department of Information

Upload: sylvia-merritt

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Imposing Constraints from the Source Tree on ITG Constraints for SMT

Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita

National Institute of Information and Communications Technology

ATR Spoken Language Communication Research Labs.

Kindai University School of Science and Engineering Department of Information

Page 2: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Background

In current SMT, erroneous word reordering is one of the most serious problems, especially for dis- similar language pair such as English-Chinese or English-Japanese.

1) To introduce linguistic syntax directly.

Not robust to parsing error

Tree-to-stringString-to-tree

Tree-to-tree

Page 3: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Background

In current SMT, erroneous word reordering is one of the most serious problems, especially for not similar language pair such as English-Chinese or English-Japanese.

2) To assign probabilistic constraints for word reordering

Weaker constraints than the first type

To introduce syntax information to second type

IBM distortion, Lexical reordering, ITG

Page 4: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

ITG Constraints

Translation source sentences are represented by binary tree. Translation target sentences can be generated by rotating branches of nodes of source tree.

BA DC

db ca

BA DC

ac bd

Above target word order cannot be generated from any source binary tree.Source binary tree instance is not considered.

Page 5: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Basic Idea of IST-ITG

To use ITG constraints under the given source tree

BA DC

BA DC

abcd, abdc, bacd, badc,cdad, cdba, dcab, dcba

abcd, bacd, cabd, cbad,dabc, dbac, dcab, dcba

In original ITG constraints, 22 combinations are allowed.

Page 6: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

The Number of Word Order Combinations

For binary source tree, word order combinations are allowed without constraints. Under the IST-ITG constraints, this number is reduced to .12 N

6N720!N

322 1 N

!N

If

Without constraintsITG constraintsIST-ITG

394

10N

800,628,3!N

5122 1 N

If

Without constraintsITG constraintsIST-ITG

098,206

Page 7: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Extension to Non-binary Tree

Parsing results sometimes are not binary tree.

For the nodes which have more than two branches, any word reorderings are allowed.

BA DC

abcd, abdc, acbd, acdb,adbc, adcb, bcda, bdca, cbda, cdba,dbca, dcba

Page 8: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Extension to Non-binary Tree

Parsing results sometimes are not binary tree.

For the node which have more than two branches, any word reorderings are allowed.

For non-binary tree, the number of combinations of IST-ITG can represented by . ( represents number of branches in -th node)

n

i iB1)!(

iB i

Page 9: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (1)

×   The unit of parsing tree is “word”, but the unit of phrase-based SMT is “phrase”. Units are different.

Additional rules for phrase-based SMT

1) Word reordering that breaks a phrase is not allowed.

2) Phrase internal word reordering is not checked.

○   Word-to-word alignments are sometimes not one-to-one. But phrase-to-phrase alignments are always one-to-one

Page 10: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

E F G

2 3

APh

B C D

14

5

1:NG 2:NG 3:OK 4:NG 5:OK(unacceptable)

Page 11: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

E F G

2 3

APh

B C D

14

5

1:NG 2:NG 3:OK 4:NG 5:OK

Ph

Page 12: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

E F G

2 3

APh

B C D

14

5

1:NG 2:NG 3:OK 4:NG 5:OK

Ph

Page 13: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

E FG

2 3

APh

B C D

14

5

1:NG 2:NG 3:OK 4:NG 5:OK

Page 14: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

EFG

2 3

APh

B C D

14

5

1:NG 2:NG 3:OK 4:NG 5:OK

Ph

Page 15: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

IST-ITG in Phrase-based SMT (2)

EFG

2 3

APh

BCD

14

5

1:NG 2:NG 3:OK 4:NG 5:OK

Page 16: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Decoding Algorithm with IST-ITG

E F GA B C D1 0 00 0 0 1

12

0

2

0

0:Untranslated   1 : Translated   2 : Translating

d e

H I0 0

0

Page 17: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Decoding Algorithm with IST-ITG

E F GA B C D1 0 01 1 0 1

12

2

NG

0

H I0 0

0

If phrases A and B are translated,Sub-tree that includes more than two “2”    NG

d e a b

Page 18: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Decoding Algorithm with IST-ITG

E F GA B C D1 0 00 0 0 1

12

0

2

0

H I0 0

0

Consider minimum Translating sub-tree(sub-tree that includes both “0” and “1”.)

d e

Page 19: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Decoding Algorithm with IST-ITG

E F GA B C D1 1 10 0 0 1

11

0

2

1

H I1 0

2

All of minimum Translating sub-tree are translated.     OK

d e f g h

Page 20: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Decoding Algorithm with IST-ITG

E F GA B C D1 0 10 0 0 1

12

0

2

2

H I0 0

0

Translate sub-part of minimum Translating sub-tree.     OK

d e g

Page 21: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

English and Japanese Patent Corpus Experiments

# of sent. Total Words # of entry

E/J Train

E/J Dev

E/J Eval

Experimental corpus size

1.8M

916

899

60M/64M

30K/32K

29K/32K

188K/118K

4,072/3,646

3,967/3,682

Single reference

Page 22: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Other Experimental Conditions

LM training: SRI Language model toolkit (5-grams)Word alignment for TM training: GIZA++Decoder: Moses compatible in-house decoder named CleopATRa

Evaluation measures

BLEU,NIST,WER,PER

njj eeXeeee ,...,,,,...,,, 2321

jnj eeeeXee ,...,,,,,,..., 3212

Page 23: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

English and Japanese Patent TranslationExperimental Results

IBM+Lex

IBM+Lex+IST

BLEU NIST WER PER

31.17

32.20

7.50

7.61

76.30

71.18

38.61

38.15

English-to-Japanese

IST-ITG 30.26 7.41 74.90 38.93

Monotone 24.91 6.95 79.97 40.02

No Constraint 26.83 7.19 81.10 39.52

IBM 28.34 7.29 78.35 39.25

Page 24: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

English and Japanese Patent TranslationExperimental Results

IBM+Lex

IBM+Lex+IST

BLEU NIST WER PER

31.17

32.20

7.50

7.61

76.30

71.18

38.61

38.15

English-to-Japanese

IST-ITG 30.26 7.41 74.90 38.93

Monotone 24.91 6.95 79.97 40.02

No Constraint 26.83 7.19 81.10 39.52

IBM 28.34 7.29 78.35 39.25

Page 25: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

English and Japanese Patent TranslationExperimental Results

IBM+Lex

+IST-ITG

BLEU NIST WER PER

29.93

29.77

7.54

7.50

77.27

72.80

39.12

39.73

Japanese-to-English

Page 26: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

English and Japanese Patent TranslationExperimental Results

IBM+Lex

+IST-ITG

BLEU NIST WER PER

29.93

29.77

7.54

7.50

77.27

72.80

39.12

39.73

Japanese-to-English

Page 27: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Chinese-to-English Translation Experiments

NIST MT08   English-to-Chinese track

IBM+Lex

+IST-ITG

W-Bleu C-Bleu WER CER

21.0

23.2

35.2

37.0

75.069.7

74.167.9

Experimental Results

Training data for TMTraining data for LMDevelopment dataEvaluation data

6.2M20.1M1,6641,859

1 reference4 reference

Page 28: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Conclusion

We proposed new word reordering constrains IST-ITG using source tree structure. It is extension of ITG constraints.

We conducted three experiments of proposed method: E-J and J-E patent translation and NIST MT08 E-C track. In all experiments, improvements of BLEU and WER are confirmed.

Especially, improvement for WER is very large, and effectiveness for global word reordering is confirmed.

Page 29: Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information

Thank you!