head finalization reordering for chinese-to-japanese...

25
Head Finalization Reordering for Chinese-to-Japanese Machine Translation Dan Han *, Katsuhito Sudoh †, Xianchao Wu $, Kevin Duh ʄ, Hajime Tsukada †, Masaaki Nagata † * National Institute of Informatics † NTT Communication Science Laboratories $ Baidu Japan, Inc. ʄ Nara Institute of Science and Technology(NAIST)

Upload: others

Post on 09-Nov-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Head Finalization Reordering for

Chinese-to-Japanese

Machine Translation

Dan Han *, Katsuhito Sudoh †, Xianchao Wu $, Kevin Duh ʄ, Hajime Tsukada †, Masaaki Nagata † * National Institute of Informatics † NTT Communication Science Laboratories $ Baidu Japan, Inc. ʄ Nara Institute of Science and Technology(NAIST)

Page 2: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

2

Outline

Introduction

Motivation & Objective

Syntactic-Based Reordering Rules

Other Reordering Issues

Conclusion

Page 3: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Introduction

English and Chinese are head-initial languages, while

Japanese is a typical head-final language.

Head Finalization Reordering Rule for English-

Japanese MT (Isozaki et al. 2010).

Using the parsed result of Enju, an HPSG parser (Miyao and

Tsujii, 2008) that outputs the syntactic heads.

Move syntactic heads to the end of the

corresponding syntactic constituents

Methodology

3

Page 4: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Introduction (cont.)

c0

c1

c2

t0

c3

c4

t1

c5

c6

t2

c7

t3

*

*

* * *

* *

*

-- H.Isozaki, K.Sudoh 2010

John hit a ball

Head-Final English (HFE)

* Indicate the syntactic head

c0

c1

c2

t0

c3 *

*

*

c4

t1

*

*

c5

c6

t2

c7

t3

* *

*

John hit a ball

ジャン(は) 打った - ボ-ル(を)

4

Page 5: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Motivation

HF works well for Japanese-English, but not for other

language pairs.

Attempt to remedy it at the level of syntactic-head

analysis.

Discrepancies in Head Definition among Chinese,

Japanese, and English cause reordering issues while

implementing HF.

In this work, example of analysis and solution for

Chinese-to-Japanese.

5

Page 6: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

私(は) と 京都 (に) 行く 。 東京

我 去 东京 和 京都 。

c0

c2

c1

t0

c10

t5

c3

c4

t1

c5

t2

c6

c7

c8

t3

c9

t4

I go to Tokyo and Kyoto .

c0

c2

c1

t0

c10

t5

c3

c4

t4

c5

c7

c8

t2

c9

t3

t1

c6

我 。 和 东京 去 京都

私(は) と 京都 (に) 行く 。 東京

Problem!

Head-Final Chinese (HFC)

6

Page 7: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

c0

c2

c1

t0

c10

t5

c3

c4

t4

c5

t1

c6

c7

c8

t2

c9

t3

我 东京 和 京都 去 。

私(は) と 京都 (に) 行く 。 東京

Perfect!

Refined-Head-Final Chinese (Refined-HFC)

私(は) と 京都 (に) 行く 。 東京

我 去 东京 和 京都 。

c0

c2

c1

t0

c10

t5

c3

c4

t1

c5

t2

c6

c7

c8

t3

c9

t4

I go to Tokyo and Kyoto .

7

Page 8: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

8

Objective

Present detailed syntactic analysis of reordering

issues.

Define novel reordering rules based on HF and

linguistically inspired refinements.

Page 9: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

9

Syntactic-based Reordering Rules

Aspect Particle

Adverbial Modifier 'bu4'

Sentence-final Particle

Et cetera

Punctuation

Coordination

Page 10: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

我去过东京。

我 (I) 东京(Tokyo) 过(have) 去(been to)

我 东京 去 过

私(は) 東京(に) い った

I have been to Tokyo.

HFC

R-HFC

Ch

En

Ja

Example for Aspect Particle

Aspect Particle

c0

c1

t0

c2

c8

t4

c3

c4

t3

c7

*

*

*

*

*

*

c5

t1

c6

t2

* *

*

东京

去 过

10

Page 11: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Adverbial Modifier 'bu4'

我不看电视。

我(I) 电视(TV) 不(do not) 看(watch)

我 电视 看 不

私(は) テレビ(を) 見 ない

I do not watch TV.

HFC

R-HFC

Ch

En

Ja

Example for Adverbial Modifier bu4 c0

c1

t0

c2

c8

t4

c3

c4

t3

c7

*

*

*

*

*

*

c5

t1

c6

t2

* *

*

不 看

电视

11

Page 12: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Sentence-final Particle

天气是真好啊。

啊 天气(weather) 真好(good) 是(is)

天气 真好 是 啊

いい 天気 です ね

It is good weather.

HFC

R-HFC

Ch

En

Ja

Example for Sentence-final Particle c0

c8

t4

c1

c7

t3

c2

c4

t2

c6

*

*

*

*

*

*

c5

t1

c3

t0

*

*

*

天气

是 真好

12

Page 13: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Et cetera

水果包括苹果等。

水果(fruits) 等(etc.) 苹果(apples) 包括(include)

水果 苹果 等 包括

果物(は) リンゴ など(を) 含む

Fruits include apples, etc.

HFC

R-HFC

Ch

En

Ja

Example for Et cetera. c0

c1

t0

c2

c8

t4

c3

c5

t3

c4

*

*

*

*

*

*

c6

t1

c7

t2

* *

* 。

水果

包括 苹果

13

Page 14: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

14

Experiments

Training data: CWMT data set (News domain, 282K sentences)

Additional training data: XINHUA parallel data sets (News

domain, 593K sentences) is used to compare the results.

Dev & Test data: CWMT data set (1K sentences)

Word alignment: GIZA++

Decoder: Moses

Page 15: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

15

Experiments (cont.)

BLEU and RIBES scores while CWMT corpus was used for

training.

Page 16: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

16

Experiments (cont.)

BLEU and RIBES scores while CWMT ext. corpus was used

for training.

Page 17: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

17

Other Reordering Issues

Serial Verb Construction

Complementizer

Verbal Nominalization and Nounal

Verbalization

Adverbial Modifier

POS tagging and Parsing Errors

Page 18: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

严格处罚违法行为

严格 违法 行为 处罚

違法 行為(を) 厳しく 処罰

Severely penalize unlawful act

HFC

Ch En

Ja

Example for Adverbial Modifier

Example for Complementizer

忙完了

完 忙 了

忙しく なくなり ました

Have finished

HFC

Ch En

Ja

Example for Serial Verb Construction

维持深化中日关系 Ch En Maintain and deepen the Japan-China relations

中日关系 深化 维持

日中関係 (を) 維持 深化 し

HFC

Ja

Example for Verbal Nominalization

健全安定发展的促进

安定 发展 的 促进 健全

健全 な 安定 し た 発展 の 促進

sound, stably and increasingly promote

HFC

Ch En

Ja

18

Page 19: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

19

POS Tagging & Parsing Errors

POS Tagging Errors

「伊朗」(イラン, Iran)

POS = “VV” or “JJ” POS=“NR”

「胡主席」(フー・チンタオ, Hu Jintao)

POS = “VV” POS=“NR”

「实施」(実施する, Implement)

POS = “NN” POS=“VV”

Page 20: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

20

POS Tagging & Parsing Errors

Parsing Errors

VV NN C M

V N

V

投资 20 亿 美元

*

*

*

20 億 ドル (を) 投資 する

P

P

NN NN

PU N

N

根据

东南 快报

*

*

* PU

N

S

S

*

投资 20 亿 美元

亿 美元 20 投资

「 東南 快報 」 に よる と

根据 “ 东南 快报 ”

“ 东南 快报 根据 ”

Invest 2 billion US dollars

According to “TONAN NEWS”

Page 21: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

21

POS Tagging & Parsing Errors

Parsing Errors

C M NN VV

N V

V

亿 美元 20 投资

*

*

*

20 億 ドル (を) 投資 する

P

P

NN NN

PU N

N

根据

东南 快报

*

*

* PU

N

S

S

*

投资 20 亿 美元

亿 美元 20 投资

「 東南 快報 」 に よる と

根据 “ 东南 快报 ”

“ 东南 快报 根据 ”

Invest 2 billion US dollars

According to “TONAN NEWS”

Page 22: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

22

Conclusion

Basic Head Finalization reordering rules improved the Chinese-

to-Japanese machine translation quality.

However, the refined-HFC substantially achieved further

improvement.

– Due to more monotonic word alignment.

Page 23: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

Thank you for your attention! Suggestions & Questions

Page 24: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese

24

Experiments (cont.)

The effect frequency of each exception rule during reordering

on CWMT extended corpus.

Page 25: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese