human science - taus tokyo forum 2015

30
The Potential for Utilizing Japan-Developed MT Engines for Translation in Industry Megumi Tokuda Takeyoshi Nakayama Human Science Co., Ltd. WWW.SCIENCE.CO.JP

Upload: taus-enabling-better-translation

Post on 15-Jul-2015

75 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Human Science - TAUS Tokyo Forum 2015

The Potential for Utilizing Japan-Developed MT Engines for Translation in Industry

Megumi Tokuda Takeyoshi Nakayama

Human Science Co., Ltd. WWW.SCIENCE.CO.JP

Page 2: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Subjects

u Problems with Machine Translation in Japan

u Potential for Utilizing Japan-Developed MT Engines in Industry

u  In the Future u Q&A

Page 3: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Page 4: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Problems with Machine Translation in Japan

I. Statistical MT engines l  Developed with focus on European languages l  Output lacks quality in non-European language pairs

II. General rule of thumb: in Japanese/English translation, statistical MT is not as good as rule-based MT

How can we improve MT quality for Japanese/English pairs?

Page 5: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Page 6: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Engines developed in Japan

Open  Source  Engines

Page 7: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

JP: SMT with Syntax developed in Japan A: SMT with Syntax B: Moses-based SMT without Syntax C: RBMT

MT Engines

Page 8: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  User manual (computer system) Volume: approx. 300 segments (~3,800 words)

English > Japanese

Target Document

Sample Project - Overview

Page 9: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

User manual (computer system) Volume: 1 million segments (~12 million words)

Corpus

Sample Project - Overview

Page 10: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Evaluation Methods

u  BLEU u  RIBES u  Human Assessment

Page 11: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Results <BLEU>

0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%  

0.43

0.32

0.28

0.2

A (SMT)

B (SMT,   Moses)

C (RBMT)

JP

Page 12: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

About RIBES

u Better metrics for language pairs with very different word orders than BLEU

u Developed by NTT Communication Science Labs

u www.kecl.ntt.co.jp/icl/lirg/ribes/

Page 13: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Results <RIBES>

0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%  

0.79

0.73

0.7

0.63

A (SMT)

C (RBMT)

JP

B (SMT, Moses)

Page 14: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

About Human Assessment

対象文書

1. Perfectly Understandable

2. Fully Understandable

3. Barely Understandable

4. Not Understandable

Better

Worse

Meaning and Accuracy

Page 15: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

0%  

20%  

40%  

60%  

80%  

100%  

JP   A  (SMT)   B  (SMT,  Moses)  

C  (RBMT)  

1. Perfectly Understandable

2. Fully Understandable

3. Barely Understandable

4. Not Understandable

Results <Human Assessment> - Total -

Page 16: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

0%  

20%  

40%  

60%  

80%  

100%  

JP   A  (SMT)   B  (SMT,  Moses)  

C  (RBMT)  

1. Perfectly Understandable

2. Fully Understandable

3. Barely Understandable

4. Not Understandable

Results <Human Assessment> - less than 20 words -

Page 17: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

0%  

20%  

40%  

60%  

80%  

100%  

JP   A  (SMT)   B  (SMT,  Moses)  

C  (RBMT)  

1. Perfectly Understandable

2. Fully Understandable

3. Barely Understandable

4. Not Understandable

Results <Human Assessment> - 20 words or more -

Page 18: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Comparison Result u JP is the most effective MT engine we investigated

Engine A with syntax information has the second quality

u Engine B and C have the lowest quality

Page 19: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Good Examples

Page 20: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Original Creates a new alert definition.

JP 新しいアラート定義を作成します。 1

A: SMT 新規アラートの定義を作成します。 1

B: SMT (Moses) 新しいアラート定義ます. 2

C: RBMT 新しい鋭敏な定義を作成します。 3

Human 新しいアラート定義を作成します。 -­‐

Comparing Translations (1)

Perfectly

Understandable

Perfectly

Understandable

Fully

Understandable Barely

Understandable

・Terminology  ・Grammar  

Page 21: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Original An integer value in seconds

JP 整数値(秒) 1

A: SMT (秒)  は、整数値 3

B: SMT (Moses) 秒単位の整数値 2

C: RBMT 数秒で整数値 4

Human 整数値(秒) -­‐

Comparing Translations (2)

Perfectly

Understandable

Fully

Understandable Not

Understandable

Barely

Understandable

・Common  expressions  in  corpus  

Page 22: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Original You can also input XXX expressions directly.

JP XXX式を直接入力することもできます。 1

A: SMT 入力 XXX式を直接できます。 3

B: SMT (Moses) また,  XXX式を直接ます. 4

C: RBMT さらに、XXX式を直接入力できます。 1

Human XXX式を直接入力することもできます。 -­‐

Comparing Translations (3)

Perfectly

Understandable

Not

Understandable

Barely

Understandable

・Dependency  ・Word  order  

Perfectly

Understandable

Page 23: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Bad Examples

Page 24: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Original Aggregation

JP Aggregation 4

A: SMT 集計 1

B: SMT (Moses) 集計 1

C: RBMT Aggregation 4

Human 集計 -­‐

Comparing Translations (4)

Not

Understandable

・Terminology  not  in  corpus  

Perfectly

Understandable

Perfectly

Understandable

Not

Understandable

Page 25: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Original Firefox 4, 5, 7, 8, and later versions.

JP Firefox \ 4 、5 、7 、8 、および後でバージョン| Emai ≫」を選択します。 4

A: SMT Firefox  4、5、7、8、およびそれ以降のバージョン。 1

B: SMT (Moses) Firefox  4,  5,  7,  8  それ以上以上のの. 3

C: RBMT Firefox 4、5、7および8、および後のバージョン。 2

Human Firefox  4、5、7、8、およびそれ以降のバージョン。 -­‐

Comparing Translations (5) ・Common  expressions  not  in  corpus  

Perfectly

Understandable

Fully

Understandable

Barely

Understandable

Not

Understandable

Page 26: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Issues

u  Client-specific style specifications Ø  abcあいう / abc あいう (spacing between characters) Ø  ホームページ / ホーム ページ / ホーム・ページ (“home page”)

u  TMX file input and output u  No user-friendly interface

Ø  Command-line

Page 27: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Issues

u  Tags Ø  <emphasis></emphasis>

u  Entities

Ø  &ProductName; u  User interfaces

Ø  Press OK button

Page 28: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

Wrapping up results

u MT engines with Syntax information can produce enough quality for English-Japanese translation

u New MT engine developed in Japan is usable in industry

Page 29: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

In the Future

u To popularize MT in Japan, we need to enhance communication and cooperation between universities, enterprises, and localization companies.

Human Science will continue to spread knowledge and know-how on utilizing MT engines to popularize MT in Japan.

Page 30: Human Science - TAUS Tokyo Forum 2015

.  

WWW.SCIENCE.CO.JP

.  

m-­‐[email protected]

www.science.co.jp

+81-­‐3-­‐5321-­‐3111

t-­‐[email protected]