human science - taus tokyo forum 2015
TRANSCRIPT
The Potential for Utilizing Japan-Developed MT Engines for Translation in Industry
Megumi Tokuda Takeyoshi Nakayama
Human Science Co., Ltd. WWW.SCIENCE.CO.JP
.
WWW.SCIENCE.CO.JP
.
Subjects
u Problems with Machine Translation in Japan
u Potential for Utilizing Japan-Developed MT Engines in Industry
u In the Future u Q&A
.
WWW.SCIENCE.CO.JP
.
.
WWW.SCIENCE.CO.JP
.
Problems with Machine Translation in Japan
I. Statistical MT engines l Developed with focus on European languages l Output lacks quality in non-European language pairs
II. General rule of thumb: in Japanese/English translation, statistical MT is not as good as rule-based MT
How can we improve MT quality for Japanese/English pairs?
.
WWW.SCIENCE.CO.JP
.
.
WWW.SCIENCE.CO.JP
.
Engines developed in Japan
Open Source Engines
.
WWW.SCIENCE.CO.JP
.
JP: SMT with Syntax developed in Japan A: SMT with Syntax B: Moses-based SMT without Syntax C: RBMT
MT Engines
.
WWW.SCIENCE.CO.JP
. User manual (computer system) Volume: approx. 300 segments (~3,800 words)
English > Japanese
Target Document
Sample Project - Overview
.
WWW.SCIENCE.CO.JP
.
User manual (computer system) Volume: 1 million segments (~12 million words)
Corpus
Sample Project - Overview
.
WWW.SCIENCE.CO.JP
.
Evaluation Methods
u BLEU u RIBES u Human Assessment
.
WWW.SCIENCE.CO.JP
.
Results <BLEU>
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.43
0.32
0.28
0.2
A (SMT)
B (SMT, Moses)
C (RBMT)
JP
.
WWW.SCIENCE.CO.JP
.
About RIBES
u Better metrics for language pairs with very different word orders than BLEU
u Developed by NTT Communication Science Labs
u www.kecl.ntt.co.jp/icl/lirg/ribes/
.
WWW.SCIENCE.CO.JP
.
Results <RIBES>
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
0.79
0.73
0.7
0.63
A (SMT)
C (RBMT)
JP
B (SMT, Moses)
.
WWW.SCIENCE.CO.JP
.
About Human Assessment
対象文書
1. Perfectly Understandable
2. Fully Understandable
3. Barely Understandable
4. Not Understandable
Better
Worse
Meaning and Accuracy
.
WWW.SCIENCE.CO.JP
.
0%
20%
40%
60%
80%
100%
JP A (SMT) B (SMT, Moses)
C (RBMT)
1. Perfectly Understandable
2. Fully Understandable
3. Barely Understandable
4. Not Understandable
Results <Human Assessment> - Total -
.
WWW.SCIENCE.CO.JP
.
0%
20%
40%
60%
80%
100%
JP A (SMT) B (SMT, Moses)
C (RBMT)
1. Perfectly Understandable
2. Fully Understandable
3. Barely Understandable
4. Not Understandable
Results <Human Assessment> - less than 20 words -
.
WWW.SCIENCE.CO.JP
.
0%
20%
40%
60%
80%
100%
JP A (SMT) B (SMT, Moses)
C (RBMT)
1. Perfectly Understandable
2. Fully Understandable
3. Barely Understandable
4. Not Understandable
Results <Human Assessment> - 20 words or more -
.
WWW.SCIENCE.CO.JP
.
Comparison Result u JP is the most effective MT engine we investigated
Engine A with syntax information has the second quality
u Engine B and C have the lowest quality
.
WWW.SCIENCE.CO.JP
.
Good Examples
.
WWW.SCIENCE.CO.JP
.
Original Creates a new alert definition.
JP 新しいアラート定義を作成します。 1
A: SMT 新規アラートの定義を作成します。 1
B: SMT (Moses) 新しいアラート定義ます. 2
C: RBMT 新しい鋭敏な定義を作成します。 3
Human 新しいアラート定義を作成します。 -‐
Comparing Translations (1)
Perfectly
Understandable
Perfectly
Understandable
Fully
Understandable Barely
Understandable
・Terminology ・Grammar
.
WWW.SCIENCE.CO.JP
.
Original An integer value in seconds
JP 整数値(秒) 1
A: SMT (秒) は、整数値 3
B: SMT (Moses) 秒単位の整数値 2
C: RBMT 数秒で整数値 4
Human 整数値(秒) -‐
Comparing Translations (2)
Perfectly
Understandable
Fully
Understandable Not
Understandable
Barely
Understandable
・Common expressions in corpus
.
WWW.SCIENCE.CO.JP
.
Original You can also input XXX expressions directly.
JP XXX式を直接入力することもできます。 1
A: SMT 入力 XXX式を直接できます。 3
B: SMT (Moses) また, XXX式を直接ます. 4
C: RBMT さらに、XXX式を直接入力できます。 1
Human XXX式を直接入力することもできます。 -‐
Comparing Translations (3)
Perfectly
Understandable
Not
Understandable
Barely
Understandable
・Dependency ・Word order
Perfectly
Understandable
.
WWW.SCIENCE.CO.JP
.
Bad Examples
.
WWW.SCIENCE.CO.JP
.
Original Aggregation
JP Aggregation 4
A: SMT 集計 1
B: SMT (Moses) 集計 1
C: RBMT Aggregation 4
Human 集計 -‐
Comparing Translations (4)
Not
Understandable
・Terminology not in corpus
Perfectly
Understandable
Perfectly
Understandable
Not
Understandable
.
WWW.SCIENCE.CO.JP
.
Original Firefox 4, 5, 7, 8, and later versions.
JP Firefox \ 4 、5 、7 、8 、および後でバージョン| Emai ≫」を選択します。 4
A: SMT Firefox 4、5、7、8、およびそれ以降のバージョン。 1
B: SMT (Moses) Firefox 4, 5, 7, 8 それ以上以上のの. 3
C: RBMT Firefox 4、5、7および8、および後のバージョン。 2
Human Firefox 4、5、7、8、およびそれ以降のバージョン。 -‐
Comparing Translations (5) ・Common expressions not in corpus
Perfectly
Understandable
Fully
Understandable
Barely
Understandable
Not
Understandable
.
WWW.SCIENCE.CO.JP
.
Issues
u Client-specific style specifications Ø abcあいう / abc あいう (spacing between characters) Ø ホームページ / ホーム ページ / ホーム・ページ (“home page”)
u TMX file input and output u No user-friendly interface
Ø Command-line
.
WWW.SCIENCE.CO.JP
.
Issues
u Tags Ø <emphasis></emphasis>
u Entities
Ø &ProductName; u User interfaces
Ø Press OK button
.
WWW.SCIENCE.CO.JP
.
Wrapping up results
u MT engines with Syntax information can produce enough quality for English-Japanese translation
u New MT engine developed in Japan is usable in industry
.
WWW.SCIENCE.CO.JP
.
In the Future
u To popularize MT in Japan, we need to enhance communication and cooperation between universities, enterprises, and localization companies.
Human Science will continue to spread knowledge and know-how on utilizing MT engines to popularize MT in Japan.
.
WWW.SCIENCE.CO.JP
.
www.science.co.jp
+81-‐3-‐5321-‐3111