atr-trek - taus tokyo forum 2015

21
ATR-Trek Co.,Ltd. TAUS Executive Forum 2015 Challenges for Speech Translation Business Applications The “Factory” Case ー April 10 th , 2015

Upload: taus-enabling-better-translation

Post on 15-Jul-2015

206 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

ATR-Trek Co.,Ltd.

TAUS Executive Forum 2015

Challenges for Speech Translation Business Applications

ー The “Factory” Case ー

April 10th, 2015

TAUS Executive Forum 2015

Overview

FueTrek Group� Commercialization of ATR speech translation technology

Speech translation business applications � Use case: “Factory” (Japanese ⇔ Chinese)� Speech recognition (ASR) / Machine translation (MT)

� Resource acquisition� Model training� System evaluation

� Real environment challenges

Future efforts� Towards language-barrier free environments

1

TAUS Executive Forum 20152

FueTrek Group

Company Name FueTrek Co., Ltd.Establishment April 17th, 2000

Location Osaka (main office), Tokyo, Fukuoka

Capital 716 Million Yen (status: January 1st, 2015)

Sales / Profit 2.42 Billion Yen / 430 Million Yen (status: March 2014)

Major Shareholders Hideyuki Fujiki (21.8%), NTT Docomo (6.1%), Trust & Custody Services Bank (2.9%)

Employees 249 (FueTrek:92, ATR-Trek:28, SuperOne:15, Media:114)(status: April 1st , 2015)

Development of smartphone/tablet applications Machine/human translation services

Development of speech recognition, machine translation, and dialog applications

Investment ratio : 19%

Development of a high-precision machine translation technology and software

TAUS Executive Forum 2015

Multilingual

Speech Synthesis

(TTS)

Multilingual

Speech Recognition

(ASR)

Spoken Language

Translation

(MT)

• Acoustic environment(noise, reverberation)

• Speech style

• Multilingual

((((Japanese, English,

Chinese, ')

・・・・Dialog speech

translation

・・・・Wide topic coverage

・・・・Multilingual (23 langs)(ar,da,de,en,es,fr,hi,hu,id,it,ja,ko,

mn,ms,nl,pt,ptb,ru,si,th,tl,ur,vi,zh)

• High-quality synthesis

• Natural voice

• Multilingual

((((Japanese, English,

Chinese, ')

Multi-speaker speech

Input language text corpus

Bilingual corpus

Target language text corpus

Various prosodic patterns

Long-utterance speech

Speech Translation SystemLarge-scale corpus-based speech dialog translation technology

3

Speech Translation TechnologySpeech Translation TechnologySpeech Translation TechnologySpeech Translation Technology

TAUS Executive Forum 20154

““““ShabeteShabeteShabeteShabete HonyakuHonyakuHonyakuHonyaku””””SpeechSpeechSpeechSpeech----Translation ServiceTranslation ServiceTranslation ServiceTranslation Service

Mobile phone manufacturers (3~5 companies))))FOMA905i (released November 2007)

■ ASR front-end adaptation (voice recognition models)■ ALL IN world mobile phone (3G+GSM)

ATR-Trek■ development of ASR/MT technology (resources, software)

■ mobile phone market + CP service

■ functionality: speech recognition in noisy environments

Pre-installed innovative services

that can be used abroad

[Service start : November 2007]

■ Specification:Terminal: Android 2.2 〜 4.4

Languages: Japanese-English, Japanese-Chinese

Configuration: ASR, SMT, TTS

Protocol: http, 3G/LTE/WiFi

Installation: Distributed ASR / on-premise (JA:16, EN:8, ZH:8 x 2 servers)

TAUS Executive Forum 20155

Technology / Service EvolutionTechnology / Service EvolutionTechnology / Service EvolutionTechnology / Service Evolution

Distributed Speech Recognition(DSR)・Shabete Honyaku

Distributed Speech Recognition(DSR)・Shabete Honyaku

Local Speech Recognition (LSR)

・voice command input

Local Speech Recognition (LSR)

・voice command input

Hybrid Speech Recognition (DSR+LSR)

・voice input mail・voice quick search

Hybrid Speech Recognition (DSR+LSR)

・voice input mail・voice quick search

New UI=realization of voice dialog (ASR+TTS)

・quick audio launch

New UI=realization of voice dialog (ASR+TTS)

・quick audio launch

Tech

nolo

gy /

Ser

vice

2007 2008 2009 2010 2011

Smartphone Services

・Shabete Concierge

・Shabete Honyaku for A

・Shabete Easy Operation

Large-vocabulary ASR system

Scenario-based

dialog system

Voice

agent

service

ATR-Trek

Establishment

2012

TAUS Executive Forum 20156

Nationwide Speech TranslationDemonstration System (2009)

Effects of ASR/MT model update((((acceptance rate of speech translation results))))

0%

10%

20%

30%

40%

50%

60%

70%

日英 日中

全国共通版

固有名詞・固有表現追加

実データによるモデル更新

~~~~System in Action ~~~~

Hokkaido:Hokkaido:Hokkaido:Hokkaido:Sapporo / Furano / Shiretoko12/28-2/22, 98,830 uttrDokon / NEC Soft / NICT

KantoKantoKantoKanto::::Yamanaka/Isawa/Kofu1/25-2/21, 40,301 uttrJTB GMT / NEC / NICT

Central:Central:Central:Central:Kanazawa/Noto/Ise/Toba1/5-2/28, 37,692 uttrJTB Central / NICT

KansaiKansaiKansaiKansai::::Osaka/Nara/Hiroshima12/14-2/ 28, 40,703 uttr

ATRATRATRATR----Trek Trek Trek Trek / NICT

Kyushu:Kyushu:Kyushu:Kyushu:Aso/Hakata/Nagasaki12/20-2/28, 58,263 uttrKumamoto software/Kyushu Industrial Transportation/JTB/ NICT

+ Funded by Ministry of Internal Affairs and Communications+ 5 regions in Japan, local tourist facilities, over 300 locations

JE JC

Nationwide Baseline System

Baseline + Proper-Noun Dic

Resource-Extended System

TAUS Executive Forum 2015

The “Factory” Case

7

投入コンベアはどうするの。

最初设置时,中心没有好好对准。

[When we first installed it,the center did not fit properly.]

ずれてるのですか。

[What are you going to do with the charging conveyer?]

[Is it dislocated?]

Sumitomo Rubber Industries (SRI)

TAUS Executive Forum 2015

New ChallengesFactory environment� Work-place safety management → hands/eye/ear-free� Manufacturing noise → equipment, announcements, etc.

Use scene� Diverse topics (daily conversation ~ technical discussions)

→ multi-domain

� Communication between supervisor and local workers→ work-place, face-to-face, bilingual language support

Resources� Text data containing technical terms in a new domain� real environment speech data (actual work-related utterances)� Audio data of background noise at the work-place

↓adaptation of speech translation technology

8

TAUS Executive Forum 2015

How to build a “Factory”Speech Translation System

Domain adaptation� Achieve performance improvement on top of existing speech

translation system

9

Resource acquisition process:� Collect domain-relevant language data

→ vocabulary (terminology), domain-specific wording/phrases

� Create bilingual corpus→ on-site utterances, company documents, keyword-based crawling

� Collect factory “noise”→ non-language audio data

� Train speech translation models→ acoustic model (ASR) → translation model (MT)→ language model(ASR@input language, MT@target language)

TAUS Executive Forum 2015

How to build a “Factory”Speech Translation System

Phase Todo Time

Step1 Feasibility Validation・translation (MT) only

・add (in-domain) “vocabulary”+ SRI-internal terminology

・evaluate translation quality・check effects of dictionary extension

1month

Step2 System Adaptation・speech translation(ASR+MT)

・determine (minimal) amount and collect in-domain speech and text resources for domain adaptation

・retrain statistical models using collected in-domain resources

・evaluate speech translation quality

6months

Step3 Practical Use・server-based speech

translation system

・integrate domain-adapted speechtranslation engine into server application

・setup speech translation system on-site

4months

10

TAUS Executive Forum 2015

23.5 23.8

13.8 15.5

7.5 10.2

9.9

13.1

45.3 37.4

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BASE TDIC

D

C

B

A

S

3.4 5.1 0.8

9.3 1.7

15.3

7.6

24.6

86.5

45.7

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BASE TDIC

D

C

B

A

S

☺☺

Step1:Terminology Effects� Engine:

� baseline engine (BASE) : travel domain (bilingual corpus:540k sen)� term-extended engine (TDIC) : BASE & tire-domain dictionary (1,758 entries)

� Evaluation:� Japanese→Chinese, TIRE domain, 639 sen

11

Coverage of terminology is insufficient

Adding in-domain terminology helps☹☹

(sentences with tire-domain terminology)(complete test set)

Grade

S) native

A) correct

B) fair

C) acceptable

D) nonsense

----7.9% ----40.8%

TAUS Executive Forum 2015

Step2: Resource Acquisition

12

TAUS Executive Forum 2015

Step2:Resource Acquisition

13

Data Type AmountDictionary tire

terminology 20,387

BilingualCorpus

work-place(8,635)

67,294in-house(26,333)crawling(32,326)

MonolingualCorpus crawling 1,351,058

1) In-house documents・create tire terminology dictionary (OCR, electronic files)

・extract phrasings from technical documents (electronic files)

2) Monolingual data through web crawling

・download webpages of tire manufacturers

and related companies

・search and download pages including tire

terminology keywords

・clean crawling data and select in-domain sentences

3) Data collection of actual work utterances

・record business trip conversations abroad

・transcribe recorded speech data

・check/rewrite inaudible voice and technical

terminology

4) Human Translation of 1-3 resources

・text translation (Japanese→Chinese)

・bilingual check (sampling)

5) Monolingual in-domain corpus

・ train in-domain seed language model and select

sentences close to tire-domain based on perplexity scores.

TAUS Executive Forum 2015

Step2:Domain Adaptation Effects� Engine:

� baseline engine (BASE) : travel domain (bilingual corpus:540k sen)� in-domain engine(TIRE): tire domain (bilingual corpus:67k sen、dict: 20k)� combined engine (TIRE+BASE): travel+tire domain

(bilingual corpus:607k sen、dict:20k)� Evaluation:

� work-place utterances, 500 sen

14

Grade

S) native

A) correct

B) fair

C) acceptable

D) nonsense

Japanese→Chinese Chinese→Japanese

20.1 25.8 27.7

10.9

17.9 20.6 14.3

19.2 19.3 24.0

20.7

20.7

30.7

16.4 11.7

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BASE TIRE TIRE+BASE

D

C

B

A

S

14.2 16.7 23.9

9.1 12.5

13.4 7.1

10.8

15.6

12.2

14.0

19.5

57.4

46.0

27.6

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

BASE TIRE TIRE+BASE

D

C

B

A

S

----14.3% ----19.0% ----11.4% ----29.8%

TAUS Executive Forum 2015

System configuration : multi-point connection network

Step3:Speech Translation Server

15

TAUS Executive Forum 2015

Noise adaptation effects

Step3:Factory Noise Adaptation

In noisy environments, recognition rate improvedEspecially, for large noise scenarios (00~05dB)

16

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

AVE(3月)

AVE(8月)

w/o noise adaptation

with noise adaptation

TAUS Executive Forum 2015

“Factory” ChallengesOn-site maintenance� Practical use at factory in China started January 2015� Runtime evaluation

・server (overseas network) speed and stability・user satisfaction・analysis of server log files on a regular basis in order to

improve system performance

Portability� Multilingual → other overseas factories (Turkey, etc.)� Multi-domain → related fields (chemical, electronic equipment)

Integration with wearable technology� Realization of hands/eye/ear-free user interfaces

17

TAUS Executive Forum 2015

Global Communication Plan (GCP)

18

Multilingual translation projectof the Ministry of Internal Affairs and Communications

・promotion of “language-barrier-free” 2020 Olympics・an industry-academia-government collaboration・national strategy zones (hospitals, commercial facilities, etc. in tourist areas)

病院

自動翻訳通話自動翻訳会議

ショッピング

飲食店

タクシー

telework

medical

disaster

hospital

sightseeing

communicationsmeetings

shopping

restaurant

travel

in towntaxi

city hall

TAUS Executive Forum 2015

FueTrek Group CharacteristicsFueTrek Group is unique in the world, because it provides multilingual business solutions for speech recognition, human translation and machine translation in one-stop.

FueTrek Group is unique in the world, because it provides multilingual business solutions for speech recognition, human translation and machine translation in one-stop.

19

Speech

Recognition

Human

TranslationMachine

Translation

Strong outside (government) networkParticipation in world’s top-performing MT venture 【Mirai Translate】Participation in Ministry of Internal Affairs and Communications GCPJoint ASR/MT research with NICT/NAIST

Expertise in ASR/MT fieldsSubsidiary translation company (MEDIA Research Inc.) providing human/machine translation servicesNumerous engineers specialized in the field of ASR and MT.Commercialization of speech translation services since 2007.

FueTrek Group Strengths

TAUS Executive Forum 2015

New Markets for Speech Translation Technology

20

TepatTepatTepatTepat beradaberadaberadaberada di di di di pintupintupintupintuKeluarKeluarKeluarKeluar stasiunstasiunstasiunstasiun KyotoKyotoKyotoKyoto

京都駅京都駅京都駅京都駅をををを出出出出てすぐですてすぐですてすぐですてすぐです。。。。

あの映画見た?あの映画見た?あの映画見た?あの映画見た?

Yes, it was very interesting.Yes, it was very interesting.Yes, it was very interesting.Yes, it was very interesting.

これどこにこれどこにこれどこにこれどこに運運運運ぶぶぶぶ????

这这这这个搬到哪里?个搬到哪里?个搬到哪里?个搬到哪里?

もっと小さいサイズはありますか。もっと小さいサイズはありますか。もっと小さいサイズはありますか。もっと小さいサイズはありますか。

はい、はい、はい、はい、SSSSサイズをお持ちします。サイズをお持ちします。サイズをお持ちします。サイズをお持ちします。

มีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะ

มีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสSSSSมาใหนะคมาใหนะคมาใหนะคมาใหนะค