atr-trek - taus tokyo forum 2015
TRANSCRIPT
ATR-Trek Co.,Ltd.
TAUS Executive Forum 2015
Challenges for Speech Translation Business Applications
ー The “Factory” Case ー
April 10th, 2015
TAUS Executive Forum 2015
Overview
FueTrek Group� Commercialization of ATR speech translation technology
Speech translation business applications � Use case: “Factory” (Japanese ⇔ Chinese)� Speech recognition (ASR) / Machine translation (MT)
� Resource acquisition� Model training� System evaluation
� Real environment challenges
Future efforts� Towards language-barrier free environments
1
TAUS Executive Forum 20152
FueTrek Group
Company Name FueTrek Co., Ltd.Establishment April 17th, 2000
Location Osaka (main office), Tokyo, Fukuoka
Capital 716 Million Yen (status: January 1st, 2015)
Sales / Profit 2.42 Billion Yen / 430 Million Yen (status: March 2014)
Major Shareholders Hideyuki Fujiki (21.8%), NTT Docomo (6.1%), Trust & Custody Services Bank (2.9%)
Employees 249 (FueTrek:92, ATR-Trek:28, SuperOne:15, Media:114)(status: April 1st , 2015)
Development of smartphone/tablet applications Machine/human translation services
Development of speech recognition, machine translation, and dialog applications
Investment ratio : 19%
Development of a high-precision machine translation technology and software
TAUS Executive Forum 2015
Multilingual
Speech Synthesis
(TTS)
Multilingual
Speech Recognition
(ASR)
Spoken Language
Translation
(MT)
• Acoustic environment(noise, reverberation)
• Speech style
• Multilingual
((((Japanese, English,
Chinese, ')
・・・・Dialog speech
translation
・・・・Wide topic coverage
・・・・Multilingual (23 langs)(ar,da,de,en,es,fr,hi,hu,id,it,ja,ko,
mn,ms,nl,pt,ptb,ru,si,th,tl,ur,vi,zh)
• High-quality synthesis
• Natural voice
• Multilingual
((((Japanese, English,
Chinese, ')
Multi-speaker speech
Input language text corpus
Bilingual corpus
Target language text corpus
Various prosodic patterns
Long-utterance speech
Speech Translation SystemLarge-scale corpus-based speech dialog translation technology
3
Speech Translation TechnologySpeech Translation TechnologySpeech Translation TechnologySpeech Translation Technology
TAUS Executive Forum 20154
““““ShabeteShabeteShabeteShabete HonyakuHonyakuHonyakuHonyaku””””SpeechSpeechSpeechSpeech----Translation ServiceTranslation ServiceTranslation ServiceTranslation Service
Mobile phone manufacturers (3~5 companies))))FOMA905i (released November 2007)
■ ASR front-end adaptation (voice recognition models)■ ALL IN world mobile phone (3G+GSM)
ATR-Trek■ development of ASR/MT technology (resources, software)
■ mobile phone market + CP service
■ functionality: speech recognition in noisy environments
Pre-installed innovative services
that can be used abroad
[Service start : November 2007]
■ Specification:Terminal: Android 2.2 〜 4.4
Languages: Japanese-English, Japanese-Chinese
Configuration: ASR, SMT, TTS
Protocol: http, 3G/LTE/WiFi
Installation: Distributed ASR / on-premise (JA:16, EN:8, ZH:8 x 2 servers)
TAUS Executive Forum 20155
Technology / Service EvolutionTechnology / Service EvolutionTechnology / Service EvolutionTechnology / Service Evolution
Distributed Speech Recognition(DSR)・Shabete Honyaku
Distributed Speech Recognition(DSR)・Shabete Honyaku
Local Speech Recognition (LSR)
・voice command input
Local Speech Recognition (LSR)
・voice command input
Hybrid Speech Recognition (DSR+LSR)
・voice input mail・voice quick search
Hybrid Speech Recognition (DSR+LSR)
・voice input mail・voice quick search
New UI=realization of voice dialog (ASR+TTS)
・quick audio launch
New UI=realization of voice dialog (ASR+TTS)
・quick audio launch
Tech
nolo
gy /
Ser
vice
2007 2008 2009 2010 2011
Smartphone Services
・Shabete Concierge
・Shabete Honyaku for A
・Shabete Easy Operation
Large-vocabulary ASR system
Scenario-based
dialog system
Voice
agent
service
ATR-Trek
Establishment
2012
TAUS Executive Forum 20156
Nationwide Speech TranslationDemonstration System (2009)
Effects of ASR/MT model update((((acceptance rate of speech translation results))))
0%
10%
20%
30%
40%
50%
60%
70%
日英 日中
全国共通版
固有名詞・固有表現追加
実データによるモデル更新
~~~~System in Action ~~~~
Hokkaido:Hokkaido:Hokkaido:Hokkaido:Sapporo / Furano / Shiretoko12/28-2/22, 98,830 uttrDokon / NEC Soft / NICT
KantoKantoKantoKanto::::Yamanaka/Isawa/Kofu1/25-2/21, 40,301 uttrJTB GMT / NEC / NICT
Central:Central:Central:Central:Kanazawa/Noto/Ise/Toba1/5-2/28, 37,692 uttrJTB Central / NICT
KansaiKansaiKansaiKansai::::Osaka/Nara/Hiroshima12/14-2/ 28, 40,703 uttr
ATRATRATRATR----Trek Trek Trek Trek / NICT
Kyushu:Kyushu:Kyushu:Kyushu:Aso/Hakata/Nagasaki12/20-2/28, 58,263 uttrKumamoto software/Kyushu Industrial Transportation/JTB/ NICT
+ Funded by Ministry of Internal Affairs and Communications+ 5 regions in Japan, local tourist facilities, over 300 locations
JE JC
Nationwide Baseline System
Baseline + Proper-Noun Dic
Resource-Extended System
TAUS Executive Forum 2015
The “Factory” Case
7
投入コンベアはどうするの。
最初设置时,中心没有好好对准。
[When we first installed it,the center did not fit properly.]
ずれてるのですか。
[What are you going to do with the charging conveyer?]
[Is it dislocated?]
Sumitomo Rubber Industries (SRI)
TAUS Executive Forum 2015
New ChallengesFactory environment� Work-place safety management → hands/eye/ear-free� Manufacturing noise → equipment, announcements, etc.
Use scene� Diverse topics (daily conversation ~ technical discussions)
→ multi-domain
� Communication between supervisor and local workers→ work-place, face-to-face, bilingual language support
Resources� Text data containing technical terms in a new domain� real environment speech data (actual work-related utterances)� Audio data of background noise at the work-place
↓adaptation of speech translation technology
8
TAUS Executive Forum 2015
How to build a “Factory”Speech Translation System
Domain adaptation� Achieve performance improvement on top of existing speech
translation system
9
Resource acquisition process:� Collect domain-relevant language data
→ vocabulary (terminology), domain-specific wording/phrases
� Create bilingual corpus→ on-site utterances, company documents, keyword-based crawling
� Collect factory “noise”→ non-language audio data
� Train speech translation models→ acoustic model (ASR) → translation model (MT)→ language model(ASR@input language, MT@target language)
TAUS Executive Forum 2015
How to build a “Factory”Speech Translation System
Phase Todo Time
Step1 Feasibility Validation・translation (MT) only
・add (in-domain) “vocabulary”+ SRI-internal terminology
・evaluate translation quality・check effects of dictionary extension
1month
Step2 System Adaptation・speech translation(ASR+MT)
・determine (minimal) amount and collect in-domain speech and text resources for domain adaptation
・retrain statistical models using collected in-domain resources
・evaluate speech translation quality
6months
Step3 Practical Use・server-based speech
translation system
・integrate domain-adapted speechtranslation engine into server application
・setup speech translation system on-site
4months
10
TAUS Executive Forum 2015
23.5 23.8
13.8 15.5
7.5 10.2
9.9
13.1
45.3 37.4
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
BASE TDIC
D
C
B
A
S
3.4 5.1 0.8
9.3 1.7
15.3
7.6
24.6
86.5
45.7
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
BASE TDIC
D
C
B
A
S
☺☺
Step1:Terminology Effects� Engine:
� baseline engine (BASE) : travel domain (bilingual corpus:540k sen)� term-extended engine (TDIC) : BASE & tire-domain dictionary (1,758 entries)
� Evaluation:� Japanese→Chinese, TIRE domain, 639 sen
11
Coverage of terminology is insufficient
Adding in-domain terminology helps☹☹
(sentences with tire-domain terminology)(complete test set)
Grade
S) native
A) correct
B) fair
C) acceptable
D) nonsense
----7.9% ----40.8%
TAUS Executive Forum 2015
Step2:Resource Acquisition
13
Data Type AmountDictionary tire
terminology 20,387
BilingualCorpus
work-place(8,635)
67,294in-house(26,333)crawling(32,326)
MonolingualCorpus crawling 1,351,058
1) In-house documents・create tire terminology dictionary (OCR, electronic files)
・extract phrasings from technical documents (electronic files)
2) Monolingual data through web crawling
・download webpages of tire manufacturers
and related companies
・search and download pages including tire
terminology keywords
・clean crawling data and select in-domain sentences
3) Data collection of actual work utterances
・record business trip conversations abroad
・transcribe recorded speech data
・check/rewrite inaudible voice and technical
terminology
4) Human Translation of 1-3 resources
・text translation (Japanese→Chinese)
・bilingual check (sampling)
5) Monolingual in-domain corpus
・ train in-domain seed language model and select
sentences close to tire-domain based on perplexity scores.
TAUS Executive Forum 2015
Step2:Domain Adaptation Effects� Engine:
� baseline engine (BASE) : travel domain (bilingual corpus:540k sen)� in-domain engine(TIRE): tire domain (bilingual corpus:67k sen、dict: 20k)� combined engine (TIRE+BASE): travel+tire domain
(bilingual corpus:607k sen、dict:20k)� Evaluation:
� work-place utterances, 500 sen
14
Grade
S) native
A) correct
B) fair
C) acceptable
D) nonsense
Japanese→Chinese Chinese→Japanese
20.1 25.8 27.7
10.9
17.9 20.6 14.3
19.2 19.3 24.0
20.7
20.7
30.7
16.4 11.7
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
BASE TIRE TIRE+BASE
D
C
B
A
S
14.2 16.7 23.9
9.1 12.5
13.4 7.1
10.8
15.6
12.2
14.0
19.5
57.4
46.0
27.6
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
BASE TIRE TIRE+BASE
D
C
B
A
S
----14.3% ----19.0% ----11.4% ----29.8%
TAUS Executive Forum 2015
System configuration : multi-point connection network
Step3:Speech Translation Server
15
TAUS Executive Forum 2015
Noise adaptation effects
Step3:Factory Noise Adaptation
In noisy environments, recognition rate improvedEspecially, for large noise scenarios (00~05dB)
16
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
AVE(3月)
AVE(8月)
w/o noise adaptation
with noise adaptation
TAUS Executive Forum 2015
“Factory” ChallengesOn-site maintenance� Practical use at factory in China started January 2015� Runtime evaluation
・server (overseas network) speed and stability・user satisfaction・analysis of server log files on a regular basis in order to
improve system performance
Portability� Multilingual → other overseas factories (Turkey, etc.)� Multi-domain → related fields (chemical, electronic equipment)
Integration with wearable technology� Realization of hands/eye/ear-free user interfaces
17
TAUS Executive Forum 2015
Global Communication Plan (GCP)
18
Multilingual translation projectof the Ministry of Internal Affairs and Communications
・promotion of “language-barrier-free” 2020 Olympics・an industry-academia-government collaboration・national strategy zones (hospitals, commercial facilities, etc. in tourist areas)
病院
自動翻訳通話自動翻訳会議
ショッピング
飲食店
タクシー
telework
medical
disaster
駅
hospital
sightseeing
communicationsmeetings
shopping
restaurant
travel
in towntaxi
city hall
TAUS Executive Forum 2015
FueTrek Group CharacteristicsFueTrek Group is unique in the world, because it provides multilingual business solutions for speech recognition, human translation and machine translation in one-stop.
FueTrek Group is unique in the world, because it provides multilingual business solutions for speech recognition, human translation and machine translation in one-stop.
19
Speech
Recognition
Human
TranslationMachine
Translation
Strong outside (government) networkParticipation in world’s top-performing MT venture 【Mirai Translate】Participation in Ministry of Internal Affairs and Communications GCPJoint ASR/MT research with NICT/NAIST
Expertise in ASR/MT fieldsSubsidiary translation company (MEDIA Research Inc.) providing human/machine translation servicesNumerous engineers specialized in the field of ASR and MT.Commercialization of speech translation services since 2007.
FueTrek Group Strengths
TAUS Executive Forum 2015
New Markets for Speech Translation Technology
20
TepatTepatTepatTepat beradaberadaberadaberada di di di di pintupintupintupintuKeluarKeluarKeluarKeluar stasiunstasiunstasiunstasiun KyotoKyotoKyotoKyoto
京都駅京都駅京都駅京都駅をををを出出出出てすぐですてすぐですてすぐですてすぐです。。。。
あの映画見た?あの映画見た?あの映画見た?あの映画見た?
Yes, it was very interesting.Yes, it was very interesting.Yes, it was very interesting.Yes, it was very interesting.
これどこにこれどこにこれどこにこれどこに運運運運ぶぶぶぶ????
这这这这个搬到哪里?个搬到哪里?个搬到哪里?个搬到哪里?
もっと小さいサイズはありますか。もっと小さいサイズはありますか。もっと小さいサイズはありますか。もっと小さいサイズはありますか。
はい、はい、はい、はい、SSSSサイズをお持ちします。サイズをお持ちします。サイズをお持ちします。サイズをお持ちします。
มีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะมีไซสเล็กกวานีไ้หมคะ
มีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสมีคะ จะไปหยบิไซสSSSSมาใหนะคมาใหนะคมาใหนะคมาใหนะค