public 1 © 2005 nokia v1-filename.ppt / yyyy-mm-dd / initials development challenges of...
TRANSCRIPT
1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Development Challenges of Multilingual Text-to-Speech SystemsKimmo Pärssinen ([email protected])
Nokia Technology Platforms / Audio Applications
2 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Outline
• Introduction
• SSML extensions
• Conclusion
3 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Background
• Audio Applications, Nokia Specific Symbian Software:• Working on ASR and TTS
• Also implementing other audio applications & features for Nokia phones
• Group of ~20 people
4 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Text-to-Speech in Mobile Phones
• Currently Nokia Series 60 phones support over 40 languages and have some voice user interface built-in.
• An example is the speaker independent name dialing (SIND) system
• In SIND user says a command and hears feedback using a Text-to-Speech system
• SIND has been (both the recognizer and synthesizer) internationalized for all Series 60 languages
• Benefit is that the user is able to use one’s own mother tongue
• It is important to be able to provide all customers the same features regardless of their native tongue
• Newly launched Nokia 5500 phone (and E50 phone) also has a more advanced (unit selection system) Text-to-Speech technology built in
• Can be used for reading text-messages and give more complicated feedback to the user than the formant synthesizer used in SIND
5 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Text-to-Speech in Mobile Phones
• Internationalization (or localization) of a TTS systems is a time consuming process
• Development requires knowledge about human speech production system, language being developed, good software skills etc.
• There exist between 3000 to 8000 languages in the world and it’s impossible to built a native TTS system for all of them
• If SSML can be used to provide support how foreign, “unsupported” words should be pronounced and handled, it would be beneficial for the applications using TTS
6 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Use of SSML for Multilingual Text-to-Speech
• Extension to “language element”:• Element could also contain a list of languages instead of one language
• These languages could be considered as fallback languages in case the first choice is not supported by the system
• For example: Italian synthesizer can pronounce Finnish better than English system or Finnish synthesizer can be used to synthesize Estonian quite well
7 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Use of SSML for Multilingual Text-to-Speech
• Element read:• This element could be used to control the selection of the pre-processor in
a TTS system
• The idea is that if there is e.g. a Finnish word in the middle of English text, it would be processed using a Finnish pre-processor to get the correct pronunciation
• The actual TTS voice would remain the same
• Element read would separate the spoken language and read language
• The benefit would be that the voice of the system would remain the same and the user of this new tag needs no information about how the word should be pronounced
8 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials
Public
Conclusions
• Read element to separate spoken and written language• Don’t change the voice (speaker)
• List of preferred languages (fallback languages)
• Lexicon element (PLS by Paolo Baggia) for example containing rules for abbreviations (SMS)