public 1 © 2005 nokia v1-filename.ppt / yyyy-mm-dd / initials development challenges of...

8
1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Public Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen ( [email protected] ) Nokia Technology Platforms / Audio Applications

Upload: gerard-perry

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Development Challenges of Multilingual Text-to-Speech SystemsKimmo Pärssinen ([email protected])

Nokia Technology Platforms / Audio Applications

Page 2: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

2 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Outline

• Introduction

• SSML extensions

• Conclusion

Page 3: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

3 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Background

• Audio Applications, Nokia Specific Symbian Software:• Working on ASR and TTS

• Also implementing other audio applications & features for Nokia phones

• Group of ~20 people

Page 4: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

4 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Text-to-Speech in Mobile Phones

• Currently Nokia Series 60 phones support over 40 languages and have some voice user interface built-in.

• An example is the speaker independent name dialing (SIND) system

• In SIND user says a command and hears feedback using a Text-to-Speech system

• SIND has been (both the recognizer and synthesizer) internationalized for all Series 60 languages

• Benefit is that the user is able to use one’s own mother tongue

• It is important to be able to provide all customers the same features regardless of their native tongue

• Newly launched Nokia 5500 phone (and E50 phone) also has a more advanced (unit selection system) Text-to-Speech technology built in

• Can be used for reading text-messages and give more complicated feedback to the user than the formant synthesizer used in SIND

Page 5: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

5 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Text-to-Speech in Mobile Phones

• Internationalization (or localization) of a TTS systems is a time consuming process

• Development requires knowledge about human speech production system, language being developed, good software skills etc.

• There exist between 3000 to 8000 languages in the world and it’s impossible to built a native TTS system for all of them

• If SSML can be used to provide support how foreign, “unsupported” words should be pronounced and handled, it would be beneficial for the applications using TTS

Page 6: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

6 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Use of SSML for Multilingual Text-to-Speech

• Extension to “language element”:• Element could also contain a list of languages instead of one language

• These languages could be considered as fallback languages in case the first choice is not supported by the system

• For example: Italian synthesizer can pronounce Finnish better than English system or Finnish synthesizer can be used to synthesize Estonian quite well

Page 7: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

7 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Use of SSML for Multilingual Text-to-Speech

• Element read:• This element could be used to control the selection of the pre-processor in

a TTS system

• The idea is that if there is e.g. a Finnish word in the middle of English text, it would be processed using a Finnish pre-processor to get the correct pronunciation

• The actual TTS voice would remain the same

• Element read would separate the spoken language and read language

• The benefit would be that the voice of the system would remain the same and the user of this new tag needs no information about how the word should be pronounced

Page 8: Public 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Development Challenges of Multilingual Text-to-Speech Systems Kimmo Pärssinen (kimmo.parssinen@nokia.com)kimmo.parssinen@nokia.com

8 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials

Public

Conclusions

• Read element to separate spoken and written language• Don’t change the voice (speaker)

• List of preferred languages (fallback languages)

• Lexicon element (PLS by Paolo Baggia) for example containing rules for abbreviations (SMS)