mobappdev (fall 2014): basics of speech synthesis
DESCRIPTION
TRANSCRIPT
MobAppDev
Basics of Speech Synthesis
Vladimir Kulyukin
www.vkedco.blogspot.com
Outline
● Background on Speech Synthesis● Text-to-Speech Synthesis (TTS)● TTS on Android● Practical Approaches to TTS: Phonetic Spelling &
Human Recording
Background on
Speech Synthesis
What is Speech Synthesis
● Speech synthesis refers to the artificial production of human speech
● Speech synthesizers take a symbolic linguistic representation and generate audio streams
● The input to speech synthesizers can be a script in some language or a phonetic transcription
Phonetic Transcription
● Phonetic transcription is a formal system that describes how words are pronounced in a specific language (the language's phonology)
● There are many formalisms used for phonetic transcriptions
● International Phonetic Alphabet (IPA) is one of the better known systems of phonetic notation
International Phonetic Alphabet
Phones, Allophones, & Phonemes
● Phone is a unit of speech sound● Allophone is a member of a set of multiple possible
spoken phones to pronounce a single phoneme● In English, /p/ is a phoneme with two allophones: [ph]
and [p]● [ph] is an aspirated allophone of /p/ (e.g., paper)● [p] is an unaspirated allophone of /p/ (e.g., speak)
Intonation, Tone, & Prosody
● Intonation is a function of pitch used to specify the speaker's emotion (joyful, calm, angry, etc.), utterance types (question vs. statement)
● Tone is another element of intonation used to distinguish individual words
● Prosody is the combination of rhythm, stress, and intonation
Text to Speech Synthesis (TTS)
TTS: Text To Speech
● The General Problem: Take a sequence of characters and generate a waveform
● Words are pronounced as a sequence of individual units called phones
● Phonetic alphabets describe how phones are pronounced● Phonological rules specify how phones combine into
speech
TTS Engine Anatomy
● A typical TTS engine consists of three components: text analyzer, language analyzer, waveform generator
● Text Analysis – parse text (after transliterating it if necessary) and identify words and utterances
● Linguistic Analysis – identify phrases and assign prosodies (accents, emphasis, duration, pauses, etc)
● Waveform Generation - generate a waveform from a fully specified linguistic description
TTS Approaches
● Full Automation – machine does everything● Mixed Initiative – human records a set of known
texts; machine learning is used to extract the rules● Human-Based Recording – human records
words/sentences/texts; machine plays them as needed
Android TTS
● Android TTS is an multi-lingual speech synthesis engine
● Android TTS can be used as a black box: text in, speech out
● Android TTS can be parameterized
Starting TTS
● It is best practice to check if TTS is available on the device
● This is done via Intent to check TTS data● If the check is successful, a instance of TTS can be
created● Activity that uses TTS implements OnInitListener
interface
Checking TTS Availability
In Activity.onCreate(): Intent ttsi = new Intent();
ttsi.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
startActivityForResult(ttsi, CHECK_TTS_REQ);
Creating TTS Instance
In Activity.onActivityResult():switch ( result_code ) {
case TextToSpeech.Engine.CHECK_VOICE_DATA_PASS:
mTTS = new TextToSpeech(this, this);
break;
}
Handling Missing TTS Data
In Activity.onActivityResult():
Intent insti = new Intent();
insti.setAction(TextToSpeech.Engine.ACTION_INSTALL_TTS_DATA);
startActivity(insti);
Handling TTS Unavailability
In Activity.onActivityResult():
switch ( result_code ) {
case TextToSpeech.Engine.CHECK_VOICE_DATA_FAIL:
// Let the user know that TTS is not available – Toast, Log Message, Notification, etc.
}
What To Do When TTS Is Ready
Override Activity.onInit():
@Override
public void onInit(int status) {if ( status == TextToSpeech.SUCCESS ) {
// do something when you know that TTS is available
}
}
Overriding onPause() and onDestroy()
● When your Activity is paused (e.g., it loses focus), have TTS stop synthesizing
● When your Activity is destroyed, shut TTS down to notify Android that the resources can be released and given to other activities or applications
Overriding onPause() and onDestroy()
public void onPause() {super.onPause();
if ( mTTS != null ) mTTS.stop();}
Overriding onPause() and onDestroy()
public void onDestroy() {super.onPause();
if ( mTTS != null ) mTTS.shutdown();}
Overcoming TTS Limitations
● Every TTS engine mispronounces some words● There are two ways of overcoming this limitation:
Phonetic spelling: spell mispronounced words the way they sound, generate waveforms, and associate words with waveforms
Human recording: have a human record mispronounced words and use the files
References & Reading Suggestions
● http://developer.android.com/reference/android/speech/tts/TextToSpeech.html