04/08/04 why speech synthesis is hard chris brew the ohio state university
TRANSCRIPT
04/08/04
Why Speech Synthesis is Hard
Chris Brew
The Ohio State University
04/08/04
Issues for text-to-speech
It should sound like a person AND should sound like a person who can
read AND it should sound like a person who
understands what they are reading
04/08/04
Credits
FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo
Huang, Acero and Huang: Spoken Language Processing
Many web-based demos– http://www.ims.uni-stuttgart.de/~moehler/
synthspeech/examples.html– http://www.icsi.berkeley.edu/eecs225d/klatt.html
04/08/04
Text-to-speech
Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right
04/08/04
Text and phonetic processing
Homographs Letter-to-sound Abbreviations
04/08/04
Prosody
Pauses Pitch Speech rate/ relative duration
04/08/04
Waveform generation
Articulatory Synthesis – Simulation of mechanics of speech production
Formant Synthesis– Source/filter model.
Concatenative synthesis– Limited domain waveform concatenation– No waveform modification– With waveform modification
04/08/04
Waveform generation
Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.
04/08/04
One slide of speech acoustics
Formants - bands of strong energy in the speech signal
Spectrogram - representation of relation between time (x), frequency (y) and intensity
The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.
04/08/04
Sound like a person
Get a person to record whole vocabulary, then splice together the words to make sentences.
But: speech is hard to cut up in such a way that it sews back together nicely.
04/08/04
Sound like a person who can read
Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for
stress and intonation. Spelling rules get you some of the way, but
even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.
04/08/04
Text Normalization
Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.
04/08/04
Specialized text types
Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689
Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125-5108,(212)404-9998
Raw
Address
04/08/04
SABLE
See rinss-slides
04/08/04
Sound like you understand
Lexical stress and intonation matter very much, and tie in with pragmatics.
The system doesn’t in fact understand enough to get this right.
Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.
04/08/04
Rumpke Advert
Rhetorical Systems
Definitely wrong
Possibly good enough
04/08/04
Multilingual and flexible
Festival is open-architecture, and has been extended by lots of people
It can even (easily) be made to speak in your voice.
04/08/04
Prosody
04/08/04
Boston
It will be rainy today in Boston
04/08/04
Challenges for speech synthesis
Improve overall speech quality Refine ways of organizing and collecting
speech databases Improve the quality of the control signal
04/08/04
Sounds