04/08/04 why speech synthesis is hard chris brew the ohio state university

21
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

Upload: joshua-whitehead

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Why Speech Synthesis is Hard

Chris Brew

The Ohio State University

Page 2: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Issues for text-to-speech

It should sound like a person AND should sound like a person who can

read AND it should sound like a person who

understands what they are reading

Page 3: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Credits

FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo

Huang, Acero and Huang: Spoken Language Processing

Many web-based demos– http://www.ims.uni-stuttgart.de/~moehler/

synthspeech/examples.html– http://www.icsi.berkeley.edu/eecs225d/klatt.html

Page 4: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Text-to-speech

Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right

Page 5: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Text and phonetic processing

Homographs Letter-to-sound Abbreviations

Page 6: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Prosody

Pauses Pitch Speech rate/ relative duration

Page 7: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Waveform generation

Articulatory Synthesis – Simulation of mechanics of speech production

Formant Synthesis– Source/filter model.

Concatenative synthesis– Limited domain waveform concatenation– No waveform modification– With waveform modification

Page 8: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Waveform generation

Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.

Page 9: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

One slide of speech acoustics

Formants - bands of strong energy in the speech signal

Spectrogram - representation of relation between time (x), frequency (y) and intensity

The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.

Page 10: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Sound like a person

Get a person to record whole vocabulary, then splice together the words to make sentences.

But: speech is hard to cut up in such a way that it sews back together nicely.

Page 11: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Sound like a person who can read

Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for

stress and intonation. Spelling rules get you some of the way, but

even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.

Page 12: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Text Normalization

Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.

Page 13: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Specialized text types

Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689

Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125-5108,(212)404-9998

Raw

Address

Page 14: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

SABLE

See rinss-slides

Page 15: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Sound like you understand

Lexical stress and intonation matter very much, and tie in with pragmatics.

The system doesn’t in fact understand enough to get this right.

Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.

Page 16: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Rumpke Advert

Rhetorical Systems

Definitely wrong

Possibly good enough

Page 17: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Multilingual and flexible

Festival is open-architecture, and has been extended by lots of people

It can even (easily) be made to speak in your voice.

Page 18: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Prosody

Page 19: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Boston

It will be rainy today in Boston

Page 20: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Challenges for speech synthesis

Improve overall speech quality Refine ways of organizing and collecting

speech databases Improve the quality of the control signal

Page 21: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University

04/08/04

Sounds