hmm-based speech synthesis erica cooper cs4706 spring 2011

HMM-Based Speech Synthesis

Erica CooperCS4706

Spring 2011

Concatenative Synthesis

HMM Synthesis

A parametric model Can train on mixed data from many

speakers Model takes up a very small amount of

space Speaker adaptation

Some hidden process has generated some visible observation.

Hidden states have transition probabilities and emission probabilities.

HMM Synthesis

Every phoneme+context is represented by an HMM.

The cat is on the mat.The cat is near the door.

< phone=/th/, next_phone=/ax/, word='the', next_word='cat', num_syllables=6, .... >

Acoustic features extracted: f0, spectrum, duration

Train HMM with these examples.

HMM Synthesis

Each state outputs acoustic features (a spectrum, an f0, and duration)

HMM Synthesis

Each state outputs acoustic features (a spectrum, an f0, and duration)

HMM Synthesis

Many contextual features = data sparsity

Cluster similar-sounding phones e.g: 'bog' and 'dog'

the /aa/ in both have similar acoustic features, even though their context is a bit different

Make one HMM that produces both, and was trained on examples of both.

Experiments: Google, Summer 2010

Can we train on lots of mixed data? (~1 utterance per speaker)

More data vs. better data

15k utterances from Google Voice Search as training data

ace hardware rural supply

More Data vs. Better Data

Voice Search utterances filtered by speech recognition confidence scores

50%, 6849 utterances 75%, 4887 utterances 90%, 3100 utterances 95%, 2010 utterances 99%, 200 utterances

Future Work

Speaker adaptation Phonetically-balanced training data Listening experiments Parallelization Other sources of data Voices for more languages

Reference

http://hts.sp.nitech.ac.jp

hmm-based speech synthesis erica cooper cs4706 spring 2011

mixed data

speakermore data

similar acoustic features

visible observation

google voice search

synthesisevery phoneme

transition probabilities

hmmshidden states

Documents

tutorial on hmm and atutorial on hmm and...

ideapad u260 - hmm

looking at spectrogram in praat cs4706, jan 30

hmm introduction

building an asr using htk cs4706

the htk book steve young the htk book (for htk version...

hmm, hid hmms

speech analysis tools - columbia...

thinkpad t60 hmm

lessons hmm

hmm report

hmm models of alignment - ucla-doe institute · proﬁle...

building an asr using htk cs4706 fadi biadsy 1. outline...

1 hmm (i) ling 570 fei xia week 7: 11/5-11/7/07. 2 hmm...

speech summarization julia hirschberg (thanks to sameer...

the lorenz corporation · 2015. 3. 23. · traditional...

hmm... so good!

hmm powerpoint

09 hmm (1)

speech recognion using htk cs4706 - columbia...