: recognition speech segmentation speech activity detection vowel detection duration parameters...

Download : Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian

If you can't read please download the document

Upload: adela-holt

Post on 17-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1
  • : Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian Signal Language Frequency (kHz) 8 0 Amplitude 00.20.40.60.81.0 Time (s) Rhythm : - Duration C - Duration V - Complexity C Intonation : - F0 - F0 CCVV CCV CV CCCV CV Vowel Non Vowel Pause Pseudo Syllable Speech segmentation: statistical segmentation [Andr-Obrecht, 1988] Shorts segments (bursts and transient parts of sounds) Longer segments (steady parts of sounds) Speech Activity Detection and Vowel detection [Pellegrino & Obrecht, 2000] Spectral analysis of the signal Language and speaker independent algorithm Pseudo Syllable segmentation Derived from the most frequent syllable structure in the world: CV The speech signal is parsed in patterns matching the structure: C n V (n integer, can be 0). Duration Parameters 3 parameters are computed: Global consonantal segments duration D c Global vocalic segment duration D v Syllable complexity (N c : number of consonantal segments in the pseudo-syllable) CCV = {D C D V N C } Intonation Parameters Fundamental frequency extraction: MESSIGNAIX toolbox: combination of three methods (amdf, spectral comb, autocorrelation) Fundamental frequency features: 2 parameters are computed: a measurement of the accent location (maximum F 0 location regarding to vocalic onset F0 ) the normalized fundamental frequency bandwidth on each syllable ( F0 ). F 0 = { F 0 F 0 } Each pseudo-syllable is characterized by two vectors, one characterizing rhythmic units and the other characterizing intonation on each of these rhythmic units. For each language two Gaussian Mixture Models are learned to characterize the language specific CCV and F 0 distributions, using the EM algorithm with VQ initialization. Recognition Experiments Experiments were previously made on the five languages of the MULTEXT database: English, French, German, Italian and Spanish. Japanese was added thanks to Mr. Kitasawa. The tests are made using 20 seconds read speech utterances and consist in a six-way identification task. On this read speech corpus, our rhythm system can achieve good performance (79 % of correct identification on six languages) while the intonation system allows us to reach up to 62 % of correct identifications. Results of the intonation system on read speech (MULTEXT corpus) Results of the rhythmic system on read speech (MULTEXT corpus) Rhythmic model: Catch the rhythmic information. Achieves a clear separation between languages of different rhythmic classes. Dont catch the languages rhythm but characterizes language-specific rhythmic units, Model sequences of these units to get a complete rhythm model. Fundamental frequency model: Better characterizes languages which have well defined intonational rules like Japanese and English. We need to test our system on many more languages to confirm (or not) the linguistic classes hypothesis. Conclusion Automatic Modelling of Rhythm and Intonation for Language Identification 1 Institut de Recherche en Informatique de Toulouse UMR 5505 CNRS - Universit Paul Sabatier - INP 31062 Toulouse Cedex 4 - France 2 Laboratoire Dynamique du Langage UMR 5596 CNRS - Universit Lumire Lyon 2 69363 Lyon Cedex 7 - France Jean-Luc ROUAS 1, Jrme FARINAS 1, Franois PELLEGRINO 2 and Rgine ANDR-OBRECHT 1 {jean-luc.rouas, jerome.farinas, regine.andre-obrecht}@irit.fr; [email protected] Pseudo syllable generation French English Japanese Spanish Model Item Model Item Models