some activities on non-linear speech processing at enst/cnrs-ltci gérard chollet...
TRANSCRIPT
Some activities on Non-linear Speech Processing
at ENST/CNRS-LTCI
Gérard [email protected]
ENST/CNRS-LTCI46 rue Barrault
75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet
Outline
What is ENST/CNRS-LTCI ?
Research and application topics related to COST-277: Speech production and perception, Speech analysis and synthesis, Speech coding:
The SYMPATEX project Automatic speech recognition:
The SIROCCO project Speaker characterisation and verification
Perspectives within COST-277
ENST:ENST: Ecole Nationale Supérieure des Ecole Nationale Supérieure des TélécommunicationsTélécommunicationshttp://www.enst.frhttp://www.enst.fr
CNRS:CNRS: Centre National de la Recherche ScientifiqueCentre National de la Recherche Scientifiquehttp://www.cnrs.frhttp://www.cnrs.fr
LTCI:LTCI: Laboratoire de Traitement et Communication Laboratoire de Traitement et Communication de l’Informationde l’Information
http://www.enst.fr/externe/ura.htmlhttp://www.enst.fr/externe/ura.html
Our affiliations
What is ENST?Ecole Nationale de
Télécommunications
• classed among the
‘Grandes Ecoles d'Ingénieurs’.
• 250 state certified engineers
each year .
• part of ‘Groupement des Ecoles
de Télécommunications’
ENST-Paris ( ) ENST-Bretagne in Brest Institut National des
Télécommunications in Evry EURECOM in Sophia-Antipolis ENIC (Ecole Nouvelle d’Ingénieurs en
Télécoms) in Lille Internet school in Marseille
GET: Groupement des Ecoles de Télécommunications
Speech Production and Perception
Parametric Vocal Tract model (Shinji Maeda)
Non-linear Production model using Distinctive Regions and Modes (René Carré)
Quantal nature of speech (R. Carré and S. Maeda)
Perceptual filter (Nicolas Moreau)
Auditory prosthesis (Alain Goyé and Jacques Prado)
Speech analysis and synthesis
Time-Frequency representations, Wavelets
Time-dependent spectral models (Yves Grenier)
HNM (Harmonics + Noise Model)(Olivier Cappé, Eric Moulines, Maurice Charbit)
Glottal Excited LPC
Time-dependent Spectral Models
Temporal Decomposition (B. Atal, 1983)
Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier)
Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)
Temporal Decomposition
HNM: Harmonics + Noise Model
Estimation des harmoniques
Estimation de l’enveloppe harmonique
Paramètres H+Bf
A
Signal à l ’entrée Voisement
EstimationAR du
résiduel
Détection dupitch, etl’énergie
EstimationAR
+-
+
Voisé
Non-voisé
f
A
A L I S P
A utomatic L anguage I ndependent S peech P rocessing
Automatic discovery of segmental units for speech coding, synthesis, recognition, language
identification and speaker verification.
Speech Coding by indexing
SYMPATEX
SYstème de Messagerie unifiée avec présentation vocale des messages (PArole et TEXte)
Thomson-CSF, ELAN TTS, Irius
GET, ESIEE
Coding principle
parole Analyse spectral
e
Analyse prosodiqu
e
Reconnaissance HMM
Dictionnaire des modèles
HMM des unités ALISP
Représentant A1
…
Représentant A8
HMM A
Détermination des unités de
synthèse
Choix unité de synthèse par
DTW
Codage prosodie
Indice unité ALISP
Indice unité de
synthèsePitch,
énergie, temps
Decoding
Parole synthétique
Représentant A1
…Représentant A8
Indice ALISP
N° représentant de synthèse
Paramètres de prosodie
Choix unité de synthèse
Synthèse par
concaténation
Automatic Speech Recognition
Recognition of proper names and spellings
Keyword spotting, noise robustness, adaptation
Large Vocabulary Speech Recognition (SIROCCO)
http://perso.enst.fr/~sirocco/index-en.html
Markov Random Fields, Bayesian Networks and Graphical Models
Markov Random Fields Bayesian Networks
and Graphical Models
• Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) http://perso.enst.fr/~ggravier/recherche.html#these
• Comparative framework to study MRF, Bayesian Networks and Graphical Models. http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
Speaker Verification
Typology of approaches (EAGLES Handbook) Text dependent
Public password Private password Customized password Text prompted
Text independent Incremental enrolment Evaluation
Speaker Verification (text independent)
The ELISA consortium ENST, LIA, IRISA, ... http://www.lia.univ-avignon.fr/equipes/RAL/elisa/
index_en.html
NIST evaluations http://www.nist.gov/speech/tests/spk/
index.htm
Support Vector Machines and Speaker Verification
Hybrid GMM-SVM system is proposed
SVM scoring model trained on development data to classify true-target speakers access and impostors access,using new feature representation based on GMMs
Modeling
Scoring
GMM
SVM
SVM principles
X (X)
Inpu
t sp
ace
Feat
ure
spac
e Separating hyperplan H , with the optimal hyperplan Ho
Ho
H
Class(X)
Results
Voice technology in Majordome
Server side background tasks:continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject
User interaction: Speaker identification and verification Speech recognition (receiving user
commands through voice interaction) Text-to-speech synthesis (reading text
summaries, E-mails or faxes)
Perspectives within COST-277
Text-book on Speech Processing
Evaluation of parametric representations of speech for diverse applications
Fundamental work on voice transformations with applications in coding, synthesis, recognition and speaker characterisation
Fundamental work on noise robustness with applications in coding, recognition and speaker verification