some activities on non-linear speech processing at enst/cnrs-ltci gérard chollet...

Some activities on Non-linear Speech Processing

at ENST/CNRS-LTCI

Gérard [email protected]

ENST/CNRS-LTCI46 rue Barrault

75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet

Outline

What is ENST/CNRS-LTCI ?

Research and application topics related to COST-277: Speech production and perception, Speech analysis and synthesis, Speech coding:

The SYMPATEX project Automatic speech recognition:

The SIROCCO project Speaker characterisation and verification

Perspectives within COST-277

ENST:ENST: Ecole Nationale Supérieure des Ecole Nationale Supérieure des TélécommunicationsTélécommunicationshttp://www.enst.frhttp://www.enst.fr

CNRS:CNRS: Centre National de la Recherche ScientifiqueCentre National de la Recherche Scientifiquehttp://www.cnrs.frhttp://www.cnrs.fr

LTCI:LTCI: Laboratoire de Traitement et Communication Laboratoire de Traitement et Communication de l’Informationde l’Information

http://www.enst.fr/externe/ura.htmlhttp://www.enst.fr/externe/ura.html

Our affiliations

What is ENST?Ecole Nationale de

Télécommunications

• classed among the

‘Grandes Ecoles d'Ingénieurs’.

• 250 state certified engineers

each year .

• part of ‘Groupement des Ecoles

de Télécommunications’

ENST-Paris ( ) ENST-Bretagne in Brest Institut National des

Télécommunications in Evry EURECOM in Sophia-Antipolis ENIC (Ecole Nouvelle d’Ingénieurs en

Télécoms) in Lille Internet school in Marseille

GET: Groupement des Ecoles de Télécommunications

Speech Production and Perception

Parametric Vocal Tract model (Shinji Maeda)

Non-linear Production model using Distinctive Regions and Modes (René Carré)

Quantal nature of speech (R. Carré and S. Maeda)

Perceptual filter (Nicolas Moreau)

Auditory prosthesis (Alain Goyé and Jacques Prado)

Speech analysis and synthesis

Time-Frequency representations, Wavelets

Time-dependent spectral models (Yves Grenier)

HNM (Harmonics + Noise Model)(Olivier Cappé, Eric Moulines, Maurice Charbit)

Glottal Excited LPC

Time-dependent Spectral Models

Temporal Decomposition (B. Atal, 1983)

Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier)

Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)

Temporal Decomposition

HNM: Harmonics + Noise Model

Estimation des harmoniques

Estimation de l’enveloppe harmonique

Paramètres H+Bf

A

Signal à l ’entrée Voisement

EstimationAR du

résiduel

Détection dupitch, etl’énergie

EstimationAR

+-

+

Voisé

Non-voisé

f

A

A L I S P

A utomatic L anguage I ndependent S peech P rocessing

Automatic discovery of segmental units for speech coding, synthesis, recognition, language

identification and speaker verification.

Speech Coding by indexing

SYMPATEX

SYstème de Messagerie unifiée avec présentation vocale des messages (PArole et TEXte)

Thomson-CSF, ELAN TTS, Irius

GET, ESIEE

Coding principle

parole Analyse spectral

e

Analyse prosodiqu

e

Reconnaissance HMM

Dictionnaire des modèles

HMM des unités ALISP

Représentant A1

…

Représentant A8

HMM A

Détermination des unités de

synthèse

Choix unité de synthèse par

DTW

Codage prosodie

Indice unité ALISP

Indice unité de

synthèsePitch,

énergie, temps

Decoding

Parole synthétique

Représentant A1

…Représentant A8

Indice ALISP

N° représentant de synthèse

Paramètres de prosodie

Choix unité de synthèse

Synthèse par

concaténation

Automatic Speech Recognition

Recognition of proper names and spellings

Keyword spotting, noise robustness, adaptation

Large Vocabulary Speech Recognition (SIROCCO)

http://perso.enst.fr/~sirocco/index-en.html

Markov Random Fields, Bayesian Networks and Graphical Models

Markov Random Fields Bayesian Networks

and Graphical Models

• Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) http://perso.enst.fr/~ggravier/recherche.html#these

• Comparative framework to study MRF, Bayesian Networks and Graphical Models. http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html

Speaker Verification

Typology of approaches (EAGLES Handbook) Text dependent

Public password Private password Customized password Text prompted

Text independent Incremental enrolment Evaluation

Speaker Verification (text independent)

The ELISA consortium ENST, LIA, IRISA, ... http://www.lia.univ-avignon.fr/equipes/RAL/elisa/

index_en.html

NIST evaluations http://www.nist.gov/speech/tests/spk/

index.htm

Support Vector Machines and Speaker Verification

Hybrid GMM-SVM system is proposed

SVM scoring model trained on development data to classify true-target speakers access and impostors access,using new feature representation based on GMMs

Modeling

Scoring

GMM

SVM

SVM principles

X (X)

Inpu

t sp

ace

Feat

ure

spac

e Separating hyperplan H , with the optimal hyperplan Ho

Ho

H

Class(X)

Results

Voice technology in Majordome

Server side background tasks:continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject

User interaction: Speaker identification and verification Speech recognition (receiving user

commands through voice interaction) Text-to-speech synthesis (reading text

summaries, E-mails or faxes)

Perspectives within COST-277

Text-book on Speech Processing

Evaluation of parametric representations of speech for diverse applications

Fundamental work on voice transformations with applications in coding, synthesis, recognition and speaker characterisation

Fundamental work on noise robustness with applications in coding, recognition and speaker verification

some activities on non-linear speech processing at enst/cnrs-ltci gérard chollet...

Documents