amsp : advanced methods for speech processing an expression of interest to set up a network of...
TRANSCRIPT
![Page 1: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/1.jpg)
AMSP : Advanced Methods for Speech Processing
An expression of Interest to set up a Network of Excellence in FP6
Prepared by members of COST-277 and colleagues
Submitted by Marcos FAUNDEZ-ZANUY
Presented here by Gérard [email protected] GET-ENST/CNRS-LTCI
http://www.tsi.enst.fr/~chollet
![Page 2: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/2.jpg)
Outline
Rationale of the proposition Objectives Approaches Modeling Recognition by synthesis Robustness to environmental conditions Evaluation paradigm Excellence Integration and structuring effect
![Page 3: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/3.jpg)
Rationale for the NoE-AMSP
The areas of Automatic Speech Processing (recognition, synthesis, coding, language identification, speaker verification) should be better integrated
Better models of Speech Production and Perception
Investigate Nonlinear Speech Processing Understanding, Semantic interpretation
![Page 4: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/4.jpg)
Integrated platform for Automatic Speech Processing
DISCRETEMODELS
SYN
TH
ET
IC SP
EE
CH
HU
MA
N S
PE
EC
HCODED SPEECH
WRITTEN SPEECH
TtSStT
StCCtS
Analysis Synthesis
Recogn.
Cod
ing
![Page 5: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/5.jpg)
Levels of representationsONE-LAYER CODES
MULTI-LAYER CODES
PCM
LPC,CELP
PLP,WLP
DiscreteModels
Orthography,IPA
No Models, One Quality Layer
Source-Filter Model (SFM)Two Quality Layers
SFM + Perceptual Aspects (PA)Two Quality Layers
SFM + PA + ArticulatoryAspects & Dynamics (AA)Three or more Quality Layers
SFM + PA + AA +Language Specific AspectsMany Quality Layers
![Page 6: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/6.jpg)
Features of Speech Models
Reflect auditory properties of human perception Explain articulatory movements Surpass the limitations of the source-filter model Capture the dynamics of speech Capable of natural speech restitution Be discriminant for segmental information Robust to noise and channel distortions Adaptable to new speakers and new
environments
![Page 7: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/7.jpg)
Time – Frequency distributions
Short Time Fourier Transform Non-linear frequency scale (PLP, WLP), mel-
cepstrum Wavelets, FAMlets Bilinear distributions (Wigner-Ville, Choi-Williams,...) Instantaneous frequency, Teager operator Time – dependent representations (parametric and
non parametric) Vector quantisation Matrix quantisation, non linear prediction
![Page 8: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/8.jpg)
Time-dependent Spectral Models
Temporal Decomposition (B. Atal, 1983)
Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier)
Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)
![Page 9: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/9.jpg)
Modeling of segmental units
Hidden Markov Model Markov Fields Bayesian Networks, Graphical ModelsOR Production models Synthesis (concatenative or rule based) with voice transformationAND / OR Non linear predictor
![Page 10: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/10.jpg)
Expected achievements in Speech Coding and Synthesis
Modeling the non-linearities in Speech Production and Perception will lead to more accurate and/or compact parametric representations.
Integrate segmental recognition and synthesis techniques in the coding loop to achieve bit rates as low as a few 100's bps with natural quality
Develop voice transformation techniques in order to : Adapt segmental coders to new speakers, Modify the characteristics of synthetic voices
![Page 11: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/11.jpg)
Expected achievements inSpeech Synthesis
Self-excited nonlinear feedback oscillators will allow to better match synthetic and human voices.
Current concatenative techniques should be supplemented (or replaced) by (nonlinear) model based generative techniques to improve quality, naturalness, flexibility, training and adaptation.
Model-based voice mimicry controled by textual, phonetic and/or parametric input should not only improve synthesis but also coding, recognition and speaker characterisation.
![Page 12: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/12.jpg)
Automatic Speech Recognition
Limitations of the HMM and hybrid HMM-ANN approaches
Keyword spotting (detection with SVM), noise robustness, adaptation
Large Vocabulary Speech Recognition (SIROCCO) http://perso.enst.fr/~sirocco/index-en.html
Markov Random Fields, Bayesian Networks and Graphical Models
![Page 13: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/13.jpg)
Markov Random Fields Bayesian Networks
and Graphical Models
• Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) http://perso.enst.fr/~ggravier/recherche.html#these
• Comparative framework to study MRF, Bayesian Networks and Graphical Models. http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html
![Page 14: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/14.jpg)
Recognition by Synthesis
If we could drive a synthesizer with meaningful units (phone sequences, words,...) to produce a speech signal that mimics the one to recognize, we may come close to transcription.
Analysis by Synthesis (which is in fact modeling) is a powerful tool in recognition and coding.
A trivial implementation is indexing a labelled speech memory
![Page 15: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/15.jpg)
A L I S P
Automatic Language Independent Speech Processing
Automatic discovery of segmental units for speech coding, synthesis, recognition, language
identification and speaker verification.
![Page 16: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/16.jpg)
The robustness issue :
Mismatch between training and testing conditions
High Order Statistics are less sensitive to environment and transmission noise than autocorrelation
CMS, RASTA filtering Independent Component Analysis
From Speaker Independent to Speaker Dependent recognition (Personalisation)
![Page 17: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/17.jpg)
Expected achievements inAutomatic Speech Recognition
Dynamic nonlinear models should allow to merge feature extraction and classification under a common paradigm
Such models should be more robust to noise, channel distortions and missing data (transmission errors and packet losses)
Indexing a speech memory may help in the verification of hypotheses (a technique shared with Very Low Bit Rate Coders)
Statistical language models should be supplemented with adapted semantic information (conceptual graphs)
![Page 18: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/18.jpg)
Voice technology in Majordome
Server side background tasks:continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject
User interaction: Speaker identification and verification Speech recognition (receiving user
commands through voice interaction) Text-to-speech synthesis (reading text
summaries, E-mails or faxes)
![Page 19: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/19.jpg)
Collaboration with COST-278
COST-278: Vocal Dialogue is a continuation of COST-249 High interest in Robust Speech Recognition,
Word spotting, Speech to actions, Speaker adaptation,...
Some members contribute to the Eureka-MAJORDOME project
Could be the seed for a Network of Excellence in FP6
![Page 20: AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues](https://reader035.vdocument.in/reader035/viewer/2022072006/56649f515503460f94c74e94/html5/thumbnails/20.jpg)
Evaluation paradigm
DARPA NIST
http://www.nist.gov/speech/tests/spk/index.htm
Could we organize evaluation campaigns in Europe ?
The 6th program of the EU is trying to promote Networks of Excellence.
How should excellence be evaluated ?Should financial support be correlated with
evaluation results ?