fusion gérard chollet [email protected]@ get-enst/cnrs-ltci 46 rue barrault 75634 paris cedex 13...
TRANSCRIPT
![Page 1: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/1.jpg)
Fusion
Gérard [email protected]
GET-ENST/CNRS-LTCI46 rue Barrault
75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet
![Page 2: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/2.jpg)
Plan
Motivations, Applications Reconnaissance de formes Multi-capteurs
Rehaussement du signal Parametres Scores Decisions Conclusions Perspectives
![Page 3: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/3.jpg)
Introduction
Reconnaissance des formes Pourquoi fusionner ? Que fusionner ?
Des signaux issus de capteurs divers, Des parametres mesures sur ces signaux, Des scores calculés par des classificateurs, Des decisions prises par des classificateurs
Comment fusionner ?
![Page 4: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/4.jpg)
Reconnaissance de formes
![Page 5: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/5.jpg)
Fusion de signaux Nombre de capteurs Types de capteurs
Identiques ? Nombre de sources Exemples :
Réseaux de microphones Stérovision Seïsmographe
![Page 6: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/6.jpg)
Fusion de paramètres
Issus d’un seul capteur Issus de plusieurs capteurs Modèles multi-flux Exemples :
Reconnaissance de la parole Réseaux bayésiens
![Page 7: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/7.jpg)
Fusion de scores
![Page 8: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/8.jpg)
Fusion de décisions
![Page 9: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/9.jpg)
Vector Quantization (VQ)
bestquant.
),()Y,X( X2
jiCd y∑=μ
Dictionnaire locuteur 1
Dictionnaire locuteur 2
Dictionnaire locuteur n
“Bonjour” locuteur test Y
Dic
tionn
aire
locu
teur
X
SOONG, ROSENBERG 1987
![Page 10: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/10.jpg)
Hidden Markov Models (HMM)
Bestpath
)S(Plog)Y,X(iXjy∑−=μ
“Bonjour” locuteur 1
“Bonjour” locuteur 2
“Bonjour” locuteur n
“Bonjour” locuteur test Y
“Bon
jour
” lo
cute
ur X
ROSENBERG 1990, TSENG 1992
![Page 11: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/11.jpg)
Ergodic HMM
Best path
)S(Plog)Y,X(iXjy∑−=μ
HMM locuteur 1
HMM locuteur 2
HMM locuteur n
“Bonjour” locuteur test Y
HM
M lo
cute
ur X
PORITZ 1982, SAVIC 1990
![Page 12: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/12.jpg)
Gaussian Mixture Models (GMM)
REYNOLDS 1995
![Page 13: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/13.jpg)
HMM structure depends on the application
![Page 14: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/14.jpg)
Gaussian Mixture Model
Parametric representation of the probability distribution of observations:
![Page 15: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/15.jpg)
Gaussian Mixture Models
8 Gaussians per mixture
![Page 16: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/16.jpg)
Support Vector Machines and Speaker Verification
Hybrid GMM-SVM system is proposed
SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs
Modeling
Scoring
GMM
SVM
![Page 17: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/17.jpg)
SVM principles
X (X)
Inpu
t sp
ace
Feat
ure
spac
e Separating hyperplans H , with the optimal hyperplan Ho
Ho
H
Class(X)
![Page 18: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/18.jpg)
Results
![Page 19: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/19.jpg)
Combining Speech Recognition and Speaker Verification.
Speaker independent phone HMMs Selection of segments or segment classes
which are speaker specific Preliminary evaluations are performed on the
NIST extended data set (one hour of training data per speaker)
Some developments were done during a 6 weeks workshop (SuperSID) during summer 2002
![Page 20: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/20.jpg)
SuperSID experiments
![Page 21: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/21.jpg)
GMM with cepstral features
![Page 22: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/22.jpg)
Selection of nasals in words in -ing
being everything getting anything
thing something things
going
![Page 23: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/23.jpg)
Fusion
![Page 24: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/24.jpg)
Fusion results
![Page 25: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/25.jpg)
Audio-Visual Identity Verification
A person speaking in front of a camera offers 2 modalities for identity verification (speech and face).
The sequence of face images and the synchronisation of speech and lip movements could be exploited.
Imposture is much more difficult than with single modalities.
Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…
![Page 26: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/26.jpg)
Examples of Speaking Faces
Sequence of digits (PIN code)
Free text
QuickTime™ et undécompresseur sont requis pour visionner cette image. QuickTime™ et undécompresseur sont requis pour visionner cette image.
![Page 27: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/27.jpg)
Fusion of Speech and Face
(from thesis of Conrad Sanderson, aug. 2002)
![Page 28: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/28.jpg)
1. Acquisition of biometric signals for each modality2. Scores are computed for each modality3. Fusion of scores and decision
InsecureNetwork
Distant server:1. Access to private data2. Secured transactions
An illustration
![Page 29: Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 chollet](https://reader035.vdocument.in/reader035/viewer/2022062511/551d9d82497959293b8bb7b5/html5/thumbnails/29.jpg)
Conclusions and Perspectives Speech is often the only usable biometric
modality (over the telephone network).
Interactive Voice Servers may use both text dependent and text independent approaches for improved verification accuracy.
Evaluation campaigns and research workshops are efficient means to stimulate progress.
Most PCs, PDAs and Mobile Phones will be equipped with cameras. Audio-Visual Identity Verification should find applications in e-Banking, e-Commerce, ….