audio-visual speech processing gérard chollet, hervé bredin, thomas hueber, rémi landais, patrick...

12
Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Upload: devereux-pruvost

Post on 03-Apr-2015

104 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Audio-Visual Speech Processing

Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari

NOLISP, Paris, March 23rd 2007

Page 2: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 8 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Compulsory ? for:– Homeland/firms security: restricted

accesses,…– Secured computer login– Secured on-line signature of contracts

(e-Commerce)

Page 3: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 9 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Available features

– Face/Face features (lip, eyes) Face Modality– Speech Speech Modality– Speech Synchrony Synchrony Modality

Page 4: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 10 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face modality– Detection:

• Generative models (MPT toolbox)• Temporal median Filtering• Eyes detection within faces

– Normalization: geometry + illumination

Page 5: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 11 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face modality– Selection:

• Keep only the most reliable detection results

• Based on the distance Rel between a detected zone and its projection over the eigenfaces space

Page 6: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 12 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification Face Modality:

– Two verification strategies and one single comparison framework

• Global = Eigenfaces:– Calculation of a set of directions (eigenfaces)

defining a projection space– Two faces are compared regarding their

projection on the eigenfaces space.– Learning data: BIOMET (130 pers.) + BANCA

(30 pers.)

Page 7: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 13 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face Modality:• SIFT descriptors:

– Keypoints extraction– Keypoints representation: 128-dimensional

vector (gradient orientation histogramme,…) + 4-dimensional position vector

SIFT descriptor(dim 128)

Position (x,y) + scale + orientation(dim 4)

Page 8: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 14 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Face Modality:• SVD-based matching method:

– Compare two videos V1 and V2– Exclusive principle: One-to-one correspondences

between» Faces (global)» Descriptors (local)

– Principle:» Proximity matrix computation between faces or

descriptors» Extraction of good pairings (made easy by SVD

computation)– Scores:

» One matching score between global representations

» One matching score between local representations

Page 9: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 15 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Speech Modality:– GMM-based approach;

• One world model• Each speaker model is derived from the

World Model by MAP adaptation• Speech verification score: derived from

likelihood ratio

Page 10: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 16 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Synchrony Modality:– Principle: synchrony between lips and

speech carries identity information– Process:

• Computation of a synchrony model (CoIA analysis) for each person based on DCT (visual signal) and MFCC (speech signal)

• Comparison of the test sample with the synchrony model

Page 11: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 17 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Experiments:– BANCA database:

• 52 persons divided into two groups (G1 and G2)• 3 recording conditions• 1 person 8 recordings (4 client accesses, 4

impostor accesses) • Evaluation based on P protocol: 234 client

accesses and 312 impostor accesses– Scores:

• 4 scores per access (PCA face, SIFT face, speech, synchrony)

• Score fusion based on RBF-SVM: hyperplan learned on G1/tested on G2 and conversely)

Page 12: Audio-Visual Speech Processing Gérard Chollet, Hervé Bredin, Thomas Hueber, Rémi Landais, Patrick Perrot, Leila Zouari NOLISP, Paris, March 23rd 2007

Page 18 NOLISP 2007, PARIS 23 Mai 2007

Audiovisual identity verification

Experiments: