june 28th, 2004 biosecure, securephone 1 automatic speaker verification : technologies, evaluations...

35
June 28th, 200 June 28th, 200 4 BioSecure, SecurePhone BioSecure, SecurePhone 1 Automatic Speaker Automatic Speaker Verification : Verification : Technologies, Evaluations Technologies, Evaluations and Possible Future and Possible Future Gérard CHOLLET Gérard CHOLLET CNRS-LTCI, GET-ENST CNRS-LTCI, GET-ENST chollet @ tsi.enst.fr Biometrics in Current Security Biometrics in Current Security Environments Environments

Upload: cecily-harrell

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 11

Automatic Speaker Automatic Speaker Verification : Verification :

Technologies, EvaluationsTechnologies, Evaluationsand Possible Futureand Possible Future

Gérard CHOLLETGérard CHOLLETCNRS-LTCI, GET-ENSTCNRS-LTCI, GET-ENST

[email protected]

Biometrics in Current Security EnvironmentsBiometrics in Current Security Environments

Page 2: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 22

OutlineOutline State of affairs (tasks, security, forensic,…)State of affairs (tasks, security, forensic,…) Speaker characteristics in the speech signalSpeaker characteristics in the speech signal Automatic Speaker Verification :Automatic Speaker Verification :

Decision theoryDecision theory Text dependent / Text independentText dependent / Text independent

Imposture (occasional, dedicated)Imposture (occasional, dedicated) Voice transformationsVoice transformations Audio-visual speaker verificationAudio-visual speaker verification Evaluations (algorithms, field tests, ergonomy,…)Evaluations (algorithms, field tests, ergonomy,…) Conclusions, PerspectivesConclusions, Perspectives

Page 3: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 33

Why should a computer recognize Why should a computer recognize who is speaking ?who is speaking ?

Protection of individual property (habitation, bank account, Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...) personal data, messages, mobile phone, PDA,...)

Limited access (secured areas, data bases)Limited access (secured areas, data bases) Personalization (only respond to its master’s voice)Personalization (only respond to its master’s voice) Locate a particular person in an audio-visual document Locate a particular person in an audio-visual document

(information retrieval)(information retrieval) Who is speaking in a meeting ?Who is speaking in a meeting ? Is a suspect the criminal ? (forensic applications)Is a suspect the criminal ? (forensic applications)

Page 4: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 44

Tasks in Tasks in Automatic Speaker RecognitionAutomatic Speaker Recognition

Speaker verification (Voice Biometric)Speaker verification (Voice Biometric) Are you really who you claim to be ?Are you really who you claim to be ?

Identification (Speaker ID) :Identification (Speaker ID) : Is this speech segment coming from a known speaker ?Is this speech segment coming from a known speaker ? How large is the set of speakers (population of the How large is the set of speakers (population of the

world) ? world) ? Speaker detection, segmentation, indexing, retrieval, tracking :Speaker detection, segmentation, indexing, retrieval, tracking :

Looking for recordings of a particular speakerLooking for recordings of a particular speaker Combining Speech and Speaker RecognitionCombining Speech and Speaker Recognition

Adaptation to a new speaker, speaker typologyAdaptation to a new speaker, speaker typology Personalization in dialogue systemsPersonalization in dialogue systems

Page 5: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 55

ApplicationsApplications

Access ControlAccess Control Physical facilities, Computer networks, WebsitesPhysical facilities, Computer networks, Websites

Transaction AuthenticationTransaction Authentication Telephone banking, e-CommerceTelephone banking, e-Commerce

Speech data ManagementSpeech data Management Voice messaging, Search enginesVoice messaging, Search engines

Law EnforcementLaw Enforcement Forensics, Home incarcerationForensics, Home incarceration

Page 6: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 66

Voice BiometricVoice Biometric AvantagesAvantages

Often the only modality over the telephone,Often the only modality over the telephone, Low cost (microphone, A/D), UbiquityLow cost (microphone, A/D), Ubiquity Possible integration on a smart (SIM) card Possible integration on a smart (SIM) card Natural bimodal fusion : speaking faceNatural bimodal fusion : speaking face

DisadvantagesDisadvantages Lack of discretionLack of discretion Possibility of imitation and electronic imposturePossibility of imitation and electronic imposture Lack of robustness to noise, distortion,…Lack of robustness to noise, distortion,… Temporal driftTemporal drift

Page 7: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 77

Speaker Identity in SpeechSpeaker Identity in Speech Differences inDifferences in

Vocal tract shapes and muscular controlVocal tract shapes and muscular control Fundamental frequency (typical values)Fundamental frequency (typical values)

100 Hz (Male), 200 Hz (Female), 300 Hz (Child)100 Hz (Male), 200 Hz (Female), 300 Hz (Child) Glottal waveformGlottal waveform PhonotacticsPhonotactics Lexical usageLexical usage

The differences between Voices of Twins is a limit The differences between Voices of Twins is a limit casecase

Voices can also be imitated or disguisedVoices can also be imitated or disguised

Page 8: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 88

spectral envelope of / i: /

f

A

Speaker A

Speaker B

Speaker Identity

segmental factors (~30ms)segmental factors (~30ms) glottal excitationglottal excitation::

fundamental frequency, amplitude,fundamental frequency, amplitude,voice quality (e.g., breathiness)voice quality (e.g., breathiness)

vocal tractvocal tract::characterized by its transfer function characterized by its transfer function and represented by MFCCs (Mel and represented by MFCCs (Mel Freq. Cepstral Coef)Freq. Cepstral Coef)

suprasegmental factorssuprasegmental factors speaking speed (timing and rhythm of speech units)speaking speed (timing and rhythm of speech units) intonation patternsintonation patterns dialect, accent, pronunciation habitsdialect, accent, pronunciation habits

Page 9: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 99

Acoutic featuresAcoutic features

Short term spectral analysisShort term spectral analysis

Page 10: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1010

Intra- and Inter-speaker Intra- and Inter-speaker variabilityvariability

Page 11: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1111

Speaker Verification

Typology of approaches (EAGLES Handbook) Text dependent

Public password Private password Customized password Text prompted

Text independent Incremental enrolment Evaluation

Page 12: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1212

History of Speaker History of Speaker RecognitionRecognition

Page 13: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1313

Current approachesCurrent approaches

Page 14: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1414

HMM structure depends on the HMM structure depends on the applicationapplication

Page 15: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1515

Gaussian Mixture ModelGaussian Mixture Model Parametric representation of the Parametric representation of the

probability distribution of observations:probability distribution of observations:

Page 16: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1616

Gaussian Mixture ModelsGaussian Mixture Models

8 Gaussians per mixture

Page 17: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1717

Two types of errors :Two types of errors : False rejectionFalse rejection (a client is rejected) (a client is rejected) False acceptationFalse acceptation (an impostor is accepted) (an impostor is accepted)

Decision theory : given an observation O and a claimed Decision theory : given an observation O and a claimed identityidentity HH00 hypothesis : it comes from an impostor hypothesis : it comes from an impostor HH1 1 hypothesis : it comes from our clienthypothesis : it comes from our client

HH1 1 is chosen if and only if P(is chosen if and only if P(HH11|O) > P(|O) > P(HH00|O) |O)

which could be rewritten (using Bayes law) aswhich could be rewritten (using Bayes law) as

Decision theory Decision theory for identity verificationfor identity verification

)1()(

)(

)1(

HPHoP

HoOP

HOP>

Page 18: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1818

Signal detection theorySignal detection theory

Page 19: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 1919

DecisionDecision

Page 20: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2020

Distribution of scoresDistribution of scores

Page 21: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2121

Detection Error Tradeoff (DET) Detection Error Tradeoff (DET) CurveCurve

Page 22: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2222

EvaluationEvaluation

Decision cost (FA, FR, priors, costs,…)Decision cost (FA, FR, priors, costs,…) Receiver Operating Characteristic CurveReceiver Operating Characteristic Curve Reference systems (open software)Reference systems (open software) Evaluations (algorithms, field trials, Evaluations (algorithms, field trials,

ergonomy,…)ergonomy,…)

Page 23: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2323

National Institute of Standards & Technology National Institute of Standards & Technology (NIST)(NIST)

Speaker Verification EvaluationsSpeaker Verification Evaluations

• Annual evaluation since 1995• Common paradigm for comparing technologies

Page 24: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2424

NIST evaluations : ResultsNIST evaluations : Results

ENST 2003

Page 25: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2525

Combining Speech Recognition Combining Speech Recognition and Speaker Verification.and Speaker Verification.

Speaker independent phone HMMsSpeaker independent phone HMMs Selection of segments or segment Selection of segments or segment

classes which are speaker specificclasses which are speaker specific Preliminary evaluations are performed Preliminary evaluations are performed

on the NIST extended data set (one on the NIST extended data set (one hour of training data per speaker)hour of training data per speaker)

Page 26: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2626

ALISP data-driven speech ALISP data-driven speech segmentationsegmentation

Page 27: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2727

Searching in client and world speech Searching in client and world speech dictionaries dictionaries

for speaker verification purposesfor speaker verification purposes

Page 28: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2828

FusionFusion

Page 29: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 2929

Fusion resultsFusion results

Page 30: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3030

Speaking Faces : MotivationsSpeaking Faces : Motivations

A person speaking in front of a camera offers 2 A person speaking in front of a camera offers 2 modalities for identity verification (speech and face).modalities for identity verification (speech and face).

The sequence of face images and the The sequence of face images and the synchronisation of speech and lip movements could synchronisation of speech and lip movements could be exploited.be exploited.

Imposture is much more difficult than with single Imposture is much more difficult than with single modalities.modalities.

Many PCs, PDAs, mobile phones are equiped with a Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…non-intrusive security for e-commerce, e-banking,…

Page 31: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3131

Talking Face RecognitionTalking Face Recognition(hybrid verification)(hybrid verification)

Page 32: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3232

Lip featuresLip features

Tracking lip movementsTracking lip movements

Page 33: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3333

A talking face modelA talking face model

Using Hidden Markov Models (HMMs)Using Hidden Markov Models (HMMs)

Acoustic parameters

Visual parameters

Page 34: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3434

Morphing, avatarsMorphing, avatars

Page 35: June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST

June 28th, 2004June 28th, 2004 BioSecure, SecurePhoneBioSecure, SecurePhone 3535

Conclusions, PerspectivesConclusions, Perspectives

Deliberate imposture is a challenge for speech only systems

Verification of identity based on features extracted from talking faces should be developped

Common databases and evaluation protocols are necessary

Free access to reference systems will facilitate future developments