informati cs dpt. 1 05- 06/2001 digital days speaker verification alexandros xafopoulos

05-06/2001

Digital DaysDigital Days

Informatics Dpt.

SpeakerVerification

Alexandros Xafopoulos

05-06/2001

Informatics Dpt.

Presentation OutlinePresentation Outline

• Framework• Preprocessing -• Features (Extraction, Noise

Compensation-Channel Equalization, Selection)

• (Pattern) Matching-Modeling• Decision Making -• Performance Evaluation• Experimental Results• References

05-06/2001

Informatics Dpt.

IntroductionIntroductionFramework

• Motivation– Speech contains speaker specific

characteristics (physiological-behavioral)• vocal tract• pitch range - vocal cords• articulator movement

– Mouth– Nasal cavity– Lips

– Voiceprint as a biometric (distinguishing trait)

– Natural & economical way

05-06/2001

Informatics Dpt.

Framework

• Objective– Discriminate betw. a given speaker & all

others

• Definitions– Verification < Latin verus (true)

• Claim: Speaker identity• Proof: Speech utterance• Binary decision to establish the truth

– Client: speaker registered on the system– Impostor: speaker who claims a false identity– Model: set of parameters that represents a

speaker or a group of speakers

Introduction(2)Introduction(2)

05-06/2001

Informatics Dpt.

Digital Signal

Processing

• Signal Processing

Analog Signal

Processing

Speaker Recognition

Storage / Transmissio

Other Signals

Speech Processing

Recognition Coding / Synthesis

Enhancement

Analysis

Signal processing

Speaker Identificatio

Language Identificatio

Speech Recognition

Speaker Detection / Tracking

Speaker Verification

FrameworkRelated Research AreasRelated Research Areas

05-06/2001

Informatics Dpt.

• (Statistical) Pattern Recognition

Data FeatureExtractor

Features TrainedClassifier

ClassLabel

FrameworkRelated Research Areas(2)Related Research Areas(2)

05-06/2001

Informatics Dpt.

• Biometrics Technology– def: automatic identification of a person

based on his/her physiological or behavioral characteristics (=biometrics)

– desirable properties of biometrics [Jain_bk]• universality (found in every person)• uniqueness (different ”value” for each person)• permanence (invariant with time)• collectability (quantitatively measurable)• performance ( accuracy vs. resources)• high acceptability (person’s willingness)• low circumvention (not easy to ”fool”)

05-06/2001

Informatics Dpt.

• Speech Science Communication by speech [Somervuo]

05-06/2001

Informatics Dpt.

Framework

Speech Production Physiology

[Picone]

• Speech Science(2)

Related Research Areas(5)Related Research Areas(5)

Block Diagram of Human Speech Production

05-06/2001

Informatics Dpt.

Framework

General Discrete-Time Model for Speech Production[Morgan] (corrected)• Speech Science(3)

Related Research Areas(6)Related Research Areas(6)

Gain for voice source

Gain for noise source

H(z) R(z)

05-06/2001

Informatics Dpt.

Framework

• Enrollment (Training) module

Digital SignalAcquisition

DigitalSpeech Feature

Creation

FeatureVectors

Speech PressureWave of ”A”

Speaker ”A”N utterances

ModelRegistration

N Sets ofFeatureVectors

Known Identity: ”Speaker is ”A””

SpeakerModel of ”A”

Generic Speaker Verification Generic Speaker Verification ProcessProcess

Channel to transfer signal

05-06/2001

Informatics Dpt.

Framework

• Enrollment module(2)– Digital signal acquisition

• Sampling frequency:

Microphone

AnalogVoltageSignal

Antialiasinglow-pass

filter

ConditionedAnalogSignal

Speech PressureWave

Sampling &Quantization

(A/D converter)

Digital Speech

Generic Speaker Verification Generic Speaker Verification Process(2)Process(2)

05-06/2001

Informatics Dpt.

Framework

• Enrollment module(3): Feature creation

Preprocessing

Preprocessed Digital Speech

FeatureExtraction

PlainFeatureVectors

DigitalSpeech

Noise Compensation

& Channel

Equalization(Clean)FeatureVectors

Feature Selection

05-06/2001

Informatics Dpt.

Framework

• Verification module

DigitalSignal

Aquisition

DigitalSpeech Feature

Creation

FeatureVectors

Speech PressureWave of ”A”

PatternMatching

Thresholdof ”B”

Acceptance (A=B)or Rejection (AB)

MatchingResults Decision

Making

|P| SpeakerModels Model

Selection

SpeakerModel of ”B”

ClaimedIdentity B

05-06/2001

Informatics Dpt.

Framework

• Threshold setting module

SpeakerModel of ”A”

ThresholdSetting

Thresholdof ”A”

SpeakerModels of

model) (world or model)(cohort A:A h

• Cohort model: competitive clients only• World model: all the clients

05-06/2001

Informatics Dpt.

Framework

• Text-dependency [Nedic]– Text dependent: verification done on a fixed

phrase, predetermined by the recognizer (fixed phrase)

– Text prompted: verification done on system generated sequence of predetermined words (fixed vocabulary)

– User customized: verification done on user requested phrase

– Text independent: verification done on any phrase– Language independent: verification done on any

language

• Vocabulary– Fixed or not– Size (|V|)

Corpus ParametersCorpus Parameters

05-06/2001

Informatics Dpt.

Framework

• Population (Speakers)– Size (|P|)– Similarity

• Speech Flow– Discrete Utterance (pauses betw. words)– Continuous– Spontaneous (natural)

• Quantity (#sessions, #phrases, phrase duration)

• Quality of speech (Problems)

Corpus Parameters(2)Corpus Parameters(2)

05-06/2001

Informatics Dpt.

Framework

• Microphone / Communication channel / Digitizer quality

• Channel - Environmental mismatch (different channels - environments for enrollment & verification request)

• Mimicry by humans & tape recorders• Bad pronunciation• Extreme emotional states (e.g. anger)• Sickness / Allergies / Tiredness / Thirst• Aging• Environmental noise / Poor room acoustics

Problems under real conditionsProblems under real conditions

05-06/2001

Informatics Dpt.

Framework

• False Rejection– A client makes a request to be verified as

himself/herself & the request is rejected– High rate client: goat [Koolwaaij]– Low rate client: sheep

• False Acceptance– An impostor makes a request to be verified as a

client & the request is accepted– High rate client: lamb– Low rate client: ram– High rate impostor: wolf– Low rate impostor: badger

ErrorsErrors

05-06/2001

Informatics Dpt.

Framework

• Access control to databases / facilities

• Electronic commerce• Remote access to computer

networks• Forensic• Telephone banking [James]

ApplicationsApplications

05-06/2001

Informatics Dpt.

Preprocessing

Preemphasis-Frame BlockingPreemphasis-Frame Blocking

• Preemphasis: Low order digital system to– spectrally flatten the signal (in favour of vocal tract

parameters)– make it less susceptible to later finite precision

effects– usually (order=1):

• Frame blocking (short-term(st) processing)– L successive overlapping (by M samples) frames

– window size - length: N samples = N/ sec– frame rate-shift-period: M samples = M/ sec

]1,90[ ),1()()( .αnsαnsns pepepe

LlN nlMnsnlf pe ,...,1 ,1,...,0)),1(();(

05-06/2001

Informatics Dpt.

Preprocessing

Frame WindowingFrame Windowing

• Used to minimize the singal discontinuities at the beg. & end of each frame– Time (long window)<->freq. (short) resolution

– Window type:

– Corrections:

[Picone]

1,...,0),();();( N nnwnlfnlfw

1,...,0 , ,1 NnnkNN

05-06/2001

Informatics Dpt.

Preprocessing

Speech Activity DetectionSpeech Activity Detection

• Silence-speech detection• Voiced-unvoiced discrimination• Endpoint detection [Deller_bk]• Can be applied afterward

05-06/2001

Informatics Dpt.

Preprocessing

Signal Measures & GraphsSignal Measures & Graphs

Zerocrossing rate

[Weingessel]

Speech waveform

Energy plotTime-frequency plot (Spectrogram)

05-06/2001

Informatics Dpt.

Feature ExtractionFeatures - GeneralFeatures - General

• Maps each speech interval-frame to a multidimensional feature space

• Order : number of coefficients in each feature vector (dimensionality)

• Several kinds of coefficients have been proposed

05-06/2001

Informatics Dpt.

Feature ExtractionLinear Prediction (LP)Linear Prediction (LP)

• Speech sample as a linear combination of previous samples (autoregressive mdl):

– : LP coefficients (LPC)– : normalized excitation source– G : scale factor– : stLPC of

frame l

)()()()(1

nGumnsmansLPCN

LPCLPC Nmmla ,...,1 ),;(

LPCLPC Nmma ,...,1 ),(

05-06/2001

Informatics Dpt.

Feature ExtractionLinear Prediction (LP)(2)Linear Prediction (LP)(2)

• Calculation of stLPC– Mean squared error

minimization– Autocorrelation method

• Levinson-Durbin (L-D) recursion

– Covariance method• Cholesky (LU)

decomposition

[Picone2]

L-D recursion(l is implied,

R: autocorrelationmatrix)

05-06/2001

Informatics Dpt.

Feature ExtractionLinear Prediction (LP)(3)Linear Prediction (LP)(3)

• LPC– highly correlated– not orthonormal

• Distance: Itakura-Saito– Computationally expensive

• LPC processor [Rabiner_bk]

05-06/2001

Informatics Dpt.

Feature ExtractionCepstrum (Complex-Real)Cepstrum (Complex-Real)

• Special case of homomorphic signal proc.

• Focuses on voiced segments• Short-term complex cepstrum (stCC):

• Short-term real cepstrum (stRC):

• Distance of cepstrum based coefficients– Euclidean: vectors defined in an

orthonormal space

CCwCC Nmnlfmlc ,...,1 })},);((DFT{{logDFT);( 101

RCwRC Nmnlfmlc ,...,1 |},});(DFT{|{logDFT);( 101

05-06/2001

Informatics Dpt.

Feature ExtractionMel CepstrumMel Cepstrum

• Mel– unit of measure of

perceived frequency of a tone

– non-linear correspondance to the physical freq. (like the human ear)

– mel freq. cepstral coefficients (MFCC):

– generalized case [Vergin]

Mel-cepstral feature generation (frame l)

[Young]

MFCCMFCC Nmmlc ,...,1 ),;(

7001log2595 10

)()( , melfiltersmelFFT NN

05-06/2001

Informatics Dpt.

Feature ExtractionLP derived CepstrumLP derived Cepstrum

• LP Cepstral Coefficients (LPCC):

);();();();(

:,...,11

kmlaklcm

kmlamlc

LPCLPCC

kLPCLPCC

);();();(

:,...,11

kmlaklcm

LPCLPCC

NmkLPCC

LPCCLPC

05-06/2001

Informatics Dpt.

Feature ExtractionOther cepstral variantsOther cepstral variants

• Linear Freq. Cepstral Coefficients (LFCC)– Like MFCC but:– filters are uniformally spaced on the Hz

• Mel-warped LPCC (MLPCC) [Kuitert]– CC not directly derived from LPC– 1st compute the log magn. spectrum of LPC– then warp the freq. axis to correspond to

the mel axis

05-06/2001

Informatics Dpt.

Feature ExtractionVariantsVariants

• Discrete Wavelet Transform (DWT) instead of FFT [Krishnan]

• Application of other type than triangular filters

• Application of the logarithm before the triangular filters

05-06/2001

Informatics Dpt.

Feature ExtractionDelta CepstrumDelta Cepstrum

• [Milner]:

• Inclusion of temporal information

KkGCC Nm

mklckmlc ,...,1 ,

);();(

05-06/2001

Informatics Dpt.

Feature ExtractionPLP - Auditory FeaturesPLP - Auditory Features

• Perceptual Linear Prediction (PLP) [Hermansky]– Spectral scale: non-linear Bark scale

– Spectral features smoothed within freq. bands

• Auditory Features [Kumar]– Imitates signal proc. performed by the ear– cochlear modeling

75003.5atan

76.0atan13 HzHz

05-06/2001

Informatics Dpt.

Noise Compensation-Channel EqualizationIntra-frame cepstral proc.Intra-frame cepstral proc.

• Liftering-weighting– low order coeffs: sensitive to overall spectral

slope– high order: sensitive to noise– =>tapered window (bandpass liftering)

• Adaptive Component Weighting (ACW)– motivation: all frames don't have same

distortion

GCCGCCGCCw

GCCGCC

Nmmlcmwmlc

,...,1 ),;()();(

,...,1 ,π

[Mammone]

05-06/2001

Informatics Dpt.

Inter-frame cepstral proc.Inter-frame cepstral proc.

• Cepstral Mean Subtraction (CMS)– mean (over a num of frames) subtraction

(tackles training-testing discrepancy)

– lowpass filtering– eliminates communication channel

spectral shaping

• Pole Filtered CMS (PFCMS): cepstrum poles modification

GCCGCCkGCCGCCCMS Nmmkcmlcmlc ,...,1 )),;((avg);();(

Noise Compensation-Channel Equalization

05-06/2001

Informatics Dpt.

RASTA proc.RASTA proc.

• Relative Spectral Filtering (RASTA) [Hermansky]– bandpass filtering in the log-spectral

domain– suppresses spectral components that

change more slowly or quickly than in typical speech

– RASTA-PLP• Microphone (type, position) robustness

Noise Compensation-Channel Equalization

05-06/2001

Informatics Dpt.

Feature SelectionFeature Selection IntroductionFeature Selection Introduction

• Goal– find a transformation to a relatively low-

dimensional feature space that preserves the information pertinent to the application while enabling meaningful comparisons to be performed using measures of similarity

• Processing of features– Principal Component Analysis (PCA) (or

Karhunen Loève Expansion-KLE)• seeks a lower dimensional representation that

accounts for variance of the features• not necessarily optimum for class discrimination

– Linear Discriminant Analysis (LDA) [Jin]– Non LDA (NLDA) (using MLP) [Konig]

05-06/2001

Informatics Dpt.

Matching-ModelingMatching-Modeling IntroductionMatching-Modeling Introduction

• Modeling: creation of (speaker) models• Model: Can be considered as the output of a

proper proc. of a speaker’s set of feature vectors

• Matching: computation of a match score betw. the input feature vectors & some speaker model

• Methods [Wassner]– Template Matching

• deterministic• score: distance betw. a test speaker (feature vectors of

an) utterance & a reference speaker model• better score: min distance

05-06/2001

Informatics Dpt.

Matching-ModelingMatching-Modeling Introduction(2)Matching-Modeling Introduction(2)

• Methods(2)– Stochastic Approach

• probabilistic matching• score: prob. of generation of a speech

utterance by the claimed speaker• better score: max probability• Parametric speaker model: specific pdf is

assumed & its appropriate parameters (e.g. mean vector, covariance matrix) can be estimated using the Maximum Likelihood Estimation (MLE) e.g. multivariate normal model

)|( cSUP

05-06/2001

Informatics Dpt.

Matching-ModelingTemplate Matching MethodsTemplate Matching Methods

• Dynamic Time Warping (DTW)– dynamic comparison betw. a test & a

reference (model) matrix (set of feature vectors)

– computes a distance betw. the test & ref. patterns

– allows time alignment at different costs– uses Dynamic Programming (DP)

05-06/2001

Informatics Dpt.

Matching-ModelingTemplate Matching Methods(2)Template Matching Methods(2)

• Dynamic Time Warping (DTW)(2)

The DP gridwith test (t)

& reference (r)feature vectorsat respectiveframe indices

[Picone]

05-06/2001

Informatics Dpt.

• Dynamic Time Warping (DTW)(3)– distances-costs on the DP grid (i,j frame

indices, k step index)• Node

– e.g.

• Transition e.g.• Both

– e.g.

• Global

– K: number of transitions

)],(|),[( 11 kkkkT jijid

),( kkN jid

)],(|),[( 11 kkkkB jijid

refLPCCk

testLPCC mjcmic

2));();((

)],(|),[(),( 11 kkkkTkkN jijidjid

][][ 11 kkkk jjii

kkkkkB jijidD

111 )],(|),[(

(Type 4)

05-06/2001

Informatics Dpt.

• Dynamic Time Warping (DTW)(4)– DTW search constraints

• Endpoint Constraints (bottom left(S) - top right(E) corners)

– endpoint relaxation: max points allowed in each direction

• Monotonicity (going up & right)• Global Path Constraints (global movement area)

– permissible slope or– permissible window Wij kk

jEiEjSiS Δ,Δ,Δ,Δ

kkkk jjii 11

05-06/2001

Informatics Dpt.

• Dynamic Time Warping (DTW)(5)– DTW search

constraints(2)• Local Path Constraints (local movement area)

Sakoe & Shibalocal constraints

on DTWpath search

[Picone]

05-06/2001

Informatics Dpt.

• Dynamic Time Warping (DTW)(6)– The minimum cost final endpoint

provides the distance betw. a test & a reference phrase

– Training-Modeling [Deller_bk]• Casual: Unaltered feature strings form models• Averaging feature strings of utterances• The stochastic techniques possess superior

training methods

05-06/2001

Informatics Dpt.

• Vector Quantization (VQ)– Uses intra-vector dependencies to break-

up a vector space in cells (unsupervised)– follows Linde-Buzo-Gray (LBG) algorithm– speaker model: codebook– codebook: set of prototype vectors used

to represent vector spaces– goal: data structure "discovery" by

finding how the data is clustered

05-06/2001

Informatics Dpt.

• Learning Vector Quantization (LVQ)– Predefined classes, labeled data– defines the class borders according to the

nearest neighbor rule– supervised version of VQ– set of variants (e.g. LVQ1,2,3)– goal: to determine a set of prototypes

that best represent each class.

05-06/2001

Informatics Dpt.

Matching-ModelingStatistical MeasuresStatistical Measures

• Second Order Statistical Measures (SOSM) [Bimbot]– E.g. Arithmetic-Harmonic-Sphericity

(AHS)• speaker model: covariance matrix of feature

vectors• Distance=min(=0) iff all eigenvalues of test &

ref covar matrices are equal

05-06/2001

Informatics Dpt.

Matching-ModelingGenerative ModelsGenerative Models

• Hidden Markov Models (HMMs)– Statistical - stochastic– Flexible– Types

• Continuous Density (CD)• Discrete• SemiContinuous (SC) [Falavigna]

– Model: prob. distributions e.g. mixtures of Gaussians of the feature vectors of the speaker

05-06/2001

Informatics Dpt.

Matching-ModelingGenerative Models(2)Generative Models(2)

• Hidden Markov Models (HMMs)(2)– Topologies

• Left-Right (LR) (self & right connections): attempts to catch the temporal structure of the speech & to link consecutive short-time observations together

#states/unit (e.g. phoneme) #gaussian distributions(mixtures)/state

[Kumar]Example of a left-right HMM

: feature vectorskO

05-06/2001

Informatics Dpt.

• Hidden Markov Models (HMMs)(3)– Topologies(2)

• Ergodic (fullyconnected)

-AR HMMs: the prob.distrib. associatedwith each state isestimated via an ARprocess [Bourlard]

[Picone]Example of an ergodic HMM

05-06/2001

Informatics Dpt.

• Gaussian Mixture Models (GMMs)– Single multi-Gaussian state ”HMM”– Uses a mixture of Gaussian densities to

model the distribution of the feature vectors of each speaker

– Local covariance info

05-06/2001

Informatics Dpt.

Matching-ModelingNeural Networks (NN)Neural Networks (NN)

• Feed-Forward Neural Networks– supervised learning– each speaker has his own NN (each

checked in turn to find the best match)– classifier-matcher: NN output– positive/negative training (rivals)

05-06/2001

Informatics Dpt.

Matching-ModelingNeural Networks (NN)(2)Neural Networks (NN)(2)

• Feed-Forward NNs(2)– Types [Haykin_bk]

• Multilayer Perceptron (MLP): trained usually with the Back-Propagation (BP) algorithm

– Error Correction Learning– Global optimization

• Time Delay NNs (TDNN)• Radial Basis Function (RBF) Networks [Lo]

– Memory-Based Learning– Local optimization

• Support Vector Machines (SVM)– Learning by examples– Vapnik-Chervonenkis (VC) dimension: framework

for the development of SVMs

05-06/2001

Informatics Dpt.

Matching-ModelingNeural Networks (NN)(3)Neural Networks (NN)(3)

• Self Organizing Maps (SOM)– unsupervised learning– method to form a topologically ordered

codebook– speaker model: codebook– density of prototype vectors approaches

the pdf of the input vectors during the training

– nonlinear projection– competitive (winner neuron) learning

05-06/2001

Informatics Dpt.

Matching-ModelingNNs & Combined MethodsNNs & Combined Methods

• DTW-SOM– associate an entire feature vector

sequence, instead of a single feature vector, as a model with each SOM node (also DTW-LVQ) [Somervuo]

• Recurrent NNs (RNN) [Shrimpton]– (self-or not) feedback

• Combined methods [Genoud]

05-06/2001

Informatics Dpt.

Matching-ModelingSub-band Proc. IntroductionSub-band Proc. Introduction

• Speech signal split into band-limited channels (freq. ranges)

Block diagram of an LPCC-based sub-band processing system

[Finan]

05-06/2001

Informatics Dpt.

Decision MakingDecision ApproachesDecision Approaches

• ”Template” approach– threshold setting: based on inter- & intra-

speaker scores/distances– comparison:

test score<=thresholdacceptance [Fakotakis]

• Statistical approach [Bengio] [Bourlard]– : speaker RV for identity c being claimed– : utterance represented by feat. vectors– : other speakers RV)(

)()|()|(

SPSUPUSP cc

05-06/2001

Informatics Dpt.

Decision MakingDecision Approaches(2)Decision Approaches(2)

• Statistical approach(2)– Claim c is true if:

– : decision threshold usu. found assuming Gaussian distributions for and

normalised likelihood - likelihood ratio– using logs:

Log Likelihood Ratio (LLR)

c ΘSUPSUPSUP

SUP )|(log)|(loglog

)|(log

USP )(

c)|( cSUP )|( cSUP

05-06/2001

Informatics Dpt.

• Statistical approach(3)– : speaker dependent model– : normalization factor– cohort model : group of selected

speakers who are more competitive with the model of the claimed id• No well-established selection procedure

– world model : all other speakers• less computation & storage needed

)|( cSUP

chc SS )|( cSUP

05-06/2001

Informatics Dpt.

• Statistical approach extensions– If– sign(y) gives the decision– Techniques:

• Bayes Decision Rule (assumes prob.s perfectly estimated)

– Minimizes Half Total Error Rate(HTER)

• Linear Regression• SVM Regression

ccc ΘSUPSUPy )|(log)|(log

FR%FA%HTER

05-06/2001

Informatics Dpt.

Decision MakingThreshold SettingThreshold Setting

• speaker dependent– |P| thresholds:

• speaker independent– 1 threshold:

• leave one (client o) out– |P|*|P| thresholds:

• a priori: computed on training set (enrollment data) [Lindberg]

• a posteriori: computed on test set (obtained during actual use of the system)

|P|1,...,c , c

|P|o|P|cco 1,..., ,1,..., ,

05-06/2001

Informatics Dpt.

Decision MakingHypothesis TestingHypothesis Testing

Valid & impostor densities

[Campbell]

05-06/2001

Informatics Dpt.

Decision MakingHypothesis Testing(2)Hypothesis Testing(2)

Probability terms & definitions

[Campbell]

05-06/2001

Informatics Dpt.

Performance EvaluationAccuracyAccuracy

• Error %s– FAR (False Acceptance Rate):

• Prob. of false acceptance

– FRR (False Rejection Rate):• Prob. of false rejection

– Values for FAR & FRR are adjusted by changing the threshold values: FAR vs. FRR

claims false#

sacceptance false#

claims true#

rejections false#

05-06/2001

Informatics Dpt.

Performance EvaluationAccuracy(2)Accuracy(2)

• Error %s(2)– EER (Equal Error Rate): operating point

where FAR FRR– Choice of 2 subsequent operating points

to approximate the EER value

– MDE (Minimum Decision Error): operating point where FRR 10FAR

FRRFARFRRFAR

,FRRFRRFARFAR

,)FRRFRR()FAR(FAR

FARFRRFARFRREER

05-06/2001

Informatics Dpt.

Performance EvaluationAccuracy(3)Accuracy(3)

• Graphs

• Quantities– #speakers correctly/wrongly verified

ROC (Receiver OperatingCharacteristics) curve:

Plot of differentoperating points

(FRR vs. FAR values).Called also DET(Detection ErrorTradeoff) plot

[Gauvain]

05-06/2001

Informatics Dpt.

Performance EvaluationComputational ComplexityComputational Complexity

• CPU time– Training

• Feature creation• Modeling• Threshold setting

– Testing (verification throughput)• Feature creation• Matching

• Memory-disk storage– Speech database, Features, Models,

Thresholds

05-06/2001

Informatics Dpt.

Experimental ResultsParametersParameters

KHz48sF

Text dependent – Fixed vocab.: Digits 0-9 in French or Spanish|V|=10 |P|=37 (M2VTS database)Discrete utterance speech flow#sessions(shots)/speaker=5, the 5th is for testing|S|=4#phrases/session=1 (0-9 utterance)Phrase duration~6sec

95.0peα (30ms) 360N (20ms) 240MWindow type: Hamming

12LPCCN 12LPCNLiftering-weighting: );( mlc LPCCw

Proc. Freq.=12KHz

Coefficients: LPCC

05-06/2001

Informatics Dpt.

Experimental ResultsParameters(2)-EERParameters(2)-EER

Matching method: DTW

Nd : Euclidean Td : Type 4 TNB ddd 30W10Δ ,10Δ ,10Δ ,10Δ jEiEjSiS

Local path constraint: Sakoe & Shiba (b)

Decision approach: TemplateThreshold setting: leave one out|P|(client left out).|P-1|(rest clients as claimants).|S|(shot left out for claiming-testing)=5328 client claims|P|(client left out as impostor).|P-1|(claims of the impostor as one of the rest clients).|S|(shot left out for claiming)=5328 impostor claimsEER(avg)[0.6569%,1.5390%] (FAR1=1.5390%

>FRR1=0.6569%)EER(avg)=[EER(1|234)+EER(2|134)+EER(3|124)+EER(4|123)]/4

05-06/2001

Informatics Dpt.

Experimental ResultsParameters(3)-EERParameters(3)-EER

Difference:

EER(avg)=4.1817%EER(5|123)=5.4054%

12MFCCN512)( melFFTN

Coefficients: MFCC

40)( melfiltersN

Shot 4 left out, shot 5 used for testing:|P|.|P-1|=1332 client & 1332 impostor claimsEER(5|123)=2.7027%

05-06/2001

Informatics Dpt.

ReferencesReferences

• [Bengio] S. Bengio and J. Mariéthoz, Learning the Decision Function for Speaker Verification, IDIAP Research Report, 2001

• [Bimbot] F. Bimbot, I. Magrin-Chagnolleau and L. Mathan, Second-Order Statistical Measures for Text-Independent Speaker Identification, Speech Communication, vol. 17, pp. 177-192, 1995

• [Bourlard] H. Bourlard and N. Morgan, Speaker Verication: A Quick Overview, IDIAP Research Report, 1998

• [Campbell] J.P. Campbell Jr., Speaker Recognition: a Tutorial, Proc. of the IEEE, vol. 85, no. 9, pp. 1437-1462, 1997

• [Deller_bk] J.R. Deller, J.G. Proakis and J.H. Hansen, Discrete-time Processing of Speech Signals, Macmillan, New York, 1993

• [Fakotakis] N. Fakotakis, E. Dermatas, G. Kokkinakis, Optimal Decision Threshold for Speaker Verification, in Signal Processing III: Theories and Applications, editor: I.T. Young et al., pp. 585-587, Elsevier Science Publishers B.V. (North Holland), 1986

05-06/2001

Informatics Dpt.

References(2)References(2)

• [Falavigna] D. Falavigna, Comparison Of Different Hmm Based Methods For Speaker Verification (citeseer)

• [Finan] R.A. Finan, R.I. Damper and A.T. Sapeluk, Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing (citeseer)

• [Gauvain] J. Gauvain, L. Lamel and B. Prouts, Experiments with Speaker Verification over the Telephone, Eurospeech’95, pp. 651-654, 1995

• [Genoud] D. Genoud, F. Bimbot, G. Gravier and G. Chollet, Combining Methods to Improve Speaker Verification Decision, Proc. of ICSLP'96, vol. 3, pp. 1756-1759, 1996

• [Haykin_bk] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1995

• [Hermansky] H. Hermansky and N. Morgan, Rasta Processing of Speech, IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994

05-06/2001

Informatics Dpt.

• [Jain_bk] A. Jain, R. Bolle and S. Pankanti, editors, Biometrics: Personal Identification in Networked Society, Kluwer Academic Publishers, Boston, MA, 1999

• [James] D. James, H. Hutter and F. Bimbot, CAVE -- Speaker Verification in Banking and Telecommunications (citeseer)

• [Jin] Q. Jin and A. Waibel, Application of LDA to Speaker Recognition (citeseer)

• [Konig] Y. Konig, L. Heck, M. Weintraub and K. Sonmez, Nonlinear Discriminant Feature Extraction for Robust Text-Independent Speaker Recognition, Proc. of RLA2C’98 (Speaker Recognition and Its Commercial and Forensic Applications), 1998

• [Koolwaaij] J.W. Koolwaaij and L. Boves, A New Procedure for Classifying Speakers in Speaker Verification Systems, Proc. of Eurospeech'97, pp. 2355-2358, 1997

• [Krishnan] M. Krishnan, C. Neophytou and G. Prescott, Wavelet Transform Speech Recognition using Vector Quantization, Dynamic Time Warping and Artificial Neural Networks, 1994

05-06/2001

Informatics Dpt.

• [Kuitert] M. Kuitert and L. Boves, Speaker Verification with GSM Coded Telephone Speech, Proc. of Eurospeech'97, vol. 2, pp. 975-978, 1997

• [Kumar] N. Kumar, Investigation of Silicon Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, PhD thesis, Johns Hopkins University, 1997

• [Lindberg] J. Lindberg, J.W. Koolwaaij, H.-P. Hutter, D. Genoud, M. Blomberg, F. Bimbot and J.-B. Pierrot, Techniques for a priori Decision Threshold Estimation in Speaker Verification, Proc. of RLA2C’98, 1998

• [Lo] T.F. Lo and M.W. Mak, A New Intra-Frame and Inter-Frame Cepstral Processing Method for Telephone-Based Speaker Verification, Int. Workshop on Multimedia Data Storage, Retrieval, Integration and Applications, pp. 116-122, 2000

• [Mammone] R.J. Mammone, X. Zhang and R.P. Ramachandran, Robust Speaker Recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 58-71, Sep. 1996

• [Milner] B. Milner, Inclusion of Temporal Information into Features for Speech Recognition, Proc. of ICSLP’96, pp. 256-259, 1996

05-06/2001

Informatics Dpt.

• [Morgan] N. Morgan and B. Gold, Speech Analysis and Synthesis Overview, Lecture, Univ. of California Berkeley, 1999

• [Nedic] B. Nedic and H. Bourlard, Recent Developments in Speaker Verification at IDIAP, IDIAP Research Report, 2000

• [Picone] J. Picone, Fundamentals of Speech Recognition: A Short Course, Mississippi State Univ., 1996

• [Picone2] J. Picone, Signal Modeling Techniques in Speech Recognition, Proc. of the IEEE, vol. 81, no. 9, pp. 1215-1247, 1993

• [Rabiner_bk] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993

• [Shrimpton] D. Shrimpton and B.D. Watson, Comparison of Recurrent Neural Network Architectures for Speaker Verification. Proc. of the Fourth Australian International Conference on Speech Science and Technology, pp. 460-464, 1992

05-06/2001

Informatics Dpt.

• [Somervuo] P. Somervuo, Speech Recognition using Context Vectors and Multiple Feature Streams, Helsinki University of Technology, Faculty of Electrical Engineering, 1996

• [Vergin] R. Vergin, D. O'Shaughnessy and A. Farhat, Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition, IEEE Trans. on Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, 1999

• [Wassner] H. Wassner, G. Maitre and G. Chollet, Speaker Verification : a Review, Technical Report, IDIAP, 1996

• [Weingessel] A. Weingessel, Speech Recognition (citeseer)• [Young] S. Young, Large vocabulary continuous speech

recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 45-57, 1996

informati cs dpt. 1 05- 06/2001 digital days speaker verification alexandros xafopoulos

Documents

alexandros apostolakis

dpt curriculum

alexandros a. taﬂanidis1 robust stochastic design of...

dpt 20130507

alexandros tasianas

alexandros moraitides 3

alexandros stamatakis - hits ggmbh

alexandros nafpliotis paper

architecture portfolio lazaridis alexandros

dr. alexandros glykas

alexandros apostolakis, george datseris, constantine...

stathis potamitis and dr. alexandros rokas

technical data and ordering informati on

alexandros lappas - forth

provider directory · 2 days ago · physical therapy...

alexandros adam - university college london

alexandros kotsonis jp morgan presentation

production dpt

tutorial dpt

health informati on exchange: a guide to patient choice...