informati cs dpt. 1 05- 06/2001 digital days speaker verification alexandros xafopoulos
Post on 20-Dec-2015
213 Views
Preview:
TRANSCRIPT
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
1
SpeakerVerification
Alexandros Xafopoulos
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
2
Presentation OutlinePresentation Outline
• Framework• Preprocessing -• Features (Extraction, Noise
Compensation-Channel Equalization, Selection)
• (Pattern) Matching-Modeling• Decision Making -• Performance Evaluation• Experimental Results• References
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
3
IntroductionIntroductionFramework
• Motivation– Speech contains speaker specific
characteristics (physiological-behavioral)• vocal tract• pitch range - vocal cords• articulator movement
– Mouth– Nasal cavity– Lips
– Voiceprint as a biometric (distinguishing trait)
– Natural & economical way
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
4
Framework
• Objective– Discriminate betw. a given speaker & all
others
• Definitions– Verification < Latin verus (true)
• Claim: Speaker identity• Proof: Speech utterance• Binary decision to establish the truth
– Client: speaker registered on the system– Impostor: speaker who claims a false identity– Model: set of parameters that represents a
speaker or a group of speakers
Introduction(2)Introduction(2)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
5
Digital Signal
Processing
• Signal Processing
Analog Signal
Processing
Speaker Recognition
Storage / Transmissio
n
Other Signals
Speech Processing
Recognition Coding / Synthesis
Enhancement
Analysis
Signal processing
Speaker Identificatio
n
Language Identificatio
n
Speech Recognition
Speaker Detection / Tracking
Speaker Verification
FrameworkRelated Research AreasRelated Research Areas
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
6
• (Statistical) Pattern Recognition
Data FeatureExtractor
Features TrainedClassifier
ClassLabel
FrameworkRelated Research Areas(2)Related Research Areas(2)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
7
• Biometrics Technology– def: automatic identification of a person
based on his/her physiological or behavioral characteristics (=biometrics)
– desirable properties of biometrics [Jain_bk]• universality (found in every person)• uniqueness (different ”value” for each person)• permanence (invariant with time)• collectability (quantitatively measurable)• performance ( accuracy vs. resources)• high acceptability (person’s willingness)• low circumvention (not easy to ”fool”)
FrameworkRelated Research Areas(3)Related Research Areas(3)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
8
• Speech Science Communication by speech [Somervuo]
FrameworkRelated Research Areas(4)Related Research Areas(4)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
9
Framework
Speech Production Physiology
[Picone]
[Picone]
• Speech Science(2)
Related Research Areas(5)Related Research Areas(5)
Block Diagram of Human Speech Production
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
10
Framework
General Discrete-Time Model for Speech Production[Morgan] (corrected)• Speech Science(3)
Related Research Areas(6)Related Research Areas(6)
Gain for voice source
Gain for noise source
G(z)
H(z) R(z)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
11
Framework
• Enrollment (Training) module
Digital SignalAcquisition
DigitalSpeech Feature
Creation
FeatureVectors
Speech PressureWave of ”A”
Speaker ”A”N utterances
ModelRegistration
N Sets ofFeatureVectors
Known Identity: ”Speaker is ”A””
SpeakerModel of ”A”
Generic Speaker Verification Generic Speaker Verification ProcessProcess
Channel to transfer signal
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
12
Framework
• Enrollment module(2)– Digital signal acquisition
• Sampling frequency:
Microphone
AnalogVoltageSignal
Antialiasinglow-pass
filter
ConditionedAnalogSignal
Speech PressureWave
Sampling &Quantization
(A/D converter)
Digital Speech
Generic Speaker Verification Generic Speaker Verification Process(2)Process(2)
sF
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
13
Framework
• Enrollment module(3): Feature creation
Preprocessing
Preprocessed Digital Speech
FeatureExtraction
PlainFeatureVectors
DigitalSpeech
Noise Compensation
& Channel
Equalization(Clean)FeatureVectors
Generic Speaker Verification Generic Speaker Verification Process(3)Process(3)
Feature Selection
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
14
Framework
• Verification module
DigitalSignal
Aquisition
DigitalSpeech Feature
Creation
FeatureVectors
Speech PressureWave of ”A”
PatternMatching
Thresholdof ”B”
Acceptance (A=B)or Rejection (AB)
MatchingResults Decision
Making
|P| SpeakerModels Model
Selection
SpeakerModel of ”B”
ClaimedIdentity B
Generic Speaker Verification Generic Speaker Verification Process(4)Process(4)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
15
Framework
• Threshold setting module
SpeakerModel of ”A”
ThresholdSetting
Thresholdof ”A”
SpeakerModels of
Generic Speaker Verification Generic Speaker Verification Process(5)Process(5)
"A"
model) (world or model)(cohort A:A h
• Cohort model: competitive clients only• World model: all the clients
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
16
Framework
• Text-dependency [Nedic]– Text dependent: verification done on a fixed
phrase, predetermined by the recognizer (fixed phrase)
– Text prompted: verification done on system generated sequence of predetermined words (fixed vocabulary)
– User customized: verification done on user requested phrase
– Text independent: verification done on any phrase– Language independent: verification done on any
language
• Vocabulary– Fixed or not– Size (|V|)
Corpus ParametersCorpus Parameters
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
17
Framework
• Population (Speakers)– Size (|P|)– Similarity
• Speech Flow– Discrete Utterance (pauses betw. words)– Continuous– Spontaneous (natural)
• Quantity (#sessions, #phrases, phrase duration)
• Quality of speech (Problems)
Corpus Parameters(2)Corpus Parameters(2)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
18
Framework
• Microphone / Communication channel / Digitizer quality
• Channel - Environmental mismatch (different channels - environments for enrollment & verification request)
• Mimicry by humans & tape recorders• Bad pronunciation• Extreme emotional states (e.g. anger)• Sickness / Allergies / Tiredness / Thirst• Aging• Environmental noise / Poor room acoustics
Problems under real conditionsProblems under real conditions
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
19
Framework
• False Rejection– A client makes a request to be verified as
himself/herself & the request is rejected– High rate client: goat [Koolwaaij]– Low rate client: sheep
• False Acceptance– An impostor makes a request to be verified as a
client & the request is accepted– High rate client: lamb– Low rate client: ram– High rate impostor: wolf– Low rate impostor: badger
ErrorsErrors
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
20
Framework
• Access control to databases / facilities
• Electronic commerce• Remote access to computer
networks• Forensic• Telephone banking [James]
ApplicationsApplications
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
21
Preprocessing
Preemphasis-Frame BlockingPreemphasis-Frame Blocking
• Preemphasis: Low order digital system to– spectrally flatten the signal (in favour of vocal tract
parameters)– make it less susceptible to later finite precision
effects– usually (order=1):
• Frame blocking (short-term(st) processing)– L successive overlapping (by M samples) frames
– window size - length: N samples = N/ sec– frame rate-shift-period: M samples = M/ sec
]1,90[ ),1()()( .αnsαnsns pepepe
LlN nlMnsnlf pe ,...,1 ,1,...,0)),1(();(
sFsF
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
22
Preprocessing
Frame WindowingFrame Windowing
• Used to minimize the singal discontinuities at the beg. & end of each frame– Time (long window)<->freq. (short) resolution
– Window type:
– Corrections:
[Picone]
1,...,0),();();( N nnwnlfnlfw
1,...,0 , ,1 NnnkNN
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
23
Preprocessing
Speech Activity DetectionSpeech Activity Detection
• Silence-speech detection• Voiced-unvoiced discrimination• Endpoint detection [Deller_bk]• Can be applied afterward
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
24
Preprocessing
Signal Measures & GraphsSignal Measures & Graphs
Zerocrossing rate
[Weingessel]
Speech waveform
Energy plotTime-frequency plot (Spectrogram)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
25
Feature ExtractionFeatures - GeneralFeatures - General
• Maps each speech interval-frame to a multidimensional feature space
• Order : number of coefficients in each feature vector (dimensionality)
• Several kinds of coefficients have been proposed
coefN
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
26
Feature ExtractionLinear Prediction (LP)Linear Prediction (LP)
• Speech sample as a linear combination of previous samples (autoregressive mdl):
– : LP coefficients (LPC)– : normalized excitation source– G : scale factor– : stLPC of
frame l
)()()()(1
nGumnsmansLPCN
mLPC
LPCLPC Nmmla ,...,1 ),;(
)(nu
LPCN
LPCLPC Nmma ,...,1 ),(
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
27
Feature ExtractionLinear Prediction (LP)(2)Linear Prediction (LP)(2)
• Calculation of stLPC– Mean squared error
minimization– Autocorrelation method
• Levinson-Durbin (L-D) recursion
– Covariance method• Cholesky (LU)
decomposition
[Picone2]
L-D recursion(l is implied,
R: autocorrelationmatrix)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
28
Feature ExtractionLinear Prediction (LP)(3)Linear Prediction (LP)(3)
• LPC– highly correlated– not orthonormal
• Distance: Itakura-Saito– Computationally expensive
• LPC processor [Rabiner_bk]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
29
Feature ExtractionCepstrum (Complex-Real)Cepstrum (Complex-Real)
• Special case of homomorphic signal proc.
• Focuses on voiced segments• Short-term complex cepstrum (stCC):
• Short-term real cepstrum (stRC):
• Distance of cepstrum based coefficients– Euclidean: vectors defined in an
orthonormal space
CCwCC Nmnlfmlc ,...,1 })},);((DFT{{logDFT);( 101
RCwRC Nmnlfmlc ,...,1 |},});(DFT{|{logDFT);( 101
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
30
Feature ExtractionMel CepstrumMel Cepstrum
• Mel– unit of measure of
perceived frequency of a tone
– non-linear correspondance to the physical freq. (like the human ear)
– mel freq. cepstral coefficients (MFCC):
– generalized case [Vergin]
Mel-cepstral feature generation (frame l)
[Young]
MFCCMFCC Nmmlc ,...,1 ),;(
7001log2595 10
Hzmel
ff
)()( , melfiltersmelFFT NN
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
31
Feature ExtractionLP derived CepstrumLP derived Cepstrum
• LP Cepstral Coefficients (LPCC):
);();();();(
:,...,11
1
kmlaklcm
kmlamlc
Nm
LPCLPCC
m
kLPCLPCC
LPC
);();();(
:,...,11
kmlaklcm
kmlc
NNm
LPCLPCC
m
NmkLPCC
LPCCLPC
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
32
Feature ExtractionOther cepstral variantsOther cepstral variants
• Linear Freq. Cepstral Coefficients (LFCC)– Like MFCC but:– filters are uniformally spaced on the Hz
scale
• Mel-warped LPCC (MLPCC) [Kuitert]– CC not directly derived from LPC– 1st compute the log magn. spectrum of LPC– then warp the freq. axis to correspond to
the mel axis
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
33
Feature ExtractionVariantsVariants
• Discrete Wavelet Transform (DWT) instead of FFT [Krishnan]
• Application of other type than triangular filters
• Application of the logarithm before the triangular filters
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
34
Feature ExtractionDelta CepstrumDelta Cepstrum
• [Milner]:
• Inclusion of temporal information
GCCK
Kk
GCC
K
KkGCC Nm
k
mklckmlc ,...,1 ,
);();(
2
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
35
Feature ExtractionPLP - Auditory FeaturesPLP - Auditory Features
• Perceptual Linear Prediction (PLP) [Hermansky]– Spectral scale: non-linear Bark scale
– Spectral features smoothed within freq. bands
• Auditory Features [Kumar]– Imitates signal proc. performed by the ear– cochlear modeling
2
2
75003.5atan
1000
76.0atan13 HzHz
Bark
fff
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
36
Noise Compensation-Channel EqualizationIntra-frame cepstral proc.Intra-frame cepstral proc.
• Liftering-weighting– low order coeffs: sensitive to overall spectral
slope– high order: sensitive to noise– =>tapered window (bandpass liftering)
• Adaptive Component Weighting (ACW)– motivation: all frames don't have same
distortion
GCCGCCGCCw
GCCGCC
GCC
Nmmlcmwmlc
NmN
mNmw
,...,1 ),;()();(
,...,1 ,π
sin2
1)(
[Mammone]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
37
Inter-frame cepstral proc.Inter-frame cepstral proc.
• Cepstral Mean Subtraction (CMS)– mean (over a num of frames) subtraction
(tackles training-testing discrepancy)
– lowpass filtering– eliminates communication channel
spectral shaping
• Pole Filtered CMS (PFCMS): cepstrum poles modification
GCCGCCkGCCGCCCMS Nmmkcmlcmlc ,...,1 )),;((avg);();(
Noise Compensation-Channel Equalization
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
38
RASTA proc.RASTA proc.
• Relative Spectral Filtering (RASTA) [Hermansky]– bandpass filtering in the log-spectral
domain– suppresses spectral components that
change more slowly or quickly than in typical speech
– RASTA-PLP• Microphone (type, position) robustness
Noise Compensation-Channel Equalization
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
39
Feature SelectionFeature Selection IntroductionFeature Selection Introduction
• Goal– find a transformation to a relatively low-
dimensional feature space that preserves the information pertinent to the application while enabling meaningful comparisons to be performed using measures of similarity
• Processing of features– Principal Component Analysis (PCA) (or
Karhunen Loève Expansion-KLE)• seeks a lower dimensional representation that
accounts for variance of the features• not necessarily optimum for class discrimination
– Linear Discriminant Analysis (LDA) [Jin]– Non LDA (NLDA) (using MLP) [Konig]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
40
Matching-ModelingMatching-Modeling IntroductionMatching-Modeling Introduction
• Modeling: creation of (speaker) models• Model: Can be considered as the output of a
proper proc. of a speaker’s set of feature vectors
• Matching: computation of a match score betw. the input feature vectors & some speaker model
• Methods [Wassner]– Template Matching
• deterministic• score: distance betw. a test speaker (feature vectors of
an) utterance & a reference speaker model• better score: min distance
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
41
Matching-ModelingMatching-Modeling Introduction(2)Matching-Modeling Introduction(2)
• Methods(2)– Stochastic Approach
• probabilistic matching• score: prob. of generation of a speech
utterance by the claimed speaker• better score: max probability• Parametric speaker model: specific pdf is
assumed & its appropriate parameters (e.g. mean vector, covariance matrix) can be estimated using the Maximum Likelihood Estimation (MLE) e.g. multivariate normal model
)|( cSUP
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
42
Matching-ModelingTemplate Matching MethodsTemplate Matching Methods
• Dynamic Time Warping (DTW)– dynamic comparison betw. a test & a
reference (model) matrix (set of feature vectors)
– computes a distance betw. the test & ref. patterns
– allows time alignment at different costs– uses Dynamic Programming (DP)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
43
Matching-ModelingTemplate Matching Methods(2)Template Matching Methods(2)
• Dynamic Time Warping (DTW)(2)
The DP gridwith test (t)
& reference (r)feature vectorsat respectiveframe indices
[Picone]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
44
Matching-ModelingTemplate Matching Methods(3)Template Matching Methods(3)
• Dynamic Time Warping (DTW)(3)– distances-costs on the DP grid (i,j frame
indices, k step index)• Node
– e.g.
• Transition e.g.• Both
– e.g.
• Global
– K: number of transitions
)],(|),[( 11 kkkkT jijid
),( kkN jid
)],(|),[( 11 kkkkB jijid
LPCCN
mk
refLPCCk
testLPCC mjcmic
1
2));();((
)],(|),[(),( 11 kkkkTkkN jijidjid
][][ 11 kkkk jjii
K
kkkkkB jijidD
111 )],(|),[(
(Type 4)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
45
Matching-ModelingTemplate Matching Methods(4)Template Matching Methods(4)
• Dynamic Time Warping (DTW)(4)– DTW search constraints
• Endpoint Constraints (bottom left(S) - top right(E) corners)
– endpoint relaxation: max points allowed in each direction
• Monotonicity (going up & right)• Global Path Constraints (global movement area)
– permissible slope or– permissible window Wij kk
jEiEjSiS Δ,Δ,Δ,Δ
kkkk jjii 11
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
46
Matching-ModelingTemplate Matching Methods(5)Template Matching Methods(5)
• Dynamic Time Warping (DTW)(5)– DTW search
constraints(2)• Local Path Constraints (local movement area)
Sakoe & Shibalocal constraints
on DTWpath search
[Picone]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
47
Matching-ModelingTemplate Matching Methods(6)Template Matching Methods(6)
• Dynamic Time Warping (DTW)(6)– The minimum cost final endpoint
provides the distance betw. a test & a reference phrase
– Training-Modeling [Deller_bk]• Casual: Unaltered feature strings form models• Averaging feature strings of utterances• The stochastic techniques possess superior
training methods
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
48
Matching-ModelingTemplate Matching Methods(7)Template Matching Methods(7)
• Vector Quantization (VQ)– Uses intra-vector dependencies to break-
up a vector space in cells (unsupervised)– follows Linde-Buzo-Gray (LBG) algorithm– speaker model: codebook– codebook: set of prototype vectors used
to represent vector spaces– goal: data structure "discovery" by
finding how the data is clustered
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
49
Matching-ModelingTemplate Matching Methods(8)Template Matching Methods(8)
• Learning Vector Quantization (LVQ)– Predefined classes, labeled data– defines the class borders according to the
nearest neighbor rule– supervised version of VQ– set of variants (e.g. LVQ1,2,3)– goal: to determine a set of prototypes
that best represent each class.
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
50
Matching-ModelingStatistical MeasuresStatistical Measures
• Second Order Statistical Measures (SOSM) [Bimbot]– E.g. Arithmetic-Harmonic-Sphericity
(AHS)• speaker model: covariance matrix of feature
vectors• Distance=min(=0) iff all eigenvalues of test &
ref covar matrices are equal
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
51
Matching-ModelingGenerative ModelsGenerative Models
• Hidden Markov Models (HMMs)– Statistical - stochastic– Flexible– Types
• Continuous Density (CD)• Discrete• SemiContinuous (SC) [Falavigna]
– Model: prob. distributions e.g. mixtures of Gaussians of the feature vectors of the speaker
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
52
Matching-ModelingGenerative Models(2)Generative Models(2)
• Hidden Markov Models (HMMs)(2)– Topologies
• Left-Right (LR) (self & right connections): attempts to catch the temporal structure of the speech & to link consecutive short-time observations together
#states/unit (e.g. phoneme) #gaussian distributions(mixtures)/state
[Kumar]Example of a left-right HMM
: feature vectorskO
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
53
Matching-ModelingGenerative Models(3)Generative Models(3)
• Hidden Markov Models (HMMs)(3)– Topologies(2)
• Ergodic (fullyconnected)
-AR HMMs: the prob.distrib. associatedwith each state isestimated via an ARprocess [Bourlard]
[Picone]Example of an ergodic HMM
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
54
Matching-ModelingGenerative Models(4)Generative Models(4)
• Gaussian Mixture Models (GMMs)– Single multi-Gaussian state ”HMM”– Uses a mixture of Gaussian densities to
model the distribution of the feature vectors of each speaker
– Local covariance info
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
55
Matching-ModelingNeural Networks (NN)Neural Networks (NN)
• Feed-Forward Neural Networks– supervised learning– each speaker has his own NN (each
checked in turn to find the best match)– classifier-matcher: NN output– positive/negative training (rivals)
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
56
Matching-ModelingNeural Networks (NN)(2)Neural Networks (NN)(2)
• Feed-Forward NNs(2)– Types [Haykin_bk]
• Multilayer Perceptron (MLP): trained usually with the Back-Propagation (BP) algorithm
– Error Correction Learning– Global optimization
• Time Delay NNs (TDNN)• Radial Basis Function (RBF) Networks [Lo]
– Memory-Based Learning– Local optimization
• Support Vector Machines (SVM)– Learning by examples– Vapnik-Chervonenkis (VC) dimension: framework
for the development of SVMs
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
57
Matching-ModelingNeural Networks (NN)(3)Neural Networks (NN)(3)
• Self Organizing Maps (SOM)– unsupervised learning– method to form a topologically ordered
codebook– speaker model: codebook– density of prototype vectors approaches
the pdf of the input vectors during the training
– nonlinear projection– competitive (winner neuron) learning
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
58
Matching-ModelingNNs & Combined MethodsNNs & Combined Methods
• DTW-SOM– associate an entire feature vector
sequence, instead of a single feature vector, as a model with each SOM node (also DTW-LVQ) [Somervuo]
• Recurrent NNs (RNN) [Shrimpton]– (self-or not) feedback
• Combined methods [Genoud]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
59
Matching-ModelingSub-band Proc. IntroductionSub-band Proc. Introduction
• Speech signal split into band-limited channels (freq. ranges)
Block diagram of an LPCC-based sub-band processing system
[Finan]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
60
Decision MakingDecision ApproachesDecision Approaches
• ”Template” approach– threshold setting: based on inter- & intra-
speaker scores/distances– comparison:
test score<=thresholdacceptance [Fakotakis]
• Statistical approach [Bengio] [Bourlard]– : speaker RV for identity c being claimed– : utterance represented by feat. vectors– : other speakers RV)(
)()|()|(
UP
SPSUPUSP cc
c
cS
U
cS
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
61
Decision MakingDecision Approaches(2)Decision Approaches(2)
• Statistical approach(2)– Claim c is true if:
– : decision threshold usu. found assuming Gaussian distributions for and
normalised likelihood - likelihood ratio– using logs:
Log Likelihood Ratio (LLR)
cccc
c
c ΘSUPSUPSUP
SUP )|(log)|(loglog
)|(
)|(log
cc
c
c
c
c
c
SP
SP
SUP
SUP
USP
USP )(
)(
)|(
)|(1
)|(
)|(
c)|( cSUP )|( cSUP
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
62
Decision MakingDecision Approaches(3)Decision Approaches(3)
• Statistical approach(3)– : speaker dependent model– : normalization factor– cohort model : group of selected
speakers who are more competitive with the model of the claimed id• No well-established selection procedure
– world model : all other speakers• less computation & storage needed
)|( cSUP
cS
chc SS )|( cSUP
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
63
Decision MakingDecision Approaches(4)Decision Approaches(4)
• Statistical approach extensions– If– sign(y) gives the decision– Techniques:
• Bayes Decision Rule (assumes prob.s perfectly estimated)
– Minimizes Half Total Error Rate(HTER)
• Linear Regression• SVM Regression
ccc ΘSUPSUPy )|(log)|(log
2
FR%FA%HTER
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
64
Decision MakingThreshold SettingThreshold Setting
• speaker dependent– |P| thresholds:
• speaker independent– 1 threshold:
• leave one (client o) out– |P|*|P| thresholds:
• a priori: computed on training set (enrollment data) [Lindberg]
• a posteriori: computed on test set (obtained during actual use of the system)
|P|1,...,c , c
|P|o|P|cco 1,..., ,1,..., ,
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
65
Decision MakingHypothesis TestingHypothesis Testing
Valid & impostor densities
[Campbell]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
66
Decision MakingHypothesis Testing(2)Hypothesis Testing(2)
Probability terms & definitions
[Campbell]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
67
Performance EvaluationAccuracyAccuracy
• Error %s– FAR (False Acceptance Rate):
• Prob. of false acceptance
– FRR (False Rejection Rate):• Prob. of false rejection
– Values for FAR & FRR are adjusted by changing the threshold values: FAR vs. FRR
claims false#
sacceptance false#
claims true#
rejections false#
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
68
Performance EvaluationAccuracy(2)Accuracy(2)
• Error %s(2)– EER (Equal Error Rate): operating point
where FAR FRR– Choice of 2 subsequent operating points
to approximate the EER value
– MDE (Minimum Decision Error): operating point where FRR 10FAR
11
11
11
11
FRRFARFRRFAR
,FRRFRRFARFAR
,)FRRFRR()FAR(FAR
FARFRRFARFRREER
kkkk
iiii
kkkk
kkkk
i
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
69
Performance EvaluationAccuracy(3)Accuracy(3)
• Graphs
• Quantities– #speakers correctly/wrongly verified
ROC (Receiver OperatingCharacteristics) curve:
Plot of differentoperating points
(FRR vs. FAR values).Called also DET(Detection ErrorTradeoff) plot
[Gauvain]
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
70
Performance EvaluationComputational ComplexityComputational Complexity
• CPU time– Training
• Feature creation• Modeling• Threshold setting
– Testing (verification throughput)• Feature creation• Matching
• Memory-disk storage– Speech database, Features, Models,
Thresholds
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
71
Experimental ResultsParametersParameters
KHz48sF
Text dependent – Fixed vocab.: Digits 0-9 in French or Spanish|V|=10 |P|=37 (M2VTS database)Discrete utterance speech flow#sessions(shots)/speaker=5, the 5th is for testing|S|=4#phrases/session=1 (0-9 utterance)Phrase duration~6sec
95.0peα (30ms) 360N (20ms) 240MWindow type: Hamming
12LPCCN 12LPCNLiftering-weighting: );( mlc LPCCw
Proc. Freq.=12KHz
Coefficients: LPCC
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
72
Experimental ResultsParameters(2)-EERParameters(2)-EER
Matching method: DTW
Nd : Euclidean Td : Type 4 TNB ddd 30W10Δ ,10Δ ,10Δ ,10Δ jEiEjSiS
Local path constraint: Sakoe & Shiba (b)
Decision approach: TemplateThreshold setting: leave one out|P|(client left out).|P-1|(rest clients as claimants).|S|(shot left out for claiming-testing)=5328 client claims|P|(client left out as impostor).|P-1|(claims of the impostor as one of the rest clients).|S|(shot left out for claiming)=5328 impostor claimsEER(avg)[0.6569%,1.5390%] (FAR1=1.5390%
>FRR1=0.6569%)EER(avg)=[EER(1|234)+EER(2|134)+EER(3|124)+EER(4|123)]/4
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
73
Experimental ResultsParameters(3)-EERParameters(3)-EER
Difference:
EER(avg)=4.1817%EER(5|123)=5.4054%
12MFCCN512)( melFFTN
Coefficients: MFCC
40)( melfiltersN
Shot 4 left out, shot 5 used for testing:|P|.|P-1|=1332 client & 1332 impostor claimsEER(5|123)=2.7027%
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
74
ReferencesReferences
• [Bengio] S. Bengio and J. Mariéthoz, Learning the Decision Function for Speaker Verification, IDIAP Research Report, 2001
• [Bimbot] F. Bimbot, I. Magrin-Chagnolleau and L. Mathan, Second-Order Statistical Measures for Text-Independent Speaker Identification, Speech Communication, vol. 17, pp. 177-192, 1995
• [Bourlard] H. Bourlard and N. Morgan, Speaker Verication: A Quick Overview, IDIAP Research Report, 1998
• [Campbell] J.P. Campbell Jr., Speaker Recognition: a Tutorial, Proc. of the IEEE, vol. 85, no. 9, pp. 1437-1462, 1997
• [Deller_bk] J.R. Deller, J.G. Proakis and J.H. Hansen, Discrete-time Processing of Speech Signals, Macmillan, New York, 1993
• [Fakotakis] N. Fakotakis, E. Dermatas, G. Kokkinakis, Optimal Decision Threshold for Speaker Verification, in Signal Processing III: Theories and Applications, editor: I.T. Young et al., pp. 585-587, Elsevier Science Publishers B.V. (North Holland), 1986
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
75
References(2)References(2)
• [Falavigna] D. Falavigna, Comparison Of Different Hmm Based Methods For Speaker Verification (citeseer)
• [Finan] R.A. Finan, R.I. Damper and A.T. Sapeluk, Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing (citeseer)
• [Gauvain] J. Gauvain, L. Lamel and B. Prouts, Experiments with Speaker Verification over the Telephone, Eurospeech’95, pp. 651-654, 1995
• [Genoud] D. Genoud, F. Bimbot, G. Gravier and G. Chollet, Combining Methods to Improve Speaker Verification Decision, Proc. of ICSLP'96, vol. 3, pp. 1756-1759, 1996
• [Haykin_bk] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1995
• [Hermansky] H. Hermansky and N. Morgan, Rasta Processing of Speech, IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
76
References(3)References(3)
• [Jain_bk] A. Jain, R. Bolle and S. Pankanti, editors, Biometrics: Personal Identification in Networked Society, Kluwer Academic Publishers, Boston, MA, 1999
• [James] D. James, H. Hutter and F. Bimbot, CAVE -- Speaker Verification in Banking and Telecommunications (citeseer)
• [Jin] Q. Jin and A. Waibel, Application of LDA to Speaker Recognition (citeseer)
• [Konig] Y. Konig, L. Heck, M. Weintraub and K. Sonmez, Nonlinear Discriminant Feature Extraction for Robust Text-Independent Speaker Recognition, Proc. of RLA2C’98 (Speaker Recognition and Its Commercial and Forensic Applications), 1998
• [Koolwaaij] J.W. Koolwaaij and L. Boves, A New Procedure for Classifying Speakers in Speaker Verification Systems, Proc. of Eurospeech'97, pp. 2355-2358, 1997
• [Krishnan] M. Krishnan, C. Neophytou and G. Prescott, Wavelet Transform Speech Recognition using Vector Quantization, Dynamic Time Warping and Artificial Neural Networks, 1994
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
77
References(4)References(4)
• [Kuitert] M. Kuitert and L. Boves, Speaker Verification with GSM Coded Telephone Speech, Proc. of Eurospeech'97, vol. 2, pp. 975-978, 1997
• [Kumar] N. Kumar, Investigation of Silicon Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, PhD thesis, Johns Hopkins University, 1997
• [Lindberg] J. Lindberg, J.W. Koolwaaij, H.-P. Hutter, D. Genoud, M. Blomberg, F. Bimbot and J.-B. Pierrot, Techniques for a priori Decision Threshold Estimation in Speaker Verification, Proc. of RLA2C’98, 1998
• [Lo] T.F. Lo and M.W. Mak, A New Intra-Frame and Inter-Frame Cepstral Processing Method for Telephone-Based Speaker Verification, Int. Workshop on Multimedia Data Storage, Retrieval, Integration and Applications, pp. 116-122, 2000
• [Mammone] R.J. Mammone, X. Zhang and R.P. Ramachandran, Robust Speaker Recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 58-71, Sep. 1996
• [Milner] B. Milner, Inclusion of Temporal Information into Features for Speech Recognition, Proc. of ICSLP’96, pp. 256-259, 1996
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
78
References(5)References(5)
• [Morgan] N. Morgan and B. Gold, Speech Analysis and Synthesis Overview, Lecture, Univ. of California Berkeley, 1999
• [Nedic] B. Nedic and H. Bourlard, Recent Developments in Speaker Verification at IDIAP, IDIAP Research Report, 2000
• [Picone] J. Picone, Fundamentals of Speech Recognition: A Short Course, Mississippi State Univ., 1996
• [Picone2] J. Picone, Signal Modeling Techniques in Speech Recognition, Proc. of the IEEE, vol. 81, no. 9, pp. 1215-1247, 1993
• [Rabiner_bk] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993
• [Shrimpton] D. Shrimpton and B.D. Watson, Comparison of Recurrent Neural Network Architectures for Speaker Verification. Proc. of the Fourth Australian International Conference on Speech Science and Technology, pp. 460-464, 1992
05-06/2001
Digital DaysDigital Days
Informatics Dpt.
79
References(6)References(6)
• [Somervuo] P. Somervuo, Speech Recognition using Context Vectors and Multiple Feature Streams, Helsinki University of Technology, Faculty of Electrical Engineering, 1996
• [Vergin] R. Vergin, D. O'Shaughnessy and A. Farhat, Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition, IEEE Trans. on Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, 1999
• [Wassner] H. Wassner, G. Maitre and G. Chollet, Speaker Verification : a Review, Technical Report, IDIAP, 1996
• [Weingessel] A. Weingessel, Speech Recognition (citeseer)• [Young] S. Young, Large vocabulary continuous speech
recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 45-57, 1996
top related