entropyanalysisinspeakerrecognition · overview on speaker recognition voice as biometric...
TRANSCRIPT
Entropy Analysis in Speaker Recognition
Andreas Nautsch
Hochschule Darmstadt, CASED, da/sec Security Research Group
Berlin, 15.12.2015
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 1/15
Outline
1. Motivation
2. Overview on Speaker Recognition
3. Biometric Strength of State-of-the-Art Voice Features
4. Conclusion
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 2/15
Introduction
Motivation: Different Approach towards Entropy
◮ Established: Goodness of (LLR) scoresFocus: scores values ⇔ expected meaning?
◮ Proposed: metric for the strength of biometric features◮ Collision probability of subjects within feature spaces◮ Metric towards biometric uniqueness◮ Comparability to other modalities on early stages
Face: 56 bit
Fingerprint: 84 bit
Iris: 249 bit
[Buchmann+14] N. Buchmann, C. Rathgeb, H. Baier, C. Busch: Towards electronic identification and trusted
services for biometric authenticated transactions in the Single Euro Payments Area, APF’14, 2014
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 3/15
Introduction
Overview on Speaker Recognition
◮ Voice as biometric characteristic
◮ Application scenarios and challenges (brief excerpt)◮ Call-Center fraud prevention:
natural free speech, variable duration and content
◮ Mobile devices: random PINs, short duration
◮ Forensic: various contents and signal qualities
Voice reference Feature extraction
Voice probe Feature extraction
Comparison Score
Accept
Reject
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 4/15
Introduction
Overview on Speaker Recognition
1. Psycho-acoustic spectral analyses
⇒ 60 Melody-Frequency Cepstral Coefficients (MFCCs)
Time
Airpressure
Time
Frequency
Time
MFCCs
−50
0
50
100
2. MFCC Clustering by Gaussian Mixture Models (GMMs)
⇒ 2048× 60 = 122 880 free parameters per voice sample
3. Total Variability Analysis: intermediate-sized vectors
⇒ 400-dimensional i-vectors per sample
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 5/15
Introduction
Overview on Speaker Recognition
4. Linear Discriminant Analyse (LDA)
⇒ 200-dimensional i-vector
5. Noise reduction by projection into spherical space
Probabilistic Discriminative speaker sub-spaces
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 6/15
Introduction
Cross-Entropy in Score-Domain: Cllr
Representing the Goodness of Log-Likelihood Ratio (LLR) scores:◮ Proper scoring rule:
Costs by too low genuine ⇔ too high impostor scores
◮ Generalized cross-entropy:Information-loss to perfect classification
−10 −5 0 5 100
2
4
6
8
10
LLR score S
Logarithmic
loss
Genuine scores
Impostor scores
Costlog(S |Hgenuine)
Costlog(S |Himpostor)
[Brummer10] N. Brummer: Measuring, refining and calibrating speaker and language information extracted
from speech. Ph.D. thesis, University of Stellenbosch, 2010.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 7/15
Introduction
Cross-Entropy in Score-Domain: Cllr
Examining all operating-points π◮ E : empiric Bayes error = πFNMR(η) + (1− π) FMR(η), η = −logit π
◮ Cllr =∫ inf
−infE dπ as application-independent uncertainty
◮ Aiming at: minimal information-loss: Cminllr =
∫ inf
−infEmindπ
−10 −5 0 5 100
0.5
1
Bayesian thresholds η
NBER
E
Edefault
Emin
η ≈ 4.6 ⇒ odds = 1 : 100
[Brummer+11] N. Brummer, E. de Villiers: The BOSARIS Toolkit User Guide: Theory, Algorithms and
Code for Binary Classifier Score Processing. Tech. Report, AGNITIO Research, 2011.
[Nautsch14] A. Nautsch: Speaker Verification using i-Vectors., M.Sc. thesis, Hochschule Darmstadt, 2014.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 8/15
Entropy in Feature-space Domain
Relative Entropy in Feature Spaces
Can the biometric discrimination potential of i-vector featurespaces be measured?
◮ Measuring biometric uniqueness
◮ Goal: Comparability of feature extractors◮ of one biometric modality◮ among modalities
◮ Note: entropy of passwords H = L log2N [NIST06]Example: 4-digit PIN H ≈ 13.3 bits
[NIST06] NIST Tech. Rep.: Electronic authentication guideline, recommendations of the national institute of
standards and technology, information security, 2006.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 9/15
Entropy in Feature-space Domain
Estimating relative Entropy
◮ Two-class problem: p same subject, q other subjects
◮ Kullback-Leibler Divergence as lower bound [Adler+06]
39.5 bits
73.5 bits
p sub-space
q1 sub-space
q2 sub-space
−10 −5 0 5 10−10
−5
0
5
10
Feature space dimension 1
Fea
ture
spac
ed
imen
sio
n2
0
0.1
0.2
0.3
0.4
[Adler+06] A. Adler, R. Youmaran, S. Loyka: Towards a Measure of Biometric Information, IEEE CCECE, 2006.[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 10/15
Entropy in Feature-space Domain
Estimating relative Entropy
◮ Two-class problem: p same subject, q other subjects
◮ Kullback-Leibler Divergence as lower bound [Adler+06]
39.5 bits
73.5 bits
47.5 bits
p sub-space
q1 sub-space
q2 sub-space
Gaussian model
for q space
−10 −5 0 5 10−10
−5
0
5
10
Feature space dimension 1
Fea
ture
spac
ed
imen
sio
n2
0
0.1
0.2
0.3
0.4
[Adler+06] A. Adler, R. Youmaran, S. Loyka: Towards a Measure of Biometric Information, IEEE CCECE, 2006.[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 10/15
Entropy in Feature-space Domain
Necessary Regularizations
Challenge: Which information is significant?
◮ Regularizations according to Adler et al. [Adler+06]:
1. Degenerated features: USqVt = svd(Σq)
⇒ Sub-space with: [Sq]i,j ≥ 10−10[Sq]1,1⇒ Subject sub-space as: Sp = U tΣpV
2. Insufficient data: [Σp]i,j = 0 i, j ≥ Np, i 6= j
◮ Extension for variable Np per subject [Nautsch+15]:
3. ill-disposed regularized models:⇒ iterative [Σp]i,j = 0 until Σp is positive-definite
4. Minimal amount of genuine samples: Np ≥ 10
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 11/15
Experimental Evaluation
Experimental Set-up
◮ Database: I4U i-vectors of NIST Speaker RecognitionEvaluation 2012 (SRE’12), comprising SREs 2004 – 2010
◮ 551 female, 425 male subjects
◮ Focus: incomplete probe samples⇒ Duration variations: 5s, 10s, 20s, 40s, full
◮ How do speaker sub-spaces accumulate by increasing sampleduration?
◮ Expectation: correlation to system performance
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 12/15
Experimental Evaluation
Accumulation of Voice Templates by Duration
81
163
244325
407
488
569
651
732813
895
976
0.25 0.5 1 2
Subject IDs
Ratio to full-full
full-5
full-full
0.25 0.5 1 2
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15
Experimental Evaluation
Accumulation of Voice Templates by Duration
81
163
244325
407
488
569
651
732813
895
976
0.25 0.5 1 2
Subject IDs
Ratio to full-full
full-5
full-10
full-full
0.25 0.5 1 2
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15
Experimental Evaluation
Accumulation of Voice Templates by Duration
81
163
244325
407
488
569
651
732813
895
976
0.25 0.5 1 2
Subject IDs
Ratio to full-full
full-5
full-10
full-20
full-full0.25 0.5 1 2
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15
Experimental Evaluation
Accumulation of Voice Templates by Duration
81
163
244325
407
488
569
651
732813
895
976
0.25 0.5 1 2
Subject IDs
Ratio to full-full
full-5
full-10
full-20
full-40
full-full
0.25 0.5 1 2
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 13/15
Experimental Evaluation
Comparison: relative Entropy ⇔ System Performance
full-5
full-10
full-20
full-40
full-full
0
100
200
300
400
500
Duration groups
RelativeEntropy(inbits)
minmeanmax
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15
Experimental Evaluation
Comparison: relative Entropy ⇔ System Performance
full-5
full-10
full-20
full-40
full-full
0
100
200
300
400
500
Duration groups
RelativeEntropy(inbits)
minmeanmax
2
5
10
20
4060
BiometricError
Rate(in%)
EER
FMR100
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15
Experimental Evaluation
Comparison: relative Entropy ⇔ System Performance
full-5
full-10
full-20
full-40
full-full
0
100
200
300
400
500
Duration groups
RelativeEntropy(inbits)
minmeanmax
2
5
10
20
4060
BiometricError
Rate(in%)
EER
FMR100
0.1
0.15
0.2
0.3
0.4
Score
cross-entropy
Cminllr
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15
Experimental Evaluation
Comparison: relative Entropy ⇔ System Performance
full-5
full-10
full-20
full-40
full-full
0
100
200
300
400
500
Duration groups
RelativeEntropy(inbits)
minmeanmax
2
5
10
20
4060
BiometricError
Rate(in%)
EER
FMR100
0.1
0.15
0.2
0.3
0.4
Score
cross-entropy
Cminllr
full-5
full-10
full-20
full-40
full-full
0
100
200
300
400
500
128-bit passwords
FaceFingerprint
Iris
Duration groups
RelativeEntropy(inbits)
[Nautsch+15] A. Nautsch, C. Rathgeb, R. Saeidi, C. Busch: Entropy analysis of i-vector feature spaces in duration-
sensitive speaker recognition., IEEE ICASSP, 2015.
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 14/15
Conclusion
◮ Relative entropy as metric for biometric uniqueness
◮ Accumulation of voice templates by duration depicted
◮ Relative Entropy: from 63.2 up to 421.9 bit, µ > 124.3 bit
◮ Comparability to other biometric modalities
◮ Comparability to password feature spaces (e.g., 128-bit)
◮ Subject collision probability: subject- and sample-depending5× 10−39 down to 2× 10−55 (empiric data)
Andreas Nautsch Entropy in Speaker Recognition / Berlin, 15.12.2015 15/15