Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200611
Speaker DiSpeaker Discriminationscrimination::The Challenge of Conversational Data Dissertation
Committee
Advisor: Robert Yantorno, Ph.D
Members:Dennis Silage, Ph.D.
Brian Butz, Ph.D.Iyad Obeid, Ph.D.
Eugene Kwatny, Ph.d
Uchechukwu O. Ofoegbu
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200622
Presentation OutlinePresentation Outline
• Problem Statement and Research Goal
• Scope of Research
• Distance Analysis
• Feature Analysis
• Data Analysis
• Application Systems
• Fusion of Distances
• Proposal Summary
DissertationCommittee
Advisor: Robert Yantorno, Ph.D
Members:Dennis Silage, Ph.D.
Brian Butz, Ph.D.Iyad Obeid, Ph.D.
Eugene Kwatny, Ph.d
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200633
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Problem Statement and Research Goal
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200644
Conventional Speaker Recognition Conventional Speaker Recognition
Reference SpeechReference Speech
Feature ExtractionFeature ExtractionFeature ExtractionFeature Extraction
Model BuildingModel BuildingModel BuildingModel Building
Test SpeechTest Speech
Feature Feature ExtractionExtraction
Feature Feature ExtractionExtraction ComparisonComparison
ComparisonComparison RecognitionRecognitionDecision Decision
RecognitionRecognitionDecision Decision
SystemSystemOutputOutput
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
• Speaker Identification– Who is this speaker?
• Speaker Verification– Is he who he claims to be?
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200655
Conversation Segmentation Conversation Segmentation
• Broadcast News/Conference Data
• Conversational Data
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
12 13 14 15 16 17 18 19 20-0.5
0
0.5
Time (Seconds)
Am
plitu
de
12 13 14 15 16 17 18 19 20-0.5
0
0.5
Time (Seconds)
Am
plitu
de
0 5 10 15 20 25 30-0.4
-0.2
0
0.2
0.4
0.6
Time (seconds)
Am
plitu
de
0 5 10 15 20 25 30-0.4
-0.2
0
0.2
0.4
0.6
Time (seconds)
Am
plitu
de
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200666
Problems with Conversational DataProblems with Conversational Data
• No a priori information available from participating speakers.– Training is impossible
• No a priori knowledge of change points
• Speakers alternate very rapidly.– Limited amounts of data for single speaker
representations
• Distortion– Channel noise, co-channel data
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200677
Proposed SolutionsProposed Solutions
1. Selective creation of data models
2. Development of an “optimal” distance measure
3. Decision level fusion of distance measures
4. Development of application-specific system
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200688
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Scope of Research
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 200699
Criminal Activity DetectionCriminal Activity Detection
• Monitoring inmate conversations– Prevention of 3-way calls– Notification of suspicious contacts– Enhancement of keyword detection– Uncooperative data collection
• Forensics– Voiceprints
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061010
Commercial ServicesCommercial Services
• Automated Customer Services– Personalized contact with customers
• Search/Retrieval of Audio Data
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061111
Homeland SecurityHomeland Security
• Military Activities– Pilot-control tower communications– Detection of unidentified speakers on
pilot radio channels
• Terrorist Identification
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061212
Distance Analysis
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061313
Distance MeasuresDistance Measures
• Univariate vs. Multivariate Analysis
Difference between meansDifference between means
Standard DeviationStandard Deviation
Standard DeviationStandard Deviation
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061414
Distance MeasuresDistance Measures
• Notations– Random variables being compared:
X = [X1, X2, …, Xp]: nx by p matrix
Y = [Y1, Y2, …, Yp]: ny by p matrix
• Properties– Q(X, Y) ≥ 0, – Q(X, Y) = 0 iff X = Y, – Q(X, Y) = Q(Y, X),
– Q(X, Y) ≤ Q(X, Z) + Q(Z,Y)
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061515
Distance MeasuresDistance Measures
• Mahalanobis Distance
QMAHANALOBIS(X,Y) = (μx – μy)T Σ-1 (μx – μy)
Σ = combined covariance matrix of X and Y
• Hotelling’s T-Square StatisticsHotelling’s T-Square Statistics
Cik = ith row and kth column of the inverse of C
)()( ykxkyixi
ikp
i
p
k
Cnn
nn
1 1yx
yxTSQ Y)(X, Q
2
)1()1( C
yx
yyxx
nn
nn
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061616
Distance MeasuresDistance Measures
• Kullback-Leibler (KL) Distance
• Bhattacharya Distance
)2(2
1))(()(
2
1 1111 ItrQ xyyxxyyxT
xyKULLBACK
yx
yx
xyyxT
xyYYABHATTACHARQ
2log(
2
1)()()(
4
1 1
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061717
Distance MeasuresDistance Measures
• Levene’s Test
– Derived from T-Square statistics as follows:
• Each set of points is transformed along each vector into absolute divergence from the mean vector
• The T-Square Statistic is then applied on the transformed features.
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061818
Randomly Randomly Select 2 Select 2
UtteranceUtterancess
Randomly Randomly Select 2 Select 2
UtteranceUtterancess
Window Window DataData
Window Window DataData
Window Window DataData
Window Window DataData
10 utterances from Speaker A
ComputCompute 14e 14thth Order Order LPCCLPCC
ComputCompute 14e 14thth Order Order LPCCLPCC
ComputCompute 14th e 14th Order Order LPCCLPCC
ComputCompute 14th e 14th Order Order LPCCLPCC
UtteranUtterance 2ce 2
Utterance 1
CompuCompute te
DistancDistancee
CompuCompute te
DistancDistancee
Procedural Set-upProcedural Set-up
• HTIMIT database used• Average Utterance Length = 5 seconds
Intra-speaker distance computationsIntra-speaker distance computations
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20061919
Randomly Randomly Select Select
Utterance Utterance
Randomly Randomly Select Select
Utterance Utterance
Compute Compute 1414thth
Order Order LPCCLPCC
Compute Compute 1414thth
Order Order LPCCLPCC
Compute Compute 14th 14th Order Order LPCCLPCC
Compute Compute 14th 14th Order Order LPCCLPCC
CompuCompute te
DistancDistancee
CompuCompute te
DistancDistancee
Procedural Set-upProcedural Set-up
Inter-speaker, different utterances Inter-speaker, different utterances distance computationsdistance computations
Randomly Randomly Select Select
UtteranceUtterance
Randomly Randomly Select Select
UtteranceUtterance
Window Window DataData
Window Window DataData
Window Window DataData
Window Window DataData
Speaker Speaker AA
Speaker Speaker BB
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062020
Analysis of Distance MeasuresAnalysis of Distance Measures
• Mahalanobis Distance – Gaussian Estimate
0 0.5 1 1.5 2 2.5 3 3.50
0.005
0.01
0.015
0.02
0.025
0.03
Distance
Pro
bab
ilit
y o
f O
ccu
rren
ceMahalanobis Distance Comparisons
SSDU
DSDU
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062121
Analysis of Distance MeasuresAnalysis of Distance Measures
• Levene’s Test – Gamma Estimate
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary0 50 100 150 200 250
0
0.02
0.04
0.06
0.08
0.1
0.12
Levene's Test Comparisons
LPCC-based Distances
Pro
babi
lity
of O
ccur
renc
e
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062222
Feature Analysis
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062323
Cepstral AnalysisCepstral Analysis
Frequency Analysis of Speech Excitation ComponentExcitation Component Vocal Tract ComponentVocal Tract Component
XX==
Slowly varying formantsSlowly varying formants
Fast varying harmonicsFast varying harmonics
STFT of SpeechSTFT of Speech
==
==
++
++
Log of STFTLog of STFT Log of ExcitationLog of Excitation Log of Vocal Tract ComponentLog of Vocal Tract Component
IDFT of Log of IDFT of Log of STFTSTFT
Vocal tractVocal tract ExcitationExcitation
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062424
Cepstral FeaturesCepstral Features
• Linear Predictive Cepstral Coefficients
– Obtained Recursively from LPC Coefficients
• Mel-Scale Frequency Cepstral Coefficients
– Nonlinear warping of frequency axis to model the human auditory system
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062525
Cepstral FeaturesCepstral Features
• Delta Cepstral Coefficients
– First and Second derivatives of cepstral coefficients
– Reflects dynamic information– Used as supplement to original cepstral
features
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062626
Analysis of Cepstral FeaturesAnalysis of Cepstral Features
• Mahalanobis Distance
0 1 2 3 40
0.02
0.04
0.06
0.08
0.1
Mahalanobis Distance Comparisons
LPCC-based Distances
Pro
bability o
f O
ccurr
ence
Intra
Inter
0 1 2 3 40
0.02
0.04
0.06
0.08
0.1
0.12
MFCC-based Distances
Pro
bability o
f O
ccurr
ence
Intra
Inter
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062727
Analysis of Cepstral FeaturesAnalysis of Cepstral Features
• Levene’s TestProblem Statement and
Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary 0 100 2000
0.02
0.04
0.06
0.08
0.1
0.12
Levene's Test Comparisons
LPCC-based Distances
Pro
bability o
f O
ccurr
ence
Intra
Inter
0 100 200 300 4000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
MFCC-based Distances
Pro
bability o
f O
ccurr
ence
Intra
Inter
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062828
Feature CombinationFeature Combination
• Proposed Investigation
– What’s the best feature combination?
– Will the delta and delta-delta coefficients contribute to the speaker differentiating ability of the features.
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20062929
Feature Combination AnalysisFeature Combination Analysis
• T-test Based Evaluation
– Why?
• Robust to the Gaussian distribution especially for amounts of data sizes and when the two samples to be compared have approximately equal values.
• Unaffected by differences in the variances of the compared variables
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063030
Data Analysis
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063131
Traditional Speaker ModelingTraditional Speaker Modeling
• Examples– Gaussian Mixture Models– Hidden Markov Models– Neural Networks– Prosody-Based Models
• Disadvantages– Require large amounts– Sometimes require training procedure– Relatively complex
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063232
Conversational Data ModelingConversational Data Modeling
• Current Method– Equal Segmentation of Data– Indiscriminate use of data– Poor performance
• Problems– Change points unknown– Not all speech is useful
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063333
Proposed Speaker ModelingProposed Speaker Modeling
SSSS
VVVV
UUUU
VVVV
UUUU
VVVV
UUUU
…………
UUUU
VVVV
SSSS
VVVV
VVVV
VVVV
VVVV
VVVV
VVVV
VVVV
VVVV. . .. . .
SEGMENT 1SEGMENT 1 SEGMENT MSEGMENT M
FEATURE FEATURE COMPUTATIONCOMPUTATION
FEATURE FEATURE COMPUTATIONCOMPUTATION
MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION
MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION
MODEL 1MODEL 1MODEL 1MODEL 1
MODEL MMODEL MMODEL MMODEL M. . .. . .
. . .. . .
FEATURE FEATURE COMPUTATIONCOMPUTATION
FEATURE FEATURE COMPUTATIONCOMPUTATION
MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION
MEAN AND COVARIANCE MEAN AND COVARIANCE MATRIX COMPUTATIONMATRIX COMPUTATION
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063434
Proposed Speaker ModelingProposed Speaker Modeling
• Why voiced only– Same speech class compared– Contains the most information
• What’s the appropriate number of phonemes
– Large enough to sufficiently represent speakers
– Small enough to avoid speaker overlap
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063535
Modeling AnalysisModeling Analysis
0 0.5 1 1.5 2 2.5 3 3.50
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Distance Value
Pro
babi
lity
Distribution of Mahalanobis Distance - Utterance Based
Same Speaker
Different Speaker
N = 20 – 4 seconds of voiced speech
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063636
Modeling AnalysisModeling Analysis
0 5 10 15 20 250
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Number of segments
Mahala
nobis
dis
tance
Speaker Differentiation with Respect to Data Size
Same Speaker
Different Speaker
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063737
Modeling AnalysisModeling Analysis
0.5 1 1.5 2 2.5 3 3.50
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Distance value
Pro
babili
ty
Distributions of Mahalanobis Distance - Segment Based
Same Speaker
Different Speaker
N = 5 – 1 second of voiced speech
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063838
Applications Systems
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20063939
Unsupervised Speaker IndexingUnsupervised Speaker Indexing
• The Restrained-Relative Minimum Distance (RRMD) Approach
0 D0 D1,21,2 D D1,31,3 … …
DD2,12,1 0 D 0 D2,32,3 … …
DD3,13,1 D D3,23,2 0 … 0 …
……
0 D0 D1,21,2 D D1,31,3 … …
DD2,12,1 0 0 DD2,32,3 … …
DD3,13,1 DD3,23,2 0 … 0 …
……
REFERENCE MODELSREFERENCE MODELS
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064040
Unsupervised Speaker IndexingUnsupervised Speaker Indexing
• The Restrained-Relative Minimum Distance (RRMD) Approach
Reference 2Reference 2
Restraining Condition
Restraining Condition
Same Speaker
PassedPassedRelativeDistance
Condition
RelativeDistance
Condition
FailedFailed
Passed
FailedFailed
Unusable Data
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Reference 1Reference 1
Observe distanceObserve distance
Min. Distance
Same Speaker?
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064141
RRMD ApproachRRMD Approach
• Restraining Condition
– Distance Likelihood Ratio
DLR > 1 Same Speaker
DLR < 1 Check Relative
Distance Condition
),|(
),|(
22
11
xf
xfDLR
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064242
RRMD ApproachRRMD Approach
• Relative Distance Condition
– Relative Distance:
Drel = dmax – dmin
– Drel > threshold Same Speaker
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Reference 2Reference 2
Reference 1Reference 1
dmin dmax
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064343
Preliminary ResultsPreliminary Results
• Experiments
– 245 telephone conversations from the SWITCHBOARD database, with an average length of 400 seconds.
– T-Square statistics used
– Ground truth obtained from Mississippi State Transcriptions
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064444
Preliminary ResultsPreliminary Results
• Best N Estimation
2 4 6 8 10 12 14 16 18 2070
72
74
76
78
80
82
84Average Indexing Accuracy Wth Respect to Number of Voiced Phonemes Per Models
Acc
ura
cy
N = Number of Voiced Phonemes
N = 5N = 5
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064545
Preliminary ResultsPreliminary Results
• RRMD Experiments– Drel Varied from 0-200
– Two Errors Defined
• Indexing Error
Ierr = 100 – Accuracy,
• Undecided Error
Nu = number of detected undecided/unusable samples,
Nc = number labeled as co-channel data
‘undecided error’ :
%100
T
Err N
NcNuU
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064646
Preliminary ResultsPreliminary Results
0 20 40 60 80 100 120 140 160 180 2000
2
4
6
8
10
12
14
16
18
20Classification Error with Respect to Relative Distance Threshold
Per
cen
t E
rro
r
Threshold
Indexing Error
Undecided Error
Equal error
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064747
Speaker Count SystemSpeaker Count System
• The Residual Ratio Algorithm (RRA)
• Process is repeated K-1 times for counting up to K speakers
Reference Model Reference Model Selected RandomlySelected Randomly
DLR-based DLR-based Model Model ComparisonComparison
Reference Model Reference Model Selected RandomlySelected Randomly
. . .. . .
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
DLR-based DLR-based Model Model ComparisonComparison
Reference Model Reference Model Selected RandomlySelected Randomly
Too little data Removed, select
Another model
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064848
RRA Examples – 2 SpeakersRRA Examples – 2 Speakers
0 10 20 30 40 50 60-20
0
20How the Residual Ratio Algorithm Works for Two-Speaker Conversations
Init
ial
Sp
eech
0 10 20 30 40 50 60-20
0
20
Ro
un
d 2
Res
idu
al
Time (Seconds)
0 10 20 30 40 50 60-20
0
20
Ro
un
d 3
Res
idu
al
Time (Seconds)
Speaker 1
Speaker 2
Speaker 2
Speaker 2
Speaker 1 Speaker 2
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20064949
RRA Examples – 3 SpeakersRRA Examples – 3 Speakers
0 10 20 30 40 50 60 70 80 90 100-20
0
20How the Residual Ratio Algorithm Works for Three-Speaker Conversations
Initi
al S
peec
h
0 10 20 30 40 50 60 70 80 90 100-20
0
20
Rou
nd 2
Res
idua
l
Time (Seconds)
0 10 20 30 40 50 60 70 80 90 100-20
-10
0
10
Rou
nd 3
Res
idua
l
Time (Seconds)
Speaker 1
Speaker 2Speaker 3
Speaker 2
Speaker 1
Speaker 2
Speaker 3
Speaker 2
Speaker 3
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065050
ComparisonComparison
Speaker 2
Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA
Residual Ratio after 2nd round Residual Ratio after 2nd round of RRAof RRA
TWO-SPEAKER RESIDUALTWO-SPEAKER RESIDUAL THREE-SPEAKER RESIDUALTHREE-SPEAKER RESIDUAL
Speaker 2
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065151
Preliminary ResultsPreliminary Results
• Experiments
– HTIMIT Database
– 1000 artificially generated K-speaker conversations (each) for K=1-4
– Average conversation length = 1min
– Mahalanobis distance used
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065252
Preliminary ResultsPreliminary Results
• Counting Techniques– Stopped Residual Ratio (SRR)
– Added Residual Ratio (ARR)• speaker count determined based on the sum of
the Residual Ratios for all K-1 rounds. The higher the ARR= higher speaker count
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065353
Preliminary ResultsPreliminary Results
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
ARR
Pro
babi
lity
ARR Probability Distributions
1 Speaker2 Speakers3 Speakers4 Speakers
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
ARR
Pro
babi
lity
ARR Probability Distributions
1 Speaker
2 Speakers
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
ARR
Pro
babi
lity
ARR Probability Distributions
1 Speaker2 Speakers3 Speakers
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065454
Preliminary ResultsPreliminary Results
RRA Accuracy
40
50
60
70
80
90
100
1 or More 1, 2 or More 1, 2, 3 or 4
Accuracy Method
Pe
rce
nt
Co
rre
ct
SRR
ARR
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065555
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Fusion of Distances
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065656
Correlation AnalysisCorrelation Analysis
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
20 40 60 80 100120140Levenne
2 4 6Bhrattacharyya
50 100 150 200KL
0 100 200 300TSquared
1 2 3
20406080
100120140
Mahalanobis
Leve
nne
2
4
6
Bhr
atta
char
yya
50
100
150
200
KL
0
100
200
300
TS
quar
ed
1
2
3
Mah
alan
obis
Draftsmans Dispalay of Distances (LPCC)
Intra
Inter
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065757
Correlation AnalysisCorrelation Analysis
50 100 150 200 250Levenne
4 6 8 10Bhrattacharyya
200 400 600 800KL
200 400 600 800TSquared
1.5 2 2.5 3 3.5
50
100150
200
250
Mahalanobis
Leve
nne
4
6
8
10
Bhr
atta
char
yya
0
500
1000
KL
200
400
600
800
TS
quar
ed
1.5
2
2.5
33.5
Mah
alan
obis
Draftsmans Dispalay of Distances (MFCC)
IntraInter
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065858
““Best Distance”Best Distance”
• Optimized Fusion of Distances
– Minimize inter-speaker variation
– Maximize intra-speaker variation
– Maximize T-test value between inter-class distance distributions
XaTT max
TTmaxmax = New Distance = New Distance
X = vector consisting of the distance measure values X = vector consisting of the distance measure values aa = vector of the weights assigned to each distance measure = vector of the weights assigned to each distance measure
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20065959
““Best Distance”Best Distance”
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Distance Measure 1
Dis
tan
ce M
easu
re 2
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066060
Preliminary ExperimentsPreliminary Experiments
0.5 1 1.5 2 2.5 3 3.5 40
0.02
0.04
Pro
bability
Tmax - 44.3369
0.5 1 1.5 2 2.5 3 3.50
0.02
Pro
bability
Mahalanobis - 44.0652
0 200 400 600 8000
0.050.1
Pro
bability
TSquared - 35.2111
0 50 100 150 200 250 300 3500
0.050.1
Pro
bability
KL - 22.7672
2 4 6 80
0.05
Pro
bability
Bhrattacharyya - 33.7449
0 50 100 150 200 2500
0.05
Pro
bability
Distance Feature - LPCC
Levenne - 13.4432
intra
inter
LPCCsLPCCs
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066161
Preliminary ExperimentsPreliminary Experiments
LPCCsLPCCs
1 2 3 4 5 44.05
44.1
44.15
44.2
44.25
44.3
44.35
Mahalanobis
LevenneBhrattacharyya
KLTSquared
Increase in Inter-Class Separation as Number of Distances is Increased Features - LPCC
Number of Distances
T-T
est
Val
ue
Mahalanobis
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066262
Preliminary ExperimentsPreliminary Experiments
MFCCsMFCCs
1 2 3 4 50
0.02
Pro
bability
39.4524
1 1.5 2 2.5 3 3.5 40
0.020.04
Pro
bability
Mahalanobis - 38.2733
0 200 400 600 800 1000 12000
0.050.1
Pro
bability
TSquared - 32.5542
0 500 1000 1500 20000
0.050.1
Pro
bability
KL - 11.0738
5 10 150
0.05
Pro
bability
Bhrattacharyya - 23.4276
0 100 200 300 400 5000
0.05
Pro
bability
Distance Feature - MFCC
Levenne - 16.8735
intra
inter
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066363
Preliminary ExperimentsPreliminary Experiments
MFCCsMFCCs
1 2 3 4 538.2
38.4
38.6
38.8
39
39.2
39.4
39.6
Mahalanobis
Levenne
KL
TSquared
Bhrattacharyya
Increase in Inter-Class Separation as Number of Distances is Increased Features - MFCC
Number of Distances
T-t
est
Val
ue
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066464
Proposal Summary
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066565
Research Goal RevisitedResearch Goal Revisited
• To overcome the following challenges faced in between differentiating speakers participating in conversations:
– No a priori information– Limited data size– No knowledge of change points– Co-channel speech
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066666
Summary of Work AccomplishedSummary of Work Accomplished
• Practically demonstration of the existence of the problem.
• Analysis of distance measures and features
• Development of a novel model formation technique
• Development, implementation and evaluation of two conversations-based speaker differentiation systems
• Introduction to and preliminary testing of an “optimal” distance formation
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066767
Proposed WorkProposed Work
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
• Features Combinations– Determination of the best combination of
features using univariate tests of similarity– Enhancement of feature combinations using
Principal Component Analysis.
• Fusion of Distance measure– Enhancement of fusion technique using
mutual information suppression techniques– Decision-level distance measure fusion
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066868
Proposed WorkProposed Work
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
• Further development of introduced systems– Use of all distance measures– Use of best feature combination– The use of the “optimal distance”– Implementation of decision-level fusion
technique
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20066969
Final GoalFinal Goal
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
A speaker recognition system for conversations yields results which are comparable to non-conversational systems.
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067070
PublicationsPublications
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
• U. Ofoegbu, A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006
• U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007
• U. Ofoegbu, A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006.
• U. Ofoegbu, A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006.
• A. Iyer, U. Ofoegbu, R. Yantorno, “Speaker Discriminative Distances: Comprehensive Study”, IEEE Transactions on Speech and Audio Processing. (Submitted).
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067171
DissertationCommittee
Advisor: Robert Yantorno, Ph.D
Members:Dennis Silage, Ph.D.
Brian Butz, Ph.D.Iyad Obeid, Ph.D.
Eugene Kwatny, Ph.d
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067272
Cepstral FeaturesCepstral Features
• Linear Predictive Cepstral Coefficients
– Obtained Recursively from LPC Coefficients
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
Let LPC vector = [a0 a1 a2 …ap] and
LPCC vector = [c0 c1 c2 …cp c0 … c1 c2 …cn-1]
20 ln Ec
nmpcam
kmc
m
kkmkm
,
)(1
1)(
pmcakmm
acm
kkmkmm
1,)(
1 1
1)(
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067373
Conversational Data ModelingConversational Data Modeling
• Current Method– Equal Segmentation of Data– Indiscriminate use of data
• Problems– Change points unknown– Not all speech is useful
Problem Statement and Problem Statement and Research GoalResearch Goal
Scope of ResearchScope of Research
Distance AnalysisDistance Analysis
Feature AnalysisFeature Analysis
Data AnalysisData Analysis
Application SystemsApplication Systems
Fusion of DistancesFusion of Distances
Proposal SummaryProposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067474
““Best Distance”Best Distance”
• Intra-speaker and inter-speaker distance lengths are always equal, therefore:
P = sum of the covariance matrices of the two classes.
λ1 = maximum eigenvalue obtained by solving the generalized eigenvalue problem:
Q = is the square of the distance between the mean vectors of the two classes
22
21
21)(
aT )( 21
1
1
Pa
k
1
211
1 )( Pk
aQaP 11
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary
Uchechukwu Ofoegbu, Dissertation Proposal
Speech Processing Lab
Sept. 29, 2006Sept. 29, 20067575
RRMD ApproachRRMD Approach
• Relative Distance Condition
0 100 200 300 400 500 6000
0.05
0.1
0.15
0.2
T-Square Statistics
Pro
bab
ilit
y
Distribution of T-Square Statistics for N = 5
Intra Speaker
Inter Speaker
D rel
Problem Statement and Research Goal
Scope of Research
Distance Analysis
Feature Analysis
Data Analysis
Application Systems
Fusion of Distances
Proposal Summary