![Page 1: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/1.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1
Dr. Hagai Aronowitz
IBM Haifa Research Lab
Presentation is available online at: http://aronowitzh.googlepages.com/
Intra-Class Variability Modeling for Speech Processing
![Page 2: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/2.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 2
Given labeled training segments from class + and class –, classify unlabeled test segments
Classification framework
1. Represent speech segments in segment-space
2. Learn a classifier in segment-space• SVMs• NNs• Bayesian classifiers• …
Speech ClassificationProposed framework
![Page 3: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/3.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 3
OutlineIntra-Class Variability Modeling for Speech Processing
1 Introduction to GMM based classification
2 Mapping speech segments into segment space
3 Intra-class variability modeling
4 Speaker diarization
5 Summary
![Page 4: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/4.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 4
GMM based speaker recognitionEstimate Pr(yt|S)
1. Train a universal background model (UBM) GMM using EM2. For every target speaker S:
Train a GMM GS by applying MAP-adaptation
Text-Independent Speaker RecognitionGMM-Based Algorithm [Reynolds 1995]
Assuming frame independence:
T
tT SySyy1t
1 Pr,...,Pr
?Pr SY
UBM
Q1 - speaker #1
Q2 - speaker #2
μ1 μ2 μ3
R26 MFCC feature space
![Page 5: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/5.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 5
1. Invalid frame independence assumption:
Factors such as channel, emotion, lexical variability, and
speaker aging cause frame dependency
2. GMM scoring is inefficient – linear in the length of the
audio
3. GMM scoring does not support indexing
GMM Based Algorithm - Analysis
![Page 6: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/6.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 6
OutlineIntra-Class Variability Modeling for Speech Processing
1 Introduction to GMM based classification
2 Mapping speech segments into segment space
3 Intra-class variability modeling
4 Speaker diarization
5 Summary
![Page 7: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/7.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 7
Mapping Speech Segments into Segment SpaceGMM scoring approximation 1/4
Definitions
X: training session for target speaker
Y: test session
Q: GMM trained for X
P: GMM trained for Y
Goal
Compute Pr(Y |Q) using GMMs P and Q only
Motivation
1. Efficient speaker recognition and indexing
2. More accurate modeling
![Page 8: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/8.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 8
QPHdxQxPxQyQYx
T
T
ttTT
,PrlogPrPrlogPrlog1
11
)1(
Negative cross entropy
Mapping Speech Segments into Segment SpaceGMM scoring approximation 2/4
Approximating the cross entropy between two GMMs
1. Matching based lower bound [Aronowitz 2004]
2. Unscented-transform based approximation [Goldberger & Aronowitz 2005]
3. Others options in [Hershey 2007]
![Page 9: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/9.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 9
CwwQPH
D
d
D
d
Qdj
D
d
Qj
j
G
g
Pg Q
dj
Pdg
Qdj
Qdj
Pdg
1
2
21
1,
1 21 ,
,
2
,
2
,, loglogmax,
(2)
Matching based approximation
Mapping Speech Segments into Segment SpaceGMM scoring approximation 3/4
Assuming weights and covariance matrices are speaker independent (+ some approximations):
CwQPH
G
g
D
di
dg
Qdg
Pdg
1 1 22
,
2
,,,
(3)
Mapping T is induced:
dg
GMMdg
gdDg
GD
wGMMT
RGMMT
,
,*ˆ;ˆ
:
(4)
![Page 10: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/10.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 10
Results
Mapping Speech Segments into Segment SpaceGMM scoring approximation 4/4
Figure and Table taken from:H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September 2007.
![Page 11: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/11.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 11
1. Anchor modeling projection [Sturim 2001]
• efficient but inaccurate
2. MLLR transofrms [Stolcke 2005]
• accurate but inefficient
3. Kernel-PCA-based mapping [Aronowitz 2007c]
Given - a set of objects
- a kernel function
(a dot product between each pair of objects)
Finds a mapping of the objects into Rn which preserves the
kernel function.• accurate & efficient
Other Mapping Techniques
![Page 12: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/12.jpg)
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 13
Introduction Mapping Modeling Speaker Diarization Summary
OutlineIntra-Class Variability Modeling for Speech Processing
1 Introduction to GMM based classification
2 Mapping speech segments into segment space
3 Intra-class variability modeling
4 Speaker diarization
5 Summary
![Page 13: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/13.jpg)
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 14
Introduction Mapping Modeling Speaker Diarization Summary
The classic GMM algorithm does not explicitly model intra-speaker inter-session variability:• channel, noise• language• stress, emotion, aging
The frame independence assumption does not hold in these cases!
T
tT SySyy1t
1 Pr,...,Pr)1(
dffSySfdfSfyySyyT
tTT
1t
11 ,PrPr,,...,Pr,...,Pr)3(
Instead, we can use a more relaxed assumption:
Intra-Class Variability Modeling [Aronowitz 2005b] Introduction
T
tT fSyfSyy1t
1 ,Pr,,...,Pr)2(
which leads to:
![Page 14: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/14.jpg)
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 15
Introduction Mapping Modeling Speaker Diarization Summary
Speaker
FrameFrame
sequencesequencegenerated independently
a GMM
Old vs. New Generative Models
Session GMM
FrameFrame
sequencesequence
Speaker a PDF over GMM space
a GMM
generated independently
Old Model New Model
![Page 15: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/15.jpg)
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 16
Introduction Mapping Modeling Speaker Diarization Summary
speaker #1 speaker #2
speaker #3
Session-GMM Space
Session-GMM space
GMM for session A of speaker #1
GMM for session B of speaker #1
![Page 16: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/16.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 17
GDs~
,~|ˆPr NS
Modeling in Session-GMM space 1/2
Recall mapping T induced by the GMM approximation analysis:
• is called a supervector• A speaker is modeled by a multivariate normal distribution in supervector space:
)3(
• A typical dimension of is 50,000*50,000• is estimated robustly using PCA + regularization: Covariance is assumed to be a low rank matrix with an additional non-zero (noise) diagonal
GDΣ~
GDΣ~
dg
GMMdg
gdDg
GD
wGMMT
RGMMT
,
,*ˆ;ˆ
:
![Page 17: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/17.jpg)
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 18
Introduction Mapping Modeling Speaker Diarization Summary
Supervector space
GDΣ~
1
2
1
2
1
2
1
2
1
2
1
2speaker #1 speaker #2
speaker #3 Delta supervector space
sΣ2~
Modeling in Session-GMM Space 2/2Estimating covariance matrix
![Page 18: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/18.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 19
• is estimated from the NIST-2006-SRE corpus• Evaluation is done on the NIST-2004-SRE corpus
• ETSI MFCC (13-cep + 13-delta-cep)• Energy based voice activity detector• Feature warping• 2048 Gaussians• Target models are adapted from GI-UBM• ZT-norm score normalization
GDΣ~
Experimental Setup
Datasets
System description
![Page 19: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/19.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 20
Results
38% reduction in EER
![Page 20: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/20.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 21
• NAP+SVMs [Campbell 2006]
• Factor Analysis [Kenny 2005]
• Kernel-PCA [Aronowitz 2007c]
• Model each supervector as
s S : Common speaker subspace
u U : Speaker unique subspace
• S is spanned by a set of development supervectors (700 speakers) • U is the orthogonal complement of S in supervector space• Intra-speaker variability is modeled separately in S and in U• U was found to be more discriminative than S• EER was reduced by 44% compared to baseline GMM
Other Modeling Techniques
Kernel-PCA based algorithm
us
![Page 21: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/21.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 22
Session space
Feature space
x
f(x)
Tx
Common speaker subspace (Rn)
y
f(y)
Ty
uy
ux
Speaker unique subspace
K-PCA
Anchor sessions
Kernel-PCA Based Modeling
Kernel induced
![Page 22: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/22.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 23
OutlineIntra-Class Variability Modeling for Speech Processing
1 Introduction to GMM based classification
2 Mapping speech segments into segment space
3 Intra-class variability modeling
4 Speaker diarization
5 Summary
![Page 23: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/23.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 24
Goals
• Detect speaker changes – “speaker segmentation”
• Cluster speaker segments - “speaker clustering”
Motivation for new method
Current algorithms do not exploit available training data!
(besides tuning thresholds, etc.)
Method
Explicitly model inter-segment intra-speaker variability from labeled
training data, and use for the metric used by change-detection /
clustering algorithms.
Trainable Speaker Diarization [Aronowitz 2007d]
![Page 24: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/24.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 25
Dev data
• BNAD05 (5hr) - Arabic, broadcast news
Eval data
• BNAT05 – Arabic, broadcast news,
(207 target models, 6756 test segments)
System EER (%)
Anchor modeling (baseline) 15.1
Anchor modeling - Kernel based scoring 10.8
Kernel-PCA projection (CSS) 8.8
Kernel-PCA projection (CSS) + inter-segment variability modeling
7.4
Speaker recognition on pairs of 3s segments
![Page 25: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/25.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 26
Speaker change detection
• 2 adjacent sliding windows (3s each)
• Speaker verification scoring + normalization
Speaker clustering
• Speaker verification scoring + normalization
• Bottom-up clustering
Speaker Error Rate (SER) on BNAT05
• Anchor modeling (baseline): 12.9%
• Kernel-PCA based method: 7.9%
Speaker Diarization System & Experiments
![Page 26: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/26.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 27
1 Introduction to GMM based classification
2 Mapping speech segments into segment space
3 Intra-class variability modeling
4 Speaker diarization
5 Summary
OutlineIntra-Class Variability Modeling for Speech Processing
![Page 27: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/27.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 28
• A method for mapping speech segments into a GMM
supervector space was described
• Intra-speaker inter-session variability is modeled in
GMM supervector space
Speaker recognition
• EER was reduced by 38% on the NIST-2004 SRE
• A corresponding kernel-PCA based approach reduces
EER by 44%
Speaker diarization
• SER for speaker diarization was reduced by 39%.
Summary 1/2
![Page 28: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/28.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 29
• Speaker recognition [Aronowitz 2005b; Aronowitz 2007c]
• Speaker diarization (“who spoke when”) [Aronowitz 2007d]
• VAD (voice activity detection) [Aronowitz 2007a]
• Language identification [Noor & Aronowitz 2006]
• Gender identification [Bocklet 2008]
• Age detection [Bocklet 2008]
• Channel/bandwidth classification [Aronowitz 2007d]
Summary 2/2Algorithms based on the proposed framework
![Page 29: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/29.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 30
[1] D. A. Reynolds et al., “Speaker identification and verification using Guassian mixture speaker models,” Speech Communications, 17, 91-108.
[2] D.E. Sturim et al., “Speaker indexing in large audio databases using anchor models”, in Proc. ICASSP, 2001.
[3] H. Aronowitz, D. Burshtein, A. Amir, "Speaker indexing in audio archives using test utterance Gaussian mixture modeling", in Proc. ICSLP, 2004.
[4] H. Aronowitz, D. Burshtein, A. Amir, "A session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification", in Proc. ICASSP, 2005.
[5] P. Kenny et al., “Factor Analysis Simplified”, in Proc. ICASSP, 2005.
[6] H. Aronowitz, D. Irony, D. Burshtein, “Modeling Intra-Speaker Variability for Speaker Recognition ”, in Proc. Interspeech, 2005.
[7] J. Goldberger and H. Aronowitz, "A distance measure between GMMs based on the unscented transform and its application to speaker recognition" , in Proc. Interspeech 2005.
[8] H. Aronowitz, D. Burshtein, "Efficient Speaker Identification and Retrieval", in Proc. Interspeech 2005.
Bibliography 1/2
![Page 30: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/30.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 31
[9] A. Stolcke et al., “MLLR Transforms as Features in Speaker Recognition”, in Proc. Interspeech, 2005.
[10] E. Noor, H. Aronowitz, "Efficient language Identification using Anchor Models and Support Vector Machines,“ in Proc. ISCA Odyssey Workshop, 2006.
[11] W.M. Campbell et al., “SVM Based Speaker Verification Using a GMM Supervector Kernel and NAP Variability Compensation”, in Proc. ICASSP 2006.
[12] H. Aronowitz, “Segmental modeling for audio segmentation”, in Proc. ICASSP, 2007.
[13] J.R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models” ,in Proc. ICASSP 2007.
[14] H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September 2007.
[15] H. Aronowitz, “Speaker Recognition using Kernel-PCA and Intersession Variability Modeling”, in Proc. Interspeech, 2007.
[16] H. Aronowitz, “Trainable Speaker Diarization”, in Proc. Interspeech, 2007.[17] T. Bocklet et al., “Age and Gender Recognition for Telephone Applications
Based on GMM Supervectors and Support Vector Machines”, in Proc. ICASSP, 2008.
Bibliography 2/2
![Page 31: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/31.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 32
Presentation is available online at: http://aronowitzh.googlepages.com/
Thanks!
![Page 32: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/32.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 33
Backup slides
![Page 33: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/33.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 34
Session spaceDot-product feature space
f(x)
f(y)
x
yKernel trick
Anchor sessions
f()
Goals: - Map sessions into feature space
- Model in feature space
Kernel-PCA Based Mapping 2/5
![Page 34: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/34.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 35
Given - kernel K
- n anchor sessions
Find an orthonormal basis for
Method
1) Compute eigenvectors of the centralized kernel-matrix ki,j =
K(Ai,Aj).
2) Normalize eigenvectors by square-roots of corresponding
eigenvalues → {vi}
3) for is the requested basis
},...,{ 1 nAfAfspan
ini vAfAff ,...,1}{ if
nAA ,...,1
Kernel-PCA Based Mapping 3/5
![Page 35: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/35.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 36
nn AxK
AxK
v
v
xT
,
...
,
...:11
is a mapping x→Rn with the property:
Given sessions x, y, may be uniquely represented as:
},...,{/
},...,{
1
1
n
n
AfAfspanFU
AfAfspanC
Common speaker subspace -
Speaker unique subspace -
UuuCccucyfucxf yxyxyyxx ,and,withand
()(,) yfxf
22
yx ccyTxT
Kernel-PCA Based Mapping 4/5
![Page 36: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/36.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 37
Session space Feature space
x f(x)
Tx
Common speaker subspace (Rn)
y
f(y)
Ty
uy
ux
Speaker unique subspace
K-PCA
Anchor sessions
Kernel-PCA Based Mapping 5/5
![Page 37: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/37.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 38
Modeling in Segment-GMM Supervector Space
Segment-GMM supervector spaceSegment-GMM supervector space
FrameFrame
sequence:sequence:
segment #1segment #1
FrameFrame
sequence:sequence:
segment #2segment #2
FrameFrame
sequence:sequence:
segment #nsegment #n
music
speechsilence
![Page 38: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/38.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 39
Segmental Modeling for Audio Segmentation
Goal
• Segment audio accurately and robustly into speech / silence / music segments.
Novel idea
• Acoustic modeling is usually done on a frame-basis.
• Segmentation/classification is usually done on a segment-basis (using smoothing).
Why not explicitly model whole segments?
Note: speaker, noise, music-context, channel (etc.) are constant during a segment.
![Page 39: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/39.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 40
10-2
10-1
10-2
10-1
speech miss probability
sile
nce
mis
s pr
obab
ility
SPEECH / SILENCE SEGMENTATION
IBM EVAL06IBM EVAL06 no-padGMM baselineSegmental System EER FA @
FR=0.5%
FR @
FA=1%
EVAL06 FA=24.2% @ FR=0.25%
GMM
baseline
2.9% 7.9% 29.6%
Segmental 1.7% 5.1% 2.7%
Error
reduction
41% 35% 91%
Speech / Silence Segmentation – Results 1/2
![Page 40: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/40.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 41
10-3
10-2
10-1
10-2
10-1
speech miss probability
mus
ic m
iss
prob
abili
ty
SPEECH / MUSIC SEGMENTATION
IBM EVAL06IBM EVAL06 no-padGMM baselineSegmental
System EER FA @
FR=0.5%
FR @
FA=1%
EVAL06 FA=69% @ FR=0.25%
GMM
baseline
1.43% 3.4% 3.2%
Segmental 1.27% 2.0% 1.9%
Error
reduction
11% 41% 41%
Speech / Silence Segmentation – Results 2/2
![Page 41: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/41.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 42
LID in Session Space
English
Arabic
FrenchSession space
Training session Test session
![Page 42: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/42.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 43
1. Front end: shifted delta cepstrum (SDC).
2. Represent every train/test session by a GMM super-vector.
3. Train a linear SVM to classify GMM super-vectors.
Results
• EER=4.1% on the NIST-03 Eval (30sec sessions).
LID in Session Space - Algorithm
![Page 43: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/43.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 44
Anchor Modeling Projection
• Speaker indexing [Sturim et al., 2001]
• Intersession variability modeling in projected space [Collet et
al., 2005]
• Speaker clustering [Reynolds et al., 2004]
• Speaker segmentation [Collet et al., 2006]
• Language identification [Noor and Aronowitz, 2006]
nXsXsX ˆ,...,ˆ 1
UBM
iFi X
XXs
Pr
Prlogˆ 1
Given: anchor models λ1,…,λn and session X= x1,…,xF
= average normalized log-likelihood
Projection:
![Page 44: Intra-Class Variability Modeling for Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062321/568136b3550346895d9e5865/html5/thumbnails/44.jpg)
Introduction Mapping Modeling Speaker Diarization Summary
H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 45
The classic GMM algorithm does not explicitly model intra-speaker inter-session variability:• Noise• Channel• Language• Changing speaker characteristics – stress, emotion, aging
The frame independence assumption does not hold in these cases!
T
tT SySyy1t
1 Pr,...,Pr)1(
dffSySfdfSfyySyyT
tTT
1t
11 ,PrPr,,...,Pr,...,Pr)2(
Instead, we get:
Intra-Class Variability ModelingIntroduction
fSt Gy ,Pr SG fS ,Pr