robust speech feature
DESCRIPTION
Robust Speech Feature. Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal , in EuroSpeech 99. DLFBE ---Preliminary. * MFCC is very successful in speech recognition * MFCC computed from the speech signal using - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/1.jpg)
Robust Speech Feature
Decorrelated and Liftered Filter-Bank Energies
(DLFBE)
Proposed by K.K. Paliwal , in EuroSpeech 99
![Page 2: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/2.jpg)
DLFBE ---Preliminary
* MFCC is very successful in speech recognition
* MFCC computed from the speech signal using
the following three steps: 1.Compute the FFT power spectrum of the speech signal
2.Apply a Mel-space filter-bank to the power spectrum to get N
energies (N=20~60)
3.Compute discrete cosine x’form (DCT) of log filter-bank energies
to get uncorrelated MFCC’s (M=10)
![Page 3: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/3.jpg)
DLFBE --- Motivation
*MFCC has two drawbacks 1. Does not have any physical interpretataion
2. Liftering of cepstral coefficient has no effect in the
modern speech recognition (discuss later)
*The two problem(i.e., numbers and correlation)
in FBE used in 50’s, 60’s,70’s can be solved
today
![Page 4: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/4.jpg)
Liftering --- What and How
*Lifter is the reweighting process of cepstral
coeff. used in DTW framework of speech
recognition
where is dissimilarity between the test vector and the mean vector
2
1
)'()'()'()';(
D
iii
t xxxxxxxxd
)',( xxd
x 'x
Euclidean distance
![Page 5: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/5.jpg)
Liftering --- What and How (cont’d)
Where is i-th cepstral coeff. , is the corresponding liftering coeff. and is the lifter
So
iii xwy iyix
iw
xhgfe
dcba
x
w
w
w
y
D
....
....
000
0...
000
000
2
1
More general form
![Page 6: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/6.jpg)
Liftering --- What and How (cont’d)
2
1
)'()'()'()',(
D
iii
t yyyyyyyyd
2
0
)]'([
D
iiii xxw
![Page 7: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/7.jpg)
Liftering --- What and How (cont’d)
The types of lifters are listed belows
1.Linear lifter
2.Statistical lifter
3.Sinusoidal lifter
4.Exponential lifter
iwi
iiw
1
)sin(2
1D
iDwi
)2
exp(2
2
i
iw si 5,5.1 s
![Page 8: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/8.jpg)
Liftering --- Discussion and Why
* The multiplicative weighting in cepstrum domain is equivalent to convolution
in spectral domain
Spectral domain Cepstral domain
Type 1 and 2 HP filter Emphasize the higher
cepstral coeff’s.
Type 3 and 4 BP filter Lessen the higher and lower cepstral coeff’s.
kk
IFFT
nn WCwc .
![Page 9: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/9.jpg)
Liftering --- Experiment on DTW
![Page 10: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/10.jpg)
Liftering on CDHMM (??) --- Why
Mahalanobis distance measure due to out
observation prob.
)'()'(),';( 1'' xxxxxxdx
t
x
![Page 11: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/11.jpg)
Liftering on CDHMM (??) --- Why
liftering matrix for MFCC
where
DDDw
w
w
w
W
*
3
2
1
.000
.....
0.00
0.00
0.00
txy WWWxyWxy '','',
![Page 12: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/12.jpg)
Liftering on CDHMM (??) --- Why
Thus,cepstral liftering has no effect in the recognition
process when used with continuous observation Gaussian
Density HMM’s
),';(
)'()'()'()()()'(
)'()()'()'()'(),';(
'
11'
1
1'
1' '
x
tx
ttt
tx
t
y
ty
xxd
xxWxxxxWWWWxx
WxWxWWWxWxyyyyyyd
![Page 13: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/13.jpg)
Decorrelation of FBE --- Why/How
*FBEs are correlated => we can’t use CDHMM
* We can use LP techniques to solve this defeat
can be obtained by covariance method of
LP analysis
p
i
ii za
zPzA
1
1
1
)(1
1)(
1,...,1,0},{ Nnen
}{ ia
P M N
M
![Page 14: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/14.jpg)
Liftering of FBE --- How
L
i
iizhzH
0
)(1,...,1,0},{ NnenM
FIR filter
N=M+L
![Page 15: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/15.jpg)
DLFBE --- Experiment
*SI and isolated word recognition using ISOLET spoken letter database
*90 training utterances from 90 speakers(45 females,45 males)
30 testing utterances from 30 speakers (15 females,15 males)
![Page 16: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/16.jpg)
DLFBE --- Experiment (cont’d)
)(zp
no1
1)( zazP
22
11)( zazazP
no
no
no
no
22
11)( zazazP
)(zH
no
nono
15.01)( zzH175.01)( zzH
11)( zzH21)( zzH11)( zzH
![Page 17: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/17.jpg)
DLFBE --- Experiment (cont’d)
![Page 18: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/18.jpg)
Robust Speech Feature
Noise-Invariant Representation for Speech Signal
Group Delay Function (GDF) Method
Proposed by Bayya & Yegnanarayana
in EuroSpeech ‘99
![Page 19: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/19.jpg)
GDF --- Motivation
*Background noise is a prominent source of mismatch
and eliminated roughly by methods as follows
1.compensation
cause the overestimation and underestimation side effects
Pre-
Processing
SS(spectral sub.) ,HP,BP
FN(feature normalization)
Model
Adaptation
Parameter x’form
![Page 20: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/20.jpg)
GDF --- Motivation (cont’d)
2.new feature
not completely noise resistant
*All the above use power/amplitude as speech feature
Why don’t we use phase information as features ?
And phase infor. may be helpful in speech recognition.
LPC MEL,PLP (projection concept)
![Page 21: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/21.jpg)
GDF --- What/How
*GDF is defined as the normalized autocorrelation of
a short segment of a signal
(#.1)
Where is the normalized autocorrelation of a short
segment of a signal
(.))arg((.)log
(.)log
(.)log))(1log(
(.))arg(
1
RR
eR
Renr
Rj
n
nj
)(nr
![Page 22: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/22.jpg)
(#.2)
compare(#.1)&(#.2)
GDF --- What/How (cont’d)
1 1
11
)cos()()cos()(
)())(1log(
n n
n
nj
n
nj
nnrjnnr
enrenr
1
)sin()((.))arg(n
nnrR
0,0)( nnr
![Page 23: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/23.jpg)
GDF --- What/How (cont’d)
1
)]cos()[((.))arg(
n
nnnrR
GDF
30~10p
Easy to implement
)()]}cos([)({1
nwnnwnrGDFp
n
Truncated version of GDF
![Page 24: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/24.jpg)
GDF --- What/How (cont’d)
where
pnPnw 1),2cos(5.05.0)(
Hanning window
![Page 25: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/25.jpg)
GDF --- Why & Experiment
*frame length = 5 ms , frame rate = 1 ms & modified
autocorrelation sequence averaged over 20 frames
then the GDF computed as defined above
![Page 26: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/26.jpg)
GDF --- Why & Experiment (cont’d)
![Page 27: Robust Speech Feature](https://reader036.vdocument.in/reader036/viewer/2022081505/56815841550346895dc59980/html5/thumbnails/27.jpg)
GDF --- Experiment
*Isolated-digit recognition
Clean Noisy
SI 97%
95%
YES
SD 96.5%
94.5%
NO
Due to large dynamicrange?