2007-02-081 higher order cepstral moment normalization (hocmn) for robust speech recognition...
TRANSCRIPT
2007-02-08 1
Higher Order Cepstral Moment Normalization
(HOCMN) for Robust Speech Recognition
Speaker: Chang-wen HsuAdvisor: Lin-shan Lee
2007/02/08
2007-02-08 2
Outline Introduction
CMS/CMVN/HEQ Higher Order Cepstral Moment
Normalization (HOCMN) Even order HOCMN Odd order HOCMN Cascade system Fundamental principles Experimental Results
Conclusions
2007-02-08 3
Introduction Feature normalization in cepstral domain is widely
used in robust speech recognition: CMS: normalizing the first moment CMVN: normalizing the first and second moments Cepstrum Third-order Normalization (CTN): normalizing
the first three moments (Electronics Letters, 1999) HEQ: normalizing the full distribution (all order moments) How about normalizing a few higher order moments only?
Higher order moments are more dominated by higher value samples
Normalizing only a few higher order moments may be good enough, while avoiding over-normalization
2007-02-08 4
Introduction• Cepstral Normalization
• CMS: •CMVN:
Timeprogressively
( ) ( ) [ ( )]CMSX n X n E X n ( ) [ ( )]
( )CMVNX
X n E X nX n
2007-02-08 5
Introduction• Histogram Equalization
2007-02-08 6
Higher Order Cepstral Moment Normalization If the distribution of the cepstral coefficients can be
assumed to be quasi-Gaussian: Odd order moments can be normalized to zero Even order moments can be normalized to some specific
values Define notation:
X(n): a certain cepstral coefficient of the n-th frame X[k](n): with the k-th moment normalized X[k,l](n): with both the k-th and l-th moments normalized X[k,l,m](n): with the k-th, l-th and m-th moments normalized HOCMN[k,l,m]: an operator normalizing the k-th, l-th and m-
th moments For example
2007-02-08 7
Cepstral Moment Normalization Moment estimation:
Time average of MFCC parameters
Purpose: For odd order L
For even order N
[ ] ( ) 0LLE X n
2007-02-08 8
Even order HOCMN Only the moment for a single even order N can
be normalized and CMS can always be performed in advance
Therefore, the new feature coefficients can be expressed as
Let the desired value of the N-th moment of the new feature coefficient be , that is
2007-02-08 9
Even order HOCMN Aurora 2, clean condition training, word accuracy averaged over 0~20dB
and all types of noise (sets A,B,C)
CMVN=HOCMN[1,2]
2007-02-08 10
80.40
80.80
81.20
81.60
82.00
82.40
60 70 80 90 100 110 120l
Acc.
[1,100]
Even order HOCMN Evaluation of the expectation value for the moments
Sample average over a reference interval• Full utterance• Moving window of l frames
…… X(n-3) X(n-2) X(n-1) X(n) X(n+1) X(n+2) X(n+3) ……
l
to be normalized
l=86 is best
2007-02-08 11
Experimental results
CMVN (l=86)
CMVN (full-utterance)
Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C)
2007-02-08 12
Odd order HOCMN (1/3) Besides the first moment (CMS), only
another single moment of odd order L can be normalized in addition
The L-th HOCMN can be obtained from the (L-1)-th HOCMN (which is for an even number as discussed previously)
Then, the new feature coefficients can be expressed as
“a” and “c” are to be solved
2007-02-08 13
Odd order HOCMN (2/3) To solve “a” and “c”
The first moment is set to zero The N-th moment is set to zero
After some mathematics and approximation
2007-02-08 14
Odd order HOCMN (3/3) Because the formula for “a” above is only
an approximation, a recursive solution can be obtained in about two iterations
2007-02-08 15
Cascade system Cascading an odd order operator HOCMN[1,L] (L
is an odd number) and an even order operator HOCMN[1,N] (N is an even number) can obtain an operator HOCMN[1,L,N]
2007-02-08 16
Experimental results
CN
CTN=HOCMN[1,2,3]
CN (l=86)
Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C)
CMVN
CTN=HOCMN[1,2,3]
CMVN (l=86)
2007-02-08 17
Skewness and Kurtosis Skewness
Third moment about the mean and normalized to the standard deviation
Pdf departure from symmetric• Positive/negative indicate skew to right/left• Zero indicate symmetric
Kurtosis
Fourth moment about the mean and normalized to the standard deviation
Peaked or “flat with tails of large size” as compared to standard Gaussian
• “3” is the fourth moment of N(0,1)• Positive/negative indicate flatter/more peaked
2007-02-08 18
Skewness and Kurtosis 1st-moment always normalized Define: Generalized skewness of odd order L
L are not necessary 3 Similar meaning as skewness (skew to right or left)
except in the sense of L–th moment
Define: Generalized kurtosis of even order N
N are not necessary 4 Similar meaning as kurtosis (peaked or flat) except
in the sense of N–th moment
( ) , : an odd integerL LS E X L
2007-02-08 19
Skewness and Kurtosis Normalizing odd order moment is to constrain
the pdf to be symmetric about the origin Except in the sense of L-th moment
Normalizing even order moment is to constrain the pdf to be “equally flat with tails of equal size” Except in the sense of N-th moment
2007-02-08 20
The order of normalized moments are not necessary integers
Generalized moment Type 1:
• Reduced to odd order moment when u is an odd integer L (ex: L=1 or 3)
Type 2:
• Reduced to even order moment when u is an even integer N (ex: N=2 or 4)
HOCMN with non-integer moment orders
Generalized Moments
2007-02-08 21
Experimental Setup Aurora2 database
Training: Clean condition training Testing: Set A, B and C Development: All from clean training data
39-dimension feature coefficients C0~C12 MFCC, Δ, Δ2
Normalization performed on C0~C12
2007-02-08 22
Experimental Results
• Higher order moments can derive more robust features• Normalizing only three orders of moments are better than full distribution
2007-02-08 23
Experimental Results
2007-02-08 24
Experimental Results
2007-02-08 25
PDF Analysis
HEQ Over fitting to Gaussian Loss original statistics
HOCMN Fitting the generalized skewness
and kurtosis Retain more speech nature
HEQ
HOCMN
Original C0 & C1
2007-02-08 26
Distance Analysis Distance definition:
• HOCMN can derive smaller distance between clean and noisy speech• distance reduction has similar trend as error rate reduction
2007-02-08 27
Experimental Results
• Slight improvement for HOCMN with non-integer order moments• Especially for lower SNR values• Other robust techniques can be combined with it
2007-02-08 28
Experimental Results
2007-02-08 29
Experimental Results
For multi-condition training: HOCMN performs better than CMVN for
all SNR values Better than HEQ for higher SNR values
2007-02-08 30
Conclusions We proposed a unified framework for
higher moment order cepstral normalization
Normalization of higher moment order gives more robust features
Parameter set can be appropriately selected by development set
Skewness/kurtosis/distance analysis can further demonstrate the concepts of the normalization techniques