presented by: fang-hui, chu automatic speech recognition based on weighted minimum classification...
TRANSCRIPT
![Page 1: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/1.jpg)
Presented by: Fang-Hui, Chu
Automatic Speech Recognition Based on Weighted Minimum Classification Error Training
Method
Qiang Fu, Biing-Hwang Juang
School of Electrical & Computer EngineeringGeorgia Institute of Technology
ASRU 2007
![Page 2: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/2.jpg)
Outline
• Introduction• Weighted word error rate• The minimum risk decision rule & weighted MCE method• Training scenarios & weighting strategies in ASR• Experiment results for weighted MCE• Conclusion & future work
![Page 3: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/3.jpg)
Review of Bayes decision theory
• A conditional loss for classifying into a class event :
• Expected loss function
• If we impose the assumption that the error loss function is uniform
• maximum a posteriori (MAP) decision rule
– It transforms the classifier design problem into a distribution estimation problem
X i
, )|()|(1
M
jjiji xCPexCR
XCPXCPXCRji
jie i
jijiij
1 ,1
,0
dXXpXXCRL )()|)(( )|(min)|)(( xCRxxCR ii
)|(max)|( if )( XCPXCPiXC jj
i
Several limitations !!
![Page 4: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/4.jpg)
Introduction
• In a variety of ASR applications, some errors should be considered more critical than others in terms of system objective– Keyword spotting system, speech understanding system,…– The difference of the significance of the recognition error is nece
ssary and a nonuniform error cost function becomes appropriate
– This transforms the classifier design into an error cost minimization problem instead of a distribution estimation problem
M
jjij
ii XCPeCXC
1
)|(minarg)(
![Page 5: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/5.jpg)
An example for non-uniform error rate
• Here is an example for using non-uniform error rate :
• The weighted word error rate (WWER) can be calculated as
0 AT N. E. C. THE NEED FOR INTERNATIONAL MANAGERS WILL KEEP RISING
1 AT ANY <del> SEE THE NEED FOR INTERNATIONAL MANAGERS WILL KEEP RISING
2 AT N. E. C. <del> NEEDS FOR INTERNATIONAL MANAGER’S WILL KEEP RISING
Two recognition results with equal-significance word error rate.But, which is better ?
N
nn
I
ii
D
dd
S
ss
wP
wPwPwP
1
111
log
logloglog
WWER
![Page 6: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/6.jpg)
An example for non-uniform error rate cont.
• An example of weighted word error rate :
0 AT N. E. C. THE NEED FOR INTERNATIONAL MANAGERS WILL KEEP RISING
2.317 3.138 3.135 2.784 1.275 3.675 2.027 3.259 3.797 2.481 3.689 3.925 35.502
1 AT ANY <del> SEE THE NEED FOR INTERNATIONAL MANAGERS WILL KEEP RISING
2.317 3.038 <del> 3.503 1.275 3.675 2.027 3.259 3.797 2.481 3.689 3.925
2 AT N. E. C. <del> NEEDS FOR INTERNATIONAL MANAGER’S WILL KEEP RISING
2.317 3.138 3.135 2.784 <del> 3.966 2.027 3.259 3.719 2.481 3.689 3.925
%24.25502.3596.82
%25.27502.35676.91
regWWER
regWWER
![Page 7: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/7.jpg)
The Minimum Risk decision rule
• minimum risk (MR) decision rule :
– involves a weighted combination of the a posteriori probabilities for all the classes
M
j
jjij
i
M
jjij
ii
XP
CPCXPe
XCPeCXC
1
1
)(
)()|(minarg
)|(minarg )(
![Page 8: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/8.jpg)
A practical MR rule
Xj
XCi
el
eK
lK
L
dXXpXCPeL
X
X
K
kk
jikji
xji
xkji
M
jjji
XXXX
XXXX
XX
ofindex indentity true thebe
recognizer by the decided asindex identity thebe )(Let
tokens trainingofset theis X
costerror theis )X(
where
1)X(
1
)()|(
1
1
![Page 9: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/9.jpg)
A practical MR rule cont.
• We can prescribe a discriminant function for each class, , and define the practical decision rule for the recognizer as
• The alternative system loss is then
jXg j ,;
:maxarg)( XgiXC ii
X
jIi
mm
ijX
j
Ijj
mm
jijIjIiX
CXXgieL
LL
XgiCXeL
M
M
MM
1;maxarg1
;maxarg11
![Page 10: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/10.jpg)
A practical MR rule cont.
• The approximation then needs to be made to the summands
otherwise ,0
;max; ,1
;;
;
, as that Note
;;
;;
;;maxarg1
,
XgXg
XGXg
Xg
XgXG
XGXg
XgeXgie
mmi
ii
i
imImmi
Ii ii
iij
Iim
mij
M
MM
![Page 11: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/11.jpg)
The weighted MCE method
• The objective function of the weighted MCE is
X j ij
iiij
jii
iij
IiIjX
CXGg
eL
CXXGXg
XgeL
MM
1lnlnexp1
1
asrewritten becan which
1;;
;
![Page 12: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/12.jpg)
Training Scenarios
• Intra-level training– The training and recognition decisions are on the same semantic
level with the performance measure
• Inter-level training– The training and recognition decisions are on the different
semantic level with the performance metric
– Minimizing the cost of the wrong recognition decisions does not directly optimize the recognizer’s performance in term of the evaluation metric
– To alleviate this inconsistency, the error weighting strategy could be built in a cross-level fashion
![Page 13: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/13.jpg)
Two types of error cost
• User-defined cost– Usually characterized by the system requirement and relatively
straightforward
• Data-defined cost– More complicated– The wrong decisions occur because the underlying data
observation deviates form the distribution represented– “bad” data ? or “bad” models ? – It is possible to measure the “reliability” of the errors by
introducing the data-defined weighting
![Page 14: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/14.jpg)
Error weighting for intra-level training
• In the intra-level training situation, the system performance is directly measured by the loss of wrong recognition decisions
• We can absorb both types of the error weighting into the error cost function as one universal functional form
• The objective function for the weighted MCE could be written as :
jji
imImlkmlki
lkilkilki
k
L
l ijlkijlkiMCEW
CPCXPXg
XgXG
XGXgXl
where
CXeXlF
M
kk
kk
k
k
k
kk
|;;
;;
;ln;lnexp1
1;
1;
,,,
,,,
1,,
)1(
k
kk
k
L
llkk
L
XX
phphphPH
,
21 ,...,,
![Page 15: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/15.jpg)
Error weighting for inter-level training
• We need to use cross-level weighting in this case to break down the high level cost and impose the appropriate weights upon the low level models
• The user-defined weighting of the weighted MCE in the inter-level training can be written as :
kk
k
k
k
kk
llu
ij
lu
ijk
L
l ijlklkiMCEW
wPwe
where
weCXXlF
log)(
)(1;
)(
)(
1,,
)2(
k
kkk
k
kkkk
k
L
lnlkk
Nllll
L
XX
phphphw
wwwW
,,
21
21
,...,,
,...,,
![Page 16: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/16.jpg)
Error weighting for inter-level training cont.
• The data-defined weighting of the weighted MCE in the inter-level training can be written as :
• A W-MCE objective function including both weighting function under the inter-level training scenario can be written as
klnm
where
meCXXlF
kk
dij
k
L
l ijlklkiMCEW
k
k
kk
,,
)(1; )(
1,,
)2(
weights total theis ),(
,,
),(1;1
,,)2(
mwe
klnm
where
mweCXXlF
k
k
k
k
kk
lij
kk
lijk
L
l ijlklkiMCEW
![Page 17: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/17.jpg)
Weighted MCE & MPE/MWE method
• The MPE/MWE is a training method with a weighted objective function to mimic training errors :
k
k
kk
kkk
k
k
k
k
kk
kk
luuulkw
Lllllkc
cwwc
w
lulku
K
kk
lulku
K
kk
wc
cK
kk
uuuu
Llllk
MWEMPE
wPwXpP
wPwXpP
PPPP
PXl
WWAXl
WWAPP
PWWA
wPwXp
wPwXp
F
;
;
lnlnexp1
1;
where
,;1
,,
,
,
,
1,
11/
![Page 18: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/18.jpg)
Weighted MCE & MPE/MWE method cont.
• To maximize the original MPE/MWE objective function is equivalent to minimize the modified objective function :
• In summary, MPE/MWE builds a objective function that incorporates the non-uniform error cost of each training utterance– W-MCE & MPE/MWE are both rooted in the Bayes decision
theory, directing to the same aim of designing the optimal classifier to minimize the non-uniform error cost
K
k iklkiMWEMPE WWAXlF
k1
,/ ,;
![Page 19: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/19.jpg)
W-MCE implementation
• In our experiments, we assume that the weighting function only contains the data-defined weighting for simplicity
ttss
XP
wwPwXPXw
where
XwXlK
F
nn
T
lllT
wwn
Tl
k
L
l i
TliMCEW
kkk
kln
k
k
k
k
when
Pr
Pr;1
1
11
,1
11
![Page 20: Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School](https://reader035.vdocument.in/reader035/viewer/2022080915/56649e315503460f94b2271a/html5/thumbnails/20.jpg)
Experiments
• Database : WSJ0