confidence measures for automatic speech recognition presented by tzan-hwei chen national taiwan...
TRANSCRIPT
![Page 1: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/1.jpg)
Confidence Measures for Automatic Speech Recognition
Presented by Tzan-Hwei Chen
National Taiwan Normal UniversitySpoken Language Processing Lab
Advisor : Hsin-Min Wang Berlin Chen
![Page 2: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/2.jpg)
2
Outline
• Introduction
• The category of estimation methods of confidence measure (CM)– Featured based– Posterior probability based– Explicit model based – Incorporation of high-level information for CM*
• The application of CM to improve speech recognition
• Summary
![Page 3: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/3.jpg)
3
Introduction (1/9)
• It is extremely important to be able to make an appropriate and reliable judgement based on the error-prone ASR result.
• Researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM), to indicate reliability of any recognition decision made by an ASR system.
![Page 4: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/4.jpg)
4
Introduction (2/9)
Feature extraction
Decodingspeechsignal
Acousticmodel
Languagemodel
recognizedword
sequence
featurevector
Lexicon
Confidence Measure
Verification
臺北 到 魚籃
12
1. 臺北到魚籃2. 臺北到宜蘭
Some application of CM
![Page 5: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/5.jpg)
5
Introduction (3/9)
• First of all, we can backtrack some early research on CM to rejection in word-spotting systems.
• Other early CM-related works lie in automatic detection of new words in LVCSR.
• From the past few years, the CM has been applied to more and more research areas, e.g.,– To improve speech recognition– The algorithm about look-head in LVCSR– To guide the system to perform unsupervised learning– …
![Page 6: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/6.jpg)
6
Recognizedunits
Introduction (4/9)
• The general procedure of CM for verification
Confidence estimation
judgment
Predefinedthreshold
> threshold < threshold
Confidenceof unit
acceptance rejection
![Page 7: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/7.jpg)
7
Introduction (5/9)
• Four situations when judging a hypothesis
宜蘭ref
hyp宜蘭 Accept Correct acceptance
reject Correct rejection
reject false rejection
Accept false acceptance
宜蘭ref
hyp魚籃
宜蘭ref
hyp宜蘭
宜蘭ref
hyp魚籃
![Page 8: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/8.jpg)
8
Introduction (6/9)
• The evaluation metric :– Confidence error rate :
wordsrecognized ofnumber totalthe
rejection false and acceptance false ofnumber CER
三民 候選人 通過
有 三名 候選人 通過 審查
審查 了
FA CA FR CA FA
5
12 CER
hyp
ref
![Page 9: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/9.jpg)
9
Introduction (7/9)
• The evaluation metric :– Confidence error rate :
wordsrecognized ofnumber totalthe
ins. sub.
correct) is wordrecognizedevery assumed is(it
baseline
三民 候選人 通過
有 三名 候選人 通過 審查
審查 了
FA CA CA CA FA
5
11baseline
hyp
ref
![Page 10: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/10.jpg)
10
Introduction (8/9)
• The evaluation metric (cont):– Receiver operator characteristics (ROC) curve :simply contains
a plot of the false acceptance rate over the detection rate.
raterejection false-1 ratedetection
wordsrecognizedy incorrectl ofnumber
acceptance false of num.rate acceptance false
wordsrecognizedcorrectly ofnumber
rejection false of num.rate rejection false
![Page 11: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/11.jpg)
11
Introduction (9/9)
• All methods proposed for computing CMs can be roughly classified into three major categories [7]:
– Feature based
– Posterior probability based
– Explicit model based (utterance verification, UV)
– Incorporation of high-level information for CM*
![Page 12: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/12.jpg)
12
Feature-based confidence measure
![Page 13: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/13.jpg)
13
Feature-based confidence measure (1/8)
• The feature can be collected during the decoding procedure and may include acoustic, language and syntactic information
• Any feature can be called a predictor if its p.d.f. of correctly recognized words is clearly distinct from that of misrecognized words
)( wfp
wf
misrecognized word correctly recognized word
![Page 14: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/14.jpg)
14
Feature-based confidence measure (2/8)
• Some common predictor features – Pure normalized likelihood score related : acoustic score per
frame.
– N-best related : count in the N-best list, N-best homogeneity score
– Duration related : word duration divided by its number of phones
)|],,([
)|],,([
1],,[
1],,[
1
1
Xeswp
Xeswp
Nnnn
bestNesw
Mmmm
esww
Nnnn
Mmmm
wordth- theof timeend the:
wordth- theof start time the:
is word
ofnumber that sequence a word :],,[ 1
me
ms
M
esw
m
m
Mmmm
![Page 15: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/15.jpg)
15
Feature-based confidence measure (3/8)
• Some common predictor features (cont)– Hypothesis density :
ttD at arc worddifferent theofnumber The:)'(
)(1
1]),;[:( tD
seeswaHD
a
a
e
staaaaa
靜音結果
建國
有
由
又
三名三名
三名
候選人
候選人
候選人
沒有
沒有
沒有
審查
審查
候選人通過
三名候選人
通過
2)( tDgraph wordin arc a word :],;[: aaa eswa
![Page 16: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/16.jpg)
16
Feature-based confidence measure (4/8)
• Some common predictor features (cont)– Acoustic stability
今天 天氣 很好Hypothesized word sequence
天氣 很好
今天 天氣
今天Hypothesized word sequence
1)()|( WPWXp
今天 天氣
今天 天氣 不佳
2)()|( WPWXp
3)()|( WPWXp
今天 天氣 很好
今天 天氣Hypothesized word sequence
不佳
![Page 17: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/17.jpg)
17
Feature-based confidence measure (6/8)
• We can combine the above features with any one of the following classifiers
– Line discriminant function
– Generalized linear model
– Neural networks
– Decision tree
– Support vector machine
– Boosting
– Naïve Bayes classifier
![Page 18: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/18.jpg)
18
Feature-based confidence measure (7/8)
• Naïve Bayes Classifier [3]
),'|()|'(
),|()|(),|(
21or'
111
wCfPwCP
wCfPwCPwfCP
wCCC
ww
),|(),|(1
wCfPwCfP iw
K
diw d
)(
),()|(
wN
wCNwCP i
i
),(
),,(),|(
wCN
wCfNwCfP
i
iwiw
d
d
vectorfeaturepredictor dimension the:
wrongis wordrecognized the:
correct is wordrecognized the:
2
1
kf
C
C
w
![Page 19: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/19.jpg)
19
Feature-based confidence measure (8/8)
• Experiments [3]
• Corpus : an Italian speech corpus of phone calls to the front desk of a hotel
feature CER(%)relative reduction (%)
acoustic stability 16.3 22.4
language modelscore
18.8 10
hypothesis density 18.9 10.5
duration 19.3 8.1
acoustic score 19.6 6.7
baseline 21
![Page 20: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/20.jpg)
20
Posterior probability based confidence measure
![Page 21: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/21.jpg)
21
)],,([)],,[|(
)],,([)],,[|(
)(
)],,([)],,[|()|],,([
11
],,[
11
111
1
Nnnn
Nnnn
esw
Mmmm
Mmmm
Mmmm
MmmmM
mmm
eswPeswXp
eswPeswXp
Xp
eswPeswXpXeswP
Nnnn
__
W
Posterior probability based confidence measure (1/11)
• Posterior probability of a word sequence :
• To adopt some approximation methods
Impossible to estimate in a precise manner
![Page 22: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/22.jpg)
22
靜音
建國
Posterior probability based confidence measure (2/11)
• Word graph based approximation
結果
有
由
又
又
有
三名三名
三名
三名
三名
候選人
候選人
候選人
沒有
沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
)],,([)],,[|()( 11
],,[ 1
Nnnn
Nnnn
esw
eswPeswXpXpN
nnn
__
W
)],,([)],,[|( 11],,[ 1
Mmmm
Mmmm
esw
eswPeswXpXM
mmm
![Page 23: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/23.jpg)
23
Posterior probability based confidence measure (3/11)
• Posterior probability of a word arc :
– Some issues are addressed and the word posterior probability is generalized
• Reduced search space
• Relaxed time registration
• Optimal acoustic and language model weights
)|()|(
)|()|(
)|],;[:(
1}],;{[
1],;[,}],;{[
1
11
nnnes
N
nesw
mmmes
M
meswaeswXaaa
hwPwXp
hwPwXp
eswapn
nXN
nnnn
m
mMmmmm
XMmmmm
![Page 24: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/24.jpg)
24
Posterior probability based confidence measure (4/11)
• Posterior probability of a word arc [6] :
)|()|(
)|()|(
)|],;[:(
1}],;{[
1],;[,}],;{[
1
11
nnnes
N
nesw
mmmes
M
meswaeswXaaanormal
hwPwXp
hwPwXp
eswaCn
nXN
nnnn
m
mMmmmm
XMmmmm
靜音結果
建國
有由
又
又
有
三名三名
三名
三名
三名
候選人
候選人
候選人
沒有
沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
![Page 25: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/25.jpg)
25
Posterior probability based confidence measure (5/11)
• Posterior probability of a word arc [6] :
)|],;[:(]),;[:(
2/)(],,;[:
Xrrrnormal
eesswweswr
aaamed eswrCeswaC
raar
arrrr
靜音結果
建國
有由
又
又
有
三名三名
三名
三名
三名
候選人
候選人
候選人
沒有
沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
![Page 26: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/26.jpg)
26
三名
Posterior probability based confidence measure (6/11)
• Posterior probability of a word arc [6] :
)|],;[:(max]),;[:(],,;[:
},,{
Xrrrnormal
etswweswr
estaaa eswrCeswaC
rr
arrrraa
max
靜音結果
建國
有由
又
又
有
三名三名
三名
三名
候選人
候選人
候選人
沒有
沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
![Page 27: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/27.jpg)
27
Posterior probability based confidence measure (7/11)
• Posterior probability of a word arc [6] :
)|],;[:(]),;[:(
),(),(:],,;[:
secX
rrrnormal
eseswweswr
aaa eswrCeswaC
rraa
arrrr
靜音結果
建國
有由
又
又
有
三名三名
三名
三名
三名
候選人
候選人
候選人
沒有
沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
![Page 28: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/28.jpg)
28
Posterior probability based confidence measure (8/11)
• The drawbacks of the above methods – all need an additional pass.
• In [8], the “local word confidence measure” is proposed
)'())'|(max(
)())|((max]),,([
''
]'',',[
''
]'',,[
wpwxp
wpwxpeswC
es
Eesw
es
Eesw
今天 rate. relaxationa given sconstraint length and time
therealize which wordsalternate theofset the:E
今天
今天
今天
))|((max ''
]'',,[wxp e
sesw
![Page 29: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/29.jpg)
29
Posterior probability based confidence measure (8/11)
• local word confidence measure (cont)
)'())'|(max(
)())|((max]),,([
''
]'',',[
''
]'',,[
wpwxp
wpwxpeswC
es
Eesw
es
Eesw
)'|'())'|(max(
)|())|((max]),,([
'
''
]'',',[
'
''
]'',,[
hw
ss
Eesw
hw
es
Eesw
wwpwxp
wwpwxpeswC
h
bigram applied
)}'|'()'|'({))'|(max(
)}|()|({))|((max
]),,([''
]'',',[
''
]'',,[
wwpwwpwxp
wwpwwpwxp
wswCfh
ww
es
Eesw
fhww
es
Eesw
fh
fh
forward/backwardbigram applied
![Page 30: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/30.jpg)
30
Posterior probability based confidence measure (9/11)
• Impact of word graph density on the quality of posterior probability [9]
Baseline 27.3 15.4
wordsspoken ofnumber the
arcs wordofnumber totalWGD
![Page 31: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/31.jpg)
31
Posterior probability based confidence measure (10/11)
• Experiments [6]
corpus baseline Cnormal Csec Cmed Cmax
ARISE 13.6 11.5 8.9 8.8 8.9
Verbmobil 27.3 23.3 19.0 20.0 18.9
NAB 20k 11.3 10.3 9.2 9.2 9.2
NAB 64k 9.2 8.4 7.2 7.2 7.2
Broadcast news 27.7 23.7 20.6 20.4 20.6
![Page 32: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/32.jpg)
32
Explicit model based confidence measure (1/10)
• The CM problem is formulated as a statistical hypothesis testing problem.
• Under the framework of binary hypothesis testing, there are two complementary hypotheses
• We test against
W1
W0
model from NOT is and recognized wrongly is :Hypothese) ve(Alternati
model from comes truly and recognizedcorrectly is :Hypothese) (Null
XH
XH
0H 1H
0
1
)|(
)|( RT) testing(Lratio likelihood
1
0
H
HHXP
HXP
![Page 33: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/33.jpg)
33
Explicit model based confidence measure (3/10)
• The above LRT score can be transformed to a CM based on a monotonic 1-1 mapping function.
• The major difficulty with LRT is how to model the alternative hypothesis.
• In practice, the same HMM structure is adopted to model the alternative hypothesis.
• A discriminative training procedure plays a crucial role in improving modeling performance.
![Page 34: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/34.jpg)
34
Explicit model based confidence measure (3/10)
• Two-pass procedure :
)|(score observaion csxP
)|(score transition ci
cj ssp
今天 天氣 很好
)|(
)|(:
aes
ces
XP
XPLRT
今天
今天
of model-anti the:
of modelcorrect the:a
c
![Page 35: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/35.jpg)
35
Explicit model based confidence measure (4/10)
• One-pass procedure
)|(
)|(score observaion
a
c
sxP
sxP
)|(
)|(score transition
ai
aj
ci
cj
ssp
ssp
今天 天氣 很好
)|(
)|(:
aes
ces
XP
XPLRT
今天
今天
a
tct ss
![Page 36: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/36.jpg)
36
Explicit model based confidence measure (5/10)
• How to calculate the confidence of a recognized word?
shift.a is and function sigmoid theof slope thedefines where
)))((logexp(1
1)(
function sigmoida by
dmanipulate is measure confidence subword therange, dynamic limit the To
segment. decoded thein frames ofnumber theis where
)|(
)|(log
1),,(log
1)(
as obtained be can,X vectors,nobservatio
ofsegment a over decoded unit subworda for score likelihood levelunit unweighted The
u
uu
uu
u
au
cu
u
acu
u
uLRuU
N
XP
XP
NXLR
NuLR
u
![Page 37: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/37.jpg)
37
Explicit model based confidence measure (6/10)
• How to calculate the confidence of a recognized word (cont)?
))(log1
exp()(
)(1
)(
))(log1
exp()(
)(1
)(
:,,1, units subword of composed a wordfor defined are measures following The
compared. are U()scores ratio likelihood levelunit weightedsigmoid theand LR(),
scores, ratio likelihood levelunit unweighted the toingcorrespond measures confidence level Word
,1
4
,1
3
,1
2
,1
1
,n
in
N
inn
in
N
inn
in
N
inn
in
N
inn
nin
uUN
wW
uUN
wW
uLRN
wW
uLRN
wW
Niu
n
n
n
n
LR() theof means arithmetic
LR() theof means geometric
U() theof means arithmetic
U() theof means geometric
![Page 38: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/38.jpg)
38
Explicit model based confidence measure (7/10)
• Discriminative training [10] – The goal of the training procedure is to increase the average val
ue of for correct hypotheses and decrease the average value of for false acceptance.
),,( acXLR
),,( acXLR
},,{segment over the
unit as decodedsegment speech theof frame final and initial theare and where
)(1
1)(
as distances based frame theaveragingby obtained is distance basedsegment The
))(log())(log()(
:decoder by the
obtained sequence in then transitiostateeach for defined is distance based frameA
1-tq
ij
ufui
uu
t
uf
uiuu
ttu
fi
tq
tt
ttif
uu
taj
aijt
cj
cijt
xxX
utt
xrtt
XR
xbaxbayr
![Page 39: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/39.jpg)
39
Explicit model based confidence measure (8/10)
• Discriminative training (cont)
),(1
)},({ },,{ where
)},({
)},({cost expected theon performed is updategradient A
imposter ,1
correct ,1)(
as defined is )( functionindicator thewhere
)))()((exp(1
1),,(
function sigmoida using unit for ),,( functioncost theDefine
1
u
u1
uuu
N
iu
uuu
au
cu
uuun
un
uuu
uu
uu
au
cu
uu
au
cu
uu
XFN
XFE
XFE
XFE
u
uu
u
XRuXF
uXF
u
![Page 40: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/40.jpg)
40
Explicit model based confidence measure (9/10)
)(F
)))()(exp(1
1),,(
uu
uu
au
cu
uu XRuXF
) R(andimposter
)( andcorrect if
uu
uRu )(F
Why discriminative training works?
![Page 41: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/41.jpg)
41
Explicit model based confidence measure (10/10)
• Experiments [10] • This task, referred to as the “movie locator”,
![Page 42: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/42.jpg)
42
Incorporation of high-level information for CM
![Page 43: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/43.jpg)
43
Incorporation of high-level information for CM (1/4)
• LSA
– The key property of LSA is that words whose vectors are close to each other are semantically similar words.
– These similarities can be used to provide an estimate of the likelihood of the words co-occurring within the same utterance.
21dd nd
2
1
w
w
mw
2
1
w
w
mw
A U
21dd nd
TV
![Page 44: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/44.jpg)
44
Incorporation of high-level information for CM (2/4)
• LSA (cont)– The entry of matrix :
– The confidence of a recognized word :
)1log()1(j
ijiij n
cEa
A
ijij
N
ji ff
NE 2
12
log)(log
1
i
ij
ij t
cf
))(),((Cosine1
1ji
N
jwUwU
N
document all in termofcount the:
document of size the:
document in termofcount the:
it
jn
jic
i
j
ij
![Page 45: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/45.jpg)
45
Incorporation of high-level information for CM (3/4)
• Inter-word mutual information :
wordsrecognized remaining thewith
word thisof ninformatio mutual average theas calculated is wordrecognized each ofCM
))w()(
)w,(log(
as calculated be can and wordsany two between ninformatio Mutual
)w,(
)w,(),(
: is )w,(y probabilitjoint thedocuments, training thein
wordand wordof timesoccurrence-co thedenotes )w,( Assume
21
21
21
21w,
2121
21
2121
21
pwp
wpMI
ww
wN
wNwwP
wP
wwwN
w
![Page 46: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/46.jpg)
46
Incorporation of high-level information for CM (4/4)
• Experiments [14]
CM Switchbord Mandarin dictation
LSA 44.7 38.5
MI 41.0 33.7
Cmed 24.4 17.5
N-best count 28.3 21.1
MI+Cmed 23.9 16.2
![Page 47: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/47.jpg)
47
The application of CM to improve speech recognition
![Page 48: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/48.jpg)
48
The application of CM to improve speech recognition (1/10)
• Statistical decision theory aims at minimizing the expected of making error
)|],,([maxarg],,[ 1],,[
*
11
XeswPesw Nnnn
esw
Nnnn
Nnnn
靜音結果
建國
有由
又
又
有
三名三名
三名
三名
三名
候選人
候選人
候選人
沒有沒有
沒有靜音
通過 靜音
候選人通過
三名候選人
靜音
![Page 49: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/49.jpg)
49
The application of CM to improve speech recognition (2/10)
• Method 1 [16]:
)|],,([
),],,[|],,([
)|],,([)|],,([
11
11
11
111
Tnnn
N
n
Tnnnn
N
n
TNnnn
Nnnn
xtswp
xtswtswp
xeswpXeswp
)|],,([maxarg],,[ 11],,[
**
11
TNnnn
esw
Nnnn xeswPesw
Nnnn
![Page 50: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/50.jpg)
50
The application of CM to improve speech recognition (3/10)
• Method 2 [18] :
)|],,([WERminarg],,[ 1],,[
*
11
Xeswesw Nnnn
esw
Nnnn
Nnnn
)|()(1
0.1)|],,([1
1 XwPcorrectwPN
XeswWER nn
N
n
Nnnn
![Page 51: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/51.jpg)
51
The application of CM to improve speech recognition (4/10)
• Method 3 (Time Frame Error decoding) [17]: – In minimum Bayes risk decoding
– if
)|],,([
)],,[,],,([minarg],,[
1
],,[11
],,[
*
1 1
1XesvP
esveswCesw
Mmmm
esv
Mmmm
Nnnn
esw
Nnnn
Mmmm
Nnnn
M
mmmN
nnn
Mmmm
NnnnM
mmmN
nnnesvesw
esveswesveswC
11
1111
],,[],,[,0
],,[],,[,1)],,[,],,([
![Page 52: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/52.jpg)
52
The application of CM to improve speech recognition (5/10)
• Method 3 (cont)
)|],,([maxarg
)|],,([1minarg
)|],,([1minarg
)|],,([
)],,[,],,([minarg],,[
1],,[
1],,[
1],,[],,[
],,[
1
],,[11
],,[
*
1
1
1
111
1
1
XeswP
XeswP
XesvP
XesvP
esveswCesw
Nnnn
esw
Nnnn
esw
Mmmm
eswesvesw
Mmmm
esv
Mmmm
Nnnn
esw
Nnnn
Nnnn
Nnnn
Nnnn
Mmmm
Nnnn
Mmmm
Nnnn
![Page 53: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/53.jpg)
53
The application of CM to improve speech recognition (6/10)
• Method 3 (cont) – we are now faced with a conceptual mismatch between the decis
ion rule and the evaluation criterion for speech recognizers- the word error rate
– The easiest way to overcome this mismatch is to use the same cost function for evaluation – Levensthein distance
– In (Stolcke et. al 1997), the pairwise alignment is restricted to N-best list.
– Let us assume that sub. were the one type of error.• A dynamic programming alignment would thus not be necess
ary.
![Page 54: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/54.jpg)
54
The application of CM to improve speech recognition (7/10)
• Method 3 (cont)
)(1
),(1
)],,[,],,([1
11nn
tn
et
stN
n
Mmmm
Nnnn se
w
esveswC
n
n
'W
W
今天 天氣 很好
今天 天氣
1 every word ofcost maximum the1, if 2' ofcost max the
3' ofcost max the
W
W
tesv
vM
mmm
t
frame timeintersects which],,[
sequence wordin hypothesis word theofidentity word the:
1
![Page 55: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/55.jpg)
55
The application of CM to improve speech recognition (8/10)
• Method 3 (cont)
)(1
)|],,([),(1
minarg
)(1
)|],,([),()|],,([1
minarg
)|],,([)(1
),(1
minarg
)|],,([)],,[,],,([minarg],,[
1],,[
1],,[
1]'',,[
11],,[
1],,[
],,[1
1],,[
],,[111
],,[
*
1
1
1
11
1
11
11
nn
Mmmm
tn
esv
et
stN
nesw
nn
Mmmm
tn
es
et
st
TMmmm
esv
et
stN
nesw
esv
Mmmm
nn
tn
et
stN
nesw
esv
Mmmm
Mmmm
Nnnn
esw
Nnnn
se
XesvPw
se
XesvPwxesvP
XesvPse
w
XesvPesveswCesw
Mmmm
n
n
N
M
n
nM
mmm
n
n
N
Mmmm
n
n
N
Mmmm
N
![Page 56: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/56.jpg)
56
The application of CM to improve speech recognition (9/10)
• Method 3 (cont)
),|(
)|],,([),(
)|],,([),(
)|],,([),(
''
1
1
:],,,[
11 ],,[
1],,[
Xtwp
XesvPw
XesvPw
XesvPw
n
mmmmnetsesv
Mmmmmn
M
etsmesv
Mmmm
tn
esv
mmmmmm
mm
Mmmm
Mmmm
)(1
),|(1
nn
n
et
st
se
Xtwpn
n
Can be interpreted as the normalizedProbability of a word being incorrect.
nw
![Page 57: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/57.jpg)
57
The application of CM to improve speech recognition (10/10)
• Experiments (Method 3)
corpus baseline Time frame error
ARISE 15.8 15.0
Verbmobil 33.6 32.5
NAB 20k 13.2 12.9
NAB 64k 11.1 10.8
broadcast news 33.3 32.3
![Page 58: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/58.jpg)
58
Summary
• Almost all CMs rely almost entirely on a single information source :how much the underlying decision can overtake other possible competitors.
• We believe it is critical to improve performance of CMs by – taking this segmentation issue into account.– Deciding a dynamic threshold for different word.
![Page 59: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/59.jpg)
59
Reference (1/4)
• Main reference– [1] H. Jiang ,“Confidence Measures for Speech Recognition : A Survey”,
Speech communication 2005 .
• Feature based confidence measure– [2] S. Cox and R. Rose, “Confidence Measures for The Switchboard Dat
abase”, ICASSP 1996.– [3] T. Schaaf and T. Kemp, “Confidence Measures for Spontaneous Spe
ech Recognition”, ICASSP 1997.– [4] A. Sanchis , A. Juan and E. Vidal, “New Features based on Multiple
Word Graphs for Utterance Verification”, ICSLP 2003.– [5] R. Zhang and A.I. Rudnicky, “Word Level Confidence Annotation Usi
ng Combinations of Features”, EuroSpeech 2001.
• Posterior based confidence measure– [6] F. Wessel , R. Schluter, K. Macherey, and H. Ney, “Confidence Meas
ures for Large Vocabulary Continuous Speech Recognition”, IEEE SAP 2001.
– [7] F. K. Soong and W. K. Lo, “Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences”, ICASSP 2005
![Page 60: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/60.jpg)
60
Reference (2/4)
• Posterior based confidence measure– [8] J. Razik, O. Mella, D. Fohr, J.-P. Haton, “Local Word Confidence Me
asure Using Word Graph and N-Best List.”
– [9] T, Fabian, R. Lieb, G. Ruske, M. Thomae, “Impact of Word Graph Density on the Quality of Posterior Probability Based Confidence Measures.
• Explicit model based confidence measure– [10] E. Lleida, R. C. Rose, “Utterance Verification in Continuous Speech
Recognition: Decoding and Training Procedures”, IEEE SAP 2000.
– [11] M. G. Rahim and C.-H Lee, “String-based Minimum Verification Error (SB-MVE) Training for Speech Recognition”, computer speech and language 1997.
– [12] H. Jiang, F. K. Soong and C.-H. Lee, “A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification”, IEEE SAP 2005.
![Page 61: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/61.jpg)
61
Reference (3/4)
• Incorporation of high-level information for CM– [13] R. Sarikaya, Y. Gao, M. Picheny and H. Erdogan, “Semantic Confid
ence Measurement for Spoken Dialog Systems.”, IEEE SAP 2005.
– [14] G. Guo, C. Huang, H. Jiang, R.-H. Wang, “A Comparative Study on Various Confidence Measures in Large Vocabulary Speech Recognition”, ISCSLP 2004.
• Some application for CM – [15] M. Afify, F. Liu, H. Jiang and O. Siohan, “A New Verification-based
Fast-Match for Large Vocabulary Continuous Speech Recognition” IEEE SAP 2005
– [16] F. Wessel , R. Schluter and H. Ney, “Using Posterior Word Probabilities for Improved Speech Recognition”, ICASSP 2000.
– [17] F. Wessel , R. Schluter and H. Ney, “Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities”, ICASSP 2001.
![Page 62: Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor](https://reader036.vdocument.in/reader036/viewer/2022062305/56649e6a5503460f94b67595/html5/thumbnails/62.jpg)
62
Reference (2/4)
• Some application for CM – [18] A. Kobayashi, K. Onoe, S. Sato and T. Imai, “Word Error Minimizati
on Using an Integrate Confidence Measure”, INTERSPEECH 2005.
– [19] Y. Qian, T. Lee and F. K. Soong, “Tone Information as a Confidence Measure for Improving Cantonese LVCSR “, ICSLP 2004.