sausage
DESCRIPTION
Sausage. Lidia Mangu Eric Brill Andreas Stolcke Presenter : Jen-Wei Kuo 2004/ 9 /24. Referred Reference. CSL ’ 00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks - PowerPoint PPT PresentationTRANSCRIPT
Sausage
Lidia ManguEric Brill
Andreas StolckePresenter : Jen-Wei Kuo
2004/ 9 /24
Referred Reference
• CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks
• Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization
• Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring
Motivation
• The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER).
)(
)|()()|(
AP
WAPWPAWP
maximize sentence posterior probability minimize sentence level error
An ExampleCorrect answer : I’M DOING FINE
)|()|()|(
]|)(correct[]|)(correct[]|)(correct[
]|)(correct words[
321
321
321
AwPAwPAwP
AwEAwEAwE
AwwwE
Word Error Minimization
• Minimizing the expected word error under the posterior distribution
R
WARP
WRWWEARPRWWEEW ),()|(min)],([min
~)|(
potential hypothesis
N-best ApproximationhypothesiscenterW
RWWEARPW
c
kikN
kNi
c
thecalled is
),()|(minarg )()()(
1,,1
hypothesiscenterW
RWWEARPW
c
kikN
kNi
c
thecalled is
),()|(minarg )()()(
1,,1
Lattice-Based Word Error Minimization
• Computational Problem– Several orders of magnitude larger than in N-best lists of
practical size.– No efficient algorithm of this kind.
• Fundamental Difficulty– Objective function is based on pairwise string distance, a
nonlocal measure.
• Solution– Replace pairwise string alignment with a modified
multiple string alignment.– WE (word error) MWE (modified word error)
Lattice to Confusion Network
Multiple Alignment
Multiple Alignment
• Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992)
• We resort to a heuristic approach based on lattice topology.
Algorithms
• Step1. Arc Pruning
• Step2. Same-Arc Clustering
• Step3. Intra-Word Clustering
• Step4*. Same-Phones Clustering
• Step5. Inter-Word Clustering
• Step6. Adding null hypothesis
• Step7. Consensus-based Lattice Pruning
Arc Pruning
Intra-Word Clustering
• Same-Arc Clustering– Arcs with with same word_id, start frame and
end frame would be merged first.
• Intra-Word Clustering– Arcs with same word_id would be merged.
yprobabilitposterior theis)(
length. their of sum by the normalized
which and between length overlap theis ),(
)(WID)(WID
)()(),(max)Intra_SIM(
21
21
212121
22
11
p
eeoverlap
EE
epepeeoverlap,EEEeEe
Same-Phones Clustering
• Same-Phones Clustering– Arcs with same phone sequences would be
clustered in this stage.
)ence(phone_sequ)ence(phone_sequbut )WID()WID(
)()(),(max)Phone_SIM(
2121
212121
22
11
eeee
epepeeoverlap,EEEeEe
Inter-Word Clustering
• Inter-Word Clustering– Remaining arcs be clustered at this stage
finally.
lengths their of sum by the normalized
series phone of distanceedit theminus 1 is
))(:()(
)()(),(avg)Inter_SIM(
111
2121
)()(
21
22
11
sim
weWordsFepwp
wpwpwwsim,FFFWordsw
FWirdsw
Adding null hypothesis
• For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.
Consensus-based Lattice Pruning
• Standard Method Likelihood-based– Paths whose overall score differs by more than
a threshold from the best-scoring path are removed from the word graph.
• Proposed Method Consensus-based– Firstly we construct a pruned confusion
network.– Then intersect the original lattice with the
pruned confusion network.
Algorithm
An Example
• How to merge ?
我
我
是
是
是 誰
我是 是
我是我
Computational Issues
• Partial Order Stupid Method:– History-based Look-ahead
• Apply first-pass search to find the history arcs for each arc. Generate the initial partial ordering.
• While clusters are merged, lots of computation for (recursive) updates are needed.
• Thousands of arcs need lots of memory storage.
Computational Issues – An example
A B
C D
E G
H
F
J
I K M
L N
CA
DA
GAA
KA
LA
NA
If we merge B and C, what happened?
MAFA
JA
Experimental Set-up
• Lattices was built using HTK• Training Corpus
– Trained with about 60 hours of Switchboard speech.
– LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts.
• Testing Corpus– Test set in the 1997 JHU
Experimental Results
Set IIWER SER WER
MAP 38.5 65.3 42.9N-best (center) 37.9 65.6 42.3N-best (consensus) 37.6Lattice (consensus) 37.3 65.8 41.6Lattice (consensus withoutphonetic similarity)
37.5
Lattice (consensus withoutposteriors)
37.6
Set IHypothesis
Experimental Results
Hypothesis
F0 F1 F2 F3 F4 F5 FXOverall
Short
utt.
Long
utt.
MAP13.0
30.8
42.1
31.0
22.8
52.3
53.9
33.1 33.3
31.5
N-best (center)
13.0
30.6
42.1
31.1
22.6
52.4
53.9
33.0
Lattice (consens
us)
11.9
30.5
42.1
30.7
22.3
51.8
52.7
32.5 33.0
32.5
Confusion Network Analyses
Other Approaches
• ROVER (Recognizer Output Voting Error Reduction)