sausage

Sausage

Lidia ManguEric Brill

Andreas StolckePresenter : Jen-Wei Kuo

2004/ 9 /24

Referred Reference

• CSL’00 Finding Consensus in Speech Recognition : Word Error Minimization and other Applications of Confusion Networks

• Eurospeech’99 Finding Consensus among Words : Lattice-Based Word Error Minimization

• Eurospeech’97 Explicit Word Error Minimization in N-Best List Rescoring

Motivation

• The mismatch between the standard scoring paradigm (MAP) and the evaluation metric (WER).

)(

)|()()|(

AP

WAPWPAWP

maximize sentence posterior probability minimize sentence level error

Word Error Minimization

• Minimizing the expected word error under the posterior distribution

R

WARP

WRWWEARPRWWEEW ),()|(min)],([min

~)|(

potential hypothesis

N-best ApproximationhypothesiscenterW

RWWEARPW

c

kikN

kNi

c

thecalled is

),()|(minarg )()()(

1,,1

hypothesiscenterW

RWWEARPW

c

kikN

kNi

c

thecalled is

),()|(minarg )()()(

1,,1

Lattice-Based Word Error Minimization

• Computational Problem– Several orders of magnitude larger than in N-best lists of

practical size.– No efficient algorithm of this kind.

• Fundamental Difficulty– Objective function is based on pairwise string distance, a

nonlocal measure.

• Solution– Replace pairwise string alignment with a modified

multiple string alignment.– WE (word error) MWE (modified word error)

Lattice to Confusion Network

Multiple Alignment

Multiple Alignment

• Finding the optimal alignment is a problem for which no efficient solution is known (Gusfield, 1992)

• We resort to a heuristic approach based on lattice topology.

Algorithms

• Step1. Arc Pruning

• Step2. Same-Arc Clustering

• Step3. Intra-Word Clustering

• Step4*. Same-Phones Clustering

• Step5. Inter-Word Clustering

• Step6. Adding null hypothesis

• Step7. Consensus-based Lattice Pruning

Arc Pruning

Intra-Word Clustering

• Same-Arc Clustering– Arcs with with same word_id, start frame and

end frame would be merged first.

• Intra-Word Clustering– Arcs with same word_id would be merged.

yprobabilitposterior theis)(

length. their of sum by the normalized

which and between length overlap theis ),(

)(WID)(WID

)()(),(max)Intra_SIM(

21

21

212121

22

11

p

eeoverlap

EE

epepeeoverlap,EEEeEe

Same-Phones Clustering

• Same-Phones Clustering– Arcs with same phone sequences would be

clustered in this stage.

)ence(phone_sequ)ence(phone_sequbut )WID()WID(

)()(),(max)Phone_SIM(

2121

212121

22

11

eeee

epepeeoverlap,EEEeEe

Inter-Word Clustering

• Inter-Word Clustering– Remaining arcs be clustered at this stage

finally.

lengths their of sum by the normalized

series phone of distanceedit theminus 1 is

))(:()(

)()(),(avg)Inter_SIM(

111

2121

)()(

21

22

11

sim

weWordsFepwp

wpwpwwsim,FFFWordsw

FWirdsw

Adding null hypothesis

• For each equivalent class, if the sum of the posterior probabilities is less than threshold (0.6) than add the null hypothesis to the class.

Consensus-based Lattice Pruning

• Standard Method Likelihood-based– Paths whose overall score differs by more than

a threshold from the best-scoring path are removed from the word graph.

• Proposed Method Consensus-based– Firstly we construct a pruned confusion

network.– Then intersect the original lattice with the

pruned confusion network.

Algorithm

An Example

• How to merge ?

我

我

是

是

是誰

我是是

我是我

Computational Issues

• Partial Order Stupid Method:– History-based Look-ahead

• Apply first-pass search to find the history arcs for each arc. Generate the initial partial ordering.

• While clusters are merged, lots of computation for (recursive) updates are needed.

• Thousands of arcs need lots of memory storage.

Computational Issues – An example

A B

C D

E G

H

F

J

I K M

L N

CA

DA

GAA

KA

LA

NA

If we merge B and C, what happened?

MAFA

JA

Experimental Set-up

• Lattices was built using HTK• Training Corpus

– Trained with about 60 hours of Switchboard speech.

– LM is a backoff trigram model trained on 2.2 million words of Switchboard transcripts.

• Testing Corpus– Test set in the 1997 JHU

Experimental Results

Set IIWER SER WER

MAP 38.5 65.3 42.9N-best (center) 37.9 65.6 42.3N-best (consensus) 37.6Lattice (consensus) 37.3 65.8 41.6Lattice (consensus withoutphonetic similarity)

37.5

Lattice (consensus withoutposteriors)

37.6

Set IHypothesis

Experimental Results

Hypothesis

F0 F1 F2 F3 F4 F5 FXOverall

Short

utt.

Long

utt.

MAP13.0

30.8

42.1

31.0

22.8

52.3

53.9

33.1 33.3

31.5

N-best (center)

13.0

30.6

42.1

31.1

22.6

52.4

53.9

33.0 　　

Lattice (consens

us)

11.9

30.5

42.1

30.7

22.3

51.8

52.7

32.5 33.0

32.5

Confusion Network Analyses

Other Approaches

• ROVER (Recognizer Output Voting Error Reduction)

sausage

Documents

word errorlattice

word graph

expected word error

intraword clusteringarcs

interword clusteringstep6

intraword clusteringstep4

lattice consensus37

error mwe