bang-xuan huang department of computer science & information engineering

15
Utterance classification with discriminative language modeling Murat Saraclar , Brian Roark Speech Communciation, 2006 Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University

Upload: brick

Post on 06-Jan-2016

19 views

Category:

Documents


2 download

DESCRIPTION

Utterance classification with discriminative language modeling Murat Saraclar , Brian Roark Speech Communciation, 2006. Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University. Outline. Introduction Method Experimental results Discussion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bang-Xuan Huang Department of Computer Science & Information Engineering

Utterance classification with discriminativelanguage modelingMurat Saraclar , Brian Roark

Speech Communciation, 2006

Bang-Xuan Huang

Department of Computer Science & Information Engineering

National Taiwan Normal University

Page 2: Bang-Xuan Huang Department of Computer Science & Information Engineering

2

Outline

• Introduction

• Method

• Experimental results

• Discussion

Page 3: Bang-Xuan Huang Department of Computer Science & Information Engineering

Introduction

3

• Reductions in WER are critical for applications making use of automatic speech recognition (ASR), but the key objective of a particular application may be different.

• In this paper, we perform a range of experiments investigating the benefit of simultaneous optimization of parameters for both WER and class-error rate (CER) reductions.

• We demonstrate that simultaneously optimizing parameter weights for both n-gram and class features provides significant reductions in both WER and CER.

Page 4: Bang-Xuan Huang Department of Computer Science & Information Engineering

Method

6

Page 5: Bang-Xuan Huang Department of Computer Science & Information Engineering

Feature definitions•The feature set investigated in the current paper includes:

(1)the scaled cost given by the baseline ASR system, i.e.

(2)unigram, bigram and trigram counts in the utterance, e.g. C(w1,w2,w3)

(3)the utterance class cl; and

(4) class-specific unigram and bigram counts, e.g. C(cl,w1,w2)

Method

7

),(log WAP

Page 6: Bang-Xuan Huang Department of Computer Science & Information Engineering

Method

• A separate n-gram model would be

estimated for each class, and the

initial class-labeled arc would have

a prior probability of the class, P(C). • The class (W) is then selected

for word string W that maximizes

the joint probability of class and

word string W, i.e.

8

C

),(maxarg)( WCPWCc

)|()(maxarg CWPCPc

Page 7: Bang-Xuan Huang Department of Computer Science & Information Engineering

• We can control parameter updates to suit a particular objective by choosing the relevant features; or by choosing the gold standard annotation so that only features related to that objective are updated.

• Here we have two main objectives: having an accurate transcription and choosing the correct class for each utterance.

• The parameters can be independently or simultaneously optimized to achieve either one or both of the objectives.

• Gd : correct class and minimum error rate word string for the utterance x

• 1B the one-best annotation from L c 。 N c 。 l for the current models

• c(Gd) : class lable of the notation Gd

• w(Gd): word transcript of the notation Gd

9

Page 8: Bang-Xuan Huang Department of Computer Science & Information Engineering

10

objective classification error reduction

word-error rate

ignore transcription class label

y= c(Gd)w(1B) c(1B)w(Gd)

feature sets (3) and (4) (1),(2),(4)

all bigram counts come only from the best scoring transcription.

class labels coming only from what is best scoring

Page 9: Bang-Xuan Huang Department of Computer Science & Information Engineering

Perceptron

11

2

1

))),(),(((2

1)(

L

iiiiiPerc zxyxF

對 偏微

)),(),(( iiii zxyx

Page 10: Bang-Xuan Huang Department of Computer Science & Information Engineering

GCLM

12

L

i

L

iM

jji

iijiGCLM

yx

yxxWF

1 1

1

,

)),(exp(

)),(exp()|(log)(

2

2

1

1

2

||||

)),(exp(

)),(exp()(

L

iM

jji

iGCLM

yx

yxF

)1(

)2(

21

1

)),(exp(

)),(exp(),(

L

iM

jji

ii

yx

yxyx

對 偏微

Page 11: Bang-Xuan Huang Department of Computer Science & Information Engineering

Experimental results

• Baseline results

13

Page 12: Bang-Xuan Huang Department of Computer Science & Information Engineering

Experimental results

• Perceptron

14

Page 13: Bang-Xuan Huang Department of Computer Science & Information Engineering

Experimental results

• GCLM

15

Page 14: Bang-Xuan Huang Department of Computer Science & Information Engineering

Experimental results

• This paper has investigated the effect of joint discriminative modeling on two objectives, classification and transcription error.

• On the classification side, there are three potential factors leading to the best performance:

(1) improvements in word accuracy provided by the model; and

(2) delaying the extraction of the 1-best hypothesis, i.e. using

word-lattice input rather than strings; and

(3) simultaneously optimizing parameters for classification and

transcription error reduction. • In the multi-label scenario, the first two factors provide 1% reduction

independently and 1.6% total reduction when combined. • Simultaneous optimization does not provide further improvement

over the two factors.

16

Page 15: Bang-Xuan Huang Department of Computer Science & Information Engineering

Experimental results

• simultaneous optimization of the parameters gives us as much improvement in word accuracy as we could get knowing the true class of the utterance, without penalizing TCErr.

• Using global conditional log-linear models further improved the performance.

• Independently optimizing the parameters yielded a 0.5% improvement in word accuracy and a 0.8% improvement in classification error over the perceptron algorithm.

17