•url: .../publications/courses/ece_8423/lectures/current/lecture_28

10
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Language Modeling in ASR Discriminative Feature Mapping Example System Course Evaluations Resources: MB: Unsupervised LM Adaptation RS: Statistical Language Modeling DP: Discriminatively Trained Features AS: Discriminative Adaptation IBM: GALE Mandarin • URL: .../publications/courses/ece_8423/lectures/current/lectur e_28.ppt • MP3: .../publications/courses/ece_8423/lectures/current/lectur LECTURE 28: STATE OF THE ART

Upload: zwi

Post on 23-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

LECTURE 28: STATE OF THE ART. Objectives: Language Modeling in ASR Discriminative Feature Mapping Example System Course Evaluations Resources: MB: Unsupervised LM Adaptation RS: Statistical Language Modeling DP: Discriminatively Trained Features AS: Discriminative Adaptation IBM: GALE Mandarin. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8443 – Pattern RecognitionECE 8423 – Adaptive Signal Processing

•Objectives:Language Modeling in ASRDiscriminative Feature MappingExample SystemCourse Evaluations

• Resources:MB: Unsupervised LM AdaptationRS: Statistical Language ModelingDP: Discriminatively Trained FeaturesAS: Discriminative AdaptationIBM: GALE Mandarin

• URL: .../publications/courses/ece_8423/lectures/current/lecture_28.ppt• MP3: .../publications/courses/ece_8423/lectures/current/lecture_28.mp3

LECTURE 28: STATE OF THE ART

Page 2: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 2

Statistical Approach To Speech Recognition

Page 3: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 3

Core components:• transduction• feature extraction• acoustic modeling (hidden Markov

models)• language modeling (statistical N-

grams)• search (Viterbi beam)• knowledge sources

Our focus will be on the acoustic modeling components of the system.

Speech Recognition Architectures

Page 4: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 4

Statistical Language Modeling: N-Gram Models• The probability of a word sequence, , can be decomposed as:

• Clearly, estimating this for every unique word history is prohibitive. A practical approach is to assume this probability depends only on an equivalence class:

• There are three common simplifications, known as N-grams, we can make:

• Of course, there are many ways to merge histories, such as based on linguistic context (e.g., parts of speech such as article, noun), and we can use higher-order N-grams.

nwwwwW ...321

n

iii

nn

n

wwwwP

wwwwPwwwPwwPwP

wwwwPP

1121

121213121

321 ...

W

n

iii

n

iii

wwwwP

wwwwPP

1121

1121

W

21121

1121

121

iii

ii

i

wwwwwwwww

www

:Trigram:Bigram

:Unigram

Page 5: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 5

N-Gram Models Require Adaptation

Page 6: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 6

MAP Language Model (LM) Adaptation• The LM adaptation problem is often described as an interpolation problem

between an existing LM and an LM estimated from new data.

• Any of the approaches we have previously discussed can be employed. MAP adaptation can be shown to simplify to:

• If additional assumptions about the priors for the histories are made, this simplifies further to:

• Most of the adaptation methods we have discussed previously can be applied to this problem because a language model at its core is just a likelihood model.

• However, language models must also deal with the problem of unseen events, and hence models must be smoothed to account for sparseness of data.

121121121 )1(ˆ iinewiioldii wwwwPwwwwPwwwwP

Page 7: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 7

Discriminatively-Trained Features• Features can also be adapted using a similar transformational approach that

we used for Gaussian means:

where ht represents a transformation high-dimensional features and M represents a dimensionality reduction transformation. This approach combines the large-margin classification approaches (e.g., support vector machines) with traditional GMM approaches.

• The transformation M is typically estimated using an MPE criterion, and hence this method is often called fMPE.

ttt Mhxy

Page 8: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 8

State of the Art Systems (IBM GALE)

Page 9: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 9

State of the Art Systems (IBM GALE)

Page 10: •URL:  .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8423: Lecture 28, Slide 10

Summary• Discussed adaptation of language models and showed the process is similar

to that for feature vectors.

• Discussed feature-space adaptation.

• Reviewed a state of the art system that uses many forms of adaptation.

• Course evaluations…