1 bayesian learning for latent semantic analysis jen-tzung chien, meng-sun wu and chia-sheng wu...

48
1 Bayesian Learning for Latent Semantic Bayesian Learning for Latent Semantic Analysis Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng W Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng W u Presenter: Hsuan-Sheng Chiu

Upload: lester-alexander

Post on 28-Dec-2015

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

1

Bayesian Learning for Latent Semantic AnalysisBayesian Learning for Latent Semantic Analysis

Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng WuJen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu

Presenter: Hsuan-Sheng Chiu

Page 2: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 22

ReferenceReference

Chia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text CChia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval”, 2005ategorization and Information Retrieval”, 2005

Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuoQ. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive us density hidden Markov model based on approximate recursive Bayes estimate”, 1997Bayes estimate”, 1997

Page 3: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 33

OutlineOutline

IntroductionIntroduction

PLSAPLSAML (Maximum Likelihood)ML (Maximum Likelihood)

MAP (Maximum A Posterior)MAP (Maximum A Posterior)

QB (Quasi-Bayes)QB (Quasi-Bayes)

ExperimentsExperiments

ConclusionsConclusions

Page 4: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 44

IntroductionIntroduction

LSA vs. PLSALSA vs. PLSALinear algebra and probabilityLinear algebra and probability

Semantic space and latent topicsSemantic space and latent topics

Batch learning vs. Incremental learningBatch learning vs. Incremental learning

Page 5: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 55

PLSAPLSA

PLSA is a general machine learning technique, which adopts the PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data.aspect model to represent the co-occurrence data.

Topics (hidden variables)Topics (hidden variables)

Corpus (document-word pairs)Corpus (document-word pairs)

Kk zzZz ,...,1

MjNiji wwwdddwdY ,...,,,..., , , 11

Page 6: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 66

PLSAPLSA

Assume that dAssume that dii and w and wjj are independent conditionally on the mixtu are independent conditionally on the mixtu

re of associated topic zre of associated topic zkk

Joint probability:Joint probability:

kjkikji zwPzdPzwdP |||,

K

kikkji

K

k i

kikjki

K

k i

kijki

K

k i

ijki

K

kijkiijiji

dzPzwPdPdP

zdPzwPzPdP

dP

zdwPzPdP

dP

dwzPdP

dwzPdPdwPdPwdP

11

11

1

||||

|,,,

|,|,

Page 7: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 77

ML PLSAML PLSA

Log likelihood of Y:Log likelihood of Y:

ML estimation:ML estimation:

|logmaxarg YML

N

i

M

jjiji wdPwdnYP

1 1

,log,|log

ikkj dzPzwP |,|

Page 8: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 88

ML PLSAML PLSA

Maximization:Maximization:

N

i

M

jijji

N

i

M

jijji

N

i

M

jiji

N

i

M

jijiji

N

i

M

jijiji

N

i

M

jjiji

dwPwdn

dwPwdndPwdn

dwPdPwdn

dwPdPwdn

wdPwdnY

1 1

1 11 1

1 1

1 1

1 1

|log,max

|log,log,max

|loglog,max

|log,max

,log,max|logmax

Page 9: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 99

ML PLSAML PLSA

Complete data:Complete data:

Incomplete data:Incomplete data:

EM (Expectation-Maximization) AlgorithmEM (Expectation-Maximization) AlgorithmE-step E-step

M-stepM-step

ikj dzwP |,

ij dwP |

ijkijikj dwzPdwPdzwP ,|||,

Page 10: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1010

ML PLSAML PLSA

E-StepE-Step

iiii

N

i

M

j

K

kijkijkji

N

i

M

j

K

kikjijkji

N

i

M

jdwzijkikjji

N

i

M

jdwzijji

dddd

dwzPdwzPwdn

dzwPdwzPwdn

dwzPdzwPEwdn

dwPEwdn

ijk

ijk

ˆ,ˆ,

,|log,|,

|,log,|,

,|log|,log,

|log,

1 1 1

1 1 1

1 1,|

1 1,|

Page 11: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1111

ML PLSAML PLSA

Auxiliary function:Auxiliary function:

AndAnd

K

l illj

ikkjjik

dzPzwP

dzPzwPwdzP

1||

||,|

N

i

M

j

K

kikkjjikji

z

dzPzwPwdzPwdn

YZYEQ

1 1 1

|ˆ|ˆlog,|,

,|ˆ|,log|ˆ

Page 12: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1212

ML PLSAML PLSA

M-step:M-step:Lagrange multiplierLagrange multiplier

K

k

M

jkjk

N

i

M

j

K

kkjjikji

MLzwP

zwP

zwPwdzPwdnQkj

1 1

1 1 1|

|1

|ˆlog,|,

N

i

K

kikk

N

i

M

j

K

kikjikji

MLdzP

dzP

dzPwdzPwdnQik

1 1

1 1 1|

|1

|ˆlog,|,

Page 13: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1313

ML PLSAML PLSA

DifferentiationDifferentiation

New parameter estimation:New parameter estimation:

N

jj

jj

N

j

N

jjjj

w

wyyywF

1

1 1

1log

K

l

M

j jilji

M

i jikjiikML

M

m

N

i mikmi

N

i jikjikjML

wdzPwdn

wdzPwdndzP

wdzPwdn

wdzPwdnzwP

1 1

1

1 1

1

,|,

,|,|ˆ

,|,

,|,|ˆ

Page 14: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1414

MAP PLSAMAP PLSA

Estimation by Maximizing the posteriori probability:Estimation by Maximizing the posteriori probability:

Definition of prior distribution:Definition of prior distribution:Dirichlet density:Dirichlet density:

Prior density:Prior density:

gXPXPMAP log|logmaxarg|maxarg

K

i

K

iii xxxf i

1 1

1 1,0

K

k

N

iik

M

jkj

ikkj dzPzwPg1 1

1

1

1 ,, ||

jijiij

,0

,1Kronecker delta

kj zwP | kj zwP |Assume andare independent

Page 15: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1515

MAP PLSAMAP PLSA

Consider prior density:Consider prior density:

Maximum a Posteriori:Maximum a Posteriori:

M

i

K

kikik

K

k

M

jkjkj dzPzwPg

1 1,

1 1, |log1|log1log

N

i

M

jijji dwPwdng

1 1

|log,logmax

Page 16: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1616

MAP PLSAMAP PLSA

E-step:E-step:expectationexpectation

Auxiliary function:Auxiliary function:

N

i

K

kikik

M

j

K

kkjkj

N

i

M

j

K

kikkjjikji

dzPzwP

dzPzwPwdzPwdnR

1 1,

1 1,

1 1 1

|ˆlog1|ˆlog1

|ˆ|ˆlog,|,|ˆ~

N

i

M

jdwzijji gdwPEwdn

ijk1 1

,|log|log,

Page 17: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1717

MAP PLSAMAP PLSA

M-stepM-stepLagrange multiplierLagrange multiplier

K

kikd

M

jkjw

N

i

K

kikik

M

j

K

kkjkj

N

i

M

j

K

kikkjjikji

dzPzwP

dzPzwP

dzPzwPwdzPwdnR

11

1 1,

1 1,

1 1 1

|ˆ1|ˆ1

|ˆlog1|ˆlog1

|ˆ|ˆlog,|,|ˆ~

Page 18: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1818

MAP PLSAMAP PLSA

Auxiliary function:Auxiliary function:

M

jkjw

M

j

K

kkjkj

N

i

M

j

K

kkjjikji

MAPzwP

zwPzwP

zwPwdzPwdnQkj

11 1,

1 1 1|

|ˆ1|ˆlog1

|ˆlog,|,

K

kikd

K

k

N

iikik

N

i

M

j

K

kikjikji

MAPdzP

dzPdzP

dzPwdzPwdnQik

11 1,

1 1 1|

|ˆ1|ˆlog1

|ˆlog,|,

Page 19: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 1919

MAP PLSAMAP PLSA

DifferentiationDifferentiation

New parameter estimation:New parameter estimation:

M

m

N

i mjmikmi

N

i kjjikjikjMAP

wdzPwdn

wdzPwdnzwP

1 1 ,

1 ,

1,|,

1,|,|ˆ

K

l ili

M

j ikjikji

ikMAPdn

wdzPwdndzP

1 ,

1 ,

1

1,|,|ˆ

K

k

M

jijkiji dwzPdwndn

1 1

,|,

Page 20: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2020

QB PLSAQB PLSA

It needs to update continuously for an online information system.It needs to update continuously for an online information system.Estimation by maximize the posteriori probability:Estimation by maximize the posteriori probability:

Posterior density is approximated by the closest tractable prior density Posterior density is approximated by the closest tractable prior density with hyperparameterswith hyperparameters

As compared to MAP PLSA, the key difference using QB PLSA As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters.is due to the updating of hyperparameters.

1

1

||maxarg

||maxarg|maxarg

nn

nn

nnQB

gXP

PXPP

1,

1,

1 , nik

nkj

n

nikQBk

njQB

nQB dzPzwP |,|

Page 21: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2121

QB PLSAQB PLSA

Conjugate prior:Conjugate prior:In Bayesian probability theory, a conjugate prior is a prior distribution In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type which has the property that the posterior distribution is the same type of distribution.of distribution.

A close-form solutionA close-form solution

A reproducible prior/posteriori pair for incremental learningA reproducible prior/posteriori pair for incremental learning

Page 22: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2222

QB PLSAQB PLSA

Hyperparameter α:Hyperparameter α:

M

jkj

kjkj

M

jkjw

M

jkj

w

kjkjw

kj

kj

M

jkjw

M

j

K

kkjkj

zwPzwP

zwPzwP

zwPzwPg

1,

,

1,

1

,,

11 1,

1

1|ˆ,1,1|ˆ

1|ˆ0

|ˆ1

|ˆ1|ˆlog1log

M

m

N

i mjmikmi

N

i kjjikjikj

wdzPwdn

wdzPwdnzwP

1 1 ,

1 ,

1,|,

1,|,|ˆ

1,

1, ,|,

nkj

N

i

nj

nik

nnj

ni

nkj wdzPwdn

Page 23: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2323

QB PLSAQB PLSA

After careful arrangement, exponential of posteriori expectation fAfter careful arrangement, exponential of posteriori expectation function can be expressed:unction can be expressed:

A reproducible prior/posterior pair is generated to build the updatA reproducible prior/posterior pair is generated to build the updating mechanism of hyperparametersing mechanism of hyperparameters

K

k

N

i

nik

nM

jk

nj

n

nn

nik

nkj dzPzwP

R

1 1

1

1

1 ,, |ˆ|ˆ

|ˆexp

1,

1, ,|,

nkj

N

i

nj

nik

nnj

ni

nkj wdzPwdn

1,

1, ,|,

nkj

M

j

nj

nik

nnj

ni

nik wdzPwdn

Page 24: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2424

Initial HyperparametersInitial Hyperparameters

A open issue in Bayesian learningA open issue in Bayesian learning

If the initial prior knowledge is too strong or after a lot of If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters adaptation data usually have only a small impact on parameters updating in incremental training. updating in incremental training.

N

ijikkj wdzP

1

0, ,|1

M

jjikik wdzP

1

0, ,|1

Page 25: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2525

ExperimentsExperiments

MED Corpus: MED Corpus:

1033 medical abstracts with 30 queries1033 medical abstracts with 30 queries

7014 unique terms7014 unique terms

433 abstracts for ML training433 abstracts for ML training

600 abstracts for MAP or QB training600 abstracts for MAP or QB training

Query subset for testingQuery subset for testing

K=8K=8

Reuters-21578Reuters-21578

4270 documents for training4270 documents for training

2925 for QB learning2925 for QB learning

2790 documents for testing2790 documents for testing

13353 unique words13353 unique words

10 categories10 categories

Page 26: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2626

ExperimentsExperiments

Page 27: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2727

ExperimentsExperiments

Page 28: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2828

ExperimentsExperiments

Page 29: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 2929

ConclusionsConclusions

This paper presented an adaptive text modeling and classification This paper presented an adaptive text modeling and classification approach for PLSA based information system.approach for PLSA based information system.

Future work:Future work:Extension of PLSA for bigram or trigram will be explored.Extension of PLSA for bigram or trigram will be explored.

Application for spoken document classification and retrievalApplication for spoken document classification and retrieval

Page 30: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

30

Discriminative Maximum Entropy Discriminative Maximum Entropy Language Model for Speech RecognitionLanguage Model for Speech Recognition

Chuang-Hua Chueh, To-Chang Chien and Jen-TzunChuang-Hua Chueh, To-Chang Chien and Jen-Tzung Chieng Chien

Presenter: Hsuan-Sheng Chiu

Page 31: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3131

ReferenceReference

R. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponentiR. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponential language models : a vehicle for linguistic statistical integrational language models : a vehicle for linguistic statistical integration”, 2001”, 2001

W.H. Tsai, “An Initial Study on Language Model Estimation and W.H. Tsai, “An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary ContinuoAdaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition”, 2005us Speech Recognition”, 2005

Page 32: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3232

OutlineOutline

IntroductionIntroduction

Whole-sentence exponential modelWhole-sentence exponential model

Discriminative ME language modelDiscriminative ME language model

ExperimentExperiment

ConclusionsConclusions

Page 33: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3333

IntroductionIntroduction

Language modelLanguage modelStatistical n-gram modelStatistical n-gram model

Latent semantic language modelLatent semantic language model

Structured language modelStructured language model

Based on maximum entropy principle, we can integrate different Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution.features to establish optimal probability distribution.

Page 34: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3434

Whole-Sentence Exponential ModelWhole-Sentence Exponential Model

Traditional method:Traditional method:

Exponential form:Exponential form:

Usage:Usage:When used for speech recognition, the model is not suitable for the When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists.first pass of the recognizer, and should be used to re-score N-best lists.

iii sfsp

Zsp exp

10

n

iiin wwwpwwpsp

1111 ...|...

Page 35: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3535

Whole-Sentence ME Language ModelWhole-Sentence ME Language Model

Expectation of feature function:Expectation of feature function:Empirical:Empirical:

Actual:Actual:

Constraint:Constraint:

R

rr

Li

s

Li

Li sf

Rsfspfp

1

1~~

s

Li

Li sfspfp

Fifpfp Li

Li ,...,1for ,~

Page 36: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3636

Whole-Sentence ME Language ModelWhole-Sentence ME Language Model

To Solve the constrained optimization problem:To Solve the constrained optimization problem:

' 1

1

1

1

11

1

1

1

'exp

exp

,

exp

11exp

11expexp

1expexp ,1log

0log1,

1~log

1~,

s

F

i

Li

Li

F

i

Li

Li

s

F

i

Li

Li

s

F

i

Li

Li

s

F

i

Li

Li

F

i

Li

Li

F

i

Li

Li

ME

s

F

i s

Li

s

Li

Li

s

s

F

i

Li

Li

LiME

sf

sf

sp

sf

sfsp

sfspsfsp

sfspsp

p

spsfspsfspspsp

spfpfppHp

Page 37: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3737

GIS algorithmGIS algorithm

converged.not has if 2 step toGo 3.

''

1

,~

log1

on based update ,1each For 2.

,...,1 allfor 0tion with Initializa 1.

ˆ multiplier Lagrange Optimal :Output

~on distributi empirical and ,..., functions Feature :Input

''

'

1

Li

s i

Li

Li

s

Li

i

Li

LiL

iLi

Li

Li

LF

L

sfsfspsfsp

F

fp

fp

F

,...,Fi

Fi

spff

Page 38: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3838

Discriminative ME Language ModelDiscriminative ME Language Model

In general, ME can be considered as a maximum likelihood In general, ME can be considered as a maximum likelihood model using log-linear distribution.model using log-linear distribution.

Propose a Discriminative language model based on whole-Propose a Discriminative language model based on whole-sentence ME model (DME)sentence ME model (DME)

Page 39: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 3939

Discriminative ME Language ModelDiscriminative ME Language Model

Acoustic features for ME estimation:Acoustic features for ME estimation:Sentence-level log-likelihood ratio of competing and target sentencesSentence-level log-likelihood ratio of competing and target sentences

Feature weight parameter:Feature weight parameter:

Namely, we activate feature parameter to be one for those speech signals Namely, we activate feature parameter to be one for those speech signals observed in training database observed in training database

X

XX

AX

ss

sssXp

sXpsf

if 0

if |

|log

sentence competing :

sentence target :

s

sX

if 0

if 1

X

XAX

X

Page 40: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4040

Discriminative ME Language ModelDiscriminative ME Language Model

New estimation:New estimation:

Upgrade to discriminative linguistic parametersUpgrade to discriminative linguistic parameters

' 1

1

''exp

exp

s

AX

AX

F

i

Li

Li

AX

AX

F

i

Li

Li

LA

sfsf

sfsf

sp

' 1

1

'exp

exp

s

F

i

Li

DLi

F

i

Li

DLi

DME

sf

sf

sp

Page 41: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4141

Discriminative ME Language ModelDiscriminative ME Language Model

Page 42: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4242

ExperimentExperiment

Corpus: TCC300Corpus: TCC30032 mixtures32 mixtures

12 Mel-frequency cepstral coefficients12 Mel-frequency cepstral coefficients

1 log-energy and first derivation1 log-energy and first derivation

4200 sentences for training, 450 for testing4200 sentences for training, 450 for testing

Corpus: Academia Sinica CKIP balanced corpusCorpus: Academia Sinica CKIP balanced corpusFive million wordsFive million words

Vocabulary 32909 wordsVocabulary 32909 words

Page 43: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4343

ExperimentExperiment

Page 44: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4444

ConclusionsConclusions

A new ME language model integrating linguistic and acoustic A new ME language model integrating linguistic and acoustic features for speech recognitionfeatures for speech recognition

The derived ME language model was inherent with The derived ME language model was inherent with discriminative power.discriminative power.

DME model involved a constrained optimization procedure and DME model involved a constrained optimization procedure and was powerful for knowledge integration.was powerful for knowledge integration.

Page 45: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4545

Relation between DME and MMI Relation between DME and MMI

MMI criterion:MMI criterion:

Modified MMI criterion:Modified MMI criterion:

Express ME model as ML model:Express ME model as ML model:

'

''|

|log

,log

S

MMI SpSXp

SXp

XpSp

XSp

R

r s r

rrr

S

MMI spsXp

spsXp

SpSXp

SpSXp

1 ''''|

|log

''|

|log

~

Page 46: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4646

Relation between DME and MMIRelation between DME and MMI

The optimal parameter:The optimal parameter:

R

r

s

AX

AX

F

i

Li

Li

rA

XAX

F

ir

Li

Li

R

r

s

F

ir

LAi

LAi

F

ir

LAi

LAi

R

rrLALADME

sfsf

sfsf

sf

sf

spsp

rr

rr

r

1

' 1

1

1

' 1

1

1

''exp

exp

logmaxarg

'exp

exp

logmaxarg

logmaxargˆ

Page 47: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4747

Relation between DME and MMIRelation between DME and MMI

MMI

R

rs

r

rrr

R

r

s

F

i

Li

Lir

F

ir

Li

Lirr

R

r

sr

F

i

Li

Li

rr

F

ir

Li

Li

DME

spsXp

spsXp

sfsXp

sfsXp

sXpsf

sXpsf

~

''|

|log

'exp'|

exp|

log

'|log'exp

|logexp

log

1'

1

' 1

1

1

' 1

1

Page 48: 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

Speech Lab. NTNU 4848

Relation between DME and MMIRelation between DME and MMI