cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...

Post on 02-Jul-2018

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS4705HiddenMarkovModels

9/20

/17

1

Slides adapted from Dan Jurafsky, and James Martin

AnnouncementsandQuestions•  HW1:Determinewhetherunigrams,bigrams,trigramsorsomecombina7onofthethreeworksbest,experimentwithMLparameters(e.g.,kernelandCforSVM).Thendofeatureselec7onandaddi7onalfeaturesontheresult.

•  Keepinmindthatyoucanhaveloweraccuracywithoutlargepenaltyinpoints.

•  Finalexam:tenta7velyscheduledfor12/21butwillbefinalizedinNovbyregistrar.Wewillhavetheexamontheexamdate.

•  Classelectronicpolicy:noopenlaptopsinclass.

9/20

/17

2

POStaggingasasequenceclassi?icationtask• Wearegivenasentence(an“observa7on”or“sequenceofobserva7ons”)•  Secretariatisexpectedtoracetomorrow

• Whatisthebestsequenceoftagswhichcorrespondstothissequenceofobserva7ons?

•  Probabilis7cview:•  Considerallpossiblesequencesoftags•  Choosethetagsequencewhichismostprobablegiventheobserva7onsequenceofnwordsw1…wn.

9/20

/17

3

GettingtoHMM•  Outofallsequencesofntagst1…tnwantthesingletagsequencesuchthatP(t1…tn|w1…wn)ishighest.

•  Hat^means“oures7mateofthebestone”

•  Argmaxxf(x)means“thexsuchthatf(x)ismaximized”

9/20

/17

4

GettingtoHMM•  Thisequa7onisguaranteedtogiveusthebesttagsequence

•  Intui7onofBayesianclassifica7on:

•  UseBayesruletotransformintoasetofotherprobabili7esthatareeasiertocompute

9/20

/17

5

UsingBayesRule

9/20

/17

6

Likelihoodandprior

n

Twokindsofprobabilities(1)• Tagtransi7onprobabili7esp(ti|ti-1)

• Determinerslikelytoprecedeadjsandnouns•  That/DTflight/NN•  The/DTyellow/JJhat/NN•  SoweexpectP(NN|DT)andP(JJ|DT)tobehigh•  ButP(DT|JJ)tobelow

• ComputeP(NN|DT)bycoun7nginalabeledcorpus:

Twokindsofprobabilities(2)• Wordlikelihoodprobabili7esp(wi|ti)

• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycoun7nginalabeledcorpus:

9/20

/17

9

AnExample:theverb“race”

•  Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR

•  People/NNScon7nue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN

•  Howdowepicktherighttag?

9/20

/17

10

Disambiguating“race”

9/20

/17

11

Disambiguating“race”

9/20

/17

12

Disambiguating“race”

9/20

/17

13

Disambiguating“race”

9/20

/17

14

•  P(NN|TO)=.00047•  P(VB|TO)=.83•  P(race|NN)=.00057•  P(race|VB)=.00012•  P(NR|VB)=.0027•  P(NR|NN)=.0012•  P(VB|TO)P(NR|VB)P(race|VB)=.00000027•  P(NN|TO)P(NR|NN)P(race|NN)=.00000000032•  Sowe(correctly)choosetheverbreading,

9/20

/17

15

De?initions•  Aweightedfinite-stateautomatonaddsprobabili7estothearcs

•  Thesumoftheprobabili7esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili7estounambiguoussequences

9/20

/17

16

Markovchainforweather

9/20

/17

17

Markovchainforwords

9/20

/17

18

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat7metisqt•  Transi7onprobabili7es:

•  asetofprobabili7esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi7oningfromstateitostatej•  Thesetoftheseisthetransi7onprobabilitymatrixA

•  Dis7nguishedstartandendstates

9/20

/17

19 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

N

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

9/20

/17

20 €

P(qi |q1...qi−1) = P(qi |qi−1)

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini7alprobabilityvectorπ

•  Anini7aldistribu7onoverprobabilityofstartstates

•  Constraints:

9/20/17 21

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

N

Theweather?igureusingpi

9/20

/17

22

Theweather?igure:speci?icexample

9/20

/17

23

Markovchainforweather•  Whatistheprobabilityof4consecu7verainydays?•  Sequenceisrainy-rainy-rainy-rainy•  I.e.,statesequenceis3-3-3-3•  P(3,3,3,3)=

•  π1a11a11a11a11=0.2x(0.6)3=0.0432

9/20

/17

24

Response

9/20

/17

25

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

9/20

/17

26

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

9/20

/17

27

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

9/20

/17

28

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa7onsO= o1, o2…oN;

•  Eachobserva7onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi7onprobabili7es

• Transition probability matrix A = {aij}

•  Observa7onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini7alprobabilityvectorπ

π i = P(q1 = i) 1≤ i ≤ N

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

•  Someconstraints

9/20

/17

30

π i = P(q1 = i) 1≤ i ≤ N

aij =1; 1≤ i ≤ Nj=1

N

bi(k) =1k=1

M

π j =1j=1

N

Assumptions• Markovassump5on:

• Output-independenceassump5on

9/20

/17

31

P(qi |q1...qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot |qt )

McKeowntask•  Given

•  IceCreamObserva7onSequence:1,2,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

9/20

/17

32

HMMforicecream

9/20

/17

33

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

9/20

/17

35

BobservationlikelihoodsforPOSHMM

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva7onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva7onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva7onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

9/20

/17

37

Response

9/20

/17

38

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaverseeingthefirsttobserva7onsandpassingthroughthemostlikelystatesequence

9/20

/17

39

Viterbiintuition:wearelookingforthebest‘path’

9/20

/17

40 promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4 S3 S5

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat7met-1iscomputedbymul7plying:

9/20

/17

41

TheViterbiAlgorithm

9/20

/17

42

TheAmatrixforthePOSHMM

9/20

/17

43

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

TheBmatrixforthePOSHMM

9/20

/17

44

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

9/20

/17

45 t=1

Viterbiexample

9/20

/17

46 t=1

X

Viterbiexample

9/20

/17

47 t=1

X

J=NN

I=S

TheAmatrixforthePOSHMM

9/20

/17

48

Viterbiexample

9/20

/17

49 t=1

X

J=NN

I=S

.041X

TheBmatrixforthePOSHMM

9/20

/17

50

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

9/20

/17

51 t=1

X

J=NN

I=S

.041X 0 0

Viterbiexample

9/20

/17

52 t=1

X

J=NN

I=S

.041X 0 0

0

0

.025

Viterbiexample

9/20

/17

53 t=1

J=NN

I=S

0

0

0

.025

Show the 4 formulas you would use to compute the value at this node and the max.

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili7esfromeachpath

9/20

/17

54

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar7ciple(VBN)vsAdjec7ve(JJ)

9/20

/17

55

top related