cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...

CS4705HiddenMarkovModels

9/20

/17

1

Slides adapted from Dan Jurafsky, and James Martin

AnnouncementsandQuestions•  HW1:Determinewhetherunigrams,bigrams,trigramsorsomecombina7onofthethreeworksbest,experimentwithMLparameters(e.g.,kernelandCforSVM).Thendofeatureselec7onandaddi7onalfeaturesontheresult.

•  Keepinmindthatyoucanhaveloweraccuracywithoutlargepenaltyinpoints.

•  Finalexam:tenta7velyscheduledfor12/21butwillbefinalizedinNovbyregistrar.Wewillhavetheexamontheexamdate.

•  Classelectronicpolicy:noopenlaptopsinclass.

9/20

/17

2

POStaggingasasequenceclassi?icationtask• Wearegivenasentence(an“observa7on”or“sequenceofobserva7ons”)•  Secretariatisexpectedtoracetomorrow

• Whatisthebestsequenceoftagswhichcorrespondstothissequenceofobserva7ons?

•  Probabilis7cview:•  Considerallpossiblesequencesoftags•  Choosethetagsequencewhichismostprobablegiventheobserva7onsequenceofnwordsw1…wn.

9/20

/17

3

GettingtoHMM•  Outofallsequencesofntagst1…tnwantthesingletagsequencesuchthatP(t1…tn|w1…wn)ishighest.

•  Hat^means“oures7mateofthebestone”

•  Argmaxxf(x)means“thexsuchthatf(x)ismaximized”

9/20

/17

4

GettingtoHMM•  Thisequa7onisguaranteedtogiveusthebesttagsequence

•  Intui7onofBayesianclassifica7on:

•  UseBayesruletotransformintoasetofotherprobabili7esthatareeasiertocompute

9/20

/17

5

UsingBayesRule

9/20

/17

6

Likelihoodandprior

n

Twokindsofprobabilities(1)• Tagtransi7onprobabili7esp(ti|ti-1)

• Determinerslikelytoprecedeadjsandnouns•  That/DTflight/NN•  The/DTyellow/JJhat/NN•  SoweexpectP(NN|DT)andP(JJ|DT)tobehigh•  ButP(DT|JJ)tobelow

• ComputeP(NN|DT)bycoun7nginalabeledcorpus:

Twokindsofprobabilities(2)• Wordlikelihoodprobabili7esp(wi|ti)

• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycoun7nginalabeledcorpus:

9/20

/17

9

AnExample:theverb“race”

•  Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR

•  People/NNScon7nue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN

•  Howdowepicktherighttag?

9/20

/17

10

Disambiguating“race”

9/20

/17

11


9/20

/17

12


9/20

/17

13


9/20

/17

14

De?initions•  Aweightedfinite-stateautomatonaddsprobabili7estothearcs

•  Thesumoftheprobabili7esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili7estounambiguoussequences

9/20

/17

16

Markovchainforweather

9/20

/17

17

Markovchainforwords

9/20

/17

18

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat7metisqt•  Transi7onprobabili7es:

•  asetofprobabili7esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi7oningfromstateitostatej•  Thesetoftheseisthetransi7onprobabilitymatrixA

•  Dis7nguishedstartandendstates

9/20

/17

19 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

€

aij =1; 1≤ i ≤ Nj=1

N

∑

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

9/20

/17

20 €

P(qi |q1...qi−1) = P(qi |qi−1)

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini7alprobabilityvectorπ

•  Anini7aldistribu7onoverprobabilityofstartstates

•  Constraints:

9/20/17 21

€

π i = P(q1 = i) 1≤ i ≤ N

€

π j =1j=1

N

∑

Theweather?igureusingpi

9/20

/17

22

Theweather?igure:speci?icexample

9/20

/17

23

Markovchainforweather•  Whatistheprobabilityof4consecu7verainydays?•  Sequenceisrainy-rainy-rainy-rainy•  I.e.,statesequenceis3-3-3-3•  P(3,3,3,3)=

•  π1a11a11a11a11=0.2x(0.6)3=0.0432

9/20

/17

24

Response

9/20

/17

25

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

9/20

/17

26

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

9/20

/17

27

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

9/20

/17

28

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa7onsO= o1, o2…oN;

•  Eachobserva7onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi7onprobabili7es

• Transition probability matrix A = {aij}

•  Observa7onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini7alprobabilityvectorπ

€

π i = P(q1 = i) 1≤ i ≤ N

€

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

€

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

•  Someconstraints

9/20

/17

30

€

π i = P(q1 = i) 1≤ i ≤ N

€

aij =1; 1≤ i ≤ Nj=1

N

∑

€

bi(k) =1k=1

M

∑

€

π j =1j=1

N

∑

Assumptions• Markovassump5on:

• Output-independenceassump5on

9/20

/17

31

€

P(qi |q1...qi−1) = P(qi |qi−1)

€

P(ot |O1t−1,q1

t ) = P(ot |qt )

McKeowntask•  Given

•  IceCreamObserva7onSequence:1,2,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

9/20

/17

32

HMMforicecream

9/20

/17

33

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

9/20

/17

35

BobservationlikelihoodsforPOSHMM

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva7onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva7onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva7onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

9/20

/17

37

Response

9/20

/17

38

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaverseeingthefirsttobserva7onsandpassingthroughthemostlikelystatesequence

9/20

/17

39

Viterbiintuition:wearelookingforthebest‘path’

9/20

/17

40 promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

S1 S2 S4 S3 S5

promised to back the bill

VBD

VBN

TO

VB

JJ

NN

RB

DT

NNP

VB

NN

Slide from Dekang Lin

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat7met-1iscomputedbymul7plying:

9/20

/17

41

TheViterbiAlgorithm

9/20

/17

42

TheAmatrixforthePOSHMM

9/20

/17

43

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

TheBmatrixforthePOSHMM

9/20

/17

44

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

9/20

/17

45 t=1

Viterbiexample

9/20

/17

46 t=1

X

Viterbiexample

9/20

/17

47 t=1

X

J=NN

I=S

TheAmatrixforthePOSHMM

9/20

/17

48

Viterbiexample

9/20

/17

49 t=1

X

J=NN

I=S

.041X

TheBmatrixforthePOSHMM

9/20

/17

50

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

9/20

/17

51 t=1

X

J=NN

I=S

.041X 0 0

Viterbiexample

9/20

/17

52 t=1

X

J=NN

I=S

.041X 0 0

0

0

.025

Viterbiexample

9/20

/17

53 t=1

J=NN

I=S

0

0

0

.025

Show the 4 formulas you would use to compute the value at this node and the max.

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili7esfromeachpath

9/20

/17

54

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar7ciple(VBN)vsAdjec7ve(JJ)

9/20

/17

55

cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...

Documents