cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...

CS4705HiddenMarkovModels

Slides adapted from Dan Jurafsky, and James Martin

AnnouncementsandQuestions•  HW1:Determinewhetherunigrams,bigrams,trigramsorsomecombina7onofthethreeworksbest,experimentwithMLparameters(e.g.,kernelandCforSVM).Thendofeatureselec7onandaddi7onalfeaturesontheresult.

•  Keepinmindthatyoucanhaveloweraccuracywithoutlargepenaltyinpoints.

•  Finalexam:tenta7velyscheduledfor12/21butwillbefinalizedinNovbyregistrar.Wewillhavetheexamontheexamdate.

•  Classelectronicpolicy:noopenlaptopsinclass.

POStaggingasasequenceclassi?icationtask• Wearegivenasentence(an“observa7on”or“sequenceofobserva7ons”)•  Secretariatisexpectedtoracetomorrow

• Whatisthebestsequenceoftagswhichcorrespondstothissequenceofobserva7ons?

•  Probabilis7cview:•  Considerallpossiblesequencesoftags•  Choosethetagsequencewhichismostprobablegiventheobserva7onsequenceofnwordsw1…wn.

GettingtoHMM•  Outofallsequencesofntagst1…tnwantthesingletagsequencesuchthatP(t1…tn|w1…wn)ishighest.

•  Hat^means“oures7mateofthebestone”

•  Argmaxxf(x)means“thexsuchthatf(x)ismaximized”

GettingtoHMM•  Thisequa7onisguaranteedtogiveusthebesttagsequence

•  Intui7onofBayesianclassifica7on:

•  UseBayesruletotransformintoasetofotherprobabili7esthatareeasiertocompute

UsingBayesRule

Likelihoodandprior

Twokindsofprobabilities(1)• Tagtransi7onprobabili7esp(ti|ti-1)

• Determinerslikelytoprecedeadjsandnouns•  That/DTflight/NN•  The/DTyellow/JJhat/NN•  SoweexpectP(NN|DT)andP(JJ|DT)tobehigh•  ButP(DT|JJ)tobelow

• ComputeP(NN|DT)bycoun7nginalabeledcorpus:

Twokindsofprobabilities(2)• Wordlikelihoodprobabili7esp(wi|ti)

• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycoun7nginalabeledcorpus:

AnExample:theverb“race”

•  Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR

•  People/NNScon7nue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN

•  Howdowepicktherighttag?

Disambiguating“race”

De?initions•  Aweightedfinite-stateautomatonaddsprobabili7estothearcs

•  Thesumoftheprobabili7esleavinganyarcmustsumtoone

•  AMarkovchainisaspecialcaseofaWFST•  theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough

•  Markovchainscan’trepresentinherentlyambiguousproblems•  Assignsprobabili7estounambiguoussequences

Markovchainforweather

Markovchainforwords

Markovchain=“First-orderobservableMarkovModel”•  asetofstates

•  Q=q1,q2…qN;thestateat7metisqt•  Transi7onprobabili7es:

•  asetofprobabili7esA=a01a02…an1…ann.•  Eachaijrepresentstheprobabilityoftransi7oningfromstateitostatej•  Thesetoftheseisthetransi7onprobabilitymatrixA

•  Dis7nguishedstartandendstates

19 €

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

aij =1; 1≤ i ≤ Nj=1

Markovchain=“First-orderobservableMarkovModel”

• Currentstateonlydependsonpreviousstate

20 €

P(qi |q1...qi−1) = P(qi |qi−1)

Anotherrepresentationforstartstate

•  Insteadofstartstate

•  Specialini7alprobabilityvectorπ

•  Anini7aldistribu7onoverprobabilityofstartstates

•  Constraints:

9/20/17 21

π i = P(q1 = i) 1≤ i ≤ N

π j =1j=1

Theweather?igureusingpi

Theweather?igure:speci?icexample

Markovchainforweather•  Whatistheprobabilityof4consecu7verainydays?•  Sequenceisrainy-rainy-rainy-rainy•  I.e.,statesequenceis3-3-3-3•  P(3,3,3,3)=

•  π1a11a11a11a11=0.2x(0.6)3=0.0432

Response

HiddenMarkovModels•  Wedon’tobservePOStags

•  Weinferthemfromthewordswesee

•  Observedevents

•  Hiddenevents

HMMforIceCream•  Youareaclimatologistintheyear2799•  Studyingglobalwarming•  Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007

•  ButyoufindKathyMcKeown’sdiary•  Whichlistshowmanyice-creamsKathyateeverydatethatsummer

•  Ourjob:figureouthowhotitwas

HiddenMarkovModel•  ForMarkovchains,theoutputsymbolsarethesameasthestates.•  Seehotweather:we’reinstatehot

• Butinpart-of-speechtagging(andotherthings)•  Theoutputsymbolsarewords•  Thehiddenstatesarepart-of-speechtags

•  Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.

•  Thismeanswedon’tknowwhichstatewearein.

HiddenMarkovModels•  StatesQ = q1, q2…qN; •  Observa7onsO= o1, o2…oN;

•  Eachobserva7onisasymbolfromavocabularyV={v1,v2,…vV}•  Transi7onprobabili7es

• Transition probability matrix A = {aij}

•  Observa7onlikelihoods• Output probability matrix B={bi(k)}

•  Specialini7alprobabilityvectorπ

π i = P(q1 = i) 1≤ i ≤ N

aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N

bi(k) = P(Xt = ok |qt = i)

HiddenMarkovModels

•  Someconstraints

π i = P(q1 = i) 1≤ i ≤ N

aij =1; 1≤ i ≤ Nj=1

bi(k) =1k=1

π j =1j=1

Assumptions• Markovassump5on:

• Output-independenceassump5on

P(qi |q1...qi−1) = P(qi |qi−1)

P(ot |O1t−1,q1

t ) = P(ot |qt )

McKeowntask•  Given

•  IceCreamObserva7onSequence:1,2,3,2,2,2,3…

•  Produce:• WeatherSequence:H,C,H,H,H,C…

HMMforicecream

DifferenttypesofHMMstructure

Bakis = left-to-right

Ergodic = fully-connected

TransitionsbetweenthehiddenstatesofHMM,showingAprobs

BobservationlikelihoodsforPOSHMM

ThreefundamentalProblemsforHMMs

•  Likelihood:GivenanHMMλ=(A,B)andanobserva7onsequenceO,determinethelikelihoodP(O,λ).

•  Decoding:Givenanobserva7onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.

•  Learning:Givenanobserva7onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?

Response

Decoding•  Thebesthiddensequence

• Weathersequenceintheicecreamtask•  POSsequencegivenaninputsentence

• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?

• Viterbialgorithm•  Dynamicprogrammingalgorithm•  Usesadynamicprogrammingtrellis

•  Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaverseeingthefirsttobserva7onsandpassingthroughthemostlikelystatesequence

Viterbiintuition:wearelookingforthebest‘path’

40 promised to back the bill

promised to back the bill

S1 S2 S4 S3 S5

promised to back the bill

Slide from Dekang Lin

Intuition•  ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.

•  Anextensionofapathfromstateiat7met-1iscomputedbymul7plying:

TheViterbiAlgorithm

TheAmatrixforthePOSHMM

What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?

TheBmatrixforthePOSHMM

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

45 t=1

Viterbiexample

46 t=1

Viterbiexample

47 t=1

TheAmatrixforthePOSHMM

Viterbiexample

49 t=1

TheBmatrixforthePOSHMM

Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.

Viterbiexample

51 t=1

.041X 0 0

Viterbiexample

52 t=1

.041X 0 0

Viterbiexample

53 t=1

Show the 4 formulas you would use to compute the value at this node and the max.

ComputingtheLikelihoodofanobservation

•  Forwardalgorithm

•  Exactlyliketheviterbialgorithm,except•  Tocomputetheprobabilityofastate,sumtheprobabili7esfromeachpath

ErrorAnalysis:ESSENTIAL!!!•  Lookataconfusionmatrix

•  Seewhaterrorsarecausingproblems•  Noun(NN)vsProperNoun(NN)vsAdj(JJ)•  Adverb(RB)vsPrep(IN)vsNoun(NN)•  Preterite(VBD)vsPar7ciple(VBN)vsAdjec7ve(JJ)

cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...

Documents

bioinformatics introduction to hidden markov models -...

l13: hidden markov models - texas a&m...

hidden markov models - home | princeton...

overview hidden markov models gaussian mixture models ·...

sequence classification - with emphasis on hidden markov...

hidden markov models

l13: hidden markov...

hidden markov model

hidden markov

hidden markov models./awm/tutorials/hmm14.pdf · hidden...

hidden markov models in...

hidden markov models - penn...

inference in mixed hidden markov models and applications...

hidden markov model nov 11, 2008 sung-bae cho. hidden markov...

markov chains and hidden markov models - rice university ·...

ee365: hidden markov models - stanford...

9 markov chains and hidden markov models - freie … · 9...

hidden markov models -...

hidden markov models and gaussian mixture models · hidden...

cs 4705 hidden markov models julia hirschberg cs4705