cs 4705 hidden markov models - department of …kathy/nlp/2017/classslides/class6...cs 4705 hidden...
TRANSCRIPT
CS4705HiddenMarkovModels
9/20
/17
1
Slides adapted from Dan Jurafsky, and James Martin
AnnouncementsandQuestions• HW1:Determinewhetherunigrams,bigrams,trigramsorsomecombina7onofthethreeworksbest,experimentwithMLparameters(e.g.,kernelandCforSVM).Thendofeatureselec7onandaddi7onalfeaturesontheresult.
• Keepinmindthatyoucanhaveloweraccuracywithoutlargepenaltyinpoints.
• Finalexam:tenta7velyscheduledfor12/21butwillbefinalizedinNovbyregistrar.Wewillhavetheexamontheexamdate.
• Classelectronicpolicy:noopenlaptopsinclass.
9/20
/17
2
POStaggingasasequenceclassi?icationtask• Wearegivenasentence(an“observa7on”or“sequenceofobserva7ons”)• Secretariatisexpectedtoracetomorrow
• Whatisthebestsequenceoftagswhichcorrespondstothissequenceofobserva7ons?
• Probabilis7cview:• Considerallpossiblesequencesoftags• Choosethetagsequencewhichismostprobablegiventheobserva7onsequenceofnwordsw1…wn.
9/20
/17
3
GettingtoHMM• Outofallsequencesofntagst1…tnwantthesingletagsequencesuchthatP(t1…tn|w1…wn)ishighest.
• Hat^means“oures7mateofthebestone”
• Argmaxxf(x)means“thexsuchthatf(x)ismaximized”
9/20
/17
4
GettingtoHMM• Thisequa7onisguaranteedtogiveusthebesttagsequence
• Intui7onofBayesianclassifica7on:
• UseBayesruletotransformintoasetofotherprobabili7esthatareeasiertocompute
9/20
/17
5
UsingBayesRule
9/20
/17
6
Likelihoodandprior
n
Twokindsofprobabilities(1)• Tagtransi7onprobabili7esp(ti|ti-1)
• Determinerslikelytoprecedeadjsandnouns• That/DTflight/NN• The/DTyellow/JJhat/NN• SoweexpectP(NN|DT)andP(JJ|DT)tobehigh• ButP(DT|JJ)tobelow
• ComputeP(NN|DT)bycoun7nginalabeledcorpus:
Twokindsofprobabilities(2)• Wordlikelihoodprobabili7esp(wi|ti)
• VBZ(3sgPresverb)likelytobe“is”• ComputeP(is|VBZ)bycoun7nginalabeledcorpus:
9/20
/17
9
AnExample:theverb“race”
• Secretariat/NNPis/VBZexpected/VBNto/TOrace/VBtomorrow/NR
• People/NNScon7nue/VBto/TOinquire/VBthe/DTreason/NNfor/INthe/DTrace/NNfor/INouter/JJspace/NN
• Howdowepicktherighttag?
9/20
/17
10
Disambiguating“race”
9/20
/17
11
Disambiguating“race”
9/20
/17
12
Disambiguating“race”
9/20
/17
13
Disambiguating“race”
9/20
/17
14
• P(NN|TO)=.00047• P(VB|TO)=.83• P(race|NN)=.00057• P(race|VB)=.00012• P(NR|VB)=.0027• P(NR|NN)=.0012• P(VB|TO)P(NR|VB)P(race|VB)=.00000027• P(NN|TO)P(NR|NN)P(race|NN)=.00000000032• Sowe(correctly)choosetheverbreading,
9/20
/17
15
De?initions• Aweightedfinite-stateautomatonaddsprobabili7estothearcs
• Thesumoftheprobabili7esleavinganyarcmustsumtoone
• AMarkovchainisaspecialcaseofaWFST• theinputsequenceuniquelydetermineswhichstatestheautomatonwillgothrough
• Markovchainscan’trepresentinherentlyambiguousproblems• Assignsprobabili7estounambiguoussequences
9/20
/17
16
Markovchainforweather
9/20
/17
17
Markovchainforwords
9/20
/17
18
Markovchain=“First-orderobservableMarkovModel”• asetofstates
• Q=q1,q2…qN;thestateat7metisqt• Transi7onprobabili7es:
• asetofprobabili7esA=a01a02…an1…ann.• Eachaijrepresentstheprobabilityoftransi7oningfromstateitostatej• Thesetoftheseisthetransi7onprobabilitymatrixA
• Dis7nguishedstartandendstates
9/20
/17
19 €
aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N
€
aij =1; 1≤ i ≤ Nj=1
N
∑
Markovchain=“First-orderobservableMarkovModel”
• Currentstateonlydependsonpreviousstate
9/20
/17
20 €
P(qi |q1...qi−1) = P(qi |qi−1)
Anotherrepresentationforstartstate
• Insteadofstartstate
• Specialini7alprobabilityvectorπ
• Anini7aldistribu7onoverprobabilityofstartstates
• Constraints:
9/20/17 21
€
π i = P(q1 = i) 1≤ i ≤ N
€
π j =1j=1
N
∑
Theweather?igureusingpi
9/20
/17
22
Theweather?igure:speci?icexample
9/20
/17
23
Markovchainforweather• Whatistheprobabilityof4consecu7verainydays?• Sequenceisrainy-rainy-rainy-rainy• I.e.,statesequenceis3-3-3-3• P(3,3,3,3)=
• π1a11a11a11a11=0.2x(0.6)3=0.0432
9/20
/17
24
Response
9/20
/17
25
HiddenMarkovModels• Wedon’tobservePOStags
• Weinferthemfromthewordswesee
• Observedevents
• Hiddenevents
9/20
/17
26
HMMforIceCream• Youareaclimatologistintheyear2799• Studyingglobalwarming• Youcan’tfindanyrecordsoftheweatherinNewYork,NYforsummerof2007
• ButyoufindKathyMcKeown’sdiary• Whichlistshowmanyice-creamsKathyateeverydatethatsummer
• Ourjob:figureouthowhotitwas
9/20
/17
27
HiddenMarkovModel• ForMarkovchains,theoutputsymbolsarethesameasthestates.• Seehotweather:we’reinstatehot
• Butinpart-of-speechtagging(andotherthings)• Theoutputsymbolsarewords• Thehiddenstatesarepart-of-speechtags
• Soweneedanextension!• AHiddenMarkovModelisanextensionofaMarkovchaininwhichtheinputsymbolsarenotthesameasthestates.
• Thismeanswedon’tknowwhichstatewearein.
9/20
/17
28
HiddenMarkovModels• StatesQ = q1, q2…qN; • Observa7onsO= o1, o2…oN;
• Eachobserva7onisasymbolfromavocabularyV={v1,v2,…vV}• Transi7onprobabili7es
• Transition probability matrix A = {aij}
• Observa7onlikelihoods• Output probability matrix B={bi(k)}
• Specialini7alprobabilityvectorπ
€
π i = P(q1 = i) 1≤ i ≤ N
€
aij = P(qt = j |qt−1 = i) 1≤ i, j ≤ N
€
bi(k) = P(Xt = ok |qt = i)
HiddenMarkovModels
• Someconstraints
9/20
/17
30
€
π i = P(q1 = i) 1≤ i ≤ N
€
aij =1; 1≤ i ≤ Nj=1
N
∑
€
bi(k) =1k=1
M
∑
€
π j =1j=1
N
∑
Assumptions• Markovassump5on:
• Output-independenceassump5on
9/20
/17
31
€
P(qi |q1...qi−1) = P(qi |qi−1)
€
P(ot |O1t−1,q1
t ) = P(ot |qt )
McKeowntask• Given
• IceCreamObserva7onSequence:1,2,3,2,2,2,3…
• Produce:• WeatherSequence:H,C,H,H,H,C…
9/20
/17
32
HMMforicecream
9/20
/17
33
DifferenttypesofHMMstructure
Bakis = left-to-right
Ergodic = fully-connected
TransitionsbetweenthehiddenstatesofHMM,showingAprobs
9/20
/17
35
BobservationlikelihoodsforPOSHMM
ThreefundamentalProblemsforHMMs
• Likelihood:GivenanHMMλ=(A,B)andanobserva7onsequenceO,determinethelikelihoodP(O,λ).
• Decoding:Givenanobserva7onsequenceOandanHMMλ=(A,B),discoverthebesthiddenstatesequenceQ.
• Learning:Givenanobserva7onsequenceOandthesetofstatesintheHMM,learntheHMMparametersAandB.WhatkindofdatawouldweneedtolearntheHMMparameters?
9/20
/17
37
Response
9/20
/17
38
Decoding• Thebesthiddensequence
• Weathersequenceintheicecreamtask• POSsequencegivenaninputsentence
• Wecoulduseargmaxovertheprobabilityofeachpossiblehiddenstatesequence• Whynot?
• Viterbialgorithm• Dynamicprogrammingalgorithm• Usesadynamicprogrammingtrellis
• Eachtrelliscellrepresents,vt(j),representstheprobabilitythattheHMMisinstatejaverseeingthefirsttobserva7onsandpassingthroughthemostlikelystatesequence
9/20
/17
39
Viterbiintuition:wearelookingforthebest‘path’
9/20
/17
40 promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
S1 S2 S4 S3 S5
promised to back the bill
VBD
VBN
TO
VB
JJ
NN
RB
DT
NNP
VB
NN
Slide from Dekang Lin
Intuition• ThevalueineachcelliscomputedbytakingtheMAXoverallpathsthatleadtothiscell.
• Anextensionofapathfromstateiat7met-1iscomputedbymul7plying:
9/20
/17
41
TheViterbiAlgorithm
9/20
/17
42
TheAmatrixforthePOSHMM
9/20
/17
43
What is P(VB|TO)? What is P(NN|TO)? Why does this make sense? What is P(TO|VB)? What is P(TO|NN)? Why does this make sense?
TheBmatrixforthePOSHMM
9/20
/17
44
Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.
Viterbiexample
9/20
/17
45 t=1
Viterbiexample
9/20
/17
46 t=1
X
Viterbiexample
9/20
/17
47 t=1
X
J=NN
I=S
TheAmatrixforthePOSHMM
9/20
/17
48
Viterbiexample
9/20
/17
49 t=1
X
J=NN
I=S
.041X
TheBmatrixforthePOSHMM
9/20
/17
50
Look at P(want|VB) and P(want|NN). Give an explanation for the difference in the probabilities.
Viterbiexample
9/20
/17
51 t=1
X
J=NN
I=S
.041X 0 0
Viterbiexample
9/20
/17
52 t=1
X
J=NN
I=S
.041X 0 0
0
0
.025
Viterbiexample
9/20
/17
53 t=1
J=NN
I=S
0
0
0
.025
Show the 4 formulas you would use to compute the value at this node and the max.
ComputingtheLikelihoodofanobservation
• Forwardalgorithm
• Exactlyliketheviterbialgorithm,except• Tocomputetheprobabilityofastate,sumtheprobabili7esfromeachpath
9/20
/17
54
ErrorAnalysis:ESSENTIAL!!!• Lookataconfusionmatrix
• Seewhaterrorsarecausingproblems• Noun(NN)vsProperNoun(NN)vsAdj(JJ)• Adverb(RB)vsPrep(IN)vsNoun(NN)• Preterite(VBD)vsPar7ciple(VBN)vsAdjec7ve(JJ)
9/20
/17
55