cs388: natural language processing lecture 10: interpre=ng …gdurrett/courses/fa2019/... · 2019....

37
CS388: Natural Language Processing Greg Durre8 Lecture 10: Interpre=ng NNs, Neural CRFs credit: Daniel Geng and Rishi Veerapaneni, ML @ Berkeley

Upload: others

Post on 22-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

CS388:NaturalLanguageProcessing

GregDurre8

Lecture10:Interpre=ngNNs,NeuralCRFs

credit:DanielGengandRishiVeerapaneni,ML@Berkeley

Page 2: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Administrivia‣Mini2dueinoneweek

Page 3: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Recall:RNNLMs

Isawthedog

hiP (w|context) = softmax(Whi)

‣Wisa(vocabsize)x(hiddensize)matrix

wordprobs

=

‣ Backpropagatethroughthenetworktosimultaneouslylearntopredictnextwordgivenpreviouswordsatallposi=ons

‣ Batchbygrabbingmanycon=guoussequencesoftextfromdifferentpartsofalargecorpus

Page 4: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Recall:ELMo‣ CNNovereachword=>RNN

JohnvisitedMadagascaryesterdayCharCNN CharCNN CharCNN CharCNN

4096-dimLSTMsw/512-dimprojec=ons

nextword

2048CNNfiltersprojecteddownto512-dim

Petersetal.(2018)

Representa=onofvisited (plusvectorsfrom backwardsLM)

Page 5: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Recall:ELMo

Peters,Ruder,Smith(2019)

Someneuralnetwork

they dance at balls

Taskpredic=ons(sen=ment,etc.)‣ Takethoseembeddingsandfeedthemintowhateverarchitectureyouwanttouseforyourtask

‣ ForELMo,besttousefrozenembeddings:updatetheweightsofyournetworkbutkeepELMo’sparametersfrozen

Page 6: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

ThisLecture

‣ Explainingneuralnetworks’predic=ons

‣ NeuralCRFs

Page 7: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

ExplainingNNs

Page 8: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

WhatisanExplana=on?

‣ Givenadatainstance,iden=fyproper=esoftheinput/modelthatledtoapar=culardecisionbeingmade

themoviewasgreat

‣ Supposeweight=(+5,+3),what’stheexplana=on?

features=(I[great],I[the])

‣ Supposeweight=(+0.1,+5),what’stheexplana=on?

‣ Explana=on!=“whatahumanwoulddo”.Soanyanalysisofexplana=onshastointrinsicallybeaboutourmodel

‣ Supposeweight=(+5,+0),decision=+.what’stheexplana=on?

Page 9: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Idea1:LookingatWeights

thatmoviewasnotgreat,infactitwasterrible!

‣ Isthemaximumweightalwaysright?

w(notgreat)=-5,w(great)=+5,w(terrible)=-3‣ Feats=unigramsandbigrams

‣ Classifiedasnega=ve;what’stheexplana=on?‣ notgreatandgreatcancel,don’treallycontributetotheclassifica=ondecision.Correlatedfeaturesmakeexplana=onsconfusing

‣ Howcanwedefinethis? Dele=nggreatwouldprobablyhaveli8leeffectontheclassifica=onscore

Page 10: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Idea2:Counterfactuals

thatmoviewasnot____,infactitwasterrible!

thatmoviewasnotgreat,infactitwas_____!

thatmoviewasnotgreat,infactitwasterrible!

Model

+

‣ LIME:Locally-InterpretableModel-Agnos=cExplana=ons

‣ Perturbinputmany=mesandassesstheimpactonthemodel’spredic=on

Ribeiroetal.(2016)

‣ Localbecausewe’lldoworktolearnhowtointerpretthisoneexample

‣Model-agnos>c:treatmodelasblackbox

Page 11: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LIME

h8ps://www.oreilly.com/learning/introduc=on-to-local-interpretable-model-agnos=c-explana=ons-lime

‣ Breakinputintocomponents(fortextclassifica=on:unigrams)

‣ Checkpredic=onsonsubsetsofthose

‣ Trainamodeltopredictpredic=ons,lookatthatmodel’sweights

Page 12: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LIME‣ Breakdowninputintomanysmallpiecessotheexplana=onisinterpretablex 2 Rd ! x0 2 {0, 1}d

0

<latexit sha1_base64="h/voAVCbiDtPrs8pyF04zNKqqKo=">AAAGDnicjVTLbtNAFHUhgRJeLSzZjChRbdVUdlsJhFRUwQYhIYVHH1LdWuPxxBnVL3nGdSJnvoANv8KGBQixZc2Ov+F67BSapi0jxbk+59zXzPV4aci4sKzfc1euttrXrs/f6Ny8dfvO3YXFezs8yTNCt0kSJtmehzkNWUy3BRMh3UsziiMvpLve0cuK3z2mGWdJ/EGMUnoQ4SBmfUawAMhdbD3qOjhMB9i1dW6gTeTQYao76YC5VOembToRFgOvXw6lYXQmWqFzVyg1zyO35G4pHttSooZWb3qDGqdDusIU00FPeDFxMqv4BgLOo2JmvhWVT7HqRW/A6XQKNOFxYVJhTtxV0pD2hV7nCpDDYuQIOhRZVMZBhiMOmQsXCOInAhEnY8FAVCHDJEA9fbRpjwuTGGgFOf0Mk9KW5ZFsSmebtjyEeBOpNS5cBuJOt6dPCixk1WpPL1zbUH9r4xNzvTJNgKDPENLzyhFCKLhGgC2Z2vg6SMUBEFf7MKVQzuNJc8eMM0F99B7HyrmufsKSJI+F1GeJTVQY8n+EhjydMYb9K3AskEhQkMBzuqRLBa+TQYwmKd5gHweYE5zV9YfwHfgw2Rd3cl6IC7s6x8mA0VhpTkYZdQUbsyuYFb+aqWab/p7B2CnTZ2hKmpqgeG45cgxRquEq0GW6Tpe7dmdYj7SaNq98Jw99VM8wzrKkQMPlmi8t03bkYekvS3dhyVq11EJnDbsxlrRm9dyFX46fkDyisSAh5nzftlJxUOJMMBJS2XFyTlNMjnBA98GMcUT5QamuM4m6gPion2Twg6NX6L8eJXyCfBR5oKx64NNcBc7i9nPRf3pQsjjNBY1Jnaifh9VwVXcj8llGiQhHYGCSMagVkQGGAxBwg3ZgE+zpls8aO2ur9vrq2tuNpa0XzXbMaw+0h5qu2doTbUt7pfW0bY20PrY+t762vrU/tb+0v7d/1NIrc43Pfe3Uav/8A851B/g=</latexit>

Ribeiroetal.(2016)

‣ Nowlearnamodeltopredictf(z)basedonz’.Thismodel’sweightswillserveastheexplana=onforthedecision

‣ Ifz’isverycoarse,caninterpretbutcan’tlearnagoodmodeloftheboundary.Ifz’istoofine-grained,caninterpretbutnotpredict(e.g.,z’=z)

‣ Drawsamplesz’byperturbingx’,thenreconstructzfromz’andcomputef(z)onthat

whyit’s+

Page 13: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LIME

Ribeiroetal.(2016)

‣ Useasparselinearmodeltoachieveasparseexplana=on

Page 14: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LIME

‣ Trainasparsemodel(onlylooksat10featuresofeachexample),thentrytouseLIMEtorecoverthefeatures.Greedy:removefeaturestomakepredictedclassprobdropbyasmuchaspossible

Page 15: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

themoviewasgreat

P(+|x)

Cantreatthislayerlikealinearmodel,buthowtoconnectittoinput?Orenhundredsoffeatures

‣ Supposeforgetgateisverylowandthefirstthreewordsareforgo8en

‣ Howcanwegenerallyassessimpactofawordonthepredic=on?

Idea3:WeightsRevisited

‣Wedon’thave“weights”,butwhatcantellusabouttheimpactoftheinputontheoutput?

‣ LIMEisverycomplex,butlookingatweightsistoosimple

Page 16: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Gradient-BasedMethods

Simonyanetal.(2013)

Sc=scoreofclassc I0=currentimage

‣ Approximatescorewithafirst-orderTaylorseriesapproxima=onaroundthecurrentimage

‣ Highergradientmagnitude=smallchangeinpixelsleadstolargechangeinpredic=on

‣ Togetsinglemagnitudeforapixel,maxovercolorchannels.Candothesameforaword(maxovervectorposi=ons)

‣ Sanitycheck:doesthismakesenseforlinearmodels?

Page 17: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Gradient-BasedMethods

Simonyanetal.(2013)

Page 18: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Gradient-basedMethod

good the

‣ Changingthewordlocallyhasli8leeffect:thisworddoesn’tma8ermuch

‣ Changingthewordmakesadifference:seemslikethewordishavingsomeimpact

‣ axes=wordvectorvalues.Lightercolor=higherposi=veclassprobability

Page 19: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Gradientsvs.LIME

‣ Explana=onmethodsshouldpredictfeatureswhich,whendeleted,causethepredic=ontoflip

Nguyen(2018)

‣ 1)Rankallfeatureswiththemethod.2)Deletefeaturesandseehowlongittakestoflipthedecision

‣ Omission:likethegreedyalgorithmfromLIMEcomparison

‣ Saliency(gradientmethod)isbe8eratfindingtheflippointsthanLIME(butonlyslightly)

Page 20: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

ExplainingSequenceModels

‣ Thesemodelsmightworkwellforbag-of-wordsmodels,butwhataboutothertasks?

Alvarez-MelisandJaakkola(2019)

Iwenttothestore=>Jesuisalléaumagasin

I____tothestore=>???

‣ Transla=onsystemmighttotallybreakdown,needtostayonthedatamanifold

‣ Samplesimilardatapointsfromavaria=onalautoencoder(VAE),morecomplexapproachthatrequiresanothermodel

Page 21: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Idea3:Probing

‣ TrainamodelfortaskXandlearntopredicttaskY

‣ E.g.:takeELMorepresenta=ons,freezethem,thentrytopredictPOSrepresenta=onswithjustasormaxlayer

‣ Doesn’t“explain”apredic=onbutcanilluminatewhatmodelsareandaren’tabletocapture

Page 22: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Takeaways

‣ Lookingatweightsisgenerallyhardforneuralnetworks

‣ LIMEisagoodmethodforgenera=nginterpretableexplana=ons,butnotalwayseasytogetright

‣ Gradient-basedtechniquescanprovideexplana=ons,butthesearen’tperfect.Very“local”anddon’tconsiderwhathappensifawordchangestoadifferentword

‣ Probingtaskscantellyougenerallywhatyournetworkmightbedoingbutarehardtointerpret

Page 23: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFBasics

Page 24: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NERRevisited

‣ FeaturesinCRFs:I[tag=B-LOC&curr_word=Hangzhou],I[tag=B-LOC&prev_word=to],I[tag=B-LOC&curr_prefix=Han]

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ Downsides:‣ Lexicalfeaturesmeanthatwordsneedtobeseeninthetrainingdata

‣ Linearmodelcan’tcapturefeatureconjunc=onsaseffec=vely(doesn’tworkwelltolookatmorethan2wordswithasinglefeature)

‣ Linearmodeloverfeatures

Page 25: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LSTMsforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

B-PERI-PEROOOB-LOC

‣ Transducer(LM-likemodel)

‣WhatarethestrengthsandweaknessesofthismodelcomparedtoCRFs?

Page 26: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LSTMsforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

B-PERI-PEROOOB-LOC

‣ Bidirec=onaltransducermodel

‣WhatarethestrengthsandweaknessesofthismodelcomparedtoCRFs?

Page 27: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFs

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

‣ NeuralCRFs:bidirec=onalLSTMs(orsomeNN)computeemissionpoten=als,capturestructuralconstraintsintransi=onpoten=als

Page 28: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFs

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

�e(yi, i,x) = w>fe(yi, i,x)

‣ Neuralnetworkcomputesunnormalizedpoten=alsthatareconsumedand“normalized”byastructuredmodel

Wisanum_tagsxlen(f)matrix

‣ Conven=onal:

‣ Neural:

‣ f(i,x)couldbetheoutputofafeedforwardneuralnetworklookingatthewordsaroundposi=oni,ortheithoutputofanLSTM,…

‣ Inference:computef,useViterbi

�e(yi, i,x) = W>yif(i,x)

Page 29: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Compu=ngGradients

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Forlinearmodel:

@L@�e,i

= �P (yi = s|x) + I[s is gold]

�e(yi, i,x) = w>fe(yi, i,x)‣ Conven=onal:

@�e,i

wi= fe,i(yi, i,x)

chainrulesaytomul=plytogether,givesourupdate

‣ Forneuralmodel:computegradientofphiw.r.t.parametersofneuralnet

“errorsignal”,computewithF-B

‣ Neural: �e(yi, i,x) = W>yif(i,x)

Page 30: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFs

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

1)Computef(x)

2)Runforward-backward

3)Computeerrorsignal

4)Backprop(noknowledgeofCRFstructurerequired)

Page 31: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

FFNNNeuralCRFforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

toHangzhoutoday

e(Hangzhou)

previousword currword nextword

e(today)e(to)

�e = Wg(V f(x, i))

f(x, i) = [emb(xi�1), emb(xi), emb(xi+1)]FFNN

Page 32: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LSTMNeuralCRFs

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

‣ Bidirec=onalLSTMscomputeemission(ortransi=on)poten=als

Page 33: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

LSTMsforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee>ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

BarackObamawilltraveltoHangzhou

B-PERI-PEROOOB-LOC

‣ HowdoesthiscomparetoneuralCRF?

Page 34: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

“NLP(Almost)FromScratch”

Collobert,Weston,etal.2008,2011

‣ LM2:wordvectorslearnedfromaprecursortoword2vec/GloVe,trainedfor2weeks(!)onWikipedia

‣WLL:independentclassifica=on;SLL:neuralCRF

Page 35: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFswithLSTMs‣ NeuralCRFusingcharacterLSTMstocomputewordrepresenta=ons

ChiuandNichols(2015),Lampleetal.(2016)

Page 36: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

NeuralCRFswithLSTMs

ChiuandNichols(2015),Lampleetal.(2016)

‣ Chiu+Nichols:characterCNNsinsteadofLSTMs

‣ Lin/Passos/Luo:useexternalresourceslikeWikipedia

‣ LSTM-CRFcapturestheimportantaspectsofNER:wordcontext(LSTM),sub-wordfeatures(characterLSTMs),outsideknowledge(wordembeddings)

Page 37: CS388: Natural Language Processing Lecture 10: Interpre=ng …gdurrett/courses/fa2019/... · 2019. 10. 1. · Barack Obama will travel to Hangzhou today for the G20 mee>ng . PERSON

Takeaways

‣ Explana=onmethods:lookingatweights,LIME,gradient-based

‣ AllkindsofNNscanbeintegratedintoCRFsforstructuredinference.CanbeappliedtoNER,othertagging,parsing,…

‣ ThisconcludestheML/DL-heavypor=onofthecourse.Star=ngTuesday:syntax,thenseman=cs