ttic 31190: natural language processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting...

47
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 9: Sequence Models 1

Upload: others

Post on 04-Dec-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

TTIC31190:NaturalLanguageProcessing

KevinGimpelWinter2016

Lecture9:SequenceModels

1

Page 2: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Announcements• onThursday,classwillbeinRoom530(theroomdirectlybehindyou)

2

Page 3: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Announcements• wewillgooverpartofAssignment1today(gradescomingsoon)

• Assignment2wasdueWed.Feb.3,nowdueFri.,Feb.5

• projectproposaldueTuesday,Feb.16• midtermonThursday,Feb.18

3

Page 4: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

• qualityofscientificjournalism:

4

OtherNaturally-OccurringData

Page 5: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

OtherNaturally-OccurringData• memorabilityofquotations:

5

Page 6: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

OtherNaturally-OccurringData• sarcasm(remove#sarcasmhashtagfromtweets):

6

Page 7: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

OtherNaturally-OccurringData• openingweekendmovierevenuepredictionfromcriticreviews:

7

Page 8: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

OtherNaturally-OccurringData• predictingnovelsuccessfromtextofnovels:

8

Page 9: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ProjectProposal• dueFeb.16(intwoweeks)• 1-2pages• onepergroup• includethefollowing:– membersofyourgroup– describethetaskyouaregoingtoworkon(couldbeanewtaskyoucreateoranexistingtask)

– describethemethodsyouwilluse/developforthetask– giveabriefreviewofrelatedwork;i.e.,situateyourprojectwithrespecttotheliterature(www.aclweb.organdGoogleScholarareusefulforthis!)

– aproposedtimeline

9

Page 10: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ProjectProposal(cont’d)

• yourresultsdonothavetobeatthestate-of-the-art!

• butyourprojectdoeshavetobecarefullydone,sothatyoucandrawconclusions

• youarewelcometostartbyreplicatinganNLPpaper(Icangivesuggestionsifyouneedsome)

• duringtheweekofFeb.22,pleasescheduleameetingwithmetodiscussyourproject– detailstofollow

10

Page 11: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ClassPresentations• finaltwoclassmeetings(March3rd andMarch8th)willbemostlyusedforin-classpresentations

• onepresentationpergroup• 10-15minutesperpresentation(willbedeterminedonceIknowhowmanygroupsthereare)

• youwilleachtakenotesandemailmequestions/feedbackforthepresenter,whichIwillanonymize andsend

11

Page 12: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Project• finalreportdueThursday,March17(originaldateofthefinalexam)

• sothepresentationwillbemorelikean“interimprogressreport”

12

Page 13: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications

13

Page 14: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

14

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

Simplestkindofstructuredprediction:SequenceLabeling

Page 15: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

15

OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct

OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.

NamedEntityRecognition

B=“begin”I=“inside”O=“outside”

FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:

Page 16: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

• therearemanydownloadablepart-of-speechtaggersandnamedentityrecognizers:– StanfordPOStagger,NERlabeler– TurboTagger,TurboEntityRecognizer– IllinoisEntityTagger– CMUTwitterPOStagger– AlanRitter’sTwitterPOS/NERlabeler

16

Page 17: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

17

Page 18: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HiddenMarkovModels

18

y1 y2 y3 y4

x1 x2 x3 x4

transitionparameters:

emissionparameters:

Page 19: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HMMsforWordClustering(Brownetal.,1992)

19

eachisaclusterIDso,labelspaceis

justin bieber forpresident

y1 y2 y3 y4

Page 20: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HMMsforPart-of-SpeechTagging

20

eachisapart-of-speechtagso,labelspaceis

whatparametersneedtobelearned?

transitionparameters:

emissionparameters:

justin bieber forpresident

propernoun

propernoun

prepo-sition

noun

Page 21: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HowshouldwelearntheHMMparameters?

21

transitionparameters:

emissionparameters:

Page 22: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

SupervisedHMMs• givenadatasetofinputsequencesandannotatedoutputs:

• toestimatetransition/emissiondistributions,usemaximumlikelihoodestimation(countandnormalize):

22

justin bieber forpresident

propernoun

propernoun

prepo-sition

noun

Page 23: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

EstimatesofTagTransitionProbabilities

23

proper modalinfinitive adjectivenoun adverbdeterminernoun verbverb

Page 24: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

EstimatesofEmissionProbabilities

24

Page 25: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

InferenceinHMMs

25

• sincetheoutputisasequence,thisargmaxrequiresiteratingoveranexponentially-largeset

• lastweekwetalkedaboutusingdynamicprogramming(DP)tosolvetheseproblems

• forHMMs(andothersequencemodels),theforsolvingthisiscalledtheViterbialgorithm

Page 26: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ViterbiAlgorithm• recursiveequations+memoization:

26

basecase:returnsprobabilityofsequencestartingwithlabely forfirstword

recursivecase:computesprobabilityofmax-probabilitylabelsequencethatendswithlabely atpositionm

finalvalueisin:

Page 27: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Example:

Janetwillbackthebill

27

proper modalinfinitive determiner nounnoun verbverb

Page 28: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Janetwillbackthebill

28

proper modalinfinitive determiner nounnoun verbverb

Page 29: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ViterbiAlgorithm(onboard)

29

Page 30: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

30

spacecomplexity:sizeofmemoization table,whichis#ofuniqueindicesofrecursiveequations

so,spacecomplexityisO(|x||L|)

lengthofsentence

numberoflabels*

Page 31: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

31

timecomplexity:sizeofmemoization table*complexityofcomputingeachentry

so,timecomplexityisO(|x||L||L|)=O(|x||L|2)

lengthofsentence

numberoflabels*

eachentryrequiresiteratingthroughthelabels*

Page 32: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

LinearSequenceModels

• let’sgeneralizeHMMsandtalkaboutlinearmodelsforscoringlabelsequencesinourclassifierframework:

• butfirst,howdoweknowthatthisscoringfunctiongeneralizesHMMs?

32

Page 33: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HMMasaLinearModel

• whatarethefeaturetemplatesandweights?

33

HMM:

linearmodel:

Page 34: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

HMMasaLinearModel

featuretemplatesandweights:

34

HMM:

linearmodel:

Page 35: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

LinearSequenceModels

• so,anHMMis:– alinearsequencemodel– withparticularfeaturesonlabeltransitionsandlabel-observationemissions

– andusesmaximumlikelihoodestimation(count&normalize)forlearning

• butwecoulduseanyfeaturefunctionswelike,anduseanyofourlossfunctionsforlearning!

35

Page 36: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

(Chain)ConditionalRandomFields

36

Page 37: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

(Chain)ConditionalRandomFields

• linearsequencemodel• arbitraryfeaturesofinputarepermitted• test-timeinferenceusesViterbiAlgorithm• learningdonebyminimizinglogloss(DPalgorithmsusedtocomputegradients)

37

Page 38: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Max-MarginMarkovNetworks

38

Page 39: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Maximum-MarginMarkovNetworks

• linearsequencemodel• arbitraryfeaturesofinputarepermitted• test-timeinferenceusesViterbiAlgorithm• learningdonebyminimizinghingeloss(DPalgorithmusedtocomputesubgradients)

39

Page 40: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

FeatureLocality

• featurelocality:roughly,how“big”areyourfeatures?

• whendesigningefficientinferencealgorithms(whetherw/DPorothermethods),weneedtobemindfulofthis

• featurescanbearbitrarilybigintermsoftheinputsequence

• butfeaturescannot bearbitrarilybigintermsoftheoutput sequence!

• thefeaturesinHMMsaresmallinboththeinputandoutputsequences(onlytwopiecesatatime)

40

Page 41: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Arethesefeaturesbigorsmall?

41

feature bigorsmall?

featurethatcountsinstancesof“the”intheinputsentence small

featurethatreturns squarerootofsumofcountsofam/is/was/were small

featurethatcounts“verb verb”sequences small

featurethatcounts“determiner noun verb verb”sequences prettybig!

featurethatcountsthenumberofnounsinasentence

big,butwecandesignspecialized

algorithmstohandlethemifthey’retheonlybigfeatures

feature thatreturnstheratioofnounstoverbs

Page 42: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Learningwithlinearsequencemodels• givenalinearsequencemodelwith“small”features,howshouldwedolearning?

42

Page 43: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Lossfunctionsforlearninglinearsequencemodels

43

loss entryj of(sub)gradientofloss forlinearmodel

perceptron

hinge

log

samegradients/subgradients asbefore,thoughcomputingtheseterms(inference)requiresDP

algorithms

Page 44: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

ImplementingDPalgorithms

• startwithcountingmode,butkeepinmindhowthemodel’sscorefunctiondecomposesacrosspartsoftheoutputs– i.e.,how“large”arethefeatures?howmanyitemsintheoutputsequenceareneededtocomputeeachfeature?

– defineafunctioncalledpartScore thatcomputesallthefeatures(forcountingmode,thisfunctionwillreturn1)

44

Page 45: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

NeuralNetworksinNLP• neuralnetworks• deepneuralnetworks• neurallanguagemodels• recurrentneuralnetworksandLSTMs• convolutionalneuralnetworks

45

Page 46: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two

Whatisaneuralnetwork?• justthinkofaneuralnetworkasafunction• ithasinputsandoutputs• theterm“neural”typicallymeansaparticulartypeoffunctionalbuildingblock(“neurallayers”),butthetermhasexpandedtomeanmanythings

46

Page 47: TTIC 31190: Natural Language Processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting novel success from text of novels: 8. Project Proposal • due Feb. 16 (in two