ttic 31190: natural language...

85
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 1: Introduction 1

Upload: others

Post on 09-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

TTIC31190:NaturalLanguageProcessing

KevinGimpelWinter2016

Lecture1:Introduction

1

Page 2: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

2

Whatisnaturallanguageprocessing?

Page 3: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

3

anexperimentalcomputerscienceresearchareathatincludesproblemsandsolutionspertainingto

theunderstandingofhumanlanguage

Whatisnaturallanguageprocessing?

Page 4: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

4

TextClassification

Page 5: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

5

TextClassification

• spam/notspam• prioritylevel• category(primary/social/promotions/updates)

Page 6: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

6

SentimentAnalysis

Page 7: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

7

MachineTranslation

Page 8: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

8

MachineTranslation

NewPoll:WillyoubuyanAppleWatch?

Page 9: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

9

QuestionAnswering

Page 10: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

10

Summarization

Page 11: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

11

Summarization

TheAppleWatchhasdrawbacks.Thereareothersmartwatches thatoffermorecapabilities.

Page 12: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

12

DialogSystems

user:ScheduleameetingwithMattandDavidonThursday.computer:Thursdaywon’tworkforDavid.HowaboutFriday?user:I’dpreferMondaythen,butFridaywouldbeokifnecessary.

Page 13: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

13

Part-of-SpeechTagging

determinerverb(past)prep.properproperposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.properpunc.wouldbeabreakawayhitforApple.

Page 14: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

14

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

Page 15: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

15

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

SomequestionedifTimCook’sfirstproductwouldbeabreakawayhitforApple.

NamedEntityRecognition

PERSON ORGANIZATION

Page 16: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

16

SyntacticParsing

Page 17: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

figurecredit:Durrett &Klein(2014)17

Revenues of $14.5 billion were posted by Dell1. The company1 ...

en.wikipedia.org/wiki/Dell

en.wikipedia.org/wiki/Michael_Dell

Infobox type: company

Infobox type: person

ORGANIZATIONPERSON

Figure 1: Coreference can help resolve ambiguous casesof semantic types or entity links: propagating informationacross coreference arcs can inform us that, in this context,Dell is an organization and should therefore link to thearticle on Dell in Wikipedia.

shown that tighter integration of coreference andentity linking is promising (Hajishirzi et al., 2013;Zheng et al., 2013); we extend these approaches andmodel the entire process more holistically. Namedentity recognition is improved by simple coreference(Finkel et al., 2005; Ratinov and Roth, 2009) andknowledge from Wikipedia (Kazama and Torisawa,2007; Ratinov and Roth, 2009; Nothman et al.,2013; Sil and Yates, 2013). Joint models of corefer-ence and NER have been proposed in Haghighi andKlein (2010) and Durrett et al. (2013), but in neithercase was supervised data used for both tasks. Tech-nically, our model is most closely related to that ofSingh et al. (2013), who handle coreference, namedentity recognition, and relation extraction.2 Our sys-tem is novel in three ways: the choice of tasks tomodel jointly, the fact that we maintain uncertaintyabout all decisions throughout inference (rather thanusing a greedy approach), and the feature sets wedeploy for cross-task interactions.

In designing a joint model, we would like topreserve the modularity, efficiency, and structuralsimplicity of pipelined approaches. Our model’sfeature-based structure permits improvement of fea-tures specific to a particular task or to a pair of tasks.By pruning variable domains with a coarse modeland using approximate inference via belief propaga-tion, we maintain efficiency and our model is only afactor of two slower than the union of the individual

2Our model could potentially be extended to handle relationextraction or mention detection, which has also been addressedin past joint modeling efforts (Daume and Marcu, 2005; Li andJi, 2014), but that is outside the scope of the current work.

models. Finally, as a structured CRF, it is concep-tually no more complex than its component modelsand its behavior can be understood using the sameintuition.

We apply our model to two datasets, ACE 2005and OntoNotes, with different mention standardsand layers of annotation. In both settings, our jointmodel outperforms our independent baseline mod-els. On ACE, we achieve state-of-the-art entity link-ing results, matching the performance of the systemof Fahrni and Strube (2014). On OntoNotes, wematch the performance of the best published coref-erence system (Bjorkelund and Kuhn, 2014) andoutperform two strong NER systems (Ratinov andRoth, 2009; Passos et al., 2014).

2 Motivating Examples

We first present two examples to motivate our ap-proach. Figure 1 shows an example of a case wherecoreference is beneficial for named entity recogni-tion and entity linking. The company is clearlycoreferent to Dell by virtue of the lack of other possi-ble antecedents; this in turn indicates that Dell refersto the corporation rather than to Michael Dell. Thiseffect can be captured for entity linking by a fea-ture tying the lexical item company to the fact thatCOMPANY is in the Wikipedia infobox for Dell,3

thereby helping the linker make the correct decision.This would also be important for recovering the factthat the mention the company links to Dell; how-ever, in the version of the task we consider, a men-tion like the company actually links to the Wikipediaarticle for Company.4

Figure 2 shows a different example, one wherethe coreference is now ambiguous but entity linkingis transparent. In this case, an NER system basedon surface statistics alone would likely predict thatFreddie Mac is a PERSON. However, the Wikipediaarticle for Freddie Mac is unambiguous, which al-lows us to fix this error. The pronoun his can then becorrectly resolved.

These examples justify why these tasks should behandled jointly: there is no obvious pipeline orderfor a system designer who cares about the perfor-

3Monospaced fonts indicate titles of Wikipedia articles.4This decision was largely driven by a need to match the

ACE linking annotations provided by Bentivogli et al. (2010).

Coreference Resolution

EntityLinking

Page 18: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

18

“Winograd Schema”Coreference Resolution

Themancouldn'tlifthissonbecausehewassoweak.

Themancouldn'tlifthissonbecausehewassoheavy.

Page 19: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

19

“Winograd Schema”Coreference Resolution

Themancouldn'tlifthissonbecausehewassoweak.

Themancouldn'tlifthissonbecausehewassoheavy.

man

son

Page 20: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

OncetherewasaboynamedFritzwholovedtodraw.Hedreweverything.Inthemorning,hedrewapictureofhiscerealwithmilk.Hispapasaid,“Don’tdrawyourcereal.Eatit!”Afterschool,Fritzdrewapictureofhisbicycle.Hisunclesaid,“Don'tdrawyourbicycle.Rideit!”…

WhatdidFritzdrawfirst?A)thetoothpasteB)hismamaC)cerealandmilkD)hisbicycle

20

ReadingComprehension

Page 21: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Conspicuousbytheirabsence…• speechrecognition(seeTTIC31110)• informationretrievalandwebsearch• knowledgerepresentation• recommendersystems

21

Page 22: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ComputationalLinguisticsvs.NaturalLanguageProcessing

• howdotheydiffer?

22

Page 23: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ComputationalBiologyvs.Bioinformatics

“Computationalbiology=thestudyofbiologyusingcomputationaltechniques.Thegoalistolearnnewbiology,knowledgeaboutlivingsystems.Itisaboutscience.

Bioinformatics=thecreationoftools(algorithms,databases)thatsolveproblems.Thegoalistobuildusefultoolsthatworkonbiologicaldata.Itisaboutengineering.”

--RussAltman

23

Page 24: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ComputationalLinguisticsvs.NaturalLanguageProcessing

• manypeoplethinkofthetwotermsassynonyms

• computationallinguisticsismoreinclusive;morelikelytoincludesociolinguistics,cognitivelinguistics,andcomputationalsocialscience

• NLPismorelikelytousemachinelearningandinvolveengineering/system-building

24

Page 25: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

IsNLPScienceorEngineering?• goalofNLPistodeveloptechnology,whichtakestheformofengineering

• thoughwetrytosolvetoday’sproblems,weseekprinciplesthatwillbeusefulforthefuture

• ifscience,it’snotlinguisticsorcognitivescience;it’sthescienceofcomputationalprocessingoflanguage

• soIliketothinkthatwe’redoingthescienceofengineering

25

Page 26: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

CourseOverview• Newcourse,firsttimebeingoffered

• Aimedatfirst-yearPhDstudents

• Instructorofficehours:Mondays3-4pm,TTIC531

• Teachingassistant:Lifu Tu,TTICPhDstudent

26

Page 27: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Prerequisites• Nocourseprerequisites,butIwillassume:– someprogrammingexperience(nospecificlanguagerequired)

– familiaritywithbasicsofprobability,calculus,andlinearalgebra

• Undergraduateswithrelevantbackgroundarewelcometotakethecourse.Pleasebringanenrollmentapprovalformtomeifyoucan’tenrollonline.

27

Page 28: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Grading• 3assignments(15%each)• midtermexam(15%)• courseproject(35%):– preliminaryreportandmeetingwithinstructor(10%)– classpresentation(5%)– finalreport(20%)

• classparticipation(5%)• nofinal

28

Page 29: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Assignments• Mixtureofformalexercises,implementation,experimentation,analysis

• “Chooseyourownadventure”componentbasedonyourinterests,e.g.:– exploratorydataanalysis– machinelearning– implementation/scalability– modelanderroranalysis– visualization

29

Page 30: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Project• Replicate[partof]apublishedNLPpaper,ordefineyourownproject.

• Theprojectmaybedoneindividuallyorinagroupoftwo.Eachgroupmemberwillreceivethesamegrade.

• Moredetailstocome.

30

Page 31: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

CollaborationPolicy• Youarewelcometodiscussassignmentswithothersinthecourse,butsolutionsandcodemustbewrittenindividually

31

Page 32: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Textbooks

• Allareoptional• SpeechandLanguageProcessing,2nd Ed.

– somechaptersof3rd editionareonline

• TheAnalysisofData,Volume1:Probability– freelyavailableonline

• IntroductiontoInformationRetrieval– freelyavailableonline

32

Page 33: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• syntaxandsyntacticparsing• neuralnetworkmethodsinNLP• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications

33

Page 34: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

WhyisNLPhard?• ambiguityandvariabilityoflinguisticexpression:– variability:manyformscanmeanthesamething– ambiguity:oneformcanmeanmanythings

• therearemanydifferentkindsofambiguity• eachNLPtaskhastoaddressadistinctsetofkinds

34

Page 35: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

WordSenseAmbiguity• manywordshavemultiplemeanings

35

Page 36: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

WordSenseAmbiguity

36

credit:A.Zwicky

Page 37: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

WordSenseAmbiguity

37

credit:A.Zwicky

Page 38: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

AttachmentAmbiguity

38

Page 39: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

MeaningAmbiguity

39

Page 40: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

• simplestuser-facingNLPapplication• email(spam,priority,categories):

• sentiment:

• topicclassification• others?

40

TextClassification

Page 41: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Whatisaclassifier?

41

Page 42: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Whatisaclassifier?• afunctionfrominputsx toclassificationlabelsy

42

Page 43: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Whatisaclassifier?• afunctionfrominputsx toclassificationlabelsy• onesimpletypeofclassifier:– foranyinputx,assignascoretoeachlabely,parameterizedbyvector :

43

Page 44: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Whatisaclassifier?• afunctionfrominputsx toclassificationlabelsy• onesimpletypeofclassifier:– foranyinputx,assignascoretoeachlabely,parameterizedbyvector :

– classifybychoosinghighest-scoringlabel:

44

Page 45: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

CoursePhilosophy• Fromreadingpapers,onegetstheideathatmachinelearningconceptsaremonolithic,opaqueobjects– e.g.,naïveBayes,logisticregression,SVMs,CRFs,neuralnetworks,LSTMs,etc.

• Nothingisopaque• Everythingcanbedissected,whichrevealsconnections• Thenamesaboveareusefulshorthand,butnotusefulforgainingunderstanding

45

Page 46: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

CoursePhilosophy• Wewilldrawfrommachinelearning,linguistics,and

algorithms,buttechnicalmaterialwillbe(mostly)self-contained;wewon’tusemanyblackboxes

• Wewillfocusondeclarative(ratherthanprocedural)specifications,becausetheyhighlightconnectionsanddifferences

46

Page 47: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling,Inference,Learning

47

Page 48: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling,Inference,Learning

• Modeling:Howdoweassignascoretoan(x,y)pairusingparameters?

modeling:definescorefunction

48

Page 49: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling,Inference,Learning

• Inference:Howdoweefficientlysearchoverthespaceofalllabels?

inference:solve_ modeling:definescorefunction

49

Page 50: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling,Inference,Learning

• Learning:Howdowechoose?

learning:choose_

modeling:definescorefunctioninference:solve_

50

Page 51: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling,Inference,Learning

• Wewillusethissameparadigmthroughoutthecourse,evenwhentheoutputspacesizeisexponentialinthesizeoftheinputorisunbounded(e.g.,machinetranslation)

learning:choose_

modeling:definescorefunctioninference:solve_

51

Page 52: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Notation• We’lluseboldfaceforvectors:

• Individualentrieswillusesubscriptsandnoboldface,e.g.,forentryi:

52

Page 53: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling:LinearModels• Scorefunctionislinearin:

• f:featurefunctionvector• :weightvector

53

Page 54: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Modeling:LinearModels• Scorefunctionislinearin:

• f:featurefunctionvector• :weightvector• Howdowedefinef?

54

Page 55: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

DefiningFeatures• ThisisalargepartofNLP• Last20years:featureengineering• Last2years:representationlearning

• Inthiscourse,wewilldoboth• Learningrepresentationsdoesn’tmeanthatwedon’thavetolookatthedataortheoutput!

• There’sstillplentyofengineeringrequiredinrepresentationlearning

55

Page 56: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

DefiningFeatures• ThisisalargepartofNLP• Last20years:featureengineering• Last2years:representationlearning

• Inthiscourse,we’lldoboth• Learningrepresentationsdoesn’tmeanthatwedon’thavetolookatthedataortheoutput!

• There’sstillplentyofengineeringrequiredinrepresentationlearning

56

Page 57: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineering• Oftendecriedas“costly,hand-crafted,expensive,domain-specific”,etc.

• Butinpractice,simplefeaturestypicallygivethebulkoftheperformance

• Let’sgetconcrete:howshouldwedefinefeaturesfortextclassification?

57

Page 58: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

58

Page 59: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

59

isnowavectorbecauseitisasequenceofwords

Page 60: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

60

isnowavectorbecauseitisasequenceofwords

let’sconsidersentimentanalysis:

Page 61: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

61

isnowavectorbecauseitisasequenceofwords

so,hereisoursentimentclassifierthatusesalinearmodel:

let’sconsidersentimentanalysis:

Page 62: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Twofeatures:

where

• Whatshouldtheweightsbe?

62

Page 63: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Twofeatures:

where

• Whatshouldtheweightsbe?

63

Page 64: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Twofeatures:

• Let’ssayweset• Onsentencescontaining“great”intheStanfordSentimentTreebanktrainingdata,thiswouldgetusanaccuracyof69%

• But“great’’onlyappearsin83/6911examples

64

Page 65: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Twofeatures:

• Let’ssayweset• Onsentencescontaining“great”intheStanfordSentimentTreebanktrainingdata,thiswouldgetusanaccuracyof69%

• But“great’’onlyappearsin83/6911examples

65

variability:manyotherwordscanindicatepositivesentiment

ambiguity:“great”canmeandifferentthingsindifferentcontexts

Page 66: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

• Usually,greatindicatespositivesentiment:Themostwondrous lovestoryinyears,itisagreat film.Agreat companionpiecetootherNapoleon films.

• Sometimesnot.Why?Negation:It'snotagreatmonstermovie.Differentsense:There'sagreatdealofcornydialogueandpreposterousmoments.Multiplesentiments: Agreat ensemblecastcan'tliftthisheartfeltenterpriseoutofthefamiliar.

• Andtherearemanyotherwordsthatindicatepositivesentiment

66

Page 67: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

• Usually,greatindicatespositivesentiment:Themostwondrous lovestoryinyears,itisagreat film.Agreat companionpiecetootherNapoleon films.

• Sometimesnot.Why?Negation:It'snotagreatmonstermovie.Differentsense:There'sagreatdealofcornydialogueandpreposterousmoments.Multiplesentiments: Agreat ensemblecastcan'tliftthisheartfeltenterpriseoutofthefamiliar.

67

Page 68: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Whataboutafeaturelikethefollowing?

• Whatshoulditsweightbe?

68

Page 69: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

FeatureEngineeringforTextClassification

• Whataboutafeaturelikethefollowing?

• Whatshoulditsweightbe?• Doesn’tmatter.• Why?

69

Page 70: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

TextClassification

70

ourlinearsentimentclassifier:

Page 71: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Inference forTextClassification

71

inference:solve_

Page 72: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Inference forTextClassification

72

inference:solve_

• trivial(loopoverlabels)

Page 73: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

TextClassification

73

Page 74: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

LearningforTextClassification

74

learning:choose_

Page 75: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

LearningforTextClassification

75

learning:choose_

• Therearemanywaystochoose

Page 76: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• inthebeginning,wejusthaddata• firstinnovation:splitintotrainandtest– motivation:simulateconditionsofapplyingsysteminpractice

• but,there’saproblemwiththis…– weneedtoexploreandevaluatemethodologicalchoices

– aftermultipleevaluationsontest,itisnolongerasimulationofreal-worldconditions

76

Page 77: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• inthebeginning,wejusthaddata• firstinnovation:splitintotrain andtest– motivation:simulateconditionsofapplyingsysteminpractice

• but,there’saproblemwiththis…– weneedtoexploreandevaluatemethodologicalchoices

– aftermultipleevaluationsontest,itisnolongerasimulationofreal-worldconditions

77

Page 78: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• inthebeginning,wejusthaddata• firstinnovation:splitintotrain andtest– motivation:simulateconditionsofapplyingsysteminpractice

• but,there’saproblemwiththis…– weneedtoexploreandevaluatemethodologicalchoices

– aftermultipleevaluationsontest,itisnolongerasimulationofreal-worldconditions

78

Page 79: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• inthebeginning,wejusthaddata• firstinnovation:splitintotrain andtest– motivation:simulateconditionsofapplyingsysteminpractice

• but,there’saproblemwiththis…– weneedtoexploreandevaluatemethodologicalchoices

– aftermultipleevaluationsontest,itisnolongerasimulationofreal-worldconditions

79

Page 80: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• weneedtoexplore/evaluatemethodologicalchoices• whatshouldwedo?– someusecrossvalidationontrain,butthisisslowanddoesn’tquitesimulatereal-worldsettings(why?)

• secondinnovation:dividedataintotrain,test,andathirdsetcalleddevelopmentorvalidation– usedevelopment/validationtoevaluatechoices– then,whenreadytowritethepaper,evaluatethebestmodelontest

• arewedoneyet?no!there’sstillaproblem

80

Page 81: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• weneedtoexplore/evaluatemethodologicalchoices• whatshouldwedo?– someusecrossvalidationontrain,butthisisslowanddoesn’tquitesimulatereal-worldsettings(why?)

• secondinnovation:dividedataintotrain,test,andathirdsetcalleddevelopment (dev)orvalidation(val)– usedev/val toevaluatechoices– then,whenreadytowritethepaper,evaluatethebestmodelontest

• arewedoneyet?no!there’sstillaproblem:– overfitting todev/val

81

Page 82: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• weneedtoexplore/evaluatemethodologicalchoices• whatshouldwedo?– someusecrossvalidationontrain,butthisisslowanddoesn’tquitesimulatereal-worldsettings(why?)

• secondinnovation:dividedataintotrain,test,andathirdsetcalleddevelopment (dev)orvalidation(val)– usedev/val toevaluatechoices– then,whenreadytowritethepaper,evaluatethebestmodelontest

• arewedoneyet?no!there’sstillaproblem:– overfitting todev/val

82

Page 83: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• weneedtoexplore/evaluatemethodologicalchoices• whatshouldwedo?– someusecrossvalidationontrain,butthisisslowanddoesn’tquitesimulatereal-worldsettings(why?)

• secondinnovation:dividedataintotrain,test,andathirdsetcalleddevelopment (dev)orvalidation(val)– usedev/val toevaluatechoices– then,whenreadytowritethepaper,evaluatethebestmodelontest

• arewedoneyet?no!there’sstillaproblem:– overfitting todev/val

83

Page 84: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

ExperimentalPractice• bestpractice:splitdataintotrain,development(dev),developmenttest(devtest),andtest– trainmodelontrain,tunehyperparameter valuesondev,dopreliminarytestingondevtest,dofinaltestingontestasingletimewhenwritingthepaper

– Evenbettertohaveevenmoretestsets!test1,test2,etc.

• experimentalcredibilityisahugecomponentofdoingusefulresearch

• whenyoupublisharesult,ithadbetterbereplicablewithouttuninganythingontest

84

Page 85: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/1.pdfComputational Linguistics vs. Natural Language Processing • many people think of the

Don’tCheat!

85