ttic 31190: natural language processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf ·...

100
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 13: Dependency Syntax/Parsing & Review for Midterm 1

Upload: phungdiep

Post on 13-Mar-2018

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

TTIC31190:NaturalLanguageProcessing

KevinGimpelWinter2016

Lecture13:DependencySyntax/Parsing

&ReviewforMidterm

1

Page 2: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Announcement• projectproposalduetoday• emailmetosetupa15-minutemeetingnextweektodiscussyourprojectproposal

• timespostedoncoursewebpage• letmeknowifnoneofthoseworkforyou

2

Page 3: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Announcement• midtermisThursday,room#530• closed-book,butyoucanbringan8.5x11sheet(thoughIdon’tthinkyou’llneedto)

• wewillstartat10:35am,finishat11:50am

3

Page 4: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications

4

Page 5: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

WhatisSyntax?• rules,principles,processesthatgovernsentencestructureofalanguage

• candifferwidelyamonglanguages• buteverylanguagehassystematicstructuralprinciples

5

Page 6: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ConstituentParse(Bracketing/Tree)(S(NPtheman)(VPwalked(PPto(NPthepark))))

6

themanwalkedtothepark

S

NP

NP

VP

PP

Key:S=sentenceNP=nounphraseVP=verbphrasePP=prepositionalphraseDT=determinerNN=nounVBD=verb(pasttense)IN=preposition

DT NN VBDINDTNN

Page 7: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ConstituentParse(Bracketing/Tree)(S(NPtheman)(VPwalked(PPto(NPthepark))))

7

themanwalkedtothepark

S

NP

NP

VP

PP

DT NN VBDINDTNN preterminals

nonterminals

terminals

Page 8: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

PennTreebankNonterminals

8

Page 9: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ProbabilisticContext-FreeGrammar(PCFG)

• assignprobabilitiestorewriterules:NPà DTNN 0.5NPà NNS 0.3NPà NPPP 0.2

NNàman 0.01NNà park 0.0004NNàwalk 0.002NNà….

9

givenatreebank,estimatetheseprobabilitiesusingMLE(“countandnormalize”)

Page 10: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

HowwelldoesaPCFGwork?• PCFGlearnedfromthePennTreebankwithMLEgetsabout73%F1score

• state-of-the-artparsersarearound92%• simplemodificationscanimprovePCFGs:– smoothing– treetransformations(selectiveflattening)– parentannotation

10

Page 11: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ParentAnnotationVPà VNPPP

VPS à VNPVP PPVP

addsmoreinformation,butalsofragmentscounts,makingparameterestimatesnoisier(sincewe’rejustusingMLE)

11

Page 12: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

HowwelldoesaPCFGwork?• PCFGlearnedfromthePennTreebankwithMLEgetsabout73%F1score

• state-of-the-artparsersarearound92%• simplemodificationscanimprovePCFGs:– smoothing– treetransformations(selectiveflattening)– parentannotation– lexicalization

12

Page 13: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Collins(1997)

13

Page 14: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

LexicalizedPCFGs

14

nonterminals aredecoratedwiththeheadwordofthesubtree

Page 15: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Lexicalization• thisaddsalotmorerules!• manymoreparameterstoestimateàsmoothingbecomesmuchmoreimportant– e.g.,right-handsideofrulemightbefactoredintoseveralsteps

• butit’sworthitbecauseheadwordsarereallyusefulforconstituentparsing

15

Page 16: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Results(Collins,1997)

16

Page 17: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

HeadRules• howareheadsdecided?• mostresearchersusedeterministicheadrules(Magerman/Collins)

• foraPCFGruleAà B1 …BN,theseheadrulessaywhichofB1 …BNistheheadoftherule

• examples:Sà NPVPVPà VBD NPPPNPà DTJJNN

17

Page 18: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

HeadAnnotation

18fromNoahSmith

Page 19: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

LexicalHeadAnnotation

19fromNoahSmith

Page 20: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

LexicalHeadAnnotationà Dependencies

20

removenonlexicalparts:

fromNoahSmith

Page 21: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Dependencies

21

mergeredundantnodes:

fromNoahSmith

Page 22: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

22

constituentparse: dependencyparse:

Page 23: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

23

constituentparse: labeled dependencyparse:

nsubj

det

dobj

pobj

det

prep

nsubj =“nominalsubject”dobj =“directobject”prep=“prepositionmodifier”pobj =“objectofpreposition”det =“determiner”

Page 24: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

24

constituentparse: labeled dependencyparse:

nsubj

det

dobj

pobj

det

prep

nsubj =“nominalsubject”dobj =“directobject”prep=“prepositionmodifier”pobj =“objectofpreposition”det =“determiner”

capturessomesemanticrelationships

Page 25: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

• how(unlabeled)dependencytreesaretypicallydrawn:– rootoftreeisrepresentedby$(“wallsymbol”)– arrowsdrawnentirelyabove(orbelow)sentence– arrowsaredirectedfromchildtoparent(orfromparenttochild);youwillseebothinpractice—don’tgetconfused!

25

source: $ konnten sie es übersetzen ?

reference: $ could you translate it ?“wall”symbol

Page 26: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

CrossingDependencies

26

ifdependenciescross(“nonprojective”),nolongercorrespondsto

aPCFG

fromNoahSmith

Page 27: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Projectivevs.Nonprojective DependencyParsing

• Englishdependencytreebanks aremostlyprojective– butwhenfocusingmoreonsemanticrelationships,oftenbecomesmorenonprojective

• some(relatively)freewordorderlanguages,likeCzech,arefairlynonprojective

• nonprojective parsingcanbeformulatedasaminimumspanningtreeproblem

• projectiveparsingcannot

27

Page 28: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

DependencyParsing• severalwidely-usedalgorithms• differentguaranteesbutsimilarperformanceinpractice

• graph-based:– dynamicprogramming(Eisner,1997)– minimumspanningtree(McDonaldetal.,2005)

• transition-based:– shift-reduce(Nivre,interalia)

28

Page 29: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

DependencyParsers• Stanfordparser• TurboParser• Joakim Nivre’s MALTparser• RyanMcDonald’sMSTparser• andmanyothersformanynon-Englishlanguages

29

Page 30: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ComplexityComparison• constituentparsing:O(Gn3)– parsingcomplexitydependsongrammarstructure(“grammarconstant”G)

– sinceithaslotsofnonterminal-onlyrulesatthetopofthetree,therearemanyruleprobabilitiestoestimate

• dependencyparsing:O(n3)– operatesdirectlyonwords,soparsingcomplexityhasnogrammarconstant

– featuresdesignedonpossibledependencies(pairsofwords)andlargerstructures

– transition-basedparsingalgorithmsareO(n),thoughnotoptimal;also,non-projectiveparsingisfaster

30

Page 31: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofDependencyParsing• widelyusedforNLPtasksbecause:– fasterthanconstituentparsing– capturesmoresemanticinformation

• textclassification(featuresondependencies)• syntax-basedmachinetranslation• relationextraction– e.g.,extractrelationbetweenSamSmithandAITech:SamSmithwasnamednewCEOofAITech.– usedependencypathbetweenSamSmithandAITech:

• Smithà named,namedß CEO,CEOß of,ofß AITech

31

Page 32: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Summary:twotypesofgrammars• phrasestructure/constituentgrammars– inspiredmostlybyChomskyandothers– onlyappropriateforcertainlanguages(e.g.,English)

• dependencygrammars– closertoasemanticrepresentation;somehavemadethismoreexplicit

– problematicforcertainsyntacticstructures(e.g.,conjunctions,nestingofnounphrases,etc.)

• botharewidelyusedinNLP• youcanfindconstituentparsersanddependencyparsersforseverallanguagesonline

32

Page 33: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Review

33

Page 34: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Modeling,Inference,Learning

• Modeling:Howdoweassignascoretoan(x,y)pairusingparameters?

modeling:definescorefunction

34

Page 35: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Modeling,Inference,Learning

• Inference:Howdoweefficientlysearchoverthespaceofalllabels?

inference:solve_ modeling:definescorefunction

35

Page 36: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Modeling,Inference,Learning

• Learning:Howdowechoose?

learning:choose_

modeling:definescorefunctioninference:solve_

36

Page 37: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Applications

37

Page 38: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofourClassificationFramework

38

textclassification:

x y

thehulk isanangerfueledmonsterwithincrediblestrengthandresistancetodamage. objective

intryingtobedaringandoriginal,itcomesoffasonlyoccasionallysatiricalandneverfresh. subjective

={objective,subjective}

Page 39: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofourClassificationFramework

39

wordsenseclassifierforbass:

x y

he’sabassinthechoir. bass3

our bassisline-caughtfromtheAtlantic. bass4

={bass1,bass2,…,bass8}

Page 40: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofourClassificationFramework

40

skip-grammodelasaclassifier:

x y

agriculture <s>

agriculture is

agriculture the

=V (theentirevocabulary)

corpus(EnglishWikipedia):agriculture isthetraditionalmainstayofthecambodian economy.butbenares hasbeendestroyedbyanearthquake .…

Page 41: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

41

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

Simplestkindofstructuredprediction:SequenceLabeling

Page 42: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

42

OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct

OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.

NamedEntityRecognition

B=“begin”I=“inside”O=“outside”

FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:

Page 43: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofourClassifierFrameworksofar

43

task input(x) output(y) outputspace() sizeof

textclassification asentence goldstandard

label forx

pre-defined, smalllabelset (e.g.,

{positive,negative})2-10

wordsensedisambiguation

instanceofaparticularword(e.g.,bass)with

itscontext

goldstandardwordsenseofx

pre-definedsenseinventory from

WordNet forbass2-30

learning skip-gramwordembeddings

instanceofawordinacorpus

awordinthecontextofx in

acorpusvocabulary |V|

part-of-speechtagging asentence

goldstandardpart-of-speech

tagsforx

allpossiblepart-of-speech tagsequenceswithsamelengthasx

|P||x|

Page 44: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ApplicationsofClassifierFramework(continued)

44

task input(x) output(y) outputspace() sizeof

namedentity

recognitionasentence

goldstandardnamedentitylabels forx

(BIOtags)

allpossibleBIOlabelsequenceswithsame

lengthasx|P||x|

constituentparsing asentence

goldstandardconstituentparse(labeledbracketing)

ofx

all possible labeledbracketings ofx

exponentialinlengthofx(Catalannumber)

dependencyparsing asentence

goldstandarddependencyparse(labeleddirectedspanning tree)ofx

allpossible labeleddirectedspanning trees

ofx

exponentialinlengthofx

Page 45: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

• eachapplicationdrawsfromparticularlinguisticconceptsandmustaddressdifferentkindsoflinguisticambiguity/variability:– wordsense:sensegranularity,relationshipsamongsenses,wordsenseambiguity

– wordvectors:distributionalproperties,senseambiguity,differentkindsofsimilarity

– part-of-speech:taggranularity,tagambiguity– parsing:constituent/dependencyrelationships,attachment&coordinationambiguities

45

Page 46: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Modeling

46

Page 47: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

modelfamilies• linearmodels– lotsoffreedomindefiningfeatures,thoughfeatureengineeringrequiredforbestperformance

– learningusesoptimizationofalossfunction– onecan(tryto)interpretlearnedfeatureweights

• stochastic/generativemodels– linearmodelswithsimple“features”(countsofevents)– learningiseasy:count&normalize(butsmoothingneeded)– easytogeneratesamples

• neuralnetworks– canusuallygetawaywithlessfeatureengineering– learningusesoptimizationofalossfunction– hardtointerpret(thoughwetry!),butoftenworksbest

47

Page 48: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

specialcaseoflinearmodels:stochastic/generativemodels

48

model tasks contextexpansion

n-gramlanguage models languagemodeling (forMT,ASR,etc.) increasen

hiddenMarkovmodelspart-of-speechtagging,

namedentityrecognition,wordclustering

increaseorderofHMM(e.g.,bigramHMMà trigram HMM)

probabilistic context-freegrammars constituentparsing increasesizeofrules,e.g.,flattening,

parentannotation,etc.

• alluseMLE+smoothing(thoughprobablydifferentkindsofsmoothing)• allassignprobabilitytosentences(someassignprobabilityjointlytopairs

of<sentence,somethingelse>)• allhavethesametrade-offofincreasing“context”(featuresize)and

needingmoredata/bettersmoothing

Page 49: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

FeatureEngineeringforTextClassification

• Twofeatures:

where

• Whatshouldtheweightsbe?

49

Page 50: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

unigrambinarytemplate:

bigrambinarytemplate:

trigrambinaryfeatures…

50

Higher-OrderBinaryFeatureTemplates

Page 51: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

UnigramCountFeatures

• a``count’’featurereturnsthecountofaparticularwordinthetext

• unigramcountfeaturetemplate:

51

Page 52: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

FeatureCountCutoffs• problem:somefeaturesareextremelyrare• solution:onlykeepfeaturesthatappearatleastk timesinthetrainingdata

52

Page 53: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

2-transformation(1-layer)network

• we’llcallthisa“2-transformation”neuralnetwork,ora“1-layer”neuralnetwork

• inputvectoris• scorevectoris• onehiddenvector(“hiddenlayer”)

53

vectoroflabelscores

Page 54: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

1-layerneuralnetworkforsentimentclassification

54

Page 55: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ikr smh heaskedfiryo lastnamesohecan

55

intj pronoun prepadj prep verbotherverbdet noun pronoun

NeuralNetworksforTwitterPart-of-SpeechTagging

vectorforlastvectorforyo

• let’susethecenterword+twowordstotheright:

vectorforname

• ifname istotherightofyo,thenyo isprobablyaformofyour• butourx aboveusesseparatedimensionsforeachposition!

– i.e.,nameistwowordstotheright– whatifnameisonewordtotheright?

Page 56: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Convolution

56

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

Page 57: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Pooling

57

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

howdoweconvertthisintoafixed-lengthvector?usepooling:

max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin

Page 58: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Pooling

58

vectorforlastvectorforyo vectorforname

=“featuremap”,hasanentryforeachwordposition incontextwindow/sentence

howdoweconvertthisintoafixed-lengthvector?usepooling:

max-pooling:returnsmaximumvalueinaverage pooling:returnsaverageofvaluesin

then,thissinglefilterproducesasinglefeaturevalue(theoutputofsomekindofpooling).inpractice,weusemanyfiltersofmanydifferentlengths(e.g.,n-gramsratherthanwords).

Page 59: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ConvolutionalNeuralNetworks• convolutionalneuralnetworks(convnets orCNNs)usefiltersthatare“convolvedwith”(matchedagainstallpositionsof)theinput

• thinkofconvolutionas“performthesameoperationeverywhereontheinputinsomesystematicorder”

• “convolutionallayer”=setoffiltersthatareconvolvedwiththeinputvector(whetherx orhiddenvector)

• couldbefollowedbymoreconvolutionallayers,orbyatypeofpooling

• oftenusedinNLPtoconvertasentenceintoafeaturevector

59

Page 60: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

RecurrentNeuralNetworks

60

“hiddenvector”

Page 61: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

LongShort-TermMemory(LSTM)RecurrentNeuralNetworks

61

Page 62: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Backward&BidirectionalLSTMs

62

bidirectional:ifshallow,justuseforwardandbackwardLSTMsinparallel,concatenatefinaltwohiddenvectors,feedtosoftmax

Page 63: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

DeepLSTM(2-layer)

63

layer1

layer2

Page 64: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

RecursiveNeuralNetworksforNLP• first,runaconstituentparseronthesentence• converttheconstituenttreetoabinarytree(eachrewritehasexactlytwochildren)

• constructvectorforsentencerecursivelyateachrewrite(“splitpoint”):

64

Page 65: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Learning

65

Page 66: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

CostFunctions• costfunction:scoresoutputagainstagoldstandard

• shouldreflecttheevaluationmetricforyourtask

• usualconventions:• forclassification,whatcostshouldweuse?• forclassification,whatcostshouldweuse?

66

Page 67: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Empirical RiskMinimization(Vapnik etal.)

67

• replaceexpectationwithsumoverexamples:

Page 68: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Empirical RiskMinimization(Vapnik etal.)

68

• replaceexpectationwithsumoverexamples:

problem:NP-hardevenforbinaryclassificationwithlinearmodels

Page 69: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

EmpiricalRiskMinimizationwithSurrogateLossFunctions

69

• giventrainingdata:whereeach isalabel

• wewanttosolvethefollowing:

manypossiblelossfunctionstoconsider

optimizing

Page 70: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

LossFunctions

70

name loss whereused

cost(“0-1”)intractable,but

underlies“directerrorminimization”

perceptron perceptronalgorithm(Rosenblatt,1958)

hingesupportvector

machines,other large-marginalgorithms

log

logisticregression,conditional randomfields,maximumentropymodels

Page 71: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

(Sub)gradientsofLossesforLinearModels

71

name entryj of(sub)gradientofloss forlinearmodel

cost(“0-1”) notsubdifferentiable ingeneral

perceptron

hinge

log

Page 72: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

(Sub)gradientsofLossesforLinearModels

72

name entryj of(sub)gradientofloss forlinearmodel

cost(“0-1”) notsubdifferentiable ingeneral

perceptron

hinge

log

expectationoffeaturevaluewithrespecttodistributionovery (wheredistribution isdefinedbytheta)

alternativenotation:

Page 73: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Visualization

73

score

fivepossibleoutputs

Page 74: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Visualization

74

cost

fivepossibleoutputs

Page 75: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Visualization

75

cost

goldstandard

Page 76: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Visualization

76

cost

goldstandard

Page 77: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Visualization

77

score+cost

goldstandard

Page 78: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

78

perceptronloss:

Page 79: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

79

score

goldstandard

perceptronloss:

Page 80: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

80

score

goldstandard

perceptronloss:

Page 81: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

81

score

goldstandard

perceptronloss:

effectoflearning?

Page 82: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

82

score

goldstandard

perceptronloss:

effectoflearning:goldstandardwillhavehighestscore

Page 83: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

83

hingeloss:

Page 84: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

84

score+cost

goldstandard

hingeloss:

Page 85: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

85

score+cost

goldstandard

hingeloss:

Page 86: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

86

score+cost

goldstandard

hingeloss:

effectoflearning?

Page 87: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

87

score+cost

goldstandard

hingeloss:

effectoflearning:scoreofgoldstandardwillbehigherthanscore+costofall

others

Page 88: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Regularized EmpiricalRiskMinimization

88

• giventrainingdata:whereeach isalabel

• wewanttosolvethefollowing:

regularizationterm

regularizationstrength

Page 89: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

RegularizationTerms

• mostcommon:penalizelargeparametervalues• intuition:largeparametersmightbeinstancesofoverfitting

• examples:L2 regularization:(alsocalledTikhonov regularizationorridgeregression)

L1 regularization:(alsocalledbasispursuitorLASSO)

89

Page 90: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Dropout• popularregularizationmethodforneuralnetworks

• randomly“dropout”(settozero)someofthevectorentriesinthelayers

90

Page 91: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Inference

91

Page 92: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Exponentially-LargeSearchProblems

92

inference:solve_

• whenoutputisasequenceortree,thisargmax requiresiteratingoveranexponentially-largeset

Page 93: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

Learningrequiressolvingexponentially-hardproblemstoo!

93

loss entryj of(sub)gradientofloss forlinearmodel

perceptron

hinge

log

computing eachof thesetermsrequiresiteratingthroughevery

possibleoutput

Page 94: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

DynamicProgramming(DP)• whatisdynamicprogramming?– afamilyofalgorithmsthatbreakproblemsintosmallerpiecesandreusesolutionsforthosepieces

– onlyapplicablewhentheproblemhascertainproperties(optimalsubstructureandoverlappingsub-problems)

• inthisclass,weuseDPtoiterateoverexponentially-largeoutputspacesinpolynomialtime

• wefocusonaparticulartypeofDPalgorithm:memoization

94

Page 95: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ImplementingDPalgorithms• evenifyourgoalistocomputeasumoramax,focusfirstoncountingmode (countthenumberofuniqueoutputsforaninput)

• memoization =recursion+saving/reusingsolutions– startbydefiningrecursiveequations– “memoize”bycreatingatabletostoreallintermediateresultsfromrecursiveequations,usethemwhenrequested

95

Page 96: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

InferenceinHMMs

96

• sincetheoutputisasequence,thisargmaxrequiresiteratingoveranexponentially-largeset

• lastweekwetalkedaboutusingdynamicprogramming(DP)tosolvetheseproblems

• forHMMs(andothersequencemodels),theforsolvingthisiscalledtheViterbialgorithm

Page 97: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ViterbiAlgorithm• recursiveequations+memoization:

97

basecase:returnsprobabilityofsequencestartingwithlabely forfirstword

recursivecase:computesprobabilityofmax-probabilitylabelsequencethatendswithlabely atpositionm

finalvalueisin:

Page 98: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

98

spacecomplexity:sizeofmemoization table,whichis#ofuniqueindicesofrecursiveequations

so,spacecomplexityisO(|x||L|)

lengthofsentence

numberoflabels*

Page 99: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

99

timecomplexity:sizeofmemoization table*complexityofcomputingeachentry

so,timecomplexityisO(|x||L||L|)=O(|x||L|2)

lengthofsentence

numberoflabels*

eachentryrequiresiteratingthroughthelabels*

Page 100: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/13.pdf · TTIC 31190: Natural Language Processing ... he’s a bass in the choir . bass 3

FeatureLocality

• featurelocality:how“big”areyourfeatures?• whendesigningefficientinferencealgorithms(whetherw/DPorothermethods),weneedtobemindfulofthis

• featurescanbearbitrarilybigintermsoftheinput,butnotintermsoftheoutput!

• thefeaturesinHMMsaresmallinboththeinputandoutputsequences(onlytwopiecesatatime)

100