ttic 31190: natural language processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting...

TTIC31190:NaturalLanguageProcessing

KevinGimpelWinter2016

Lecture9:SequenceModels

Announcements• onThursday,classwillbeinRoom530(theroomdirectlybehindyou)

Announcements• wewillgooverpartofAssignment1today(gradescomingsoon)

• Assignment2wasdueWed.Feb.3,nowdueFri.,Feb.5

• projectproposaldueTuesday,Feb.16• midtermonThursday,Feb.18

• qualityofscientificjournalism:

OtherNaturally-OccurringData

OtherNaturally-OccurringData• memorabilityofquotations:

OtherNaturally-OccurringData• sarcasm(remove#sarcasmhashtagfromtweets):

OtherNaturally-OccurringData• openingweekendmovierevenuepredictionfromcriticreviews:

OtherNaturally-OccurringData• predictingnovelsuccessfromtextofnovels:

ProjectProposal• dueFeb.16(intwoweeks)• 1-2pages• onepergroup• includethefollowing:– membersofyourgroup– describethetaskyouaregoingtoworkon(couldbeanewtaskyoucreateoranexistingtask)

– describethemethodsyouwilluse/developforthetask– giveabriefreviewofrelatedwork;i.e.,situateyourprojectwithrespecttotheliterature(www.aclweb.organdGoogleScholarareusefulforthis!)

– aproposedtimeline

ProjectProposal(cont’d)

• yourresultsdonothavetobeatthestate-of-the-art!

• butyourprojectdoeshavetobecarefullydone,sothatyoucandrawconclusions

• youarewelcometostartbyreplicatinganNLPpaper(Icangivesuggestionsifyouneedsome)

• duringtheweekofFeb.22,pleasescheduleameetingwithmetodiscussyourproject– detailstofollow

ClassPresentations• finaltwoclassmeetings(March3rd andMarch8th)willbemostlyusedforin-classpresentations

• onepresentationpergroup• 10-15minutesperpresentation(willbedeterminedonceIknowhowmanygroupsthereare)

• youwilleachtakenotesandemailmequestions/feedbackforthepresenter,whichIwillanonymize andsend

Project• finalreportdueThursday,March17(originaldateofthefinalexam)

• sothepresentationwillbemorelikean“interimprogressreport”

Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• neuralnetworkmethodsinNLP• syntaxandsyntacticparsing• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications

determinerverb(past)prep.properproperposs.adj.noun

modalverbdet.adjectivenounprep.properpunc.

Part-of-SpeechTagging

determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct

modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.

Simplestkindofstructuredprediction:SequenceLabeling

OOOB-PERSONI-PERSONOOOSomequestionedifTimCook’sfirstproduct

OOOOOOB-ORGANIZATIONOwouldbeabreakawayhitforApple.

NamedEntityRecognition

B=“begin”I=“inside”O=“outside”

FormulatingsegmentationtasksassequencelabelingviaB-I-Olabeling:

• therearemanydownloadablepart-of-speechtaggersandnamedentityrecognizers:– StanfordPOStagger,NERlabeler– TurboTagger,TurboEntityRecognizer– IllinoisEntityTagger– CMUTwitterPOStagger– AlanRitter’sTwitterPOS/NERlabeler

HiddenMarkovModels

y1 y2 y3 y4

x1 x2 x3 x4

transitionparameters:

emissionparameters:

HMMsforWordClustering(Brownetal.,1992)

eachisaclusterIDso,labelspaceis

justin bieber forpresident

y1 y2 y3 y4

HMMsforPart-of-SpeechTagging

eachisapart-of-speechtagso,labelspaceis

whatparametersneedtobelearned?

emissionparameters:

propernoun

prepo-sition

HowshouldwelearntheHMMparameters?

emissionparameters:

SupervisedHMMs• givenadatasetofinputsequencesandannotatedoutputs:

• toestimatetransition/emissiondistributions,usemaximumlikelihoodestimation(countandnormalize):

propernoun

prepo-sition

EstimatesofTagTransitionProbabilities

proper modalinfinitive adjectivenoun adverbdeterminernoun verbverb

EstimatesofEmissionProbabilities

InferenceinHMMs

• sincetheoutputisasequence,thisargmaxrequiresiteratingoveranexponentially-largeset

• lastweekwetalkedaboutusingdynamicprogramming(DP)tosolvetheseproblems

• forHMMs(andothersequencemodels),theforsolvingthisiscalledtheViterbialgorithm

ViterbiAlgorithm• recursiveequations+memoization:

basecase:returnsprobabilityofsequencestartingwithlabely forfirstword

recursivecase:computesprobabilityofmax-probabilitylabelsequencethatendswithlabely atpositionm

finalvalueisin:

Example:

Janetwillbackthebill

proper modalinfinitive determiner nounnoun verbverb

Janetwillbackthebill

proper modalinfinitive determiner nounnoun verbverb

ViterbiAlgorithm(onboard)

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

spacecomplexity:sizeofmemoization table,whichis#ofuniqueindicesofrecursiveequations

so,spacecomplexityisO(|x||L|)

lengthofsentence

numberoflabels*

ViterbiAlgorithm• spaceandtimecomplexity?• canbereadofffromtherecursiveequations:

timecomplexity:sizeofmemoization table*complexityofcomputingeachentry

so,timecomplexityisO(|x||L||L|)=O(|x||L|2)

lengthofsentence

numberoflabels*

eachentryrequiresiteratingthroughthelabels*

LinearSequenceModels

• let’sgeneralizeHMMsandtalkaboutlinearmodelsforscoringlabelsequencesinourclassifierframework:

• butfirst,howdoweknowthatthisscoringfunctiongeneralizesHMMs?

HMMasaLinearModel

• whatarethefeaturetemplatesandweights?

linearmodel:

HMMasaLinearModel

featuretemplatesandweights:

linearmodel:

LinearSequenceModels

• so,anHMMis:– alinearsequencemodel– withparticularfeaturesonlabeltransitionsandlabel-observationemissions

– andusesmaximumlikelihoodestimation(count&normalize)forlearning

• butwecoulduseanyfeaturefunctionswelike,anduseanyofourlossfunctionsforlearning!

(Chain)ConditionalRandomFields

• linearsequencemodel• arbitraryfeaturesofinputarepermitted• test-timeinferenceusesViterbiAlgorithm• learningdonebyminimizinglogloss(DPalgorithmsusedtocomputegradients)

Max-MarginMarkovNetworks

Maximum-MarginMarkovNetworks

• linearsequencemodel• arbitraryfeaturesofinputarepermitted• test-timeinferenceusesViterbiAlgorithm• learningdonebyminimizinghingeloss(DPalgorithmusedtocomputesubgradients)

FeatureLocality

• featurelocality:roughly,how“big”areyourfeatures?

• whendesigningefficientinferencealgorithms(whetherw/DPorothermethods),weneedtobemindfulofthis

• featurescanbearbitrarilybigintermsoftheinputsequence

• butfeaturescannot bearbitrarilybigintermsoftheoutput sequence!

• thefeaturesinHMMsaresmallinboththeinputandoutputsequences(onlytwopiecesatatime)

Arethesefeaturesbigorsmall?

feature bigorsmall?

featurethatcountsinstancesof“the”intheinputsentence small

featurethatreturns squarerootofsumofcountsofam/is/was/were small

featurethatcounts“verb verb”sequences small

featurethatcounts“determiner noun verb verb”sequences prettybig!

featurethatcountsthenumberofnounsinasentence

big,butwecandesignspecialized

algorithmstohandlethemifthey’retheonlybigfeatures

feature thatreturnstheratioofnounstoverbs

Learningwithlinearsequencemodels• givenalinearsequencemodelwith“small”features,howshouldwedolearning?

Lossfunctionsforlearninglinearsequencemodels

loss entryj of(sub)gradientofloss forlinearmodel

perceptron

samegradients/subgradients asbefore,thoughcomputingtheseterms(inference)requiresDP

algorithms

ImplementingDPalgorithms

• startwithcountingmode,butkeepinmindhowthemodel’sscorefunctiondecomposesacrosspartsoftheoutputs– i.e.,how“large”arethefeatures?howmanyitemsintheoutputsequenceareneededtocomputeeachfeature?

– defineafunctioncalledpartScore thatcomputesallthefeatures(forcountingmode,thisfunctionwillreturn1)

NeuralNetworksinNLP• neuralnetworks• deepneuralnetworks• neurallanguagemodels• recurrentneuralnetworksandLSTMs• convolutionalneuralnetworks

Whatisaneuralnetwork?• justthinkofaneuralnetworkasafunction• ithasinputsandoutputs• theterm“neural”typicallymeansaparticulartypeoffunctionalbuildingblock(“neurallayers”),butthetermhasexpandedtomeanmanythings

ttic 31190: natural language processingkgimpel/teaching/31190/... · 2016. 2. 2. · • predicting...

Documents

ttic 31190: natural language...

clark clark clark clark clark hyster - · pdf filehyster...

the rule of law in ttic and post-)westphalian poetics...

david mcallester, winter 2018 computation graphs and...

compilation of designated cultural properties in kizugawa...

travel and tourism in vietnam to 2019 - marketresearch ·...

ttic 31230, fundamentals of deep learning · 2020-04-04 ·...

a generalized blind scheduling...

ttic 31190: natural language processing

n. bansal 1, m. charikar 2, r. krishnaswamy 2, s. li 3 1 tu...

lehder environmental services limited has attempted to...

ttic 31190: natural language...

ttic 31230, fundamentals of deep...

progress in deep learning theory - ttic · in neural nets...

ttic 31190: natural language...

t ttic)avernment aztle - western australia · t...

businesses for sale - microsoft limited level 1, 110 wairau...

ttic and information sharing ttic exists, in large part,...

lecture 7 convolutional neural networks - ttic

ttic 31210 · 2017. 5. 22. · local maximum of the...