natural language inference, reading comprehension and...

72
Natural Language Inference, Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR 2016

Upload: tranhanh

Post on 25-Apr-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

NaturalLanguageInference,ReadingComprehensionandDeepLearning

ChristopherManning@chrmanning•@stanfordnlp

StanfordUniversitySIGIR2016

Page 2: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Machine ComprehensionTested by question answering (Burges)

“Amachinecomprehends apassageoftext if,foranyquestion regardingthattextthatcanbeansweredcorrectlybyamajorityofnativespeakers,thatmachinecanprovideastringwhichthosespeakerswouldagreebothanswersthatquestion,anddoesnotcontaininformationirrelevanttothatquestion.”

Page 3: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

IR needs language understanding

• ThereweresomethingsthatkeptIRandNLPapart• IRwasheavilyfocusedonefficiencyandscale• NLPwaswaytoofocusedonformratherthanmeaning

• Nowtherearecompellingreasonsforthemtocometogether• TakingIRprecisionandrecalltothenextlevel• [carparts forsale]• Shouldmatch:Sellingautomobileandpickupengines, transmissions

• Example fromJeffDean’sWSDM2016talk

• Informationretrieval/questionansweringinmobilecontexts• Websnippetsnolongercutitonawatch!

Page 4: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Menu

1. Naturallogic:Aweaklogicoverhumanlanguagesforinference

2. Distributedwordrepresentations

3. Deep,recursiveneuralnetworklanguageunderstanding

Page 5: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

How can information retrieval be viewed more as theorem proving (than matching)?

Page 6: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

AI2 4th Grade Science Question Answering [Angeli, Nayak, & Manning, ACL 2016]

Our“knowledge”:

Ovariesarethefemalepartoftheflower,whichproduceseggsthatareneededformakingseeds.

Thequestion:

Whichpartofaplantproducestheseeds?

Theanswerchoices:

theflowertheleavesthestemtheroots

Page 7: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

How can we represent and reason with broad-coverage knowledge?

1. Rigid-schemaknowledgebaseswithwell-definedlogicalinference

2. Open-domainknowledgebases(OpenIE)– noclearontologyorinference[Etzionietal.2007ff]

3. HumanlanguagetextKB–Norigidschema,butwith“Naturallogic”candoformalinferenceoverhumanlanguagetext [MacCartneyandManning2008]

Page 8: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Natural Language Inference[Dagan 2005, MacCartney & Manning, 2009]

Doesapieceoftextfollowsfromorcontradictanother?TwosenatorsreceivedcontributionsengineeredbylobbyistJackAbramoffinreturnforpoliticalfavors.

JackAbramoffattemptedtobribetwolegislators.

Heretrytoproveorrefuteaccordingtoalargetextcollection:1. Theflowerofaplantproducestheseeds2. Theleavesofaplantproducestheseeds3. Thestemofaplantproducestheseeds4. Therootsofaplantproducestheseeds

Follows

Page 9: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Text as Knowledge Base

Storingknowledgeastextiseasy!

Doinginferencesovertextmightbehard

Don’twanttoruninferenceovereveryfact!

Don’twanttostorealltheinferences!

Page 10: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Inferences … on demand from a query … [Angeli and Manning 2014]

Page 11: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

… using text as the meaningrepresentation

Page 12: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Natural Logic: logical inference over text

WearedoinglogicalinferenceThecatateamouse⊨ ¬Nocarnivoreseatanimals

WedoitwithnaturallogicIfImutateasentenceinthisway,doIpreserveitstruth?Post-DealIranAsksifU.S. IsStill‘GreatSatan,’orSomethingLess ⊨ACountryAsksifU.S. IsStill‘GreatSatan,’orSomethingLess

• Asoundandcompleteweaklogic[Icard andMoss2014]• Expressiveforcommonhumaninferences*• “Semantic”parsingisjustsyntacticparsing• Tractable:Polynomialtimeentailmentchecking• Playsnicelywithlexicalmatchingback-offmethods

Page 13: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

#1. Common sense reasoning

PolarityinNaturalLogic

WeorderphrasesinpartialordersSimplestone:is-a-kind-ofAlso:geographicalcontainment,etc.

Polarity:Inacertaincontext,isitvalidtomoveupordowninthisorder?

Page 14: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Example inferences

Quantifiersdeterminethepolarity ofphrases

Validmutationsconsider polarity

Successfultoyinference:• Allcatseatmice ⊨ Allhousecatsconsumerodents

Page 15: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

“Soft” Natural Logic

• Wealsowanttomakelikely(butnotcertain)inferences• SamemotivationasMarkovlogic,probabilisticsoftlogic,

etc.• Eachmutationedgetemplatefeaturehasacostθ ≥0• Costofanedgeisθi ·fi• Costofapathisθ ·f• Canlearnparametersθ• Inferenceisthengraphsearch

Page 16: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

#2. Dealing with real sentences

Naturallogicworkswithfactsliketheseintheknowledgebase:ObamawasborninHawaii

Butreal-worldsentencesarecomplexandlong:BorninHonolulu,Hawaii,Obama isagraduateofColumbiaUniversityandHarvardLawSchool,whereheservedaspresidentoftheHarvardLawReview.

Approach:1. Classifierdivideslongsentencesintoentailedclauses2. Naturallogicinferencecanshortentheseclauses

Page 17: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Universal Dependencies (UD)http://universaldependencies.github.io/docs/

Asingleleveloftypeddependencysyntaxthat(i) worksforallhumanlanguages(ii) givesasimple,human-friendlyrepresentationofsentence

Dependencysyntaxisbetterthanaphrase-structuretreeformachineinterpretation– it’salmostasemanticnetwork

UDaimstobelinguisticallybetteracrosslanguagesthanearlierrepresentations,suchasCoNLLdependencies

Page 18: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Generation of minimal clauses

1. Classificationproblem:givenadependencyedge,doesitintroduceaclause?

2. Isitmissingacontrolledsubjectfromsubj/object?

3. Shortenclauseswhilepreservingvalidity,usingnaturallogic!• Allyoungrabbitsdrinkmilk

⊭ Allrabbitsdrinkmilk

• OK: SJC,theBayArea’sthirdlargestairport,oftenexperiences delaysduetoweather.

• Oftenbetter:SJCoftenexperiencesdelays.

Page 19: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

#3. Add a lexical alignment classifier

• Sometimeswecan’tquitemaketheinferencesthatwewouldliketomake:

• Weuseasimplelexicalmatchback-offclassifierwithfeatures:• Matchingwords,mismatchedwords,unmatchedwords• Thesealwaysworkprettywell• Thiswasthe lessonofRTEevaluations andperhaps orIRingeneral

Page 20: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

The full system

• Werunourusualsearchoversplitup,shortenedclauses• Ifwefindapremise,great!• Ifnot,weusethelexicalclassifierasanevaluationfunction

• Weworktodothisquicklyatscale• Visit1Mnodes/second, don’trefeaturize, justdelta• 32bytesearchstates (thanksGabor!)

Page 21: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Solving NY State 4th grade science (Allen AI Institute datasets)

Multiplechoicequestionsfromreal4th gradescienceexamsWhichactivityisanexampleofagoodhealthhabit?

(A)Watchingtelevision(B)Smokingcigarettes(C)Eatingcandy(D)Exercisingeveryday

Inourcorpus knowledgebase:• PlasmaTV’scandisplayupto16millioncolors...greatfor

watchingTV...alsomakeagoodscreen.• Notsmokingordrinkingalcoholisgoodforhealth,regardlessof

whetherclothingiswornornot.• Eatingcandyfordinerisanexampleofapoorhealthhabit.• Healthyisexercising

Page 22: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Solving 4th grade science (Allen AI NDMC)

System Dev TestKnowBot[Hixon etal.NAACL2015] 45 –KnowBot(augmentedwith humaninloop) 57 –IRbaseline(Lucene) 49 42NaturalLI 52 51Moredata+IRbaseline 62 58Moredata+NaturalLI 65 61NaturalLI+🔔 +(lex.classifier) 74 67Aristo[Clarketal.2016]6systems,evenmoredata 71

Testset:NewYorkRegents4thGradeScienceexammultiple-choice questionsfromAI2Training:BasicisBarron’sstudyguide;moredataisSciText corpusfromAI2.Score:%correct

Page 23: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Natural Logic

• Canwejustusetextasaknowledgebase?• Naturallogicprovidesauseful,formal(weak)logicfortextual

inference• Naturallogiciseasilycombinablewithlexicalmatching

methods,includingneuralnetmethods• Theresultingsystemisusefulfor:• Common-sensereasoning• QuestionAnswering• OpenInformationExtraction• i.e.,gettingoutrelation triples fromtext

Page 24: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Can information retrieval benefit from distributed representations of words?

Page 25: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

From symbolic to distributed representations

Thevastmajorityofrule-basedorstatisticalNLPand IR workregardedwordsasatomicsymbols:hotel, conference,walk

Invectorspaceterms,thisisavectorwithone1andalotofzeroes

[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]

Wenowcallthisa“one-hot”representation.

Sec. 9.2.2

Page 26: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

From symbolic to distributed representations

Itsproblem:• Ifusersearchesfor[Dellnotebookbatterysize],wewouldliketomatchdocumentswith“Delllaptopbatterycapacity”

• Ifusersearchesfor[Seattlemotel],wewouldliketomatchdocumentscontaining“Seattlehotel”

Butmotel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]T

hotel [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] = 0OurqueryanddocumentvectorsareorthogonalThereisnonaturalnotionofsimilarityinasetofone-hotvectors

Sec. 9.2.2

Page 27: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Capturing similarity

Therearemanythingsyoucandoaboutsimilarity,manywellknowninIR

Queryexpansionwithsynonymdictionaries

Learningwordsimilaritiesfromlargecorpora

ButawordrepresentationthatencodessimilaritywinsLessparameterstolearn(perword,notperpair)

Moresharingofstatistics

Moreopportunitiesformulti-tasklearning

Page 28: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Distributional similarity-based representations

Youcangetalotofvaluebyrepresentingawordbymeansofitsneighbors

“Youshallknowawordbythecompanyitkeeps”(J.R.Firth1957:11)

OneofthemostsuccessfulideasofmodernNLPgovernment debt problems turning into banking crises as has happened in

saying that Europe needs unified banking regulation to replace the hodgepodge

ë Thesewordswillrepresentbankingì

Page 29: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Basic idea of learning neural network word embeddings

Wedefinesomemodelthataimstopredictawordbasedonotherwordsinitscontext

Chooseargmaxw w·((wj−1 +wj+1)/2)

whichhasalossfunction,e.g.,

J =1−wj·((wj−1 +wj+1)/2)

Welookatmanysamplesfromabiglanguagecorpus

Wekeepadjustingthevectorrepresentationsofwordstominimizethisloss

Unitnormvectors

Page 30: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

With distributed, distributional representations, syntactic and semantic similarity is captured

0.2860.792−0.177−0.1070.109−0.5420.3490.271

currency =

Page 31: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Distributional representations can solve the fragility of NLP tools

StandardNLPsystems– here,theStanfordParser– areincrediblyfragilebecauseofsymbolicrepresentations

Crazysententialcomplement, suchasfor“likes [(being)crazy]”

Page 32: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Distributional representations can capture the long tail of IR similarity

Google’sRankBrain

Notnecessarilyasgoodfortheheadofthequerydistribution,butgreatforseeingsimilarityinthetail

3rd mostimportantrankingsignal(we’retold…)

Page 33: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

LSA:Count!models• Factorizea(maybeweighted,oftenlog-scaled)term-

document(Deerwester etal.1990)orword-contextmatrix(Schütze1992)intoUΣVT

• Retainonlyksingularvalues,inordertogeneralize

[Cf.Baroni:Don’tcount,predict!Asystematiccomparisonofcontext-countingvs.context-predicting semanticvectors. ACL2014]

LSA (Latent Semantic Analysis) vs. word2vec

k

Sec. 18.3

Page 34: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

word2vecCBOW/SkipGram:Predict![Mikolov etal.2013]:Simplepredictmodelsforlearningwordvectors• Trainwordvectorstotrytoeither:

• Predictawordgivenitsbag-of-wordscontext(CBOW);or

• Predictacontextword(position-independent)fromthecenterword

• Updatewordvectorsuntiltheycandothispredictionwell

LSA vs. word2vec

Sec. 18.3

Page 35: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

word2vec encodes semantic components as linear relations

Page 36: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

COALS model (count-modified LSA)[Rohde, Gonnerman & Plaut, ms., 2005]

Page 37: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Count based vs. direct prediction

LSA, HAL (Lund & Burgess), COALS (Rohde et al), Hellinger-PCA (Lebret & Collobert)

• Fast training• Efficient usage of statistics

• Primarily used to capture word similarity

• May not use the best methods for scaling counts

• NNLM, HLBL, RNN, word2vec Skip-gram/CBOW, (Bengio et al; Collobert & Weston; Huang et al; Mnih & Hinton; Mikolov et al; Mnih & Kavukcuoglu)

• Scales with corpus size

• Inefficient usage of statistics

• Can capture complex patterns beyond word similarity

• Generate improved performance on other tasks

Page 38: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents

Crucialinsight:

x =solid x =water

large

x =gas

small

x =random

smalllarge

small large large small

~1 ~1large small

Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]

Page 39: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents

Crucialinsight:

x =solid x =water

1.9x10-4

x =gas x =fashion

2.2x10-5

1.36 0.96

Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]

8.9

7.8x10-4 2.2x10-3

3.0x10-3 1.7x10-5

1.8x10-5

6.6x10-5

8.5x10-2

Page 40: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

A:Log-bilinearmodel:

withvectordifferences

Encoding meaning in vector differences

Q:Howcanwecaptureratiosofco-occurrenceprobabilitiesasmeaningcomponentsinawordvectorspace?

Page 41: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Nearestwordsto frog:

1.frogs2.toad3.litoria4.leptodactylidae5.rana6.lizard7.eleutherodactylus

Glove Word similarities[Pennington et al., EMNLP 2014]

litoria leptodactylidae

rana eleutherodactylushttp://nlp.stanford.edu/projects/glove/

Page 42: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Glove Visualizations

http://nlp.stanford.edu/projects/glove/

Page 43: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Glove Visualizations: Company - CEO

Page 44: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Named Entity Recognition Performance

ModelonCoNLL CoNLL ’03dev CoNLL ’03test ACE2 MUC7CategoricalCRF 91.0 85.4 77.4 73.4SVD(logtf) 90.5 84.8 73.6 71.5HPCA 92.6 88.7 81.7 80.7C&W 92.2 87.4 81.7 80.2CBOW 93.1 88.2 82.2 81.1GloVe 93.2 88.3 82.9 82.2

F1scoreofCRFtrainedonCoNLL2003Englishwith50dimwordvectors

Page 45: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Word embeddings: Conclusion

Glovetranslatesmeaningfulrelationshipsbetweenword-wordco-occurrencecounts into linearrelations inthewordvectorspace

GloveshowstheconnectionbetweenCount!workandPredict!work– appropriatescalingofcountsgivesthepropertiesandperformanceofPredict!Models

Alotofotherimportantworkinthislineofresearch:[Levy&Goldberg,2014][Arora,Li,Liang,Ma&Risteski,2015][Hashimoto,Alvarez-Melis &Jaakkola,2016]

Page 46: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Can we use neural networks to understand, not just word similarities,

but language meaning in general?

Page 47: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

CompositionalityArtificial Intelligence requires being able to understand bigger things from knowing

about smaller parts

Page 48: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

We need more than word embeddings!

Howcanweknowwhenlargerlinguisticunitsaresimilarinmeaning?

Thesnowboarderisleapingoverthemogul

Apersononasnowboardjumpsintotheair

Peopleinterpretthemeaningoflargertextunits–entities,descriptiveterms,facts,arguments,stories– bysemanticcomposition ofsmallerelements

Page 49: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Beyond the bag of words: Sentiment detection

Isthetoneofapieceoftextpositive,negative,orneutral?

• Sentimentisthatsentimentis“easy”• Detectionaccuracyforlongerdocuments~90%,BUT

……loved……………great………………impressed………………marvelous…………

Page 50: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Stanford Sentiment Treebank

• 215,154phraseslabeledin11,855sentences• Cantrainandtestcompositions

http://nlp.stanford.edu:8080/sentiment/

Page 51: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Tree-Structured Long Short-Term Memory Networks [Tai et al., ACL 2015]

Page 52: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Tree-structured LSTM

GeneralizessequentialLSTMtotreeswithanybranchingfactor

Page 53: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Positive/Negative Results on Treebank

75

80

85

90

95

TrainingwithSentence Labels TrainingwithTreebank

BiNB

RNN

MV-RNN

RNTN

TreeLSTM

Page 54: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Experimental Results on Treebank

• TreeRNN cancaptureconstructionslikeXbutY• Biword NaïveBayesisonly58%onthese

Page 55: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Stanford Natural Language Inference Corpus http://nlp.stanford.edu/projects/snli/570K Turker-judged pairs, based on an assumed picture

Amanridesabikeon asnowcoveredroad.Aman isoutside. ENTAILMENT

2femalebabieseatingchips.Twofemalebabiesareenjoyingchips.NEUTRAL

Amaninanapronshoppingatamarket.Amaninanapronispreparingdinner.CONTRADICTION

Page 56: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

NLI with Tree-RNNs[Bowman, Angeli, Potts & Manning, EMNLP 2015]

Approach: Wewouldliketoworkoutthemeaningofeachsentenceseparately– apurecompositionalmodel

ThenwecomparethemwithNN&classifyforinference

P(Entail)=0.8

manoutside vs. maninsnow

snowin

insnowman

maninsnow

outsideman

manoutside

Softmax classifier

ComparisonNNlayer(s)

Composition NN layer

Learnedwordvectors

Page 57: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Tree recursive NNs (TreeRNNs)

Theoreticallyappealing

Veryempiricallycompetitive

But

Prohibitivelyslow

Usuallyrequireanexternalparser

Don’texploitcomplementarylinearstructureoflanguage

Page 58: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

A recurrent NN allows efficient batched computation on GPUs

Page 59: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

TreeRNN: Input-specific structure undermines batched computation

Page 60: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

The Shift-reduce Parser-Interpreter NN (SPINN) [Bowman, Gauthier et al. 2016]

BasemodelequivalenttoaTreeRNN,but…supportsbatchedcomputation:25× speedups

Plus:Effectivenewhybridthatcombineslinearandtree-structuredcontext

Canstandalonewithoutaparser

Page 61: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Beginning observation: binary trees = transition sequences

SHIFT SHIFTREDUCE SHIFTSHIFT REDUCE

REDUCE

SHIFT SHIFTSHIFT SHIFTREDUCE REDUCE

REDUCE

SHIFT SHIFTSHIFT REDUCESHIFT REDUCE

REDUCE

Page 62: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

The Shift-reduce Parser-Interpreter NN (SPINN)

Page 63: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

The Shift-reduce Parser-Interpreter NN (SPINN)

ThemodelincludesasequenceLSTMRNN• ThisactsasasimpleparserbypredictingSHIFTorREDUCE• Italsogivesleftsequencecontextasinputtocomposition

Page 64: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Implementing the stack

• Naïveimplementation:simulatesstacksin abatchwithafixed-sizemultidimensionalarray ateachtimestep• Backpropagationrequiresthateachintermediatestackbemaintainedinmemory

• ⇒ Largeamountofdatacopyingandmovementrequired• Efficientimplementation

• Haveonlyonestackarrayforeachexample• Ateachtimestep,augmentwiththecurrentheadofthestack• Keeplistofbackpointers forREDUCEoperations

• Similartozipperdatastructuresemployedelsewhere

Page 65: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Array Backpointers

1

2

3

4

5

Spot 1

sat 1 2

down 123

(satdown) 14

(Spot(satdown)) 5

A thinner stack

Page 66: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep
Page 67: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Using SPINN for natural language inference

o₁ o₂the cat

sat down

the cat

is angry

(the cat) (sat down)

(the cat) (is angry)

Page 68: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

SNLI Results

Model %Accuracy(Testset)

Feature-based classifier 78.2PreviousSOTAsentenceencoder[Mou etal.2016]

82.1

LSTMRNNsequencemodel 80.6Tree LSTM 80.9SPINN 83.2SOTA(sentencepairalignmentmodel)[Parikhetal.2016]

86.8

Page 69: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Successes for SPINN over LSTM

Exampleswithnegation• P:Therhythmicgymnastcompletesherfloorexerciseatthecompetition.

• H:Thegymnastcannotfinishherexercise.

Longexamples(>20words)• P:AmanwearingglassesandaraggedcostumeisplayingaJaguarelectricguitarandsingingwiththeaccompanimentofadrummer.

• H:Amanwithglassesandadisheveledoutfitisplayingaguitarandsingingalongwithadrummer.

Page 70: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Envoi

• Thereareverygoodreasonsforwantingtorepresentmeaningwithdistributedrepresentations

• Sofar,distributionallearninghasbeenmosteffectiveforthis• Butcf.[Young,Lai,Hodosh&Hockenmaier 2014]ondenotationalrepresentations,usingvisualscenes

• However,wewantnotjustwordmeanings,butalso:• Meaningsoflargerunits,calculatedcompositionally• Theabilitytodonaturallanguageinference

• TheSPINNmodelisfast— closetorecurrentnetworks!• Itshybridsequence/treestructureispsychologicallyplausible

andout-performsothersentencecompositionmethods

Page 71: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

2011201320152017speechvisionNLPIR

Final Thoughts

Youarehere

IR NLP

Page 72: Natural Language Inference, Reading Comprehension and …nlp.stanford.edu/~manning/talks/SIGIR2016-Deep-Learning-NLI.pdf · Natural Language Inference, Reading Comprehension and Deep

Final Thoughts

I’mcertainthatdeep learningwillcometodominateSIGIRover thenextcoupleofyears…justlikespeech, vision,andNLPbefore it.Thisisagoodthing.Deeplearningprovidessomepowerfulnewtechniques thatarejustbeingamazinglysuccessful onmanyhardappliedproblems.However, weshould realize thatthere isalsocurrentlyahugeamountofhypeaboutdeep learningandartificialintelligence.Weshouldnotletagenuineenthusiasmforimportantandsuccessful newtechniques leadtoirrationalexuberance oradiminishedappreciation ofotherapproaches. Finally,despite theefforts ofanumberofpeople, inpractice therehas beenaconsiderable divisionbetween thehumanlanguage technologyfieldsofIR,NLP,andspeech.Partlythisisduetoorganizational factorsandpartlythatatonetimethesubfieldseachhadaverydifferent focus.However, recentchanges inemphasis– withIRpeoplewantingtounderstand theuserbetterandNLPpeoplemuchmoreinterested inmeaningandcontext– meanthattherearealotofcommoninterests, andIwouldencouragemuchmorecollaborationbetween NLPandIRinthenextdecade.