from simple to complex qa - nus · – multiple-choice question answering: race, mctest – answer...
TRANSCRIPT
FromSimpletoComplexQA
EduardHovyCMULanguageTechnologiesInstitute
www.cs.cmu.edu/~hovy
WebclopediaQA,2003
• Wherearezebrasmostlikelyfound? —inthedictionary
• Wheredolobstersliketolive? —onthetable
• HowmanypeopleliveinChile? —nine
Webclopedia(Hovyetal.2001)
• Whatisaninvertebrate? —Dukakis 1
BasicsimplefactoidQA
• IdentifykeywordsfromQ• Build(Boolean)queryforIR• RetrievetextsusingIR• Ranktexts/passages
• FindspecifiedQtype• MoveApatternsovertextand
scoreeachposition• Rankwindows;returntopN
Alist
InputQ
Corpus:30%
+Web:add10%
1Mdocuments3000sentences
50candidates5answers
…Xwasbornin<YEAR>……Xwasbornon<DATE>……X(<YEAR>–<YEAR>)…
2
WhereistheAnswer?—Progresssince2003?
TypicalQAformat:
EithertheQcontextprovidestheA1. nearby(=n-wordwindow)context 2. distant(=doc-level)context
Ornotatall…soyouhavetousebackgroundinfo3. fromthetrainingdata4. fromlogicalderivation/reasoningrules/procedure 3
Question:QContext:“wwwwww…w”A:
Either…
WhenallinfoneededtogettheAispresentintheQcontext
…thensomeformofsurfaceandsimpletypematching+sub-Acompositionisenough
—>Ultimately,justdo[nested]simpleQA
Or…
ButwhengettingtheArequiresinformationnotintheQcontext(likebackgroundinfo,calculation,etc.)
…thenyouareintrouble:thisisnotstandardized,henceimpossibletoevaluate
—>NocomplexQA!?4
Outline
1. Ainthenearbycontext2. Ainthedistantcontext3. Ahiddeninthetrainingdata4. Aonlybyreasoning
5
Option1:Ainnearbycontext
• BuildanduseshortpatternsorarichLM• Tonsofworksince2000onpatternlearningandgeneralization,QAtypologies,etc.
• NumerousQAdatasets(TREC,SQuAD,CNN…)• ManyQAcompetitions(SEMEVAL…)
6
Sowhere’sthelimit?
YoucandoaLOTwithpatternsDidyouknowyouareanexpertonthe
PanamaCanal?
BlahPanamaCanalblahblahPanamablahPres.RooseveltblahUSAblahblahblahblahblah10yearsblahuntil1914blahblahblahblah51milesblahblahblahblahblahblahblahblahblahblahblahblahblah8to10hoursblahblahblahblahGatunLakeblah
WhenwasthePanamaCanalcompleted?HowlongisthePanamaCanal?HowlongdidittaketobuildthePanamaCanal?HowlongdoesittaketocrossthePanamaCanal?WhatisthelakeinthePanamaCanalcalled?WhichUSPresidentenabledthePanamaCanal?WhichoceansdoesthePanamaCanalconnect?Inyourtrainingdata,youhavesurelyseen“PanamaCanal”withonlytwooceannames…
7Sowhere’sthelimit?
Acorpustotestthepowerofngram/patternQAmodels
• CLOTH(Xie,Lai,Dai,Hovy,EMNLP2018)– Large-scaleClozetestdataset– CreatedbyEnglishteachersinChinaforEnglishexams(MiddleandHighschoollevels)
– Aftercleanup:7kpassages;99kquestions(2/3removed)
– Droppedwordsandwordoptionscarefullycreatedbyteachers:highlynuancedalternatives
– Testsknowledgeofgrammar,vocabulary,reasoning• Howwelldostate-of-the-artcomputationalmodelsdocomparedtohumans?– Wetestusinga1-billion-wordlanguagemodel
8
• Tense,voice,preps• Localcontentwords
• Copy/paraphrasewords• Contentwords,long-distancedependencies
Percentagesoftestexamples,Middle/Highschoollevels
(Xieetal.2018)
9
QAsystemresults
• Evena1B-LMstilllagsbehindhumanperformance• Increasingthecontextlengthfor1B-LMdoesnothelp• However:human-createdquestionsaredifferent:
(Xieetal.2018)
10
(AR:AttentionReader)
Thiswaspre-BERT!)
Conclusionforoption1
ForfactoidQAtypesthatobeypatterns,iftheAiscloseenough,andyouhaveenoughtrainingdata……youwillalwayslearngoodenoughwordcombinationpatternstoconnectQparameters<–>Qcontextmaterial<–>A
(Ifyouhaven’tseenthenecessarywordcombinations,youwon’teverbeabletoanswertheQ)
11
Option2:Aindistantcontext
• StillusesomeformofmatchingQandA• Needamore-sophisticatedandlonger-distancetypeof‘pattern’
12
Makingmatchingmorecomplex:RACE:Abettertestbed
• RACE:ReAdingComprehensiondatasetfromExaminations(Lai,Xie,Liu,Yang,Hovy,EMNLP2018)
• CollectedfromChinesemiddleandhighschoolexamsthatevaluatehumanstudents’Englishreadingcomprehensionability– Designedbyhumanexperts:Ensuresqualityandbroadtopiccoverage
– SubstantiallymoredifficultthanexistingQAdatasets(butRACE-MeasierthanRACE-H)
– About4/5ofsourcematerialfilteredouttoremoveduplicates,incorrectformat,etc.
• Aftercleaning:27,933passages;97,687questions
(Laietal.2018)
13
14
Toward‘reasoning’:typesofmore-complexmatching
• ParaphrasingQs:testlanguageability• DetailQs:identifyandmatchdetailsofathing
• AttitudeQs:findopinions/attitudesoftheauthortowardssomething(’sentiment’)
• Whole-pictureQs:understandtheentirestory(multi-sentence)
• SummarizationQs:understandthepoint(multi-sentence)
(Laietal.2018)
15
Increasingreasoning
ComparisonwithotherQAdatasets• Reasoningquestions:59.2%ofRACE;20.5%ofSQuAD• Processingtypes:
– Wordmatching:exactmatch– Paraphrasing:paraphraseorentailment– Single-sentreasoning:incompleteinfoorconceptualoverlap
– Multi-sentreasoning:synthesizinginformationfrommultiplesentences
– Insufficient/Ambiguous:noA,orAisnotunique
(Laietal.2018)
16
ComparingQAalgorithms
• Baselines:– SlidingWindow:TF-IDFbasedmatchingalgorithm– StanfordAttentionReader(AR)andGatedAttentionReader(early-2018state-of-the-artneuralmodels)
• RACEhasmore‘semantics’(=requiresmore‘reasoning’)thanothercorpora:– higherhumanceiling– harderforneuralmodels
(Laietal.2018)
17
Matchingtypeperformance
• TurkersandSlidingWindowaregoodatsimplematchingquestions
• Surprisingly,StanfordARdoesnothavebetterperformanceonmatchingquestions
(Laietal.2018)
18
Conclusionforoption2
WhentheAisdistant,orrequiresmore-sophisticatedmatching/’reasoning’(notjustsimpleword-string/languagemodel),
thenattention-basedneuralmodelscandosomeofit,butstillfailwiththeharderparts
19
Option3:A‘hidden’intrainingdata
SometimestheQcontextdoesnotcontaintheAatall…butyoucanSTILLgettherightA!(AndevengetitwithouttheQitself!)CorruptedngramsandotherSQuADperturbations(JiaandLiang,EMNLP2017)
NecessityofQcontextorevenofQitself(KaushikandLipton,EMNLP2018,BestShortPaperaward)
20
Example:QonlyQuestion:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Answer:kakueitanaka
(KaushikandLipton,EMNLP2018)
21
Doyouactuallyneedthecontext?
• Researchgoal:– HowstrongaremodelsthatseetheQonly?– WhataboutmodelsthatseetheQcontextpassageonly?– Howdoweknowmodelsarereally“reading”thewholepassage?
• Question-onlysetting:– IftheQAsystemneedsthepassage,randomizeitswordsfirst– IfjustcandidateAsneeded,placetheminrandomspots,fillinterveningtextwithgibberish
• Passage-onlysetting:– ‘Ignore’theQs:assigneachQtosomerandompassage
22
(KaushikandLiptonEMNLP2018)
Experiments• Datasets/tests:
– Spanselection:SQuAD,TriviaQA– Clozequeries:ChildrensBookTest(CBT),CNN,CLOTH,Who-did-What,DailyMail
– Multi-classclassification(implicit):bAbI(20tasks)– Multiple-choicequestionanswering:RACE,MCTest– Answergeneration:MSMARCO
• Algorithms:– Key-ValueMemoryNetworks:
Milleretal.2016:Key-ValueMemoryNetworksforDirectlyReadingDocuments.ProceedingsofEMNLP
– GatedAttentionReaders:Dhingraetal.2017:Gated-AttentionReadersforTextComprehension.ProceedingsofACL
– QANet:Yuetal.2018:QANet:CombiningLocalConvolutionwithGlobalSelf-AttentionforReadingComprehension.ProceedingsofICLR
(KaushikandLiptonEMNLP2018)
23
Someresults
SQuAD,usingQANet
bAbI,usingKey-ValueMemNets
(KaushikandLiptonEMNLP2018)
24
Who-did-What,usingGated-AttentionReaders
CBT,usingGated-AttentionReaders
(KaushikandLiptonEMNLP2018)
25
Why?What’sgoingon??Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Answer:kakueitanaka
Transportationcompany
Kanemaru’ssecretary
Long-termpolitician
NamenotinGoogle
26
Conclusionforoption3
• Don’ttrustQAdatasets!• Don’ttrustQAsystemclaims!• First,checkif
– anypre-existing(=trainingdata)dependenciesamongtheQandcandidateAs?
– fullcontextpredictstheAwithouteventheQ?
27
Option4:Aonlythroughreasoning
FortrulycomplexQA:1. Identifytheindividualsteps/piecesneeded
toderivetheA2. Figureouthowtocompute/findthem
– FromtheQcontextand/orfromelsewhere
3. Compose(andcheck?)them– BuildanAfinding‘script’
28
Possiblesourcesofthisknowledge• Externalsearch:
– Querysomethinglikethewebandhopetobelucky
• Entailments:“sentence”–>“sentence”– Operateatsurfaceform(inRTEformulation)– Allowonesurfaceformtobestatedwhenanotherisgiven– NewsurfaceformmayprovideAnswer– Need:entailmentrules+entailmentapplier
• Axioms:A∨B–>C– Operateatdeeperlevel– Connectrepresentationsubgraphs,evenprovidingnewnodes– ExpandedgraphmayprovideAnswer– Need:axioms/compositionrules+theoremprover
29
Type1:Apopulartasktoday:QAoverstructureddata
• Data:database,table,etc.• Task:askQsthatrequire(1)findingvariousbitsofdataand(2)composingthemtomaketheA
• Themissinginformationisthescriptgoverningthesequenceofaccessandcomposition
• Research:howto[learnto]buildthisscript?• Evaluation:didthesystemproducetherightA?• Examples:
– U.S.geographydatabaseof800facts(Zelle&Mooney,1996)– Wikitablequestions(PasupatandLiang,2015;Dasigi2018)– Otherdomains’tables(severalAI2projects)
30
Wikitabledataset
31
Athlete Nation Olympics Medals
Gillis Grafström
Sweden (SWE) 1920–1932 4
Kim Soo-Nyung
South Korea (KOR) 1988-200 6
Evgeni Plushenko Russia (RUS) 2002–2014 4
Kim Yu-na South Korea (KOR) 2010–2014 2
Patrick Chan Canada (CAN) 2014 2
Question:WhichathletewasfromSouthKoreaaftertheyear2010?
Answer:KimYu-Na
Reasoning:1) GetrowswhereNationcolumn
containsSouthKorea2) FilterrowswhereOlympicshas
avaluegreaterthan2010.3) GetvaluefromAthletecolumn
fromfilteredrows.
Program:((reverseathlete)(and (nationsouth_korea) (year((reversedate)
(>=2010-mm-dd)))WikiTableQuestions,PasupatandLiang,2015
(DasigiLTIPhDthesis,2018)
Example:Dasigi• Approachforlearningtobuildaccessroutines:
1. ParseQ,builddependencytree2. ConvertintoLogicalForm3. Translateintocandidatetableaccessroutine4. (tryallkindsofmappingsfromwordstoqueryoperators/structure)5. Testcompositionbyrepeatedtrialanderror
• Essentially,learningisasearchin‘operatorcombinationspace’tobuildthelogicalform
• Weaksupervisionisnotenough.Speedupthelearning/searchby:– Learningtoassociatetableaccessparameterswithpartsofthetree(Q
variables)– Learningtoassociatenestingandaccessoperatorswithpartsofthetree
(‘operator’words:“themost”,“last”,etc.)– Predefiningsomelexicon-to-operationmappings– Payingattentiontogrammaticalconstructionofthetree– Implementingheuristicstoguideexploration(‘shortQsfirst’)
32
(DasigiLTIPhDthesis,2018)
Dasigiapproach• Strategies:
– Incorporateknowledgeofgrammaticalconstraints– ‘Lucky’examples:removerightAwithwrongquerylogic– Questioncoverage:howmanyQwordsmapped?– Complexqueries(denotation):howlargeisthequery?– Doiterativesearch,fromsimplertomorecomplexQs
• CombineintosingleObjective:Minimizeexpectedvalueofcost(Goodman,1996;GoelandByrne,2000;SmithandEisner,2005)
withalinearcombinationofcoverageanddenotationcosts
33
x: NL term y: script term d: denotation
EmpiricalcomparisononWikiTableQuestions
● Requiresapproximatesetoflogicalformsduringtraining
● UsedoutputfromDynamicProgrammingonDenotations(PasupatandLiang,2016)
● Variousmodels:strings,trees,etc.
● Efficientsearchfollowedbypruningusinghumanannotations
34
(Krishnamurthy, Dasigi and Gardner, 2017)
Dasigiresultsusingiterativesearch
35
● Similar trend in 2 domains ● Used functional query language (Liang et al., 2018)
(Dasigi,Gardner,Murty,Zettlemoyer,Hovy2018)
NLVR WikiTableQuestions
Conclusionforoption4.1
Interestingideato‘operationalize’theQandtestits‘truth’byrunningthescriptfortheA
ButworksonlywithstructuredAsourceswheresuchoperationalizationispossible
36
Canwe‘operationalize’other,typicalkindsofQs?
Type2:AnewQAtask:Multi-domainknowledge
Q:WhatisthelargestcapitalcitysouthofSantiagodeChile?
– Geographicknowledge(lat-long,population)– Numericalability(sorting,etc.)
Q:WhichoftheleadersoftheXYZenterprisearewell-liked,andwhy?
– Discoveryofsocialrolebyactions– Sentimentjudgmentsattachedtoactions
37
Multi-domainknowledge
• DefineNself-containedstandardized‘domainspecialists’(KBs+reasoners)thatanyQAenginecanrun
• Atrun-time,analyzetheQ,buildtheAscript,activatethespecialistsasneeded,computetheA
Arithmetic
GeographyPsych:goals
SocialcustomsPhysics
38
Researchneeded
• Foreachdomainspecialist:– Defineits‘knowledgeservice’– Createtheunderlyingknowledge– DefinetheI/OAPIsfortheQAenginetouse– Buildthespecialist
• ForeachQAengine:– AnalyzetheQ—>determineparametersandneed– Decomposeneed,buildascriptofspecialistqueriesplustheirresultcomposition
– Execute39
Somespecialistareaswearecurrentlyworkingoninmygroup
1. Arithmetic/numericalreasoningforentailment(Ravichander,Naik,Rosé,Hovy,CoNLL2019,ACL2019)
2. Psychgoalsforsentimentjustification(OtaniandHovy,ACL2019)
3. Socialrolesforgroupactivitysupport(Yang,Kraut,Hov,yEMNLP,HCI,andothers2017–18)
40
Topic1.Numericalcalculation
• Task:Entailmentproblem• Input:clausescontainingnumbers• Output:entailed/not-entailed
• Results:– EQUATEdatasetextractedfrom~8existingQAandEntailmentresources,withAsadded
– Baselinenumericalreasonerscoresonthedataset
(Ravichander,Naik,Rose,Hovy,2019)
P:AbombinaHebrewUniversitycafeteriakilledfiveAmericansandfourIsraelisH:AbombingatHebrewUniversityinJerusalemkilledninepeople,includingfiveAmericans
41
EQUATEcorpusDataset Size Clas
sesSynthetic
DataSource
AnnotationSource
QuantitativePhenomena
StressTest 7500 3 ✓ AQuA-RAT Automatic Quantifiers
RTE-Quant 166 2 ✗ RTE2-RTE4 Expert Arithmetic,Worldknowledge,Ranges,Quantifiers
AwpNLI 722 2 ✓ ArithmeticWordProblems
Automatic Arithmetic
NewsNLI 1000 2 ✗ CNN Crowd-sourced
Ordinals,Quantifiers,Arithmetic,WorldKnowledge,Magnitude,Ratios
RedditNLI 250 3 ✗ Reddit Expert Range,Arithmetic,Approximation,Verbal
(Ravichanderetal.,2019)
42
Baselines(SOTAmethods)• MajorityClass(MAJ):Simplebaselinealwayspredictsthemajorityclassintestset.• Hypothesis-Only(HYP):FastTextclassifiertrainedononlyhypothesestopredictthe
entailmentrelation(Gururanganetal.2018)• ALIGN:Abag-of-wordsalignmentmodelinspiredbyMacCartney(2009)• NB(NieandBansal2017):SentenceencoderconsistingofstackedBiLSTM-RNNs
withshortcutconnectionsandfine-tuningofembeddings.Achievestopnon-ensembleresultintheRepEval-2017sharedtask
• CH(Chenetal.2017):SentenceencoderconsistingofstackedBiLSTM-RNNswithshortcutconnections,character-compositionwordembeddingslearnedviaCNNs,intra-sentencegatedattentionandensembling.AchievesbestoverallresultintheRepEval-2017sharedtask
• RC(Balazsetal.2017):Single-layerBiLSTMwithmeanpoolingandintra-sentenceattention
• IS(Conneauetal.2017):Single-layerBiLSTM-RNNwithmax-pooling,showntolearnrobustuniversalsentencerepresentationsthattransferwellacrossinferencetasks
• BiLSTM:WereimplementthesimpleBiLSTMbaselinemodelofNangiaetal.(2017).OurreimplementationachievesslightlybetterresultsontheMultiNLIdevset
• CBOW:Bag-of-wordssentencerepresentationfromwordembeddingspassedthroughatanhnon-linearityandasoftmaxlayerforclassification.
(Ravichanderetal.,2019)
43
Constructingentailmentinferences• Generateareport
foreachpremise-hypothesispair,consistingof:– ExtractedNUMSETSforpremiseandhypothesis
– WhichNUMSETSwerecombinedandbywhatoperation
– WhichNUMSETSwerejustifiedandwhichweren’t
• Combinesneuralandsymbolicprograms– Somesubmodulesareneural;overallframeworkissymbolic– Lightweightsupervision
(Ravichanderetal.,2019)
44
Topic2.Humangoals
• ComplexQAdomain:humangoalforsentiment– Ilovedthehotel’spricebuttheroomwasnoisy—>[price+][room-]
• Task:sentimentjustification:WHYdoestheHolderhavethesentimentvalueforthefacet?
• Approach:Classifyeachclauseintoalistofhuman(psychologicalandsocial)goals– Initialset:Maslowhierarchy– Currently:~110humangoalsfromUSC(Talevichetal.)
• Data:Crowdsourced;κ≈0.55
(OtaniandHovy,2019)
45
(Talevichetal.2017)
46
Topic3.Socialroles
• ComplexQAdomain:Humaninteractionsingroups
• Task:Automatedsocialrolediscovery– Input:Discussionsinasocialmediaplatform– Output:Rolelist,andassignmentforeachuser
• Data:– Wikipediaeditors:ourroletaxonomyconformstoWikipedia’sinternalset
– CancerSurvivorNetworkdiscussiongroups
(Yang,Kraut,Hovy,2018)
47
User edit history Role assignments
Information_insertion 0.4 Reference_insertion 0.2 ….
Roles
Grammar 0.2 Markup_deletion 0.1 Rephrase 0.1 ….
Wikilink_insertion 0.2 Wikilink_deletion 0.1 ….
48
(Yang,Kraut,Hovy,2018)
LatentrolemodelinWikipedia
Role: distribution of edit actions
Role proportions
Role assignment for user u and word n
Edit actions
(Yang,Kraut,Hovy,2018)
49
Discoverededitorroles(namingbyexpert)Expert’s role name Discovered representative behavior
Substantive Expert Information insertion, wikilink insertion, reference insertion
Social Networker Main talk namespace, user namespace
Vandal Fighter Reverting, user talk namespace
Quality Assurance Wikilink insertion, wikipedia namespace, template namespace
Fact Checker Information deletion, wikilink deletion, reference deletion
Cleanup Worker Wikilink modification, template insertion, markup modification
Fact Updater Template modification, reference modification
Copy Editor Grammar, paraphrase, relocation 50
(Yang,Kraut,Hovy,2018)
Topics4–.Otherinferencespecialists
• GeographyandTime… (see(Allen,CACM1983)and(Davis,JAIR2017))– E.g.:north-of,area-included-in-region…
• Physics,Biology… (seetheHALOproject)– RecentworkonaspectsofPhysicsatAI2(Clarketal.)
• Emotions
51
Physics:noun-nouncompounds
Whereis…
• …thekitchentable• …thecoffeetable• …thewoodtable• …theteacher’stable• …thedatatable
• Needtoknowtherelationandthenountypestoinferadditionalinfo:
• LOC• FUNCTIONèLOC• MATERIAL• ?FUNCTIONèLOC?• TYPESèCONTENTèLOC?
52
Conclusionforoption4.2WherenextwithComplexQA?
• Identifyandbuildthemostusefuldomainspecialists– Findbasicknowledgeprimitives– Developreasoninglogics,models,andimplementations
– Develop/findQAdatasetsthatexercisethissortofspecialistknowledgeandreasoning
• Greatoverviewin(Davis,JAIR2018)• Createacommonlibraryforalltoshare• EvaluatecorrectnessANDAnswerproductionscripts(traces,as‘explanation’) 53
Open-sourceandgeneral-purpose(notjustscientific/political)versionofWolframAlpha
THANKYOU
54