from simple to complex qa - nus · – multiple-choice question answering: race, mctest – answer...

FromSimpletoComplexQA

EduardHovyCMULanguageTechnologiesInstitute

www.cs.cmu.edu/~hovy

WebclopediaQA,2003

•  Wherearezebrasmostlikelyfound? —inthedictionary

•  Wheredolobstersliketolive? —onthetable

•  HowmanypeopleliveinChile? —nine

Webclopedia(Hovyetal.2001)

•  Whatisaninvertebrate? —Dukakis 1

BasicsimplefactoidQA

•  IdentifykeywordsfromQ•  Build(Boolean)queryforIR•  RetrievetextsusingIR•  Ranktexts/passages

•  FindspecifiedQtype•  MoveApatternsovertextand

scoreeachposition•  Rankwindows;returntopN

InputQ

Corpus:30%

+Web:add10%

1Mdocuments3000sentences

50candidates5answers

…Xwasbornin<YEAR>……Xwasbornon<DATE>……X(<YEAR>–<YEAR>)…

WhereistheAnswer?—Progresssince2003?

TypicalQAformat:

EithertheQcontextprovidestheA1.  nearby(=n-wordwindow)context 2.  distant(=doc-level)context

Ornotatall…soyouhavetousebackgroundinfo3.  fromthetrainingdata4.  fromlogicalderivation/reasoningrules/procedure 3

Question:QContext:“wwwwww…w”A:

Either…

WhenallinfoneededtogettheAispresentintheQcontext

…thensomeformofsurfaceandsimpletypematching+sub-Acompositionisenough

—>Ultimately,justdo[nested]simpleQA

ButwhengettingtheArequiresinformationnotintheQcontext(likebackgroundinfo,calculation,etc.)

…thenyouareintrouble:thisisnotstandardized,henceimpossibletoevaluate

—>NocomplexQA!?4

Outline

1.  Ainthenearbycontext2.  Ainthedistantcontext3.  Ahiddeninthetrainingdata4.  Aonlybyreasoning

Option1:Ainnearbycontext

•  BuildanduseshortpatternsorarichLM•  Tonsofworksince2000onpatternlearningandgeneralization,QAtypologies,etc.

•  NumerousQAdatasets(TREC,SQuAD,CNN…)•  ManyQAcompetitions(SEMEVAL…)

Sowhere’sthelimit?

YoucandoaLOTwithpatternsDidyouknowyouareanexpertonthe

PanamaCanal?

BlahPanamaCanalblahblahPanamablahPres.RooseveltblahUSAblahblahblahblahblah10yearsblahuntil1914blahblahblahblah51milesblahblahblahblahblahblahblahblahblahblahblahblahblah8to10hoursblahblahblahblahGatunLakeblah

WhenwasthePanamaCanalcompleted?HowlongisthePanamaCanal?HowlongdidittaketobuildthePanamaCanal?HowlongdoesittaketocrossthePanamaCanal?WhatisthelakeinthePanamaCanalcalled?WhichUSPresidentenabledthePanamaCanal?WhichoceansdoesthePanamaCanalconnect?Inyourtrainingdata,youhavesurelyseen“PanamaCanal”withonlytwooceannames…

7Sowhere’sthelimit?

Acorpustotestthepowerofngram/patternQAmodels

•  CLOTH(Xie,Lai,Dai,Hovy,EMNLP2018)–  Large-scaleClozetestdataset–  CreatedbyEnglishteachersinChinaforEnglishexams(MiddleandHighschoollevels)

– Aftercleanup:7kpassages;99kquestions(2/3removed)

– Droppedwordsandwordoptionscarefullycreatedbyteachers:highlynuancedalternatives

–  Testsknowledgeofgrammar,vocabulary,reasoning•  Howwelldostate-of-the-artcomputationalmodelsdocomparedtohumans?– Wetestusinga1-billion-wordlanguagemodel

•  Tense,voice,preps•  Localcontentwords

•  Copy/paraphrasewords•  Contentwords,long-distancedependencies

Percentagesoftestexamples,Middle/Highschoollevels

(Xieetal.2018)

QAsystemresults

•  Evena1B-LMstilllagsbehindhumanperformance•  Increasingthecontextlengthfor1B-LMdoesnothelp•  However:human-createdquestionsaredifferent:

(Xieetal.2018)

(AR:AttentionReader)

Thiswaspre-BERT!)

Conclusionforoption1

ForfactoidQAtypesthatobeypatterns,iftheAiscloseenough,andyouhaveenoughtrainingdata……youwillalwayslearngoodenoughwordcombinationpatternstoconnectQparameters<–>Qcontextmaterial<–>A

(Ifyouhaven’tseenthenecessarywordcombinations,youwon’teverbeabletoanswertheQ)

Option2:Aindistantcontext

•  StillusesomeformofmatchingQandA•  Needamore-sophisticatedandlonger-distancetypeof‘pattern’

Makingmatchingmorecomplex:RACE:Abettertestbed

•  RACE:ReAdingComprehensiondatasetfromExaminations(Lai,Xie,Liu,Yang,Hovy,EMNLP2018)

•  CollectedfromChinesemiddleandhighschoolexamsthatevaluatehumanstudents’Englishreadingcomprehensionability– Designedbyhumanexperts:Ensuresqualityandbroadtopiccoverage

–  SubstantiallymoredifficultthanexistingQAdatasets(butRACE-MeasierthanRACE-H)

– About4/5ofsourcematerialfilteredouttoremoveduplicates,incorrectformat,etc.

•  Aftercleaning:27,933passages;97,687questions

(Laietal.2018)

Toward‘reasoning’:typesofmore-complexmatching

•  ParaphrasingQs:testlanguageability•  DetailQs:identifyandmatchdetailsofathing

•  AttitudeQs:findopinions/attitudesoftheauthortowardssomething(’sentiment’)

•  Whole-pictureQs:understandtheentirestory(multi-sentence)

•  SummarizationQs:understandthepoint(multi-sentence)

(Laietal.2018)

Increasingreasoning

ComparisonwithotherQAdatasets•  Reasoningquestions:59.2%ofRACE;20.5%ofSQuAD•  Processingtypes:

–  Wordmatching:exactmatch–  Paraphrasing:paraphraseorentailment–  Single-sentreasoning:incompleteinfoorconceptualoverlap

–  Multi-sentreasoning:synthesizinginformationfrommultiplesentences

–  Insufficient/Ambiguous:noA,orAisnotunique

(Laietal.2018)

ComparingQAalgorithms

•  Baselines:–  SlidingWindow:TF-IDFbasedmatchingalgorithm–  StanfordAttentionReader(AR)andGatedAttentionReader(early-2018state-of-the-artneuralmodels)

•  RACEhasmore‘semantics’(=requiresmore‘reasoning’)thanothercorpora:–  higherhumanceiling–  harderforneuralmodels

(Laietal.2018)

Matchingtypeperformance

•  TurkersandSlidingWindowaregoodatsimplematchingquestions

•  Surprisingly,StanfordARdoesnothavebetterperformanceonmatchingquestions

(Laietal.2018)

WhentheAisdistant,orrequiresmore-sophisticatedmatching/’reasoning’(notjustsimpleword-string/languagemodel),

thenattention-basedneuralmodelscandosomeofit,butstillfailwiththeharderparts

Option3:A‘hidden’intrainingdata

SometimestheQcontextdoesnotcontaintheAatall…butyoucanSTILLgettherightA!(AndevengetitwithouttheQitself!)CorruptedngramsandotherSQuADperturbations(JiaandLiang,EMNLP2017)

NecessityofQcontextorevenofQitself(KaushikandLipton,EMNLP2018,BestShortPaperaward)

Example:QonlyQuestion:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Answer:kakueitanaka

(KaushikandLipton,EMNLP2018)

Doyouactuallyneedthecontext?

•  Researchgoal:–  HowstrongaremodelsthatseetheQonly?– WhataboutmodelsthatseetheQcontextpassageonly?–  Howdoweknowmodelsarereally“reading”thewholepassage?

•  Question-onlysetting:–  IftheQAsystemneedsthepassage,randomizeitswordsfirst–  IfjustcandidateAsneeded,placetheminrandomspots,fillinterveningtextwithgibberish

•  Passage-onlysetting:–  ‘Ignore’theQs:assigneachQtosomerandompassage

(KaushikandLiptonEMNLP2018)

Experiments•  Datasets/tests:

–  Spanselection:SQuAD,TriviaQA–  Clozequeries:ChildrensBookTest(CBT),CNN,CLOTH,Who-did-What,DailyMail

–  Multi-classclassification(implicit):bAbI(20tasks)–  Multiple-choicequestionanswering:RACE,MCTest–  Answergeneration:MSMARCO

•  Algorithms:–  Key-ValueMemoryNetworks:

Milleretal.2016:Key-ValueMemoryNetworksforDirectlyReadingDocuments.ProceedingsofEMNLP

–  GatedAttentionReaders:Dhingraetal.2017:Gated-AttentionReadersforTextComprehension.ProceedingsofACL

–  QANet:Yuetal.2018:QANet:CombiningLocalConvolutionwithGlobalSelf-AttentionforReadingComprehension.ProceedingsofICLR

Someresults

SQuAD,usingQANet

bAbI,usingKey-ValueMemNets

Who-did-What,usingGated-AttentionReaders

CBT,usingGated-AttentionReaders

Why?What’sgoingon??Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Answer:kakueitanaka

Transportationcompany

Kanemaru’ssecretary

Long-termpolitician

NamenotinGoogle

•  Don’ttrustQAdatasets!•  Don’ttrustQAsystemclaims!•  First,checkif

– anypre-existing(=trainingdata)dependenciesamongtheQandcandidateAs?

–  fullcontextpredictstheAwithouteventheQ?

Option4:Aonlythroughreasoning

FortrulycomplexQA:1.  Identifytheindividualsteps/piecesneeded

toderivetheA2.  Figureouthowtocompute/findthem

– FromtheQcontextand/orfromelsewhere

3.  Compose(andcheck?)them– BuildanAfinding‘script’

Possiblesourcesofthisknowledge•  Externalsearch:

–  Querysomethinglikethewebandhopetobelucky

•  Entailments:“sentence”–>“sentence”–  Operateatsurfaceform(inRTEformulation)–  Allowonesurfaceformtobestatedwhenanotherisgiven–  NewsurfaceformmayprovideAnswer–  Need:entailmentrules+entailmentapplier

•  Axioms:A∨B–>C–  Operateatdeeperlevel–  Connectrepresentationsubgraphs,evenprovidingnewnodes–  ExpandedgraphmayprovideAnswer–  Need:axioms/compositionrules+theoremprover

Type1:Apopulartasktoday:QAoverstructureddata

•  Data:database,table,etc.•  Task:askQsthatrequire(1)findingvariousbitsofdataand(2)composingthemtomaketheA

•  Themissinginformationisthescriptgoverningthesequenceofaccessandcomposition

•  Research:howto[learnto]buildthisscript?•  Evaluation:didthesystemproducetherightA?•  Examples:

– U.S.geographydatabaseof800facts(Zelle&Mooney,1996)– Wikitablequestions(PasupatandLiang,2015;Dasigi2018)– Otherdomains’tables(severalAI2projects)

Wikitabledataset

Athlete Nation Olympics Medals

Gillis Grafström

Sweden (SWE) 1920–1932 4

Kim Soo-Nyung

South Korea (KOR) 1988-200 6

Evgeni Plushenko Russia (RUS) 2002–2014 4

Kim Yu-na South Korea (KOR) 2010–2014 2

Patrick Chan Canada (CAN) 2014 2

Question:WhichathletewasfromSouthKoreaaftertheyear2010?

Answer:KimYu-Na

Reasoning:1)  GetrowswhereNationcolumn

containsSouthKorea2)  FilterrowswhereOlympicshas

avaluegreaterthan2010.3)  GetvaluefromAthletecolumn

fromfilteredrows.

Program:((reverseathlete)(and (nationsouth_korea) (year((reversedate)

(>=2010-mm-dd)))WikiTableQuestions,PasupatandLiang,2015

(DasigiLTIPhDthesis,2018)

Example:Dasigi•  Approachforlearningtobuildaccessroutines:

1.  ParseQ,builddependencytree2.  ConvertintoLogicalForm3.  Translateintocandidatetableaccessroutine4.  (tryallkindsofmappingsfromwordstoqueryoperators/structure)5.  Testcompositionbyrepeatedtrialanderror

•  Essentially,learningisasearchin‘operatorcombinationspace’tobuildthelogicalform

•  Weaksupervisionisnotenough.Speedupthelearning/searchby:–  Learningtoassociatetableaccessparameterswithpartsofthetree(Q

variables)–  Learningtoassociatenestingandaccessoperatorswithpartsofthetree

(‘operator’words:“themost”,“last”,etc.)–  Predefiningsomelexicon-to-operationmappings–  Payingattentiontogrammaticalconstructionofthetree–  Implementingheuristicstoguideexploration(‘shortQsfirst’)

(DasigiLTIPhDthesis,2018)

Dasigiapproach•  Strategies:

–  Incorporateknowledgeofgrammaticalconstraints–  ‘Lucky’examples:removerightAwithwrongquerylogic–  Questioncoverage:howmanyQwordsmapped?–  Complexqueries(denotation):howlargeisthequery?–  Doiterativesearch,fromsimplertomorecomplexQs

•  CombineintosingleObjective:Minimizeexpectedvalueofcost(Goodman,1996;GoelandByrne,2000;SmithandEisner,2005)

withalinearcombinationofcoverageanddenotationcosts

x: NL term y: script term d: denotation

EmpiricalcomparisononWikiTableQuestions

●  Requiresapproximatesetoflogicalformsduringtraining

●  UsedoutputfromDynamicProgrammingonDenotations(PasupatandLiang,2016)

●  Variousmodels:strings,trees,etc.

●  Efficientsearchfollowedbypruningusinghumanannotations

(Krishnamurthy, Dasigi and Gardner, 2017)

Dasigiresultsusingiterativesearch

●  Similar trend in 2 domains ●  Used functional query language (Liang et al., 2018)

(Dasigi,Gardner,Murty,Zettlemoyer,Hovy2018)

NLVR WikiTableQuestions

Conclusionforoption4.1

Interestingideato‘operationalize’theQandtestits‘truth’byrunningthescriptfortheA

ButworksonlywithstructuredAsourceswheresuchoperationalizationispossible

Canwe‘operationalize’other,typicalkindsofQs?

Type2:AnewQAtask:Multi-domainknowledge

Q:WhatisthelargestcapitalcitysouthofSantiagodeChile?

– Geographicknowledge(lat-long,population)– Numericalability(sorting,etc.)

Q:WhichoftheleadersoftheXYZenterprisearewell-liked,andwhy?

– Discoveryofsocialrolebyactions– Sentimentjudgmentsattachedtoactions

Multi-domainknowledge

•  DefineNself-containedstandardized‘domainspecialists’(KBs+reasoners)thatanyQAenginecanrun

•  Atrun-time,analyzetheQ,buildtheAscript,activatethespecialistsasneeded,computetheA

Arithmetic

GeographyPsych:goals

SocialcustomsPhysics

Researchneeded

•  Foreachdomainspecialist:– Defineits‘knowledgeservice’–  Createtheunderlyingknowledge– DefinetheI/OAPIsfortheQAenginetouse–  Buildthespecialist

•  ForeachQAengine:– AnalyzetheQ—>determineparametersandneed– Decomposeneed,buildascriptofspecialistqueriesplustheirresultcomposition

–  Execute39

Somespecialistareaswearecurrentlyworkingoninmygroup

1.  Arithmetic/numericalreasoningforentailment(Ravichander,Naik,Rosé,Hovy,CoNLL2019,ACL2019)

2.  Psychgoalsforsentimentjustification(OtaniandHovy,ACL2019)

3.  Socialrolesforgroupactivitysupport(Yang,Kraut,Hov,yEMNLP,HCI,andothers2017–18)

Topic1.Numericalcalculation

•  Task:Entailmentproblem•  Input:clausescontainingnumbers•  Output:entailed/not-entailed

•  Results:–  EQUATEdatasetextractedfrom~8existingQAandEntailmentresources,withAsadded

–  Baselinenumericalreasonerscoresonthedataset

(Ravichander,Naik,Rose,Hovy,2019)

P:AbombinaHebrewUniversitycafeteriakilledfiveAmericansandfourIsraelisH:AbombingatHebrewUniversityinJerusalemkilledninepeople,includingfiveAmericans

EQUATEcorpusDataset Size Clas

sesSynthetic

DataSource

AnnotationSource

QuantitativePhenomena

StressTest 7500 3 ✓ AQuA-RAT Automatic Quantifiers

RTE-Quant 166 2 ✗ RTE2-RTE4 Expert Arithmetic,Worldknowledge,Ranges,Quantifiers

AwpNLI 722 2 ✓ ArithmeticWordProblems

Automatic Arithmetic

NewsNLI 1000 2 ✗ CNN Crowd-sourced

Ordinals,Quantifiers,Arithmetic,WorldKnowledge,Magnitude,Ratios

RedditNLI 250 3 ✗ Reddit Expert Range,Arithmetic,Approximation,Verbal

(Ravichanderetal.,2019)

Baselines(SOTAmethods)•  MajorityClass(MAJ):Simplebaselinealwayspredictsthemajorityclassintestset.•  Hypothesis-Only(HYP):FastTextclassifiertrainedononlyhypothesestopredictthe

entailmentrelation(Gururanganetal.2018)•  ALIGN:Abag-of-wordsalignmentmodelinspiredbyMacCartney(2009)•  NB(NieandBansal2017):SentenceencoderconsistingofstackedBiLSTM-RNNs

withshortcutconnectionsandfine-tuningofembeddings.Achievestopnon-ensembleresultintheRepEval-2017sharedtask

•  CH(Chenetal.2017):SentenceencoderconsistingofstackedBiLSTM-RNNswithshortcutconnections,character-compositionwordembeddingslearnedviaCNNs,intra-sentencegatedattentionandensembling.AchievesbestoverallresultintheRepEval-2017sharedtask

•  RC(Balazsetal.2017):Single-layerBiLSTMwithmeanpoolingandintra-sentenceattention

•  IS(Conneauetal.2017):Single-layerBiLSTM-RNNwithmax-pooling,showntolearnrobustuniversalsentencerepresentationsthattransferwellacrossinferencetasks

•  BiLSTM:WereimplementthesimpleBiLSTMbaselinemodelofNangiaetal.(2017).OurreimplementationachievesslightlybetterresultsontheMultiNLIdevset

•  CBOW:Bag-of-wordssentencerepresentationfromwordembeddingspassedthroughatanhnon-linearityandasoftmaxlayerforclassification.

Constructingentailmentinferences•  Generateareport

foreachpremise-hypothesispair,consistingof:–  ExtractedNUMSETSforpremiseandhypothesis

–  WhichNUMSETSwerecombinedandbywhatoperation

–  WhichNUMSETSwerejustifiedandwhichweren’t

•  Combinesneuralandsymbolicprograms–  Somesubmodulesareneural;overallframeworkissymbolic–  Lightweightsupervision

Topic2.Humangoals

•  ComplexQAdomain:humangoalforsentiment–  Ilovedthehotel’spricebuttheroomwasnoisy—>[price+][room-]

•  Task:sentimentjustification:WHYdoestheHolderhavethesentimentvalueforthefacet?

•  Approach:Classifyeachclauseintoalistofhuman(psychologicalandsocial)goals–  Initialset:Maslowhierarchy–  Currently:~110humangoalsfromUSC(Talevichetal.)

•  Data:Crowdsourced;κ≈0.55

(OtaniandHovy,2019)

(Talevichetal.2017)

Topic3.Socialroles

•  ComplexQAdomain:Humaninteractionsingroups

•  Task:Automatedsocialrolediscovery–  Input:Discussionsinasocialmediaplatform– Output:Rolelist,andassignmentforeachuser

•  Data:– Wikipediaeditors:ourroletaxonomyconformstoWikipedia’sinternalset

– CancerSurvivorNetworkdiscussiongroups

(Yang,Kraut,Hovy,2018)

User edit history Role assignments

Information_insertion 0.4 Reference_insertion 0.2 ….

Grammar 0.2 Markup_deletion 0.1 Rephrase 0.1 ….

Wikilink_insertion 0.2 Wikilink_deletion 0.1 ….

LatentrolemodelinWikipedia

Role: distribution of edit actions

Role proportions

Role assignment for user u and word n

Edit actions

Discoverededitorroles(namingbyexpert)Expert’s role name Discovered representative behavior

Substantive Expert Information insertion, wikilink insertion, reference insertion

Social Networker Main talk namespace, user namespace

Vandal Fighter Reverting, user talk namespace

Quality Assurance Wikilink insertion, wikipedia namespace, template namespace

Fact Checker Information deletion, wikilink deletion, reference deletion

Cleanup Worker Wikilink modification, template insertion, markup modification

Fact Updater Template modification, reference modification

Copy Editor Grammar, paraphrase, relocation 50

Topics4–.Otherinferencespecialists

•  GeographyandTime… (see(Allen,CACM1983)and(Davis,JAIR2017))– E.g.:north-of,area-included-in-region…

•  Physics,Biology… (seetheHALOproject)– RecentworkonaspectsofPhysicsatAI2(Clarketal.)

•  Emotions

Physics:noun-nouncompounds

Whereis…

•  …thekitchentable•  …thecoffeetable•  …thewoodtable•  …theteacher’stable•  …thedatatable

•  Needtoknowtherelationandthenountypestoinferadditionalinfo:

•  LOC•  FUNCTIONèLOC•  MATERIAL•  ?FUNCTIONèLOC?•  TYPESèCONTENTèLOC?

Conclusionforoption4.2WherenextwithComplexQA?

•  Identifyandbuildthemostusefuldomainspecialists–  Findbasicknowledgeprimitives– Developreasoninglogics,models,andimplementations

– Develop/findQAdatasetsthatexercisethissortofspecialistknowledgeandreasoning

•  Greatoverviewin(Davis,JAIR2018)•  Createacommonlibraryforalltoshare•  EvaluatecorrectnessANDAnswerproductionscripts(traces,as‘explanation’) 53

Open-sourceandgeneral-purpose(notjustscientific/political)versionofWolframAlpha

THANKYOU

from simple to complex qa - nus · – multiple-choice question answering: race, mctest – answer...

Documents

question answering...

visual7w: grounded question answering in images · pdf...

responding e-commerce product questions via exploiting qa...

complex question answering: unsupervised …automated...

question-answering system - sjsu...question answering [qa]...

question answering with imperfect temporal...

14. question-answering (qa) - depaul university ·...

imtku question answering system for world...

deploying semantic resources for open domain question...

multi-relational question answering from narratives ... ·...

seminar on machine answering -...

imtku question answering system for world history...

question answering and reading comprehension · question...

constitutional pluralism in southeast asiadr herlambang...

question answering - emory...

© johan bos november 2005 question answering lecture 1 (two...

question answering systems - syracuse...

example-driven question answering di.pdf · open-domain...

© johan bos april 2008 question answering (qa) lecture 1...

quac : question answering in context › pdf ›...