from simple to complex qa - ibm€¦ · evgeni plushenko russia (rus) 2002–2014 4 kim yu-na south...
TRANSCRIPT
FromSimpletoComplexQA
EduardHovyCMU
www.cs.cmu.edu/~hovy
WebclopediaQA,2003
• Wherearezebrasmostlikelyfound? —inthedictionary
• Wheredolobstersliketolive? —onthetable
• HowmanypeopleliveinChile? —nine
Webclopedia(Hovyetal.2001)
• Whatisaninvertebrate? —Dukakis 1
2
Twowaystomakethingsmorecomplex
1. Complexanswers:Long,non-factoidresponses.Connectstotextsummarization,textplanning,NLgeneration
2. Complexanswering:Aproceduremorethansimpleatomicpatternmatching
Weleavethisforanothertime
So,whattodo?3
Whatis‘complexQA’?
LongAgeneration
FactoidAmatching
FactoidAcalculation
Today
✗More-
complex FactoidAcalculation
4
Outline
• SimpleQA:Matching• ComplexQA:Calculating• TheConundrum• TwoPathsForward
5
SIMPLEQA:MATCHING
6
BasicsimpleQAarchitecture
• IdentifykeywordsfromQ• Build(Boolean)queryforIR• RetrievetextsusingIR• Ranktexts/passages
• FindspecifiedQtype• MoveApatternsovertextand
scoreeachposition• Rankwindows;returntopN
Alist
InputQ
Corpus:30%
+Web:add10%
1Mdocuments3000sentences
50candidates5answers
…Xwasbornin<YEAR>……Xwasbornon<DATE>……X(<YEAR>–<YEAR>)…
7
Patterns
Themainresearchliesinbuilding/learning,specializing,andmatchingAgraphs/patterns:
– Manualcreationandstringmatching
– LearnedasperInfoExtraction:recursivepatternlearning,simplematching
– Learnedinneuralarchitectures,matchingisautomatic
Increasinglyautomated
Patternstrings,trees,frames,graphs…
8
Infact,allQAprocessingbeseenasmatchingaQuestiongraphagainstasetof(normalized/expanded)candidateAnswergraphs
withthebestmatch(es)providingtheAnswer
9
Withenoughpatternengineering…
Withenoughpatternengineeringandsomespecializedreasoning(rhymes,geospatialandtemporalreasoning,etc.)……yougetIBM’sWatson
10
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered
Baseline
12/2007
8/2008
5/2009
10/2009
11/2010
12/2008
5/2008
4/2010
Precision
Watson’sincrementalprogressinPrecisionandConfidence
slidebyIBM
11
Theultimatepatterntechnology
• Apatternmustcapture– Thesequenceofrelevantwords/types– Thetargetword/type[familyofsynonyms]
• NNsdoexactlythis– Ngrammodelsofwordembeddings– Embeddingsthatgeneralizewords
12
Embeddingsmakepatternlearningandengineeringmucheasier
• Awordembeddingisa‘localgeneralizedngrammodel’fortheword– capturestheword’sgeneralcontextexpectations– captures(semantic)substitutionsoftheword
• Asentenceembeddingisa‘generalizedngramsequencemodel’fortheQuestion/sentence– NgramsareweightedindifferentwaysdependingonNNarchitecture(directionofLSTM,attention,etc.)
– Peopleinventarchitecturestohighlight(orsmoothout,usingadversarialapproaches)specificparts
– Theimportantparts(wordsforQparameters,inter-partrelations,theAintroducerwords,etc.)formthe‘pattern’
13
Pattern-levelevolutionMethod
• Stringmatch• Synonymsubstitution• Typedpatternmatch
• Parsetreematch
• Shallowsemanticmatch
• Embeddingmatch
Resource• Surfacepatternlibrary• Synonymdictionary,WN• QAtypology/Typehierarchy
• Parser+Treetransformrules
• Semanticanalyzer+Typehierarchy+partialunifier
• SimpleNN
• Attention-focusedBiLSTM,etc. 14
BUT:Patternsthataretoogoodgiveafalsesenseofaccomplishment
Trainedwithenoughembeddings…
…theNN’swordandword-combinationmodelscaptureworldknowledgefactoidsandrelations(incl.theirsurface-levelcuesforsemanticstructure)
SoyougetwhatlookslikesemanticQA
15
DidyouknowyouareanexpertonthePanamaCanal?
BlahPanamaCanalblahblahPanamablahPres.RooseveltblahUSAblahblahblahblahblah10yearsblahuntil1914blahblahblahblah51milesblahblahblahblahblahblahblahblahblahblahblahblahblah8to10hoursblahblahblahblahGatunLakeblah
WhenwasthePanamaCanalcompleted?HowlongisthePanamaCanal?HowlongdidittaketobuildthePanamaCanal?HowlongdoesittaketocrossthePanamaCanal?WhatisthelakeinthePanamaCanalcalled?WhichUSPresidentenabledthePanamaCanal?WhichoceansdoesthePanamaCanalconnect?Inyourtrainingdata,youhavesurelyseen“PanamaCanal”withonlytwooceannames…
16
Howpowerfularengrampatternmodels?
• CLOTH:Large-scaleClozeTestDatasetCreatedbyTeachers(Xie,Lai,Dai,Hovy,EMNLP2018)
• Large-scaleclozetestdatasetcollectedfromEnglishexamsinChina(MiddleandHighschoollevel)– Aftercleanup:7kpassages;99kquestions(2/3removed)
• Thedroppedwordsandwordoptionswerecarefullycreatedbyteachers:– Highlynuancedalternatives– Testknowledgeofgrammar,vocabulary,reasoning– Howwelldostate-of-the-artcomputationalmodelsdoincomparisontohumans?(1-billion-wordlanguagemodel)
17
• Tense,voice,preps• Localcontentwords
• Copy/paraphrasewords• Contentwords,long-distancedependencies
Percentagesoftestexamples
(Xieetal.2018)
18
QAsystemresults
• Evena1B-LMstilllagsbehindhumanperformance• Increasingthecontextlengthfor1B-LMdoesnothelp• However:human-createdquestionsaredifferent:
(Xieetal.2018)
19
So,Iconclude:Givenenoughtrainingdata…
(andassumingtheQprovidescontexttext)…youwillalwayslearngoodpatterns/wordcombinationgraphsthatconnectQparameters<–>Qcontextmaterial<–>A…(Ifyouhavenotseenthenecessarycombinations,youwon’tbeabletoanswertheQ)
20
…tothedegreethat……youmaynotevenneedtheQcontext(orifyouhavetheQcontext,youmaynotneedtheQ!!):CorruptedngramsandotherSQuADperturbations(JiaandLiang,EMNLP2017)
NecessityofQcontextorevenofQitself(KaushikandLipton,EMNLP2018,BestShortPaperaward)
AHEM!!21
KaushikandLiptonEMNLP2018• Researchgoal:
– Howstrongaremodelsthatseethequestiononly?– WhataboutmodelsthatseetheQcontextpassageonly?– Howdoweknowmodelsarereally“reading”thewholepassage?
• Question-onlysetting:– Ifpassageneededforengine,randomizeitswordsfirst– IfcandidateAsneeded,theyareplacedinrandomspots,interveningtextfilledwithgibberish
• Passage-onlysetting:– Createcorruptedversionsofeachdataset:assignQtosomepassagerandomly
22
Example:QonlyPassage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Answer:kakueitanaka
(KaushikandLiptonEMNLP2018)
23
Experiments• Datasets/tests:
– Spanselection:SQuAD,TriviaQA– Clozequeries:ChildrensBookTest(CBT),CNN,CLOTH,Who-did-What,DailyMail
– Multi-classclassification(implicit):bAbI(20tasks)– Multiple-choicequestionanswering:RACE,MCTest– Answergeneration:MSMARCO
• Algorithms:– Key-ValueMemoryNetworks:
Miller,Alexander,etal.2016.Key-ValueMemoryNetworksforDirectlyReadingDocuments.ProceedingsoftheEMNLPconference.
– GatedAttentionReaders:Dhingra,Bhuwan,etal.2017.Gated-AttentionReadersforTextComprehension.ProceedingsoftheACLconference(LongPapers).
– QANet:Yu,AdamsWei,etal.2018.QANet:CombiningLocalConvolutionwithGlobalSelf-AttentionforReadingComprehension.ProceedingsofICLR.
(KaushikandLiptonEMNLP2018)
24
Someresults
SQuAD,usingQANet
bAbI,usingKey-ValueMemNets
(KaushikandLiptonEMNLP2018)
25
Who-did-What,usingGated-AttentionReaders
CBT,usingGated-AttentionReaders
(KaushikandLiptonEMNLP2018)
26
What’sgoingon??Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Answer:kakueitanaka
Transportationcompany
Kanemaru’ssecretary
Long-termpolitician
NamenotinGoogle
27
Lessonslearned
• Thisstudywasgreatbutnotperfect—itshouldhavecheckedforpre-existingdependenciesamongtheQandthecandidateAs
• Still,itshows:youmustproviderigorousbaselines,forbothdatasetsandmodels
• Andyoumusttestthatfullcontextisessentialforthetask—nohiddendependenciesanywhere!
28
Wherenextwiththisapproach?
• DesignlargerandfancierNNarchitectures(Feedforward—>BiLSTM—>BiLSTMwithAttentiontoQtypeword—>…)thatbuildincreasinglycomplexgeneralized‘recognizergraphs’
• Inthelimit(withenoughtrainingdata),theywillidentifyallrelevantQparametersinthecorrectconfigurationandpinpointtheA
BUT:Thisworksonlywhenalltherelevantinfoisexplicitlypresent
29
LongAgeneration
FactoidAmatching
FactoidAcalculation
Today
✗More-
complex
FancierNNsthatbuildelaborateA
graphsovertheinput
30
COMPLEXQA:CALCULATING
31
QAsystemdevelopment
• Seriesofspecializedsubtasks:– ProduceaQuestiongraph– GetmultiplecandidateAnswer
texts– Producetheirgraphs– Canonicalizethemintotypes– Matchandgetagoodnessscore– ReranktheAsanddeliverlist
• Whensomeinfoismissing…– ThecandidateAgraphis
disconnected– TheQgraphcannotmatch
anywhere– Needbackgroundknowledge
(andreasoning?)toconnectuptheAgraphfragments
• ModerncommercialQAwork:– InduceAnswertypesfrom
questionlogs,andcreate[thousandsof]latentAdimensions
– Automaticallylearnpatternsassociatedwithanswerdimensions
– Harvestandpreparetonsofanswers—cover98%ofeffectiveQspaceofquestionlogs
32
QAresearchhasdonealltheeasycases…butwhatifsomegraphinfoismissing?
ThecandidateAgraphisdisconnectedTheQgraphcannotmatchanywhere
HowtoconnectuptheAgraphfragments?
SEMTYPE
SEMTYPE
SEMTYPE
SEMTYPE
NameAgent
Loc
SEMTYPE
?Rel1 ?Rel2
?Rel3
?Rel4 33
Needbackgroundknowledge!
Ineverycase,youneedbackgroundknowledge• Easyknowledge:TransformAstringintomatchableform:– Synonymsubstitution– Semantictypenormalization– Parsetreenormalization(tenses,passive–>active,etc.)
• Hard:Addnewnodesandlinksbetweensubgraphs:– Providenewrelations– Provideadditionalnodes
34
Possiblesourcesofthisknowledge• Externalsearch:
– Querysomethinglikethewebandhopetobelucky
• Entailments:“sentence”–>“sentence”– Operateatsurfaceform(inRTEformulation)– Allowonesurfaceformtobestatedwhenanotherisgiven– NewsurfaceformmayprovideAnswer– Need:entailmentrules+entailmentapplier
• Axioms:A∨B–>C– Operateatdeeperlevel– Connectrepresentationsubgraphs,evenprovidingnewnodes– ExpandedgraphmayprovideAnswer– Need:axioms/compositionrules+theoremprover
35
Populartasktoday:QAoverstructureddata
• Data:database,table,etc.• Task:AskQsthatrequire(1)findingvariousbitsofdataand(2)composingthemtomaketheA
• Themissinginformationisthescriptgoverningthesequenceofaccessandcomposition
• Research:howto[learnto]buildthisscript?• Evaluation:didthesystemproducetherightA?• Examples:
– U.S.geographydatabaseof800facts(Zelle&Mooney,1996)– Wikitablequestions(PasupatandLiang,2015;Dasigi2018)– Otherdomains’tables(severalAI2projects)
36
Wikitabledataset
37
Athlete Nation Olympics Medals
Gillis Grafström
Sweden (SWE) 1920–1932 4
Kim Soo-Nyung
South Korea (KOR) 1988-200 6
Evgeni Plushenko Russia (RUS) 2002–2014 4
Kim Yu-na South Korea (KOR) 2010–2014 2
Patrick Chan Canada (CAN) 2014 2
Question:WhichathletewasfromSouthKoreaaftertheyear2010?
Answer:KimYu-Na
Reasoning:1) GetrowswhereNationcolumn
containsSouthKorea2) FilterrowswhereOlympicshas
avaluegreaterthan2010.3) GetvaluefromAthletecolumn
fromfilteredrows.
Program:((reverseathlete)(and (nationsouth_korea) (year((reversedate)
(>=2010-mm-dd)))WikiTableQuestions,PasupatandLiang,2015
DasigiPhDthesis,2018
Example:Dasigithesis2018• Approachforlearningtobuildaccessroutines:
– ParseQ,builddependencytree– ConvertintoLogicalForm– Translateintocandidatetableaccessroutine– Testcompositionbyrepeatedtrialanderror
• Essentially,learningisasearchin‘operatorcombinationspace’tobuildlogicalform.Speedupsearchby– Learningtoassociatetableaccessparameterswithpartsofthetree(Qvariables)
– Learningtoassociatenestingandaccessoperatorswithpartsofthetree(‘operator’words:“themost”,“last”,etc.)
– Predefiningsomelexicon-to-operationmappings– Payingattentiontogrammaticalconstructionofthetree– Implementingheuristicstoguideexploration(‘shortQsfirst’)
DasigiPhDthesis,2018
38
EmpiricalcomparisononWikiTableQuestions
● Requiresapproximatesetoflogicalformsduringtraining
● UsedoutputfromDynamicProgrammingonDenotations(PasupatandLiang,2016)
● Variousmodels:strongs,trees,etc.
● Efficientsearchfollowedbypruningusinghumanannotations
39
(Krishnamurthy, Dasigi and Gardner, 2017)
Theproblemislearningtoconstructtheanswerscript
• Weaksupervisionisnotenough:– Incorporatingknowledgeofgrammarconstraintshelps– Handlingspuriousexamples(rightanswerforwrongreasons)– Consideringcoverageofcases:Useoverlapasameasuretoguidesearch
• CombineintosingleObjective:Minimizeexpectedvalueofcost(Goodman,1996;GoelandByrne,2000;SmithandEisner,2005)
withalinearcombinationofcoverageanddenotationcosts
• Implementiterativesearch(fromsimplertomorecomplex)40
ResultsoftrainingwithiterativesearchonWikiTableQuestions
41
● Similar trend in 2 domains (NLVR and WikiTableQuestions) ● Used functional query language (Liang et al., 2018)
(Dasigi,Gardner,Murty,Zettlemoyer,Hovy2018)
Wherenextwiththisapproach?
• BetterwaystolearntobuildtheAretrieval+compositionscript(=mappingfromEnglishQtonestedoperatorsandvariables):– LearningfromcorporaofQsandscripts(likeMooneyetal.)
– Learningbyguidedsearchthrough‘operatorspace’(likeDasigiandothers)
• Waystoprovecorrectnessofthescript42
LongAgeneration
FactoidAmatching
FactoidAcalculation
Today
✗More-
complex More-
complex
FancierNNsthatbuildelaborateA
graphsovertheinput
MethodstobuildAscriptsfromaccessintostructuredKBs
43
THECONUNDRUM
44
Either…
AllinfoneededtogettheAispresentintheQcontext
…sosomeformofsurfacematching+sub-Acompositionsuffices
—>Ultimately,nestedsimpleQA(…stillOK?)
Or…
GettingtheArequiresinformationnotintheQcontext:backgroundknowledge,calculation,etc.
…butthisisnotstandardized,henceimpossibletoevaluate
—>NocomplexQA!?45
TWOPATHSFORWARD
46
Twotypesof‘complex’QA
Matching (Fact[oid]-oriented)
• CreateQsneedingmultiplematches+dynamiccomposition
• Buildthisdataset• Evaluatethecomposition
Computing(Procedure-oriented)
• IdentifygeneralbackgroundknowledgeallQAsystemsshouldhave
• Buildthedataset(s)ofQsandAsrequiringit
Howtodeepenthis? Howtodeepenthis?
47
1.Makingmatchingmorecomplex:RACE:Abettertestbed
• RACE:ReAdingComprehensiondatasetfromExaminations(Lai,Xie,Liu,Yang,Hovy,EMNLP2018)
• CollectedfromChinesemiddleandhighschoolexamsthatevaluatehumanstudents’Englishreadingcomprehensionability– Designedbyhumanexperts:Ensuresqualityandbroadtopiccoverage
– SubstantiallymoredifficultthanexistingQAdatasets(butRACE-MeasierthanRACE-H)
– About4/5ofsourcematerialfilteredouttoremoveduplicates,incorrectformat,etc.
• Aftercleaning:27,933passages;97,687questions
(Laietal.2018)
48
49
Kindsofmore-complexmatching• Detailmatching:understanddetailsofthepassage
• ParaphrasingQs:testlanguageability• Whole-picturematching:comprehendtheentirestory
• PassagesummarizationQs:understandthepoint• Attitudematching:findopinions/attitudesoftheauthortowardssomething
• WorldknowledgeQs:useexternalknowledgesuchassimplearithmetic
(Laietal.2018)
50
ComparisonwithotherQAdatasets• Reasoningquestions:59.2%ofRACE;20.5%ofSQuAD• Processingtypes:
– Wordmatching:exactmatch– Paraphrasing:paraphraseorentailment– Single-sentreasoning:incompleteinfoorconceptualoverlap
– Multi-sentreasoning:synthesizinginformationfrommultiplesentences
– Insufficient/Ambiguous:noA,orAisnotunique
(Laietal.2018)
51
ComparingQAalgorithms
• Baselines:– SlidingWindow:TF-IDFbasedmatchingalgorithm– StanfordAttentionReader(AR)andGatedAttentionReader:state-of-the-artneuralmodels
• RACEhashigherhumanceilingperformance,whichshowsthedataisquiteclean
• RACEisharderforARmodels,provingasignificantgapremains
(Laietal.2018)
52
Matchingtypeperformance
• TurkersandSlidingWindowaregoodatsimplematchingquestions
• Surprisingly,StanfordARdoesnothaveabetterperformanceonmatchingquestions
(Laietal.2018)
53
MovingaheadonMatchingQA
• Identifythetypesofmore-complexmatching,matchcomposition/nesting,etc.
• BuilddatasetslikeRACEthat– Comefromtherealworld– Donotsupportmatchingonsimplemulti-factpresence
– Showarealgapbetweenhumanandsystem
• EvaluatecorrectnessANDcomposition(requiretheanswertraceplusitscomponentfactoids)
54
2.Makingcomputationmorecomplex:Inferenceofvariouskinds
• DefineNself-containedstandardized‘domainspecialists’(KBs+reasoners)thatanyQAenginecanactivate
• Atrun-time,analyzetheQ,buildtheAscript,activatethespecialistsasneeded,computetheA
Arithmetic
GeographyPsych:goals
SocialcustomsPhysics
55
Examples
• WhatisthelargestcapitalcitysouthofSantiagodeChile?– Geographicknowledge(lat-long,population)– Numericalability(sorting,etc.)
• WhichoftheleadersoftheXYZenterprisearewell-liked,andwhy?– Discoveryofsocialrolebyactions– Sentimentjudgmentsattachedtoactions
56
Researchneeded
• Foreachdomainspecialist:– Defineits‘knowledgeservice’– Createtheunderlyingknowledge– DefinetheI/OAPIsfortheQAenginetouse– Buildthespecialist
• ForeachQAengine:– AnalyzetheQtodetermineparametersandneed– Decomposetheneedintoascriptofspecialistqueriesplustheirresultcomposition
– Execute57
Somespecialistareaswearecurrentlyworkingoninmygroup
1. Arithmetic/numericalreasoningforentailment
2. Psychgoalsforsentimentjustification3. Socialrolesforgroupactivitysupport
58
Topic1.Numericalcalculation• Task:Entailmentproblem• Input:clausescontainingnumbers• Output:entailed/not-entailed
• Impact/need:– ContradictionpairsfromWikipediaandGoogleNews:29%fromnumericdiscrepancies(deMarneffeetal.ACL2008)
– SeveralRecognizingTextualEntailmentdatasets:numericcontradictionsare8.8%ofcontradictorypairs(Daganetal.RTE2006)
(Ravichander,Naik,Rose,Hovy,2019)
P:AbombinaHebrewUniversitycafeteriakilledfiveAmericansandfourIsraelisH:AbombingatHebrewUniversityinJerusalemkilledninepeople,includingfiveAmericans
59
EQUATE
• Modelsofquantitativereasoningshould:– Interpretquantitiesexpressedinlanguage– Performbasicarithmeticcalculations– Justifyquantitativeclaimsbycombiningverbalandnumericreasoning
• Currentinferencedatasetsdonottestthis• OurworkEQUATE:
– Acorpusofnumericalpairswithentailmentlabels– Abenchmarkevaluationframeworktotestmodelabilitytoperformquantitativereasoningfornaturallanguageinference,combining9previousapproaches
(Ravichanderetal.,2019)
60
EQUATEcorpusDataset Size Clas
sesSynthetic
DataSource
AnnotationSource
QuantitativePhenomena
StressTest 7500 3 ✓ AQuA-RAT Automatic Quantifiers
RTE-Quant 166 2 ✗ RTE2-RTE4 Expert Arithmetic,Worldknowledge,Ranges,Quantifiers
AwpNLI 722 2 ✓ ArithmeticWordProblems
Automatic Arithmetic
NewsNLI 1000 2 ✗ CNN Crowd-sourced
Ordinals,Quantifiers,Arithmetic,WorldKnowledge,Magnitude,Ratios
RedditNLI 250 3 ✗ Reddit Expert Range,Arithmetic,Approximation,Verbal
(Ravichanderetal.,2019)
61
Baselines(SOTAmethods)• MajorityClass(MAJ):Simplebaselinealwayspredictsthemajorityclassintestset.• Hypothesis-Only(HYP):FastTextclassifiertrainedononlyhypothesestopredictthe
entailmentrelation(Gururanganetal.2018)• ALIGN:Abag-of-wordsalignmentmodelinspiredbyMacCartney(2009)• NB(NieandBansal2017):SentenceencoderconsistingofstackedBiLSTM-RNNs
withshortcutconnectionsandfine-tuningofembeddings.Achievestopnon-ensembleresultintheRepEval-2017sharedtask
• CH(Chenetal.2017):SentenceencoderconsistingofstackedBiLSTM-RNNswithshortcutconnections,character-compositionwordembeddingslearnedviaCNNs,intra-sentencegatedattentionandensembling.AchievesbestoverallresultintheRepEval-2017sharedtask
• RC(Balazsetal.2017):Single-layerBiLSTMwithmeanpoolingandintra-sentenceattention
• IS(Conneauetal.2017):Single-layerBiLSTM-RNNwithmax-pooling,showntolearnrobustuniversalsentencerepresentationsthattransferwellacrossinferencetasks
• BiLSTM:WereimplementthesimpleBiLSTMbaselinemodelofNangiaetal.(2017).OurreimplementationachievesslightlybetterresultsontheMultiNLIdevset
• CBOW:Bag-of-wordssentencerepresentationfromwordembeddingspassedthroughatanhnon-linearityandasoftmaxlayerforclassification.
(Ravichanderetal.,2019)
62
Constructingentailmentinferences• Generateareport
foreachpremise-hypothesispair,consistingof:– ExtractedNUMSETSforpremiseandhypothesis
– WhichNUMSETSwerecombinedandbywhatoperation
– WhichNUMSETSwerejustifiedandwhichweren’t
• Combinesneuralandsymbolicprograms– Somesubmodulesareneural;overallframeworkissymbolic– Lightweightsupervision
(Ravichanderetal.,2019)
63
Topic2.Humangoals• ComplexQAdomain:humangoalsandsentiment
– FindingsentimentHolder,Topic+Facet,andValencearerelativelyeasy
– Example:“Ilovedthehotel’spricebuttheroomwasnoisy”—>[price+][room-]
• Task:sentimentjustification:WHYdoestheHolderhavethatsentimentvalenceforthatfacet?
• Approach:Classifyeachclauseintoalistofhuman(psychologicalandsocial)goals– Input:Sentiment-bearingclause– Output:Sentimentvalencelabel,facet,humangoal(s)thatjustifysentimentvalence
(OtaniandHovy,2019)
64
Psychgoals
• Technicalapproach:Automatedclassificationintotaxonomiesofpsych(notsocial)goals– Initialset:Maslowhierarchy(seeWikipedia)– Currently:about110humangoals,identifiedandtaxonomized(Talevichetal.)
• Domains:ReviewsofHotels,Cameras,Movies• Data:Crowdsourcingtrainingmaterial;kappaagreement≈0.55
• Results:traditionalandneuralclassifiersdobetteronobjectsandpooreronevents/complexthingslikemovies
(OtaniandHovy,2019)
65
(Talevichetal.2017)
66
Topic3.Socialroles
• ComplexQAdomain:Humaninteractionsingroups
• Task:Automatedsocialrolediscovery– Input:Discussionsinasocialmediaplatform– Output:Rolelist,andassignmentforeachuser
• Data:– Wikipediaeditors:ourroletaxonomyconformstoWikipedia’sinternalset
– CancerSurvivorNetworkdiscussiongroups
(Yang,Kraut,Hovy,2019)
67
LatentrolemodelinWikipedia
Role: distribution of edit actions
Role proportions
Role assignment for user u and word n
Edit actions
(Yang,Kraut,Hovy,2019)
68
User edit history Role assignments
Information_insertion 0.4 Reference_insertion 0.2 ….
Roles
Grammar 0.2 Markup_deletion 0.1 Rephrase 0.1 ….
Wikilink_insertion 0.2 Wikilink_deletion 0.1 ….
69
(Yang,Kraut,Hovy,2019)
Discoverededitorroles(namingbyexpert)Expert’s role name Discovered representative behavior
Substantive Expert Information insertion, wikilink insertion, reference insertion
Social Networker Main talk namespace, user namespace
Vandal Fighter Reverting, user talk namespace
Quality Assurance Wikilink insertion, wikipedia namespace, template namespace
Fact Checker Information deletion, wikilink deletion, reference deletion
Cleanup Worker Wikilink modification, template insertion, markup modification
Fact Updater Template modification, reference modification
Copy Editor Grammar, paraphrase, relocation 70
(Yang,Kraut,Hovy,2019)
Topics4–.Otherinferencespecialists
• GeographyandTime… (see(Allen,CACM1983)and(Davis,JAIR2017))– E.g.:north-of,area-included-in-region…
• Physics,Biology… (seetheHALOproject)– RecentworkonaspectsofPhysicsatAI2(Clarketal.)
• Emotions
71
Physics:noun-nouncompounds
Whereis…
• …thekitchentable• …thecoffeetable• …thewoodtable• …theteacher’stable• …thedatatable
• Needtoknowtherelationandthenountypestoinferadditionalinfo:
• LOC• FUNCTIONèLOC• MATERIAL• ?FUNCTIONèLOC?• TYPESèCONTENTèLOC?
72
Physics:SomeBigProblems!• “Salt(Na+Cl-)isawhitepowderwithasaltytaste.Asyou
cansee,itisanioniccompound.Youwillseethepowderdissolvewhenyouputitintowater.”– DoestheformulaNa+Cl-haveasaltytaste?– Isthepowdertheformula?Canyouwriteapowder?– Doesthetastedissolve?Orthewhiteness?
• Alotofinformationishidden,andalotassumed:– Knowledgegaps:explicitlinksbetweenonetermandanother– Omissions:missing(assumedknown?)information
LanguageisfullofwhatPeterClarkcalls‘loosespeak’
73
WherenextwithCalculationQA?
• Identifyandbuildthemostusefuldomainspecialists– Findbasicknowledgeprimitives– Developreasoninglogics,models,andimplementations
– Develop/findQAdatasetsthatexercisethissortofspecialistknowledgeandreasoning
• Createacommonlibraryforalltoshare• EvaluatecorrectnessANDAnswerproductionscripts(traces,as‘explanation’)
Greatoverviewin(Davis,JAIR2018)
74
LongAgeneration
FactoidAmatching
FactoidAcalculation
Today
✗More-
complex More-
complex
FancierNNsthatbuildelaborateA
graphsovertheinput
MethodstobuildAscriptsfromaccessintostructuredKBs
Domainspecialists
75
THANKYOU
76