from simple to complex qa - ibm€¦ · evgeni plushenko russia (rus) 2002–2014 4 kim yu-na south...

FromSimpletoComplexQA

EduardHovyCMU

www.cs.cmu.edu/~hovy

WebclopediaQA,2003

•  Wherearezebrasmostlikelyfound? —inthedictionary

•  Wheredolobstersliketolive? —onthetable

•  HowmanypeopleliveinChile? —nine

Webclopedia(Hovyetal.2001)

•  Whatisaninvertebrate? —Dukakis 1

Twowaystomakethingsmorecomplex

1.   Complexanswers:Long,non-factoidresponses.Connectstotextsummarization,textplanning,NLgeneration

2.   Complexanswering:Aproceduremorethansimpleatomicpatternmatching

Weleavethisforanothertime

So,whattodo?3

Whatis‘complexQA’?

LongAgeneration

FactoidAmatching

FactoidAcalculation

Today

✗More-

complex FactoidAcalculation

4

Outline

•  SimpleQA:Matching•  ComplexQA:Calculating•  TheConundrum•  TwoPathsForward

5

SIMPLEQA:MATCHING

6

BasicsimpleQAarchitecture

•  IdentifykeywordsfromQ•  Build(Boolean)queryforIR•  RetrievetextsusingIR•  Ranktexts/passages

•  FindspecifiedQtype•  MoveApatternsovertextand

scoreeachposition•  Rankwindows;returntopN

Alist

InputQ

Corpus:30%

+Web:add10%

1Mdocuments3000sentences

50candidates5answers

…Xwasbornin<YEAR>……Xwasbornon<DATE>……X(<YEAR>–<YEAR>)…

7

Patterns

Themainresearchliesinbuilding/learning,specializing,andmatchingAgraphs/patterns:

– Manualcreationandstringmatching

– LearnedasperInfoExtraction:recursivepatternlearning,simplematching

– Learnedinneuralarchitectures,matchingisautomatic

Increasinglyautomated

Patternstrings,trees,frames,graphs…

8

Infact,allQAprocessingbeseenasmatchingaQuestiongraphagainstasetof(normalized/expanded)candidateAnswergraphs

withthebestmatch(es)providingtheAnswer

9

Withenoughpatternengineering…

Withenoughpatternengineeringandsomespecializedreasoning(rhymes,geospatialandtemporalreasoning,etc.)……yougetIBM’sWatson

10

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Answered

Baseline

12/2007

8/2008

5/2009

10/2009

11/2010

12/2008

5/2008

4/2010

Precision

Watson’sincrementalprogressinPrecisionandConfidence

slidebyIBM

11

Theultimatepatterntechnology

•  Apatternmustcapture– Thesequenceofrelevantwords/types– Thetargetword/type[familyofsynonyms]

•  NNsdoexactlythis– Ngrammodelsofwordembeddings– Embeddingsthatgeneralizewords

12

Embeddingsmakepatternlearningandengineeringmucheasier

•  Awordembeddingisa‘localgeneralizedngrammodel’fortheword–  capturestheword’sgeneralcontextexpectations–  captures(semantic)substitutionsoftheword

•  Asentenceembeddingisa‘generalizedngramsequencemodel’fortheQuestion/sentence–  NgramsareweightedindifferentwaysdependingonNNarchitecture(directionofLSTM,attention,etc.)

–  Peopleinventarchitecturestohighlight(orsmoothout,usingadversarialapproaches)specificparts

–  Theimportantparts(wordsforQparameters,inter-partrelations,theAintroducerwords,etc.)formthe‘pattern’

13

Pattern-levelevolutionMethod

•  Stringmatch•  Synonymsubstitution•  Typedpatternmatch

•  Parsetreematch

•  Shallowsemanticmatch

•  Embeddingmatch

Resource•  Surfacepatternlibrary•  Synonymdictionary,WN•  QAtypology/Typehierarchy

•  Parser+Treetransformrules

•  Semanticanalyzer+Typehierarchy+partialunifier

•  SimpleNN

•  Attention-focusedBiLSTM,etc. 14

BUT:Patternsthataretoogoodgiveafalsesenseofaccomplishment

Trainedwithenoughembeddings…

…theNN’swordandword-combinationmodelscaptureworldknowledgefactoidsandrelations(incl.theirsurface-levelcuesforsemanticstructure)

SoyougetwhatlookslikesemanticQA

15

DidyouknowyouareanexpertonthePanamaCanal?

BlahPanamaCanalblahblahPanamablahPres.RooseveltblahUSAblahblahblahblahblah10yearsblahuntil1914blahblahblahblah51milesblahblahblahblahblahblahblahblahblahblahblahblahblah8to10hoursblahblahblahblahGatunLakeblah

WhenwasthePanamaCanalcompleted?HowlongisthePanamaCanal?HowlongdidittaketobuildthePanamaCanal?HowlongdoesittaketocrossthePanamaCanal?WhatisthelakeinthePanamaCanalcalled?WhichUSPresidentenabledthePanamaCanal?WhichoceansdoesthePanamaCanalconnect?Inyourtrainingdata,youhavesurelyseen“PanamaCanal”withonlytwooceannames…

16

Howpowerfularengrampatternmodels?

•  CLOTH:Large-scaleClozeTestDatasetCreatedbyTeachers(Xie,Lai,Dai,Hovy,EMNLP2018)

•  Large-scaleclozetestdatasetcollectedfromEnglishexamsinChina(MiddleandHighschoollevel)–  Aftercleanup:7kpassages;99kquestions(2/3removed)

•  Thedroppedwordsandwordoptionswerecarefullycreatedbyteachers:–  Highlynuancedalternatives–  Testknowledgeofgrammar,vocabulary,reasoning–  Howwelldostate-of-the-artcomputationalmodelsdoincomparisontohumans?(1-billion-wordlanguagemodel)

17

•  Tense,voice,preps•  Localcontentwords

•  Copy/paraphrasewords•  Contentwords,long-distancedependencies

Percentagesoftestexamples

(Xieetal.2018)

18

QAsystemresults

•  Evena1B-LMstilllagsbehindhumanperformance•  Increasingthecontextlengthfor1B-LMdoesnothelp•  However:human-createdquestionsaredifferent:

(Xieetal.2018)

19

So,Iconclude:Givenenoughtrainingdata…

(andassumingtheQprovidescontexttext)…youwillalwayslearngoodpatterns/wordcombinationgraphsthatconnectQparameters<–>Qcontextmaterial<–>A…(Ifyouhavenotseenthenecessarycombinations,youwon’tbeabletoanswertheQ)

20

…tothedegreethat……youmaynotevenneedtheQcontext(orifyouhavetheQcontext,youmaynotneedtheQ!!):CorruptedngramsandotherSQuADperturbations(JiaandLiang,EMNLP2017)

NecessityofQcontextorevenofQitself(KaushikandLipton,EMNLP2018,BestShortPaperaward)

AHEM!!21

KaushikandLiptonEMNLP2018•  Researchgoal:

–  Howstrongaremodelsthatseethequestiononly?– WhataboutmodelsthatseetheQcontextpassageonly?–  Howdoweknowmodelsarereally“reading”thewholepassage?

•  Question-onlysetting:–  Ifpassageneededforengine,randomizeitswordsfirst–  IfcandidateAsneeded,theyareplacedinrandomspots,interveningtextfilledwithgibberish

•  Passage-onlysetting:–  Createcorruptedversionsofeachdataset:assignQtosomepassagerandomly

22

Example:QonlyPassage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Answer:kakueitanaka

(KaushikandLiptonEMNLP2018)

23

Experiments•  Datasets/tests:

–  Spanselection:SQuAD,TriviaQA–  Clozequeries:ChildrensBookTest(CBT),CNN,CLOTH,Who-did-What,DailyMail

–  Multi-classclassification(implicit):bAbI(20tasks)–  Multiple-choicequestionanswering:RACE,MCTest–  Answergeneration:MSMARCO

•  Algorithms:–  Key-ValueMemoryNetworks:

Miller,Alexander,etal.2016.Key-ValueMemoryNetworksforDirectlyReadingDocuments.ProceedingsoftheEMNLPconference.

–  GatedAttentionReaders:Dhingra,Bhuwan,etal.2017.Gated-AttentionReadersforTextComprehension.ProceedingsoftheACLconference(LongPapers).

–  QANet:Yu,AdamsWei,etal.2018.QANet:CombiningLocalConvolutionwithGlobalSelf-AttentionforReadingComprehension.ProceedingsofICLR.


24

Someresults

SQuAD,usingQANet

bAbI,usingKey-ValueMemNets


25

Who-did-What,usingGated-AttentionReaders

CBT,usingGated-AttentionReaders


26

What’sgoingon??Passage:...glynisbc-nj-zimmer-profile-2takes-nytrahanefumioyasuhirodragnealhadonbjorkman/max...seventh-largestembarrasedjeopardyhilariouslymasahisahaibarabajram8-to-24duke/meredithacceding...koiduiraqs2:32:21//www.ironmanlive.com/sagawakyubindeaninternatinoal90-meterkakueitanakaseven-paragraph577,610wendovergolf-lpga-jpnpartner,un-appointeduemazzeicanada-u.s.Question:shinkanemaru,thegravel-voicedback-roombosswhodiedonthursdayaged81,goesdowninhistoryasjapan’smostcorruptpost-warpoliticianafter___________Answer:kakueitanaka

Transportationcompany

Kanemaru’ssecretary

Long-termpolitician

NamenotinGoogle

27

Lessonslearned

•  Thisstudywasgreatbutnotperfect—itshouldhavecheckedforpre-existingdependenciesamongtheQandthecandidateAs

•  Still,itshows:youmustproviderigorousbaselines,forbothdatasetsandmodels

•  Andyoumusttestthatfullcontextisessentialforthetask—nohiddendependenciesanywhere!

28

Wherenextwiththisapproach?

•  DesignlargerandfancierNNarchitectures(Feedforward—>BiLSTM—>BiLSTMwithAttentiontoQtypeword—>…)thatbuildincreasinglycomplexgeneralized‘recognizergraphs’

•  Inthelimit(withenoughtrainingdata),theywillidentifyallrelevantQparametersinthecorrectconfigurationandpinpointtheA

BUT:Thisworksonlywhenalltherelevantinfoisexplicitlypresent

29

LongAgeneration

FactoidAmatching

FactoidAcalculation

Today

✗More-

complex

FancierNNsthatbuildelaborateA

graphsovertheinput

30

COMPLEXQA:CALCULATING

31

QAsystemdevelopment

•  Seriesofspecializedsubtasks:–  ProduceaQuestiongraph–  GetmultiplecandidateAnswer

texts–  Producetheirgraphs–  Canonicalizethemintotypes–  Matchandgetagoodnessscore–  ReranktheAsanddeliverlist

•  Whensomeinfoismissing…–  ThecandidateAgraphis

disconnected–  TheQgraphcannotmatch

anywhere–  Needbackgroundknowledge

(andreasoning?)toconnectuptheAgraphfragments

•  ModerncommercialQAwork:–  InduceAnswertypesfrom

questionlogs,andcreate[thousandsof]latentAdimensions

–  Automaticallylearnpatternsassociatedwithanswerdimensions

–  Harvestandpreparetonsofanswers—cover98%ofeffectiveQspaceofquestionlogs

32

QAresearchhasdonealltheeasycases…butwhatifsomegraphinfoismissing?

ThecandidateAgraphisdisconnectedTheQgraphcannotmatchanywhere

HowtoconnectuptheAgraphfragments?

SEMTYPE

SEMTYPE

SEMTYPE

SEMTYPE

NameAgent

Loc

SEMTYPE

?Rel1 ?Rel2

?Rel3

?Rel4 33

Needbackgroundknowledge!

Ineverycase,youneedbackgroundknowledge•  Easyknowledge:TransformAstringintomatchableform:–  Synonymsubstitution–  Semantictypenormalization–  Parsetreenormalization(tenses,passive–>active,etc.)

•  Hard:Addnewnodesandlinksbetweensubgraphs:–  Providenewrelations–  Provideadditionalnodes

34

Possiblesourcesofthisknowledge•  Externalsearch:

–  Querysomethinglikethewebandhopetobelucky

•  Entailments:“sentence”–>“sentence”–  Operateatsurfaceform(inRTEformulation)–  Allowonesurfaceformtobestatedwhenanotherisgiven–  NewsurfaceformmayprovideAnswer–  Need:entailmentrules+entailmentapplier

•  Axioms:A∨B–>C–  Operateatdeeperlevel–  Connectrepresentationsubgraphs,evenprovidingnewnodes–  ExpandedgraphmayprovideAnswer–  Need:axioms/compositionrules+theoremprover

35

Populartasktoday:QAoverstructureddata

•  Data:database,table,etc.•  Task:AskQsthatrequire(1)findingvariousbitsofdataand(2)composingthemtomaketheA

•  Themissinginformationisthescriptgoverningthesequenceofaccessandcomposition

•  Research:howto[learnto]buildthisscript?•  Evaluation:didthesystemproducetherightA?•  Examples:

– U.S.geographydatabaseof800facts(Zelle&Mooney,1996)– Wikitablequestions(PasupatandLiang,2015;Dasigi2018)– Otherdomains’tables(severalAI2projects)

36

Wikitabledataset

37

Athlete Nation Olympics Medals

Gillis Grafström

Sweden (SWE) 1920–1932 4

Kim Soo-Nyung

South Korea (KOR) 1988-200 6

Evgeni Plushenko Russia (RUS) 2002–2014 4

Kim Yu-na South Korea (KOR) 2010–2014 2

Patrick Chan Canada (CAN) 2014 2

Question:WhichathletewasfromSouthKoreaaftertheyear2010?

Answer:KimYu-Na

Reasoning:1)  GetrowswhereNationcolumn

containsSouthKorea2)  FilterrowswhereOlympicshas

avaluegreaterthan2010.3)  GetvaluefromAthletecolumn

fromfilteredrows.

Program:((reverseathlete)(and (nationsouth_korea) (year((reversedate)

(>=2010-mm-dd)))WikiTableQuestions,PasupatandLiang,2015

DasigiPhDthesis,2018

Example:Dasigithesis2018•  Approachforlearningtobuildaccessroutines:

–  ParseQ,builddependencytree–  ConvertintoLogicalForm–  Translateintocandidatetableaccessroutine–  Testcompositionbyrepeatedtrialanderror

•  Essentially,learningisasearchin‘operatorcombinationspace’tobuildlogicalform.Speedupsearchby–  Learningtoassociatetableaccessparameterswithpartsofthetree(Qvariables)

–  Learningtoassociatenestingandaccessoperatorswithpartsofthetree(‘operator’words:“themost”,“last”,etc.)

–  Predefiningsomelexicon-to-operationmappings–  Payingattentiontogrammaticalconstructionofthetree–  Implementingheuristicstoguideexploration(‘shortQsfirst’)

DasigiPhDthesis,2018

38

EmpiricalcomparisononWikiTableQuestions

●  Requiresapproximatesetoflogicalformsduringtraining

●  UsedoutputfromDynamicProgrammingonDenotations(PasupatandLiang,2016)

●  Variousmodels:strongs,trees,etc.

●  Efficientsearchfollowedbypruningusinghumanannotations

39

(Krishnamurthy, Dasigi and Gardner, 2017)

Theproblemislearningtoconstructtheanswerscript

•  Weaksupervisionisnotenough:–  Incorporatingknowledgeofgrammarconstraintshelps–  Handlingspuriousexamples(rightanswerforwrongreasons)–  Consideringcoverageofcases:Useoverlapasameasuretoguidesearch

•  CombineintosingleObjective:Minimizeexpectedvalueofcost(Goodman,1996;GoelandByrne,2000;SmithandEisner,2005)

withalinearcombinationofcoverageanddenotationcosts

•  Implementiterativesearch(fromsimplertomorecomplex)40

ResultsoftrainingwithiterativesearchonWikiTableQuestions

41

●  Similar trend in 2 domains (NLVR and WikiTableQuestions) ●  Used functional query language (Liang et al., 2018)

(Dasigi,Gardner,Murty,Zettlemoyer,Hovy2018)

Wherenextwiththisapproach?

•  BetterwaystolearntobuildtheAretrieval+compositionscript(=mappingfromEnglishQtonestedoperatorsandvariables):– LearningfromcorporaofQsandscripts(likeMooneyetal.)

– Learningbyguidedsearchthrough‘operatorspace’(likeDasigiandothers)

•  Waystoprovecorrectnessofthescript42

LongAgeneration

FactoidAmatching

FactoidAcalculation

Today

✗More-

complex More-

complex


graphsovertheinput

MethodstobuildAscriptsfromaccessintostructuredKBs

43

THECONUNDRUM

44

Either…

AllinfoneededtogettheAispresentintheQcontext

…sosomeformofsurfacematching+sub-Acompositionsuffices

—>Ultimately,nestedsimpleQA(…stillOK?)

Or…

GettingtheArequiresinformationnotintheQcontext:backgroundknowledge,calculation,etc.

…butthisisnotstandardized,henceimpossibletoevaluate

—>NocomplexQA!?45

TWOPATHSFORWARD

46

Twotypesof‘complex’QA

Matching (Fact[oid]-oriented)

•  CreateQsneedingmultiplematches+dynamiccomposition

•  Buildthisdataset•  Evaluatethecomposition

Computing(Procedure-oriented)

•  IdentifygeneralbackgroundknowledgeallQAsystemsshouldhave

•  Buildthedataset(s)ofQsandAsrequiringit

Howtodeepenthis? Howtodeepenthis?

47

1.Makingmatchingmorecomplex:RACE:Abettertestbed

•  RACE:ReAdingComprehensiondatasetfromExaminations(Lai,Xie,Liu,Yang,Hovy,EMNLP2018)

•  CollectedfromChinesemiddleandhighschoolexamsthatevaluatehumanstudents’Englishreadingcomprehensionability– Designedbyhumanexperts:Ensuresqualityandbroadtopiccoverage

–  SubstantiallymoredifficultthanexistingQAdatasets(butRACE-MeasierthanRACE-H)

– About4/5ofsourcematerialfilteredouttoremoveduplicates,incorrectformat,etc.

•  Aftercleaning:27,933passages;97,687questions

(Laietal.2018)

48

Kindsofmore-complexmatching•  Detailmatching:understanddetailsofthepassage

•  ParaphrasingQs:testlanguageability•  Whole-picturematching:comprehendtheentirestory

•  PassagesummarizationQs:understandthepoint•  Attitudematching:findopinions/attitudesoftheauthortowardssomething

•  WorldknowledgeQs:useexternalknowledgesuchassimplearithmetic

(Laietal.2018)

50

ComparisonwithotherQAdatasets•  Reasoningquestions:59.2%ofRACE;20.5%ofSQuAD•  Processingtypes:

–  Wordmatching:exactmatch–  Paraphrasing:paraphraseorentailment–  Single-sentreasoning:incompleteinfoorconceptualoverlap

–  Multi-sentreasoning:synthesizinginformationfrommultiplesentences

–  Insufficient/Ambiguous:noA,orAisnotunique

(Laietal.2018)

51

ComparingQAalgorithms

•  Baselines:–  SlidingWindow:TF-IDFbasedmatchingalgorithm–  StanfordAttentionReader(AR)andGatedAttentionReader:state-of-the-artneuralmodels

•  RACEhashigherhumanceilingperformance,whichshowsthedataisquiteclean

•  RACEisharderforARmodels,provingasignificantgapremains

(Laietal.2018)

52

Matchingtypeperformance

•  TurkersandSlidingWindowaregoodatsimplematchingquestions

•  Surprisingly,StanfordARdoesnothaveabetterperformanceonmatchingquestions

(Laietal.2018)

53

MovingaheadonMatchingQA

•  Identifythetypesofmore-complexmatching,matchcomposition/nesting,etc.

•  BuilddatasetslikeRACEthat–  Comefromtherealworld– Donotsupportmatchingonsimplemulti-factpresence

–  Showarealgapbetweenhumanandsystem

•  EvaluatecorrectnessANDcomposition(requiretheanswertraceplusitscomponentfactoids)

54

2.Makingcomputationmorecomplex:Inferenceofvariouskinds

•  DefineNself-containedstandardized‘domainspecialists’(KBs+reasoners)thatanyQAenginecanactivate

•  Atrun-time,analyzetheQ,buildtheAscript,activatethespecialistsasneeded,computetheA

Arithmetic

GeographyPsych:goals

SocialcustomsPhysics

55

Examples

•  WhatisthelargestcapitalcitysouthofSantiagodeChile?– Geographicknowledge(lat-long,population)– Numericalability(sorting,etc.)

•  WhichoftheleadersoftheXYZenterprisearewell-liked,andwhy?– Discoveryofsocialrolebyactions– Sentimentjudgmentsattachedtoactions

56

Researchneeded

•  Foreachdomainspecialist:– Defineits‘knowledgeservice’–  Createtheunderlyingknowledge– DefinetheI/OAPIsfortheQAenginetouse–  Buildthespecialist

•  ForeachQAengine:– AnalyzetheQtodetermineparametersandneed– Decomposetheneedintoascriptofspecialistqueriesplustheirresultcomposition

–  Execute57

Somespecialistareaswearecurrentlyworkingoninmygroup

1.  Arithmetic/numericalreasoningforentailment

2.  Psychgoalsforsentimentjustification3.  Socialrolesforgroupactivitysupport

58

Topic1.Numericalcalculation•  Task:Entailmentproblem•  Input:clausescontainingnumbers•  Output:entailed/not-entailed

•  Impact/need:–  ContradictionpairsfromWikipediaandGoogleNews:29%fromnumericdiscrepancies(deMarneffeetal.ACL2008)

–  SeveralRecognizingTextualEntailmentdatasets:numericcontradictionsare8.8%ofcontradictorypairs(Daganetal.RTE2006)

(Ravichander,Naik,Rose,Hovy,2019)

P:AbombinaHebrewUniversitycafeteriakilledfiveAmericansandfourIsraelisH:AbombingatHebrewUniversityinJerusalemkilledninepeople,includingfiveAmericans

59

EQUATE

•  Modelsofquantitativereasoningshould:–  Interpretquantitiesexpressedinlanguage–  Performbasicarithmeticcalculations–  Justifyquantitativeclaimsbycombiningverbalandnumericreasoning

•  Currentinferencedatasetsdonottestthis•  OurworkEQUATE:

– Acorpusofnumericalpairswithentailmentlabels– Abenchmarkevaluationframeworktotestmodelabilitytoperformquantitativereasoningfornaturallanguageinference,combining9previousapproaches

(Ravichanderetal.,2019)

60

EQUATEcorpusDataset Size Clas

sesSynthetic

DataSource

AnnotationSource

QuantitativePhenomena

StressTest 7500 3 ✓ AQuA-RAT Automatic Quantifiers

RTE-Quant 166 2 ✗ RTE2-RTE4 Expert Arithmetic,Worldknowledge,Ranges,Quantifiers

AwpNLI 722 2 ✓ ArithmeticWordProblems

Automatic Arithmetic

NewsNLI 1000 2 ✗ CNN Crowd-sourced

Ordinals,Quantifiers,Arithmetic,WorldKnowledge,Magnitude,Ratios

RedditNLI 250 3 ✗ Reddit Expert Range,Arithmetic,Approximation,Verbal


61

Baselines(SOTAmethods)•  MajorityClass(MAJ):Simplebaselinealwayspredictsthemajorityclassintestset.•  Hypothesis-Only(HYP):FastTextclassifiertrainedononlyhypothesestopredictthe

entailmentrelation(Gururanganetal.2018)•  ALIGN:Abag-of-wordsalignmentmodelinspiredbyMacCartney(2009)•  NB(NieandBansal2017):SentenceencoderconsistingofstackedBiLSTM-RNNs

withshortcutconnectionsandfine-tuningofembeddings.Achievestopnon-ensembleresultintheRepEval-2017sharedtask

•  CH(Chenetal.2017):SentenceencoderconsistingofstackedBiLSTM-RNNswithshortcutconnections,character-compositionwordembeddingslearnedviaCNNs,intra-sentencegatedattentionandensembling.AchievesbestoverallresultintheRepEval-2017sharedtask

•  RC(Balazsetal.2017):Single-layerBiLSTMwithmeanpoolingandintra-sentenceattention

•  IS(Conneauetal.2017):Single-layerBiLSTM-RNNwithmax-pooling,showntolearnrobustuniversalsentencerepresentationsthattransferwellacrossinferencetasks

•  BiLSTM:WereimplementthesimpleBiLSTMbaselinemodelofNangiaetal.(2017).OurreimplementationachievesslightlybetterresultsontheMultiNLIdevset

•  CBOW:Bag-of-wordssentencerepresentationfromwordembeddingspassedthroughatanhnon-linearityandasoftmaxlayerforclassification.


62

Constructingentailmentinferences•  Generateareport

foreachpremise-hypothesispair,consistingof:–  ExtractedNUMSETSforpremiseandhypothesis

–  WhichNUMSETSwerecombinedandbywhatoperation

–  WhichNUMSETSwerejustifiedandwhichweren’t

•  Combinesneuralandsymbolicprograms–  Somesubmodulesareneural;overallframeworkissymbolic–  Lightweightsupervision


63

Topic2.Humangoals•  ComplexQAdomain:humangoalsandsentiment

–  FindingsentimentHolder,Topic+Facet,andValencearerelativelyeasy

–  Example:“Ilovedthehotel’spricebuttheroomwasnoisy”—>[price+][room-]

•  Task:sentimentjustification:WHYdoestheHolderhavethatsentimentvalenceforthatfacet?

•  Approach:Classifyeachclauseintoalistofhuman(psychologicalandsocial)goals–  Input:Sentiment-bearingclause– Output:Sentimentvalencelabel,facet,humangoal(s)thatjustifysentimentvalence

(OtaniandHovy,2019)

64

Psychgoals

•  Technicalapproach:Automatedclassificationintotaxonomiesofpsych(notsocial)goals–  Initialset:Maslowhierarchy(seeWikipedia)–  Currently:about110humangoals,identifiedandtaxonomized(Talevichetal.)

•  Domains:ReviewsofHotels,Cameras,Movies•  Data:Crowdsourcingtrainingmaterial;kappaagreement≈0.55

•  Results:traditionalandneuralclassifiersdobetteronobjectsandpooreronevents/complexthingslikemovies

(OtaniandHovy,2019)

65

(Talevichetal.2017)

66

Topic3.Socialroles

•  ComplexQAdomain:Humaninteractionsingroups

•  Task:Automatedsocialrolediscovery–  Input:Discussionsinasocialmediaplatform– Output:Rolelist,andassignmentforeachuser

•  Data:– Wikipediaeditors:ourroletaxonomyconformstoWikipedia’sinternalset

– CancerSurvivorNetworkdiscussiongroups

(Yang,Kraut,Hovy,2019)

67

LatentrolemodelinWikipedia

Role: distribution of edit actions

Role proportions

Role assignment for user u and word n

Edit actions


68

User edit history Role assignments

Information_insertion 0.4 Reference_insertion 0.2 ….

Roles

Grammar 0.2 Markup_deletion 0.1 Rephrase 0.1 ….

Wikilink_insertion 0.2 Wikilink_deletion 0.1 ….

69


Discoverededitorroles(namingbyexpert)Expert’s role name Discovered representative behavior

Substantive Expert Information insertion, wikilink insertion, reference insertion

Social Networker Main talk namespace, user namespace

Vandal Fighter Reverting, user talk namespace

Quality Assurance Wikilink insertion, wikipedia namespace, template namespace

Fact Checker Information deletion, wikilink deletion, reference deletion

Cleanup Worker Wikilink modification, template insertion, markup modification

Fact Updater Template modification, reference modification

Copy Editor Grammar, paraphrase, relocation 70


Topics4–.Otherinferencespecialists

•  GeographyandTime… (see(Allen,CACM1983)and(Davis,JAIR2017))– E.g.:north-of,area-included-in-region…

•  Physics,Biology… (seetheHALOproject)– RecentworkonaspectsofPhysicsatAI2(Clarketal.)

•  Emotions

71

Physics:noun-nouncompounds

Whereis…

•  …thekitchentable•  …thecoffeetable•  …thewoodtable•  …theteacher’stable•  …thedatatable

•  Needtoknowtherelationandthenountypestoinferadditionalinfo:

•  LOC•  FUNCTIONèLOC•  MATERIAL•  ?FUNCTIONèLOC?•  TYPESèCONTENTèLOC?

72

Physics:SomeBigProblems!•  “Salt(Na+Cl-)isawhitepowderwithasaltytaste.Asyou

cansee,itisanioniccompound.Youwillseethepowderdissolvewhenyouputitintowater.”–  DoestheformulaNa+Cl-haveasaltytaste?–  Isthepowdertheformula?Canyouwriteapowder?–  Doesthetastedissolve?Orthewhiteness?

•  Alotofinformationishidden,andalotassumed:–  Knowledgegaps:explicitlinksbetweenonetermandanother–  Omissions:missing(assumedknown?)information

LanguageisfullofwhatPeterClarkcalls‘loosespeak’

73

WherenextwithCalculationQA?

•  Identifyandbuildthemostusefuldomainspecialists–  Findbasicknowledgeprimitives– Developreasoninglogics,models,andimplementations

– Develop/findQAdatasetsthatexercisethissortofspecialistknowledgeandreasoning

•  Createacommonlibraryforalltoshare•  EvaluatecorrectnessANDAnswerproductionscripts(traces,as‘explanation’)

Greatoverviewin(Davis,JAIR2018)

74

LongAgeneration

FactoidAmatching

FactoidAcalculation

Today

✗More-

complex More-

complex


graphsovertheinput

MethodstobuildAscriptsfromaccessintostructuredKBs

Domainspecialists

75

THANKYOU

76