cs388: natural language processing lecture 20: informa>on ...gdurrett/courses/fa2019/... · lady...

42
CS388: Natural Language Processing Greg Durre8 Lecture 20: Informa>on Extrac>on, SRL, etc.

Upload: others

Post on 23-Jul-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

CS388:NaturalLanguageProcessing

GregDurre8

Lecture20:Informa>onExtrac>on,SRL,etc.

Page 2: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Administrivia

‣ Finalprojectpresenta>onslotsannounced

‣ Project2duetoday

Page 3: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

ThisLecture

‣ Howdowerepresentinforma>onforinforma>onextrac>on?

‣ OpenInforma>onExtrac>on

‣ Rela>onextrac>on

‣ Slotfilling

‣ Seman>crolelabeling/abstractmeaningrepresenta>on

Page 4: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Represen>ngInforma>on

Page 5: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRepresenta>ons

Examplecredit:AsadSayeed

personBrutusCaesarObamaBush…

stabBrutus Caesar

‣ “World”isasetofen>>esandpredicates

president

Obama

Bush…

BrutusstabsCaesar stab(Brutus,Caesar)=>true

‣ Statementsarelogicalexpressionsthatevaluatetotrueorfalse

Caesarwasstabbed ∃xstab(x,Caesar)=>true

Page 6: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Neo-DavidsonianEvents

Examplecredit:AsadSayeed

BrutusstabbedCaesarwithaknifeatthetheaterontheIdesofMarch

∃estabs(e,Brutus,Caesar)

‣ Letsusdescribeeventsashavingproper>es‣ Unifiedrepresenta>onofeventsanden>>es:

∃xdriver(x)∧clever(x)∧loca>on(x,America)

somecleverdriverinAmerica

∧with(e,knife) ∧loca>on(e,theater)

∧>me(e,IdesofMarch)

Page 7: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

RealText

BarackObamasignedtheAffordableCareactonTuesday.HegaveaspeechlaterthataCernoononhowtheactwouldhelptheAmericanpeople.SeveralprominentRepublicanswerequicktodenouncethenewlaw. whichTuesday?

‣ Needtoimputemissinginforma>on,resolvecoreference,etc.

‣ S>llunclearhowtorepresentsomethingspreciselyorhowthatinforma>oncouldbeleveraged(severalprominentRepublicans)

who?

???

whichaaernoon?

∃esign(e,BarackObama)∧pa>ent(e,ACA)∧>me(e,Tuesday)

Page 8: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

OtherChallenges

BobandAlicewerefriendsunGlhemovedawaytoaHendcollege

‣ Howtorepresenttemporalinforma>on?

‣ Represen>ngtrulyopen-domaininforma>onisverycomplicated!Wedon’thaveaformalrepresenta>onthatcancaptureeverything

∃e1∃e2friends(e1,Bob,Alice)∧moved(e2,Bob)∧end_of(e1,e2)

BobandAlicewerefriendsunGlaroundthe+mehemovedawaytoaHendcollege

Page 9: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

(Atleast)ThreeSolu>ons

‣ En>ty-rela>on-en>tytriples:focusonen>>esandtheirrela>ons(notethaten>>esispre8ybroad:canincludeeventslikeWorldWarII,etc.)

(LadyGaga,singerOf,BadRomance)

‣ Craaedannota>onstocapturesomesubsetofphenomena:predicate-argumentstructures(seman>crolelabeling),>me(temporalrela>ons),…

‣ Slotfilling:specificontology,populateinforma>oninapredefinedway

(Earthquake:magnitude=8.0,epicenter=centralItaly,…)

Page 10: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

OpenIE‣ En>ty-rela>on-en>tytriplesaren’tnecessarilygroundedinanontology

BarackObamasignedtheAffordableCareactonTuesday.HegaveaspeechlaterthataCernoononhowtheactwouldhelptheAmericanpeople.SeveralprominentRepublicanswerequicktodenouncethenewlaw.

(BarackObama,signed,theAffordableCareact)

(SeveralprominentRepublicans,denounce,thenewlaw)

‣ Extractstringsandletadownstreamsystemfigureitout

Page 11: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

IE:TheBigPicture

‣ Howdowerepresentinforma>on?Whatdoweextract?

‣ Slotfillers‣ En>ty-rela>on-en>tytriples(fixedontologyoropen)

‣ Seman>croles

‣ Abstractmeaningrepresenta>on

Page 12: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRoleLabeling/ AbstractMeaningRepresenta>on

Page 13: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRoleLabeling

LadyGagaperformedaconcertforstudents

‣ Performingevent

‣ Subject:LadyGaga‣ Object:aconcert‣ Audience:students

AconcertwasperformedbyLadyGagaforstudents

‣ Sameeventdescribedbuttherepresenta>onlooksdifferent

VARG1 ARG0 ARG2

VARG0 ARG1 ARG2

‣ Verb(predicate)associatedwithseveralarguments(roles):“Agent”,“Theme”,and“Beneficiary”

Page 14: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

VerbNet

verbs.colorado.edu

‣ Definestheseman>csofverbs,argumentsforeveryverbinEnglish

Page 15: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRoles

‣ Relatedtothetarolesinlinguis>cs

‣ Agent(~subject),pa>ent/theme(~object),goal(~indirectobject)

‣ “Postprocessing”layerontopofdependencyparsingthatexposesusefulinforma>on,canonicalizesacrossgramma>calconstruc>ons

ARG0 ARG1 ARG2+(seman>csvary)

Page 16: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRoleLabeling

‣ Iden>fypredicate,disambiguateit,iden>fythatpredicate’sarguments

‣ VerbrolesfromPropbank(Palmeretal.,2005)

FigurefromHeetal.(2017)

quicken:

Page 17: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Seman>cRoleLabeling

FigurefromHeetal.(2017)

‣ Iden>fypredicates(love)usingaclassifier(notshown)

‣ Iden>fyARG0,ARG1,etc.asataggingtaskwithaBiLSTMcondi>onedonlove

‣ Othersystemsincorporatesyntax,jointpredicate-argumentfinding

Page 18: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

SRLforQA

ShenandLapata(2007)

‣ Ques>onandseveralanswercandidates

Q:Whodiscoveredprions?

AC1:In1997,StanleyB.Prusiner,ascienGstintheUnitedStates,discoveredprions…

AC2:Prionswereresearchedby…

Scorebymatchingexpectedanswerphrase(EAP)againstanswercandidate(AC)

Page 19: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

AbstractMeaningRepresenta>on

‣ Graph-structuredannota>on

Theboywantstogo

Banarescuetal.(2014)

‣ SupersetofSRL:fullsentenceanalyses,containscoreferenceandmul>-wordexpressionsaswell

‣ F1scoresinthe60s:hard!

‣ Socomprehensivethatit’shardtopredict,buts>lldoesn’thandletenseorsomeotherthings…

Page 20: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

AbstractMeaningRepresenta>on

Flaniganetal.(2016),Lyuetal.(2018)

‣ Firstpredictmappingfromconceptstographnodes(many-to-many)

‣ Thenuseanedgescoringmodulesimilartodependencyparserstopredictedges

‣ Predic>ngacoherentgraphishard,lotsofconstraintsonitandnodynamicprogram

Page 21: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Summariza>onwithAMR

Liuetal.(2015)

‣MergeAMRsacrossmul>plesentences

‣ Summariza>on=subgraphextrac>on

‣ Norealsystemsactuallyworkthisway(morewhenwetalkaboutsummariza>on)

Page 22: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

SlotFilling

Page 23: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

SlotFilling‣Mostconserva>ve,narrowformofIE

IndianExpress—Amassiveearthquakeofmagnitude7.3struckIraqonSunday,103kms(64miles)southeastofthecityofAs-Sulaymaniyah,theUSGeologicalSurveysaid,reportsReuters.USGeologicalSurveyiniGallysaidthequakewasofamagnitude7.2,beforerevisingitto7.3.

magnitude time

epicenter

Speaker:AlanClark

“GenderRolesintheHolyRomanEmpire”

AllagherCenterMainAuditorium

Thistalkwilldiscuss…

speakertitle

location

FreitagandMcCallum(2000)

‣ Oldwork:HMMs,laterCRFstrainedperrole

Page 24: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

SlotFilling:MUC

HaghighiandKlein(2010)

‣ Keyaspect:needtocombineinforma>onacrossmul>plemen>onsofanen>tyusingcoreference

Page 25: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

SlotFilling:Forums

‣ Extractproductoccurrencesincybercrimeforums,butnoteverythingthatlookslikeaproductisaproduct

Portnoffetal.(2017),Durre8etal.(2017)Notaproductinthiscontext

Page 26: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Rela>onExtrac>on

Page 27: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Rela>onExtrac>on‣ Extracten>ty-rela>on-en>tytriplesfromafixedinventory

ACE(2003-2005)

DuringthewarinIraq,AmericanjournalistsweresomeGmescaughtinthelineoffire

Located_In

‣ Systemscanbefeature-basedorneural,lookatsurfacewords,syntac>cfeatures(dependencypaths),seman>croles

Na>onality

‣ Problem:limiteddataforscalingtobigontologies

‣ UseNER-likesystemtoiden>fyen>tyspans,classifyrela>onsbetweenen>typairswithaclassifier

Page 28: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

HearstPa8erns

Hearst(1992)

Xsuchas[list] ciGessuchasBerlin,Paris,andLondon.

‣ Syntac>cpa8ernsespeciallyforfindinghypernym-hyponympairs(“isa”rela>ons)

Berlinisacity

otherciGesincludingBerlin

YisaX

otherXincludingY

‣ Totallyunsupervisedwayofharves>ngworldknowledgefortaskslikeparsingandcoreference(BansalandKlein,2011-2012)

Page 29: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

DistantSupervision

Mintzetal.(2009)

[StevenSpielberg]’sfilm[SavingPrivateRyan]islooselybasedonthebrothers’story

‣ Iftwoen>>esinarela>onappearinthesamesentence,assumethesentenceexpressestherela>on

‣ Lotsofrela>onsinourknowledgebasealready(e.g.,23,000film-directorrela>ons);usethesetobootstrapmoretrainingdata

Allisonco-producedtheAcademyAward-winning[SavingPrivateRyan],directedby[StevenSpielberg]

Director

Director

Page 30: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

DistantSupervision

Mintzetal.(2009)

‣ Learndecentlyaccurateclassifiersfor~100Freebaserela>ons‣ Couldbeusedtocrawlthewebandexpandourknowledgebase

Page 31: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

FewRel‣ Treatsrela>onclassifica>onasafew-shotclassifica>onproblem

‣ 100classesx700instances,goalistogeneralizetoeachclasswithjustafewinstances

‣ BERTcanhandlethisfairlywell(Soaresetal.,2019)

‣ “FewRel2.0”:newdatasetwith“noneoftheabove”type,whichmakesthingsmuchharder

Hanetal.(2018),Gaoetal.(2019)

Page 32: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

En>tyTracking/ProceduralText

Page 33: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

En>tyTracking

Slidecredit:AdityaGupta

RecipesDataset

Kiddonetal.(2016),Bosselutetal.(2018)

‣ Informa>onextrac>onfor“proceduraltext”:textdescribingsomekindofprocess

‣ Forarecipe:whatingredientsareinvolvedateach>mestep?

‣ Involvesglobalconstraintsandbeingabletomodelcomplexen>tyinterac>ons

Page 34: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

En>tyTracking

Slidecredit:AdityaGupta

ProParaDataset

Dalvietal.(2018),GuptaandDurre8(2019)

‣ Processparagraphs:predictwhenobjectsarecreated,moved,ordestroyedinascien>ficprocess

‣ Structuredpredic>onproblem,>edtothepar>cularinforma>onconveyedintheseparagraphs

‣ UseaneuralCRFtomakeacoherentpredic>onforeachen>ty

Page 35: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

OpenIE

Page 36: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

OpenInforma>onExtrac>on

‣ Typicallynofixedrela>oninventory

‣ “Open”ness—wanttobeabletoextractallkindsofinforma>onfromopen-domaintext

‣ Acquirecommonsenseknowledgejustfrom“reading”aboutit,butneedtoprocesslotsoftext(“machinereading”)

Page 37: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

TextRunner‣ Extractposi>veexamplesof(e,r,e)triplesviaparsingandheuris>cs

BarackObama,44thpresidentoftheUnitedStates,wasbornonAugust4,1961inHonolulu

=>Barack_Obama,wasbornin,Honolulu

‣ TrainaNaiveBayesclassifiertofiltertriplesfromrawtext:usesfeaturesonPOStags,lexicalfeatures,stopwords,etc.

‣ 80xfasterthanrunningaparser(whichwasslowin2007…)

Bankoetal.(2007)

‣ Usemul>pleinstancesofextrac>onstoassignprobabilitytoarela>on

Page 38: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Exploi>ngRedundancy

Bankoetal.(2007)

‣ Concrete:definitelytrueAbstract:possiblytruebutunderspecified

‣ Hardtoevaluate:canassessprecision ofextractedfacts,buthowdoweknowrecall?

‣ 9Mwebpages/133Msentences

‣ 2.2tuplesextractedpersentence,filterbasedonprobabili>es

Page 39: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

ReVerb

Faderetal.(2011)

‣Moreconstraints:openrela>onshavetobeginwithverb,endwithpreposi>on,becon>guous(e.g.,wasbornon)

‣ Extractmoremeaningfulrela>ons,par>cularlywithlightverbs

Page 40: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

ReVerb

Faderetal.(2011)

‣ Foreachverb,iden>fythelongestsequenceofwordsfollowingtheverbthatsa>sfyaPOSregex(V.*P)andwhichsa>sfyheuris>clexicalconstraintsonspecificity

‣ Findthenearestargumentsoneithersideoftherela>on

‣ Annotatorslabeledrela>onsin500documentstoassessrecall

Page 41: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

QAfromOpenIE

Choietal.(2015)

Page 42: CS388: Natural Language Processing Lecture 20: Informa>on ...gdurrett/courses/fa2019/... · Lady Gaga performed a concert for students ‣ Performing event ‣ Subject: Lady Gaga

Takeaways

‣ Rela>onextrac>on:cancollectdatawithdistantsupervision,usethistoexpandknowledgebases

‣ Slotfilling:>edtoaspecificontology,butgivesfine-grainedinforma>on

‣ OpenIE:extractslotsofthings,buthardtoknowhowgoodorusefultheyare

‣ Cancombinewithstandardques>onanswering

‣ Addnewfactstoknowledgebases

‣ SRL/AMR:handleabunchofphenomena,butmoreorlesslikesyntax++intermsofwhattheyrepresent

‣Many,manyapplica>onsandtechniques