cs388: natural language processing lecture 16: informa;on

9
CS388: Natural Language Processing Lecture 16: Informa;on Extrac;on Greg Durrett Recall: ABen;on the movie was great h 1 h 2 h 3 h 4 <s> ¯ h 1 For each decoder state, compute weighted sum of input states e ij = f ( ¯ h i ,h j ) c i = X j ij h j c 1 Unnormalized scalar weight Weighted sum of input hidden states (vector) le ij = exp(e ij ) P j 0 exp(e ij 0 ) P (y i |x,y 1 ,...,y i-1 ) = softmax(W [c i ; ¯ h i ]) the movie was great Very helpful for seq2seq models Recall: Alterna;ves When do we compute aBen;on? Can compute before or aWer RNN cell Bahdanau et al. (2015) <s> ¯ h 1 c 1 <s> c 1 Luong et al. (2015) AWer RNN cell Before RNN cell; this one is a liBle more convoluted and less standard le le This Lecture How do we represent informa;on for informa;on extrac;on? Open Informa;on Extrac;on Rela;on extrac;on Slot filling Seman;c role labeling / abstract meaning representa;on

Upload: others

Post on 05-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS388: Natural Language Processing Lecture 16: Informa;on

CS388:NaturalLanguageProcessingLecture16:Informa;onExtrac;on

GregDurrett

Recall:ABen;on

themoviewasgreat

h1 h2 h3 h4

<s>

h̄1

‣ Foreachdecoderstate,computeweightedsumofinputstates

eij = f(h̄i, hj)

ci =X

j

↵ijhj

c1

‣ Unnormalized scalarweight

‣Weightedsumofinputhidden states(vector)

le

↵ij =exp(eij)Pj0 exp(eij0)

P (yi|x, y1, . . . , yi�1) = softmax(W [ci; ¯hi])

themovie wa

sgreat

‣ Veryhelpfulforseq2seqmodels

Recall:Alterna;ves‣ WhendowecomputeaBen;on?CancomputebeforeoraWerRNNcell

Bahdanauetal.(2015)

<s>

h̄1

c1

<s>

c1

Luongetal.(2015)

‣ AWerRNNcell

‣ BeforeRNNcell;thisoneisaliBlemoreconvolutedandlessstandard

lele

ThisLecture

‣Howdowerepresentinforma;onforinforma;onextrac;on?

‣ OpenInforma;onExtrac;on

‣ Rela;onextrac;on

‣ Slotfilling

‣ Seman;crolelabeling/abstractmeaningrepresenta;on

Page 2: CS388: Natural Language Processing Lecture 16: Informa;on

Represen;ngInforma;on

Seman;cRepresenta;ons

Examplecredit:AsadSayeed

personBrutusCaesarObamaBush…

stabBrutus Caesar

‣ “World”isasetofen;;esandpredicates

presidentObamaBush…

BrutusstabsCaesar stab(Brutus,Caesar)=>true

‣ Statementsarelogicalexpressionsthatevaluatetotrueorfalse

Caesarwasstabbed ∃xstab(x,Caesar)=>true

Neo-DavidsonianEvents

Examplecredit:AsadSayeed

BrutusstabbedCaesarwithaknifeatthetheaterontheIdesofMarch

∃estabs(e,Brutus,Caesar)

‣ Letsusdescribeeventsashavingproper;es‣Unifiedrepresenta;onofeventsanden;;es:

∃xdriver(x)∧clever(x)∧loca;on(x,America)

somecleverdriverinAmerica

∧with(e,knife) ∧loca;on(e,theater)∧;me(e,IdesofMarch)

RealText

BarackObamasignedtheAffordableCareactonTuesday.HegaveaspeechlaterthataCernoononhowtheactwouldhelptheAmericanpeople.SeveralprominentRepublicanswerequicktodenouncethenewlaw. whichTuesday?

‣ Needtoimputemissinginforma;on,resolvecoreference,etc.

‣ S;llunclearhowtorepresentsomethingspreciselyorhowthatinforma;oncouldbeleveraged(severalprominentRepublicans)

who?

???

whichaWernoon?

∃esign(e,BarackObama)∧pa;ent(e,ACA)∧;me(e,Tuesday)

Page 3: CS388: Natural Language Processing Lecture 16: Informa;on

OtherChallenges

BobandAlicewerefriendsunGlhemovedawaytoaHendcollege

‣Howtorepresenttemporalinforma;on?

‣ Represen;ngtrulyopen-domaininforma;onisverycomplicated!Wedon’thaveaformalrepresenta;onthatcancaptureeverything

∃e1∃e2friends(e1,Bob,Alice)∧moved(e2,Bob)∧end_of(e1,e2)

BobandAlicewerefriendsunGlaroundthe+mehemovedawaytoaHendcollege

(Atleast)ThreeSolu;ons

‣ En;ty-rela;on-en;tytriples:focusonen;;esandtheirrela;ons(notethatprominenteventscans;llbeen;;es)

(LadyGaga,singerOf,BadRomance)

‣ CraWedannota;onstocapturesomesubsetofphenomena:predicate-argumentstructures(seman;crolelabeling),;me(temporalrela;ons),…

‣ Slotfilling:specificontology,populateinforma;oninapredefinedway

(Earthquake:magnitude=8.0,epicenter=centralItaly,…)

OpenIE‣ En;ty-rela;on-en;tytriplesaren’tnecessarilygroundedinanontology

BarackObamasignedtheAffordableCareactonTuesday.HegaveaspeechlaterthataCernoononhowtheactwouldhelptheAmericanpeople.SeveralprominentRepublicanswerequicktodenouncethenewlaw.

(BarackObama,signed,theAffordableCareact)(SeveralprominentRepublicans,denounce,thenewlaw)

‣ Extractstringsandletadownstreamsystemfigureitout

IE:TheBigPicture‣ Howdowerepresentinforma;on?Whatdoweextract?

‣ Slotfillers‣ En;ty-rela;on-en;tytriples(fixedontologyoropen)

‣ Seman;croles‣ Abstractmeaningrepresenta;on

Page 4: CS388: Natural Language Processing Lecture 16: Informa;on

Seman;cRoleLabeling/ AbstractMeaningRepresenta;on

Seman;cRoleLabeling‣ Iden;fypredicate,disambiguateit,iden;fythatpredicate’sarguments

‣ VerbrolesfromPropbank(Palmeretal.,2005)

FigurefromHeetal.(2017)

quicken:

Seman;cRoleLabeling

FigurefromHeetal.(2017)

‣ Iden;fypredicates(love)usingaclassifier(notshown)‣ Iden;fyARG0,ARG1,etc.asataggingtaskwithaBiLSTMcondi;onedonlove

‣Othersystemsincorporatesyntax,jointpredicate-argumentfinding

SRLforQA

ShenandLapata(2007)

‣Ques;onandseveralanswercandidates

Q:Whodiscoveredprions?

AC1:In1997,StanleyB.Prusiner,ascienGstintheUnitedStates,discoveredprions…

AC2:Prionswereresearchedby…

Scorebymatchingexpectedanswerphrase(EAP)againstanswercandidate(AC)

Page 5: CS388: Natural Language Processing Lecture 16: Informa;on

AbstractMeaningRepresenta;on‣Graph-structuredannota;on

Theboywantstogo

Banarescuetal.(2014)

‣ SupersetofSRL:fullsentenceanalyses,containscoreferenceandmul;-wordexpressionsaswell

‣ F1scoresinthe60s:hard!

‣ Socomprehensivethatit’shardtopredict,buts;lldoesn’thandletenseorsomeotherthings…

Summariza;onwithAMR

Liuetal.(2015)

‣ MergeAMRsacrossmul;plesentences

‣ Summariza;on=subgraphextrac;on

‣ Norealsystemsactuallyworkthisway(morewhenwetalkaboutsummariza;on)

SlotFilling

SlotFilling‣Mostconserva;ve,narrowformofIE

IndianExpress—Amassiveearthquakeofmagnitude7.3struckIraqonSunday,103kms(64miles)southeastofthecityofAs-Sulaymaniyah,theUSGeologicalSurveysaid,reportsReuters.USGeologicalSurveyiniGallysaidthequakewasofamagnitude7.2,beforerevisingitto7.3.

magnitude time

epicenter

Speaker:AlanClark“GenderRolesintheHolyRomanEmpire”AllagherCenterMainAuditoriumThistalkwilldiscuss…

speakertitle

location

FreitagandMcCallum(2000)

‣ Oldwork:HMMs,laterCRFstrainedperrole

Page 6: CS388: Natural Language Processing Lecture 16: Informa;on

SlotFilling:MUC

HaghighiandKlein(2010)

‣ Keyaspect:needtocombineinforma;onacrossmul;plemen;onsofanen;tyusingcoreference

SlotFilling:Forums

‣ Extractproductoccurrencesincybercrimeforums,butnoteverythingthatlookslikeaproductisaproduct

Portnoffetal.(2017),DurreBetal.(2017)Notaproductinthiscontext

Rela;onExtrac;on

Rela;onExtrac;on‣ Extracten;ty-rela;on-en;tytriplesfromafixedinventory

ACE(2003-2005)

DuringthewarinIraq,AmericanjournalistsweresomeGmescaughtinthelineoffire

Located_In

‣ Systemscanbefeature-basedorneural,lookatsurfacewords,syntac;cfeatures(dependencypaths),seman;croles

Na;onality

‣ Problem:limiteddataforscalingtobigontologies

‣ UseNER-likesystemtoiden;fyen;tyspans,classifyrela;onsbetweenen;typairswithaclassifier

Page 7: CS388: Natural Language Processing Lecture 16: Informa;on

HearstPaBerns

Hearst(1992)

Xsuchas[list] ciGessuchasBerlin,Paris,andLondon.

‣ Syntac;cpaBernsespeciallyforfindinghypernym-hyponympairs(“isa”rela;ons)

Berlinisacity

otherciGesincludingBerlin

YisaX

otherXincludingY

‣ Totallyunsupervisedwayofharves;ngworldknowledgefortaskslikeparsingandcoreference(BansalandKlein,2011-2012)

DistantSupervision

Mintzetal.(2009)

[StevenSpielberg]’sfilm[SavingPrivateRyan]islooselybasedonthebrothers’story

‣ Iftwoen;;esinarela;onappearinthesamesentence,assumethesentenceexpressestherela;on

‣ Lotsofrela;onsinourknowledgebasealready(e.g.,23,000film-directorrela;ons);usethesetobootstrapmoretrainingdata

Allisonco-producedtheAcademyAward-winning[SavingPrivateRyan],directedby[StevenSpielberg]

Director

Director

DistantSupervision

Mintzetal.(2009)

‣ Learndecentlyaccurateclassifiersfor~100Freebaserela;ons‣ Couldbeusedtocrawlthewebandexpandourknowledgebase

OpenIE

Page 8: CS388: Natural Language Processing Lecture 16: Informa;on

OpenInforma;onExtrac;on

‣ Typicallynofixedrela;oninventory

‣ “Open”ness—wanttobeabletoextractallkindsofinforma;onfromopen-domaintext

‣ Acquirecommonsenseknowledgejustfrom“reading”aboutit,butneedtoprocesslotsoftext(“machinereading”)

TextRunner‣ Extractposi;veexamplesof(e,r,e)triplesviaparsingandheuris;cs

BarackObama,44thpresidentoftheUnitedStates,wasbornonAugust4,1961inHonolulu

=>Barack_Obama,wasbornin,Honolulu

‣ TrainaNaiveBayesclassifiertofiltertriplesfromrawtext:usesfeaturesonPOStags,lexicalfeatures,stopwords,etc.

‣ 80xfasterthanrunningaparser(whichwasslowin2007…)

Bankoetal.(2007)

‣ Usemul;pleinstancesofextrac;onstoassignprobabilitytoarela;on

Exploi;ngRedundancy

Bankoetal.(2007)

‣ Concrete:definitelytrueAbstract:possiblytruebutunderspecified

‣Hardtoevaluate:canassessprecision ofextractedfacts,buthowdoweknowrecall?

‣ 9Mwebpages/133Msentences

‣ 2.2tuplesextractedpersentence,filterbasedonprobabili;es

ReVerb

Faderetal.(2011)

‣Moreconstraints:openrela;onshavetobeginwithverb,endwithpreposi;on,becon;guous(e.g.,wasbornon)

‣ Extractmoremeaningfulrela;ons,par;cularlywithlightverbs

Page 9: CS388: Natural Language Processing Lecture 16: Informa;on

ReVerb

Faderetal.(2011)

‣ Foreachverb,iden;fythelongestsequenceofwordsfollowingtheverbthatsa;sfyaPOSregex(V.*P)andwhichsa;sfyheuris;clexicalconstraintsonspecificity

‣ Findthenearestargumentsoneithersideoftherela;on

‣ Annotatorslabeledrela;onsin500documentstoassessrecall

QAfromOpenIE

Choietal.(2015)

Takeaways

‣ Rela;onextrac;on:cancollectdatawithdistantsupervision,usethistoexpandknowledgebases

‣ Slotfilling:;edtoaspecificontology,butgivesfine-grainedinforma;on

‣ OpenIE:extractslotsofthings,buthardtoknowhowgoodorusefultheyare

‣ Cancombinewithstandardques;onanswering

‣ Addnewfactstoknowledgebases

‣ SRL/AMR:handleabunchofphenomena,butmoreorlesslikesyntax++intermsofwhattheyrepresent

‣ Many,manyapplica;onsandtechniques