relation extraction - github pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf ·...

50
Relation Extraction Many slides from Dan Jurafsky

Upload: dinhxuyen

Post on 14-Aug-2019

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

ManyslidesfromDanJurafsky

Page 2: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Extractingrelationsfromtext

• Companyreport: “InternationalBusinessMachinesCorporation(IBMorthecompany)wasincorporatedintheStateofNewYorkonJune16,1911,astheComputing-Tabulating-RecordingCo.(C-T-R)…”• ExtractedComplexRelation:

Company-FoundingCompany IBMLocation NewYorkDate June16,1911Original-Name Computing-Tabulating-RecordingCo.

• ButwewillfocusonthesimplertaskofextractingrelationtriplesFounding-year(IBM,1911)Founding-location(IBM,New York)

Page 3: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

ExtractingRelationTriplesfromTextTheLelandStanfordJuniorUniversity,commonlyreferredtoasStanfordUniversityorStanford,isanAmericanprivateresearchuniversitylocatedinStanford,California …nearPaloAlto,California…LelandStanford…foundedtheuniversityin1891

Stanford EQLelandStanfordJuniorUniversityStanford LOC-INCaliforniaStanford IS-Aresearch universityStanford LOC-NEARPaloAltoStanford FOUNDED-IN1891StanfordFOUNDERLelandStanford

Page 4: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

WhyRelationExtraction?

• Createnewstructuredknowledgebases,usefulforanyapp• Augmentcurrentknowledgebases• AddingwordstoWordNet thesaurus,factstoFreeBase orDBPedia

• Supportquestionanswering• Thegranddaughterofwhichactorstarredinthemovie“E.T.”?(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)

• Butwhichrelationsshouldweextract?

4

Page 5: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

AutomatedContentExtraction(ACE)

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

GeographicalSubsidiary

Sports-Affiliation

17relationsfrom2008“RelationExtractionTask”

Page 6: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

AutomatedContentExtraction(ACE)

• Physical-LocatedPER-GPEHe was in Tennessee

• Part-Whole-SubsidiaryORG-ORGXYZ, the parent company of ABC

• Person-Social-FamilyPER-PERJohn’s wife Yoko

• Org-AFF-FounderPER-ORGSteve Jobs, co-founder of Apple…

6

Page 7: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

UMLS:UnifiedMedicalLanguageSystem

• 134entitytypes,54relations

Injury disrupts PhysiologicalFunctionBodilyLocation location-of BiologicFunctionAnatomicalStructure part-of OrganismPharmacologicSubstancecauses PathologicalFunctionPharmacologicSubstancetreats PathologicFunction

Page 8: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

ExtractingUMLSrelationsfromasentence

Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes

ê

Echocardiography,DopplerDIAGNOSES Acquiredstenosis

8

Page 9: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

DatabasesofWikipedia Relations

9

RelationsextractedfromInfoboxStanfordstateCaliforniaStanfordmotto “DieLuft derFreiheit weht”…

WikipediaInfobox

Page 10: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationdatabasesthatdrawfromWikipedia

• ResourceDescriptionFramework(RDF)triplessubjectpredicate objectGolden Gate Park location San Franciscodbpedia:Golden_Gate_Park dbpedia-owl:location dbpedia:San_Francisco

• DBPedia:1billionRDFtriples,385fromEnglishWikipedia• FrequentFreebaserelations:

people/person/nationality,location/location/containspeople/person/profession,people/person/place-of-birthbiology/organism_higher_classification film/film/genre

10

Page 11: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Ontologicalrelations

• IS-A(hypernym):subsumption betweenclasses•Giraffe IS-Aruminant IS-A ungulate IS-Amammal IS-Avertebrate IS-Aanimal…

• Instance-of:relationbetween individual andclass•San Francisco instance-of city

ExamplesfromtheWordNet Thesaurus

Page 12: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Howtobuildrelationextractors

1. Hand-writtenpatterns2. Supervisedmachinelearning3. Semi-supervisedandunsupervised• Bootstrapping(usingseeds)• Distantsupervision• Unsupervisedlearningfromtheweb

Page 13: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Whatisrelationextraction?

Page 14: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Usingpatternstoextractrelations

Page 15: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RulesforextractingIS-Arelation

EarlyintuitionfromHearst(1992)• “Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium, forlaboratoryorindustrial use”

• WhatdoesGelidium mean?• Howdoyouknow?`

Page 16: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RulesforextractingIS-Arelation

EarlyintuitionfromHearst(1992)• “Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium,forlaboratoryorindustrial use”

• WhatdoesGelidium mean?• Howdoyouknow?`

Page 17: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Hearst’sPatternsforextractingIS-Arelations

(Hearst,1992):AutomaticAcquisitionofHyponyms

“Y such as X ((, X)* (, and|or) X)”“such Y as X”“X or other Y”“X and other Y”“Y including X”“Y, especially X”

Page 18: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Hearst’sPatternsforextractingIS-Arelations

Hearstpattern ExampleoccurrencesXandother Y ...temples,treasuries,andotherimportantcivicbuildings.

XorotherY Bruises,wounds,brokenbonesorotherinjuries...

YsuchasX Thebowlute,suchastheBambarandang...

Such YasX ...such authorsas Herrick,Goldsmith,andShakespeare.

YincludingX ...common-lawcountries,including CanadaandEngland...

Y,especiallyX Europeancountries,especially France,England,andSpain...

Page 19: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

ExtractingRicherRelationsUsingRules

• Intuition: relationsoftenholdbetweenspecificentities• located-in(ORGANIZATION, LOCATION)• founded (PERSON,ORGANIZATION)• cures(DRUG,DISEASE)

• StartwithNamedEntitytagstohelpextractrelation!

Page 20: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

NamedEntitiesaren’tquiteenough.Whichrelationsholdbetween2entities?

Drug Disease

Cure?Prevent?

Cause?

Page 21: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Whatrelationsholdbetween2entities?

PERSON ORGANIZATION

Founder?

Investor?

Member?

Employee?

President?

Page 22: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

ExtractingRicherRelationsUsingRulesandNamedEntities

Whoholdswhatofficeinwhatorganization?PERSON, POSITION of ORG

• GeorgeMarshall,SecretaryofStateoftheUnitedStates

PERSON(named|appointed|chose|etc.) PERSON Prep?POSITION• TrumanappointedMarshallSecretaryofState

PERSON [be]?(named|appointed|etc.) Prep?ORG POSITION• GeorgeMarshallwasnamedUSSecretaryofState

Page 23: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Hand-builtpatternsforrelations• Plus:•Humanpatternstendtobehigh-precision• Canbetailoredtospecificdomains

•Minus•Humanpatternsareoftenlow-recall•Alotofworktothinkofallpossiblepatterns!•Don’twanttohavetodothisforeveryrelation!•We’dlikebetteraccuracy

Page 24: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Usingpatternstoextractrelations

Page 25: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Supervisedrelationextraction

Page 26: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Supervisedmachinelearningforrelations

• Chooseasetofrelationswe’dliketoextract• Chooseasetofrelevantnamedentities• Findandlabeldata• Choosearepresentativecorpus• Labelthenamedentitiesinthecorpus• Hand-labeltherelationsbetweentheseentities• Breakintotraining,development,andtest

• Trainaclassifieronthetrainingset

26

Page 27: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Howtodoclassificationinsupervisedrelationextraction

1. Findallpairsofnamedentities(usuallyinsamesentence)

2. Decideif2entitiesarerelated3. Ifyes,classifytherelation•Whytheextrastep?• Fasterclassificationtrainingbyeliminatingmostpairs• Canusedistinctfeature-setsappropriateforeachtask.

27

Page 28: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

AutomatedContentExtraction(ACE)

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

GeographicalSubsidiary

Sports-Affiliation

17sub-relationsof6relationsfrom2008“RelationExtractionTask”

Page 29: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Classifytherelationbetweentwoentities inasentence

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman

TimWagnersaid.

SUBSIDIARY

FAMILYEMPLOYMENT

NIL

FOUNDER

CITIZEN

INVENTOR…

Page 30: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

WordFeaturesforRelationExtraction

• HeadwordsofM1andM2,andcombinationAirlinesWagnerAirlines-Wagner

• BagofwordsandbigramsinM1andM2{American,Airlines,Tim,Wagner,AmericanAirlines,TimWagner}

• WordsorbigramsinparticularpositionsleftandrightofM1/M2M2:-1spokesmanM2:+1said

• Bagofwordsorbigramsbetweenthetwoentities{a,AMR,of,immediately,matched,move,spokesman,the,unit}

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2

Page 31: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

NamedEntityTypeandMentionLevelFeaturesforRelationExtraction

• Named-entitytypes• M1:ORG• M2:PERSON

• Concatenationofthetwonamed-entitytypes• ORG-PERSON

• EntityLevelofM1andM2 (NAME,NOMINAL,PRONOUN)• M1:NAME [itor hewouldbePRONOUN]• M2:NAME [thecompanywouldbeNOMINAL]

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2

Page 32: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

ParseFeaturesforRelationExtraction

• BasesyntacticchunksequencefromonetotheotherNPNPPPVPNPNP

• ConstituentpaththroughthetreefromonetotheotherNPé NPé Sé Sê NP

• DependencypathAirlines<- matched->Wagner->said

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesman TimWagnersaidMention1 Mention2

Page 33: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Gazeteer andtriggerwordfeaturesforrelationextraction• Triggerlistforfamily:kinshipterms• parent,wife,husband,grandparent,etc.[fromWordNet]

• Gazeteer:• Listsofusefulgeoorgeopoliticalwords

• Countrynamelist• Othersub-entities

Page 34: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

AmericanAirlines,aunitofAMR,immediatelymatchedthemove,spokesmanTimWagnersaid.

Page 35: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Classifiersforsupervisedmethods

• Nowyoucanuseanyclassifieryoulike•MaxEnt• NaïveBayes• SVM• ...

• Trainitonthetrainingset,tuneonthedev set,testonthetestset

Page 36: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

EvaluationofSupervisedRelationExtraction

•ComputeP/R/F1 foreachrelation

36

P = # of correctly extracted relationsTotal # of extracted relations

R = # of correctly extracted relationsTotal # of gold relations

F1 =2PRP + R

Page 37: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Summary:SupervisedRelationExtraction

+ Cangethighaccuracieswithenoughhand-labeledtrainingdata,iftestsimilarenoughtotraining- Labelingalargetraining setisexpensive

- Supervisedmodelsarebrittle, don’tgeneralizewelltodifferentgenres

Page 38: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Supervisedrelationextraction

Page 39: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Semi-supervisedandunsupervisedrelationextraction

Page 40: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Seed-basedorbootstrappingapproachestorelationextraction

•Notraining set?Maybeyouhave:• Afewseedtuplesor• Afewhigh-precisionpatterns

•Canyouusethoseseedstodosomethinguseful?• Bootstrapping:usetheseedstodirectlylearntopopulatearelation

Page 41: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationBootstrapping(Hearst1992)

•GatherasetofseedpairsthathaverelationR• Iterate:1. Findsentenceswiththesepairs2. Lookatthecontextbetweenoraroundthepair

andgeneralizethecontexttocreatepatterns3. Usethepatterns forgrep formorepairs

Page 42: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Bootstrapping

• <MarkTwain,Elmira>Seedtuple• Grep (google)fortheenvironmentsoftheseedtuple“MarkTwainisburiedinElmira,NY.”

XisburiedinY“ThegraveofMarkTwainisinElmira”

ThegraveofXisinY“ElmiraisMarkTwain’sfinalrestingplace”

YisX’sfinalrestingplace.

• Usethosepatternstogrep fornewtuples• Iterate

Page 43: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Dipre:Extract<author,book>pairs

• Startwith5seeds:

• FindInstances:TheComedyofErrors,byWilliamShakespeare,wasTheComedyofErrors,byWilliamShakespeare,isTheComedyofErrors,oneofWilliamShakespeare'searliestattemptsTheComedyofErrors,oneofWilliamShakespeare'smost

• Extractpatterns(groupbymiddle,takelongestcommonprefix/suffix)?x , by ?y , ?x , one of ?y ‘s

• Nowiterate,findingnewseedsthatmatchthepattern

Brin,Sergei.1998.ExtractingPatternsandRelationsfromtheWorldWideWeb.Author BookIsaacAsimov TheRobots ofDawnDavidBrin Startide RisingJamesGleick Chaos:MakingaNewScienceCharlesDickens GreatExpectationsWilliamShakespeare TheComedyofErrors

Page 44: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Snowball

• Similariterativealgorithm

• Groupinstancesw/similarprefix,middle,suffix,extractpatterns• ButrequirethatXandYbenamedentities• Andcomputeaconfidenceforeachpattern

{’s, in, headquarters}

{in, based} ORGANIZATIONLOCATION

Organization LocationofHeadquartersMicrosoft RedmondExxon IrvingIBM Armonk

E.Agichtein andL.Gravano 2000.Snowball:ExtractingRelationsfromLargePlain-TextCollections.ICDL

ORGANIZATION LOCATION .69

.75

Page 45: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

DistantSupervision

•Combinebootstrappingwithsupervised learning• Insteadof5seeds,• Usealargedatabasetogethuge#ofseedexamples

•Createlotsoffeaturesfromalltheseexamples•Combineinasupervised classifier

Snow,Jurafsky,Ng.2005.Learningsyntacticpatternsforautomatichypernymdiscovery.NIPS17Fei WuandDanielS.Weld.2007.AutonomouslySemantifyingWikipeida.CIKM2007Mintz,Bills,Snow,Jurafsky.2009.Distantsupervisionforrelationextractionwithoutlabeleddata.ACL09

Page 46: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Distantsupervisionparadigm

• Likesupervised classification:• Usesaclassifierwithlotsoffeatures• Supervisedbydetailedhand-createdknowledge• Doesn’trequireiterativelyexpandingpatterns

• Likeunsupervised classification:• Usesverylargeamountsofunlabeleddata• Notsensitivetogenreissuesintrainingcorpus

Page 47: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Distantlysupervisedlearningofrelationextractionpatterns

Foreachrelation

Foreachtupleinbigdatabase

Findsentencesinlargecorpuswithbothentities

Extractfrequentfeatures(parse,words,etc)

Trainsupervisedclassifierusingthousandsofpatterns

4

1

2

3

5

PERwasborninLOCPER,born(XXXX), LOCPER’sbirthplaceinLOC

<EdwinHubble,Marshfield><AlbertEinstein,Ulm>

Born-In

HubblewasborninMarshfieldEinstein,born(1879),UlmHubble’sbirthplaceinMarshfield

P(born-in | f1,f2,f3,…,f70000)

Page 48: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

Unsupervisedrelationextraction

• OpenInformationExtraction:• extractrelationsfromthewebwithnotrainingdata,nolistofrelations

1. Useparseddatatotraina“trustworthytuple”classifier2. Single-passextractallrelationsbetweenNPs,keepiftrustworthy3. Assessorranksrelationsbasedontextredundancy

(FCI,specializesin,softwaredevelopment)(Tesla,invented,coiltransformer)

48

M.Banko,M.Cararella,S.Soderland,M.Broadhead, andO.Etzioni.2007.Openinformationextractionfromtheweb. IJCAI

Page 49: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

EvaluationofSemi-supervisedandUnsupervisedRelationExtraction

• Sinceitextractstotallynewrelationsfromtheweb• Thereisnogoldsetofcorrectinstancesofrelations!• Can’tcomputeprecision(don’tknowwhichonesarecorrect)• Can’tcomputerecall(don’tknowwhichonesweremissed)

• Instead,wecanapproximateprecision(only)• Drawarandomsampleofrelationsfromoutput,checkprecisionmanually

• Canalsocomputeprecisionatdifferentlevelsofrecall.• Precisionfortop1000newrelations,top10,000newrelations,top100,000• Ineachcasetakingarandomsampleofthatset

• Butnowaytoevaluaterecall49

P̂ = # of correctly extracted relations in the sampleTotal # of extracted relations in the sample

Page 50: relation extraction - GitHub Pagesaritter.github.io/courses/5525_slides/relation_extraction.pdf · Why Relation Extraction? • Create new structured knowledge bases, useful for any

RelationExtraction

Semi-supervisedandunsupervisedrelationextraction