enterprise knowledge graphs for large scale analytics · “we defines a knowledge graph as an rdf...
Post on 31-Aug-2019
9 Views
Preview:
TRANSCRIPT
EnterpriseKnowledgeGraphsforLargeScaleAnalytics
NidhiRajshree,IBMWatson,USA,Nitish Aggarwal,IBMWatson,USA,Sumit Bhatia, IBMResearch,IndiaAnshu Jain,IBMResearch,Almaden,USA
The material presented in this tutorial represents the personal opinion of the presenters and not of IBM and affiliated organization.
Outlineofthetutorial
Part 1: Knowledge Graph Construction• Introduction• DBpedia: Knowledge extraction• Approaches to extend knowledge graph• Knowledge extraction from scratch
Part 2: Knowledge Graph Analytics• Finding entities of interest• Entity exploration• Upcoming challenges
WhatisKnowledgeGraph
“The KnowledgeGraph isa knowledgebase usedby Google toenhanceits searchengine'ssearchresultswith semantic-searchinformationgatheredfromawidevarietyofsources.”
WhatisKnowledgeGraph
“The KnowledgeGraph isa knowledgebase usedby Google toenhanceits searchengine'ssearchresultswith semantic-searchinformationgatheredfromawidevarietyofsources.”
“AKnowledgegraph(i)mainlydescribesrealworldentitiesandinterrelations,organizedinagraph(ii)definespossibleclassesandrelationsofentitiesinaschema”(iii)allowspotentiallyinterrelatingarbitraryentitieswitheachother… [Paulheim H.]
“WedefinesaKnowledgeGraphasanRDFgraphconsistsofasetofRDFtripleswhereeachRDFtriple(s,p,o)isanorderedsetoffollowingRDFterm….”[Pujara J.alal.]
WhatisKnowledgeGraph
Nosingleformaldefinition…
• Definesrealworldentities
• Providesrelationshipsbetweenthem
WhatisKnowledgeGraph
Nosingleformaldefinition…
• Definesrealworldentities
• Providesrelationshipsbetweenthem
• Containsrulesdefinesthroughontologies
• Enablereasoningtoinfernewknowledge
WhyKnowledgeGraph
Building an intelligent system that can interact with human, requires knowledge about real world entities.
WhyKnowledgeGraph
Building an intelligent system that can interact with human, requires knowledge about real world entities.
• Enhance search results.
• Enhance ad sense.
• Help in language understanding.
• Enables knowledge discovery.
Isthereexistingknowledgegraphreadytouseformyapplication?
GoogleKnowledgeGraphFacebook
EntityGraph
MicrosoftSatori
LinkedInKnowledgeGraph
AmazonProductGraph
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.
DBpedia:Knowledgeextraction
TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.
<NewYorkCity>,<CityIn><UnitedStates>.
<CityName>,<locatedIn><CountryName>.
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
<headentity>,<rel>< tailentity>
DBpedia:Knowledgeextraction
<headentity>,<rel>< tailentity>
WikipediaInfobox
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
DBpedia:Knowledgeextraction
Parsers
Ontology(Classes,properties)
dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint
dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint
dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint
……………
(Research)problemsinknowledgegraphs
• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types
(Research)problemsinknowledgegraphs
• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types
• Incorrect knowledge– Wrong entity label recognition– Wrong entity and relation type– Wrong facts
(Research)problemsinknowledgegraphs
• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types
• Incorrect knowledge– Wrong entity label recognition– Wrong entity and relation type– Wrong facts
• Inconsistency in knowledge– Different labels for same entity– Merging entities with same labels
Approachestoextendknowledgegraphs
• Extracting knowledge from Wikipedia tables– Large amount of raw data in form of tables– Tables have some implicit structure/patterns
Approachestoextendknowledgegraphs
• Extracting knowledge from Wikipedia tables– Large amount of raw data in form of tables– Tables have some implicit structure/patterns
Wiki:AFC_Ajax containingrelationsbetweenplayers,theirshirtnumber,andcountry
Approachestoextendknowledgegraphs
• <Wiki:AFC_Ajax,dbp:rel,Wiki:Andre_Onana>• 80%entitiesinthetablehaverelationdbp:rel withtheWikipediatitleentity
Wiki_AFC_Ajax• Other20%entitiesarelikelytohavethesamerelationshipdbp:rel withWiki_AFC_Ajax
[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014
Approachestoextendknowledgegraphs
[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014
• Features– Articlefeatures:no.oftables,length– Tablefeatures:no.ofrows,no.ofcolumns– Columnfeatures:no.ofentitiesincolumn,potentialrelations– Cellfeatures:no.ofentitiesinacell,lengthofcell– Manyothers
• Combinesusingclassificationmethod
Prec. Rec. F1
Rule-based 64.23 70.46 67.20
SVM 72.43 75.77 74.06
Logistic 79.62 79.01 79.31
Approachestoextendknowledgegraphs
[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014
• Features– Articlefeatures:no.oftables,length– Tablefeatures:no.ofrows,no.ofcolumns– Columnfeatures:no.ofentitiesincolumn,potentialrelations– Cellfeatures:no.ofentitiesinacell,lengthofcell– Manyothers
• Combinesusingclassificationmethod
Prec. Rec. F1
Rule-based 64.23 70.46 67.20
SVM 72.43 75.77 74.06
Logistic 79.62 79.01 79.31
• Rules/heuristicsbasedmethodsmakesmistakes,andhardtocreateoneruleforeveryone.
• Eventhoughcombiningdifferentfeaturesachieves80%accuracy,itintroduces20%noise.
Tabledataislimited,weneedtogobeyond
Approachestoextendknowledgegraphs
• Missingentity/literalforarelation– “ChristopherA.WeltyisanAmericancomputerscientist,whoworksat
GoogleResearchinNY”• <dbr:Chris_Welty><employedBy><?>
– "TomCruiseandBradPittappearinInterviewwiththeVampire"• <dbr:Brad_Pitt><?><dbr:Tom_Cruise>
Approachestoextendknowledgegraphs
• Missingentity/literalforarelation– “ChristopherA.WeltyisanAmericancomputerscientist,whoworksat
GoogleResearchinNY”• <dbr:Chris_Welty><employedBy><?>
– "TomCruiseandBradPittappearinInterviewwiththeVampire"• <dbr:Brad_Pitt><?><dbr:Tom_Cruise>
• KnowledgeBaseCompletion– Similartolinkpredictioninsocialnetworkbutabitmorechallenging– Needtoidentifyrelationtypeinadditiontobinaryoutput.
Approachestoextendknowledgegraphs
• KnowledgeBaseCompletion– TransE:learntheentityandrelationembeddings byassumingthattranslation
ofentityembeddings correspondtotheirrelationembeddings.[Bordes etat.2013]
– S+R≈T,where<S,R,T>
– TransH:Learndifferententityembeddingfordifferentrelationships[Wangatel.2014]
– TransR:Learnentityandrelationembeddings indifferentspace,followingbytranslationperforminrelationspace.[LinY.atel.2015]
– Manymoremethods [NickelM.atal,2015]
Knowledgebasecompletionapproachesfocusonfindingmissingentities/relations
Needtoaddnewentitiesfromexternalsources
Needtoaddnewentitiesfromexternalsources
• Entityrecognitioninexternaltextresource• ManyNamedEntityRecognitionsystems
• LinkextractedentitytoKGorcreateanewnodeifitdoesnothaveacorrespondingentity
• TAC-KBP(EntityDiscoveryandLinkingtask)[JiH.atel.2016]
BuildingknowledgegraphsuchasDBpedia requireslotofmanualefforts
BuildingknowledgegraphsuchasDBpedia requireslotofmanualefforts
• Manyapplicationsrequiredomain/dataspecificcustomknowledgegraphs.
• CreatingschemawithclassstructureandconstraintsforeachKGisdifficult.
Howtocreateaknowledgegraphfromunstructuredtext?
JonathonWatsonworksatIBM.Hehasmorethan50patents,andwonbestinventorawardforhisinvention“NeuralChipbyJonWatsonetal.
JonathonWatsonworksatIBM.Hehasmorethan50patents,andwonbestinventorawardforhisinvention“NeuralChipbyJonWatsonetal.
Entityextraction
Relationextraction
Noisereduction KG
JonathonWatsonIBMJonWatson
employedBy(JonathonWatson,IBM)JonWatson
JonathonWatson,JonWatson
Relationextraction
• Supervised methods
Predefined schema (employedBy, bornOn, BirthPlace …)
Training data
JonathonWatsonworksatIBM.
JonathonWatsonjoinedIBM.
employedBy
employedBy
Relationextraction
• Supervised methods
Predefined schema (employedBy, bornOn, BirthPlace …)
Training data Test data
JonathonWatsonworksatIBM.
JonathonWatsonjoinedIBM.
employedBy
employedBy JonathonWatsonismanageratIBM.
?
Relationextraction
• Supervised methods
Predefined schema (employedBy, bornOn, BirthPlace …)
Training data Test data
JonathonWatsonworksatIBM.
JonathonWatsonjoinedIBM.
employedBy
employedBy JonathonWatsonismanageratIBM.
employedBy
Relationextraction
• Supervised methods
Pros: High accuracy and less noise
Cons: Hard and expensive to build labeled data
Relationextraction
• Supervised methods• Distantly supervised methods
employedBy (JonWatson,IBM)
affiliated(MichaelDecker,,SMU)
JonWatsonworksatIBM.
JonWatsonbecomesVPatIBM.……….
MichaelDeckerjoinsDataSciencegroupatSMU.
MichaelDeckerwonanationalfundingawardat
SMU.……….
Relationextraction
• Supervised methods• Distantly supervised methods
employedBy (JonWatson,IBM)
affiliated(MichaelDecker,,SMU)
JonWatsonworksatIBM.
JonWatsonbecomesVPatIBM.……….
MichaelDeckerjoinsDataSciencegroupatSMU.
MichaelDeckerwonanationalfundingawardat
SMU.……….
Trainingsentences
Relationextraction
• Supervised methods• Distantly supervised methods
Pros: Overcome the effort of labeling data
Cons: Dependency of existing knowledge graph and corresponding . text
Relationextraction
• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)
JonathonWatsonworksatIBM.
JonathonWatsonjoinedIBM.
join
(ROOT(S(NP(JonWatson))(VP(VBZworks)(PP(INat)(NPIBM)))
(ROOT(S(NP(JonWatson))(VP(VBDjoined)(NPIBM))
work
Relationextraction
• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)
Pros: eliminates the effort of labeling data
Cons: Noisy, large number of relations
Relationextraction
• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)
Relation1 Relation2 Relation3
WorksemployerCompany
employedBy….
livesIncurrentCityCountry
….
VicePresidentexecutive
Boardmember
….
Relationextraction(UniversalSchema)
• Clustering using vector similarity• Matrix completion and fill the empty values [YaoL.atel.,
2012]
employeBy affiliated Leaderof
Jon x x
Michael x
Steve x x
Joyce x x x
Entitytypesidentification(UniversalSchema)
• Clustering using vector similarity• Matrix completion and fill the empty values [YaoL.atel.,
2012]
director musician actor
Jon x x
Michael x x
Steve x
Joyce x x
Relationextractionindomain
• Supervised methods – Need domain experts to label the data• Distantly supervised methods – Hard to find corresponding
text• Unsupervised methods (OpenIE, Universal Schema) – Noisy
A 59-year-old African American man with a past medical history of hypertension, benign prostatichypertrophy, type II diabetes mellitus for the past 15 years, and chronic back pain presents to the hospitalwith gross hematuria. The patient states that he noticed blood in his urine last night. The patient also reportsmild, intermittent flank pain. The patient states that his diabetes and blood pressure are well controlled withmedications, and that he has managed his chronic back pain with 2 aspirin per day for the past 4 years. Vitalsigns are Temp- 98.6°F, BP- 124/82 mm/Hg, pulse- 88/min, and RR- 14/min. Blood work is notable for HbA1Cof 6.5%. A pyelogram reveals a ring sign. His current fasting glucose is 140mmol/L.<br /><br />What is themost likely etiology of hematuria in this patient?
Symptom
Knowledgegraphsindomain
• Domain specific entity extraction is more challenging
• Limited relation types
• Less explicit mention of entity and relation types in text
• Creating simple schema requires domain experts
Knowledgegraph- Simple
JonathonWatsonworksatIBM.
MichaelDeckerjoinedIBM.
MichaelDeckerattendsSMU
JonathonWatson
IBM
MichaelDecker
SMU
Knowledgegraph- Simple+Schema
JonathonWatsonworksatIBM.
MichaelDeckerjoinsIBM.
MichaelDeckerattendedSMU
JonathonWatson
IBM
MichaelDecker
SMU
affiliated
affiliatedaffiliated
Knowledgegraph- Simple+Schema+Ontology
JonathonWatsonworksatIBM.
MichaelDeckerjoinedIBM.
MichaelDeckerattendsSMU
JonathonWatson
IBM
MichaelDecker
SMU
affiliated
affiliatedaffiliated
Domain,range,constraint
Summary
• Simple knowledge graph works for many applications
• Identify the requirement before finding the solution.
• Many knowledge graphs are publically available
https://www.youtube.com/watch?v=kao05ArIiok&feature=youtu.be
top related