leveraging linked data to infer semantic relations within structured sources
TRANSCRIPT
Leveraging Linked Data to Infer Semantic Relations within
Structured Sources
Mohsen Taheriyan Craig A. Knoblock
Pedro Szekely Jose Luis Ambite
Yinyi Chen
Problem: How to map structured data to
a domain ontology?
4
Semantic ModelMap the source to the classes & properties in an
ontologytitle date name
1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey DewingSour
ceDo
mai
n On
tolo
gy
CIDOC-CRM 85 classes297 properties
Semantic Types
E35_Title E82_Actor_Appellation
rdfs:label rdfs:label
5
E52_Time-Span
title date name1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey Dewing
P82_at_some_time_within
Relationships
E35_Title E82_Actor_Appellation
rdfs:label rdfs:label
6
E52_Time-Span
title date name1 The Island 2009 Walton Ford
2 Excavation at Night 1908 George Wesley Bellows
3 Rose Garden 1901 Maria Oakey Dewing
P82_at_some_time_within
E22_Man-Made_Object
E12_Production E21_Person
P102_has_title
P108_was_produced_by
P4_has_time-span
P14_carried_out_by
P131_is_identified_by
7
Idea
• There is a huge amount of linked data available in many domains (RDF format)
• Use LOD as the background knowledge
• Exploit the relationships between instances
8
Approach
Extract graph patterns from the linked data
• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models
1
2
3
InputA ranked set of semantic models for S
Output
9
Approach
Extract graph patterns from the linked data
• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models
1
2
3
InputA ranked set of semantic models for S
Output
10
LOD Patterns../person-
institution/57551 E21_Personrdf:type
Thomas Burgonskos:prefLabel
../person-institution/57551/birthP98i_was_born
../person-institution/57551/birth/date
P4_has_time-span
E67_Birthrdf:type
1787rdfs:label
E52_Time-Span
rdf:type
LOD fragment from the British Museum
11
LOD Patterns
E67_BirthE21_Person
P98i_was_born
../person-institution/57551 E21_Person
rdf:type
Thomas Burgonskos:prefLabel
../person-institution/57551/birthP98i_was_born
../person-institution/57551/birth/date
P4_has_time-span
E67_Birthrdf:type
1787rdfs:label
E52_Time-Span
P4_has_time-span
E52_Time-Span
rdf:type
LOD fragment from the British Museum
Pattern
12
Pattern Templates
• Many possible templates for patterns– Example: patterns for classes C1, C2, C3
• Consider only tree patterns• Limit the length of the patterns
13
Extracting Patterns
• Use SPARQL to query RDF data• Example: patterns with length 1SELECT DISTINCT ?c1 ?p ?c2 (COUNT(*) as ?count) WHERE {
?x ?p ?y. ?x rdf:type ?c1.?y rdf:type ?c2. FILTER (?x != ?y).}
GROUP BY ?c1 ?p ?c2 ORDER BY DESC(?count);
14
Approach
Extract graph patterns from the linked data
• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models
1
2
3
InputA ranked set of semantic models for S
Output
15
Merge the Patterns into a Graph
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
P4_has_time-span
Links are weighted: less weight for more frequent linksLinks have tags: the identifier of the patterns containing the link
16
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
E39_Actor
P1_is_identified_by
P1_is_identified_by
P98i_was_born
P14_carried_out_by
P4_has_time-span
Add the paths from the Ontology
The links added from the patterns have much less weight compared to the links added from the ontology
17
Approach
Extract graph patterns from the linked data
• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)
Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models
1
2
3
InputA ranked set of semantic models for S
Output
18
Map Semantic Labels to the Graph
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
E39_Actor
P1_is_identified_by
P1_is_identified_by
P98i_was_born
P14_carried_out_by
P4_has_time-span
19
Map Semantic Labels to the Graph
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
E39_Actor
P1_is_identified_by
P1_is_identified_by
P98i_was_born
P14_carried_out_by
P4_has_time-span
20
Generate and Rank Semantic Models
• Compute Steiner tree for the mapping– A minimal tree connecting nodes of
mapping– A customization of BANKS algorithm
[Bhalotia et al., 2002]• Our algorithm considers both
coherence and popularity• Each tree is a candidate model• Rank the models based on coherence
and cost
21
What Is Coherence?
Place
Person
organizer
Event
location
p1 p2 p3
Place
Person
bornIn
Place
isPartOf
PlacePlace
bornIn
Person
diedIn
Patte
rns
22
What Is Coherence?
Place
Person
organizer
Event
location
p1 p2 p3
Place
Person
bornIn
Place
isPartOf
PlacePlace
bornIn
Person
diedIn
PlacePerson
organizer
Event
location1 1
bornIn
0.5
p2, p3
p1p1Place
isPartOfp21
1diedInp3
Patte
rns
Grap
h
23
What Is Coherence?
Place
Person
organizer
Event
location
p1 p2 p3
Place
Person
bornIn
Place
isPartOf
PlacePlace
bornIn
Person
diedIn
Patte
rns
Grap
h
Labe
ls PersonPlaceEvent
PlacePerson
organizer
Event
location1 1
bornIn
0.5
p2, p3
p1p1Place
isPartOfp21
1diedInp3
24
What Is Coherence?
Place
Person
organizer
Event
location
p1 p2 p3
Place
Person
bornIn
Place
isPartOf
PlacePlace
bornIn
Person
diedIn
Patte
rns
Grap
h
Labe
ls PersonPlaceEvent
PlacePerson
organizer
Event
location1 1
bornIn
0.5
p2, p3
p1p1Place
isPartOfp21
1diedInp3
Stei
ner
Tree
s Place
Person
organizer
Event
bornIn
p1
p2, p3
Place
Person
location
Event
bornIn
p1
p2, p3
Place
Person
organizer
Event
p1location
p1
Not minimal model
but more coherent
25
Steiner Tree
E12_ProductionE53_Title
P108i_was_produced_by
E52_Time-Span
E82_Actor_Appellation
E22_Man-Made_Object
E21_Person
P102_has_title
P14_carried_out_by P131_is_identified_by
E67_Birth
P98i_was_born
P4_has_time-span
E39_Actor
P1_is_identified_by
P1_is_identified_by
P98i_was_born
P14_carried_out_by
P4_has_time-span
Evaluation
• Correct semantic types given• Linked data: 3,398,350 triples published by Smithsonian
American Art Museum• Extracted patterns of length 1 and 2• Compute precision and recall between learned links and
correct links26
Evaluation Dataset# sources 29# classes in the ontologies 147# properties in the ontologies 409# nodes in the gold standard models 812# links in the gold standard models 785
27
Example
Person
Artwork
location
Museum
creator
correct model
Person
Museum
location
Artwork
founder
learned model
<Artwork,location,Museum><Artwork,creator,Person>
<Museum,founder,Person><Artwork,location,Museum>
Precision: 0.5Recall: 0.5
28
Gold Standard Models - Example 1
29
Gold Standard Models - Example 2
30
Results
background knowledge precision recall time
(s)
domain ontology 0.07 0.05 0.17
domain ontology + patterns of length 1 0.65 0.55 0.75domain ontology + patterns of length 1 and 2 0.78 0.70 0.46
31
Related Work• Mapping databases and spreadsheets to ontologies
– Mapping languages and tools (D2R, R2RML)– String similarity between column names and ontology terms
• Understand semantics of Web tables– Use column headers and cell values to find the labels and
relations from a database of labels and relations populated from the Web
• Exploit Linked Open Data (LOD)– Link the values to the entities in LOD to find the types of the
values and their relationships
• Learn semantic models of structured data sources from previously modeled sources– Learn from the popular and coherent patterns in known
semantic models
32
Discussion & Future Work
• Automatically Infer semantic relations from LOD
• Help to publish consistent RDF data
• Extract longer patterns from LOD