leveraging linked data to infer semantic relations within structured sources

31
Leveraging Linked Data to Infer Semantic Relations within Structured Sources Mohsen Taheriyan Craig A. Knoblock Pedro Szekely Jose Luis Ambite Yinyi Chen

Upload: mohsen-taheriyan

Post on 11-Jan-2017

544 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

Leveraging Linked Data to Infer Semantic Relations within

Structured Sources

Mohsen Taheriyan Craig A. Knoblock

Pedro Szekely Jose Luis Ambite

Yinyi Chen

Page 2: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

Problem: How to map structured data to

a domain ontology?

Page 3: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

4

Semantic ModelMap the source to the classes & properties in an

ontologytitle date name

1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey DewingSour

ceDo

mai

n On

tolo

gy

CIDOC-CRM 85 classes297 properties

Page 4: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

Semantic Types

E35_Title E82_Actor_Appellation

rdfs:label rdfs:label

5

E52_Time-Span

title date name1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey Dewing

P82_at_some_time_within

Page 5: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

Relationships

E35_Title E82_Actor_Appellation

rdfs:label rdfs:label

6

E52_Time-Span

title date name1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey Dewing

P82_at_some_time_within

E22_Man-Made_Object

E12_Production E21_Person

P102_has_title

P108_was_produced_by

P4_has_time-span

P14_carried_out_by

P131_is_identified_by

Page 6: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

7

Idea

• There is a huge amount of linked data available in many domains (RDF format)

• Use LOD as the background knowledge

• Exploit the relationships between instances

Page 7: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

8

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

Page 8: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

9

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

Page 9: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

10

LOD Patterns../person-

institution/57551 E21_Personrdf:type

Thomas Burgonskos:prefLabel

../person-institution/57551/birthP98i_was_born

../person-institution/57551/birth/date

P4_has_time-span

E67_Birthrdf:type

1787rdfs:label

E52_Time-Span

rdf:type

LOD fragment from the British Museum

Page 10: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

11

LOD Patterns

E67_BirthE21_Person

P98i_was_born

../person-institution/57551 E21_Person

rdf:type

Thomas Burgonskos:prefLabel

../person-institution/57551/birthP98i_was_born

../person-institution/57551/birth/date

P4_has_time-span

E67_Birthrdf:type

1787rdfs:label

E52_Time-Span

P4_has_time-span

E52_Time-Span

rdf:type

LOD fragment from the British Museum

Pattern

Page 11: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

12

Pattern Templates

• Many possible templates for patterns– Example: patterns for classes C1, C2, C3

• Consider only tree patterns• Limit the length of the patterns

Page 12: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

13

Extracting Patterns

• Use SPARQL to query RDF data• Example: patterns with length 1SELECT DISTINCT ?c1 ?p ?c2 (COUNT(*) as ?count) WHERE {

?x ?p ?y. ?x rdf:type ?c1.?y rdf:type ?c2. FILTER (?x != ?y).}

GROUP BY ?c1 ?p ?c2 ORDER BY DESC(?count);

Page 13: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

14

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

Page 14: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

15

Merge the Patterns into a Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

P4_has_time-span

Links are weighted: less weight for more frequent linksLinks have tags: the identifier of the patterns containing the link

Page 15: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

16

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Add the paths from the Ontology

The links added from the patterns have much less weight compared to the links added from the ontology

Page 16: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

17

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

Page 17: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

18

Map Semantic Labels to the Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Page 18: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

19

Map Semantic Labels to the Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Page 19: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

20

Generate and Rank Semantic Models

• Compute Steiner tree for the mapping– A minimal tree connecting nodes of

mapping– A customization of BANKS algorithm

[Bhalotia et al., 2002]• Our algorithm considers both

coherence and popularity• Each tree is a candidate model• Rank the models based on coherence

and cost

Page 20: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

21

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

Page 21: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

22

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

Patte

rns

Grap

h

Page 22: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

23

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

Grap

h

Labe

ls PersonPlaceEvent

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

Page 23: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

24

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

Grap

h

Labe

ls PersonPlaceEvent

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

Stei

ner

Tree

s Place

Person

organizer

Event

bornIn

p1

p2, p3

Place

Person

location

Event

bornIn

p1

p2, p3

Place

Person

organizer

Event

p1location

p1

Not minimal model

but more coherent

Page 24: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

25

Steiner Tree

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Page 25: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

Evaluation

• Correct semantic types given• Linked data: 3,398,350 triples published by Smithsonian

American Art Museum• Extracted patterns of length 1 and 2• Compute precision and recall between learned links and

correct links26

Evaluation Dataset# sources 29# classes in the ontologies 147# properties in the ontologies 409# nodes in the gold standard models 812# links in the gold standard models 785

Page 26: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

27

Example

Person

Artwork

location

Museum

creator

correct model

Person

Museum

location

Artwork

founder

learned model

<Artwork,location,Museum><Artwork,creator,Person>

<Museum,founder,Person><Artwork,location,Museum>

Precision: 0.5Recall: 0.5

Page 27: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

28

Gold Standard Models - Example 1

Page 28: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

29

Gold Standard Models - Example 2

Page 29: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

30

Results

background knowledge precision recall time

(s)

domain ontology 0.07 0.05 0.17

domain ontology + patterns of length 1 0.65 0.55 0.75domain ontology + patterns of length 1 and 2 0.78 0.70 0.46

Page 30: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

31

Related Work• Mapping databases and spreadsheets to ontologies

– Mapping languages and tools (D2R, R2RML)– String similarity between column names and ontology terms

• Understand semantics of Web tables– Use column headers and cell values to find the labels and

relations from a database of labels and relations populated from the Web

• Exploit Linked Open Data (LOD)– Link the values to the entities in LOD to find the types of the

values and their relationships

• Learn semantic models of structured data sources from previously modeled sources– Learn from the popular and coherent patterns in known

semantic models

Page 31: Leveraging Linked Data to Infer Semantic Relations within Structured Sources

32

Discussion & Future Work

• Automatically Infer semantic relations from LOD

• Help to publish consistent RDF data

• Extract longer patterns from LOD