leveraging linked data to infer semantic relations within structured sources

Post on 11-Jan-2017

544 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Leveraging Linked Data to Infer Semantic Relations within

Structured Sources

Mohsen Taheriyan Craig A. Knoblock

Pedro Szekely Jose Luis Ambite

Yinyi Chen

Problem: How to map structured data to

a domain ontology?

4

Semantic ModelMap the source to the classes & properties in an

ontologytitle date name

1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey DewingSour

ceDo

mai

n On

tolo

gy

CIDOC-CRM 85 classes297 properties

Semantic Types

E35_Title E82_Actor_Appellation

rdfs:label rdfs:label

5

E52_Time-Span

title date name1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey Dewing

P82_at_some_time_within

Relationships

E35_Title E82_Actor_Appellation

rdfs:label rdfs:label

6

E52_Time-Span

title date name1 The Island 2009 Walton Ford

2 Excavation at Night 1908 George Wesley Bellows

3 Rose Garden 1901 Maria Oakey Dewing

P82_at_some_time_within

E22_Man-Made_Object

E12_Production E21_Person

P102_has_title

P108_was_produced_by

P4_has_time-span

P14_carried_out_by

P131_is_identified_by

7

Idea

• There is a huge amount of linked data available in many domains (RDF format)

• Use LOD as the background knowledge

• Exploit the relationships between instances

8

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

9

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

10

LOD Patterns../person-

institution/57551 E21_Personrdf:type

Thomas Burgonskos:prefLabel

../person-institution/57551/birthP98i_was_born

../person-institution/57551/birth/date

P4_has_time-span

E67_Birthrdf:type

1787rdfs:label

E52_Time-Span

rdf:type

LOD fragment from the British Museum

11

LOD Patterns

E67_BirthE21_Person

P98i_was_born

../person-institution/57551 E21_Person

rdf:type

Thomas Burgonskos:prefLabel

../person-institution/57551/birthP98i_was_born

../person-institution/57551/birth/date

P4_has_time-span

E67_Birthrdf:type

1787rdfs:label

E52_Time-Span

P4_has_time-span

E52_Time-Span

rdf:type

LOD fragment from the British Museum

Pattern

12

Pattern Templates

• Many possible templates for patterns– Example: patterns for classes C1, C2, C3

• Consider only tree patterns• Limit the length of the patterns

13

Extracting Patterns

• Use SPARQL to query RDF data• Example: patterns with length 1SELECT DISTINCT ?c1 ?p ?c2 (COUNT(*) as ?count) WHERE {

?x ?p ?y. ?x rdf:type ?c1.?y rdf:type ?c2. FILTER (?x != ?y).}

GROUP BY ?c1 ?p ?c2 ORDER BY DESC(?count);

14

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

15

Merge the Patterns into a Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

P4_has_time-span

Links are weighted: less weight for more frequent linksLinks have tags: the identifier of the patterns containing the link

16

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Add the paths from the Ontology

The links added from the patterns have much less weight compared to the links added from the ontology

17

Approach

Extract graph patterns from the linked data

• Target source (S)• Domain Ontologies (O)• Semantic labels of S• Linked Data (in the same domain)

Construct a graph from LOD patterns and the ontologyGenerate and rank semantic models

1

2

3

InputA ranked set of semantic models for S

Output

18

Map Semantic Labels to the Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

19

Map Semantic Labels to the Graph

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

20

Generate and Rank Semantic Models

• Compute Steiner tree for the mapping– A minimal tree connecting nodes of

mapping– A customization of BANKS algorithm

[Bhalotia et al., 2002]• Our algorithm considers both

coherence and popularity• Each tree is a candidate model• Rank the models based on coherence

and cost

21

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

22

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

Patte

rns

Grap

h

23

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

Grap

h

Labe

ls PersonPlaceEvent

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

24

What Is Coherence?

Place

Person

organizer

Event

location

p1 p2 p3

Place

Person

bornIn

Place

isPartOf

PlacePlace

bornIn

Person

diedIn

Patte

rns

Grap

h

Labe

ls PersonPlaceEvent

PlacePerson

organizer

Event

location1 1

bornIn

0.5

p2, p3

p1p1Place

isPartOfp21

1diedInp3

Stei

ner

Tree

s Place

Person

organizer

Event

bornIn

p1

p2, p3

Place

Person

location

Event

bornIn

p1

p2, p3

Place

Person

organizer

Event

p1location

p1

Not minimal model

but more coherent

25

Steiner Tree

E12_ProductionE53_Title

P108i_was_produced_by

E52_Time-Span

E82_Actor_Appellation

E22_Man-Made_Object

E21_Person

P102_has_title

P14_carried_out_by P131_is_identified_by

E67_Birth

P98i_was_born

P4_has_time-span

E39_Actor

P1_is_identified_by

P1_is_identified_by

P98i_was_born

P14_carried_out_by

P4_has_time-span

Evaluation

• Correct semantic types given• Linked data: 3,398,350 triples published by Smithsonian

American Art Museum• Extracted patterns of length 1 and 2• Compute precision and recall between learned links and

correct links26

Evaluation Dataset# sources 29# classes in the ontologies 147# properties in the ontologies 409# nodes in the gold standard models 812# links in the gold standard models 785

27

Example

Person

Artwork

location

Museum

creator

correct model

Person

Museum

location

Artwork

founder

learned model

<Artwork,location,Museum><Artwork,creator,Person>

<Museum,founder,Person><Artwork,location,Museum>

Precision: 0.5Recall: 0.5

28

Gold Standard Models - Example 1

29

Gold Standard Models - Example 2

30

Results

background knowledge precision recall time

(s)

domain ontology 0.07 0.05 0.17

domain ontology + patterns of length 1 0.65 0.55 0.75domain ontology + patterns of length 1 and 2 0.78 0.70 0.46

31

Related Work• Mapping databases and spreadsheets to ontologies

– Mapping languages and tools (D2R, R2RML)– String similarity between column names and ontology terms

• Understand semantics of Web tables– Use column headers and cell values to find the labels and

relations from a database of labels and relations populated from the Web

• Exploit Linked Open Data (LOD)– Link the values to the entities in LOD to find the types of the

values and their relationships

• Learn semantic models of structured data sources from previously modeled sources– Learn from the popular and coherent patterns in known

semantic models

32

Discussion & Future Work

• Automatically Infer semantic relations from LOD

• Help to publish consistent RDF data

• Extract longer patterns from LOD

top related