melinda: methods and tools for web data interlinking

Post on 22-May-2015

1.175 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation given at STI Innsbruck the 17th of December 2009.

TRANSCRIPT

Introduction Framework Tools Application Conclusions

Melinda

Methods and tools for Web data Interlinking

François Schar�e

December @ STI Innsbruck

Introduction Framework Tools Application Conclusions

1 Introduction

2 Framework

3 Tools

4 Application

5 Conclusions

Introduction Framework Tools Application Conclusions

Publishing datasets on the Web

Four publication principles

1 Resources are identi�ed by URIs.

2 URIs are dereferenceable.

3 When a URI is dereferenced, a description of the identi�ed

resource should be returned, ideally adapted through content

negotiation.

4 Published Web datasets must contain links to other Web

datasets.

Introduction Framework Tools Application Conclusions

Interlinking datasets

Links are contained in speci�c datasets

<http://www.example.org/linkset/DBPedia-MB>

a void:Linkset ;

void:target <http://www.dpbedia.org>;

void:target <http://www.musicbrainz.org>;

<http://www.example.org/linkset/DBPedia-MB>

<http://www.dbpedia.org/resource/

Johann_Sebastian_Bach>

owl:sameAs

<http://www.musicbrainz.org/artist/

24f1766e-9635-4d58-a4d4-9413f9f98a4c> .

Introduction Framework Tools Application Conclusions

Web Data Cloud

Introduction Framework Tools Application Conclusions

Goodie : Open Data's coming up

data.gov, US Data Act

data.gov.co.uk, Sir TBL on the track

Other intitiatives around : from the EU, Open Data intitiatives

Introduction Framework Tools Application Conclusions

What do we do ?

We propose a framework capturing the various data

interlinking methods

We study existing tools and position them in the framework

We propose an architecture allowing to articulate ontology

alignment and interlinking tools

Introduction Framework Tools Application Conclusions

General approach

URI1 URI2Data interlinking

owl :sameAs

Fig.: The data interlinking problem.

Introduction Framework Tools Application Conclusions

Manual resource alignement

URI1 URI2

URI transformation

owl :sameAs

Fig.: URI transformation.

Introduction Framework Tools Application Conclusions

Matching identi�ers - Exemple

http://dbpedia.org/resource/Johann_Sebastian_Bach

http://www.lastfm.fr/music/Johann+Sebastian+Bach

owl:sameAs

URI alignment

Fig.: URI transformation exemple

Introduction Framework Tools Application Conclusions

Datasets sharing a common ontology

O1

URI1 URI2

Resource

matching of

datasets described

by the same

ontology

owl :sameAs

Fig.: Matching two datasets described according to the same ontology.

Introduction Framework Tools Application Conclusions

Datasets sharing a common ontology - Exemple

URI1 URI2first

mo:MusicArtist

last first last

Johann-Sebastian Bach

Jean-Sébastien Bach

Resource matching algorithm,datasets described according to a common ontology

type type

DBPedia Musicbrainz

Fig.: Matching data sharing a common ontology

Introduction Framework Tools Application Conclusions

Matching datasets having heterogeneous ontologies

O1 O2

URI1 URI2

Implicit alignment

Resource

matching of

datasets described

by di�erent

ontologies

owl :sameAs

Fig.: Two datasets matched using an implicit alignment.

Introduction Framework Tools Application Conclusions

Exemple

URI1 URI2

mo:MusicArtist

givenname

nameSebastianBach"

"Johann"Jean-Sébastien"

"Bach"

type type

OpenCyc Musicbrainz

Classical Music Performer

English ID

Introduction Framework Tools Application Conclusions

General interlinking framework

O1 O2

URI1 URI2

Ontology matching

Alignment

Data interlinking

owl :sameAs

Fig.: General framework for data interlinking involving ontology matching.

Introduction Framework Tools Application Conclusions

Processes and speci�cations

process result

instance link speci�cation linkset

class matcher alignment

Tab.: Matching process, interlinks, and their results.

Introduction Framework Tools Application Conclusions

Analysis criterion

Degree of Automation

Is the tool completely automatic ?

Does the tool need to be parametrized by the user ? What kind

of parameters (data matching techniques, ontology

alignment) ?

Used matching techniques

String matching ?

External functions (values conversion, data transformations) ?

Similarity propagation ?

Other techniques ?

Domain : Is the tool speci�c for a given domain ?

Introduction Framework Tools Application Conclusions

Analysis criterion

Ontologies

Does the tool take into account ontologies associated to the

datasets ?

Does the tool allow to interlink datasets described according

to di�erent ontologies ?

In the case the ontologies di�er, does the tool perform

ontology alignment ?

Output

What does the tool produce in output ?

Does the tool propose to merge the two input datasets ?

Postprocessing Does the tool perform any post-processing

operations ?

Introduction Framework Tools Application Conclusions

Six interlinking tools

RKB-CRS Coreference resolution service of the RKB RDF

Knowledge Base.

LD-mapper Interlinking tool for the music ontology MO.

ODD Linker Interlinking tool based on SQL record matching.

RDF-AI Interlinking and data fusion tool.

Silk et Silk LSL Interlinking tool and link speci�cation language.

Knofuss architecture Outil Interlinking and data fusion tool with

ontology alignment.

Introduction Framework Tools Application Conclusions

Six interlinking tools

owl:sameAs

URI 2

Resource comparison method

URI 1

O1 O2Implicit

Alignment

OntologyMatchingSystem

Silk

ODD-Linker LD-Mapper

RDF-AI Knofuss

ExplicitAlignment

RKB-CRS

Fig.: Tools positioned in the de�ned framework

Introduction Framework Tools Application Conclusions

Application

Let us consider a link speci�cation between DBPedia andGeonames :

<Silk><Prefix id="rdfs" namespace=

"http://www.w3.org/2000/01/rdf-schema#" /><Prefix id="dbpedia" namespace=

"http://dbpedia.org/ontology/" /><Prefix id="gn" namespace=

"http://www.geonames.org/ontology#" />

<DataSource id="dbpedia"><EndpointURI>http://demo_sparql_server1/sparql</EndpointURI><Graph>http://dbpedia.org</Graph>

</DataSource>

<DataSource id="geonames"><EndpointURI>http://demo_sparql_server2/sparql</EndpointURI><Graph>http://sws.geonames.org/</Graph>

</DataSource>

<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"

verifyLinks="verify_links.n3"mode="truncate" />

<Interlink id="cities"><LinkType>owl:sameAs</LinkType><SourceDataset dataSource="dbpedia" var="a"><RestrictTo>

?a rdf:type dbpedia:City</RestrictTo>

</SourceDataset><TargetDataset dataSource="geonames" var="b"><RestrictTo>

?b rdf:type gn:P</RestrictTo>

</TargetDataset><LinkCondition><AVG>

<Compare metric="jaroSimilarity"><Param name="str1" path="?a/rdfs:label" /><Param name="str2" path="?b/gn:name" />

</Compare><Compare metric="numSimilarity">

<Param name="num1"path="?a/dbpedia:populationTotal" />

<Param name="num2" path="?b/gn:population" /></Compare>

</AVG></LinkCondition>

</Interlink></Silk>

Introduction Framework Tools Application Conclusions

Application

The alignment implicitely contained in the link speci�cation.

:dbp-geo a align:Alignment;align:onto1 <http://dbpedia.org/ontology/>;align:onto2 <http://www.geonames.org/ontology#>;align:map [ :map1 a align:Cell;align:entity1 dbpedia:City;align:entity2 gn:P;align:relation align:subsumedBy.

];align:map [ :map2 a align:Cell;align:entity1 dbpedia:populationTotal;align:entity2 gn:population;align:relation align:equivalent.

];align:map [ :map3 a align:Cell;align:entity1 rdfs:label;align:entity2 gn:name;align:relation align:equivalent.

].

align:map [ :map2 a align:Cell;align:entity1 [ a align:Property;

edoal:and dbpedia:populationTotal.edoal:and [ a edoal:PropertyDomainRestriction;

edoal:domain dbpedia:City.];

align:entity2 [ a align:Property;edoal:and gn:population;

edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain gn:P. ];

align:relation align:equivalent.];align:map [ :map2 a align:Cell;

align:entity1 [ a align:Property;edoal:and rdfs:label.

edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain dbpedia:City.

];align:entity2 [ a align:Property;

edoal:and gn:name;edoal:and [ a edoal:PropertyDomainRestriction;

edoal:domain gn:P. ];align:relation align:equivalent.

].

Introduction Framework Tools Application Conclusions

Application

Using the alignment, the link speci�cation can be simpli�ed.

<UseAlignment rdf:resource="#dbp-geo" />

<Interlink id="cities"><LinkType>owl:sameAs</LinkType><LinkCell rdf:resource="#map1" /><LinkCondition><AVG>

<Compare metric="jaroSimilarity"><CellParam rdf:resource="#map2" />

</Compare><Compare metric="numSimilarity">

<CellParam rdf:resource="#map3" /></Compare>

</AVG></LinkCondition>

<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"verifyLinks="verify_links.n3"mode="truncate" />

</Interlink>

Introduction Framework Tools Application Conclusions

Conclusions

We propose a framework for data interlinking on the Web of

data.

We have presented existing tools and positioned them wrt the

framework.

We propose a simpli�cation of the interlinking task and

demonstrate it on an example.

Our current work goes towards more interoperability for link

speci�cations :

Is it possible to construct more generic link speci�cations ? ie

attached to datasets or ontologies

Is it possible to automatically �nd out the key properties

allowing to identify matching pairs ?

Introduction Framework Tools Application Conclusions

For more

http://melinda.inrialpes.fr

François Schar�e et Jérôme Euzenat. Linked data meets

ontology matching : enhancing data interlinking through

ontology alignments. (submitted WWW'2010).

top related