lod2 webinar series: d2r and sparqlify

63
LOD2 Webinar . 29.11.2011 . Page 1 http:// lod2.eu Creating Knowledge out of Interlinked Data

Upload: lod2-creating-knowledge-out-of-interlinked-data

Post on 11-May-2015

963 views

Category:

Technology


4 download

DESCRIPTION

This webinar in the course of the LOD2 webinar series will present use cases and live demos of D2R (Free University Berlin) and Sparqlify (University of Leipzig). D2R Server is a tool for publishing relational databases on the Semantic Web. It enables RDF and HTML browsers to navigate the content of the database, and allows applications to query the database using the SPARQL query language. Sparqlify is a tool enabling one to define expressive RDF views on relational databases and query them with a subset of the SPARQL query language. By featuring a novel RDF view definition syntax, it aims at simplifying the RDB-RDF mapping process. more to be found at:

TRANSCRIPT

Page 1: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 2: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu

Creating Knowledge out of Interlinked Data

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany.

LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.

Page 3: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu

Creating Knowledge out of Interlinked Data

Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle.

Stay with us and learn more about acquisition, editing, composing, connected applications – and finally publishing Linked Open Data.

Page 4: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu

Creating Knowledge out of Interlinked Data

• School of Business & Economics, Freie Universität Berlin• Research focus: Linked Data technologies for extending the

World Wide Web with a global data commons• Funded Projects:

• LOD2 - Creating Knowledge out of Interlinked Data• LATC - LOD Around The Clock• PlanetData

• Visit us at: http://wbsg.de

Web-based Systems Group

Page 5: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu

Creating Knowledge out of Interlinked Data

• DBpedia is a community effort lead by WBSG, AKSW and OpenLink Software to:

• Extract structured information from Wikipedia• Make this information available on the Web under an open license• Interlink the DBpedia dataset with other open datasets on the Web• DBpedia Spotlight: Automatic annotation of free-text with DBpedia URIs

• Data Integration• R2R: Translates Web data that is represented using terms from different

vocabularies into a single target vocabulary.• Silk: Tool for generating RDF links between data items.• LDIF: Translates heterogeneous Linked Data from the Web into a clean,

local target representation while keeping track of data provenance.

Main Projects

Page 6: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu

Creating Knowledge out of Interlinked Data

• D2R/Sparqlify in the LOD2 Stack• The D2RQ Platform• The D2RQ Mapping Language• Example and Demo• Availability• Sparqlify (Claus Stadler)• Q & A

Outline

Page 7: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu

Creating Knowledge out of Interlinked Data

D2R/Sparqlify in the LOD2 Stack

Page 8: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu

Creating Knowledge out of Interlinked Data

• System for accessing relational databases as virtual RDF graphs

• Offers RDF-based access to the content of relational databases without having to replicate it into an RDF store

• Features:• query a non-RDF database using SPARQL• access the content of the database as Linked Data over

the Web• create custom dumps of the database in RDF • access information using the Apache Jena API

The D2RQ Platform

Page 9: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu

Creating Knowledge out of Interlinked Data

• The D2RQ Platform consists of:• D2RQ Mapping Language, a declarative mapping

language for describing the relation between an ontology and an relational data model.

• D2RQ Engine, uses the mappings to rewrite SQL queries against the database and passes query results up to the higher layers of the frameworks

• D2R Server, an HTTP server that provides a Linked Data view, a HTML view for debugging and a SPARQL Protocol endpoint over the database.

Components

Page 10: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu

Creating Knowledge out of Interlinked Data

Architecture

Page 11: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Declarative language for mapping relational database schemas to RDF vocabularies and OWL ontologies.

• N3 based syntax• Very flexible• Usual workflow: auto-generate mapping from DB schema,

then customize

D2RQ Mapping Language

Page 12: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu

Creating Knowledge out of Interlinked Data

Mapping process

Page 13: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Existing database which stores information about:• Conferences• Papers• Authors• Topics

• We want publish this database as RDF• We will use the International Semantic Web Community

(ISWC) Ontology.

Example

Page 14: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu

Creating Knowledge out of Interlinked Data

• d2rq:Database defines a JDBC connection to a local or remote relational database

• d2rq:jdbcDSN specifies the JDBC database URL• Typically of the form: jdbc:subprotocol:subname

• d2rq:jdbcDriver specifies the JDBC driver for the database

Define DB connection

map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".

map:MyDatabase a d2rq:Database; d2rq:jdbcDSN "jdbc:mysql://localhost/mydb"; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:username "user"; d2rq:password "password".

Page 15: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu

Creating Knowledge out of Interlinked Data

• d2rq:ClassMap represents a class or a group of similar classes

• A class map defines how instances of the class are identified 

• d2rq:uriPattern specifies a URI pattern that will be used to identify instances of this class map.

Define your entities

(SQL fragments in red)

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”.

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”.

Page 16: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu

Creating Knowledge out of Interlinked Data

• d2rq:condition specifies an SQL WHERE condition• An instance of this class will only be generated for database

rows that satisfy the condition• Conditions can be used to hide parts of the database from

D2RQ

Define your entities

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”.

map:People a d2rq:ClassMap;d2rq:uriPattern “http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”.

(SQL fragments in red)

Page 17: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu

Creating Knowledge out of Interlinked Data

• d2rq:class relates the generated entity to a OWL/RDFS class

• We use the Person class from the FOAF vocabulary

Add properties to entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:People a d2rq:ClassMap;d2rq:uriPattern

“http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .

map:People a d2rq:ClassMap;d2rq:uriPattern

“http://.../people/@@User.ID@@”;d2rq:condition “User.deleted=0”;d2rq:class foaf:Person .

Page 18: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu

Creating Knowledge out of Interlinked Data

• A d2rq:PropertyBridge relates a database column to an RDF property.• Here we use properties from the FOAF vocabulary as well 

Add properties to entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:name a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:nick; d2rq:column “User.name”.

map:mbox a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox; d2rq:uriPattern “mailto:@@User.email@@”.

map:name a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:nick; d2rq:column “User.name”.

map:mbox a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox; d2rq:uriPattern “mailto:@@User.email@@”.

Page 19: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu

Creating Knowledge out of Interlinked Data

• d2rq:sqlExpression generates literal values by evaluating a SQL expression.

• Note that querying for such a computed value might put a heavy load on the database.

• We compute the SHA1 sum from the user email address

Add properties to entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:mbox_sha1 a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox_sha1sum; d2rq:sqlExpression

“SHA1(CONCAT(‘mailto:’, User.email))”.

map:mbox_sha1 a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:mbox_sha1sum; d2rq:sqlExpression

“SHA1(CONCAT(‘mailto:’, User.email))”.

Page 20: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu

Creating Knowledge out of Interlinked Data

• We define a second class mapping for photos• In the next step, we will interlink person with their photos

Link your entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:Photos a d2rq:ClassMap;d2rq:uriPattern

“http://.../photo/@@Photo.ID@@”;d2rq:class foaf:Image .

map:Photos a d2rq:ClassMap;d2rq:uriPattern

“http://.../photo/@@Photo.ID@@”;d2rq:class foaf:Image .

Page 21: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu

Creating Knowledge out of Interlinked Data

• We can use the already presented syntax to interlink persons to their photo

• Photo.UserID is a foreign key to User.ID

Link your entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:uriPattern “http://.../photo/@@Photo.UserID@@”.

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:uriPattern “http://.../photo/@@Photo.UserID@@”.

Page 22: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Better, with less repetition:

Link your entities

(SQL fragments in red, RDFS/OWL vocabulary in blue)

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

map:photo a d2rq:PropertyBridge; d2rq:belongsToClassMap map:People; d2rq:property foaf:made; d2rq:join “User.ID = Photo.UserID”; d2rq:refersToClassMap map:Photos .

Page 23: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 23 http://lod2.eu

Creating Knowledge out of Interlinked Data

Mapping Overview

Page 24: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 24 http://lod2.eu

Creating Knowledge out of Interlinked Data

•Demo

Page 25: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 25 http://lod2.eu

Creating Knowledge out of Interlinked Data

• D2RQ can be downloaded from the official homepage at:

• http://d2rq.org/

• Support is provided through the official mailing list:

[email protected]

• The latest source code is available from the project's Git repository:

• https://github.com/d2rq/d2rq

• D2RQ is licensed under the terms of the Apache Software Licence

Availability

Page 26: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 26 http://lod2.eu

Creating Knowledge out of Interlinked Data

Developers

Page 27: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 27 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Supported databases• Oracle• MySQL• PostgreSQL• SQL Server• HSQLDB• Interbase/Firebird

• ODBC data sources• Works with some limitations.

• Other databases• May or may not work. By default, D2RQ interacts with the

database using the SQL-92 standard. Any compatible database should work out of the box. We are interested in reports about D2RQ on other databases.

Database Compatibility

Page 28: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 28 http://lod2.eu

Creating Knowledge out of Interlinked Data

• D2RQ is actively developed• Work on supporting RDB2RDF (Direct Mapping und R2RML)

in the next 6 weeks

Current Work

Page 29: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 29 http://lod2.eu

Creating Knowledge out of Interlinked Data

SparqlifySparqlify

Project Page: http://aksw.org/projects/SparqlifySource Code: https://github.com/AKSW/Sparqlify

Page 30: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 30 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Claus Stadler• Austria• PhD Student at the University of Leipzig since 2011

– In the Agile Knowledge Engineering and Semantic Web (AKSW) research group, headed by Soeren Auer.

• Research Interests: Spatial Data Management, SPARQL-SQL query rewriting and optimization, Data integration.

About me

Page 31: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 31 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Founded in 2006• 25+ Researchers• 3 Sub groups

• Goals– Contributing to the advancement of science in Semantic Web, Knowledge

Engineering, Software Engineering– Cost efficient, high-impact R&D, which proves usefulness at an early stage– Bridge the gap between research results and applications

• Committed to Open Source, Open Access, and Open Knowledge movements

Agile Knowledge Engineering and Semantic Web Research Group

Page 32: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 32 http://lod2.eu

Creating Knowledge out of Interlinked Data

• EU Funded Projects:

– Linked Open Data 2 (LOD2)

– LOD Around the Clock (LATC)

– Open Data Portal (ODP)

– Semantic Content Management Systems for Enterprise Knowledge Management and News Mining (SCMS)

– OntoWiki - Semantic Collaboration for Knowledge Management, E-Learning and E-Tourism

Agile Knowledge Engineering and Semantic Web Research Group

Page 33: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 33 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Further Projects– SlideWiki

• SlideWiki is a collaboration platform which enables communities to build, share and play online presentations.

– LinkedGeoData• Making OpenStreetMap data available in the Semantic Web• Motivation for Sparqlify

– LIMES• Very fast tool for interlinking RDF knowledge bases.

– DBpedia Live• Synchronization of DBpedia with Wikipedia

– …

• Find more at– http://aksw.org/Projects

Agile Knowledge Engineering and Semantic Web Research Group

Page 34: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 34 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Introduction• View Definition Example

– based on challenges encountered with LinkedGeoData

• Launching Sparqlify Server• Demonstration• Initial Results of the Performance Evaluation• Conclusion & Future Work• Outro

Structure

Page 35: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 35 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Sparqlify is a SPARQL-SQL rewriter that enables one to define RDF views on relational databases and query them with SPARQL. Currently only PostgreSQL is supported.

• Inputs– PostgreSQL Database, Set of View Definitions, Sparql Query

• Features– Intuitive View Definition Syntax– SPARQL queries are rewritten into a single SQL query

• Give as much control as possible to the query optimizer of the underlying RDBMS– High expressivity

• Language and Data type Tags can originate from columns• Constraints can be stated for tuning the rewriting process

– Initial support for geospatial predicates• Can be extended to enable the use of arbitrary SQL predicates on the SPARQL level

Introduction

Page 36: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 36 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

On the following slides, Prefix Declarations are omitted for brevity

Page 37: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 37 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

Create View pois As Construct { …

id class geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Page 38: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 38 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

Create View pois As Construct { ?s a ?t . ?s geom:geometry ?geo . } With …

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Page 39: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 39 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Create View pois As Construct { ?s a ?t . ?s geom:geometry ?geo . } With ?s = spy:uri(concat(“http://ex.org/”, ?id)) ….

Page 40: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 40 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Create View pois As Construct { ?s a ?t . ?s geom:geometry ?geo . } With ?s = spy:uri(concat(“http://...”, ?id)) ?t = spy:uri(?type) ?geom = spy:typedLiteral(?geom, ogc:WKTLiteral) From …

Page 41: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 41 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Create View pois As Construct { ?s a ?t . ?s geom:geometry ?geo . } With ?s = spy:uri(concat(“http://ex.org/”, ?id)) ?t = spy:uri(?type) ?geom = spy:typedLiteral(?geom, ogc:WKTLiteral) From points_of_interest;

Page 42: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 42 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “points_of_interest”

id type geom

1 lgdo:Bakery (1, 1)

2 lgdo:School (2, 2)

3 lgdo:Pub (3, 3)

Create View pois As Construct { ?s a ?t . ?s geom:geometry ?geo . } With ?s = spy:uri(concat(“http://ex.org/”, ?id)) ?t = spy:uri(?type) ?geom = spy:typedLiteral(?geom, ogc:WKTLiteral) Constrain ?t prefix “http://linkedgeodata.org/ontology/” From points_of_interest;

Page 43: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 43 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “resource_label”

resource label language

lgdo:Bakery Baeckerei de

lgdo:Bakery Bakery en

lgdo:School Schule de

Page 44: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 44 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Mapping the table “resource_label”

resource label language

lgdo:Bakery Baeckerei de

lgdo:Bakery Bakery en

lgdo:School Schule de

Create View labels As Construct { ?s rdfs:label ?l . } With ?s = spy:uri(?resource) ?l = spy:plainLiteral(?label, ?language) Constrain ?s prefix “http://linkedgeodata.org/ontology/” From resource_labels;

Page 45: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 45 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Adding a set of static triples

Create View static_triples As Construct { lgdo:Bakery a owl:Class . lgdo:School a owl:Class . lgdo:Pub a owl:class };

Page 46: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 46 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition File Syntax

Prefix Declarations

Create View {name} As Construct { {triple patterns} } With {variable bindings} Constrain {constraint expressions} From logical table (table, view or SQL query);

… More View Definitions …

Page 47: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 47 http://lod2.eu

Creating Knowledge out of Interlinked Data

View Definition Example: Wortschatz

Create View view_co_n As Construct { ?a wso:coOccursDirectlyWith ?b . ?x owl:annotatedSource ?a . ?x owl:annotatedProperty wso:coOccursDirectlyWith . ?x owl:annotatedTarget ?b . ?x wso:frequency ?f . ?x wso:sigma ?s . } With ?a = spy:uri(concat('http://aksw.org/wortschatz/word/', ?w1_id)) ?b = spy:uri(concat('http://aksw.org/wortschatz/word/', ?w2_id)) ?x = spy:uri(concat('http://aksw.org/wortschatz/co-occurence/direct/', ?w1_id, '/', ?w2_id)) ?f = spy:typedLiteral(?freq, xsd:long) ?s = spy:typedLiteral(?sig, xsd:long) From [[SELECT w1_id, w2_id, freq::bigint, sig::bigint FROM co_n]];

Escape SQL queries in double brackets

Page 48: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 48 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Download from git, build with– mvn assebly:assembly

• Run– java -cp target/sparqlify-0.0.1-SNAPSHOT-jar-with-dependencies.jar RunEndpoint

[options]

• Options are– Server Configuration

• -c Config file containing the mapping definitions• -P Server port [default 7531]

– Database settings• -h Hostname of the database (e.g. localhost or localhost:5432)• -d Database name• -u User name• -p Password

– Quality of Service• -n Maximum result set size• -t Maximum query execution time (excluding rewriting time)

Launching Sparqlify

Page 49: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 49 http://lod2.eu

Creating Knowledge out of Interlinked Data

Demonstration

Page 50: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 50 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Initial performance comparision on BSBM 1 mio dataset on PostgreSQL:

– (Times per Query Mix)– D2R Fast Mode Disabled: ~8sec– D2R Fast Mode Enabled: ~3sec– Sparqlify: 4 sec– Performance is comparable to D2R.

• Mixed results for the LinkedGeoData schema:– Simple queries work well on the LGD schema– Complex queries are troublesome (timeouts) on a complete OSM dump as the

PostgreSQL optimizer makes suboptimal choices.

Initial Results of the Performance Evaluation

Page 51: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 51 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Sparqlify provides an intuitive Mapping Syntax

• Originally developed for the LinkedGeoData use-case– Spatial predicate support, arbitrary predicate support planned.– URIs, language and datatype tags can be mapped from columns of the DB.– Queries are rewritten into a single SQL statement, in order to give as much control

to the query optimizer of the underlying DBMS as possible.

• Initial performance results seem to be comparable to D2R– More extensive testing has yet to be done

• Bugfixing

• Additional features– Especially support for the COUNT keyword

Conclusion and Future Work

Page 52: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 52 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Project Page– http://aksw.org/projects/Sparqlify

• Source Code– https://github.com/AKSW/Sparqlify

• AKSW Research Group– http://aksw.org

• My Work Page– http://bis.informatik.uni-leipzig.de/ClausStadler

• My Email– [email protected]

Contact

Page 53: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 53 http://lod2.eu

Creating Knowledge out of Interlinked Data

Thank you for your attention!

Q & A

Page 54: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 54 http://lod2.eu

Creating Knowledge out of Interlinked Data

Credits

Jingle R.E.M., Martin Kaltenböck, Florian Kondert

Coordination Thomas Thurner

Martin Kaltenböck

Moderation Martin Kaltenböck

Presented by Robert Isele & Claus Stadler

Page 55: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 55 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Hope you enjoyed staying with us – if you need more detailed information, visit us at www.lod2.eu and let us know how we can improve to meet your expectations!

Don’t forget to register for our next webinar

22.05. 2012 – Cloud View (Exalead Dassault Systems, France) 19.06. 2012 – PoolParty Thesaurus Manager (SWC, Austria)Have a great day and don’t forget ...

Page 56: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 56 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Page 57: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 57 http://lod2.eu

Creating Knowledge out of Interlinked Data

• There is– Virtuoso RDF Views– D2R– Revelytix Spyder– Asio Semantic Web Bridge for Relational Databases– ODE Mapster, RDBToOnto– Soon further implementations of R2RML– Ultrawrap– …

Why another SPARQL – SQL Rewriter?

Page 58: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 58 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Map OpenStreetMap data to RDF– Taken approach

• Download a OSM planet file (>10GB compressed), pipe each OSM entity (node, way, relation) through a custom Java RDF mapper, and load the data into Virtuoso

• Implemented a LiveSync on top of that• Repeat the dump process after each change in the mappings• Takes more than 2 days.

– Goal• Immediate effect of a change in the mappings• Reuse of Osmosis' LiveSync

– Possible Solution• Keep the mapping information in the relational database, and use a RDB-RDF

mapper for querying it.– However: Back in April 2011, none of the existing RDB-RDF solutions seemed suitable

• Lack of support for spatial predicates• Evaluations of Sparql-Filters in memory• No support for creating literals where the language tag or datatype are stored in

the database.

Motivation

Page 59: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 59 http://lod2.eu

Creating Knowledge out of Interlinked Data

• LinkedGeoData project: Convert OpenStreetMap (OSM) data as RDF– (http://linkedgeodata.org)

• Main tables of the OSM Schema (Excerpt):– Nodes(id, geom, tstamp)– NodeTags(node_id, k, v)

– Ways(id, geom, tstamp)– WayTags(way_id, k, v)

– WayNodes(way_id, sequence_id, node_id)

Motivation

(place, city)(name, Leipzig)

Page 60: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 60 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Geometry datatype• URIs and language tags stored in database tables

Challenges with OpenStreetMap data

node_id k v

1 amenity school

k v property object

amenity school rdf:type lgdo:school

Additional mappings tables for LGD

k v label language

amenity school Schule de

Nodes (OSM)

lgd_map_resource_kv

lgd_map_resource_labels

Labels imported from TranslateWiki

Page 61: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 61 http://lod2.eu

Creating Knowledge out of Interlinked Data

Rewriting process

Page 62: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 62 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Rewriting process– View Candidate Finding

• Given a SPARQL query, find an appropriate subset of the views for answering the query

– Rewriting• After the candidates have been identified, translate the SPARQL algebra to

SQL algebra.• Thereby do book-keeping of how the SPARQL variables are reconstructed

from the SQL columns.– Result Set Rendering

• Execute the SQL query, construct the RDF according to the SPARQL variable bindings, serialize the result.

Rewriting process

Page 63: LOD2 Webinar Series: D2R and Sparqlify

LOD2 Webinar . 29.11.2011 . Page 63 http://lod2.eu

Creating Knowledge out of Interlinked Data

• Based onLe, Wangchao and Duan, Songyun and Kementsietsidis, Anastasios and Li, Feifei and

Wang, MinRewriting Queries on SPARQL Views,In WWW2011

View Candidate Finding