tutorial "an introduction to sparql and queries over linked data" chapter 3 (icwe 2012...

73
ICWE 2012 Tutorial An Introduction to SPARQL and Queries over Linked Data ● ● ● Chapter 3: Querying Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin

Upload: olaf-hartig

Post on 11-May-2015

3.499 views

Category:

Technology


2 download

DESCRIPTION

These are the slides from my ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data"

TRANSCRIPT

Page 1: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

ICWE 2012 Tutorial

An Introduction to SPARQL and Queries over Linked Data

● ● ●

Chapter 3: Querying Linked Data

Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf

@olafhartig

Database and Information Systems Research GroupHumboldt-Universität zu Berlin

Page 2: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 2

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries

Page 3: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 3

SPARQL Endpoints

● SPARQL query processing service

● Supports the SPARQL protocol

● Issuing a SPARQL query is an HTTP GET requestwith parameter query

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

URL-encoded stringwith the SPARQL query

Page 4: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 4

Query Result Formats

● For SELECT and ASK queries: XML, JSON, plain text

● For CONSTRUCT and DESCRIBE: RDF/XML, Turtle, ...

● How to request?● ACCEPT header

● Non-standard alternative: parameter out

GET /sparql?query=PREFIX+rd... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1Accept: application/sparql-results+json

GET /sparql?out=json&query=... HTTP/1.1Host: dbpedia.orgUser-agent: my-sparql-client/0.1

Page 5: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 5

SPARQL Client Libraries

● More convenient than on the protocol level:● SPARQL JavaScript Library

http://www.thefigtrees.net/lee/blog/2006/04/sparql_calendar_demo_a_sparql.html

● ARC for PHP http://arc.semsol.org/● RAP – RDF API for PHP

http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html● Jena / ARQ (Java) http://jena.sourceforge.net/● Sesame (Java) http://www.openrdf.org/● SPARQL Wrapper (Python)

http://sparql-wrapper.sourceforge.net/● PySPARQL (Python)

http://code.google.com/p/pysparql/

Page 6: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 6

SPARQL Client Libraries

● Example with Jena ARQ:

import com.hp.hpl.jena.query.*;

String service = "..."; // address of the SPARQL endpointString query = "SELECT ..."; // your SPARQL queryQueryExecution e = QueryExecutionFactory.sparqlService( service, query );ResultSet results = e.execSelect();while ( results.hasNext() ) {

QuerySolution s = results.nextSolution();// …

}e.close();

Page 7: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 7

SPARQL Endpoints

● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints

● Send your query, receive the result

Page 8: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 8

SPARQL Endpoints

● Several Linked Data sets exposed via SPARQL endpoint● DBpedia http://dbpedia.org/sparql● Musicbrainz http://dbtune.org/musicbrainz/sparql● Semantic Web dog food http://data.semanticweb.org/sparql● etc. http://esw.w3.org/topic/SparqlEndpoints

● Send your query, receive the result

Querying a single dataset is quite boring

compared to:

Issuing SPARQL queries over multiple datasets

Querying a single dataset is quite boring

compared to:

Issuing SPARQL queries over multiple datasets

Page 9: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 9

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets Linked Data Queries

Page 10: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 10

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 11: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 11

Querying a Given Collection

● Some public SPARQL endpoints provide access to a collection of data from multiple sources● http://lod.openlinksw.com/sparql● http://sparql.sindice.com/

● Pros:● Nothing to set up● Good query execution times

● Cons:● Queried data might be out of date● Not all relevant data in the collection

Page 12: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 12

Setting up Your Own Collection

● RDF-specific DBMSs:● Virtuoso http://virtuoso.openlinksw.com/● Allegro Graph http://www.franz.com/agraph/allegrograph/● Bigdata http://www.systap.com/bigdata.htm● OWLIM http://www.ontotext.com/owlim● 4store http://4store.org/● Jena TDB

http://jena.apache.org/● Sesame

http://www.openrdf.org/● etc.

Page 13: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 13

Populating Your Own Collection

● Datasets provided as RDF dumps

● (Focused) crawling● ldspider http://code.google.com/p/ldspider/

Page 14: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 14

Setting up Your Own Collection

● Pros:● All relevant data● Independent of existence, availability,

efficiency of SPARQL endpoints● Good query execution times

(once set up properly)

● Cons:● Effort to set up● Effort to operate● Queried data might

be out of date

Page 15: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 15

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 16: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 16

???

?

SPARQL Endpoint Federation

● Idea of federated query processing:● Querying a query federation

service (mediator)● Mediator distributes

sub-queries torelevant sources

● Finally, mediatorcombinessub-results

● Prototypes:● FedX● SPLENDID● ANAPSID

Page 17: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 17

???

?

SPARQL Endpoint Federation

● Pros:● Queried data is up to date

● Cons:● All relevant datasets

must be exposed viaa SPARQL endpoint

● Effort to setup mediator

Page 18: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 18

SPARQL 1.1 Federation Extension

● SERVICE pattern in SPARQL 1.1● Explicitly specify query patterns whose execution

must be distributed to a remote SPARQL endpoint

SELECT ?v ?ve WHERE

{

?v rdf:type umbel-sc:Volcano ;

p:location dbpedia:Italy .

SERVICE <http://volcanos.example.org/query> {

?v p:lastEruption ?ve }

}

SELECT ?v ?ve WHERE

{

?v rdf:type umbel-sc:Volcano ;

p:location dbpedia:Italy .

SERVICE <http://volcanos.example.org/query> {

?v p:lastEruption ?ve }

}

Page 19: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 19

For all these approaches ...

● … you have to know the relevant data sources beforehand● When selecting a SPARQL endpoint over an existing

collection of datasets● When setting up your own collection● When configuring your federation system● When using the SERVICE pattern

● … you restrict yourself to the selected sources

● … you do not tap the full potential of the Web

Page 20: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 20

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries

Page 21: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 21

Main Idea

Discovered data

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 22: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 22

Main Idea

Discovered data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 23: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 23

Main Idea

Queried data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

http://.../movie2449

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 24: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 24

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

Queried data

filmingLocationlives_in

?loc

Queryhttp://.../movie2449

acto

r_in

?actor

http://mdb.../Paul

?actor

actor_in

http://.../movie2449

http://mdb.../Paul

Page 25: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 25

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

Queried data

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

http

://m

db...

/Pau

l

?

http://mdb.../Paul

?actor

Page 26: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 26

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://mdb.../Paul

?actor

Queried data

http://mdb.../Paul http://geo.../Berlin

?loc?actor

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

lives_inhttp://geo.../Berlin

http://mdb.../Paul

Page 27: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 27

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://mdb.../Paul

?actor

Queried data

http://mdb.../Paul http://geo.../Berlin

?loc?actor

filmingLocation

http://.../movie2449

acto

r_in

lives_in ?loc

Query

?actor

Page 28: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 28

“Real World” Example

SELECT DISTINCT ?author ?phone WHERE {

?pub swc:isPartOf <http://data.semanticweb.org/conference/eswc/2009/proceedings> .

?pub swc:hasTopic ?topic . ?topic rdfs:label ?topicLabel .

FILTER regex( str(?topicLabel), "ontology engineering", "i" ) .

?pub swrc:author ?author .

{ ?author owl:sameAs ?authorAlt }

UNION

{ ?authorAlt owl:sameAs ?author }

?authorAlt foaf:phone ?phone

}

Return phone numbers ofauthors of ontology engineering papers

at ESWC'09.

2

297

161min 30sec

Result size

# of retrieved docs

# of accessed servers

avg. execution time

Page 29: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 29

Summary

O. Hartig and A. Langegger. A Database Perspective on Consuming Linked Data on the Web. Datenbankspektrum 10(2), 2010

Page 30: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 30

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 31: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 31

http://mdb.../Paul http://geo.../Berlin

?loc?actor

SPARQL Pattern Evaluation

eval(P,G ) = { μ1 , μ2 , ... }

filmingLocationlives_in

?loc

http://.../movie2449

acto

r_in

?actor

Page 32: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 32

http://mdb.../Paul http://geo.../Berlin

?loc?actor

QP(W ) = { μ1 , μ2 , ... }

SPARQL Linked Data Query

filmingLocationlives_in

?loc

http://.../movie2449

acto

r_in

?actor

Page 33: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 33

QP(W ) = { μ1 , μ2 , ... }

Full-Web Semantics

eval(P,AllData(W ))

Page 34: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 34

Reachability-based Semantics

● Seed URIs S

● Reachability criterion c

Page 35: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 35

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))c

Page 36: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 36

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cAll

Page 37: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 37

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cNone

Page 38: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 38

Reachability-based Semantics

WQP,S( ) = eval(P,AllData(W

* ))cMatch

Page 39: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 39

TM

Computability

● (Ordinary) Turing machinesunsuitable:● Limited data access capabilities

not properly captured

● Web machines● Abiteboul and Vianu, 1997● Mendelzon and Milo, 1997

WQP,S( )cMatch

Page 40: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 40

LD Machine

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

● Multi-tape Turing machine➔ Web Input

➔ Input

➔ Work

➔ Output

● Access to Web input is restricted● Only by performing

a particular procedurein a particular state

Page 41: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 41

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

● MQ halts after a finite number of computation steps, and

● MQ outputs the complete result Q(W )

Finitely Computable LD Queries

step 1 ∙ ∙ ∙ step k - 3 step k - 2 step k – 1 step k

∙ ∙ ∙

# enc(μ1) # enc(μ2) # ∙ ∙ ∙ # enc(μn) #

Page 42: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 42

Eventually Computable LD Queries

stepk + 2

∙ ∙ ∙∙ ∙ ∙

stepk - 3

stepk - 2

stepk - 1

stepk

stepk + 1

# enc(u1) enc(adoc(u1)) # enc(u2) enc(adoc(u2)) # ∙ ∙ ∙

# enc(μ1) # enc(μ2)

➔ Web Input

➔ Input

➔ Work

➔ Output

● For Q exists an LD machine MQ such that for any W holds:

1. Output always encodes a subset of query result Q(W ), and

2. Each μ Q(W ) eventually appears on the output

✗ No guarantee for termination

Page 43: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 43

Main Results for cMatch-Semantics

Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.

Theorem: Any satisfiable SPARQL based Linked Data query QP,S under cMatch-semantics that is monotonic, is at least eventually computable; Any non-monotonic QP,S is either finitely computable or not even eventually computable.

cMatch

cMatch

Theorem: TERMINATION(cMatch) is not LD machine decidable.Theorem: TERMINATION(cMatch) is not LD machine decidable.

Problem: TERMINATION(cMatch )

Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs

P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )

and halts?

Problem: TERMINATION(cMatch )

Web Input: W – a (potentially infinite) Web of Linked DataOrd.Input: S – a finite but nonempty set of seed URIs

P – a SPARQL expressionQuestion: Does an LD machine exist that computes QP,S (W )

and halts?cMatch

Page 44: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 44

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 45: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 45

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Seed: <http://.../orgaX>

Iterator Based Execution

Page 46: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 46

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

query-localdataset

Iterator Based Execution

Page 47: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 47

query-localdataset

Next?

Next?

Next?

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Page 48: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 48

Next?

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

query-localdataset

{ ?p = <http://.../alice> }

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Page 49: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 49

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

query-localdataset

{ ?p = <http://.../alice> }

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution

Next?

Page 50: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 50

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

:

<http://.../alice> ex:interested_in <http://.../b1>

:

query-localdataset

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Iterator Based Execution

{ ?p = <http://.../alice> }

Page 51: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 51

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 52: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 52

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

:

<http://.../b1> rdf:type <http://.../Book>

:

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 53: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 53

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

Iterator Based Execution

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Page 54: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 54

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp2 = ( ?p , ex:interested_in , ?b ) I

2

Alternative Execution Order

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Page 55: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 55

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Seed: <http://.../orgaX>

tp2 = ( ?p , ex:interested_in , ?b ) I

2query-local

dataset

Iterator Based Execution

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

Page 56: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 56

Next?:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

query-localdataset

Next?

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

END!

Alternative Execution Order

Page 57: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 57

query-localdataset

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

END!

END!

END!Computed queryresult may dependon the order of triple patterns

= logical query execution plan

Alternative Execution Order

Page 58: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 58

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 59: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 59

Query Plan Selection

Assumptions about Q P,S :● P refers to instance data● S = uris(P)

cMatch

● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)

● Cost and benefit must be estimated without plan execution

● Estimation impossible due to “zero knowledge”

● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE

● SEED TP RULE

● NO VOCAB SEED RULE

● FILTERING TP RULE

Page 60: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 60

Query Plan Selection

● Assessment criteria:● Cost (query execution time)● Benefit (size of computed of result)

● Cost and benefit must be estimated without plan execution

● Estimation impossible due to “zero knowledge”

● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE

● SEED TP RULE

● NO VOCAB SEED RULE

● FILTERING TP RULE

Assumptions about Q P,S :● P refers to instance data● S = uris(P)

cMatch

Page 61: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 61

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Page 62: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 62

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Page 63: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 63

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?b , rdf:type , <http://.../Book> ) I

2

tp3 = ( ?p , ex:interested_in , ?b ) I

3

Use a dependency respecting query plan

Page 64: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 64

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

Use a dependency respecting query plan

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

● Rationale:Avoidcartesianproducts

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?b , rdf:type , <http://.../Book> ) I

2

tp3 = ( ?p , ex:interested_in , ?b ) I

3

Page 65: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 65

SEED TP RULE

● Potential seed triple pattern

… is a triple pattern that contains at least one HTTP URI

● Seed triple pattern of a plan

… is the first triple pattern in the plan and

… is a potential seed triple pattern

● Rationale: goodstarting point

Use a plan with a seed triple pattern

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

√√

Recall: S = uris(P)

Page 66: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 66

NO VOCAB SEED RULE

● Not only vocabulary term URIs in the seed triple pattern

● Patterns to avoid: ?s ex:any_property ?o

?s rdf:type ex:any_class

● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data

Avoid a seed triple pattern with vocabulary terms

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Page 67: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 67

FILTERING TP RULE

● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns

● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output

● Rationale: Reduce cost

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible

Page 68: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 68

Evaluation Procedure

● Generate all possible plans

● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run

● Measure for each plan:● Avg. execution time● Avg. number of RDF documents retrieved during execution● Avg. number of query results

Page 69: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 69

Evaluation Query (Example)

SELECT ?spec ?genus WHERE {

geospecies:4qyn7 gs:inFamily ?fam .

?fam skos:narrowerTransitive ?spec .

?spec skos:closeMatch ?sp2 .

?sp2 rdfs:subClassOf ?genus .

?spec gs:isExpectedIn ?loc .

geospecies:4qyn7 gs:isExpectedIn ?loc

?loc rdf:type gs:State . }

● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE

● 56 different dependency respectingplans, each contains 2 filtering TPs

Of what genus are the species that are● classified in the

same family as the American Badger,

● and expected in the same states as the American Badger ?

Picture source: Wikipedia

Page 70: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 70

Measurements

1st Filtering TP

Percentage of plans in each group with a filtering TP in specific positions

2nd Filtering TP

0 30 60 90 120 150 1800

100

200

300

400

query exec. times (in seconds)

quer

y re

sults

0 30 60 90 120 150 1800

10

20

30

query exec. times (in seconds)

1 2 3 4 5 6 70

100

TP position in the ordered BGP

1 2 3 4 5 6 70

100

TP position in the ordered BGP

retr

ieve

d d

ocu

men

ts

Page 71: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 71

Summary (Linked Data Queries)

● Theoretical foundations of Linked Data queries● Full-Web semantics, (family of) reachability based semantics● Theoretical properties of queries (e.g. computability)

● Link traversal based query execution● Novel paradigm for executing Linked Data queries● Sound and complete for conjunctive Linked Data queries

under cMatch-semantics

● Iterator implementation of the LTBQE paradigm● Trades off completeness for a termination guarantee● Degree of completeness depends on execution order of TPs

● Heuristic based plan selection

Page 72: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 72

Chapter 3

Accessing a SPARQL Endpoint Queries over Multiple Datasets

➢ Query a given collection➢ Manage your own collection➢ Use a query federation system➢ Link traversal based query execution

Linked Data Queries➢ Foundations➢ Iterator Based Implementation➢ Query Planning

Page 73: Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (ICWE 2012 Ed.)

Olaf Hartig - ICWE 2012 Tutorial "An Introduction to SPARQL and Queries over Linked Data" - Chapter 3: Querying Linked Data 73

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)