friday talk 11.02.2011

23
Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Querying Live Linked Data Mini Viva presentation ( 11.02.2011) 1 by Jürgen Umbrich

Upload: juergen-umbrich

Post on 11-May-2015

294 views

Category:

Technology


1 download

DESCRIPTION

MiniViva presentation as part of the DERI Friday talk events

TRANSCRIPT

Page 1: Friday talk 11.02.2011

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Querying Live Linked Data

Mini Viva presentation ( 11.02.2011)

1

by Jürgen Umbrich

Page 2: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Querying in the Linked Data space

millions of diverse but often interrelated data

sources

“data everywhere” on the Web

no complete control over the data

crawl IndexYars2

Virtuoso

livedistributed querying

QP

sta

tic

dyn

am

ic

2

Page 3: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Linked Data is Dynamic

Dataset – Web data (’08 – ‘09) 24 weekly snapshots 4 hop neighborhood from Tim Berners-Lee FOAF file 550K RDF/XML docs, 3.3M unique entities

[ Umbrich et al. 2010 ]

Findings (entity level)

68% 32%

static dynamic

3

52%

24%

10%14%

<1 week >1 week<= 1 month

>1 month<= 3 month

>3 month<= 6 month

Change frequencyChange frequency

Page 4: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Accessing Linked Data

① Use URIs for things② Use HTTP URIs so that

people can look it up③ Provide useful

information, using standards (RDF, SPARQL)

④ Include links to other URIs

① Use URIs for things② Use HTTP URIs so that

people can look it up③ Provide useful

information, using standards (RDF, SPARQL)

④ Include links to other URIs

Direct correspondence between thing-URI and source-URI

http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me

HTTP-GETHTTP-GET

http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf

RDF/XMLRDF/XML

#me#me

http://dbpedia.org/resource/Galway

http://dbpedia.org/resource/Galway

4

foaf:based_near

Page 5: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Accessing Linked Data

http://dbpedia.org/resource/Galwayhttp://dbpedia.org/resource/Galway

Re-direct correspondence between thing-URI and source-URI

HTTP-GETHTTP-GET

http://dbpedia.org/data/Galwayhttp://dbpedia.org/data/Galway

HTMLHTML

http://dbpedia.org/page/Galwayhttp://dbpedia.org/page/Galway

Direct correspondence between thing-URI and source-URI

http://umbrich.net/foaf.rdf#mehttp://umbrich.net/foaf.rdf#me

HTTP-GETHTTP-GET

http://umbrich.net/foaf.rdfhttp://umbrich.net/foaf.rdf

RDF/XMLRDF/XML

#me#me

http://dbpedia.org/resource/Galway

http://dbpedia.org/resource/Galway

RDF/XMLRDF/XML

5

Page 6: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}

SELECT ?friendLabel WHERE{ juum:me foaf:knows ?f . ?f foaf:name ?friendLabel .}

The Problem

What are the query relevant sources?

Example Query

?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .

polleres.net/foaf.rdf

6

umbrich.net/foaf.rdf sw.deri.org/~aidanh/

Page 7: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Index

Source Selection Approaches

Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)

?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .

“Aidan Hogan”

“Aidan Hogan”

“Axel Polleres”

“Axel Polleres”

7

Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)

HTTP GET “Aidan Hogan” HTTP GET

“Axel Polleres”

HTTP GET

Page 8: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Source Selection Approaches

Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)

Direct execution/ graph traversal [Hartig et al. 2009]

Direct execution/ graph traversal [Hartig et al. 2009]

?f foaf:name ?friendLabel . ?f foaf:name ?friendLabel . juum:me foaf:knows ?f .juum:me foaf:knows ?f .

HTTP GET HTTP GET

“Aidan Hogan”

“Aidan Hogan”

“Aidan Hogan”

“Axel Polleres”

“Axel Polleres”

“Axel Polleres”

8

Direct execution/ graph traversal [Hartig et al. 2009]

Direct execution/ graph traversal [Hartig et al. 2009]

Page 9: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Source Selection Approaches

Schema-Level Indices [Stuckenschmidt et al.

2004]

Schema-Level Indices [Stuckenschmidt et al.

2004]Data Summaries

[Umbrich et al. 2010]Data Summaries

[Umbrich et al. 2010]

Inverted Indices [Heflin et al. 2010] (e.g.

Sindice V1.0)

Inverted Indices [Heflin et al. 2010] (e.g.

Sindice V1.0)

Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)

Direct execution/ graph traversal [Hartig et al.

2009]

Direct execution/ graph traversal [Hartig et al.

2009]

Index SizeQuery time recall freshness

ResultsQuery System

9

Page 10: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Approximate Data Summaries

Combined description of schema level and instance level

Use approximation to reduce index size (incurs false positives)

Index growth only with the number of sources

10

Multidimensional numerical dataspace

Hash-based data summaries

o

s1

301

30

Page 11: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

o

s1

301

30

10

20

10 20

Hash-based Data Summaries

① juum:me foaf:knows ah:ah <http…foaf.rdf>

11

① Input: triple + source information ② Hash triples

② [ 24 , 5 , 2 ] <http…foaf.rdf>

③ Insert hash-triple into dataspaceand store source information with buckets

③ INS([ 24 , 5 , 2 ] , http…foaf.rdf )

Equi-width histogram

④ Query for relevant sources

④ QUERY ( juum:me ?p ?o ) -> ( 24, ?, ? )

Page 12: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

o

s1

301

30

10

20

10 20

QTree: Efficient source selection

12

Equi-width histogram QTree

Combination of histograms and R-tree inheriting thebenefit of both data structures optimal for sparse data

Buckets store cardinality and set of sources => Top-k source rankinge.g. R1,1 ( 1: { http://…/foaf.rdf } )

Page 13: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Evaluation: Source Selection

13

J. Umbrich, K. Hose, M. Karnstedt, A. Harth, A. Polleres."Comparing Data Summaries for Processing Live Queries over Linked Data.”. In WWW Journal, Special Issue "Querying the Data Web", 2011

Page 14: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Source Selection Approaches

Schema-Level Indices [Stuckenschmidt et al.

2004]

Schema-Level Indices [Stuckenschmidt et al.

2004]Data Summaries

[Umbrich et al. 2010]Data Summaries

[Umbrich et al. 2010]

Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)

Inverted Indices [Hefflin et al. 2010] (e.g. Sindice V1.0)

Quad Store (e.g. Yars2)Quad Store (e.g. Yars2)

Direct execution/ graph traversal [Hartig et al.

2009]

Direct execution/ graph traversal [Hartig et al.

2009]

Index SizeQuery time recall freshness

ResultsQuery System

14

Page 15: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Querying in the Linked Data Space

millions of diverse but often interrelated data

sources

“data everywhere” on the Web

no complete control over the data

crawl MATIndex

livedistributed querying

QP

sta

tic

dyn

am

ic

Combined Query of RDF stores and the Linked Data Web

Combined Query of RDF stores and the Linked Data Web

15

Page 16: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Improved Query Time & Fresh Results

query

tim

e

#number of query execution

live querying

index querying

16

combined queryinglearning about source dynamics

combined querying

decrease query time by avoiding unnecessary HTTP lookups and still returning fresh results

Page 17: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Current Research Question

17

How to combined queryRDF stores and the Linked Data Web

Page 18: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Combined Query Processing

Live results on top of SPARQL stores

SPARQL

Index

query

live results

Query Processo

r

18

to decide (at query time) if we access the static store or the Web resources

Linked Data Web

by integrating the knowledge about the dynamic of sources into the query processor

SourceSelectio

n

Dynamics

SourceSelectio

n

Dynamics

Query Processo

r

Yars2,Virtuoso

Page 19: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Mining Dynamic/Static Patterns

Goal acquire knowledge about dynamic patterns

( e.g. geo:lat, geo:long) Considering context of a node ( e.g. a location value of a city

vs location value of a GPS sensor )

19

Dynamics Based on two datasets (started in March 2010 ) Daily 3-hop neighborhood crawls from 20 seed URIs Weekly snapshots over ~10 month

10% sampling from a billion triples crawl(fixed URI list, contains ~2K web vocabularies)

Learn to predict changes events

Page 20: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Query Processor

Collaboration with Yuan (APEXLAB)

Elaboration on how dynamic query planning can support data access decision taking into account dynamic patterns

Investigation of one of the possible approaches

20

Query Processo

r

Page 21: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Evaluation

Based on simulation using our dynamic mining dataset

Based on real-world data Linked Stream Data effort Using the gathered knowledge from our dynamic mining

Evaluation criteria Query time ( number of HTTP lookups ) Result freshness Recall (number of results)

21

Page 22: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

22

How to combined query RDF stores and the Linked Data Web

to return fresh results

SourceSelecti

on

Dynamics

SPARQL

Index

Query Process

or

query

live results

Questions ?

Page 23: Friday talk 11.02.2011

Digital Enterprise Research Institute www.deri.ie

Literature

23

[Hartig 2009 ] O. Hartig, Ch. Bizer, and J.-Ch. Freytag. Executing SPARQL Queries over the Web of Linked Data. In ISWC’09, 2009.

[Stuckenschmidt] H. Stuckenschmidt, R. Vdovjak, J. Broekstra, and G.-J. Houben. Towards distributed processing of RDF path queries. JWET, 2(2/3):207–230, 2005.

[Umbrich 2010] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, S. Decker. Towards Understanding Dataset Dynamics: Change Frequency of Linked Data Sources. LODW 2010 at WWW 2010, 2010.

. [Heflin 2010] Y. Li and J. Heflin. Using Reformulation Trees to Optimize Queries over Distributed Heterogeneous Sources. In proceedings of the 9th International Semantic Web Conference (ISWC2010). 2010.