waled almukawi

28
1 Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks Erietta Liarou, Stratos Idreos, and Manolis Koubarakis 2006. Waled Almukawi. Albert-Ludwigs-Universität Fr eiburg Rechnernetze und Telematik

Upload: jin

Post on 07-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Albert-Ludwigs-Universität Freiburg Rechnernetze und Telematik. Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks Erietta Liarou, Stratos Idreos, and Manolis Koubarakis 2006. Waled Almukawi. Outline. Motivation Terminologies Conjunctive Queries RDF - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Waled Almukawi

1

Evaluating Conjunctive Triple Pattern Queries over Large Structured

Overlay Networks

Erietta Liarou, Stratos Idreos, and Manolis Koubarakis 2006.

Waled Almukawi.

Albert-Ludwigs-Universität Freiburg

Rechnernetze und Telematik

Page 2: Waled Almukawi

2

Outline

Motivation Terminologies

Conjunctive Queries

RDF

DHT

System Model & Data Model Query Chain Algorithm Spread By Value Algorithm Experiments Future Works References

Page 3: Waled Almukawi

3

Motivation

One of the most challenging and interesting open

problems in the frontiers of P2P and Semantic Web is how

to evaluate queries expressed in RDF query languages on

top of P2P networks .

Previous Rdfpeers used only atomic formula or disjoint

RDF triple pattern.

The conjunctive queries are interesting, since they provide

a flexible methods that can deal effectively with a full

functionality of existing RDF query .

Page 4: Waled Almukawi

4

Conjunctive Queries

The conjunctive queries are simply the fragment of first

order logic given by the set of atomic formula using

conjunction Λ .

A conjunctive query q formula :

?x1, . . . , ?xn : (s1, p1, o1) ^ (s2, p2, o2) ^ · · · ^ (sn, pn, on)

where ?x1, . . . , ?xn are answer variables, each (si, pi, oi)

is a triple pattern, and each variable ?xi appears in at least

one triple pattern (si, pi, oi).

Page 5: Waled Almukawi

Conjunctive queries (2)

Example:

Q:ans(x,y):- country(Id,x),capital(y,x ),continent(x,“Europa“)

Q‘:ans(x,y):-(?x,“type“,“country“),(?x,“has-capital“,?y),

(?x, continent,“Europa“).

5

Page 6: Waled Almukawi

6

Resource Description Framework (RDF) RDF is a framework for describing Web resources, such as

the title, author, content …e.t.c.

RDF Statements is The combination of a Resource, a

Property, and a Property value forms a Statement (known

as the subject, predicate and object of a Statement).

SubjectSubject ObjectObject

Uris, blank node

Predicate

Uris, Literals, Blank nodes

Page 7: Waled Almukawi

7

Distriuted Hash Tables (DHT) DHTs are an important class of P2P networks that offer

distributed hash table functionality, and allow one to

develop scalable, robust and fault-tolerant distributed

applications.

Distribute data over a large P2P network

Can Quickly find any given item ni Log N nodes.

Look up for an Item in the DHT is performed in O(Log N)

nodes.

Page 8: Waled Almukawi

System Model ( ref:[4] )

10N11N42+32

42N42N42+0keyssucc.distan

5850464443

N63N50N48N48N48

N42+4N42+2N42+1

N42+16N42+8

finger table

N27

N11

N2

N56

N50

N17

N63

N48

N6

N33

N42

27N27N11+16

11N11N11+0keyssucc.distan

43

19151312

N17N11+4N17N11+2N17N11+1

N48N11+32

N27N11+8

finger table

28N33N27+127N27N27+0

keyssucc.distan

5943353129

N63N48N42N33N33

N27+4N27+2

N27+16N27+32

N27+8

finger table

Key 28Key 28

N42N42

1010N11N11N42+32N42+32

N11N112727N27N27N11+16N11+16

N27N27 2828N33N33N27+1N27+1

N33N33

Page 9: Waled Almukawi

9

Data Model

Each network node is able to describe in RDF the resources that it wants to make available to the rest of the network .

It creates and inserts the RDF triples

( Subject, Predicate, Object ). The subject of a triple identifies the resource that the statement

is about. The predicate identifies a property or a characteristic of the

subject. The object identifies the value of the property.

Page 10: Waled Almukawi

10

Query Chain Algorithm The main idea of QC is that the query is evaluated by a chain

of nodes.

Intermediate results flow through the nodes of this chain

Finally the last node in the chain delivers the result back to

the node that submitted the query.

Distribute the Load Balance by

Split the query into the triple patterns that it consists of.

Evaluate each triple pattern separately to a deferent node.

Intermediate result flow through these nodes as triple and

partially satisfy the query.

Page 11: Waled Almukawi

Query Chain Algorithm

11

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(S)

T2=Hash(P)

T3=Hash(O)

r1r1

r2r2

r3r3

Indexing a new triple

Page 12: Waled Almukawi

Query Chain Algorithm

12

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(p1)

T2=Hash(p2)

T3=Hash(o3)

r1r1

r2r2

r3r3

Indexing a new triple

Page 13: Waled Almukawi

13

Query Chain Algorithm Evaluating a query

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

xx

r1r1

R1 searches in its TT

and evaluates t1

R1 searches in its TT

and evaluates t1 r1r1

r2r2 QEval(q,2,R’,ip(x)) QEval(q,2,R’,ip(x)) QEval (q,i,R’,ip(x)) QEval (q,i,R’,ip(x))

L={s1,s2}

R‘={s1,s2}

L={s1,s2}

R‘={s1,s2}

S1,p1,s2S1,p1,s2

Page 14: Waled Almukawi

14

Query Chain Algorithm Evaluating a query

R2 searches in its TT and evaluates

t2 =(?y,p2,?z)

R2 searches in its TT and evaluates

t2 =(?y,p2,?z)

r2r2

r3r3

R3 searches in its TT

and evaluates t3

(?z,p3,o3)

R3 searches in its TT

and evaluates t3

(?z,p3,o3)

r3r3

XX Delivers answer

answer (q,R’)

Delivers answer

answer (q,R’) QEval (q,3,R’,ip(x)) QEval (q,3,R’,ip(x))

S2,p2,s3S2,p2,s3

L={s2,s3},R‘={s1,s3}L={s2,s3},R‘={s1,s3}

L={s3},R‘={s1}L={s3},R‘={s1}

S3,p3,s3S3,p3,s3

Page 15: Waled Almukawi

Query chain algorithms Local processing at each chain node

15

nn

QEval (q,i,R,ip(x))

L = π X ( σF (TT) )

e.g : q1=(?s1,p1,o1)

L=π 1(σ2=p1Λ3=o1(TT) )

R’=π Y(R∞ L )

n’n’

QEval (q, i+1, R’, ip(x))

nknk

QEval (q,k, R’, ip(x))

R’=π Y(R ∞ π X ( σF (TT) )

Answer (q,R’)

Page 16: Waled Almukawi

16

Spread By Value Algorithm

In SBV the triple pattern t will be stored at the successor nodes of the identifiers Hash(s), Hash(p), Hash(o), Hash(s + p), Hash(s + o), Hash(p + o) and Hash(s + p + o).

SBV exploits the values of matching triples found while processing the query incrementally.

Page 17: Waled Almukawi

SBV algorithm

17

N42

N27

N11

N2

N56

N50

XX

N63

N48

N6

N33

T1=Hash(p1)

T2=Hash(s2+p2)

T3=Hash(s3+o3)

r1r1

r2r2

r3r3

Indexing a new triple

T4=Hash(p1)

Page 18: Waled Almukawi

18

SBV Algorithm Evaluating a query

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

q is ?x:(?x,p1,?y),

(?y,p2,?z),

(?z,p3,o3).

xx

r1r1 QEval(q, V, u, IP(x)),

r1 evaluates (?x,p1,?y)

QEval(q, V, u, IP(x)),

r1 evaluates (?x,p1,?y)

Page 19: Waled Almukawi

SBV Algorithm Evaluating a query

R1 searches in its TT and finds

v ’a={?x=s1,y=s2},v’b={?x=s1,y=o1}

q’a=(s2p2,?z)(?z,p3,o3)

q’b=(o1p2,?z)(?z,p3,o3)

R1 searches in its TT and finds

v ’a={?x=s1,y=s2},v’b={?x=s1,y=o1}

q’a=(s2p2,?z)(?z,p3,o3)

q’b=(o1p2,?z)(?z,p3,o3)

r1r1

r2r2

S (H

(p2+01))

QEval(q’a,V,v’a,ip(x)) QEval(q’a,V,v’a,ip(x))

r4r4

QEval(q’b,V,v’b,ip(x)) QEval(q’b,V,v’b,ip(x))

(?x,p1,?y) ,(?y,p2,z3)

t1=(s1,p1,s2)

t4=(s1,p1,o1)

(?x,p1,?y) ,(?y,p2,z3)

t1=(s1,p1,s2)

t4=(s1,p1,o1)

Page 20: Waled Almukawi

SBV Algorithm Evaluating a query

Each node searches in its TT for matching

triples (?z,p3,o3 )

v’’a={?x=s1,?y=s2,?z=s3} q’’a=(s3,p3,o3)

Each node searches in its TT for matching

triples (?z,p3,o3 )

v’’a={?x=s1,?y=s2,?z=s3} q’’a=(s3,p3,o3)

r2r2

r3r3

QEval (q’’a,V,v’’a,ip(x)) QEval (q’’a,V,v’’a,ip(x))

xxIP(x)

v’’’a={?x=s1,?y=s2,?z=s3}

Delivers answer Answer (q,V)

v’’’a={?x=s1,?y=s2,?z=s3}

Delivers answer Answer (q,V)

s2,p2,s3s2,p2,s3

s3,p3,s3s3,p3,s3

Page 21: Waled Almukawi

Comparing the query chains in QC and SBV

21

q is ?x : (s1,p1,?x),(?x,p2,?y),(?y,p3,o3).

?x has four matching values for (s1,p1,?x) .

?y has three matching values for each value of ?x .

QC SBV

Page 22: Waled Almukawi

Optimizing network traffic

IP cache (IPC) used by our algorithms to significantly

reduce network traffic.

Another effect of IPC, is that we reduce the routing load

incurred by nodes in the network.

22

Page 23: Waled Almukawi

Experiments

A simulator for chord network with java code was done.

Our metrics are:

1) The amount of network traffic that is created.

2) how well the query processing load and storage load

are distribution among the network nodes.

23

Page 24: Waled Almukawi

24

Experiments

Some Experimental evaluating of the two algorithms ,and the result are showed in the following figures

(b) IPC effect(a) Traffic cost

Page 25: Waled Almukawi

Experiments Query processing and storage load distribution

25

Page 26: Waled Almukawi

Future work

Order of triple choosed to evaluate the

query.

RDFpeers reasoning.

26

Page 27: Waled Almukawi

27

Reference

1) E. Liarou, S. Idreos, and M. Koubarakis. Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo.

2) E. Liarou, S. Idreos, and M. Koubarakis. Publish-Subscribe with RDF Data over Large Structured ,Overlay Networks. In DBISP2P ’05.

3) Book : Foundations of Databases, Addison-Wesley,

1995. ISBN 0-201-53771-0.

4) I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord:

A Scalable peer-to-peer lookup service for internet applications.

5) E. Prud’hommeaux and A. Seaborn. SPARQL Query Language for RDF.

http://www.w3.org/TR/rdf-sparql-query/, 2005.

Page 28: Waled Almukawi

Thank you for your Attention.

28