efficient query answering against dynamic rdf databases

Post on 22-May-2015

168 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Efficient Query Answering againstDynamic RDF Databases

François Goasdoué, Ioana Manolescu,Alexandra Roatis

Université Paris-Sud & Inria Saclay (OAK project)

20 March 2013

Overview

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35

The Resource Description Framework

Basic Graph Pattern Queries

Contributions

Experiments

Related Work

Conclusion

The Resource Description Framework

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

⊲ built-in property: rdf:type

specify to whichclassesa resource belongs

The Resource Description Framework (RDF)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35

⊲ graph-based data model⊲ W3C standard

RDF Graph:

⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L

U – URIs,L – literals (constants),B – blank nodes

thesubjectshas thepropertypwith the value: theobjecto

⊲ built-in property: rdf:type

specify to whichclassesa resource belongs

Constructor Triple Relational notation

Class assertion s rdf:type o o(s)Property assertion s p o p(s, o)

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italy

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italythe city of the same_:b1 is Genoa

Blank nodes

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35

⊲ feature of RDF⊲ supportunknown URI/literal tokens

Example:

the country of_:b1 is Italythe city of the same_:b1 is Genoa

the population ofGenoais an unspecified value_:b2

Running example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

RDF Schema (RDFS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35

⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties

RDF Schema (RDFS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35

⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties

Built-in properties:

⊲ subclass relationships:rdfs:subClassOf⊲ subproperty relationships:rdfs:subPropertyOf⊲ typing the first attribute (domain) of a property:rdfs:domain⊲ typing the second attribute (range) of a property:rdfs:range

Constructor Triple Relational notation

Subclass constraint s rdfs:subClassOf o s ⊆ o

Subproperty constraint s rdfs:subPropertyOf o s ⊆ o

Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o

Range typing constraint s rdfs:range o Πrange(s) ⊆ o

Running example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Exhaustive application of entailment rules→ saturation(a.k.a.closure)

The saturation of a graph isunique (up to blank node renaming).

Open-world assumption and RDF entailment

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35

TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples

Implicit triples→ considered part of the graph – not explicitly present

Entailment – reasoning mechanismset ofexplicit triples& someentailment rules

deriveimplicit information

Exhaustive application of entailment rules→ saturation(a.k.a.closure)

The saturation of a graph isunique (up to blank node renaming).

Entailment is part of the RDF specification itself.

The semantics of an RDF graph is its saturation.

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

3)

book1writtenIn

Book English

rdfs:domain writtenInrdf:type

Entailment rules by example

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35

1)

book1Book

Publication

rdfs:subClassOf

rdf:type

rdf:type

2)

book1writtenIn

hasLanguage English

rdfs:subPropertyOf writtenIn hasLanguage

3)

book1writtenIn

Book English

rdfs:domain writtenInrdf:type

4)

book1writtenIn

Language English

rdfs:range writtenInrdf:type

Basic Graph Pattern Queries

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

query evaluationtreats blank nodesin a query asnon-distinguished variables

Basic Graph Pattern (BGP) Queries

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35

⊲ subset of SPARQL

⊲ BGP –conjunction of triple patterns(or triples)

q(x):- t1, . . . , tα

ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L

x ∈ V (distinguished variables)

query evaluationtreats blank nodesin a query asnon-distinguished variables

Example:

q(x, y):- x hasAuthor z, x rdf:type y≡

q(x, y):- x hasAuthor _:b0, x rdf:type y

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Solution:

decoupleRDF entailment from query evaluation

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35

Problem:

queryevaluation6= queryanswering

the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set

the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation

Solution:

decoupleRDF entailment from query evaluation

Perform apre-processingstep to deal with entailed triples:

⊲ on the database –data saturation⊲ on the queries –query reformulation

Data saturation vs. Query reformulation

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35

Data saturation

Advantages:

⊲ straightforward⊲ easy to implement

Drawbacks:

⊲ computation time⊲ additional storage space⊲ must be recomputed upon

database updates

Example:

the YAGO2 dataset doubles insize when computing theRDFS-closure→ 33M to 64M triples

Query reformulation

Advantages:

⊲ database saturation does not needto be (re)computed

Drawbacks:

⊲ every incoming query must bereformulated

⊲ reformulations can beprohibitively large

⊲ difficult to optimize

Example:

a single atom query over YAGO2,can yield of union of > 300 000queries

Contributions

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

Contributions

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35

1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes

2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor

(i) an efficientincremental RDF saturation maintenance algorithm

(ii) a novelreformulation-based query answering algorithm

3. Thorough performance comparison and analysis

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

An RDF database: db = 〈D, S〉

D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics

The database (DB) fragment of RDF

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35

⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way

An RDF database: db = 〈D, S〉

D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics

db =

book1

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

Book

English

_:b0 Language

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type ,

Book

_:b1

Language

writtenIn

hasLanguage

Publication

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range rdfs:subPropertyOf⟩

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book∪

q(x,Publication):- x writtenIn z

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

Reformulate(q, db) =

q(x, y):- x rdf:type y∪

q(x,Publication):- x rdf:type Publication∪

q(x,Publication):- x rdf:type Book∪

q(x,Publication):- x writtenIn z

∪ . . .∪q(x, _:b1):- x rdf:type _:b1

∪ . . .

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

≡q(x, _:b1):- x rdf:type z

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

≡q(x, _:b1):- x rdf:type z

Answer set:{〈book1, _:b1〉, 〈English, _:b1〉}wrong answer

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer

Book

_:b1

Language writtenIn

hasLanguage

Publication

LanguageEnglish

book1 _:b1

rdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

rdf:type

rdf:type

q(x, _:b1):- x rdf:type _:b1

6≡q(x, _:b1):- x rdf:type z

Answer set:{〈book1, _:b1〉}correct answer

Query reformulation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35

Reformulate(q, db)

⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer⊲ size of the output:O((6 ∗#db2)#q)

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples

Saturate(db) = db ∪

book1

Language

Publication

_:b1

English

rdf:type

rdf:type

hasLanguage

rdf:type

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)

Database saturation algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35

Saturate(db)

⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)

What about updates?

Saturation maintenance algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35

Saturate+(db)

⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates

Saturation maintenance algorithm

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35

Saturate+(db)

⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates

Saturate+(db) = db ∪

Book

book1

Language

Publication

_:b1

English

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To insert the triple:

book1 FrenchwrittenIn

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To insert the triple:

book1 FrenchwrittenIn

First saturate the triple usingdb:

book1

Language

Book

Publication

_:b1

French

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Example of instance insertion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

French

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

hasLanguage

rdf:type

writtenIn

To insert the triple:

book1 FrenchwrittenIn

First saturate the triple usingdb:

book1

Language

Book

Publication

_:b1

French

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

Theninsert the explicit triple

andthe inferred ones indb.

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

First infer affected data triplesusingdb:

book1

Book

Publication

_:b1

rdf:type

rdf:type

rdf:type

Example of schema deletion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35

Book

_:b1

Language writtenIn

hasLanguage

Publicationrdfs:subClassOf

rdfs:subClassOf

rdfs:range

rdfs:subPropertyOf

book1

_:b1 Book

“Good Omens”

“Neil Gaiman”

“Terry Pratchett”

English

_:b0 Language

Publication

hasTitle

hasAuthor

hasAuthor

rdf:type

translatedTo

writtenIn

rdf:type

rdf:type

rdf:type

hasLanguage

rdf:type

To delete the triple:

BookwrittenInrdfs:domain

First infer affected data triplesusingdb:

book1

Book

Publication

_:b1

rdf:type

rdf:type

rdf:type

Thendelete the explicit triple

andthe inferred ones fromdb.

Experiments

Experimental setup

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35

• implementation in Java 1.6• deployed on top of a PostgreSQL v8.5 server• 6 indexes – all permutations of the (s, p, o) columns• the spo index is clustering• dictionary encoding

Graph characteristics and saturation times:

Graph Storage Barton DBpedia DBLP

#Schema in memory 101 5, 666 41

#Instance Triple(s, p, o) 34× 106 27× 106 8.4× 106

#Saturation Sat(s, p, o) 39× 106 30× 106 12× 106

Saturation increase (%) 14.91 10.65 41.05

#Multiset SatM(s, p, o, isExp, count) 73.5× 106 66× 106 18.7× 106

Multiset increase (%) 116.89 227.37 121.97

tsat (s) 4, 294 2, 742 748

tsat+ (s) 4, 586 2, 977 799

Query answering

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35

• 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average)• similar query answering times onSat andSatM

� �����

���

��

���

�����

����AB�CDEFB�

������EB����E���B�FDEB�������DE

������EB����E���B�FDE���D�������DE

������EB����E���B�FDEB�������DE���������DB�������

Graph updates

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35

• no impact on reformulation• saturation needs to maintainSatM• insertions & deletions• updates of one triple on the data and the schema

� �

����

���

��

���

�����

����AB�CDEFB�

�EB��EC��EB����DEA�������

�EB��EC�F�����DEA�������

�EB��EC��EB����DEA�����

�EB��EC�F�����DEA�����

BC�����EB����DEA�����

BC����F�����DEA�����

Saturation thresholds

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35

Thesaturation thresholdof a queryq (st(q)):the smallest integern s.t.

n× tref (q) > n× t

sat(q) + tsat+

tref (q) – time to answerq throughreformulation(usingTriple)tsat(q) – time to answerq based onsaturation(usingSatM)tsat+ – time to saturatedb (createSatM)

� �

��

���

�����

������

�������

���������

����������

������AB

�CDE�CDF��D�����AB D�����AB����C��F��DC����F����DF�

D�����AB����C��F��DC����B�A�DF� D�����AB����C������C�F����DF�

D�����AB����C������C�B�A�DF�

Related Work

Outline of the positioning of our work

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35

Query languageexpressive power

SPARQL

BGP queriesrelational

conjunctivequeries RDF fragment

expressive powerDL DB

[1, 3, 5]

[4, 6, 7]

[2]

thiswork

[1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web.JODS 8(2007).

[2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. InReasoning Web(2009).

[3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering indescription logics: The DL-Lite family.Journal of Automated Reasoning (JAR) 39, 3 (2007).

[4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases.PVLDB(2011).

[5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. InICDE (2011). Keynote.

[6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. InISWC(2008).

[7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very largeknowledge bases. InISWC(2011).

Conclusion

Conclusion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35

Summary:

⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques

robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis

Conclusion

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35

Summary:

⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques

robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis

Future work:

An automated strategy to choose between the two techniques:

Saturate+(db) / Reformulate(q, db)

Thank you!

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35

I you attention

Question

_:b1

_:b2

_:b3

thankpay

ask

ask

ask

rdf:type

rdf:type

rdf:type

Open-world interpretation of RDFS constraints

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35

Constraint interpretation:

⊲ closed-world assumption (CWA)any fact not present in the database is assumednot to holddatabase facts do not respect a constraint→ inconsistency

R1 ⊆ R2 – any tuple in the relationR1 must also be in the relationR2

⊲ open-world assumption (OWA)facts may hold even though they arenot in the database

R1 ⊆ R2 – any tuple in the relationR1 is also in the relationR2

TheRDF data model is based onOWA.

RDF meets Relational Database Management Systems (RDBMS)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35

RDF graphs:incomplete relational databases based onV-tables

V-tables:allow using variables in their tuples

using a variable multiple times allows expressing joins on unknown values

BGP query answeringboils down toconjunctive query evaluationon asaturated database.

Saturation (related work)

EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35

• J. Broekstra and A. Kampman“Inferencing and truth maintenance in RDF Schema: Exploring a naivepractical approach”in PSSS Workshop, 2003.

• B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov“OWLIM: A family of scalable semantic repositories”Semantic Web, vol. 2, no. 1, 2011.

• C. Gutierrez, C. A. Hurtado, and A. A. Vaisman“RDFS update: From theory to practice”in ESWC, 2011.

top related