efficient query answering against dynamic rdf databases
TRANSCRIPT
Efficient Query Answering againstDynamic RDF Databases
François Goasdoué, Ioana Manolescu,Alexandra Roatis
Université Paris-Sud & Inria Saclay (OAK project)
20 March 2013
Overview
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 2 / 35
The Resource Description Framework
Basic Graph Pattern Queries
Contributions
Experiments
Related Work
Conclusion
The Resource Description Framework
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model⊲ W3C standard
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model⊲ W3C standard
RDF Graph:
⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L
U – URIs,L – literals (constants),B – blank nodes
thesubjectshas thepropertypwith the value: theobjecto
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model⊲ W3C standard
RDF Graph:
⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L
U – URIs,L – literals (constants),B – blank nodes
thesubjectshas thepropertypwith the value: theobjecto
⊲ built-in property: rdf:type
specify to whichclassesa resource belongs
The Resource Description Framework (RDF)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 4 / 35
⊲ graph-based data model⊲ W3C standard
RDF Graph:
⊲ set oftriples: s p o s ∈ U ∪B, p ∈ U, o ∈ U ∪B ∪ L
U – URIs,L – literals (constants),B – blank nodes
thesubjectshas thepropertypwith the value: theobjecto
⊲ built-in property: rdf:type
specify to whichclassesa resource belongs
Constructor Triple Relational notation
Class assertion s rdf:type o o(s)Property assertion s p o p(s, o)
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF⊲ supportunknown URI/literal tokens
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF⊲ supportunknown URI/literal tokens
Example:
the country of_:b1 is Italy
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF⊲ supportunknown URI/literal tokens
Example:
the country of_:b1 is Italythe city of the same_:b1 is Genoa
Blank nodes
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 5 / 35
⊲ feature of RDF⊲ supportunknown URI/literal tokens
Example:
the country of_:b1 is Italythe city of the same_:b1 is Genoa
the population ofGenoais an unspecified value_:b2
Running example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 6 / 35
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
RDF Schema (RDFS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35
⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties
RDF Schema (RDFS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 7 / 35
⊲ feature of RDF⊲ enhance the descriptions in graphs⊲ declaresemantic constraintsbetween classes and properties
Built-in properties:
⊲ subclass relationships:rdfs:subClassOf⊲ subproperty relationships:rdfs:subPropertyOf⊲ typing the first attribute (domain) of a property:rdfs:domain⊲ typing the second attribute (range) of a property:rdfs:range
Constructor Triple Relational notation
Subclass constraint s rdfs:subClassOf o s ⊆ o
Subproperty constraint s rdfs:subPropertyOf o s ⊆ o
Domain typing constraint s rdfs:domain o Πdomain(s) ⊆ o
Range typing constraint s rdfs:range o Πrange(s) ⊆ o
Running example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 8 / 35
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples
Implicit triples→ considered part of the graph – not explicitly present
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples
Implicit triples→ considered part of the graph – not explicitly present
Entailment – reasoning mechanismset ofexplicit triples& someentailment rules
deriveimplicit information
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples
Implicit triples→ considered part of the graph – not explicitly present
Entailment – reasoning mechanismset ofexplicit triples& someentailment rules
deriveimplicit information
Exhaustive application of entailment rules→ saturation(a.k.a.closure)
The saturation of a graph isunique (up to blank node renaming).
Open-world assumption and RDF entailment
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 9 / 35
TheRDF data model is based on theopen-world assumption.→ deductive constraints – implicitlypropagate tuples
Implicit triples→ considered part of the graph – not explicitly present
Entailment – reasoning mechanismset ofexplicit triples& someentailment rules
deriveimplicit information
Exhaustive application of entailment rules→ saturation(a.k.a.closure)
The saturation of a graph isunique (up to blank node renaming).
Entailment is part of the RDF specification itself.
The semantics of an RDF graph is its saturation.
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
3)
book1writtenIn
Book English
rdfs:domain writtenInrdf:type
Entailment rules by example
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 10 / 35
1)
book1Book
Publication
rdfs:subClassOf
rdf:type
rdf:type
2)
book1writtenIn
hasLanguage English
rdfs:subPropertyOf writtenIn hasLanguage
3)
book1writtenIn
Book English
rdfs:domain writtenInrdf:type
4)
book1writtenIn
Language English
rdfs:range writtenInrdf:type
Basic Graph Pattern Queries
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP –conjunction of triple patterns(or triples)
q(x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L
x ∈ V (distinguished variables)
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP –conjunction of triple patterns(or triples)
q(x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L
x ∈ V (distinguished variables)
query evaluationtreats blank nodesin a query asnon-distinguished variables
Basic Graph Pattern (BGP) Queries
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 12 / 35
⊲ subset of SPARQL
⊲ BGP –conjunction of triple patterns(or triples)
q(x):- t1, . . . , tα
ti = si pi oi, si, pi ∈ U ∪B ∪ V, oi ∈ U ∪B ∪ V ∪ L
x ∈ V (distinguished variables)
query evaluationtreats blank nodesin a query asnon-distinguished variables
Example:
q(x, y):- x hasAuthor z, x rdf:type y≡
q(x, y):- x hasAuthor _:b0, x rdf:type y
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
queryevaluation6= queryanswering
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
queryevaluation6= queryanswering
the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set
the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
queryevaluation6= queryanswering
the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set
the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation
Solution:
decoupleRDF entailment from query evaluation
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 13 / 35
Problem:
queryevaluation6= queryanswering
the evaluation of a query only uses the graph’sexplicit triplesmay lead to anincomplete answer set
the(complete) answer setis obtained by evaluating the queryagainst the graph’ssaturation
Solution:
decoupleRDF entailment from query evaluation
Perform apre-processingstep to deal with entailed triples:
⊲ on the database –data saturation⊲ on the queries –query reformulation
Data saturation vs. Query reformulation
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 14 / 35
Data saturation
Advantages:
⊲ straightforward⊲ easy to implement
Drawbacks:
⊲ computation time⊲ additional storage space⊲ must be recomputed upon
database updates
Example:
the YAGO2 dataset doubles insize when computing theRDFS-closure→ 33M to 64M triples
Query reformulation
Advantages:
⊲ database saturation does not needto be (re)computed
Drawbacks:
⊲ every incoming query must bereformulated
⊲ reformulations can beprohibitively large
⊲ difficult to optimize
Example:
a single atom query over YAGO2,can yield of union of > 300 000queries
Contributions
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes
2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor
(i) an efficientincremental RDF saturation maintenance algorithm
(ii) a novelreformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes
2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor
(i) an efficientincremental RDF saturation maintenance algorithm
(ii) a novelreformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes
2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor
(i) an efficientincremental RDF saturation maintenance algorithm
(ii) a novelreformulation-based query answering algorithm
3. Thorough performance comparison and analysis
Contributions
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 16 / 35
1. The database (DB) fragment of RDFextending previously studied fragments by the support ofblank nodes
2. Novel BGP query answering techniques for this DB fragmentdesigned to work on top of on anystandard conjunctive query processor
(i) an efficientincremental RDF saturation maintenance algorithm
(ii) a novelreformulation-based query answering algorithm
3. Thorough performance comparison and analysis
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way
An RDF database: db = 〈D, S〉
D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics
The database (DB) fragment of RDF
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 17 / 35
⊲ restricts entailment toRDFS entailment⊲ doesnot restrict graphs in any way
An RDF database: db = 〈D, S〉
D & S – disjoint sets of triplesD (RDF) – instance level→ assertionsS (RDFS)– schema level→ semantics
db =
⟨
book1
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
Book
English
_:b0 Language
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type ,
Book
_:b1
Language
writtenIn
hasLanguage
Publication
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range rdfs:subPropertyOf⟩
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y∪
q(x,Publication):- x rdf:type Publication
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y∪
q(x,Publication):- x rdf:type Publication∪
q(x,Publication):- x rdf:type Book
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y∪
q(x,Publication):- x rdf:type Publication∪
q(x,Publication):- x rdf:type Book∪
q(x,Publication):- x writtenIn z
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
Reformulate(q, db) =
q(x, y):- x rdf:type y∪
q(x,Publication):- x rdf:type Publication∪
q(x,Publication):- x rdf:type Book∪
q(x,Publication):- x writtenIn z
∪ . . .∪q(x, _:b1):- x rdf:type _:b1
∪ . . .
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
≡q(x, _:b1):- x rdf:type z
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof theevaluationsof these queriesondb produces the correct answer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
≡q(x, _:b1):- x rdf:type z
Answer set:{〈book1, _:b1〉, 〈English, _:b1〉}wrong answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer
Book
_:b1
Language writtenIn
hasLanguage
Publication
LanguageEnglish
book1 _:b1
rdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
rdf:type
rdf:type
q(x, _:b1):- x rdf:type _:b1
6≡q(x, _:b1):- x rdf:type z
Answer set:{〈book1, _:b1〉}correct answer
Query reformulation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 18 / 35
Reformulate(q, db)
⊲ fixpoint algorithm(13 reformulation rules)⊲ reformulatesq into aset of queriess.t.theunionof thenon-standard evaluationsof these queries ondb produces the correctanswer⊲ size of the output:O((6 ∗#db2)#q)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples
Saturate(db) = db ∪
book1
Language
Publication
_:b1
English
rdf:type
rdf:type
hasLanguage
rdf:type
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)
Database saturation algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 19 / 35
Saturate(db)
⊲ fixpoint algorithm (4 saturation rules)⊲ explicitlyadds todb all its implicit triples⊲ size of the output:O(#db2)⊲ computation time:O(#db3)
What about updates?
Saturation maintenance algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35
Saturate+(db)
⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates
Saturation maintenance algorithm
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 20 / 35
Saturate+(db)
⊲ multisetvariant ofSaturate(db)⊲ allowssaturation maintenance upon updates
Saturate+(db) = db ∪
Book
book1
Language
Publication
_:b1
English
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To insert the triple:
book1 FrenchwrittenIn
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To insert the triple:
book1 FrenchwrittenIn
First saturate the triple usingdb:
book1
Language
Book
Publication
_:b1
French
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Example of instance insertion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 21 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
French
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
hasLanguage
rdf:type
writtenIn
To insert the triple:
book1 FrenchwrittenIn
First saturate the triple usingdb:
book1
Language
Book
Publication
_:b1
French
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
Theninsert the explicit triple
andthe inferred ones indb.
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenInrdfs:domain
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenInrdfs:domain
First infer affected data triplesusingdb:
book1
Book
Publication
_:b1
rdf:type
rdf:type
rdf:type
Example of schema deletion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 22 / 35
Book
_:b1
Language writtenIn
hasLanguage
Publicationrdfs:subClassOf
rdfs:subClassOf
rdfs:range
rdfs:subPropertyOf
book1
_:b1 Book
“Good Omens”
“Neil Gaiman”
“Terry Pratchett”
English
_:b0 Language
Publication
hasTitle
hasAuthor
hasAuthor
rdf:type
translatedTo
writtenIn
rdf:type
rdf:type
rdf:type
hasLanguage
rdf:type
To delete the triple:
BookwrittenInrdfs:domain
First infer affected data triplesusingdb:
book1
Book
Publication
_:b1
rdf:type
rdf:type
rdf:type
Thendelete the explicit triple
andthe inferred ones fromdb.
Experiments
Experimental setup
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 24 / 35
• implementation in Java 1.6• deployed on top of a PostgreSQL v8.5 server• 6 indexes – all permutations of the (s, p, o) columns• the spo index is clustering• dictionary encoding
Graph characteristics and saturation times:
Graph Storage Barton DBpedia DBLP
#Schema in memory 101 5, 666 41
#Instance Triple(s, p, o) 34× 106 27× 106 8.4× 106
#Saturation Sat(s, p, o) 39× 106 30× 106 12× 106
Saturation increase (%) 14.91 10.65 41.05
#Multiset SatM(s, p, o, isExp, count) 73.5× 106 66× 106 18.7× 106
Multiset increase (%) 116.89 227.37 121.97
tsat (s) 4, 294 2, 742 748
tsat+ (s) 4, 586 2, 977 799
Query answering
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 25 / 35
• 26 hand-picked queries (between 1 and 10 triple patterns – 6 on average)• similar query answering times onSat andSatM
� �����
���
�
��
���
�����
����AB�CDEFB�
������EB����E���B�FDEB�������DE
������EB����E���B�FDE���D�������DE
������EB����E���B�FDEB�������DE���������DB�������
Graph updates
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 26 / 35
• no impact on reformulation• saturation needs to maintainSatM• insertions & deletions• updates of one triple on the data and the schema
� �
�
����
���
�
��
���
�����
����AB�CDEFB�
�EB��EC��EB����DEA�������
�EB��EC�F�����DEA�������
�EB��EC��EB����DEA�����
�EB��EC�F�����DEA�����
BC�����EB����DEA�����
BC����F�����DEA�����
Saturation thresholds
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 27 / 35
Thesaturation thresholdof a queryq (st(q)):the smallest integern s.t.
n× tref (q) > n× t
sat(q) + tsat+
tref (q) – time to answerq throughreformulation(usingTriple)tsat(q) – time to answerq based onsaturation(usingSatM)tsat+ – time to saturatedb (createSatM)
� �
�
�
��
���
�����
������
�������
���������
����������
������AB
�CDE�CDF��D�����AB D�����AB����C��F��DC����F����DF�
D�����AB����C��F��DC����B�A�DF� D�����AB����C������C�F����DF�
D�����AB����C������C�B�A�DF�
Related Work
Outline of the positioning of our work
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 29 / 35
Query languageexpressive power
SPARQL
BGP queriesrelational
conjunctivequeries RDF fragment
expressive powerDL DB
[1, 3, 5]
[4, 6, 7]
[2]
thiswork
[1] ADJIMAN, P., GOASDOUÉ, F., AND ROUSSET, M.-C. SomeRDFS in the semantic web.JODS 8(2007).
[2] ARENAS, M., GUTIERREZ, C., AND PÉREZ, J. Foundations of RDF databases. InReasoning Web(2009).
[3] CALVANESE, D., GIACOMO, G. D., LEMBO, D., LENZERINI, M., AND ROSATI, R. Tractable reasoning and efficient query answering indescription logics: The DL-Lite family.Journal of Automated Reasoning (JAR) 39, 3 (2007).
[4] GOASDOUÉ, F., KARANASOS, K., LEBLAY, J., AND MANOLESCU, I. View selection in semantic web databases.PVLDB(2011).
[5] GOTTLOB, G., ORSI, G., AND PIERIS, A. Ontological queries: Rewriting and optimization. InICDE (2011). Keynote.
[6] KAOUDI, Z., MILIARAKI, I., AND KOUBARAKIS, M. RDFS reasoning and query answering on DHTs. InISWC(2008).
[7] URBANI, J., VAN HARMELEN, F., SCHLOBACH, S., AND BAL, H. QueryPIE: Backward reasoning for OWL Horst over very largeknowledge bases. InISWC(2011).
Conclusion
Conclusion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35
Summary:
⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques
robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis
Conclusion
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 31 / 35
Summary:
⊲ RDF fragment (extending those studied in the literature)⊲ novelsaturation-andreformulation-based query answering techniques
robust to instance and schema updates⊲ algorithms directly deployable on top of any RDBMS⊲ thorough performance comparison and analysis
Future work:
An automated strategy to choose between the two techniques:
Saturate+(db) / Reformulate(q, db)
Thank you!
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 32 / 35
I you attention
Question
_:b1
_:b2
_:b3
thankpay
ask
ask
ask
rdf:type
rdf:type
rdf:type
Open-world interpretation of RDFS constraints
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 33 / 35
Constraint interpretation:
⊲ closed-world assumption (CWA)any fact not present in the database is assumednot to holddatabase facts do not respect a constraint→ inconsistency
R1 ⊆ R2 – any tuple in the relationR1 must also be in the relationR2
⊲ open-world assumption (OWA)facts may hold even though they arenot in the database
R1 ⊆ R2 – any tuple in the relationR1 is also in the relationR2
TheRDF data model is based onOWA.
RDF meets Relational Database Management Systems (RDBMS)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 34 / 35
RDF graphs:incomplete relational databases based onV-tables
V-tables:allow using variables in their tuples
using a variable multiple times allows expressing joins on unknown values
BGP query answeringboils down toconjunctive query evaluationon asaturated database.
Saturation (related work)
EDBT 2013 Efficient Query Answering against Dynamic RDF Databases 35 / 35
• J. Broekstra and A. Kampman“Inferencing and truth maintenance in RDF Schema: Exploring a naivepractical approach”in PSSS Workshop, 2003.
• B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov“OWLIM: A family of scalable semantic repositories”Semantic Web, vol. 2, no. 1, 2011.
• C. Gutierrez, C. A. Hurtado, and A. A. Vaisman“RDFS update: From theory to practice”in ESWC, 2011.