database optimization techniques for semantic queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf ·...
TRANSCRIPT
Database Optimization Techniques for SemanticQueries
Ioana Manolescu
INRIA, [email protected],
http://pages.saclay.inria.fr/Ioana.Manolescu
https://team.inria.fr/oak
SEBD 2015, Gaeta
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 1 / 73
Part I
Motivation and outline
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 2 / 73
Motivation and context
Querying Semantic data
Most popular data model: RDF (W3C’s Resource DescriptionFramework)
Famous application: the Linked Open Data cloud
(2007)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 3 / 73
Motivation and context
Querying Semantic data
Most popular data model: RDF (W3C’s Resource DescriptionFramework)
Famous application: the Linked Open Data cloud (2007)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 3 / 73
Motivation and context
Querying Semantic data
Most popular data model: RDF (W3C’s Resource DescriptionFramework)
Famous application: the Linked Open Data cloud (2008)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 4 / 73
Motivation and context
Querying Semantic data
Most popular data model: RDF (W3C’s Resource DescriptionFramework)
Famous application: the Linked Open Data cloud (2009)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 5 / 73
Motivation and context
Querying Semantic data
Most popular data model: RDF (W3C’s Resource DescriptionFramework)
Famous application: the Linked Open Data cloud (2010)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 6 / 73
Motivation and context
Linked Open Data cloud, 2014
900,129 documents describing 8,038,396 resources(Schmachtenberg, Bizer, Paulheim, ISWC 2014)
Topic Datasets %Government 183 18.05
Publications 96 9.47
Life sciences 83 8.19
User-generated 48 4.73
Cross-domain 41 4.04
Media 22 2.17
Geographic 21 2.07
Social web 520 51.28
Total 1014 100
There is more (Billion Triple Challenge etc.)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 7 / 73
Motivation and context
Querying Semantic Data
1 RDF: three-attribute relation (subject, property, object)
The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other
2 RDFS semantics providing information about the propertiesand classes (types) of resources:
Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...
Semantics leads to implicit data
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73
Motivation and context
Querying Semantic Data
1 RDF: three-attribute relation (subject, property, object)
The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other
2 RDFS semantics providing information about the propertiesand classes (types) of resources:
Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...
Semantics leads to implicit data
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73
Motivation and context
Querying Semantic Data
1 RDF: three-attribute relation (subject, property, object)
The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other
2 RDFS semantics providing information about the propertiesand classes (types) of resources:
Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...
Semantics leads to implicit data
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73
Motivation and context
Querying Semantic Data
Main RDFS constraints: rdfs:subClassOf, rdfs:subPropertyOf,rdfs:domain, rdfs:range
Beyond RDFS
Many (richer) constraint (or ontology) languages
W3C standards: OWL 2 profile family
DL Lite family� Calvanese, De Giacomo, Lembo, Lenzerini, Rosati, JAR2007
Datalog±
� Cali, Gottlob Lukasiewick, Marnette, Pierris, LICS 2010
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 9 / 73
Motivation and context
Do we really need the semantics?
Yes. All the time.
Application knowledge / constraints:
Every Senator is an ElectedOfficial which is a Person
(On Wikipedia) being BornInAPlace means being a Person
The source and destination of a tripFromTo are either astreetAddress, or a cityAddress or a countryAddress
1 Without the semantics, we may miss query answers
2 Semantic contraints are a compact way of encodinginformation (“every ElectedOfficial is a Person” stated onlyonce)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 10 / 73
Motivation and context
Do we really need the semantics?
Yes. All the time.
Application knowledge / constraints:
Every Senator is an ElectedOfficial which is a Person
(On Wikipedia) being BornInAPlace means being a Person
The source and destination of a tripFromTo are either astreetAddress, or a cityAddress or a countryAddress
1 Without the semantics, we may miss query answers
2 Semantic contraints are a compact way of encodinginformation (“every ElectedOfficial is a Person” stated onlyonce)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 10 / 73
Motivation and context
Outline
1 Motivation
2 RDF data model and query language
3 Query answering techniques
4 Cover-based query reformulation for the database fragment ofRDF
5 Cover-based query reformulation framework for FOL-reduciblesettings
6 Performance and concluding remarks
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 11 / 73
Part II
RDF data and queries
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 12 / 73
RDF data model and query language RDF
The Resource Description Framework (RDF)
RDF graph – set of triples
Assertion Triple Relational notation
Class s rdf:type o o(s)Property s p o p(s, o)
doi1
Book
“El Aleph”
:b1
“J. L. Borges”
“1949”
publishedIn
hasTitle
writtenBy
hasName
rdf:typeresource (URI)
blank node
literal (string)
property
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 13 / 73
RDF data model and query language RDFS
RDF Schema (RDFS)
Declare deductive constraints between classes and properties
Constraint Triple OWA interpretation
Subclass s rdfs:subClassOf o s ⊆ o
Subproperty s rdfs:subPropertyOf o s ⊆ o
Domain typing s rdfs:domain o Πdomain(s) ⊆ o
Range typing s rdfs:range o Πrange(s) ⊆ o
Book
Publication
Person
writtenBy
hasAuthor
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 14 / 73
RDF data model and query language RDF entailment
Open-world assumption and RDF entailment
RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples
Implicit triples: part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples+ → derive implicit triples
some entailment rules
Exhaustive application of entailment → saturation (closure)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73
RDF data model and query language RDF entailment
Open-world assumption and RDF entailment
RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples
Implicit triples: part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples+ → derive implicit triples
some entailment rules
Exhaustive application of entailment → saturation (closure)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73
RDF data model and query language RDF entailment
Open-world assumption and RDF entailment
RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples
Implicit triples: part of the graph – not explicitly present
Entailment – reasoning mechanism
set of explicit triples+ → derive implicit triples
some entailment rules
Exhaustive application of entailment → saturation (closure)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73
RDF data model and query language RDF entailment
RDF entailment
The semantics of an RDF graph G is its saturation G∞.
Sample RDFSentailment rules
Instance entailment from combining schema and instancetriples
rdfs9 c1 rdfs:subClassOf c2 ∧ s rdf:type c1 `rdfs9RDF s rdf:type c2
rdfs7 p1 rdfs:subPropertyOf p2 ∧ s p1 o `rdfs7RDF s p2 o
rdfs2 p rdfs:domain c ∧ s p o `rdfs2RDF s rdf:type c
rdfs3 p rdfs:range c ∧ s p o `rdfs3RDF o rdf:type c
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73
RDF data model and query language Query answering
SPARQL query language and SPARQL conjunctive queries
SPARQL is the W3C query language for RDF.
SPARQL conjunctive queries = Basic Graph Pattern (BGP) queries
Sample BGP query:
q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,hasAuthor, a),(b, publishedIn, ”1949”)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 17 / 73
RDF data model and query language Query answering
Query semantics / Answer set
query evaluation 6= query answering
The evaluation of a query only uses the graph’s explicittriples
For the (complete) answer set, evaluate q against thegraph’s saturation
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 18 / 73
RDF data model and query language Query answering
Query answering example
Given the query q(x, y):- x rdf:type y:
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(G) = {(doi1,Book)}q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73
RDF data model and query language Query answering
Query answering example
Given the query q(x, y):- x rdf:type y:
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(G) = {(doi1,Book)}
q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73
RDF data model and query language Query answering
Query answering example
Given the query q(x, y):- x rdf:type y:
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(G) = {(doi1,Book)}q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73
Part III
Query answering techniques
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 20 / 73
Query answering techniques Reasoning
The need for reasoning
Query answering needs explicit and implicit data!
Saturation-based query answering (when the saturation resultis finite)
Reformulation-based query answering
Hybrids of the above� J. Urbani, F. van Harmelen, S. Schlobach, and H. Bal,“QueryPIE: Backward reasoning for OWL Horst over verylarge knowledge bases”, ISWC 2011
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 21 / 73
Query answering techniques Graph saturation
Saturation-based query answering
schema
G
G∞
query q
answer
q(G∞) can be computed using an RDBMS
G∞ needs time to be computed and space to be stored
Not suitable for high update rate (data and/or schema triples)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 22 / 73
Query answering techniques Graph saturation
Saturation-based query answering
schema
G
G∞
query q
answer
q(G∞) can be computed using an RDBMS
G∞ needs time to be computed and space to be stored
Not suitable for high update rate (data and/or schema triples)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 22 / 73
Query answering techniques Graph saturation
Saturation maintenance
schema
G
G∞
query q
answer
Compute ∆ for an update of an RDF graph G s.t.• (µ(G))∞ = G∞ ∪∆ when µ an insertion• (µ(G))∞ = G∞ \∆ when µ a deletion
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 23 / 73
Query answering techniques Query reformulation
Reformulation-based query answering
schema
query q query qref
answer
G
qref (G) can be evaluated using an RDBMS
Robust to updates
Reformulated queries are complex, thus costly to evaluate
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 24 / 73
Query answering techniques Query reformulation
Reformulation-based query answering
schema
query q query qref
answer
G
qref (G) can be evaluated using an RDBMS
Robust to updates
Reformulated queries are complex, thus costly to evaluate
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 24 / 73
Query answering techniques Query reformulation
Reformulation-based query answering
schema
query q query qref
answer
G
qref (G) can be evaluated using an RDBMS
Robust to updates
Reformulated queries are complex, thus costly to evaluate
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 25 / 73
Query answering techniques Query reformulation
Reformulation-based query answering
schema
query q query qref
answer
G
qref (G) can be evaluated using an RDBMS
Robust to updates
Reformulated queries are complex, thus costly to evaluate
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 25 / 73
Query answering techniques Query reformulation
Semantic query answering in the presence of mappings
� Di Pinto, Lembo, Lenzerini, Mancini, Poggi, Rosatti, Ruzzi,Savo, EDBT 2013
The logical storage of the data is related to the ontology bymeans of mappings.
A mapping states that a query on the database is includedinto a query on the ontology.
Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015
1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the
mappings)4 Query execution phase (e.g., through an RDBMS)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 26 / 73
Query answering techniques Query reformulation
Semantic query answering in the presence of mappings
Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015
1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the
mappings)4 Query execution phase (e.g., through an RDBMS)
This talk: Optimized reformulation technique based on con-sidering translation and rewriting together
Can be composed with mappings (and further mapping-based opti-mizations)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 27 / 73
Query answering techniques Query reformulation
Semantic query answering in the presence of mappings
Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015
1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the
mappings)4 Query execution phase (e.g., through an RDBMS)
This talk: Optimized reformulation technique based on con-sidering translation and rewriting together
Can be composed with mappings (and further mapping-based opti-mizations)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 27 / 73
Query answering techniques Query reformulation
Semantic query answering vs. Ontology-Based DataAccess (OBDA)
Fundamentally the same
Input data (stored somehow), ontology, query
Output query answer
In general, mappings may be GAV, LAV, GLAV etc.LAV can be very practical (see: RDBMS indexes!)
We optimize query rewriting and translation independently ofthe global ↔ local (source) data mappings
Integration/interactions are intriguing and current/future work
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 28 / 73
Query answering techniques Query reformulation
Reformulation-based query answering landscape
schema
query q query qref
answer
G
Target reformulation languages for conjunctive queries (CQs):
mainly unions of CQs (UCQs)
� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013
joins of single-atom CQs (SCQs)
� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013
joins of UCQs (JUCQs)
� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015
Wait: is this about SQL syntax?!...
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 29 / 73
Query answering techniques Query reformulation
Reformulation-based query answering landscape
schema
query q query qref
answer
G
Target reformulation languages for conjunctive queries (CQs):
mainly unions of CQs (UCQs)
� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013
joins of single-atom CQs (SCQs)
� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013
joins of UCQs (JUCQs)
� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015
Wait: is this about SQL syntax?!...
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 29 / 73
Query answering techniques Query reformulation
Reformulation-based query answering landscape
schema
query q query qref
answer
G
Target reformulation languages for conjunctive queries (CQs):
mainly unions of CQs (UCQs)
� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013
joins of single-atom CQs (SCQs)
� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013
joins of UCQs (JUCQs)
� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015
Wait: is this about SQL syntax?!...
Yes. And it makes a big difference.From failing to feasible, or 4 orders of magnitudespeed-up on the 8 M triples DBLP dataset.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 30 / 73
Part IV
Cover-based reformulation for the
database fragment of RDF
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 31 / 73
Cover-based reformulation for the DB fragment of RDF
The database fragment of RDF
Largest RDF fragment for which an FOL reformulation-based queryanswering technique is known
Restrict RDF entailment to the rules dedicated to RDFSchema only (a.k.a. RDFS entailment).
No restriction to graphs: any triple allowed by the RDFspecification is also allowed in the DB fragment (e.g., blanknodes in data and schema)
Algorithm Reformulate(q, db) for BGP queries over schema anddata. Output size bound O((6 ∗#db2)#q)� Goasdoue, Manolescu, Roatis, EDBT 2013
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 32 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b,hasAuthor, a),(b,publishedIn, ”1949”)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 33 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 34 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 35 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪
(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 36 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪
(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)
q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 37 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-UCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :
(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪
(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪
(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)
q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 37 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-SCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) produces the qref :
(0) q(b) :- (b, rdf:type,Book) ∪ (b,writtenBy, x) ./(1) q(b, a) :- (b,hasAuthor, a) ∪ (b,writtenBy, a) ./(2) q(b) :- (b,publishedIn, ”1949”)
q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 38 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-SCQ query reformulation example
doi1
Book
Publication
“El Aleph”
:b1
“J. L. Borges”
“1949”
Person
writtenBy
hasAuthor
publishedIn
rdfs:subClassOf
rdfs:domain
rdfs:range
rdfs:subPropertyOf
hasTitle
writtenBy
hasName
rdf:type
rdf:type
hasAuthor rdf:type
rdfs:domain
q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) produces the qref :
(0) q(b) :- (b, rdf:type,Book) ∪ (b,writtenBy, x) ./(1) q(b, a) :- (b,hasAuthor, a) ∪ (b,writtenBy, a) ./(2) q(b) :- (b,publishedIn, ”1949”)
q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 38 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
RDF Entailment Rules
query q Optimiser
Reformulation algorithm
query q′
answer
G
1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative
2. Uses a cost model for estimating the cost of evaluating qref
through an RDBMS3. Chooses the cheapest alternative from the search space.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
RDF Entailment Rules
query q Optimiser
Reformulation algorithm
query q′
answer
G
1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative2. Uses a cost model for estimating the cost of evaluating qref
through an RDBMS
3. Chooses the cheapest alternative from the search space.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
RDF Entailment Rules
query q Optimiser
Reformulation algorithm
query q′
answer
G
1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative2. Uses a cost model for estimating the cost of evaluating qref
through an RDBMS3. Chooses the cheapest alternative from the search space.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73
Cover-based reformulation for the DB fragment of RDF
Optimized reformulation into JUCQs
qrefc(qref )
Query q
Graph G
CQ-to-UCQ ref. algo. qrefUCQ ref
JUCQ ref
q1
c(q1)
≡
qbest
c(qbest)
≡qn
c(qn)
≡
. . .
≡. . .
≡
qbest
RDBMS
Results
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 40 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
RDF Entailment Rules
query q Optimiser
Reformulation algorithm
query q′
answer
G
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref :
the UCQ reformulation is: (t1, t2, t3)ref
the SCQ reformulation is: (t1)ref 1 (t2)ref 1 (t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 41 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
RDF Entailment Rules
query q Optimiser
Reformulation algorithm
query q′
answer
G
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref :
the UCQ reformulation is: (t1, t2, t3)ref
the SCQ reformulation is: (t1)ref 1 (t2)ref 1 (t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 41 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref , thespace of JUCQs is:
1. (t1, t2, t3)ref
2. (t1)ref 1 (t2)ref 1 (t3)ref
3. (t1, t2)ref 1 (t3)ref
4. (t1)ref 1 (t2, t3)ref
5. (t1, t3)ref 1 (t2)ref
6. (t1, t2)ref 1 (t1, t3)ref
7. (t1, t2)ref 1 (t2, t3)ref
8. (t1, t3)ref 1 (t2, t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 42 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation algorithms
Exhaustive algorithm
Impractical since search space size is the number of minimal coversof a set of n elements
Greedy algorithm
Driven by cost; starting from 1-atom fragments and “growing” them
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
a cost-based greedy exploration is:
(t1)ref 1 (t2)ref 1 (t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 43 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation: greedy algorithmexample
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
a cost-based greedy exploration is:
(t1)ref 1 (t2)ref 1 (t3)ref
(t1, t2)ref 1 (t3)ref (t1, t3)ref 1 (t2)ref (t1)ref 1 (t2, t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 44 / 73
Cover-based reformulation for the DB fragment of RDF
CQ-to-JUCQ query reformulation algorithm
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
a cost-based greedy exploration is:
(t1)ref 1 (t2)ref 1 (t3)ref
(t1, t2)ref 1 (t3)ref (t1, t3)ref 1 (t2)ref (t1)ref 1 (t2, t3)ref
(t1, t2, t3)ref (t1, t3)ref 1 (t1, t2)ref (t1, t3)ref 1 (t2, t3)ref
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 45 / 73
Part V
Cover-based reformulation in
FOL-reducible settings
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 46 / 73
Optimised reformulation-based query answering
For any setting where query answering is reducible to FOL queryevaluationThe same overall approach:
qrefc(qref )
Query q
Deductive constraints
FOL ref. algo. qrefFixed FOL reform.
Best FOL reformulation
q1
c(q1)
≡
qbest
c(qbest)
≡qn
c(qn)
≡
. . .
≡. . .
≡
qbest
RDBMS
Answers
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 47 / 73
DL-LiteR facts (a.k.a. ABox)
Given:
NC : set of concept names (unary predicates),
NR : set of role names (binary predicates),
NI : set of individuals (constants).
Finite number of:
Concept assertions: A(a) with A ∈ NC and a ∈ NI
Role assertions: R(a, b) with R ∈ NR and a, b ∈ NI
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 48 / 73
DL-LiteR facts (a.k.a. ABox)
Given:
NC : set of concept names (unary predicates),
NR : set of role names (binary predicates),
NI : set of individuals (constants).
Finite number of:
Concept assertions: A(a) with A ∈ NC and a ∈ NI
Role assertions: R(a, b) with R ∈ NR and a, b ∈ NI
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 48 / 73
DL-LiteR constraints (a.k.a. TBox)
Given:
NC : set of concept names (unary predicates),
NR : set of role names (binary predicates),
Set of:
Concept inclusions: B v C
Role inclusions: Q v S
using the following grammar (where A ∈ NC and R ∈ NR):
B := A | ∃Q, C := B | ¬B, Q := R | R−, S := Q | ¬Q
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 49 / 73
DL-LiteR constraints (a.k.a. TBox)
Given:
NC : set of concept names (unary predicates),
NR : set of role names (binary predicates),
Set of:
Concept inclusions: B v C
Role inclusions: Q v S
using the following grammar (where A ∈ NC and R ∈ NR):
B := A | ∃Q, C := B | ¬B, Q := R | R−, S := Q | ¬Q
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 49 / 73
FOL Reformulation
Consider the query q(x):- PhDStudent(x) ∧ worksWith(y , x) andKB K = 〈T ,A〉 with the TBox T and ABox A below:
(T1) PhDStudent v Researcher
(T2) ∃worksWith v Researcher
(T3) ∃worksWith− v Researcher
(T4) worksWith v worksWith−
(T5) supervisedBy v worksWith
(T6) ∃supervisedBy v PhDStudent
(T7) PhDStudent v ¬∃supervisedBy−
(A1) worksWith(Ioana,Francois)(A2) supervisedBy(Damian, Ioana)(A3) supervisedBy(Damian,Francois)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 50 / 73
FOL Reformulation
The UCQ reformulation of q is:
qUCQ(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q4(x):- PhDStudent(x) ∧ supervisedBy(x , y)∨ q5(x):- supervisedBy(x , z) ∧ worksWith(y , x)∨ q6(x):- supervisedBy(x , z) ∧ worksWith(x , y)∨ q7(x):- supervisedBy(x , z) ∧ supervisedBy(y , x)∨ q8(x):- supervisedBy(x , z) ∧ supervisedBy(x , y)∨ q9(x):- supervisedBy(x , x)∨ q10(x):- supervisedBy(x , y)
Naive exhaustive application of specialization steps leads, ingeneral, to highly redundant reformulations.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 51 / 73
FOL Reformulation
The UCQ reformulation of q is:
qUCQ(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q4(x):- PhDStudent(x) ∧ supervisedBy(x , y)∨ q5(x):- supervisedBy(x , z) ∧ worksWith(y , x)∨ q6(x):- supervisedBy(x , z) ∧ worksWith(x , y)∨ q7(x):- supervisedBy(x , z) ∧ supervisedBy(y , x)∨ q8(x):- supervisedBy(x , z) ∧ supervisedBy(x , y)∨ q9(x):- supervisedBy(x , x)∨ q10(x):- supervisedBy(x , y)
Naive exhaustive application of specialization steps leads, ingeneral, to highly redundant reformulations.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 51 / 73
FOL Reformulation minimization
Minimization of qUCQ by eliminating disjuncts contained in anotherleads to:
qUCQmin(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q9(x):- supervisedBy(x , y)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 52 / 73
Cover-based FOL reformulation
� Bursztyn, Goasdoue, Manolescu (DL 2015)Consider the query q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y) and KBK = 〈T ,A〉 with the TBox T and ABox A below:
(T1) B v ∃R ′(T2) R ′ v R
(A1) A(a)(A2) B(a)
The only answer to q(x) is a.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 53 / 73
Cover-based FOL reformulation
� Bursztyn, Goasdoue, Manolescu (DL 2015)Consider the query q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y) and KBK = 〈T ,A〉 with the TBox T and ABox A below:
(T1) B v ∃R ′(T2) R ′ v R
(A1) A(a)(A2) B(a)
The only answer to q(x) is a.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 53 / 73
Cover-based FOL reformulation
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.
The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧
qUCQ2 (y):- R ′(z , y)
Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL
reformulation of q w.r.t. T .
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73
Cover-based FOL reformulation
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧
qUCQ2 (y):- R ′(z , y)
Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL
reformulation of q w.r.t. T .
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73
Cover-based FOL reformulation
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧
qUCQ2 (y):- R ′(z , y)
Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL
reformulation of q w.r.t. T .
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73
Safe cover
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}
The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x):- (A(x)
qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))
Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL
reformulation of q w.r.t. T and C2 a safe cover.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73
Safe cover
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x):- (A(x)
qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))
Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL
reformulation of q w.r.t. T and C2 a safe cover.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73
Safe cover
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}The cover-based FOL reformulation of q w.r.t. C1 is:
qJUCQ(x):- qUCQ1 (x):- (A(x)
qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))
Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL
reformulation of q w.r.t. T and C2 a safe cover.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73
Root cover
We call root cover the minimal safe cover.
C2 = {{A(x)}, {R(x , y),R ′(z , y)}} is a root cover for qw.r.t. T .
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 56 / 73
Root cover
We call root cover the minimal safe cover.
C2 = {{A(x)}, {R(x , y),R ′(z , y)}} is a root cover for qw.r.t. T .
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 56 / 73
Extended cover and extended fragment queries
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C3 = {f1‖f1, f2‖f0}, with:
f0 = {A(x)}f1 = {R(x , y),R ′(z , y)}f2 = {A(x),R(x , y)}
The extended cover-based FOL reformulation of q w.r.t. C3 is:
qe(x):- qUCQ|f1‖f1(x):- ((R(x, y) ∧ R ′(z , y))
∨R ′(x, y) ∨ B(x))
qUCQf2‖f0(x):- ((A(x) ∧ R(x, y))
∨(A(x) ∧ R ′(x, y))∨(A(x) ∧ B(x)))
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 57 / 73
Extended cover and extended fragment queries
q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C3 = {f1‖f1, f2‖f0}, with:
f0 = {A(x)}f1 = {R(x , y),R ′(z , y)}f2 = {A(x),R(x , y)}
The extended cover-based FOL reformulation of q w.r.t. C3 is:
qe(x):- qUCQ|f1‖f1(x):- ((R(x, y) ∧ R ′(z , y))
∨R ′(x, y) ∨ B(x))
qUCQf2‖f0(x):- ((A(x) ∧ R(x, y))
∨(A(x) ∧ R ′(x, y))∨(A(x) ∧ B(x)))
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 57 / 73
Search space
Covers
Eq(extended covers)
Lq(safe covers)
Croot
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 58 / 73
Part VI
Performance
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 59 / 73
Performance Performance potential of optimized reformulation
CQ-to-JUCQ query reformulation performance
Given the query
q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)
and the LUBM 100M benchmark:
JUCQ #reformulations exec. time (ms)
(t1, t2, t3)ref 2, 256 6,387(t1)ref 1 (t2)ref 1 (t3)ref 195 1,074,026
(t1, t2)ref 1 (t3)ref 755 1, 968
(t1)ref 1 (t2, t3)ref 200 846, 710
(t1, t3)ref 1 (t2)ref 568 554(t1, t2)ref 1 (t1, t3)ref 1, 316 2, 734
(t1, t2)ref 1 (t2, t3)ref 764 2, 289
(t1, t3)ref 1 (t2, t3)ref 576 588
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 60 / 73
Performance Optimized reformulation experiments on three RDBMSs
Datasets and RDBMS engines
DBLP (8 M) and LUBM (1 M and 100 M) millions triples
PostgreSQL 9.3.2
System A
System B
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 61 / 73
Performance Optimized reformulation experiments on three RDBMSs
Reformulation algorithms
Basic CQ-to-UCQ algorithm
Picked that of [Goasdoue et al., EDBT’13] since it handles thelargest known fragment of RDF.
Comparison:
1 UCQ reformulation
2 SCQ reformulation
3 Greedy JUCQ reformulation
4 Exhaustive JUCQ reformulation
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 62 / 73
Performance Optimized reformulation experiments on three RDBMSs
Query answering on LUBM 100 M using PostgreSQL
28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 63 / 73
Performance Optimized reformulation experiments on three RDBMSs
Query answering on LUBM 100 M using System A
28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 64 / 73
Performance Optimized reformulation experiments on three RDBMSs
Query answering on LUBM 100 M using System B
28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 65 / 73
Performance Optimized reformulation experiments on three RDBMSs
Optimized reformulation: take-home message
1 Equivalent SQL syntaxes are not equal from the RDBMSoptimizer perspective (inside or outside well-supporteddialect)
2 Chosing the reformulation with the help of textbook costmodel formulas makes queries feasible or efficient when theywere not
3 This amounts to enlarging the optimizer’s “can-do” dialect, ata very modest performance overhead.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 66 / 73
Performance Optimized reformulation experiments on three RDBMSs
Experiments on DL-LiteR reformulation
1 CQ-to-UCQ reformulation
2 Croot reformulation
3 Our cover-based greedy reformulation
4 Our cover-based exhaustive reformulation (as a yardstick forthe quality of the solution our greedy finds)
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 67 / 73
Performance Optimized reformulation experiments on three RDBMSs
Query answering on LUBM∃20 15 millions facts datasetusing PostgreSQL
13 queries; 2 to 10 atoms, 5.77 on average; 35 to 667reformulations, 290.2 on average.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 68 / 73
Performance Optimized reformulation experiments on three RDBMSs
Algorithms running time on LUBM∃20 15 millions factsdataset
13 queries; 2 to 10 atoms, 5.77 on average; 35 to 667reformulations, 290.2 on average.
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 69 / 73
Part VII
Conclusion and perspectives
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 70 / 73
Open issues and perspectives
Where we stand
Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”
We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73
Open issues and perspectives
Where we stand
Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).
Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73
Open issues and perspectives
Where we stand
Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!
Vast performance differences between different reformulations ofthe same query; cost-based approach
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73
Open issues and perspectives
Where we stand
Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73
Open issues and perspectives
What is ahead
Continuous push on the expressivity - efficiency frontier;updates...
RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints
RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.
“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.
Most attractive languages currently: DL-Lite, Datalog±
New relevance of query optimization literature and research!
I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73