scalable multi-query optimization for sparql - cs.utah.edulifeifei/papers/mqoslides.pdf · scalable...
Post on 12-Oct-2020
4 Views
Preview:
TRANSCRIPT
Scalable Multi-Query Optimization for SPARQL
Wangchao Le1 Anastasios Kementsietsidis2 Songyun Duan2 Feifei Li1
1University of Utah 2IBM Research
April 2, 2012
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
2 / 113
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
3 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
4 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
Internally ...
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
5 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
subject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
6 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
subject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
7 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
Query language: SPARQLsubject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
8 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
9 / 113
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
10 / 113
Introduction
Q3
RDF store
Q1Q2
Qn−1QnSPARQL queries
11 / 113
Introduction
Q3
RDF store
Q1Q2
Qn−1Qn
• Observation: queries share common parts• Multi-query optimization
SPARQL queries
12 / 113
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
13 / 113
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
14 / 113
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joins
Store dependent solution
15 / 113
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
16 / 113
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
17 / 113
Preliminary
We foucs on two types of queries
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
18 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
19 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?nameWHERE { ?x name ?name, ?x zip 10001,
}
(b) Example query QOPT
name”Alice””Bob””Ella”
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
20 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,
OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}
(b) Example query QOPT
name”Alice””Bob””Ella”
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
21 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,
OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}
(b) Example query QOPT
name mail hpage”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
(c) Output QOPT(D)
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
22 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
23 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph G
Output: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
24 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queries
Requirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
25 / 113
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
26 / 113
Our approach
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
27 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
28 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
: constant : variable
Q1: SELECT * WHERE{?x1 P1 ?z1 . ?y1 P2 ?z1 . ?y1 P3 ?w1 . ?w1 P4 v1 . }
29 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
30 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
}
?z?x
?y
P1
P2
(I) Structure only QOPT
31 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }
}
?z?x
?y
P1
P2
P3
v1 ?wP4
(I) Structure only QOPT
32 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }
}
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
(I) Structure only QOPT
33 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }
}
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
Evaluated once→potential saving
(I) Structure only QOPT
OPTIONALs are evaluated on top of the common substructures
(intermediate results cached by engine).
34 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
*Max common subquery is not selective
(II) Using cost in optimization
35 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
*Max common subquery is not selective
(II) Using cost in optimization
36 / 113
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?w P4 v1,
OPTIONAL {?x1 P1 ?z1, ?y1 P2 ?z1, ?y1 P3 ?w }OPTIONAL {?x2 P1 ?z2, ?y2 P2 ?z2, ?t2 P3 ?x2, ?t2 P5 v1 }
}
(II) Using cost in optimization
37 / 113
Our approach
Q={q1, q2, . . . , qn}
38 / 113
Our approach
Q={q1, q2, . . . , qn} They often do not share one common subquery
39 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together
40 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
41 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting
42 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries(hierarchically)→ a set of type 2 queries
43 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
(hierarchically)→ a set of type 2 queries
44 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
(hierarchically)→ a set of type 2 queries
45 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
Execution
(hierarchically)→ a set of type 2 queries
46 / 113
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
Execution
Result distribution
r(q1) r(q2) r(qn)
(hierarchically)→ a set of type 2 queries
47 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query: e.g., X and Z
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
48 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
49 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
50 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
51 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
52 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
1 5 10 15 20 25 30 35 40 45 50
10−1
100
101
Se
lectivity (
%)
Predicate ID
Selective Non−selective
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
53 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
54 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesEnsure randomness in structure, e.g., star, chain and circle
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
55 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
56 / 113
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
57 / 113
Experiments
Time on rewritingMQO-S-C: structure based rewriting
MQO-C: rewriting integrating with cost
58 / 113
Experiments
Time on rewritingMQO-S-C: structure based rewriting
MQO-C: rewriting integrating with cost
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−C MQO−C
|Q|
*Costly/bad rewritings are rejected→more rounds of comparisons.
59 / 113
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 / 113
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−P MQO−P
|Q|
*Non-selective common subqueriesincrease the set of results.
61 / 113
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−P MQO−P
|Q|
*Non-selective common subqueriesincrease the set of results.
*Both rewriting and parsingare efficiently doable
62 / 113
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
63 / 113
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
60 80 100 120 140 1600
50
100
150
Num
ber
of q
uerie
s
No−MQO MQO−S MQO
|Q|
*Both reduce the num ofqueries to be exectued
64 / 113
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
60 80 100 120 140 1600
100
200
300
400
Tim
e (s
econ
ds)
No−MQO MQO−S MQO
|Q|65 / 113
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
66 / 113
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
0.1 0.5 1 2 40
25
50
75
100
Num
ber
of q
uerie
s
No−MQO MQO−S MQO
αmin(qcmn) (%)
*MQO: reject more bad rewritings;MQO-S: not sensitive
67 / 113
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
0.1 0.5 1 2 40
100
200
300
Tim
e (s
econ
ds)
No−MQO MQO−S MQO
αmin(qcmn) (%)68 / 113
Experiments
Varying seed sizeMQO-S: optimization based on structural rewriting
MQO: integrating cost
percentage=Te (common subquery)Te (Qopt )
× 100%
69 / 113
Experiments
Varying seed sizeMQO-S: optimization based on structural rewriting
MQO: integrating cost
percentage=Te (common subquery)Te (Qopt )
× 100%
1 2 3 4 5
60
80
100
Per
cent
age
(%)
MQO−S MQO
|qcmn|
No−OPTIONAL− − −
*MQO-S: up to 25% time on optional
70 / 113
Conclusions
In dealing RDF data on the Web, store independency is important
Combining SPARQL language and graph algorithms can achieveMQO, i.e., by rewriting queries
Cost must be taken in consideration during rewriting
71 / 113
The End
Thank You
Q and A
72 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
73 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
74 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
75 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p7
p7
p8
p9 p9
p8
p7p10
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p3
p7
p9p8
p9
p8p6
p9
p7p7
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
76 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p7
p7
p8
p9 p9
p8
p7p10
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p3
p7
p9p8
p9
p8p6
p9
p7p7
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
Distance: Jaccard similarity on predicates
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
77 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
78 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
79 / 113
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
80 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
81 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
qab qcd
qabcd
qa qb qc qd . . . . . . . . . . . . qr qs
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
82 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
83 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
84 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
85 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
86 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
87 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
88 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs
89 / 113
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs→ maximal cliques in the product graph
90 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
91 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
92 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
93 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
94 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
95 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
96 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
97 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
98 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants
99 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants• prune vertices with non-common
neighborhoods
100 / 113
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants• prune vertices with non-common
neighborhoods
`3
`3
P2P1
L(GPp):
S: P3 P4
101 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
102 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
103 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
104 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritings
Measure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
105 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximation
Cost: discard bad rewritings, keep good ones in hierarchical rewriting
106 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
107 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
108 / 113
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Stop when the selectivitydrops
109 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
110 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
111 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
112 / 113
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
113 / 113
top related