scalable multi-query optimization for sparql - cs.utah.edulifeifei/papers/mqoslides.pdf · scalable...
TRANSCRIPT
![Page 1: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/1.jpg)
Scalable Multi-Query Optimization for SPARQL
Wangchao Le1 Anastasios Kementsietsidis2 Songyun Duan2 Feifei Li1
1University of Utah 2IBM Research
April 2, 2012
![Page 2: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
2 / 113
![Page 3: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/3.jpg)
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
3 / 113
![Page 4: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/4.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
4 / 113
![Page 5: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/5.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
Internally ...
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
5 / 113
![Page 6: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/6.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
subject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
6 / 113
![Page 7: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/7.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
subject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
7 / 113
![Page 8: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/8.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etc
<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>
<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>
</rdf:Description></rdf:RDF>
<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .
Triple format:
Internally ...
Query language: SPARQLsubject predicate object
A large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
8 / 113
![Page 9: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/9.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
9 / 113
![Page 10: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/10.jpg)
Introduction
We are inundated with a large collection of RDF (ResourceDescription Framework) data.
DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics
Available engines to manage RDF data?
RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.
[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.
[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.
[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.
[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.
[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.
[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.
10 / 113
![Page 11: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/11.jpg)
Introduction
Q3
RDF store
Q1Q2
Qn−1QnSPARQL queries
11 / 113
![Page 12: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/12.jpg)
Introduction
Q3
RDF store
Q1Q2
Qn−1Qn
• Observation: queries share common parts• Multi-query optimization
SPARQL queries
12 / 113
![Page 13: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/13.jpg)
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
13 / 113
![Page 14: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/14.jpg)
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
14 / 113
![Page 15: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/15.jpg)
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joins
Store dependent solution
15 / 113
![Page 16: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/16.jpg)
Introduction
A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]
SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.
[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.
[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.
[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.
[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.
[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.
For SPARQL and RDF, new issues arise in practice.
Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution
16 / 113
![Page 17: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/17.jpg)
Outline
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
17 / 113
![Page 18: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/18.jpg)
Preliminary
We foucs on two types of queries
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
18 / 113
![Page 19: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/19.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
19 / 113
![Page 20: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/20.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?nameWHERE { ?x name ?name, ?x zip 10001,
}
(b) Example query QOPT
name”Alice””Bob””Ella”
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
20 / 113
![Page 21: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/21.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,
OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}
(b) Example query QOPT
name”Alice””Bob””Ella”
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
21 / 113
![Page 22: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/22.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”
(a) triple table D
SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,
OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}
(b) Example query QOPT
name mail hpage”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
(c) Output QOPT(D)
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
22 / 113
![Page 23: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/23.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
23 / 113
![Page 24: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/24.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph G
Output: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
24 / 113
![Page 25: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/25.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queries
Requirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
25 / 113
![Page 26: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/26.jpg)
Preliminary
We foucs on two types of queries
Type 1: Q := SELECT RD WHERE GP
Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+
Problem statement.
Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:
soundness and completeness: QOPT(G)≡Q (G).
cost:Tr (Q)+Te (Qopt)
Te (Q)≤ 1
26 / 113
![Page 27: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/27.jpg)
Our approach
1 Introduction
2 Preliminary
3 Our approach
4 Experiments
5 Conclusions
27 / 113
![Page 28: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/28.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
28 / 113
![Page 29: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/29.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
: constant : variable
Q1: SELECT * WHERE{?x1 P1 ?z1 . ?y1 P2 ?z1 . ?y1 P3 ?w1 . ?w1 P4 v1 . }
29 / 113
![Page 30: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/30.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
30 / 113
![Page 31: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/31.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
}
?z?x
?y
P1
P2
(I) Structure only QOPT
31 / 113
![Page 32: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/32.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }
}
?z?x
?y
P1
P2
P3
v1 ?wP4
(I) Structure only QOPT
32 / 113
![Page 33: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/33.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }
}
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
(I) Structure only QOPT
33 / 113
![Page 34: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/34.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,
OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }
}
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
Evaluated once→potential saving
(I) Structure only QOPT
OPTIONALs are evaluated on top of the common substructures
(intermediate results cached by engine).
34 / 113
![Page 35: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/35.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
*Max common subquery is not selective
(II) Using cost in optimization
35 / 113
![Page 36: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/36.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
*Max common subquery is not selective
(II) Using cost in optimization
36 / 113
![Page 37: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/37.jpg)
Motivating example
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2
P4?w2
?t2
SELECT *WHERE { ?w P4 v1,
OPTIONAL {?x1 P1 ?z1, ?y1 P2 ?z1, ?y1 P3 ?w }OPTIONAL {?x2 P1 ?z2, ?y2 P2 ?z2, ?t2 P3 ?x2, ?t2 P5 v1 }
}
(II) Using cost in optimization
37 / 113
![Page 38: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/38.jpg)
Our approach
Q={q1, q2, . . . , qn}
38 / 113
![Page 39: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/39.jpg)
Our approach
Q={q1, q2, . . . , qn} They often do not share one common subquery
39 / 113
![Page 40: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/40.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together
40 / 113
![Page 41: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/41.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
41 / 113
![Page 42: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/42.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting
42 / 113
![Page 43: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/43.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries(hierarchically)→ a set of type 2 queries
43 / 113
![Page 44: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/44.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
(hierarchically)→ a set of type 2 queries
44 / 113
![Page 45: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/45.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
?z?x
?y
P1
P2
?t
P3
P5 P3
v1 ?wP4
pattern p α(p)
?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%
(hierarchically)→ a set of type 2 queries
45 / 113
![Page 46: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/46.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
Execution
(hierarchically)→ a set of type 2 queries
46 / 113
![Page 47: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/47.jpg)
Our approach
Q={q1, q2, . . . , qn}
Paritition input queries
Group 1 Group 2 Group k
• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets
Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries
• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery
Execution
Result distribution
r(q1) r(q2) r(qn)
(hierarchically)→ a set of type 2 queries
47 / 113
![Page 48: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/48.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query: e.g., X and Z
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
48 / 113
![Page 49: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/49.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
49 / 113
![Page 50: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/50.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
50 / 113
![Page 51: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/51.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
X Y Z
ab cd ef g
RD of a Type 1 query: e.g., X and Z↑↓
columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesOPTIONAL queries
51 / 113
![Page 52: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/52.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
52 / 113
![Page 53: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/53.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
1 5 10 15 20 25 30 35 40 45 50
10−1
100
101
Se
lectivity (
%)
Predicate ID
Selective Non−selective
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
53 / 113
![Page 54: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/54.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
Queries
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
54 / 113
![Page 55: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/55.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesEnsure randomness in structure, e.g., star, chain and circle
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
55 / 113
![Page 56: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/56.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
56 / 113
![Page 57: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/57.jpg)
Experiments
Implementation highlights
C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory
Dataset
Extend LUBM benchmark generator:randomness in structure, variances of sel.
RDF stores: Jena TDB 0.85 etc
QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%
Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO
57 / 113
![Page 58: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/58.jpg)
Experiments
Time on rewritingMQO-S-C: structure based rewriting
MQO-C: rewriting integrating with cost
58 / 113
![Page 59: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/59.jpg)
Experiments
Time on rewritingMQO-S-C: structure based rewriting
MQO-C: rewriting integrating with cost
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−C MQO−C
|Q|
*Costly/bad rewritings are rejected→more rounds of comparisons.
59 / 113
![Page 60: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/60.jpg)
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 / 113
![Page 61: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/61.jpg)
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−P MQO−P
|Q|
*Non-selective common subqueriesincrease the set of results.
61 / 113
![Page 62: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/62.jpg)
Experiments
Time on distributing resultsMQO-S-P: parsing results from MQO-S
MQO-P: parsing results with MQO
60 80 100 120 140 1600
0.25
0.5
0.75
1
1.25
1.5
Tim
e (s
econ
ds)
MQO−S−P MQO−P
|Q|
*Non-selective common subqueriesincrease the set of results.
*Both rewriting and parsingare efficiently doable
62 / 113
![Page 63: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/63.jpg)
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
63 / 113
![Page 64: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/64.jpg)
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
60 80 100 120 140 1600
50
100
150
Num
ber
of q
uerie
s
No−MQO MQO−S MQO
|Q|
*Both reduce the num ofqueries to be exectued
64 / 113
![Page 65: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/65.jpg)
Experiments
Varying num of queries in a batchNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
60 80 100 120 140 1600
100
200
300
400
Tim
e (s
econ
ds)
No−MQO MQO−S MQO
|Q|65 / 113
![Page 66: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/66.jpg)
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
66 / 113
![Page 67: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/67.jpg)
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
0.1 0.5 1 2 40
25
50
75
100
Num
ber
of q
uerie
s
No−MQO MQO−S MQO
αmin(qcmn) (%)
*MQO: reject more bad rewritings;MQO-S: not sensitive
67 / 113
![Page 68: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/68.jpg)
Experiments
Varying min. selectivity in seed queriesNo-MQO: no optimization
MQO-S: optimization based on structural rewriting
MQO: integrating cost
0.1 0.5 1 2 40
100
200
300
Tim
e (s
econ
ds)
No−MQO MQO−S MQO
αmin(qcmn) (%)68 / 113
![Page 69: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/69.jpg)
Experiments
Varying seed sizeMQO-S: optimization based on structural rewriting
MQO: integrating cost
percentage=Te (common subquery)Te (Qopt )
× 100%
69 / 113
![Page 70: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/70.jpg)
Experiments
Varying seed sizeMQO-S: optimization based on structural rewriting
MQO: integrating cost
percentage=Te (common subquery)Te (Qopt )
× 100%
1 2 3 4 5
60
80
100
Per
cent
age
(%)
MQO−S MQO
|qcmn|
No−OPTIONAL− − −
*MQO-S: up to 25% time on optional
70 / 113
![Page 71: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/71.jpg)
Conclusions
In dealing RDF data on the Web, store independency is important
Combining SPARQL language and graph algorithms can achieveMQO, i.e., by rewriting queries
Cost must be taken in consideration during rewriting
71 / 113
![Page 72: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/72.jpg)
The End
Thank You
Q and A
72 / 113
![Page 73: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/73.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
73 / 113
![Page 74: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/74.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
74 / 113
![Page 75: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/75.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
75 / 113
![Page 76: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/76.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p7
p7
p8
p9 p9
p8
p7p10
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p3
p7
p9p8
p9
p8p6
p9
p7p7
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
76 / 113
![Page 77: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/77.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p7
p7
p8
p9 p9
p8
p7p10
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p3
p7
p9p8
p9
p8p6
p9
p7p7
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
Distance: Jaccard similarity on predicates
Represent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
77 / 113
![Page 78: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/78.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
78 / 113
![Page 79: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/79.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
79 / 113
![Page 80: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/80.jpg)
Our approach
Partition input queries
Object: similar queries can be optimized together in rewriting
p1
p2 p3
p1
p3p2 p4
p1
p2 p3
p7
p8p9
p2
p1
p1p3
p4
p4
p2p3
p3
p2p2 p1
p1p3
p4 p1
p2
p4
p2
p1p4
p5p6
p9
p7p7
{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}
{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}
q1 q2 q3 q4 q5 q6
q12q11q10q9q8q7
p4
p7
p7
p8
p9 p9
p8
p7p10
p7
p9p8
p9
Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.
Measure the similarity of a pair of queries by set similarity
Grouping: k-means
80 / 113
![Page 81: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/81.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
81 / 113
![Page 82: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/82.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
qab qcd
qabcd
qa qb qc qd . . . . . . . . . . . . qr qs
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
82 / 113
![Page 83: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/83.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
83 / 113
![Page 84: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/84.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
84 / 113
![Page 85: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/85.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
85 / 113
![Page 86: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/86.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .
[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
86 / 113
![Page 87: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/87.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
87 / 113
![Page 88: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/88.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs
88 / 113
![Page 89: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/89.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs
89 / 113
![Page 90: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/90.jpg)
Our approach
Hierarchical rewriting and clustering (inside a group)
Rewrite pairs of queries bottom up
Pair up queries with max Jaccard similarity
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcdrewrite
Rewriting → finding maximal common triple patterns
In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.
maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs→ maximal cliques in the product graph
90 / 113
![Page 91: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/91.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
91 / 113
![Page 92: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/92.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
92 / 113
![Page 93: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/93.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
93 / 113
![Page 94: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/94.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
94 / 113
![Page 95: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/95.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
95 / 113
![Page 96: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/96.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
96 / 113
![Page 97: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/97.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
97 / 113
![Page 98: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/98.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
graph query pattern 1 graph query pattern 2
The triangle (clique) highlights the common subgraph composed by
98 / 113
![Page 99: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/99.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants
99 / 113
![Page 100: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/100.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants• prune vertices with non-common
neighborhoods
100 / 113
![Page 101: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/101.jpg)
Our approach
?z4?x4
?y4
P1
P2
v1
?z3?x3
?y3
P1
P2P3
P5
v1
?z2?x2
?y2
P1
P2P3
P5
?w1v1
?z1?x1
?y1
P1
P2
P4
P3
(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4
P4 P4?w2
?t2
?w3
?u4
P4?w4
P3
P6
v1
• Linegraph: invert vertices and edges
• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3
P2
P1
P3
P4
`3
`3
`0
`0
`1
`2
P5
P1 P3
`3
`3`3
`0
P2
`3
`1`2
`1
`2
(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)
`0
P2P1
P3 P5
P4
`3
`3
`2 `1`0
`0
`3`3
P4
`2 `1
P1P2
P3 P6
P4
`3
`3
`0 `0`3
`3
`0`0
• product graph: simultaneous walk
• blowup in size, esp. > 2 queries
affect clique detection
• optimize the product graph
• prune non-common predicates• check the constants• prune vertices with non-common
neighborhoods
`3
`3
P2P1
L(GPp):
S: P3 P4
101 / 113
![Page 102: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/102.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
102 / 113
![Page 103: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/103.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
103 / 113
![Page 104: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/104.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
104 / 113
![Page 105: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/105.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritings
Measure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
105 / 113
![Page 106: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/106.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximation
Cost: discard bad rewritings, keep good ones in hierarchical rewriting
106 / 113
![Page 107: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/107.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
107 / 113
![Page 108: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/108.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
108 / 113
![Page 109: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/109.jpg)
Our approach
Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.
[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.
Integrate cost into rewriting
Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting
qa qb qc qd . . . . . . . . . . . . qr qs
qab qcd
qabcd
Stop when the selectivitydrops
109 / 113
![Page 110: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/110.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
110 / 113
![Page 111: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/111.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
111 / 113
![Page 112: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/112.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
112 / 113
![Page 113: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun](https://reader035.vdocument.in/reader035/viewer/2022071215/60444fccd575c145c923daca/html5/thumbnails/113.jpg)
Our approach
Related issues
Distributing results, i.e., Type 2 query→Type 1 queries
name mail hpage
”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella
RD of a Type 1 query
↑↓columns from results of the Type 2 rewriting
Soundness and completeness
Extensibility of the solution: more general queries
handle variable predicatesnested OPTIONALs
113 / 113