phd defense - pages personelles de louis jachietlouis.jachiet.com/these/slide.pdf ·...

113
>>> PhD Defense >>> On the foundations for the compilation of web data queries: optimization and distributed evaluation of SPARQL. Date: 13 September, 2018 Defendant: Louis JACHIET Directors: Nabil LAYAÏDA Pierre GENEVÈS Reviewers: Dario COLAZZO Ioana MANOLESCU Jury: Jérôme EUZENAT Patrick VALDURIEZ [1/40]

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> PhD Defense

>>> On the foundations for the compilation of webdata queries: optimization and distributedevaluation of SPARQL.

Date: 13 September, 2018Defendant: Louis JACHIET

Directors: Nabil LAYAÏDAPierre GENEVÈS

Reviewers: Dario COLAZZOIoana MANOLESCU

Jury: Jérôme EUZENATPatrick VALDURIEZ

[1/40]

Page 2: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> PhD Defense

>>> On the foundations for the compilation of webdata queries: optimization and distributedevaluation of SPARQL.

Date: 13 September, 2018Defendant: Louis JACHIET

Directors: Nabil LAYAÏDAPierre GENEVÈS

Reviewers: Dario COLAZZOIoana MANOLESCU

Jury: Jérôme EUZENATPatrick VALDURIEZ

[1/40]

Page 3: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Motivation / Questions

What are the museums in Grenoble?

Who are they exposing?

[2/40]

Page 4: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Motivation / Answering with Wikipedia?

[3/40]

Page 5: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / RDF & SPARQL / Goals

The goals of a data standard are to:

* Encode data in a machine readable & processable way

* Allow exchange of data on the web

* Facilitate querying

[4/40]

Page 6: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / RDF & SPARQL / The RDF standard

RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.

[5/40]

Page 7: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / RDF & SPARQL / The RDF standard

RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.

In N-Triples format:

subject predicate object:Musée_de_Grenoble :isA :Museum .:Musée_de_Grenoble :locatedIn :Grenoble .:Musée_de_Grenoble :exhibits :Chagall .:Musée_de_Grenoble :exhibits :Fantin-Latour .

:Perret_Tower :locatedIn :Grenoble .:Louvre :isA :Museum .

[5/40]

Page 8: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / RDF & SPARQL / The RDF standard

RDF: Resource Description Framework [RCM14]In RDF, data is represented as entities and as statementsexpressing relationships between these entities.

Viewed as graphs:

:Musée_de_Grenoble

:Museum

:Grenoble

:Perret_Tower

:Louvre

:Chagall

:Fantin-Latour

:isA

:isA

:exhibits

:exhibit

s :locatedIn :locatedIn

[5/40]

Page 9: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Triple Patterns

Triple Pattern (TP)

A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.

“What are the museums?”

?what

:Museum:isA

In SPARQL:

?what :isA :Museum .

Solution:

(?What → :Musée_de_Grenoble)(?What → :Louvre)

[6/40]

Page 10: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Triple Patterns

Triple Pattern (TP)

A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.

“What are the museums?”

?what

:Museum:isA

In SPARQL:

?what :isA :Museum .

Solution:

(?What → :Musée_de_Grenoble)(?What → :Louvre)

[6/40]

Page 11: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Triple Patterns

Triple Pattern (TP)

A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.

“What are the museums?”

?what

:Museum:isA

In SPARQL:

?what :isA :Museum .

Solution:

(?What → :Musée_de_Grenoble)(?What → :Louvre)

[6/40]

Page 12: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Triple Patterns

Triple Pattern (TP)

A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.

“What are the museums?”

?what

:Museum:isA

In SPARQL:

?what :isA :Museum .

Solution:

(?What → :Musée_de_Grenoble)(?What → :Louvre)

[6/40]

Page 13: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Triple Patterns

Triple Pattern (TP)

A Triple Path is a triple (s, p, o) where s, p, o are constants orvariables.

“What are the museums?”

?what

:Museum:isA

In SPARQL:

?what :isA :Museum .

Solution:

(?What → :Musée_de_Grenoble)(?What → :Louvre)

[6/40]

Page 14: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

Basic Graph Patterns (BGP)

Basic Graph Patterns are conjunctions of Triple Patterns.

“What are the museums in Grenoble exposing Chagall?”

?What:Museum

:Grenoble

:Chagall

:isA:locat

edIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .

Solution:

(?What → :Musée_de_Grenoble)

[7/40]

Page 15: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

Basic Graph Patterns (BGP)

Basic Graph Patterns are conjunctions of Triple Patterns.

“What are the museums in Grenoble exposing Chagall?”

?What:Museum

:Grenoble

:Chagall

:isA:locat

edIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .

Solution:

(?What → :Musée_de_Grenoble)

[7/40]

Page 16: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

Basic Graph Patterns (BGP)

Basic Graph Patterns are conjunctions of Triple Patterns.

“What are the museums in Grenoble exposing Chagall?”

?What:Museum

:Grenoble

:Chagall

:isA:locat

edIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .

Solution:

(?What → :Musée_de_Grenoble)

[7/40]

Page 17: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

Basic Graph Patterns (BGP)

Basic Graph Patterns are conjunctions of Triple Patterns.

“What are the museums in Grenoble exposing Chagall?”

?What:Museum

:Grenoble

:Chagall

:isA:locat

edIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .

Solution:

(?What → :Musée_de_Grenoble)

[7/40]

Page 18: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

Basic Graph Patterns (BGP)

Basic Graph Patterns are conjunctions of Triple Patterns.

“What are the museums in Grenoble exposing Chagall?”

?What:Museum

:Grenoble

:Chagall

:isA:locat

edIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits :Chagall .

Solution:

(?What → :Musée_de_Grenoble)

[7/40]

Page 19: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA

:locatedIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .

Solution:

(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)

[8/40]

Page 20: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA

:locatedIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .

Solution:

(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)

[8/40]

Page 21: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Basic Graph Patterns

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA

:locatedIn

:exhibits

In SPARQL:

?what :isA :Museum .?what :locatedIn :Grenoble .?what :exhibits ?artist .

Solution:

(?What → :Musée_de_Grenoble;?artist → :Chagall)(?What → :Musée_de_Grenoble;?artist → :Fantin-Latour)

[8/40]

Page 22: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.0

Other graph patterns in SPARQL 1.0 [PS+08]

* Triple Patterns

* Conjunction

Basic Graph Patterns

* Disjunction

Museums or Trekking paths

* Filters

Museums with artists born before 1900

* Conditional optionals

Missing information

* Changing graphs, etc.

[9/40]

Page 23: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.0

Other graph patterns in SPARQL 1.0 [PS+08]

* Triple Patterns

* Conjunction

Basic Graph Patterns

* Disjunction

Museums or Trekking paths

* Filters

Museums with artists born before 1900

* Conditional optionals

Missing information

* Changing graphs, etc.

[9/40]

Page 24: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.0

Other graph patterns in SPARQL 1.0 [PS+08]

* Triple Patterns

* Conjunction

Basic Graph Patterns

* Disjunction

Museums or Trekking paths

* Filters

Museums with artists born before 1900

* Conditional optionals

Missing information

* Changing graphs, etc.

[9/40]

Page 25: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.0

Other graph patterns in SPARQL 1.0 [PS+08]

* Triple Patterns

* Conjunction

Basic Graph Patterns

* Disjunction

Museums or Trekking paths

* Filters

Museums with artists born before 1900

* Conditional optionals

Missing information

* Changing graphs, etc.

[9/40]

Page 26: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.0

Other graph patterns in SPARQL 1.0 [PS+08]

* Triple Patterns

* Conjunction

Basic Graph Patterns

* Disjunction

Museums or Trekking paths

* Filters

Museums with artists born before 1900

* Conditional optionals

Missing information

* Changing graphs, etc.

[9/40]

Page 27: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.1

Novelties of SPARQL 1.1 [HSP13]

* Expressions

Compute the age of monuments

* Minus, Exists

Cities with museums but without operas

* Group by & Aggregation

Count the number of museums per city

* Property Paths

[10/40]

Page 28: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.1

Novelties of SPARQL 1.1 [HSP13]

* Expressions

Compute the age of monuments

* Minus, Exists

Cities with museums but without operas

* Group by & Aggregation

Count the number of museums per city

* Property Paths

[10/40]

Page 29: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.1

Novelties of SPARQL 1.1 [HSP13]

* Expressions

Compute the age of monuments

* Minus, Exists

Cities with museums but without operas

* Group by & Aggregation

Count the number of museums per city

* Property Paths

[10/40]

Page 30: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / The SPARQL standard 1.1

Novelties of SPARQL 1.1 [HSP13]

* Expressions

Compute the age of monuments

* Minus, Exists

Cities with museums but without operas

* Group by & Aggregation

Count the number of museums per city

* Property Paths

[10/40]

Page 31: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Property Paths

“What are the museums in France?”

[11/40]

Page 32: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Property Paths

“What are the museums in France?”

:Musée_de_Grenoble

:Museum

:Grenoble :Isère

:ARA:France

:isA

:locatedIn :locatedIn

:locatedIn

:locatedIn

[11/40]

Page 33: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Property Paths

?What

:Museum

:France

:isA

:locatedIn+

“What are the museums in France?”

:Musée_de_Grenoble

:Museum

:Grenoble :Isère

:ARA:France

:isA

:locatedIn :locatedIn

:locatedIn

:locatedIn

[11/40]

Page 34: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / SPARQL / Property Paths

?What

:Museum

:France

:isA

:locatedIn+

“What are the museums in France?”

Property Path (PP)

A Property Path is a triple (s, r, o) where r is a pathexpression.

[11/40]

Page 35: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP naive

:Musée_de_Grenoble

:Museum

:Grenoble:Chagall

:Fantin-Latour

:Perret_Tower

:Louvres

:isA

:exhibits

:exhibits

:locatedIn

:isA

:locatedIn

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA:locatedIn

:exhibits

(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)

O(

#nodes#variables)

checks!

[12/40]

Page 36: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP naive

:Musée_de_Grenoble

:Museum

:Grenoble:Chagall

:Fantin-Latour

:Perret_Tower

:Louvres

:isA

:exhibits

:exhibits

:locatedIn

:isA

:locatedIn

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA:locatedIn

:exhibits

(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)

O(

#nodes#variables)

checks!

[12/40]

Page 37: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP naive

:Musée_de_Grenoble

:Museum

:Grenoble:Chagall

:Fantin-Latour

:Perret_Tower

:Louvres

:isA

:exhibits

:exhibits

:locatedIn

:isA

:locatedIn

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA:locatedIn

:exhibits

(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)

O(

#nodes#variables)

checks!

[12/40]

Page 38: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP naive

:Musée_de_Grenoble

:Museum

:Grenoble:Chagall

:Fantin-Latour

:Perret_Tower

:Louvres

:isA

:exhibits

:exhibits

:locatedIn

:isA

:locatedIn

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA:locatedIn

:exhibits

(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)

O(

#nodes#variables)

checks!

[12/40]

Page 39: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP naive

:Musée_de_Grenoble

:Museum

:Grenoble:Chagall

:Fantin-Latour

:Perret_Tower

:Louvres

:isA

:exhibits

:exhibits

:locatedIn

:isA

:locatedIn

“What are the museums in Grenoble and who are they exposing?”

?What

:Museum

:Grenoble

?artist

:isA:locatedIn

:exhibits

(?What → :Musée_de_Grenoble; ?artist → :Chagall)(?What → :Musée_de_Grenoble; ?artist → :Fantin-Latour)

O(

#nodes#variables)

checks![12/40]

Page 40: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP one-by-one

Compute solution to each individual TP

?What

?What

?What

TP1

TP2

TP3

?artist

:Museum

:Grenoble

:exhibits

:isA

:locatedIn

O(#edges) per TP

Combine individual solutions

(TP 1 ⋈ TP 2) ⋈ TP 3

O(|A| + |B| + |A ⋈ B|) per A ⋈ B

[13/40]

Page 41: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP one-by-one

Compute solution to each individual TP

?What

?What

?What

TP1

TP2

TP3

?artist

:Museum

:Grenoble

:exhibits

:isA

:locatedIn

O(#edges) per TPCombine individual solutions

(TP 1 ⋈ TP 2) ⋈ TP 3

O(|A| + |B| + |A ⋈ B|) per A ⋈ B

[13/40]

Page 42: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / BGP : Algebraic vision

(BGP ,⋈) as an algebraic structure:

* ⋈ is associative

TP 1 ⋈ (TP 2 ⋈ TP 3) = (TP 1 ⋈ TP 2) ⋈ TP 3* ⋈ is commutative

TP 1 ⋈ TP 2 = TP 2 ⋈ TP 1

An optimization method:

Generate all the equivalent terms, run the most efficient.

[14/40]

Page 43: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 44: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 45: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 46: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 47: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 48: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Introduction / Query evaluation / Relational algebra

The relational algebra [Cod70]

* Base relations

* Combined through a set of operators (⋈, ∪, �, etc.)

Optimization of relational languages

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

Problems:

* Not specifically for graphs

* Mismatchs in the semantics

* Poor optimization of Property Paths

[15/40]

Page 49: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Overview

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

[16/40]

Page 50: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / A new algebra

The �-algebra:

* a variation of the relational algebra

* equipped with fixpoints

* matches the SPARQL semantics

[17/40]

Page 51: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Syntax

' ∶∶= formula| '1 ∪ '2 union| '1⧵⧵ '2 normal minus| '1 − '2 set minus| '1 ⧵ '2 strict minus| '1 '2 left-join| '1 '2 join| �ba (') column exchange (or rename)| �a (') projection| �ba (') column multiplying| �(', g ∶ → ) apply a function to mappings| Θ (', g,,) reduce| �filter(') row filtering| �(X = ') fixpoint| let (X = ') in let-binder| X variable| ∅ no mapping| |c1 → v1,… , cn → vn| a mapping

[18/40]

Page 52: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Syntax

' ∶∶= formula| '1 ∪ '2 union| '1⧵⧵ '2 normal minus| '1 − '2 set minus| '1 ⧵ '2 strict minus| '1 '2 left-join| '1 '2 join| �ba (') column exchange (or rename)| �a (') projection| �ba (') column multiplying| �(', g ∶ → ) apply a function to mappings| Θ (', g,,) reduce| �filter(') row filtering| �(X = ') fixpoint| let (X = ') in let-binder| X variable| ∅ no mapping| |c1 → v1,… , cn → vn| a mapping

[18/40]

Page 53: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

?s ?oPath

?s ?m ?oA B

A/B

T r(A∕B) = �m(

�mo (T r(A)) �ms (T r(B)))

[19/40]

Page 54: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

?s ?oPath

?s ?m ?oA B

A/B

T r(A∕B) = �m(

�mo (T r(A)) �ms (T r(B)))

[19/40]

Page 55: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

?s ?oPath

?s ?m ?oA B

A/B

T r(A∕B) = �m(

�mo (T r(A)) �ms (T r(B)))

[19/40]

Page 56: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

A∗ = Empty Path or Path of A∗∕A

T r(A∗) =

EmptyPatℎ

T r(A∗)∕A

= �(

X = EmptyPatℎ ∪ X∕A)

= �(

X = �os (AllNodes) ∪ �m(

�mo (X) �ms (T r(A)))

)

[20/40]

Page 57: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

A∗ = Empty Path or Path of A∗∕A

T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A

= �(

X = EmptyPatℎ ∪ X∕A)

= �(

X = �os (AllNodes) ∪ �m(

�mo (X) �ms (T r(A)))

)

[20/40]

Page 58: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

A∗ = Empty Path or Path of A∗∕A

T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A

= �(

X = EmptyPatℎ ∪ X∕A)

= �(

X = �os (AllNodes) ∪ �m(

�mo (X) �ms (T r(A)))

)

[20/40]

Page 59: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Translation

A∗ = Empty Path or Path of A∗∕A

T r(A∗) = EmptyPatℎ ∪ T r(A∗)∕A

= �(

X = EmptyPatℎ ∪ X∕A)

= �(

X = �os (AllNodes) ∪ �m(

�mo (X) �ms (T r(A)))

)

[20/40]

Page 60: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

[21/40]

Page 61: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 62: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 63: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 64: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 65: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 66: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 67: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 68: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 69: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 70: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 71: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 72: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 73: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 74: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 75: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

N

[21/40]

Page 76: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 77: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 78: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 79: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 80: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 81: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Rewrite rules

Rewrite rules for fixpoints

* pushing filters?

�filter (�(X = '))?= �

(

X = �filter ('))

* return fixpoints?

* pushing joins?

�(X = ')?= �(X = ')

* pushing projections?

�p (�(X = '))?= �

(

X = �p ('))

* combine fixpoints?

�(X = ∪ �) �(X = ' ∪ �)?= �(X = ' ∪ � ∪ �)

[22/40]

Page 82: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 83: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 84: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 85: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 86: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 87: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / An example

:N :Red/:Yellow∗ ?o

�?s(

�?s=∶N(

:Red∕�(

X = �os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N(

�(

X = :Red∕�os (AllNodes) ∪X∕:Yellow)))

�?s(

�?s=∶N (�(X = :Red ∪X∕:Yellow)))

�?s(

�(

X = �?s=∶N (:Red) ∪X∕:Yellow))

�(

X = �?s(

�?s=∶N (:Red))

∪X∕:Yellow)

[23/40]

Page 88: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Property Paths

Methods of evaluating Property Paths:

* Ad-hoc

Automata, Waveguide[YGG15]

* Fixpoints

Datalog, Recursive SQL

[24/40]

Page 89: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / The �-algebra / Benchmarking

102 103 104 105 10610−3

10−2

10−1

100

101

number of nodes n

Time

(s)

PostgresSQLite

VirtuosoARQ1ARQ2DLV1DLV2

Ramsdell2Ramsdell1

Vlog1Vlog2

Prototype

Figure: Time for ∶ N :Red∕:Yellow∗ ?o on n nodes

[25/40]

Page 90: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / Overview

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

[26/40]

Page 91: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / Cost model

Cost model

Estimate the running time to evaluate a term.

Cost(A B) = Cost(A) + Cost(B) + O(size(A B))

Cardinality estimation

Estimate the number of solutions to a term.

[27/40]

Page 92: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / Cost model

Cost model

Estimate the running time to evaluate a term.

Cost(A B) = Cost(A) + Cost(B) + O(size(A B))

Cardinality estimation

Estimate the number of solutions to a term.

[27/40]

Page 93: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / SPARQLGX

SPARQLGX

SPARQLGX is a distributed SPARQL query evaluator based onApache Spark.

CardEst

A worst-case cardinality estimation with a new tool:summaries.

[28/40]

Page 94: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / SPARQLGX

SPARQLGX

SPARQLGX is a distributed SPARQL query evaluator based onApache Spark.

CardEst

A worst-case cardinality estimation with a new tool:summaries.

[28/40]

Page 95: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / Join algorithms

The hash join algorithm

Map(Cogroup(A,B), iter)

O((|A| + |B|) × shuffle + |A B|)

The broadcast join algorithm

MapV alues(A, fB)

O(|A| + |B| × #workers + |A B|)

[29/40]

Page 96: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Plan selection / SPARQLGX

NoOptim Stats CardEst

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Q11Q12Q13Q140204060

Time

(s)

Figure: LUBM [GPH05] 10k (1,4 billions triples)

C1 C2 C3 S1 S2 S3 S4 S5 S6 S70204060

Time

(s)

L1 L2 L3 L4 L5 F1 F2 F3 F4 F50204060

Time

(s)

Figure: WatDiv [AHÖD14] 1k (140 millions triples)

[30/40]

Page 97: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Overview

1. Generate equivalent terms

2. Select an estimated most efficient

3. Execute it

[31/40]

Page 98: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Streams are one-way communication channels

RS

end

[32/40]

Page 99: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Streams are one-way communication channels

RS

end

[32/40]

Page 100: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Streams are one-way communication channels

RS

end

[32/40]

Page 101: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Streams are one-way communication channels

RS

end

[32/40]

Page 102: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Streams are one-way communication channels

RS

end

[32/40]

Page 103: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / Streams

Execution of �-algebra terms with streams

start

Xn

.

.

.X1 ' out

[33/40]

Page 104: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / muSPARQL Q2

102 103 104 105 106 10710−2

10−1

100

101

102

Number of nodes

Time

(s)

muSPARQLRamsdell

DLVVlog

Postgres

Figure: ?a (P1+)∕(P 5+) ?b.

[34/40]

Page 105: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / muSPARQL Q3

102 103 104 105 106 10710−2

10−1

100

101

102

Number of nodes

Time

(s)

muSPARQLRamsdell

DLVVlog

Postgres

Figure: ?a (P1+)∕P 2 ?b . ?b P3 + ?c.

[35/40]

Page 106: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / muSPARQL Q7

102 103 104 105 106 10710−2

10−1

100

101

102

Number of nodes

Time

(s)

muSPARQLRamsdell

DLVVlog

Postgres

Figure: N0 P1∕(P2+) ?a

[36/40]

Page 107: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Contributions / Execution / muSPARQL Q10

102 103 104 105 106 10710−2

10−1

100

101

102

Number of nodes

Time

(s)

muSPARQLRamsdell

DLVVlog

Postgres

Figure: ?a (P4+)∕(P5+)∕(P3+) ?b

[37/40]

Page 108: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Conclusion / Contributions

Main contributions:

�-Algebra A new algebra with new efficient rewritingrules for fixpoints

BDA’17, BDA’18

muSPARQL A prototype evaluator based on streams

CardEst A new cardinality estimation technique

SPARQLGX A distributed SPARQL query evaluator

ISWC’16

Collaborations:

* Leaves enumeration technique.

ICALP’17

* Distributed SPARQL query evaluators benchmarks.

BDA’17

* ProvSQL: provenance for SQL.

VLDB’18

[38/40]

Page 109: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Conclusion / Contributions

Main contributions:

�-Algebra A new algebra with new efficient rewritingrules for fixpoints BDA’17, BDA’18

muSPARQL A prototype evaluator based on streams

CardEst A new cardinality estimation technique

SPARQLGX A distributed SPARQL query evaluator ISWC’16

Collaborations:

* Leaves enumeration technique. ICALP’17

* Distributed SPARQL query evaluators benchmarks. BDA’17

* ProvSQL: provenance for SQL. VLDB’18

[38/40]

Page 110: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Conclusion / Perspectives

Short-term perspectives

* Complete the distributed implementation of �-algebra* Use �-algebra for SPARQL with ontology-based data access

Long-term perspectives

* Improve cardinality estimation scheme

* Port the optimization of the �-algebra to SQL and Datalog

[39/40]

Page 111: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

>>> Conclusion / Questions?

Questions?

[40/40]

Page 112: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

Güneş Aluç, Olaf Hartig, M Tamer Özsu, and KhuzaimaDaudjee.Diversified stress testing of rdf data management systems.

In International Semantic Web Conference, pages 197–212.Springer, 2014.

Edgar F Codd.A relational model of data for large shared data banks.Communications of the ACM, 13(6):377–387, 1970.

Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin.LUBM: A benchmark for OWL knowledge base systems.Web Semantics: Science, Services and Agents on the WorldWide Web, 3(2):158–182, October 2005.

Steve Harris, Andy Seaborne, and Eric Prud’hommeaux.SPARQL 1.1 query language.W3C recommendation, 21(10), 2013.

[40/40]

Page 113: PhD Defense - Pages personelles de Louis Jachietlouis.jachiet.com/these/slide.pdf · >>>PhD Defense >>> On the foundations for the compilation of web data queries: optimization and

Eric Prud’Hommeaux, Andy Seaborne, et al.SPARQL query language for RDF.W3C recommendation, 15, 2008.www.w3.org/TR/rdf-sparql-query/.

David Wood Richard Cyganiak and Lanthaler Markus.RDF 1.1 concepts and abstract syntax, February 2014.

Nikolay Yakovets, Parke Godfrey, and Jarek Gryz.Towards Query Optimization for SPARQL Property Paths.arXiv:1504.08262 [cs], April 2015.arXiv: 1504.08262.

[40/40]