complex queries in a distributed multi-model database

70
Complex queries in a distributed multi-model database Max Neunhöffer Tech Talk Geekdom, 16 March 2015

Upload: max-neunhoeffer

Post on 18-Jul-2015

64 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Complex queries in a distributed multi-model database

Complex queries in adistributed multi-modeldatabaseMax Neunhöffer

Tech Talk Geekdom, 16 March 2015

www.arangodb.com

Page 2: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.A “collection” is a set ofdocuments in a DB.The DB can inspect thevalues, allowing forsecondary indexes.Or one can just treat theDB as a key/value store.Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 3: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.

A “collection” is a set ofdocuments in a DB.The DB can inspect thevalues, allowing forsecondary indexes.Or one can just treat theDB as a key/value store.Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 4: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.A “collection” is a set ofdocuments in a DB.

The DB can inspect thevalues, allowing forsecondary indexes.Or one can just treat theDB as a key/value store.Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 5: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.A “collection” is a set ofdocuments in a DB.The DB can inspect thevalues, allowing forsecondary indexes.

Or one can just treat theDB as a key/value store.Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 6: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.A “collection” is a set ofdocuments in a DB.The DB can inspect thevalues, allowing forsecondary indexes.Or one can just treat theDB as a key/value store.

Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 7: Complex queries in a distributed multi-model database

Documents and collections{

"_key": "123456",

"_id": "chars/123456",

"name": "Duck",

"firstname": "Donald",

"dob": "1934-11-13",

"hobbies": ["Golf",

"Singing",

"Running"],

"home":

{"town": "Duck town",

"street": "Lake Road",

"number": 17},

"species": "duck"

}

When I say “document”,Imean “JSON”.A “collection” is a set ofdocuments in a DB.The DB can inspect thevalues, allowing forsecondary indexes.Or one can just treat theDB as a key/value store.Sharding: the data of acollection is distributedbetween multiple servers.

1

Page 8: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 9: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.

Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 10: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.

Vertices and edges aredocuments.Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 11: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.

Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 12: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.Every edge has a _from and a _toattribute.

The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 13: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.

For example, paths in the graphare interesting.

2

Page 14: Complex queries in a distributed multi-model database

Graphs

A

B

D

E

F

G

C

"likes"

"hates"

A graph consists of vertices andedges.Graphs model relations, can bedirected or undirected.Vertices and edges aredocuments.Every edge has a _from and a _toattribute.The database offers queries andtransactions dealing with graphs.For example, paths in the graphare interesting.

2

Page 15: Complex queries in a distributed multi-model database

Query 1Fetch all documents in a collectionFOR p IN people

RETURN p

[ { "name": "Schmidt", "firstname": "Helmut",

"hobbies": ["Smoking"]},

{ "name": "Neunhöffer", "firstname": "Max",

"hobbies": ["Piano", "Golf"]},

...

]

(Actually, a cursor is returned.)

3

Page 16: Complex queries in a distributed multi-model database

Query 1Fetch all documents in a collectionFOR p IN people

RETURN p

[ { "name": "Schmidt", "firstname": "Helmut",

"hobbies": ["Smoking"]},

{ "name": "Neunhöffer", "firstname": "Max",

"hobbies": ["Piano", "Golf"]},

...

]

(Actually, a cursor is returned.)

3

Page 17: Complex queries in a distributed multi-model database

Query 1Fetch all documents in a collectionFOR p IN people

RETURN p

[ { "name": "Schmidt", "firstname": "Helmut",

"hobbies": ["Smoking"]},

{ "name": "Neunhöffer", "firstname": "Max",

"hobbies": ["Piano", "Golf"]},

...

]

(Actually, a cursor is returned.)3

Page 18: Complex queries in a distributed multi-model database

Query 2Use filtering, sorting and limitFOR p IN people

FILTER p.age >= @minage

SORT p.name, p.firstname

LIMIT @nrlimit

RETURN { name: CONCAT(p.name, ", ",

p.firstname),

age : p.age }

[ { "name": "Neunhöffer, Max", "age": 44 },

{ "name": "Schmidt, Helmut", "age": 95 },

...

]

4

Page 19: Complex queries in a distributed multi-model database

Query 2Use filtering, sorting and limitFOR p IN people

FILTER p.age >= @minage

SORT p.name, p.firstname

LIMIT @nrlimit

RETURN { name: CONCAT(p.name, ", ",

p.firstname),

age : p.age }

[ { "name": "Neunhöffer, Max", "age": 44 },

{ "name": "Schmidt, Helmut", "age": 95 },

...

]

4

Page 20: Complex queries in a distributed multi-model database

Query 3Aggregation and functionsFOR p IN people

COLLECT a = p.age INTO L

FILTER a >= @minage

RETURN { "age": a, "number": LENGTH(L) }

[ { "age": 18, "number": 10 },

{ "age": 19, "number": 17 },

{ "age": 20, "number": 12 },

...

]

5

Page 21: Complex queries in a distributed multi-model database

Query 3Aggregation and functionsFOR p IN people

COLLECT a = p.age INTO L

FILTER a >= @minage

RETURN { "age": a, "number": LENGTH(L) }

[ { "age": 18, "number": 10 },

{ "age": 19, "number": 17 },

{ "age": 20, "number": 12 },

...

]

5

Page 22: Complex queries in a distributed multi-model database

Query 4JoinsFOR p IN @@peoplecollection

FOR h IN houses

FILTER p._key == h.owner

SORT h.streetname, h.housename

RETURN { housename: h.housename,

streetname: h.streetname,

owner: p.name,

value: h.value }

[ { "housename": "Firlefanz",

"streetname": "Meyer street",

"owner": "Hans Schmidt", "value": 423000

},

...

]

6

Page 23: Complex queries in a distributed multi-model database

Query 4JoinsFOR p IN @@peoplecollection

FOR h IN houses

FILTER p._key == h.owner

SORT h.streetname, h.housename

RETURN { housename: h.housename,

streetname: h.streetname,

owner: p.name,

value: h.value }

[ { "housename": "Firlefanz",

"streetname": "Meyer street",

"owner": "Hans Schmidt", "value": 423000

},

...

]

6

Page 24: Complex queries in a distributed multi-model database

Query 5

Modifying dataFOR e IN events

FILTER e.timestamp<"2014-09-01T09:53+0200"

INSERT e IN oldevents

FOR e IN events

FILTER e.timestamp<"2014-09-01T09:53+0200"

REMOVE e._key IN events

7

Page 25: Complex queries in a distributed multi-model database

Query 6Graph queriesFOR x IN GRAPH_SHORTEST_PATH(

"routeplanner", "germanCity/Cologne",

"frenchCity/Paris", {weight: "distance"} )

RETURN { begin : x.startVertex,

end : x.vertex,

distance : x.distance,

nrPaths : LENGTH(x.paths) }

[ { "begin": "germanCity/Cologne",

"end" : {"_id": "frenchCity/Paris", ... },

"distance": 550,

"nrPaths": 10 },

...

]

8

Page 26: Complex queries in a distributed multi-model database

Query 6Graph queriesFOR x IN GRAPH_SHORTEST_PATH(

"routeplanner", "germanCity/Cologne",

"frenchCity/Paris", {weight: "distance"} )

RETURN { begin : x.startVertex,

end : x.vertex,

distance : x.distance,

nrPaths : LENGTH(x.paths) }

[ { "begin": "germanCity/Cologne",

"end" : {"_id": "frenchCity/Paris", ... },

"distance": 550,

"nrPaths": 10 },

...

]

8

Page 27: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user

2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 28: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)

3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 29: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters

4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 30: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.

5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 31: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)

6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 32: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs

7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 33: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster

8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 34: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs

9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 35: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost

10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 36: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine

11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 37: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers

12. Execute plan, provide cursor API

9

Page 38: Complex queries in a distributed multi-model database

Life of a query1. Text and query parameters come from user2. Parse text, produce abstract syntax tree (AST)3. Substitute query parameters4. First optimisation: constant expressions, etc.5. Translate AST into an execution plan (EXP)6. Optimise one EXP, producemany, potentially better EXPs7. Reason about distribution in cluster8. Optimise distributed EXPs9. Estimate costs for all EXPs, and sort by ascending cost10. Instanciate “cheapest” plan, i.e. set up execution engine11. Distribute and link up engines on different servers12. Execute plan, provide cursor API

9

Page 39: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXP

Black arrows aredependenciesThink of a pipelineEach node providesa cursor APIBlocks of “Items”travel through thepipelineWhat is an “item”???

10

Page 40: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXPBlack arrows aredependencies

Think of a pipelineEach node providesa cursor APIBlocks of “Items”travel through thepipelineWhat is an “item”???

10

Page 41: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXPBlack arrows aredependenciesThink of a pipeline

Each node providesa cursor APIBlocks of “Items”travel through thepipelineWhat is an “item”???

10

Page 42: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXPBlack arrows aredependenciesThink of a pipelineEach node providesa cursor API

Blocks of “Items”travel through thepipelineWhat is an “item”???

10

Page 43: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXPBlack arrows aredependenciesThink of a pipelineEach node providesa cursor APIBlocks of “Items”travel through thepipeline

What is an “item”???

10

Page 44: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

Query→ EXPBlack arrows aredependenciesThink of a pipelineEach node providesa cursor APIBlocks of “Items”travel through thepipelineWhat is an “item”???

10

Page 45: Complex queries in a distributed multi-model database

Pipeline and items

FOR a IN collA EnumerateCollection a

EnumerateCollection b

Singleton

Calculation xx

FOR b IN collB

LET xx = a.x Items have vars a, xx

Items have no vars

Items are the thingies traveling through the pipeline.

An item holds values of those variables in the current frameThus: Items look differently in different parts of the planWe always deal with blocks of items for performance reasons

11

Page 46: Complex queries in a distributed multi-model database

Pipeline and items

FOR a IN collA EnumerateCollection a

EnumerateCollection b

Singleton

Calculation xx

FOR b IN collB

LET xx = a.x Items have vars a, xx

Items have no vars

Items are the thingies traveling through the pipeline.An item holds values of those variables in the current frame

Thus: Items look differently in different parts of the planWe always deal with blocks of items for performance reasons

11

Page 47: Complex queries in a distributed multi-model database

Pipeline and items

FOR a IN collA EnumerateCollection a

EnumerateCollection b

Singleton

Calculation xx

FOR b IN collB

LET xx = a.x Items have vars a, xx

Items have no vars

Items are the thingies traveling through the pipeline.An item holds values of those variables in the current frameThus: Items look differently in different parts of the plan

We always deal with blocks of items for performance reasons

11

Page 48: Complex queries in a distributed multi-model database

Pipeline and items

FOR a IN collA EnumerateCollection a

EnumerateCollection b

Singleton

Calculation xx

FOR b IN collB

LET xx = a.x Items have vars a, xx

Items have no vars

Items are the thingies traveling through the pipeline.An item holds values of those variables in the current frameThus: Items look differently in different parts of the planWe always deal with blocks of items for performance reasons

11

Page 49: Complex queries in a distributed multi-model database

Execution plans

FOR a IN collA

RETURN {x: a.x, z: b.z}

EnumerateCollection a

EnumerateCollection b

Calculation xx == b.y

Filter xx == b.y

Singleton

Calculation xx

Return {x: a.x, z: b.z}

Calc {x: a.x, z: b.z}

FILTER xx == b.y

FOR b IN collB

LET xx = a.x

12

Page 50: Complex queries in a distributed multi-model database

Move filters upFOR a IN collA

FOR b IN collB

FILTER a.x == 10

FILTER a.u == b.v

RETURN {u:a.u,w:b.w}

The result and behaviour does notchange, if the first FILTER is pulledout of the inner FOR.However, the number of items trave-ling in the pipeline is decreased.Note that the two FOR statementscould be interchanged!

Singleton

EnumColl a

EnumColl b

Calc a.x == 10

Return {u:a.u,w:b.w}

Filter a.u == b.v

Calc a.u == b.v

Filter a.x == 10

13

Page 51: Complex queries in a distributed multi-model database

Move filters upFOR a IN collA

FOR b IN collB

FILTER a.x == 10

FILTER a.u == b.v

RETURN {u:a.u,w:b.w}

The result and behaviour does notchange, if the first FILTER is pulledout of the inner FOR.

However, the number of items trave-ling in the pipeline is decreased.Note that the two FOR statementscould be interchanged!

Singleton

EnumColl a

EnumColl b

Calc a.x == 10

Return {u:a.u,w:b.w}

Filter a.u == b.v

Calc a.u == b.v

Filter a.x == 10

13

Page 52: Complex queries in a distributed multi-model database

Move filters upFOR a IN collA

FILTER a.x < 10

FOR b IN collB

FILTER a.u == b.v

RETURN {u:a.u,w:b.w}

The result and behaviour does notchange, if the first FILTER is pulledout of the inner FOR.However, the number of items trave-ling in the pipeline is decreased.

Note that the two FOR statementscould be interchanged!

Singleton

EnumColl a

Return {u:a.u,w:b.w}

Filter a.u == b.v

Calc a.u == b.v

Calc a.x == 10

EnumColl b

Filter a.x == 10

13

Page 53: Complex queries in a distributed multi-model database

Move filters upFOR a IN collA

FILTER a.x < 10

FOR b IN collB

FILTER a.u == b.v

RETURN {u:a.u,w:b.w}

The result and behaviour does notchange, if the first FILTER is pulledout of the inner FOR.However, the number of items trave-ling in the pipeline is decreased.Note that the two FOR statementscould be interchanged!

Singleton

EnumColl a

Return {u:a.u,w:b.w}

Filter a.u == b.v

Calc a.u == b.v

Calc a.x == 10

EnumColl b

Filter a.x == 10

13

Page 54: Complex queries in a distributed multi-model database

Remove unnecessary calculationsFOR a IN collA

LET L = LENGTH(a.hobbies)

FOR b IN collB

FILTER a.u == b.v

RETURN {h:a.hobbies,w:b.w}

The Calculation of L is unnecessary!(since it cannot throw an exception).Therefore we can just leave it out.

Singleton

EnumColl a

Calc L = ...

EnumColl b

Calc a.u == b.v

Filter a.u == b.v

Return {...}

14

Page 55: Complex queries in a distributed multi-model database

Remove unnecessary calculationsFOR a IN collA

LET L = LENGTH(a.hobbies)

FOR b IN collB

FILTER a.u == b.v

RETURN {h:a.hobbies,w:b.w}

The Calculation of L is unnecessary!

(since it cannot throw an exception).Therefore we can just leave it out.

Singleton

EnumColl a

Calc L = ...

EnumColl b

Calc a.u == b.v

Filter a.u == b.v

Return {...}

14

Page 56: Complex queries in a distributed multi-model database

Remove unnecessary calculationsFOR a IN collA

FOR b IN collB

FILTER a.u == b.v

RETURN {h:a.hobbies,w:b.w}

The Calculation of L is unnecessary!(since it cannot throw an exception).

Therefore we can just leave it out.

Singleton

EnumColl a

EnumColl b

Calc a.u == b.v

Filter a.u == b.v

Return {...}

14

Page 57: Complex queries in a distributed multi-model database

Remove unnecessary calculationsFOR a IN collA

FOR b IN collB

FILTER a.u == b.v

RETURN {h:a.hobbies,w:b.w}

The Calculation of L is unnecessary!(since it cannot throw an exception).Therefore we can just leave it out.

Singleton

EnumColl a

EnumColl b

Calc a.u == b.v

Filter a.u == b.v

Return {...}

14

Page 58: Complex queries in a distributed multi-model database

Use index for FILTER and SORTFOR a IN collA

FILTER a.x > 17 &&

a.x <= 23 &&

a.y == 10

SORT a.y, a.x

RETURN a

Assume collA has a skiplist index on “y”and “x” (in this order), then we can readoff the half-open interval between{ y: 10, x: 17 } and{ y: 10, x: 23 }from the skiplist index.

The result will automatically be sorted byy and then by x.

Singleton

EnumColl a

Filter ...

Calc ...

Sort a.y, a.x

Return a

15

Page 59: Complex queries in a distributed multi-model database

Use index for FILTER and SORTFOR a IN collA

FILTER a.x > 17 &&

a.x <= 23 &&

a.y == 10

SORT a.y, a.x

RETURN a

Assume collA has a skiplist index on “y”and “x” (in this order),

then we can readoff the half-open interval between{ y: 10, x: 17 } and{ y: 10, x: 23 }from the skiplist index.

The result will automatically be sorted byy and then by x.

Singleton

EnumColl a

Filter ...

Calc ...

Sort a.y, a.x

Return a

15

Page 60: Complex queries in a distributed multi-model database

Use index for FILTER and SORTFOR a IN collA

FILTER a.x > 17 &&

a.x <= 23 &&

a.y == 10

SORT a.y, a.x

RETURN a

Assume collA has a skiplist index on “y”and “x” (in this order), then we can readoff the half-open interval between{ y: 10, x: 17 } and{ y: 10, x: 23 }from the skiplist index.

The result will automatically be sorted byy and then by x.

Singleton

Sort a.y, a.x

Return a

IndexRange a

15

Page 61: Complex queries in a distributed multi-model database

Use index for FILTER and SORTFOR a IN collA

FILTER a.x > 17 &&

a.x <= 23 &&

a.y == 10

SORT a.y, a.x

RETURN a

Assume collA has a skiplist index on “y”and “x” (in this order), then we can readoff the half-open interval between{ y: 10, x: 17 } and{ y: 10, x: 23 }from the skiplist index.

The result will automatically be sorted byy and then by x.

Singleton

Return a

IndexRange a

15

Page 62: Complex queries in a distributed multi-model database

Data distribution in a clusterRequests

DBserver DBserver DBserver

CoordinatorCoordinator

4 2 5 3 11

The shards of a collection are distributed across the DBservers.

The coordinators receive queries and organise theirexecution

16

Page 63: Complex queries in a distributed multi-model database

Data distribution in a clusterRequests

DBserver DBserver DBserver

CoordinatorCoordinator

4 2 5 3 11

The shards of a collection are distributed across the DBservers.The coordinators receive queries and organise theirexecution

16

Page 64: Complex queries in a distributed multi-model database

Scatter/gather

EnumerateCollection

17

Page 65: Complex queries in a distributed multi-model database

Scatter/gather

Remote

EnumShard

Remote Remote

EnumShard

Remote

Concat/Merge

Remote

EnumShard

Remote

Scatter

17

Page 66: Complex queries in a distributed multi-model database

Scatter/gather

Remote

EnumShard

Remote Remote

EnumShard

Remote

Concat/Merge

Remote

EnumShard

Remote

Scatter

17

Page 67: Complex queries in a distributed multi-model database

Modifying queriesFortunately:

There can be at most one modifying node in each query.There can be no modifying nodes in subqueries.

Modifying nodesThe modifying node in a query

is executed on the DBservers,to this end, we either scatter the items to all DBservers,or, if possible, we distribute each item to the shardthat is responsible for the modification.Sometimes, we can even optimise away a gather/scattercombination and parallelise completely.

18

Page 68: Complex queries in a distributed multi-model database

Modifying queriesFortunately:

There can be at most one modifying node in each query.There can be no modifying nodes in subqueries.Modifying nodesThe modifying node in a query

is executed on the DBservers,

to this end, we either scatter the items to all DBservers,or, if possible, we distribute each item to the shardthat is responsible for the modification.Sometimes, we can even optimise away a gather/scattercombination and parallelise completely.

18

Page 69: Complex queries in a distributed multi-model database

Modifying queriesFortunately:

There can be at most one modifying node in each query.There can be no modifying nodes in subqueries.Modifying nodesThe modifying node in a query

is executed on the DBservers,to this end, we either scatter the items to all DBservers,or, if possible, we distribute each item to the shardthat is responsible for the modification.

Sometimes, we can even optimise away a gather/scattercombination and parallelise completely.

18

Page 70: Complex queries in a distributed multi-model database

Modifying queriesFortunately:

There can be at most one modifying node in each query.There can be no modifying nodes in subqueries.Modifying nodesThe modifying node in a query

is executed on the DBservers,to this end, we either scatter the items to all DBservers,or, if possible, we distribute each item to the shardthat is responsible for the modification.Sometimes, we can even optimise away a gather/scattercombination and parallelise completely.

18