distributed database systemsmaterials.dagstuhl.de/files/17/17262/17262.katjahose1.slides.pdf ·...

62
Distributed Database Systems Distributed Database Systems A classic perspective Katja Hose Department of Computer Science Aalborg University [email protected] Dagstuhl June 27, 2017 Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 1 / 24

Upload: others

Post on 25-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed Database SystemsA classic perspective

Katja Hose

Department of Computer ScienceAalborg [email protected]

DagstuhlJune 27, 2017

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 1 / 24

Page 2: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

What is distributed data management?

Example: distributed database system

Scenario

Company with offices in Aalborg, Berlin, New York, and Paris

Database of products and customers

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 2 / 24

Page 3: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

What is distributed data management?

Example: distributed database system

Definition: distributed database

A collection of multiple, logically interrelated databases distributed over acomputer network

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 2 / 24

Page 4: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 5: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 6: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 7: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 8: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 9: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Introduction

Challenges

Challenges

Distributed database designFragmentation, replication, and distribution

Distributed query processingExecuting a query over the network in the most cost-effective way

Distributed concurrency controlSynchronizing access such that integrity is maintained

Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures

Heterogeneous databasesTranslation between database systems – data model, data language

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 10: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

1 IntroductionWhat is distributed data management?Challenges

2 Fragmentation and Allocation

3 Distributed query processingData localizationGlobal Query OptimizationJoin Order OptimizationQuery Execution

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24

Page 11: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

Fragmentation and Allocation

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 4 / 24

Page 12: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

Fragmentation

Alternatives

Tuples vs. columns, i.e., horizontal vs. vertical

Horizontal fragmentationVertical fragmentation

Many algorithms proposed in the literature

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 5 / 24

Page 13: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

Allocation

Golden rules of allocation

Place data as close as possible to where it will be used

Use load balancing to find a global optimization of systemperformance

Optimal allocation

Objective:Minimize storage (

∑S) and transfer costs (

∑T ) for P fragments and

K nodes

Optimization problem

Find minimum∑

S +∑

T

Often heuristics are applied

Many algorithms proposed in the literature

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24

Page 14: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

Allocation

Golden rules of allocation

Place data as close as possible to where it will be used

Use load balancing to find a global optimization of systemperformance

Optimal allocation

Objective:Minimize storage (

∑S) and transfer costs (

∑T ) for P fragments and

K nodes

Optimization problem

Find minimum∑

S +∑

T

Often heuristics are applied

Many algorithms proposed in the literature

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24

Page 15: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Fragmentation and Allocation

Allocation

Golden rules of allocation

Place data as close as possible to where it will be used

Use load balancing to find a global optimization of systemperformance

Optimal allocation

Objective:Minimize storage (

∑S) and transfer costs (

∑T ) for P fragments and

K nodes

Optimization problem

Find minimum∑

S +∑

T

Often heuristics are applied

Many algorithms proposed in the literature

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24

Page 16: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

1 IntroductionWhat is distributed data management?Challenges

2 Fragmentation and Allocation

3 Distributed query processingData localizationGlobal Query OptimizationJoin Order OptimizationQuery Execution

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24

Page 17: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Phases of distributed query processing

Phases of distributed query processing

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 7 / 24

Page 18: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Phases of distributed query processing

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 7 / 24

Page 19: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Schema

Projects1 = σBudget≤150.000(Projects)

Projects2 = σ150.000<Budget≤200.000(Projects)

Projects3 = σBudget>200.000(Projects)

Reconstruction expression (horizontal fragmentation)

Projects = Projects1 ∪Projects2 ∪Projects3

Example query

σLocation=′Aalborg′∧Budget≤100.000(Projects)

After replacing references to global relations

σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24

Page 20: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Schema

Projects1 = σBudget≤150.000(Projects)

Projects2 = σ150.000<Budget≤200.000(Projects)

Projects3 = σBudget>200.000(Projects)

Reconstruction expression (horizontal fragmentation)

Projects = Projects1 ∪Projects2 ∪Projects3

Example query

σLocation=′Aalborg′∧Budget≤100.000(Projects)

After replacing references to global relations

σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24

Page 21: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Schema

Projects1 = σBudget≤150.000(Projects)

Projects2 = σ150.000<Budget≤200.000(Projects)

Projects3 = σBudget>200.000(Projects)

Reconstruction expression (horizontal fragmentation)

Projects = Projects1 ∪Projects2 ∪Projects3

Example query

σLocation=′Aalborg′∧Budget≤100.000(Projects)

After replacing references to global relations

σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24

Page 22: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Schema

Projects1 = σBudget≤150.000(Projects)

Projects2 = σ150.000<Budget≤200.000(Projects)

Projects3 = σBudget>200.000(Projects)

Reconstruction expression (horizontal fragmentation)

Projects = Projects1 ∪Projects2 ∪Projects3

Example query

σLocation=′Aalborg′∧Budget≤100.000(Projects)

After replacing references to global relations

σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24

Page 23: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Schema

Projects1 = σBudget≤150.000(Projects)

Projects2 = σ150.000<Budget≤200.000(Projects)

Projects3 = σBudget>200.000(Projects)

Reconstruction expression (horizontal fragmentation)

Projects = Projects1 ∪Projects2 ∪Projects3

Example query

σLocation=′Aalborg′∧Budget≤100.000(Projects)

After replacing references to global relations

σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24

Page 24: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Objective

Eliminate non-necessary subqueries (contradiction between query andfragmentation predicate)

Query with fragmentation expressionσLocation=′Aalborg′∧Budget≤100.000(Projects1∪Projects2∪Projects3)

Fragment definitionsProjects1 = σBudget≤150.000(Projects)Projects2 = σ150.000<Budget≤200.000(Projects)Projects3 = σBudget>200.000(Projects)

Because ofσBudget≤100.000(Projects2) = ∅, σBudget≤100.000(Projects3) = ∅

We obtain the reduced queryσLocation=′Aalborg′(σBudget≤100.000(Projects1))

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 9 / 24

Page 25: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – horizontal reduction

Objective

Eliminate non-necessary subqueries (contradiction between query andfragmentation predicate)

Query with fragmentation expressionσLocation=′Aalborg′∧Budget≤100.000(Projects1∪Projects2∪Projects3)

Fragment definitionsProjects1 = σBudget≤150.000(Projects)Projects2 = σ150.000<Budget≤200.000(Projects)Projects3 = σBudget>200.000(Projects)

Because ofσBudget≤100.000(Projects2) = ∅, σBudget≤100.000(Projects3) = ∅

We obtain the reduced queryσLocation=′Aalborg′(σBudget≤100.000(Projects1))

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 9 / 24

Page 26: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query Projects on Assignment

After replacing global relations with reconstruction expressions

(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)

After applying the distributive law

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24

Page 27: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query Projects on Assignment

After replacing global relations with reconstruction expressions

(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)

After applying the distributive law

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24

Page 28: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query Projects on Assignment

After replacing global relations with reconstruction expressions

(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)

After applying the distributive law

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24

Page 29: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query Projects on Assignment

After replacing global relations with reconstruction expressions

(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)

After applying the distributive law

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Further optimization is possible!

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24

Page 30: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query with fragmentation expression

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Some of these partial joins are empty, e.g.:

Projects1 on Assignment2 = ∅

Because their fragmentation expressions contradict:

Projects1 = σPNo=′P1′∨PNo=′P2′(Projects) andAssignment2 = σPNo=′P3′∨PNo=′P4′(Assignment)

Reduced query

(Projects1 on Assignment1) ∪ (Projects2 on Assignment2) ∪(Projects3 on Assignment2)

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 11 / 24

Page 31: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Data localization

Example – join reduction

Query with fragmentation expression

(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪

(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)

Some of these partial joins are empty, e.g.:

Projects1 on Assignment2 = ∅

Because their fragmentation expressions contradict:

Projects1 = σPNo=′P1′∨PNo=′P2′(Projects) andAssignment2 = σPNo=′P3′∨PNo=′P4′(Assignment)

Reduced query

(Projects1 on Assignment1) ∪ (Projects2 on Assignment2) ∪(Projects3 on Assignment2)

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 11 / 24

Page 32: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Global Query Optimization

Phases of distributed query processing

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 12 / 24

Page 33: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Global Query Optimization

Global query optimization

Phases

1 Spanning the search space usingtransformation rules→ equivalent search plans

2 Applying a search strategy and acost model→ choose an efficient plan

Main focus: join trees and joinordering

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 13 / 24

Page 34: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Global Query Optimization

Search space

Tree variants for join order optimization

Linear join trees

All inner nodes have at least one leaf node (base relation) as childReduces search space

Bushy trees

May have inner nodes with no base relation as childHigh potential for parallelization

⊲⊳

⊲⊳ R1

⊲⊳ R2

R3 R4

linear join tree

⊲⊳

⊲⊳ ⊲⊳

R1 R2 R3 R4

bushy join tree

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 14 / 24

Page 35: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Global Query Optimization

Search strategy

Deterministic search strategy

Systematic generation of query plans

Starting with plans accessing the base relations

Constructing complex plans by combining easier plans, e.g., joiningone more relation at each step

Implementations

Exhaustive search guarantees finding the best plan

Dynamic programming

Greedy algorithm

Cost model

Detailed statistics

Calibrated parameters

Fomulas (cardinality estimation, processing costs, communicationcosts, etc.)

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 15 / 24

Page 36: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Site selection and data transfer

Query shipping

Query initiator (node at whichthe query is issued/optimized)sends the query to othernodes

Receiver nodes compute thequery result and ship the resultback to the initiator

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 16 / 24

Page 37: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Site selection and data transfer

Data shipping

Query remains at the initiator

Initiator sends data requestmessages to other nodes

Receiver nodes ship allrequired data to the initiator

Initiator computes result

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 17 / 24

Page 38: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Site selection and data transfer

Hybrid shipping

Initiator sends partial queriesto other nodes

Other nodes execute someparts of the query and sendintermediate results to theinitiator

Initiator executes remainingquery operations(post-processing)

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 18 / 24

Page 39: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (relation S) to nodeS

nodeS : send requested data (relation S) to nodeR

Total costs: 2 messages, 18 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24

Page 40: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (relation S) to nodeS

nodeS : send requested data (relation S) to nodeR

Total costs: 2 messages, 18 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24

Page 41: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (relation S) to nodeS

nodeS : send requested data (relation S) to nodeR

Total costs: 2 messages, 18 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24

Page 42: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (relation S) to nodeS

nodeS : send requested data (relation S) to nodeR

Total costs: 2 messages, 18 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24

Page 43: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR: 2 messages, 18 attribute values

Execution at nodeS : 2 messages, 14 attribute values

Execution at nodeX : 4 messages, 18 + 14 = 32 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24

Page 44: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 45: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 46: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 47: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 48: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 49: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 50: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 51: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 52: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 53: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 54: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 55: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Fetch as needed

R A B3 71 14 67 74 56 25 7

S B C D9 8 81 5 19 4 24 3 34 2 65 7 8

R on S A B C D1 1 5 14 5 7 8

Execution at nodeR

nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS

nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR

nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS

nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR

. . .

Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values

Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute valuesKatja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24

Page 56: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole vs. fetch as needed

Conclusion

Fetch as needed results in a high number of messages

Ship whole results in high amounts of transferred data

More advanced join execution algorithms

Semijoin

Bitvector join

. . .

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 21 / 24

Page 57: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Ship whole vs. fetch as needed

Conclusion

Fetch as needed results in a high number of messages

Ship whole results in high amounts of transferred data

More advanced join execution algorithms

Semijoin

Bitvector join

. . .

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 21 / 24

Page 58: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Semijoin

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 22 / 24

Page 59: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Query Execution

Bitvector join

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 23 / 24

Page 60: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Summary

Query optimization in distributed database systems benefits from globalknowledge and control.

We can decide on fragmentation and allocation

Create detailed statistics

Finetune a cost model

Optimize join processing (Semijoin, Bitvector join,. . . )

. . .

Further interesting topics

Distributed concurrency control

Reliability and failure

Heterogeneity and data integration

. . .

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24

Page 61: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

Distributed query processing

Summary

Query optimization in distributed database systems benefits from globalknowledge and control.

We can decide on fragmentation and allocation

Create detailed statistics

Finetune a cost model

Optimize join processing (Semijoin, Bitvector join,. . . )

. . .

Further interesting topics

Distributed concurrency control

Reliability and failure

Heterogeneity and data integration

. . .Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24

Page 62: Distributed Database Systemsmaterials.dagstuhl.de/files/17/17262/17262.KatjaHose1.Slides.pdf · Phases of distributed query processing Katja Hose Distributed Database Systems Dagstuhl,

Distributed Database Systems

References

References

This talk involves material based on the following sources:

M. Tamer Ozsu, P. Valduriez. Principles of Distributed Database Systems.Third Edition, Springer, 2011.

P. Dadam. Verteilte Datenbanken und Client/Server-Systeme.Springer-Verlag, Berlin, Heidelberg 1996.

Kai-Uwe Sattler Vorlesung: Verteilte Datenbanksysteme.

Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24