distributed database systemsmaterials.dagstuhl.de/files/17/17262/17262.katjahose1.slides.pdf ·...
TRANSCRIPT
Distributed Database Systems
Distributed Database SystemsA classic perspective
Katja Hose
Department of Computer ScienceAalborg [email protected]
DagstuhlJune 27, 2017
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 1 / 24
Distributed Database Systems
Introduction
What is distributed data management?
Example: distributed database system
Scenario
Company with offices in Aalborg, Berlin, New York, and Paris
Database of products and customers
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 2 / 24
Distributed Database Systems
Introduction
What is distributed data management?
Example: distributed database system
Definition: distributed database
A collection of multiple, logically interrelated databases distributed over acomputer network
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 2 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Introduction
Challenges
Challenges
Distributed database designFragmentation, replication, and distribution
Distributed query processingExecuting a query over the network in the most cost-effective way
Distributed concurrency controlSynchronizing access such that integrity is maintained
Reliability of distributed DBMSEnsure consistency, detect failures, and recover from failures
Heterogeneous databasesTranslation between database systems – data model, data language
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Fragmentation and Allocation
1 IntroductionWhat is distributed data management?Challenges
2 Fragmentation and Allocation
3 Distributed query processingData localizationGlobal Query OptimizationJoin Order OptimizationQuery Execution
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 3 / 24
Distributed Database Systems
Fragmentation and Allocation
Fragmentation and Allocation
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 4 / 24
Distributed Database Systems
Fragmentation and Allocation
Fragmentation
Alternatives
Tuples vs. columns, i.e., horizontal vs. vertical
Horizontal fragmentationVertical fragmentation
Many algorithms proposed in the literature
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 5 / 24
Distributed Database Systems
Fragmentation and Allocation
Allocation
Golden rules of allocation
Place data as close as possible to where it will be used
Use load balancing to find a global optimization of systemperformance
Optimal allocation
Objective:Minimize storage (
∑S) and transfer costs (
∑T ) for P fragments and
K nodes
Optimization problem
Find minimum∑
S +∑
T
Often heuristics are applied
Many algorithms proposed in the literature
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24
Distributed Database Systems
Fragmentation and Allocation
Allocation
Golden rules of allocation
Place data as close as possible to where it will be used
Use load balancing to find a global optimization of systemperformance
Optimal allocation
Objective:Minimize storage (
∑S) and transfer costs (
∑T ) for P fragments and
K nodes
Optimization problem
Find minimum∑
S +∑
T
Often heuristics are applied
Many algorithms proposed in the literature
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24
Distributed Database Systems
Fragmentation and Allocation
Allocation
Golden rules of allocation
Place data as close as possible to where it will be used
Use load balancing to find a global optimization of systemperformance
Optimal allocation
Objective:Minimize storage (
∑S) and transfer costs (
∑T ) for P fragments and
K nodes
Optimization problem
Find minimum∑
S +∑
T
Often heuristics are applied
Many algorithms proposed in the literature
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24
Distributed Database Systems
Distributed query processing
1 IntroductionWhat is distributed data management?Challenges
2 Fragmentation and Allocation
3 Distributed query processingData localizationGlobal Query OptimizationJoin Order OptimizationQuery Execution
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 6 / 24
Distributed Database Systems
Distributed query processing
Phases of distributed query processing
Phases of distributed query processing
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 7 / 24
Distributed Database Systems
Distributed query processing
Data localization
Phases of distributed query processing
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 7 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Schema
Projects1 = σBudget≤150.000(Projects)
Projects2 = σ150.000<Budget≤200.000(Projects)
Projects3 = σBudget>200.000(Projects)
Reconstruction expression (horizontal fragmentation)
Projects = Projects1 ∪Projects2 ∪Projects3
Example query
σLocation=′Aalborg′∧Budget≤100.000(Projects)
After replacing references to global relations
σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Schema
Projects1 = σBudget≤150.000(Projects)
Projects2 = σ150.000<Budget≤200.000(Projects)
Projects3 = σBudget>200.000(Projects)
Reconstruction expression (horizontal fragmentation)
Projects = Projects1 ∪Projects2 ∪Projects3
Example query
σLocation=′Aalborg′∧Budget≤100.000(Projects)
After replacing references to global relations
σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Schema
Projects1 = σBudget≤150.000(Projects)
Projects2 = σ150.000<Budget≤200.000(Projects)
Projects3 = σBudget>200.000(Projects)
Reconstruction expression (horizontal fragmentation)
Projects = Projects1 ∪Projects2 ∪Projects3
Example query
σLocation=′Aalborg′∧Budget≤100.000(Projects)
After replacing references to global relations
σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Schema
Projects1 = σBudget≤150.000(Projects)
Projects2 = σ150.000<Budget≤200.000(Projects)
Projects3 = σBudget>200.000(Projects)
Reconstruction expression (horizontal fragmentation)
Projects = Projects1 ∪Projects2 ∪Projects3
Example query
σLocation=′Aalborg′∧Budget≤100.000(Projects)
After replacing references to global relations
σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Schema
Projects1 = σBudget≤150.000(Projects)
Projects2 = σ150.000<Budget≤200.000(Projects)
Projects3 = σBudget>200.000(Projects)
Reconstruction expression (horizontal fragmentation)
Projects = Projects1 ∪Projects2 ∪Projects3
Example query
σLocation=′Aalborg′∧Budget≤100.000(Projects)
After replacing references to global relations
σLocation=′Aalborg′∧Budget≤100.000(Projects1 ∪Projects2 ∪Projects3)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 8 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Objective
Eliminate non-necessary subqueries (contradiction between query andfragmentation predicate)
Query with fragmentation expressionσLocation=′Aalborg′∧Budget≤100.000(Projects1∪Projects2∪Projects3)
Fragment definitionsProjects1 = σBudget≤150.000(Projects)Projects2 = σ150.000<Budget≤200.000(Projects)Projects3 = σBudget>200.000(Projects)
Because ofσBudget≤100.000(Projects2) = ∅, σBudget≤100.000(Projects3) = ∅
We obtain the reduced queryσLocation=′Aalborg′(σBudget≤100.000(Projects1))
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 9 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – horizontal reduction
Objective
Eliminate non-necessary subqueries (contradiction between query andfragmentation predicate)
Query with fragmentation expressionσLocation=′Aalborg′∧Budget≤100.000(Projects1∪Projects2∪Projects3)
Fragment definitionsProjects1 = σBudget≤150.000(Projects)Projects2 = σ150.000<Budget≤200.000(Projects)Projects3 = σBudget>200.000(Projects)
Because ofσBudget≤100.000(Projects2) = ∅, σBudget≤100.000(Projects3) = ∅
We obtain the reduced queryσLocation=′Aalborg′(σBudget≤100.000(Projects1))
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 9 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query Projects on Assignment
After replacing global relations with reconstruction expressions
(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)
After applying the distributive law
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query Projects on Assignment
After replacing global relations with reconstruction expressions
(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)
After applying the distributive law
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query Projects on Assignment
After replacing global relations with reconstruction expressions
(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)
After applying the distributive law
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query Projects on Assignment
After replacing global relations with reconstruction expressions
(Projects1 ∪Projects2 ∪Projects3) on (Assignment1 ∪Assignment2)
After applying the distributive law
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Further optimization is possible!
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 10 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query with fragmentation expression
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Some of these partial joins are empty, e.g.:
Projects1 on Assignment2 = ∅
Because their fragmentation expressions contradict:
Projects1 = σPNo=′P1′∨PNo=′P2′(Projects) andAssignment2 = σPNo=′P3′∨PNo=′P4′(Assignment)
Reduced query
(Projects1 on Assignment1) ∪ (Projects2 on Assignment2) ∪(Projects3 on Assignment2)
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 11 / 24
Distributed Database Systems
Distributed query processing
Data localization
Example – join reduction
Query with fragmentation expression
(Projects1 on Assignment1) ∪ (Projects1 on Assignment2) ∪(Projects2 on Assignment1) ∪ (Projects2 on Assignment2) ∪
(Projects3 on Assignment1) ∪ (Projects3 on Assignment2)
Some of these partial joins are empty, e.g.:
Projects1 on Assignment2 = ∅
Because their fragmentation expressions contradict:
Projects1 = σPNo=′P1′∨PNo=′P2′(Projects) andAssignment2 = σPNo=′P3′∨PNo=′P4′(Assignment)
Reduced query
(Projects1 on Assignment1) ∪ (Projects2 on Assignment2) ∪(Projects3 on Assignment2)
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 11 / 24
Distributed Database Systems
Distributed query processing
Global Query Optimization
Phases of distributed query processing
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 12 / 24
Distributed Database Systems
Distributed query processing
Global Query Optimization
Global query optimization
Phases
1 Spanning the search space usingtransformation rules→ equivalent search plans
2 Applying a search strategy and acost model→ choose an efficient plan
Main focus: join trees and joinordering
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 13 / 24
Distributed Database Systems
Distributed query processing
Global Query Optimization
Search space
Tree variants for join order optimization
Linear join trees
All inner nodes have at least one leaf node (base relation) as childReduces search space
Bushy trees
May have inner nodes with no base relation as childHigh potential for parallelization
⊲⊳
⊲⊳ R1
⊲⊳ R2
R3 R4
linear join tree
⊲⊳
⊲⊳ ⊲⊳
R1 R2 R3 R4
bushy join tree
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 14 / 24
Distributed Database Systems
Distributed query processing
Global Query Optimization
Search strategy
Deterministic search strategy
Systematic generation of query plans
Starting with plans accessing the base relations
Constructing complex plans by combining easier plans, e.g., joiningone more relation at each step
Implementations
Exhaustive search guarantees finding the best plan
Dynamic programming
Greedy algorithm
Cost model
Detailed statistics
Calibrated parameters
Fomulas (cardinality estimation, processing costs, communicationcosts, etc.)
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 15 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Site selection and data transfer
Query shipping
Query initiator (node at whichthe query is issued/optimized)sends the query to othernodes
Receiver nodes compute thequery result and ship the resultback to the initiator
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 16 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Site selection and data transfer
Data shipping
Query remains at the initiator
Initiator sends data requestmessages to other nodes
Receiver nodes ship allrequired data to the initiator
Initiator computes result
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 17 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Site selection and data transfer
Hybrid shipping
Initiator sends partial queriesto other nodes
Other nodes execute someparts of the query and sendintermediate results to theinitiator
Initiator executes remainingquery operations(post-processing)
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 18 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (relation S) to nodeS
nodeS : send requested data (relation S) to nodeR
Total costs: 2 messages, 18 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (relation S) to nodeS
nodeS : send requested data (relation S) to nodeR
Total costs: 2 messages, 18 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (relation S) to nodeS
nodeS : send requested data (relation S) to nodeR
Total costs: 2 messages, 18 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (relation S) to nodeS
nodeS : send requested data (relation S) to nodeR
Total costs: 2 messages, 18 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR: 2 messages, 18 attribute values
Execution at nodeS : 2 messages, 14 attribute values
Execution at nodeX : 4 messages, 18 + 14 = 32 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 19 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute values
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Fetch as needed
R A B3 71 14 67 74 56 25 7
S B C D9 8 81 5 19 4 24 3 34 2 65 7 8
R on S A B C D1 1 5 14 5 7 8
Execution at nodeR
nodeR: send data request message (tuples of relation S with B = ‘7′) to nodeS
nodeS : send requested data (0 tuples of relation S with B = ‘7′) to nodeR
nodeR: send data request message (tuples of relation S with B = ‘1′) to nodeS
nodeS : send requested data (1 tuple of relation S with B = ‘1′) to nodeR
. . .
Total costs: 7 · 2 = 14 messages, 7 + 2 · 3 = 13 attribute values
Execution at nodeS : 6 · 2 = 12 messages, 6 + 2 · 2 = 10 attribute valuesKatja Hose Distributed Database Systems Dagstuhl, June 27, 2017 20 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole vs. fetch as needed
Conclusion
Fetch as needed results in a high number of messages
Ship whole results in high amounts of transferred data
More advanced join execution algorithms
Semijoin
Bitvector join
. . .
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 21 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Ship whole vs. fetch as needed
Conclusion
Fetch as needed results in a high number of messages
Ship whole results in high amounts of transferred data
More advanced join execution algorithms
Semijoin
Bitvector join
. . .
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 21 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Semijoin
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 22 / 24
Distributed Database Systems
Distributed query processing
Query Execution
Bitvector join
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 23 / 24
Distributed Database Systems
Distributed query processing
Summary
Query optimization in distributed database systems benefits from globalknowledge and control.
We can decide on fragmentation and allocation
Create detailed statistics
Finetune a cost model
Optimize join processing (Semijoin, Bitvector join,. . . )
. . .
Further interesting topics
Distributed concurrency control
Reliability and failure
Heterogeneity and data integration
. . .
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24
Distributed Database Systems
Distributed query processing
Summary
Query optimization in distributed database systems benefits from globalknowledge and control.
We can decide on fragmentation and allocation
Create detailed statistics
Finetune a cost model
Optimize join processing (Semijoin, Bitvector join,. . . )
. . .
Further interesting topics
Distributed concurrency control
Reliability and failure
Heterogeneity and data integration
. . .Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24
Distributed Database Systems
References
References
This talk involves material based on the following sources:
M. Tamer Ozsu, P. Valduriez. Principles of Distributed Database Systems.Third Edition, Springer, 2011.
P. Dadam. Verteilte Datenbanken und Client/Server-Systeme.Springer-Verlag, Berlin, Heidelberg 1996.
Kai-Uwe Sattler Vorlesung: Verteilte Datenbanksysteme.
Katja Hose Distributed Database Systems Dagstuhl, June 27, 2017 24 / 24