07. overview of query processing

35
Distributed Database Systems Autumn, 2007 Chapter 7 Overview of Query Processing 1 Distributed Database Systems

Upload: kartik-rupani

Post on 24-Apr-2015

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 07. Overview of Query Processing

Distributed Database SystemsAutumn, 2007

Chapter 7

Overview of Query Processing

1Distributed Database Systems

Page 2: 07. Overview of Query Processing

SQL: Non-Procedural Language of RDB

Tuple calculus◦ { t | F(t) } where:

t : tuple variableF(t) : well formed formula

Example◦ Get the No. and name of all managers

2Distributed Database Systems

( ) ( ){ }""|, MANAGERTITLEtEMPtENAMEENOt =∧∈

Page 3: 07. Overview of Query Processing

SQL: Non-Procedural Language of RDB

Domain calculus

where:xi : domain variables

: well formed formula

Example

{ x, y | E(x, y, "manager") }

3Distributed Database Systems

( ){ } ,,,|,,, 2121 nn xxxFxxx ⋅⋅⋅⋅⋅⋅

( )nxxxF ,,, 21 ⋅⋅⋅

Variables are position sensitive!

Page 4: 07. Overview of Query Processing

SQL: Non-Procedural Language of RDB

SQL is a tuple calculus language

SELECT ENO,ENAMEFROM EMPWHERE TITLE=“manager”

4Distributed Database Systems

End user uses non-procedural languages to express queries.

Page 5: 07. Overview of Query Processing

Query Processor

Query processor transforms queries into procedural operations to access data

5Distributed Database Systems

Page 6: 07. Overview of Query Processing

Query Processor

Distributed query processor has to deal with

◦query decomposition, and

◦data localization

6Distributed Database Systems

Page 7: 07. Overview of Query Processing

7.1 Query Processing Problems

Distributed Database Systems 7

Page 8: 07. Overview of Query Processing

7.1 Query Processing ProblemsCentralized query processor must◦ transform calculus query into algebra operation, and

◦choose the best execution planExample:

SELECT ENAMEFROM E,GWHERE E.ENO = G.ENOAND RESP=“manager”

8Distributed Database Systems

Page 9: 07. Overview of Query Processing

7.1 Query Processing Problems

Relational Algebra 1

Relational Algebra 2

9Distributed Database Systems

( )( )GE ManagerRESPENOENAME ""=σπ <>

( )( )GEENOGENOEManagerRESPENAME ×=∧= ..""σπ

Execution plan 2 is better for consuming less resources!

Page 10: 07. Overview of Query Processing

7.1 Query Processing Problems

In DDB, the query processor must consider the communication cost and select the best site!Same query as last example, but G and E are distributed.Simple plan:◦ To transport all segments to query site and

execute there. This causes too much network traffic, very costly.

10Distributed Database Systems

Page 11: 07. Overview of Query Processing

7.1 Query Processing Problems

Distributed Query Example◦ Distribution of E and G

11Distributed Database Systems

Page 12: 07. Overview of Query Processing

7.1 Query Processing Problems

Distributed Query Example◦ Query

12Distributed Database Systems

( )( )GE ManagerREPSPENOENAME ""=σπ <>

Page 13: 07. Overview of Query Processing

7.1 Query Processing Problems

Distributed Query Example◦ Optimized Processing

13Distributed Database Systems

Page 14: 07. Overview of Query Processing

7.2 Objectives of Query Processing

Distributed Database Systems 14

Page 15: 07. Overview of Query Processing

7.2 Objectives of Query Processing

Two-fold objectives:

◦Transformation, and

◦Optimization

15Distributed Database Systems

Page 16: 07. Overview of Query Processing

7.2 Objectives of Query Processing

Cost to be considered for optimization:

◦CPU time◦I/O time, and

◦Communication time

16Distributed Database Systems

WAN: the last cost is dominant

LAN: all three are equal

Page 17: 07. Overview of Query Processing

7.3 Complexity of Relational Algebra Operations

Distributed Database Systems 17

Page 18: 07. Overview of Query Processing

7.3 Complexity of Relational Algebra Operations

Measured by n (cardinality) and tuples are sorted on comparison attributes

Distributed Database Systems 18

O(n)

O(nÿlogn)

O(nÿlogn)

O(n2)

)duplicates(with ,πσ

GROUP ),duplicates(with π

−÷ ,,,, UI<>

×

Page 19: 07. Overview of Query Processing

7.4 Characterization of Query Processor

Distributed Database Systems 19

Page 20: 07. Overview of Query Processing

7.4.1 Languages

For users:

◦ calculus or algebra based languages.For query processor:

◦map the input into internal form of algebra augmented with communication primitives.

Distributed Database Systems 20

Page 21: 07. Overview of Query Processing

7.4.2 Types of Optimization

Exhaustive search◦ Workable for small solution space

Heuristics

◦ Perform first, semi-join, etc. for largesolution space

Distributed Database Systems 21

σ π,

Page 22: 07. Overview of Query Processing

7.4.3 Optimization Timing

Static◦ Do it at compiling time by using statistics,

appropriate for exhaustive search, optimized once, but executed many times.

Dynamic◦ Do it at execution time, accurate, repeated

for every execution, expensive.

Distributed Database Systems 22

Page 23: 07. Overview of Query Processing

7.4.4 Statistics

Facts of◦ Cardinalities◦ Attribute value distribution◦ Size of relation, etc.

Provided to query optimizer and periodically updated.

Distributed Database Systems 23

Page 24: 07. Overview of Query Processing

7.4.5 Decision Site

For query optimization, it may be done by◦ Single site – centralized approach, or◦ All the sites involved – distributed, or◦ Hybrid – one site makes major decision in

cooperation with other sites making local decisions

Distributed Database Systems 24

Page 25: 07. Overview of Query Processing

7.4.6 Exploration of the Network Topology

WAN◦ communication cost is dominant

LAN◦ communication cost is comparable to I/O

cost. Broadcasting capability, star network, satellite network should be considered.

Distributed Database Systems 25

Page 26: 07. Overview of Query Processing

7.4.7 Exploration of Replicated Fragments

Use replications to minimize communication costs.

Distributed Database Systems 26

Page 27: 07. Overview of Query Processing

7.4.8 Use of Semi-joins

Reduce the size of operand relations to cut down communication costs when overhead is not significant.

Distributed Database Systems 27

Page 28: 07. Overview of Query Processing

7.5 Layers of Query Processing

Distributed Database Systems 28

Page 29: 07. Overview of Query Processing

Distributed Database Systems 29

Generic Laying Scheme for Distributed Query Processing

Page 30: 07. Overview of Query Processing

7.5.1 Query Decomposition

Decompose calculus query into algebra query using global conceptual schema information.

Distributed Database Systems 30

Step 1 – calculus normalization

Step 2 – semantic analysis to reject incorrect queries

Step 3 – simplification to eliminate redundant components

Step 4 – translation of calculus query into optimized algebra query.

Page 31: 07. Overview of Query Processing

7.5.2 Data Localization

Distributed query is mapped into a fragment query and simplified to produce a good one.

Distributed Database Systems 31

Page 32: 07. Overview of Query Processing

7.5.3 Global Query Optimization

Find an execution strategy close to optimal.Find the best ordering of operations in the fragment query, including communication operations.Cost function defined in time is required.

Distributed Database Systems 32

Page 33: 07. Overview of Query Processing

7.5.4 Local Query Optimization

Centralized system algorithms

(to be discussed in chapter 9)

Distributed Database Systems 33

Page 34: 07. Overview of Query Processing

7.6 Conclusions

Distributed Database Systems 34

Page 35: 07. Overview of Query Processing

7.6 Conclusions

Query processor – must be able to find good execution plan for a calculus query, s. t. CPU time, I/O time and communication time are minimized.Method: laying of◦ decomposition◦ localization◦ global query optimization◦ local query optimization

Distributed Database Systems 35