querry processing
TRANSCRIPT
-
8/9/2019 Querry Processing
1/112
1
Chapter 21
Query Processing
Transparencies
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
2/112
2
Chapter 21 - Objectives
Objectives of query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and semantically
analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA operations.
How to apply heuristic transformation rules toimprove efficiency of a query.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
3/112
3
Chapter 21 - Objectives
Types of database statistics required to estimate
cost of operations.
Different strategies for implementing selection.
How to evaluate cost and size of selection.
Different strategies for implementing join.
How to evaluate cost and size of join.
Different strategies for implementing projection.How to evaluate cost and size of projection.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
4/112
4
Chapter 21 - Objectives
How to evaluate the cost and size of other RAoperations.
How pipelining can be used to improve efficiency
of queries.Difference between materialization and
pipelining.
Advantages of left-deep trees.
Approaches to finding optimal executionstrategy.
How Oracle handles QO.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
5/112
5
Introduction
In network and hierarchical DBMSs, low-levelprocedural query language is generally embeddedin high-level programming language.
Programmers responsibility to select mostappropriate execution strategy.
With declarative languages such as SQL, userspecifies what data is required rather than how it
is to be retrieved.Relieves user of knowing what constitutes good
execution strategy.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
6/112
6
Introduction
Also gives DBMS more control over systemperformance.
Two main techniques for query optimization:
heuristic rules that order operations in a query;
comparing different strategies based on relativecosts, and selecting one that minimizes resource
usage.
Disk access tends to be dominant cost in queryprocessing for centralized DBMS.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
7/112
7
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP: transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
(implementing RA);
execute strategy to retrieve required data.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
8/112
8
Query Optimization
Activity of choosing an efficient executionstrategy for processing query.
As there are many equivalent transformations of
same high-level query, aim of QO is to choose onethat minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with largenumber of relations, so strategy adopted isreduced to finding near optimum solution.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
9/112
-
8/9/2019 Querry Processing
10/112
10
Example 21.1 - Different Strategies
Three equivalent RA queries are:
(1) W(position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) W(position='Manager') (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)
(3) (Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(Wcity='London' (Branch))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
11/112
11
Example 21.1 - Different Strategies
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
no indexes or sort keys;
results of any intermediate operations stored
on disk;
cost of the final write is ignored;
tuples are accessed one at a time.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
12/112
12
Example 21.1 - Cost Comparison
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050
(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations muchmore expensive than selection, and third option
significantly reduces size of relations being joinedtogether.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
13/112
13
Phases of Query Processing
QP has four main phases:
decomposition (consisting of parsing and
validation); optimization;
code generation;
execution.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
14/112
14
Phases of Query Processing
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
15/112
15
Dynamic versus Static Optimization
Two times when first three phases of QP can becarried out:
dynamically every time query is run;
statically when query is first submitted.Advantages of dynamic QO arise from fact that
information is up to date.
Disadvantages are that performance of query is
affected, time may limit finding optimumstrategy.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
16/112
16
Dynamic versus Static Optimization
Advantages of static QO are removal of runtime
overhead, and more time to find optimum
strategy.
Disadvantages arise from fact that chosenexecution strategy may no longer be optimal
when query is run.
Could use a hybrid approach to overcome this.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
17/112
17
Query Decomposition
Aims are to transform high-level query into RAquery and check that query is syntactically andsemantically correct.
Typical stages are: analysis,
normalization,
semantic analysis,
simplification,
query restructuring.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
18/112
18
Analysis
Analyze query lexically and syntactically using
compiler techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
19/112
19
Analysis - Example
SELECT staff_no
FROM Staff
WHERE position > 10;
This query would be rejected on two grounds:
staff_no is not defined for Staff relation
(should be staffNo).
Comparison >10 is incompatible with type
position, which is variable character string.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
20/112
20
Analysis
Finally, query transformed into some internalrepresentation more suitable for processing.
Some kind of query tree is typically chosen,
constructed as follows:Leaf node created for each base relation.
Non-leaf node created for each intermediaterelation produced by RA operation.
Root of tree represents query result.Sequence is directed from leaves to root.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
21/112
21
Example 21.1 - R.A.T.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
22/112
22
Normalization
Converts query into a normalized form for easier
manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:
(position = 'Manager' salary > 20000) (branchNo = 'B003')
Disjunctive normal form:
(position = 'Manager' branchNo = 'B003' )
(salary > 20000 branchNo = 'B003')
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
23/112
23
Semantic Analysis
Rejects normalized queries that are incorrectlyformulated or contradictory.
Query is incorrectly formulated if components
do not contribute to generation of result.Query is contradictory if its predicate cannot be
satisfied by any tuple.
Algorithms to determine correctness exist only
for queries that do not contain disjunction andnegation.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
24/112
24
Semantic Analysis
For these queries, could construct:
A relation connection graph.
Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for
result. Create edges between two nodes that
represent a join, and edges between nodes thatrepresent projection.
If not connected, query is incorrectly formulated.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
25/112
25
Semantic Analysis - Normalized Attribute
Connection Graph
Create node for each reference to an attribute, or
constant 0.
Create directed edge between nodes that represent
a join, and directed edge between attribute nodeand 0 node that represents selection.
Weight edges a p b with value c, if it represents
inequality condition (a e b + c); weight edges 0 p a
with -c, if it represents inequality condition (a u c).
If graph has cycle for which valuation sum is
negative, query is contradictory.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
26/112
26
Example 21.2 - Checking Semantic Correctness
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 ANDc.prefType = Flat AND p.ownerNo = CO93;
Relation connection graph not fully connected, so
query is not correctly formulated.Have omitted the join condition (v.propertyNo =
p.propertyNo) .
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
27/112
27
Example 21.2 - Checking Semantic Correctness
Relation Connection graph
Normalized attribute
connection graph
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
28/112
28
Example 21.2 - Checking Semantic Correctness
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = Flat AND c.maxRent < 200;
Normalized attribute connection graph has cycle
between nodes c.maxRent and 0 with negativevaluation sum, so query is contradictory.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
29/112
29
Simplification
Detects redundant qualifications,
eliminates common sub-expressions,
transforms query to semantically equivalent
but more easily and efficiently computed form. Typically, access restrictions, view definitions,
and integrity constraints are considered.
Assuming user has appropriate access privileges,
first apply well-known idempotency rules ofboolean algebra.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
30/112
30
Transformation Rules for RA Operations
Conjunctive Selection operations can cascade into
individual Selection operations (and vice versa).
Wpqr(R) = Wp(Wq(Wr(R)))
Sometimes referred to as cascade of Selection.
WbranchNo='B003' salary>15000(Staff) =
WbranchNo='B003'(Wsalary>15000(Staff))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
31/112
31
Transformation Rules for RA Operations
Commutativity of Selection.
Wp(Wq(R)) = Wq(Wp(R))
For example:
WbranchNo='B003'(Wsalary>15000(Staff)) =
Wsalary>15000(WbranchNo='B003'(Staff))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
32/112
32
Transformation Rules for RA Operations
In a sequence of Projection operations, only the
last in the sequence is required.
4L4
M
4N
(R) = 4L
(R)
For example:
4lName4branchNo, lName(Staff) = 4lName (Staff)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
33/112
33
Transformation Rules for RA Operations
Commutativity of Selection and Projection.
If predicate p involves only attributes in projection list,
Selection and Projection operations commute:
4Ai, , Am(Wp(R)) = Wp(4Ai, , Am(R))
where p {A1, A2, , Am}
For example:
4fName, lName
(WlName='Beech'
(Staff)) =
WlName='Beech'(4fName,lName(Staff))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
34/112
34
Transformation Rules for RA Operations
Commutativity of Theta join (and Cartesianproduct).
R p S = S p R
R X S = S X R
Rule also applies to Equijoin and Natural join.For example:
Staff staff.branchNo=branch.branchNo Branch =
Branch staff.branchNo=branch.branchNoStaff
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
35/112
35
Transformation Rules for RA Operations
Commutativity of Selection and Theta join (orCartesian product).
If selection predicate involves only attributes ofone of join relations, Selection and Join (orCartesian product) operations commute:
Wp(R r S) = (Wp(R)) r S
Wp(R X S) = (Wp(R)) X S
where p {A1, A2, , An}
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
36/112
36
Transformation Rules for RA Operations
If selection predicate is conjunctive predicate
having form (p q), where p only involves
attributes of R, and q only attributes of S,
Selection and Theta join operations commute as:
Wp q(R r S) = (Wp(R)) r (Wq(S))
Wp q(R X S) = (Wp(R)) X (Wq(S))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
37/112
37
Transformation Rules for RA Operations
For example:
Wposition='Manager' city='London'(Staff
Staff.branchNo=Branch.branchNo Branch) =(Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(Wcity='London' (Branch))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
38/112
38
Transformation Rules for RA Operations
Commutativity of Projection and Theta join (orCartesian product).
If projection list is of form L = L1 L2, where L1only has attributes of R, and L2 only hasattributes of S, provided join condition onlycontains attributes of L, Projection and Thetajoin commute:
4L1L2(R r S) = (4L1(R)) r (4L2(S))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
39/112
39
Transformation Rules for RA Operations
If join condition contains additional attributes
not in L (M = M1 M2 where M1 only has
attributes of R, and M2 only has attributes of S),
a final projection operation is required:
4L1L2(R r S) = 4L1L2( (4L1M1(R)) r(4L2M2(S)))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
40/112
40
Transformation Rules for RA Operations
For example:
4position,city,branchNo(Staff Staff.branchNo=Branch.branchNo Branch)
=
(4
position, branchNo(Staff)) Staff.branchNo=Branch.branchNo (4city, branchNo (Branch))
and using the latter rule:
4position, city
(StaffStaff.branchNo=Branch.branchNo
Branch) =
4position, city ((4position, branchNo(Staff))
Staff.branchNo=Branch.branchNo ( 4city, branchNo (Branch)))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
41/112
41
Transformation Rules for RA Operations
Commutativity of Union and Intersection (but
not set difference).
R
S = S
RR S = S R
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
42/112
42
Transformation Rules for RA Operations
Commutativity of Selection and set operations
(Union, Intersection, and Set difference).
Wp(R
S) =W
p(S) W
p(R)Wp(R S) = Wp(S) Wp(R)
Wp(R - S) = Wp(S) - Wp(R)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
43/112
43
Transformation Rules for RA Operations
Commutativity of Projection and Union.
4L(R S) = 4L(S) 4L(R)
Associativity of Union and Intersection (but not
Set difference).
(R S) T = S (R T)
(R S) T = S (R T)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
44/112
44
Transformation Rules for RA Operations
Associativity of Theta join (and Cartesian product).
Cartesian product and Natural join are always
associative:
(R S) T = R (S T)
(R X S) X T = R X (S X T)
If join condition q involves attributes only from S
and T, then Theta join is associative:(R p S) q r T = R p r (S q T)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
45/112
45
Transformation Rules for RA Operations
For example:
(Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)
ownerNo=Owner.ownerNo staff.lName=Owner.lName Owner =
Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName
(PropertyForRent ownerNo Owner)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
46/112
46
Example 21.3 Use of Transformation Rules
For prospective renters of flats, find propertiesthat match requirements and owned by CO93.
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = Flat AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent ANDc.prefType = p.type AND
p.ownerNo = CO93;
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
47/112
47
Example 21.3 Use of Transformation Rules
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
48/112
48
Example 21.3 Use of Transformation Rules
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
49/112
49
Example 21.3 Use of Transformation Rules
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
50/112
50
Heuristical Processing Strategies
Perform Selection operations as early as possible.
Keep predicates on same relation together.
Combine Cartesian product with subsequent
Selection whose predicate represents join
condition into a Join operation.
Use associativity of binary operations to
rearrange leaf nodes so leaf nodes with mostrestrictive Selection operations executed first.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
51/112
51
Heuristical Processing Strategies
Perform Projection as early as possible.
Keep projection attributes on same relation together.
Compute common expressions once.
If common expression appears more than once, and
result not too large, store result and reuse it when
required.
Useful when querying views, as same expression is usedto construct view each time.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
52/112
52
Cost Estimation for RA Operations
Many different ways of implementing RAoperations.
Aim of QO is to choose most efficient one.
Use formulae that estimate costs for a number ofoptions, and select one with lowest cost.
Consider only cost of disk access, which is usuallydominant cost in QP.
Many estimates are based on cardinality of therelation, so need to be able to estimate this.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
53/112
53
Database Statistics
Success of estimation depends on amount and
currency of statistical information DBMS holds.
Keeping statistics current can be problematic.
If statistics updated every time tuple is changed,this would impact performance.
DBMS could update statistics on a periodic basis,
for example nightly, or whenever the system is
idle.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
54/112
54
Typical Statistics for Relation R
nTuples(R) - number of tuples in R.
bFactor(R) - blocking factor of R.
nBlocks(R) - number of blocks required to store R:
nBlocks(R) = [nTuples(R)/bFactor(R)]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
55/112
55
Typical Statistics for Attribute A of Relation R
nDistinctA(R) - number of distinct values that
appear for attribute A in R.
minA(R),maxA(R)
minimum and maximum possible values
for attribute A in R.
SCA(R) - selection cardinality of attribute A in R.
Average number of tuples that satisfy anequality condition on attribute A.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
56/112
56
Statistics for Multilevel Index I on Attribute A
nLevelsA(I) - number of levels in I.
nLfBlocksA(I) - number of leaf blocks in I.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
57/112
57
Selection Operation
Predicate may be simple or composite.
Number of different implementations, dependingon file structure, and whether attribute(s)
involved are indexed/hashed.Main strategies are:
Linear Search (Unordered file, no index).
Binary Search (Ordered file, no index).
Equality on hash key.Equality condition on primary key.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
58/112
58
Selection Operation
Inequality condition on primary key.
Equality condition on clustering (secondary)
index.
Equality condition on a non-clustering(secondary) index.
Inequality condition on a secondary B+-tree
index.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
59/112
59
Estimating Cardinality of Selection
Assume attribute values are uniformly distributed
within their domain and attributes are
independent.
nTuples(S) = SCA(R)
For any attribute B { A of S, nDistinctB(S) =
nTuples(S) if nTuples(S) < nDistinctB(R)/2
nDistinctB(R) if nTuples(S) > 2*nDistinctB(R)[(nTuples(S) + nDistinctB(R))/3] otherwise
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
60/112
60
Linear Search (Ordered File, No Index)
May need to scan each tuple in each block tocheck whether it satisfies predicate.
For equality condition on key attribute, cost
estimate is:[nBlocks(R)/2]
For any other condition, entire file may need to besearched, so more general cost estimate is:
nBlocks(R)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
61/112
61
Binary Search (Ordered File, No Index)
If predicate is of form A = x, and file is orderedon key attribute A, cost estimate:
[log2(nBlocks(R))]
Generally, cost estimate is:[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1
First term represents cost of finding first tupleusing binary search.
Expect there to be SCA(R) tuples satisfyingpredicate.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
62/112
62
Equality of Hash Key
If attribute A is hash key, apply hashing
algorithm to calculate target address for tuple.
If there is no overflow, expected cost is 1.
If there is overflow, additional accesses may benecessary.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
63/112
63
Equality Condition on Primary Key
Can use primary index to retrieve single record
satisfying condition.
Need to read one more block than number of
index accesses, equivalent to number of levels inindex, so estimated cost is:
nLevelsA(I) + 1
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
64/112
64
Inequality Condition on Primary Key
Can first use index to locate record satisfying
predicate (A = x).
Provided index is sorted, records can be found by
accessing all records before/after this one.Assuming uniform distribution, would expect
half the records to satisfy inequality, so estimated
cost is:
nLevelsA(I) + [nBlocks(R)/2]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
65/112
65
Equality Condition on Clustering Index
Can use index to retrieve required records.
Estimated cost is:
nLevelsA(I) + [SC
A(R)/bFactor(R)]
Second term is estimate of number of blocks that
will be required to store number of tuples that
satisfy equality condition, represented as SCA(R).
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
66/112
66
Equality Condition on Non-Clustering Index
Can use index to retrieve required records.
Have to assume that tuples are on different
blocks (index is not clustered this time), so
estimated cost becomes:
nLevelsA(I) + [SCA(R)]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
67/112
67
Inequality Condition on a Secondary B+-
Tree Index
From leaf nodes of tree, can scan keys from
smallest value up to x (< or or >=).
Assuming uniform distribution, would expecthalf the leaf node blocks to be accessed and, via
index, half the file records to be accessed.
Estimated cost is:
nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
68/112
68
Composite Predicates - Conjunction
without Disjunction
May consider following approaches:
- If one attribute has index or is ordered, can use one of
above selection strategies. Can then check each retrieved
record.
- For equality on two or more attributes, with composite
index (or hash key) on combined attributes, can search
index directly.
- With secondary indexes on one or more attributes
(involved only in equality conditions in predicate), could
use record pointers if exist.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
69/112
69
Composite Predicates - Selections with
Disjunction
If one term contains an (OR), and term requires
linear search, entire selection requires linear
search.
Only if index or sort order exists on every termcan selection be optimized by retrieving records
that satisfy each condition and applying union
operator.
Again, record pointers can be used if they exist.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
70/112
70
Join Operation
Main strategies for implementing join:
Block Nested Loop Join.
Indexed Nested Loop Join.
Sort-Merge Join.
Hash Join.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
71/112
-
8/9/2019 Querry Processing
72/112
72
Estimating Cardinality of Join
If assume uniform distribution, can estimate for
Equijoins with a predicate (R.A = S.B) as follows:
If A is key of R: nTuples(T) e nTuples(S)
If B is key of S: nTuples(T) e nTuples(R)
Otherwise, could estimate cardinality of join as:
nTuples(T) = SCA(R)*nTuples(S) or
nTuples(T) = SCB(S)*nTuples(R)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
73/112
73
Block Nested Loop Join
Simplest join algorithm is nested loop that joinstwo relations together a tuple at a time.
Outer loop iterates over each tuple in R, andinner loop iterates over each tuple in S.
As basic unit of reading/writing is a disk block,better to have two extra loops that processblocks.
E
stimated cost of this approach is:nBlocks(R) + (nBlocks(R) * nBlocks(S))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
74/112
74
Block Nested Loop Join
Could read as many blocks as possible of smallerrelation, R say, into database buffer, saving oneblock for inner relation and one for result.
New cost estimate becomes:
nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))]
If can read all blocks of R into the buffer, thisreduces to:
nBlocks(R) + nBlocks(S)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
75/112
75
Indexed Nested Loop Join
If have index (or hash function) on joinattributes of inner relation, can use indexlookup.
For each tuple in R, use index to retrievematching tuples of S.
Cost of scanning R is nBlocks(R), as before.
Cost of retrieving matching tuples in S dependson type of index and number of matching tuples.
If join attribute A in S is PK, cost estimate is:
nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
76/112
76
Sort-Merge Join
For Equijoins, most efficient join is when both
relations are sorted on join attributes.
Can look for qualifying tuples merging relations.
May need to sort relations first.
Now tuples with same join value are in order.
If assume join is *:* and each set of tuples with
same join value can be held in database buffer at
same time, then each block of each relation need
only be read once.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
77/112
77
Sort-Merge Join
Cost estimate for the sort-merge join is:
nBlocks(R) + nBlocks(S)
If a relation has to be sorted, R say, add:
nBlocks(R)*[log2(nBlocks(R)]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
78/112
78
Hash Join
For Natural or Equijoin, hash join may be used.
Idea is to partition relations according to somehash function that provides uniformity andrandomness.
Each equivalent partition should hold samevalue for join attributes, although it may holdmore than one value.
Cost estimate of hash join as:
3(nBlocks(R) + nBlocks(S))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
79/112
-
8/9/2019 Querry Processing
80/112
80
Estimating Cardinality of Projection
When projection contains key, cardinality is:
nTuples(S) = nTuples(R)
If projection consists of a single non-key
attribute, estimate is:
nTuples(S) = SCA(R)
Otherwise, could estimate cardinality as:
nTuples(S) e min(nTuples(R), 4im
=1(nDistinctai(R)))
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
81/112
81
Duplicate Elimination using Sorting
Sort tuples of reduced relation using all
remaining attributes as sort key.
Duplicates will now be adjacent and can be
removed easily. Estimated cost of sorting is:
nBlocks(R)*[log2(nBlocks(R))].
Combined cost is:
nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
82/112
82
Duplicate Elimination using Hashing
Two phases: partitioning and duplicate
elimination.
In partitioning phase, for each tuple in R,
remove unwanted attributes and apply hashfunction to combination of remaining attributes,
and write reduced tuple to hashed value.
Two tuples that belong to different partitions are
guaranteed not to be duplicates.
Estimated cost is: nBlocks(R) + nB
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
83/112
83
Set Operations
Can be implemented by sorting both relations on
same attributes, and scanning through each of
sorted relations once to obtain desired result.
Could use sort-merge join as basis. Estimated cost in all cases is:
nBlocks(R) + nBlocks(S) +
nBlocks(R)*[log2(nBlocks(R))] +nBlocks(S)*[log2(nBlocks(S))]
Could also use hashing algorithm.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
84/112
84
Estimating Cardinality of Set Operations
As duplicates are eliminated when performing
Union, difficult to estimate cardinality, but can
give an upper and lower bound as:
max(nTuples(R), nTuples(S)) e nTuples(T) e
nTuples(R) + nTuples(S)
For Set Difference, can also give upper and lower
bound:
0 e nTuples(T) e nTuples(R)
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
85/112
85
Aggregate Operations
SELECT AVG(salary)
FROM Staff;
To implement query, could scan entire Staff
relation and maintain running count of number
of tuples read and sum of all salaries.
Easy to compute average from these two running
counts.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
86/112
86
Aggregate Operations
SELECT AVG(salary)
FROM Staff
GROUP BY branchNo;
For grouping queries, can use sorting or hashing
algorithms similar to duplicate elimination.
Can estimate cardinality of result using
estimates derived earlier for selection.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
87/112
87
Enumeration of Alternative Strategies
Fundamental to efficiency of QO is the searchspace of possible execution strategies and theenumeration algorithm used to search this space.
Query with 2 joins gives 12 join orderings:R (S T) R (T S) (S T) R (T S) R
S (R T) S (T R) (R T) S (T R) S
T (R S) T (S R) (R S) T (S R) T
With n relations, (2(n 1))!/(n 1)! orderings.
Ifn = 4 this is 120; ifn = 10 this is > 176 billion.
Compounded by different selection/join methods. Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
88/112
88
Pipelining
Materialization - output of one operation isstored in temporary relation for processing bynext.
Could also pipeline results of one operation toanother without creating temporary relation.
Known as pipelining or on-the-fly processing.
Pipelining can save on cost of creatingtemporary relations and reading results back inagain.
Generally, pipeline is implemented as separateprocess or thread.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
89/112
89
Types of Trees
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
90/112
90
Pipelining
With linear trees, relation on one side of eachoperator is always a base relation.
However, as need to examine entire inner relationfor each tuple of outer relation, inner relationsmust always be materialized.
This makes left-deep trees appealing as innerrelations are always base relations.
Reduces search space for optimum strategy, andallows QO to use dynamic processing.
Not all execution strategies are considered.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
91/112
91
Physical Operators & Strategies
Term physical operator refers to specific
algorithm that implements a logical operation,
such as selection or join.
For example, can use sort-merge join toimplement the join operation.
Replacing logical operations in a R.A.T. with
physical operators produces an execution strategy
(or query evaluation plan or access plan).
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
92/112
92
Physical Operators & Strategies
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
93/112
93
Reducing the Search Space
Restriction 1: Unary operations processed on-the-fly: selections processed as relations areaccessed for first time; projections processed asresults of other operations are generated.
Restriction 2: Cartesian products are neverformed unless query itself specifies one.
Restriction 3: Inner operand of each join is abase relation, never an intermediate result. Thisuses fact that with left-deep trees inner operand is
a base relation and so already materialized.Restriction 3 excludes many alternative strategies
but significantly reduces number to be considered.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
94/112
94
Dynamic Programming
Enumeration of left-deep trees using dynamic
programming first proposed for System R QO.
Algorithm based on assumption that the cost
model satisfies principle of optimality. Thus, to obtain optimal strategy for query with n
joins, only need to consider optimal strategies for
subexpressions with (n 1) joins and extend those
strategies with an additional join. Remainingsuboptimal strategies can be discarded.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
95/112
95
Dynamic Programming
To ensure some potentially useful strategies are
not discarded algorithm retains strategies with
interesting orders: an intermediate result has an
interesting order if it is sorted by a final ORDE
RBY attribute, GROUP BY attribute, or any
attributes that participate in subsequent joins.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
96/112
96
Dynamic Programming
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent < 500 AND
c.clientNo = v.clientNo ANDv.propertyNo = p.propertyNo;
Attributes c.clientNo, v.clientNo, v.propertyNo,and p.propertyNo are interesting.
If any intermediate result is sorted on any of theseattributes, then corresponding partial strategymust be included in search.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
97/112
97
Dynamic Programming
Algorithm proceeds from the bottom up andconstructs all alternative join trees that satisfy therestrictions above, as follows:
Pass 1: Enumerate the strategies for each baserelation using a linear search and all availableindexes on the relation. These partial strategiesare partitioned into equivalence classes based onany interesting orders. An additional equivalence
class is created for the partial strategies with nointeresting order.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
98/112
98
Dynamic Programming
For each equivalence class, strategy with lowestcost is retained for consideration in next pass.
Do not retain equivalence class with no interestingorder if its lowest cost strategy is not lower thanall other strategies.
For a given relation R, any selections involvingonly attributes of R are processed on-the-fly.Similarly, any attributes of R that are not part of
the SELECT clause and do not contribute to anysubsequent join can be projected out at this stage(restriction 1 above).
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
99/112
99
Dynamic Programming
Pass 2: Generate all 2-relation strategies byconsidering each strategy retained after Pass 1 asouter relation, discarding any Cartesian productsgenerated (restriction 2 above). Again, any on-the-
fly processing is performed and lowest coststrategy in each equivalence class is retained.
Pass n: Generate all n-relation strategies byconsidering each strategy retained after Pass (n
1) as outer relation, discarding any Cartesianproducts generated. After pruning, now havelowest overall strategy for processing the query.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
100/112
100
Dynamic Programming
Although algorithm is still exponential, there arequery forms for which it only generates O(n3)strategies, so for n = 10 the number is 1,000, whichis significantly better than the 176 billion different
join orders noted earlier.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
101/112
101
Semantic Query Optimization
Based on constraints specified on the databaseschema to reduce the search space.
For example, a constraint states that staff cannotsupervise more than 100 properties, so any query
searching for staff who supervise more than 100properties will produce zero rows. Now consider:
CREATE ASSERTION ManagerSalary
CHECK(salary > 20000 AND position = Manager)
SELECT s.staffNo, fName, lName, propertyNoFROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo AND
position = Manager; Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
102/112
102
Semantic Query Optimization
Can rewrite this query as:
SELECT s.staffNo, fName, lName, propertyNo
FROM Staff s, PropertyForRent p
WHERE s.staffNo = p.staffNo AND
salary > 20000 AND position = Manager;
Additional predicate may be very useful if only
index for Staff is a B+-tree on the salary attribute.
However, additional predicate would complicatequery if no such index existed.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
103/112
103
Query Optimization in Oracle
Oracle supports two approaches to queryoptimization: rule-based and cost-based.
Rule-based
15 rules, ranked in order of efficiency. Particularaccess path for a table only chosen if statementcontains a predicate or other construct thatmakes that access path available.
Score assigned to each execution strategy usingthese rankings and strategy with best (lowest)score selected.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
104/112
104
QO in Oracle Rule-Based
When 2 strategies have same score, tie-break
resolved by making decision based on order in
which tables occur in the SQL statement.
Pearson Education Limited 1995,
2005
-
8/9/2019 Querry Processing
105/112
105
QO in Oracle Rule-based: Example
SELECT propertyNo
FROM PropertyForRent
WHERE rooms > 7 AND city = London
Single-column access path using index on city from
WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from
WHERE condition (rooms > 7). Rank 11.
Full table scan - rank 15.
Although there is index on propertyNo, column does notappear in WHERE clause and so is not considered byoptimizer.
Based on these paths, rule-based optimizer will choose touse index based on city column.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
106/112
106
QO in Oracle Cost-Based
To improve QO, Oracle introduced cost-basedoptimizer in Oracle 7, which selects strategy thatrequires minimal resource use necessary toprocess all rows accessed by query (avoiding
above tie-break anomaly).User can select whether minimal resource usage
is based on throughputor based on response time,by setting the OPTIMIZER_MODE initialization
parameter.Cost-based optimizer also takes into
consideration hints that the user may provide.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
107/112
107
QO in Oracle Statistics
Cost-based optimizer depends on statistics for alltables, clusters, and indexes accessed by query.
Users responsibility to generate these statisticsand keep them current.
Package DBMS_STATS can be used to generateand manage statistics.
Whenever possible, Oracle uses a parallel methodto gather statistics, although index statistics arecollected serially.EXECUTE
DBMS_STATS.GATHER_SCHEMA_STATS(Manager);
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
108/112
108
QO in Oracle Histograms
Previously made assumption that data values
within columns of a table are uniformly
distributed.
Histogram of values and their relativefrequencies gives optimizer improved selectivity
estimates in presence of non-uniform
distribution.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
109/112
109
QO in Oracle Histograms
(a) uniform distribution of rooms; (b) actual non-uniformdistribution.
(a) can be stored compactly as low value (1) and high value
(10), and as total count of all frequencies (in this case, 100).
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
110/112
110
QO in Oracle Histograms
Histogram is data structure that can improveestimates of number of tuples in result.
Two types of histogram:
width-balanced histogram, which divides data into a
fixed number of equal-width ranges (called buckets)each containing count of number of values fallingwithin that bucket;
height-balanced histogram, which placesapproximately same number of values in each bucketso that end points of each bucket are determined byhow many values are in that bucket.
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
111/112
111
QO in Oracle Histograms
(a) width-balanced for rooms with 5 buckets. Each bucket
of equal width with 2 values (1-2, 3-4, etc.)
(b) height-balanced height of each column is 20 (100/5).
Pearson Education Limited 1995, 2005
-
8/9/2019 Querry Processing
112/112
QO in Oracle Viewing Execution Plan