querry processing

8/9/2019 Querry Processing

1/112

1

Chapter 21

Query Processing

Transparencies

Pearson Education Limited 1995, 2005


2/112

2

Chapter 21 - Objectives

Objectives of query processing and optimization.

Static versus dynamic query optimization.

How a query is decomposed and semantically

analyzed.

How to create a R.A.T. to represent a query.

Rules of equivalence for RA operations.

How to apply heuristic transformation rules toimprove efficiency of a query.



3/112

3


Types of database statistics required to estimate

cost of operations.

Different strategies for implementing selection.

How to evaluate cost and size of selection.

Different strategies for implementing join.

How to evaluate cost and size of join.

Different strategies for implementing projection.How to evaluate cost and size of projection.



4/112

4


How to evaluate the cost and size of other RAoperations.

How pipelining can be used to improve efficiency

of queries.Difference between materialization and

pipelining.

Advantages of left-deep trees.

Approaches to finding optimal executionstrategy.

How Oracle handles QO.



5/112

5

Introduction

In network and hierarchical DBMSs, low-levelprocedural query language is generally embeddedin high-level programming language.

Programmers responsibility to select mostappropriate execution strategy.

With declarative languages such as SQL, userspecifies what data is required rather than how it

is to be retrieved.Relieves user of knowing what constitutes good

execution strategy.



6/112

6

Introduction

Also gives DBMS more control over systemperformance.

Two main techniques for query optimization:

heuristic rules that order operations in a query;

comparing different strategies based on relativecosts, and selecting one that minimizes resource

usage.

Disk access tends to be dominant cost in queryprocessing for centralized DBMS.



7/112

7

Query Processing

Activities involved in retrieving data from the

database.

Aims of QP: transform query written in high-level language

(e.g. SQL), into correct and efficient execution

strategy expressed in low-level language

(implementing RA);

execute strategy to retrieve required data.



8/112

8

Query Optimization

Activity of choosing an efficient executionstrategy for processing query.

As there are many equivalent transformations of

same high-level query, aim of QO is to choose onethat minimizes resource usage.

Generally, reduce total execution time of query.

May also reduce response time of query.

Problem computationally intractable with largenumber of relations, so strategy adopted isreduced to finding near optimum solution.



9/112


10/112

10

Example 21.1 - Different Strategies

Three equivalent RA queries are:

(1) W(position='Manager') (city='London')

(Staff.branchNo=Branch.branchNo) (Staff X Branch)

(2) W(position='Manager') (city='London')(

Staff Staff.branchNo=Branch.branchNo Branch)

(3) (Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo

(Wcity='London' (Branch))



11/112

11

Example 21.1 - Different Strategies

Assume:

1000 tuples in Staff; 50 tuples in Branch;

50 Managers; 5 London branches;

no indexes or sort keys;

results of any intermediate operations stored

on disk;

cost of the final write is ignored;

tuples are accessed one at a time.



12/112

12

Example 21.1 - Cost Comparison

Cost (in disk accesses) are:

(1) (1000 + 50) + 2*(1000 * 50) = 101 050

(2) 2*1000 + (1000 + 50) = 3 050

(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160

Cartesian product and join operations muchmore expensive than selection, and third option

significantly reduces size of relations being joinedtogether.



13/112

13

Phases of Query Processing

QP has four main phases:

decomposition (consisting of parsing and

validation); optimization;

code generation;

execution.



14/112

14

Phases of Query Processing



15/112

15

Dynamic versus Static Optimization

Two times when first three phases of QP can becarried out:

dynamically every time query is run;

statically when query is first submitted.Advantages of dynamic QO arise from fact that

information is up to date.

Disadvantages are that performance of query is

affected, time may limit finding optimumstrategy.



16/112

16

Dynamic versus Static Optimization

Advantages of static QO are removal of runtime

overhead, and more time to find optimum

strategy.

Disadvantages arise from fact that chosenexecution strategy may no longer be optimal

when query is run.

Could use a hybrid approach to overcome this.



17/112

17

Query Decomposition

Aims are to transform high-level query into RAquery and check that query is syntactically andsemantically correct.

Typical stages are: analysis,

normalization,

semantic analysis,

simplification,

query restructuring.



18/112

18

Analysis

Analyze query lexically and syntactically using

compiler techniques.

Verify relations and attributes exist.

Verify operations are appropriate for object type.



19/112

19

Analysis - Example

SELECT staff_no

FROM Staff

WHERE position > 10;

This query would be rejected on two grounds:

staff_no is not defined for Staff relation

(should be staffNo).

Comparison >10 is incompatible with type

position, which is variable character string.



20/112

20

Analysis

Finally, query transformed into some internalrepresentation more suitable for processing.

Some kind of query tree is typically chosen,

constructed as follows:Leaf node created for each base relation.

Non-leaf node created for each intermediaterelation produced by RA operation.

Root of tree represents query result.Sequence is directed from leaves to root.



21/112

21

Example 21.1 - R.A.T.



22/112

22

Normalization

Converts query into a normalized form for easier

manipulation.

Predicate can be converted into one of two forms:

Conjunctive normal form:

(position = 'Manager' salary > 20000) (branchNo = 'B003')

Disjunctive normal form:

(position = 'Manager' branchNo = 'B003' )

(salary > 20000 branchNo = 'B003')



23/112

23

Semantic Analysis

Rejects normalized queries that are incorrectlyformulated or contradictory.

Query is incorrectly formulated if components

do not contribute to generation of result.Query is contradictory if its predicate cannot be

satisfied by any tuple.

Algorithms to determine correctness exist only

for queries that do not contain disjunction andnegation.



24/112

24

Semantic Analysis

For these queries, could construct:

A relation connection graph.

Normalized attribute connection graph.

Relation connection graph

Create node for each relation and node for

result. Create edges between two nodes that

represent a join, and edges between nodes thatrepresent projection.

If not connected, query is incorrectly formulated.



25/112

25

Semantic Analysis - Normalized Attribute

Connection Graph

Create node for each reference to an attribute, or

constant 0.

Create directed edge between nodes that represent

a join, and directed edge between attribute nodeand 0 node that represents selection.

Weight edges a p b with value c, if it represents

inequality condition (a e b + c); weight edges 0 p a

with -c, if it represents inequality condition (a u c).

If graph has cycle for which valuation sum is

negative, query is contradictory.



26/112

26

Example 21.2 - Checking Semantic Correctness

SELECT p.propertyNo, p.street

FROM Client c, Viewing v, PropertyForRent p

WHERE c.clientNo = v.clientNo AND

c.maxRent >= 500 ANDc.prefType = Flat AND p.ownerNo = CO93;

Relation connection graph not fully connected, so

query is not correctly formulated.Have omitted the join condition (v.propertyNo =

p.propertyNo) .



27/112

27


Relation Connection graph

Normalized attribute

connection graph



28/112

28




WHERE c.maxRent > 500 AND

c.clientNo = v.clientNo AND

v.propertyNo = p.propertyNo AND

c.prefType = Flat AND c.maxRent < 200;

Normalized attribute connection graph has cycle

between nodes c.maxRent and 0 with negativevaluation sum, so query is contradictory.



29/112

29

Simplification

Detects redundant qualifications,

eliminates common sub-expressions,

transforms query to semantically equivalent

but more easily and efficiently computed form. Typically, access restrictions, view definitions,

and integrity constraints are considered.

Assuming user has appropriate access privileges,

first apply well-known idempotency rules ofboolean algebra.



30/112

30

Transformation Rules for RA Operations

Conjunctive Selection operations can cascade into

individual Selection operations (and vice versa).

Wpqr(R) = Wp(Wq(Wr(R)))

Sometimes referred to as cascade of Selection.

WbranchNo='B003' salary>15000(Staff) =

WbranchNo='B003'(Wsalary>15000(Staff))



31/112

31


Commutativity of Selection.

Wp(Wq(R)) = Wq(Wp(R))

For example:

WbranchNo='B003'(Wsalary>15000(Staff)) =

Wsalary>15000(WbranchNo='B003'(Staff))



32/112

32


In a sequence of Projection operations, only the

last in the sequence is required.

4L4

M

4N

(R) = 4L

(R)

For example:

4lName4branchNo, lName(Staff) = 4lName (Staff)



33/112

33


Commutativity of Selection and Projection.

If predicate p involves only attributes in projection list,

Selection and Projection operations commute:

4Ai, , Am(Wp(R)) = Wp(4Ai, , Am(R))

where p {A1, A2, , Am}

For example:

4fName, lName

(WlName='Beech'

(Staff)) =

WlName='Beech'(4fName,lName(Staff))



34/112

34


Commutativity of Theta join (and Cartesianproduct).

R p S = S p R

R X S = S X R

Rule also applies to Equijoin and Natural join.For example:

Staff staff.branchNo=branch.branchNo Branch =

Branch staff.branchNo=branch.branchNoStaff



35/112

35


Commutativity of Selection and Theta join (orCartesian product).

If selection predicate involves only attributes ofone of join relations, Selection and Join (orCartesian product) operations commute:

Wp(R r S) = (Wp(R)) r S

Wp(R X S) = (Wp(R)) X S

where p {A1, A2, , An}



36/112

36


If selection predicate is conjunctive predicate

having form (p q), where p only involves

attributes of R, and q only attributes of S,

Selection and Theta join operations commute as:

Wp q(R r S) = (Wp(R)) r (Wq(S))

Wp q(R X S) = (Wp(R)) X (Wq(S))



37/112

37


For example:

Wposition='Manager' city='London'(Staff

Staff.branchNo=Branch.branchNo Branch) =(Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo

(Wcity='London' (Branch))



38/112

38


Commutativity of Projection and Theta join (orCartesian product).

If projection list is of form L = L1 L2, where L1only has attributes of R, and L2 only hasattributes of S, provided join condition onlycontains attributes of L, Projection and Thetajoin commute:

4L1L2(R r S) = (4L1(R)) r (4L2(S))



39/112

39


If join condition contains additional attributes

not in L (M = M1 M2 where M1 only has

attributes of R, and M2 only has attributes of S),

a final projection operation is required:

4L1L2(R r S) = 4L1L2( (4L1M1(R)) r(4L2M2(S)))



40/112

40


For example:

4position,city,branchNo(Staff Staff.branchNo=Branch.branchNo Branch)

=

(4

position, branchNo(Staff)) Staff.branchNo=Branch.branchNo (4city, branchNo (Branch))

and using the latter rule:

4position, city

(StaffStaff.branchNo=Branch.branchNo

Branch) =

4position, city ((4position, branchNo(Staff))

Staff.branchNo=Branch.branchNo ( 4city, branchNo (Branch)))



41/112

41


Commutativity of Union and Intersection (but

not set difference).

R

S = S

RR S = S R



42/112

42


Commutativity of Selection and set operations

(Union, Intersection, and Set difference).

Wp(R

S) =W

p(S) W

p(R)Wp(R S) = Wp(S) Wp(R)

Wp(R - S) = Wp(S) - Wp(R)



43/112

43


Commutativity of Projection and Union.

4L(R S) = 4L(S) 4L(R)

Associativity of Union and Intersection (but not

Set difference).

(R S) T = S (R T)

(R S) T = S (R T)



44/112

44


Associativity of Theta join (and Cartesian product).

Cartesian product and Natural join are always

associative:

(R S) T = R (S T)

(R X S) X T = R X (S X T)

If join condition q involves attributes only from S

and T, then Theta join is associative:(R p S) q r T = R p r (S q T)



45/112

45


For example:

(Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

ownerNo=Owner.ownerNo staff.lName=Owner.lName Owner =

Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName

(PropertyForRent ownerNo Owner)



46/112

46

Example 21.3 Use of Transformation Rules

For prospective renters of flats, find propertiesthat match requirements and owned by CO93.


FROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = Flat AND

c.clientNo = v.clientNo AND

v.propertyNo = p.propertyNo AND

c.maxRent >= p.rent ANDc.prefType = p.type AND

p.ownerNo = CO93;



47/112

47




48/112

48




49/112

49




50/112

50

Heuristical Processing Strategies

Perform Selection operations as early as possible.

Keep predicates on same relation together.

Combine Cartesian product with subsequent

Selection whose predicate represents join

condition into a Join operation.

Use associativity of binary operations to

rearrange leaf nodes so leaf nodes with mostrestrictive Selection operations executed first.



51/112

51

Heuristical Processing Strategies

Perform Projection as early as possible.

Keep projection attributes on same relation together.

Compute common expressions once.

If common expression appears more than once, and

result not too large, store result and reuse it when

required.

Useful when querying views, as same expression is usedto construct view each time.



52/112

52

Cost Estimation for RA Operations

Many different ways of implementing RAoperations.

Aim of QO is to choose most efficient one.

Use formulae that estimate costs for a number ofoptions, and select one with lowest cost.

Consider only cost of disk access, which is usuallydominant cost in QP.

Many estimates are based on cardinality of therelation, so need to be able to estimate this.



53/112

53

Database Statistics

Success of estimation depends on amount and

currency of statistical information DBMS holds.

Keeping statistics current can be problematic.

If statistics updated every time tuple is changed,this would impact performance.

DBMS could update statistics on a periodic basis,

for example nightly, or whenever the system is

idle.



54/112

54

Typical Statistics for Relation R

nTuples(R) - number of tuples in R.

bFactor(R) - blocking factor of R.

nBlocks(R) - number of blocks required to store R:

nBlocks(R) = [nTuples(R)/bFactor(R)]



55/112

55

Typical Statistics for Attribute A of Relation R

nDistinctA(R) - number of distinct values that

appear for attribute A in R.

minA(R),maxA(R)

minimum and maximum possible values

for attribute A in R.

SCA(R) - selection cardinality of attribute A in R.

Average number of tuples that satisfy anequality condition on attribute A.



56/112

56

Statistics for Multilevel Index I on Attribute A

nLevelsA(I) - number of levels in I.

nLfBlocksA(I) - number of leaf blocks in I.



57/112

57

Selection Operation

Predicate may be simple or composite.

Number of different implementations, dependingon file structure, and whether attribute(s)

involved are indexed/hashed.Main strategies are:

Linear Search (Unordered file, no index).

Binary Search (Ordered file, no index).

Equality on hash key.Equality condition on primary key.



58/112

58

Selection Operation

Inequality condition on primary key.

Equality condition on clustering (secondary)

index.

Equality condition on a non-clustering(secondary) index.

Inequality condition on a secondary B+-tree

index.



59/112

59

Estimating Cardinality of Selection

Assume attribute values are uniformly distributed

within their domain and attributes are

independent.

nTuples(S) = SCA(R)

For any attribute B { A of S, nDistinctB(S) =

nTuples(S) if nTuples(S) < nDistinctB(R)/2

nDistinctB(R) if nTuples(S) > 2*nDistinctB(R)[(nTuples(S) + nDistinctB(R))/3] otherwise



60/112

60

Linear Search (Ordered File, No Index)

May need to scan each tuple in each block tocheck whether it satisfies predicate.

For equality condition on key attribute, cost

estimate is:[nBlocks(R)/2]

For any other condition, entire file may need to besearched, so more general cost estimate is:

nBlocks(R)



61/112

61

Binary Search (Ordered File, No Index)

If predicate is of form A = x, and file is orderedon key attribute A, cost estimate:

[log2(nBlocks(R))]

Generally, cost estimate is:[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1

First term represents cost of finding first tupleusing binary search.

Expect there to be SCA(R) tuples satisfyingpredicate.



62/112

62

Equality of Hash Key

If attribute A is hash key, apply hashing

algorithm to calculate target address for tuple.

If there is no overflow, expected cost is 1.

If there is overflow, additional accesses may benecessary.



63/112

63

Equality Condition on Primary Key

Can use primary index to retrieve single record

satisfying condition.

Need to read one more block than number of

index accesses, equivalent to number of levels inindex, so estimated cost is:

nLevelsA(I) + 1



64/112

64

Inequality Condition on Primary Key

Can first use index to locate record satisfying

predicate (A = x).

Provided index is sorted, records can be found by

accessing all records before/after this one.Assuming uniform distribution, would expect

half the records to satisfy inequality, so estimated

cost is:

nLevelsA(I) + [nBlocks(R)/2]



65/112

65

Equality Condition on Clustering Index

Can use index to retrieve required records.

Estimated cost is:

nLevelsA(I) + [SC

A(R)/bFactor(R)]

Second term is estimate of number of blocks that

will be required to store number of tuples that

satisfy equality condition, represented as SCA(R).



66/112

66

Equality Condition on Non-Clustering Index

Can use index to retrieve required records.

Have to assume that tuples are on different

blocks (index is not clustered this time), so

estimated cost becomes:

nLevelsA(I) + [SCA(R)]



67/112

67

Inequality Condition on a Secondary B+-

Tree Index

From leaf nodes of tree, can scan keys from

smallest value up to x (< or or >=).

Assuming uniform distribution, would expecthalf the leaf node blocks to be accessed and, via

index, half the file records to be accessed.

Estimated cost is:

nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]



68/112

68

Composite Predicates - Conjunction

without Disjunction

May consider following approaches:

- If one attribute has index or is ordered, can use one of

above selection strategies. Can then check each retrieved

record.

- For equality on two or more attributes, with composite

index (or hash key) on combined attributes, can search

index directly.

- With secondary indexes on one or more attributes

(involved only in equality conditions in predicate), could

use record pointers if exist.



69/112

69

Composite Predicates - Selections with

Disjunction

If one term contains an (OR), and term requires

linear search, entire selection requires linear

search.

Only if index or sort order exists on every termcan selection be optimized by retrieving records

that satisfy each condition and applying union

operator.

Again, record pointers can be used if they exist.



70/112

70

Join Operation

Main strategies for implementing join:

Block Nested Loop Join.

Indexed Nested Loop Join.

Sort-Merge Join.

Hash Join.



71/112


72/112

72

Estimating Cardinality of Join

If assume uniform distribution, can estimate for

Equijoins with a predicate (R.A = S.B) as follows:

If A is key of R: nTuples(T) e nTuples(S)

If B is key of S: nTuples(T) e nTuples(R)

Otherwise, could estimate cardinality of join as:

nTuples(T) = SCA(R)*nTuples(S) or

nTuples(T) = SCB(S)*nTuples(R)



73/112

73

Block Nested Loop Join

Simplest join algorithm is nested loop that joinstwo relations together a tuple at a time.

Outer loop iterates over each tuple in R, andinner loop iterates over each tuple in S.

As basic unit of reading/writing is a disk block,better to have two extra loops that processblocks.

E

stimated cost of this approach is:nBlocks(R) + (nBlocks(R) * nBlocks(S))



74/112

74

Block Nested Loop Join

Could read as many blocks as possible of smallerrelation, R say, into database buffer, saving oneblock for inner relation and one for result.

New cost estimate becomes:

nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))]

If can read all blocks of R into the buffer, thisreduces to:

nBlocks(R) + nBlocks(S)



75/112

75

Indexed Nested Loop Join

If have index (or hash function) on joinattributes of inner relation, can use indexlookup.

For each tuple in R, use index to retrievematching tuples of S.

Cost of scanning R is nBlocks(R), as before.

Cost of retrieving matching tuples in S dependson type of index and number of matching tuples.

If join attribute A in S is PK, cost estimate is:

nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)



76/112

76

Sort-Merge Join

For Equijoins, most efficient join is when both

relations are sorted on join attributes.

Can look for qualifying tuples merging relations.

May need to sort relations first.

Now tuples with same join value are in order.

If assume join is *:* and each set of tuples with

same join value can be held in database buffer at

same time, then each block of each relation need

only be read once.



77/112

77

Sort-Merge Join

Cost estimate for the sort-merge join is:

nBlocks(R) + nBlocks(S)

If a relation has to be sorted, R say, add:

nBlocks(R)*[log2(nBlocks(R)]



78/112

78

Hash Join

For Natural or Equijoin, hash join may be used.

Idea is to partition relations according to somehash function that provides uniformity andrandomness.

Each equivalent partition should hold samevalue for join attributes, although it may holdmore than one value.

Cost estimate of hash join as:

3(nBlocks(R) + nBlocks(S))



79/112


80/112

80

Estimating Cardinality of Projection

When projection contains key, cardinality is:

nTuples(S) = nTuples(R)

If projection consists of a single non-key

attribute, estimate is:

nTuples(S) = SCA(R)

Otherwise, could estimate cardinality as:

nTuples(S) e min(nTuples(R), 4im

=1(nDistinctai(R)))



81/112

81

Duplicate Elimination using Sorting

Sort tuples of reduced relation using all

remaining attributes as sort key.

Duplicates will now be adjacent and can be

removed easily. Estimated cost of sorting is:

nBlocks(R)*[log2(nBlocks(R))].

Combined cost is:

nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]



82/112

82

Duplicate Elimination using Hashing

Two phases: partitioning and duplicate

elimination.

In partitioning phase, for each tuple in R,

remove unwanted attributes and apply hashfunction to combination of remaining attributes,

and write reduced tuple to hashed value.

Two tuples that belong to different partitions are

guaranteed not to be duplicates.

Estimated cost is: nBlocks(R) + nB



83/112

83

Set Operations

Can be implemented by sorting both relations on

same attributes, and scanning through each of

sorted relations once to obtain desired result.

Could use sort-merge join as basis. Estimated cost in all cases is:

nBlocks(R) + nBlocks(S) +

nBlocks(R)*[log2(nBlocks(R))] +nBlocks(S)*[log2(nBlocks(S))]

Could also use hashing algorithm.



84/112

84

Estimating Cardinality of Set Operations

As duplicates are eliminated when performing

Union, difficult to estimate cardinality, but can

give an upper and lower bound as:

max(nTuples(R), nTuples(S)) e nTuples(T) e

nTuples(R) + nTuples(S)

For Set Difference, can also give upper and lower

bound:

0 e nTuples(T) e nTuples(R)



85/112

85

Aggregate Operations

SELECT AVG(salary)

FROM Staff;

To implement query, could scan entire Staff

relation and maintain running count of number

of tuples read and sum of all salaries.

Easy to compute average from these two running

counts.



86/112

86

Aggregate Operations

SELECT AVG(salary)

FROM Staff

GROUP BY branchNo;

For grouping queries, can use sorting or hashing

algorithms similar to duplicate elimination.

Can estimate cardinality of result using

estimates derived earlier for selection.



87/112

87

Enumeration of Alternative Strategies

Fundamental to efficiency of QO is the searchspace of possible execution strategies and theenumeration algorithm used to search this space.

Query with 2 joins gives 12 join orderings:R (S T) R (T S) (S T) R (T S) R

S (R T) S (T R) (R T) S (T R) S

T (R S) T (S R) (R S) T (S R) T

With n relations, (2(n 1))!/(n 1)! orderings.

Ifn = 4 this is 120; ifn = 10 this is > 176 billion.

Compounded by different selection/join methods. Pearson Education Limited 1995, 2005


88/112

88

Pipelining

Materialization - output of one operation isstored in temporary relation for processing bynext.

Could also pipeline results of one operation toanother without creating temporary relation.

Known as pipelining or on-the-fly processing.

Pipelining can save on cost of creatingtemporary relations and reading results back inagain.

Generally, pipeline is implemented as separateprocess or thread.



89/112

89

Types of Trees



90/112

90

Pipelining

With linear trees, relation on one side of eachoperator is always a base relation.

However, as need to examine entire inner relationfor each tuple of outer relation, inner relationsmust always be materialized.

This makes left-deep trees appealing as innerrelations are always base relations.

Reduces search space for optimum strategy, andallows QO to use dynamic processing.

Not all execution strategies are considered.



91/112

91

Physical Operators & Strategies

Term physical operator refers to specific

algorithm that implements a logical operation,

such as selection or join.

For example, can use sort-merge join toimplement the join operation.

Replacing logical operations in a R.A.T. with

physical operators produces an execution strategy

(or query evaluation plan or access plan).



92/112

92

Physical Operators & Strategies



93/112

93

Reducing the Search Space

Restriction 1: Unary operations processed on-the-fly: selections processed as relations areaccessed for first time; projections processed asresults of other operations are generated.

Restriction 2: Cartesian products are neverformed unless query itself specifies one.

Restriction 3: Inner operand of each join is abase relation, never an intermediate result. Thisuses fact that with left-deep trees inner operand is

a base relation and so already materialized.Restriction 3 excludes many alternative strategies

but significantly reduces number to be considered.



94/112

94

Dynamic Programming

Enumeration of left-deep trees using dynamic

programming first proposed for System R QO.

Algorithm based on assumption that the cost

model satisfies principle of optimality. Thus, to obtain optimal strategy for query with n

joins, only need to consider optimal strategies for

subexpressions with (n 1) joins and extend those

strategies with an additional join. Remainingsuboptimal strategies can be discarded.



95/112

95

Dynamic Programming

To ensure some potentially useful strategies are

not discarded algorithm retains strategies with

interesting orders: an intermediate result has an

interesting order if it is sorted by a final ORDE

RBY attribute, GROUP BY attribute, or any

attributes that participate in subsequent joins.



96/112

96

Dynamic Programming



WHERE c.maxRent < 500 AND

c.clientNo = v.clientNo ANDv.propertyNo = p.propertyNo;

Attributes c.clientNo, v.clientNo, v.propertyNo,and p.propertyNo are interesting.

If any intermediate result is sorted on any of theseattributes, then corresponding partial strategymust be included in search.



97/112

97

Dynamic Programming

Algorithm proceeds from the bottom up andconstructs all alternative join trees that satisfy therestrictions above, as follows:

Pass 1: Enumerate the strategies for each baserelation using a linear search and all availableindexes on the relation. These partial strategiesare partitioned into equivalence classes based onany interesting orders. An additional equivalence

class is created for the partial strategies with nointeresting order.



98/112

98

Dynamic Programming

For each equivalence class, strategy with lowestcost is retained for consideration in next pass.

Do not retain equivalence class with no interestingorder if its lowest cost strategy is not lower thanall other strategies.

For a given relation R, any selections involvingonly attributes of R are processed on-the-fly.Similarly, any attributes of R that are not part of

the SELECT clause and do not contribute to anysubsequent join can be projected out at this stage(restriction 1 above).



99/112

99

Dynamic Programming

Pass 2: Generate all 2-relation strategies byconsidering each strategy retained after Pass 1 asouter relation, discarding any Cartesian productsgenerated (restriction 2 above). Again, any on-the-

fly processing is performed and lowest coststrategy in each equivalence class is retained.

Pass n: Generate all n-relation strategies byconsidering each strategy retained after Pass (n

1) as outer relation, discarding any Cartesianproducts generated. After pruning, now havelowest overall strategy for processing the query.



100/112

100

Dynamic Programming

Although algorithm is still exponential, there arequery forms for which it only generates O(n3)strategies, so for n = 10 the number is 1,000, whichis significantly better than the 176 billion different

join orders noted earlier.



101/112

101

Semantic Query Optimization

Based on constraints specified on the databaseschema to reduce the search space.

For example, a constraint states that staff cannotsupervise more than 100 properties, so any query

searching for staff who supervise more than 100properties will produce zero rows. Now consider:

CREATE ASSERTION ManagerSalary

CHECK(salary > 20000 AND position = Manager)

SELECT s.staffNo, fName, lName, propertyNoFROM Staff s, PropertyForRent p

WHERE s.staffNo = p.staffNo AND

position = Manager; Pearson Education Limited 1995, 2005


102/112

102

Semantic Query Optimization

Can rewrite this query as:

SELECT s.staffNo, fName, lName, propertyNo

FROM Staff s, PropertyForRent p

WHERE s.staffNo = p.staffNo AND

salary > 20000 AND position = Manager;

Additional predicate may be very useful if only

index for Staff is a B+-tree on the salary attribute.

However, additional predicate would complicatequery if no such index existed.



103/112

103

Query Optimization in Oracle

Oracle supports two approaches to queryoptimization: rule-based and cost-based.

Rule-based

15 rules, ranked in order of efficiency. Particularaccess path for a table only chosen if statementcontains a predicate or other construct thatmakes that access path available.

Score assigned to each execution strategy usingthese rankings and strategy with best (lowest)score selected.



104/112

104

QO in Oracle Rule-Based

When 2 strategies have same score, tie-break

resolved by making decision based on order in

which tables occur in the SQL statement.

Pearson Education Limited 1995,

2005


105/112

105

QO in Oracle Rule-based: Example

SELECT propertyNo

FROM PropertyForRent

WHERE rooms > 7 AND city = London

Single-column access path using index on city from

WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from

WHERE condition (rooms > 7). Rank 11.

Full table scan - rank 15.

Although there is index on propertyNo, column does notappear in WHERE clause and so is not considered byoptimizer.

Based on these paths, rule-based optimizer will choose touse index based on city column.



106/112

106

QO in Oracle Cost-Based

To improve QO, Oracle introduced cost-basedoptimizer in Oracle 7, which selects strategy thatrequires minimal resource use necessary toprocess all rows accessed by query (avoiding

above tie-break anomaly).User can select whether minimal resource usage

is based on throughputor based on response time,by setting the OPTIMIZER_MODE initialization

parameter.Cost-based optimizer also takes into

consideration hints that the user may provide.



107/112

107

QO in Oracle Statistics

Cost-based optimizer depends on statistics for alltables, clusters, and indexes accessed by query.

Users responsibility to generate these statisticsand keep them current.

Package DBMS_STATS can be used to generateand manage statistics.

Whenever possible, Oracle uses a parallel methodto gather statistics, although index statistics arecollected serially.EXECUTE

DBMS_STATS.GATHER_SCHEMA_STATS(Manager);



108/112

108

QO in Oracle Histograms

Previously made assumption that data values

within columns of a table are uniformly

distributed.

Histogram of values and their relativefrequencies gives optimizer improved selectivity

estimates in presence of non-uniform

distribution.



109/112

109


(a) uniform distribution of rooms; (b) actual non-uniformdistribution.

(a) can be stored compactly as low value (1) and high value

(10), and as total count of all frequencies (in this case, 100).



110/112

110


Histogram is data structure that can improveestimates of number of tuples in result.

Two types of histogram:

width-balanced histogram, which divides data into a

fixed number of equal-width ranges (called buckets)each containing count of number of values fallingwithin that bucket;

height-balanced histogram, which placesapproximately same number of values in each bucketso that end points of each bucket are determined byhow many values are in that bucket.



111/112

111


(a) width-balanced for rooms with 5 buckets. Each bucket

of equal width with 2 values (1-2, 3-4, etc.)

(b) height-balanced height of each column is 20 (100/5).



112/112

QO in Oracle Viewing Execution Plan

querry processing

Documents