cs346: advanced databases graham cormode [email protected] query planning and optimization

CS346: Advanced DatabasesGraham Cormode [email protected]

Query Planning and Optimization

Outline

Chapter: “Algorithms for Query Processing and Optimization” in Elmasri and Navathe

Background: “Basic SQL” and “More SQL” in Elmasri and Navatheor “SQL in ten minutes” etc.

¨ Query patterns: block (nested) queries ¨ External sorting¨ Techniques for different query operators:

– SELECT, JOIN, PROJECT¨ Query optimization: choosing between different query plans

– Heuristic (rule-based) versus cost-based query optimization

CS346 Advanced Databases2

http://www.codeproject.com/Articles/2059/SQL-in-ten-minutes

Why?

¨ Understand the separation between query specification and query execution

¨ See the process of deciding how to execute a query¨ Understand the different choices in query computation¨ Connection to many data processing tasks:

– Sorting, indexing, joins ubiquitous in data processing, analytics¨ Connects theory and practice

– Use properties of relational algebra to optimize queries


Query Processing and Optimization

¨ SQL is a high level declarative language– Say what result you want, not how to get it– Can be more than one way to process a query to get results

¨ Basic example: find maximum value of an attribute– Simple approach: just scan through all records– Use an index (if it exists): jump to end of index to find max value

¨ Gets more involved when more factors are in play– Multiple indexes, complex (nested) queries, joins, ...

¨ A query plan is the sequence of operations to answer a query– Query Processing & Optimization: choosing a good query plan– Not necessarily “the best” query plan: may take too long to find


From Query to Results


We will focus on this part

Query Representation: SQL and Algebra¨ Arbitrary queries can be quite complex

– How to automatically make query plans?¨ Observation: queries are highly structured

– Basic structures repeated at multiple levels¨ Break SQL query into query blocks

– A query block contains a single SELECT-FROM-WHERE expression– Optionally may include GROUPBY and HAVING clauses– Nested queries are parsed into separate query blocks

¨ Convert into relational algebra – Relational Operators: s (selection), p (projection), ⋈ (join)– Since SQL has aggregation (SUM, MAX), algebra also includes these


Example parsing into blocks


SELECT LNAME, FNAMEFROM EMPLOYEEWHERE SALARY > (SELECT MAX (SALARY)

FROM EMPLOYEEWHERE DNO = 5);

SELECT MAX (SALARY)FROM EMPLOYEEWHERE DNO = 5

SELECT LNAME, FNAMEFROM EMPLOYEEWHERE SALARY > C

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

Approach to Query Processing

¨ Divide-and-conquer: break the problem into standard pieces– Define different ways to implement each relational operator

SELECT, PROJECT, JOIN– Choose how to combine these ways based on estimated costs– The set of choices forms the query plan

¨ We will look at each basic operator in turn¨ Start with a more basic primitive: sorting

– Many operators can be done more efficiently if input is sorted


External Sorting

¨ In algorithms, mostly study in-memory sorting– How to sort big data files larger than memory?

¨ External sorting algorithms– External – data resides outside the memory– Typically follow a “sort-merge” strategy

Sort pieces of the file, merge these together to get final result Sort blocks as large as possible in memory

– “External memory model”: measure cost as number of disk accesses In-memory operations are (treated as) effectively free In contrast to RAM model: measure cost as number of operations


Sort-merge algorithm

¨ Parameters of the sorting procedure:– b: number of blocks in the file to be sorted– nB: available buffer space (measured in blocks)– nR: number of runs (pieces) of the file produced

nR = b/nB : break file into pieces that fit into memory

¨ Small example: b=1024, nB = 5, need 205 runs of 5 blocks¨ After first pass, have nR sorted pieces of nB blocks: merge

them together– In-memory mergesort: merge two at a time (two-way merge)– External sorting: merge as many as possible

Limiting factor: smaller of (nB - 1) and nR


Merge phases

¨ After one merge phase, now have runs (sorted pieces) of length nB

2 blocks– Can merge these together, until one sorted file emerges– Each merge phases is a “pass”: requires reading the whole data– Number of passes is small: log nR/log nB

– Can be much better than binary merging (log2 nR) if nB is large


Digression: Sorting numbers

¨ Considerable effort invested in optimizing sorting data!¨ Several competitive sort benchmarks ( sortbenchmark.org/ )

– GraySort: Fastest time to sort 100TB (2014: < half an hour)– MinuteSort: Most data sorted in 1 Minute (3.7TB)– JouleSort: Minimum energy to sort data (100K records/joule)– PennySort: Amount of data sorted using 1c of system time

(300GB), based on depreciating cost of system over 3 years¨ Original test: sort 100 million records (100MB)

– 1985 time: 1 hour– By 2000, <1 second

¨ Current winners highly distributed, parallel systems


http://sortbenchmark.org/

SELECT Operator

¨ SELECT operator: find all records that meet a condition¨ Examples:

– OP1: s SSN='123456789' (EMPLOYEE)– OP2: s DNUMBER>5(DEPARTMENT)– OP3: s DNO=5(EMPLOYEE)– OP4: s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)– OP5: s ESSN=123456789 AND PNO=10(WORKS_ON)

¨ Simple, right?


SELECT operation (basic conditions)

Surprisingly many ways to SELECT records!¨ S1 Linear search (brute force):

– Retrieve every record in the file– Test whether the attribute values satisfy the selection condition

¨ S2 Binary search:– If selection condition involves an equality comparison on a key

attribute on which the file is ordered, can binary search. (e.g. OP1, s SSN='123456789' (EMPLOYEE))

¨ S3 Use a primary index or hash key to retrieve one record:– If selection condition has an equality comparison on a key attribute

with a primary index (or a hash key)– Use the primary index (or the hash key) to retrieve the record


SELECT operation (basic conditions) ¨ S4 Use a primary index to retrieve multiple records:

– If comparison condition is >, ≥, <, or ≤ on field with primary index– Use index to find record satisfying corresponding equality condition– Then retrieve all subsequent records in the (ordered) file

¨ S5 Using a clustering index to retrieve multiple records:– If selection condition involves equality comparison on a non-key

attribute with a clustering index– Use the index to retrieve records satisfying selection condition

¨ S6 Using a secondary (B+-tree) index:– If indexing field is a (candidate) key, use index to retrieve the match– Retrieve multiple records if the indexing field is not a key– Can also use index to retrieve records on conditions with >,≥, <, ≤– Range queries: e.g. 30000 < Salary < 35000


SELECT operation (complex conditions)

Things get more complex when conditions are conjunctions (AND) of several simple conditions

E.g. OP4: s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)

¨ S7 Conjunctive selection :– If an attribute in any part of the condition has an access path

allowing S2-S6, use that condition to retrieve the records, then check if each retrieved record satisfies the whole condition

¨ S8 Conjunctive selection using a composite index:– If two or more attributes are involved in equality conditions and

a composite index (or hash structure) exists on the combined field, we can use it directly


SELECT operation

¨ S9 Conjunctive selection by intersecting record pointers:– If there are secondary indexes on all (or some) fields involved in

equality comparison conditions in the conjunctive condition– Each index can be used to retrieve the record pointers that satisfy

each individual condition– Intersecting these pointers gives the pointers that satisfy the

conjunctive condition, which can then be retrieved– If only some conditions have secondary indexes, test each

retrieved record against the full condition– Not necessary when one condition is on a key value. Why?


Choosing a method

¨ 9 possible selection methods (and counting)– How to choose between them? – How should DBMS (automatically) choose one to use?

¨ Easy case: single SELECT condition (OP1, OP2, OP3): – Either there exists a suitable access path: use it!– Or not: linear scan (S1)

¨ Harder case: conjunctive SELECT condition (OP4, OP5): – If only one method has an access path: use it! (S7)– If more than one access path: pick method yielding fewest records

But how can we know this without actually doing it?


OP1: σSSN='123456789' (EMPLOYEE)OP2: σDNUMBER>5(DEPARTMENT)OP3: σDNO=5(EMPLOYEE)OP4: σDNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)OP5: σESSN=123456789 AND PNO=10(WORKS_ON)

Selectivity estimation

¨ DBMS keeps statistics on relations, uses them to estimate costs¨ The selectivity of a condition is the fraction that satisfy it

– Zero selectivity: none satisfy the condition– Selectivity of 1: every record satisfies the condition

¨ DBMS estimates selectivity of each part of a condition– Equality on key attribute: at most one record can match– Equality on non-key attribute with d distinct values: assume 1/d

records match (uniformity assumption)– Range query for a fraction f of possible values: assume selectivity f

¨ Pick method retrieving least records based on estimated selectivity– Much work on estimating selectivity based on samples, histograms…


Disjunctive selection

¨ Conditions with OR (disjunctive) are harder to optimize– E.g. s Dno=5 OR Salary > 30000 OR Sex = ‘F’ (EMPLOYEE)– Results are all records satisfying any of the simple conditions– If any of the conditions has no access path, must linear scan– If every condition has access path, can use each in turn

Need to eliminate duplicates (later)


The JOIN operation

¨ JOIN is a costly operation to perform ¨ Most examples of join are EQUIJOIN (or NATURAL JOIN)

– Can be two–way join: a join on two files i.e. R ⋈A=B S – Or multi-way joins: joins involving more than two files

i.e. R ⋈A=B S ⋈C=D T ¨ Examples

– OP6: EMPLOYEE ⋈DNO=DNUMBER DEPARTMENT

– OP7: DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE


Implementing JOIN

As with SELECT, many ways to perform JOIN¨ J1 Nested-loop join (brute force):

– For each record t in R (outer loop): Retrieve every record s from S (inner loop) Test if the two records satisfy the join condition t[A] = s[B]

¨ J2 Single-loop join (Use an access structure to retrieve matching records):– If index (or hash key) exists for one join attribute — say, B of S:

retrieve each record t in R, one at a time use access structure to retrieve matching records s in S with

s[B] = t[A]


Implementing JOIN

¨ J3 Sort-merge join:– If R and S are physically sorted (ordered) by value of the join

attributes A and B, respectively, sort-merge join is very efficient– Both files are scanned in order of the join attributes, matching the

records that have the same values for A and B– Each file is scanned only once for matching with the other file

If A and B are both non-key, need to generate all matching pairs


X A

1 5

2 5

3 6

Y B

7 5

8 5

9 6

X A B Y

1 5 5 7

1 5 5 8

2 5 5 7

2 5 5 8

3 6 6 9

⋈ =

Implementing JOIN¨ J4 Hash-join:

– The records of R and S are both hashed with the same hash function on the join attributes A of R and B of S as hash keys.

– A single pass through the file with least records (say, R) hashes its records to the hash file buckets

– A single pass through the other file (S) then hashes each of its records to the appropriate bucket, where the record is combined with all matching records from R


JOIN performance

¨ Nested-loop join can be very expensive¨ Example: (OP6) EMPLOYEE ⋈DNO=DNUMBER DEPARTMENT

– Suppose nB = 7 disk blocks are available as file buffers– Suppose DEPARTMENT has rD = 50 records in bD = 10 disk blocks– And EMPLOYEE has rE = 6000 records in bE = 2000 disk blocks

¨ Read as many blocks from the outer loop file as possible (nB-2)– Read 1 block at a time from inner loop file, look for matches– Fill one further block with records in the join, flush to disk when full

¨ Which file is ‘inner’ and which is ‘outer’ makes a difference


Nested Loop Example

¨ Using EMPLOYEE for the outer loop: EMPLOYEE read once, DEPARTMENT read bE/(nB-2) times– Read EMPLOYEE once: bE blocks– Read DEPARTMENT bE / (nB – 2) times: bD * bE/(nB-2)– In our example: 2000 + 10 * 2000/5 = 6000 block reads

¨ Using DEPARTMENT for the outer loop: DEPARTMENT read once, EMPLOYEE read bD/(nB -2 ) times– Read DEPARTMENT once: bD blocks– Read EMPLOYEE bD / (nB – 2) times: bE * bD/(nB -2) times– In our example: 10 + 2000 * 10/5 = 4010 block reads

¨ Not counting the cost of writing results to disk


DEPARTMENT: rD=50 records, bD=10 blocksEMPLOYEE: rE=6000 records, bE=2000 blocks

JOIN selection factor

¨ JOIN selection factor: what fraction of records in one file join?– Particularly relevant for J2 single loop join (with access path)

¨ Consider OP7: DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE – Assume 50 DEPARTMENT records, 6000 EMPLOYEE records

¨ Option 1: for each EMPLOYEE record, look up in DEPARTMENT– Assume a 2 level index on SSN for DEPARTMENT– Approximate cost: bE + (rE * 3) = 2000 + 6000*3 = 20000 blocks– Join selection factor is (50/6000) = 0.008

¨ Option 2: for each DEPARTMENT record, look up in EMPLOYEE– Assume a 4 level index on SSN for EMPLOYEE– Approximate cost: bD + (rD * 5) = 10 + 50*5 = 260 blocks– Join selection factor is 1


Sort-merge JOIN efficiency

¨ Sort-merge JOIN is very efficient if files are already sorted– A single pass through each– For OP6 and OP7, need bE + bD = 2000 + 10 = 2010 block reads

¨ If the files are not sorted on join attribute, can sort them– Cost is estimated as (bE log bE + bD log bD + bE + bD)– The base of the logs depends on space available to sort– DBMS can easily find the estimated cost


X A

1 5

2 5

3 6

Y B

7 5

8 5

9 6

X A B Y

1 5 5 7

1 5 5 8

2 5 5 7

2 5 5 8

3 6 6 9

⋈ =

Partition hash JOIN

¨ Hash join is most efficient if one file can be kept in memory– Store hash table in memory, then read through larger file

¨ Partition hash JOIN:– A partitioning hash function on join attribute splits into M pieces

Use M blocks as buffers. Read R and S, write out R1...RM, S1...SM

– Then join each of the M pieces of both files: join Ri with Si

Can use whatever method is preferred Nested loop join is now efficient if one piece fits in memory

– Records with different partitioning hash values cannot join¨ Cost is low if M is chosen suitably and hash function works well

– Read each file, write out partitions, then read to join: 3 * (bR + bS)– For OP6, this is 3 * (2000 + 10) = 6030 block accesses


PROJECT operation

¨ PROJECT operation picks out certain attributes from each record– p<ATTRIBUTE LIST> R– Why is this not completely trivial?

¨ Relational semantics do not allow duplicates– PROJECT can create duplicates, so need to remove these– If attribute list includes a key of the relation, there will be no dupes

¨ Two standard approaches to duplicate elimination:– Sort the results, then scan through to find duplicates– Hash the results: duplicates hash to same bucket, & can be dropped

¨ SQL does not require elimination of duplicates by default– Must be dropped only if the query includes the DISTINCT keyword


Set Operations

¨ Other operations: UNION, INTERSECTION, SET DIFFERENCE– Require the input relations to be compatible (same attributes)– All can be implemented with variations of Sort-Merge

¨ UNION, R S: scan through sorted inputs, copy results to output– Suppress duplicate tuples

¨ INTERSECTION, R S: scan through sorted inputs– Only copy tuples that appear in both inputs

¨ SET DIFFERENCE (EXCEPT in SQL), R / S:– Scan through sorted inputs, keep those in R but not in S

¨ Can also be implemented via hashing– Exercise: how would you use hashing to implement UNION?


Aggregate Operators

¨ Aggregate operators in SQL: MIN, MAX, COUNT, AVERAGE, SUM¨ Linear scan works for all

– Keep track of running sum, count or max/min¨ MAX, MIN: Indexes (on target attribute) are helpful

– Follow the index to find the max/min value– E.g. SELECT MAX (SALARY) FROM EMPLOYEE;

¨ SUM, COUNT, AVERAGE:– If there is a dense index on target attribute, compute from index– If the index is sparse (e.g. cluster index), need extra information

E.g. keep the number of corresponding records with index entry Needs extra effort to keep this information up to date


Aggregate operators and GROUP BY

¨ Many aggregate queries include a GROUP BY– SELECT Dno, AVG(Salary)

FROM EMPLOYEEGROUP BY Dno;

– Apply the aggregate separately to each group of tuples¨ Use sorting/hashing on grouping attibute (Dno) to partition

– Then apply aggregate on group (linear scan)¨ If there is a clustering index on the grouping attribute, then

records are already partitioned correctly


Avoiding hitting the disk: pipelining

¨ It is convenient to think of each operation in isolation– Read input from disk, write results to disk

¨ But this is slow: very disk intensive– Can we avoid creating intermediate (temporary) result files?

¨ Pipelining: pass results of one operator directly to the next– Like unix pipes |

¨ Create pipelined code: implement compound operations– For a 2-way join, combine 2 selections on the input and one

projection on the output with the Join: eliminates 4 temp files¨ DBMS can generate code dynamically to pipeline operations

– Using ideas from compilers


Using Heuristics in Query Optimization¨ Steps for heuristic optimization:

– Query parser generates initial internal representation– Apply heuristics to optimize the internal representation– Query execution plan generated to use available access paths

¨ General idea: apply operations that reduce results size first – E.g., Apply SELECT and PROJECT before JOIN or similar operations– “Push down” selections (projections) in tree representation

¨ Will introduce query tree and query graph representations– Develop the idea of “equivalent query trees”


Query Trees

¨ Represent a relational algebra expression as a tree structure– Input relations are leaf nodes of the tree– Internal nodes are (relational) operations– Execution proceeds bottom-up: result is the root

¨ E.g.: PNUMBER, DNUM, LNAME, ADDRESS, BDATE(((PLOCATION=‘Stafford’(PROJECT)) ⋈DNUM=DNUMBER (DEPARTMENT)) ⋈ MGRSSN=SSN (EMPLOYEE))


SELECT P.Pnumber, P.Dnum,E.Lname, E.Address, E.Bdate

FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E

WHERE P.Dnum=D.Dnumber AND D.Mgrssn=E.Ssn AND P.Plocation=‘Stafford’;

Query trees

¨ Query trees impose a (partial) ordering on operations– (1) must be done before (2) before (3)

¨ One query can correspond to many relational algebra expressions– Hence there can be many different query trees for the same query– Query optimization goal: pick a tree that is efficient to execute

¨ Analogy: multiple mathematical expressions are equivalent– E.g. a(x + y) = ax + ay

¨ Start with an initial query tree derived from the SQL query– Often very inefficient: uses cartesian product instead of join– Initial query tree never executed, but is easier to optimize

¨ Query optimizer will use equivalence rules to transform trees



SELECT LNAMEFROM EMPLOYEE, WORKS_ON,

PROJECTWHERE PNAME = ‘AQUARIUS’ AND

PNUMBER=PNO AND ESSN=SSN

AND BDATE > ‘1957-12-31’;





AND BDATE > ‘1957-12-31’;

Final query tree





AND BDATE > ‘1957-12-31’;

Transformation rules for relational algebra¨ Rules preserve the semantics of queries: same result

– Not necessarily with the same order of attributes1. Cascade of . Conjunction of selections can be broken up:

c1 AND c2 AND … cn (R) c1 (c2 ( …. cn (R))…))

2. The operation is commutative (follows from previous) c1 (c2(R)) c2 (c1(R))

3. Cascade of . Can ignore all but the final projection: list1(list2(…(listn(R)…)) list1(R)

4. Commuting with . If selection only involves attributes in the projection list, can swap

A1, A2, … , An(c(R)) c(A1, A2, … , An (R))


Transformation rules for relational algebra

5. Commutativity of ⋈ (and ): Join (cartesian product) commutesR ⋈c S = S ⋈c R and R S = S R

6. Commuting with ⋈ (or ): If all attributes in selection condition c are in only one relation(say, R) then c( R S ) ⋈ (c(R)) S⋈

If c can be written as (c1 AND c2) where c1 is only on R, and c2 is only on S, then c( R S ) ⋈ (c1(R)) (⋈ c2(S))

7. Commuting with ⋈ (): Write projection list L={A1...An,B1... Bn} where A’s are attributes of R, B’s are attributes of S.If join condition c involves only attributes in L, then:

L(R ⋈c S) ( A1,...An(R)) ⋈c (B1,...Bn(S))

Can also allow join on attributes not in L with an extra projectionCS346 Advanced Databases42

Transformation rules for relational algebra

8. Commutativity of set operatorsOperations and are commutative ( / is not)

9. Associativity of , ⋈ , , These operations are individually associative: (R S) T = R (S T) where is the same op throughout

10. Commuting with set operations commutes with , and / : c (r S) (c(R)) (c(S))

11. The operation commutes with : L(R S) (L(R)) (L(S))

12. Converting a (, ) sequence into a ⋈If the condition c of selection following a corresponds to a join condition, then: (c(R S)) (R ⋈c S)


Yet more transformation rules

¨ Other rules from arithmetic and logic can be applied¨ E.g. DeMorgan’s laws:

NOT (c1 AND c2) (NOT c1) OR (NOT c2)NOT (c1 OR c2) (NOT c1) AND (NOT c2)


Algebraic Optimization

¨ How to go from rules into an algorithm for optimizing queries?– What order to apply rules? Avoid going round in circles…

¨ Using rule 1, break up SELECTs with ANDs into a cascade– Makes it easier to move them around the tree

¨ Use rules 2, 3, 6, 10 to move SELECTs as far down as possible– SELECT reduces data size so do it as soon as possible

¨ Use rules 5 and 9 (commutativity and associativity of binary ops):– Arrange that leaf operations with most restrictive SELECT are first– Use selectivity estimates to determine this – But avoid creating cartesian products


Algebraic Optimization Continued

¨ Use Rule 12 to turn cartesian product + select into join¨ Use rules 3, 4, 7, 11 to push PROJECT as far down the tree as

possible, creating new PROJECT operations if needed– Only attributes needed higher up the tree should remain

¨ Find subtrees that can be pipelined into a single operation

¨ The example from earlier follows this sequence of steps¨ Main points to remember:

– First apply operations that reduce the size of intermediate results– Perform select and project early, to minimize result size– The most restrictive select and join should be done earliest


Query Trees to Query Plans


¨ Now we have a query tree… but we’re not quite done– Need to specify a few more points to make the final plan– Tree specifies order of operations, not the implementation to use

¨ E.g. for tree below, DBMS could specify:– Use secondary index to search for SELECT on DEPARTMENT– Single-loop join algorithm to do the join using an index on Dno– Linear scan of the join result to do the project– Lastly, specify whether intermediate

results are materialized on disk orpipelined direct to next operation

Cost-based query optimization

¨ Cost-based query optimization assigns costs to different strategies– Aims to pick the strategy with the lowest (estimated) cost– Sits in place of or in addition to heuristic query optimization

¨ Can sometimes be time-consuming to compare costs– Better suited to queries that are compiled (to run multiple times)

¨ Does not guarantee finding the best (optimal) strategy– Poor estimates of costs could pick a suboptimal strategy– Not time-effective to consider every possible strategy

¨ Need cost functions to assign cost to each operation


Measuring the cost

Many different dimensions over which cost could be measured: ¨ Access cost to secondary storage (disk), or I/O cost

– The (time/number) of disk accesses¨ Disk storage cost

– The size of intermediate files stored on disk¨ Computation cost (also known as CPU cost)

– The time to perform the operations on data in memory¨ Memory usage cost: how much memory is needed?¨ Communication cost: how much data goes over the network? For large databases, disk cost dominatesFor small databases in memory, CPU cost dominates


The DBMS catalog

¨ The catalog(ue) for a database stores metadata– Information about the tables, types of field within the tables– Stored as a database itself, and can be queries– Information for query optimization can also be stored in a catalog

¨ Statistics (typically) stored about each database file:– Number of records (tuples), r– (Average) record size, R– Number of file blocks, b– Blocking factor, bfr – File organization: unordered, ordered, hashed


The DBMS catalog

¨ Information stored about fields in each file: – Number of distinct values observed, d– (Average) attribute selectivity for an equality condition, sl

Estimated from number of distinct values May keep a histogram if this is very non-uniform

¨ Information stored about each index – Number of levels in each multilevel index, x– Number of attributes in first-level index blocks, bI1


Example Cost Estimates for SELECT

Some simple functions to estimate number of block transfers¨ S1: Linear search (brute force)

– Simply check every block, so cost C = b, number of blocks¨ S2: Binary search on sorted file

– Access approximately C = log2 b + s/bfr – 1 blocks– Search cost + number of blocks that match the criteria– Simplifies to log2 b if equality condition is on a key attribute

¨ S3: Use hash key or primary index to retrieve a record– C = x + 1 to search an x-level index– C = 1 if using hashing: just jump directly to disk block

C is a larger constant (2 or 3) if e.g. extendible hashing


Example cost functions for SELECT

¨ S4: use an ordering index to retrieve multiple records– If condition is >, , , < on a key field, assume half the records pass– So cost function is C = x + (b/2)– Better estimates from using histograms/samples of data

¨ S5: use clustering index to retrieve multiple records– First traverse the index to find the first matching record (x)– Then one disk block access for each matching record (s/bfr)– Total estimated cost is C = x + s / bfr

¨ S6: use secondary index (B+-tree)– For equality on a key attribute, cost is C = x + 1– For equality on non-key attributes, cost is C = x +1 + s– For >, , , < conditions, C = x + f(bI1 + r) – if fraction f records match


Example cost functions for SELECT

¨ S7 Conjunctive selection (is AND of multiple conditions)– Use any of the above methods for one condition– Check that the retrieved records match all the required conditions– No extra disk accesses for the checks so cost is of the initial method– If multiple indexes exist, can first take intersection of pointers


Example using cost functions

¨ Query optimizer finds possible strategies, estimates cost of each¨ Simple example for SELECT

– EMPLOYEE file has rE = 10,000 records in bE = 2000 disk blocks Blocking factor bfr = 5

– Clustering index on Salary has x = 3 levels and average selection cardinality s = 20

– Secondary index on key attribute SSN with x = 4 levels– Secondary index on nonkey attribute Dno with x = 2 levels

and bI1 = 4 first level bocks. 125 distinct values of Dno– Secondary index on sex with x = 1 levels, and d = 2 distinct values


SELECT example

¨ OP1: sSSN='123456789' (EMPLOYEE)– S1 Brute force: read on average 1000 blocks (half the file)– S6 use index: 4 blocks

¨ OP2: s DNUMBER>5(DEPARTMENT)– S6: use index on dno: 2 + (4/2) + 10000/2 = 5000

Assumes half records match, incur cost of reading each block Better to use brute force linear scan!

¨ OP3: s DNO=5(EMPLOYEE)– S6: use index on Dno, estimate 10000/125 = 80 records

Cost is 2 + 80 = 82¨ OP4: s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)

– Select on Dno: 82; Select on salary: ~1000; select on sex: ~5000CS346 Advanced Databases56

EMPLOYEE: rE = 10,000 records in bE = 2000 disk blocks

Cluster index on Salary: x=3 levels, selection cardinality s=20Secondary index on key attribute SSN with x = 4 levelsSecondary index on nonkey attribute Dno with x = 2 levels

and bI1 = 4 first level bocks. 125 distinct values of Dno

Secondary index on sex with x=1 levels, d=2 distinct values

Join Selectivity

¨ To estimate cost of JOIN, need to know how many tuples result– Store as a selectivity ratio: size of join to size of cartesian product– Join selectivity js = |(R ⋈c S)| / |(R S)| = |(R ⋈c S)|/(|R|*|S|)

¨ Join selectivity js takes on values from 0 to 1– If no condition c, js = 1, result is just cartesian product– If no tuples meet the join condition, js = 0, result is empty

¨ Special case when c is an equality condition, R.A = S.B– If A is a key of R, then |(R ⋈c S )| can be at most |S| [Why?]

js 1/|R| If B is a foreign key of S then size of join is exactly |S|

– By symmetry, if B is a key of S, then js 1/|S|


JOIN estimation for equijoins

Estimate cost for |(R ⋈A=B S )| assuming an estimate for js¨ J1: Nested loop join

– Suppose R is used for outerloop, blocking factor for result is bfrRS

– Joint cost C = bR + (bR * bS) + (js * |R| * |S|/bfrRS)– Refines the earlier cost for nested-loop by including output size– With nB buffers, C = bR + (bR/(nB-s)*bS) + (js*|R|*|S|/bfrRS)

¨ J3: Sort-merge join– If files are already sorted, C = bR + bS + (js * |R| * |S|/bfrRS)– If not, add the cost of sorting (estimated as b log b)


JOIN estimation for equijoins

¨ J2: Single loop join with access path for matching records– Use index on B with x levels– sB denotes the average number of tuples matching in S– Secondary index: C = bR + (|R| * (x + 1 + sB) + (js*|R|*|S|/bfrRS))– Cluster index: C = bR + (|R| * (x + 1 + sB/bfrB) + (js*|R|*|S|/bfrRS))

Cluster index means all matching records are sequential – Primary index: C = bR + (|R| * (x + 1) + (js*|R|*|S|/bfrRS))

Primary index implies can only be one matching record– Hash key on B of S: C = bR + (|R| * h) + (js*|R|*|S|/bfrRS))

h is average number of reads to find a record, should be 1 or 2


JOIN example

Size of result is (125 * 10,000 )/(125 * 4) = 2500 with bfrED = 4

1. J1 Nested loop join with EMPLOYEE as outer loop– C = bE + (bE * bD) + [output] = 2000 + 2000*13 + 2500 = 30,500

2. J1 Nested loop join with DEPARTMENT as outer loop– C = bD + (bE * bD) + [output] = 13 + 2000*13 + 2500 = 28,513

3. J2 Single loop join with EMPLOYEE as outer loop– C = bE + (rE * (x + 1)) + [output] = 2000 + (10,000*2) + 2500 = 24,500

4. J2 Single loop join with DEPARTMENT as outer loop– C = bD + (rD * (x + s)) + [output] = 13 + (125 * (2+80)) = 2500 = 12763

¨ Can do better: DEPARTMENT is small, so keep in memory (case 2)CS346 Advanced Databases60

EMPLOYEE file: rE = 10,000 records in bE = 2000 blocksSecondary index on nonkey attribute Dno:

x = 2 levels and bI1 = 4 first level bocks. 125 distinct values of Dno

DEPARTMENT file: rD = 125 records in bD = 13 blocksPrimary index on Dnumber with x = 1 levels

OP6: EMPLOYEE ⋈DNO=DNUMBER DEPARTMENT

Assume js for OP6 is 1/125 (number of depts)

JOIN ordering

¨ Commutativity & associativity of join give many equivalent trees– Exponential growth with number of joins– Not feasible to estimate all costs

¨ Restrict to a subset of possible query trees– Left-deep tree is one where every right child is a base relation


Joins and trees

¨ With left-deep trees, right child is always the “inner”relation– When doing a nested loop join, or single loop join

¨ Left-deep trees are amenable to pipelining– Standard structure is easy to generate code for

¨ Base relation on the right means access path can be used (if present)


Query optimization example

Starting point: result of heuristic query optimization


SELECT P.Number, P.Dnum,E.Lname, E.Address, E.Bdate

FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E

WHERE P.Dnum=D.Dnumber AND D.Mgrssn=E.Ssn AND P.Plocation=‘Stafford’;

Query optimization example

Several possible join orders:– (PROJECT DEPARTMENT) EMPLOYEE⋈ ⋈– (DEPARTMENT PROJECT) EMPLOYEE⋈ ⋈– (DEPARTMENT EMPLOYEE) PROJECT⋈ ⋈– (EMPLOYEE DEPARTMENT) PROJECT⋈ ⋈

¨ Consider first ordering, (PROJECT DEPARTMENT) EMPLOYEE⋈ ⋈– Compare using index to PROJECT table to linear scan– Estimate number of matching records with location Stafford– Use of index requires ~100 block accesses, so better than scan– Estimate join sizes to get size of temporary files– Combine with cost of second join to get total estimate (+32 blocks)

¨ Repeat for other join orders. See textbook for more details


Query Optimization in Oracle

¨ Oracle: major example of a commercial DBMS– Largest market share, beating IBM, Microsoft, SAP and teradata

¨ Used to use a rule-based approach– Used a list of 15 possible access path types, from best to worst

¨ Current approach: cost-based optimization– Cost aims to be proportional to total query run time– Combination of costs from I/O, CPU and memory

¨ Allows developers to include “hints” in the SQL queries– Tell the optimizer to use a certain join method or access path

¨ Example of hint: consider an attribute with few distinct values– Simple assumption would be that all are equally likely– But if a value in the query is rare, tell (hint) optimizer to use index


Summary


¨ Queries follow simple query patterns: block (nested) queries ¨ Make use of external sorting to help query evaluation¨ There are many techniques for different query operators:

– Many different choices for each of SELECT, JOIN, PROJECT¨ Query optimization: choosing between different query plans

– Heuristic (rule-based) versus cost-based query optimization– Used in real-world databases (Oracle, MySQL, MS SQLServer…)– SQL ‘EXPLAIN’ keyword gives details of the query plan chosen

Chapter: “Algorithms for Query Processing and Optimization” in Elmasri and Navathe

cs346: advanced databases graham cormode [email protected] query planning and optimization

Documents