Transcript
Page 1: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST

L05: Query Processing & Tuning Queries

Query Processing -- Principles Tuning Queries

Page 2: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 2

Query Processing & Optimization

Any high-level query (SQL) on a database must be processed, optimized and executed by the DBMS

The high-level query is scanned, and parsed to check for syntactic correctness

An internal representation of a query is created, which is either a query tree or a query graph

The DBMS then devises an execution strategy for retrieving the result of the query. (An execution strategy is a plan for executing the query by accessing the data, and storing the intermediate results)

The process of choosing one out of the many execution strategies is known as query optimization

Page 3: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 3

Query Processor & Query Optimizer

A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level query

Queries expressed in SQL can have multiple equivalent relational algebra query expressions. The query optimizer must select the optimal one

Page 4: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 4

Database Query Processing

Quer y

Quer yResul t s

Par ser

QuerOpt i mi zor

CodeGener at i on

Runt i meSuppor t

Dat abase

St at i st i cs

Dat abase

St or age

I nternalRepresentat i on

QueryExecut i on Pl an

Execut abl eCode

Page 5: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 5

Relational Operations

We will consider how to implement: Selection : Selects a subset of rows from relation. Projection : Deletes unwanted columns from

relation. Join : Allows us to combine two relations.

Since each operation returns a relation, opreations can be composed!

Page 6: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 6

Simple Selections

Of the form Size of result approximated as size of R * reduction

factor; we will consider how to estimate reduction factors later.

With no index, unsorted: Must essentially scan the whole relation; cost is |R| (# pages in R).

With an index on selection attribute: Use index to find qualifying data entries, then retrieve corresponding data records. (Hash index useful only for equality selections.)

SELECT *FROM Reserves RWHERE R.rname < ‘C%’

R attr valueop R. ( )

Page 7: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 7

Using an Index for Selections

Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data entries (typically small)

plus cost of retrieving records (could be large w/o clustering).

In example, assuming uniform distribution of names, about 10% of tuples qualify (100 pages, 10000 tuples). With a clustered index, cost is little more than 100 I/Os; if unclustered, upto 10000 I/Os!

Important refinement for unclustered indexes: 1. Find qualifying data entries. 2. Sort the rid’s of the data records to be retrieved. 3. Fetch rids in order. This ensures that each data page is

looked at just once (though # of such pages likely to be higher than with clustering).

Page 8: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 8

Selection - Summary

R.attr op value (R) Size of result = R * selectivity Processing methods

Table scan Index scan

• B+-tree, Hash index• Clustered vs. non-clustered

– Clustered index: Good– Non-clustered index:

• Good for low selectivity• Worse than scan for high selectivity

SELECT *FROM Reserves RWHERE R.rname < ‘C%’

Page 9: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 9

General Selection Conditions

Such selection conditions are first converted to conjunctive normal form (CNF):

(day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3)

We only discuss the case with no ORs (a conjunction of terms of the form attr op value).

An index matches (a conjunction of) terms that involve only attributes in a prefix of the search key. Index on <a, b, c> matches a=5 AND b= 3, but not b=3.

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3

Page 10: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 10

Two Approaches to General Selections

First approach: Find the most selective access path, retrieve tuples using it, and apply any remaining terms that don’t match the index: Most selective access path: An index or file scan that we

estimate will require the fewest page I/Os. Terms that match this index reduce the number of tuples

retrieved; other terms are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched.

Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Similarly, a hash index on <bid, sid> could be used; day < 8/9/94 must then be checked.

Page 11: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 11

Intersection of Rids

Second approach (if we have 2 or more matching indexes that use Alternatives (2) or (3) for data entries): Get sets of rids of data records using each matching index. Then intersect these sets of rids (we’ll discuss intersection

soon!) Retrieve the records and apply any remaining terms. Consider day<8/9/94 AND bid=5 AND sid=3. If we have

a B+ tree index on day and an index on sid, both using Alternative (2), we can retrieve rids of records satisfying day<8/9/94 using the first, rids of recs satisfying sid=3 using the second, intersect, retrieve records and check bid=5.

Page 12: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 12

The Projection Operation

An approach based on sorting: Modify Pass 0 of external sort to eliminate unwanted fields.

Thus, runs of about 2B pages are produced, but tuples in runs are smaller than input tuples. (Size ratio depends on # and size of fields that are dropped.)

Modify merging passes to eliminate duplicates. Thus, number of result tuples smaller than input. (Difference depends on # of duplicates.)

Cost: In Pass 0, read original relation (size M), write out same number of smaller tuples. In merging passes, fewer tuples written out in each pass. Using Reserves example, 1000 input pages reduced to 250 in Pass 0 if size ratio is 0.25

SELECT DISTINCT R.sid, R.bidFROM Reserves R

Page 13: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 13

Projection Based on Hashing

Partitioning phase: Read R using one input buffer. For each tuple, discard unwanted fields, apply hash function h1 to choose one of B-1 output buffers. Result is B-1 partitions (of tuples with no unwanted fields).

2 tuples from different partitions guaranteed to be distinct. Duplicate elimination phase: For each partition, read it and

build an in-memory hash table, using hash fn h2 (<> h1) on all fields, while discarding duplicates. If partition does not fit in memory, can apply hash-based

projection algorithm recursively to this partition. Cost: For partitioning, read R, write out each tuple, but with

fewer fields. This is read in next phase.

Page 14: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 14

Discussion of Projection

Sort-based approach is the standard; better handling of skew and result is sorted.

If an index on the relation contains all wanted attributes in its search key, can do index-only scan. Apply projection techniques to data entries (much

smaller!) If an ordered (i.e., tree) index contains all wanted attributes as

prefix of search key, can do even better: Retrieve data entries in order (index-only scan), discard

unwanted fields, compare adjacent tuples to check for duplicates.

Page 15: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 15

Equality Joins With One Join Column

In algebra: R S. Common! Must be carefully optimized.

R S is large; so, R S followed by a selection is inefficient.

Cost metric: # of I/Os. We will ignore output costs.

SELECT *FROM Reserves R1, Sailors S1WHERE R1.sid=S1.sid

Page 16: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 16

Notations

|R| = number of pages in outer table R ||R|| = number of tuples in outer table R |S| = number of pages in inner table S ||S|| = number of tuples in inner table S M = number of main memory pages allocated

Page 17: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 17

Example of Join

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day rname

31 101 10/11/96 lubber58 103 11/12/96 dustin

sid sname rating age bid day rname

31 lubber 8 55.5 101 10/11/96 lubber58 rusty 10 35.0 103 11/12/96 dustin

SELECT *FROM Sailors R, Reserve SWHERE R.sid=S.sid

Page 18: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 18

Schema for Examples

Similar to old schema; rname added for variations. Reserves:

Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.

Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages.

Sailors (sid: integer, sname: string, rating: integer, age: real)Reserves (sid: integer, bid: integer, day: dates, rname: string)

Page 19: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 19

Simple Nested Loops Join

For each tuple in the outer relation R, we scan the entire inner relation S. Cost: |R| + ||R|| * |S| = 1000 + 100*1000*500 I/Os.

Page-oriented Nested Loops join: For each page of R, get each page of S, and write out matching pairs of tuples <r, s>, where r is in R-page and S is in S-page. Cost: |R| + |R|*|S| = 1000 + 1000*500 If smaller relation (S) is outer, cost = 500 + 500*1000

foreach tuple r in R doforeach tuple s in S do

if ri == sj then add <r, s> to result

Page 20: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 20

Block Nested Loops Join

InputBuffer

Hash Table for R

(M-2) pages

S

R

OutputBuffer

h( )Read in block by

block

Read in page by

page

R S

Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R. For each matching

tuple r in R-block, s in S-page, add <r, s> to result.

Then read next R-block, scan S, etc.

Page 21: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 21

Block Nested Loop Join

Scan inner table S per block of (M – 2) pages of R tuples Each scan costs |S| pages |R| / (M – 2) blocks of R tuples

|R| pages for outer table R Total cost = |R| + |R| / (M – 2) * |S| pages R should be the smaller table

Page 22: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 22

Examples of Block Nested Loops

Cost: Scan of outer + #outer blocks * scan of inner #outer blocks =no of pages of outer / blocksize

With Reserves (R) as outer, and 100 pages of R: Cost of scanning R is 1000 I/Os; a total of 10 blocks. Per block of R, we scan Sailors (S); 10*500 I/Os. If space for just 90 pages of R, we would scan S 12 times.

With 100-page block of Sailors as outer: Cost of scanning S is 500 I/Os; a total of 5 blocks. Per block of S, we scan Reserves; 5*1000 I/Os.

With sequential reads considered, analysis changes: may be best to divide buffers evenly between R and S.

Page 23: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 23

Index Nested Loops Join

InputBuffer

S

R

OutputBuffer

Read in page by

page

R S

Use one page as an input buffer for scanning the outer R, one page as the output buffer.

Use all remaining pages to hold index of inner S. For each tuple r in

R- page, probe index of S to find matches, s, add <r, s> to result.

Then read next R-page, repeat.

Index for S

Page 24: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 24

Index Nested Loop Join

Probe S index for matching S tuples per R tuple Probe hash index: 1.2 I/Os Probe B+ tree: 2-4 I/Os, plus retrieve matching S

tuples: 1 I/O For ||R|| tuples

|R| pages for outer table R Total cost = |R| + ||R|| * index retrieval Better than Block NL join only for small number of

R tuples

Page 25: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 25

Examples of Index Nested Loops

Hash-index (Alt. 2) on sid of Sailors (as inner): Scan Reserves: 1000 page I/Os, 100*1000 tuples. For each Reserves tuple: 1.2 I/Os to get data entry in

index, plus 1 I/O to get (the exactly one) matching Sailors tuple. Total: 220,000 I/Os.

Hash-index (Alt. 2) on sid of Reserves (as inner): Scan Sailors: 500 page I/Os, 80*500 tuples. For each Sailors tuple: 1.2 I/Os to find index page with

data entries, plus cost of retrieving matching Reserves tuples. Assuming uniform distribution, 2.5 reservations per sailor (100,000 / 40,000). Cost of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered.

Page 26: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 26

Sort-Merge Join

Sort R and S on the join column Scan sorted R and S to do a ``merge’’ (on join col.), and

output result tuples. Advance scan of R until current R-tuple >= current S

tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.

At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples.

Then resume scanning R and S. R is scanned once; each S group is scanned once per

matching R tuple. (Multiple scans of an S group are likely to find needed pages in buffer.)

Page 27: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 27

External Sort ( K-Way Merge Sort)

43 26 49 78 65 13 2

3 264 74 9 8 31 65 2

2 1 33 44 65 7 698 2

21 32 43 54 66 87 9

I n p u t D ata

S o r tin g

F ir s t m er g e

S ec o n d m er g e

S o r ted r u n s ( L = 3 )

S o r ted r u n s ( L = 9 )

S o r ted d a ta

Page 28: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 28

Sort Merge Join

External-sort R: 2 |R| * (logM-1 |R|/M + 1) Split R into |R|/M sorted runs each of size M: 2 |R| Merge up to (M – 1) runs repeatedly logM-1 |R|/M passes, each costing 2 |R|

External-sort S: 2 |S| * (logM-1 |S|/M + 1) Merge matching tuples from sorted R and S: |R| + |S| Total cost = 2 |R| * (logM-1 |R|/M + 1) + 2 |S| * (logM-

1 |S|/M + 1) + |R| + |S| If |R| < M*(M-1), cost = 5 * (|R| + |S|)

Page 29: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 29

Example of Sort-Merge Join

Cost of merge is M+N, could be M*N (very unlikely!) With 35, 100 or 300 buffer pages, both Reserves and

Sailors can be sorted in 2 passes; total join cost: 7500.

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day rname

28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustin

(BNL cost: 2500 to 15000 I/Os)

Page 30: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 30

Hash Join – The Principle

S

R

Si

Ri

1

n

i

i iR S R S

Ri Sj If

for i j

Page 31: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 31

Hash-Join

Partition Phase: Partition both relations using hash fn h1: R tuples in partition i will only match S tuples in partition i.

Joining Phase: Read in a partition of R, hash it using h2 (<> h1!). Scan matching partition of S, search for matches.

Partitionsof R & S

Input bufferfor Si

Hash table for partitionRi (k < M -1 pages)

M main memory buffersDisk

Output buffer

Disk

Join Result

hashfnh2

h2

M main memory buffers DiskDisk

Original Relation OUTPUT

2INPUT

1

hashfunction

h1 M-1

Partitions

1

2

M-1

. . .

Page 32: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 32

Hash Join Algorithm

Partition phase: 2 (|R| + |S|) Partition table R using hash function h1: 2 |R| Partition table S using hash function h1: 2 |S| R tuples in partition i will match only S tuples in partition i R (M – 1) partitions, each of size |R| / (M – 1)

Join phase: |R| + |S| Read in a partition of R (|R| / (M – 1) < M -1) Hash it using function h2 (<> h1!) Scan corresponding S partition, search for matches

Total cost = 3 (|R| + |S|) pages Condition: M > √f|R|, f ≈ 1.2 to account for hash table

Page 33: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 33

General Join Conditions

Equalities over several attributes (e.g., R.sid=S.sid AND R.rname=S.sname): For Index NL, build index on <sid, sname> (if S is inner);

or use existing indexes on sid or sname. For Sort-Merge and Hash Join, sort/partition on

combination of the two join columns. Inequality conditions (e.g., R.rname < S.sname):

For Index NL, need (clustered!) B+ tree index.• Range probes on inner; # matches likely to be much higher than

for equality joins. Hash Join, Sort Merge Join not applicable. Block NL quite likely to be the best join method here.

Page 34: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 34

Impact of Buffering

If several operations are executing concurrently, estimating the number of available buffer pages is guesswork.

Repeated access patterns interact with buffer replacement policy. e.g., Inner relation is scanned repeatedly in Simple Nested

Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, MRU is best, LRU is worst (sequential flooding).

Does replacement policy matter for Block Nested Loops? What about Index Nested Loops? Sort-Merge Join?

Page 35: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 35

Summary of Join Operator

Simple nested loop: |R| + ||R|| * |S| Block nested loop: |R| + |R| / (M – 2) * |S| Index nested loop: |R| + ||R|| * index retrieval Sort-merge: 2 |R| * (logM-1 |R|/M + 1) + 2 |S| *

(logM-1 |S|/M + 1) + |R| + |S|

Hash Join: 3 * (|R| + |S|) Condition: M > √f|R|

Page 36: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 36

Overview of Query Processing

Parser QueryOptimizer

Statistics Cost Model

QEPParsed Query

Database

High Level Query Query Result

QueryEvaluator

Page 37: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 37

Query Optimization

Given: An SQL query joining n tables Dream: Map to most efficient plan Reality: Avoid rotten plans State of the art:

Most optimizers follow System R’s technique Works fine up to about 10 joins

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Page 38: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 38

Complexity of Query Optimization

Many degrees of freedom Selection: scan versus

(clustered, non-clustered) index

Join: block nested loop, sort-merge, hash

Relative order of the operators

Exponential search space!

Heuristics Push the selections

down Push the projections

down Delay Cartesian

products System R: Only left-

deep trees

BA

C

D

Page 39: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 39

Selection: - cascade

- commutative

Projection: - cascade

Join: - associative

- commutative

Equivalences in Relational Algebra

c cn c cnR R1 1 ... . . .

c c c cR R1 2 2 1

a a anR R1 1 . . .

R (S T) (R S) T

(R S) (S R)

Page 40: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 40

Equivalences in Relational Algebra

A projection commutes with a selection that only uses attributes retained by the projection

Selection between attributes of the two arguments of a cross-product converts cross-product to a join

A selection on just attributes of R commutes with join R S (i.e., (R S) (R) S )

Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection

Page 41: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 41

System R Optimizer

1. Find all plans for accessing each base table2. For each table

• Save cheapest unordered plan• Save cheapest plan for each interesting order• Discard all others

3. Try all ways of joining pairs of 1-table plans; save cheapest unordered + interesting ordered plans

4. Try all ways of joining 2-table with 1-table5. Combine k-table with 1-table till you have full plan tree6. At the top, to satisfy GROUP BY and ORDER BY

• Use interesting ordered plan• Add a sort node to unordered plan

Page 42: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 42

Source: Selinger et al, “Access Path Selection in a Relational Database Management System”

Page 43: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 43

Note: Only branches for NL join are shown here. Additional branches for other join methods (e.g. sort-merge) are not shown.

Source: Selinger et al, “Access Path Selection in a Relational Database Management System”

Page 44: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 44

What is “Cheapest”?

Need information about the relations and indexes involved Catalogs typically contain at least:

# tuples (NTuples) and # pages (NPages) for each relation. # distinct key values (NKeys) and NPages for each index. Index height, low/high key values (Low/High) for each tree

index. Catalogs updated periodically.

Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.

More detailed information (e.g., histograms of the values in some field) are sometimes stored.

Page 45: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 45

Estimating Result Size

Consider a query block:

Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause.

Reduction factor (RF) associated with each termi reflects the impact of the term in reducing result size Term col=value has RF 1/NKeys(I) Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2)) Term col>value has RF (High(I)-value)/(High(I)-Low(I))

Result cardinality = Max # tuples * product of all RF’s. Implicit assumption that terms are independent!

SELECT attribute listFROM relation listWHERE term1 AND ... AND termk

Page 46: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 46

Cost Estimates for Single-Table Plans

Index I on primary key matches selection: Cost is Height(I)+1 for a B+ tree, about 1.2 for hash

index. Clustered index I matching one or more selects:

(NPages(I)+NPages(R)) * product of RF’s of matching selects.

Non-clustered index I matching one or more selects: (NPages(I)+NTuples(R)) * product of RF’s of matching

selects. Sequential scan of file:

NPages(R).

Note: Typically, no duplicate elimination on projections! (Exception: Done on answers if user says DISTINCT.)

Page 47: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 47

Counting the Costs

With 5 buffers, cost of plan: Scan Reserves (1000) + write temp

T1 (10 pages, if we have 100 boats, uniform distribution)

Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).

Sort T1 (2*10*2), sort T2 (2*250*4), merge (10+250), total=2300

Total: 4060 page I/Os If we used BNL join, join cost =

10+4*250, total cost = 2770 If we ‘push’ projections, T1 has only

sid, T2 only sid and sname: T1 fits in 3 pages, cost of BNL

drops to under 250 pages, total < 2000

Reserves Sailors

sid=sid

bid=100

sname(On-the-fly)

rating > 5(Scan;write to temp T1)

(Scan;write totemp T2)

(Sort-Merge Join)

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5

Page 48: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 48

Exercise Reserves: 100,000 tuples, 100

tuples per page With clustered index on bid of

Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages

Join column sid is a key for Sailors - at most one matching tuple

Decision not to push rating>5 before the join is based on availability of sid index on Sailors

Cost: Selection of Reserves tuples (10 I/Os); for each tuple, must get matching Sailors tuple (1000*1.2); total 1210 I/Os

Reserves

Sailors

sid=sid

bid=100

sname(On-the-fly)

rating > 5

(Use clustered index on sid)

(Index Nested Loops,with pipelining )

(On-the-fly)

(Use hashIndex on sid)

Page 49: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST

L05: Query Processing & Tuning Queries

Query Processing – Principles

Tuning Queries

Page 50: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 50

Avoid Redundant DISTINCT

DISTINCT usually entails a sort operation Slow down query optimization because one more

“interesting” order to consider Remove if you know the result has no duplicates

SELECT DISTINCT ssnumFROM EmployeeWHERE dept = ‘information systems’

Page 51: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 51

Change Nested Queries to Join

Might not use index on Employee.dept

Need DISTINCT if an employee might belong to multiple departments

SELECT ssnumFROM EmployeeWHERE dept IN (SELECT dept FROM Techdept)

SELECT ssnumFROM Employee, TechdeptWHERE Employee.dept = Techdept.dept

Page 52: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 52

Avoid Unnecessary Temp Tables

Creating temp table causes update to catalog Cannot use any index on original table

SELECT * INTO TempFROM EmployeeWHERE salary > 40000

SELECT ssnumFROM TempWHERE Temp.dept = ‘information systems’

SELECT ssnumFROM EmployeeWHERE Employee.dept = ‘information systems’AND salary > 40000

Page 53: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 53

Avoid Complicated Correlation Subqueries

Search all of e2 for each e1 record!

SELECT ssnumFROM Employee e1WHERE salary = (SELECT MAX(salary) FROM Employee e2 WHERE e2.dept = e1.dept

SELECT MAX(salary) as bigsalary, dept INTO TempFROM EmployeeGROUP BY dept

SELECT ssnumFROM Employee, TempWHERE salary = bigsalaryAND Employee.dept = Temp.dept

Page 54: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 54

Avoid Complicated Correlation Subqueries

SQL Server 2000 does a good job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks) The techniques

implemented in SQL Server 2000 are described in “Orthogonal Optimization of Subqueries and Aggregates” by C.Galindo-Legaria and M.Joshi, SIGMOD 2001.-10

0

10

20

30

40

50

60

70

80

correlated subquery

Th

rou

gh

pu

t im

pro

vem

ent p

erce

nt

SQLServer 2000

Oracle 8i

DB2 V7.1

> 10000> 1000

Page 55: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 55

Join on Clustering and Integer Attributes

Employee is clustered on ssnum ssnum is an integer

SELECT Employee.ssnumFROM Employee, StudentWHERE Employee.name = Student.name

SELECT Employee.ssnumFROM Employee, StudentWHERE Employee.ssnum = Student.ssnum

Page 56: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 56

Avoid HAVING when WHERE is enough

May first perform grouping for all departments!

SELECT AVG(salary) as avgsalary, deptFROM EmployeeGROUP BY deptHAVING dept = ‘information systems’

SELECT AVG(salary) as avgsalaryFROM EmployeeWHERE dept = ‘information systems’GROUP BY dept

Page 57: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 57

Avoid Views with unnecessary Joins

Join with Techdept unnecessarily

CREATE VIEW TechlocationAS SELECT ssnum, Techdept.dept, locationFROM Employee, TechdeptWHERE Employee.dept = Techdept.dept

SELECT deptFROM TechlocationWHERE ssnum = 4444

SELECT deptFROM EmployeeWHERE ssnum = 4444

Page 58: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 58

Aggregate Maintenance

Materialize an aggregate if needed “frequently” Use trigger to update

create trigger updateVendorOutstanding on orders for insert asupdate vendorOutstandingset amount =

(select vendorOutstanding.amount+sum(inserted.quantity*item.price)from inserted,itemwhere inserted.itemnum = item.itemnum)

where vendor = (select vendor from inserted) ;

Page 59: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 59

Avoid External Loops

No loop:sqlStmt = “select * from lineitem where l_partkey <=

200;”odbc->prepareStmt(sqlStmt);odbc->execPrepared(sqlStmt);

Loop:sqlStmt = “select * from lineitem where l_partkey = ?;”odbc->prepareStmt(sqlStmt);for (int i=1; i<200; i++){

odbc->bindParameter(1, SQL_INTEGER, i);odbc->execPrepared(sqlStmt);

}

Page 60: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 60

Avoid External Loops

SQL Server 2000 on Windows 2000

Crossing the application interface has a significant impact on performance

0

100

200

300

400

500

600

loop no loop

thro

ug

hp

ut

(re

co

rds

/se

c)

Let the DBMS optimizeset operations

Page 61: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 61

Avoid Cursors

No cursorselect * from employees;

CursorDECLARE d_cursor CURSOR FOR select * from employees;OPEN d_cursorwhile (@@FETCH_STATUS = 0)BEGIN

FETCH NEXT from d_cursorENDCLOSE d_cursorgo

Page 62: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 62

Avoid Cursors

SQL Server 2000 on Windows 2000

Response time is a few seconds with a SQL query and more than an hour iterating over a cursor

0

1000

2000

3000

4000

5000

cursor SQL

Th

rou

gh

pu

t (r

eco

rds/

sec)

Page 63: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 63

Retrieve Needed Columns Only

AllSelect * from lineitem;

Covered subsetSelect l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;

Avoid transferring unnecessary data

May enable use of a covering index.

0

0.25

0.5

0.75

1

1.25

1.5

1.75

no index index

Th

rou

gh

pu

t (q

uer

ies/

mse

c)

all

covered subset

Page 64: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 64

Use Direct Path for Bulk Loading

sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl

load data infile "lineitem.tbl"into table LINEITEM appendfields terminated by '|' (

L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT

)

Page 65: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 65

Use Direct Path for Bulk Loading

Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record).

650

10000

20000

30000

40000

50000

conventional direct path insert

Th

rou

gh

pu

t (r

ec

/se

c)

Page 66: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 66

Some Idiosyncrasies

OR may stop the index being used break the query and use UNION

Order of tables may affect join implementation

Page 67: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 67

Query Tuning Principles

Avoid redundant DISTINCT Change nested queries to join Avoid unnecessary temp tables Avoid complicated correlation subqueries Join on clustering and integer attributes Avoid HAVING when WHERE is enough Avoid views with unnecessary joins Maintain frequently used aggregates Avoid external loops

Page 68: H.Lu/HKUST L05: Query Processing & Tuning Queries  Query Processing -- Principles  Tuning Queries

H.Lu/HKUST L04: Physical Database Design (2) -- 68

Query Tuning Principles

Avoid cursors Retrieve needed columns only Use direct path for bulk loading


Top Related