principles of query processing. application programmer (e.g., business analyst, data architect)...
TRANSCRIPT
![Page 1: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/1.jpg)
Principles of Query Processing
![Page 2: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/2.jpg)
ApplicationProgrammer
(e.g., business analyst,Data architect)
SophisticatedApplicationProgrammer
(e.g., SAP admin)
DBA,Tuner
Hardware[Processor(s), Disk(s), Memory]
Operating System
Concurrency Control Recovery
Storage SubsystemIndexes
Query Processor
Application
![Page 3: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/3.jpg)
Overview of Query Processing
Parser QueryOptimizer
Statistics Cost Model
QEPParsed Query
Database
High Level Query Query Result
QueryEvaluator
![Page 4: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/4.jpg)
Outline
• Processing relational operators
• Query optimization
• Performance tuning
![Page 5: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/5.jpg)
Projection Operator
R.attrib, .. (R)
• Implementation is straightforward
SELECT bidFROM Reserves RWHERE R.rname < ‘C%’
![Page 6: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/6.jpg)
Selection Operator
R.attr op value (R)
• Size of result = R * selectivity • Scan• Clustered index: Good• Non-clustered index:
– Good for low selectivity– Worse than scan for high selectivity
SELECT *FROM Reserves RWHERE R.rname < ‘C%’
![Page 7: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/7.jpg)
Example of Join
sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day rname
31 101 10/11/96 lubber58 103 11/12/96 dustin
sid sname rating age bid day rname
31 lubber 8 55.5 101 10/11/96 lubber58 rusty 10 35.0 103 11/12/96 dustin
SELECT *FROM Sailors R, Reserve SWHERE R.sid=S.sid
![Page 8: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/8.jpg)
Notations
• |R| = number of pages in outer table R• ||R|| = number of tuples in outer table R• |S| = number of pages in inner table S• ||S|| = number of tuples in inner table S• M = number of main memory pages allocated
![Page 9: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/9.jpg)
Simple Nested Loop Join
R S
Tuple
1 scan per R tuple
|S| pages per scan||R|| tuples
![Page 10: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/10.jpg)
Simple Nested Loop Join
• Scan inner table S per R tuple: ||R|| * |S|– Each scan costs |S| pages– For ||R|| tuples
• |R| pages for outer table R• Total cost = |R| + ||R|| * |S| pages• Not optimal!
![Page 11: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/11.jpg)
Block Nested Loop Join
R S
M – 2 pages
1 scan per R block
|S| pages per scan|R| / (M – 2) blocks
![Page 12: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/12.jpg)
Block Nested Loop Join
• Scan inner table S per block of (M – 2) pages of R tuples– Each scan costs |S| pages– |R| / (M – 2) blocks of R tuples
• |R| pages for outer table R
• Total cost = |R| + |R| / (M – 2) * |S| pages
• R should be the smaller table
![Page 13: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/13.jpg)
Index Nested Loop Join
R S
Tuple
Index
||R|| tuples
1 probe per R tuple
![Page 14: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/14.jpg)
Index Nested Loop Join
• Probe S index for matching S tuples per R tuple– Probe hash index: 1.2 I/Os– Probe B+ tree: 2-4 I/Os, plus retrieve matching S
tuples: 1 I/O– For ||R|| tuples
• |R| pages for outer table R• Total cost = |R| + ||R|| * index retrieval• Better than Block NL join only for small number
of R tuples
![Page 15: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/15.jpg)
Sort Merge Join
• External sort R• External sort S• Merge sorted R and sorted S
![Page 16: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/16.jpg)
External Sort R
R0,M-1 R0,M… …
R1,2 R1,M-1…Merge pass 1 R1,1
Merge pass 2 R2,1
Split pass R R0,1
# merge passes = logM-1 |R|/M
Cost per pass = |R| input + |R| output = 2 |R|
Total cost = 2 |R| (logM-1 |R|/M + 1) including split pass
Size of R0,i = M, # R0,i’s = |R|/M
(m-1)-waymerge
![Page 17: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/17.jpg)
• A classic problem in computer science!• Data requested in sorted order
– e.g., find students in increasing cap order• Sorting is used in many applications
– First step in bulk loading operations.– Sorting useful for eliminating duplicate copies in a collection of
records (How?)– Sort-merge join algorithm involves sorting.
• Problem: sort 1Gb of data with 1Mb of RAM.
External Sorting
![Page 18: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/18.jpg)
2-Way Sort: Requires 3 Buffers
• Pass 1: Read a page, sort it, write it.– only one buffer page is used
• Pass 2, 3, …, etc.:– three buffer pages used.
Main memory buffers
INPUT 1
INPUT 2
OUTPUT
DiskDisk
![Page 19: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/19.jpg)
Two-Way External Merge Sort
• Each pass we read + write each page in file.
• N pages in the file => the number of passes
• So total cost is:
• Idea: Divide and conquer: sort
subfiles and merge
log2 1N
2 12N Nlog
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,62,6 4,9 7,8 1,3 2
2,34,6
4,7
8,91,35,6 2
2,3
4,46,7
8,9
1,23,56
1,22,3
3,4
4,56,6
7,8
![Page 20: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/20.jpg)
General External Merge Sort
• To sort a file with N pages using B buffer pages:– Pass 0: use B buffer pages. Produce sorted runs of B pages each. – Pass 2, …, etc.: merge B-1 runs.
N B/
B Main memory buffers
INPUT 1
INPUT B-1
OUTPUT
DiskDisk
INPUT 2
. . . . . .
. . .
More than 3 buffer pages. How can we utilize them?
![Page 21: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/21.jpg)
Cost of External Merge Sort
• Number of passes:• Cost = 2N * (# of passes)• E.g., with 5 buffer pages, to sort 108 page file:
– Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages)
– Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages)– Pass 2: 2 sorted runs, 80 pages and 28 pages– Pass 3: Sorted file of 108 pages
1 1 log /B N B
108 5/
22 4/
![Page 22: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/22.jpg)
Number of Passes of External Sort
N B=3 B=5 B=9 B=17 B=129 B=257100 7 4 3 2 1 11,000 10 5 4 3 2 210,000 13 7 5 4 2 2100,000 17 9 6 5 3 31,000,000 20 10 7 5 3 310,000,000 23 12 8 6 4 3100,000,000 26 14 9 7 4 41,000,000,000 30 15 10 8 5 4
![Page 23: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/23.jpg)
Sequential vs Random I/Os
• Transfer rate increases 40% per year; seek time and latency time decreases by only 8% per year
• Is minimizing passes optimal? Would merging as many runs as possible the best solution?
• Suppose we have 80 runs, each 80 pages long and we have 81 pages of buffer space.
• We can merge all 80 runs in a single pass– each page requires a seek to access (Why?)– there are 80 pages per run, so 80 seeks per run– total cost = 80 runs X 80 seeks = 6,400 seeks
![Page 24: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/24.jpg)
Sequential vs Random I/Os (Cont)
• We can merge all 80 runs in two steps– 5 sets of 16 runs each
• read 80/16=5 pages of one run • 16 runs result in sorted run of 1280 pages• each merge requires 80/5X16 = 256 seeks • for 5 sets, we have 5X256 = 1280 seeks
– merge 5 runs of 1280 pages • read 80/5=16 pages of one run => 1280/16=80 seeks in total• 5 runs => 5X80 = 400 seeks
– total: 1280+400=1680 seeks!!!• Number of passes increases, but number of seeks decreases!
![Page 25: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/25.jpg)
Sort Merge Join
• External-sort R: 2 |R| * (logM-1 |R|/M + 1)– Split R into |R|/M sorted runs each of size M: 2 |R|– Merge up to (M – 1) runs repeatedly logM-1 |R|/M passes, each costing 2 |R|
• External-sort S: 2 |S| * (logM-1 |S|/M + 1)• Merge matching tuples from sorted R and S: |R|
+ |S|• Total cost = 2 |R| * (logM-1 |R|/M + 1) + 2 |S| *
(logM-1 |S|/M + 1) + |R| + |S|– If |R| < M*(M-1), cost = 5 * (|R| + |S|)
![Page 26: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/26.jpg)
GRACE Hash Join
X X XX X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
R
S
0
1
2
3
0 1 2 3
bucketID = X mod 4Join on R.X = S.X
R S = R0 S0 + R1 S1 + R2 S2 + R3 S3
![Page 27: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/27.jpg)
GRACE Hash Join – Partition Phase
M main memory buffers DiskDisk
Original Relation OUTPUT
2INPUT
1
hashfunction
h1M-1
Partitions
1
2
M-1
. . .
R (M – 1) partitions, each of size |R| / (M – 1)
![Page 28: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/28.jpg)
GRACE Hash Join – Join Phase
Partitionsof R & S
Input bufferfor Si
Hash table for partitionRi (< M-1 pages)
B main memory buffersDisk
Output buffer
Disk
Join Result
hashfnh2
h2
Partition must fit in memory: |R| / (M – 1) < M -1
![Page 29: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/29.jpg)
GRACE Hash Join Algorithm
• Partition phase: 2 (|R| + |S|)– Partition table R using hash function h1: 2 |R|– Partition table S using hash function h1: 2 |S|– R tuples in partition i will match only S tuples in partition I– R (M – 1) partitions, each of size |R| / (M – 1)
• Join phase: |R| + |S|– Read in a partition of R (|R| / (M – 1) < M -1)– Hash it using function h2 (<> h1!)– Scan corresponding S partition, search for matches
• Total cost = 3 (|R| + |S|) pages
• Condition: M > √f|R|, f ≈ 1.2 to account for hash table
![Page 30: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/30.jpg)
Summary of Join Operator
• Simple nested loop: |R| + ||R|| * |S|
• Block nested loop: |R| + |R| / (M – 2) * |S|
• Index nested loop: |R| + ||R|| * index retrieval
• Sort-merge: 2 |R| * (logM-1 |R|/M + 1) + 2 |S| *
(logM-1 |S|/M + 1) + |R| + |S|
• GRACE hash: 3 * (|R| + |S|)– Condition: M > √f|R|
![Page 31: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/31.jpg)
Overview of Query Processing
Parser QueryOptimizer
Statistics Cost Model
QEPParsed Query
Database
High Level Query Query Result
QueryEvaluator
![Page 32: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/32.jpg)
Query Rewriting
• A query can be expressed in many forms, with some being more efficient than others.
• Example: S, P, SP relationsSelect Distinct S.sname
From S
Where S.s# IN (Select SP.s#
From SP
Where SP.p# = ‘P2’)
Select Distinct S.sname
From S, SP
Where S.s# = SP.s#
AND SP.p# = ‘P2’
Select Distinct S.sname
From S
Where ‘P2’ IN
(Select SP.p#
From SP
Where SP.p# = S.s#)
![Page 33: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/33.jpg)
Select Distinct S.sname
From S
Where S.s# = ANY
(Select SP.s#
From SP
Where SP.p# = ‘P2’)
Select Distinct S.sname
From S
Where EXISTS
(Select *
From SP
Where SP.s# = S.s#
And SP.p# = ‘P2’)
Select Distinct S.sname
From S
Where 0 <
(Select Count(*)
From SP
Where SP.s# = S.s#
And SP.p# = ‘P2’)
Select S.sname
From S, SP
Where SP.s# = S.s#
And SP.p# = ‘P2’)
Group by S.sname
![Page 34: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/34.jpg)
Query Optimization
• Given: An SQL query joining n tables• Dream: Map to most efficient plan• Reality: Avoid rotten plans• State of the art:
– Most optimizers follow System R’s technique– Works fine up to about 10 joins
SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
![Page 35: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/35.jpg)
Complexity of Query Optimization
• Many degrees of freedom– Selection: scan versus
(clustered, non-clustered) index
– Join: block nested loop, sort-merge, hash
– Relative order of the operators
– Exponential search space!
• Heuristics– Push the selections down– Push the projections down– Delay Cartesian products– System R: Only left-deep
trees
BA
C
D
![Page 36: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/36.jpg)
• Selection: - cascade
- commutative
• Projection: - cascade
• Join: - associative
- commutative
Equivalences in Relational Algebra
c cn c cnR R1 1 ... . . .
c c c cR R1 2 2 1
a a anR R1 1 . . .
R (S T) (R S) T
(R S) (S R)
![Page 37: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/37.jpg)
Equivalences in Relational Algebra
• A projection commutes with a selection that only uses attributes retained by the projection
• Selection between attributes of the two arguments of a cross-product converts cross-product to a join
• A selection on just attributes of R commutes with join R S (i.e., (R S) (R) S )
• Similarly, if a projection follows a join R S, we can `push’ it by retaining only attributes of R (and S) that are needed for the join or are kept by the projection
![Page 38: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/38.jpg)
System R Optimizer
1. Find all plans for accessing each base table2. For each table
• Save cheapest unordered plan• Save cheapest plan for each interesting order• Discard all others
3. Try all ways of joining pairs of 1-table plans; save cheapest unordered + interesting ordered plans
4. Try all ways of joining 2-table with 1-table5. Combine k-table with 1-table till you have full plan tree6. At the top, to satisfy GROUP BY and ORDER BY
• Use interesting ordered plan• Add a sort node to unordered plan
![Page 39: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/39.jpg)
Source: Selinger et al, “Access Path Selection in a Relational Database Management System”
![Page 40: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/40.jpg)
Search Strategies for Single Relations
![Page 41: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/41.jpg)
Note: Only branches for NL join are shown here. Additional branches for other join methods (e.g. sort-merge) are not shown.
Source: Selinger et al, “Access Path Selection in a Relational Database Management System”
![Page 42: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/42.jpg)
![Page 43: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/43.jpg)
What is “Cheapest”?
• Need information about the relations and indexes involved
• Catalogs typically contain at least:– # tuples (NTuples) and # pages (NPages) for each relation.– # distinct key values (NKeys) and NPages for each index.– Index height, low/high key values (Low/High) for each tree index.
• Catalogs updated periodically.– Updating whenever data changes is too expensive; lots of
approximation anyway, so slight inconsistency ok.
• More detailed information (e.g., histograms of the values in some field) are sometimes stored.
![Page 44: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/44.jpg)
Estimating Result Size
• Consider a query block:
• Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause.
• Reduction factor (RF) associated with each termi reflects the impact of the term in reducing result size– Term col=value has RF 1/NKeys(I)– Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2))– Term col>value has RF (High(I)-value)/(High(I)-Low(I))
• Result cardinality = Max # tuples * product of all RF’s.– Implicit assumption that terms are independent!
SELECT attribute listFROM relation listWHERE term1 AND ... AND termk
![Page 45: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/45.jpg)
Cost Estimates for Single-Table Plans
• Index I on primary key matches selection:– Cost is Height(I)+1 for a B+ tree, about 1.2 for hash index.
• Clustered index I matching one or more selects:– (NPages(I)+NPages(R)) * product of RF’s of matching selects.
• Non-clustered index I matching one or more selects:– (NPages(I)+NTuples(R)) * product of RF’s of matching selects.
• Sequential scan of file:– NPages(R).
Note: Typically, no duplicate elimination on projections! (Exception: Done on answers if user says DISTINCT.)
![Page 46: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/46.jpg)
Counting the Costs
• With 5 buffers, cost of plan:– Scan Reserves (1000) + write temp
T1 (10 pages, if we have 100 boats, uniform distribution)
– Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings).
– Sort T1 (2*10*2), sort T2 (2*250*4), merge (10+250), total=2300
– Total: 4060 page I/Os
• If we used BNL join, join cost = 10+4*250, total cost = 2770
• If we ‘push’ projections, T1 has only sid, T2 only sid and sname:– T1 fits in 3 pages, cost of BNL
drops to under 250 pages, total < 2000
Reserves Sailors
sid=sid
bid=100
sname(On-the-fly)
rating > 5(Scan;write to temp T1)
(Scan;write totemp T2)
(Sort-Merge Join)
SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
![Page 47: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/47.jpg)
Exercise
• Reserves: 100,000 tuples, 100 tuples per page
• With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages
• Join column sid is a key for Sailors - at most one matching tuple
• Decision not to push rating>5 before the join is based on availability of sid index on Sailors
• Cost: Selection of Reserves tuples (10 I/Os); for each tuple, must get matching Sailors tuple (1000*1.2); total 1210 I/Os
Reserves
Sailors
sid=sid
bid=100
sname(On-the-fly)
rating > 5
(Use clustered index on sid)
(Index Nested Loops,with pipelining )
(On-the-fly)
(Use hashIndex on sid)
![Page 48: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/48.jpg)
Query Tuning
![Page 49: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/49.jpg)
Avoid Redundant DISTINCT
• DISTINCT usually entails a sort operation• Slow down query optimization because one
more “interesting” order to consider• Remove if you know the result has no duplicates
SELECT DISTINCT ssnumFROM EmployeeWHERE dept = ‘information systems’
![Page 50: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/50.jpg)
Change Nested Queries to Join
• Might not use index on Employee.dept
• Need DISTINCT if an employee might belong to multiple departments
SELECT ssnumFROM EmployeeWHERE dept IN (SELECT dept FROM Techdept)
SELECT ssnumFROM Employee, TechdeptWHERE Employee.dept = Techdept.dept
![Page 51: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/51.jpg)
Avoid Unnecessary Temp Tables
• Creating temp table causes update to catalog• Cannot use any index on original table
SELECT * INTO TempFROM EmployeeWHERE salary > 40000
SELECT ssnumFROM TempWHERE Temp.dept = ‘information systems’
SELECT ssnumFROM EmployeeWHERE Employee.dept = ‘information systems’AND salary > 40000
![Page 52: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/52.jpg)
Avoid Complicated Correlation Subqueries
• Search all of e2 for each e1 record!
SELECT ssnumFROM Employee e1WHERE salary = (SELECT MAX(salary) FROM Employee e2 WHERE e2.dept = e1.dept
SELECT MAX(salary) as bigsalary, dept INTO TempFROM EmployeeGROUP BY dept
SELECT ssnumFROM Employee, TempWHERE salary = bigsalaryAND Employee.dept = Temp.dept
![Page 53: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/53.jpg)
Avoid Complicated Correlation Subqueries
• SQL Server 2000 does a good job at handling the correlated subqueries (a hash join is used as opposed to a nested loop between query blocks)– The techniques
implemented in SQL Server 2000 are described in “Orthogonal Optimization of Subqueries and Aggregates” by C.Galindo-Legaria and M.Joshi, SIGMOD 2001.-10
0
10
20
30
40
50
60
70
80
correlated subquery
Th
rou
gh
pu
t im
pro
vem
ent p
erce
nt
SQLServer 2000
Oracle 8i
DB2 V7.1
> 10000> 1000
![Page 54: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/54.jpg)
Join on Clustering and Integer Attributes
• Employee is clustered on ssnum• ssnum is an integer
SELECT Employee.ssnumFROM Employee, StudentWHERE Employee.name = Student.name
SELECT Employee.ssnumFROM Employee, StudentWHERE Employee.ssnum = Student.ssnum
![Page 55: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/55.jpg)
Avoid HAVING when WHERE is enough
• May first perform grouping for all departments!
SELECT AVG(salary) as avgsalary, deptFROM EmployeeGROUP BY deptHAVING dept = ‘information systems’
SELECT AVG(salary) as avgsalaryFROM EmployeeWHERE dept = ‘information systems’GROUP BY dept
![Page 56: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/56.jpg)
Avoid Views with unnecessary Joins
• Join with Techdept unnecessarily
CREATE VIEW TechlocationAS SELECT ssnum, Techdept.dept, locationFROM Employee, TechdeptWHERE Employee.dept = Techdept.dept
SELECT deptFROM TechlocationWHERE ssnum = 4444
SELECT deptFROM EmployeeWHERE ssnum = 4444
![Page 57: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/57.jpg)
Aggregate Maintenance
• Materialize an aggregate if needed “frequently”• Use trigger to update
create trigger updateVendorOutstanding on orders for insert asupdate vendorOutstandingset amount =
(select vendorOutstanding.amount+sum(inserted.quantity*item.price)from inserted,itemwhere inserted.itemnum = item.itemnum)
where vendor = (select vendor from inserted) ;
![Page 58: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/58.jpg)
Avoid External Loops
• No loop:sqlStmt = “select * from lineitem where l_partkey <=
200;”odbc->prepareStmt(sqlStmt);odbc->execPrepared(sqlStmt);
• Loop:sqlStmt = “select * from lineitem where l_partkey = ?;”odbc->prepareStmt(sqlStmt);for (int i=1; i<200; i++){
odbc->bindParameter(1, SQL_INTEGER, i);odbc->execPrepared(sqlStmt);
}
![Page 59: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/59.jpg)
Avoid External Loops
• SQL Server 2000 on Windows 2000
• Crossing the application interface has a significant impact on performance
0
100
200
300
400
500
600
loop no loop
thro
ug
hp
ut
(rec
ord
s/se
c)
Let the DBMS optimizeset operations
![Page 60: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/60.jpg)
Avoid Cursors
• No cursorselect * from employees;
• CursorDECLARE d_cursor CURSOR FOR select * from employees;OPEN d_cursorwhile (@@FETCH_STATUS = 0)BEGIN
FETCH NEXT from d_cursorENDCLOSE d_cursorgo
![Page 61: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/61.jpg)
Avoid Cursors
• SQL Server 2000 on Windows 2000
• Response time is a few seconds with a SQL query and more than an hour iterating over a cursor
0
1000
2000
3000
4000
5000
cursor SQL
Th
rou
gh
pu
t (r
eco
rds/
sec)
![Page 62: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/62.jpg)
Retrieve Needed Columns Only
– All
Select * from lineitem;
– Covered subset
Select l_orderkey, l_partkey, l_suppkey, l_shipdate, l_commitdate from lineitem;
• Avoid transferring unnecessary data
• May enable use of a covering index.
0
0.25
0.5
0.75
1
1.25
1.5
1.75
no index index
Th
rou
gh
pu
t (q
uer
ies/
mse
c)
all
covered subset
![Page 63: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/63.jpg)
Use Direct Path for Bulk Loading
sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl
load data infile "lineitem.tbl"into table LINEITEM appendfields terminated by '|' (
L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT
)
![Page 64: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/64.jpg)
Use Direct Path for Bulk Loading
• Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than for conventional bulk load (commit every 100 records) and inserts (commit for each record).
650
10000
20000
30000
40000
50000
conventional direct path insert
Th
rou
gh
pu
t (r
ec/s
ec)
![Page 65: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/65.jpg)
Some Idiosyncrasies
• OR may stop the index being used– break the query and use UNION
• Order of tables may affect join implementation
![Page 66: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/66.jpg)
Query Tuning – Thou Shalt …
• Avoid redundant DISTINCT• Change nested queries to join• Avoid unnecessary temp tables• Avoid complicated correlation subqueries• Join on clustering and integer attributes• Avoid HAVING when WHERE is enough• Avoid views with unnecessary joins• Maintain frequently used aggregates• Avoid external loops
![Page 67: Principles of Query Processing. Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin)](https://reader031.vdocument.in/reader031/viewer/2022032308/56649f4d5503460f94c6dd61/html5/thumbnails/67.jpg)
Query Tuning – Thou Shalt …
• Avoid cursors• Retrieve needed columns only• Use direct path for bulk loading