query& processing& · select b,d from r,s where r.a = “c” and s.e = 2 and r.c=s.c query...
TRANSCRIPT
![Page 1: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/1.jpg)
query processing
![Page 2: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/2.jpg)
Query Processing • The query processor turns user queries and data
modification commands into a query plan - a sequence of operations (or algorithm) on the database from high level queries to low level commands
• Decisions taken by the query processor Which of the algebraically equivalent forms of a query will lead to the most efficient algorithm? For each algebraic operator what algorithm should we use to run the operator? How should the operators pass data from one to the other? (eg, main memory buffers, disk buffers)
2
![Page 3: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/3.jpg)
SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C
Query Processing Example SQL Query
Example
A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45
R C D E 10 x 2 20 y 2 30 z 2 40 x 1 50 y 3
S
B D 2 x
3
![Page 4: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/4.jpg)
Query Processing Example How do we execute the query eventually?
- Scan relations - Compute Cartesian product - Select tuples - Compute projection
One idea!
R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2
… … … … … …
c 2 10 10 x 2
… … … … … …
R×S
We found one!
4
![Page 5: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/5.jpg)
Describing execution plans Using an enhanced version of the relational algebra
ΠB,D
σR.A=“c”∧ S.E=2 ∧ R.C=S.C
X R S
SCAN SCAN
FLY
FLY
1. Scan R 2. For each tuple r of R scan S 3. For each (r,s), where s in S
select and project on the fly
PLAN I
or: ΠB,D [ σR.A=“c”∧ S.E=2 ∧ R.C = S.C(R × S )] FLY FLY SCAN SCAN
5
![Page 6: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/6.jpg)
Describing execution plans Using an enhanced version of the relational algebra
ΠB,D
σR.A=“c”∧ S.E=2 ∧ R.C=S.C
X R S
PLAN I
FLY & SCAN are the defaults!
1. Scan R 2. For each tuple r of R scan S 3. For each (r,s), where s in S
select and project on the fly
6
![Page 7: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/7.jpg)
Describing execution plans Using an enhanced version of the relational algebra
PLAN II
ΠB,D
σR.A = “c” σS.E = 2
R S
: natural join
HASH 1. Scan R & S 2. Perform on the fly selections 3. Do hash join 4. Project
7
![Page 8: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/8.jpg)
Describing execution plans Using an enhanced version of the relational algebra
PLAN II
A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45
R C D E 10 x 2 20 y 2 30 z 2 40 x 1 50 y 3
S
A B C c 2 10
C D E 10 x 2 20 y 2 30 z 2
σR σS
8
![Page 9: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/9.jpg)
Describing execution plans Using an enhanced version of the relational algebra
PLAN III ΠR.B, S.D
R S
σS.E=2
σR.a=“c” INDEX
RI
Use R.A and S.C Indexes
1. Use R.A index to select R tuples with R.A = “c”
2. For each R.C value found, use S.C index to find matching join tuples
3. Eliminate join tuples S.E ≠ 2 4. Project B,D attributes
: right index natural join RI
9
![Page 10: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/10.jpg)
Describing execution plans Using an enhanced version of the relational algebra
PLAN III
A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45
R C D E 10 x 2 20 y 2 30 z 2 40 x 1 50 y 3
S
A C
I1 I2 =“c”
<c,2,10> <10,x,2>
check=2?
output: <2,x>
next tuple: <c,7,15>
10
![Page 11: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/11.jpg)
From Query To Optimal Plan • Complex process • Algebra-based logical and physical plans • Transformations • Evaluation of multiple alternatives
11
![Page 12: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/12.jpg)
Issues in Query Processing and Optimization
• Generate Plans Employ efficient execution primitives for computing relational algebra operations Systematically transform expressions to achieve more efficient combinations of operators
• Estimate Cost of Generated Plans Statistics
• “Smart” Search of the Space of Possible Plans Always do the “good” transformations (relational algebra optimization) Prune the space (e.g., System R)
• Often the above steps are mixed
12
![Page 13: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/13.jpg)
The Journey of a Query parse
convert
Generate/transform lqp’s
estimate result sizes
generate physical plans
estimate costs
pick best
execute
answer parse tree
logical query plan
“improved” logical query plans
logical query plans + sizes
Scope of responsibility of each module is fuzzy
generate/transform pqp’s
SQL Query
13
![Page 14: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/14.jpg)
The Journey of a Query
SELECT Theater FROM Movie M, Schedule S WHERE M.Title = S.Title AND M.Actor=“Winger”
π
σ
Movie Schedule
M.Title=S.Title AND M.Actor=“Winger”
Theater
x
Parse/Convert
Logical query plan generator
Initial logical query plan SQL query
14
![Page 15: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/15.jpg)
The Journey of a Query
Logical Plan Generator applies Algebraic Rewriting
Logical query plan generator
π
σ
x M.Title=S.Title
Theater
σ M.Actor=“Winger”
Movie Schedule
π
Movie Schedule
Theater
σ M.Actor=“Winger”
M.Title=S.Title
Another logical query plan Another logical query plan
15
![Page 16: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/16.jpg)
The Journey of a Query
Logical Plan Generator applies Algebraic Rewriting
Logical query plan generator
Another logical query plan
Theater
S.Title=M.Title
σ M.Actor=“Winger”
Movie Schedule
π • 4 logical query plans created • Algebraic rewritings were used
for producing the candidate logical query plans plans
• The last one is the winner (at least, cannot be a big loser)
• In general, multiple logical plans may “win” eventually
Summary of Logical Plan Generator:
16
![Page 17: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/17.jpg)
The Journey of a Query
Physical Plan Generator chooses execution primitives and data passing
Physical query plan generator
1st physical query plan
π Theater
S.Title=M.Title
σ Actor=“Winger”
LEFT INDEX
INDEX
Movie Schedule
Index on Actor and Title, unsorted tables, Table size >> Memory size
17
![Page 18: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/18.jpg)
The Journey of a Query
Physical Plan Generator chooses execution primitives and data passing
Physical query plan generator
2nd physical query plan
π Theater
S.Title=M.Title
σ
SORT-MERGE
INDEX
Movie Schedule
Actor=“Winger”
Index on Actor, Table Schedule sorted on Schedule
18
![Page 19: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/19.jpg)
The Journey of a Query Physical query plan generator
• 2 logical query plans created in this case) • In general, more than one different logical
plans may be generated by choosing different primitives
Summary of Physical Plan Generator:
19
![Page 20: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/20.jpg)
SELECT title FROM StarsIn WHERE starName IN (
SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’)
(Find the movies with stars born in 1960)
Example: Nested SQL Query
20
![Page 21: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/21.jpg)
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
Example: Parse Tree
21
![Page 22: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/22.jpg)
Πtitle
σ StarsIn <condition>
<tuple> IN ΠName
<attribute> σbirthdate LIKE ‘%1960’
starName MovieStar
An expression using a two-argument σ, midway between a parse tree and relational algebra
Example: Generating Rel. Algebra
22
![Page 23: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/23.jpg)
Πtitle
σstarName=name
StarsIn Πname
σbirthdate LIKE ‘%1960’
MovieStar
May consider “IN” elimination as a rewriting in the logical plan generator or may consider it a task of the converter
×
Example: Logical Query Plan
23
![Page 24: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/24.jpg)
Πtitle
StarsIn Πname
σbirthdate LIKE ‘%1960’
MovieStar
Example: Improved Logical Plan
starName=name
24
![Page 25: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/25.jpg)
StarsIn
MovieStar
Π
σ
Example: Size estimation Sizes are important for selecting physical query plans
Need expected size
25
![Page 26: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/26.jpg)
Example: One Physical Plan
StarsIn SCAN
MovieStar
σ birthdate LIKE ‘%1960’ INDEX
starName=name
Πname
Πtitle AddiVonal parameters: memory size, result sizes...
26
![Page 27: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/27.jpg)
• Transformation rules (preserve equivalence) • What are good transformations?
Relational Algebra Optimization
27
![Page 28: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/28.jpg)
Commutativity & Associativity Algebraic Rewritings
Cartesian Product
Natural Join
R
S T
×
× T
R S
×
× R S
×
Commutativity Associativity
R
S T
T
R S R S S R
S R
×
28
![Page 29: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/29.jpg)
Commutativity & Associativity Algebraic Rewritings
Union
Intersection
R
S T
T
R S R S
∪
Commutativity Associativity
R
S T
T
R S R S S R
S R
∪ ∪ ∪
∪ ∪
∩ ∩ ∩ ∩
∩
∩
29
![Page 30: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/30.jpg)
Decomposition of Logical Connectives Algebraic Rewritings
σ cond2
σ cond1
R
σ cond1 AND cond2 R
σ cond1 OR cond2 R
∪
σ cond2
R
σ cond1 σ cond1
σ cond2
R
30
![Page 31: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/31.jpg)
Decomposition of Negation Algebraic Rewritings
σ cond1 AND NOT cond2 R
σ NOT cond2 R
31
![Page 32: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/32.jpg)
Pushing Selection through Union & Difference
Algebraic Rewritings
σ
R S
cond ∪ σ cond
S
σ cond R
∪
σ
R S
cond -
σ cond S
σ cond R
-
S
σ cond R
-
Union
Difference
What about intersection?
32
![Page 33: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/33.jpg)
Pushing Selection through Cartesian Product & Join
Algebraic Rewritings
σ
R S
cond × σ cond
S R
×
σ
R S
cond σ cond S
R
σ cond S
R
σ cond
The right direction requires that cond refers to S attributes only
The right direction requires that cond refers to S attributes only
The right direction requires that all the attributes in cond appear in both R and S
33
![Page 34: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/34.jpg)
Let x = subset of R attributes z = attributes in predicate P (subset of R attributes)
πx[σp (R) ] =
{σp [ πx (R) ]} πx
πxz
Rules: π,σ combined
34
![Page 35: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/35.jpg)
Projection Decomposition Algebraic Rewritings
π XY
π X
R X
R
π
37
![Page 36: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/36.jpg)
More Rules can be Derived:
σp∧q (R S) = ? where p only at R, q only at S
Derived Rules: σ+ combined
38
![Page 37: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/37.jpg)
More Rules can be Derived:
σp∧q (R S) =
σp [σq (R S) ] =
σp [ R σq (S) ] =
[σp (R)] [σq (S)] where p only at R, q only at S
Derived Rules: σ+ combined
39
![Page 38: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/38.jpg)
σp1∧p2 (R) → σp1 [σp2 (R)]
σp (R S) → [σp (R)] S R S → S R
πx [σp (R)] → πx {σp [πxz (R)]}
Which are always “good” transformations?
40
![Page 39: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/39.jpg)
• No transformation is always good at the l.q.p level • Usually good:
early selections elimination of cartesian products elimination of redundant subexpressions
• Many transformations lead to “promising” plans Commuting/rearranging joins In practice too “combinatorially explosive” to be handled as rewriting of l.q.p.
Bottom Line
41
![Page 40: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/40.jpg)
Arranging the Join Order: The Wong-Youssefi algorithm (INGRES) Sample TPC-‐H Schema Nation(NationKey, NName) Customer(CustKey, CName, NationKey) Order(OrderKey, CustKey, Status) Lineitem(OrderKey, PartKey, Quantity) Product(SuppKey, PartKey, PName) Supplier(SuppKey, Sname)
SELECT SName FROM NaVon, Customer, Order, LineItem, Product, Supplier WHERE NaVon.NaVonKey = Cuctomer.NaVonKey
AND Customer.CustKey = Order.CustKey AND Order.OrderKey=LineItem.OrderKey AND LineItem.PartKey= Product.Partkey AND Product.Suppkey = Supplier.SuppKey AND NName = “Canada”
Find the names of suppliers that sell a product
that appears in a line item of an order made by a customer who is
in Canada
42
![Page 41: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/41.jpg)
Challenges with Large Natural Join Expressions For simplicity, assume that in the query 1. All joins are natural 2. whenever two tables of the FROM clause have common
attributes we join on them 3. Consider Right-Index only
NaVon Customer Order LineItem Product Supplier
σNName=“Canada”
πSName
One possible order
RI
RI
RI
RI
RI
Index
43
![Page 42: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/42.jpg)
Multiple Possible Orders
NaVon Customer Order LineItem Product Supplier
σNName=“Canada”
πSName
RI
RI
RI
RI
RI
44
![Page 43: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/43.jpg)
Wong-Yussefi algorithm assumptions and objectives • Assumption 1 (weak): Indexes on all join
attributes (keys and foreign keys) • Assumption 2 (strong): At least one selection
creates a small relation A join with a small relation results in a small relation
• Objective: Create sequence of index-based joins such that all intermediate results are small
45
![Page 44: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/44.jpg)
Hypergraphs
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
• relation hyperedges • two hyperedges for same relation are possible
• each node is an attribute • can extend for non-natural equality joins by merging nodes
Na2on Customer
Order LineItem
Product
Supplier
46
![Page 45: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/45.jpg)
Small Relations/Hypergraph Reduction
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on
Customer
Order LineItem
Product
Supplier
NaVonKey NName
“NaVon” is small because it has the equality selecVon NName = “Canada”
NaVon
σNName=“Canada” Index Pick a small relaVon
(and its condiVons) to start the plan
47
![Page 46: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/46.jpg)
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on
Customer
Order LineItem
Product
Supplier
NaVonKey NName
NaVon
σNName=“Canada” Index
RI
Remove small relaVon (hypergraph reducVon) and color as “small” any relaVon that joins with the removed “small” relaVon
Customer
Pick a small relaVon (and its condiVons if any) and join it with the small relaVon that has been
reduced
48
![Page 47: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/47.jpg)
After a bunch of steps…
NaVon Customer Order LineItem Product Supplier
σNName=“Canada”
πSName
RI
RI
RI
RI
RI
Index
49
![Page 48: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/48.jpg)
Multiple Instances of Each Relation
SELECT S.SName FROM NaVon, Customer, Order, LineItem L, Product P, Supplier S,
LineItem LE, Product PE, Supplier Enron WHERE NaVon.NaVonKey = Cuctomer.NaVonKey
AND Customer.CustKey = Order.CustKey AND Order.OrderKey=L.OrderKey AND L.PartKey= P.Partkey AND P.Suppkey = S.SuppKey AND Order.OrderKey=LE.OrderKey AND LE.PartKey= PE.Partkey AND PE.Suppkey = Enron.SuppKey AND Enron.Sname = “Enron” AND NName = “Cayman”
Find the names of suppliers
whose products appear in an
order made by a customer who is
in Cayman Islands and an Enron product appears in the
same order
50
![Page 49: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/49.jpg)
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on Customer
Order LineItem L
Product P
Supplier S
SuppKey PName PartKey SName
Product PE
Supplier Enron
LineItem LE
QuanVty
Multiple Instances of Each Relation
51
![Page 50: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/50.jpg)
Multiple choices are possible
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on Customer
Order LineItem L
Product P
Supplier S
SuppKey PName PartKey SName
Product PE
Supplier Enron
LineItem LE
QuanVty
52
![Page 51: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/51.jpg)
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on Customer
Order LineItem L
Product P
Supplier S
SuppKey PName PartKey SName
Product PE
Supplier Enron
LineItem LE
QuanVty
53
![Page 52: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/52.jpg)
CName
CustKey
NaVonKey NName
Status OrderKey
QuanVty
PartKey SuppKey PName SName
Na2on Customer
Order LineItem L
Product P
Supplier S
SuppKey PName PartKey SName
Product PE
Supplier Enron
LineItem LE
QuanVty
54
![Page 53: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/53.jpg)
NaVon Customer Order
σNName=“Cayman”
RI
RI
Index
Enron PE LE
σSName=“Enron”
RI
RI
Index
LineItem Product Supplier
RI
RI
RI
55
![Page 54: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/54.jpg)
Algorithms for Relational Algebra Operators
• Three primary techniques Sorting Hashing Indexing
• Three degrees of difficulty Data small enough to fit in memory Too large to fit in main memory but small enough to be handled by a “two-pass” algorithm So large that “two-pass” methods have to be generalized to “multi-pass” methods (quite unlikely nowadays)
56
![Page 55: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/55.jpg)
Join Implementations • Nested-loop Join • Merge Join • Index Join • Hash Join
57
![Page 56: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/56.jpg)
R1 R2 on common attribute C
Join Example
58
![Page 57: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/57.jpg)
Nested-Loop Join • Nested loop join
(conceptually, without taking into account disk block issues)
for each r ∈ R1 do for each s ∈ R2 do if r.C = s.C then output r,s pair
59
![Page 58: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/58.jpg)
Merge Join • Merge join (conceptually)
(1) if R1 and R2 not sorted, sort them (2) i ← 1; j ← 1; While (i ≤ T(R1)) ∧ (j ≤ T(R2)) do if R1{ i }.C = R2{ j }.C then outputTuples else if R1{ i }.C > R2{ j }.C then j ← j+1 else if R1{ i }.C < R2{ j }.C then i ← i+1
60
![Page 59: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/59.jpg)
Merge Join (continued) Procedure Output-Tuples
While (R1{ i }.C = R2{ j }.C) ∧ (i ≤ T(R1)) do [jj ← j;
while (R1{ i }.C = R2{ jj }.C) ∧ (jj ≤ T(R2)) do [output pair R1{ i }, R2{ jj }; jj ← jj+1 ] i ← i+1 ]
61
![Page 60: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/60.jpg)
i R1{i}.C R2{j}.C j 1 10 5 1 2 20 20 2 3 20 20 3 4 30 30 4 5 40 30 5
50 6 52 7
Merge Join Example
62
![Page 61: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/61.jpg)
Index Join • Index Join (conceptually)
For each r ∈ R1 do [ X ← index (R2, C, r.C) for each s ∈ X do output r,s pair]
Assume R2.C index
Note: X ← index(rel, attr, value) means that X = set of rel tuples with attr = value
63
![Page 62: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/62.jpg)
Hash Join
Hash function h, range 0 → k Buckets for R1: G0, G1, ... Gk Buckets for R2: H0, H1, ... Hk
• Hash Join (conceptually)
(1) Hash R1 tuples into G buckets (2) Hash R2 tuples into H buckets (3) For i = 0 to k do match tuples in Gi, Hi buckets
64
![Page 63: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/63.jpg)
R1 R2 2 5 Even 4 4 3 12 Odd 5 3 8 13 9 8
11 14
2 4 8 4 12 8 14
3 5 9 5 3 13 11
R1 R2 Buckets
Hashing Example: Even/Odd
65
![Page 64: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/64.jpg)
(1) Tuples of relation stored physically together? (2) Relations sorted by join attribute? (3) Indexes exist?
Factors affecting performance
66
![Page 65: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/65.jpg)
• The above algorithms are conceptual and do not account for physical implementation details.
• In reality we have to specify exactly how the algorithm will operate in terms of disk/memory accesses.
• Then based on this information we can calculate the cost of each operator in terms of disk accesses.
More details
67
![Page 66: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/66.jpg)
Disk-oriented Computation Model • There are M main memory buffers.
Each buffer has the size of a disk block • The input relation is read one block at a time. • The cost is the number of blocks read. • If B consecutive blocks are read the cost is B/d. • The output buffers are not part of the M buffers
mentioned above. Pipelining allows the output buffers of an operator to be the input of the next one. We do not count the cost of writing the output.
68
![Page 67: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/67.jpg)
Disk-oriented Computation Model
69
Metric: # of I/Os (ignoring writing the result)
Caution: This may not be the best way to compare • ignoring CPU costs • ignoring timing • ignoring double buffering requirements
![Page 68: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/68.jpg)
Notation • M = # memory blocks available • B(R) = number of blocks that R occupies • T(R) = number of tuples of R • V(R,[a1, a2 ,…, an]) = number of distinct tuples in the projection
of R on a1, a2 ,…, an
70
![Page 69: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/69.jpg)
One-Pass Main Memory Algorithms for Unary Operators • Assumption: Enough memory to keep the relation • Projection and selection:
Scan the input relation R and apply operator one tuple at a time Incremental cost of “on the fly” operators is 0 Cost depends on
• clustering of R • whether the blocks are consecutive
• Duplicate elimination and aggregation Create one entry for each group and compute the aggregated value of the group It becomes hard to assume that CPU cost is negligible
• main memory data structures are needed
71
![Page 70: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/70.jpg)
for each block Br of R do store tuples of Br in main memory for each each block Bs of S do for each tuple s of Bs join tuples of s with matching tuples of R
One-Pass Nested Loop Join • Assume B(R) is less than M • Tuples of R should be stored in an efficient lookup structure • Algorithm:
What is the cost of this algorithm?
![Page 71: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/71.jpg)
Generalization of Nested-Loops
for each chunk of M-1 blocks Br of R do store tuples of Br in main memory for each each block Bs of S do for each tuple s of Bs join tuples of s with matching tuples of R
What is the cost of this algorithm?
• Works even if B(R) is greater than M
![Page 72: query& processing& · SELECT B,D FROM R,S WHERE R.A = “c” AND S.E = 2 AND R.C=S.C Query Processing Example SQL Query Example A B C a 1 10 b 1 20 c 2 10 d 2 35 e 3 45 R C D E 10](https://reader036.vdocument.in/reader036/viewer/2022070217/6121437202471905e5407b69/html5/thumbnails/72.jpg)
Two-Pass Hash-Based Algorithms
• General Idea: Hash the tuples of the input arguments in such a way that all tuples that must be considered together will have hashed to the same hash value. If there are M buffers pick M-1 as the number of hash buckets
• Example: Duplicate Elimination Phase 1: Hash each tuple of each input block into one of the M-1 bucket/buffers. When a buffer fills save to disk. Phase 2: For each bucket:
• load the bucket in main memory, • treat the bucket as a small relation and eliminate duplicates • save the bucket back to disk.
Catch: Each bucket has to be less than M. Cost = ?