temple university – cis dept. cis661 – principles of data management v. megalooikonomou query...
TRANSCRIPT
![Page 1: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/1.jpg)
Temple University – CIS Dept. CIS661 – Principles of Data Management
V. MegalooikonomouQuery Optimization
(based on slides by C. Faloutsos at CMU)
![Page 2: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/2.jpg)
General Overview - rel. model Relational model - SQL Functional Dependencies &
Normalization Physical Design Indexing Query optimization Transaction processing
![Page 3: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/3.jpg)
Overview of a DBMSDBAcasual
user
DML parser
buffer mgr
trans. mgr
DMLprecomp.
DDL parser
catalogData-files
Naïve user
![Page 4: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/4.jpg)
Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies
![Page 5: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/5.jpg)
Why Q-opt? SQL: ~declarative good q-opt -> big difference
eg., seq. Scan vs B-tree index, on P=1,000 pages
![Page 6: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/6.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic
q-opt) generate alternative plans estimate cost; pick best
![Page 7: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/7.jpg)
Q-opt - example
select name
from STUDENT, TAKES
where c-id=‘CIS661’ and
STUDENT.ssn=TAKES.ssn
STUDENT TAKES
STUDENT TAKES
Canonical form
![Page 8: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/8.jpg)
Q-opt - example
STUDENT TAKES
Index; seq. scan
Hash join; merge join;
nested loops;
![Page 9: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/9.jpg)
Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies
![Page 10: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/10.jpg)
Equivalence of expressions A.k.a.: syntactic q-opt In short: perform selections and
projections early More details:
see transformation rules in text
![Page 11: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/11.jpg)
Equivalence of expressions Q: How to prove a transformation
rule?
A: use TRC, to show that LHS = RHS, e.g.: )2()1()21(
?
RRRR PPP
)2()1()21(?
RRRR PPP
![Page 12: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/12.jpg)
Equivalence of expressions
))()2())(1(
)()21(
)()21(
)2()1()21(?
tPRttPRt
tPRtRt
tPRRt
LHSt
RRRR PPP
![Page 13: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/13.jpg)
Equivalence of expressions
QED
RHSt
RRt
RtRt
tPRttPRt
RRRR
PP
PP
PPP
)2()1(
))2(())1((
))()2())(1(
...
)2()1()21(?
![Page 14: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/14.jpg)
Equivalence of expressions Q: how to disprove a rule??
)2()1()21(?
RRRR AAA
![Page 15: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/15.jpg)
Equivalence of expressions Selections
perform them early break a complex predicate, and push
simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’
))...)((...()( 21^...2^1 RR pnpppnpp
![Page 16: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/16.jpg)
Equivalence of expressions Projections
perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are
eliminated) project out all attributes except the
ones requested or required (e.g., joining attr.)
![Page 17: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/17.jpg)
Equivalence of expressions Joins
Commutative , associative
Q: n-way join - how many diff. orderings? … Exhaustive enumeration too slow…
RSSR
)()( TSRTSR
![Page 18: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/18.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic
q-opt) generate alt. plans estimate cost; pick best
![Page 19: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/19.jpg)
19
Cost estimation Eg., find ssn’s of students with an
‘A’ in CIS661 (using seq. scanning) How long will a query take?
CPU (but: small cost; decreasing; tough to estimate)
Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to
keep?)
![Page 20: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/20.jpg)
Cost estimation
Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in
bytes …
Sr
#1#2
#3
#nr
![Page 21: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/21.jpg)
Cost estimation Statistics: for each
relation ‘r’ we keep … V(A,r): number of
distinct values of attr. ‘A’
(recently, histograms, too)
…
Sr
#1#2
#3
#nr
![Page 22: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/22.jpg)
Derivable statistics fr: blocking factor =
max# records/block (=?? )
br: # blocks (=?? ) SC(A,r) = selection
cardinality = avg# of records with A=given (=?? )
…
fr
Sr
#1
#2
#br
![Page 23: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/23.jpg)
Derivable statistics fr: blocking factor = max#
records/block (= B/Sr ; B: block size in bytes)
br: # blocks (= nr / fr )
![Page 24: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/24.jpg)
Derivable statistics SC(A,r) = selection cardinality =
avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 30,000 students, 10 colleges – how many students in CST?
![Page 25: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/25.jpg)
Additional quantities we need:
For index ‘i’: fi: average fanout - degree (~50-100) HTi: # levels of index ‘i’ (~2-3)
~ log(#entries)/log(fi) LBi: # blocks at leaf level
HTi
![Page 26: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/26.jpg)
Statistics Where do we store them? How often do we update them?
![Page 27: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/27.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic q-
opt) generate alt. plans
selections; sorting; projections joins
estimate cost; pick best
![Page 28: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/28.jpg)
Cost estimation + plan generation Selections – eg.,
select * from TAKESwhere grade =
‘A’ Plans? …
fr
Sr
#1
#2
#br
![Page 29: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/29.jpg)
Cost estimation + plan generation Plans?
seq. scan binary search
(if sorted & consecutive)
index search if an index
exists
…
fr
Sr
#1
#2
#br
![Page 30: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/30.jpg)
Cost estimation + plan generation
seq. scan – cost? br (worst case) br/2 (average, if
we search for primary key)
…
fr
Sr
#1
#2
#br
![Page 31: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/31.jpg)
Cost estimation + plan generation
binary search – cost?if sorted and
consecutive: ~log(br) + SC(A,r)/fr (=#blocks
spanned by qualified tuples)
-1
…
fr
Sr
#1
#2
#br
![Page 32: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/32.jpg)
Cost estimation + plan generation
estimation of selection cardinalities SC(A,r):
non-trivial – details later
…
fr
Sr
#1
#2
#br
![Page 33: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/33.jpg)
Cost estimation + plan generation
method#3: index – cost? levels of index + blocks w/ qual. tuples
…
fr
Sr
#1
#2
#br
...
case#1: primary key
case#2: sec. key – clustering index
case#3: sec. key – non-clust. index
![Page 34: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/34.jpg)
Cost estimation + plan generation
method#3: index – cost? levels of index + blocks w/ qual. tuples
…
fr
Sr
#1
#2
#br
..
case#1: primary key – cost:
HTi + 1
HTi
![Page 35: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/35.jpg)
Cost estimation + plan generation
method#3: index - cost? levels of index + blocks w/ qual. tuples
…
fr
Sr
#1
#2
#br
case#2: sec. key – clustering index
OR prim. index on non-key
…retrieve multiple records
HTi + SC(A,r)/fr
HTi
![Page 36: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/36.jpg)
Cost estimation + plan generation
method#3: index – cost? levels of index + blocks w/ qual. tuples
…
fr
Sr
#1
#2
#br
...
case#3: sec. key – non-clust. index
HTi + SC(A,r)
(actually, pessimistic...)
![Page 37: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/37.jpg)
Cost estimation – arithmetic examples find accounts with branch-name =
‘Perryridge’ account(branch-name, balance, ...)
![Page 38: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/38.jpg)
Arithm. examples – cont’d n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct
values V(branch-name, account) = 50
distinct values for branch-index: fanout fi = 20
![Page 39: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/39.jpg)
Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses Q2: assume a clustering index on
branch-name – cost?
![Page 40: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/40.jpg)
Cost estimation + plan generation
method#3: index – cost? levels of index + blocks w/ qual.
tuples
…
fr
Sr
#1
#2
#br
case#2: sec. key – clustering index
HTi + SC(A,r)/frHTi
![Page 41: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/41.jpg)
Arithm. examples A2:
HTi + SC(branch-name, account)/f-account
HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+)
SC(..)= # qualified records = nr/V(A,r) = 10,000/50 = 200 tuples SC/f: spanning 200/20 blocks = 10 blocks
![Page 42: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/42.jpg)
Arithm. examples A2 final answer: 2+10= 12 block
accesses (vs. 500 block accesses of seq.
scan) footnote: in all fairness
seq. disk accesses: ~2msec or less random disk accesses: ~10msec
![Page 43: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/43.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic q-
opt) generate alternative plans
selections; sorting; projections joins
estimate cost; pick best
![Page 44: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/44.jpg)
General Overview - rel. model Relational model - SQL Functional Dependencies &
Normalization Physical Design Indexing Query optimization Transaction processing
![Page 45: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/45.jpg)
Q-opt steps bring query in internal form (eg., parse
tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans
selections (simple; complex predicates) sorting; projections joins
estimate cost; pick best
![Page 46: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/46.jpg)
Reminder – statistics: for each relation ‘r’ we keep
nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr.
‘A’ fr: blocking factor br: number of blocks SC(A,r): selection cardinality (avg.# of
records with A=given)
![Page 47: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/47.jpg)
Selections we saw simple predicates
(A=constant; eg., ‘name=Smith’) how about more complex predicates,
like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’
what is their selectivity?
![Page 48: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/48.jpg)
Selections – complex predicates
selectivity sel(P) of predicate P :== fraction of tuples that qualifysel(P) = SC(P) * nr
![Page 49: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/49.jpg)
Selections – complex predicates
eg., assume that V(grade, TAKES)=5 distinct values
simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5
(what if V(A,r) is unknown??)grade
count
AF
![Page 50: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/50.jpg)
Selections – complex predicates range query: sel( grade >= ‘C’)
sel(A>a) = (Amax – a) / (Amax – Amin)
grade
count
AF
![Page 51: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/51.jpg)
Selections - complex predicates negation: sel( grade != ‘C’)
sel( not P) = 1 – sel(P) (Observation: selectivity =~
probability)
grade
count
AF
‘P’
![Page 52: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/52.jpg)
Selections – complex predicates
conjunction: sel( grade = ‘C’ and course = ‘CIS661’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION
P1 P2
![Page 53: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/53.jpg)
Selections – complex predicates
disjunction: sel( grade = ‘C’ or course = ‘CIS661’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again
P1 P2
![Page 54: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/54.jpg)
Selections – complex predicates
disjunction: in generalsel(P1 or P2 or … Pn) =1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn))
P1 P2
![Page 55: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/55.jpg)
Selections – summary sel(A=constant) = 1/V(A,r) sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) –
sel(P1)*sel(P2)
UNIFORMITY and INDEPENDENCE ASSUMPTIONS
![Page 56: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/56.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic q-
opt) generate alt. plans
selections (simple; complex predicates) sorting; projections joins
estimate cost; pick best
![Page 57: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/57.jpg)
Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost?
...
12
M
1
br
...
‘r’
![Page 58: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/58.jpg)
Sorting Q1: how to sort (‘external sorting’)? A1:
create sorted runs of size M merge
...
12
M
1
br
...
‘r’
![Page 59: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/59.jpg)
Sorting create sorted runs of size M (how many?) merge them (how?)
M
... ...
![Page 60: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/60.jpg)
Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ...
M
... ...…..
![Page 61: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/61.jpg)
Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br
(each step reads every block once and writes it once)
M
... ...…..
![Page 62: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/62.jpg)
Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br
M
... ...…..
![Page 63: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/63.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic q-
opt) generate alt. plans
selections (simple; complex predicates) sorting; projections, aggregations joins
estimate cost; pick best
![Page 64: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/64.jpg)
Projection - dupl. elimination
eg., select distinct c-idfrom TAKES
How?Pros and cons?
![Page 65: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/65.jpg)
Set operations
eg., select * from REGULAR-STUDENTunionselect * from SPECIAL-STUDENT
How?Pros and cons?
![Page 66: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/66.jpg)
Aggregations
eg., select ssn, avg(grade) from TAKES
How?
![Page 67: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/67.jpg)
Q-opt steps bring query in internal form (eg., parse
tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans
selections; sorting; projections, aggregations joins
2-way joins n-way joins
estimate cost; pick best
![Page 68: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/68.jpg)
2-way joins output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S
have no common attribute) #of output tuples=??
![Page 69: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/69.jpg)
2-way joins output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand.
key for ‘r’ #of output tuples=??
r(A, ...)
s(A, ......)nr
ns
<=ns
![Page 70: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/70.jpg)
2-way joins output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand.
key for neither (is it possible??) #of output tuples=??
r(A, ...)
s(A, ......)nr
ns
![Page 71: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/71.jpg)
2-way joins #of output tuples= nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less)
r(A, ...)
s(A, ......)nr
ns
![Page 72: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/72.jpg)
Q-opt steps bring query in internal form (eg., parse
tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans
selections; sorting; projections, aggregations joins
2-way joins - output size estimation; algorithms n-way joins
estimate cost; pick best
![Page 73: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/73.jpg)
2-way joins algorithm(s) for r JOIN s? nr, ns tuples each
r(A, ...)
s(A, ......)nr
ns
![Page 74: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/74.jpg)
2-way joins Algorithm #0: (naive) nested loop
(SLOW!)for each tuple tr of r
for each tuple ts of sprint, if they match
r(A, ...)
s(A, ......)nr
ns
![Page 75: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/75.jpg)
2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and
‘bs’ are the number of blocks for ‘r’ and ‘s’)?r(A, ...)
s(A, ......)nr
ns
nr*bs + br
![Page 76: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/76.jpg)
2-way joins Algorithm #1: Blocked nested-loop join
read in a block of r read in a block of s
print matching tuples
r(A, ...)
s(A, ......)nr,
brns records, bs blocks
cost: br + br * bs
![Page 77: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/77.jpg)
2-way joins Arithmetic example:
nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks
r(A, ...)
s(A, ......)10,000
1,000 1,000 records,
200 blocks
alg#0: 2,001,000 d.a.
alg#1: 201,000 d.a.
![Page 78: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/78.jpg)
2-way joins Observation1: Algo#1: asymmetric:
cost: br + br * bs - reverse roles: cost= bs + bs*br
Best choice?
r(A, ...)
s(A, ......)nr,
brns records, bs blocks
smallest relation in outer loop
![Page 79: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/79.jpg)
2-way joinsObservation2 [NOT IN BOOK]:
what if we have ‘k’ buffers available?
r(A, ...)
s(A, ......)nr,
brns records, bs blocks
read in ‘k-1’ blocks of ‘r’
read in a block of ‘s’
print matching tuples
![Page 80: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/80.jpg)
2-way joins Cost?
r(A, ...)
s(A, ......)nr,
brns records, bs blocks
read in ‘k-1’ blocks of ‘r’
read in a block of ‘s’
print matching tuples
br + br/(k-1) * bs
what if br=k-1?what if we assign k-1 blocks to inner?)
![Page 81: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/81.jpg)
2-way joins Observation3: can we get rid of the ‘br’ term?
cost: br + br * bs
r(A, ...)
s(A, ......)nr,
brns records, bs blocks
A: read the inner relation backwards half of the times! Q: cons?
![Page 82: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/82.jpg)
2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each
r(A, ...)
s(A, ......)nr
ns
![Page 83: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/83.jpg)
2-way joins - other algo’s sort-merge
sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted)
r(A, ...)
s(A, ......)nr
ns
![Page 84: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/84.jpg)
2-way joins - other algo’s sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order
r(A, ...)
s(A, ......)nr
ns
![Page 85: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/85.jpg)
use an existing index, or even build one on the fly
cost: br + nr * c (c: look-up cost)
2-way joins - other algo’s
r(A, ...)
nr
s(A, ......)
ns
![Page 86: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/86.jpg)
hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets
2-way joins - other algo’s
r(A, ...)s(A, ......)
0
1
max
![Page 87: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/87.jpg)
how to join each pair of partitions Hr-i, Hs-i ?
A: build another hash table for Hs-i, and probe (look up) it with each tuple of Hr-i
2-way joins - hash join details
r(A, ...)s(A, ......)
Hr-0
0
1
max
Hs-0
![Page 88: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/88.jpg)
what if Hs-i is too large to fit in main-memory?
A: recursive partitioning more details (overflows, hybrid hash
joins): in book cost of hash join? (under certain
assumptions) 2(br + bs) + (br + bs) + 4* max
2-way joins - hash join details
![Page 89: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/89.jpg)
Q-opt steps bring query in internal form (eg., parse
tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans
selections; sorting; projections, aggregations joins
2-way joins - output size estimation; algorithms n-way joins
estimate cost; pick best
![Page 90: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/90.jpg)
r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way
joins
n-way joins
![Page 91: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/91.jpg)
System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only
one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal
2-way joins: nested-loop and sort-merge
Structure of query optimizers:
r1 r2 r3 r4
![Page 92: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/92.jpg)
More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book
In general: q-opt is very important for large databases.
(‘explain select <sql-statement>’ gives plan)
Structure of query optimizers:
![Page 93: Temple University – CIS Dept. CIS661 – Principles of Data Management V. Megalooikonomou Query Optimization (based on slides by C. Faloutsos at CMU)](https://reader036.vdocument.in/reader036/viewer/2022062422/56649eef5503460f94bff62f/html5/thumbnails/93.jpg)
Q-opt steps bring query in internal form (eg.,
parse tree) … into ‘canonical form’ (syntactic q-
opt) generate alt. plans
selections (simple; complex predicates) sorting; projections joins
estimate cost; pick best