reducing order enforcement cost in complex query plans ravindra guravannavar and s. sudarshan (to...
TRANSCRIPT
![Page 1: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/1.jpg)
Reducing Order Enforcement Cost in Complex Query Plans
Ravindra Guravannavar and S. Sudarshan(To appear in ICDE 2007)
![Page 2: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/2.jpg)
2
Background Sort-based query processing algorithms
Sort-merge Join (also Union/Intersection) Sort-based grouping and duplicate elimination
Explicit “order by” Notion of “Interesting Sort Orders” (System-R)
Find and remember the best plan for each sort order that may be useful
Optimization goal in Volcano : (expr, sort-order)
![Page 3: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/3.jpg)
3
The Problem Interesting orders can be too many!
Factorial in number of attributes involved Plan cost can vary substantially with the
choice of interesting order Clustering and covering indices Other operators in the input sub-expressions Possibility of partial sorting
G Group By {a2,a4,a5,…}
R S
R.a1=S.a1 and R.a2=S.a2 … R.an=S.an
![Page 4: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/4.jpg)
4
Motivation Joins in data integration and decision support
involve large number of attributes Increasing use of covering indices
Several alternative sort orders Partial sorting
Query patterns Attributes common to multiple operators
Known techniques Work only for unary operators like group-by
![Page 5: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/5.jpg)
5
Outline of the Talk Partial sorting
Changes to external sort Optimizer changes to handle partial sort orders
Interesting orders for a join tree : A special case Problem is NP-Hard A 2-approximation for the special case
The general problem Notion of favorable orders Plan generation using favorable orders Post-optimization phase
Experimental results
![Page 6: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/6.jpg)
6
Exploiting Partial Sort Orders
Sort on (a1, a2) given (a1) Standard external-sort
Cost is independent of input sort order Replacement-selection
Produces single run but incurs I/O Both methods break the pipeline – first o/p tuple
after reading all i/p
R S
R.a1=S.a1 and R.a2=S.a2
C. Index on (R.a1)
(a1) (a1,a2) () (a1,a2)
![Page 7: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/7.jpg)
7
A Minor Change to External Sorting
Multiple “partial sort segments” Hold only one segment at any given
time When a new segment starts
Sort the current segment and output
No run generation I/O if each segment fits in memory
Early output (good for Top-K) Reduced comparisons
O(n log n/k) Vs. O(n log n), k = # segments
a1 a2
1 2
1 1
1 5
1 3
2 4
2 1
2 6
6 3
… …
![Page 8: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/8.jpg)
8
Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort:
Let the input order be o1
Required (output) order be o2
Let os=Longest common prefix between o1 and o2
Let or=o2 – os (i.e, os + or = o2) A(o) = Attribute set of order o Є : Empty (no) sort order
coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), where e’=p(e) and p equates A(os) to a constant.
![Page 9: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/9.jpg)
9
Optimizer Changes to Handle Partial Sort Orders Cost Model for Partial Sort:
coe(e, o1,o2) = D(e, A(os)) X coe(e’, Є, or), where e’=p(e) and p equates A(os) to a constant.
o1=(a,b)
o2=(a,c)
os=(a), or=(c), e’=(a=k)(e)
e
![Page 10: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/10.jpg)
10
Flexible Order Requirements Most operators have interest in any order on the
attributes involved Merge-Join, Merge-Union, Group By, Duplicate Elimination Binary operators demand the same order from inputs
G {a1, a2}
{a1,a2,a3,a4}
{a4,a7}{a3,a5,a6}
![Page 11: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/11.jpg)
11
Finding Optimal is NP-Hard A special case:
All relations/intermediate results of the same size
All attribute cardinalities same
We try to maximize the length of common prefixes
Maximize LCP(pi, pj)
Reduction from graph layout problem SUM-CUT Optimal algorithm for paths and 2-approximation for binary trees
![Page 12: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/12.jpg)
12
A 2-Approximation Algorithm Optimal algorithm for paths
s2s1 sns3 Sn-1
OPT(i,j) = max {OPT(i,k) + OPT(k+1,j) + c(i,j)}, i ≤ k < j
2-Approximation for binary trees
- OPT ≤ OPT-EVEN + OPT-ODD- Take the one with higher benefit
Even levelsOdd levels
![Page 13: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/13.jpg)
13
General Case Logical plan space for inputs not expanded
(i.e, Join order not fixed)
Varying sizes of relations and intermediate results
All orders on base relations do not have the same cost (due to clustering and covering indices)
![Page 14: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/14.jpg)
14
Overview of the Approach Identify a small set of favorable orders
Orders that are relatively inexpensive Should not require expanding the input plan space
Plan generation (Phase-1) Deduce the interesting orders from the favorable
orders Try each of the interesting order, retain the best
Plan refinement (Phase-2) Use the 2-approximation algorithm and refine the
sort orders further
![Page 15: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/15.jpg)
15
Favorable Orders Benefit of an order:
benefit(o, e) = cbp(e, Є) + coe (e, Є, o) – cpb(e,o)Positive benefit The order can be obtained at cost
less than the full sort of unordered result (e.g., the
clustering order)
Favorable orders:ford(e)={ o : benefit(o,e) > 0 } Can be a huge set E.g., Every order having the clustering order as its
prefix is a favorable order.
![Page 16: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/16.jpg)
16
Minimal Favorable Orders A favorable order o that satisfies:
1. o’ ≤ o s.t. cbp(e, o’) + coe(e, o’, o) = cbp(e,o)
2. o” s.t. o ≤ o” and cbp(e, o”) = cbp(e,o)E.g., Relation R with clustering index on (a1,a2)
(a1,a2) is a minimal favorable order
(a1 ), (a1,a2,a3) are not
ford-min(e): Set of all minimal favorable orders for expression e
For base relations size of ford-min limited to the number of covering indices
E
E
![Page 17: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/17.jpg)
17
Computing Favorable Orders: Issues Defined in terms of cost of best plan
Need them before optimizing input sub-expressions Even ford-min can get prohibitively large for
join, group-by expressions
R S
J1
J2
ford-min contains every permutation of
the join attributes
![Page 18: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/18.jpg)
18
Heuristics for Computing ford-min
e=R : {o: o is clustering or covering index order}
e=p(e1) : {o: o ford-min(e1)}
e=L(e1) :{o: o’ ford-min(e1) and o=o’ ^ L} a,b(e1), ford-min(e1)={(a,c,b)} ford-min(e)={(a)}
e=e1 e2 : Let T=ford-min(e1) U ford-min(e2) T U {o: o’ T and o=((o’ ^ S) permute(S – A(o’ ^ S)))
UU
U
![Page 19: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/19.jpg)
19
Heuristics for Computing ford-minS={a,b,c,d}
ford-min={(a,b,e),(b)} ford-min={(a)}
T = {(a,b,e), (b), (a)}
Input F.Order (o) o ^ {a,b,c,d} Extended Order
(a,b,e) (a,b) (a,b,c,d)
(b) (b) (b,a,c,d)
(a) (a)
![Page 20: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/20.jpg)
20
Plan Generation (Phase-1) Form the set I of interesting orders to try
Collect input favorable orders and rqd. o/p order Take LCP with the set of join attributes Extend the orders (arbitrarily) to include remaining
attributes For each order o in I, generate optimization
sub-goals for input sub-expressions
![Page 21: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/21.jpg)
21
Plan Refinement (Phase-2)
Identify the suffix that can be freely reordered Use the 2-approximation algorithm to reorder
the suffix
R2(a)
(a,b,c,h)
(a,d,h)
R4(a)
R3(a)
R1(a)
(a,e,h){a,d,h} {a,e,h}
{a,e,h}
(a,h,e)
(a,h,b,c)
(a,h,d)
![Page 22: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/22.jpg)
22
Experiments1. Benefits of exploiting partial sort orders
2. Evaluate the plans produced by our optimizer extensions
Systems Compared
PostgreSQL 8.1.3, SQLServer 2005,
DB2 8.2, PYRO
Test Machine Intel P4 (HT) PC, 512 MB
Dataset TPC-H 1GB and synthetic
Queries Synthetic and from a real application
![Page 23: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/23.jpg)
23
Experiment 1SELECT suppkey, partkey FROM lineitem
ORDER BY suppkey, partkey;
(suppkey) (suppkey, partkey)
![Page 24: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/24.jpg)
24
Experiment 2
R(c1,c2,c3), 10 M records, (c1)(c1,c2), card(c1)=10,000
![Page 25: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/25.jpg)
25
Experiment 3
![Page 26: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/26.jpg)
26
Experiment 4
SELECT ps_suppkey, ps_partkey, ps_availqty,
sum(l_quantity) AS total_required
FROM partsupp, lineitem
WHERE ps_suppkey=l_suppkey AND ps_partkey=l_partkey
AND l_linestatus='O'
GROUP BY ps_partkey, ps_suppkey, ps_availqty,
HAVING sum(l_quantity) > ps_availqty
ORDER BY ps_partkey;
Parts running out of stock:
![Page 27: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/27.jpg)
27
Experiment 4 - Plans
Merge-Join Plan on SYS1 and SYS2 Plan Generated by PYRO-O
![Page 28: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/28.jpg)
28
Experiment 4 & 5 - Timings
![Page 29: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/29.jpg)
29
Experiments with Variants of PYRO
PYRO : Baseline PYROPYRO-O-: No partial sortPYRO-P : Postgres HeuristicPYRO-O : Our ApproachPYRO-E : Exhaustive
![Page 30: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/30.jpg)
30
Optimization Overheads
![Page 31: Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)](https://reader036.vdocument.in/reader036/viewer/2022062404/5516db14550346821e8b4570/html5/thumbnails/31.jpg)
31
Questions?