icicles: self-tuning samples for approximate query answering by venkatesh ganti , mong li lee,...
DESCRIPTION
ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan. Harikrishnan Karunakaran Sulabha Balan. CSE 6339 . Outline. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality & Performance Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/1.jpg)
Harikrishnan Karunakaran Sulabha Balan
ICICLES: Self-tuning Samples for Approximate Query Answering
By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan
CSE 6339
![Page 2: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/2.jpg)
Introduction
Icicles
Icicle Maintenance
Icicle-Based Estimators
Quality & Performance
Conclusion
Outline
![Page 3: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/3.jpg)
Analysis of data in data warehouses useful in decision support
Users of decision support systems want interactive systems
OLAP – Online Analytical Processing Aggregate Query Answering Systems
(AQUA) developed to reduce response time to desirable levels
Tolerant of approximate results
Introduction
![Page 4: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/4.jpg)
Various Approaches
Sampling-based
Histogram-based
Clustering
Probabilistic
Wavelet-based
Approximate Querying
![Page 5: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/5.jpg)
Uniform Random Sampling
Branch
State Sales
1 CA 80K2 TX 42K3 CA 40K4 CA 42K5 TX 75K6 CA 48K7 TX 55K8 TX 38K9 CA 40K10 CA 41K
Branch
State Sales
2 TX 42K4 CA 42K6 CA 48K8 TX 38K10 CA 41K
50%Sample
SELECT SUM(sales) x 2 AS cntFROM s_salesWHERE state = ‘TX’
S_sales
scale factor
Sales
![Page 6: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/6.jpg)
Biased Sampling
Sample relation for aggregation query workload regarding Texas branches
Branch
State Sales
1 CA 80K2 TX 42K3 CA 40K4 CA 42K5 TX 75K6 CA 48K7 TX 55K8 TX 38K9 CA 40K10 CA 41K
Branch
State Sales
2 TX 42K4 CA 42K5 TX 75K7 TX 55K8 TX 38K
SalesS_sales
![Page 7: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/7.jpg)
All tuples in a Uniform Random Sample are treated as equally important for answering queries
Sample needs to be tuned to contain tuples which are more relevant to answer queries in a workload
Need for a dynamic algorithm that changes the sample as and according to suit the queries being executed in the workload
Methodology
![Page 8: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/8.jpg)
Join of a Uniform Random Sample of a Fact Table with a set of accompanying Dimension Tables
Join Synopsis
SELECT COUNT(*), AVG(LI Extendedprice), SUM(LI Extendedprice) FROM LI, C, O, S, N, R WHERE C Custkey=O Custkey AND O Orderkey=LI
Orderkey AND LI Suppkey=S Suppkey AND C Nationkey=N
Nationkey AND N Regionkey=R Regionkey AND R Name=North
America AND O Orderdate01-01-1998 AND O Orderdate12-31-
1998;
![Page 9: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/9.jpg)
Any aggregate query on the fact table can be answered approximately using exactly one of a smaller number of synopses
Uniform Random Sample of Relation wastes memory
OLAP queries exhibit locality in their data access
Need for Icicles
![Page 10: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/10.jpg)
Class of samples to capture data locality of aggregate queries of foreign key joins
Identify focus of a query workload and sample accordingly
Is a uniform random sample of a multiset of tuples L, which is the union of R and all sets of tuples that were required to answer queries in the workload (an extension of R)
Is a non-uniform sample of the original relation R
Icicles
![Page 11: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/11.jpg)
Icicle L
![Page 12: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/12.jpg)
Icicle Maintanence Algorithm
![Page 13: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/13.jpg)
Algorithm is efficient due to
Uniform Random Sample of L ensures tuple’s selection in its icicle is proportional to it’s frequency
Incremental maintenance of icicle requires only the segment of R that satisfies the new query from the workload
Reservoir Sampling Algorithm
Icicle Maintanence Algorithm
![Page 14: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/14.jpg)
Icicle Maintenance Example
SELECT average(*)FROM widget-tunersWHERE date.month = ‘April’
![Page 15: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/15.jpg)
• In spite of unified sampling being used the result is a biased sample
• Frequency Relation maintained over all tuples in relation
• Different Estimation mechanisms for Average, Count and Sum
Icicle-Based Estimators
![Page 16: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/16.jpg)
Average Average taken over set of distinct sample tuples that satisfy the query
predicate of the average query is a pretty good estimate of the average Count Sum of Expected Contributions of all tuples in the sample that
satisfy the given query Sum Estimate is given by the product of the average and the count
estimates
Estimators
![Page 17: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/17.jpg)
Frequency Attribute added to the Relation
Starting Frequency set to 1 for all tuples
Incremented each time tuple is used to answer a query
Frequencies of relevant tuples updated only when icicle updated with new query
Maintaining Frequency Relation
![Page 18: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/18.jpg)
When queries exhibit data locality then icicle is constituted of more tuples from frequently accessed subsets of the relation
Accuracy improves with increase in number of tuples used to compute it
Class consisting of queries ‘focused’ with respect to workload will obtain more accurate approximate answers from the icicle
Quality Guarantees
![Page 19: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/19.jpg)
Quality Guarantees contd...
![Page 20: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/20.jpg)
Performance EvaluationSELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LI, C, O, S, N, RWHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998
SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LICOS-icicle, N, RWHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998
Qworkload : Template for generating workloads
Template for obtaining approximate answers
![Page 21: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/21.jpg)
Performance Evaluation contd...
![Page 22: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/22.jpg)
The Error Plots for Comparison
Static uniform random sample on Join Synopsis
Icicle as it evolves with the workload
Icicle-Complete which is formed after entire workload has been executed once
Performance Evaluation contd...
![Page 23: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/23.jpg)
Focused Queries
Performance Evaluation contd...
![Page 24: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/24.jpg)
Performance Evaluation contd...
Mixed Workload
![Page 25: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/25.jpg)
Rapid decrease in relative error of query answers from icicles with queries focused on a set of core tuples
Icicle plot shows a convergence to the Icicle-Complete plot
Quick Convergence of Icicle plot towards Icicle-Complete means Icicle adapts fast
Observations (focused)
![Page 26: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/26.jpg)
Improvement due to usage of icicles is not significant
Can be concluded that icicles are at worst as good as the static samples
Observations (mixed)
![Page 27: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/27.jpg)
Icicles provide class of samples that adapt according to the characteristics of the workload
It can never be worse than the case of static sampling
It focuses on relatively small subsets in the relation
Conclusion
![Page 28: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/28.jpg)
There is no significant gains in the case of Uniform Workload
There is a trade-off between accuracy and cost
Restricted to certain scenarios where the queries tend to be increasingly focused towards the workload.
Inferences
![Page 29: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/29.jpg)
V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.
S Acharya, PB Gibbons, V Poosala, S Ramaswamy Join synopses for approximate query answering. ACM SIGMOD Record 1999
References
![Page 30: ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti , Mong Li Lee, and Raghu Ramakrishnan](https://reader035.vdocument.in/reader035/viewer/2022062521/56816906550346895de019ef/html5/thumbnails/30.jpg)
Thank You
Questions?