Download - Statistics Profile For Query Optimization
![Page 1: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/1.jpg)
05/01/04Spring 2004, CSE8330 Presentition1
Statistics Profile Statistics Profile For For
Query OptimizationQuery Optimization
WENYI NI
![Page 2: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/2.jpg)
05/01/04Spring 2004, CSE8330 Presentition2
Introduction Introduction
What is statistics profile?
•Every object has its own status.
•In order to know its status, we need statistics.
•The relation between Statistics profile and statistics.
![Page 3: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/3.jpg)
05/01/04Spring 2004, CSE8330 Presentition3
When DBMS use statistics profile?
From M.Tamer Oszu
Cost Model
![Page 4: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/4.jpg)
05/01/04Spring 2004, CSE8330 Presentition4
What does statistics profile What does statistics profile collect?collect?The central tendency of the dataThe range of the dataThe size of the dataThe distribution of the data
![Page 5: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/5.jpg)
05/01/04Spring 2004, CSE8330 Presentition5
Common types of statistics Common types of statistics profileprofileTable profileAttribute profileIndex profile
![Page 6: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/6.jpg)
05/01/04Spring 2004, CSE8330 Presentition6
Typical profilesTypical profiles
Table profile
Cardinality
500
Row size 30
Pages 100
Number of attributes
6
Attribute profile
value 100
Max value 100
Min value 0
Size 5
Data distribution
skew
Index profile
Pages 50
Size 5
Distinct values
50
![Page 7: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/7.jpg)
05/01/04Spring 2004, CSE8330 Presentition7
Three ways to collect statisticsThree ways to collect statistics
Exhaustive accumulationSamplingPiggyback
![Page 8: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/8.jpg)
05/01/04Spring 2004, CSE8330 Presentition8
Exhaustive accumulationExhaustive accumulation
Calculate every statistics describer through scanning the related object exhaustively
AdvantageMost AccurateDisadvantageHeavy system load
![Page 9: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/9.jpg)
05/01/04Spring 2004, CSE8330 Presentition9
SamplingSampling
Scan part of the related object. Estimate statistics through sample dataAdvantageLow system overheadDisadvantageStill have overhead. Statistics is not 100% accurate.
![Page 10: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/10.jpg)
05/01/04Spring 2004, CSE8330 Presentition10
PiggybackPiggyback
Collect statistics through data in memory. Slightly change SQL statement to make full use of these data.Types of piggyback
1.Vertical piggyback
2.Horizontal piggyback
3.Mixed piggyback
![Page 11: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/11.jpg)
05/01/04Spring 2004, CSE8330 Presentition11
Vertical piggybackVertical piggyback
Include extra columns during query processingExample:Select student.name from student;rewrite to:Select student.name,student.age from student;
![Page 12: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/12.jpg)
05/01/04Spring 2004, CSE8330 Presentition12
No extra I/O, but extra cpu load. Solution: set piggyback level1.AC1 = { x| x is a column in Table Ri referenced by Query Q}2.AC2 = { x| x is an index column in Table Ri } – AC13.AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC24.AC4 = { x| x is a column in Table Ri }-AC3
Advantage: Choose your piggyback level according to the CPU load
![Page 13: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/13.jpg)
05/01/04Spring 2004, CSE8330 Presentition13
Horizontal piggybackHorizontal piggyback
Include extra rows during query processExample:Select student.name, student.scoreFrom student where score >60;Rewrite to:Select student.name, student.scoreFrom student where score >60 or
student.pid In(Select student.pid for studentWhere score>60); Advantage
![Page 14: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/14.jpg)
05/01/04Spring 2004, CSE8330 Presentition14
Mixed piggybackMixed piggyback
Use both vertical and horizontal piggyback method
Advantage
![Page 15: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/15.jpg)
05/01/04Spring 2004, CSE8330 Presentition15
Value distributionValue distribution
Why we need it?
Example:Select * from StudentWhere score>60;
Size??
Attribute profile: score
Max 100
Min 0
Size 10
Values 101
Distribution table0~10: =1%10~19: =1%20~29: =1%30~39: =3%40~49: =6%50~59: =10%60~69: =10%70~79: =31%80~89: =30%90~100: =10%
![Page 16: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/16.jpg)
05/01/04Spring 2004, CSE8330 Presentition16
Answer:Answer:
Size = 500*0.81*30 = 121.5
Where 500 is the cardinality of the student table. 30 is the size of each record
![Page 17: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/17.jpg)
05/01/04Spring 2004, CSE8330 Presentition17
How to get distribution table?How to get distribution table?
Histogram1. Equal width2. Equal height
0
5
10
15
20
25
30
35
10 20 30 40 50 60 70 80 90 100
Score
Percentage
0
2
4
6
8
10
12
45 56 63 68 73 76 78 85 90 100
Score
Percentage
![Page 18: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/18.jpg)
05/01/04Spring 2004, CSE8330 Presentition18
Bucket numberBucket number
1+ logn [rule of sturge 1927]Example: student table ( 500 records)1+log500 = 10For equal width, put each value into the proper bucketsFor equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order
![Page 19: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/19.jpg)
05/01/04Spring 2004, CSE8330 Presentition19
SamplingSampling
How many sample do we need?A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988)
To gain same error rate for varies size of table,Sample rate drops when size of table grows.Drop rate: log(n)/nExample:20 sample with 2%error rate on table with 100 recordsWe need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records
![Page 20: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/20.jpg)
05/01/04Spring 2004, CSE8330 Presentition20
Summery & Future work Summery & Future work
Low overheadLow error rate, still have room to improveThe way to estimate the size of project and
join operations with statistics still need be improved.
![Page 21: Statistics Profile For Query Optimization](https://reader036.vdocument.in/reader036/viewer/2022062803/568146fb550346895db434bc/html5/thumbnails/21.jpg)
05/01/04Spring 2004, CSE8330 Presentition21
The endThe end