tuning the optimizer statistics >eric miner senior engineer data server technology...
TRANSCRIPT
![Page 2: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/2.jpg)
There are Two Kinds of Optimizer Statistics
> Table/Index level- describes a table and its index(es)
> Page/row counts, cluster ratios, deleted and forwarded rows
> Some are updated dynamically as DML occurs
• page/ row counts, deleted rows, forwarded rows, cluster ratios
> Stored in systabstats
> Column level - describes the data to the optimizer
> Histogram (distribution), density values, default selectivity values
> Static, need to be updated or written directly
> Stored in sysstatistics
> This presentation deals with the column level statistics
![Page 3: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/3.jpg)
What Are These Column Level Statistics Used For?
> The Histogram Values
> Describes the distribution of values in the column
> Belongs to a column, not an index
> Used in costing SARGs
> A step is the point in the column where a value is read to obtain a ‘boundary value’
> A cell represents the rows that fall between two steps
> Each cell has a weight which is the fraction of rows in the column it represents - read as a percentage of all rows
> There are approximately the same number of rows in each cell - except Frequency count cells
![Page 4: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/4.jpg)
Some Quick Definitions
Range cell density: 0.0037264745412389 Total density: 0.3208892191740000 Range selectivity: default used (0.33) In between selectivity: default used (0.25) Histogram for column: “A"Column datatype: integerRequested step count: 20Actual step count: 10 Step Weight Value 1 0.00000000 <= 154143023 2 0.05263700 <= 154220089 3 0.05245500 <= 171522361 4 0.00000000 < 800000000 5 0.34489399 = 800000000 6 0.04968300 <= 859388217 7 0.00000000 < 860000000
![Page 5: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/5.jpg)
What Are These Column Level Statistics Used For? cont.
> A cell can represent either a single value or multiple values of the column
> Range cell - more than one value 2 0.05263982 <= 31423 3 0.05263316 <= 63045
• All values between 31424 and 63045 are in cell 3
> Frequency count cell - only one value (very accurate) 5 0.00000000 < 170016 6 0.15908001 = 170016 7 0.05263815 <= 201861 8 0.10263316 = 201862 9 0.05264576 <= 317462
• Cells 6 and 8 represent only one value
> Cell (step) 1 represents the NULL values in the column
![Page 6: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/6.jpg)
Statistics On Inner Columns of Composite Indexes
Stats on inner columns of composite indexes
Think of a composite index as a 3D object, columns with statistics are transparent, those without statistics are opaque
> Columns with statistics give the optimizer a clearer picture of an index – sometimes good, sometimes not
> This is a fairly common practice
> Does add maintenance
> update index statistics most commonly used to do this
update index statistics tab_name [ind_name]
![Page 7: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/7.jpg)
Statistics On Inner Columns of Composite Indexes cont.
Index on columns E and B – No statistics on column bselect * from TW4 where E = "yes" and b >= 959789065 and id >= 600000 and F > "May 14, 2002“ and A_A = 959000000
Beginning selection of qualifying indexes for table TW4',varno = 0, objectid 464004684. The table (Allpages) has 1000000 rows, 24098 pages,
Estimated selectivity for E, selectivity = 0.527436, upper limit = 0.527436.
No statistics available for B,using the default range selectivity to estimate selectivity.
Estimated selectivity for B, selectivity = 0.330000.
![Page 8: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/8.jpg)
Statistics On Inner Columns of Composite Indexes cont.
The best qualifying index is ‘E_B' (indid 7)costing 49264 pages, with an estimate of 191rows to be returned per scan of the table
FINAL PLAN (total cost = 481960):
varno=0 (TW4) indexid=0 ()path=0xfbccc120 pathtype=sclausemethod=NESTED ITERATION
Table: TW4 scan count 1, logical reads:(regular=24098apf=0 total=24098)physical reads: (regular=16468 apf=0 total=16468), apf IOs used=0
![Page 9: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/9.jpg)
Statistics On Inner Columns of Composite Indexes cont.
Statistics are now on column BEstimated selectivity for E, selectivity = 0.527436, upper limit = 0.527436.
Estimated selectivity for B, selectivity = 0.022199, upper limit = 0.074835.
The best qualifying index is ‘E_B' (indid 7)costing 3317 pages,with an estimate of 13 rows tobe returned per scan of the table
FINAL PLAN (total cost = 55108):
varno=0 (TW4) indexid=7 (E_B)path=0xfbd1da08 pathtype=sclausemethod=NESTED ITERATION
Table: TW4 scan count 1, logicalreads:(regular=4070 apf=0 total=4070),physical reads: (regular=820 apf=0 total=820),
![Page 10: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/10.jpg)
Statistics On Non-Indexed Columns and Joins
Stats on non-indexed columns
Can’t help with index selection but can affect join ordering
> Columns with statistics give the optimizer a clearer picture of the column – no hard coded assumptions have to be used
> When costing joins of non-indexed columns having statistics may result in better plans than using the default values
> Without statistics there will be no Total density or histogram that the optimizer can use to cost the column in the join
> Yes, in some circumstances histograms can be used in costing joins – if there is a SARG on the joining column and that column is also in the join table then the SARG from the joining table can be used to filter the join table
> If there is no SARG on the join column or on the joining column the Total density value (with stats) or the default value (w/o stats) will be used
![Page 11: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/11.jpg)
Statistics On Non-Indexed Columns and Joins cont.
SARG exampleselect ....from TW1, TW4where TW1.A = TW4.A and TW1.A = 10
Selecting best index for the JOIN CLAUSE: TW4.A = TW1.A TW4.A = 10Estimated selectivity for a, selectivity = 0.003726,upper limit = 0.049683. Histogram values used
select ....from TW1, TW4where TW1.A = TW4.A and TW1.B = 10
Selecting best index for the JOIN CLAUSE: TW4.A = TW1.A
Estimated selectivity for a, selectivity = 0.320889. Total density value used
![Page 12: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/12.jpg)
Statistics On Non-Indexed Columns and Joins - Example
select * from TW1,TW2 where TW1.A=TW2.A and TW1.A =805975090 A simple join with a SARG on the join column of one table
Table TW2 column A has no statistics, TW1 column A doesSelecting best index for the JOIN CLAUSE: (for TW2.A)
TW2.A = TW1.A TW2.A = 805975090 Inherited from SARG on TW1
But, can’t help…no statsEstimated selectivity for A, selectivity = 0.100000.
The best qualifying access is a table scan, costing 13384 pages, with an estimate of 50000 rows to be returned per scan of the table, using no data prefetch (size 2K I/O), in data cache 'default data cache' (cacheid 0) with MRU replacementJoin selectivity is 0.100000.
Inherited SARG from other table doesn’t help in this case
![Page 13: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/13.jpg)
Statistics On Non-Indexed Columns and Joins – Example cont.
Without statistics on TW2.A the plan includes a reformat with TW1 as the outer table
FINAL PLAN (total cost = 2855774):
varno=0 (TW1) indexid=2 (A_E_F)path=0xfbd46800 pathtype=sclausemethod=NESTED ITERATION
varno=1 (TW2) indexid=0 ()path=0xfbd0bb10 pathtype=joinmethod=REFORMATTING
> Not the best plan – but the optimizer had little to go on
![Page 14: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/14.jpg)
Statistics On Non-Indexed Columns and Joins – Example cont.
Table TW2 column A now has statistics. The inherited SARG on TW1.A can now be used to help
filter the join on TW2.ASelecting best index for the JOIN CLAUSE:
TW2.A = TW1.A TW2.A = 805975090
Estimated selectivity for A, selectivity = 0.001447, upper limit = 0.052948.
The best qualifying access is a table scan, costing 13384 pages, with an estimate of 724 rows to be returned per scan of the table, using no data prefetch (size 2K I/O), in data cache 'default data cache' (cacheid 0) with MRU replacement
Join selectivity is 0.001447.
![Page 15: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/15.jpg)
Statistics On Non-Indexed Columns and Joins – Example cont.
With statistics on TW2.A reformatting is not used and the join order has changed
FINAL PLAN (total cost = 1252148):
varno=1 (TW2) indexid=0 ()path=0xfbd0b800 pathtype=sclausemethod=NESTED ITERATION
varno=0 (TW1) indexid=2 (A_E_F)path=0xfbd46800 pathtype=sclausemethod=NESTED ITERATION
![Page 16: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/16.jpg)
The Effects of Changing the Number of Steps (Cells)
The Number of Cells (steps) Affects SARG Costing –
As the Number Of Steps Changes Costing Does Too
Cell weights and Range cell density are used in costing SARGs
> Cell weight is used as column’s ‘upper limit’ Range cell density is used as ‘selectivity’ for Equi-SARGs – as seen in 302 output
> Result(s) of interpolation is used as column ‘selectivity’ for Range SARGs
> Increasing the number of steps narrows the average cell width, thus the weight of Range cells decreases
> Can also result in more Frequency count cells and thus change the Range cell density value
> More cells means more granular cells
![Page 17: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/17.jpg)
The Effects of Changing the Number of Steps (Cells) cont.
Average cell width = # of rows/(# of requested steps –1)
> Table has 1 million rows, requested 20 steps -
> 1,000,000/19 = 52,632 rows per cell
> 1,000,000/199 = 5,025 rows per cell
> What does this mean?
> As you increase the number of steps (cells) they become narrower – representing fewer values
> We’ll see that this has an effect on how the optimizer estimates the cost of a SARG
> “update statistics ……. using X values“create index ….. using X values
![Page 18: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/18.jpg)
The Effects of Changing the Number of Steps (Cells) cont.
Changing the number of steps – effects on Equi-SARGs
select A from TW2 where B = 842000000
With 20 cells (steps) in the histogramRange cell density: 0.0012829768785739
9 0.05263200 <= 82556933710 0.05264200 <= 842084405
SARG value falls into cell 10Estimated selectivity for B, selectivity = 0.001283, upper limit = 0.052642. Range cell weight of density qualifying cell
![Page 19: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/19.jpg)
The Effects of Changing the Number of Steps (Cells) cont.
With 200 cells (steps) in the histogramRange cell density: 0.0002303825911991 77 0.00507200 <= 839463989 78 0.00506000 <= 842019895
> SARG value falls into cell 78
Estimated selectivity for B,
selectivity = 0.000230, upper limit = 0.005060.
In this case more cells result in a lower estimated selectivity
> Increasing the number of steps has decreased the average width and lowered the Range cell density and the average cell weight.
> Range cell density decreased because Frequency count cells appeared in the histogram
![Page 20: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/20.jpg)
The Effects of Changing the Number of Steps (Cells) cont.
Changing the number of steps – effects on Range SARGs -select * from TW2 where B between 825570000 and 830000000
With 20 cells (steps) in the histogram
Range cell density: 0.0012829768785739
9 0.05263200 <= 82556933710 0.05264200 <= 842084405
> SARG values fall into cell 10
Estimated selectivity for B,
selectivity = 0.014121, upper limit = 0.052642.
> Here ‘selectivity’ is the product of interpolation, ‘upper limit’ is the weight of the qualifying cell.
> Interpolation estimates how much of cell will qualify for the range SARG
![Page 21: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/21.jpg)
The Effects of Changing the Number of Steps (Cells) cont.
select * from TW2 where B between 825570000 and 830000000
With 200 cells (steps) in the histogram
Range cell density: 0.0002303825911991 67 0.00505200 <= 825505843 68 0.00503000 <= 825570611 69 0.00508000 <= 825635378 70 0.00504000 <= 825690418 71 0.00506400 <= 825702450 72 0.00503200 <= 825767218 73 0.00510200 <= 825831945 74 0.00425800 <= 825833785 75 0.00598400 <= 839462921
Estimated selectivity for B, selectivity = 0.029624, upper limit = 0.034606.
> The SARG values now span multiple cells> Interpolation estimates amount of cells 68 and 75 to use since not
all of those two cells qualify
![Page 22: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/22.jpg)
Some Statistics Related FAQs cont.
How many steps should I request?
> It will depend on your data and your queries
> Increase requested steps to get Frequency count cells when there are highly duplicated values
> FC only represents one value - very accurate weight
> Range SARGs will estimate what portion of a cell qualifies for the SARG
> More cells means narrower cells (represent fewer values)
> Narrower cells mean more accurate estimates
> Can have an affect on equi-SARGs - lower selectivity
![Page 23: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/23.jpg)
Removing Statistics Can Effect Query Plans
Sometimes no statistics are better then having them
This will usually be an issue when very dense columns are involved
Histogram for column: “E" Step Weight Value 1 0.00000000 < "no" 2 0.47256401 = "no" 3 0.00000000 < "yes" 4 0.52743602 = "yes“
This can also show up when you have ‘spikes’ (Frequency count cells) in the distribution
![Page 24: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/24.jpg)
Removing Statistics Can Effect Query Plans cont.
select count(*) from TW4 where E = “yes” and C = 825765940
The table…has 1000000 rows, 24098 pages,
Estimated selectivity for E, selectivity = 0.527436, upper limit = 0.527436.
Estimating selectivity of index ‘E_AA_B', indid 6scan selectivity 0.52743602,filter selectivity 0.527436 527436 rows, 174107 pages
The best qualifying index is ‘E_AA_B' (indid 6) costing 174107 pages, with an estimate of 526 rows
FROM TABLE TW4 Nested iteration. Table Scan.
![Page 25: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/25.jpg)
Removing Statistics Can Effect Query Plans cont.
delete statistics TW4(E)
Estimated selectivity for E, selectivity = 0.100000.
Estimating selectivity of index ‘E_AA_B', indid 6scan selectivity 0.100000,filter selectivity 0.100000 100000 rows, 20584 pages
The best qualifying index is ‘E_AA_B (indid 6) costing 20584 pages, with an estimate of 92 rows
FROM TABLE TW4 Nested iteration. Index : E_AA_B Forward scan. Positioning by key.
![Page 26: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/26.jpg)
Maintaining Tuned Statistics
Tuned statistics will add to your maintenance
Any statistical value you write to sysstatistics either via optdiag or sp_modifystats will be overwritten by update statistics
> Keep optdiag input files for reuse
> If needed get an optdiag output file, edit it and read it in
> Keep scripts that run sp_modifystats
> Rewrite tuned statistics after running update statistics that affects the column with the modified statistics
![Page 27: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/27.jpg)
Monitoring Table/Index Level Fragmentation Using The Statistics
Can Be Both An Optimizer and Space Management Concern
The more fragmentation the less efficient page reads are
> Deleted rows – fewer rows per page, affects costing
> Forwarded rows – 2 I/O each, optimizer adds to costing
> Empty data/leaf pages – more reads may be necessary
> Clustering can get worse
> Watch the DPCR of the table or APL clustered index
> In general the Cluster Ratios are not a good indicator of fragmentation since they are often normally low
> Use optdiag outputs to monitor these values
![Page 28: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/28.jpg)
Monitoring Table/Index Level Fragmentation Using The Statistics cont.
> ASE 12.0 and above check the ‘Space utilization’ value
> ‘Large I/O efficiency’ is another value to watchEmpty data page count: 0Forwarded row count: 0.0000000000000000Deleted row count: 0.0000000000000000
Derived statistics: Data page cluster ratio: 0.9994653835872761 Space utilization: 0.9403543288085808 Large I/O efficiency: 1.0000000000000000
> ‘Space utilization’ and ‘Large I/O efficiency’ are not used by the optimizer
> The further from 1 the more fragmentation there is
![Page 29: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/29.jpg)
Maintaining the Statistics
When data changes the statistics become out of date
In general up to date statistics are needed to get the best query plans
> Statistics are usually updated using update statistics commands
> The more statistics you have the more maintenance
> It’s a trade off between the gain in query performance and the increased statistics maintenance
> There’s no point in updating statistics if the table is static
![Page 30: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/30.jpg)
Update Statistics
> Update statistics has been extended to allow for placement of statistics on columns
> update statistics table_name (col_name)
> update index statistics table_name [ind_name]
> update all statistics table_name
> Specify the requested number of steps (cells) to use when building the column’s histogram
> update statistics table_name (col_name) using 200 values
![Page 31: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/31.jpg)
How Update Statistics Works
Column and table/index values have to be read in order to gather the statistics
> What does it do?
> Reads the column to gather information for density and histogram, writes the column level statistics
> While reading the column it gathers index/table level statistics – row & page count, forwarded rows, deleted rows, the cluster ratios, etc.
> Takes a sample value every X rows for a histogram boundary value - (based on the number of rows and requested steps)
• If same value for multiple steps save it to make an FC
![Page 32: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/32.jpg)
How Update Statistics Works cont.
> Values have to be in sorted order for statistics gathering
> If it’s the leading column of an index no sort is necessary
> Just scan index leaf for statistics
> If not the leading column of an index - create a worktable, read values in, sort and scan for statistics
update statistics tab_name (col_name)- a table scan will be done to read the column
update index statistics (ind_name)- then only an index scan (with a sort of the inner columns)
> The sort is done in a worktable in tempdb.
update index and update all statistics will use a lot of tempdb space unless sampling is used
![Page 33: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/33.jpg)
Some Statistics Related Myths & Legends
“Update statistics will result in improved performance”
> Only guarantees up to date statistics
> Due to distribution statistics may not give a ‘pretty’ picture of the column
“Always use update all statistics”
> Rarely need statistics on all columns of a table
> Can take a VERY long time to run, makes maintenance difficult at best
> Should consider adding stats to composite index columns
![Page 34: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/34.jpg)
Statistics Tools
> Some useful tools for working with the statistics
> Some are by Sybase some are by users
> Optdiag - read, write and simulate the statistics
> Well known and documented
> sp_modifystats - make modifications to density values (more functionality coming soon - 11.9.2.4, 12.0.0.4, 12.5)
> sp__optdiag (that’s a double underscore) -
> by Kevin Sherlock
> Displays the statistics ala optdiag output - very handy
> http://www.sypron.nl/download.html
![Page 35: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/35.jpg)
Sampling for Update Statistics
A new feature in 12.5.0.3
Designed to dramatically reduce the time it takes to update statistics – can dramatically speed up the running of update statistics
> ‘Opens’ your maintenance window
> Decreases the cost of using ASE
Randomly selected pages are read instead of reading all pages to gather the column level statistics – less I/O
> The percentage of pages to be sampled can be specified
update statistics tab_name with sampling = X percent
> X is the percentage of pages you want to sample
• Can be between 1 and 100
![Page 36: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/36.jpg)
Definitions
> Column Level Statistics – those statistics that describe the values in the column to the optimizer – an attribute of a colum (i.e.; the histogram and density values)
> Sampling – randomly reading rows from a specified percentage (subset) of pages rather than all pages of the table in order to gather column level statistics
> Sampling Rate – the specified percentage of pages to read
> Full Scan – to gather statistics by reading all pages of the object (table or index)
> Major Attribute of an Index – the ‘leading’ column of an index as listed in the create index command
![Page 37: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/37.jpg)
Sampling for Update Statistics cont.
> Unofficial tests show that a sampling rate of 10% on a 1 million row numeric column reduces the time for update statistics to run from 9 minutes to 30 seconds
![Page 38: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/38.jpg)
Sampling for Update Statistics cont.
> The Resulting histogram will be based on the values that are sampled
• It will differ from a histogram obtained from a ‘full scan’ update statistics
• The lower the specified percentage of sampling the more the histogram will differ from a full scan histogram
• Test your queries against sampled statistics. In most cases you won’t see any major changes
• Density values not updated by sampling
> In most cases this won’t be an issue.
![Page 39: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/39.jpg)
Why Sampling for Update Statistics?
> As datasets have grown the time it takes to run update statistics has also grown – Dramatically!!
> This became more of an issue with ‘update index statistics’ introduced in 11.9.x due to extra sort in worktable
> TCO and auto-tuning/admin require a faster way to run update statistics
> Without a faster update statistics neither efforts would succeed
> Speeding up update statistics is a long standing Customer feature request
> Random page sampling is the most I/O efficient method
> Dramatically decreased the time to run update statistics
![Page 40: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/40.jpg)
Why Sampling for Update Statistics? cont.
> Some time test results – Not official, not for general release
> ‘your mileage may vary’
> Timings are from tests run by Sybase QA
> 1 million row int colum – timings based on ‘elapsed time’
20% sampling rate – Full scan time :2465850Sampling time : 398783Percentage of savings time(elapsed time):83%
10% sampling rate –Full scan time :2139013Sampling time : 153130Percentage of savings time(elapsed time):92% > Variations in full scan time are taken into account
![Page 41: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/41.jpg)
How Does It Work?
> Specify the percentage of pages to read via update statistics
> ‘with sampling = X percent’• Percent value can be between 1 and 100
> ‘with’ extensions must follow ‘using’ – • with sampling = x percent and/or with consumers = x must follow
using X values
update statistics authors(auth_id) using 40 values with percent = 10
> Sampling reads all rows from each page read
> Row values are moved to the worktable to be sorted and the statistics gathered
> This saves tempdb space since the sampled sets of values are smaller than if the whole column was read into the worktable
![Page 42: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/42.jpg)
How Does It Work? cont.
Specific update statistics syntax and their affects
update statistics table_name [index_name] with sampling = X percent
> Will full scan index pages to update/create statistics on the major attribute(s) of the specified index or all indexes
on the table ignoring the specified sampling rate – sampling will not be done
![Page 43: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/43.jpg)
How Does It Work? cont.
update index statistics tab_name [ind_name] with sampling = X percent
> Will full scan index pages to update/create statistics for the major attribute(s) of the indexes or specified index on the table, ignoring sampling.
> For minor index attributes will use sampling to scan the requested percentage of pages, read those values into a worktable, sort and gather statistics from there.
> The space used in tempdb will decrease as the sampling rate decreases
update statistics tab_name (col_name) with sampling = X percent
> Will use sampling to update/create statistics for the specified column using the specified sampling rate. This applies to all columns whether major attributes of an index or not
> Will not affect multi-column density values
![Page 44: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/44.jpg)
How Does It Work? cont.
update all statistics table_name with sampling = X percent
> Will full scan index pages to gather statistics for the major attribute of all indexes – will not use sampling on these columns
> Will use sampling to gather statistics for all columns that are not the major attribute of an index
> The space used in tempdb will decrease as the sampling rate decreases
![Page 45: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/45.jpg)
How Does It Work? cont.
> Sampling is not used for create index
> Since a full scan is required to build an index there is no additional cost for building the statistics
![Page 46: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/46.jpg)
Trade Offs
> A sampled set of anything is not as accurate as examining the most effective sampling rate for a given dataset
> A histogram created with sampling is not likely to match a histogram created via a full scan
> Histogram boundary values will vary> Cell weights will vary> Minimum and maximum histogram boundary values will vary
> Since cell weight(s) and Range cell density are used to cost all SARGs a histogram from a sampled set will have an affect on SARG costing
> Variations in the upper and lower histogram values may result in ‘out of bounds costing’ by the optimizer
> The smaller the sampling rate the greater the variance is likely to be
![Page 47: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/47.jpg)
Trade Offs cont.
> If there are existing density values they will not be overwritten. If there are no density values a default value of 0.100000 will be used for both Range cell and Total density values
> There is currently no information saved about the use of sampling (whether or not it was used and the sampling rate)
> Different cell types may appear
> As the sampling rate decreases it is possible that Frequency count and/or Range cells may appear where they didn’t exist prior to sampling
> The same pages will be resampled if the dataset is static and the same sampling rate used
![Page 48: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/48.jpg)
Examples of Variations in the Histogram
Full scan histogram -
Step Weight Value
1 0.00000000 <= 154218543
2 0.05315000 <= 805909305
3 0.05305000 <= 808793353
4 0.05311000 <= 822687028
5 0.05304000 <= 825700873
6 0.05314000 <= 839464505
7 0.05292000 <= 842544649
8 0.05305000 <= 858863369
<edited>
20 0.04621000 <= 960051465> Note boundary values, cell weights and the upper and lower boundary values> Variations within the histogram are the main issue that needs to be tested
![Page 49: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/49.jpg)
Examples of Variations in the Histogram cont.
10% sampled histogram -
Step Weight Value
1 0.00000000 <= 154218799
2 0.05253968 <= 805909300
3 0.05253968 <= 808728585
4 0.05253968 <= 822686772
5 0.05269841 <= 825636617
6 0.05349206 <= 839464498
7 0.05253968 <= 842543113
8 0.05253968 <= 858797321
<edited>
20 0.04888889 <= 960050979
> Note variations in the boundary values, cell weights and the upper and lower boundary values
![Page 50: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/50.jpg)
Tuning and Troubleshooting
> “Trial-and-error” testing/tuning will need to be done to determine the most optimal sampling rate for a given dataset
> In most cases variations in the statistics will have no affectIn other cases small variations may change query plans
> There is no ‘rule of thumb’ on what sampling rate to use
> In some cases the same sampling rate may be fine across all or most tables/columns.
> In some cases sampling may not result in efficient plans
> Use showplan and traceon 302/310 outputs to track changes to the query plan as the sampling rate changes
> Using sample queries get above outputs from statistics gathered by a full scan. Update statistics with the sampling rate, rerun query and compare outputs
![Page 51: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/51.jpg)
Tuning and Troubleshooting cont.
>Use optdiag to monitor changes to the histogram
>Check optdiag of full scan histogram for upper and lower boundary values these can be inserted into the histogram if needed
>Keep a copy of optdiag output file as a backup of statistics in case old values need to be reloaded
![Page 52: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/52.jpg)
Future Enhancements
> This first implementation of sampling will require some enhancements
> ‘Scale’ density values gathered by sampling so that they are more accurate
> Track the min/max values in the column in order to maintain the upper and lower boundary values of the histogram
> Sampling index pages
> Will help decrease the time of running update statistics even further
> Add a mechanism to record if sampling was used and what sampling rate was last used
> Add this information to optdiag and traceon 302 (and future optimizer diagnostics)
![Page 53: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/53.jpg)
Where To Get More Information
> The Sybase Customer newsgroups
> http://support.sybase.com/newsgroups
> The Sybase list server
> The external Sybase FAQ
> http://www.isug.com/Sybase_FAQ/
> Join the ISUG, ISUG Technical Journal, feature requests
> http://www.isug.com
![Page 54: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/54.jpg)
Where To Get More Information
> The latest Performance and Tuning Guide
> Don’t be put off by the ASE 12.0 in the title, it covers the 11.9.2 features/functionality too
> http://sybooks.sybase.com/onlinebooks/group-as/asg1200e
> Any “What’s New” docs for a new ASE release
> Tech Docs at Sybase Support
> http://techinfo.sybase.com/css/techinfo.nsf/Home
> Upgrade/Migration help page
> http://www.sybase.com/support/techdocs/migration
![Page 55: Tuning the Optimizer Statistics >Eric Miner Senior Engineer Data Server Technology eric.miner@sybase.com](https://reader038.vdocument.in/reader038/viewer/2022110116/5518a148550346b31f8b4898/html5/thumbnails/55.jpg)
Sybase Developer Network (SDN)
Additional Resources for Developers/DBAs
> Single point of access to developer software, services, and up-to-date technical information:
> White papers and documentation
> Collaboration with other developers and Sybase engineers
> Code samples and beta programs
> Technical recordings
> Free software