statistics that need special attention

45
Statistics That Need Special Attention Joe Chang www.qdpma.com Jchang6 @ yahoo http://www.qdpma.com/CBO/Statistics.html

Upload: dangtu

Post on 11-Feb-2017

218 views

Category:

Documents


2 download

TRANSCRIPT

Statistics That Need Special Attention

Joe Changwww.qdpma.com Jchang6 @ yahoo

http://www.qdpma.com/CBO/Statistics.html

About Joe• SQL Server consultant since 1999• Query Optimizer execution plan cost formulas (2002)• True cost structure of SQL plan operations (2003?)• Database with distribution statistics only, no data

2004• Decoding statblob/stats_stream

– writing your own statistics• Disk IO cost structure• Tools for system monitoring, execution plan analysis See http://www.qdpma.com/Download: http://www.qdpma.com/ExecStatsZip.htmlBlog: http://sqlblog.com/blogs/joe_chang/default.aspx

Statistics – Special Attention

• What works automatically?– No need for special attention

• What does not work by default– or simple adjustments, trace flags etc.– What could cause spectacular failures

• Ok, now what do we do?– scheduled jobs, triggered jobs, in-procedure,

optimize for “fake data” Other?

Topics

• Quick Statistics Overview– Details available elsewhere

• Weak point of SQL Server Statistics – Recompute set points– Sampling– Compile parameter etc.

• Options – Trace 2371, 4136• Workarounds

SQL Performance

Natural keys with unique indexes, not SQL

The Execution Plan links all the elements of performanceIndex tuning alone has limited valueOver indexing can cause problems as well

Index and Statistics maintenance policy

1 Logic may need more than one execution plan?

Compile cost versus execution cost?

Tables and SQL combined implement business logic

Plan cache bloat?

SQL Tablesnatural keys

Indexes

Execution Plan

Statistics& Compile parameters

Compile Row estimate propagation

errors

Storage Engine

Hardware

DOP MemoryParallel plans

Recompiletemp table /

table variable

Query Optimizer

Index & Stats Maintenance

API Server Cursors: open, prepare, execute, close?

SET NO COUNT Information messages

Factors to Consider

SQL Tables Indexes

Query Optimizer

Statistics

Compile Parameters

Storage Engine

Hardware

DOPmemory

STATISTICS

SQL Server Statistics

• Principles: it should just work• Statistics are automatically – created and updated

• Indexes and columns

Statistics Used by the Query Optimizer in Microsoft SQL Server 2008http://technet.microsoft.com/en-us/library/dd535534(v=sql.100).aspxEric N. Hanson and Yavor AngelovContributor Lubor Kollar

DBCC SHOW_STATISTICSDBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX)

header

Density vector

Histogram

Options:WITH STAT_HEADER, DENSITY VECTOR, HISTOGRAM, STATS_STREAM

binary

Density Vector

• Statistics binary storage structure allows for 30 rows in density vector

• Full nonclustered index key consists of explicitly declared columns plus cluster key columns (not in NC key)

• Limit of 15 columns in clustered index key• And 15 columns in nonclustered index key

Histogram

• Up to 200 steps• Captures low and high bounds• Attempts to capture skewed distribution– frequent values equal rows, others in range rows

UPDATE STATISTICS – STATS_STREAM

UPDATE STATISTICS LINEITEM(L_SHIPDATE_CLUIDX)WITH STATS_STREAM = 0x010000000300000000000000000000000792A163000000009A128038F3..., ROWCOUNT = 59986052, PAGECOUNT = 574971

DBCC SHOW_STATISTICS('LINEITEM',L_SHIPDATE_CLUIDX) WITH STATS_STREAM

Statistics Structure

• Stored (mostly) in binary field Scalar values

Density Vector – limit 30, half in NC, half Cluster key

HistogramUp to 200 steps

Consider not blindly using IDENTITY on critical tablesExample: Large customers get low ID valuesSmall customers get high ID values

http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

Statistics

• Auto - re-compute at 20% (first 6, 500 rows)• Sampling strategy– How much to sample - theory?– Random pages versus random rows– Histogram Equal and Range Rows– Out of bounds, value does not exist– etc.

Statistics Used by the Query Optimizer in SQL Server 2008Eric N. Hanson and Yavor Angelov, Contributor: Lubor KollarOptimizing Your Query Plans with the SQL Server 2014 Cardinality EstimatorJoseph Sack

http://msdn.microsoft.com/en-us/library/dd535534.aspx

System tables, views, functions

;WITH k AS ( SELECT k.object_id, k.stats_id, k.stats_column_id, c.column_id, c.name FROM sys.stats_columns k INNER JOIN sys.columns c ON c.object_id = k.object_id AND c.column_id = k.column_id )SELECT s.name, o.name, d.name, d.stats_id, d.auto_created, ISNULL(STUFF(( SELECT ', ' + name FROM k WHERE k.object_id = d.object_id AND k.stats_id = d.stats_id ORDER BY k.stats_column_id FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Keys , p.rows, p.rows_sampled, p.steps, p.modification_counter, p.last_updatedFROM sys.objects o JOIN sys.schemas s ON s.schema_id = o.schema_idJOIN sys.stats d ON d.object_id = o.object_idOUTER APPLY sys.dm_db_stats_properties(d.object_id , d.stats_id) p WHERE o.is_ms_shipped = 0ORDER BY s.name, o.name, d.stats_id

sys.statssys.stats_columnssys.dm_db_stats_properties (2008R2 sp2, 2012 sp1, 2014 RTM)

;WITH b AS ( SELECT d.object_id, d.index_id, part = COUNT(*) , reserved = 8*SUM(d.reserved_page_count) , used = 8*SUM(d.used_page_count ) , in_row_data = 8*SUM(d.in_row_data_page_count) , lob_used = 8*SUM(d.lob_used_page_count) , overflow = 8*SUM( d.row_overflow_used_page_count) , row_count = SUM(row_count) , notcompressed = SUM(CASE data_compression WHEN 0 THEN 1 ELSE 0 END) , compressed = SUM(CASE data_compression WHEN 0 THEN 0 ELSE 1 END) -- change to 0 for SQL Server 2005 FROM sys.dm_db_partition_stats d WITH(NOLOCK) INNER JOIN sys.partitions r WITH(NOLOCK) ON r.partition_id = d.partition_id GROUP BY d.object_id, d.index_id ), j AS ( SELECT j.object_id, j.index_id, j.key_ordinal, c.column_id, c.name,is_descending_key FROM sys.index_columns j INNER JOIN sys.columns c ON c.object_id = j.object_id AND c.column_id = j.column_id )SELECT t.name, o.name, ISNULL(i.name, '') [index] , ISNULL(STUFF(( SELECT ', ' + name + CASE is_descending_key WHEN 1 THEN '-' ELSE '' END FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal >0 ORDER BY j.key_ordinal FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Keys , ISNULL(STUFF(( SELECT ', ' + name FROM j WHERE j.object_id = i.object_id AND j.index_id = i.index_id AND j.key_ordinal = 0 ORDER BY j.column_id FOR XML PATH(''), TYPE, ROOT).value('root[1]','nvarchar(max)'),1,1,'') ,'') as Incl , i.index_id , CASE WHEN i.is_primary_key = 1 THEN 'PK' WHEN i.is_unique_constraint = 1 THEN 'UC' WHEN i.is_unique = 1 THEN 'U' WHEN i.type = 0 THEN 'heap' WHEN i.type = 3 THEN 'X' WHEN i.type = 4 THEN 'S' ELSE CONVERT(char, i.type) END typ , i.data_space_id dsi , b.row_count , b.in_row_data in_row , b.overflow ovf , b.lob_used lob , b.reserved - b.in_row_data - b.overflow -b.lob_used unu , 'ABR' = CASE row_count WHEN 0 THEN 0 ELSE 1024*used/row_count END , y.user_seeks, y.user_scans u_scan, y.user_lookups u_look, y.user_updates u_upd , b.notcompressed ncm , b.compressed cmp , rw_delta = b.row_count - s.rows, s.rows_sampled --, s.unfiltered_rows, s.modification_counter mod_ctr, s.steps , CONVERT(varchar, s.last_updated,120) updated, i.is_disabled dis, i.is_hypothetical hyp, ISNULL(i.filter_definition, '') filt FROM sys.objects o JOIN sys.schemas t ON t.schema_id = o.schema_idJOIN sys.indexes i ON i.object_id = o.object_id LEFT JOIN b ON b.object_id = i.object_id AND b.index_id = i.index_id LEFT JOIN sys.dm_db_index_usage_stats y ON y.object_id = i.object_id AND y.index_id = i.index_id AND y.database_id = DB_ID() OUTER APPLY sys.dm_db_stats_properties(i.object_id , i.index_id) s WHERE o.is_ms_shipped = 0

Statistics Auto/Re-Compute

• Automatically generated on query compile• Recompute at 6 rows, 500, every 20%?

Has this changed? 2008 R2Trace 2371 – lower threshold auto recomputed for large tables

http://support.microsoft.com/kb/2754171

http://www.sqlskills.com/blogs/erin/understanding-when-statistics-will-automatically-update/Understanding When Statistics Will Automatically Update

Statistics Sampling

• Sampling theory– True random sample– Sample error - square root N • Relative error 1/ N

• SQL Server sampling– Random pages • But always first and last page???

– All rows in selected pages

Query Optimizer Assumptions

• Statistics has captured lower and upper bounds

• If there are no range rows between range hi keys, then it knows there are no rows in between

• The page sampling method compensates for lack of true random row sample

Hypothesis

• If we insert/update rows, • then dm_db_stats_properties

modification_counter > 0• Should not we assume that there is new data?

Row Estimate Problems (at source)

• Skewed data distribution– Does query compile with low, medium or high

skew parameter value?– What about subsequent executes?

• Errors – due to random page sample• Out of bounds• Value does not exist

Row estimate errors at source – is classified under statistics topic

Problems in Statistics

• Scenario – update all statistics with default• Insert rows, new (int / identity) values that are

higher than upper bound at time of stats• if plan compiles with previously existing value– Then fine

• If plan compiles with new value, – Then optimizer knows value “does not exist”– But plan will show 1 row

SELECT * FROM PART WHERE P_PARTKEY = 1999999

SELECT * FROM PART WHERE P_PARTKEY = 2000001

SELECT * FROM LINEITEM WHERE L_PARTKEY = 2000001

SELECT * FROM PART JOIN LINEITEM ON L_PARTKEY = P_PARTKEYWHERE P_PARTKEY = 2000001

SELECT * FROM PART JOIN LINEITEM ON L_PARTKEY = P_PARTKEYWHERE P_PARTKEY = 1999999

Loop Join - Table Scan on Inner Source

Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source

Default statistics rules may lead to serious ETL issuesConsider custom strategy

Limited distinct values

• Column has less than 200 distinct values– Statistics via histogram knows the exact values– i.e., it also knows what values do not exist

• Now insert rows with different values– If >= 20%, then statistics update is triggered– Next query w/explicit SARG on new value good• Ex WHERE bIsProcessed = 0

– If too few row to trigger stats, then bad news

Compile Parameter Not ExistsMain procedure has cursor around view_ServersFirst server in view_Servers is ’CAESIUM’Cursor executes sub-procedure for each Serversql:

SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName

But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

SqlPlan Compile Parameters<?xml version="1.0" encoding="utf-8"?><ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)

FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1"StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound">

<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />

<QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar"

EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0">

</RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence></ShowPlanXML>

Compile parameter values at bottom of sqlplan file

Microsoft Responds (Empire Strikes Back)

http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspxChanges to automatic update statistics in SQL Server – traceflag 2371

Trace flag 4136http://blogs.msdn.com/b/axinthefield/archive/2010/11/04/sql-server-trace-flags-for-dynamics-ax.aspxQL Server Trace Flags for Dynamics AX

Disables use of the histogram

2371

http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspx

Upper Bound Problem

• Insert dummy row with maximum value– SET IDENTITY_INSERT table ON– int 2,147,483,647, datetime Dec 31, 9999

• OPTION (OPTIMIZE FOR (@P1 UNKNOWN))• SQL Server 2014 – not a problem

Temp Table and Table Variable

• Forget what other people have said– Most is cr@p

• Temp Tables – subject to statistics auto/re-compile• Table variable – no statistics, assumes 1 row• Question: In each specific case: does the statistics

and recompile help or not?– Yes: temp table– No: table variable

Is this still true?

Multiple joins to one tableSELECT xxxFROM Cession.RetroTransaction rtINNER JOIN Common.Code c1 ON c1.CodeId = rt.TransactionTypeId INNER JOIN Cession.NARSplit ns ON ns.NARSplitId = rt.NARSplitId AND ns.CessionId = rt.CessionIdINNER JOIN CRM.Company comp ON comp.CompanyId = ns.CompanyId INNER JOIN Cession.RetroTransactionPeriod rtp ON rt.RetroTransactionPeriodId = rtp.RetroTransactionPeriodId LEFT OUTER JOIN Cession.RetroTransactionAllocation BasePremium WITH ( NOLOCK ) ON BasePremium.RetroTransactionId = rt.RetroTransactionId AND BasePremium.AllocationTypeId = '02C48EDA-57ED-4704-99CD-6122AE372B3D' AND BasePremium.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation BaseAllowance WITH ( NOLOCK ) ON BaseAllowance.RetroTransactionId = rt.RetroTransactionId

AND BaseAllowance.AllocationTypeId = 'CA5047A3-43CD-496B-8F71-3E95E209806E' LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFee WITH ( NOLOCK ) ON PolicyFee.RetroTransactionId = rt.RetroTransactionId

AND PolicyFee.AllocationTypeId = 'FCD31932-840F-4BCA-99DC-F55BD46E46AF' AND PolicyFee.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'

LEFT OUTER JOIN Cession.RetroTransactionAllocation PolicyFeeAllowance ON PolicyFeeAllowance.RetroTransactionId = rt.RetroTransactionId AND PolicyFeeAllowance.AllocationTypeId = 'E570DA71-6B6C-4742-AB8B-DDFDB5286938'

LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtra WITH ( NOLOCK ) ON TempFlatExtra.RetroTransactionId = rt.RetroTransactionId AND TempFlatExtra.AllocationTypeId = '82CB2D45-904F-4B50-8C36-A39CD2F33D9D'

AND TempFlatExtra.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF' LEFT OUTER JOIN Cession.RetroTransactionAllocation TempFlatExtraAllowance ON TempFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId

AND TempFlatExtraAllowance.AllocationTypeId = '43E49878-9AC3-4A1F-A13F-19A1BA7D3AFA' LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtra WITH ( NOLOCK ) ON PermFlatExtra.RetroTransactionId = rt.RetroTransactionId

AND PermFlatExtra.AllocationTypeId = '889639AB-620C-4234-9B9E-2970A4500B57' AND PermFlatExtra.AllocationGrossNetTypeId = '09E95F88-4546-49B9-B4FB-A068CDBDBFDF'

LEFT OUTER JOIN Cession.RetroTransactionAllocation PermFlatExtraAllowance ON PermFlatExtraAllowance.RetroTransactionId = rt.RetroTransactionId AND PermFlatExtraAllowance.AllocationTypeId = '43DFA681-44FB-4025-B75A-1E12498A679B'

LEFT OUTER JOIN Cession.RetroTransactionAllocation CashSurrender WITH ( NOLOCK ) ON CashSurrender.RetroTransactionId = rt.RetroTransactionId AND CashSurrender.AllocationTypeId = 'BCBDF86C-DCC5-49DA-BD2E-CD42CE402E99'

LEFT OUTER JOIN Common.Code splittype ON splittype.CodeId = ns.SplitTypeIdWHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f8868852af5'

<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.2500.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> <BatchSequence> <Batch> <Statements>

<StmtSimple StatementCompId="1“ StatementEstRows="5004190000“ StatementId="1" StatementOptmLevel="FULL“ StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="44029" StatementText="SELECT rt.RetroTransactionId, FROM Cession.vwRetroTransaction AS rt WHERE rt.CessionId = N'86037aac-465a-4e47-8bf1-6f8868852af5'" StatementType="SELECT" ParameterizedText="(@1 nvarchar(4000)) " QueryHash="0x19274C9794934C69" QueryPlanHash="0x80E6A4E14DF1DC0A"><StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan CachedPlanSize="896" CompileTime="296392" CompileCPU="296383" CompileMemory="379856"><RelOp AvgRowSize="365" EstimateCPU="133.341" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0“ EstimateRows="5004190000" LogicalOp="Gather Streams" NodeId="0" Parallel="true" PhysicalOp="Parallelism“

EstimatedTotalSubtreeCost="44029">

<?xml version="1.0" encoding="utf-16"?><ShowPlanXML xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.50.2500.0“ xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> <BatchSequence> <Batch> <Statements>

<StmtSimple StatementCompId="1“ StatementEstRows="24.2411“StatementId="1" StatementOptmLevel="FULL“ StatementOptmEarlyAbortReason="TimeOut" StatementSubTreeCost="0.683036“ StatementText="SELECT rt.RetroTransactionId FROM Cession.vwRetroTransaction rt WHERE rt.CessionId=@1“ StatementType="SELECT" QueryHash="0x19274C9794934C69" QueryPlanHash="0xD5FF47FBFB2AD6AE"><StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />

<QueryPlan DegreeOfParallelism="1" MemoryGrant="2720“ CachedPlanSize="232“ CompileTime="51" CompileCPU="51" CompileMemory="2712">

<RelOp AvgRowSize="365" EstimateCPU="0.000101328" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="24.2411" LogicalOp="Left Outer Join" NodeId="0" Parallel="false" PhysicalOp="Nested Loops"

EstimatedTotalSubtreeCost="0.683036">

plan