managing statistics for optimal query performance

Managing Statistics for Optimal Query Performance

Karen [email protected]

OOW 2009

2009 October 13

1:00pm-2:00pm

Moscone South Room 305

mailto:[email protected]

Your speaker…

•Karen Morton

– Sr. Principal Database Engineer

– Educator, DBA, developer, consultant, researcher, author, speaker, …

•Come see me…

– karenmorton.blogspot.com

– An Oracle user group near you

http://karenmorton.blogspot.com

Mathor

Magic ?

“I accept no responsibility forstatistics, which are a form of magic beyond my comprehension.”

— Robertson Davies

SQL>desc deck

Name Null? Type

------------- -------- -------------

SUIT NOT NULL VARCHAR2(10)

CARD VARCHAR2(10)

COLOR VARCHAR2(5)

FACEVAL NOT NULL NUMBER(2)

Table: DECK

Statistic Current value

--------------- -------------------

# rows 52

Blocks 5

Avg Row Len 20

Degree 1

Sample Size 52

Column Name NDV Nulls # Nulls Density Length Low Value High Value

----------- --- ----- ------- ------- ------ ---------- -----------

SUIT 4 N 0 .250000 8 Clubs Spades

CARD 13 Y 0 .076923 5 Ace Two

COLOR 2 Y 0 .500000 5 Black Red

FACEVAL 13 N 0 .076923 3 1 13

Index Name Col# Column Name Unique? Height Leaf Blks Distinct Keys

-------------- ----- ------------ ------- ------ ---------- -------------

DECK_PK 1 SUIT Y 1 1 52

2 FACEVAL

DECK_CARD_IDX 1 CARD N 1 1 13

DECK_COLOR_IDX 1 COLOR N 1 1 2

Cardinality

The estimated number of rows

a query is expected to return.

number of rows in table

x

predicate selectivity

select *

from deck

order by suit, faceval ;

Cardinality

52 x 1 = 52

SQL>select * from deck order by suit, faceval ;

52 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3142028678

----------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost |

----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 52 | 1040 | 2|

| 1 | TABLE ACCESS BY INDEX ROWID| DECK | 52 | 1040 | 2|

| 2 | INDEX FULL SCAN | DECK_PK | 52 | | 1|

----------------------------------------------------------------------

*

select *

from deck

where color = 'Black' ;

Cardinality

52 x 1/2 = 26

SQL>select * from deck where color = 'Black' ;

26 rows selected.

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------


----------------------------------------------------------------------------



|* 2 | INDEX RANGE SCAN | DECK_COLOR_ID | 26 | | 1|

----------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("COLOR"='Black')

select *

from deck

where card = 'Ace'

and suit = 'Spades' ;

Cardinality

52 x 1/13 x 1/4 = 1

SQL>select *

2 from deck

3 where card = 'Ace'

4 and suit = 'Spades' ;

1 row selected.

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------


----------------------------------------------------------------------------


|* 1 | TABLE ACCESS BY INDEX ROWID| DECK | 1 | 20 | 2|

|* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1|

----------------------------------------------------------------------------


---------------------------------------------------

1 - filter("SUIT"='Spades')

2 - access("CARD"='Ace')

select *

from deck

where faceval > 10 ;

52 x

High Value - Predicate Value

High Value - Low Value

(13 – 10)

(13 – 1)

Cardinality

= 13

SQL>select *

2 from deck

3 where faceval > 10 ;

12 rows selected.

Execution Plan

----------------------------------------------------------


---------------------------------------------------------


---------------------------------------------------------


|* 1 | TABLE ACCESS FULL| DECK | 13 | 260 | 3|

---------------------------------------------------------


---------------------------------------------------

1 - filter("FACEVAL">10)

select *

from deck

where card = 'Ace' ;

Cardinality

52 x 1/13 = 4

SQL>select * from deck where card = :b1 ;

4 rows selected.

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------


----------------------------------------------------------------------------



|* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1|

----------------------------------------------------------------------------


---------------------------------------------------

2 - access("CARD"=:B1)

Mathor

Magic ?

Maybe it's a little bit of both!

What's the best methodfor collecting statistics?

It depends.

Statistics that don't reasonably describe your data

…lead to poor cardinality estimates

…which leads to poor access path selection

…which leads to poor join method selection

…which leads to poor join order selection

…which leads to poor SQL execution times.

Statistics matter!

Automatic

Manual

vs

AutomaticCollections

Objects must changeby at least 10%

Collection scheduledduring nightly

maintenance window

dbms_statsgather_database_stats_job_proc

Prioritizes collectionin order by objects

which most need updating

Most functional whendata changes at a slow

to moderate rate

Volatile tables andlarge bulk loads

are good candidatesfor manual collection

Automaticdoesn't mean

accurate(for your data)

AutomaticCollectionDefaults

SQL>exec dbms_stats.gather_table_stats (ownname=>?, tabname=>?) ;

partname NULL

cascade DBMS_STATS.AUTO_CASCADE

estimate_percent DBMS_STATS.AUTO_SAMPLE_SIZE

stattab NULL

block_sample FALSE

statid NULL

method_opt FOR ALL COLUMNS SIZE AUTO

statown NULL

degree 1 or value based on number of CPUs

and initialization parameters

force FALSE

granularity AUTO (value is based on partitioning type)

no_invalidate DBMS_STATS.AUTO_INVALIDATE

cascade=>

AUTO_CASCADE

Allow Oracle to determinewhether or not to

gather index statistics

estimate_percent=>

AUTO_SAMPLE_SIZE

Allow Oracle to determinesample size

method_opt=>

FOR ALL COLUMNS

SIZE AUTO

Allow Oracle to determine when to gather histogram statistics

SYS.COL_USAGE$

no_invalidate=>

AUTO_INVALIDATE

Allow Oracle to determine when to invalidate dependent cursors

Goal

Collect statistics that are"good enough" to meet

most needs most of the time.

Say you were standing with onefoot in the oven and one foot inan ice bucket. According to the percentage people, you would

be perfectly comfortable.– Bobby Bragan

Collections

Manual

dbms_statsgather_*_stats

* = database, schema, table, index, etc.

Is it common for your users to get slammed with performance problems shortly after statistics are updated?

Does performance decline before a 10% data change occurs?

Do low and high values for a column change significantly between automatic collections?

Does your application performance seem "sensitive" to changing user counts as well as data volume changes?

If you answered "Yes"to one or more ofthese questions...

your application'sunique needs may

be best served withmanual collection.

Test. Test. Test.

DynamicSampling

optimizer_dynamic_samplingparameter

dynamic_samplinghint

SQL>create table depend_test as

2 select mod(num, 100) c1,

3 mod(num, 100) c2,

4 mod(num, 75) c3,

5 mod(num, 30) c4

6 from (select level num from dual

7 connect by level <= 10001);

Table created.

SQL>exec dbms_stats.gather_table_stats( user, 'depend_test',

estimate_percent => null, method_opt => 'for all columns size

1');

PL/SQL procedure successfully completed.


--------------- --------------

# rows 10001

Blocks 28

Avg Row Len 11

Sample Size 10001

Monitoring YES

Column NDV Density AvgLen Histogram LowVal HighVal

------- --- ------- ------ --------- ------ -------

C1 100 .010000 3 NONE (1) 0 99

C2 100 .010000 3 NONE (1) 0 99

C3 75 .013333 3 NONE (1) 0 74

C4 30 .033333 3 NONE (1) 0 29

SQL>set autotrace traceonly explain

SQL>select count(*) from depend_test where c1 = 10;

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 8 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 3 | | |

|* 2 | TABLE ACCESS FULL| DEPEND_TEST | 100 | 300 | 8 (0)| 00:00:01 |

----------------------------------------------------------------------------------


---------------------------------------------------

2 - filter("C1"=10)


SQL>select count(*) from depend_test where c1 = 10 and c2 = 10;

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------------


----------------------------------------------------------------------------------


| 1 | SORT AGGREGATE | | 1 | 6 | | |


----------------------------------------------------------------------------------


---------------------------------------------------

2 - filter("C1"=10 AND "C2"=10)


SQL>select /*+ dynamic_sampling (4) */ count(*)

2 from depend_test where c1 = 10 and c2 = 10;

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------------


----------------------------------------------------------------------------------


| 1 | SORT AGGREGATE | | 1 | 6 | | |


----------------------------------------------------------------------------------


---------------------------------------------------

2 - filter("C1"=10 AND "C2"=10)

Note

-----

- dynamic sampling used for this statement

11g Extended Statistics

SQL> select dbms_stats.create_extended_stats(ownname=>user,

2 tabname => 'DEPEND_TEST',

3 extension => '(c1, c2)' ) AS c1_c2_correlation

4 from dual ;

C1_C2_CORRELATION

-------------------------------------------------------------

SYS_STUF3GLKIOP5F4B0BTTCFTMX0W

SQL> exec dbms_stats.gather_table_stats( user, 'depend_test');


SQL> set autotrace traceonly explain

SQL> select count(*) from depend_test where c1 = 10 and c2 = 10;

Execution Plan

----------------------------------------------------------


----------------------------------------------------------------------------------


----------------------------------------------------------------------------------


| 1 | SORT AGGREGATE | | 1 | 6 | | |


----------------------------------------------------------------------------------


---------------------------------------------------

2 - filter("C1"=10 AND "C2"=10)

SettingStatistics

dbms_statsset_column_statsset_index_statsset_table_stats

It's OK.

Really.

Why guess(i.e. gather stats)when you know!

CommonPerformance

Problems

3 areas where

non-representativestatistics cause problems

Data Skew

1

The optimizer assumesuniform distribution

of column values.

Color column - uniform distribution

Color column – skewed distribution

Data skew must beidentified witha histogram.

Table: obj_tab


--------------- --------------

# rows 1601874

Blocks 22321

Avg Row Len 94

Sample Size 1601874

Monitoring YES

Column: object_type (has 36 distinct values)

OBJECT_TYPE PCT_TOTAL

------------------------------- ---------

WINDOW GROUP - PROGRAM .00-.02

EVALUATION CONTEXT - XML SCHEMA .03-.05

OPERATOR - PROCEDURE .11-.17

LIBRARY - TYPE BODY .30-.35

FUNCTION - INDEX PARTITION .54-.64

JAVA RESOURCE - PACKAGE 1.54-1.69

TABLE - VIEW 3.44-7.35

JAVA CLASS 32.80

SYNONYM 40.01

100% Statistics

FOR ALL COLUMNS SIZE 1

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 16yy3p8sstr28, child number 0

-------------------------------------

select /*+ gather_plan_statistics */ owner, object_name, object_type, object_id,

status from obj_tab where object_type = 'PROCEDURE'


--------------------------------------------------------------------------------

| Id | Operation | Name | E-Rows | A-Rows | Buffers |

--------------------------------------------------------------------------------

| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 44497 | 2720 | 1237 |

|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 44497 | 2720 | 193 |

--------------------------------------------------------------------------------


---------------------------------------------------

2 - access("OBJECT_TYPE"='PROCEDURE')

R = .06 seconds

E-Rows = 1/36 x 1,601,874

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 9u6ppkh5mhr8v, child number 0

-------------------------------------


status from obj_tab where object_type = 'SYNONYM'


--------------------------------------------------------------------------------


--------------------------------------------------------------------------------

| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 44497 | 640K| 104K|

|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 44497 | 640K| 44082 |

--------------------------------------------------------------------------------


---------------------------------------------------

2 - access("OBJECT_TYPE"='SYNONYM')

R = 14.25 seconds

E-Rows = 1/36 x 1,601,874

Re-collect statistics

100%

FOR ALL COLUMNS SIZE AUTO

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 16yy3p8sstr28, child number 0

-------------------------------------


status from obj_tab where object_type = 'PROCEDURE'


--------------------------------------------------------------------------------


--------------------------------------------------------------------------------

| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 2720 | 2720 | 1237 |

|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 2720 | 2720 | 193 |

--------------------------------------------------------------------------------


---------------------------------------------------

2 - access("OBJECT_TYPE"='PROCEDURE')

R = .07 seconds

E-Rows = histogram x 1,601,874

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------

SQL_ID 9u6ppkh5mhr8v, child number 0

-------------------------------------


status from obj_tab where object_type = 'SYNONYM'


-----------------------------------------------------------------


-----------------------------------------------------------------

|* 1 | TABLE ACCESS FULL| OBJ_TAB | 640K| 640K| 64263 |

-----------------------------------------------------------------


---------------------------------------------------

1 - filter("OBJECT_TYPE"='SYNONYM')

R = 3.36 seconds

E-Rows = histogram x 1,601,874

vs 14.25 seconds

Histograms areimportant for more

reasons than justhelping determine

the access method.

2

Bind Peeking

During hard parse, the optimizer"peeks" at the bind value and

uses it to determine the execution plan.

But, what if yourdata is skewed?

11g

SQL> variable objtype varchar2(19)

SQL> exec :objtype := 'PROCEDURE';


SQL> select /*+ gather_plan_statistics */ count(*) ct

2 from big_tab

3 where object_type = :objtype ;

CT

---------------

4416

1 row selected.

SQL>

SQL> select * from table

(dbms_xplan.display_cursor('211078a9adzak',0,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT

---------------------------------------

SQL_ID 211078a9adzak, child number 0

-------------------------------------

select /*+ gather_plan_statistics */ count(*) ct from big_tab where

object_type = :objtype


-------------------------------------------------------------------------


-------------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 16 |

|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 4416 | 16 |

-------------------------------------------------------------------------


---------------------------------------------------

2 - access("OBJECT_TYPE"=:OBJTYPE)

SQL> select child_number, executions, buffer_gets,

2 is_bind_sensitive, is_bind_aware, is_shareable

3 from v$sql where sql_id = '211078a9adzak' ;

CHILD_NUMBER = 0

EXECUTIONS = 1

BUFFER_GETS = 16

IS_BIND_SENSITIVE = N

IS_BIND_AWARE = N

IS_SHAREABLE = Y


SQL> exec :objtype := 'SYNONYM';


SQL> select /*+ gather_plan_statistics */ count(*) ct

2 from big_tab

3 where object_type = :objtype ;

CT

----------------

854176

1 row selected.

SQL>

SQL> select * from table

(dbms_xplan.display_cursor('211078a9adzak',0,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT

---------------------------------------


-------------------------------------




-------------------------------------------------------------------------


-------------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 2263 |

|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 854K | 2263 |

-------------------------------------------------------------------------


---------------------------------------------------





CHILD_NUMBER = 0

EXECUTIONS = 2

BUFFER_GETS = 2279 (2263 + 16)

IS_BIND_SENSITIVE = Y

IS_BIND_AWARE = N

IS_SHAREABLE = Y


SQL> exec :objtype := 'SYNONYM';


PLAN_TABLE_OUTPUT

---------------------------------------


-------------------------------------




-----------------------------------------------------------------------------


-----------------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 6016 |

|* 2 | INDEX FAST FULL SCAN| BIG_OBJTYPE_IDX | 854K | 854K | 6016 |

-----------------------------------------------------------------------------


---------------------------------------------------





CHILD_NUMBER = 0

EXECUTIONS = 2

BUFFER_GETS = 2279


IS_BIND_AWARE = N

IS_SHAREABLE = N

CHILD_NUMBER = 1

EXECUTIONS = 1

BUFFER_GETS = 6016


IS_BIND_AWARE = Y

IS_SHAREABLE = Y


SQL> exec :objtype := 'PROCEDURE';


PLAN_TABLE_OUTPUT

---------------------------------------


-------------------------------------




-------------------------------------------------------------------------


-------------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 16 |

|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 4416 | 16 |

-------------------------------------------------------------------------


---------------------------------------------------





CHILD_NUMBER = 0

EXECUTIONS = 2

BUFFER_GETS = 2279


IS_BIND_AWARE = N

IS_SHAREABLE = N

CHILD_NUMBER = 1

EXECUTIONS = 1

BUFFER_GETS = 6016


IS_BIND_AWARE = Y

IS_SHAREABLE = Y

CHILD_NUMBER = 2

EXECUTIONS = 1

BUFFER_GETS = 16


IS_BIND_AWARE = Y

IS_SHAREABLE = Y

10g will create only 1 plan.

11g will create plans as neededto cover data skew.

Handling bind peekingis more of a coding issue

than a statistics issue.

Incorrect

High and Low

Values

3

To derive the cardinality estimate for range predicates,

the optimizer uses the low and high value statistics.

Table: hi_lo_t


--------------- ---------------------

# rows 100000

Blocks 180

Avg Row Len 7

Sample Size 100000

Monitoring YES

Column NDV Nulls Density AvgLen Histogram LowVal HighVal

------- ------ ----- ------- ------ --------- ------ -------

A 100000 N .000010 5 NONE (1) 10 100009

B 10 Y .100000 3 NONE (1) 9 18

100% Statistics

FOR ALL COLUMNS SIZE 1

select count(a)

from hi_lo_t

where b < 11 ;

11 – 9

18 – 9

100000 rows x .22222 = 22222

Predicate value – Low value

High value – Low value( )

select count(a) from hi_lo_t where b < 11


------------------------------------------------------------------


------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 184 |

|* 2 | TABLE ACCESS FULL| HI_LO_T | 22222 | 20000 | 184 |

------------------------------------------------------------------


---------------------------------------------------

2 - filter("B"<11)

select count(a)

from hi_lo_t

where b < 4 ;

4 – 9

18 – 9

100000 rows x .04444 = 4444

Predicate value – Low value

High value – Low value( ).10 x 1 +

select count(a) from hi_lo_t where b < 4


------------------------------------------------------------------


------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 184 |


------------------------------------------------------------------


---------------------------------------------------

2 - filter("B"<4)

METHOD_OPT=>

'FOR ALL INDEXED COLUMNS'

Be cautious of using this!

If column is not indexed,no statistics are collected.

select count(a) from hi_lo_t where b = 12


------------------------------------------------------------------


------------------------------------------------------------------

| 1 | SORT AGGREGATE | | 1 | 1 | 184 |


------------------------------------------------------------------


---------------------------------------------------

2 - filter("B"=12)

Without statistics, a 10% default is used.

Result:

Cardinality estimates that are orders of magnitude "off"

Conclusion

Why guess when you can know.

Thoroughly test anddocument your

statistics collectionstrategy.

Check default optionsparticularly when

upgrading.

Things change.

Regularly checkstatistics and compareto previous collections

for any anomalies.

10.2.0.4 and abovedbms_stats.diff_table_stats_*

Don't ignoreyour data.

There is nosingle strategy

that works bestfor everyone.

Statistics must reasonablyrepresent your actual data.

Understanding basicoptimizer statistics

computations is key.

The more you know,the more likely you

are to succeed.

Thank You!

Q U E S T I O N S

A N S W E R S

managing statistics for optimal query performance

Technology

id operation

operation id

deck statisticcurrent

plan hash value

index range scan deck

deck order

execution plan

rows bytes cost