managing statistics for optimal query performance
DESCRIPTION
Half the battle of writing good SQL is in understanding how the Oracle query optimizer analyzes your code and applies statistics in order to derive the “best” execution plan. The other half of the battle is successfully applying that knowledge to the databases that you manage. The optimizer uses statistics as input to develop query execution plans, and so these statistics are the foundation of good plans. If the statistics supplied aren’t representative of your actual data, you can expect bad plans. However, if the statistics are representative of your data, then the optimizer will probably choose an optimal plan.TRANSCRIPT
Managing Statistics for Optimal Query Performance
Karen [email protected]
OOW 2009
2009 October 13
1:00pm-2:00pm
Moscone South Room 305
Your speaker…
•Karen Morton
– Sr. Principal Database Engineer
– Educator, DBA, developer, consultant, researcher, author, speaker, …
•Come see me…
– karenmorton.blogspot.com
– An Oracle user group near you
Mathor
Magic ?
“I accept no responsibility forstatistics, which are a form of magic beyond my comprehension.”
— Robertson Davies
SQL>desc deck
Name Null? Type
------------- -------- -------------
SUIT NOT NULL VARCHAR2(10)
CARD VARCHAR2(10)
COLOR VARCHAR2(5)
FACEVAL NOT NULL NUMBER(2)
Table: DECK
Statistic Current value
--------------- -------------------
# rows 52
Blocks 5
Avg Row Len 20
Degree 1
Sample Size 52
Column Name NDV Nulls # Nulls Density Length Low Value High Value
----------- --- ----- ------- ------- ------ ---------- -----------
SUIT 4 N 0 .250000 8 Clubs Spades
CARD 13 Y 0 .076923 5 Ace Two
COLOR 2 Y 0 .500000 5 Black Red
FACEVAL 13 N 0 .076923 3 1 13
Index Name Col# Column Name Unique? Height Leaf Blks Distinct Keys
-------------- ----- ------------ ------- ------ ---------- -------------
DECK_PK 1 SUIT Y 1 1 52
2 FACEVAL
DECK_CARD_IDX 1 CARD N 1 1 13
DECK_COLOR_IDX 1 COLOR N 1 1 2
Cardinality
The estimated number of rows
a query is expected to return.
number of rows in table
x
predicate selectivity
select *
from deck
order by suit, faceval ;
Cardinality
52 x 1 = 52
SQL>select * from deck order by suit, faceval ;
52 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3142028678
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 52 | 1040 | 2|
| 1 | TABLE ACCESS BY INDEX ROWID| DECK | 52 | 1040 | 2|
| 2 | INDEX FULL SCAN | DECK_PK | 52 | | 1|
----------------------------------------------------------------------
*
select *
from deck
where color = 'Black' ;
Cardinality
52 x 1/2 = 26
SQL>select * from deck where color = 'Black' ;
26 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1366616955
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 520 | 2|
| 1 | TABLE ACCESS BY INDEX ROWID| DECK | 26 | 520 | 2|
|* 2 | INDEX RANGE SCAN | DECK_COLOR_ID | 26 | | 1|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COLOR"='Black')
select *
from deck
where card = 'Ace'
and suit = 'Spades' ;
Cardinality
52 x 1/13 x 1/4 = 1
SQL>select *
2 from deck
3 where card = 'Ace'
4 and suit = 'Spades' ;
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2030372774
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 2|
|* 1 | TABLE ACCESS BY INDEX ROWID| DECK | 1 | 20 | 2|
|* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SUIT"='Spades')
2 - access("CARD"='Ace')
select *
from deck
where faceval > 10 ;
52 x
High Value - Predicate Value
High Value - Low Value
(13 – 10)
(13 – 1)
Cardinality
= 13
SQL>select *
2 from deck
3 where faceval > 10 ;
12 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1303963799
---------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------
| 0 | SELECT STATEMENT | | 13 | 260 | 3|
|* 1 | TABLE ACCESS FULL| DECK | 13 | 260 | 3|
---------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FACEVAL">10)
select *
from deck
where card = 'Ace' ;
Cardinality
52 x 1/13 = 4
SQL>select * from deck where card = :b1 ;
4 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2030372774
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 80 | 2|
| 1 | TABLE ACCESS BY INDEX ROWID| DECK | 4 | 80 | 2|
|* 2 | INDEX RANGE SCAN | DECK_CARD_IDX | 4 | | 1|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("CARD"=:B1)
Mathor
Magic ?
Maybe it's a little bit of both!
What's the best methodfor collecting statistics?
It depends.
Statistics that don't reasonably describe your data
…lead to poor cardinality estimates
…which leads to poor access path selection
…which leads to poor join method selection
…which leads to poor join order selection
…which leads to poor SQL execution times.
Statistics matter!
Automatic
Manual
vs
AutomaticCollections
Objects must changeby at least 10%
Collection scheduledduring nightly
maintenance window
dbms_statsgather_database_stats_job_proc
Prioritizes collectionin order by objects
which most need updating
Most functional whendata changes at a slow
to moderate rate
Volatile tables andlarge bulk loads
are good candidatesfor manual collection
Automaticdoesn't mean
accurate(for your data)
AutomaticCollectionDefaults
SQL>exec dbms_stats.gather_table_stats (ownname=>?, tabname=>?) ;
partname NULL
cascade DBMS_STATS.AUTO_CASCADE
estimate_percent DBMS_STATS.AUTO_SAMPLE_SIZE
stattab NULL
block_sample FALSE
statid NULL
method_opt FOR ALL COLUMNS SIZE AUTO
statown NULL
degree 1 or value based on number of CPUs
and initialization parameters
force FALSE
granularity AUTO (value is based on partitioning type)
no_invalidate DBMS_STATS.AUTO_INVALIDATE
cascade=>
AUTO_CASCADE
Allow Oracle to determinewhether or not to
gather index statistics
estimate_percent=>
AUTO_SAMPLE_SIZE
Allow Oracle to determinesample size
method_opt=>
FOR ALL COLUMNS
SIZE AUTO
Allow Oracle to determine when to gather histogram statistics
SYS.COL_USAGE$
no_invalidate=>
AUTO_INVALIDATE
Allow Oracle to determine when to invalidate dependent cursors
Goal
Collect statistics that are"good enough" to meet
most needs most of the time.
Say you were standing with onefoot in the oven and one foot inan ice bucket. According to the percentage people, you would
be perfectly comfortable.– Bobby Bragan
Collections
Manual
dbms_statsgather_*_stats
* = database, schema, table, index, etc.
Is it common for your users to get slammed with performance problems shortly after statistics are updated?
Does performance decline before a 10% data change occurs?
Do low and high values for a column change significantly between automatic collections?
Does your application performance seem "sensitive" to changing user counts as well as data volume changes?
If you answered "Yes"to one or more ofthese questions...
your application'sunique needs may
be best served withmanual collection.
Test. Test. Test.
DynamicSampling
optimizer_dynamic_samplingparameter
dynamic_samplinghint
SQL>create table depend_test as
2 select mod(num, 100) c1,
3 mod(num, 100) c2,
4 mod(num, 75) c3,
5 mod(num, 30) c4
6 from (select level num from dual
7 connect by level <= 10001);
Table created.
SQL>exec dbms_stats.gather_table_stats( user, 'depend_test',
estimate_percent => null, method_opt => 'for all columns size
1');
PL/SQL procedure successfully completed.
Statistic Current value
--------------- --------------
# rows 10001
Blocks 28
Avg Row Len 11
Sample Size 10001
Monitoring YES
Column NDV Density AvgLen Histogram LowVal HighVal
------- --- ------- ------ --------- ------ -------
C1 100 .010000 3 NONE (1) 0 99
C2 100 .010000 3 NONE (1) 0 99
C3 75 .013333 3 NONE (1) 0 74
C4 30 .033333 3 NONE (1) 0 29
SQL>set autotrace traceonly explain
SQL>select count(*) from depend_test where c1 = 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 3984367388
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| DEPEND_TEST | 100 | 300 | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10)
SQL>set autotrace traceonly explain
SQL>select count(*) from depend_test where c1 = 10 and c2 = 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 3984367388
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| DEPEND_TEST | 1 | 6 | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
SQL>set autotrace traceonly explain
SQL>select /*+ dynamic_sampling (4) */ count(*)
2 from depend_test where c1 = 10 and c2 = 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 3984367388
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 8 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| DEPEND_TEST | 100 | 600 | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
Note
-----
- dynamic sampling used for this statement
11g Extended Statistics
SQL> select dbms_stats.create_extended_stats(ownname=>user,
2 tabname => 'DEPEND_TEST',
3 extension => '(c1, c2)' ) AS c1_c2_correlation
4 from dual ;
C1_C2_CORRELATION
-------------------------------------------------------------
SYS_STUF3GLKIOP5F4B0BTTCFTMX0W
SQL> exec dbms_stats.gather_table_stats( user, 'depend_test');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly explain
SQL> select count(*) from depend_test where c1 = 10 and c2 = 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 3984367388
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 9 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| DEPEND_TEST | 100 | 600 | 9 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("C1"=10 AND "C2"=10)
SettingStatistics
dbms_statsset_column_statsset_index_statsset_table_stats
It's OK.
Really.
Why guess(i.e. gather stats)when you know!
CommonPerformance
Problems
3 areas where
non-representativestatistics cause problems
Data Skew
1
The optimizer assumesuniform distribution
of column values.
Color column - uniform distribution
Color column – skewed distribution
Data skew must beidentified witha histogram.
Table: obj_tab
Statistic Current value
--------------- --------------
# rows 1601874
Blocks 22321
Avg Row Len 94
Sample Size 1601874
Monitoring YES
Column: object_type (has 36 distinct values)
OBJECT_TYPE PCT_TOTAL
------------------------------- ---------
WINDOW GROUP - PROGRAM .00-.02
EVALUATION CONTEXT - XML SCHEMA .03-.05
OPERATOR - PROCEDURE .11-.17
LIBRARY - TYPE BODY .30-.35
FUNCTION - INDEX PARTITION .54-.64
JAVA RESOURCE - PACKAGE 1.54-1.69
TABLE - VIEW 3.44-7.35
JAVA CLASS 32.80
SYNONYM 40.01
100% Statistics
FOR ALL COLUMNS SIZE 1
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 16yy3p8sstr28, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_name, object_type, object_id,
status from obj_tab where object_type = 'PROCEDURE'
Plan hash value: 2862749165
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 44497 | 2720 | 1237 |
|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 44497 | 2720 | 193 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='PROCEDURE')
R = .06 seconds
E-Rows = 1/36 x 1,601,874
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 9u6ppkh5mhr8v, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_name, object_type, object_id,
status from obj_tab where object_type = 'SYNONYM'
Plan hash value: 2862749165
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 44497 | 640K| 104K|
|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 44497 | 640K| 44082 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='SYNONYM')
R = 14.25 seconds
E-Rows = 1/36 x 1,601,874
Re-collect statistics
100%
FOR ALL COLUMNS SIZE AUTO
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 16yy3p8sstr28, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_name, object_type, object_id,
status from obj_tab where object_type = 'PROCEDURE'
Plan hash value: 2862749165
--------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| OBJ_TAB | 2720 | 2720 | 1237 |
|* 2 | INDEX RANGE SCAN | OBJ_TYPE_IDX | 2720 | 2720 | 193 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"='PROCEDURE')
R = .07 seconds
E-Rows = histogram x 1,601,874
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 9u6ppkh5mhr8v, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ owner, object_name, object_type, object_id,
status from obj_tab where object_type = 'SYNONYM'
Plan hash value: 2748991475
-----------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
-----------------------------------------------------------------
|* 1 | TABLE ACCESS FULL| OBJ_TAB | 640K| 640K| 64263 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_TYPE"='SYNONYM')
R = 3.36 seconds
E-Rows = histogram x 1,601,874
vs 14.25 seconds
Histograms areimportant for more
reasons than justhelping determine
the access method.
2
Bind Peeking
During hard parse, the optimizer"peeks" at the bind value and
uses it to determine the execution plan.
But, what if yourdata is skewed?
11g
SQL> variable objtype varchar2(19)
SQL> exec :objtype := 'PROCEDURE';
PL/SQL procedure successfully completed.
SQL> select /*+ gather_plan_statistics */ count(*) ct
2 from big_tab
3 where object_type = :objtype ;
CT
---------------
4416
1 row selected.
SQL>
SQL> select * from table
(dbms_xplan.display_cursor('211078a9adzak',0,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------
SQL_ID 211078a9adzak, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) ct from big_tab where
object_type = :objtype
Plan hash value: 154074842
-------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 16 |
|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 4416 | 16 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"=:OBJTYPE)
SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware, is_shareable
3 from v$sql where sql_id = '211078a9adzak' ;
CHILD_NUMBER = 0
EXECUTIONS = 1
BUFFER_GETS = 16
IS_BIND_SENSITIVE = N
IS_BIND_AWARE = N
IS_SHAREABLE = Y
SQL> variable objtype varchar2(19)
SQL> exec :objtype := 'SYNONYM';
PL/SQL procedure successfully completed.
SQL> select /*+ gather_plan_statistics */ count(*) ct
2 from big_tab
3 where object_type = :objtype ;
CT
----------------
854176
1 row selected.
SQL>
SQL> select * from table
(dbms_xplan.display_cursor('211078a9adzak',0,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------
SQL_ID 211078a9adzak, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) ct from big_tab where
object_type = :objtype
Plan hash value: 154074842
-------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 2263 |
|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 854K | 2263 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"=:OBJTYPE)
SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware, is_shareable
3 from v$sql where sql_id = '211078a9adzak' ;
CHILD_NUMBER = 0
EXECUTIONS = 2
BUFFER_GETS = 2279 (2263 + 16)
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = N
IS_SHAREABLE = Y
SQL> variable objtype varchar2(19)
SQL> exec :objtype := 'SYNONYM';
PL/SQL procedure successfully completed.
PLAN_TABLE_OUTPUT
---------------------------------------
SQL_ID 211078a9adzak, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) ct from big_tab where
object_type = :objtype
Plan hash value: 1315022418
-----------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
-----------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 6016 |
|* 2 | INDEX FAST FULL SCAN| BIG_OBJTYPE_IDX | 854K | 854K | 6016 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"=:OBJTYPE)
SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware, is_shareable
3 from v$sql where sql_id = '211078a9adzak' ;
CHILD_NUMBER = 0
EXECUTIONS = 2
BUFFER_GETS = 2279
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = N
IS_SHAREABLE = N
CHILD_NUMBER = 1
EXECUTIONS = 1
BUFFER_GETS = 6016
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = Y
IS_SHAREABLE = Y
SQL> variable objtype varchar2(19)
SQL> exec :objtype := 'PROCEDURE';
PL/SQL procedure successfully completed.
PLAN_TABLE_OUTPUT
---------------------------------------
SQL_ID 211078a9adzak, child number 2
-------------------------------------
select /*+ gather_plan_statistics */ count(*) ct from big_tab where
object_type = :objtype
Plan hash value: 154074842
-------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 16 |
|* 2 | INDEX RANGE SCAN| BIG_OBJTYPE_IDX | 4416 | 4416 | 16 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_TYPE"=:OBJTYPE)
SQL> select child_number, executions, buffer_gets,
2 is_bind_sensitive, is_bind_aware, is_shareable
3 from v$sql where sql_id = '211078a9adzak' ;
CHILD_NUMBER = 0
EXECUTIONS = 2
BUFFER_GETS = 2279
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = N
IS_SHAREABLE = N
CHILD_NUMBER = 1
EXECUTIONS = 1
BUFFER_GETS = 6016
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = Y
IS_SHAREABLE = Y
CHILD_NUMBER = 2
EXECUTIONS = 1
BUFFER_GETS = 16
IS_BIND_SENSITIVE = Y
IS_BIND_AWARE = Y
IS_SHAREABLE = Y
10g will create only 1 plan.
11g will create plans as neededto cover data skew.
Handling bind peekingis more of a coding issue
than a statistics issue.
Incorrect
High and Low
Values
3
To derive the cardinality estimate for range predicates,
the optimizer uses the low and high value statistics.
Table: hi_lo_t
Statistic Current value
--------------- ---------------------
# rows 100000
Blocks 180
Avg Row Len 7
Sample Size 100000
Monitoring YES
Column NDV Nulls Density AvgLen Histogram LowVal HighVal
------- ------ ----- ------- ------ --------- ------ -------
A 100000 N .000010 5 NONE (1) 10 100009
B 10 Y .100000 3 NONE (1) 9 18
100% Statistics
FOR ALL COLUMNS SIZE 1
select count(a)
from hi_lo_t
where b < 11 ;
11 – 9
18 – 9
100000 rows x .22222 = 22222
Predicate value – Low value
High value – Low value( )
select count(a) from hi_lo_t where b < 11
Plan hash value: 3307858660
------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 184 |
|* 2 | TABLE ACCESS FULL| HI_LO_T | 22222 | 20000 | 184 |
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("B"<11)
select count(a)
from hi_lo_t
where b < 4 ;
4 – 9
18 – 9
100000 rows x .04444 = 4444
Predicate value – Low value
High value – Low value( ).10 x 1 +
select count(a) from hi_lo_t where b < 4
Plan hash value: 3307858660
------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 184 |
|* 2 | TABLE ACCESS FULL| HI_LO_T | 4444 | 0 | 184 |
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("B"<4)
METHOD_OPT=>
'FOR ALL INDEXED COLUMNS'
Be cautious of using this!
If column is not indexed,no statistics are collected.
select count(a) from hi_lo_t where b = 12
Plan hash value: 3307858660
------------------------------------------------------------------
| Id | Operation | Name | E-Rows | A-Rows | Buffers |
------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 184 |
|* 2 | TABLE ACCESS FULL| HI_LO_T | 1000 | 10000 | 184 |
------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("B"=12)
Without statistics, a 10% default is used.
Result:
Cardinality estimates that are orders of magnitude "off"
Conclusion
Why guess when you can know.
Thoroughly test anddocument your
statistics collectionstrategy.
Check default optionsparticularly when
upgrading.
Things change.
Regularly checkstatistics and compareto previous collections
for any anomalies.
10.2.0.4 and abovedbms_stats.diff_table_stats_*
Don't ignoreyour data.
There is nosingle strategy
that works bestfor everyone.
Statistics must reasonablyrepresent your actual data.
Understanding basicoptimizer statistics
computations is key.
The more you know,the more likely you
are to succeed.
Thank You!
Q U E S T I O N S
A N S W E R S