understanding histogram ppt

28
Understanding Histograms in the Oracle Database Robert Gaydos Robert Gaydos R Wave Solutions [email protected] Copyright © R Wave Solutions 2008 All rights reserved

Upload: sandeep-sharma

Post on 27-Apr-2015

115 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding Histogram Ppt

Understanding Histograms in the Oracle Database

Robert GaydosRobert GaydosR Wave [email protected]

Copyright © R Wave Solutions 2008 All rights reserved

Page 2: Understanding Histogram Ppt

Addenda

� What is a Histogram?� Types of Histograms� How to create Histograms� How to identify existing Histograms� Rules and tips when creating Histograms� Q & A

Copyright © R Wave Solutions 2008 All rights reserved

Page 3: Understanding Histogram Ppt

What is a Histogram?

� A histogram holds the data distribution of values within a column of a table.– Number of occurrences for a specific value/range – Number of occurrences for a specific value/range – Used by CBO to optimize a query.

� It is collected by using DBMS_STATS .– By default DBMS_STATS does not collect

histogram stats.� Min/Max Value and # of distinct values� NOTE: DBMS_STAT is used to delete histogram data.

Copyright © R Wave Solutions 2008 All rights reserved

Page 4: Understanding Histogram Ppt

Types of Histograms

� Two types of Histograms– frequency histograms

height-balanced histograms – height-balanced histograms – Type of histogram is stored in the HISTOGRAM

column of the *tab_col_statistics views (USER/DBA)

– Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’, or ‘NONE’)

Copyright © R Wave Solutions 2008 All rights reserved

Page 5: Understanding Histogram Ppt

Histogram and Buckets

� When Histograms are created the number of buckets can be specified.

� It is this number that controls the type of � It is this number that controls the type of histogram created.

� # Buckets = # of Rows of information.� When Distinct Values <= Buckets

– Then Frequency Histogram is created– Else Height-Balanced Histogram is created

Copyright © R Wave Solutions 2008 All rights reserved

Page 6: Understanding Histogram Ppt

Frequency Histograms

� Each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of � Each bucket contains the number of occurrences of that single value.

Copyright © R Wave Solutions 2008 All rights reserved

Page 7: Understanding Histogram Ppt

Frequency Histograms Example

DataABC

SortedABB

ResultsBucket 1 A = 1Bucket 2 B = 3

CBCCCCCBDE

BBCCCCCCDE

Bucket 2 B = 3Bucket 3 C = 6Bucket 4 D = 1Bucket 5 E = 1

Copyright © R Wave Solutions 2008 All rights reserved

Page 8: Understanding Histogram Ppt

Height-Balanced Histograms

� In a height-balanced histogram, the column values are divided into bands so that each band contains approximately the same band contains approximately the same number of rows.

� The useful information that the histogram provides is where in the range of values the endpoints fall .

Copyright © R Wave Solutions 2008 All rights reserved

Page 9: Understanding Histogram Ppt

Height-Balanced Example 1

DataABC

SortedABB

ResultsEPN0

EPVB

CBCCCCCBDE

BBCCCCCCDE

Buckets = 4

0123

BCCE

Copyright © R Wave Solutions 2008 All rights reserved

Page 10: Understanding Histogram Ppt

Height-Balanced Example 1

DataABC

SortedABB

ResultsEPN0

EPVB

Notice that C Crosses Buckets.

The DB only stores0 - B2 - C3 - E

CBCCCCCBDE

BBCCCCCCDE

0123

Buckets = 4

BCCE

Crosses Buckets.

This is how database knows which values are popular

Copyright © R Wave Solutions 2008 All rights reserved

Page 11: Understanding Histogram Ppt

How to Create a Histogram

� Created by using DBMS_STATS.GATHER_TABLE_STATS

� METHOD_OPT => 'FOR COLUMNS SIZE � METHOD_OPT => 'FOR COLUMNS SIZE <# of buckets> <Column Name>‘

execute dbms_stats.gather_table_stats (ownname => 'oe', tabname => 'inventories', METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand');

Copyright © R Wave Solutions 2008 All rights reserved

Page 12: Understanding Histogram Ppt

Identify Existing Histograms

� In 10g– HISTOGRAM column of the *tab_col_statistics views

(USER/DBA)

In 9� In 9select owner, table_name, column_name, count(*)

bucketsfrom dba_histogramswhere endpoint_number not in (0,1)group by owner, table_name, column_name

Copyright © R Wave Solutions 2008 All rights reserved

Page 13: Understanding Histogram Ppt

Rules On When To Create Histograms

� First there are no rules of thumb that are always correct.When creating using dbms_stats, the use of � When creating using dbms_stats, the use of the DEGREE (for parallel) can skew the value of density

� Running dbms_stats can drop histograms– Use method_opt==>'REPEAT‘

Copyright © R Wave Solutions 2008 All rights reserved

Page 14: Understanding Histogram Ppt

Rules On When To Create Histograms

� Index maintenance overhead is performed during update, insert and delete operations.

� Histogram maintenance overhead is � Histogram maintenance overhead is performed during the analyze/dbms_stats process.

� Histograms are NOT just for indexed columns.

– Adding a histogram to an un-indexed column that is used in a where clause can improve performance.

Copyright © R Wave Solutions 2008 All rights reserved

Page 15: Understanding Histogram Ppt

Histogram Example

select sum(t1.d2*t2.d3*t3.d3)from t1, t2, t3where t1.fk1 = t2.pk1and t3.pk1 = t2.fk1and t3.d2 = 35and t1.d3 = 0; << This column has 2 values in it

In this example column T1.D3 has 2 values and is our most selective criterion, but without a histogram CBO assumes an even distribution of 2 distinct values (density = 0.5).

Copyright © R Wave Solutions 2008 All rights reserved

Page 16: Understanding Histogram Ppt

SQL*Plan Without Histogram

Rows Row Source Operation------- ---------------------------------

1 SORT AGGREGATE 1 SORT AGGREGATE 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID 601 INDEX RANGE SCAN OBJ#(57612)

1000 HASH JOIN 4 TABLE ACCESS FULL OBJ#(57604)

filter("T1"."D3"=0)62500 TABLE ACCESS FULL OBJ#(57605)

Copyright © R Wave Solutions 2008 All rights reserved

Page 17: Understanding Histogram Ppt

Timing Without Histogram

call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 1 0.00 0.00 0 0 0 0Execute 1 0.00 0.00 0 0 0 0Fetch 2 1.04 4.91 19013 20231 0 1------- ------ -------- ---------- ---------- ---------- ---------- ----------------- ------ -------- ---------- ---------- ---------- ---------- ----------total 4 1.04 4.91 19013 20231 0 1

Query = 20231

Copyright © R Wave Solutions 2008 All rights reserved

Page 18: Understanding Histogram Ppt

Use DBMS_STATS to Add Histogram

execute dbms_stats.gather_table_stats(NULL, 'T1', estimate_percent=>null, method_opt=>'FOR COLUMNS SIZE AUTO method_opt=>'FOR COLUMNS SIZE AUTO d3');

Copyright © R Wave Solutions 2008 All rights reserved

Page 19: Understanding Histogram Ppt

Timing With Histogram

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.01 0.00 0 0 0 0

Fetch 2 0.56 2.84 9841 10666 0 1Fetch 2 0.56 2.84 9841 10666 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.57 2.84 9841 10666 0 1

Now Query = 10666It was Query = 20231

½ the IOCopyright © R Wave Solutions 2008 All rights reserved

Page 20: Understanding Histogram Ppt

Plan With Histogram

Rows Row Source Operation------- ---------------------------------------

1 SORT AGGREGATE 2088 HASH JOIN 2088 HASH JOIN 601 TABLE ACCESS BY INDEX ROWID OBJ#(57606) 601 INDEX RANGE SCAN OBJ#(57612) 1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605) 1005 NESTED LOOPS

4 TABLE ACCESS FULL OBJ#(57604)filter("T1"."D3"=0)

1000 INDEX RANGE SCAN OBJ#(57609)access("T1"."FK1"="T2"."PK1")

Copyright © R Wave Solutions 2008 All rights reserved

Page 21: Understanding Histogram Ppt

Histograms Opportunities

� Any column used in a where clause with skewed dataColumns that are not queried all the time� Columns that are not queried all the time

� Reduced overhead for insert, update, delete

Copyright © R Wave Solutions 2008 All rights reserved

Page 22: Understanding Histogram Ppt

When Not To Create A Histogram

� No rule of thumb is ever perfect.� Do not create Histograms

– Evenly distributed data in the column– Columns that are not queried

– Do not create them on every column of every table.

– PK of a table

Copyright © R Wave Solutions 2008 All rights reserved

Page 23: Understanding Histogram Ppt

Creating / Maintaining Histograms

� Table maintenance procedures are important.– Accidentally deleting histogram data is possible.– Accidentally deleting histogram data is possible.– Histogram data is ONLY Collected/Maintained

when stats are collected (DBMS_STATS).– Changing data in the table does not change data

stored in the histogram.

Copyright © R Wave Solutions 2008 All rights reserved

Page 24: Understanding Histogram Ppt

Maintaining Histograms

� Always check if histogram exists on table before DBMS_STATS is run.– Do not forget to note the # of buckets.– Do not forget to note the # of buckets.– Create tables to hold information.

� Use METHOD_OPT FOR ALL COLUMNS REPEAT to prevent deletion of histograms data.

� Distribution ratio change -> Recollect

Copyright © R Wave Solutions 2008 All rights reserved

Page 25: Understanding Histogram Ppt

Creating Histograms

� What is the best way to start out creating Histograms? Three popular ways to create.– Salt and Pepper - Create them as you need them.– Salt and Pepper - Create them as you need them.– Little Hammer DBMS_STATS using

METHOD_OPT AUTO� Should NOT be used after fresh restarted database

– Big Hammer DBMS_STATS using SKEWONLY– Over Kill Hammer - FOR ALL COLUMNS

Copyright © R Wave Solutions 2008 All rights reserved

Page 26: Understanding Histogram Ppt

Other Comments

� It is the Density of a column that the optimizer considers when accessing a table.Changing the number of buckets changes � Changing the number of buckets changes the density.

� More Buckets DOES NOT mean better density.

� Using degree != 1 can affect density.

Copyright © R Wave Solutions 2008 All rights reserved

Page 27: Understanding Histogram Ppt

Review

� What is a Histogram?� Two types of Histograms in the database� How to create Histograms� Data is collected by running DBMS_STATS.� Different methods to create histograms� Bucket size affects density of column.� Density of the column affects CBO.

Copyright © R Wave Solutions 2008 All rights reserved

Page 28: Understanding Histogram Ppt

Thank You for Your Time

Copyright © R Wave Solutions 2008 All rights reserved