cost based optimizer - part 1 of 2

51
www.hotsos.com Slide 1 Copyright © 1999–2007 by Hotsos Enterprises, Ltd. Cost Based Optimizer – 1 of 2 Hotsos Enterprises, Ltd. Grapevine, Texas Oracle. Performance. Now. [email protected]

Upload: mahesh-vallampati

Post on 12-Jun-2015

2.590 views

Category:

Technology


0 download

DESCRIPTION

This paper describes how the optimizer uses statistics and determines plans for executing SQL statement. It explains how the 10053 trace file can be used to understand Oracle's decisions on execution plans.

TRANSCRIPT

Page 1: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 1Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cost Based Optimizer – 1 of 2

Hotsos Enterprises, Ltd.

Grapevine, Texas

Oracle. Performance. Now.

[email protected]

Page 2: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 2Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Agenda

• Cost Based Optimizer and its impact in performance• Data Points Collected by the Statistics Gathering Process• The 10053 Trace File• Case Studies

Page 3: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 3Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cost Based Optimizer

Page 4: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 4Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cost Based Optimizer (CBO)

• The CBO in reality is a complex decision making software– Use several Database Initialization Parameters

• These are listed in the 10053 trace file

– Uses several session level initialization parameter• These are parameters at the session level that override the

database initialization parameters

– Uses statistics about the objects• These statistics will be discussed later

– Hints to the optimizer– Uses Statistics about the system (CPU, Disk etc)– Use this information and makes decisions on the “best way” to

generate an execution plan

Page 5: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 5Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

CBO will be part of your life if you keep working with Oracle.

• The cost-based query optimizer (CBO)…– Uses data from a variety of sources– Estimates the costs of several execution plans– Chooses the plan it estimates to be the least expensive

• Characteristics– Adapts to changing circumstances– Frustrating if you don’t know what it considers as input

• Works great if you know how to use it

• But produces very poor results if you lie to it

– The only query optimizer supported by Oracle Corporation from release 10 onward

Page 6: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 6Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

The cost-based query optimizer chooses the plan that it computes as having the lowest estimated cost.

• Don’t assume the following are identical– CBO’s estimated cost of an execution plan– The actual cost of an execution plan

• CBO’s cost estimate can be imperfect– Are your CBO inputs perfect?– CBO isn’t perfect, but by 9.2 it’s almost always good enough

• Without properly collected statistics, the CBO will– use RBO if no statistics exist on any object in the statement– use default statistics if statistics exist for a single object in the

statement but not others– use dynamic sampling to generate statistics (based on

parameter setting and Oracle version)

Page 7: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 7Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cost Based Optimizer

dbms_stats.gather_database_stats

execution

databasetable and

indexstatistics

dbms_stats.gather_system_stats

execution

systemCPU and

I/Ostatistics

databaseschemaconfig

index, partitionmanipulation, etc.

Oracle cost-basedoptimizer

Oraclequery cost

model

execution plan

application SQLmanipulation

SQL text

Oracle DBMSauthorship

Oracleinstance

parameters

parameter editsstored outlinemanipulation

storedoutlines

Page 8: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 8Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Execution plan changes can result in profoundly different application performance.

A. Table size change

B. Device latency change

C. Execution plan change

• Type C performance changes are the most profound

200 400 600 800 1000rows

20

40

60

80

100LIOs C

1000 2000 3000 4000 5000rows

1000

2000

3000

4000

5000LIOs B

5000 10000 15000 20000 25000 30000 35000rows

2000

4000

6000

8000

10000

12000

14000

16000LIOs A

size change

performance change

performance change

performance change

Page 9: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 9Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Recap

• The CBO is a complex piece of software• It uses several data points to calculate the cost of the execution

plan and will choose the plan with the lowest cost• It is dynamic and will adapt to changing data better than the Cost

Based Optimizer• A good understanding of the Cost Based Optimizer is imperative

in understanding the rationale behind some of the choices

Page 10: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 10Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Data Points Collected by the Cost Based Optimizer

Page 11: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 11Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Data Points Collected by the Cost Based Optimizer

• Table Statistics• Column Statistics• Index Statistics• System Statistics

Page 12: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 12Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Table Statistics

• Statistics collected for tables appear in the data dictionary views *_TABLES– Number of rows (NUM_ROWS) – Number of data blocks below the high water mark

(BLOCKS) – Number of data blocks allocated to the table that have never

been used (EMPTY_BLOCKS) – Average available free space in each data block in bytes

(AVG_SPACE) – Number of chained rows (CHAIN_CNT) – Average row length, including the row's overhead, in bytes

(AVG_ROW_LEN)

Page 13: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 13Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Column Statistics

• Statistics collected for columns appear in the data dictionary views *_TAB_COLUMNS and *_TAB_COL_STATISTICS.– Number of distinct column values (NUM_DISTINCT) – Lowest value in a column (LOW_VALUE) – Highest value in a column (HIGH_VALUE)– Selectivity estimate for column (DENSITY)– Number of null values in a column (NUM_NULLS)– Number of histogram buckets (NUM_BUCKETS)– Average column length (AVG_COL_LEN)– Existence/type of histogram (HISTOGRAM)

Page 14: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 14Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Index Statistics

• Statistics collected for indexes appear in the data dictionary views *_INDEXES.– Depth of the index from its root block to its leaf blocks

(BLEVEL)– Number of leaf blocks (LEAF_BLOCKS) – Number of distinct indexed values (DISTINCT_KEYS) – Average number of leaf blocks in which each distinct value in

the index appears (AVG_LEAF_BLOCKS_PER_KEY) – Average number of data blocks in the table that are pointed to

by a distinct value in the index (AVG_DATA_BLOCKS_PER_KEY)

– Clustering factor (how well ordered the rows are with respect to the indexed values) (CLUSTERING_FACTOR)

Page 15: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 15Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

System Statistics

Statistic Description

sreadtim average time (ms) to complete a single block read

mreadtim average time (ms) to complete a multi-block read

cpuspeed average CPU cycles per second (in millions)

cpuspeednw estimate of CPU cycles (v10)

ioseektim estimate of average read seek time (v10)

iotfrspeed estimate of transfer speed of I/O system (v10)

mbrc average multiblock read count (in blocks)

maxthr maximum I/O system throughput (in bytes/second)

slavethr average slave I/O throughput (in bytes/second)

Page 16: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 16Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

DBMS_STATS

• DBMS_STATS are used to collect the above described data points– There are several debates on what the % of data collected

should be– What the options to the DBMS_STATS should be on

• Our perspective– It should be dependent on the best plans that can be

generated for all the SQL being executed in the system– The frequency and the % of stats being gathered itself should

not become a malignant load on the system

Page 17: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 17Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Recap

• There are several data points that are used by the optimizer• They are collected using DBMS_STATS• All these data points help the optimizer in deciding to choose a

execution plan• The collection of statistics should be a function of the quality of

the plans that can be generated and the special cases that will drive higher frequency and % estimate of statistics collection

Page 18: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 18Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Example of Data Points Gathered by the Optimizer

Page 19: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 19Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

10053 Trace File

Page 20: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 20Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

10053 Trace File

• The 10053 Trace File is the dump of the analysis done by the Cost Based Optimizer in figuring out the execution plan

• It is unlike the 10046 trace file which has a specific syntax and is usually the same across version barring a few exceptions (Millsap, Optimizing Oracle Performance)

• It should be used as the last resort for trouble shooting• Usually the difference in the rows estimated by the Cost Based

Optimizer and the Actual Rows from the execution plan is good enough to see where the problem is

Page 21: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 21Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

See Webinar 1 – SQL Tuning with Trace Data and DBMS_XPLAN

Page 22: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 22Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

10053 Trace File Structure

• Peeked Values of the Bind variables in the SQL Statement• Parameters used by the Optimizer• Base Table Statistics

– Table Stats– Column Stats– Index Stats

• Single Table Access Path• Join Order Determination (Tries Several Combinations and

determines the type of the join and the order of the join)• Query Itself

Page 23: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 23Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Script to Generate 10053 Trace File

• alter session set max_dump_file_size = unlimited;• alter session set tracefile_identifier = ‘Query1_10053’;• alter session set events '10053 trace name context forever, level

1';• Explain plan for• select * from dba_objects where owner=‘SCOTT’;• alter session set events ’10053 trace name context off’;• exit;

Page 24: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 24Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Recap

• The 10053 Trace File is a good tool to understand the reasoning and the data behind the optimizer

• Should usually be used a last resort• Produce large amount of data due to the number of combinations

and data points involved in making a decision• Usually the row difference in the execution plan and the explain

plan should be more than enough to understand why the optimizer is making a certain choice of row source operation

Page 25: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 25Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Case Studies

Page 26: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 26Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Index Scan versus Full Table Scan’s

Page 27: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 27Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Setup

• Create a table which is a copy of dba_objects• create table myobj as select * from dba_objects;• We will create indexes on owner and object_type• We will now gather default statistics and run some queries• We will create indexes on owner and object_id• We will then study some simple queries on this table

Page 28: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 28Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Should I do a Full Table Scan or Use an Index?

• Here is a simple query• Full Table Scan

– select object_type, count(1) from myobj– where object_id between 10 and 11558– group by object_type;

• Index Range Scan– select object_type, count(1) from myobj– where object_id between 10 and 11557– group by object_type;

Page 29: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 29Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Review 10053 Trace Files

Page 30: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 30Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Data Points

• Full Table Scan– Cost

• 180

– Logical I/O’s• 772

• Index Scan– Cost

• 179

– Logical I/O’s• 166

Page 31: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 31Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Shouldn’t Oracle minimize the number of Logical I/O’s?

• Yes– Yes, it is probably the right thing to do

• No– Because, it is probably not the right thing

• The answer is it depends on several things the most important being db_file_multiblock_read_count

• In our experiment it was set to 16• At 16, the cost or the time taken to execute the query by Oracle

will be faster if it did a full table scan using the multi block read count than if it did a sequential read of the index and the table

• You can set different values for db_file_multiblock_read_count and see you can see the cost calculations and the plans change

Page 32: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 32Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Now let us set db_file_multiblock_read_count to 8

• Full Table Scan– select object_type, count(1) from myobj– where object_id between 10 and 14301– group by object_type;

• Index Range Scan– select object_type, count(1) from myobj– where object_id between 10 and 14300– group by object_type;

Page 33: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 33Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Let us set it to 8 and see what happens

• Full Table Scan– Cost

• 221

– Logical I/O’s• 772

• Index Scan– Cost

• 220

– Logical I/O’s• 211

Page 34: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 34Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

So there are a lot of moving parts!!!

• Yes, there are.• How do you manage that

– Keep your optimizer environment as stable as possible– The only thing that should be changing are the data points

collected by the DBMS_STATS gathering process– Keep a copy of your old statistics– Just in case you may have to revert back to old stats to get

back to the good ol’ plan

Page 35: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 35Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

A couple of more concepts

Page 36: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 36Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cardinality/Selectivity

• Cardinality– The predicted number of rows generated by a row source

operation– That is what you see in the explain plan

• Selectivity– Cardinality is calculated by estimating the selectivity which is

the expected fraction of rows that will pass the predicate test– Is very sensitive to number of distinct values– Determines join order choice based on cardinality and hence

selectivity

Page 37: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 37Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Simple Selectivity

• Calculate Range for the Query (Rng) – Obtained from Query

• Calculate the difference between column high value and column low value (hival-loval)– Obtained from dba_tab_columns

• Divide Rng and (hival-loval)• Add N/num_distinct where N is the range bound count and

num_distinct is the number of distinct values for this column which is obtained from dba_tab_columns

Page 38: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 38Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Sample Selectivity Calculation

select object_type, count(1) from myobj where object_id between10 and 11558 group by object_type;• Range is 11558-10 = 11548• Column High Value – 69892• Column Low Value – 14• N = 2• Number of Distinct Values – 55618• Note the optimizer see that 10 as a lower bound value does not

make sense and upgrades it to 14• Select (11548-14)/(69892-14)+(2/55618) * 55618 from dual gives

cardinality which is 9190 and let us look at the explain plan cardinality

Page 39: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 39Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Cost

• What do we mean by the cost of the query?– Oracle Performance Guide

• Cost = (Number of Single Block Reads * single read time

+ number of multi block reads * multi block read time

+ CPU Cycles / CPU Cycles / second )

/Time to read a single block

• Dimensional Analysis Show Cost is just a number

• Although it is the optimizer guess at how long the query will take to execute

Page 40: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 40Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Selectivity and Cardinality Impact cost and join strategies

• Selectivity and Cardinality impact cost and join strategies as we saw in the example

• You can predict the cost and behavior of your queries as the data volume increases

• The number of distinct values is very important and determines the selectivity and hence cardinality and hence access strategies and hence costs

• The number of distinct values from non-unique columns on which predicates are written and joined can severely be impacted the quality of the statistics being collected

Page 41: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 41Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

So how can you know if a plan is good?

• Look at the cardinality of the explain plan• Look at the rows actually obtained from the execution plan (from

the trace file or the row source execution statistics)• See how closely they match• If there is a wide variance, study if it is impacting your query

performance and determine what additional statistics need to be gathered

Page 42: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 42Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

How can you predict the nature of plans due to data growth?

• Use the cardinality hint to study the performance of the query and its behaviour

Page 43: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 43Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Scripts

Page 44: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 44Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

10053_SCOTT.sql

• alter session set max_dump_file_size = unlimited;• alter session set tracefile_identifier = 'MYOBJ_SCOTT';• alter session set events '10053 trace name context forever, level

1';• explain plan for• select object_type,count(*) from myobj where owner='SCOTT'• group by object_type;• alter session set events '10053 trace name context off';• exit;

Page 45: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 45Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

10053_sys.sql

• alter session set max_dump_file_size = unlimited;• alter session set tracefile_identifier = 'SYS_OBJ';• alter session set events '10053 trace name context forever, level

1';• explain plan for• select object_type,count(*) from myobj where owner='SYS'• group by object_type;• alter session set events '10053 trace name context off';• exit;

Page 46: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 46Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Col_stats.sql

• SELECT table_name "Table Name", • column_name "Column Name",• num_distinct "Number of Distinct Values", • low_value "Low Value", • high_value "High Value", • density "Density", • num_nulls "Number of Nulls", • num_buckets "Number of Buckets", • avg_col_len "Average Column Length", • histogram "Histogram"• FROM user_tab_columns • WHERE table_name='MYOBJ';

Page 47: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 47Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Gather_table_stats.sql

• begin• dbms_stats.gather_table_stats(USER,'MYOBJ');• end;

Page 48: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 48Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Index_stats.sql

• SELECT table_name "Table Name", • index_name "Index Name", • blevel "Blevel", • leaf_blocks "Leaf Blocks", • distinct_keys "Distinct Keys", • AVG_LEAF_BLOCKS_PER_KEY "Avg Leaf Blocks per Key", • AVG_DATA_BLOCKS_PER_KEY "Avg Data Blocks Per Key", • CLUSTERING_FACTOR "Clustering Factor" • FROM user_indexes • WHERE table_name='MYOBJ';

Page 49: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 49Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Join_strategy_choice.sql

• select /*+ cardinality (e 6059) cardinality (d 200) */ e.empno,e.ename,d.dname,d.loc from emp e, dept d

• where e.deptno=d.deptno;

• select /*+ cardinality (e 6060) cardinality (d 200) */ e.empno,e.ename,d.dname,d.loc from emp e, dept d

• where e.deptno=d.deptno;

Page 50: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 50Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Raw_to_num.sql

• select• column_name,• raw_to_num(low_value) low_val,• raw_to_num(high_value) high_val• from• user_tab_columns• where• table_name = 'MYOBJ'• and column_name = 'OBJECT_ID'

Page 51: Cost Based Optimizer - Part 1 of 2

www.hotsos.com Slide 51Copyright © 1999–2007 by Hotsos Enterprises, Ltd.

Tab_stats.sql

• select table_name "Table Name", • num_rows "No. of Rows",• blocks "Blocks", • empty_blocks "Empty Blocks", • avg_space "Average Space", • chain_cnt "Chain Count", • avg_row_len "Average Row Length", • sample_size "Sampe Size" from user_tables• where table_name='MYOBJ';