excessive temp space usages from parallel operations

12
Excessive Temp Space Usages From Parallel Operations

Upload: saeed-meethal

Post on 22-Oct-2015

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Excessive Temp Space Usages From Parallel Operations

Excessive Temp Space Usages From Parallel Operations

Page 2: Excessive Temp Space Usages From Parallel Operations

The Sources of Temp Space Usages

• SORT ORDER BY (PGA)• SORT GROUP BY (PGA)• HASH GROUP BY (PGA)• WINDOW SORT (Analytic Function) (PGA)• HASH JOIN (PGA and join order)• HASH JOIN BUFFERED (PX related, need more

research)• BUFFER SORT (PX related, excessive)• PX SEND BROADCAST (PX distribute

BROADCAST, excessive)

Page 3: Excessive Temp Space Usages From Parallel Operations

How to Identify SQLs with Temp Space Issue

• Use view V$SQL (or AWR DBA_HIST_SQLSTAT).

• Check column direct_writes and compare the value with disk_reads.

• If the value is significant, and the query is not related to direct load, it is highly possible that we have high temp space usages.

Page 4: Excessive Temp Space Usages From Parallel Operations

V$SQL Example (UAD)

SQL_ID 6jbvpvurr02rh

ELAPSED TIME (SEC)

4129

IO WAIT TIME (SEC)

2494

DISK_READS 2,545,487

DIRECT_WRITES 5,481,060

Note: The reason DIRECT_WRITES is much greater than DISK_READS is that the query was still writing the data to temp space and yet to read when v$sql was checked.

Page 5: Excessive Temp Space Usages From Parallel Operations

Locate the Source of Temp Space Usages

• For 11g, try v$sql_plan_monitor, column workarea_max_tempseg

• For 11g and 10g, try v$sql_workarea_active, column tempseg_size

• Any significant value from above metrics will tell the execution steps with large temp space usages.

Page 6: Excessive Temp Space Usages From Parallel Operations

Example to Use V$SQL_PLAN_MONITOR

SQL_ID 6jbvpvurr02rh

SQL_EXEC_ID 16777217

PLAN ID 17

PLAN PARENT ID 12

OPERATION HASH JOIN

READ REQUESTS 0

WRITE REQUESTS 365,794

TEMP SPACE (MB) 45,732

Note: The reason read requests (PHYSICAL_READ_REQUESTS) is 0 is that the query was still building the first hash table from the first row source.

Page 7: Excessive Temp Space Usages From Parallel Operations

Example to Use V$SQL_WORKAREA_ACTIVE

Operation Plan Id SID Temp Space (MB)

HASH JOIN 17 1042 11,435

HASH JOIN 17 1107 11,433

HASH JOIN 17 1156 11,432

HASH JOIN 17 1223 11,432

SQL_ID: 6jbvpvurr02rh

Page 8: Excessive Temp Space Usages From Parallel Operations

Analyze The Plan

1. The temp space usage is from plan Id 17: HASH JOIN2. Since temp space is used, the first row source (Id 19 – 35) must be very large. 3. There is “PX SEND BROADCAST” for the first row source. It will amplify the temp

space usages by the magnitude of DOP, in this case, DOP = 4.4. When the row source of a HASH JOIN is already very large, BROADCAST PX

distribute will make the join much harder.

Page 9: Excessive Temp Space Usages From Parallel Operations

Using Realtime Monitor (V$SQL_PLAN_MONITOR)

1. Up to plan step 20, the first row source has generated 112,679,920 rows. The plan step 19 “PX SEND BROADCAST” amplified it to 450,719,680 rows. It definitely made the join much harder.

2. BROADCAST is supposed to be used for small row source distribution, that is how Oracle estimated for this query: 10421 rows for the first row source. Since Oracle estimate the second row source with 2.9M records, Oracle thought this join order was better.

Page 10: Excessive Temp Space Usages From Parallel Operations

The Root Cause1. The bad temp space usages with BROADCAST PX distribution is usually

the result of bad cardinality estimates of the first row source.2. The root cause is either the inaccuracy of table stats or Oracle’s

incapability to estimate JOIN cardinality. 3. For this case, both are to be blamed:

• The fact table involved does not have global stats.• There is no explicit partition range for Oracle to use partition level stats.• Multi column range partition scheme makes cardinality, join estimate and

partition pruning complicated.• BLOOM filter is disabled on UAD DB which makes partition pruning by join

almost impossible.4. The work around is to add two hints

• Dynamic sample hint: dynamic_sampling(2), note no table alias is used, so it will be applied to all tables involved. The purpose is to have better cardinality estimate.

• OPT_PARAM('_bloom_filter_enabled' 'true') to enable bloom filter for join related partition pruning.

Page 11: Excessive Temp Space Usages From Parallel Operations

PX BUFFER SORT Example

1. BUFFER SORT in PX is the result of that the operations on one row source/table is not parallelized, while the whole query runs in parallel. The BUFFER SORT operation happens when the query switches from serial operation to parallel operation. The temp space usage can be identified, using v$sql_workarea_active or v$sql_plan_monitor, or by researching the plan self if the query has completed long time ago.

2. In above case (DIRECT MARKETING, SEM), the query run with DOP 32, but the operation on the major row source, the fact table AGG_BY_SPACEID_KWOID_7D, was serial operation.

Page 12: Excessive Temp Space Usages From Parallel Operations

The Impact of BUFFER SORT

• If the BUFFER SORT is on the major row source and results significant temp space usages, it basically triples the IO requests (with additional one round of write and read)

• The more interesting thing is, the whole query runs in parallel, even with very high DOP, but the slowest operation to read a very large table runs in serial. This is basically PX resource waste.

• The work around is, to identify the operations running in serial (inside plan, those operations have column TQ and IN-OUT empty) and see if parallel hints can be added to appropriate tables, it will not only make PX operation more efficient, also reduce temp space usages.