oracle analysis 101_v1.0_ext

48
oracle_analysis_101 (12/9/08) [email protected] Page 1 Oracle Analysis 101 Simple techniques to help analyze performance [email protected] > http://blogs.sun.com/~glennf > Sr. Staff Engineer > Performance Technologies Group

Upload: saravanaprabakaran

Post on 31-May-2015

1.229 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 1

Oracle Analysis 101 Simple techniques to help analyze performance

[email protected]> http://blogs.sun.com/~glennf> Sr. Staff Engineer> Performance Technologies Group

Page 2: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 2

Goal Statements

Introduce basic techniques that are required to better collect and analyze

Oracle performance data.

Page 3: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 3

Overview Collecting Data• Developing a well defined problem statement.• Define types of performance data and what is

important.• Minimal set of data required for performance

engagements.• Data quality – properly scoped and collected.• Show techniques to gather various types of

performance data from Oracle.>Basic STATSPACK and Automatic Workload

Repository (AWR) capabilities>Gathering Oracle Trace Data

Page 4: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 4

Developing a problem statement• Be as specific as possible using business

metrics:> Warehouse Inventory user response time increases

from 1 to 10 seconds during peak hours (10AM to 1PM PST).

> The Fulfillment batch job has increased from 1 hour to 2 hours over the past month.

• Avoid defining performance in terms of system metrics.> System cpu% has increased from 10% to 25% during

peak hours.> This may be an indication of a potential problem or Future

problem. This by itself is NOT a problem. Just a symptom.

Page 5: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 5

CPU is not a workload metric!• Consider the following:

> Upgrade from Older v880 server running Solaris 8.> New server m4000 on Solaris 10.

• CPU% on new server at 60% during peak vs old system at 50% during peak.

> Panic!!! The new server can't possibly handle any growth!!> Escalations ensue, people flap their arms, executive get

involved... you get the picture :)

• Observations> Need real metrics like “orders/hr”, etc...> CPU% is not a workload metric or a measure of throughput.> Solaris 8 often under-reports CPU% vs Solaris 10.... use Tim

Cooks utility and blog: http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement

Page 6: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 6

Types of Performance Data• Environmental> Configuration (HW, OS, Network, IO, and DB)> Event/Error logs (“messages” and “alert_xx.log”)> System Run logs or ECOs.

• High Level statistics (Be sure to scope the data!)> Business metrics: Orders/min, Shipments/sec, ...> iostat, netstat, vmstat, mount, prstat, ps -ecf, ... (guds?)> Oracle STATSPACK or AWR

• Low Level statistics> mpstat, trapstat, cpustat, lockstat, DTrace> Event 10046 tracing in Oracle.

Page 7: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 7

Scoped and Correlated Data• Focus on data around the event> I once received a STATSPACK where the report

spanned 36 hours ☺> Avoid data-overload... I recently received 2GB of trace files> Averages have a funny way of distorting problems and

pointing you in the wrong direction.> User response time and business metrics

• OS and Database statistics should be from the SAME interval.> Often I see an Explorer from midnight with some utilization

data paired up with a STATSPACK from the afternoon.

Page 8: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 8

DTRACE - Cool, but not the best place to start!• Treats Oracle as a BLACK box.• Can identify resource consumers, but can NOT tell

if this behavior is correct or not.• STATSPACK or AWR can provide DB stats

overview• Oracle Event Tracing is best for deep drill-down..

the “Dtrace” of Oracle.

Page 9: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 9

Oracle Performance data• STATSPACK introduced in 8.1.6> Replaced tired bstat/estat> Workload profiling with Persistent storage of perf data> More detailed latch and shared pool data> Finds HOT SQL statements to aid in SQL tuning.

• Automated Workload Repository (AWR) in 10g> HTML output!, Remote capabilities, sort by CPU and

Elasped time.

• Trace Wait interface> Enhanced in 10g> Trace individual processes/sessions via “oradebug”

Page 10: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 10

Overview Analysis• Basic techniques• Environmental (logfiles and configuration)• STATSPACK / AWR overview• Oracle Event Tracing (The “DTrace” of Oracle)

Page 11: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 11

Basic techniques● Start with a well defined problem● Look for high-level signs of problems

– alert.log– STATSPACK/AWR: (1st page stats)

Ÿ Load profileŸ top wait eventsŸ Hit rates

– Top SQL CPU consumers in AWR reports

Oracle Performance analys" takes years to ma#er... so be patient.

Page 12: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 12

Alert.log analysis• Startup time and messages.> Restart frequency.> init.ora hacking shows up “_underbar_params”> Restart frequency

• Errors are reported to the alert.log file.

• Log file switch frequency. Tue Aug 30 14:01:22 2005Starting ORACLE instance (normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 0Picked latch-free SCN scheme 3....SYS auditing is disabledStarting up ORACLE RDBMS Version: 10.1.0.2.0......Mon Nov 28 14:39:26 2005Private_strands 3 at log switchBeginning log switch checkpoint up to RBA [0x19d.2.10], SCN: 0x0000.00478e91Thread 1 advanced to log sequence 413 Current log# 1 seq# 413 mem# 0: /export/home/oracle/oradata/GLENNF/redo01.logMon Nov 28 14:40:37 2005Private_strands 3 at log switchBeginning log switch checkpoint up to RBA [0x19e.2.10], SCN: 0x0000.00478eadThread 1 advanced to log sequence 414 Current log# 2 seq# 414 mem# 0: /export/home/oracle/oradata/GLENNF/redo02.log

Log switches every 71 seconds!!

Startup message.

Page 13: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 13

AWR / Statspack Analysis 101• GOAL:> Give basic guidance when looking at an AWR or

STATSPACK report.• Answer basic questions like:> What is the scope of the data collected?> Is this RAC or single instance?> How many connections?> What is the transaction rate?> IO rate? Cache hit rate?> How much CPU is being used?> What SQL is using the most CPU, IO?

Page 14: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 14

HEADER for Statspack/AWR• A fair amount of information can be squeezed just from the

header.

RAC cluster

650+ connections... Shadow ProcessesSample interval

Page 15: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 15

Scoping issues• Example #1. Can you find the issue?

WORKLOAD REPOSITORY report for

DB Name DB Id Instance Inst Num Release RAC Host------------ ----------- ------------ -------- ----------- --- ------------PROD 4060419904 PROD2 2 10.2.0.3.0 YES thdtoltpr02

Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 25738 15-May-08 09:00:12 828 15.6 End Snap: 25744 15-May-08 13:00:73 832 15.6 Elapsed: 240.86 (mins) DB Time: 2405.07 (mins)

4 hour window??Notice Gap in Snap IDs?Oracle by default schedules

AWR by the hour.

Page 16: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 16

Scoping issues cont...• Example #2: What's wrong with this sample?

3000 sessions were added in the 30min interval!!

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host------------ ----------- ------------ -------- ----------- ------- ------------SWINGBCH 861079668 SWINGBCH 1 9.2.0.6.0 NO dc1-beta

Snap Id Snap Time Sessions Curs/Sess Comment --------- ------------------ -------- --------- -------------------Begin Snap: 221 27-Apr-07 02:00:06 14 48.6 End Snap: 223 27-Apr-07 02:30:07 3,017 34.5 Elapsed: 30.02 (mins)

30min collection interval is good.

This is an application startup phase.

Page 17: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 17

Oracle Cache Sizes• Shows Default Buffer cache, shared pool, recycle, ..• Caches use IPC shared memory.> “ipcs -mb” shows segments from OS point of view> “pmap -xs <orapid>” shows pages and sizes from OS

point view

Oracle block size. 8K is the safest by far. All development and optimizer work is with 8K.

Cache Sizes~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 5,712M 5,712M Std Block Size: 8K Shared Pool Size: 2,864M 2,864M Log Buffer: 14,376K

With DISM, caches can grow and shrink

Page 18: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 18

Load Profile• How many transactions/sec?• IO profile? Query profile?Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69

% Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18

Page 19: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 19

Load Profile: Apples and Oranges!• As “Joe the DBA” might say:

– “Nothing's changed”– “It's the same application”

• Verify it is the same... The truth is in the DATA!• Key metrics: Logical IO, Physical IO, Transaction profile.Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69

% Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18

Page 20: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 20

Load Profile... warning signs• High physical IO rate.• Hard parses... should primarily be soft parses.• High “Logons/sec”... use persistent connections!

Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,282,493.19 2,192.82 Logical reads: 1,104,645.30 1,888.74 Block changes: 9,286.08 15.88 Physical reads: 48,975.96 0.01 Physical writes: 11,983.33 0.37 User calls: 484.33 0.83 Parses: 79.70 0.14 Hard parses: 0.14 0.00 Sorts: 6.74 0.01 Logons: 1.56 0.00 Executes: 4,375.60 7.48 Transactions: 584.86

% Blocks changed per Read: 0.84 Recursive Call %: 97.13 Rollback per transaction %: 1.30 Rows per Sort: 527.7

Page 21: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 21

Instances Efficiency Percentages• Buffer Hit rate> Values below 99% are suspect for OLTP.

• Shared Pool “% SQL with exec > 1”> low values mean poor reuse of shared statements> SQL without bind variables..

Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 100.00 Buffer Hit %: 98.92 In-memory Sort %: 100.00 Library Hit %: 100.15 Soft Parse %: 99.87 Execute to Parse %: 98.15 Latch Hit %: 99.81Parse CPU to Parse Elapsd %: 93.41 % Non-Parse CPU: 99.89

Shared Pool Statistics Begin End ------ ------ Memory Usage %: 68.00 68.06 % SQL with executions>1: 98.40 95.83 % Memory for SQL w/exec>1: 96.84 94.28

Page 22: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 22

Top 5 Timed Events• Wait events> Shows where “Oracle” connections wait.> Bad problems usually show up here first.> This is an Average of all sessions, so treat it as such.> This is a good sample of the TOP 5 events>CPU and IO are the top events.

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 3,641 60.6db file sequential read 268,976 1,375 5 22.9 User I/Ogc cr grant 2-way 218,866 384 2 6.4 Clusterlog file sync 12,625 131 10 2.2 Commitgc current block 2-way 61,056 130 2 2.2 Cluster -------------------------------------------------------------

Page 23: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 23

CPU time in Oracle• Total amount of CPU seconds during the sample interval.> CPU is typically one of the top stats... along with IO.> Can calculate CPU utilization!> Useful for consolidation since only the CPU time for this

instance is considered. Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 7380 07-Nov-08 20:00:43 1,375 72.0 End Snap: 7382 07-Nov-08 20:30:59 1,361 71.9 Elapsed: 30.27 (mins) DB Time: 395.62 (mins)

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 14,845 62.5db file sequential read 1,146,234 8,873 8 37.4 User I/Odb file scattered read 21,784 545 25 2.3 User I/Oread by other session 53,589 244 5 1.0 User I/O

14845/(30.27*60) = 8.17 CPUs busy for “usr” time.

Page 24: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 24

Drill down on Expensive SQL• Which SQL is using the most CPU?> Allows you to quickly locate expensive SQL statements...

but beware, this might not be the problem :) CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 18,641,061 2,645 7,047.7 39.9 369.37 372.72 3894562395Module: JDBC Thin Clientinsert into PLANARRIV (item, source, dest, transmode, needarrivdate, schedarrivdate, needshipdate, schedshipdate, expdate, qty,firmplansw, seqnum, substqty, departuredate, deliverydate, orderplacedate, sourcing ) values ( :1, :2, :3, :4, :5, :6, :7, :8,:9, :10, :11, :12, :13, :14, :15, :16, :17 )

6,867,117 377 18,215.2 14.7 66.63 94.12 1924417985Module: JDBC Thin ClientSELECT sku.item,sku.loc,item.perishablesw,loc.ohpost,sku.oh,sku.ohpost,sku.replentype,skudemandparam.alloccal,skudemandparam.ccpsw,skudemandparam.custorderdur,skudemandparam.dmdredid,skudemandparam.dmdtodate,skudemandparam.fcstadjrule,skudemandparam.fcstconsumptionrule,skudemandparam.fcstprimconsdur,skudemandparam.fcs

Page 25: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 25

Problem wait events• “enq”, “buffer busy”, “latch free”.. Often a sign of too many

connections or application problems.Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------enq: TX - row lock contention 427,949 157,270 367 26.2 ApplicatioCPU time 113,999 19.0gc buffer busy 3,642,184 95,627 26 16.0 Clustergc current block busy 2,264,273 76,874 34 12.8 Clusterdb file scattered read 351,146 30,238 86 5.0 User I/O

Event Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------gc buffer busy 556,725 487,263 875 89.7 Clusterdb file sequential read 10,814 9,982 923 1.8 User I/Oenq: HW - contention 7,313 899 123 0.2 ConfiguratCPU time 852 0.2gc current multi block request 901 796 883 0.1 Cluster

% TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------latch free 4,542,675 1,137,914 79.04log file sync 242,359 164,671 11.44buffer busy waits 102,540 61,887 4.30enqueue 35,142 42,498 2.95CPU time 25,310 1.76

Page 26: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 26

Further down the AWR you see all wait events... Avg %Time Total Wait wait WaitsEvent Waits -outs Time (s) (ms) /txn---------------------------- -------------- ------ ----------- ------- ---------log file sync 107,090 77.4 82,401 769 4.5enq: HW - contention 78,617 73.7 29,060 370 3.3......log file sequential read 3,975 .0 86 22 0.2log file parallel write 27,333 .0 86 3 1.1

Problem wait events... “log file sync”• Too many connections lead to scheduling issues.

• Rarely an IO issue.... but check Log file io just in case.

• 2ms or less is desirable

• Many bugs... Use 10.2.0.4.. (Checksum bug #6814520 in 10.2.0.3)

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------log file sync 107,090 82,401 769 29.1 Commitenq: HW - contention 78,617 29,060 370 10.3 Configuratdb file sequential read 25,928 24,612 949 8.7 User I/Ogc buffer busy 7,803 5,906 757 2.1 Cluster

Page 27: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 27

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 17,186 75.3db file sequential read 744,522 5,874 8 25.7 User I/Odb file scattered read 23,809 459 19 2.0 User I/O

IO wait events• You can get avg wait for IO from the Top 5 events.> Oracle's statistic: “db file sequential read”

– Storage centric view: “Random single block IO”> Oracle's statistic: “db file scattered read”

– Storage centric view : “Sequential IO”... HUH?

Page 28: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 28

Tablespace IO Stats DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> ordered by IOs (Reads + Writes) desc

Tablespace------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------DATA_TS 477,801 180 8.1 2.6 99,555 37 7,016 5.5INDEX_TS 186,082 70 8.3 1.0 64,924 24 30,214 0.9

Tablespace Filename------------------------ ---------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------AMG_ALBUM_IDX_TS /oradata/itmscmp/data2/amg_album_idx_ts01.dbf 392 0 7.0 1.0 5 0 0 0.0AMG_ALBUM_TS /oradata/itmscmp/data3/amg_album_ts01.dbf 7,604 3 7.4 1.0 5 0 2 10.0

More IO information...• Reads by Tablespace, Datafile, SQL statement,

Page 29: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 29

SQL ordered by Reads DB/Inst: TTOPERF1/ttoperf15 Snaps: 5141-5142-> Total Disk Reads: 318,079-> Captured SQL account for 133.5% of Total

Reads CPU ElapsedPhysical Reads Executions per Exec %Total Time (s) Time (s) SQL Id-------------- ----------- ------------- ------ -------- --------- ------------- 811,212 1 811,212.0 51.4 538.11 1013.10 2j2g639a9s4kxModule: sqlplus@itscontentrepdb05 (TNS V1-V3)select /*+ parallel(ppc, 2) */ count(distinct p.adam_id) from mz_playlist p, mz_playlist_price_cache ppc where p.first_production_release is not null and p.last_production_release is null and p.playlist_id=ppc.playlist_id and (ppc.start_date is NULL or ppc.start_date <= sysdate) and (ppc.end_date is NULL or ppc.end_da

Segments by Physical Reads DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> Total Physical Reads: 1,577,615-> Captured Segments account for 81.5% of Total

Tablespace Subobject Obj. PhysicalOwner Name Object Name Name Type Reads %Total---------- ---------- -------------------- ---------- ----- ------------ -------CONTENT_OW DATA_TS MZ_PLAYLIST_PRICE_CA TABLE 723,411 45.85CONTENT_OW DATA_TS MZ_PLAYLIST__LS TABLE 87,947 5.57CONTENT_OW DATA_TS MZ_USER_REVIEW TABLE 79,534 5.04CONTENT_OW DATA_TS MZ_PRODUCT__LS TABLE 52,580 3.33CONTENT_OW DATA_TS MZ_PODCAST_EPISODE_2 TABLE 43,243 2.74

Even More IO information...• Reads by SQL statement, Database object

Page 30: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 30

“Who needs iostat?”• IO rate information from the Load Profile> physical reads/writes per second

• IO service time(s) from wait events

• IO broken down by Tablespace and Datafile, etc..

• Seriously, who needs it?

• Sorry, you still need “iostat”.> Like the CPU wait events, IO events are only from this instance.> Times aren't accurate on an over-processed system.> iostat from the system point of view> “storage level” analytics are useful as well!> They often don't match due to > IO configuration and layout> Scheduling

Page 31: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 31

Case Study: Oracle Applications BM• Benchmark for DIT (India IRS :)• Configuration > E20K with 36 USIV @1200MHz> Solaris 10 with Oracle 9iR2

• Oracle Statistics> STATSPACK> Event trace

• Problem Statement:> Unable to support more than 2000 users within 2 second average

response time. The goal is 4000 users. At 2000 users the system is fully utilized 100% cpu.

Page 32: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 32

Case Study: STATSPACK Data• STATSPACK data showed severe latch contention

• Drill down by CPU, IO, etc... didn't show the problem.

CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 264,391,557 50,560 5,229.3 44.7 1799.33 3566.29 3184176672Module: f90runm@sleepy (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

75,312,641 113,269 664.9 12.7 1063.39 1451.77 3785480933select max(nvl(option$,0)) from sysauth$ where privilege#=:1 connect by grantee#=prior privilege# and privilege#>0 start with (grantee#=:2 or grantee#=1) and privilege#>0 group by privilege#

Top 5 Timed Events~~~~~~~~~~~~~~~~~~ % TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------latch free 10,597,141 1,425,538 97.52CPU time 25,842 1.77row cache lock 105,066 4,235 .29enqueue 7,065 2,438 .17buffer busy waits 23,785 2,195 .15

Page 33: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 33

Case Study: Oracle Trace Top-level• Using Oracle event trace allowed us to narrow our focus and

concentrate on the true bottle-neck.

• Gathered several *.trc files and used “orasrp” to analyze.

• Drilled down on “latch free” events as shown in profile below...

Page 34: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 34

Case Study: Oracle Trace “latch free”• Drilling down again on statements which contribute the most to

“latch free” shows an interesting pattern with the “dual” table... a well known problem in Oracle 9i.

Page 35: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 35

Case Study: Summary• OS showed 100% CPU utilization, but no anomalies. DTRACE

was not helpful here either.

• STATSPACK provided starting point of problem.

• Oracle Trace interface and “response-time” profiling pinned down the source of the problem.

• Researched “dual” table problem on-line (metalink)> Problem is fixed in 10g> Trick / workaround for 9i.> Re-coding to avoid is Best!!

Page 36: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 36

Oracle Resources• http://metalink.oracle.com - Oracle's Metalink > Need an account. Check [email protected] archives for latest.> Research bugs, tech tips, download patches, ...

• http://technet.oracle.com – Oracle's Technet> Documentation, white papers, ...

• http://asktom.oracle.com Misc questions mostly dba but some perf

• http://www.oraperf.com - Analyzer for STATSPACK files!!

• http://oracledba.ru/orasrp/ - Oracle Session profiler.

• http://method-r.com/ - Great papers and insight – Cary Millsap

• http://www.orapub.com - Papers, advice, ...

• Nasty bug for 10.2.0.3 : Checksum bug #6814520

Page 37: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 37

More references and resources• metalink.oracle.com documents on Trace

> 245981.1 – Trace wait functionality in 10g> 21154.1 – Enabling Tracing (session level)> 1058210.6 – Enabling Tracing ORADEBUG> 39817.1 – Interpreting Raw trace data

• Oracle papers> Avoiding Common Oracle Performance Problems

> http://www.sun.com/blueprints/0303/817-1781.pdf

• Sun Blogs> Oracle performance on Sun

> http://blogs.sun.com/glennf

> Tim Cook's Solaris 8,9,10 CPU% blog and “old-new” utility.> http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement

Page 38: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 38

Summary• Identify and define the problem• Collect and identify Oracle performance data> Alert.log> STATSPACK> Oracle Tracing and analysis

• Know when to say when> Use experts to help guide analysis.> Avoid Google performance hackers.

Page 39: Oracle analysis 101_v1.0_ext

Oracle Analysis 101●[email protected] http://blogs.sun.com/~glennf

Sr. Staff Engineer Performance Technologies Group

Questions?????

Page 40: Oracle analysis 101_v1.0_ext

Oracle Analysis 101●[email protected] http://blogs.sun.com/~glennf

Sr. Staff Engineer Performance Technologies Group

Extra slides...

Page 41: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 41

Where is the Oracle Data? • Alert logs & Trace Files

$ORACLE_HOME/rdbms/log ##Default

• Optimal Flexible Architecture (OFA) is common to manage multiple instances > Places Files files in set location to ease administration.> User Trace and Alert.log found in:

“USER_DUMP_DEST” init.ora over-rides Default. “BACKGROUND_DUMP_DEST” for server files... Including the “alert.log” file.

• Full OFA documentation http://www.hotsos.com/e-library/abstract.php?id=19

Page 42: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 42

Using STATSPACK• Install package from $ORACLE_HOME/rdbms/admin

SQL> connect / as sysdbaSQL> @?/rdbms/admin/spcreate ## Usually not necessary

• Take snapshots throughout the day. Often an hourly job.SQL> connect perfstat/perfstatSQL> exec statspack.snap(i_snap_level=>7);...... run workload ......SQL> exec statspack.snap(i_snap_level=>7);

• Run “spreport.sql” and select two intervalsSQL> @?/rdbms/admin/spreport ## Run report

• init.ora “statistics_level=ALL”> Necessary to get details about Query plans and Segment

statistics.

Page 43: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 43

Using Automatic Workload Repository• AWR installation automatic as part of 10g.• Snaphot

SQL> connect / as sysdbaSQL> exec dbms_workload_repository.create_snapshot();...run test....SQL> exec dbms_workload_repository.create_snapshot();

• Run “@?/rdbms/admin/awrrpt” and select two snapshots.

Page 44: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 44

Show Query plans• Further drill down with ?/rdbms/admin/awrrepsql.sql> Get full stats and QEP given the hash value of statement

SQL Statistics~~~~~~~~~~~~~~-> CPU and Elapsed Time are in seconds (s) for Statement Total and in milliseconds (ms) for Per Execute % Snap Statement Total Per Execute Total --------------- --------------- ------ Buffer Gets: 6,867,117 18,215.2 14.71 Disk Reads: 3,887 10.3 6.54 Rows processed: 378,635 1,004.3 CPU Time(s/ms): 67 176.7 Elapsed Time(s/ms): 94 249.6 Sorts: 377 1.0 Parse Calls: 0 .0 Invalidations: 0 Version count: 1 Sharable Mem(K): 346 Executions: 377

Page 45: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 45

Show Query plans ( cont...)------------------------------------------------| Operation | PHV/Object Name | Rows | Bytes| Cost |--------------------------------------------------------------------------------|SELECT STATEMENT |----- 1966240984 ----| | | 11671 ||SORT ORDER BY | | 974 | 253K| 11671 || NESTED LOOPS OUTER | | 974 | 253K| 11649 || NESTED LOOPS | | 960 | 231K| 9729 || NESTED LOOPS | | 951 | 192K| 7827 || NESTED LOOPS | | 952 | 168K| 5923 || NESTED LOOPS | | 956 | 124K| 4011 || NESTED LOOPS | | 979 | 84K| 2053 || HASH JOIN | | 1K| 54K| 43 || HASH JOIN | | 1K| 46K| 22 || TABLE ACCESS BY INDEX R|PROCESSSKU | 1K| 27K| 10 || INDEX RANGE SCAN |PROCESSSKU_BATCH | 1K| | 4 || TABLE ACCESS FULL |LOC | 1K| 23K| 11 || TABLE ACCESS FULL |ITEM | 10K| 87K| 20 || TABLE ACCESS BY INDEX ROW|SKU | 1 | 32 | 2 || INDEX UNIQUE SCAN |SKU_PK | 1 | | 1 || TABLE ACCESS BY INDEX ROWI|SKUPLANNINGPARAM | 1 | 45 | 2 || INDEX UNIQUE SCAN |XPKSKUPLANNINGPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID|SKUDEMANDPARAM | 1 | 48 | 2 || INDEX UNIQUE SCAN |XPKSKUDEMANDPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUDEPLOYMENTPARAM | 1 | 26 | 2 || INDEX UNIQUE SCAN |XPKSKUDEPLOYMENTPARA | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUSAFETYSTOCKPARAM | 1 | 40 | 2 || INDEX UNIQUE SCAN |XPKSKUSAFETYSTOCKPAR | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUPERISHABLEPARAM | 1 | 20 | 2 || INDEX UNIQUE SCAN |XPKSKUPERISHABLEPARA | 1 | | 1 |--------------------------------------------------------------------------------

Page 46: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 46

Drill down on Object Statistics• Object statistics... > Which objects are doing the most IO?> Which objects get the most Buffer Busy Waits?

Subobject Obj. PhysicalOwner Tablespace Object Name Name Type Reads %Total---------- ---------- -------------------- ---------- ----- ------------ -------STSC INDX DFUTOSKUFCST_PK INDEX 16,784 28.24STSC DATA DFUTOSKUFCST TABLE 14,792 24.89STSC DATA SKUPLANNINGPARAM TABLE 5,412 9.11STSC DATA SOURCING TABLE 3,644 6.13STSC DATA SKUSAFETYSTOCKPARAM TABLE 2,923 4.92 -------------------------------------------------------------

Buffer Subobject Obj. BusyOwner Tablespace Object Name Name Type Waits %Total---------- ---------- -------------------- ---------- ----- ------------ -------STSC DATA PLANARRIV TABLE 95,897 94.22STSC INDX PLANARRIV_PK INDEX 3,999 3.93STSC DATA SKU TABLE 466 .46STSC INDX XIF4RECSHIP INDEX 391 .38STSC DATA RECSHIP TABLE 347 .34 -------------------------------------------------------------

Page 47: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 47

Using the Trace Wait Interface• Oracle tracing is a lot like truss or Dtrace for the database.> What is a particular “shadow” process doing? (SQL statements, wait

events, ...)

> Trace produces *.trc file in udump directory. (Post process with HOTSOS profiler or ORASRP)

SQL> connect / as sysdbaSQL> oradebug setospid 5544SQL> oradebug event 10046 trace name context forever, level 12...wait for a while...SQL> oradebug event 10046 trace name context off

==ora_5544.trc file====EXEC #2:c=0,e=324,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=677730703911WAIT #2: nam='db file sequential read' ela= 5954 p1=1 p2=15356 p3=1WAIT #2: nam='db file sequential read' ela= 7235 p1=1 p2=14168 p3=1FETCH #2:c=10000,e=13869,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=677730717849STAT #1 id=1 cnt=0 pid=0 pos=1 obj=6251 op='TABLE ACCESS FULL SQLPLUS_PRODUCT_PROFILE (cr=3 r=0 w=0 time=121 us)'STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ#(18) (cr=3 r=2 w=0 time=13829 us)'STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN OBJ#(36) (cr=2 r=1 w=0 time=6350 us)'WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0WAIT #1: nam='SQL*Net message from client' ela= 256 p1=1650815232 p2=1 p3=0

Page 48: Oracle analysis 101_v1.0_ext

oracle_analysis_101 (12/9/08) [email protected] Page 48

Response Time Profiling Trace Data • Collect *.trc file as previously shown via oradebug or ???

• Analyze files with> HOTSOS / Method-R profiler> “orasrp” freeware which gives a similar profile to HOTSOS.