oracle analysis 101_v1.0_ext
TRANSCRIPT
oracle_analysis_101 (12/9/08) [email protected] Page 1
Oracle Analysis 101 Simple techniques to help analyze performance
• [email protected]> http://blogs.sun.com/~glennf> Sr. Staff Engineer> Performance Technologies Group
oracle_analysis_101 (12/9/08) [email protected] Page 2
Goal Statements
Introduce basic techniques that are required to better collect and analyze
Oracle performance data.
oracle_analysis_101 (12/9/08) [email protected] Page 3
Overview Collecting Data• Developing a well defined problem statement.• Define types of performance data and what is
important.• Minimal set of data required for performance
engagements.• Data quality – properly scoped and collected.• Show techniques to gather various types of
performance data from Oracle.>Basic STATSPACK and Automatic Workload
Repository (AWR) capabilities>Gathering Oracle Trace Data
oracle_analysis_101 (12/9/08) [email protected] Page 4
Developing a problem statement• Be as specific as possible using business
metrics:> Warehouse Inventory user response time increases
from 1 to 10 seconds during peak hours (10AM to 1PM PST).
> The Fulfillment batch job has increased from 1 hour to 2 hours over the past month.
• Avoid defining performance in terms of system metrics.> System cpu% has increased from 10% to 25% during
peak hours.> This may be an indication of a potential problem or Future
problem. This by itself is NOT a problem. Just a symptom.
oracle_analysis_101 (12/9/08) [email protected] Page 5
CPU is not a workload metric!• Consider the following:
> Upgrade from Older v880 server running Solaris 8.> New server m4000 on Solaris 10.
• CPU% on new server at 60% during peak vs old system at 50% during peak.
> Panic!!! The new server can't possibly handle any growth!!> Escalations ensue, people flap their arms, executive get
involved... you get the picture :)
• Observations> Need real metrics like “orders/hr”, etc...> CPU% is not a workload metric or a measure of throughput.> Solaris 8 often under-reports CPU% vs Solaris 10.... use Tim
Cooks utility and blog: http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement
oracle_analysis_101 (12/9/08) [email protected] Page 6
Types of Performance Data• Environmental> Configuration (HW, OS, Network, IO, and DB)> Event/Error logs (“messages” and “alert_xx.log”)> System Run logs or ECOs.
• High Level statistics (Be sure to scope the data!)> Business metrics: Orders/min, Shipments/sec, ...> iostat, netstat, vmstat, mount, prstat, ps -ecf, ... (guds?)> Oracle STATSPACK or AWR
• Low Level statistics> mpstat, trapstat, cpustat, lockstat, DTrace> Event 10046 tracing in Oracle.
oracle_analysis_101 (12/9/08) [email protected] Page 7
Scoped and Correlated Data• Focus on data around the event> I once received a STATSPACK where the report
spanned 36 hours ☺> Avoid data-overload... I recently received 2GB of trace files> Averages have a funny way of distorting problems and
pointing you in the wrong direction.> User response time and business metrics
• OS and Database statistics should be from the SAME interval.> Often I see an Explorer from midnight with some utilization
data paired up with a STATSPACK from the afternoon.
oracle_analysis_101 (12/9/08) [email protected] Page 8
DTRACE - Cool, but not the best place to start!• Treats Oracle as a BLACK box.• Can identify resource consumers, but can NOT tell
if this behavior is correct or not.• STATSPACK or AWR can provide DB stats
overview• Oracle Event Tracing is best for deep drill-down..
the “Dtrace” of Oracle.
oracle_analysis_101 (12/9/08) [email protected] Page 9
Oracle Performance data• STATSPACK introduced in 8.1.6> Replaced tired bstat/estat> Workload profiling with Persistent storage of perf data> More detailed latch and shared pool data> Finds HOT SQL statements to aid in SQL tuning.
• Automated Workload Repository (AWR) in 10g> HTML output!, Remote capabilities, sort by CPU and
Elasped time.
• Trace Wait interface> Enhanced in 10g> Trace individual processes/sessions via “oradebug”
oracle_analysis_101 (12/9/08) [email protected] Page 10
Overview Analysis• Basic techniques• Environmental (logfiles and configuration)• STATSPACK / AWR overview• Oracle Event Tracing (The “DTrace” of Oracle)
oracle_analysis_101 (12/9/08) [email protected] Page 11
Basic techniques● Start with a well defined problem● Look for high-level signs of problems
– alert.log– STATSPACK/AWR: (1st page stats)
Ÿ Load profileŸ top wait eventsŸ Hit rates
– Top SQL CPU consumers in AWR reports
Oracle Performance analys" takes years to ma#er... so be patient.
oracle_analysis_101 (12/9/08) [email protected] Page 12
Alert.log analysis• Startup time and messages.> Restart frequency.> init.ora hacking shows up “_underbar_params”> Restart frequency
• Errors are reported to the alert.log file.
• Log file switch frequency. Tue Aug 30 14:01:22 2005Starting ORACLE instance (normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 0Picked latch-free SCN scheme 3....SYS auditing is disabledStarting up ORACLE RDBMS Version: 10.1.0.2.0......Mon Nov 28 14:39:26 2005Private_strands 3 at log switchBeginning log switch checkpoint up to RBA [0x19d.2.10], SCN: 0x0000.00478e91Thread 1 advanced to log sequence 413 Current log# 1 seq# 413 mem# 0: /export/home/oracle/oradata/GLENNF/redo01.logMon Nov 28 14:40:37 2005Private_strands 3 at log switchBeginning log switch checkpoint up to RBA [0x19e.2.10], SCN: 0x0000.00478eadThread 1 advanced to log sequence 414 Current log# 2 seq# 414 mem# 0: /export/home/oracle/oradata/GLENNF/redo02.log
Log switches every 71 seconds!!
Startup message.
oracle_analysis_101 (12/9/08) [email protected] Page 13
AWR / Statspack Analysis 101• GOAL:> Give basic guidance when looking at an AWR or
STATSPACK report.• Answer basic questions like:> What is the scope of the data collected?> Is this RAC or single instance?> How many connections?> What is the transaction rate?> IO rate? Cache hit rate?> How much CPU is being used?> What SQL is using the most CPU, IO?
oracle_analysis_101 (12/9/08) [email protected] Page 14
HEADER for Statspack/AWR• A fair amount of information can be squeezed just from the
header.
RAC cluster
650+ connections... Shadow ProcessesSample interval
oracle_analysis_101 (12/9/08) [email protected] Page 15
Scoping issues• Example #1. Can you find the issue?
WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Release RAC Host------------ ----------- ------------ -------- ----------- --- ------------PROD 4060419904 PROD2 2 10.2.0.3.0 YES thdtoltpr02
Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 25738 15-May-08 09:00:12 828 15.6 End Snap: 25744 15-May-08 13:00:73 832 15.6 Elapsed: 240.86 (mins) DB Time: 2405.07 (mins)
4 hour window??Notice Gap in Snap IDs?Oracle by default schedules
AWR by the hour.
oracle_analysis_101 (12/9/08) [email protected] Page 16
Scoping issues cont...• Example #2: What's wrong with this sample?
3000 sessions were added in the 30min interval!!
STATSPACK report for
DB Name DB Id Instance Inst Num Release Cluster Host------------ ----------- ------------ -------- ----------- ------- ------------SWINGBCH 861079668 SWINGBCH 1 9.2.0.6.0 NO dc1-beta
Snap Id Snap Time Sessions Curs/Sess Comment --------- ------------------ -------- --------- -------------------Begin Snap: 221 27-Apr-07 02:00:06 14 48.6 End Snap: 223 27-Apr-07 02:30:07 3,017 34.5 Elapsed: 30.02 (mins)
30min collection interval is good.
This is an application startup phase.
oracle_analysis_101 (12/9/08) [email protected] Page 17
Oracle Cache Sizes• Shows Default Buffer cache, shared pool, recycle, ..• Caches use IPC shared memory.> “ipcs -mb” shows segments from OS point of view> “pmap -xs <orapid>” shows pages and sizes from OS
point view
Oracle block size. 8K is the safest by far. All development and optimizer work is with 8K.
Cache Sizes~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 5,712M 5,712M Std Block Size: 8K Shared Pool Size: 2,864M 2,864M Log Buffer: 14,376K
With DISM, caches can grow and shrink
oracle_analysis_101 (12/9/08) [email protected] Page 18
Load Profile• How many transactions/sec?• IO profile? Query profile?Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69
% Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18
oracle_analysis_101 (12/9/08) [email protected] Page 19
Load Profile: Apples and Oranges!• As “Joe the DBA” might say:
– “Nothing's changed”– “It's the same application”
• Verify it is the same... The truth is in the DATA!• Key metrics: Logical IO, Physical IO, Transaction profile.Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 14,529,454.45 506,509.90 Logical reads: 154,624.04 5,390.33 Block changes: 45,862.25 1,598.80 Physical reads: 196.92 6.86 Physical writes: 794.24 27.69 User calls: 148.29 5.17 Parses: 34.47 1.20 Hard parses: 0.00 0.00 Sorts: 15.67 0.55 Logons: 0.29 0.01 Executes: 98.55 3.44 Transactions: 28.69
% Blocks changed per Read: 29.66 Recursive Call %: 48.51 Rollback per transaction %: 0.02 Rows per Sort: 1137.18
oracle_analysis_101 (12/9/08) [email protected] Page 20
Load Profile... warning signs• High physical IO rate.• Hard parses... should primarily be soft parses.• High “Logons/sec”... use persistent connections!
Load Profile~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 1,282,493.19 2,192.82 Logical reads: 1,104,645.30 1,888.74 Block changes: 9,286.08 15.88 Physical reads: 48,975.96 0.01 Physical writes: 11,983.33 0.37 User calls: 484.33 0.83 Parses: 79.70 0.14 Hard parses: 0.14 0.00 Sorts: 6.74 0.01 Logons: 1.56 0.00 Executes: 4,375.60 7.48 Transactions: 584.86
% Blocks changed per Read: 0.84 Recursive Call %: 97.13 Rollback per transaction %: 1.30 Rows per Sort: 527.7
oracle_analysis_101 (12/9/08) [email protected] Page 21
Instances Efficiency Percentages• Buffer Hit rate> Values below 99% are suspect for OLTP.
• Shared Pool “% SQL with exec > 1”> low values mean poor reuse of shared statements> SQL without bind variables..
Instance Efficiency Percentages (Target 100%)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.99 Redo NoWait %: 100.00 Buffer Hit %: 98.92 In-memory Sort %: 100.00 Library Hit %: 100.15 Soft Parse %: 99.87 Execute to Parse %: 98.15 Latch Hit %: 99.81Parse CPU to Parse Elapsd %: 93.41 % Non-Parse CPU: 99.89
Shared Pool Statistics Begin End ------ ------ Memory Usage %: 68.00 68.06 % SQL with executions>1: 98.40 95.83 % Memory for SQL w/exec>1: 96.84 94.28
oracle_analysis_101 (12/9/08) [email protected] Page 22
Top 5 Timed Events• Wait events> Shows where “Oracle” connections wait.> Bad problems usually show up here first.> This is an Average of all sessions, so treat it as such.> This is a good sample of the TOP 5 events>CPU and IO are the top events.
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 3,641 60.6db file sequential read 268,976 1,375 5 22.9 User I/Ogc cr grant 2-way 218,866 384 2 6.4 Clusterlog file sync 12,625 131 10 2.2 Commitgc current block 2-way 61,056 130 2 2.2 Cluster -------------------------------------------------------------
oracle_analysis_101 (12/9/08) [email protected] Page 23
CPU time in Oracle• Total amount of CPU seconds during the sample interval.> CPU is typically one of the top stats... along with IO.> Can calculate CPU utilization!> Useful for consolidation since only the CPU time for this
instance is considered. Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 7380 07-Nov-08 20:00:43 1,375 72.0 End Snap: 7382 07-Nov-08 20:30:59 1,361 71.9 Elapsed: 30.27 (mins) DB Time: 395.62 (mins)
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 14,845 62.5db file sequential read 1,146,234 8,873 8 37.4 User I/Odb file scattered read 21,784 545 25 2.3 User I/Oread by other session 53,589 244 5 1.0 User I/O
14845/(30.27*60) = 8.17 CPUs busy for “usr” time.
oracle_analysis_101 (12/9/08) [email protected] Page 24
Drill down on Expensive SQL• Which SQL is using the most CPU?> Allows you to quickly locate expensive SQL statements...
but beware, this might not be the problem :) CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 18,641,061 2,645 7,047.7 39.9 369.37 372.72 3894562395Module: JDBC Thin Clientinsert into PLANARRIV (item, source, dest, transmode, needarrivdate, schedarrivdate, needshipdate, schedshipdate, expdate, qty,firmplansw, seqnum, substqty, departuredate, deliverydate, orderplacedate, sourcing ) values ( :1, :2, :3, :4, :5, :6, :7, :8,:9, :10, :11, :12, :13, :14, :15, :16, :17 )
6,867,117 377 18,215.2 14.7 66.63 94.12 1924417985Module: JDBC Thin ClientSELECT sku.item,sku.loc,item.perishablesw,loc.ohpost,sku.oh,sku.ohpost,sku.replentype,skudemandparam.alloccal,skudemandparam.ccpsw,skudemandparam.custorderdur,skudemandparam.dmdredid,skudemandparam.dmdtodate,skudemandparam.fcstadjrule,skudemandparam.fcstconsumptionrule,skudemandparam.fcstprimconsdur,skudemandparam.fcs
oracle_analysis_101 (12/9/08) [email protected] Page 25
Problem wait events• “enq”, “buffer busy”, “latch free”.. Often a sign of too many
connections or application problems.Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------enq: TX - row lock contention 427,949 157,270 367 26.2 ApplicatioCPU time 113,999 19.0gc buffer busy 3,642,184 95,627 26 16.0 Clustergc current block busy 2,264,273 76,874 34 12.8 Clusterdb file scattered read 351,146 30,238 86 5.0 User I/O
Event Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------gc buffer busy 556,725 487,263 875 89.7 Clusterdb file sequential read 10,814 9,982 923 1.8 User I/Oenq: HW - contention 7,313 899 123 0.2 ConfiguratCPU time 852 0.2gc current multi block request 901 796 883 0.1 Cluster
% TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------latch free 4,542,675 1,137,914 79.04log file sync 242,359 164,671 11.44buffer busy waits 102,540 61,887 4.30enqueue 35,142 42,498 2.95CPU time 25,310 1.76
oracle_analysis_101 (12/9/08) [email protected] Page 26
Further down the AWR you see all wait events... Avg %Time Total Wait wait WaitsEvent Waits -outs Time (s) (ms) /txn---------------------------- -------------- ------ ----------- ------- ---------log file sync 107,090 77.4 82,401 769 4.5enq: HW - contention 78,617 73.7 29,060 370 3.3......log file sequential read 3,975 .0 86 22 0.2log file parallel write 27,333 .0 86 3 1.1
Problem wait events... “log file sync”• Too many connections lead to scheduling issues.
• Rarely an IO issue.... but check Log file io just in case.
• 2ms or less is desirable
• Many bugs... Use 10.2.0.4.. (Checksum bug #6814520 in 10.2.0.3)
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------log file sync 107,090 82,401 769 29.1 Commitenq: HW - contention 78,617 29,060 370 10.3 Configuratdb file sequential read 25,928 24,612 949 8.7 User I/Ogc buffer busy 7,803 5,906 757 2.1 Cluster
oracle_analysis_101 (12/9/08) [email protected] Page 27
Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time Wait Class------------------------------ ------------ ----------- ------ ------ ----------CPU time 17,186 75.3db file sequential read 744,522 5,874 8 25.7 User I/Odb file scattered read 23,809 459 19 2.0 User I/O
IO wait events• You can get avg wait for IO from the Top 5 events.> Oracle's statistic: “db file sequential read”
– Storage centric view: “Random single block IO”> Oracle's statistic: “db file scattered read”
– Storage centric view : “Sequential IO”... HUH?
oracle_analysis_101 (12/9/08) [email protected] Page 28
Tablespace IO Stats DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> ordered by IOs (Reads + Writes) desc
Tablespace------------------------------ Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------DATA_TS 477,801 180 8.1 2.6 99,555 37 7,016 5.5INDEX_TS 186,082 70 8.3 1.0 64,924 24 30,214 0.9
Tablespace Filename------------------------ ---------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)-------------- ------- ------ ------- ------------ -------- ---------- ------AMG_ALBUM_IDX_TS /oradata/itmscmp/data2/amg_album_idx_ts01.dbf 392 0 7.0 1.0 5 0 0 0.0AMG_ALBUM_TS /oradata/itmscmp/data3/amg_album_ts01.dbf 7,604 3 7.4 1.0 5 0 2 10.0
More IO information...• Reads by Tablespace, Datafile, SQL statement,
oracle_analysis_101 (12/9/08) [email protected] Page 29
SQL ordered by Reads DB/Inst: TTOPERF1/ttoperf15 Snaps: 5141-5142-> Total Disk Reads: 318,079-> Captured SQL account for 133.5% of Total
Reads CPU ElapsedPhysical Reads Executions per Exec %Total Time (s) Time (s) SQL Id-------------- ----------- ------------- ------ -------- --------- ------------- 811,212 1 811,212.0 51.4 538.11 1013.10 2j2g639a9s4kxModule: sqlplus@itscontentrepdb05 (TNS V1-V3)select /*+ parallel(ppc, 2) */ count(distinct p.adam_id) from mz_playlist p, mz_playlist_price_cache ppc where p.first_production_release is not null and p.last_production_release is null and p.playlist_id=ppc.playlist_id and (ppc.start_date is NULL or ppc.start_date <= sysdate) and (ppc.end_date is NULL or ppc.end_da
Segments by Physical Reads DB/Inst: ITMSCMP/itscr11p Snaps: 6785-6788-> Total Physical Reads: 1,577,615-> Captured Segments account for 81.5% of Total
Tablespace Subobject Obj. PhysicalOwner Name Object Name Name Type Reads %Total---------- ---------- -------------------- ---------- ----- ------------ -------CONTENT_OW DATA_TS MZ_PLAYLIST_PRICE_CA TABLE 723,411 45.85CONTENT_OW DATA_TS MZ_PLAYLIST__LS TABLE 87,947 5.57CONTENT_OW DATA_TS MZ_USER_REVIEW TABLE 79,534 5.04CONTENT_OW DATA_TS MZ_PRODUCT__LS TABLE 52,580 3.33CONTENT_OW DATA_TS MZ_PODCAST_EPISODE_2 TABLE 43,243 2.74
Even More IO information...• Reads by SQL statement, Database object
oracle_analysis_101 (12/9/08) [email protected] Page 30
“Who needs iostat?”• IO rate information from the Load Profile> physical reads/writes per second
• IO service time(s) from wait events
• IO broken down by Tablespace and Datafile, etc..
• Seriously, who needs it?
• Sorry, you still need “iostat”.> Like the CPU wait events, IO events are only from this instance.> Times aren't accurate on an over-processed system.> iostat from the system point of view> “storage level” analytics are useful as well!> They often don't match due to > IO configuration and layout> Scheduling
oracle_analysis_101 (12/9/08) [email protected] Page 31
Case Study: Oracle Applications BM• Benchmark for DIT (India IRS :)• Configuration > E20K with 36 USIV @1200MHz> Solaris 10 with Oracle 9iR2
• Oracle Statistics> STATSPACK> Event trace
• Problem Statement:> Unable to support more than 2000 users within 2 second average
response time. The goal is 4000 users. At 2000 users the system is fully utilized 100% cpu.
oracle_analysis_101 (12/9/08) [email protected] Page 32
Case Study: STATSPACK Data• STATSPACK data showed severe latch contention
• Drill down by CPU, IO, etc... didn't show the problem.
CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value--------------- ------------ -------------- ------ -------- --------- ---------- 264,391,557 50,560 5,229.3 44.7 1799.33 3566.29 3184176672Module: f90runm@sleepy (TNS V1-V3)SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE_CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_YR,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SELECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO= :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
75,312,641 113,269 664.9 12.7 1063.39 1451.77 3785480933select max(nvl(option$,0)) from sysauth$ where privilege#=:1 connect by grantee#=prior privilege# and privilege#>0 start with (grantee#=:2 or grantee#=1) and privilege#>0 group by privilege#
Top 5 Timed Events~~~~~~~~~~~~~~~~~~ % TotalEvent Waits Time (s) Ela Time-------------------------------------------- ------------ ----------- --------latch free 10,597,141 1,425,538 97.52CPU time 25,842 1.77row cache lock 105,066 4,235 .29enqueue 7,065 2,438 .17buffer busy waits 23,785 2,195 .15
oracle_analysis_101 (12/9/08) [email protected] Page 33
Case Study: Oracle Trace Top-level• Using Oracle event trace allowed us to narrow our focus and
concentrate on the true bottle-neck.
• Gathered several *.trc files and used “orasrp” to analyze.
• Drilled down on “latch free” events as shown in profile below...
oracle_analysis_101 (12/9/08) [email protected] Page 34
Case Study: Oracle Trace “latch free”• Drilling down again on statements which contribute the most to
“latch free” shows an interesting pattern with the “dual” table... a well known problem in Oracle 9i.
oracle_analysis_101 (12/9/08) [email protected] Page 35
Case Study: Summary• OS showed 100% CPU utilization, but no anomalies. DTRACE
was not helpful here either.
• STATSPACK provided starting point of problem.
• Oracle Trace interface and “response-time” profiling pinned down the source of the problem.
• Researched “dual” table problem on-line (metalink)> Problem is fixed in 10g> Trick / workaround for 9i.> Re-coding to avoid is Best!!
oracle_analysis_101 (12/9/08) [email protected] Page 36
Oracle Resources• http://metalink.oracle.com - Oracle's Metalink > Need an account. Check [email protected] archives for latest.> Research bugs, tech tips, download patches, ...
• http://technet.oracle.com – Oracle's Technet> Documentation, white papers, ...
• http://asktom.oracle.com Misc questions mostly dba but some perf
• http://www.oraperf.com - Analyzer for STATSPACK files!!
• http://oracledba.ru/orasrp/ - Oracle Session profiler.
• http://method-r.com/ - Great papers and insight – Cary Millsap
• http://www.orapub.com - Papers, advice, ...
• Nasty bug for 10.2.0.3 : Checksum bug #6814520
oracle_analysis_101 (12/9/08) [email protected] Page 37
More references and resources• metalink.oracle.com documents on Trace
> 245981.1 – Trace wait functionality in 10g> 21154.1 – Enabling Tracing (session level)> 1058210.6 – Enabling Tracing ORADEBUG> 39817.1 – Interpreting Raw trace data
• Oracle papers> Avoiding Common Oracle Performance Problems
> http://www.sun.com/blueprints/0303/817-1781.pdf
• Sun Blogs> Oracle performance on Sun
> http://blogs.sun.com/glennf
> Tim Cook's Solaris 8,9,10 CPU% blog and “old-new” utility.> http://blogs.sun.com/timc/entry/how_event_driven_utilization_measurement
oracle_analysis_101 (12/9/08) [email protected] Page 38
Summary• Identify and define the problem• Collect and identify Oracle performance data> Alert.log> STATSPACK> Oracle Tracing and analysis
• Know when to say when> Use experts to help guide analysis.> Avoid Google performance hackers.
Oracle Analysis 101●[email protected] http://blogs.sun.com/~glennf
Sr. Staff Engineer Performance Technologies Group
Questions?????
Oracle Analysis 101●[email protected] http://blogs.sun.com/~glennf
Sr. Staff Engineer Performance Technologies Group
Extra slides...
oracle_analysis_101 (12/9/08) [email protected] Page 41
Where is the Oracle Data? • Alert logs & Trace Files
$ORACLE_HOME/rdbms/log ##Default
• Optimal Flexible Architecture (OFA) is common to manage multiple instances > Places Files files in set location to ease administration.> User Trace and Alert.log found in:
“USER_DUMP_DEST” init.ora over-rides Default. “BACKGROUND_DUMP_DEST” for server files... Including the “alert.log” file.
• Full OFA documentation http://www.hotsos.com/e-library/abstract.php?id=19
oracle_analysis_101 (12/9/08) [email protected] Page 42
Using STATSPACK• Install package from $ORACLE_HOME/rdbms/admin
SQL> connect / as sysdbaSQL> @?/rdbms/admin/spcreate ## Usually not necessary
• Take snapshots throughout the day. Often an hourly job.SQL> connect perfstat/perfstatSQL> exec statspack.snap(i_snap_level=>7);...... run workload ......SQL> exec statspack.snap(i_snap_level=>7);
• Run “spreport.sql” and select two intervalsSQL> @?/rdbms/admin/spreport ## Run report
• init.ora “statistics_level=ALL”> Necessary to get details about Query plans and Segment
statistics.
oracle_analysis_101 (12/9/08) [email protected] Page 43
Using Automatic Workload Repository• AWR installation automatic as part of 10g.• Snaphot
SQL> connect / as sysdbaSQL> exec dbms_workload_repository.create_snapshot();...run test....SQL> exec dbms_workload_repository.create_snapshot();
• Run “@?/rdbms/admin/awrrpt” and select two snapshots.
oracle_analysis_101 (12/9/08) [email protected] Page 44
Show Query plans• Further drill down with ?/rdbms/admin/awrrepsql.sql> Get full stats and QEP given the hash value of statement
SQL Statistics~~~~~~~~~~~~~~-> CPU and Elapsed Time are in seconds (s) for Statement Total and in milliseconds (ms) for Per Execute % Snap Statement Total Per Execute Total --------------- --------------- ------ Buffer Gets: 6,867,117 18,215.2 14.71 Disk Reads: 3,887 10.3 6.54 Rows processed: 378,635 1,004.3 CPU Time(s/ms): 67 176.7 Elapsed Time(s/ms): 94 249.6 Sorts: 377 1.0 Parse Calls: 0 .0 Invalidations: 0 Version count: 1 Sharable Mem(K): 346 Executions: 377
oracle_analysis_101 (12/9/08) [email protected] Page 45
Show Query plans ( cont...)------------------------------------------------| Operation | PHV/Object Name | Rows | Bytes| Cost |--------------------------------------------------------------------------------|SELECT STATEMENT |----- 1966240984 ----| | | 11671 ||SORT ORDER BY | | 974 | 253K| 11671 || NESTED LOOPS OUTER | | 974 | 253K| 11649 || NESTED LOOPS | | 960 | 231K| 9729 || NESTED LOOPS | | 951 | 192K| 7827 || NESTED LOOPS | | 952 | 168K| 5923 || NESTED LOOPS | | 956 | 124K| 4011 || NESTED LOOPS | | 979 | 84K| 2053 || HASH JOIN | | 1K| 54K| 43 || HASH JOIN | | 1K| 46K| 22 || TABLE ACCESS BY INDEX R|PROCESSSKU | 1K| 27K| 10 || INDEX RANGE SCAN |PROCESSSKU_BATCH | 1K| | 4 || TABLE ACCESS FULL |LOC | 1K| 23K| 11 || TABLE ACCESS FULL |ITEM | 10K| 87K| 20 || TABLE ACCESS BY INDEX ROW|SKU | 1 | 32 | 2 || INDEX UNIQUE SCAN |SKU_PK | 1 | | 1 || TABLE ACCESS BY INDEX ROWI|SKUPLANNINGPARAM | 1 | 45 | 2 || INDEX UNIQUE SCAN |XPKSKUPLANNINGPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID|SKUDEMANDPARAM | 1 | 48 | 2 || INDEX UNIQUE SCAN |XPKSKUDEMANDPARAM | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUDEPLOYMENTPARAM | 1 | 26 | 2 || INDEX UNIQUE SCAN |XPKSKUDEPLOYMENTPARA | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUSAFETYSTOCKPARAM | 1 | 40 | 2 || INDEX UNIQUE SCAN |XPKSKUSAFETYSTOCKPAR | 1 | | 1 || TABLE ACCESS BY INDEX ROWID |SKUPERISHABLEPARAM | 1 | 20 | 2 || INDEX UNIQUE SCAN |XPKSKUPERISHABLEPARA | 1 | | 1 |--------------------------------------------------------------------------------
oracle_analysis_101 (12/9/08) [email protected] Page 46
Drill down on Object Statistics• Object statistics... > Which objects are doing the most IO?> Which objects get the most Buffer Busy Waits?
Subobject Obj. PhysicalOwner Tablespace Object Name Name Type Reads %Total---------- ---------- -------------------- ---------- ----- ------------ -------STSC INDX DFUTOSKUFCST_PK INDEX 16,784 28.24STSC DATA DFUTOSKUFCST TABLE 14,792 24.89STSC DATA SKUPLANNINGPARAM TABLE 5,412 9.11STSC DATA SOURCING TABLE 3,644 6.13STSC DATA SKUSAFETYSTOCKPARAM TABLE 2,923 4.92 -------------------------------------------------------------
Buffer Subobject Obj. BusyOwner Tablespace Object Name Name Type Waits %Total---------- ---------- -------------------- ---------- ----- ------------ -------STSC DATA PLANARRIV TABLE 95,897 94.22STSC INDX PLANARRIV_PK INDEX 3,999 3.93STSC DATA SKU TABLE 466 .46STSC INDX XIF4RECSHIP INDEX 391 .38STSC DATA RECSHIP TABLE 347 .34 -------------------------------------------------------------
oracle_analysis_101 (12/9/08) [email protected] Page 47
Using the Trace Wait Interface• Oracle tracing is a lot like truss or Dtrace for the database.> What is a particular “shadow” process doing? (SQL statements, wait
events, ...)
> Trace produces *.trc file in udump directory. (Post process with HOTSOS profiler or ORASRP)
SQL> connect / as sysdbaSQL> oradebug setospid 5544SQL> oradebug event 10046 trace name context forever, level 12...wait for a while...SQL> oradebug event 10046 trace name context off
==ora_5544.trc file====EXEC #2:c=0,e=324,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=677730703911WAIT #2: nam='db file sequential read' ela= 5954 p1=1 p2=15356 p3=1WAIT #2: nam='db file sequential read' ela= 7235 p1=1 p2=14168 p3=1FETCH #2:c=10000,e=13869,p=2,cr=3,cu=0,mis=0,r=1,dep=1,og=4,tim=677730717849STAT #1 id=1 cnt=0 pid=0 pos=1 obj=6251 op='TABLE ACCESS FULL SQLPLUS_PRODUCT_PROFILE (cr=3 r=0 w=0 time=121 us)'STAT #2 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID OBJ#(18) (cr=3 r=2 w=0 time=13829 us)'STAT #2 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX UNIQUE SCAN OBJ#(36) (cr=2 r=1 w=0 time=6350 us)'WAIT #1: nam='SQL*Net message to client' ela= 3 p1=1650815232 p2=1 p3=0WAIT #1: nam='SQL*Net message from client' ela= 256 p1=1650815232 p2=1 p3=0
oracle_analysis_101 (12/9/08) [email protected] Page 48
Response Time Profiling Trace Data • Collect *.trc file as previously shown via oradebug or ???
• Analyze files with> HOTSOS / Method-R profiler> “orasrp” freeware which gives a similar profile to HOTSOS.