553: oracle database performance: are database users telling me the truth?

37
Oracle Database Performance: Are Database Users Telling Me The Truth? Session # 553 REMINDER Check in on the COLLABORATE mobile app Prepared by: Alfredo Krieg Database Administrator The Sherwin Williams Company

Upload: alfredokrieg

Post on 16-Jul-2015

202 views

Category:

Technology


1 download

TRANSCRIPT

Oracle Database Performance: Are Database Users Telling Me The Truth?

Session # 553

REMINDER

Check in on the

COLLABORATE mobile app

Prepared by:

Alfredo Krieg

Database Administrator

The Sherwin Williams Company

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Who am I?

■ Alfredo Krieg ([email protected])

■ Senior Oracle Cloud Administrator at The Sherwin Williams Company based in Cleveland, Oh

■ OEM Cloud Control 12c and Database Performance Tuning■ OEM Cloud Control 12c and Database Performance Tuning

■ Oracle Technologies since 2004 & 11g Certified

■ Blog bitkode.blogspot.com

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Performance Challenge

■ Difficult without a performance tuning method

■ Multiple metrics, ratios and performance data to look into

■ DBA’s sometimes rely in what users experience and communicate

■ Analyze wrong snapshots, not when problem really happened

■ Are DB users telling me the truth?

What to do?

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Case Study■ Friday 10 am in the morning, DBA receives a call from a

database user stating that the production DB is slow. DBA questions the user in order to better understand the situation:

▪ When did the performance issue start?

▪ Has this happened before?

▪ Is it just your session or are other users affected as well?

■ Database user responds using his/her best knowledge of the ■ Database user responds using his/her best knowledge of the situation

▪ When did it start? I arrived today at 9 am and it was already slow!

▪ Has this happened before? Yes, it happened last Friday and the Friday before that too!

▪ Is it just your session or are other users affected as well? Seems like the entire department is being affected!

Case Study

■ DBA trusts the user and starts comparing previous

AWR/Statspack snapshots against the one contained

between 9:30 am and 10 am. DBA gathers a couple of

reports and looks at the ratios, CPU utilization and some

other performance metrics finding not much change in the

values when suddenly the user states that performance came

back to normal and the call ends. “DBA Magic”

■ What happened?:

▪ There was no method

▪ Bottleneck was not identified

▪ Root cause was not identified

▪ No permanent fix applied

Case Study

■ Key data communicated by the user

▪ At 9 am the problem was present

▪ Last 2 Fridays as well

▪ Entire department impacted

■ Do you trust in user?, We need to look into the DB system to see who’s right!

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

The Method

■ Must establish the stop tuning criteria

■ Define the scope of tuning

■ CTD (Compulsive Tuning Disorder)

Set Goals

■ CTD (Compulsive Tuning Disorder)

“Find out if the DB is slow/How slow? (Find the Response Time)”

“Identify the period of time and the bottleneck of the performance issue reported by the DB user”

“Identify if this issue has any relationship with the issue of the last 2 Fridays”

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

The Method

■ Oracle’s Kernel Instrumentation

(Wait Interface)

■ System views v$sysstat, v$system_event & v$sys_time_model (session level views can be used to troubleshoot performance

Measure

problems at session level) TIMED_STATISTICS=TRUE (modifiable)

■ Calculate DB Response Time for a period of time

▪ Service Time ms (Total DB CPU consumption)

▪ Queue Time ms (Total Non Idle wait time)

▪ User calls (Unit of Work)

▪ Snapshot’s time frame (i.e. 15 min, 30 min, 60 min, etc)

The Method

■ Queuing Theory (R = S + W)

RT ms/uc = Service Time + Queue Time

User Calls

Measure

■ As per Oracle Documentation

▪ Response time refers to the amount of time Enterprise Server takes to

return the results of a request to the user. The response time is affected

by factors such as network bandwidth, number of users, number and

type of requests submitted, and average think time.

The Method

■ Choose a Unit of Work

▪ User Calls (overall activity)

▪ Physical Reads/Writes (I/O bound)

▪ Logical Reads/Writes (CPU bound)

Measure

■ User Calls as Per Oracle Documentation

▪ “User Calls” represents the number of logins, parses, or execute calls

during the sample period and is an overall activity level monitor.

http://bitkode.blogspot.com/2013/08/testing-user-calls.html

http://shallahamer-orapub.blogspot.com/2010/05/understanding-user-calls.html

The Method

■ Take two snapshots from v$sysstat, v$system_event & v$sys_time_model and extract “Delta” values (Data Capture)

▪ Make use of AWR or Statspack (AWR requires license) verify retention!

▪ Extract AWR SQL script can be obtained from

http://bitkode.blogspot.com/2014/02/extract-awr-data-to-build-response-time.html

Measure

http://bitkode.blogspot.com/2014/02/extract-awr-data-to-build-response-time.html

■ The smaller the time between snapshots the more accurate the measure (but higher the overhead?)

■ Calculate the Response Time of every delta and import the data into an spreadsheet

The Method

■ Calculate the Mode/Average Response Time of your system

■ The Mode is the value that the most often appears in a set of data,

■ Average can make you lose the clear picture of the DB response time (skew data). Median is good as well!

Measure

time (skew data). Median is good as well!

■ See the difference between average, mode & median!

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

The Method

■ Plot Column SNAP_TIME Against NON_IDLE_WAIT, DB_CPU, UC & RT_MS_PER_UC

Graph

■ Great Visuals of Database Performance

� But still unclear!

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

The Method

■ Slice and Dice data to compare only last 3 Fridays

Compare

■ Remember that RT’s Mode was 2220 ms/uc

The Method

■ Identify the time of the performance problem

Compare

■ Spikes in Response Time & Non Idle Wait Time

■ Is slow?/How slow? 2.2 sec | > 40 sec

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

The Method

■ Focus on those 3 periods of time

▪ 5:30 am to 9:00 am

▪ 5:30 am to 9:00 am

▪ 7:00 am to 10:30 am

■ What changed when the performance “got bad”?

Analyze

■ Use AWR/Statspack reports to Find Bottleneck

The Method

■ Find Configuration Problems, Bad SQL, OS problems

Analyze

■ Sessions asking for the same buffers through multiple instances

The Method

■ What changed?

Analyze

The Method

Who was correct?

■ Both were right! But get always the whole picture of the issue.

■ Question your system!

The Method

■ Change application design

■ Increase network interconnect speed

■ Setup DB service

Possible Solutions

■ Setup DB service

■ All solutions tied to company’s needs!

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Anticipate

■ AWR & Statspack Not Real Time Data!

■ System Views v$sysstat, v$system_event & v$sys_time_model Close to Real Time

▪ Measure Response Time (Shrink time between Snapshots)

▪ Compare Against RT’s Mode/Median/Average

▪ Alert if Threshold Reached (Threshold not easy to define!)

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Tools

■ Oracle Enterprise Manager

▪ Out of the box Metrics (RT per TXN)

▪ Adaptive Thresholds (fixed value, percent of maximum and significance

level)

▪ Metric Extensions

■ 3rd Party Performance Tuning SW (cost)

Agenda

■ Introduction

■ Performance Challenge

■ Case Study

■ The Method

– Set Goals

– Measure– Measure

– Graph

– Compare

– Analyze

■ Anticipate

■ Tools

■ Summary

Summary

■ Follow a method

■ Ask your DB system not just to users, know your system!, but also ask your users

■ Users experience different Response Times than DB system

■ Don’t waste your time by analyzing useless snapshots

■ Time is money!

■ If available, make use of provided tools

Want more?

■ Shallahamer, Craig

▪ Book: “Oracle Performance Firefighting” (ISBN 978-0-9841023-0-3)

▪ http://shallahamer-orapub.blogspot.mx/2010/11/average-challenge-

part-1.html

■ Sztrik, Dr. János

▪ Basic Queuing Theory

http://irh.inf.unideb.hu/~jsztrik/education/16/SOR_Main_Angol.pdf

■ Millsap, Cary

▪ Thinking clearly about performance

▪ http://method-r.com/papers/file/44-thinking-clearly-about-performance

http://method-r.com/papers/doc_download/44-thinking-clearly-about-performance-cary-millsap

Thank You!

Special thanks to:My manager & supervisor in The Sherwin Williams Company

Friends and colleagues that helped with suggestions

Friend and mentor from OR for all the support and suggestions

My family to understand the extra time working

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how

to complete the session evaluation online