copyright © 2006 quest software rac be nimble, rac be quick bert scalzo, domain expert, oracle...

37
Copyright © 2006 Quest Software RAC be Nimble, RAC be Quick Bert Scalzo, Domain Expert, Oracle Solutions [email protected]

Upload: simon-hopkins

Post on 16-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Copyright © 2006 Quest Software

RAC be Nimble,RAC be Quick

Bert Scalzo, Domain Expert, Oracle Solutions

[email protected]

2

About the Author …Domain Expert & Product Architect for Quest Software

Oracle Background:• Worked with Oracle databases for well over two decades (starting with version 4)• Work history includes time at both Oracle Education and Oracle Consulting

Academic Background:• Several Oracle Masters certifications• BS, MS and PhD in Computer Science• MBA (general business)• Several insurance industry designations

Key Interests:• Data Modeling• Database Benchmarking• Database Tuning & Optimization• "Star Schema" Data Warehouses• Oracle on Linux – and specifically: RAC on Linux

Articles for:• Oracle’s Technology Network (OTN)• Oracle Magazine,• Oracle Informant• PC Week (eWeek)

Articles for:• Dell Power Solutions

Magazine• The Linux Journal• www.linux.com• www.orafaq.com

3

Books by Author … Potentially in 2008 …

4

Agenda

• RAC Performance– “RAC in the Box” syndrome

– Must Optimize every subsystem

• RAC Optimization Approach– RAC Optimization Methodology

– via Quest’s RAC Tools (just example - not a sales pitch)

• Real-world Scenario– Dell’s use of Quest’s Solution for Oracle RAC

– “Best Practices” applied incrementally & results

5

Optimization Focus

• “Low Hanging Fruit”

• Obvious yet Overlooked

• Subtle yet Highly Critical

Radically Different than other RAC tuning sessions:

Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations,

Just easy stuff that makes a big difference

6

Oracle RAC is Great, but …

• Too often people expect RAC to “auto-magically” function out of the box with little to no optimization

• During RAC optimization attempts, people far too often concentrate on just a single dimension – the Oracle database “stuff” (i.e. waits, parameters, etc)

• During RAC optimization attempts, Oracle is often too heavily weighted as the source and reason for most, if not all, of the performance bottlenecks

• Not enough “application nature” is identified and accounted for during the optimization process

• Result: Too many people achieve sub-par results!

7

I call it “RAC in the BOX” syndrome

• That’s just my stupid name for it (hope it catches)

• But nearly half the RAC sites I visit are suffering from RAC performance issues related to this!

• Still far too RAC experts among the general DBA population (although improving every day)

• Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders”

• Many people bail on RAC far too soon and simply fall-back to big SMP boxes (the evil that they know)

• But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free

8

RAC is System, Must Tune its Entirety

9

RAC Performance = Sum of its Parts

• Application Nature (affects everything else below)

• Public Network

• Storage Network

• Storage Sub-System

• Oracle Instance Configuration

• Oracle Cluster Configuration

• Private Network (i.e. Interconnect)

10

RAC Performance

Testing Methodology

1. Benchmark Factory www.TPC.org testsor Oracle trace file playback generator

3. TOAD with DBA Module“AWR/ADDM Reports “

Identify and correctTop five wait events

RAC Performance Testing Methodology

2. Spotlight on RAC Monitor RAC system for performance bottlenecks

11

Quest Solutions for Oracle RAC

Benchmark Factory®

• Test Oracle RAC environments rapidly and reliably

• Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration

• Tests:

– TPC-C

– TPC-H

– Trace File playback

– etc, etc, etc …

• Let’s DBA concentrate on task at hand - Optimization

12

13

Quest Solutions for Oracle RAC Spotlight® on RAC

• Monitor Oracle RAC environments rapidly and reliably

• Diagnose Oracle RAC environment health levels at– Node

– Cluster

– ASM

– Instance

– Interconnect

• Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks

• Let’s DBA concentrate on task at hand - Diagnosing

14

15

16

17

Quest Solutions for Oracle RAC TOAD® with DBA Module

• Expedite typical DBA management & tuning tasks

• Great Productivity Enhancing Features– Database Health Check

– Database Probe

– Database Monitor

– AWR/ADDM Reports

– UNIX Monitor

– Stats Pack Reports (coming soon)

• See Toad World paper– Title: “Maximize Database Performance Via Toad for Oracle”

– http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

• Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing)

18

19

Real-world Scenario

Quest strategic partner & customer Dell uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays

20

DELL Success Story

www.Quest.com/success_stories/Dell-Quest.pdf

21

Database Configuration used at DELL for the RAC test environment

DELL Success Story

22

DELL Success Story

Near

Linear

Scalability!

23

Step-By-Step Example

Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize

and quantify the approximate percentage of the improvements

Note – will quote some specific examples for a given RAC setup, your mileage will surely vary

Test = TPC-C (OLTP) for 200-2000 users, 10 GB

24

Step 1 - Application NatureKnow Your Application Demands (this info flows downstream)

• OLTP vs. Data Warehousing– Primarily Read vs. Write

– Average Transaction Size

– Likelihood of “Dead Lock”

– Logging, Flashback and Recovery Requirements

– etc, etc, etc …

• Concurrent User Load Profile (i.e. user load over time)

• Focus on User Response Time Requirements– For example, TPC-C must run each transaction <= 2 seconds

– Response Time ~= Wait Events

• Cary Millsap (Hotsos) calls this “Method R”

• Anjo Kolk & et al … (Oracle) call this “YAPP Method”

• Kyle Haily (PerfVision) paper “Waits Defined”

• etc, etc, etc …

• Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!!

25

Application “Best Practices”Well known rules:

• Write efficient SQL and/or PL/SQL code (explain plans)

• Use bind variables to reduce unnecessary “re-parsing”

• Often the underlying Application Code (e.g. Benchmark) NOT changeable, so you can’t do anything

Deeper TPC-C Analysis (remember across all next steps):

• Primarily Read with Some Writes

• Small Average Transaction Size

• High Concurrency with Potential Deadlocks

• Logging for ACID compliance and no flashback

26

Public Network “Best Practices”

Well known rules:

• Isolate Network (for single or related applications only)

• Use Gigabit Ethernet (consider “bonding” multiple cards)

• Use Layer 2 or 3 Switches and verify Gigabit throughput

TPC-C Analysis Ramifications:

• Primarily Reads = Nothing

• Small Transaction = No jumbo frames, Standard SDU/TCU

• High Concurrency = Multiple Ethernet Segments (collisions)

• No Logging, etc… = Nothing

27

Storage Network “Best Practices”

Well known rules:

• Isolate Network (for single or related applications only)

• Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI

• Consider multiple pathways per storage controller and HBA

• Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s

TPC-C Analysis Ramifications:

• Primarily Reads = Nothing

• Small Transaction = Jumbo frames since “Block Level” IO

• High Concurrency = Fiber Channel and Multiple Pathways (if budget)

• No Logging, etc… = Nothing

28

Storage Sub-System “Best Practices”Well known rules:• More Smaller Disks generally higher overall throughput• More memory cache generally higher overall throughput• Avoid “write-back” mode if no backup power source (e.g. battery)• Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc• Stripe Depth (i.e. size) from 256 KB to 1 MB• Stripe Width (i.e. # disks) between 4 and 16• Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request• Average I/O <= Stripe Width X Stripe Depth• Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered)

TPC-C Analysis Ramifications:• Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms• Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count• High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s• No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK

29

Oracle Instance “Best Practices”

Well known rules:

• Size & Tune the SGA appropriately for application nature

• Choose reasonable block size based on application nature (?8K?)

• Partition large objects & indexes across storage devices & spindles

• Don’t assume any “golden rules” – i.e. test all assumptions!

TPC-C Analysis Ramifications:

• Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20

• Small Transaction = Size redo logs correctly for small size X high load

• High Concurrency = cursor_space_for_time=t, cursor_sharng=similar

• No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING”

30

Oracle Cluster “Best Practices”

Well known rules:

• Increase default SGA size for all ASM instances (64M too small)

• Interconnect is the most important bottleneck – many bonded NIC’s

• Consider hash partitions & reverse indexes to spread IO across nodes

• Don’t assume any “golden rules” – i.e. really test all assumptions!!!

TPC-C Analysis Ramifications:

• Primarily Reads = Nothing

• Small Transaction = Decrease db_file_multiblock_read_count

• High Concurrency = Decrease block size (?4K?) – see next slide

• No Logging, etc… = Nothing

31

Single Instance – No Block Contention

32

Cluster – Block Contention Costs

33

Private Network “Best Practices”

Well known rules:

• Isolate Network (for single cluster only – and 0% public)

• Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI)

• Consider multiple pathways per HBA and storage controller

• Jumbo frames since high “Block Level” IO between nodes

TPC-C Analysis Ramifications:

• Primarily Reads = Nothing

• Small Transaction = Nothing

• High Concurrency = Lower block size until interconnect traffic OK

– Consider increasing the OS priority of the global cache cluster services

• No Logging, etc… = Nothing

34

Let’s Apply the Recommendations

Read Count

Block Size

Cursor

Space

Cursor

Share

Index Cach

e

Index Cost

Jumbo

Test 1 16 8 False Exact 0 100 False

Test 2 2 8 False Exact 0 100 False

Test 3 2 4 False Exact 0 100 False

Test 4 2 4 True True 80 20 False

Test 5 2 4 True True 80 20 True

35

Results – TPS (lesser interest here)

Transactions / Second

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

50 100 150 200 250 300 350 400 450 500

Run 1

Run 2

Run 3

Run 4

Run 5

36

Results – Average Response Time

Average Response Time

0.00

1.00

2.00

3.00

4.00

5.00

6.00

50 100 150 200 250 300 350 400 450 500

Run 1

Run 2

Run 3

Run 4

Run 5

SubSecond

37

Thank youPlease offer any questions or comments

Remember:

• Eliminate “RAC in the Box” syndrome – eat low hanging fruit

• Example was for TPC-C or OLTP type application

• TPC-H or Data Warehouse would NOT be the same

• Your mileage may well vary (especially percentages)

Toad World Article: “Maximize Database Performance Via Toad for Oracle”

http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

Dell Power Solutions article:

http://www.quest.com/success_stories/Dell-Quest.pdf