copyright © 2006 quest software rac be nimble, rac be quick bert scalzo, domain expert, oracle...

Copyright © 2006 Quest Software

RAC be Nimble,RAC be Quick

Bert Scalzo, Domain Expert, Oracle Solutions

[email protected]

2

About the Author …Domain Expert & Product Architect for Quest Software

Oracle Background:• Worked with Oracle databases for well over two decades (starting with version 4)• Work history includes time at both Oracle Education and Oracle Consulting

Academic Background:• Several Oracle Masters certifications• BS, MS and PhD in Computer Science• MBA (general business)• Several insurance industry designations

Key Interests:• Data Modeling• Database Benchmarking• Database Tuning & Optimization• "Star Schema" Data Warehouses• Oracle on Linux – and specifically: RAC on Linux

Articles for:• Oracle’s Technology Network (OTN)• Oracle Magazine,• Oracle Informant• PC Week (eWeek)

Articles for:• Dell Power Solutions

Magazine• The Linux Journal• www.linux.com• www.orafaq.com

3

Books by Author … Potentially in 2008 …

4

Agenda

• RAC Performance– “RAC in the Box” syndrome

– Must Optimize every subsystem

• RAC Optimization Approach– RAC Optimization Methodology

– via Quest’s RAC Tools (just example - not a sales pitch)

• Real-world Scenario– Dell’s use of Quest’s Solution for Oracle RAC

– “Best Practices” applied incrementally & results

5

Optimization Focus

• “Low Hanging Fruit”

• Obvious yet Overlooked

• Subtle yet Highly Critical

Radically Different than other RAC tuning sessions:

Not going to delve into obscure hardware, OS, network, Oracle and RAC tuning parameters or configurations,

Just easy stuff that makes a big difference

6

Oracle RAC is Great, but …

• Too often people expect RAC to “auto-magically” function out of the box with little to no optimization

• During RAC optimization attempts, people far too often concentrate on just a single dimension – the Oracle database “stuff” (i.e. waits, parameters, etc)

• During RAC optimization attempts, Oracle is often too heavily weighted as the source and reason for most, if not all, of the performance bottlenecks

• Not enough “application nature” is identified and accounted for during the optimization process

• Result: Too many people achieve sub-par results!

7

I call it “RAC in the BOX” syndrome

• That’s just my stupid name for it (hope it catches)

• But nearly half the RAC sites I visit are suffering from RAC performance issues related to this!

• Still far too RAC experts among the general DBA population (although improving every day)

• Really no simple, single button tools yet to make RAC “auto-magically” “fire on all cylinders”

• Many people bail on RAC far too soon and simply fall-back to big SMP boxes (the evil that they know)

• But with just a little manual “box winding”, anyone should be able to “pop the RAC weasel” free

8

RAC is System, Must Tune its Entirety

9

RAC Performance = Sum of its Parts

• Application Nature (affects everything else below)

• Public Network

• Storage Network

• Storage Sub-System

• Oracle Instance Configuration

• Oracle Cluster Configuration

• Private Network (i.e. Interconnect)

10

RAC Performance

Testing Methodology

1. Benchmark Factory www.TPC.org testsor Oracle trace file playback generator

3. TOAD with DBA Module“AWR/ADDM Reports “

Identify and correctTop five wait events

RAC Performance Testing Methodology

2. Spotlight on RAC Monitor RAC system for performance bottlenecks

11

Quest Solutions for Oracle RAC

Benchmark Factory®

• Test Oracle RAC environments rapidly and reliably

• Perform database “scalability” or “goal” testing to determine the most optimal RAC configuration

• Tests:

– TPC-C

– TPC-H

– Trace File playback

– etc, etc, etc …

• Let’s DBA concentrate on task at hand - Optimization

13

Quest Solutions for Oracle RAC Spotlight® on RAC

• Monitor Oracle RAC environments rapidly and reliably

• Diagnose Oracle RAC environment health levels at– Node

– Cluster

– ASM

– Instance

– Interconnect

• Intelligent performance alerts plus market-leading GUI for entire RAC to instances architecture & bottlenecks

• Let’s DBA concentrate on task at hand - Diagnosing

17

Quest Solutions for Oracle RAC TOAD® with DBA Module

• Expedite typical DBA management & tuning tasks

• Great Productivity Enhancing Features– Database Health Check

– Database Probe

– Database Monitor

– AWR/ADDM Reports

– UNIX Monitor

– Stats Pack Reports (coming soon)

• See Toad World paper– Title: “Maximize Database Performance Via Toad for Oracle”

– http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

• Let’s DBA concentrate on task at hand – Correcting (i.e. Fixing)

19

Real-world Scenario

Quest strategic partner & customer Dell uses Quest’s solution for Oracle RAC to test the performance of the Oracle RAC architecture running on Dell Power Edge servers and EMC Clarion SAN & iSCSI Disk Arrays

20

DELL Success Story

www.Quest.com/success_stories/Dell-Quest.pdf

21

Database Configuration used at DELL for the RAC test environment

DELL Success Story

22

DELL Success Story

Near

Linear

Scalability!

23

Step-By-Step Example

Apply Methodology, “Best Practices”, and Quest’s RAC tools to optimize

and quantify the approximate percentage of the improvements

Note – will quote some specific examples for a given RAC setup, your mileage will surely vary

Test = TPC-C (OLTP) for 200-2000 users, 10 GB

24

Step 1 - Application NatureKnow Your Application Demands (this info flows downstream)

• OLTP vs. Data Warehousing– Primarily Read vs. Write

– Average Transaction Size

– Likelihood of “Dead Lock”

– Logging, Flashback and Recovery Requirements

– etc, etc, etc …

• Concurrent User Load Profile (i.e. user load over time)

• Focus on User Response Time Requirements– For example, TPC-C must run each transaction <= 2 seconds

– Response Time ~= Wait Events

• Cary Millsap (Hotsos) calls this “Method R”

• Anjo Kolk & et al … (Oracle) call this “YAPP Method”

• Kyle Haily (PerfVision) paper “Waits Defined”

• etc, etc, etc …

• Don’t skip this step – cost can be enormous – and that no network, OS, or database tuning can compensate for !!!!!

25

Application “Best Practices”Well known rules:

• Write efficient SQL and/or PL/SQL code (explain plans)

• Use bind variables to reduce unnecessary “re-parsing”

• Often the underlying Application Code (e.g. Benchmark) NOT changeable, so you can’t do anything

Deeper TPC-C Analysis (remember across all next steps):

• Primarily Read with Some Writes

• Small Average Transaction Size

• High Concurrency with Potential Deadlocks

• Logging for ACID compliance and no flashback

26

Public Network “Best Practices”

Well known rules:

• Isolate Network (for single or related applications only)

• Use Gigabit Ethernet (consider “bonding” multiple cards)

• Use Layer 2 or 3 Switches and verify Gigabit throughput

TPC-C Analysis Ramifications:

• Primarily Reads = Nothing

• Small Transaction = No jumbo frames, Standard SDU/TCU

• High Concurrency = Multiple Ethernet Segments (collisions)

• No Logging, etc… = Nothing

27

Storage Network “Best Practices”

Well known rules:

• Isolate Network (for single or related applications only)

• Use Fiber Channel for SAN, 10GB Ethernet for NAS/iSCSI

• Consider multiple pathways per storage controller and HBA

• Consider TCP/IP offload engine (TOE) NIC’s or iSCSI HBA’s



• Small Transaction = Jumbo frames since “Block Level” IO

• High Concurrency = Fiber Channel and Multiple Pathways (if budget)


28

Storage Sub-System “Best Practices”Well known rules:• More Smaller Disks generally higher overall throughput• More memory cache generally higher overall throughput• Avoid “write-back” mode if no backup power source (e.g. battery)• Align Stripe Boundaries: drive, OS block, LVM, file sys, database block, etc• Stripe Depth (i.e. size) from 256 KB to 1 MB• Stripe Width (i.e. # disks) between 4 and 16• Stripe Depth = Stripe Width X Drive IO Size = One IO per Disk per IO request• Average I/O <= Stripe Width X Stripe Depth• Write-intensive = RAID 0+1/1+0 and Read-intensive = RAID 3 (sequential) or 5 (scattered)

TPC-C Analysis Ramifications:• Primarily Reads = RAID 5, Adjust cache memory allocations & look-ahead algorithms• Small Transaction = Stripe Depth >= db_block_size X db_file_multiblock_read_count• High Concurrency = Low Deadlock, so spread DB objects and partitions across LUN’s• No Logging, etc… = Logging but no flashback, so write IO is reasonable, so RAID 5 OK

29

Oracle Instance “Best Practices”

Well known rules:

• Size & Tune the SGA appropriately for application nature

• Choose reasonable block size based on application nature (?8K?)

• Partition large objects & indexes across storage devices & spindles

• Don’t assume any “golden rules” – i.e. test all assumptions!


• Primarily Reads = opt_index_caching=80, opt_index_adj_cost=20

• Small Transaction = Size redo logs correctly for small size X high load

• High Concurrency = cursor_space_for_time=t, cursor_sharng=similar

• No Logging, etc… = Turn off “Recycle Bin” but keep “LOGGING”

30

Oracle Cluster “Best Practices”

Well known rules:

• Increase default SGA size for all ASM instances (64M too small)

• Interconnect is the most important bottleneck – many bonded NIC’s

• Consider hash partitions & reverse indexes to spread IO across nodes

• Don’t assume any “golden rules” – i.e. really test all assumptions!!!



• Small Transaction = Decrease db_file_multiblock_read_count

• High Concurrency = Decrease block size (?4K?) – see next slide


31

Single Instance – No Block Contention

32

Cluster – Block Contention Costs

33

Private Network “Best Practices”

Well known rules:

• Isolate Network (for single cluster only – and 0% public)

• Use 10GB Ethernet or Inifini-Band (Dell found 15% RTI)

• Consider multiple pathways per HBA and storage controller

• Jumbo frames since high “Block Level” IO between nodes



• Small Transaction = Nothing

• High Concurrency = Lower block size until interconnect traffic OK

– Consider increasing the OS priority of the global cache cluster services


34

Let’s Apply the Recommendations

Read Count

Block Size

Cursor

Space

Cursor

Share

Index Cach

e

Index Cost

Jumbo

Test 1 16 8 False Exact 0 100 False



Test 4 2 4 True True 80 20 False

Test 5 2 4 True True 80 20 True

35

Results – TPS (lesser interest here)

Transactions / Second

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

50 100 150 200 250 300 350 400 450 500

Run 1

Run 2

Run 3

Run 4

Run 5

36

Results – Average Response Time

Average Response Time

0.00

1.00

2.00

3.00

4.00

5.00

6.00

50 100 150 200 250 300 350 400 450 500

Run 1

Run 2

Run 3

Run 4

Run 5

SubSecond

37

Thank youPlease offer any questions or comments

Remember:

• Eliminate “RAC in the Box” syndrome – eat low hanging fruit

• Example was for TPC-C or OLTP type application

• TPC-H or Data Warehouse would NOT be the same

• Your mileage may well vary (especially percentages)

Toad World Article: “Maximize Database Performance Via Toad for Oracle”

http://www.toadworld.com/Education/ToadWorldPapersandPodcasts/tabid/82/Default.aspx

Dell Power Solutions article:

http://www.quest.com/success_stories/Dell-Quest.pdf

copyright © 2006 quest software rac be nimble, rac be quick bert scalzo, domain expert, oracle...

Documents

agenda rac performance

rac auto

rac experts

rac sites

rac optimization attempts

oracle rac best practices

rac performance issues

rac tuning parameters