data dependent routing may not be necessary when using oracle rac

Data Dependent Routing may not be necessary when using Oracle RAC

Ken Gottry

Apr-2003

Through Technology Improvements in:Oracle 9i - RACOracle 9i - CacheFusionSolaris - RSMSun Cluster – SunFire Link

www.gottry.com2

Objective

• To provide a brief overview of several new technologies that have been implemented by Oracle and Sun over the past 18 months. These include:

• Oracle 9i RAC database cluster• Oracle 9i CacheFusion• Solaris Remote Shared Memory (RSM)• Sun Cluster SunFire Link

• To suggest that, based on the above improvements, application logic to implement data dependent routing may no longer be as important when using an Oracle RAC database cluster.

www.gottry.com3

Agenda

• Executive Summary• HA-Oracle vs. OPS/RAC• Pinging in OPS• Pinging in RAC• Data dependent routing (DDR)• Oracle 9i CacheFusion• Solaris remote shared memory (RSM)• Sun Cluster Interconnect – SunFire Link

www.gottry.com4

Executive Summary

• What was called Oracle Parallel Server (OPS) in 8i is now called Real Application Cluster (RAC) in 9i

• CacheFusion in 9i reduces pinging degradation from 20% in OPS to 5-10%

• Oracle 9i can use Solaris Remote Shared Memory (RSM) to move CacheFusion into the kernel level. Pinging degradation may be reduced to 3-5%

• Sun Cluster supports SunFire Link, a 1.6 Gbps pipe between cluster nodes with less than 1 ms latency. Up to 6 SunFire Link interconnects between nodes will allow striping of data transfer. Pinging degradation may be reduced to 1-3%

• With such reduction in pinging degradation, is data dependent routing (DDR) a design concern any more?

www.gottry.com5

HA-Oracle vs. RAC

AppServer

AppServer

DBServer

DBServer

Database

GCS

GCS

HA-Oracle• Only one DB server active at a

time• Failover may take a long time

AppServer

AppServer

DBServer

DBServer

Database

AppServer

AppServer

DBServer

DBServer

Database

AppServer

AppServer

DBServer

DBServer

Database

GCS

GCS

RAC• Both DB servers active so throughput

is often 80-90% more than with HA-Oracle

• Distributed Lock Mgr (DLM) called Global Cache Service (GCS) in 9i

• Failover is immediate• Requires application coding

Failover

Failover

www.gottry.com6

Pinging with 8i OPS

Pinging

Reduced throughput when DB node #1 has to ask DB node #2 if it has the needed block before DB node #1 can update it

AppServer

DBServer

#1

DBServer

#2

Database

DLM

DLM

1 UPDATE salary = $1Mw here emp="Gottry"

2 do you haveblock #123?

3 w rites block#123 to disk

4it's all yours

5reads andupdates block#123

6Updatecomplete

Oracle 8i OPS

DB node #2 had to flush the block to disk before DB node #1 could have it

Throughput was degraded about 20% with OPS pinging.

Example: assume one DB node could process 100 tps. When adding a second DB node, you would expect the OPS database cluster to process 200 tps. However, due to the pinging overhead, you would normally see

(100 + 100) – (20% * (100 + 100)) = 200 – 40 = 160 tps

www.gottry.com7

Pinging with 9i RAC

AppServer

DBServer

#1

DBServer

#2

Database

GCS

GCS

1 UPDATE salary = $1Mw here emp="Gottry"

2 do you haveblock #123?

3Here's block#123. it's allyours

4updates block#123 in cacheand w rites to disk

5Updatecomplete

CacheFusion

Oracle 9i RAC

Using CacheFusion, DB node #2 pushes the block to DB node #1 over the cluster interconnect.

Pinging still occurs within RAC, but is much faster because the block is transferred between cache without a disk write by DB node #2

Throughput degraded about 10% with RAC pinging.

Example: assume one DB node could process 100 tps. When adding a second DB node, you would expect the RAC database cluster to process 200 tps. However, due to the pinging overhead, you would normally see

(100 + 100) – (10% * (100 + 100)) = 200 – 20 = 180 tps

www.gottry.com8

Data Dependent Routing (DDR)

App knows that DB server #1 is the primary handler of the portion of the DB containing Patient ID’s 1-500. So, app sends SQL request for patient ID 200 to DB server #1 to minimize impact of pinging

AppServer

DBServer

#1

DBServer

#2

Database

GCS

GCS

1

Patient_ID = 200

2do you haveblock #123?

3Nope

4reads andupdates

block #123

DB Partitioning

Patient ID1-500

Patient ID501-1000

App knows that DB server #2 is the primary handler of the portion of the DB containing Patient ID’s 501-1000. So, app sends SQL request for patient ID 800 to DB server #2 to minimize impact of pinging

AppServer

DBServer

#1

DBServer

#2

Database

GCS

GCS

1

Patient_ID = 800

2do you haveblock #456?

3Nope

4reads and updatesblock #456

DB Partitioning

Patient ID1-500

Patient ID501-1000

Notice ping still happens, but no block transfer is required. It’s the block transfer that can degrade throughput by up to 5-20%

To minimize the impact of pinging, architects often partition the DB, making one DB node primarily responsible for one-half the DB and the other DB node primarily responsible for the other half. The application must then contain data dependent routing logic that decides to which DB node to send each SQL call

www.gottry.com9

CacheFusion andRemote Shared Memory (RSM)

Oracle 9i CacheFusion makes the cache on multiple DB nodes act as one. This speeds up block transfer when it’s needed.

cache cacheCacheFusion

DB Node #1 DB Node #2

CacheFusioncache

Oracle

cluster

kernel

cache

Oracle

cluster

kernel

CacheFusioncache

Oracle

cluster

kernel

cache

Oracle

cluster

kernel

RSM

Taking a closer look, CacheFusion is implemented at the application (Oracle) level

Solaris’ Remote Shared Memory (RSM) allows clustered apps to share memory at the kernel level. Oracle 9.1 implements RSM-API

www.gottry.com10

SunFire Link Interconnect

Nodes of a cluster use a private network connection between the nodes to communicate.

Heartbeat (“are you alive”) info is exchanged over the cluster interconnect

Previously Sun Cluster supported two types of interconnect:

• ethernet (100Mbps)• proprietary SCI (200 Mbps)

In Apr-2003, Sun Cluster announced support for proprietary SunFire Link interconnect (1.6Gbps). Up to 6 SFL interconnects can be used to stripe the data as it is transferred

cluster

kernel

cluster

kernel

clusterinterconnect

DB Node #1 DB Node #2

cluster

kernel

cluster

kernel

SunFire LinkInterconnect

cluster

kernel

cluster

kernel

striped xfer

www.gottry.com11

Is Data Dependent Routing Needed?

This chart and table show the relative improvement in throughput using the new technologies.

Perhaps this improvement is good enough to avoid adding data dependent routing logic to your application.

Based on a 2 node DB cluster with each node capable of 100 tps

Configuration Degradation Throughput Total

OPS 20% 80 + 80 160

RAC with CacheFusion 10% 90 + 90 180

RAC with RSM 7% 93 + 93 186

RAC with RSM and SunFire Link 3% 97 + 97 194

Ideal 0% 100 + 100 200

0

50

100

150

200

250

OPS CacheFusion RSM SunFireLink

Ideal Throughput (200 tps)

data dependent routing may not be necessary when using oracle rac

Documents