data dependent routing may not be necessary when using oracle rac
DESCRIPTION
Through Technology Improvements in:. Data Dependent Routing may not be necessary when using Oracle RAC. Oracle 9i - RAC Oracle 9i - CacheFusion Solaris - RSM Sun Cluster – SunFire Link. Ken Gottry Apr-2003. Objective. - PowerPoint PPT PresentationTRANSCRIPT
Data Dependent Routing may not be necessary when using Oracle RAC
Ken Gottry
Apr-2003
Through Technology Improvements in:Oracle 9i - RACOracle 9i - CacheFusionSolaris - RSMSun Cluster – SunFire Link
www.gottry.com2
Objective
• To provide a brief overview of several new technologies that have been implemented by Oracle and Sun over the past 18 months. These include:
• Oracle 9i RAC database cluster• Oracle 9i CacheFusion• Solaris Remote Shared Memory (RSM)• Sun Cluster SunFire Link
• To suggest that, based on the above improvements, application logic to implement data dependent routing may no longer be as important when using an Oracle RAC database cluster.
www.gottry.com3
Agenda
• Executive Summary• HA-Oracle vs. OPS/RAC• Pinging in OPS• Pinging in RAC• Data dependent routing (DDR)• Oracle 9i CacheFusion• Solaris remote shared memory (RSM)• Sun Cluster Interconnect – SunFire Link
www.gottry.com4
Executive Summary
• What was called Oracle Parallel Server (OPS) in 8i is now called Real Application Cluster (RAC) in 9i
• CacheFusion in 9i reduces pinging degradation from 20% in OPS to 5-10%
• Oracle 9i can use Solaris Remote Shared Memory (RSM) to move CacheFusion into the kernel level. Pinging degradation may be reduced to 3-5%
• Sun Cluster supports SunFire Link, a 1.6 Gbps pipe between cluster nodes with less than 1 ms latency. Up to 6 SunFire Link interconnects between nodes will allow striping of data transfer. Pinging degradation may be reduced to 1-3%
• With such reduction in pinging degradation, is data dependent routing (DDR) a design concern any more?
www.gottry.com5
HA-Oracle vs. RAC
AppServer
AppServer
DBServer
DBServer
Database
GCS
GCS
HA-Oracle• Only one DB server active at a
time• Failover may take a long time
AppServer
AppServer
DBServer
DBServer
Database
AppServer
AppServer
DBServer
DBServer
Database
AppServer
AppServer
DBServer
DBServer
Database
GCS
GCS
RAC• Both DB servers active so throughput
is often 80-90% more than with HA-Oracle
• Distributed Lock Mgr (DLM) called Global Cache Service (GCS) in 9i
• Failover is immediate• Requires application coding
Failover
Failover
www.gottry.com6
Pinging with 8i OPS
Pinging
Reduced throughput when DB node #1 has to ask DB node #2 if it has the needed block before DB node #1 can update it
AppServer
DBServer
#1
DBServer
#2
Database
DLM
DLM
1 UPDATE salary = $1Mw here emp="Gottry"
2 do you haveblock #123?
3 w rites block#123 to disk
4it's all yours
5reads andupdates block#123
6Updatecomplete
Oracle 8i OPS
DB node #2 had to flush the block to disk before DB node #1 could have it
Throughput was degraded about 20% with OPS pinging.
Example: assume one DB node could process 100 tps. When adding a second DB node, you would expect the OPS database cluster to process 200 tps. However, due to the pinging overhead, you would normally see
(100 + 100) – (20% * (100 + 100)) = 200 – 40 = 160 tps
www.gottry.com7
Pinging with 9i RAC
AppServer
DBServer
#1
DBServer
#2
Database
GCS
GCS
1 UPDATE salary = $1Mw here emp="Gottry"
2 do you haveblock #123?
3Here's block#123. it's allyours
4updates block#123 in cacheand w rites to disk
5Updatecomplete
CacheFusion
Oracle 9i RAC
Using CacheFusion, DB node #2 pushes the block to DB node #1 over the cluster interconnect.
Pinging still occurs within RAC, but is much faster because the block is transferred between cache without a disk write by DB node #2
Throughput degraded about 10% with RAC pinging.
Example: assume one DB node could process 100 tps. When adding a second DB node, you would expect the RAC database cluster to process 200 tps. However, due to the pinging overhead, you would normally see
(100 + 100) – (10% * (100 + 100)) = 200 – 20 = 180 tps
www.gottry.com8
Data Dependent Routing (DDR)
App knows that DB server #1 is the primary handler of the portion of the DB containing Patient ID’s 1-500. So, app sends SQL request for patient ID 200 to DB server #1 to minimize impact of pinging
AppServer
DBServer
#1
DBServer
#2
Database
GCS
GCS
1
Patient_ID = 200
2do you haveblock #123?
3Nope
4reads andupdates
block #123
DB Partitioning
Patient ID1-500
Patient ID501-1000
App knows that DB server #2 is the primary handler of the portion of the DB containing Patient ID’s 501-1000. So, app sends SQL request for patient ID 800 to DB server #2 to minimize impact of pinging
AppServer
DBServer
#1
DBServer
#2
Database
GCS
GCS
1
Patient_ID = 800
2do you haveblock #456?
3Nope
4reads and updatesblock #456
DB Partitioning
Patient ID1-500
Patient ID501-1000
Notice ping still happens, but no block transfer is required. It’s the block transfer that can degrade throughput by up to 5-20%
To minimize the impact of pinging, architects often partition the DB, making one DB node primarily responsible for one-half the DB and the other DB node primarily responsible for the other half. The application must then contain data dependent routing logic that decides to which DB node to send each SQL call
www.gottry.com9
CacheFusion andRemote Shared Memory (RSM)
Oracle 9i CacheFusion makes the cache on multiple DB nodes act as one. This speeds up block transfer when it’s needed.
cache cacheCacheFusion
DB Node #1 DB Node #2
CacheFusioncache
Oracle
cluster
kernel
cache
Oracle
cluster
kernel
CacheFusioncache
Oracle
cluster
kernel
cache
Oracle
cluster
kernel
RSM
Taking a closer look, CacheFusion is implemented at the application (Oracle) level
Solaris’ Remote Shared Memory (RSM) allows clustered apps to share memory at the kernel level. Oracle 9.1 implements RSM-API
www.gottry.com10
SunFire Link Interconnect
Nodes of a cluster use a private network connection between the nodes to communicate.
Heartbeat (“are you alive”) info is exchanged over the cluster interconnect
Previously Sun Cluster supported two types of interconnect:
• ethernet (100Mbps)• proprietary SCI (200 Mbps)
In Apr-2003, Sun Cluster announced support for proprietary SunFire Link interconnect (1.6Gbps). Up to 6 SFL interconnects can be used to stripe the data as it is transferred
cluster
kernel
cluster
kernel
clusterinterconnect
DB Node #1 DB Node #2
cluster
kernel
cluster
kernel
SunFire LinkInterconnect
cluster
kernel
cluster
kernel
striped xfer
www.gottry.com11
Is Data Dependent Routing Needed?
This chart and table show the relative improvement in throughput using the new technologies.
Perhaps this improvement is good enough to avoid adding data dependent routing logic to your application.
Based on a 2 node DB cluster with each node capable of 100 tps
Configuration Degradation Throughput Total
OPS 20% 80 + 80 160
RAC with CacheFusion 10% 90 + 90 180
RAC with RSM 7% 93 + 93 186
RAC with RSM and SunFire Link 3% 97 + 97 194
Ideal 0% 100 + 100 200
0
50
100
150
200
250
OPS CacheFusion RSM SunFireLink
Ideal Throughput (200 tps)