racsig rac internals

24
1 <Insert Picture Here> Demystifying Oracle RAC Internals Barb Lundhild RAC Product Management The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Upload: pvnarayanan

Post on 11-May-2015

372 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Racsig rac internals

1

<Insert Picture Here>

Demystifying Oracle RAC Internals

Barb Lundhild RAC Product Management

The following is intended to outline our general

product direction. It is intended for information purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

The development, release, and timing of any features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Page 2: Racsig rac internals

2

<Insert Picture Here>

Agenda Answer most common questions about Oracle Clusterware and Oracle RAC

• Architecture

• Oracle Clusterware – Group Membership

• Oracle Cluster Registry

• The Interconnect

• The Public Network and the Virtual IP (VIP)

• Oracle RAC Startup/Shutdown

• Advanced Features of Oracle RAC

• Appendix

<Insert Picture Here>

Architecture

Page 3: Racsig rac internals

3

Service

RAC Architecture

public network

Node1

Operating System

Oracle Clusterwarecluster

interconnect

instance 1

ASM

Node n

Operating System

Oracle Clusterware

instance n

ASM

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

VIP1 VIPn

Managed by ASM

RAW Devices

Listener Listener

Service

What does Clusterware provide?

Operating System

Group Membership

High Availability

Framework

Process Monitor

VIP

Event Management

Oracle

Clusterware

Page 4: Racsig rac internals

4

Oracle Clusterware Architecture

Operating System

CSS

CRS

OPROC

VIP

RACG

EVM

Oracle

Clusterware

<Insert Picture Here>

Oracle Clusterware

Group Membership and Heartbeats

Page 5: Racsig rac internals

5

Oracle Clusterware Group Membership and Heartbeats

• Cluster needs to know who is a member at all times

• Oracle Clusterware has 2 heartbeats:

• Network heartbeat

If a node does not send a heartbeat for MissCount (time in

seconds), then node is evicted from cluster

• Disk heartbeat

If disk heartbeat is not updated in I/O timeout, then node is

evicted from cluster

Oracle Clusterware Split Brain Resolution

• When interconnect breaks – keeps the largest cluster

possible up, other nodes will be evicted, in 2 node cluster lowest number node remains.

• IO Fencing similar to the Stonith algorithm

• Voting disk is used to detect network problems that could

lead to a split-brain

• Final arbiter of the status of configured nodes, either up or down,

and delivers eviction notices

• Recommended to have at least 3 voting disks

• Standard NFS support for 3rd voting disk on Linux, AIX, or Solaris

Page 6: Racsig rac internals

6

IT IS NOT SUPPORTED TO

REDUCE MISSCOUNT BELOW

THE DEFAULT (30s)

<Insert Picture Here>

Oracle Cluster Registry

Page 7: Racsig rac internals

7

Oracle Cluster Registry (OCR)

• A repository containing the definition of the

configuration of the cluster and status of resources managed by the cluster

• Required file(s) for Oracle Clusterware

• Initialized during install of Oracle Clusterware

• Location defined in Registry on Windows or OCR.LOC on

Linux and Unix

• Mirrored by Oracle Clusterware or externally (RAID)

• Supports both automatic (every 4 hours) and manual

(new in 11.1) backups

• ocrconfig –manualbackup

Oracle Cluster Registry (OCR)

• Tools to manage OCR

• OCRCONFIG – command line tool to manage backups,

restore, import, export, repair, and replace

• Make sure you have a good backup before changing the

cluster configuration!

• OCRCHECK – checks integrity and displays the version of

the OCR's block format, total space available, used space,

and the OCR locations that you have configured

• OCRDUMP - view the OCR contents by writing OCR content to a file or stdout in a readable format.

Page 8: Racsig rac internals

8

<Insert Picture Here>

InterconnectFailure Protection and Scalability

Service

Private Interconnect

public network

Node1

Operating System

Oracle Clusterware

cluster

interconnect

instance 1

ASM

VIP1

Listener

Node 2

Operating System

Oracle Clusterware

instance 2

ASM

VIP2

Listener

Service

Switch 1 Switch 2

Node n

Operating System

Oracle Clusterware

instance n

ASM

VIPn

Listener

Service

/…/

Page 9: Racsig rac internals

9

The Interconnect

• Interconnect is typically a standard GigE network

• IP over IB is supported

• Network should use a private dedicated non-routable switch or VLAN

• A crossover cable is not supported as an interconnect

• For high availability and scalability use OS based

solution to combine multiple physical links into a single logical link

• Same technology can be applied to public network

• Only logical link should be provided to Oracle

Clusterware and therefore Oracle RAC

<Insert Picture Here>

Public Network and VIPFailure Protection

Page 10: Racsig rac internals

10

Why Oracle RAC has a VIP?

• Protects database clients from long TCP/IP timeouts

(can be >10 minutes)

• During normal operation, works the same as

hostname

• During failure, it removes network timeout from

connection request time, client fails immediately to

next address in the list

sales.us.acme.com =(DESCRIPTION=(ADDRESS_LIST=

(LOAD_BALANCE=on)(FAILOVER=ON)

(ADDRESS=(PROTOCOL=tcp)(HOST=sales1-vip)(PORT=1521))

(ADDRESS=(PROTOCOL=tcp)(HOST=sales2-vip)(PORT=1521)))

(CONNECT_DATA=

(SERVICE_NAME= sales.us.acme.com)))

Oracle RAC VIPThe Details

• One for each node in cluster

• Required for Oracle Clusterware installation

• IP and network name should not currently be in use

• Should be registered in DNS and must be on the same

subnet as public IP address

• Configuration managed by VIPCA and SRVCTL

• Note that netmask defaults to 255.255.255.0, rather

than defaulting to netmask of underlying physical

interface.

Page 11: Racsig rac internals

11

Oracle RAC VIP is DIFFERENT

• Only accepts connections when on its home node

• Failure on home node: relocates to another node in the

cluster only to send a error back to client (it will not be

in the listener so connections are not accepted!)

• You will only have one active RAC VIP per node (there

may be others who have relocated due to failure!)

• Independent of number of databases running in cluster

Oracle RAC VIP

[root@pmrac1 root]# ifconfig

eth0 Link encap:Ethernet HWaddr 00:12:79:D8:90:93

inet addr:144.15.214.10 Bcast:144.15.215.255

Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:5070815 errors:0 dropped:0 overruns:0 frame:0

TX packets:3064435 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:509963813 (486.3 Mb) TX bytes:3621223517 (3453.4 Mb)

Interrupt:25

eth0:1 Link encap:Ethernet HWaddr 00:12:79:D8:90:93

inet addr:144.15.214.30 Bcast:144.15.215.255

Mask:255.255.252.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:5762695 errors:0 dropped:0 overruns:0 frame:0

TX packets:5679252 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:3400642002 (3243.1 Mb) TX bytes:3166774792 (3020.0 Mb)

Interrupt:25

Page 12: Racsig rac internals

12

Listener.ora

SID_LIST_LISTENER_PMRAC1 =

(SID_LIST =

(SID_DESC =

(SID_NAME = PLSExtProc)

(ORACLE_HOME = /u01/oracle/product/10gR2/asm)

(PROGRAM = extproc)

)

)

LISTENER_PMRAC1 =

(DESCRIPTION_LIST =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1))

(ADDRESS = (PROTOCOL = TCP)(HOST = pmrac1-vip)(PORT = 1521)(IP = FIRST))

(ADDRESS = (PROTOCOL = TCP)(HOST = 144.25.214.45)(PORT = 1521)(IP = FIRST))

)

)

Use the VIP in the Address List Automatically completed by DBCA

• init.ora remote_listener = listeners_sales

local_listener = listeners_sales1

• tnsnames.ora in RAC ORACLE_HOMELISTENERS_SALES =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = node3-vip)(PORT = 1521))

)

LISTENERS_SALES1 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 1521))

)

Page 13: Racsig rac internals

13

Application VIPs

• New resource as of Oracle RAC 10g Release 2

• Created as functional VIPs which can be used to connect to an application regardless of the node it is

running on

• VIP is a dependent resource of the user registered

application

• There can be many VIPs, one per User Application

Creating an Application VIP

• The usrvip script must run as root

• The default permissions need to be changed after registration

• As root…

crs_setperm ApplicationVIP1 –o root

• Allow oracle user to execute this script

• As root…

crs_setperm ApplicationVIP1 –u user:oracle:r-x

• Start the VIP

• As oracle…

crs_start ApplicationVIP1

Page 14: Racsig rac internals

14

<Insert Picture Here>

Oracle RACStartup/Shutdown

Node Startup Sequence

Service

Operating System

Oracle Clusterware

Instance 1

ASM

VIP1

Listener

Page 15: Racsig rac internals

15

VIP1

Oracle Dependencies

public network

Node1

cluster

interconnect

Node 2

Operating System

Oracle Clusterware

instance 2

ASM

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

VIP2

Managed by ASM

RAW Devices

Service

Operating System

Oracle Clusterware

instance 1

ASM

VIP1

Listener Listener

Service

Listener

Oracle DependenciesPrior to 10.2.0.3

public network

Node1

cluster

interconnect

Node2

Operating System

Oracle Clusterware

instance 2

ASM

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

VIP2

Managed by ASM

RAW Devices

Service

Operating System

Oracle Clusterware

instance 1

ASM

VIP1

Listener Listener

Service

VIP1

Page 16: Racsig rac internals

16

<Insert Picture Here>

Advanced Features of RACHigh Availability and Load

Balancing for Applications

Services

• Application workloads can be defined as Services• Individually managed and controlled

• Assigned to instances during normal startup

• On instance failure, automatic re-assignment

• Service performance individually tracked

• Finer grained control with Resource Manager

• Integrated with other Oracle tools / facilities (E.G. Scheduler,Streams)

• Managed by Oracle Clusterware

• Several services created and managed by database server

Many features discussed do not apply to default database service

Page 17: Racsig rac internals

17

Cluster Managed Services

• Service has a set of resources defined to Oracle

Clusterware

• Oracle Clusterware manages start/stop/re-locate

based on definition

• Define Preferred (normal operations) and Available (if

failure occurs) instances

• Dependent on Instance and VIP

• Manage using Enterprise Manager

• SRVCTL CLI for Cluster configuration

• DBMS_SERVICE PL/SQL package

What is FAN?

• Fast Application Notification (FAN) is a RAC

notification mechanism

• FAN HA Events: Notification of Up/Down for service,

instance & node

• Load Balancing Advisory Events: Advise clients of

current load for service and where to send connection requests

• Enable it, and Forget it.

Page 18: Racsig rac internals

18

Oracle Notification Service (ONS)

• Publish/Subscribe Messaging System

• Allows both local and remote consumption

• Used by Fast Application Notification (FAN) to publish

HA Events and Load Balancing Events

• Used by FAN clients to subscribe to events

• Automatically installed and configured by the installation of Oracle Clusterware

• DO NOT TURN OFF – Required by Oracle

Clusterware and RAC

Fan Clients

• HA Events: JDBC Implicit Connection Cache, OCI,

ODP.NET Connection Pools, Listener, Server Side Callouts, CMAN

• Load Balancing Advisory Events: JDBC Implicit

Connection Cache, ODP.NET Connection Pools, Listener, CMAN

• New with 11.1.0.7: Universal Connection Pool for

JAVA

Page 19: Racsig rac internals

19

Fast Connection Failover

• Fast and reliable high availability for connections in an

Oracle Real Application Clusters 10g environment

• Enable it and forget it

• Application can make it transparent to user by trapping SQL Exception and retrying

• Supported by Oracle JDBC, OCI, and ODP.NET

Load Balancing Advisory

• Load Balancing Advisory is an advisory for balancing

work across RAC instances.

• Load Balances at the transaction level (not

connections!)

• Directs work to where services are executing

well and resources are available.

• Adjusts distribution for different power nodes,

different priority and shape workloads, changing

demand.

• Stops sending work to slow, hung, failed nodes

early.

Page 20: Racsig rac internals

20

Runtime Connection Load Balancing

• When application does “getConnection”, the connection given is the one that will provide the best service.

• Supported by Oracle JDBC, OCI, and ODP.NET connection Pools

• Policy defined by setting GOAL on Service

• Need to have Oracle Net Services Connection Load Balancing

Web ClientWeb Client

RAC

Database

Instance1

Instance2

Instance3

Pool

Affinity Context

Connection

Leverage Temporal Connection AffinityNew with 11.1.0.7

Connect to me

Page 21: Racsig rac internals

21

Leverage XA Connection AffinityNew with 11.1.0.7

• DB 11g fixes the correctness problem. XA Affinity

adds Performance and Scalability.

• Eliminates current single DTP service limitation for

XA/RAC

• XA affinity is the ability to automatically localize a

global transaction to a single RAC instance

• Scope is the life of a global transaction

• First connection request for a global transaction uses

Runtime Connection Load Balancing (RCLB)

• Subsequent requests use affinity and are routed to the

same RAC instance where XA first started

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

Page 22: Racsig rac internals

22

<Insert Picture Here>

Appendix

For More Information

http://search.oracle.com

or

otn.oracle.com/rac

REAL APPLICATION CLUSTERS

Page 23: Racsig rac internals

23

Useful Metalink Notes

• Note 342082.1 “How to Change Subnet Masks for VIPs”

• Note 294430.1 “CSS Timeout Computation in RAC 10g ”

• Note 284752.1 “10g RAC: Steps To Increase CSS Misscount,

Reboottime and Disktimeout”

• Note 291962.1 ‘Setting Up Bonding in SLES 9’

• Note 291958.1 ‘Setting Up Bonding in Suse SLES8’

• Note 298891.1 ‘Configuring Linux for the Oracle 10g VIP using

bonding’

• Note 283107.1 ‘Configuring Solaris IP Multipathing (IPMP) for

the Oracle 10g VIP’

OTN.ORACLE.COM/RAC

• Workload Management with Oracle Real Application

Clusters (FAN, FCF, Load Balancing)

• Using standard NFS to support a third voting disk on a

stretch cluster configuration on Linux

• Using Oracle Clusterware to Protect 3rd Party

Applications

• New: otn.oracle.com/clusterware

• RAC Sample Code Pagehttp://www.oracle.com/technology/sample_code/products/rac/index.html

Page 24: Racsig rac internals

24