Download - 11gR2 Clusterware Technical Wp
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 1/108
Oracle Clusterware 11g Release 2 (11.2)Technical White Paper
Internal / Confidential
Version 1.0 update 3
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 2/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
1
1 Oracle Clusterware Architecture ................................................ 3
1.1 Oracle Clusterware Daemons and Agent Overview............ 3
1.2 Oracle High Availability Service Daemon (OHASD): .......... 4
1.3 Agents .............................................................................. 11
1.4 Cluster Synchronization Services (CSS): ......................... 14
1.5 Cluster Ready Services (CRS): ........................................ 35
1.6 Grid Plug and Play (GPnP) .............................................. 40
1.7 Oracle Grid Naming Service (GNS):................................. 46
1.8 Grid Interprocess Communication .................................... 52
1.9 Cluster time synchronization service daemon (CTSS): ..... 55
1.10 mdnsd .............................................................................. 57
2 Voting Files and Oracle Cluster Repository Architecture .......... 58
2.1 Voting File in ASM............................................................ 58
2.2 Voting File Changes ......................................................... 58
2.3 Oracle Cluster Registry (OCR) ......................................... 60
2.4 Oracle Local Registry (OLR) ............................................ 61
2.5 Bootstrap and Shutdown if OCR is located in ASM .......... 62 2.6 OCR in ASM diagnostics .................................................. 62
2.7 The ASM Diskgroup Resource ......................................... 63
2.8 The Quorum Failure Group .............................................. 64
2.9 ASM spfile ........................................................................ 65
3 Resources ............................................................................... 66
3.1 Resource types ................................................................ 66
3.2 Resource Dependencies .................................................. 77
4 Fast Application Notification (FAN) .......................................... 80
4.1 Event Sources .................................................................. 80
4.2 Event Processing architecture in oraagent ....................... 80
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 3/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
2
5 Configuration best practices ..................................................... 84
5.1 Cluster interconnect ......................................................... 84
5.2 misscount ......................................................................... 84
6 Clusterware Diagnostics and Debugging ................................. 85
6.1 Check Cluster Health ....................................................... 85
6.2 crsctl command line tool ................................................... 86
6.3 Trace File Infrastructure and Location .............................. 87
6.4 OUI / SRVM / JAVA related GUI tracing ........................... 89
6.5 Reboot Advisory ............................................................... 89
7 Other Tools .............................................................................. 91
7.1 ocrpatch ........................................................................... 91
7.2 vdpatch ............................................................................ 91
7.3 Appvipcfg – adding an application VIP ............................. 96
7.4 Application and Script Agent ............................................ 97
7.5 Oracle Cluster Health Monitor - OS Tool (IPD/OS) ......... 100
8 Appendix ................................................................................ 106
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 4/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
3
Oracle Clusterware 11g Release 2 (11.2)
Introduction
The transition to Oracle Database 11g release 2 (11.2), Oracle Clusterware introduced a
wide array of changes, ranging from a complete redesign of CRSD, introduction of a "local
CRS" (OHASD) and replacing the RACG layer with a tightly integrated AGENT layer, to new
features such as Grid Naming Service, Grid Plug and Play, Cluster time synchronization
service and Grid IPC. Cluster synchronization service (CSS) is probably the layer that seems
least affected by the 11.2 changes, but it provided functionality to support the new features,
as well as added new functionality such as IPMI support.
With this technical paper, we wanted to take the opportunity to provide all the know-how
that we have accumulated over the 11.2 development years, and relay it to everyone else
who is just starting out learning Oracle Clusterware 11.2. The paper provides conceptual
overviews as well as detailed information related to diagnostics and debugging.
Since this is the first version of the Oracle Clusterware diagnostics paper, not all components
of the 11.2 stack are covered equally detailed. If you feel you can make a contribution to the
paper, please let us know.
Disclaimer
The information contained in this document is subject to change without notice. If you find
any problems in this paper, or have any comments, corrections or suggestions, please report
them to us via E-Mail (mailto:[email protected]). We do not warrant that this
document is error-free. No part of this document may be reproduced in any form or by any
means, electronic or mechanical, for any purpose, without the permission of the authors.
This document is for internal use only and may not be distributed outside of Oracle.
1 Oracle Clusterware Architecture
This section will describe the main Oracle Clusterware daemons.
1.1 Oracle Clusterware Daemons and Agent Overview
The diagram below is a high level overview about the daemons, resources and agents used
in Oracle Clusterware 11g release (11.2).
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 5/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
4
The first big change noticed between pre-11.2 and 11.2 is the new OHASD daemon, which is
replacing all the known init scripts which exist in pre-11.2.
#
Figure 1: Resource startup chart.
1.2 Oracle High Availability Service Daemon (OHASD):
Oracle Clusterware consists of two separate stacks. The upper stack anchored by the Cluster
Ready Services daemon (crsd) and a lower stack anchored by the Oracle High Availability
Services daemon (ohasd). These two stacks have several processes that facilitate cluster
operations. The following chapters will describe them in detail.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 6/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
5
The OHASD is the daemon which starts every other daemon that is part of the Oracle
Clusterware stack on a node. OHASD will replace all the pre-11.2 existing init scripts.
The entry point for OHASD is /etc/inittab, which executes the /etc/init.d/ohasd and/etc/init.d/init.ohasd control scriptsThe /etc/init.d/ohasd script is a RC script including the
start and the stop actions. The /etc/init.d/init.ohasd script is the OHASD framework control
script which will spawn the Grid_home/bin/ohasd.bin executable.
The cluster control files are located in /etc/oracle/scls_scr/<hostname>/root (this is the
location for Linux) and maintained by crsctl; in other words, a “crsctl enable / disable crs”
will update the files in this directory.
# crsctl enable -h
Usage:
crsctl enable crs
Enable OHAS autostart on this server
# crsctl disable –h
Usage:
crsctl disable crs
Disable OHAS autostart on this server
The content of the file scls_scr/<hostname>/root/ohasdstr controls the autostart of the CRS
stack; the two possible values in the file are “enable” – autostart enabled, or “disable” –
autostart disabled.
The file scls_scr/<hostname>/root/ohasdrun controls the init.ohasd script. The three
possible values are “reboot” – sync with OHASD, “restart” – restart crashed OHASD, “stop” –
scheduled OHASD shutdown.
The big benefit of having OHASD in Oracle Clusterware 11g release 2 (11.2) is the ability to
run certain crsctl commands in a clusterized manner. Clusterized commands are completely
operating system independent, as they only rely on ohasd. If ohasd is running, then remote
operations, such as the starting, stopping, and checking the stack status of remote nodes,
can be performed.
Clusterized commands include the following:
– crsctl check cluster
– crsctl start cluster
– crsctl stop cluster
There are more functions that OHASD is performing, such as processing and managing the
Oracle Local Repository (OLR), as well as acting as the OLR server. In a cluster, OHASD runs
as root; in an Oracle Restart environment, where OHASD manages application resources, it
runs as the oracle user.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 7/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
6
1.2.1 OHASD Resource Dependency
The clusterware stack in Oracle Clusterware 11g release 2 (11.2) is started by the OHASD
daemon, which itself is spawned by the script /etc/init.d/init.ohasd when a node is started.
Alternatively, ohasd is started on a running node with ‘crsctl start crs’ after a prior ‘crsctl
stop crs’. The OHASD daemon will then start other daemons and agents. Each Clusterware
daemon is represented by an OHASD resource, stored in the OLR. The chart below shows
the association of the OHASD resources / Clusterware daemons and their respective agent
processes and owner.
Resource Name Agent Name Owner
ora.gipcd oraagent crs user
ora.gpnpd oraagent crs user
ora.mdnsd oraagent crs user
ora.cssd cssdagent Root
ora.cssdmonitor cssdmonitor Root
ora.diskmon orarootagent Root
ora.ctssd orarootagent Root
ora.evmd oraagent crs user
ora.crsd orarootagent Root
ora.asm oraagent crs user
ora.driver.acfs orarootagent Root
ora.crf (new in 11.2.0.2) orarootagent root
Figure 2: Resource and Agent associated table.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 8/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
7
The picture below shows all the resource dependencies between OHASD managed resources
/ daemons:
MDNSD GIPCD
GPNPD
CSSDMONITORDISKMON CSSD
CTSSD
EVMDCRSD
START:weakSTART:weakSTOP:hard(intermediate)
START:weak
START:hard
STOP:hard(intermediate)
START:weak
STOP:hard
STOP:hardSTART:hard , pullup
STOP:hard(intermediate)START:hard , pullup
START:hard , pullup
STOP:hard (intermediate)START:hard , pullup
START:hard , pul lup
START:weak(concurrent),pullup(always)
Figure 3: For details regarding the hard/weak and pullup/intermediate resource dependencies see to 3.2.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 9/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
8
1.2.2 Daemon Resources
A typical daemon resource list from a node is listed below. To get the daemon resources list
we need to use the –init flag with the crsctl command.
# crsctl stat res -init –t
--------------------------------------------------------------------------NAME
TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE node1 Started
ora.crsd
1 ONLINE ONLINE node1
ora.cssd
1 ONLINE ONLINE node1
ora.cssdmonitor
1 ONLINE ONLINE node1
ora.ctssd
1 ONLINE ONLINE node1 OBSERVER
ora.diskmon
1 ONLINE ONLINE node1
ora.drivers.acfs
1 ONLINE ONLINE node1
ora.evmd
1 ONLINE ONLINE node1
ora.gipcd
1 ONLINE ONLINE node1
ora.gpnpd
1 ONLINE ONLINE node1
ora.mdnsd
1 ONLINE ONLINE node1
The list below will show the types used, and the hierarchy. Everything is built on the base
“resource” type. The cluster_resource is using the “resource” type as base type. Using the
cluster_type as base type we build the ora.daemon.type which is the building block for e.g.
the ora.cssd.type and all the other daemon resources.
To print the “internal” resource type names and resources use the crsctl –init flag.
# crsctl stat type -init
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 10/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
9
TYPE_NAME=application
BASE_TYPE=cluster_resource
TYPE_NAME=cluster_resource
BASE_TYPE=resource
TYPE_NAME=local_resource
BASE_TYPE=resource
TYPE_NAME=ora.asm.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.crs.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.cssd.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.cssdmonitor.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.ctss.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.daemon.type
BASE_TYPE=cluster_resource
TYPE_NAME=ora.diskmon.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.drivers.acfs.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.evm.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.gipc.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.gpnp.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=ora.mdns.type
BASE_TYPE=ora.daemon.type
TYPE_NAME=resource
BASE_TYPE=
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 11/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
10
Using the ora.cssd resource as an example, all the ora.cssd attributes can be shown using
crsctl stat res ora.cssd –init –f (note, not all the attributes are listed in the below example,
onyl the most important one).
# crsctl stat res ora.cssd -init –f
NAME=ora.cssd
TYPE=ora.cssd.type
STATE=ONLINE
TARGET=ONLINE
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:oracle11:r-x
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
CHECK_INTERVAL=30
ocssd_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=oracle11
ID=ora.cssd
LOGGING_LEVEL=1
START_DEPENDENCIES=weak(ora.gpnpd,concurrent:ora.diskmon)hard(ora.cssdmonitor)
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon)
In order to debug daemon resources the –init flag must be used all the time. To enable
additional debugging for e.g. ora.cssd:
# crsctl set log res ora.cssd:3 -init
To check a log level:
# crsctl get log res ora.cssd –init
Get Resource ora.cssd Log Level: 3
To check resource properties like logging level run:
# crsctl stat res ora.cssd -init -f | grep LOGGING_LEVEL
DAEMON_LOGGING_LEVELS=
LOGGING_LEVEL=3
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 12/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
11
1.3 Agents
Oracle Clusterware 11g Release 2 (11.2) introduces a new agent concept which makes the
Oracle Clusterware more robust and performant. These agents are multi-threaded daemonswhich implement entry points for multiple resource types and which spawn new processes
for different users. The agents are highly available and besides the oraagent, orarootagent
and cssdagent/cssdmonitor, there can be an application agent and a script agent.
The two main agents are the oraagent and the orarootagent. Both ohasd and crsd employ
one oraagent and one orarootagent each. If the CRS user is different from the ORACLE user,
then crsd would utilize two oraagents and one orarootagent.
1.3.1 oraagent
ohasd’s oraagent:
– Performs start/stop/check/clean actions for ora.asm, ora.evmd, ora.gipcd,ora.gpnpd, ora.mdnsd
crsd’s oraagent:
– Performs start/stop/check/clean actions for ora.asm, ora.eons, ora.LISTENER.lsnr,
SCAN listeners, ora.ons
– Performs start/stop/check/clean actions for service, database and diskgroup
resources
– Receives eONS events, and translates and forwards them to interested clients
(eONS will be removed and its functionality included in EVM in 11.2.0.2)
– Receives CRS state change events and dequeues RLB events and enqueues HAevents for OCI and ODP.NET clients
1.3.2 orarootagent
ohasd’s orarootagent:
– Performs start/stop/check/clean actions for ora.crsd, ora.ctssd, ora.diskmon,
ora.drivers.acfs, ora.crf (11.2.0.2)
crsd’s orarootagent:
– Performs start/stop/check/clean actions for GNS, VIP, SCAN VIP and network
resources
1.3.3 cssdagent / cssdmonitor
Please refer to the chapter “cssdagent and cssdmonitor”.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 13/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
12
1.3.4 Application agent / scriptagent
Please refer to the chapter “application and scriptagent”.
1.3.5 Agent Log Files
The log files for the ohasd/crsd agents are located in
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentname>_<o
wner>.log. For example, for ora.crsd, which is managed by ohasd and owned by root, the
agent log file is named
Grid_home/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
The same agent log file can have log messages for more than one resource, if those
resources are managed by the same daemon.
If an agent process crashes,
– a core file will be written toGrid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>, and
– a call stack will be written to
Grid_home/log/<hostname>/agent/{ohasd|crsd}/<agentname>_<owner>/<agentna
me>_<owner>OUT.log
The agent log file format is the following:
<timestamp>:[<component>][<thread id>]…
<timestamp>:[<component>][<thread id>][<entry point>]…
Example:
2009-10-07 13:25:00.640: [ora.ctssd][2991836048] [check] In code translate, return
= 0, state detail = OBSERVER
2009-10-07 13:25:08.545: [ AGFW][2991836048] check for resource: ora.diskmon 1
1 completed with status: ONLINE
2009-10-07 13:25:18.231: [ora.crsd][2991836048] [check] DaemonAgent::check
returned 0
2009-10-07 13:25:18.231: [ora.crsd][2991836048] [check] CRSD Deep Check
If any error occurs, the entry points for determining what happened are:
– clusterware alert log file Grid_home/log/<hostname>/alert<hostname>.log
– OHASD/CRSD log file
Grid_home/log/<hostname>/ohasd/ohasd.log
Grid_home/log/<hostname>/crsd/crsd.log
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 14/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
13
– The corresponding agent log file.
Bear in mind that one agent log file will contain the start/stop/check for multiple
resources. Taking the crsd orarootagent as example in case of a SCAN VIP failure,
grep for e.g. the resource name “ora.scan2.vip”.
2009-11-25 06:20:24.766: [ora.scan2.vip] [check] Checking if IP 10.137.12.214 is
present on NIC eth0
2009-11-25 06:20:24.766: [ AGFW] check for resource: ora.scan2.vip 1 1
completed with status: ONLINE
2009-11-25 06:20:25.765: [ AGFW] CHECK initiated by timer for: ora.scan2.vip 1
1
2009-11-25 06:20:25.767: [ AGFW] Executing command: check for resource:
ora.scan2.vip 1 1
2009-11-25 06:20:25.768: [ora.scan2.vip] [check] Checking if IP 10.137.12.214 is
present on NIC eth0
2009-11-25 06:20:25.768: [ AGFW] check for resource: ora.scan2.vip 1 1
completed with status: ONLINE
2009-11-25 06:20:26.767: [ AGFW] CHECK initiated by timer for: ora.scan2.vip 1
1
2009-11-25 06:20:26.769: [ AGFW] Executing command: check for resource:
ora.scan2.vip 1 1
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 15/108
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 16/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
15
reconfig manager thread writes an eviction notification (a.k.a. a kill block)
to the voting files. The RMT also sends a shutdown message to the victim.
Voting file heartbeats are monitored for split-brain checking, and remote
nodes are not considered gone until their disk-heartbeats have ceased for
<misscount> seconds.
– Discovery thread – for voting file discovery
– Fencing thread – for communicating with the diskmon process for I/O fencing, if
EXADATA is used.
1.4.2 Voting File Cluster Membership Threads
– Disk Ping thread – (one per voting file)
o writes the current view of cluster membership along with an incarnation
number and incrementing sequence number to voting file with which it is
associated, and
o Reads the kill block to see if its host node has been evicted.
o This thread also monitors the voting-disk heartbeat for remote nodes. The
disk heartbeat information is used during reconfigurations in order to
determine whether a remote ocssd has terminated.
– Kill Block thread – (one per voting file) monitors voting file availability to ensure a
sufficient number of voting files are accessible. If Oracle redundancy is used, werequire the majority of the configured voting disks online.
– Worker thread – (new in 11.2.0.1, 1 per voting file) miscellaneous I/O to voting files
– Disk Ping Monitor – monitors I/O voting file status
o This thread watches to ensure that disk ping threads are correctly reading
their kill blocks on a majority of the configured voting files. If we can’t
perform I/O to the voting file(s) due to I/O hang or I/O failures or other
reasons, we take the voting file(s) offline. This thread monitors the progress
of the disk ping threads. If CSS is unable to read a majority of the voting
files, it is possible that it no longer shares access to at least one disk with
each other node. It would be possible for this node to miss an eviction
notice; in other words, CSS is not able to cooperate and must be
terminated.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 17/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
16
1.4.3 Other Threads – Occasionally
– Node Kill threads – (transient) used for killing nodes via IPMI
–
Member kill thread – (transient) used during member kill
o member-kill (monitor) thread
o local-kill thread - when a CSS client initiates a member kill, the local CSS kill
thread will be created
– skgxn monitor (skgxnmon only present with vendor clusterware)
o This thread registers as a member of the node group with skgxn and
watches for changes in node-group membership. When a reconfig event
occurs, this thread requests the current node-group membership bitmap
from skgxn and compares it to the bitmap it received last time and the
current values of two other bitmaps: eviction pending, which identifies
nodes that are in the process of going down, and VMON’s group
membership, which indicates nodes whose oclsmon process is still running
(nodes that are (still) up). When a membership transition is identified, the
node-monitor thread initiates the appropriate action.
1.4.4 Other CSS trivia
In Oracle Clusterware 11g release 2 (11.2) there are diminished configuration requirements,
meaning nodes are added back automatically when started and deleted if they have been
down for too long. Unpinned servers that stop for longer than a week are no longer
reported by olsnodes. These servers are automatically administered when they leave the
cluster, so you do not need to explicitly remove them from the cluster.
1.4.4.1 Pinning nodes
The appropriate command to change the node pin behavior (i.e. to pin or unpin any specific
node), is the crsctl pin/unpin css command. Pinning a node means that the association of a
node name with a node number is fixed. If a node is not pinned, its node number may
change if the lease expires while it is down. The lease of a pinned node never expires.
Deleting a node with the crsctl delete node command implicitly unpins the node.
– During upgrade of Oracle Clusterware, all servers are pinned, whereas after a fresh
installation of Oracle Clusterware 11g release 2 (11.2), all servers you add to the
cluster are unpinned.
– You cannot unpin a server that has an instance of Oracle RAC that is older than
Oracle Clusterware 11g release 2 (11.2) if you installed Oracle Clusterware 11g
release 2 (11.2) on that server.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 18/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
17
Pinning a node is required for rolling upgrade to Oracle Clusterware 11g release 2 (11.2) and
will be done automatically. We have seen cases where customers perform a manual
upgrade and this would fail due to unpinned nodes.
1.4.4.2 Port assignment
The fixed port assignment for the CSS and node monitor has been removed, so there should
be no contention with other applications for ports. The only exception is during rolling
upgrade where we assign two fixed ports.
1.4.4.3 GIPC
The CSS layer is using the new communication layer Grid IPC (GIPC) and it still supports the
interaction with the pre-11.2 CLSC communication layer. In 11.2.0.2, GIPC will support the
use of multiple NICs for a single communications link, e.g. CSS/NM internode
communications.
1.4.4.4 Cluster alert.log
More cluster_alert.log messages have been added to allow faster location of entries
associated with a problem. An identifier will be printed in both the alert.log and the daemon
log entries that are linked to the problem. The identifier will be unique within the
component, e.g. CSS or CRS.
2009-11-24 03:46:21.110
[crsd(27731)]CRS-2757:Command 'Start' timed out waiting for response from the
resource 'ora.stnsp006.vip'. Details at (:CRSPE00111:) in
/scratch/grid_home_11.2/log/stnsp005/crsd/crsd.log.
2009-11-24 03:58:07.375
[cssd(27413)]CRS-1605:CSSD voting file is online: /dev/sdj2; details in
/scratch/grid_home_11.2/log/stnsp005/cssd/ocssd.log .
1.4.4.5 Exclusive mode
A new concept in Oracle Clusterware 11g release 2 (11.2) is the clusterware exclusive mode.
This mode will allow you to start the stack on one node with virtually nothing required to
start the stack. No voting files are required and no network connectivity is required. This
mode is for maintenance or trouble shooting only. Because this is a user invoked command
the user should be sure that only one node is up at the same time. The command to start
the stack in this mode is crsctl start crs –excl which only the root user can execute and
should run from one node only.
If another node is already up in the cluster, the exclusive start up will fail. The ocssd daemon
checks for active nodes and if it finds one, the start up will fail with CRS-4402. This is not an
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 19/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
18
error; this is an expected behaviour when another node is already up. John Leys said “do
not file bugs because you receive CRS-4402”.
1.4.4.6 Voting file discovery
The method of identifying voting files has changed in 11.2. While voting files were
configured in OCR in 11.1 and earlier, in 11.2 voting files are located via the CSS voting file
discovery string in the GPNP profile. Examples:
1.4.4.6.1 CSS voting file discovery string referring to ASM
The CSS voting file discovery string refers to ASM, so it will be using the value in the ASM
discovery string. Most commonly you will see this configuration on systems (e.g. Linux, using
older 2.6 kernels) where raw devices can still be configured, and where raw devices are used
for the LUN’s to be used by CRS and ASM.
Example:
<orcl:CSS-Profile id="css"
DiscoveryString=”+asm"
LeaseDuration="400"/>
<orcl:ASM-Profile id="asm" DiscoveryString="" SPFile=""/>
The empty value for the ASM discovery string means that it will revert to an OS-specific
default, which on Linux is “/dev/raw/raw*”.
1.4.4.6.2 CSS voting file discovery string referring to list of LUN’s/disks
In the example below, the CSS voting file discovery string actually refers to a list of disks /LUN’s. This is likely the configuration when block devices or devices in non-default locations
are used. In that scenario, the values for the CSS VF discovery string and the ASM discovery
string are identical.
<orcl:CSS-Profile id="css"
DiscoveryString=”/dev/shared/sdsk-a[123]-*-part8"
LeaseDuration="400"/>
<orcl:ASM-Profile id="asm"
DiscoveryString=”/dev/shared/sdsk-a[123]-*-part8
SPFile=""/>
Several voting file identifiers must be found on a disk to accept it as a voting disk: a unique
identifier for the file, the cluster GUID and a matching configuration incarnation number
(CIN). vdpatch can be used (vdpatch) to inspect a device whether it is a voting file.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 20/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
19
1.4.5 CSS lease
Lease acquisition is a mechanism through which a node acquires a node number. A lease
denotes that a node owns the associated node number for a period defined by the lease
duration. A lease duration is hardcoded in the GPNP profile to be one week. A node owns
the lease for the lease duration from the time of last lease renewal. A lease is considered to
be renewed during every DHB. Hence a lease expiry is defined
as below - lease expiry time = last DHB time + lease duration.
There are two types of lease.
– Pinned leases
A node uses a hard coded static node number. A pinned lease is used in an upgrade
scenario which involves older version clusterware that use static node number.
– Unpinned leases
A node acquires a node number dynamically using a lease acquisition algorithm.Lease acquisition algorithm is designed to resolve conflicts among nodes which try
to acquire the same slot at the same time.
For a successful lease operation the below message is put into the
Grid_home/log/<hostname>/alert<hostname>.log,
[cssd(8433)]CRS-1707:Lease acquisition for node staiv10 number 5 completed
For a lease acquisition failure, an appropriate message is also put in the <alert>hostname.log
and the ocssd.log. In the current release there are no tunable to tune the lease duration.
1.4.6 Split Brain Resolution
The below chapter will describe the main components and techniques used to resolve split
brain situations.
1.4.6.1 Heartbeats
The CSS uses two main heartbeat mechanisms for cluster membership, the network
heartbeat (NHB) and the disk heartbeat (DHB). The heartbeat mechanisms are intentionally
redundant and they are used for different purposes. The NHB is used for the detection of
loss of cluster connectivity, whereas the DHB is mainly used for network split brain
resolution. Each cluster node must participate in the heartbeat protocols in order to be
considered a healthy member of the cluster.
1.4.6.1.1 Network Heartbeat (NHB)
The NHB is sent over the private network interface that was configured as private
interconnect during Clusterware installation. CSS sends a NHB every second from one node
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 21/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
20
to all the other nodes in a cluster and receive every second a NHB from the remote nodes.
The NHB is also sent to the cssdmonitor and the cssdagent.
The NHB contains time stamp information from the local node, and is used by the remotenode to figure out when the NHB was sent. It indicates that a node can participate in cluster
activities, e.g. group membership changes, message sends etc. If the NHB is missing for
<misscount> seconds (30 seconds in Linux 11.2), a cluster membership change (cluster
reconfiguration) is required. The loss of connectivity to the network is not necessarily fatal if
the network connectivity is restored in less then <misscount> seconds.
To debug NHB issues, it is sometimes useful to increase the ocssd log level to 3 to see each
heartbeat message. Run the crsctl set log command as root user on each node:
# crsctl set log css ocssd:3
Monitor the largest misstime value in milliseconds to see if the misscount is increasing,
which would indicate network problems.
# tail -f ocssd.log | grep -i misstime
2009-10-22 06:06:07.275: [ ocssd][2840566672]clssnmPollingThread: node 2,
stnsp006, ninfmisstime 270, misstime 270, skgxnbit 4, vcwmisstime 0, syncstage 0
2009-10-22 06:06:08.220: [ ocssd][2830076816]clssnmHBInfo: css timestmp
1256205968 220 slgtime 246596654 DTO 28030 (index=1) biggest misstime 220 NTO
28280
2009-10-22 06:06:08.277: [ ocssd][2840566672]clssnmPollingThread: node 2,
stnsp006, ninfmisstime 280, misstime 280, skgxnbit 4, vcwmisstime 0, syncstage 0
2009-10-22 06:06:09.223: [ ocssd][2830076816]clssnmHBInfo: css timestmp
1256205969 223 slgtime 246597654 DTO 28030 (index=1) biggest misstime 1230 NTO
28290
2009-10-22 06:06:09.279: [ ocssd][2840566672]clssnmPollingThread: node 2,
stnsp006, ninfmisstime 270, misstime 270, skgxnbit 4, vcwmisstime 0, syncstage 0
2009-10-22 06:06:10.226: [ ocssd][2830076816]clssnmHBInfo: css timestmp
1256205970 226 slgtime 246598654 DTO 28030 (index=1) biggest misstime 2785 NTO
28290
To display the value of the current misscount setting use the command crsctl get css
misscount . We do not support a misscount setting other than the default; for customerswith more stringent HA requirements, contact Support / Development.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 22/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
21
1.4.6.1.2 Disk Heartbeat (DHB)
Apart from the NHB, we use the DHB which is required for split brain resolution. It contains a
timestamp of the local time in UNIX epoch seconds, as well as a millisecond timer.
The DHB beat is the definitive mechanisms to make a decision about whether a node is still
alive. When the DHB beat is missing for too long, the node is assumed to be dead. When
connectivity to the disk is lost for 'too long', the disk is considered offline.
The definition about ‘too long’ depends for the DHB on the following circumstances. First of
all, the Long disk I/O Timeout (LIOT), which has a default setting from 200 seconds. If we
cannot finish an I/O within that time to a voting file, we will take this voting file offline.
Secondly, the Short disk I/O Timeout (SIOT), which CSS uses during a cluster reconfiguration.
The SIOT is related to misscount (misscount (30) – reboottime (3) = 27 sec.). The default
reboottime is 3 seconds. To display the value of the disktimeout parameter for CSS, use the
command, crsctl get css disktimeout.
1.4.6.2 Network Split Detection
The timestamp of the last NHB is compared to the timestamp of the most recent DHB to
determine if a node is still alive.
When the delta between the timestamps of the most recent DHB and the last NHB is greater
than the SIOT (misscount – reboottime), a node is considered still active.
When the delta between the timestamps is less than reboottime, the node is considered still
alive.
If the time that the last DHB was read is more than SIOT, the node is considered dead (see
bug 5949311) .If the delta between the timestamps is greater than reboottime and less than SIOT, the
status of the node is unclear, and we must wait to make a decision until we fall into one of
the three above categories.
When the network fails and nodes that are still up cannot communicate with each other, the
network is considered split. To maintain data integrity when a split occurs, one of the nodes
must fail and the surviving nodes should be an optimal sub-cluster of the original cluster.
Nodes that are not to survive are evicted via one of the three possible ways:
– Via an eviction message sent through the network. In most cases this will fail
because of the existing network failure.
– via the voting file, the kill block
– via IPMI, if supported and configured
To explain this in more detail we use the following example for a cluster with nodes A, B, C
and D:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 23/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
22
– Nodes A and B receive each other's heartbeats
– Nodes C and D receive each other's heartbeats
– Nodes A and B cannot see heartbeats of C or D
– Nodes C and D cannot see heartbeats of A or B
– Nodes A and B are one cohort, C and D are another cohort
– Split begins when 2 cohorts stop receiving NHB’s from each other
CSS assumes a symmetric failure, i.e. the cohort of A+B stops receiving NHB’s from the
cohort of C+D at the same time that C+D stop receiving NHB’s from A+B.
In scenarios like this, CSS uses the voting file and DHB for split brain resolution. The kill
block, which is one part of the voting file structure, will be updated and used to notify nodes
that they have been evicted. Each node is reading its kill block every second, and will commit
suicide after another node has updated this kill block section.
In cases like the above, where we have similar sized sub-clusters, the sub-cluster with the
node containing the lower node number will survive and the other sub-cluster nodes will
reboot.
In case of a split in a larger cluster, the bigger sub-cluster will survive. In the two-node
cluster case, the node with the lower node number will survive in case of a network split,
independent from where the network error occurred.
The connectivity to a majority of voting files required for a node to stay active.
1.4.7 Member Kill ArchitectureThe kill daemon in 11.2.0.1 is an unprivileged process that kills members of CSS groups. It is
spawned by the ocssd library code when an I/O capable client joins a group, and it is
respawned when required. There is ONE kill daemon (oclskd) per user (e.g. crsowner,
oracle).
1.4.7.1 Member kill description
The following ocssd threads are involved in member kill / member kill escalation:
– client_listener – receives group join and kill requests
– peer_listener – receives kill requests from remote nodes
– death_check – provides confirmation of termination
– member_kill – spawned to manage a member kill request
– local_kill – spawned to carry out member kills on local node
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 24/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
23
– node termination – spawned to carry out escalation
Member kills are issued by clients who want to eliminate group members doing IO, for
example:
– LMON of the ASM instance
– LMON of a database instance
– crsd on Policy Engine (PE) master node (new in 11.2)
Member kills always involve a remote target; either a remote ASM or database instance, or
a remote, non-PE master crsd. The member kill request is handed over to the local ocssd,
who then sends the request to ocssd on the target node. In 11.1 and 11.2.0.1, ocssd will
hand over the process id's of the primary and shared members of the group to be killed to
oclskd. The oclskd will then perform a kill -9 on these processes. In 11.2.0.2 and later, the killdaemon runs as a thread in the cssdagent and cssdmonitor processes, hence there will no
running oclskd.bin process anymore. The kill daemon / thread register with CSS separately in
the KILLD group.
In some situations, and more likely in 11.2.0.1 and earlier, such as extreme CPU and memory
starvation, the remote node's kill daemon or remote ocssd cannot service the local ocssd’s
member kill request in time (misscount seconds), and therefore the member kill request will
time out. If LMON (ASM and/or RDBMS) requested the member kill, then the request will be
escalated by the local ocssd to a remote node kill. A member kill request by crsd will never
be escalated to a node kill, instead, we rely on the orarootagent's check action to detect the
dysfunctional crsd and restart it. The target node's ocssd will receive the member killescalation request, and will commit suicide, thereby forcing a node reboot.
With the kill daemon running as real-time thread in cssdagent/cssdmonitor (11.2.0.2),
there's a higher chance that the kill request succeeds despite high system load.
If IPMI is configured and functional, the ocssd node monitor will spawn a node termination
thread to shutdown the remote node using IPMI. The node termination thread
communicates with the remote BMC via the management LAN; it will establish an
authentication session (only a privileged user can shutdown a node) and check the power
status. The next step is requesting is a power-off and repeatedly checking the status until
the node status is OFF. After receiving the OFF status, we will power-ON the remote node
again, and the node termination thread will exit.
1.4.7.2 Member kill example:
LMON of database instance 3 issuing a member kill for instance on node 2 due to CPU
starvation:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 25/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
24
2009-10-21 12:22:03.613810 : kjxgrKillEM: schedule kill of inst 2 inc 20
in 20 sec
2009-10-21 12:22:03.613854 : kjxgrKillEM: total 1 kill(s) scheduled
kgxgnmkill: Memberkill called - group: DBPOMMI, bitmap: 1
2009-10-21 12:22:22.151: [ CSSCLNT]clssgsmbrkill: Member kill request: Members map
0x00000002
2009-10-21 12:22:22.152: [ CSSCLNT]clssgsmbrkill: Success from kill call rc 0
The local ocssd (third node, internal node number 2) receives the member kill request:
2009-10-21 12:22:22.151: [ ocssd][2996095904]clssgmExecuteClientRequest: Member
kill request from client (0x8b054a8)
2009-10-21 12:22:22.151: [ ocssd][2996095904]clssgmReqMemberKill: Kill
requested map 0x00000002 flags 0x2 escalate 0xffffffff
2009-10-21 12:22:22.152: [ ocssd][2712714144]clssgmMbrKillThread: Kill
requested map 0x00000002 id 1 Group name DBPOMMI flags 0x00000001 start time
0x91794756 end time 0x91797442 time out 11500 req node 2
DBPOMMI is the database group where LMON registers as primary member
time out = misscount (in milliseconds) + 500ms
map = 0x2 = 0010 = second member = member 1 (other example: map = 0x7 = 0111 =
members 0,1,2)
The remote ocssd on the target node (second node, internal node number 1) receives the
request and submits the PID's to the kill daemon:
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmmkLocalKillThread: Local
kill requested: id 1 mbr map 0x00000002 Group name DBPOMMI flags 0x00000000 st
time 1088320132 end time 1088331632 time out 11500 req node 2
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmmkLocalKillThread: Kill
requested for member 1 group (0xe88ceda0/DBPOMMI)
2009-10-21 12:22:22.201: [ ocssd][3799477152]clssgmUnreferenceMember: global
grock DBPOMMI member 1 refcount is 7
2009-10-21 12:22:22.201: [ ocssd][3799477152]GM Diagnostics started for
mbrnum/grockname: 1/DBPOMMI
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0xe330d5b0, pid 23929)
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0xe331fd68, pid 23973) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x89f7858, pid 23957) sharing group DBPOMMI, member 1, share type xmbr
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 26/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
25
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0x8a1e648, pid 23949) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.201: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0x89e7ef0, pid 23951) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DBPOMMI, member 1 (client
0xe8aabbb8, pid 23947) sharing group DBPOMMI, member 1, share type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x8a23df0, pid 23949) sharing group DG_LOCAL_POMMIDG, member 0, share type
normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x8a25268, pid 23929) sharing group DG_LOCAL_POMMIDG, member 0, share type
normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0x89e9f78, pid 23951) sharing group DG_LOCAL_POMMIDG, member 0, share type
normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]group DG_LOCAL_POMMIDG, member 0
(client 0xe8ab5cc0, pid 23947) sharing group DG_LOCAL_POMMIDG, member 0, share
type normal
2009-10-21 12:22:22.202: [ ocssd][3799477152]GM Diagnostics completed for
mbrnum/grockname: 1/DBPOMMI
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23929
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23973
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23957
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23949
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23951
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23947
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23949
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23929
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23951
2009-10-21 12:22:22.202: [ ocssd][3799477152]clssgmmkLocalSendKD: Copy pid
23947
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 27/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
26
At this point, the oclskd.log should indicate the successful kill of these processes, and
thereby the completion of the kill request. In 11.2.0.2 and later, the kill daemon thread will
perform the kill:
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsnkillagent_main:killreq
received:
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23929
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23973
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23957
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23949
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23951
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23947
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23949
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23929
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23951
2009-10-21 12:22:22.295: [ USRTHRD][3980221344] clsskdKillMembers: kill status 0
pid 23947
However, if within (misscount + 1/2 seconds) the request doesn't complete, the ocssd on the
local node escalates the request to a node kill:
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillThread: Time up:
Start time -1854322858 End time -1854311358 Current time -1854311358 timeout 11500
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillThread: Member kill
request complete.
2009-10-21 12:22:33.655: [ ocssd][2712714144]clssgmMbrKillSendEvent: Missing
answers or immediate escalation: Req member 2 Req node 2 Number of answers
expected 0 Number of answers outstanding 1
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmQueueGrockEvent:
groupName(DBPOMMI) count(4) master(0) event(11), incarn 0, mbrc 0, to member 2,
events 0x68, state 0x0
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmMbrKillEsc: Escalating node
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 28/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
27
1 Member request 0x00000002 Member success 0x00000000 Member failure 0x00000000
Number left to kill 1
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssnmKillNode: node 1 (staiu02)
kill initiated
2009-10-21 12:22:33.656: [ ocssd][2712714144]clssgmMbrKillThread: Exiting
ocssd on the target node will abort, forcing a node reboot:
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillThread: Time up.
Timeout 11500 Start time 1088320132 End time 1088331632 Current time 1088331632
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillResults: Replying
to kill request from remote node 2 kill id 1 Success map 0x00000000 Fail map
0x00000000
2009-10-21 12:22:33.705: [ ocssd][3799477152]clssgmmkLocalKillThread: Exiting
...
2009-10-21 12:22:34.679: [
ocssd][3948735392](:CSSNM00005:)clssnmvDiskKillCheck: Aborting, evicted by node 2,
sync 151438398, stamp 2440656688
2009-10-21 12:22:34.679: [
ocssd][3948735392]###################################
2009-10-21 12:22:34.679: [ ocssd][3948735392]clssscExit: ocssd aborting from
thread clssnmvKillBlockThread
2009-10-21 12:22:34.679: [
ocssd][3948735392]###################################
1.4.7.3 How to identify the client who originally requested the member kill?
From the ocssd.log, the requestor can also be derived:
2009-10-21 12:22:22.151: [ocssd][2996095904]clssgmExecuteClientRequest: Member
kill request from client (0x8b054a8)
<search backwards to when client registered>
2009-10-21 12:13:24.913: [ocssd][2996095904]clssgmRegisterClient:
proc(22/0x8a5d5e0), client(1/0x8b054a8)
<search backwards to when process connected to ocssd>
2009-10-21 12:13:24.897: [ocssd][2996095904]clssgmClientConnectMsg: Connect from
con(0x677b23) proc(0x8a5d5e0) pid(20485/20485) version 11:2:1:4, properties:
1,2,3,4,5
Using 'ps', or from other history (e.g. trace file, IPD/OS, OSWatcher), the process can be
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 29/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
28
identified via the process id:
$ ps -ef|grep ora_lmon
spommere 20485 1 0 01:46 ? 00:01:15 ora_lmon_pommi_3
1.4.8 Intelligent Platform Management Interface (IPMI)
Intelligent Platform Management Interface (IPMI) is an industry standard management
protocol that is included with many servers today. IPMI operates independently of the
operating system, and can operate even if the system is not powered on. Servers with IPMI
contain a baseboard management controller (BMC) which is used to communicate to the
server.
1.4.8.1 About Using IPMI for Node Fencing
To support the member-kill escalation to node-termination, you must configure and use an
external mechanism capable of restarting a problem node without cooperation, either from
Oracle Clusterware or from the operating system running on that node. IPMI is such a
mechanism, supported starting with 11.2. Normally, node termination using IPMI is
configured during installation, when the option of configuring IPMI from the Failure Isolation
Support screen is provided. If IPMI is not configured during installation, then it can be
configured using crsctl after the installation of CRS is complete.
1.4.8.2 About Node-termination Escalation with IPMI
To use IPMI for node termination, each cluster member node must be equipped with a
Baseboard Management Controller (BMC) running firmware compatible with IPMI version
1.5, which supports IPMI over a local area network (LAN). During database operation,member-kill escalation is accomplished by communication from the evicting ocssd daemon
to the victim node’s BMC over LAN. The IPMI over LAN protocol is carried over an
authenticated session protected by a user name and password, which are obtained from the
administrator during installation. If the BMC IP addresses are DHCP assigned, ocssd requires
direct communication with the local BMC during CSS startup. This is accomplished using a
BMC probe command (OSD), which communicates with the BMC through an IPMI driver,
which must be installed and loaded on each cluster system.
1.4.8.3 OLR Configuration for IPMI
There are two ways to configure IPMI, either during the Oracle Clusterware installation via
the Oracle Universal Installer or afterwards via crsctl.
OUI – asks about node-fencing via IPMI
– tests for driver to enable full support (DHCP addresses)
– obtains IPMI username and password and configures OLR on all cluster nodes
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 30/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
29
Manual configuration - after install or when using static IP addresses for BMCs
– crsctl query css ipmidevice
– crsctl set css ipmiadmin <ipmi-admin>
– crsctl set css ipmiaddr
See Also: Oracle Clusterware Administration and Deployment Guide, “Configuration and
Installation for Node Fencing" for more information and Oracle Grid Infrastructure
Installation Guide, “Enabling Intelligent Platform Management Interface (IPMI)”
1.4.9 Debugging CSS
Sometimes it is necessary to change the default logging level for ocssd.
The default logging level for ocssd in 11.2 is 2. In order to change the logging level, run the
following command as root user on a node with the clusterware stack up:
# crsctl set log css CSSD:N (where N is the logging level)
– Logging level 2 = default
– Logging level 3 = verbose e.g. displays each heartbeat message including the
misstime which can be helpful debugging NHB related problems
– Logging level 4 = super verbose
Most problems can be solved with level 2. Some require level 3, few require level 4. Using
level 3 or 4, trace information may only be kept for a few hours (or even minutes) because
the trace files can fill up and information can be overwritten. Please note that a high logging
level will incur a performance impact on ocssd due to the amount of tracing. If you need tokeep data for a longer period of time, create a cron job to back up and compress the CSS
logs.
In order to trace the cssdagent or the cssdmonitor the below enhanced tracing can be set
via crsctl.
# crsctl set log res ora.cssd=2 -init
# crsctl set log res ora.cssdmonitor=2 -init
In Oracle Clusterware 11g release 2 (11.2), CSS prints the stack dump into the cssdOUT.log.
There are enhancements which will help to flush diagnostic data to disk before a reboot
occurs. So in 11.2 we don’t consider it necessary to change the diagwait (default 0) unless
advised by support or development.
In very rare cases and only during debugging, it might maybe necessary to disable ocssd
reboots. This can be done via below crsctl command. Disabling reboots should only be done
when instructed by support or development and can be done online without a clusterware
stack restart.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 31/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
30
# crsctl modify resource ora.cssd -attr "ENV_OPTS=DEV_ENV" -init
# crsctl modify resource ora.cssdmonitor -attr "ENV_OPTS=DEV_ENV" –init
Starting with 11.2.0.2 the possibility to set higher log levels for the individual modules is
introduced.
To list all the module names for the css daemon, the following command should be used:
# crsctl lsmodules css
List CSSD Debug Module: CLSF
List CSSD Debug Module: CSSD
List CSSD Debug Module: GIPCCM
List CSSD Debug Module: GIPCGM
List CSSD Debug Module: GIPCNM
List CSSD Debug Module: GPNPList CSSD Debug Module: OLR
List CSSD Debug Module: SKGFD
CLSF and SKGFD - are related to the I/O layer to the voting disks
CSSD - same old one
GIPCCM - gipc communication between applications and CSS
GIPCGM - communication between peers in the GM layer
GIPCNM - communication between nodes in the NM layer
GPNP - trace for gpnp calls within CSS
OLR - trace for olr calls within CSS
The following is an example in how to set the trace level different for various modules.
# crsctl set log css GIPCCM=1,GIPCGM=2,GIPCNM=3
# crsctl get log css CSSD=4
To check which trace level is currently set the following command can be used:
# crsctl get log ALL
# crsctl get log css GIPCCM
1.4.10 CSSDAGENT and CSSDMONITOR
The cssdagent and cssdmonitor provide almost the same functionality. The cssdagent(represented by the ora.cssd resource) starts, stops, and checks the status of the ocssd
daemon. The cssdmonitor (represented by the ora.cssdmonitor resource) monitors the
cssdagent. There is no ora.cssdagent resource, and there is no resource for the ocssd
daemon.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 32/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
31
Both agents implement the functionality of several pre-11.2 daemons such as the oprocd,
and olcsomon; the thread that implements oclsvmon functionality, runs in either process,
not both. The cssdagent and cssdmonitor run in real-time priority with locked down
memory, just like ocssd.
In addition, the cssdagent and cssdmonitor provide the following services to guarantee data
integrity:
– Monitoring ocssd; if ocssd fails, then cssd* reboot the node.
– Monitoring the node scheduling: if node is hung / not scheduled, reboot the node.
To make more comprehensive decisions whether a reboot is required, both cssdagent and
cssdmonitor receive state information from ocssd, via NHB, to ensure that the state of the
local nodes as perceived by remote nodes is accurate. Furthermore, the integration will
leverage the time before other nodes perceive the local node to be down for purposes such
as filesystem sync to get complete diagnostic data.
1.4.10.1 CSSDAGENT and CSSDMONITOR debugging
In order to enable ocssd agent debugging, the command crsctl set log res ora.cssd:3 –init
should be used. The operation is logged in the
Grid_home/log/<hostname>/agent/ohasd/oracssdagent_root/oracssdagent_root.log and
immediate more trace information is written to the oracssdagent_root.log.
2009-11-25 10:00:52.386: [ AGFW][2945420176] Agent received the message:
RESOURCE_MODIFY_ATTR[ora.cssd 1 1] ID 4355:106099
2009-11-25 10:00:52.387: [ AGFW][2966399888] Executing command:
res_attr_modified for resource: ora.cssd 1 1
2009-11-25 10:00:52.387: [ USRTHRD][2966399888] clsncssd_upd_attr: setting trace
to level 3
2009-11-25 10:00:52.388: [ CSSCLNT][2966399888]clssstrace: trace level set to 2
2009-11-25 10:00:52.388: [ AGFW][2966399888] Command: res_attr_modified for
resource: ora.cssd 1 1 completed with status: SUCCESS
2009-11-25 10:00:52.388: [ AGFW][2945420176] Attribute: LOGGING_LEVEL for
resource ora.cssd modified to: 3
2009-11-25 10:00:52.388: [ AGFW][2945420176] config version updated to : 7 for
ora.cssd 1 1
2009-11-25 10:00:52.388: [ AGFW][2945420176] Agent sending last reply for:
RESOURCE_MODIFY_ATTR[ora.cssd 1 1] ID 4355:106099
2009-11-25 10:00:52.484: [ CSSCLNT][3031063440]clssgsgrpstat: rc 0, gev 0, incarn
2, mc 2, mast 1, map 0x00000003, not posted
The same applies for the cssdmonitor (ora.cssdmonitor) resource.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 33/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
32
1.4.11 Concepts
1.4.11.1 HEARTBEATS
– Disk HeartBeat (DHB) is written to the voting file periodically, once per second
– Network HeartBeat (NHB) is sent to the other nodes periodically, once per second
– Local HeartBeat (LHB) is sent to the agent/monitor periodically, once per second
1.4.11.2 ocssd threads
– Sending Thread (ST) sends NHB’s and LHB’s (at the same time) – Disk Ping thread writes DHB’s to VF (one per VF) – Cluster Listener (CLT) receive messages from other nodes, mostly NHB’s
1.4.11.3 Agent/Monitor threads– HeartBeat thread (HBT) receives LHB from ocssd and detects connection failures
– OMON thread (OMT) monitors for connection failure and state of its local peer
– OPROCD thread (OPT) monitors scheduling of agent/monitor processes
– VMON thread (VMT) replaces clssvmon executable, registers in skgxn group when
vendor clusterware present
1.4.11.4 Timeouts
– Misscount (MC) amount of time with no NHB from a node before removing the
node from the cluster– Network Time Out (NTO) maximum time remaining with no NHB from a node
before removing the node from the cluster
– Disk Time Out (DTO) maximum time left before a majority of voting files are
considered inaccessible
– ReBoot Time (RBT) the amount of time allowed for a reboot; historically had to
account for init script latencies in rebooting. The default is 3 seconds.
1.4.11.5 Misscount, SIOT, RBT
– Disk I/O Timeout amount of time for a voting file to be offline before it is unusable
o SIOT – Short I/O Timeout, in effect during reconfig
o LIOT – Long I/O Timeout, in effect otherwise
– Long I/O Timeout – (LIOT) is configurable via ‘crsctl set css disktimeout’ and the
default is 200 seconds
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 34/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
33
– Short I/O Timeout (SIOT) is (misscount – reboot time) o In effect when NHB’s missed for misscount/2
o ocssd terminates if no DHB for SIOT
o Allows RBT seconds after termination for reboot to complete
1.4.11.6 Disk Heartbeat Perceptions
– Other node perception of local state in reconfig
o No NHB for misscount, node not visible on network
o No DHB for SIOT, node not alive
o If node alive, wait full misscount for DHB activity to be missing, i.e. node
not alive
– As long as DHB’s are written, other nodes must wait
– Perception of local state by other nodes must be valid to avoid data corruption
1.4.11.7 Disk Heartbeat Relevance
– DHB only read starting shortly before a reconfig to remove the node is started
– When no reconfig is impending, the I/O timeout not important, so need not be
monitored
– If the disk timeout expires, but the NHB’s have been sent to and received from
other nodes, it will still be misscount seconds before other nodes will start a
reconfig
– The proximity to a reconfig is important state information for OPT
1.4.11.8 Clocks
– Time Of Day Clock (TODC) the clock that indicates the hour/minute/second of the
day (may change as a result of commands) – aTODC is the agent TODC
– cTODC is the ocssd TODC
– Invariant Time Clock (ITC) a monotonically increasing clock that is invariant i.e. does
not change as a result of commands). The invariant clock does not change if time setbackwards or forwards; it is always constant.
o aITC is the agent ITC
o cITC is the ocssd ITC
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 35/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
34
1.4.12 How It Works
ocssd state information contains the current clock information, the network time out (NTO)
based on the node with the longest time since the last NHB and a disk I/O timeout based on
the amount of time since the majority of voting files was last online. The sending thread
gathers this current state information and sends both a NHB and local heartbeat to ensure
that the agent perception of the aliveness of ocssd is the same as that of other nodes.
The cluster listener thread monitors the sending thread. It ensures the sending thread has
been scheduled recently and wakes up if necessary. There are enhancements here to ensure
that even after clock shifts backwards and forwards, the sending thread is scheduled
accurately.
There are several agent threads, one is the oprocd thread which just sleeps and wakes up
periodically. Upon wakeup, it checks if it should initiate a reboot, based on the last known
ocssd state information and the local invariant time clock (ITC). The wakeup is timer driven.
The heartbeat thread is just waiting for a local heartbeat from the ocssd. The heartbeatthread will calculate the value that the oprocd thread looks at, to determine whether to
reboot. It checks if the oprocd thread has been awake recently and if not, pings it awake.
The heartbeat thread is event driven and not timer driven.
1.4.13 Filesystem Sync
When the ocssd fails, a filesystem sync is started. There is a fair amount of time to get this
done, so we can wait several seconds for a sync. The last local heartbeat indicates how long
we can wait, and the wait time is based on misscount. When the wait time expires, oprocd
will reboot the node. In most cases, diagnostic data will get written to disk. There are rare
cases when this may not possible, e.g. when the sync is not issued due to CSS being hung.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 36/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
35
1.5 Cluster Ready Services (CRS):
Cluster Ready Services is the primary program for managing high availability operations in a
cluster. The CRS daemon (crsd) manages cluster resources based on the configurationinformation that is stored in OCR for each resource. This includes start, stop, monitor, and
failover operations. The crsd daemon monitors the Oracle database instance, listener, and
so on, and automatically restarts these components when a failure occurs.
The crsd daemon runs as root and restarts automatically after a failure. When Oracle
Clusterware is installed in a single-instance database environment for Oracle ASM and
Oracle Restart, ohasd instead of crsd manages application resources.
1.5.1 Policy Engine
1.5.1.1 Overview
Resource High Availability in 11.2 is handled by OHASD (usually for infrastructure resources)and CRSD (for applications deployed in the cluster). Both daemons share the same
architecture and most of the code base. For most intents and purposes, OHASD can be seen
as a CRSD in a cluster of one node. The discussion in the subsequent sections applies to both
daemons, to the extent it makes sense (“OHASD is like a CRSD in a single node cluster!”)
Since 11.2, the architecture of CRSD implements the master-slave model: a single CRSD in
the cluster is picked to be the master and others are all slaves. Upon daemon start-up and
every time the master is re-elected, every CRSD writes the current master into its crsd.log
(grep for “PE MASTER NAME”) e.g.
grep "PE MASTER" Grid_home/log/hostname/crsd/crsd.*
crsd.log:2010-01-07 07:59:36.529: [ CRSPE][2614045584] PE MASTER NAME: staiv13
CRSD is a distributed application comprised of several “modules”. Modules are mostly state-less
and operate by exchanging messages. The state (context) is always carried with each individual
message; most interactions are asynchronous in nature. Some modules have dedicated threads
others share a single thread and some operate with a pool of threads. The important CRSD
modules are as follows:
- The Policy Engine (a.k.a PE/CRSPE in logs) is responsible for rendering all policy decisions
- The Agent Proxy Server (a.k.a Proxy/AGFW in logs) is responsible for agent management
and proxy-ing commands/events between the Policy Engine and the agents
- The UI Server (a.k.a UI/UiServer in logs) is responsible for managing client connections
(APIs/crsctl), and being a proxy between the PE and client programs
- The OCR/OLR module (OCR in logs) is the front-end for all OCR/OLR interactions
- The Reporter module (CRSRPT in logs) is responsible for all event publishing out of CRSD
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 37/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
36
For example, a client request to modify a resource will produce the following interaction:
CRSCTL UI Server PE OCR Module PE Reporter (event publishing)
Proxy (to notify the agent)
CRSCTL UI Server PE
Note that the UiServer/PE/Proxy can each be on different nodes, as shown on Figure 4 below.
Figure 4: UiServer / PE / Proxy picture
1.5.1.2 Resource Instances & IDs
In 11.2, CRS modeling supports two concepts of resource multiplicity: cardinality and
degree. The former controls the number of nodes where the resource can run concurrently
while the latter controls the number of instances of the resource that can be run on each
node. To support the concepts, the PE now distinguishes between resources and resource
instances. The former can be seen as a configuration profile for the entire resource while the
latter represents the state data for each instance of the resource. For example, a resource
with CARDINALITY=2, DEGREE=3 will have 6 resource instances. Operations that affect
resource state (start/stopping/etc.) are performed using resource instances. Internally,
resource instances are referred to with IDs which following the following format: “<A> <B>
<C>” (note space separation), where <A> is the resource name, <C> is the degree of the
instance (mostly 1), and <B> is the cardinality of the instance for cluster_resource resources
or the name of the node to which the instance is assigned for local_resource names. That’s
why resource name have “funny” decorations in logs:
[ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new target state: [ONLINE] old
value: [OFFLINE]
crsctl
CRSD
@ Node 2
1
2CRSD
@ Node 1
CRSD
@ Node 0
3
agent
4 5
67
8
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 38/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
37
1.5.1.3 Log Correlation
CRSD is event-driven in nature. Everything of interest is an event/command to process. Two
kinds of commands are distinguished: planned and unplanned. The former are usually
administrator-initiated (add/start/stop/update a resource, etc.) or system-initiated
(resource auto start at node reboot, for instance) actions while the latter are normally
unsolicited state changes (a resource failure, for example). In either case, processing such
events/commands is what CRSD does and that’s when module interaction takes place. One
can easily follow the interaction/processing of each event in the logs, right from the point of
origination (say from the UI module) through to PE and then all the way to the agent and
back all the way using the concept referred to as a “tint”. A tint is basically a cluster-unique
event ID of the following format: {X:Y:Z}, where X is the node number, Y a node-unique
number of a process where the event first entered the system, and Z is a monotonically
increasing sequence number, per process. For instance, {1:25747:254} is a tint for the
254th event that originated in some process internally referred to us 25747 on node
number 1. Tints are new in 11.2.0.2 and can be seen in CRSD/OHASD/agent logs. Each event
in the system gets assigned a unique tint at the point of entering the system and modules
prefix each log message while working on the event with that tint.
For example, in a 3-node cluster where node0 is the PE, issuing a “crsctl start resource r1 –n
node2” from node1, exactly as illustrated on Figure 4 above, will produce the following in
the logs:
CRSD log node1 (crsctl always connects to the local CRSD; UI server forwards the
command to the PE):
2009-12-29 17:07:24.742: [UiServer][2689649568] {1:25747:256} Container [ Name:UI_START
…
RESOURCE:
TextMessage[r1]
2009-12-29 17:07:24.742: [UiServer][2689649568] {1:25747:256} Sending message to
PE. ctx= 0xa3819430
CRSD log node 0 (with PE master)
2009-12-29 17:07:24.745: [ CRSPE][2660580256] {1:25747:256} Cmd : 0xa7258ba8 :
flags: HOST_TAG | QUEUE_TAG
2009-12-29 17:07:24.745: [ CRSPE][2660580256] {1:25747:256} Processing PE
command id=347. Description: [Start Resource : 0xa7258ba8]
2009-12-29 17:07:24.748: [ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new
target state: [ONLINE] old value: [OFFLINE]
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 39/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
38
2009-12-29 17:07:24.748: [ CRSOCR][2664782752] {1:25747:256} Multi Write Batch
processing...
2009-12-29 17:07:24.753: [ CRSPE][2660580256] {1:25747:256} Sending message to
agfw: id = 2198
Here, the PE performs a policy evaluation and interacts with the Proxy on the
destination node (to issue the start action) and the OCR (to record the new value for the
TARGET).
CRSD log node 2 (The proxy starts the agent, forwards the message to it)
2009-12-29 17:07:24.763: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
received the message: RESOURCE_START[r1 1 1] ID 4098:2198
2009-12-29 17:07:24.767: [ AGFW][2703780768] {1:25747:256} Starting the agent:
/ade/agusev_bug/oracle/bin/scriptagent with user id: agusev and incarnation:1
AGENT log node 2 (the agent executes the start command)
2009-12-29 17:07:25.120: [ AGFW][2966404000] {1:25747:256} Agent received the
message: RESOURCE_START[r1 1 1] ID 4098:1459
2009-12-29 17:07:25.122: [ AGFW][2987383712] {1:25747:256} Executing command:
start for resource: r1 1 1
2009-12-29 17:07:26.990: [ AGFW][2987383712] {1:25747:256} Command: start for
resource: r1 1 1 completed with status: SUCCESS
2009-12-29 17:07:26.991: [ AGFW][2966404000] {1:25747:256} Agent sending reply
for: RESOURCE_START[r1 1 1] ID 4098:1459
CRSD log node 2 (The proxy gets a reply, forwards it back to the PE)
2009-12-29 17:07:27.514: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
received the message: CMD_COMPLETED[Proxy] ID 20482:2212
2009-12-29 17:07:27.514: [ AGFW][2703780768] {1:25747:256} Agfw Proxy Server
replying to the message: CMD_COMPLETED[Proxy] ID 20482:2212
CRSD log node 0 (with PE master: receives the reply, notifies the Reporter and replies to UI
Server; the Reporter publishes to EVM)
2009-12-29 17:07:27.012: [ CRSPE][2660580256] {1:25747:256} Received reply to
action [Start] message ID: 2198
2009-12-29 17:07:27.504: [ CRSPE][2660580256] {1:25747:256} RI [r1 1 1] new
external state [ONLINE] old value: [OFFLINE] on agusev_bug_2 label = []
2009-12-29 17:07:27.504: [ CRSRPT][2658479008] {1:25747:256} Sending UseEvm mesg
2009-12-29 17:07:27.513: [ CRSPE][2660580256] {1:25747:256} UI Command [Start
Resource : 0xa7258ba8] is replying to sender.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 40/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
39
CRSD log node1 (where crsctl command was issued; UI server writes out the response,
completes the API call)
2009-12-29 17:07:27.525: [UiServer][2689649568] {1:25747:256} Container [ Name:
UI_DATA
r1:
TextMessage[0]
]
2009-12-29 17:07:27.526: [UiServer][2689649568] {1:25747:256} Done for
ctx=0xa3819430
The above demonstrates the ease of following distributed processing of a single request
across 4 processes on 3 nodes by using tints as a way to filter, extract, group and correlate
information pertaining to a single event across a plurality of diagnostic logs.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 41/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
40
1.6 Grid Plug and Play (GPnP)
A new feature in Oracle Clusterware 11g release 2 (11.2) is Grid Plug and Play, which is
mainly managed by the Grid Plug and Play Daemon (GPnPD). The GPnPD provides access tothe GPnP profile, and coordinates updates to the profile among the nodes of the cluster to
ensure that all of the nodes have the most recent profile.
1.6.1 GPnP Configuration
The GPnP configuration is a profile and wallet configuration, identical for every peer node.
The profile and wallet are created and copied by the Oracle Universal Installer. The GPnP
profile is a XML test file which contains bootstrap information necessary to form a cluster.
Information such as the clustername, the GUID, the discovery strings, expected network
connectivity. It does not contain node specific. The profile is managed by GPnPD, and it
exists on every node in the GPnP cache. When there are no updates to the profile, it is
identical on all cluster nodes. The way the best profile is judged, is via a sequence number.
The GPnP wallet is just a binary blob containing public / private RSA keys, used to sign and
verify the GPnP profile. The wallet is identical for all GPnP peers and once created by the
Oracle Universal Installer; it never changes and lives forever.
A typical profile would contain the information below. Never change the XML file directly;
instead, use supported tools, like OUI, ASMCA, asmcd, oifcfg etc. in order to modify GPnP
profile information.
The use of gpnptool to make changes to the GPnP profile is discouraged as multiple steps
have to be executed to even get a modification into the profile. If the modification adds
invalid content, it will certainly mess up the profile information and subsequent errors will
happen.
# gpnptool get
Warning: some command line parameters were defaulted. Resulting command line:
/scratch/grid_home_11.2/bin/gpnptool.bin get -o-
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0"
xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-
pnp.org/2005/11/gpnp-profile" xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-
profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd"
ProfileSequence="4" ClusterUId="0cd26848cf4fdfdebfac2138791d6cf1"
ClusterName="stnsp0506" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork
id="gen" HostName="*"><gpnp:Network id="net1" IP="10.137.8.0" Adapter="eth0"
Use="public"/><gpnp:Network id="net2" IP="10.137.20.0" Adapter="eth2"
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 42/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
41
Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile
id="asm" DiscoveryString="/dev/sdf*,/dev/sdg*,/voting_disk/vote_node1"
SPFile="+DATA/stnsp0506/asmparameterfile/registry.253.699162981"/>
<ds:Signature
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationM
ethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod
Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference
URI=""><ds:Transforms><ds:Transform
Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform
Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> <InclusiveNamespaces
xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl
xsi"/></ds:Transform></ds:Transforms><ds:DigestMethod
Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>ORAmrPMJ/plFtG
Tg/mZP0fU8ypM=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>K
u7QBc1/fZ/RPT6BcHRaQ+sOwQswRfECwtA5SlQ2psCopVrO6XJV+BMJ1UG6sS3vuP7CrS8LXrOTyoIxSkU
7xWAIB2Okzo/Zh/sej5O03GAgOvt+2OsFWX0iZ1+2e6QkAABHEsqCZwRdI4za3KJeTkIOPliGPPEmLuImu
DiBgMk=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>
Success.
The initial GPnP configuration is created and propagated by the root script as part of the
Oracle Clusterware installation. During a fresh install the profile content is sourced from the
Oracle Universal Installer interview results in Grid_home/crs/install/crsconfig_params.
1.6.2 GPnP Daemon
The GPnP daemon is like all other daemons OHASD managed and spawned by OHASD
oraagent. The main purpose of the GPnPD is to serve the profiles, therefore it must run in
order for the stack to start. The GPnPD startup sequence is mainly:
– detects running gpnpd, connects back to oraagent
– opens wallet/profile
– opens local/remote endpoints
– advertises remote endpoint with mdnsd
– starts OCR availability check
– discovers remote gpnpds
– equalizes profile
– starts to service clients
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 43/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
42
1.6.3 GPnP CLI Tools
There are a few client tools which indirectly perform GPnP profile changes. They require
ocssd to be running:
– crsctl replace discoverystring
– oifcfg getif / setif
– ASM – srvctl or sqlplus changing the spfile location or the ASM disk discoverystring
Note, that profile changes are serialized cluster-wide with a CSS lock (bug 7327595).
Grid_home/bin/gpnptool is the actual tool to manipulate the gpnp profile. To see the
detailed usage, run ‘gpnptool help’.
Oracle GPnP Tool
Usage:
"gpnptool <verb> <switches>", where verbs are:
create Create a new GPnP Profile
edit Edit existing GPnP Profile
getpval Get value(s) from GPnP Profile
get Get profile in effect on local node
rget Get profile in effect on remote GPnP node
put Put profile as a current best
find Find all RD-discoverable resources of given type
lfind Find local gpnpd server
check Perform basic profile sanity checks
c14n Canonicalize, format profile text (XML C14N)
sign Sign/re-sign profile with wallet's private key
unsign Remove profile signature, if any
verify Verify profile signature against wallet certificate
help Print detailed tool help
ver Show tool version
1.6.4 Debugging and Troubleshooting
In order to get more log and trace information there is a tracing environment variable
GPNP_TRACELEVEL which range is from [0-6]. The GPnP traces are located mainly at
Grid_home/log/<hostname>/alert*,
Grid_home/log/<hostname>/client/gpnptool*, other client logs
Grid_home/log/<hostname>/gpnpd|mdnsd/*
Grid_home/log/<hostname>/agent/ohasd/oraagent_<username>/*
The product setup files which are holding the initial information are located at
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 44/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
43
Grid_home/crs/install/crsconfig_params
Grid_home/cfgtoollogs/crsconfig/root*
Grid_home/gpnp/*, Grid_home /gpnp/<hostname>/* [profile+wallet]
If the GPnP setup is failing the following failure scenario checks should be performed.
– Failed to create wallet, profile? Failed to sign profile? Wrong signature? No access
to wallet or profile? [gpnpd is dead, stack is dead] (bug:8609709,bug:8445816)
– Missing/bad settings in profile (e.g. no discovery string, no interconnect, too many
interconnects)? [gpnpd is up, stack is dead – e.g. no voting files, no interconnects]
– Failed to propagate cluster-wide config? [gpnpd daemons are not communicating,
no put]
If something is failing during the GPnP runtime the following checks should be done.
– Is mdnsd running? Gpnpd failed to register with mdnsd? Discovery fails? [no put,
rget]
– Is gpnpd dead/not running? [no get, immediately fails]
– Is gpnpd is not fully up? [no get, no put, client spins in retries, times out]
– Discovering spurious nodes as a part of the cluster? [no put, can block gpnpd
dispatch]
– Is ocssd is not up? [no put]
– OCR was up, but failed [gpnpd dispatch can block, client waits in receive until OCR
recovers]
For all the above a first source would be the appropriate daemon log files and check the
resources status via crsctl stat res –init –t
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 45/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
44
Other troubleshooting steps if GPnPD is not running are:
– Check if the GPnP configuration is valid and check the GPnP log files for errors.
Some sanity checks can be done with gpnptool check or gpnptool verify
# gpnptool check -\
p=/scratch/grid_home_11.2/gpnp/stnsp006/profiles/peer/profile.xml
Profile cluster="stnsp0506", version=4
GPnP profile signed by peer, signature valid.
Got GPnP Service current profile to check against.
Current GPnP Service Profile cluster="stnsp0506", version=4
Error: profile version 4 is older than- or duplicate of- GPnP Service
current profile version 4.
Profile appears valid, but push will not succeed.
# gpnptool verify
Oracle GPnP Tool
verify Verify profile signature against wallet certificate
Usage:
"gpnptool verify <switches>", where switches are:
-p[=profile.xml] GPnP profile name
-w[=file:./] WRL-locator of OracleWallet with crypto
keys
-wp=<val> OracleWallet password, optional
-wu[=owner] Wallet certificate user (enum:
owner,peer,pa)-t[=3] Trace level (min..max=0..7), optional
-f=<val> Command file name, optional
-? Print verb help and exit
– Is gpnpd serving locally, this can be checked with gpnptool lfind
# gpnptool lfind
Success. Local gpnpd found.
‘gpnptool get’ should return the local profile information. If gpnptool lfind|get
hangs, a pstack from the hanging client and the GPnPD log files under
Grid_home/log/<hostname>/gpnpd would be beneficial for further debugging.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 46/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
45
– To check if the remote GPnPD daemon is responding, the ‘find’ option is very
helpful:
# gpnptool find -h=stnsp006
Found 1 instances of service 'gpnp'.
mdns:service:gpnp._tcp.local.://stnsp006:17452/agent=gpnpd,cname=stnsp0506
,host=stnsp006,pid=13133/gpnpd h:stnsp006 c:stnsp0506
If the above is hanging or returns with an error, check the
Grid_home/log/<hostname>/mdnsd/*.log files and the gpnpd logs.
– To check if all the peers are responding, run gpnptool find –c=<clustername>
# gpnptool find -c=stnsp0506
Found 2 instances of service 'gpnp'.
mdns:service:gpnp._tcp.local.://stnsp005:23810/agent=gpnpd,cname=stnsp0506
,host=stnsp005,pid=12408/gpnpd h:stnsp005 c:stnsp0506
mdns:service:gpnp._tcp.local.://stnsp006:17452/agent=gpnpd,cname=stnsp0506
,host=stnsp006,pid=13133/gpnpd h:stnsp006 c:stnsp0506
We store copies of the GPnP profile in the local OLR and the OCR. In case of loss or
corruption, GPnPD pulls the information from there and recreates the profile.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 47/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
46
1.7 Oracle Grid Naming Service (GNS):
GNS performs name resolution in the cluster. GNS doesn't always use mDNS for
performance reasons.
In Oracle Clusterware 11g release 2 (11.2) we support the use of DHCP for both the private
interconnect and for almost all virtual IP addresses on the public network. For clients outside
the cluster to find the virtual hosts in the cluster, we provide a Grid Naming Service (GNS).
This works with any higher-level DNS to provide resolvable names to external clients.
This section explains how to perform a simple setup of DHCP and GNS. A complex network
environment may require a more elaborate solution. The GNS and DHCP setup must be in
place before the grid infrastructure installation.
1.7.1 What Grid Naming Service Provides
DHCP provides dynamic configuration of the hosts IP address, but does not provide a goodway to produce names that are useful to external clients. As a result, it has been uncommon
in server complexes. In Oracle Clusterware 11g release 2 (11.2), this problem is solved by
providing our own service for resolving names in the cluster, and connecting this to the DNS
that is visible to the clients.
1.7.2 Network Configuration Steps
To get GNS to work for clients, it is necessary to configure the higher-level DNS to “delegate”
a subdomain to the cluster, and the cluster must run GNS on an address known to the DNS.
The GNS address will be maintained as a statically configured VIP in the cluster. The GNS
daemon (GNSD) will follow that VIP around the cluster and service names in the subdomain.
Four things need to be configured:
– A single static address in the public network for the cluster to use as the GNS VIP.
– Delegation from the higher-level DNS for names within the cluster sub-domain to
the GNS VIP.
– a DHCP for dynamic address provision on the public network
– a running cluster with properly configured GNS
1.7.2.1 Obtain an IP address for the GNS-VIP
Request an IP address from your network administrator to be assigned as the GNS-VIP. This
IP address is to be registered with the corporate DNS as the GNS-VIP for a given cluster, for
example strdv0108-gns.mycorp.com . Do not plumb this IP address; it will be
managed by clusterware after installing it.
1.7.2.2 Establish DNS delegation for the GNS sub-domain to the GNS-VIP
Create an entry of the following format in the appropriate DNS zone file:
# Delegate to gns on strdv0108
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 48/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
47
strdv0108-gns.mycorp.com NS strdv0108.mycorp.com
#Let the world know to go to the GNS vip
strdv0108.mycorp.com 10.9.8.7
Here, the sub-domain is strdv0108.mycorp.com, the GNS VIP has been assigned the
name strdv0108-gns.us.mycorp.com (corresponding to a chosen static IP address),
and the GNS daemon will listen on the default port 53.
NOTE: This does not establish an address for the name strdv0108.mycorp.com – it
creates a way of resolving a name within this sub-domain, such as clusterNode1-
VIP.strdv0108.mycorp.com.
1.7.3 DHCP
With DHCP, a host requiring an IP address sends a broadcast message to the hardware
network. A DHCP server on the segment can respond to the request, and give back an
address, along with other information such as what gateway to use, what DNS server(s) to
use, what domain should be used, what NTP server should be used, etc.
When we get DHCP for the public network, we have several IP addresses:
– One IP address per host (the node VIP)
– Three IP addresses per cluster for the cluster-wide SCAN.
The GNS VIP can’t be obtained from DHCP, because it must be known in advance, so must
be statically assigned.
The DHCP configuration file is /etc/dhcp.conf.
Using the following configuration example:
– the interface on the subnet is 10.228.212.0/10 (netmask 255.255.252.0)
– the addresses allowed to be served are 10.228.212.10 through 10.228.215.254
– the gateway is 10.228.212.1
– the domain the machines will reside in for DNS purposes is strdv0108.mycorp.com
/etc/dhcp.conf would contain something similar to:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 49/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
48
subnet 10.228.212.0 netmask 255.255.252.0
{
default-lease-time 43200;
max-lease-time 86400;
option subnet-mask 255.255.252.0;
option broadcast-address 10.228.215.255;
option routers 10.228.212.1;
option domain-name-servers M.N.P.Q, W.X.Y.Z;
option domain-name "strdv0108.mycorp.com";
pool
{
range 10.228.212.10 10.228.215.254;
}
}
1.7.3.1 Name resolution
The /etc/resolv.conf must contain nameserver entries that are resolvable to corporate DNS
servers, and the total timeout period configured (a combination of options attempts
[retries] and options timeout [exponential backoff]) should be less than 30 seconds. For
example:
/etc/resolv.conf:
options attempts: 2
options timeout: 1
search us.mycorp.com mycorp.com
nameserver 130.32.234.42
nameserver 133.2.2.15
The /etc/nsswitch.conf controls name service lookup order. In some system configurations,
the Network Information System (NIS) can cause problems with Oracle SCAN address
resolution. It is suggested to place the NIS entry at the end of the search list.
/etc/nsswitch.conf
hosts: files dns nis
See Also: Oracle Grid Infrastructure Installation Guide,
"DNS Configuration for Domain Delegation to Grid Naming Service" for more information.
In Oracle Clusterware 11g release 2 (11.2) GNS is managed by a Clusterware agent
(orarootagent). The agent will start, stop and check the GNS. The SCAN agent advertises its
name and address with GNS and each SCAN VIP registers itself as well. All this is done during
the Oracle Universal Installer installation. The information about GNS is added to the OCR
and the GNS is added to the cluster through the srvctl add gns –d <mycluster.company.com>
command.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 50/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
49
1.7.4 The GNS Server
During a server startup the GNS server retrieves the name from the subdomain to be
serviced from the OCR and starts the threads. The first thing the GNS server will do is a self
check once all the threads are running. It performs a test to see if the name resolution is
working. The client API is called to register a dummy name and address and the server then
attempts to resolve the name. If the resolution succeeds and one of the addresses matches
the dummy address, the self check has succeeded and a message is written to the cluster
alert<hostname>.log. This self check is done only once and even if the test is failing GNS
server keeps running.
The default trace location for GNS server is Grid_home/log/<hostname>/gnsd/. The trace file
format looks like the following:
<Time stamp>: [GNS][Thread ID]<Thread name>::<function>:<message>
2009-09-21 10:33:14.344: [GNS][3045873888] Resolve::clsgnmxInitialize:
initializing mutex 0x86a7770 (SLTS 0x86a777c).
1.7.5 The GNS Agent
The GNS Agent (orarootagent) will check the GNS server periodically. The check is done by
querying the GNS for its status.
To see if the agent is successfully advertising with GNS, run:
#grep -i 'updat.*gns'
Grid_home/log/<hostname>/agent/crsd/orarootagent_root/orarootagent_*
orarootagent_root.log:2009-10-07 10:17:23.513: [ora.gns.vip] [check] Updating GNS
with stnsp0506-gns-vip 10.137.13.245
orarootagent_root.log:2009-10-07 10:17:23.540: [ora.scan1.vip] [check] Updating
GNS with stnsp0506-scan1-vip 10.137.12.200
orarootagent_root.log:2009-10-07 10:17:23.562: [ora.scan2.vip] [check] Updating
GNS with stnsp0506-scan2-vip 10.137.8.17
orarootagent_root.log:2009-10-07 10:17:23.580: [ora.scan3.vip] [check] Updating
GNS with stnsp0506-scan3-vip 10.137.12.214
orarootagent_root.log:2009-10-07 10:17:23.597: [ora.stnsp005.vip] [check] Updating
GNS with stnsp005-vip 10.137.12.228
orarootagent_root.log:2009-10-07 10:17:23.615: [ora.stnsp006.vip] [check] Updating
GNS with stnsp006-vip 10.137.12.226
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 51/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
50
1.7.6 Command Line Interface
There command line interface to interact with GNS is via srvctl (the only supported way).
The crsctl can stop and start the ora.gns but we don’t support this other than told by
development directly.
GNS operations are run by performing operations on the “gns” noun so like:
# srvctl {start|stop|modify|etc.} gns ...
To start gns:
# srvctl start gns [-l <log_level>] - where –l is the level of logging that GNS
should run with.
To stop gns:
# srvctl stop gns
To advertise a name and address:
# srvctl modify gns -N <name> -A <address>
1.7.7 Debugging GNS
The default GNS server logging level is 0, which can be seen via a simple ps –ef | grep
gnsd.bin.
/scratch/grid_home_11.2/bin/gnsd.bin -trace-level 0 -ip-address 10.137.13.245 -
startup-endpoint ipc://GNS_stnsp005_31802_429f8c0476f4e1
To debug GNS server issues it is sometimes necessary to increase this log level. Which can be
done by stopping the GNS server via srvctl stop gns and restart it via srvctl start gns –v –l 5. Only the root user can stop and start the GNS.
Usage: srvctl start gns [-v] [-l <log_level>] [-n <node_name>]
-v Verbose output
-l <log_level> Specify the level of logging that GNS should run
with.
-n <node_name> Node name
-h Print usage
The trace level ranges from 0 to 6; level 5 should be sufficient in all the cases; setting the
trace level to level 6 is not recommended as gnsd will consume a lot of CPU.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 52/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
51
Due to bug 8705125 in 11.2.0.1, the default logging level for GNS server (gnsd daemon) will
be level 6 after the initial installation. To set the log level back to the default value of 0, stop
and start the GNS using ‘srvctl stop / start’. This will only stop and start the gnsd.bin, and
will not cause any harm on the running cluster.
– srvctl stop gns
– srvctl start gns –l 0
To list the current GNS configuration srvctl should be used like:
srvctl config gns –a
GNS is enabled.
GNS is listening for DNS server requests on port 53
GNS is using port 5353 to connect to mDNS
GNS status: OK
Domain served by GNS: stnsp0506.oraclecorp.com
GNS version: 11.2.0.1.0
GNS VIP network: ora.net1.network
Starting with 11.2.0.2 the –l option (List all records in GNS) is a very helpful option in order
to debug GNS issues.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 53/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
52
1.8 Grid Interprocess Communication
Grid Interprocess Communication (GIPC) is a new common communications infrastructure to
replace CLSC/NS. It provides a full control of the communications stack from the operatingsystem up to whatever client library uses it. The dependency on network services (NS) prior
to 11.2 is removed, but there is still backwards compatibility with existing CLSC clients
(mainly from 11.1).
GIPC can support multiple communications types: CLSC, TCP, UDP, IPC and the
communication type GIPC.
The configuration regarding listening endpoints with GIPC is a little different. The
private/cluster interconnects are now defined in the GPnP profile.
The requirement for the same interfaces to exist with the same name on all nodes is more
relaxed, as long as communication will be established. The part of the GPnP profile
regarding the private and public network configuration is:
<gpnp:Network id="net1" IP="10.137.8.0" Adapter="eth0" Use="public"/><gpnp:Network
id="net2" IP="10.137.20.0" Adapter="eth2" Use="cluster_interconnect"/>
1.8.1 Logs and Diagnostics
The GIPC default trace level only prints errors, and the default trace level for the different
components ranges from 0 to 2. To debug GIPC related issues, it might be necessary to
increase the trace levels, which are described below.
1.8.2 Setting trace levels via crsctl
With crsctl it is possible to set a GIPC trace level for different components.
Example:
# crsctl set log css COMMCRS:abcd
Where
• a denotes the trace level for NM
• b denotes the trace level for GM
• c denotes the trace level for GIPC
• d denotes the trace level for PROC
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 54/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
53
If the component of interest is GIPC, and want to modify the GIPC trace level only, up from
its default value of 2, simply run:
# crsctl set log css COMMCRS:2242
To turn on GIPC tracing for all components (NM, GM, etc.), set
# crsctl set log css COMMCRS:3 or
# crsctl set log css COMMCRS:4
With level 4, a lot of tracing is generated, so the ocssd.log will wrap around fairly quickly.
1.8.3 Setting trace levels via GIPC_TRACE_LEVEL and GIPC_FIELD_LEVEL
Another option is to set a pair of environment variables for the component using GIPC as
communication e.g. ocssd. In order to achieve this, a wrapper script is required. Taking
ocssd as an example, the wrapper script is Grid_home/bin/ocssd that invokes ‘ocssd.bin’.
Adding the variables below to the wrapper script (under the LD_LIBRARY_PATH) and
restarting ocssd will enable GIPC tracing. To restart ocssd.bin, perform a crsctl stop/start
cluster.
case `/bin/uname` in
Linux)
LD_LIBRARY_PATH=/scratch/grid_home_11.2/lib
export LD_LIBRARY_PATH
export GIPC_TRACE_LEVEL=4
export GIPC_FIELD_LEVEL=0x80
# forcibly eliminate LD_ASSUME_KERNEL to ensure NPTL where
available
LD_ASSUME_KERNEL=
export LD_ASSUME_KERNEL
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ];then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
;;
This will set the trace level to 4. The values for the trace environment variables are
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 55/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
54
GIPC_TRACE_LEVEL=3 (valid range [0-6])
GIPC_FIELD_LEVEL=0x80 (only 0x80 is supported)
1.8.4 Setting trace levels via GIPC_COMPONENT_TRACE
To enable more fine grained tracing use the following environment variable
GIPC_COMPONENT_TRACE. The defined components are
GIPCGEN, GIPCTRAC, GIPCWAIT, GIPCXCPT, GIPCOSD, GIPCBASE, GIPCCLSA, GIPCCLSC,
GIPCEXMP, GIPCGMOD, GIPCHEAD, GIPCMUX, GIPCNET, GIPCNULL, GIPCPKT, GIPCSMEM,
GIPCHAUP, GIPCHALO, GIPCHTHR, GIPCHGEN, GIPCHLCK, GIPCHDEM, GIPCHWRK
Example:
# export GIPC_COMPONENT_TRACE=GIPCWAIT:4,GIPCNET:3
How does a trace message look like?
2009-10-23 05:47:40.952: [GIPCMUX][2993683344]gipcmodMuxCompleteSend: [mux]
Completed send req 0xa481c0e0 [00000000000093a6] { gipcSendRequest : addr '', data
0xa481c830, len 104, olen 104, parentEndp 0x8f99118, ret gipcretSuccess (0),
objFlags 0x0, reqFlags 0x2 }
2009-10-23 05:47:40.952: [GIPCWAIT][2993683344]gipcRequestSaveInfo: [req]
Completed req 0xa481c0e0 [00000000000093a6] { gipcSendRequest : addr '', data
0xa481c830, len 104, olen 104, parentEndp 0x8f99118, ret gipcretSuccess (0),
objFlags 0x0, reqFlags 0x4 }
Only some layers like CSS (client and server), GPNPD, GNSD, and small parts of MDNSD areusing GIPC right now.
Others like CRS/EVM/OCR/CTSS will use GIPC starting with 11.2.0.2. This is important to
know in order to turn on GIPC tracing or the old NS/CLSC tracing to debug communication
issues.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 56/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
55
1.9 Cluster time synchronization service daemon (CTSS):
The CTSS is a new feature in Oracle Clusterware 11g release 2 (11.2), which takes care of
time synchronization in a cluster, in case the network time protocol daemon is not runningor is not configured properly.
The CTSS synchronizes the time on all of the nodes in a cluster to match the time setting on
the CTSS master node. When Oracle Clusterware is installed, the Cluster Time
Synchronization Service (CTSS) is installed as part of the software package. During
installation, the Cluster Verification Utility (CVU) determines if the network time protocol
(NTP) is in use on any nodes in the cluster. On Windows systems, CVU checks for NTP and
Windows Time Service.
If Oracle Clusterware finds that NTP is running or that NTP has been configured, then NTP is
not affected by the CTSS installation. Instead, CTSS starts in observer mode (this condition is
logged in the alert log for Oracle Clusterware). CTSS then monitors the cluster time and logsalert messages, if necessary, but CTSS does not modify the system time. If Oracle
Clusterware detects that NTP is not running and is not configured, then CTSS designates one
node as a clock reference, and synchronizes all of the other cluster member time and date
settings to those of the clock reference.
Oracle Clusterware considers an NTP installation to be misconfigured if one of the following
is true:
– NTP is not installed on all nodes of the cluster; CVU detects an NTP installation by a
configuration file, such as ntp.conf
– The primary and alternate clock references are different for all of the nodes of the
cluster
– The NTP processes are not running on all of the nodes of the cluster; only one type
of time synchronization service can be active on the cluster.
To check whether CTSS is running in active or observer mode run crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
or
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset from the reference node (in msec): 100
The tracing for the ctssd daemon is written to the octssd.log. The alert log
(alert<hostname>.log) also contains information about the mode in which CTSS is running.
[ctssd(13936)]CRS-2403:The Cluster Time Synchronization Service on host node1 is
in observer mode.
[ctssd(13936)]CRS-2407:The new Cluster Time Synchronization Service reference node
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 57/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
56
is host node1.
[ctssd(13936)]CRS-2401:The Cluster Time Synchronization Service started on host
node1.
1.9.1 CVU checks
There are pre-install CVU checks performed automatically during installation, like: cluvfy
stage –pre crsinit <>
This step will check and make sure that the operating system time synchronization software
(e.g. NTP) is either properly configured and running on all cluster nodes, or on none of the
nodes.
During the post-install check, CVU will run cluvfy comp clocksync –n all. If CTSS is in observer
mode, it will perform a configuration check as above. If the CTSS is in active mode, we verify
that the time difference is within the limit.
1.9.2 CTSS resource
When CTSS comes up as part of the clusterware startup, it performs step time sync, and if
everything goes well, it publishes its state as ONLINE. There is a start dependency on
ora.cssd but note that it has no stop dependency, so if for some reasons (maybe faulted
CTSSD), CTSSD dumps core or exits, nothing else should be affected.
The chart below shows the start dependency build on ora.ctssd for other resources.
Figure 5: ora.ctssd start dependency picture.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 58/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
57
crsctl stat res ora.ctssd -init –t
----------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
----------------------------------------------------------------------
ora.ctssd
1 ONLINE ONLINE node1 OBSERVER
1.10 mdnsd
1.10.1 Debugging mdnsd
In order to capture mdnsd network traffic, use the mDNS Network Monitor located in
Grid_home/bin:
# mkdir Grid_home/log/$HOSTNAME/netmon
# Grid_home/bin/oranetmonitor &
The output from oranetmonitor will be captured in netmonOUT.log in the above directory.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 59/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
58
2 Voting Files and Oracle Cluster Repository Architecture
Storing OCR and the voting files in ASM eliminates the need for third-party cluster volume
managers and eliminates the complexity of managing disk partitions for OCR and voting files
in Oracle Clusterware installations.
2.1 Voting File in ASM
ASM manages voting files differently from other files that it stores. When voting files are
placed on disks in an ASM disk group, Oracle Clusterware records exactly on which disks in
that diskgroup they are located. If ASM fails, then CSS can still access the voting files. If you
choose to store voting files in ASM, then all voting files must reside in ASM, i.e. we do not
support mixed configurations like storing some voting files in ASM and some on NAS.
The number of voting files you can store in a particular Oracle ASM disk group depends upon
the redundancy of the disk group.
– External redundancy: A disk group with external redundancy can store only one
voting file
– Normal redundancy: A disk group with normal redundancy can store up to three
voting files
– High redundancy: A disk group with high redundancy can store up to five voting files
By default, Oracle ASM puts each voting file in its own failure group within the disk group. A
failure group is a subset of the disks in a disk group, which could fail at the same time
because they share hardware, e.g. a disk controller. The failure of common hardware must
be tolerated. For example, four drives that are in a single removable tray of a large JBOD
(Just a Bunch of Disks) array are in the same failure group because the tray could beremoved, making all four drives fail at the same time. Conversely, drives in the same cabinet
can be in multiple failure groups if the cabinet has redundant power and cooling so that it is
not necessary to protect against failure of the entire cabinet. However, Oracle ASM
mirroring is not intended to protect against a fire in the computer room that destroys the
entire cabinet. If voting files stored on Oracle ASM with Normal or High redundancy, and the
storage hardware in one failure group suffers a failure, then if there is another disk available
in a disk group in an unaffected failure group, Oracle ASM recovers the voting file in the
unaffected failure group.
2.2 Voting File Changes
–
The voting files formation critical data are stored in the voting file and not in theOCR anymore. From a voting file perspective, the OCR is not touched at all. The
critical data each node must agree on to form a cluster are e.g. misscount and the
list of voting files configured.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 60/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
59
– In Oracle Clusterware 11g release 2 (11.2), it is no longer necessary to back up the
voting disk. The voting disk data is automatically backed up in OCR as part of any
configuration change and is automatically restored to any voting disk that is being
added. If all voting disks are corrupted, however, you can restore them as described
in the Oracle Clusterware Administration and Deployment Guide.
– New blocks added to the voting files are the voting file identifier block (needed for
voting file stored in ASM), and it contains the cluster GUID and the file UID. The
committed and pending configuration incarnation number (CCIN and PCIN) contain
this formation critical data.
– To query the configured voting files and to see their location run crsctl query css
votedisk :
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ----------
1. ONLINE 3e1836343f534f51bf2a19dff275da59 (/dev/sdf10) [DATA]
2. ONLINE 138cbee15b394f3ebf57dbfee7cec633 (/dev/sdg11) [DATA]
3. ONLINE 462722bd24c94f70bf4d90539c42ad4c (/dev/sdu12) [DATA]
Located 3 voting file(s).
– Voting files that reside in ASM
o Voting files that reside in ASM may be automatically deleted and added
back if one of the existing voting files gets corrupted.
– Voting files can be migrated from/to NAS/ASM and from ASM to ASM with e.g
$ crsctl replace css votedisk /nas/vdfile1 /nas/vdfile2 /nas/vdfile3
or
$ crsctl replace css votedisk +OTHERDG
– If all voting files are corrupted, however, you can restore them as described below.
If the cluster is down and cannot restart due to lost voting files, then you must start
CSS in exclusive mode to replace the voting files by entering the following
command:
o # crsctl start crs –excl (on one node only)
o # crsctl delete css votedisk FUID
o # crsctl add css votedisk path_to_voting_disk
– In case of extended Oracle Clusterware / extended RAC configuration, the third
voting file must be located on a third storage on a third side to prevent you from a
data center outage. We do support a third voting on standard NFS. For more
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 61/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
60
information see Appendix “Oracle Clusterware 11g release 2 (11.2) - Using standard
NFS to support a third voting file on a stretch cluster configuration”.
See Also: Oracle Clusterware Administration and Deployment Guide, "Voting file, OracleCluster Registry, and Oracle Local Registry" for more information. For information about
extended clusters and how to configure the quorum voting file see the Appendix.
2.3 Oracle Cluster Registry (OCR)
As of 11.2, OCR can also be stored in ASM. The ASM partnership and status table (PST) is
replicated on multiple disks and is extended to store OCR. Consequently, OCR can tolerate
the loss of the same number of disks as are in the underlying disk group, and be can
relocated / rebalanced in response to disk failures.
In order to store an OCR on a disk group, the disk group has a ‘special’ file type called ‘ocr’.
The default configuration location is /etc/oracle/ocr.loc
# cat /etc/oracle/ocr.loc
ocrconfig_loc=+DATA
local_only=FALSE
From a user and maintenance perspective, the rest remains the same. The OCR can only be
configured in ASM when the cluster completely migrated to 11.2 (crsctl query crs
activeversion >= 11.2.0.1.0). We still support mixed configurations, so we could have OCR’s
stored in ASM and another stored on a supported NAS device, as we support up to 5 OCR
locations in 11.2.0.1. We do not support raw or block devices for neither OCR nor voting files
anymore.
The OCR diskgroup is auto mounted by the ASM instance during startup. The CRSD and ASMdependency is maintained by OHASD.
OCRCHECK
There are small enhancements in ocrcheck like the –config which is only checking the
configuration. Run ocrcheck as root otherwise the logical corruption check will not run. To
check OLR data use the –local keyword.
Usage: ocrcheck [-config] [-local]
Shows OCR version, total, used and available space
Performs OCR block integrity (header and checksum) checks
Performs OCR logical corruption checks (11.1.0.7)
‘-config’ checks just configuration (11.2)
‘-local’ checks OLR, default OCR
Can be run when stack is up or down
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 62/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
61
The output is similar like below:
# ocrcheck
Status of Oracle Cluster Registry is as follows:
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3072
Available space (kbytes) : 259048
ID : 701301903
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File Name : /nas/cluster3/ocr3
Device/File integrity check succeeded
Device/File Name : /nas/cluster5/ocr1
Device/File integrity check succeeded
Device/File Name : /nas/cluster2/ocr2
Device/File integrity check succeeded
Device/File Name : /nas/cluster4/ocr4
Device/File integrity check succeeded
Cluster registry integrity check succeeded
Logical corruption check succeeded
2.4 Oracle Local Registry (OLR)
The OLR, similar in structure as the OCR, is a node-local repository, and is managed by
OHASD. The configuration data in OLR pertains to the local node only, and is not shared
among other nodes.
The configuration is stored in ‘/etc/oracle/olr.loc’ (on Linux) or equivalent on other OS. The
default location after installing Oracle Clusterware is:
– RAC: Grid_home/cdata/<hostname.olr>
– Oracle Restart: Grid_home/cdata/localhost/hostname.
The information stored in the OLR is needed by OHASD to start or join a cluster; this includes
data about GPnP wallets, clusterware configuration and version information.
OLR keys have the same properties as OCR keys and the same tools are used to either check
or dump them.
To see the OLR location, run the command:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 63/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
62
# ocrcheck -local –config
Oracle Local Registry configuration is :
Device/File Name : Grid_home/cdata/node1.olr
To dump the OLR content, run the command:
# ocrdump -local –stdout (or filename)
ocrdump –h to get the usage
See Also: Oracle Clusterware Administration and Deployment Guide, "Managing the Oracle
Cluster Registry and Oracle Local Registries" for more information about using the ocrconfig
and ocrcheck.
2.5 Bootstrap and Shutdown if OCR is located in ASM
ASM has to be up with the diskgroup mounted before any OCR operations can beperformed. There are bugs reported when the diskgroup having OCR was dismounted force
and/or ASM instance was shutdown abort.
When the stack is running, CRSD keeps reading/writing OCR.
OHASD maintains the resource dependency and will bring up ASM with the required
diskgroup mounted before it starts CRSD.
Once ASM is up with the diskgroup mounted, the usual ocr* commands (ocrcheck,
ocrconfig, etc.) can be used.
The shutdown command will fail with an ORA-15097 for the ASM instance with an active
OCR (meaning that crsd is running on this node) in it. In order to see which clients areaccessing ASM, use the commands
asmcmd lsct (v$asm_client)
DB_Name Status Software_Version Compatible_version Instance_Name
Disk_Group
+ASM CONNECTED 11.2.0.1.0 11.2.0.1.0 +ASM2
DATA
asmcmd lsof
DB_Name Instance_Name Path
+ASM +ASM2 +data.255.4294967295
Where +data.255 is the OCR file number which is used to identify the OCR file within ASM.
2.6 OCR in ASM diagnostics
If any error occurs,
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 64/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
63
– Ensure that the ASM instance is up and running with the required diskgroup
mounted, and/or check ASM alert.log for the status for the ASM instance.
– Verify that the OCR files were properly created in the diskgroup, using asmcmd ls. Since the clusterware stack keeps accessing OCR files, most of the time the error
will show up as a CRSD error in the crsd.log. Any error related to an ocr* command
(like crsd, also considered an ASM client) will generate a trace file in the
Grid_home/log/<hostname>/client directory; in either case, look for kgfo / kgfp /
kgfn at the top of the error stack.
– Confirm that the ASM compatible.asm property of the diskgroup is set to at least
11.2.0.0.
2.7 The ASM Diskgroup Resource
When the diskgroup is created, the diskgroup resource is automatically created with the
name, ora.<DGNAME>.dg and the status is set to ONLINE. The status OFFLINE will be set if
the diskgroup is dismounted, as this is a CRS managed resource now. When the diskgroup is
dropped the diskgroup resource is removed as well.
A dependency between the database and the diskgroup is automatically created when the
database tries to access the ASM files. However, when the database does not longer uses
the ASM files or the ASM files are removed, we do not remove the database dependency
automatically. This must be done using the srvctl command line tool.
Typical ASM alert.log messages for success/failure and warnings are
Success:
NOTE: diskgroup resource ora.DATA.dg is offline
NOTE: diskgroup resource ora.DATA.dg is online
Failure
ERROR: failed to online diskgroup resource ora.DATA.dg
ERROR: failed to offline diskgroup resource ora.DATA.dg
Warning
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to
communicate with CRSD/OHASD)
This warning may appear when the stack is started
WARNING: unknown state for diskgroup resource ora.DATA.dg
If errors happen, look at the ASM alert.log for the related resource operation status message
like,
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 65/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
64
“ERROR”: the resource operation failed; check CRSD log and Agent log for more
details
Grid_home/log/<hostname>/crsd/
Grid_home/log/<hostname>/agent/crsd/oraagent_user/
“WARNING”: cannot communicate with CRSD.
This warning can be ignored during bootstrap as ASM instance starts up and mount the
diskgroup before CRSD.
The status of the diskgroup resource and the diskgroup should be consistent. In rare cases,
they may become out of sync transiently. To get them back in sync manually run srvctl to
sync the status, or wait some time for the agent to refresh the status. If they become out of
sync for a long period, please check CRSD log and ASM log for more details.
To turn on more comprehensive tracing use event="39505 trace name context forever, level1“.
2.8 The Quorum Failure Group
A quorum failure group is a special type of failure group and disks in these failure groups do
not contain user data and are not considered when determining redundancy requirements.
The COMPATIBLE.ASM disk group compatibility attribute must be set to 11.2 or greater to
store OCR or voting file data in a disk group.
During Oracle Clusterware installation we do not offer to create a quorum failure group
which is needed for a third voting files in case of extended / stretched clusters or two
storage arrays.
Create a diskgroup with a failgroup and optionally a quorum failgroup if a third array is
available.
SQL> CREATE DISKGROUP PROD NORMAL REDUNDANCY
FAILGROUP fg1 DISK ‘<a disk in SAN1>’
FAILGROUP fg2 DISK ‘<a disk in SAN2>’
QUORUM FAILGROUP fg3 DISK ‘<another disk or file on a third location>’
ATTRIBUTE ‘compatible.asm’ = ’11.2.0.0’;
If the diskgroup creation was done using ASMCA, then after adding a quorum disk to the disk
group, Oracle Clusterware will automatically change the CSS votedisk location to something
like below:
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 3e1836343f534f51bf2a19dff275da59 (/dev/sdg10) [DATA]
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 66/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
65
2. ONLINE 138cbee15b394f3ebf57dbfee7cec633 (/dev/sdf11) [DATA]
3. ONLINE 462722bd24c94f70bf4d90539c42ad4c (/voting_disk/vote_node1)
[DATA]
Located 3 voting file(s).
If it is done via SQL*PLUS the crsctl replace css votedisk must be used.
See Also: Oracle Database Storage Administrator's Guide, "Oracle ASM Failure Groups" for
more information. Oracle Clusterware Administration and Deployment Guide, "Voting file,
Oracle Cluster Registry, and Oracle Local Registry" for more information about backup and
restore and failure recovery.
2.9 ASM spfile
2.9.1 ASM spfile location
Oracle recommends that the Oracle ASM SPFILE is placed in a disk group. You cannot use anew alias created on an existing Oracle ASM SPFILE to start up the Oracle ASM instance.
If you do not use a shared Oracle grid infrastructure home, then the Oracle ASM instance
can use a PFILE. The same rules for file name, default location, and search order that apply
to database initialization parameter files also apply to Oracle ASM initialization parameter
files.
When an Oracle ASM instance searches for an initialization parameter file, the search order
is:
– The location of the initialization parameter file specified in the Grid Plug and Play
(GPnP) profile
– If the location has not been set in the GPnP profile, the search order changes to:
o SPFILE in the Oracle ASM instance home
For example, the SPFILE for Oracle ASM has the following default path in
the Oracle grid infrastructure home in a Linux environment:
$ORACLE_HOME/dbs/spfile+ASM.ora
o PFILE in the Oracle ASM instance home
2.9.2 Backing Up, Moving a ASM spfile
You can back up, copy, or move an Oracle ASM SPFILE with the ASMCMD spbackup, spcopy
or spmove commands. For information about these ASMCMD commands see the Oracle
Database Storage Administrator's Guide.
See Also: Oracle Database Storage Administrator's Guide "Configuring Initialization
Parameters for an Oracle ASM Instance" for more information.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 67/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
66
3 Resources
Oracle Clusterware manages applications and processes as resources that you register with
Oracle Clusterware. The number of resources you register with Oracle Clusterware to
manage an application depends on the application. Applications that consist of only one
process are usually represented by only one resource. More complex applications, built on
multiple processes or components, may require multiple resources.
3.1 Resource types
Generally, all resources are unique but some resources may have common attributes. Oracle
Clusterware uses resource types to organize these similar resources. Using resource types
provides the following benefits:
– Manage only necessary resource attributes
– Manage all resources based on the resource type
Every resource that is registered in Oracle Clusterware must have a certain resource type. In
addition to the resource types included in Oracle Clusterware, custom resource types can be
defined using the crsctl utility. The included resource types are:
– Base resource: base type
– Local resource: instances of local resources (type name is local_resource) run on
each server of the cluster, e.g. ora.node14.vip.
– Cluster resource: cluster-aware resource types (type name is cluster_resource) are
aware of the cluster environment and are subject to cardinality and cross-server
switchover and failover; example: ora.asm.
All user-defined resource types must be based, directly or indirectly, on either the
local_resource or cluster_resource type.
In order to list all defined types and their base types, run the crsctl stat type command:
TYPE_NAME=application
BASE_TYPE=cluster_resource
TYPE_NAME=cluster_resource
BASE_TYPE=resource
TYPE_NAME=local_resource
BASE_TYPE=resource
TYPE_NAME=ora.asm.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.cluster_resource.type
BASE_TYPE=cluster_resource
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 68/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
67
TYPE_NAME=ora.cluster_vip.type
BASE_TYPE=ora.cluster_resource.type
TYPE_NAME=ora.cluster_vip_net1.type
BASE_TYPE=ora.cluster_vip.type
TYPE_NAME=ora.database.type
BASE_TYPE=ora.cluster_resource.type
TYPE_NAME=ora.diskgroup.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.eons.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.gns.type
BASE_TYPE=ora.cluster_resource.type
TYPE_NAME=ora.gns_vip.type
BASE_TYPE=ora.cluster_vip.type
TYPE_NAME=ora.gsd.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.listener.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.local_resource.type
BASE_TYPE=local_resource
TYPE_NAME=ora.network.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.oc4j.type
BASE_TYPE=ora.cluster_resource.type
TYPE_NAME=ora.ons.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.registry.acfs.type
BASE_TYPE=ora.local_resource.type
TYPE_NAME=ora.scan_listener.type
BASE_TYPE=ora.cluster_resource.type
TYPE_NAME=ora.scan_vip.type
BASE_TYPE=ora.cluster_vip.type
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 69/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
68
TYPE_NAME=resource
BASE_TYPE=
To list all the attributes and default values for a type, run crsctl stat type <typeName> -f (forfull configuration) or –p (for static configuration).
3.1.1 Base Resource Type Definition
This section specifies the attributes that make up the resource type definition. A resource is
an abstract and read-only type definition. The type may only serve as a base for other types.
Oracle Clusterware 11.2.0.1 will not allow user-defined types to extend this type directly.
To see all default values and names from the base resource type, run crsctl stat type
resource –p.
Name History Description
NAME From
10gR2
The name of the resource. Resource names must be unique
and may not be modified once the resource is created.
TYPE From
10gR2,
modified
Semantics are unchanged; values other than application exist
Type: string
Special Values: No
CHECK_INTERVAL From
10gR2
Unchanged
Type: unsigned integer
Special Values: No
Per-X Support: Yes
DESCRIPTION From
10gR2
Unchanged
Type: string
Special Values: No
RESTART_ATTEMPTS From
10gR2
Unchanged
Type: unsigned integer
Special Values: No
Per-X Support: Yes
START_TIMEOUT From
10gR2
Unchanged
Type: unsigned integer
Special Values: No
Per-X Support: Yes
STOP_TIMEOUT From
10gR2
Unchanged
Type: unsigned integer
Special Values: No
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 70/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
69
Per-X Support: Yes
SCRIPT_TIMEOUT From
10gR2
Unchanged
Type: unsigned integerSpecial Values: No
Per-X Support: Yes
UPTIME_THRESHOLD From
10gR2
Unchanged
Type: string
Special Values: No
Per-X Support: Yes
AUTO_START From
10gR2
Unchanged
Type: string
Format: restore|never|always
Required: NoDefault: restore
Special Values: No
BASE_TYPE New The name of the base type from which this type extends. This
is the value of the “TYPE” in the base type’s profile.
Type: string
Format: [name of the base type]
Required: Yes
Default: empty string (none)
Special Values: No
Per-X Support: No
DEGREE New This is the count of the number of instances of the resource
that are allowed to run on a single server. Today’s
application has a fixed degree of one. Degree supports
multiplicity within a server
Type: unsigned integer
Format: [number of attempts, >=1]
Required: No
Default: 1
Special Values: No
ENABLED New The flag that governs the state of the resource as far as being
managed by Oracle Clusterware, which will not attempt to
manage a disabled resource whether directly or because of a
dependency to another resource. However, stopping of the
resource when requested by the administrator will be allowed
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 71/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
70
(so as to make it possible to disable a resource without having
to stop it). Additionally, any change to the resource’s state
performed by an ‘outside force’ will still be proxied into the
clusterware.
Type: unsigned integer
Format: 1 | 0
Required: No
Default: 1
Special Values: No
Per-X Support: Yes
START_DEPENDENCIES New Specifies a set of relationships that govern the start of the
resource.
Type: stringRequired: No
Default:
Special Values: No
STOP_DEPENDENCIES New Specifies a set of relationships that govern the stop of the
resource.
Type: string
Required: No
Default:
Special Values: No
AGENT_FILENAME New An absolute filename (that is, inclusive of the path and file
name) of the agent program that handles this type. Every
resource type must have an agent program that handles its
resources. Types can do so by either specifying the value for
this attribute or inheriting it from their base type.
Type: string
Required: Yes
Special Values: Yes
Per-X Support: Yes (per-server only)
ACTION_SCRIPT From
10gR2,
modified
An absolute filename (that is, inclusive of the path and file
name) of the action script file. This attribute is used in
conjunction with the AGENT_FILENAME. CRSD will invoke the
script in the manner it did in 10g for all entry points
(operations) not implemented in the agent binary. That is, if
the agent program implements a particular entry point, it is
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 72/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
71
invoked; if it does not, the script specified in this attribute will
be executed.
Please note that for backwards compatibility with previousreleases, a built-in agent for the application type will be
included with CRS. This agent is implemented to always
invoke the script specified with this attribute.
Type: string
Required: No
Default:
Special Values: Yes
Per-X Support: Yes (per-server only)
ACL New Contains permission attributes. The value is populated at
resource creation time based on the identity of the processcreating the resource, unless explicitly overridden. The value
can subsequently be changed using the APIs/command line
utilities, provided that such a change is allowed based on the
existing permissions of the resource.
Format:owner:<user>:rwx,pgrp:<group>:rwx,other::r—
Where
owner: the OS User of the resource owner, followed by the
permissions that the owner has. Resource actions will be
executed as with this user ID.
pgrp: the OS Group that is the resource’s primary group,
followed by the permissions that members of the group have
other: followed by permissions that others have
Type: string
Required: No
Special Values: No
STATE_CHANGE_EVENT_TEM
PLATE
New The template for the State Change events. Type: string
Required: No
Default:Special Values: No
PROFILE_CHANGE_EVENT_TE
MPLATE
New The template for the Profile Change events. Type: string
Required: No
Default:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 73/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
72
Special Values: No
ACTION_FAILURE_EVENT_TE
MPLATE
New The template for the State Change events.
Type: string
Required: No
Default:
Special Values: No
LAST_SERVER New An internally managed, read-only attribute that contains the
name of the server on which the last start action has
succeeded.
Type: string
Required: No, read-only
Default: emptySpecial Values: No
OFFLINE_CHECK_INTERVAL New Used for controlling off-line monitoring of a resource. The
value represents the interval (in seconds) to use for implicitly
monitoring the resource when it is OFFLINE. The monitoring is
turned off if the value is 0
Type: unsigned integer
Required: No
Default: 0
Special Values: No
Per-X Support: Yes
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 74/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
73
STATE_DETAILS New An internally managed, read-only attribute that contains
details about the state of the resource. The attribute fulfills
the following needs:
1. CRSD understood resource states (Online, Offline,
Intermediate, etc) may map to different resource-specific
values (mounted, unmounted, open, closed, etc). In order to
provide a better description of this mapping, resource agent
developers may choose to provide a ‘state label’ as part of
providing the value of the STATE.
2. Providing the label, unlike the value of the resource state,
is optional. If not provided, the Policy Engine will use CRSD-
understood state values (Online, Offline, etc). Additionally, in
the event the agent is unable to provide the label (as may also
happen to the value of STATE), the Policy Engine will set the
value of this attribute to do it is best at providing the details
as to why the resource is in the state it is (why it is
Intermediate and/or why it is Unknown)
Type: string
Required: No, read-only
Default: empty
Special Values: No
3.1.2 Local Resource Type Definition
The local_resource type is the basic building block for resources that are instantiated for
each server but are cluster oblivious and have a locally visible state. While the definition of
the type is global to the clusterware, the exact property values of the resource instantiation
on a particular server are stored on that server. This resource type has no equivalent in
Oracle Clusterware 10gR2 and is a totally new concept to Oracle Clusterware.
The following table specifies the attributes that make up the local_resource type definition.
To see all default values run the command crsctl stat type local_resource –p.
Name Description
ALIAS_NAME Type: stringRequired: No
Special Values: Yes
Per-X Support: No
LAST_SERVER Overridden from resource: the name of the server to which the resource
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 75/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
74
is assigned (“pinned”).
Only Cluster Administrators will be allowed to register local resources.
3.1.3 Cluster Resource Type Definition
The cluster_resource is the basic building block for resources that are cluster aware and
have globally visible state. 11.1‘s application is a cluster_resource. The type’s base is
resource. The type definition is read-only. The following table specifies the attributes that
make up the cluster_resource type definition.
The following table specifies the attributes that make up the cluster_resource type
definition. Run crsctl stat type cluster_resource –p to see all default values.
Name History Description
ACTIVE_PLACEMENT From 10gR2 Unchanged
Type: unsigned integer
Special Values: No
FAILOVER_DELAY From 10gR2 Unchanged, Deprecated
Special Values: No
FAILURE_INTERVAL From 10gR2 Unchanged
Type: unsigned integer
Special Values: No
Per-X Support: Yes
FAILURE_THRESHOLD From 10gR2 Unchanged
Type: unsigned integer
Special Values: No
Per-X Support: Yes
PLACEMENT From 10gR2 Format: value
where value is one of the following:
restricted
Only servers that belong to the associated server
pool(s) or hosting members may host instances of the
resource.
favored
If only SERVER_POOLS or HOSTING_MEMBERS
attribute is non-empty, servers belonging to the
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 76/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
75
specified server pool(s)/hosting member list will be
considered first if available; if/when none are available,
any other server will be used.
If both SERVER_POOLS and HOSTING_MEMBERS are
populated, the former indicates preference while the
latter – restricts the choices to the servers within that
preference
balanced
Any ONLINE, enabled server may be used for
placement. Less loaded servers will be preferred to
more loaded ones. To measure how loaded a server is,
clusterware will use the LOAD attribute of resources
that are ONLINE on the server. The sum total of LOADvalues is used as the absolute measure of the current
server load.
Type: string
Default: balanced
Special Values: No
HOSTING_MEMBERS From 10g The meaning from this attribute is taken from the
previous release.
Although not officially deprecated, the use of this
attribute is discouraged.
Special Values: No
Required: @see SERVER_POOLS
SERVER_POOLS New Format:
* | [<pool name1> […]]
This attribute creates an affinity between the resource
and one or more server pools as far as placement goes.
The meaning of this attribute depends on what the
value of PLACEMENT is.
When a resource should be able to run on any server of
the cluster, a special value of * needs to be used. Note
that only Cluster Administrators can specify * as the
value for this attribute.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 77/108
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 78/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
77
Required: No
Default: 1
Special Values: No
Per-X Support: Yes
3.2 Resource Dependencies
With Oracle Clusterware 11.2 a new dependency concept is introduced, to be able to build
dependencies for start and stop actions independent and have a much better granularity.
3.2.1 Hard Dependency
If resource A has a hard dependency on resource B, B must be ONLINE before A will be
started. Please note there is no requirement that A and B be located on the same server.
A possible parameter to this dependency would allow resource B to be in either in ONLINE
or INTERMEDIATE state. Such a variation is sometimes referred to as the intermediate
dependency.
Another possible parameter to this dependency would make it possible to differentiate if A
requires that B be present on the same server or on any server in the cluster. In other words,
this illustrates that the presence of resource B on the same server as A is a must for resource
A to start.
If the dependency is on a resource type, as opposed to a concrete resource, this should be
interpreted as “any resource of the type”. The aforementioned modifiers for locality/state
still apply accordingly.
3.2.2 Weak Dependency
If resource A has a weak dependency on resource B, an attempt to start of A will attempt to
start B if is not ONLINE. The result of the attempt to start B is, however, of no consequence
to the result of starting A (it is ignored). Additionally, if start of A causes an attempt to start
B, failure to start A has no affect on B.
A possible parameter to this dependency is whether or not the start of A should wait for
start of B to complete or may execute concurrently.
Another possible parameter to this dependency would make it possible to differentiate if A
desires that B be running on the same server or on any server in the cluster. In other words,
this illustrates that the presence of resource B on the same server as A is a desired for
resource A to start. In addition to the desire to have the dependent resource started locally
or on any server in the cluster, another possible parameter is to start the dependent
resource on every server where it can run.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 79/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
78
If the dependency is on a resource type, as opposed to a concrete resource, this should be
interpreted as “every resource of the type”. The aforementioned modifiers for locality/state
still apply accordingly.
3.2.3 Attraction
If resource A attracts B, then whenever B needs to be started, servers that currently have A
running will be first on the list of placement candidates. Since a resource may have more
than one resource to which it is attracted, the number of attraction-exhibiting resources will
govern the order of precedence as far as server placement goes.
If the dependency is on a resource type, as opposed to a concrete resource, this should be
interpreted as “any resource of the type”.
A possible flavor of this relation is to require that a resource’s placement be re-evaluated
when a related resource’s state changes. For example, resource A is attracted to B and C. At
the time of starting A, A is started where B is. Resource C may either be running or started
thereafter. Resource B is subsequently shut down/fails and does not restart. Then resource
A requires that at this moment its placement be re-evaluated and it be moved to C. This is
somewhat similar to the AUTOSTART attribute of the resource profile, with the dependent
resource’s state change acting as a trigger as opposed to a server joining the cluster.
A possible parameter to this relation is whether or not resources in intermediate state
should be counted as running thus exhibit attraction or not.
If resource A excludes resource B, this means that starting resource A on a server where B is
running will be impossible. However, please see the dependency’s namesake for STOP to
find out how B may be stopped/relocated so A may start.
3.2.4 Pull-up
If a resource A needs to be auto-started whenever resource B is started, this dependency is
used. Note that the dependency will only affect A if it is not already running. As is the case
for other dependency types, pull-up may cause the dependent resource to start on any or
the same server, which is parameterized. Another possible parameter to this dependency
would allow resource B to go to either in ONLINE or INTERMEDIATE state to trigger pull-up
of A. Such a variation is sometimes referred to as the intermediate dependency. Note that if
resource A has pull-up relation to resources B and C, then it will only be pulled up when both
B and C are started. In other words, the meaning of resources mentioned in the pull-up
specification is interpreted as a Boolean AND.
Another variation in this dependency is if the value of the TARGET of resource A plays a role:
in some cases, a resource needs to be pulled-up irrespective of its TARGET while in others
only if the value of TARGET is ONLINE. To accommodate both needs, the relation offers a
modifier to let users specify if the value of the TARGET is irrelevant; by default, pull-up will
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 80/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
79
only start resources if their TARGET is ONLINE. Note that this modifier is on the relation, not
on any of the targets as it applies to the entire relation.
If the dependency is on a resource type, as opposed to a concrete resource, this should beinterpreted as “any resource of the type”. The aforementioned modifiers for locality/state
still apply accordingly.
3.2.5 Dispersion
The property between two resources that desire to avoid being co-located, if there’s no
alternative other than one of them being stopped, is described by the use of the dispersion
relation. In other words, if resource A prefers to run on a different server than the one
occupied by resource B, then resource A is said to have a dispersion relation to resource B at
start time. This sort of relation between resources has an advisory effect, much like that of
attraction: it is not binding as the two resources may still end up on the same server.
A special variation on this relation is whether or not crsd is allowed/expected to disperse
resources, once it is possible, that are already running. In other words, normally, crsd will
not disperse co-located resources when, for example, a new server becomes online: it will
not actively relocate resources once they are running, only disperse them when starting
them. However, if the dispersion is ‘active’, then crsd will try to relocate one of the
resources that disperse to the newly available server.
A possible parameter to this relation is whether or not resources in intermediate state
should be counted as running thus exhibit attraction or not.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 81/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
80
4 Fast Application Notification (FAN)
4.1 Event Sources
In 11.2, the CRSD master is the originator of most events, and the database is the source of the Remote Load Balance (RLB) events. The CRSD master passes events from the
PolicyEngine thread to the ReporterModule thread, in which the events are translated to
eONS events, and then the events are sent out to peers within the cluster. If eONS is not
running, the ReporterModule attempts to cache the events until it the eONS server is
running, and then retries. The events are guaranteed to be sent and received in the order in
which the actions happened.
4.2 Event Processing architecture in oraagent
4.2.1 database / ONS / eONS agents
Every node runs one database agent, one ONS agent, and one eONS agent within crsd's
oraagent process. These agents are responsible for stop/start/check actions. There are no
dedicated threads for each agent; instead oraagent use a pool of threads to execute these
actions for the various resources.
4.2.2 eONS subscriber threads
Each of the three agents (as mentioned above) is associated with one other thread in the
oraagent that is blocked on ons_subscriber_receive(). These eONS subscriber threads can be
identified by the string "Thread:[EonsSub ONS]", "Thread:[EonsSub EONS]" and
"Thread:[EonsSub FAN]" in the oraagent log. In the example below, a service was stopped
and this node's crsd oraagent process and its three eONS subscriber received the event:
2009-05-26 23:36:40.479: [AGENTUSR][2868419488][UNKNOWN] Thread:[EonsSub FAN]
process {
2009-05-26 23:36:40.500: [AGENTUSR][2868419488][UNKNOWN] Thread:[EonsSub FAN]
process }
2009-05-26 23:36:40.540: [AGENTUSR][2934963104][UNKNOWN] Thread:[EonsSub ONS]
process }
2009-05-26 23:36:40.558: [AGENTUSR][2934963104][UNKNOWN] Thread:[EonsSub ONS]
process {
2009-05-26 23:36:40.563: [AGENTUSR][2924329888][UNKNOWN] Thread:[EonsSub EONS]
process {
2009-05-26 23:36:40.564: [AGENTUSR][2924329888][UNKNOWN] Thread:[EonsSub EONS]
process }
4.2.3 Event Publishers/processors in general
On one node of the cluster, the eONS subscriber of the following agents also assumes the
role of a publisher or processor or master (pick your favorite terminology):
– One dbagent's eONS subscriber assumes the role "CLSN.FAN.pommi.FANPROC"; this
subscriber is responsible for publishing ONS events (FAN events) to the HA alerts
queue for database 'pommi'. There is one FAN publisher per database in the cluster.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 82/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
81
– One onsagent's eONS subscriber assumes the role "CLSN.ONS.ONSPROC", publisher
for ONS events; this subscriber is responsible for sending eONS events to ONS
clients.
– Each eonsagent's eONS subscriber on every node publishes eONS events as user
callouts. There is no single eONS publisher in the cluster. User callouts are no longer
produced by racgevtf.
The publishers/processors can be identified by searching for "got lock":
staiu01/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26
19:51:41.549: [AGENTUSR][2934959008][UNKNOWN] CssLock::tryLock, got lock
CLSN.ONS.ONSPROC
staiu02/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26
19:51:41.626: [AGENTUSR][3992972192][UNKNOWN] CssLock::tryLock, got lockCLSN.ONS.ONSNETPROC
staiu03/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26
20:00:21.214: [AGENTUSR][2856319904][UNKNOWN] CssLock::tryLock, got lock
CLSN.RLB.pommi
staiu02/agent/crsd/oraagent_spommere/oraagent_spommere.l01:2009-05-26
20:00:27.108: [AGENTUSR][3926576032][UNKNOWN] CssLock::tryLock, got lock
CLSN.FAN.pommi.FANPROC
These CSS-based locks work in such a way that any node can grab the lock if it is not already
held. If the process of the lock holder goes away, or CSS thinks the node went away, the lock
is released and someone else tries to get the lock. The different processors try to grab the
lock whenever they see an event. If a processor previously was holding the lock, it doesn't
have to acquire it again. There is currently no implementation of a "backup" or designated
failover-publisher.
4.2.4 ONSNETPROC
In a cluster of 2 or more nodes, one onsagent's eONS subscriber will also assume the role of
CLSN.ONS.ONSNETPROC, i.e. is responsible for just publishing network down events. The
publishers with the roles of CLSN.ONS.ONSPROC and CLSN.ONS.ONSNETPROC cannot and
will not run on the same node, i.e. they must run on distinct nodes.
If both the CLSN.ONS.ONSPROC and CLSN.ONS.ONSNETPROC simultaneously get their public
network interface pulled down, there may not be any event.
4.2.5 RLB publisher
Another additional thread tied to the dbagent thread in the oraagent process of only one
node in the cluster, is " Thread:[RLB:dbname]", and it dequeues the LBA/RLB/affinity event
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 83/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
82
from the SYS$SERVICE_METRICS queue, and publishes the event to eONS clients. It assumes
the lock role of CLSN.RLB.dbname. The CLSN.RLB.dbname publisher can run on any node,
and is not related to the location of the MMON master (who enqueues LBA events into the
SYS$SERVICE_METRICS queue. So since the RLB publisher (RLB.dbname) can run on a
different node than the ONS publisher (ONSPROC), RLB events can be dequeued on one
node, and published to ONS on another node. There is one RLB publisher per database in
the cluster
Sample trace, where Node 3 is the RLB publisher, and Node 2 has the ONSPROC role:
– Node 3:
2009-05-28 19:29:10.754: [AGENTUSR][2857368480][UNKNOWN]
Thread:[RLB:pommi] publishing message srvname = rlb
2009-05-28 19:29:10.754: [AGENTUSR][2857368480][UNKNOWN]
Thread:[RLB:pommi] publishing message payload = VERSION=1.0 database=pommi
service=rlb { {instance=pommi_3 percent=25 flag=UNKNOWN
aff=FALSE}{instance=pommi_4 percent=25 flag=UNKNOWN
aff=FALSE}{instance=pommi_2 percent=25 flag=UNKNOWN
aff=FALSE}{instance=pommi_1 percent=25 flag=UNKNOWN aff=FALSE} }
timestamp=2009-05-28 19:29:10
The RLB events will be received by the eONS subscriber of the ONS publisher
(ONSPROC) who then posts the event to ONS:
–
Node 2:2009-05-28 19:29:40.773: [AGENTUSR][3992976288][UNKNOWN] Publishing the
ONS event type database/event/servicemetrics/rlb
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 84/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
83
4.2.6 Example
– Node 1
o assumes role of FAN/AQ publisher CLSN.FAN.dbname.FANPROC, enqueues
HA events into HA alerts queue
o assumes role of eONS publisher to generate user callouts
MMON enqueues RLB events into SYS$SERVICE_METRICS queue
– Node 2
o assumes role of ONS publisher CLSN.ONS.ONSPROC to publish ONS and RLB
events to ONS subscribers (listener, JDBC ICC/UCP)
o assumes role of eONS publisher to generate user callouts
– Node 3
o assumes role of ONSNET publisher CLSN.ONS.ONSNETPROC to publish ONS
events to ONS subscribers (listener, JDBC ICC/UCP)
o assumes role of eONS publisher to generate user callouts
– Node 4
o assumes role of RLB publisher CLSN.RLB.dbname, dequeues RLB events
from SYS$SERVICE_METRICS queue and posts them to eONS
o assumes role of eONS publisher to generate user callouts
4.2.7 Coming up in 11.2.0.2
The above description is only valid for 11.2.0.1. In 11.2.0.2, the eONS proxy a.k.a eONS
server will be removed, and its functionality will be assumed by evmd. In addition, the
tracing as described above, will change significantly. The major reason for this change was
the high resource usage of the eONS JVM.
In order to find the publishers in the oraagent.log in 11.2.0.2, search for these patterns:
“ONS.ONSNETPROC CssLockMM::tryMaster I am the master”
“ONS.ONSPROC CssLockMM::tryMaster I am the master”
“FAN.<dbname> CssLockMM::tryMaster I am the master”
“RLB.<dbname> CssSemMM::tryMaster I am the master”
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 85/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
84
5 Configuration best practices
5.1 Cluster interconnect
Oracle does not recommend configuring separate interfaces for Oracle Clusterware andOracle RAC; instead, if multiple private interfaces are configured in the system, we
recommend those to be bonded to a single interface in order to provide redundancy in case
of a NIC failure. Unless bonded, multiple private interfaces provide only load balancing, not
failover capabilities.
The consequences of changing interface names depend on which name you are changing,
and whether you are also changing the IP address. In cases where you are only changing the
interface names, the consequences are minor. If you change the name for the public
interface that is stored in the OCR, then you also must modify the node applications for each
node. Therefore, you must stop the node applications for this change to take effect.
Changes made with oifcfg delif / setif for the cluster interconnect also change the privateinterconnect used by clusterware, hence an Oracle Clusterware restart is the consequence.
The interface used by the Oracle RAC (RDBMS) interconnect must be the same interface that
Oracle Clusterware is using with the hostname. Do not configure the private interconnect
for Oracle RAC on a separate interface that is not monitored by Oracle Clusterware.
See Also: Oracle Clusterware Administration and Deployment Guide, "Changing Network
Addresses on Manually Configured Networks" for more information.
5.2 misscount
As misscount is a critical value, Oracle does not support changing the default value. The
current misscount value can be checked with
# crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
In case of vendor clusterware integration we set misscount to 600 in order to give the
vendor clusterware enough time to make a node join / leave decision. Never change the
default in a vendor clusterware configuration.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 86/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
85
6 Clusterware Diagnostics and Debugging
6.1 Check Cluster Health
After a successful cluster installation or node startup the health of the entire cluster or a
node can be checked.
‘crsctl check has’ will check if OHASD is started on the local node and if the daemon is
running healthy.
# crsctl check has
CRS-4638: Oracle High Availability Services is online
‘crsctl check crs’ will check the OHASD, the CRSD, the ocssd and the EVM daemon.
# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
‘crsctl check cluster –all’ will check all the daemons from all nodes belonging to that cluster.
# crsctl check cluster –all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
During startup issues monitor the output from the crsctl start cluster command; all attempts
to start a resource should be successful. If the start of a resource is failing, consult the
appropriate log file to see the errors.
# crsctl start cluster
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 87/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
86
CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'node1'
CRS-2672: Attempting to start 'ora.asm' on 'node1'
CRS-2676: Start of 'ora.evmd' on 'node1' succeeded
CRS-2676: Start of 'ora.asm' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'node1'
CRS-2676: Start of 'ora.crsd' on 'node1' succeeded
6.2 crsctl command line tool
It is the Oracle Clusterware management utility that has commands to manage allClusterware entities under the Oracle Clusterware framework. This includes the daemons
that are part of the Clusterware, wallet management and clusterized commands that work
on all or some of the nodes in the cluster.
You can use CRSCTL commands to perform several operations on Oracle Clusterware, such
as:
– Starting and stopping Oracle Clusterware resources
– Enabling and disabling Oracle Clusterware daemons
– Checking the health of the cluster
– Managing resources that represent third-party applications
– Integrating Intelligent Platform Management Interface (IPMI) with Oracle Clusterware to
provide failure isolation support and to ensure cluster integrity
– Debugging Oracle Clusterware components
Mostly all of the operations are cluster-wide.
See Also: Oracle Clusterware Administration and Deployment Guide, "CRSCTL Utility
Reference" for more information about using crsctl.
You can use crsctl set log commands as the root user to enable dynamic debugging for
Cluster Ready Services (CRS), Cluster Synchronization Services (CSS), and the Event Manager
(EVM), and the clusterware subcomponents. You can dynamically change debugging levels
using crsctl debug commands. Debugging information remains in the Oracle Cluster Registry
for use during the next startup. You can also enable debugging for resources.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 88/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
87
A full comprehensive list of all debugging features and options is listed in the
“Troubleshooting and Diagnostic Output” section in the “Oracle Clusterware Administration
and Deployment Guide”.
6.3 Trace File Infrastructure and Location
Oracle Clusterware uses a unified log directory structure to consolidate component log files.
This consolidated structure simplifies diagnostic information collection and assists during
data retrieval and problem analysis.
Oracle Clusterware uses a file rotation approach for log files. If you cannot find the reference
given in the file specified in the "Details in" section of an alert file message, then this file
might have been rolled over to a rollover version, typically ending in *.lnumber where
number is a number that starts at 01 and increments to however many logs are being kept,
the total for which can be different for different logs. While there is usually no need to
follow the reference unless you are asked to do so by Oracle Support, you can check thepath given for roll over versions of the file. The log retention policy, however, foresees that
older logs are be purged as required by the amount of logs generated
GRID_HOME /log/<host>/diskmon – Disk Monitor Daemon
GRID_HOME /log/<host>/client – OCRDUMP, OCRCHECK, OCRCONFIG, CRSCTL – edit the
GRID_HOME /srvm/admin/ocrlog.ini file to increase the trace level
GRID_HOME /log/<host>/admin – not used
GRID_HOME/ log/<host>/ctssd – Cluster Time Synchronization Service
GRID_HOME /log/<host>/gipcd – Grid Interprocess Communication Daemon
GRID_HOME /log/<host>/ohasd – Oracle High Availability Services Daemon
GRID_HOME /log/<host>/crsd – Cluster Ready Services Daemon
GRID_HOME /log/<host>/gpnpd – Grid Plug and Play Daemon
GRID_HOME /log/<host>/mdnsd – Mulitcast Domain Name Service Daemon
GRID_HOME /log/<host>/evmd – Event Manager Daemon
GRID_HOME /log/<host>/racg/racgmain – RAC RACG
GRID_HOME /log/<host>/racg/racgeut – RAC RACG
GRID_HOME /log/<host>/racg/racgevtf – RAC RACG
GRID_HOME /log/<host>/racg – RAC RACG (only used if pre-11.1 database is installed)
GRID_HOME /log/<host>/cssd – Cluster Synchronization Service Daemon
GRID_HOME /log/<host>/srvm – Server Manager
GRID_HOME /log/<host>/agent/ohasd/oraagent_oracle11 – HA Service Daemon Agent
GRID_HOME /log/<host>/agent/ohasd/oracssdagent_root – HA Service Daemon CSS Agent
GRID_HOME /log/<host>/agent/ohasd/oracssdmonitor_root – HA Service Daemon
ocssdMonitor Agent
GRID_HOME /log/<host>/agent/ohasd/orarootagent_root – HA Service Daemon Oracle Root
Agent
GRID_HOME /log/<host>/agent/crsd/oraagent_oracle11 – CRS Daemon Oracle Agent
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 89/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
88
GRID_HOME /log/<host>/agent/crsd/orarootagent_root – CRS Daemon Oracle Root Agent
GRID_HOME /log/<host>/agent/crsd/ora_oc4j_type_oracle11 – CRS Daemon OC4J Agent
(11.2.0.2 feature and not used in 11.2.0.1)
GRID_HOME /log/<host>/gnsd – Grid Naming Services Daemon
6.3.1 Diagcollection
The best way to get all clusterware related traces for an incident is using
Grid_home/bin/diagcollection.pl. To get all traces and an OCRDUMP run the command as
root user “diagcollection.pl –collect –crshome <GRID_HOME >” on all nodes from the cluster
and provide support or development the collected traces.
# Grid_home/bin/diagcollection.pl
Production Copyright 2004, 2008, Oracle. All rights reserved
Cluster Ready Services (CRS) diagnostic collection tool
diagcollection
--collect
[--crs] For collecting crs diag information
[--adr] For collecting diag information for ADR
[--ipd] For collecting IPD-OS data
[--all] Default.For collecting all diag information.
[--core] UNIX only. Package core files with CRS data
[--afterdate] UNIX only. Collects archives from the specified
date. Specify in mm/dd/yyyy format
[--aftertime] Supported with -adr option. Collects archives
after the specified time. Specify in YYYYMMDDHHMISS24 format
[--beforetime] Supported with -adr option. Collects archives
before the specified date. Specify in YYYYMMDDHHMISS24 format
[--crshome] Argument that specifies the CRS Home location
[--incidenttime] Collects IPD data from the specified time.
Specify in MM/DD/YYYY24HH:MM:SS format
If not specified, IPD data generated in the past 2
hours are collected
[--incidentduration] Collects IPD data for the duration after
the specified time. Specify in HH:MM format.
If not specified, all IPD data after incidenttime are
collected
NOTE:
1. You can also do the following
./diagcollection.pl --collect --crs --crshome <CRS Home>
--clean cleans up the diagnosability
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 90/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
89
information gathered by this script
--coreanalyze UNIX only. Extracts information from core files
and stores it in a text file
For more information about collection of IPD data please see section 6.4.
In case of a vendor clusterware installation it is important to collect and provide all related
vendor clusterware files to Oracle Support.
6.3.2 Alert Messages Using Diagnostic Record Unique IDs
Beginning with Oracle Database 11g release 2 (11.2), certain Oracle Clusterware messages
contain a text identifier surrounded by "(:" and ":)". Usually, the identifier is part of the
message text that begins with "Details in..." and includes an Oracle Clusterware diagnostic
log file path and name similar to the following example. The identifier is called a DRUID, or
Diagnostic Record Unique ID:
2009-07-16 00:18:44.472
[/scratch/11.2/grid/bin/orarootagent.bin(13098)]CRS-5822:Agent
'/scratch/11.2/grid/bin/orarootagent_root' disconnected from server. Details at
(:CRSAGF00117:) in
/scratch/11.2/grid/log/stnsp014/agent/crsd/orarootagent_root/orarootagent_root.log
.
DRUIDs are used to relate external product messages to entries in a diagnostic log file and to
internal Oracle Clusterware program code locations. They are not directly meaningful to
customers and are used primarily by Oracle Support when diagnosing problems.
6.4 OUI / SRVM / JAVA related GUI tracing
There are several Java-based GUI tools which in case of errors should run with the following
trace levels set:
"setenv SRVM_TRACE true" (or "export SRVM_TRACE=true")
"setenv SRVM_TRACE_LEVEL 2" (or "export SRVM_TRACE_LEVEL=2")
The Oracle Universal Installer can run with the –debug flag in case of installer errors (e.g.
"./runInstaller -debug" for install).
6.5 Reboot Advisory
Oracle Clusterware may, in certain circumstances, instigate rebooting of a node to ensurethe overall health of the cluster and of the databases and other applications running on it.
The decision to reboot a node can be made by Clusterware running on that node or by
Clusterware on another node in the cluster. When the decision is made on the problematic
node, ordinary activity logging (such as the Clusterware alert log) is not reliable: time is of
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 91/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
90
the essence in most reboot scenarios, and the reboot usually occurs before the operating
system flushes buffered log data to disk. This means that an explanation of what led to the
reboot may be lost.
New in the 11.2 release of Oracle Clusterware is a feature called Reboot Advisory that
improves the chances of preserving an explanation for a Clusterware-initiated reboot. At
the moment a reboot decision is made by Clusterware, a short explanatory message is
produced and an attempt is made to “publish” it in two ways:
The reboot decision is written to a small file (normally on locally-attached storage) using a
“direct”, non-buffered I/O request. The file is created and preformatted in advance of the
failure (during Clusterware startup), so this I/O has a high probability of success, even on a
failing system. The reboot decision is also broadcast over all available network interfaces on
the failing system.
These operations are executed in parallel and are subject to an elapsed time limit so as notto delay the impending reboot. Attempting both disk and network publication of the
message makes it likely that at least one succeeds, and often both will. Successfully stored
or transmitted Reboot Advisory messages ultimately appear in a Clusterware alert log on
one or more nodes of the cluster.
When network broadcast of a Reboot Advisory is successful, the associated messages
appear in the alert logs of other nodes in the cluster. This happens more or less
instantaneously, so the messages can be viewed immediately to determine the cause of the
reboot. The message includes the host name of node that is being rebooted to distinguish it
from the normal flow of alert messages for that node. Only nodes in the same cluster as the
failing node will display these messages.
If the Reboot Advisory was successfully written to a disk file, when Oracle Clusterware starts
the next time on that node, it will produce messages related to the prior in the Clusterware
alert log. Reboot Advisories are timestamped and the startup scan for these files will
announce any occurrences that are less than 3 days old. The scan doesn’t empty or mark
already-announced files, so the same Reboot Advisory can appear in the alert log multiple
times if Clusterware is restarted on a node multiple times within a 3-day period.
Whether from a file or a network broadcast, Reboot Advisories use the same alert log
messages, normally two per advisory. The first is message CRS-8011, which displays the host
name of the rebooting node, a software component identifier, and a timestamp
(approximately the time of the reboot). An example looks like this:
[ohasd(24687)]CRS-8011:reboot advisory message from host: sta00129, component:
CSSMON, with timestamp: L-2009-05-05-10:03:25.340
Following message CRS-8011 will be CRS-8013, which conveys the explanatory message for
the forced reboot, as in this example:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 92/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
91
[ohasd(24687)]CRS-8013:reboot advisory message text: Rebooting after limit 28500
exceeded; disk timeout 27630, network timeout 28500, last heartbeat from ocssd at
epoch seconds 1241543005.340, 4294967295 milliseconds ago based on invariant clock
value of 93235653
Note that everything in message CRS-8013 after “text:” originates in the Clusterware
component that instigated the reboot. Because of the critical circumstances in which it is
produced, this text does not come from an Oracle NLS message file: it is always in English
language and USASCII7 character set.
In some circumstances, Reboot Advisories may convey binary diagnostic data in addition to a
text message. If so, message CRS-8014 and one or more of message CRS-8015 will also
appear. This binary data is used only if the reboot situation is reported to Oracle for
resolution.
Because multiple components can write to the Clusterware alert log at the same time, it ispossible that the messages associated with a given Reboot Advisory may appear with other
(unrelated) messages interspersed. However, messages for different Reboot Advisories are
never interleaved: all of the messages for one Advisory are written before any message for
another Advisory.
For additional information, refer to the Oracle Errors manual discussion of messages CRS-
8011 and –8013.
7 Other Tools
7.1 ocrpatchocrpatch was developed in 2005 in order provide Development and Support with a tool that
is able to fix corruptions or make other changes in OCR in the case where official tools such
as ocrconfig or crsctl were unable to handle such changes. ocrpatch is NOT being distributed
as part of the software release. The functionality of ocrpatch is already well described in a
separate document, therefore we won't go into details in this paper; the ocrpatch document
is located in the public RAC Performance Group Folder on stcontent.
7.2 vdpatch
7.2.1 Introduction
vdpatch is a new, Oracle internal tool, developed for Oracle Clusterware 11g release 2
(11.2). vdpatch pretty much uses the same code as ocrpatch, i.e. the look & feel is very
similar. The purpose of this tool is to facilitate diagnosis of CSS related issues where voting
file content is involved. vdpatch operates on a per-block basis, i.e. it can read (not write)
512-byte blocks from a voting file by block number or name. Similarly to ocrpatch, it
attempts to interpret the content in a meaningful way instead of just presenting columns of
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 93/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
92
hexadecimal values. vdpatch allows online (clusterware stack and ocssd running) and offline
(clusterware stack / ocssd not running) access. vdpatch works for both voting files on NAS
and in ASM. At this time, vdpatch is not actively being distributed such as ocrpatch.
Development and Support have to obtain a binary from a production ADE label.
7.2.2 General Usage
vdpatch can only be run as root, otherwise receive
$ vdpatch
VD Patch Tool Version 11.2 (20090724)
Oracle Clusterware Release 11.2.0.2.0
Copyright (c) 2008, 2009, Oracle. All rights reserved.
[FATAL] not privileged
[OK] Exiting due to fatal error ...
The filename/pathname of the voting file(s) can be obtained via 'crsctl query css votedisk'
command; note that this command only works if ocssd is running. If ocssd is not up, crsctlwill signal
# crsctl query css votedisk
Unable to communicate with the Cluster Synchronization Services daemon.
If ocssd is running, you will receive the following output:
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 0909c24b14da4f89bfbaf025cd228109 (/dev/raw/raw100) [VDDG]
2. ONLINE 9c74b39a1cfd4f84bf27559638812106 (/dev/raw/raw104) [VDDG]
3. ONLINE 1bb06db216434fadbfa3336b720da252 (/dev/raw/raw108) [VDDG]
Located 3 voting file(s).
The above output indicates that there are three voting files defined in the diskgroup +VDDG,
each located on particular raw device, which is part of the ASM diskgroup. vdpatch allows
opening only ONE device at a time to read its content:
# vdpatch
VD Patch Tool Version 11.2 (20090724)
Oracle Clusterware Release 11.2.0.2.0
Copyright (c) 2008, 2009, Oracle. All rights reserved.
vdpatch> op /dev/raw/raw100
[OK] Opened /dev/raw/raw100, type: ASM
If the voting file is on a raw device, crsctl and vdpatch would show
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 1de94f4db65a4f9bbf8b9bf3eba6f43b (/dev/raw/raw126) []
2. ONLINE 26d28a7311264f77bf8df6463420e614 (/dev/raw/raw130) []
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 94/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
93
3. ONLINE 9f862a63239b4f52bfdbce6d262dc349 (/dev/raw/raw134) []
Located 3 voting file(s).
# vdpatch
VD Patch Tool Version 11.2 (20090724)Oracle Clusterware Release 11.2.0.2.0
Copyright (c) 2008, 2009, Oracle. All rights reserved.
vdpatch> op /dev/raw/raw126
[OK] Opened /dev/raw/raw126, type: Raw/FS
In order to open another voting file, simply run 'op' again:
vdpatch> op /dev/raw/raw126
[OK] Opened /dev/raw/raw126, type: Raw/FS
vdpatch> op /dev/raw/raw130
[INFO] closing voting file /dev/raw/raw126
[OK] Opened /dev/raw/raw130, type: Raw/FS
Using the 'h' command, it will list all other available commands:
vdpatch> h
Usage: vdpatch
BLOCK operations
op <path to voting file> open voting file
rb <block#> read block by block#
rb status|kill|lease <index> read named block
index=[0..n] => Devenv nodes 1..(n-1)
index=[1..n] => shiphome nodes 1..n
rb toc|info|op|ccin|pcin|limbo read named block
du dump native block from offset
di display interpreted block
of <offset> set offset in block, range 0-511
MISC operations
i show parameters, version, info
h this help screen
exit / quit exit vdpatch
7.2.3 Common Use Case
The common use case for vdpatch is reading the content. Voting file blocks can be read by
either block number or named block type. For types TOC, INFO, OP, CCIN, PCIN and LIMBO,
there just exists one block in the voting file, so reading this one block would be done by e.g.
running 'rb toc'; the output will both show a hex/ascii dump of the 512-byte block, as well as
the interpreted content of that block:
vdpatch> rb toc
[OK] Read block 4
[INFO] clssnmvtoc block
0 73734C63 6B636F54 01040000 00020000 00000000 ssLckcoT............
20 00000000 40A00000 00020000 00000000 10000000 ....@...............
40 05000000 10000000 00020000 10020000 00020000 ....................
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 95/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
94
…
…
420 00000000 00000000 00000000 00000000 00000000 ....................
440 00000000 00000000 00000000 00000000 00000000 ....................
460 00000000 00000000 00000000 00000000 00000000 ....................
480 00000000 00000000 00000000 00000000 00000000 ....................
500 00000000 00000000 00000000 ............
[OK] Displayed block 4 at offset 0, length 512
[INFO] clssnmvtoc block
magic1_clssnmvtoc: 0x634c7373 - 1665954675
magic2_clssnmvtoc: 0x546f636b - 1416586091
fmtvmaj_clssnmvtoc: 0x01 - 1
fmtvmin_clssnmvtoc: 0x04 - 4
resrvd_clssnmvtoc: 0x0000 - 0
maxnodes_clssnmvtoc: 0x00000200 - 512
incarn1_clssnmvtoc: 0x00000000 - 0
incarn2_clssnmvtoc: 0x00000000 - 0
filesz_clssnmvtoc: 0x0000a040 - 41024
blocksz_clssnmvtoc: 0x00000200 - 512
hdroff_clssnmvtoc: 0x00000000 - 0
hdrsz_clssnmvtoc: 0x00000010 - 16
opoff_clssnmvtoc: 0x00000005 - 5
statusoff_clssnmvtoc: 0x00000010 - 16
statussz_clssnmvtoc: 0x00000200 - 512
killoff_clssnmvtoc: 0x00000210 - 528
killsz_clssnmvtoc: 0x00000200 - 512
leaseoff_clssnmvtoc: 0x0410 - 1040
leasesz_clssnmvtoc: 0x0200 - 512
ccinoff_clssnmvtoc: 0x0006 - 6
pcinoff_clssnmvtoc: 0x0008 - 8
limbooff_clssnmvtoc: 0x000a - 10
volinfooff_clssnmvtoc: 0x0003 - 3
For block types STATUS, KILL and LEASE, there exists one block per defined cluster node, sothe 'rb' command needs to be used in combination with an index that denotes the node
number. In a Development environment, the index starts with 0, while in a
shiphome/production environment, the index starts with 1. So in order to read the 5th
node's KILL block in a Development environment, submit 'rb kill 4', while in a production
environment, use 'rb kill 5'.
Example to read the STATUS block of node 3 (here: staiu03) in a Development environment:
vdpatch> rb status 2
[OK] Read block 18
[INFO] clssnmdsknodei vote block
0 65746F56 02000000 01040B02 00000000 73746169 etoV............stai20 75303300 00000000 00000000 00000000 00000000 u03.................
40 00000000 00000000 00000000 00000000 00000000 ....................
60 00000000 00000000 00000000 00000000 00000000 ....................
80 00000000 3EC40609 8A340200 03000000 03030303 ....>....4..........
100 00000000 00000000 00000000 00000000 00000000 ....................
120 00000000 00000000 00000000 00000000 00000000 ....................
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 96/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
95
140 00000000 00000000 00000000 00000000 00000000 ....................
160 00000000 00000000 00000000 00000000 00000000 ....................
180 00000000 00000000 00000000 00000000 00000000 ....................
200 00000000 00000000 00000000 00000000 00000000 ....................
220 00000000 00000000 00000000 00000000 00000000 ....................
240 00000000 00000000 00000000 00000000 00000000 ....................
260 00000000 00000000 00000000 00000000 00000000 ....................
280 00000000 00000000 00000000 00000000 00000000 ....................
300 00000000 00000000 00000000 00000000 00000000 ....................
320 00000000 00000000 00000000 00000000 00000000 ....................
340 00000000 00000000 00000000 8E53DF4A ACE84A91 .............S.J..J.
360 E4350200 00000000 03000000 441DDD4A 6051DF4A .5..........D..J`Q.J
380 00000000 00000000 00000000 00000000 00000000 ....................
400 00000000 00000000 00000000 00000000 00000000 ....................
420 00000000 00000000 00000000 00000000 00000000 ....................
440 00000000 00000000 00000000 00000000 00000000 ....................
460 00000000 00000000 00000000 00000000 00000000 ....................
480 00000000 00000000 00000000 00000000 00000000 ....................
500 00000000 00000000 00000000 ............
[OK] Displayed block 18 at offset 0, length 512
[INFO] clssnmdsknodei vote block
magic_clssnmdsknodei: 0x566f7465 - 1450144869
nodeNum_clssnmdsknodei: 0x00000002 - 2
fmtvmaj_clssnmdsknodei: 0x01 - 1
fmtvmin_clssnmdsknodei: 0x04 - 4
prodvmaj_clssnmdsknodei: 0x0b - 11
prodvmin_clssnmdsknodei: 0x02 - 2
killtime_clssnmdsknodei: 0x00000000 - 0
nodeName_clssnmdsknodei: staiu03
inSync_clssnmdsknodei: 0x00000000 - 0
reconfigGen_clssnmdsknodei: 0x0906c43e - 151438398
dskWrtCnt_clssnmdsknodei: 0x0002348a - 144522
nodeStatus_clssnmdsknodei: 0x00000003 - 3
nodeState_clssnmdsknodei[CLSSGC_MAX_NODES]:node 0: 0x03 - 3 - MEMBER
node 1: 0x03 - 3 - MEMBER
node 2: 0x03 - 3 - MEMBER
node 3: 0x03 - 3 - MEMBER
timing_clssnmdsknodei.sts_clssnmTimingStmp: 0x4adf538e - 1256149902 - Wed Oct 21
11:31:42 2009
timing_clssnmdsknodei.stms_clssnmTimingStmp: 0x914ae8ac - 2437605548
timing_clssnmdsknodei.stc_clssnmTimingStmp: 0x000235e4 - 144868
timing_clssnmdsknodei.stsi_clssnmTimingStmp: 0x00000000 - 0
timing_clssnmdsknodei.flags_clssnmdsknodei: 0x00000003 - 3
unique_clssnmdsknodei.eptime_clssnmunique: 0x4add1d44 - 1256004932 - Mon Oct 19
19:15:32 2009
ccinid_clssnmdsknodei.cin_clssnmcinid: 0x4adf5160 - 1256149344 - Wed Oct 21
11:22:24 2009
ccinid_clssnmdsknodei.unique_clssnmcinid: 0x00000000 - 0
pcinid_clssnmdsknodei.cin_clssnmcinid: 0x00000000 - 0 - Wed Dec 31 16:00:00 1969
pcinid_clssnmdsknodei.unique_clssnmcinid: 0x00000000 - 0
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 97/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
96
We do not plan to allow vdpatch making any changes to a voting file. The only
recommended way of modifying voting files is to drop and recreate them using the crsctl
command.
7.3 Appvipcfg – adding an application VIP
In 11.2 the creation and deletion from an application or uservip can be managed via
Grid_home/bin/appvipcfg
Production Copyright 2007, 2008, Oracle.All rights reserved
Usage: appvipcfg create -network=<network_number> -ip=<ip_address>
-vipname=<vipname>
-user=<user_name>[-group=<group_name>]
delete -vipname=<vipname>
The appvipcfg command line tool can only create an application VIP on the default network
for which the resource ora.net1.network is created by default. If someone needs to createan application VIP on a different network or subnet this must be done manually.
EXAMPLE of creating a uservip on a different network (ora.net2.network)
srvctl add vip -n node1 -k 2 -A appsvip1/255.255.252.0/eth2
crsctl add type coldfailover.vip.type -basetype ora.cluster_vip_net2.type
crsctl add resource coldfailover.vip -type coldfailover.vip.type -attr \
"DESCRIPTION=USRVIP_resource,RESTART_ATTEMPTS=0,START_TIMEOUT=0, STOP_TIMEOUT=0, \
CHECK_INTERVAL=10, USR_ORA_VIP=10.137.11.163, \
START_DEPENDENCIES=hard(ora.net2.network)pullup(ora.net2.network), \
STOP_DEPENDENCIES=hard(ora.net2.network), \
ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:oracle11:r-x'"
There are a couple known bugs around that area and for tracking purpose and completeness
we will list them:
– 8623900 srvctl remove vip -i <ora.vipname> is removing the associated
ora.netx.network
– 8620119 appvipcfg should be expanded to create a network resource
– 8632344 srvctl modify nodeapps -a will modify the vip even if the interface is not
valid
– 8703112 appsvip should have the same behavior as ora.vip like vip failback
– 8758455 uservip start failed and orarootagent core dump in clsn_agent::agentassert
– 8761666 appsvipcfg should respect /etc/hosts entry for apps ip even if gns is
configured
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 98/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
97
– 8820801 using a second network (k 2) I’m able to add and start the same ip twice
7.4 Application and Script Agent
The application or script agent manages the application/resource through the application
specific user code. Oracle Clusterware contains a special shared library (libagfw) which
allows users to plug-in application specific actions using a well defined interface.
The following sections describe how to build an agent using Oracle Clusterware's agent
framework interface.
7.4.1 Action Entry Points
Action entry points refer to user defined code that needs to be executed whenever an
action has to be taken on a resource (start resource, stop resource etc.). For every resource
type, Clusterware requires that action entry points are defined for the following actions:
start : Actions to be taken to start the resource
stop : Actions to gracefully stop the resource
check : Actions taken to check the status of the resource
clean : Actions to forcefully stop the resource.
These action entry points can be defined using C++ code or in script. If any of these actions
are not explicitly defined, Clusterware assumes by default that they are defined in a script.
This script is located via the ACTION_SCRIPT attribute for the resource type. Hence it is
possible to have hybrid agents, which define some action entry points using script and other
action entry points using C++. It is possible to define action entry points for other actions too
(e.g. for changes in attribute value) but these are not mandatory.
7.4.2 Sample Agents
Consider a file as the resource that needs to be managed by Clusterware. An agent that
manages this resource has the following tasks:
On startup : Create the file.
On shutdown : Gracefully delete the file.
On check command: Detect whether the file is present or not.
On clean command: Forcefully delete the file.
To describe this particular resource to Oracle Clusterware, a specialized resource type is first
created, that contains all the characteristic attributes for this resource class. In this case, the
only special attribute to be described is the filename to be monitored. This can be done withthe crsctl command. While defining the resource type, we can also specify the
ACTION_SCRIPT and AGENT_FILENAME attributes. These are used to refer to the shell script
and executables that contain the action entry points for the agents.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 99/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
98
Once the resource type is defined, there are several options to write a specialized agent
which does the required tasks - the agent could be written as a script, as a C/C++ program or
as a hybrid.
Examples for each of them are given below.
7.4.3 Shell script agent
The file Grid_home/crs/demo/demoActionScript is a shell script which already contains all
the required action entry points and can act as an agent for the file resource. To test this
script, the following steps need to be performed:
(1) Start the Clusterware installation.
(2) Add a new resource type using the crsctl utility as below
$ crsctl add type test_type1 -basetype cluster_resource -attr \
"ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt" -attr \
"ATTRIBUTE=ACTION_SCRIPT,TYPE=string,DEFAULT_VALUE=/path/to/demoActionScript"
Modify the path to the file appropriately. This adds a new resource type to Clusterware.
Alternately, the attributes can be added in a text file which is passed as a parameter to the
CRSCTL utility.
(3) Add new resources to the cluster using the crsctl utility. The commands to do this are:
$ crsctl add resource r1 -type test_type1 -attr "PATH_NAME=/tmp/r1.txt"
$ crsctl add resource r2 -type test_type1 -attr "PATH_NAME=/tmp/r2.txt"
Modify the PATH_NAME attribute for the resources as needed. This adds resources named
r1 and r2 to be monitored by clusterware. Here we are overriding the default value for the
PATH_NAME attribute for our resources.
(4) Start/stop the resources using the crsctl utility. The commands to do this are:
$ crsctl start res r1
$ crsctl start res r2
$ crsctl check res r1
$ crsctl stop res r2
The files /tmp/r1.txt and /tmp/r2.txt get created and deleted as the resources r1 and r2 get
started and stopped.
7.4.4 Option 2: C++ agentOracle provides a demoagent1.cpp in the Grid_home/crs/demo directory. The
demoagent1.cpp is a sample C++ program that has similar functionality to the shell script
above. This program also monitors a specified file on the local machine. To test this
program, the following steps need to be performed:
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 100/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
99
(1) Compile the C++ agent using the provided source file demoagent1.cpp and makefile.
The makefile needs to be modified based on the local compiler/linker paths and install
locations. The output will be an executable named demoagent1
(2) Start the Clusterware
(3) Add a new resource type using the crsctl utility as below
$ crsctl add type test_type1 -basetype cluster_resource \
-attr "ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt" \
-attr "ATTRIBUTE=AGENT_FILENAME,TYPE=string,DEFAULT_VALUE=/path/to/demoagent1"
Modify the path to the file appropriately. This adds a new resource type to Clusterware.
(4) Create a new resource based on the type that is defined above. The commands are as
follows:
$ crsctl add res r3 -type test_type1 -attr "PATH_NAME=/tmp/r3.txt"
$ crsctl add res r4 -type test_type1 -attr "PATH_NAME=/tmp/r4.txt"
This adds resources named r3 and r4 to be monitored by Clusterware.
(5) Start/stop the resource using the CRSCTL utility. The commands to do so are:
$ crsctl start res re3
$ crsctl start res r4
$ crsctl check res r3
$ crsctl stop res r4
The files /tmp/r3.txt and /tmp/r4.txt get created and deleted as the resources get started
and stopped.
7.4.5 Option 3: Hybrid agent
The Grid_home/crs/demo/demoagent2.cpp is a sample C++ program that has similar
functionality to the shell script above. This program also monitors a specified file on the local
machine. However, this program defines only the CHECK action entry point - all other action
entry points are left undefined and are read from the ACTION_SCRIPT attribute. To test this
program, the following steps need to be performed:
(1) Compile the C++ agent using the provided source file demoagent2.cpp and makefile.
The makefile needs to be modified based on the local compiler/linker paths and install
locations. The output will be an executable named demoagent2.
(2) Start the Clusterware
(3) Add a new resource type using the crsctl utility as below
$ crsctl add type test_type1 -basetype cluster_resource \
-attr "ATTRIBUTE=PATH_NAME,TYPE=string,DEFAULT_VALUE=default.txt" \
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 101/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
100
-attr "ATTRIBUTE=AGENT_FILENAME,TYPE=string,DEFAULT_VALUE=/path/demoagent2" \
-attr "ATTRIBUTE=ACTION_SCRIPT,TYPE=string,DEFAULT_VALUE=/path/demoActionScript"
Modify the path to the files appropriately. This adds a new resource type to Clusterware.
(4) Create new resources based on the type that is defined above. The commands are as
follows:
$ crsctl add res r5 -type test_type1 -attr "PATH_NAME=/tmp/r5.txt"
$ crsctl add res r6 -type test_type1 -attr "PATH_NAME=/tmp/r6.txt"
This adds resources named r5 and r6 to be monitored by Clusterware.
(5) Start/stop the resource using the CRSCTL utility. The commands to do so are:
$ crsctl start res r5
$ crsctl start res r6
$ crsctl check res r5
$ crsctl stop res r6
The files /tmp/r5.txt and /tmp/r6.txt get created and deleted as the resources get started
and stopped.
7.5 Oracle Cluster Health Monitor - OS Tool (IPD/OS)
7.5.1 Overview
This tool (formerly known as Instantaneous Problem Detection tool) is designed to detect
and analyze operating system (OS) and cluster resource related degradation and failures in
order to bring more explanatory power to many Oracle Clusterware and Oracle RAC issues
such as node eviction.
It continuously tracks the OS resource consumption at node, process, and device level. It
collects and analyzes the cluster-wide data. In real time mode, when thresholds are hit, an
alert is shown to the operator. For root cause analysis, historical data can be replayed to
understand what was happening at the time of failure.
The tool installation is pretty simple and is described in the README shipped with the zipfile.
The latest version is uploaded on OTN for Linux and NT under the following link
http://www.oracle.com/technology/products/database/clustering/ipd_download_homepag
e.html
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 102/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
101
7.5.2 Install the Oracle Cluster Health Monitor
In order to install the tool on a list of nodes run the following basic steps (for more detailed
information read the REAMDE):
– Unzip the package
– Create user crfuser:oinstall on all nodes
– Make sure crfuser’s home is same on all nodes
– Set up password-less ssh for crfuser on all nodes
– Login as crfuser and run crfinst.pl with appropriate options
– To finalize install, login as root and run crfinst.pl –f on all installed nodes
– CRF_home is set to /usr/lib/oracrf on Linux
7.5.3 Running the OS Tool stack
The OS tool must be started via /etc/init.d/init.crfd start. This command spawns the
osysmond process which spawns the ologgerd daemon. The ologgerd then picks a replica
node (if >= 2 nodes) and informs the osysmond on that node to spawn the replica ologgerd.
The OS Tool stack can be shutdown on a node as follows:
# /etc/init.d/init.crfd disable
7.5.4 Overview of Monitoring Process (osysmond)
The osysmond (one daemon per cluster node) will perform the following steps to collect the
data:
– Monitors and gathers system metrics periodically
– Runs as real time process
– Runs validation rules against the system metrics
– Marks color coded alerts based on thresholds
– Sends the data to the master Logger daemon
– Logs data to local disk in case of failure to send
The osysmond will alert on perceived node-hangs (under-utilized resources despite many
potential consumer tasks)
– CPU usage < 5%
– CPU Iowait > 50%
– MemFree < 25%
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 103/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
102
– # Disk IOs persec < 10% of max possible Disk IOs persec
– # bytes of outbound n/w traffic limited to data sent by SYSMOND
– # tasks node-wide > 1024
7.5.5 CRFGUI
The Oracle Cluster Health Monitor is shipped with two data retrieval tools one is the crfgui
which is the main GIU display.
Crfgui connects to the local or remote master LOGGERD. Is the GUI installed inside the
cluster it auto detects the LOGGERD, otherwise, running outside the cluster a cluster node
must be specified with the ‘-m’ switch.
The GUI alerts critical resource usage events and perceived system hangs. After starting it
we support different GUI views like cluster view, node view and device view.
Usage: crfgui [-m <node>] [-d <time>] [-r <sec>] [-h <sec>]
[-W <sec>] [-i] [-f <name>] [-D <int>]
-m <node> Name of the master node (tmp)
-d <time> Delayed at a past time point
-r <sec> Refresh rate
-h <sec> Highlight rate
-W <sec> Maximal poll time for connection
-I interactive with cmd prompt
-f <name> read from file, ".trc" added if no suffix
given
-D <int> sets an internal debug level
7.5.6 oclumon
A command line tool is included in the package which can be used to query the Berkeley DB
backend to print out to the terminal the node specific metrics for a specified time period.
The tool also supports a query to print the durations and the states for a resource on a node
during a specified time period. These states are based on predefined thresholds for each
resource metric and are denoted as red, orange, yellow and green indicating decreasing
order of criticality. For example, you could ask to show how many seconds did the CPU on
node "node1" remain in RED state during the last 1 hour. Oclumon can also be used to
perform miscellaneous administrative tasks such as changing the debug levels, querying
version of the tool, changing the metrics database size, etc.The usage of the oclumon can be printed by oclumon –h. To get more information about
each verb option run oclumon <verb> -h.
Currently supported verbs are:
showtrail, showobjects, dumpnodeview, manage, version, debug, quit and help
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 104/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
103
Below some useful attribute examples that can be passed to oclumon. The default location
for oclumon is /usr/lib/oracrf/bin/oclumon.
Showobjects
oclumon showobjects –n node –time "2009-10-07 15:11:00"
Dumpnodeview
oclumon dumpnodeview –n node
Showgaps
oclumon showgaps -n node1 -s "2009-10-07 02:40:00" \
-e "2009-10-07 03:59:00"
Number of gaps found = 0
Showtrail
oclumon showtrail -n node1 -diskid sde qlen totalwaittime \
-s "2009-07-09 03:40:00" -e "2009-07-09 03:50:00" \
-c "red" "yellow" "green"
Parameter=QUEUE LENGTH
2009-07-09 03:40:00 TO 2009-07-09 03:41:31 GREEN
2009-07-09 03:41:31 TO 2009-07-09 03:45:21 GREEN
2009-07-09 03:45:21 TO 2009-07-09 03:49:18 GREEN
2009-07-09 03:49:18 TO 2009-07-09 03:50:00 GREEN
Parameter=TOTAL WAIT TIME
oclumon showtrail -n node1 -sys cpuqlen -s \
"2009-07-09 03:40:00" -e "2009-07-09 03:50:00" \
-c "red" "yellow" "green"
Parameter=CPU QUEUELENGTH
2009-07-09 03:40:00 TO 2009-07-09 03:41:31 GREEN
2009-07-09 03:41:31 TO 2009-07-09 03:45:21 GREEN
2009-07-09 03:45:21 TO 2009-07-09 03:49:18 GREEN
2009-07-09 03:49:18 TO 2009-07-09 03:50:00 GREEN
7.5.7 What to collect for cluster related issues
With Oracle Clusterware 11g release 2 the Grid_home/bin/diagcollection.pl is collecting
Oracle Cluster Health Monitor data as well if found it installed on a cluster, which is
recommended by Oracle.
To collect the data after a hang or node eviction, to analyze the issue, perform the followingsteps.
– Run the 'Grid_home/bin/diagcollection.pl --collect --ipd --incidenttime <inc time> --
incidentduration <duration>' command on the IPD master, LOGGERD node, where --
incidenttime format is MM/DD/YYYY24HH:MM:SS, and --incidentduration is HH:MM
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 105/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
104
– Identify the LOGGERD node using the
/usr/lib/oracrf/bin/oclumon manage -getkey "MASTER=" command. Starting with
11.2.0.2 the oclumon will be in the Grid_home/bin directory.
– Collect data for at least 30 min before and after the incident.
masterloggerhost:$./bin/diagcollection.pl --collect --ipd --incidenttime
10/05/200909:10:11 --incidentduration 02:00
Starting with 11.2.0.2 and the CRS integrated IPD/OS the syntax to get the IPD data
collected is "masterloggerhost:$./bin/diagcollection.pl --collect --crshome
/scratch/grid_home_11.2/ --ipdhome /scratch/grid_home_11.2/ --ipd --
incidenttime 01/14/201001:00:00 --incidentduration 04:00"
– The IPD data file which will look like:
ipdData_<hostname>_<curr time>.tar.gz
ipdData_node1_20091006_2321.tar.gz
– How long does it take to run diagcollect?
4 node cluster, 4 hour data - 10 min
32 node cluster, 1 hour data - 20 min
7.5.8 Debugging
In order to turn on debugging for the osysmond or the loggerd run ‘oclumon debug log all
allcomp:5’ as root user. This will turn on debugging for all components.
Starting with 11.2.0.2 the IPD/CHM log files will be under
Grid_home/log/<hostname>/crfmond
Grid_home/log/<hostname>/crfproxy
Grid_home/log/<hostname>/crflogd
7.5.9 For ADE users
Installation and start of IPD/OS in a development environment is simpler:
$ cd crfutl && make setup && runcrf
osysmond usually starts immediately, while it may take seconds (minutes if your I/O
subsystem is slow) for ologgerd and oproxyd to start due to the initialization of the Berkeley
Database (bdb). First node to call 'runcrf' will be configured as master. First node after the
master to run 'runcrf' will be configured as replica. From there on, things will move if
required. Daemons to look out for are: osysmond (on all nodes), ologgerd (on master andreplica nodes), oproxyd (on all nodes).
In a development environment, the IPD/OS processes do not run as root or in real time.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 106/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
105
7.5.10 11.2.0.2
- The oproxyd process may or may not exist anymore. As of time of publication of this
document, the oproxyd process is disabled.
- IPDOS will be represented by the OHASD resource ora.crf, and the need for manual
installation and configuration for both development and production environments
will be eliminated.
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 107/108
Oracle White Paper— Oracle Clusterware 11g Release 2 (11.2) Technical White Paper
106
8 Appendix
References
Oracle Clusterware 11g Release 2 (11.2) – Using standard NFS to support a third
voting file for extended cluster configurations
Grid Infrastructure Installation Guide 11g Release 2 (11.2)
Clusterware Administration and Deployment Guide 11g Release 2 (11.2)
Storage Administrator's Guide 11g Release 2 (11.2)
Oracle Clusterware 11g Release 2 Technical Overview
http://www.oracle.com/technology/products/database/clustering/ipd_download_h
omepage.html
Functional Specification for CRS Resource Modeling Capabilities, Oracle
Clusterware, 11gR2
Useful Notes
Note 294430.1 - CSS Timeout Computation in Oracle Clusterware
Note 1050693.1 - Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
Note 1053010.1 - How to Dump the Contents of an Spfile on ASM when ASM/GRID
is down
Note 338706.1 - Oracle Clusterware (formerly CRS) Rolling Upgrades
Note:785351.1 - Upgrade Companion 11g Release 2
http://www.oracle.com/technology/products/database/oracle11g/upgrade/index.h
tml
7/18/2019 11gR2 Clusterware Technical Wp
http://slidepdf.com/reader/full/11gr2-clusterware-technical-wp 108/108