gi rac diagnostics racsig 2

53
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

Upload: justin-cote

Post on 22-Oct-2015

71 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Gi Rac Diagnostics Racsig 2

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1

Page 2: Gi Rac Diagnostics Racsig 2

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 3: Gi Rac Diagnostics Racsig 2

Oracle Grid Infrastructure and RACTroubleshooting and Diagnostics 2 Troubleshooting and Diagnostics 2

Sandesh Rao, Bob CaldwellRAC Assurance Team – Oracle Product Development

Page 4: Gi Rac Diagnostics Racsig 2

Agenda

� Architectural Overview

� Grid Infrastructure Processes

� Installation Troubleshooting

� RAC Performance

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4

� RAC Performance

� Dynamic Resource Mastering (DRM)

� Q&A

Page 5: Gi Rac Diagnostics Racsig 2

Architectural Overview

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5

Page 6: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� Oracle Clusterware is required for 11gR2 RAC databases

� Oracle Clusterware can manage non RAC database resources using agents.

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6

� Oracle Clusterware can manage HA for any Business Critical Application with agent infrastructure.

� Oracle publishes Agents for some non RAC DB resources

– Bundled Agents for SAP, Golden Gate, Siebel, Apache..

Page 7: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� Grid Infrastructure is the name for the combination of :-

– Oracle Cluster Ready Services (CRS)

– Oracle Automatic Storage Management (ASM)

� The Grid Home contains the software for both products

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7

� CRS can also be Standalone for ASM and/or Oracle Restart.

� CRS can run by itself or in combination with other vendor clusterware

� Grid Home and RDBMS home must be installed in different locations

– The installer locks the Grid Home path by setting root permissions.

Page 8: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� CRS requires shared Oracle Cluster Registry (OCR) and Voting files

– Must be in ASM or CFS ( raw not supported for install )

– OCR backed up every 4 hours automatically GIHOME/cdata

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8

– Kept 4,8,12 hours, 1 day, 1 week

– Restored with ocrconfig

– Voting file backed up into OCR at each change.

– Voting file restored with crsctl

Page 9: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� For network CRS requires

– One high speed, low latency, redundant private network for inter node communications

– Should be a separate physical network.

– VLANS are supported with restrictions.

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9

– VLANS are supported with restrictions.

– Used for :-

� Clusterware messaging

� RDBMS messaging and block transfer

� ASM messaging.

Page 10: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� For Network CRS requires

– Standard set up Public Network

� One Public IP and VIP per node in DNS

� One Scan name set up in DNS.

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10

– Or Grid Naming Service (GNS) set up Public Network

� One Public IP per node ( recommended )

� One GNS VIP per cluster

� DHCP allocation of hostnames.

Page 11: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� Single Client Access Name (SCAN)

– single name for clients to access Oracle Databases running in a cluster.

– Cluster alias for databases in the cluster.

– Provides load balancing and failover for client connections to the database.

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11

– Cluster topology changes do not require client configuration changes.

– Allows clients to use the EZConnect client and the simple JDBC thin URL for transparent access to any database running in the cluster

– Examples

� sqlplus system/manager@sales1-scan:1521/oltp

� jdbc:oracle:thin:@sales1-scan:1521/oltp

Page 12: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� SCAN in the Cluster

– Each SCAN IP has a SCAN listener dispersed across the cluster.

– [oracle@mynode] srvctl config scan_listener

SCAN Listener LISTENER_SCAN1 exists. Port: TCP:1521

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12

SCAN Listener LISTENER_SCAN2 exists. Port: TCP:1521

SCAN Listener LISTENER_SCAN3 exists. Port: TCP:1521

– [oracle@mynode] srvctl config scan

SCAN name: sales1-scan, Network: 1/133.22.67.0/255.255.255.0/

SCAN VIP name: scan1, IP: /sales1-scan.example.com/133.22.67.192

SCAN VIP name: scan2, IP: /sales1-scan.example.com/133.22.67.193

SCAN VIP name: scan3, IP: /sales1-scan.example.com/133.22.67.194

Page 13: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� Only one set of Clusterware daemons can run on each node

� The CRS stack all spawns from Oracle HA Services Daemon (ohasd)

� On Unix ohasd runs out of inittab with respawn .

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13

� On Unix ohasd runs out of inittab with respawn .

� A node can be evicted when deemed unhealthy

– May require reboot but at least CRS stack restart (rebootless restart).

� CRS provides Cluster Time Synchronization services.

– Always runs but in observer mode if ntpd configured

Page 14: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Overview

� Nodes only lease a node number

– Not guaranteed for stack to always start with same node number

– Only way to influence numbering is at first install/upgrade, and then ensure nodes remain fairly active. (almost true)

What you need to know.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14

– Pre 11.2 databases cannot handle leased node numbers

� Pin node numbers – only allows pinning to current leased number

� CRS stack should be started/stopped on boot/shutdown by init or

– crsctl start/stop crs for local clusterware stack

– crsctl start/stop cluster for all nodes ( ohasd must be running )

Page 15: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15

Page 16: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

11.2 Agents change everything.

� Multi-threaded Daemons

� Manage multiple resources and types

� Implements entry points for multiple resource types

– Start,stop check,clean,fail

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16

– Start,stop check,clean,fail

� oraagent, orarootagent, application agent, script agent, cssdagent

� Single process started from init on Unix (ohasd).

� Diagram below shows all core resources.

Page 17: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

Level 1

Level 2aLevel 3

Level 4a

Level 4b

Level 0

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17

Level 1

Level 2b

Level 4b

Page 18: Gi Rac Diagnostics Racsig 2

Grid Infrastructure ProcessesInit Scripts

� /etc/init.d/ohasd ( location O/S dependent )– RC script with “start” and “stop” actions

– Initiates Oracle Clusterware autostart

– Control file coordinates with CRSCTL

� /etc/init.d/init.ohasd ( location O/S dependent )

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18

� /etc/init.d/init.ohasd ( location O/S dependent )– OHASD Framework Script runs from init/upstart

– Control file coordinates with CRSCTL

– Named pipe syncs with OHASD

Page 19: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

� Level 1: OHASD Spawns:

– cssdagent - Agent responsible for spawning CSSD.

– orarootagent - Agent responsible for managing all root owned ohasdresources.

Startup Sequence 11gR2.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19

resources.

– oraagent - Agent responsible for managing all oracle owned ohasdresources.

– cssdmonitor - Monitors CSSD and node health (along with the cssdagent).

Page 20: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

� Level 2a: OHASD rootagent spawns:

– CRSD - Primary daemon responsible for managing cluster resources.

– CTSSD - Cluster Time Synchronization Services Daemon

Startup Sequence 11gR2.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20

– Diskmon ( Exadata )

– ACFS (ASM Cluster File System) Drivers

Page 21: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

� Level 2b: OHASD oraagent spawns:

– Mdnsd – Multicast DNS daemon

– GIPCD – Grid IPC Daemon

Startup Sequence 11gR2.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21

– GpnpD – Grid Plug and Play Daemon

– EVMD – Event Monitor Daemon

– ASM – ASM instance started here as may be required by CRSD

Page 22: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

� Level 3: CRSD spawns:

– orarootagent - Agent responsible for managing all root owned crsdresources.

oraagent - Agent responsible for managing all nonroot owned crsd

Startup Sequence 11gR2.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22

– oraagent - Agent responsible for managing all nonroot owned crsdresources. One is spawned for every user that has CRS ressources to manage.

Page 23: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processes

� Level 4: CRSD oraagent spawns:

– ASM Resouce - ASM Instance(s) resource (proxy resource)

– Diskgroup - Used for managing/monitoring ASM diskgroups.

– DB Resource - Used for monitoring and managing the DB and instances

– SCAN Listener - Listener for single client access name, listening on SCAN VIP

Startup Sequence 11gR2.

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23

– SCAN Listener - Listener for single client access name, listening on SCAN VIP

– Listener - Node listener listening on the Node VIP

– Services - Used for monitoring and managing services

– ONS - Oracle Notification Service

– eONS - Enhanced Oracle Notification Service ( pre 11.2.0.2 )

– GSD - For 9i backward compatibility

– GNS (optional) - Grid Naming Service - Performs name resolution

Page 24: Gi Rac Diagnostics Racsig 2

Grid Infrastructure Processesohasd managed resources

Resource Name Agent Name Owner

ora.gipcd oraagent crs user

ora.gpnpd oraagent crs user

ora.mdnsd oraagent crs user

ora.cssd cssdagent root

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24

ora.cssdmonitor cssdmonitor root

ora.diskmon orarootagent root

ora.ctssd orarootagent root

ora.evmd oraagent crs user

ora.crsd orarootagent root

ora.asm oraagent crs user

ora.driver.acfs orarootagent root

Page 25: Gi Rac Diagnostics Racsig 2

Troubleshooting ScenariosCluster Startup Problem Triage (11.2+)

StartupSequence

ps –ef|grep init.ohasdps –ef|grep ohasd.bin Running?

YES

NO crsctl config hasohasd.log

Obvious?

YES

NOTFA Collector

Engage Oracle SupportEngage Sysadmin

Team

Cluster Startup

Initintegration?

NO

TFA Collector

ps –ef|grep cssdagentps –ef|grep ocssd.binps –ef|grep orarootagent

YES

Engage Sysadmin Team

ohasd.log YES Engage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25

Cluster StartupDiagnostic Flow

ps –ef|grep orarootagentps –ef|grep ctssd.binps –ef|grep crsd.binps –ef|grep cssdmonitorps –ef|grep oraagentps –ef|grep ora.asmps –ef|grep gpnpd.binps –ef|grep mdnsd.binps –ef|grep evmd.binEtcR

Running?

YES

NOohasd.logagent logs

process logsObvious?

YES

NO

EngageSysadmin Team

Engage Oracle SupportSysadmin Team

TFA Collectorohasd.log

OLR permsCompare reference system

Obvious?YESNO

TFA CollectorEngage

Sysadmin Team

Engage Oracle SupportSysadmin Team

Page 26: Gi Rac Diagnostics Racsig 2

� Multicast Domain Name Service Daemon (mDNS(d))

– Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX and on Windows.

Troubleshooting ScenariosCluster Startup Problem Triage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26

Linux and UNIX and on Windows.

– Uses multicast for cache updates on service advertisement arrival/departure.

– Advertises/serves on all found node interfaces.

– Log is GI_HOME/log/<node>/mdnsd/mdnsd.log

Page 27: Gi Rac Diagnostics Racsig 2

� Grid Plug ‘n’ play daemon (gpnp(d))

– Provides access to the Grid Plug and Play profile

– Coordinates updates to the profile from clients among the nodes of the cluster

– Ensures all nodes have the most recent profile

Troubleshooting ScenariosCluster Startup Problem Triage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27

– Ensures all nodes have the most recent profile

– Registers with mdns to advertise profile availability

– Log is GI_HOME/log/<node>/gpnpd/gpnpd.log

Page 28: Gi Rac Diagnostics Racsig 2

<?xml version="1.0" encoding="UTF-8"?>

<gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" ProfileSequence="6"ClusterUId="b1eec1fcdd355f2bbf7910ce9cc4a228" ClusterName="staij-cluster" PALocation="">

<gpnp:Network-Profile><gpnp:HostNetwork id="gen" HostName="*"><gpnp:Network id="net1" IP="140.87.152.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="140.87.148.0" Adapter="eth1“ Use="cluster_interconnect"/></gpnp:HostNetworkcss"></gpnp:Network-Profile>

Troubleshooting ScenariosCluster Startup Problem Triage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28

</gpnp:HostNetworkcss"></gpnp:Network-Profile>

<orcl:CSS-Profile id=" DiscoveryString="+asm" LeaseDuration="400"/>

<orcl:ASM-Profile id="asm" DiscoveryString="" SPFile="+SYSTEM/staij-cluster/asmparameterfile/registry.253.693925293"/>

<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> <InclusiveNamespacesxmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orclxsi"/></ds:Transform></ds:Transforms><ds:DigestMethodAlgorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>x1H9LWjyNyMn6BsOykHhMvxnP8U=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>N+20jG4=</ds:SignatureValue></ds:Signature>

</gpnp:GPnP-Profile>

Page 29: Gi Rac Diagnostics Racsig 2

� cssd agent and monitor

– Same functionality in both agent and monitor

– Functionality of several pre-11.2 daemons consolidated in both

� OPROCD – system hang

� OMON – oracle clusterware monitor

Troubleshooting ScenariosCluster Startup Problem Triage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29

� VMON – vendor clusterware monitor

– Run realtime with locked down memory, like CSSD

– Provides enhanced stability and diagnosability

– Logs are

� GI_HOME/log/<node>/agent/oracssdagent_root/oracssdagent_root.log

� GI_HOME/log/<node>/agent/oracssdmonitor_root/oracssdmonitor_root.log

Page 30: Gi Rac Diagnostics Racsig 2

� cssd agent and monitor

– oprocd

� The basic objective of both OPROCD and OMON was to ensure that the perceptions of other nodes was correct

Troubleshooting ScenariosCluster Startup Problem Triage

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30

– If CSSD failed, other nodes assumed that the node would fail within a certain amount of time and OMON ensured that it would

– If the node hung for long enough, other nodes would assume that it was gone and OPROCD would ensure that it was gone

� The goal of the change is to do this more accurately and avoid false terminations

Page 31: Gi Rac Diagnostics Racsig 2

� Cluster Time Synchronisation Services daemon

– Provides time management in a cluster for Oracle.

� Observer mode when Vendor time synchronisation s/w is found

– Logs time difference to the CRS alert log

Node Eviction Triage

Troubleshooting Scenarios

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31

– Logs time difference to the CRS alert log

� Active mode when no Vendor time sync s/w is found

Page 32: Gi Rac Diagnostics Racsig 2

� Cluster Ready Services Daemon

– The CRSD daemon is primarily responsible for maintaining the availability of application resources, such as database instances. CRSD is responsible for starting and stopping these resources, relocating them when required to another node in the event of failure, and maintaining the resource profiles in the OCR

Node Eviction Triage

Troubleshooting Scenarios

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32

node in the event of failure, and maintaining the resource profiles in the OCR (Oracle Cluster Registry). In addition, CRSD is responsible for overseeing the caching of the OCR for faster access, and also backing up the OCR.

– Log file is GI_HOME/log/<node>/crsd/crsd.log

� Rotation policy 10MB

� Retention policy 10 logs

Page 33: Gi Rac Diagnostics Racsig 2

� CRSD oraagent

– CRSD’s oraagent manages

� all database, instance, service and diskgroup resources

� node listeners

Node Eviction Triage

Troubleshooting Scenarios

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33

� SCAN listeners, and ONS

– If the Grid Infrastructure owner is different from the RDBMS home owner then you would have 2 oraagents each running as one of the installation owners. The database, and service resources would be managed by the RDBMS home owner and other resources by the Grid Infrastructure home owner.

– Log file is

� GI_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log

Page 34: Gi Rac Diagnostics Racsig 2

� CRSD orarootagent

– CRSD’s rootagent manages

� GNS and it’s VIP

� Node VIP

Node Eviction Triage

Troubleshooting Scenarios

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34

� Node VIP

� SCAN VIP

� network resources.

– Log file is

� GI_HOME/log/<node>/agent/crsd/orarootagent_root/oraagent_root.log

Page 35: Gi Rac Diagnostics Racsig 2

� Agent return codes

– Check entry must return one of the following return codes:

� ONLINE

� UNPLANNED_OFFLINE

– Target=online, may be recovered failed over

Node Eviction Triage

Troubleshooting Scenarios

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35

� PLANNED_OFFLINE

� UNKNOWN

– Cannot determine, if previously online, partial then monitor

� PARTIAL

– Some of a resources services are available. Instance up but not open.

� FAILED

– Requires clean action

Page 36: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36

Page 37: Gi Rac Diagnostics Racsig 2

SystemProvisioning

Check pre-reqsruncluvfy.sh

Pre-reqsMet?

NO

YESInstall

ProblemBefore

root.sh?

NO

Engage appropriate teamCVU Fixup Jobs

DBAsSysadmin

YES

1056322.1

1367631.1

NO

ProblemRunningroot.sh?

YES

942166.1

NO

YESNO

Installation Diagnostics and TroubleshootingInstall/Upgrade Scenario Process Flow

Top 5

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37

Provisioning810394.11096952.1169706.1

SysadminNetworkingStorageOS VendorHW VendorOracle SupportEtc

TFA Collector

Engage Oracle Support

Upgrade?

Check pre-reqsruncluvfy.sh

raccheck –u –o pre

YES

Pre-reqsMet?

NO

YESProblemBefore

rootupgrade.sh?

NO

YES

1056322.1

1366558.1

ProblemRunning

rootupgrade.sh?

YES

1364947.1

1121573.1

InstallNO

Success?

YESNO

Page 38: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• References • RAC and Oracle Clusterware Best Practices ..(Platform Independent) (Doc ID 810394.1)• Master Note for Real Application Clusters (RAC) Oracle Clusterware .. (Doc ID 1096952.1)• Oracle Database .. Operating Systems Installation and Configuration .. (Doc ID 169706.1)• Troubleshoot 11gR2 Grid Infrastructure/RAC Database runInstaller Issues (Doc ID

1056322.1)• Top 5 CRS/Grid Infrastructure Install issues (Doc ID 1367631.1)• How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation (Doc ID 942166.1)• How to Proceed When Upgrade to 11.2 Grid Infrastructure Cluster Fails (Doc ID 1364947.1)

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38

• How to Proceed When Upgrade to 11.2 Grid Infrastructure Cluster Fails (Doc ID 1364947.1)• How To Proceed After The Failed Upgrade ..In Standalone Environments (Doc ID 1121573.1)• Top 11gR2 Grid Infrastructure Upgrade Issues (Doc ID 1366558.1)• TFA Collector - Tool for Enhanced Diagnostic Gathering (Doc ID 1513912.1)

Page 39: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• runInstaller issue diagnostics• Installation logs

• installActions${TIMESTAMP}.log• oraInstall${TIMESTAMP}.err• oraInstall${TIMESTAMP}.out

• Relink errors in installActions*.log due to missing RPMs on Linux, eg.• Error :-

• /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../libpthread.so when searching for -lpthread• /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../libpthread.a when searching for -lpthread

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39

• /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../libpthread.a when searching for -lpthread• /usr/bin/ld: cannot find -lpthread• collect2: ld returned 1 exit status

• Affected Version :-

• 10.2 on RHEL3(x86-64),RHEL4(x86-64) and RHEL5(x86-64)

• RPM missing :-

• glibc-devel (64 bit)

Page 40: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• runInstaller issue diagnostics• Relink errors in installActions*.log on AIX, eg.

• ld: 0706-006 Cannot find or open library file: -l m• INFO: End output from spawned process.• INFO: ----------------------------------• INFO: Exception thrown from action: make• Exception Name: MakefileException• Exception String: Error in invoking target 'links proc gen_pcscfg' of makefile

'/app/oracle/oraInventory/logs/installActions2012-10-01_03-34-41PM.log' for details

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40

• Exception Severity: 1• MOS search terms

• links proc gen_pcscfg makefile• MOS search results

• Solution – filesystem mount option configuration problem

Page 41: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• Problem Avoidance• Standard builds with proper configuration baked in• Pre-flight checklist

• ssh configuration• Follow How To Configure SSH for a RAC Installation (Doc ID 300548.1)• Some customers do not follow the guidelines in the note• Manual checking of ssh ($ ssh hostname date) and CVU checks of ssh pass• But Oracle Universal Installer fails with messages about ssh configuration

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41

• But Oracle Universal Installer fails with messages about ssh configuration• Sanity check and verify the way OUI expects

• $ /usr/bin/ssh -o FallBackToRsh=no -o PasswordAuthentication=no -o StrictHostKeyChecking=yes -o NumberOfPasswordPrompts=0 <hostname> date;

• Tue Jan 14 12:49:48 PST 2014

• Installations/Upgrades – Cluster Verification Utility (CVU)• Upgrades – raccheck/orachk pre-upgrade mode (./orachk –u –o pre)

Page 42: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• # 1: 11.2.0.2+ root.sh or rootupgrade.sh fail on 2nd node due to multicast issues• Symptom

• Failed to start Cluster Synchronization Service in clustered mode at /u01/app/crs/11.2.0.2/crs/install/crsconfig_lib.pm line 1016.

• Cause

Top 5 CRS/Grid Infrastructure Install issues

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42

• Cause• Improper multicast configuration for cluster interconnect network

• Solution• Prior to installR

• Follow Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement (Doc ID 1212703.1)

Page 43: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• # 2: root.sh fails to startup 11.2 GI stack due to known defects• Symptom

• GI install failure when running root.sh• Cause

• Known issues for which fixes already exist

Top 5 CRS/Grid Infrastructure Install issues

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43

• Known issues for which fixes already exist• Solution

• In-flight application of most recent PSU • Proceed with install up to step requiring running root.sh• Before running root.sh script apply PSU• In general you’ll want the latest PSUs anyway but this step may help avoid problems

• For upgrades run ./raccheck –u –o pre prior to beginning • Checks for pre-req patches

Page 44: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• # 3: How to complete a GI installation if the OUI session has died while running root.sh on the clusternodes

• Symptom• Incomplete or interrupted installation

• CauseUnexpected reboot/failure of node on which OUI session was running before confirmation that

Top 5 CRS/Grid Infrastructure Install issues

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.44

• Unexpected reboot/failure of node on which OUI session was running before confirmation that root.sh was run on all the nodes prior to the reboot/failure and before the assistants are run

• Solution• As the grid user execute "$GRID_HOME/cfgtoollogs/configToolAllCommands" on the first node

(only)

Page 45: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• # 4: Installation fails because network requirements aren't met• Symptom

• Clusterware startup problems• Individual clusterware component startup problems

• Cause• Improper network configuration for public and/or private network

Top 5 CRS/Grid Infrastructure Install issues

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45

• Improper network configuration for public and/or private network• Solution

• Prior to installation• How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc

ID 1054902.1)• Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to

Multicasting Requirement (Doc ID 1212703.1)

Page 46: Gi Rac Diagnostics Racsig 2

Installation Diagnostics and Troubleshooting

• # 5: 11.2 Rolling GI upgrade fails• Symptom

• Rolling upgrade failure• Cause

• Potential ASM bugs• Solution

• Prior to rolling GI upgrade• ./raccheck –u –o pre

Top 5 CRS/Grid Infrastructure Install issues

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.46

• ./raccheck –u –o pre• Checks for pre-req patches• Install pre-req patches to avoid ASM bugs and

• If complete cluster outage is allowable, optionally perform non-rolling GI upgrade

• References• Top 5 CRS/Grid Infrastructure Install issues (Doc ID 1367631.1) for more details• Things to Consider Before Upgrading to 11.2.0.3/11.2.0.4 Grid Infrastructure/ASM (Doc ID

1363369.1)

Page 47: Gi Rac Diagnostics Racsig 2

Dynamic Resource Mastering (DRM)

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.47

Page 48: Gi Rac Diagnostics Racsig 2

Dynamic Resource Mastering• What is it?

• Not something you would ordinarily need to worry about• Part of the “plumbing” of Cache Fusion• Optimizations to speed access to data• Reduce interconnect traffic• DRM - Dynamic Resource management (Doc ID 390483.1)

• How does it work?• Lock element (LE) resources for data blocks for objects• Hashed and mastered across all nodes in the cluster

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.48

• Hashed and mastered across all nodes in the cluster• Access statistics collected, compared to policies in the database (50:1 access pattern)• Depending upon workload access patterns resource mastership may migrate to other nodes• Resources automatically remastered to node where most often accessed• LMON, LMD, LMS processes responsible for DRM• DRMs can be seen in

• LMON trace files• gv$dynamic_remaster_stats

• Insert/Update/Delete operations continue without interruption• Example use case that might trigger DRM – hybrid workloads OLTP vs Batch

Page 49: Gi Rac Diagnostics Racsig 2

Dynamic Resource Mastering

• Affinity locks • Optimization introduced in 10.2 with object affinity to manage buffers• Smaller, more efficient than fusion locks (LE)

• Less memory required• Fewer instructions performed

• Master node grants affinity locks• Affinity locks can be expanded to fusion locks

• If another instance needs to access the block• If mastership is changed

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.49

• If mastership is changed• Affinity locks apply to data and undo segment blocks

• Affinity Lock Example• GCS lock (LE) mastered on instance 2• Instance 1 accesses buffers for this object 50x more than instance 2• LEs dissolved and affinity locks created, mastership stored in memory• Instance 1 can now cheaply read/write to these buffers• Instance 2 accesses buffers, affinity locks expanded to fusion locks (LE)

Page 50: Gi Rac Diagnostics Racsig 2

Dynamic Resource Mastering

• Symptoms of a problem with DRM• High DRM related wait events

• gcs drm freeze in enter server mode• Script to Collect DRM Information (drmdiag.sql) (Doc ID 1492990.1)• Open SR and submit diagnostics collected by script

• With large buffer cache (> 100 gig)• gcs resource directory to be unfrozen• gcs remaster waits• Bug 12879027 - LMON gets stuck in DRM quiesce causing intermittent pseudo

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.50

• Bug 12879027 - LMON gets stuck in DRM quiesce causing intermittent pseudo reconfiguration (Doc ID 12879027.8)

• DRM hang causes frequent RAC Instances Reconfiguration (Doc ID 1528362.1)• Database slowdowns that correlate with DRMs

• Script to Collect DRM Information (drmdiag.sql) (Doc ID 1492990.1)• Open SR and submit diagnostics collected by script

Page 51: Gi Rac Diagnostics Racsig 2

QuestionsQuestionsAnswersAnswers

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.51

AnswersAnswers

Page 52: Gi Rac Diagnostics Racsig 2

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.52

Page 53: Gi Rac Diagnostics Racsig 2

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.53