mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02
TRANSCRIPT
![Page 1: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/1.jpg)
Under the Hoodof Oracle ClusterwareMiracle OpenWorld 2010
15-Apr-2010
Alex Gorbachev, The Pythian Group
![Page 2: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/2.jpg)
© 2009/2010 Pythian
Alex Gorbachev
• CTO, The Pythian Group• Blogger• OakTable Network member• Oracle ACE Director• BattleAgainstAnyGuess.com• Vice-president, Oracle RAC SIG
2
![Page 3: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/3.jpg)
© 2009/2010 Pythian
Why Companies Trust Pythian• Recognized Leader:• Global industry-leader in remote database administration services and consulting for Oracle,
Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Forbes.com, Fox Interactive media, and MDS Inc. to help manage their complex IT deployments
• Expertise:• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
• Global Reach & Scalability:• 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
3
![Page 4: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/4.jpg)
© 2009/2010 Pythian
Agenda
• Place of Clusterware in Oracle RAC
• Node membership and evictions
• Clusterware startup sequence
• Oracle Cluster Registry
• Resources Management and troubleshooting
• 11gR2 Grid Infrastructure
4
![Page 5: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/5.jpg)
© 2009/2010 Pythian
Agenda
4
Nee
d to
mem
oriz
e
Understanding
Low
High
Shallow In-depth
The more you understand,
the less you need to memorize
![Page 6: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/6.jpg)
© 2009/2010 Pythian
ArchitectureOS
Clusterware
Instance
ASM
VIPListener
Service
Shared storage
OCR Votingdisk
OS
Clusterware
Instance
ASM
VIPListener
Service
OS
Clusterware
Instance
ASM
VIPListener
Service
interconnectstorage access
5
![Page 7: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/7.jpg)
© 2009/2010 Pythian
ArchitectureOS
Clusterware
Instance
ASM
VIPListener
Service
Shared storage
OCR Votingdisk
OS
Clusterware
Instance
ASM
VIPListener
Service
OS
Clusterware
Instance
ASM
VIPListener
Service
interconnectstorage access
5
![Page 8: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/8.jpg)
© 2009/2010 Pythian
OS
Clusterware
6
![Page 9: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/9.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
Cluster Synchronization Services
6
![Page 10: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/10.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
Cluster Synchronization Services
Cluster Ready Services
6
![Page 11: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/11.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
6
![Page 12: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/12.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
EVM
D
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
Event Manager
6
![Page 13: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/13.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD
EVM
D
OPROCD
RACG
VIP
Cluster Synchronization Services
Cluster Ready Services
HA Framework scripts
Event Manager
Oracle Process Monitor
6
![Page 14: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/14.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
7
![Page 15: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/15.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
7
![Page 16: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/16.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
![Page 17: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/17.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
![Page 18: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/18.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
8
![Page 19: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/19.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
9
![Page 20: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/20.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
9
![Page 21: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/21.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
ShootTheOtherNodeInTheHead
9
![Page 22: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/22.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
10
![Page 23: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/23.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
![Page 24: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/24.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
![Page 25: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/25.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
![Page 26: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/26.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
11
![Page 27: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/27.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
AskTheOtherNodeToRebootItself (c) known quote
11
![Page 28: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/28.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
interconnect
12
![Page 29: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/29.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
interconnect
OCLSOMON
12
![Page 30: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/30.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
Votingdisk
interconnect
OCLSOMON
12
![Page 31: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/31.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
![Page 32: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/32.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
![Page 33: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/33.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
13
![Page 34: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/34.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
CSSD
OPROCD
13
![Page 35: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/35.jpg)
© 2009/2010 Pythian
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
OPROCD
13
![Page 36: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/36.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
![Page 37: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/37.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
![Page 38: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/38.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
![Page 39: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/39.jpg)
© 2009/2010 Pythian
CSSD
OS
Clusterware
CSSD
CRSD EVM
D
OPROCD
RACG
VIP
interconnect
Votingdisk
14
![Page 40: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/40.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
Votingdisk
15
![Page 41: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/41.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
15
![Page 42: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/42.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSDEVM
D
OPROCD
RACG
VIP
CSSD
OS
Clusterware
CRSD EVM
D
OPROCD
RACG
VIP
CSSDinterconnect
15
![Page 43: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/43.jpg)
© 2009/2010 Pythian
CSSD CSSDinterconnect
15
![Page 44: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/44.jpg)
© 2009/2010 Pythian
Evictions
16
![Page 45: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/45.jpg)
© 2009/2010 Pythian
• Network heartbeat lost
Evictions
16
![Page 46: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/46.jpg)
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost
Evictions
16
![Page 47: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/47.jpg)
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost• CSSD is not healthy
Evictions
16
![Page 48: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/48.jpg)
© 2009/2010 Pythian
• Network heartbeat lost• Voting disk access lost• CSSD is not healthy• OS is not healthy
• OPROCD - Unix, Windows, 11g Linux
• hangcheck-timer - 10g Linux
Evictions
16
![Page 49: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/49.jpg)
© 2009/2010 Pythian
DEMONHB failure
• Simulate with “ifconfig eth1 down”• Both nodes notice the loss• Racing to evict each other
• from voting disk => 2 equal sub-clusters
• survives the one with the lowest leader #
• leader is the node with lowest # in sub-cluster
• Winner evicts another node• Setting kill-block in voting disk
• CSSD and OCLSOMON race to suicide
17
![Page 50: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/50.jpg)
© 2009/2010 Pythian
NHB failure symptoms
• NHB failure on several nodes• ocssd.log
• Evicted node can contain other traces• maybe - syslog (Linux - /var/log/messages)
• maybe - oclsomon.log
• almost always - console
• Network is only *possible* root cause• check syslog, ifconfig, netstat
• Network engineering - switches logs
18
![Page 51: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/51.jpg)
© 2009/2010 Pythian
DEMOCSSD is not healthy
• Simulate using kill -STOP <cssd.bin pid>• Another node observes NHB loss
• After misscount seconds => attempt eviction
• but CSSD is frozen and can’t commit suicide
• OCLSOMON detects CSSD timeout• Commit suicide
19
![Page 52: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/52.jpg)
© 2009/2010 Pythian
OCSSD sick - symptoms
• Error in OCLSOMON.log• OCSSD log might be clean on evicted node• syslog might contain OCLSOMON diag. err.• Console often contains diag. err.
• Depending on syslogd settings
• Set diagwait to more that 3 for better diagnosability• 3 seconds is reboottime
• Increases risk of corruption
20
![Page 53: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/53.jpg)
© 2009/2010 Pythian
DEMOhost sick - CPU stalled
• Simulate by pausing OPROCD• kill -STOP <oprocd pid>
• sleep 1 or 2
• kill -CONT <oprocd pid>
• oprocd.log• Usually nothing if node is reset
• Immediate reboot• Console might contain diag msg
21
![Page 54: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/54.jpg)
© 2009/2010 Pythian
Killed by OPROCD - symptoms
• Hard to confirm (nothing in oprocd.log)• Console output often helps
• “SysRq: resetting” could be in syslog as well
• Root cause• Faulty hardware, drivers, caused by IO/network
• Kernel bugs, NTP bugs
• Investigate syslog messages
• Margin can be tuned• diagwait and reboottime CSSD parameters
22
![Page 55: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/55.jpg)
© 2009/2010 Pythian
10g on Linux - hangcheck-timer
• Replaced by OPROCD in 11g and 10.2.0.4+• Most of the time useless and inactive!• Metalink Note 726833.1
• Updated 21-JUL-08!
• Oracle suggests to keep both• I would only leave OPROCD
• Metalink Note 567730.1• OPROCD in 10.2.0.4
23
![Page 56: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/56.jpg)
© 2009/2010 Pythian
Killed by hangcheck-timer
• Rarely can be confirmed• “Hangcheck: hangcheck is restarting the machine”
• Can set hangcheck_dump_tasks to dump state
• See source code...
24
![Page 57: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/57.jpg)
© 2009/2010 Pythian
• Linux & UNIX inittab• init.cssd
• init.evmd
• init.crsd
• Linux & UNIX init.d• init.crs
• Windows Services
Clusterware startup
25
![Page 58: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/58.jpg)
© 2009/2010 Pythian
Daemons startup sequence
CSSD
EVMD
CRSD
Third-party clusterware
• Triggered• by init.crs from init.d sequence
• manually
26
![Page 59: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/59.jpg)
© 2009/2010 Pythian
Startup in Linux & Unix[gorby@dime ~]$ ps -fe | grep 'init\.' | grep -v grep
root 6352 1 0 10:24 ... /bin/sh /etc/init.d/init.evmd run
root 6353 1 0 10:24 ... /bin/sh /etc/init.d/init.cssd fatal
root 6354 1 0 10:24 ... /bin/sh /etc/init.d/init.crsd run
root 7356 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oprocd
root 7364 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd oclsomon
root 7383 6353 0 10:25 ... /bin/sh /etc/init.d/init.cssd daemon
[gorby@dime ~]$ tail -3 /etc/inittab
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
[gorby@dime ~]$ ls -l /etc/rc3.d/S96init.crs
lrwxrwxrwx 1 root root 20 Aug 1 23:51 /etc/rc3.d/S96init.crs -> /etc/init.d/init.crs
27
![Page 60: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/60.jpg)
© 2009/2010 Pythian
t
Startup flow
28
![Page 61: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/61.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
Startup flow
28
![Page 62: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/62.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
![Page 63: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/63.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
![Page 64: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/64.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrunStartup flow
28
![Page 65: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/65.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
Startup flow
28
![Page 66: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/66.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.crs start
init.cssd autostart
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
Startup flow
28
![Page 67: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/67.jpg)
© 2009/2010 Pythian
init.crsd run
init.evmd run
init.cssd fatal
init.cssd oprodc
init.cssd oclsomon
init.cssd daemon
init.cssd oclsvmon
oprocd
oclsomon.bin
ocssd.bin
oclsvmon.bin
evmd.bin
t
/etc/oracle/scls_scr/{host}/root/cssrun
/etc/oracle/scls_scr/{host}/root/crsstart• enable• disable
crsd.bin
Startup flow
28
![Page 68: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/68.jpg)
© 2009/2010 Pythian
DEMOStartup troubleshooting
• Check processes using “ps -fe | grep init”• Check syslog (/var/log/messages)
• Can point to /tmp/crsctl.#####
• Remember boot sequence• Clusterware log files
• if *.bin processes are running already
• crsctl• crsctl check crs/cssd/crsd/evmd
29
![Page 69: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/69.jpg)
© 2009/2010 Pythian
Log files
• log/{host}/cssd/ocssd.log• log/{host}/cssd/oclsomon/ocslmon.log
• ocslmon.ba1, ocslmon.ba2,...
• /etc/oracle/oprocd/{host}.oprocd.log• {host}.oprocd.log.{timestamp}
• syslog• Linux /var/log/messages
• Solaris /var/adm/log
• Console logs
30
![Page 70: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/70.jpg)
© 2009/2010 Pythian
Windows world
• OPROCD = OraFenceService• EVMD = OracleEVMService• CRSD = OracleCRService• CSSD = OracleCSService• OPMD
• Oracle Process Manager Daemon
• Start trigger like init.crs in *nix
• registered with Windows Service Control Manager (WSCM) and delay start by 60 seconds
31
![Page 71: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/71.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
32
• Passing clusterware events
• Usually not a problem• Verify
• evmwatch -A
• evmpost -u "my message"
![Page 72: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/72.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVMD
RACG
VIP
CSSD
OPROCD
32
• Passing clusterware events
• Usually not a problem• Verify
• evmwatch -A
• evmpost -u "my message"
![Page 73: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/73.jpg)
© 2009/2010 Pythian
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
33
![Page 74: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/74.jpg)
© 2009/2010 Pythian
• CRSD manages cluster resources• Stop / Start
• Failover
• VIP management
• New resources and etc.
• RACG helper scripts
OS
Clusterware
CRSD
EVM
D
RACG
VIP
CSSD
OPROCD
33
![Page 75: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/75.jpg)
© 2009/2010 Pythian
• After CSSD and EVMD• Re-spawned on failure
• No eviction
• Runs as root• VIP control
• OCR management
• root ulimits are in place!
• Can run resources owned by any user
• owner is the property of a resource
CRSD startup
34
![Page 76: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/76.jpg)
© 2009/2010 Pythian
Oracle Cluster Registry
• Repository for all configuration data• Except OCR location itself
• OCR is accessed mostly read-only• Every component reads OCR
• OCR is written only by CRS• only from a single OCR master node
### crsd.log ###
2008-08-02 22:23:50.958: [ OCRMAS] [3065154448]th_master:13:I AM THE NEW OCR MASTER at incar 12. Node Number 1
35
![Page 77: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/77.jpg)
© 2009/2010 Pythian
CRS resources
• Standard Oracle resources• ASM
• Listener
• VIP
• Database and Instance
• etc..
• srvctl => manages Oracle resources
• Custom user resources• crs_% => manages any resources
36
![Page 78: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/78.jpg)
© 2009/2010 Pythian
CRS resource internals
• Unique name• Associated action script
• stop / start / check functions
• Other attributes• check frequency
• pre-requisites
• restart retries
• etc...
• All info stored in OCR
37
![Page 79: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/79.jpg)
© 2009/2010 Pythian
DEMOResource profiles
• Use crs_stat [-t] to check status• Use crs_stat -p to check attributes• crs_* vs srvctl (like srvctl config ... -a)• Standard action scripts
• racgimon
• racgwrap / racgmain
• racgvip
• racgons
• usrvip
38
![Page 80: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/80.jpg)
© 2009/2010 Pythian
DEMOOCR internals
• ocrcheck• ocrconfig
• used during install/ugrade
• backup OCR
• recover OCR
• ocrdump• txt or xml
39
![Page 81: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/81.jpg)
© 2009/2010 Pythian
DEMOracgvip case study
• Check the script• Set env. vars and simulate the call• Use _USR_ORA_DEBUG=1 in the script
40
![Page 82: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/82.jpg)
© 2009/2010 Pythian
Resources hierarchy
• 10.2.0.2 (?)• released dependency of
ASM and Instance on VIP
• If DB registered manually with srvctl• ASM dependency missing
DB
Instance
Nodeapps
GSD ONS
VIP
Listener
ASM
Only 10.1 and 10.2.0.1
CS(Collective Service)
Service
41
![Page 83: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/83.jpg)
© 2009/2010 Pythian
DB
Instance
Nodeapps
GSD ONS
VIP
Listener
ASM
Only 10.1 and 10.2.0.1
CS(Collective Service)
Service
Resources and Oracle homes
DB Home
ASM Home
CRS Home
Listener can be in ASM homeASM home can be Oracle home
Logs are in appropriate home
42
![Page 84: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/84.jpg)
© 2009/2010 Pythian
DEMOtroubleshooting resources
• {home}/log/{host}/racg/{resource_name}.log • Old way - edit racgwrap
• Uncomment _USR_ORA_DEBUG=1
• crsctl debug log res ‘{res_name}:{0|1}’• crs_stat -p | grep DEBUG
• Run “srvctl start ...” manually• SRVM_TRACE=TRUE
43
![Page 85: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/85.jpg)
© 2009/2010 Pythian
Troubleshooting summary
• crsctl check crs | crsd | cssd | evmd• crs_stat [-t]• crs_stat -p [{res_name}]• crsctl debug log css | crs | evm | res• crsctl lsmodules css | crs | evm• crs_stop {res_name} [-f] (stop force resource)• ocrdump• See scripts
44
![Page 86: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/86.jpg)
© 2009/2010 Pythian
Troubleshooting flow
• Is Clusterware up?• Is Oracle resources up?
• Listener & VIP
• Database & ASM instance
• Services
• Did any nodes got rebooted?• Did any resources re-started?
• $ORA_CRS_HOME/log/{host}/crs/crsd.log
• $ORA_CRS_HOME/log/{host}/alert{host}.log
• MOS Note 265769.1 “Troubleshooting 10g and 11.1 Clusterware Reboots”
45
![Page 87: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/87.jpg)
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
46
![Page 88: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/88.jpg)
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
46
Oracle Clusterware Administration and Deployment Guide
![Page 89: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/89.jpg)
© 2009/2010 Pythian
Enter the 11gR2 World - Grid Infrastructure
47
My Oracle Support Note 1053147.1
![Page 90: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/90.jpg)
© 2009/2010 Pythian
11g Grid Infrastructure Documentation
• Oracle Clusterware Administration and Deployment Guide• MOS Note 1053147.1
• 11gR2 Clusterware and Grid Home - What You Need to Know
• MOS Note 1050908.1• How to Troubleshoot Grid Infrastructure Startup Issues
• MOS Note 1053970.1• Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues
• MOS Note 1050693.1• Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
48
![Page 91: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/91.jpg)
© 2009/2010 Pythian
11gR2 Node Evictions
• Same as in 10g + member kill escalation• LMON process may request CSS to remove an instance from the
cluster via the instance eviction mechanism. If this times out it could escalate to a node kill.
• Processes evicting• CSSD
• CSSDAGENT
• CSSDMONITOR
49
![Page 92: Mow10 uthoc-alex-gorbachev-public-100422164413-phpapp02](https://reader031.vdocument.in/reader031/viewer/2022030403/554fcce8b4c90542018b5374/html5/thumbnails/92.jpg)
© 2009/2010 Pythian
Questions?
Thank you!
http://www.pythian.com/