ha with mc guard concepts
TRANSCRIPT
-
8/3/2019 HA With MC Guard Concepts
1/55
HA with MC/ServiceGuard (Concepts)
http://uxsl.europe.hp.com/doc/tech/ha/HAtrain/ Prepared by Anand
Other platforms have other HA software
HA means the following :- no SPOC- N+1 redundancy - Ideal : Dual power sources/vendors ; hubs and switches connected to dual power sources- Not load balancing (foundry / cisco local director, software load balancers)
HA Terminology :- Cluster (1)- Node (1 to many)- Package (1 to many) - Floating IPs (single/multiple eg. BAMM) ; Can specify hostnames in DNS for each
floating IP
Question : Can we have a node in 2 clusters ? Not advisable - dependencies
Availability :99% - standard server99.5% - MC/ServiceGuard application, not node99.99% - ??
Criteria for HA :Ensure that both (all) nodes in the cluster
- Are of same build hardware and software wise (patch level, kernel changes, user accounts)
Type of disks applicable for use with HA MC/ServiceGuard :
- In general, disks with 2 SPUs/controllers :- VA- FC10, SC10- XP- DS- AutoRAID 12H- Nike disks
Not recommended- Jamaica disks- Desktop
Note : Disks should have HA (RAID1, RAID5) as well.
Question : Can MC/ServiceGuard work across DCs or countries ie one node in Singapore, the other node Japan ?
Answer : Yes, provided the heartbeat cable is long enough, or more importantly the subnet is the same andthe shared disk system is accessible by both servers.
-
8/3/2019 HA With MC Guard Concepts
2/55
Software Licenses
Part# Description Qty Unit PriceB3935DA MC/SG software system license for HPUX 11.x 2 USD 0.0B3935DA-AE5 MC/SG software license for K/N class 2 USD 5117.0B3935DA-ABA MC/SG software English localization 2 USD 0.0
B3935DA-0S6 MC/SG 24x7 Support (first year) 2 USD 496.8B5140BA MC/SG NFS toolkit license 2 USD 322.5B5140BA-0S6 MC/SG NFS toolkit 24x7 support (first year) 2 USD 64.8B5139DA Enterprise Cluster Extension 2 USD 427.85B5139DA-0S6 Enterprise Cluster Extension 24x7 Support 2 USD 86.4first year)H6194AA MC/SG Implementation 1 USD 15000.0(to be included only if you want to buy consulting and implementation service from HPC)B7885BA MC/SG LTU Extension for SAP 1 USD 12900.0(per SAP instance)
**Pls verify with the SAP team if any other SAP related license is needed.
If you would like to buy service from HPC, what our team usually do is to approach Vincent who's tAccount manager for HPO and he will arrange for someone from HPC to work with us. (Do remembinclude the USD15k)
Software Installation
Note : MC/ServiceGuard can be installed from ctss144 depots. (/var/depot/applications/11.00/hp-ux.,/var/depot/applications/11.11/hp-ux.,)
We have in our depots :Version 11.09Version 11.13 - recommended
MC/ServiceGuard software to install (basic setup, install on both machines):
B3935DA A.11.13 MC / Service GuardB5140BA A.11.00.04 MC/ServiceGuard NFS Toolkit install only if NFS
is required to work within the cluster
B5139DA B.01.06 Enterprise Cluster Master Toolkit - optionalB8324BA A.01.03 HP Cluster Object Manager optional
Note : Only install the above software from the same DART/CD version, do not try to mix and match from dreleases.
Note : If OS is ver 11.11 (11i) and your OS is Mission Critical Environment, then it should come withMC/ServiceGuard installed.
-
8/3/2019 HA With MC Guard Concepts
3/55
Note : Do check /etc/services and /etc/inetd.conf files for the MC/ServiceGuard related services, esp. for 1mission critical OS.
/etc/serviceshacl-hb 5300/tcp # High Availability (HA) Cluster heartbeat
hacl-gs 5301/tcp # HA Cluster General Serviceshacl-cfg 5302/tcp # HA Cluster TCP configurationhacl-cfg 5302/udp # HA Cluster UDP configurationhacl-probe 5303/tcp # HA Cluster TCP probehacl-probe 5303/udp # HA Cluster UDP probehacl-local 5304/tcp # HA Cluster Commandshacl-test 5305/tcp # HA Cluster Testhacl-dlm 5408/tcp # HA Cluster distributed lock manager
/etc/inetd.confhacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -phacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c
hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f /var/opt/cmom/cmo
Depending on what version of MC/ServiceGuard is installed, MC/ServiceGuard patches must be installed:
http://haweb.cup.hp.com/Support/Patches/SG11.00.html
Question : can we install one node with MC/ServiceGuard version 11.09 and the other with version 11.13 osomething else, ie different versions?
Answer : Not advisable, compatibility issues. Unless, your doing rolling upgrades.
-
8/3/2019 HA With MC Guard Concepts
4/55
MC/ServiceGuard Network Design
Note : Usually heartbeat LAN, use internal LAN cardPrimary and Secondary LANs use 2 separate LAN cards.
Question : How will a 3 node, 4 node cluster be like ?
How can we configure the packages to failover ?? Many possibilities
Heartbeat network- cross UTP- Serial cable- dedicated heartbeat subnet- Primary LAN usually set as secondary heartbeat
Cluster/Package Node Configurations-ACTIVE ; ACTIVE
-ACTIVE ; PASSIVE
Cluster Lock Disk-Tie breaker-Who gets the lock disk who will reform the cluster, the other will panic reboot usually-What if the cluster lock disk is dead?? - UNPLANNED OUTAGE
switch 1 switch 2
User Lan (Securenet)
lan0 lan1lan1 lan0
lan2
FC2
lan2
sgpue036 sgpue037
heartbeat lan
(cross UTP cable)
Primary lan (15.209.0.25) -
cable name : sgpue036
Primary lan (15.209.0.26) -
cable name : sgpue037
192.0.0.1 192.0.0.2
Keychain Database MC/ServiceGuard network design
Failover lan(no physical IP, but must
be connected to switch 1))cable name: sgpue037s
Failover lan(no physical IP, but must
be connected to switch 2))cable name: sgpue036s
-
8/3/2019 HA With MC Guard Concepts
5/55
MC/ServiceGuard Monitoring-Hardware
-Application
-ITO
-ClusterviewPlus
-NNM
MC/ServiceGuard Commands
CmqueryclCmcheckconfCmapplyconf will distribute binary configuration details to all nodes in the clusterCmgetconf
Cluster specific commandsCmruncl
CmviewclCmhaltcl
Node specific commandsCmrunnodeCmhaltnode
Package specific commandsCmrunpkgCmhaltpkgCmmodpkg
MC/ServiceGuard with SAM
MC/ServiceGuard backups
Database vendors online backup tools
Split mirror
Business Copy (VA, XP) KNET
JFS snapshots
Practise of backup for HPMS if no special requestso for filesystem backup whatever filesystem is mounted on which
system, therefore if failed over.
o for database SAP/DBA will consult tools team on backup strategy usually configure omniback to detect and backup by floating IP.
Issues with BAMM ??
-
8/3/2019 HA With MC Guard Concepts
6/55
Project Timeline (TAT)Gathering information 2 daysHardware setup (LAN)- 2 daysConfiguration - 3 days (varies) dependencies : Application/DB scriptsTesting - 1 day (require CE presence)
-
8/3/2019 HA With MC Guard Concepts
7/55
Configure /etc/rc.config.d/netconf on each of the nodes in the cluster with the heartbeat LAN (if using LANand not serial interface)
!"#
#
!"
#
$%&'&($)*+,-.+/)$&0$%&&)%$1&($.-23*&)
$.-23*&%4&5*$$%54$6&5*$$%5*4$0%0$&(4$.%4&$)&0-%4&(30%&*%$-
-%0&.)$%4&*$)&0-%4&(30%&*
2222
$)&*$$%$)&*$$%$)&*$$%$)&*$$%
$)&*$$%$)&*$$%$)&*$$%$)&*$$%
Sgpue036.sgp.hp.com root
Sgpue037.sgp.hp.com root
$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$
$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0
Unmount Logical Volumes and deactivate the Volume Groups that will be controlled/run by the cluster.(These do not need to be entered in /etc/fstab)
E.g.. vgchange a n vg02. vgchange a n vg03
Note : It is possible that a cluster does not have any cluster lock disk or a even VG at all.
Same for packages. Also each VG must be unique for each package, cannot use the same VG for
other packages.
Export and distribute the Volume Groups to the secondary (failover) node.
E.g.
1. vgexport p s m v /tmp/vg02.map /dev/vg022. vgexport p s m v /tmp/vg03.map /dev/vg03
-
8/3/2019 HA With MC Guard Concepts
8/55
-p option : preview mode, so that the volume group will not be exported
off the original node.
- s option : Sharable option, Series 800 only. When the s option is
is specified, then the -p, -v, and m options must also be
specified. A mapfile is created that can be used to
create volume group entries on other systems in the high
availability cluster (with the vgimport command).
- m option : generates the map file
- v option : print verbose
FTP the .map files to secondary (failover) node.
On Secondary (failover) node, create the volume group directories:
E.g.
3. mkdir /dev/vg024. mkdir /dev/vg035. ls l /dev/*/group6. mknod /dev/vg01/group c 64 0x0200007. mknod /dev/vg02/group c 64 0x030000
Import the volume groups onto the secondary (failover) nodeE.g.
8. vgimport s m /tmp/vg02.map /dev/vg029. vgimport s m /tmp/vg03.map /dev/vg03
Note : Leave the cluster volume groups decactivated.
Configure the Cluster (do this on one node). cmquerycl [w] [full] v C /etc/cmcluster/cluster.conf n primary node n secondary node
[n other nodes in the cluster]
(Note : This will generate the cluster config file.)
. Edit the /etc/cmcluster/cluster.conf file
# **********************************************************************
# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************
-
8/3/2019 HA With MC Guard Concepts
9/55
# ***** For complete details about cluster parameters and how to ****
# ***** set them, consult the ServiceGuard manual. ****
# **********************************************************************
# Enter a name for this cluster. This name will be used to identify the
# cluster when viewing or manipulating it.
CLUSTER_NAME Kcdatabases
# Cluster Lock Parameters
#
# The cluster lock is used as a tie-breaker for situations
# in which a running cluster fails, and then two equal-sized
# sub-clusters are both trying to form a new cluster. The
# cluster lock may be configured using either a lock disk# or a quorum server.
#
# You can use either the quorum server or the lock disk as
# a cluster lock but not both in the same cluster.
#
# Consider the following when configuring a cluster.# For a two-node cluster, you must use a cluster lock. For
# a cluster of three or four nodes, a cluster lock is strongly
# recommended. For a cluster of more than four nodes, a
# cluster lock is recommended. If you decide to configure
# a lock for a cluster of more than four nodes, it must be
# a quorum server.
# Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and
# FIRST_CLUSTER_LOCK_PV parameters to define a lock disk.
# The FIRST_CLUSTER_LOCK_VG is the LVM volume group that
# holds the cluster lock. This volume group should not be
# used by any other cluster as a cluster lock device.
# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.
# The QS_HOST is the host name or IP address of the system
# that is running the quorum server process. The
# QS_POLLING_INTERVAL (microseconds) is the interval at which
# ServiceGuard checks to make sure the quorum server is running.
# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase
# the time interval after which the quorum server is marked DOWN.
#
# The default quorum server timeout is calculated from the
# ServiceGuard cluster parameters, including NODE_TIMEOUT and
# HEARTBEAT_INTERVAL. If you are experiencing quorum server
# timeouts, you can adjust these parameters, or you can include
# the QS_TIMEOUT_EXTENSION parameter.
#
# For example, to configure a quorum server running on node
# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to
# add 2 seconds to the system assigned value for the quorum server
# timeout, enter:
#
# QS_HOST qshost
# QS_POLLING_INTERVAL 120000000
# QS_TIMEOUT_EXTENSION 2000000
FIRST_CLUSTER_LOCK_VG /dev/vg02 < - -This is automatically searched for.
-
8/3/2019 HA With MC Guard Concepts
10/55
# Definition of nodes in the cluster.# Repeat node definitions as necessary for additional nodes.
NODE_NAME sgpue036
NETWORK_INTERFACE lan0
HEARTBEAT_IP 192.0.0.1.
-
8/3/2019 HA With MC Guard Concepts
11/55
# Enter the maximum number of packages which will be configured in the cluster.
# You can not add packages beyond this limit.
# This parameter is required.
MAX_CONFIGURED_PACKAGES 8
# List of cluster aware LVM Volume Groups. These volume groups will# be used by package applications via the vgchange -a e command.
# Neither CVM or VxVM Disk Groups should be used here.
# For example:
# VOLUME_GROUP /dev/vgdatabase
# VOLUME_GROUP /dev/vg02
VOLUME_GROUP /dev/vg02
VOLUME_GROUP /dev/vg03
Verify the Cluster Configuration (do this on one node)1. cmcheckconf [k] v C /etc/cmcluster/cluster.conf
Note : If there are no errors, means that the cluster is ready to be applied.
Distributing the Binary Configuration File (do this on one node). vgchange a y /dev/vg02 (cluster lock volume group). cmapplyconf [k] v C /etc/cmcluster/cluster.conf. vgchange a n /dev/vg02
Note : Need to activate cluster lock volume group in order for it to be applied for first time
clusters. Subsequent changes to the cluster may not need to activate cluster lock or even may
not need to down the cluster ie can be done online but not recommended.
Note : Need to deactivate cluster lock disk right after cluster changes are applied.
Backing up Volume Group and Cluster Lock Configuration Data (optional)
1. vgcfgbackup u /dev/vg022. vgcfgbackup u /dev/vg03
Note : This does not requires the volume groups to be activated.
Checking Cluster Operation (do on either node)
1. cmruncl v2. cmhaltnode v primary node3. cmrunnode v primary node4. cmhaltcl v5. cmruncl v6. cmhaltcl v
Note : Try this on all other nodes in the cluster as well.
Disable Automount of Volume Groups (On both nodes)
1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0
Note : This is necessary as we do not want the cluster volume groups to be activated when a
system reboots. It is now under the control of the cluster now.
-
8/3/2019 HA With MC Guard Concepts
12/55
Disable Autostart Features (On both nodes)
1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0
Note : This is to prevent the cluster node from automatically joining the cluster after a
reboot. Usually done when doing maintenance.
Create Packages
E.g.1. mkdir /etc/cmcluster/kci2prd < - can be any name2. cmmakepkg p /etc/cmcluster/kci2prd.conf < - can be any name3. Edit the configuration file
Note : If the package and control file is special (e.g NFS required) then do not run the
cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS
extension toolkit (similar for SAP extension). You still need to do adjustments to the files t
suit your needs.
# **********************************************************************
# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******
# **********************************************************************
# ******* Note: This file MUST be edited before it can be used. ********
# * For complete details about package parameters and how to set them, *
# * consult the MC/ServiceGuard ServiceGuard OPS Edition manuals *******
# **********************************************************************
# Enter a name for this package. This name will be used to identify the
# package when viewing or manipulating it. It must be different from
# the other configured package names.
PACKAGE_NAME kci2prd
# Enter the package type for this package. PACKAGE_TYPE indicates
# whether this package is to run as a FAILOVER or SYSTEM_MULTI_NODE
# package.#
# FAILOVER package runs on one node at a time and if a failure
# occurs it can switch to an alternate node.
#
# SYSTEM_MULTI_NODE
# package runs on multiple nodes at the same time.
# It can not be started and halted on individual nodes.
# Both NODE_FAIL_FAST_ENABLED and AUTO_RUN must be set
# to YES for this type of package. All SERVICES must
# have SERVICE_FAIL_FAST_ENABLED set to YES.
#
# NOTE: Packages which have a PACKAGE_TYPE of SYSTEM_MULTI_NODE are
# not failover packages and should only be used for applications
# provided by Hewlett-Packard.#
# Since SYSTEM_MULTI_NODE packages run on multiple nodes at
# one time, following parameters are ignored:
#
# FAILOVER_POLICY
# FAILBACK_POLICY
#
# Since an IP address can not be assigned to more than node at a
# time, relocatable IP addresses can not be assigned in the
# package control script for multiple node packages. If
-
8/3/2019 HA With MC Guard Concepts
13/55
# volume groups are assigned to multiple node packages they must
# activated in a shared mode and data integrity is left to the
# application. Shared access requires a shared volume manager.
#
#
# Examples : PACKAGE_TYPE FAILOVER (default)
# PACKAGE_TYPE SYSTEM_MULTI_NODE
#
PACKAGE_TYPE FAILOVER
# Enter the failover policy for this package. This policy will be used
# to select an adoptive node whenever the package needs to be started.
# The default policy unless otherwise specified is CONFIGURED_NODE.
# This policy will select nodes in priority order from the list of
# NODE_NAME entries specified below.
#
# The alternative policy is MIN_PACKAGE_NODE. This policy will select
# the node, from the list of NODE_NAME entries below, which is
# running the least number of packages at the time this package needs
# to start.
FAILOVER_POLICY CONFIGURED_NODE
# Enter the failback policy for this package. This policy will be used
# to determine what action to take when a package is not running on
# its primary node and its primary node is capable of running the
# package. The default policy unless otherwise specified is MANUAL.
# The MANUAL policy means no attempt will be made to move the package
# back to its primary node when it is running on an adoptive node.
#
# The alternative policy is AUTOMATIC. This policy will attempt to
# move the package back to its primary node whenever the primary node
# is capable of running the package.
FAILBACK_POLICY MANUAL
# Enter the names of the nodes configured for this package. Repeat
# this line as necessary for additional adoptive nodes.
#
# NOTE: The order is relevant.
# Put the second Adoptive Node after the first one.
#
# Example : NODE_NAME original_node
# NODE_NAME adoptive_node
#
# If all nodes in cluster is to be specified and order is not
# important, "NODE_NAME *" may be specified.
#
# Example : NODE_NAME *
NODE_NAME sgpue036 NODE_NAME sgpue037
# Enter the value for AUTO_RUN. Possible values are YES and NO.
# The default for AUTO_RUN is YES. When the cluster is started the
# package will be automatically started. In the event of a failure the
-
8/3/2019 HA With MC Guard Concepts
14/55
# package will be started on an adoptive node. Adjust as necessary.
#
# AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.
AUTO_RUN YES
# Enter the value for LOCAL_LAN_FAILOVER_ALLOWED.
# Possible values are YES and NO.
# The default for LOCAL_LAN_FAILOVER_ALLOWED is YES. In the event of a
# failure, this permits the cluster software to switch LANs locally
# (transfer to a standby LAN card). Adjust as necessary.
#
# LOCAL_LAN_FAILOVER_ALLOWED replaces obsolete NET_SWITCHING_ENABLED.
LOCAL_LAN_FAILOVER_ALLOWED YES
# Enter the value for NODE_FAIL_FAST_ENABLED.
# Possible values are YES and NO.
# The default for NODE_FAIL_FAST_ENABLED is NO. If set to YES,
# in the event of a failure, the cluster software will halt the node
# on which the package is running. All SYSTEM_MULTI_NODE packages must have# NODE_FAIL_FAST_ENABLED set to YES. Adjust as necessary.
NODE_FAIL_FAST_ENABLED NO
# Enter the complete path for the run and halt scripts. In most cases
# the run script and halt script specified here will be the same script,
# the package control script generated by the cmmakepkg command. This
# control script handles the run(ning) and halt(ing) of the package.
# Enter the timeout, specified in seconds, for the run and halt scripts.
# If the script has not completed by the specified timeout value,
# it will be terminated. The default for each script timeout is
# NO_TIMEOUT. Adjust the timeouts as necessary to permit full
# execution of each script.
# Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of# all SERVICE_HALT_TIMEOUT specified for all services.
RUN_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl
RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl
HALT_SCRIPT_TIMEOUT NO_TIMEOUT
# Enter the names of the storage groups configured for this package.
# Repeat this line as necessary for additional storage groups.
#
# Storage groups are only used with CVM disk groups. Neither
# VxVM disk groups or LVM volume groups should be listed here.
# By specifying a CVM disk group with the STORAGE_GROUP keyword
# this package will not run until the VxVM-CVM-pkg package is
# running and thus the CVM shared disk groups are ready for
# activation.
#
# NOTE: Should only be used by applications provided by
# Hewlett-Packard.
#
# Example : STORAGE_GROUP dg01
# STORAGE_GROUP dg02
# STORAGE_GROUP dg03
-
8/3/2019 HA With MC Guard Concepts
15/55
# STORAGE_GROUP dg04
#
# Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the
# SERVICE_HALT_TIMEOUT values for this package. Repeat these
# three lines as necessary for additional service names. All
# service names MUST correspond to the service names used by
# cmrunserv and cmhaltserv commands in the run and halt scripts.
#
# The value for SERVICE_FAIL_FAST_ENABLED can be either YES or
# NO. If set to YES, in the event of a service failure, the
# cluster software will halt the node on which the service is
# running. If SERVICE_FAIL_FAST_ENABLED is not specified, the
# default will be NO.
#
# SERVICE_HALT_TIMEOUT is represented in the number of seconds.
# This timeout is used to determine the length of time (in
# seconds) the cluster software will wait for the service to
# halt before a SIGKILL signal is sent to force the termination
# of the service. In the event of a service halt, the cluster
# software will first send a SIGTERM signal to terminate the
# service. If the service does not halt, after waiting for the# specified SERVICE_HALT_TIMEOUT, the cluster software will send
# out the SIGKILL signal to the service to force its termination.
# This timeout value should be large enough to allow all cleanup
# processes associated with the service to complete. If the
# SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be
# assumed, meaning the cluster software will not wait at all
# before sending the SIGKILL signal to halt the service.
#
# Example: SERVICE_NAME DB_SERVICE
# SERVICE_FAIL_FAST_ENABLED NO
# SERVICE_HALT_TIMEOUT 300
#
# To configure a service, uncomment the following lines and
# fill in the values for all of the keywords.#
SERVICE_NAME kci2prdSERVICE_FAIL_FAST_ENABLED NO
SERVICE_HALT_TIMEOUT 300
# Enter the network subnet name that is to be monitored for this package.
# Repeat this line as necessary for additional subnet names. If any of
# the subnets defined goes down, the package will be switched to another
# node that is configured for this package and has all the defined subnets
# available.
SUBNET 15.209.0.0
# The keywords RESOURCE_NAME, RESOURCE_POLLING_INTERVAL,
# RESOURCE_START, and RESOURCE_UP_VALUE are used to specify Package
# Resource Dependencies. To define a package Resource Dependency, a
# RESOURCE_NAME line with a fully qualified resource path name, and
# one or more RESOURCE_UP_VALUE lines are required. The
# RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional.
#
# The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the
# resource is to be monitored. It will be defaulted to 60 seconds if
-
8/3/2019 HA With MC Guard Concepts
16/55
# RESOURCE_POLLING_INTERVAL is not specified.
#
# The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.
# The default setting for RESOURCE_START is AUTOMATIC. If AUTOMATIC
# is specified, ServiceGuard will start up resource monitoring for
# these AUTOMATIC resources automatically when the node starts up.
# If DEFERRED is selected, ServiceGuard will not attempt to start
# resource monitoring for these resources during node start up. User
# should specify all the DEFERRED resources in the package run script
# so that these DEFERRED resources will be started up from the package
# run script during package run time.
#
# RESOURCE_UP_VALUE requires an operator and a value. This defines
# the resource 'UP' condition. The operators are =, !=, >, =,
# and or >= may be used
# for the first operator, and only < or 5.1 greater than 5.1 (threshold)
# RESOURCE_UP_VALUE > -5 and < 10 between -5 and 10 (range)
#
# Note that "and" is required between the lower limit and upper limit# when specifying a range. The upper limit must be greater than the lower
# limit. If RESOURCE_UP_VALUE is repeated within a RESOURCE_NAME block, then
# they are inclusively OR'd together. Package Resource Dependencies may be
# defined by repeating the entire RESOURCE_NAME block.
#
# Example : RESOURCE_NAME /net/interfaces/lan/status/lan0
# RESOURCE_POLLING_INTERVAL 120
# RESOURCE_START AUTOMATIC
# RESOURCE_UP_VALUE = RUNNING
# RESOURCE_UP_VALUE = ONLINE
#
# Means that the value of resource /net/interfaces/lan/status/lan0
# will be checked every 120 seconds, and is considered to
# be 'up' when its value is "RUNNING" or "ONLINE".
#
# Uncomment the following lines to specify Package Resource Dependencies.
#
#RESOURCE_NAME
#RESOURCE_POLLING_INTERVAL
#RESOURCE_START
#RESOURCE_UP_VALUE [and ]
-
8/3/2019 HA With MC Guard Concepts
17/55
Create Package Control Scripts1. cmmakepkg s /etc/cmcluster/kci2prd/kci2prd.cntl2. Edit the control script.
Note : If the package and control file is special (e.g NFS required) then do not run the
cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS
extension toolkit (similar for SAP extension). You still need to do adjustments to the files tsuit your needs.
Note : It is possible that packages do not use any volume groups.
# **********************************************************************
# * *
# * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) *
# * *
# * Note: This file MUST be edited before it can be used. *
# * *
# **********************************************************************
# The PACKAGE and NODE environment variables are set by
# ServiceGuard at the time the control script is executed.
# Do not set these environment variables yourself!
# The package may fail to start or halt if the values for
# these environment variables are altered.
# UNCOMMENT the variables as you set them.
# Set PATH to reference the appropriate directories.
-
8/3/2019 HA With MC Guard Concepts
18/55
PATH=/usr/bin:/usr/sbin:/etc:/bin
# VOLUME GROUP ACTIVATION:
# Specify the method of activation for volume groups.
# Leave the default ("VGCHANGE="vgchange -a e") if you want volume
# groups activated in exclusive mode. This assumes the volume groups have
# been initialized with 'vgchange -c y' at the time of creation.
#
# Uncomment the first line (VGCHANGE="vgchange -a e -q n"), and comment
# out the default, if your disks are mirrored on separate physical paths,
#
# Uncomment the second line (VGCHANGE="vgchange -a e -q n -s"), and comment
# out the default, if your disks are mirrored on separate physical paths,
# and you want the mirror resynchronization to ocurr in parallel with
# the package startup.
#
# Uncomment the third line (VGCHANGE="vgchange -a y") if you wish to
# use non-exclusive activation mode. Single node cluster configurations
# must use non-exclusive activation.
#
# VGCHANGE="vgchange -a e -q n"
# VGCHANGE="vgchange -a e -q n -s"
# VGCHANGE="vgchange -a y"VGCHANGE="vgchange -a e" # Default
# CVM DISK GROUP ACTIVATION:
# Specify the method of activation for CVM disk groups.
# Leave the default
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite")
# if you want disk groups activated in the exclusive write mode.
#
# Uncomment the first line
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"),
# and comment out the default, if you want disk groups activated in
# the readonly mode.
#
# Uncomment the second line# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"),
# and comment out the default, if you want disk groups activated in the
# shared read mode.
#
# Uncomment the third line
# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"),
# and comment out the default, if you want disk groups activated in the
# shared write mode.
#
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"
# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"
CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite"
# VOLUME GROUPS
# Specify which volume groups are used by this package. Uncomment VG[0]=""
# and fill in the name of your first volume group. You must begin with
# VG[0], and increment the list in sequence.
#
# For example, if this package uses your volume groups vg01 and vg02, enter:
# VG[0]=vg01
# VG[1]=vg02
#
# The volume group activation method is defined above. The filesystems
-
8/3/2019 HA With MC Guard Concepts
19/55
# associated with these volume groups are specified below.
#
VG[0]=vg02VG[1]=vg03
# CVM DISK GROUPS
# Specify which cvm disk groups are used by this package. Uncomment
# CVM_DG[0]="" and fill in the name of your first disk group. You must
# begin with CVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# CVM_DG[0]=dg01
# CVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above. The filesystems
# associated with these volume groups are specified below in the CVM_*
# variables.
#
#CVM_DG[0]=""
# VxVM DISK GROUPS
# Specify which VxVM disk groups are used by this package. Uncomment
# VXVM_DG[0]="" and fill in the name of your first disk group. You must# begin with VXVM_DG[0], and increment the list in sequence.
#
# For example, if this package uses your disk groups dg01 and dg02, enter:
# VXVM_DG[0]=dg01
# VXVM_DG[1]=dg02
#
# The cvm disk group activation method is defined above.
#
#VXVM_DG[0]=""
#
# NOTE: A package could have LVM volume groups, CVM disk groups and VxVM
# disk groups.
## FILESYSTEMS
# Specify the filesystems which are used by this package. Uncomment
# LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]="" and fill in the name of your first
# logical volume, filesystem and mount option for the file system. You must
# begin with LV[0], FS[0] and FS_MOUNT_OPT[0] and increment the list in
# sequence.
#
# For the LVM example, if this package uses the file systems pkg1a and
# pkg1b, which are mounted on the logical volumes lvol1 and lvol2 with
# read and write options enter:
# LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_MOUNT_OPT[0]="-o rw"
# LV[1]=/dev/vg01/lvol2; FS[1]=/pkg1b; FS_MOUNT_OPT[1]="-o rw"
#
# For the CVM or VxVM example, if this package uses the file systems
# pkg1a and pkg1b, which are mounted on the volumes lvol1 and lvol2
# with read and write options enter:
# LV[0]="/dev/vx/dsk/dg01/vol01"; FS[0]="/pkg1a"; FS_MOUNT_OPT[0]="-o rw"
# LV[1]="/dev/vx/dsk/dg01/vol02"; FS[1]="/pkg1b"; FS_MOUNT_OPT[1]="-o rw"
#
# The filesystems are defined as triplets of entries specifying the logical
# volume, the mount point and the mount options for the file system. Each
# filesystem will be fsck'd prior to being mounted. The filesystems will be
# mounted in the order specified during package startup and will be unmounted
# in reverse order during package shutdown. Ensure that volume groups
-
8/3/2019 HA With MC Guard Concepts
20/55
# referenced by the logical volume definitions below are included in
# volume group definitions above.
#
#LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]=""
LV[0]=/dev/vg02/lvol1; FS[0]=/oracle/KCI2PRD/data01; FS_MOUNT_OPT[0]="-o rw,suid,largefiles"
LV[1]=/dev/vg02/lvol2; FS[1]=/oracle/KCI2PRD/data02; FS_MOUNT_OPT[1]="-o rw,suid,largefiles"LV[2]=/dev/vg02/lvol3; FS[2]=/oracle/KCI2PRD/data03; FS_MOUNT_OPT[2]="-o rw,suid,largefiles"LV[3]=/dev/vg02/lvol4; FS[3]=/oracle/KCI2PRD/data04; FS_MOUNT_OPT[3]="-o rw,suid,largefiles"
LV[4]=/dev/vg02/lvol5; FS[4]=/oracle/KCI2PRD/data05; FS_MOUNT_OPT[4]="-o rw,suid,largefiles"
LV[5]=/dev/vg02/lvol6; FS[5]=/oracle/KCI2PRD/data06; FS_MOUNT_OPT[5]="-o rw,suid,largefiles"LV[6]=/dev/vg02/lvol7; FS[6]=/oracle/KCI2PRD/data07; FS_MOUNT_OPT[6]="-o rw,suid,largefiles"
LV[7]=/dev/vg02/lvol8; FS[7]=/oracle/KCI2PRD/data08; FS_MOUNT_OPT[7]="-o rw,suid,largefiles"LV[8]=/dev/vg02/lvol9; FS[8]=/oracle/KCI2PRD/data09; FS_MOUNT_OPT[8]="-o rw,suid,largefiles"
LV[9]=/dev/vg02/lvol10; FS[9]=/oracle/KCI2PRD/data10; FS_MOUNT_OPT[9]="-o rw,suid,largefiles"LV[10]=/dev/vg02/lvol11; FS[10]=/oracle/KCI2PRD/mirrlogA; FS_MOUNT_OPT[10]="-o
rw,suid,largefiles"LV[11]=/dev/vg02/lvol12; FS[11]=/oracle/KCI2PRD/mirrlogB; FS_MOUNT_OPT[11]="-o
rw,suid,largefiles"LV[12]=/dev/vg02/lvol13; FS[12]=/oracle/KCI2PRD/origlogA; FS_MOUNT_OPT[12]="-o
rw,suid,largefiles"LV[13]=/dev/vg02/lvol14; FS[13]=/oracle/KCI2PRD/origlogB; FS_MOUNT_OPT[13]="-o
rw,suid,largefiles"LV[14]=/dev/vg03/lvol1; FS[14]=/oracle/KCI2PRD/arch; FS_MOUNT_OPT[14]="-o rw,suid,largefiles"
LV[15]=/dev/vg03/lvol2; FS[15]=/oracle/KCI2PRD/bkup01; FS_MOUNT_OPT[15]="-o rw,suid,largefiles
#
# VOLUME RECOVERY
#
# When mirrored VxVM volumes are started during the package control
# bring up, if recovery is required the default behavior is for
# the package control script to wait until recovery has been
# completed.
#
# To allow mirror resynchronization to ocurr in parallel with
# the package startup, uncomment the line
# VXVOL="vxvol -g \$DiskGroup -o bg startall" and comment out the default.#
# VXVOL="vxvol -g \$DiskGroup -o bg startall"
VXVOL="vxvol -g \$DiskGroup startall" # Default
# FILESYSTEM UNMOUNT COUNT
# Specify the number of unmount attempts for each filesystem during package
# shutdown. The default is set to 1.
FS_UMOUNT_COUNT=1
# FILESYSTEM MOUNT RETRY COUNT.
# Specify the number of mount retrys for each filesystem.
# The default is 0. During startup, if a mount point is busy
# and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and
# the script will exit with 1. If a mount point is busy and
# FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt
# to kill the user responsible for the busy mount point
# and then mount the file system. It will attempt to kill user and
# retry mount, for the number of times specified in FS_MOUNT_RETRY_COUNT.
# If the mount still fails after this number of attempts, the script
# will exit with 1.
# NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute
# "fuser -ku" to freeup busy mount point.
FS_MOUNT_RETRY_COUNT=0
-
8/3/2019 HA With MC Guard Concepts
21/55
# CONCURRENT VGCHANGE OPERATIONS
# Specify the number of concurrent volume group activations or
# deactivations to allow during package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while activating or deactivating a large number of volume groups in the
# package. If the specified value is less than 1, the script defaults it
# to 1 and proceeds with a warning message in the package control script
# logfile.
CONCURRENT_VGCHANGE_OPERATIONS=1
# CONCURRENT DISK GROUP OPERATIONS
# Specify the number of concurrent VxVM DG imports or deports to allow
# during package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while importing or deporting a large number of disk groups in the
# package. If the specified value is less than 1, the script defaults it
# to 1 and proceeds with a warning message in the package control script
# logfile.
CONCURRENT_DISKGROUP_OPERATIONS=1
# CONCURRENT FSCK OPERATIONS
# Specify the number of concurrent fsck to allow during package startup.# Setting this value to an appropriate number may improve the performance
# while checking a large number of file systems in the package. If the
# specified value is less than 1, the script defaults it to 1 and proceeds
# with a warning message in the package control script logfile.
CONCURRENT_FSCK_OPERATIONS=1
# CONCURRENT MOUNT AND UMOUNT OPERATIONS
# Specify the number of concurrent mounts and umounts to allow during
# package startup or shutdown.
# Setting this value to an appropriate number may improve the performance
# while mounting or un-mounting a large number of file systems in the package.
# If the specified value is less than 1, the script defaults it to 1 and
# proceeds with a warning message in the package control script logfile.
CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1
# IP ADDRESSES
# Specify the IP and Subnet address pairs which are used by this package.
# Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first
# IP and subnet address. You must begin with IP[0] and SUBNET[0] and
# increment the list in sequence.
#
# For example, if this package uses an IP of 192.10.25.12 and a subnet of
# 192.10.25.0 enter:
# IP[0]=192.10.25.12
# SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0)
#
# Hint: Run "netstat -i" to see the available subnets in the Network field.
#
# IP/Subnet address pairs for each IP address you want to add to a subnet
# interface card. Must be set in pairs, even for IP addresses on the same
# subnet.
#
#IP[0]=""
#SUBNET[0]=""
IP[0]="15.209.0.33"
SUBNET[0]="15.209.0.0" # netmask 255.255.255.192
-
8/3/2019 HA With MC Guard Concepts
22/55
# SERVICE NAMES AND COMMANDS.
# Specify the service name, command, and restart parameters which are
# used by this package. Uncomment SERVICE_NAME[0]="", SERVICE_CMD[0]="",
# SERVICE_RESTART[0]="" and fill in the name of the first service, command,
# and restart parameters. You must begin with SERVICE_NAME[0], SERVICE_CMD[0],
# and SERVICE_RESTART[0] and increment the list in sequence.
#
# For example:
# SERVICE_NAME[0]=pkg1a
# SERVICE_CMD[0]="/usr/bin/X11/xclock -display 192.10.25.54:0"
# SERVICE_RESTART[0]="" # Will not restart the service.
#
# SERVICE_NAME[1]=pkg1b
# SERVICE_CMD[1]="/usr/bin/X11/xload -display 192.10.25.54:0"
# SERVICE_RESTART[1]="-r 2" # Will restart the service twice.
#
# SERVICE_NAME[2]=pkg1c
# SERVICE_CMD[2]="/usr/sbin/ping"
# SERVICE_RESTART[2]="-R" # Will restart the service an infinite
# number of times.
#
# Note: No environmental variables will be passed to the command, this
# includes the PATH variable. Absolute path names are required for the# service command definition. Default shell is /usr/bin/sh.
#
#SERVICE_NAME[0]=""
#SERVICE_CMD[0]=""
#SERVICE_RESTART[0]=""
SERVICE_NAME[0]=kci2prdSERVICE_CMD[0]="/etc/cmcluster/kci2prd/kci2prd.sh monitor"SERVICE_RESTART[0]=""
# DEFERRED_RESOURCE NAME
# Specify the full path name of the 'DEFERRED' resources configured for
# this package. Uncomment DEFERRED_RESOURCE_NAME[0]="" and fill in the# full path name of the resource.
#
#DEFERRED_RESOURCE_NAME[0]=""
# DTC manager information for each DTC.
# Example: DTC[0]=dtc_20
#DTC_NAME[0]=
# START OF CUSTOMER DEFINED FUNCTIONS
# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# started. You can create as many functions as you need.
function customer_defined_run_cmds
{
# ADD customer defined run commands.
: # do nothing instruction, because a function must contain some command.
/etc/cmcluster/kci2prd/kci2prd.sh start
test_return 51
}
-
8/3/2019 HA With MC Guard Concepts
23/55
# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# halted.
function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
: # do nothing instruction, because a function must contain some command.
/etc/cmcluster/kci2prd/kci2prd.sh shutdown
test_return 52
}
# END OF CUSTOMER DEFINED FUNCTIONS
..
Ftp all ascii scripts to secondary (failover) node/nodes.
Verify the Cluster Configuration (Do this on the packages primary node)cmcheckconf [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf
Note : If there are no errors, means that the package is ready to be applied
Distribute the Cluster Configuration File (Do this on the packages primary node)1. vgchange a y /dev/vg02 (cluster lock volume group)
cmapplyconf [v] [C] [/etc/cmcluster/cluster.conf] P
/etc/cmcluster/kci2prd/kci2prd.conf
3. vgchange a n /dev/vg02
Note : You should not need to activate and later deactivate cluster lock volume group while
applying packages.
Note : Repeat steps Create Packages to here again if there are more packages required in thecluster.
Configure Automounter (Do this only if your system is using automounter)Check that in /etc/rc.config.d/nfsconf, the automounter section should be:
AUTOMOUNT=1
AUTOMASTER="/etc/auto_master"
AUTOMOUNT_OPTIONS="-f $AUTO_MASTER"
AUTOMOUNTD_OPTIONS=
Check in /etc/rc.config.d/nfsconf, one nfs client and one nfs server daemon is configured to
run:
NFS_CLIENT=1
NFS_SERVER=1
NUM_NFSD=4
NUM_NFSIOD=4Add this line to /etc/auto_master
/- /etc/auto.direct
Create an /etc/auto.direct file
/oracle :/export/
Restart the automounter with
/sbin/init.d/nfs.client stop
/sbin/init.d/nfs.client start
Disable Automount of Volume Groups (On both nodes)
1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0
-
8/3/2019 HA With MC Guard Concepts
24/55
Enable Autostart Features (On both nodes)
1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=1
Checking Package Operation (do on either node)
7. cmruncl v8. cmhaltnode v primary node (node will be halted and package failed
over to secondary (adoptive) node)
9. cmrunnode v primary node (node will rejoin cluster)10. cmhaltpkg package name (halt package on adoptive node)11. cmrunpkg package name (run package on original node)12. cmmodpkg e package name (enable package switching)13. cmhaltcl v
Note : Use cmviewcl or cmviewcl v to view results of each command.
MC/ServiceGuard Template
System Configuration
Hardware Information
Hostname
Model
Operating System version
Physical Memory
Swap Space
Non-Shared HDs
Shared HDs
Tapes
LAN Cards
Primary and Standby Network
Type
Heartbeat Network TypeMC ServiceGuard Version
MirrorDisk/UX Version
Online JFS Version
Application name / Application
version
Database name / Database version
OS/Appls Patch Level
-
8/3/2019 HA With MC Guard Concepts
25/55
System Information
Server Hostname
Server IP Address
Server IP Netmask
Server Default Router
Primary Network on separate
Switch
Standby Network on separate
Switch
Operation System File System Layout
Volume Group Logical FS Type Size (mb) Mount point
MC/ServiceGuard Configuration
Cluster Information
Cluster Name
Cluster Members
Cluster Lock Disk
Heartbeat Interval Default Value is 1
Node Timeout Default Value is 2 ; recommended 8
Network Polling Interval Default Value is 2
Autostart Delay Default Value is 10mins
Maximum Configured Packages To allow online package reconfiguration
Packages Overview
The cluster consist of ________ packages:
1.
2.3.
Detailed Package Information:
Package Name
Re-locatable Hostname
Re-locatable IP Address
Monitor Subnet
-
8/3/2019 HA With MC Guard Concepts
26/55
Primary Node
Adoptive Node
Run/Halt Script
Run/Halt Script Timeout
Package Switch Enabled
Network Switch Enabled
Node Failfast Enabled
Service NameVolume Groups
Logic Logical Volume and File System Details
Device file Size/ Type Mount Point Owner Group Perm.
Parameter Value
CLUSTER_NAME
FIRST_CLUSTER_LOCK_VG
NODE_NAME
NETWORK_INTERFACE
HEARTBEAT_IP
NETWORK_INTERFACE
HEARTBEAT_IP
FIRST_CLUSTER_LOCK_PV
NODE_NAME
NETWORK_INTERFACE
HEARTBEAT_IP
NETWORK_INTERFACE
HEARTBEAT_IP
-
8/3/2019 HA With MC Guard Concepts
27/55
FIRST_CLUSTER_LOCK_PV
HEARTBEAT_INTERVAL (Default value is 1s)
NODE_TIMEOUT (Default value is 2s)
AUTO_START_TIMEOUT (Default value is 10 mins)
NETWORK_POLLING_INTERVAL (Default value is 2s)
MAX_CONFIGURED_PACKAGES (To allow and add for online package
reconfiguration)
VOLUME_GROUP
!"
Parameter Value
PACKAGE_NAME
NODE_NAME
NODE_NAME
RUN_SCRIPT
RUN_SCRIPT_TIMEOUT
HALT_SCRIPT
HALT_SCRIPT_TIMEOUT
SERVICE_NAME
SUBNET
AUTO_RUN
(PKG_SWITCHING_ENABLED)
YES
LOCAL_LAN_FAILOVER_ALLOWED
(NET_SWITCHING_ENABLED)
YES
-
8/3/2019 HA With MC Guard Concepts
28/55
NODE_FAIL_FAST_ENABLED NO
!#"
Parameter Value
PATH (Default value is 2s)
VGCHANGE "vgchange a e"
VG[0]
VG[1]
LV[0]
LV[1]
LV[2]
FS[0]
FS[1]
FS[2]
IP[0]
SUBNET[0]
SERVICE_NAME[0]
SERVICE_CMD[0]
SERVICE_RESTART[0]
function
customer_defined_run_cmds
-
8/3/2019 HA With MC Guard Concepts
29/55
function
customer_defined_halt_cmds
$%#'"
Parameter Value
INFORMIX_HOME or
ORACLE_HOME
INFORMIX_SESSION_NAME or
ORACLE_SESSION HOME
(Mount point and session name)
MONITOR_INTERVAL (Time between checks)
MONITOR_PROCESSES (Processes like dataserver etc)
PACKAGE_NAME
TIME_OUT (Waiting time in seconds for Informix/Oracle
abort to complete before killing
Informix/Oracleprocesses)
Note : If it is oracle, SAP or NFS, there are pre-defined scripts for these, provided you
install the enterprise master toolkit and nfs toolkit - /opt/cmcluster/
-
8/3/2019 HA With MC Guard Concepts
30/55
STING MC/SERVICEGUARD
1.1 Test Overview
This section contains the test requirement and test plan for the MC/ServiceGuard
1.2 Test Requirement
The MC/ServiceGuard product is a High Availability solution that performs system failure detection and transfers the applicati
from the primary node to the adoptive node when a system failure occurs.
Note : We assume that there is only 1 package in the cluster. If in the event there are more packages, please change/add steaccordingly.
The faults to be tested and the appropriate methods are listed below:
Type of Failure Method of Simulation
CPU, Memory, Power Supply and
Operating System
Active LAN
Total Data LAN
Reset of server
Removal of LAN cable from active LAN card
Removal of all Data LAN cables from server
1.3 Verification method
Upon startup of the package, the verification checkpoints are
-
8/3/2019 HA With MC Guard Concepts
31/55
. Log onto surviving server and run the command cmviewcl to check that the package application is RUNNING
. Ping the relocatable IP from another station in the same network
. Check that all shared file systems are mounted.
-
8/3/2019 HA With MC Guard Concepts
32/55
1.4 Test Checklist
Five categories of test that will be performed are as follows:
a. Normal Bootupb. Manual Package Switching Functionality
c. LAN Failure Tests Heartbeat Failure
Data LAN Failure
d. System Failure Testse. Failures not affecting package These are sanity checks to ensure that failure of the adoptive node in the cluster has no side effect on the primary node.
No. Test Method of
Simulation
Expected Result Check Remarks
NORMAL BOOTUP SEQUENCE
1 Normal boot
up
Power on or reboot
both servers
Cluster is up with
node1 and node2running and package is
running on node1
MANUAL PACKAGE SWITCHING FUNCTIONALITY
1 Package halts
successfully
on node1
Run cmhaltpkg v
package command
Application shuts
down successfully and
package is halted
properly
2 Package
starts
successfully
on node2
Run cmrunpkg v
n node2 package
command
Package starts up
successfully on node2
3 Package halts
successfully
on node2
Run cmhaltpkg v
package command
Application shuts
down successfully and
package is halted
properly
4 Package
starts
successfully
on node1
Run cmrunpkg v
n node1 package
command
Package starts up
successfully on node1
-
8/3/2019 HA With MC Guard Concepts
33/55
No. Test Method of
Simulation
Expected Result Check Remarks
LAN FAILURE TESTS
1 Heartbeat
LAN failure
on node1
package is running
on node1
Pull out lan0 cable
on node1
lan1 takes over as
Heartbeat LAN and
package remainsrunning on node1
2 Pri Data LANfailure on
node1
package is runningon node1
Pull out lan1 cable
on node1
Sec LAN, lan5 takesover as active LAN
and package remains
running on node1
3 Sec Data
LAN failure
on node1
package is running
on node1
Pull out lan5 cable
on node1
Pri LAN, lan1 takes
over as active LAN
and package remains
running on node1
4 Total Data
LAN Failure
on node1
package is running
on node1
Pull out lan1 and
lan5 from node1
Package fails to node2
if it is running as a
node in the cluster ; 50
% chance of failing on
adoptive node as
unable to get cluster
lock and panic reboots
5 Heartbeat
LAN failure
on node2
package is running
on node2
Pull out lan0 cable
on node2
lan1 takes over as
Heartbeat LAN and
package remains
running on node2
6 Pri Data LANfailure on
node2
package is runningon node2
Pull out lan1 cable
on node2
Secondary LAN, lan5takes over as active
LAN and package
remains running on
node2
7 Sec Data
LAN failure
on node2
package is running
on node2
Pull out lan5 cable
on node2
Pri LAN, lan1 takes
over as active LAN
and package remains
running on node2
8 Total Data
LAN Failure
on node2
package is running
on node2
Pull out lan1 and
lan5 from node2
Package fails to node1
if it is running as a
node in the cluster; ;50 % chance of failing
on adoptive node as
unable to get cluster
lock and panic reboots
-
8/3/2019 HA With MC Guard Concepts
34/55
You may wish to extend the test to test the functionaility of MC/ServiceGuard with regards to application monitoring
scripts and application failover.
No. Test Method of
Simulation
Result Check Remarks
SYSTEM FAILURE TESTS
1 node1 failure package is running
on node1
Reset node1
(try both shutdown
ry and reboot or rs
from console)
Package fails to node2
if it is running as anode in the cluster
Yes
2 node2 failure package is running
on node2
Reset node2
(try both shutdown
ry and reboot or
rs from console)
Package fails to node1
if it is running as a
node in the cluster
Yes
FAILURES NOT AFFECTING PACKAGE
1 node2 failure package is running
on node1
Cluster reforms to a
single node cluster and
package continues to
run on node1
Yes
-
8/3/2019 HA With MC Guard Concepts
35/55
MC/ServiceGuard Troubleshooting
Troubleshooting using log files
For troubleshooting, there are a few files that will help to log problems experienced by MC/ServiceGuard, these are:
a. /var/adm/syslog/syslog.logb. /etc/cmcluster/packagedir/packagename.cntl.log
These files need to be maintained as the file size will grow. This can ultimately affect / file system if not maintained.
The package control log file will contain information regarding packagestart/stop. Each package will have its own package control log file.
Note : Always use cmviewcl or cmviewcl v to help to see the status of your
cluster.
Common Problems :
. Problems of configuration- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files
. Warning : Missing cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem occurs after something has changed affecting the cluste
lock disk
eg. SCSI ID of disk changed- No issue at the moment, but when a tie breaker period occurs, nodes
will not be able to detect the disk and all nodes may panic reboot.
Solution :
1. Schedule downtime to halt the cluster (cmhaltcl)
2. Run vgchange c n vgsh to remove the cluster lock volume group fromthe cluster.
3. Activate vgsh on the node where the cluster configuration ascii fileexists by running
vgchange a y vgsh and do a cmapplyconf v C /etc/cmcluster/cluster.ascii
Answer yes
to the change and then run vgchange a n vg02 to deactivate the cluster
lock volume
disk.
-
8/3/2019 HA With MC Guard Concepts
36/55
4. Start the cluster (cmruncl)
. Warning : I/O error on cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem usually occurs if something is wrong with one of the
SPUs or controllers of the disk array connected to one of the nodes.
- If happened on the primary node, it would be possible that theapplication would already have hung.
- No issue if occur on adoptive node at the moment, but when a tiebreaker period occurs, nodes will not be able to detect the disk and all nodesmay panic reboot.
- In other cases, cluster lock disk itself could be faulty and a hungsituation wrf to the application and bdfwill occur.
Solution :
- Schedule downtime and ask CE to check the SPU or controller
. Cluster failures- Cluster cannot start- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files- could be hardware, package induced, application problem. Again check
log files.
. Package failures
- Package unable to start totally on all nodes- Check syslog and package log file.
Possible config problem or control script problem orapplication script name changed.
- Package cannot failover to adoptive node but can start on primarynode
- Check syslog and package log file. Possible could be package switching or node disabled Cmmodpkg e package name to enable package switching Cmmodpkg e package name n node name to enable node
(package to run on this node)
- Package cannot mount/umount filesystems from package log
- Package failed to start because of mount problems Possible shared VG not marked as cluster or activated -
manually mounted fileystem or someone accessing umounted directory
Unmount all filesystems, check who accessing directory and gethat person to exit, vgchange c y vgsh to mark cluster and deactivate and trystarting again.
Harddisk problem- Package failed to halt
application process hung and could not be killed.
-
8/3/2019 HA With MC Guard Concepts
37/55
Hardddisk problem
. Service Failure- Cmviewcl v to see the status of all packages and their services.- Trace from the package control file and syslog to see why did it fai
etc.- Possible config problem or control script problem or application
script name changed.
. Node timeout- Recommended node timeout value in cluster config file is 5-8 seconds- Otherwise if use default 2 seconds, system may panic reboot due to
tie breaker scenario because of poor network performance.
. GSP problems- Known problem for L class servers (certain generation)- Cause system to panic reboot and failover package to adoptive node- Patch recommended/GSP Firmware upgrade need to be done
. LAN problems- NMID problems
0. Disk problems- SCSI ID changed /conflict perhaps due to controller card factory default setting Cannot bring up cluster Need CE to change accordingly.- Cluster lock disk failed
If lock disk RAID1 or RAID5 no problem If lock disk LVM mirror need to do vgcfgrestore and vgsync to
recover the lock info which is stored on the BBR table part of the disk
If no mirror, then need reapply cluster
-
8/3/2019 HA With MC Guard Concepts
38/55
On-Going Upgrades/Changes to systems/cluster /package
- Pro-active Patch installation (node by node)- Data Centre outages (shutdown entire cluster)- Rolling upgrades (node by node)
Keychain Cluster - Shutdown and Startup Procedure
-------------------------------------------------
Last update: 19 June SGP 2002
*******************************************************************
Please follow these steps whenever you need to arrange a shutdown
for sgpue036.sgp.hp.com & sgpue037.sgp.hp.com.
Special handling is required because of their MC/Serviceguard HA
environment.
*******************************************************************
Before you shutdown a node
--------------------------
1. Get agreement with application support on schedule, scope and
duration of shutdown.
2. Ensure both nodes in the cluster are up and running. If any node
is down or appears to be having problems, DO NOT proceed with
shutdown.
3. If shutting down a primary node, goto section titled "Shutting down
and restarting the primary node".
If shutting down a secondary node, goto section titled "Shutting down
and restarting the secondary node".
If shutting down the entire cluster, goto section titled "Shutting down
and restarting the MC/SG cluster".
If doing rolling upgrade, goto section titled "Doing a rolling upgrade".
Shutting down and restarting the primary node
------------------------------------------------
We assume primary node = sgpue036 and secondary node = sgpue037
-
8/3/2019 HA With MC Guard Concepts
39/55
in the following examples.
1. Before shutdown, make a note of all packages currently running
on each node.
sgpue036# cmviewcl
> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
2. Halt primary node sgpue036
sgpue036# cmhaltnode -f -v sgpue036
Production packages will failover from sgpue036 to sgpue037. sgpue036
will cease to be a member of the active cluster.
3. Check package status on cluster
sgpue036# cmviewcl
> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running disabled sgpue037
> kcdbstg up running enabled sgpue037> kcnfs up running enabled sgpue037
>
> NODE STATUS STATE
> sgpue036 down halted
4. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the
following line:
-
8/3/2019 HA With MC Guard Concepts
40/55
AUTOSTART_CMCLD = 0
5. Now we can proceed to shutdown (for PM, repair) or reboot
(for patching, kernel regen) sgpue036, eg:
sgpue036# /etc/shutdown -h 0
sgpue036# /etc/shutdown -r 0
6. When repair or reboot is over, sgpue036 should be booted up to
run level 3
sgpue036# who -r
. run-level 3 Jan 17 08:01 3 0 S
7. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the
following line:
AUTOSTART_CMCLD = 1
8. Make sgpue036 join the cluster
sgpue036# cmrunnode -v sgpue036
9. Halt production packages on sgpue037
sgpue037# cmhaltpkg kci2stg
10. Restart production packages on sgpue036
sgpue036# cmrunpkg kci2stg
11. Re-enable package switching on production packages
sgpue036# cmmodpkg -e kci2stg
12. Check package status on cluster.
You should see the same listing as shown in Step 1 ie.
sgpue036# cmviewcl
> CLUSTER STATUS
> knet up
>> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
-
8/3/2019 HA With MC Guard Concepts
41/55
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
13. Release sgpue036 to customers (notify by phone, email etc)
Shutting down and restarting the secondary node
---------------------------------------------
1. Before shutdown, make a note of all packages currently running
on each node
sgpue037# cmviewcl
> CLUSTER STATUS
> knet up
>> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
2. Halt secondary node sgpue037
sgpue037# cmhaltnode -f -v sgpue037
Production packages will failover from sgpue037 to sgpue036. sgpue037
will cease to be a member of the active cluster.
3. Check package status on cluster
sgpue037# cmviewcl
> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
-
8/3/2019 HA With MC Guard Concepts
42/55
> kcdbstg up running disabled sgpue036
> kcnfs up running disabled sgpue036
>
> NODE STATUS STATE
> sgpue037 down halted
4. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include thefollowing line:
AUTOSTART_CMCLD = 0
5. Now we can proceed to shutdown (for PM, repair) or reboot
(for patching, kernel regen) sgpue037, eg:
sgpue037# /etc/shutdown -h 0
c# /etc/shutdown -r 0
6. When repair or reboot is over, sgpue037 should be booted up to
run level 3
sgpue037# who -r
. run-level 3 Jan 17 08:01 3 0 S
7. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include the
following line:
AUTOSTART_CMCLD = 1
8. Make sgpue037 join the cluster
sgpue037# cmrunnode -v sgpue037
9. Halt production packages on sgpue036
sgpue036# cmhaltpkg kcdbstg
sgpue036# cmhaltpkg kcnfs
10. Restart production packages on sgpue037
sgpue037# cmrunpkg kcdbstg
sgpue037# cmrunpkg kcnfs
11. Re-enable package switching on production packages
sgpue037# cmmodpkg -e kcdbstg
sgpue037# cmmodpkg -e kcnfs
12. Check package status on cluster.
You should see the same listing as shown in Step 1 ie.
sgpue037# cmviewcl
-
8/3/2019 HA With MC Guard Concepts
43/55
> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
13. Release sgpue037 to customers (notify by phone, email etc)
Shutting down and restarting the MC/SG cluster----------------------------------------------
We assume primary node = sgpue036 and secondary node = sgpue037 in
the following examples.
1. Log in to sgpue036 or sgpue037 as superuser and issue command to
halt cluster daemon
sgpue036# cmhaltcl -f -v
2. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include
the following line:
AUTOSTART_CMCLD = 0
3. Proceed to shutdown each node
sgpue036# /etc/shutdown -h 0
sgpue037# /etc/shutdown -h 0
4. After planned activity is over, bootup each node to run level 3
sgpue036# who -r
sgpue037# who -r
. run-level 3 Jan 17 08:01 3 0 S
5. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the
following line:
AUTOSTART_CMCLD = 1
6. Startup the cluster daemon from any node
-
8/3/2019 HA With MC Guard Concepts
44/55
sgpue036# cmruncl -v
7. Check package status on cluster.
It should look exactly like the following
sgpue036# cmviewcl
> CLUSTER STATUS
> knet up
>
> NODE STATUS STATE
> sgpue036 up running
>
> PACKAGE STATUS STATE AUTO_RUN NODE
> kci2stg up running enabled sgpue036
>
> NODE STATUS STATE
> sgpue037 up running
>> PACKAGE STATUS STATE AUTO_RUN NODE
> kcdbstg up running enabled sgpue037
> kcnfs up running enabled sgpue037
8. Release machines to customers (notify by phone, email etc)
Doing a rolling upgrade
-----------------------
This is the most common scenario where we work on 1 node at a time
without bringing down the entire cluster. This ensures there is at
least 1 node available to run the application packages. The stepsare already detailed above. Either:
1. Shutting down and restarting the primary node
2. Shutting down and restarting the secondary node
or
1. Shutting down and restarting the secondary node
2. Shutting down and restarting the primary node
Note : This may apply to OS upgrades eg. 10.20 to 11.00 whereby MC/SG is fromver 10.10 to 11.X
Another method, you may deploy is building a separate cluster on a
separate machine with
the latest OS and just copy all config files over, and just swap packag
IPs.
-
8/3/2019 HA With MC Guard Concepts
45/55
- Modifying the clustero Anything to do with the cluster will need to reapply the cluster (go
through the cluster.conf file to see what are the parameters) so needdowntime to halt the cluster, except for adding/removing nodes and packages
which can be done while cluster is still up and running. Eg. Node timeout, heartbeat interval Eg. cluster name Eg. Heartbeat IPs Eg. No. of packages Eg. Change of node names Eg. Manual change / add of volume groupo Steps Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run Cmgetconf v c cluster name
outputfilename (cluster ascii file - name it something different) to getlatest copy of cluster config file.
Modify the outputfilename to make the intended changes to thecluster.
cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors
Cmapplyconf v C outputfilename - cluster ascii file) if noerrors
Start the cluster Cmruncl
- Adding/removing nodes to the clustero Addingo Online method Heartbeat must be configured and network ready Can be done on any node (preferably node where original cluster
config file was placed) cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n
primary node n secondary node n new node
(Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the
outputfilename.)
Cmgetconf v c cluster name outputfilename (cluster ascii file- name it something different) to get latest copy of cluster config file.
Check and Combine the 2 configurations into one final configfile.
-
8/3/2019 HA With MC Guard Concepts
46/55
cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors
Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors
Cmrunnode node name to join the cluster Modify all package config files to include the new node if
desired. (Remember modifying the package config file will need a downtime toapply the package config file)
o Offline method Same except perform with cluster halted and then when made all
the changes, start cluster
o Removingo Online method Modify all package config files to exclude the new node if
configured in the package. (Remember modifying the package config file willneed a downtime to apply the package config file)
Halt all ACTIVE packages on the node cmhaltpkg package names Halt the node cmhaltnode v node name Cmgetconf v c cluster name outputfilename (cluster ascii file
- name it something different) to get latest copy of cluster config file. Edit this cluster ascii file to remove the node details cmcheckconf v C outputfilename - cluster ascii file) check
for any errors
Cmapplyconf v C outputfilename - cluster ascii file) if noerrors
Do whatever with the node, power down, redeploy Vgexport vgsh (off the removed node)o Offline method Same except perform with cluster halted and then when made all
the changes, start cluster, skip the halt package and halt node steps
Note : While cluster is running, you can remove node from cluster while thenode is reachable ie connected to LAN recommended. In the event, if the nodeis unreachable, it can still be removed from cluster, only if there are nopackages which specify the unreachable node. If there are packages that dependon the unreachable node, then best to halt the cluster and do the changes onthe package and cluster config files to remove the node from the cluster.
- Adding/removing packages to the clustero Addingo Online methodo Create Packages on primary node mkdir /etc/cmcluster/packagedir cmmakepkg p /etc/cmcluster/packagedir/packagename.conf Edit the configuration file cmmakepkg s /etc/cmcluster/ packagedir/packagename.cntl
-
8/3/2019 HA With MC Guard Concepts
47/55
Edit the control script.
Note : If the package and control file is special (e.g NFS required) then do
not run the cmmakepkg command, just get the pre-defined scripts from the MC/SG
NFS extension toolkit,
You may still need to do some adjustments. (similar for SAP extension).
Note : It is possible that packages do not use any volume groups.
ftp the control script file to the adoptive nodes On primary node cmcheckconf v P packagename.conf package config file)
check for any errors
Cmapplyconf v P packagename.conf package config file) if no errors
Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching Test package on all adoptive nodes if possible
Note : Repeat steps Create Packages to here again if there are more packages required in the cluster.
o Offline method Same except perform with cluster halted and then when made all
the changes, start cluster
o Removingo Online method Cmhaltpkg v package name Cmdeleteconf f v p package name
Cmviewcl (to view that it is no longer part of the cluster)Note : The package config and control files are not removed ie deleted fromsystem,
just removed from the cluster.
o Offline method Same except perform with cluster halted and then when made all
the changes, start cluster
- Modifying packageso 2 parts package config file and package control fileo Anything to do with modifying thepackage config file will need to
reapply the package(go through the package.conf file to see what are theparameters)
Parameters that can be changed without stopping package iecluster and package is up and running.
-
8/3/2019 HA With MC Guard Concepts
48/55
Eg. Failover policy, Failback policy Eg. Add/Remove/modify Node names E.g Switching parameters
Steps Cmgetconf v p package name outputfilename (package confi
file - name it something different) to get latest copy of package configfile.
Modify the outputfilename to make the intended changes tothe package config.
cmcheckconf v P outputfilename package config file) check for any errors
Cmapplyconf v P outputfilename package config file) if no errors
Parameters that must be changed by stopping package ie package
is down but cluster is up and running. Eg. package name (if possible change hosting directory name as
well) Eg. Change Run/Halt Scripts Eg. Add/remove Service names Eg. Add/remove Subnet
Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, run Cmgetconf v p package name
outputfilename (package config file - name it something different) to getlatest copy of package config file.
Modify the outputfilename to make the intended changes tothe package config.
cmcheckconf v P outputfilename package config file) check for any errors
Cmapplyconf v P outputfilename package config file) if no errors
Start the packageo Cmrunpkg package nameo Cmmodpkg e package name to re-enable package
switching
o Anything to do with modifying thepackage control file will NOT needto reapply the package(go through the package.cntl file to see what are theparameters) script, but need downtime to halt the package, but the clusterand other packages in the cluster can still be running.
-
8/3/2019 HA With MC Guard Concepts
49/55
Eg. VG name and no. of VGs Eg. LVs, names of mount points and no.s Eg. Nfs mounts Eg. Package IPs and subnet
Eg. Service names Eg. Subnet E.g Application start/stop scriptso Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, modify the package control file to make
the intended changes.
Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching
- Adding/modifying LAN cards in the clustero If there is a need to add or upgrade/replace LAN cards in a clustere
environment, need to take note of the LAN ID (NMID)o Usually adding will not cause an issue, unless it will be part of
cluster, and it is already connected to the network need to reconfigure andreapply cluster config file.
o For upgrading/replacing LAN cards, NMID may change, eg. Upgradingfrom a 10BT to a 100BT or replacing a 1 port LAN card with a 4 port LAN card.In such a case, the cluster cannot startup, because the cluster setting isdifferent (cluster trying to find LAN1 configured in the cluster config file,but the NMID has already changed to LAN2. We will need to reform, re-apply thecluster, before running it.
o Steps Method 1 Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run
o cmquerycl [w] [full] v C/etc/cmcluster/outputfilename n primary node n secondary node [n other node
in the cluster]
(Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the
outputfilename. This should have automatically generated the cluster config
file with the new LAN card NMID.
-
8/3/2019 HA With MC Guard Concepts
50/55
run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.
Check and Combine the 2 configurations into one final configfile.
cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors
Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors
Start the cluster Cmruncl
Method 2 not recommended
Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster
run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.
Modify the outputfilename to make the intended changes to thecluster (if you are aware of the change in NMID of the LAN card.
cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors
Cmapplyconf v C outputfilename - cluster ascii file) if noerrors
Start the cluster Cmruncl
- Extending/Reducing logical volumes in the clusterpackageso (ONLINE) No downtime required provided OnlineJFS installedo Make changes on node where logical volumes are mountedo No action required on adoptive nodeso Extending : Lvextend L newsizeinBL /dev/vgsh/shlvol Fsadm f vxfs b newsizeinKB /shnameo Reducing : Fsadm f vxfs b newsizeinKB /shname Lvreduce L newsizeinBL /dev/vgsh/shlvol
- LVMTAB needs to be updated when :o Adding/removing Disks Logical volumes
-
8/3/2019 HA With MC Guard Concepts
51/55
Volume groups
- Adding/Removing new Physical volumes/Disks to the
volume group owned by packageo Addingo On the primary node (node where shared VG is activated, where packag
is running) Pvcreate new disk Vgextend new disk to the identified shared volume group VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes
oOn the adoptive nodes
VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Removing Same steps except that use vgreduce (no pvcreate required)
o (Online) No downtime required, but it will be good to schedule one iyou want to test the failover.
o Do I need to re-apply the cluster and package? No.
- Adding/Removing logical volumes to the volumegroup owned by the package
o Adding
o On the primary node, (node where shared VG is activated, wherepackage is running) Lvcreate L . Newfs . Mkdir /filesystem Mount fileystem manually and assign correct ownershipd and
permissions
-
8/3/2019 HA With MC Guard Concepts
52/55
Umount fileystem VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes
o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid Mkdir /filesystem VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Schedule time to halt the package -(only package affected).
Cmhaltpkg package name
o After package halted,modify the package control script (.cntl) toinclude the new filesystem on all nodes.
o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that filesystem is mounted and accessible.o Test on all adoptive nodes.
o Removing
Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the
package from cluster
Vgchange a y vgsh to activate vg Lvremove the logical volume Vgchange a n vgsh to deactivate the vg Vgchange c y vgsh to mark the vg as part of the cluster Modify package control files on all nodes to exclude this
LV and filesystem
Cmrunpkg package name - to restart package Cmmodpkg e . to re-enable package switching Vgexport mapfile on primary and ftp to all adoptive node Vgexport., vgimport . Mapfile on adoptive nodes Test on all adoptive nodes
-
8/3/2019 HA With MC Guard Concepts
53/55
o Offline for package affected, but cluster can be up and running,other packages can be up and running.
o Do I need to re-apply the cluster/package (changing package controlfile does not need a reapplication)? No.
o Can I create a LV/filesystem that is not mounted by my package butbelongs to the same volume group ie I mount it via /etc/fstab ? No, this wilcause a problem since the VG will need to be activated/deactivated package
may fail.
- Adding new Volume groups to the cluster packageso Addingo On the primary node (node where shared VG is activated, where packag
is running) Pvcreate new disk
Mkdir /dev/vgsh new share vg Mknod /dev/vgsh/group c 64 0x0. Vgcreate new shared volume group Create necessary lvols and filesystems or raw devices for VG Mount the filesystems and change permissions and ownerships
accordingly VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes
o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh Mkdir /filesystems for the logical volumeso On the primary node, Vgchange c y /dev/vgsh to mark the VG as part of the cluster Umount all filesystems in this new shared VG and deactivate it vgchange a n vgsh. Check /var/adm/syslog/syslog.log to see if this vg has been
successfully marked in the cluster Cmgetconf v c cluster name outputfilename (name it something
different) to see that it has been entered into the cluster config file.
-
8/3/2019 HA With MC Guard Concepts
54/55
If no, then we will need to down the entire cluster, check andre-apply the cluster.
o Method 1 (do this if successfully marked)o Schedule time to halt the package -(only package affected). Cmhaltpkg package nameo After package halted,modify the package control script (.cntl) to
include the new filesystem, and Volume Group on all nodes.o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that the VG is activated and filesystems are mounted and
accessible.o Test on all adoptive nodes.
o Method 2 (do this if not marked successfully)o Schedule time to halt the entire cluster. Cmhaltcl
o After cluster halted, run Cmgetconf v c cluster name outputfilenam(cluster ascii file - name it something different) to see that it has beenentered into the cluster config file.
o If not entered, try to manually type in the new shared VG into thenew cluster outputfilename.
o Cmcheckconf v C outputfilename - cluster ascii file) check forany errors
o Cmapplyconf v C outputfilename - cluster ascii file) if no erroro modify the package control script (.cntl) to include the new
filesystem and Volume Group on all nodes.o Start the cluster cmrunclo Verify that the VG is activated and filesystems are mounted and
accessible.
o Test that the VG can be mounted on all adoptive nodes.
o Removing
Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the
package from cluster
Modi