ha with mc guard concepts

8/3/2019 HA With MC Guard Concepts

1/55

HA with MC/ServiceGuard (Concepts)

http://uxsl.europe.hp.com/doc/tech/ha/HAtrain/ Prepared by Anand

Other platforms have other HA software

HA means the following :- no SPOC- N+1 redundancy - Ideal : Dual power sources/vendors ; hubs and switches connected to dual power sources- Not load balancing (foundry / cisco local director, software load balancers)

HA Terminology :- Cluster (1)- Node (1 to many)- Package (1 to many) - Floating IPs (single/multiple eg. BAMM) ; Can specify hostnames in DNS for each

floating IP

Question : Can we have a node in 2 clusters ? Not advisable - dependencies

Availability :99% - standard server99.5% - MC/ServiceGuard application, not node99.99% - ??

Criteria for HA :Ensure that both (all) nodes in the cluster

- Are of same build hardware and software wise (patch level, kernel changes, user accounts)

Type of disks applicable for use with HA MC/ServiceGuard :

- In general, disks with 2 SPUs/controllers :- VA- FC10, SC10- XP- DS- AutoRAID 12H- Nike disks

Not recommended- Jamaica disks- Desktop

Note : Disks should have HA (RAID1, RAID5) as well.

Question : Can MC/ServiceGuard work across DCs or countries ie one node in Singapore, the other node Japan ?

Answer : Yes, provided the heartbeat cable is long enough, or more importantly the subnet is the same andthe shared disk system is accessible by both servers.


2/55

Software Licenses

Part# Description Qty Unit PriceB3935DA MC/SG software system license for HPUX 11.x 2 USD 0.0B3935DA-AE5 MC/SG software license for K/N class 2 USD 5117.0B3935DA-ABA MC/SG software English localization 2 USD 0.0

B3935DA-0S6 MC/SG 24x7 Support (first year) 2 USD 496.8B5140BA MC/SG NFS toolkit license 2 USD 322.5B5140BA-0S6 MC/SG NFS toolkit 24x7 support (first year) 2 USD 64.8B5139DA Enterprise Cluster Extension 2 USD 427.85B5139DA-0S6 Enterprise Cluster Extension 24x7 Support 2 USD 86.4first year)H6194AA MC/SG Implementation 1 USD 15000.0(to be included only if you want to buy consulting and implementation service from HPC)B7885BA MC/SG LTU Extension for SAP 1 USD 12900.0(per SAP instance)

**Pls verify with the SAP team if any other SAP related license is needed.

If you would like to buy service from HPC, what our team usually do is to approach Vincent who's tAccount manager for HPO and he will arrange for someone from HPC to work with us. (Do remembinclude the USD15k)

Software Installation

Note : MC/ServiceGuard can be installed from ctss144 depots. (/var/depot/applications/11.00/hp-ux.,/var/depot/applications/11.11/hp-ux.,)

We have in our depots :Version 11.09Version 11.13 - recommended

MC/ServiceGuard software to install (basic setup, install on both machines):

B3935DA A.11.13 MC / Service GuardB5140BA A.11.00.04 MC/ServiceGuard NFS Toolkit install only if NFS

is required to work within the cluster

B5139DA B.01.06 Enterprise Cluster Master Toolkit - optionalB8324BA A.01.03 HP Cluster Object Manager optional

Note : Only install the above software from the same DART/CD version, do not try to mix and match from dreleases.

Note : If OS is ver 11.11 (11i) and your OS is Mission Critical Environment, then it should come withMC/ServiceGuard installed.


3/55

Note : Do check /etc/services and /etc/inetd.conf files for the MC/ServiceGuard related services, esp. for 1mission critical OS.

/etc/serviceshacl-hb 5300/tcp # High Availability (HA) Cluster heartbeat

hacl-gs 5301/tcp # HA Cluster General Serviceshacl-cfg 5302/tcp # HA Cluster TCP configurationhacl-cfg 5302/udp # HA Cluster UDP configurationhacl-probe 5303/tcp # HA Cluster TCP probehacl-probe 5303/udp # HA Cluster UDP probehacl-local 5304/tcp # HA Cluster Commandshacl-test 5305/tcp # HA Cluster Testhacl-dlm 5408/tcp # HA Cluster distributed lock manager

/etc/inetd.confhacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -phacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f /var/opt/cmom/cmo

Depending on what version of MC/ServiceGuard is installed, MC/ServiceGuard patches must be installed:

http://haweb.cup.hp.com/Support/Patches/SG11.00.html

Question : can we install one node with MC/ServiceGuard version 11.09 and the other with version 11.13 osomething else, ie different versions?

Answer : Not advisable, compatibility issues. Unless, your doing rolling upgrades.


4/55

MC/ServiceGuard Network Design

Note : Usually heartbeat LAN, use internal LAN cardPrimary and Secondary LANs use 2 separate LAN cards.

Question : How will a 3 node, 4 node cluster be like ?

How can we configure the packages to failover ?? Many possibilities

Heartbeat network- cross UTP- Serial cable- dedicated heartbeat subnet- Primary LAN usually set as secondary heartbeat

Cluster/Package Node Configurations-ACTIVE ; ACTIVE

-ACTIVE ; PASSIVE

Cluster Lock Disk-Tie breaker-Who gets the lock disk who will reform the cluster, the other will panic reboot usually-What if the cluster lock disk is dead?? - UNPLANNED OUTAGE

switch 1 switch 2

User Lan (Securenet)

lan0 lan1lan1 lan0

lan2

FC2

lan2

sgpue036 sgpue037

heartbeat lan

(cross UTP cable)

Primary lan (15.209.0.25) -

cable name : sgpue036

Primary lan (15.209.0.26) -

cable name : sgpue037

192.0.0.1 192.0.0.2

Keychain Database MC/ServiceGuard network design

Failover lan(no physical IP, but must

be connected to switch 1))cable name: sgpue037s

Failover lan(no physical IP, but must

be connected to switch 2))cable name: sgpue036s


5/55

MC/ServiceGuard Monitoring-Hardware

-Application

-ITO

-ClusterviewPlus

-NNM

MC/ServiceGuard Commands

CmqueryclCmcheckconfCmapplyconf will distribute binary configuration details to all nodes in the clusterCmgetconf

Cluster specific commandsCmruncl

CmviewclCmhaltcl

Node specific commandsCmrunnodeCmhaltnode

Package specific commandsCmrunpkgCmhaltpkgCmmodpkg

MC/ServiceGuard with SAM

MC/ServiceGuard backups

Database vendors online backup tools

Split mirror

Business Copy (VA, XP) KNET

JFS snapshots

Practise of backup for HPMS if no special requestso for filesystem backup whatever filesystem is mounted on which

system, therefore if failed over.

o for database SAP/DBA will consult tools team on backup strategy usually configure omniback to detect and backup by floating IP.

Issues with BAMM ??


6/55

Project Timeline (TAT)Gathering information 2 daysHardware setup (LAN)- 2 daysConfiguration - 3 days (varies) dependencies : Application/DB scriptsTesting - 1 day (require CE presence)


7/55

Configure /etc/rc.config.d/netconf on each of the nodes in the cluster with the heartbeat LAN (if using LANand not serial interface)

!"#

#

!"

#

$%&'&($)*+,-.+/)$&0$%&&)%$1&($.-23*&)

$.-23*&%4&5*$$%54$6&5*$$%5*4$0%0$&(4$.%4&$)&0-%4&(30%&*%$-

-%0&.)$%4&*$)&0-%4&(30%&*

2222

$)&*$$%$)&*$$%$)&*$$%$)&*$$%

$)&*$$%$)&*$$%$)&*$$%$)&*$$%

Sgpue036.sgp.hp.com root

Sgpue037.sgp.hp.com root

$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$

$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0

Unmount Logical Volumes and deactivate the Volume Groups that will be controlled/run by the cluster.(These do not need to be entered in /etc/fstab)

E.g.. vgchange a n vg02. vgchange a n vg03

Note : It is possible that a cluster does not have any cluster lock disk or a even VG at all.

Same for packages. Also each VG must be unique for each package, cannot use the same VG for

other packages.

Export and distribute the Volume Groups to the secondary (failover) node.

E.g.

1. vgexport p s m v /tmp/vg02.map /dev/vg022. vgexport p s m v /tmp/vg03.map /dev/vg03


8/55

-p option : preview mode, so that the volume group will not be exported

off the original node.

- s option : Sharable option, Series 800 only. When the s option is

is specified, then the -p, -v, and m options must also be

specified. A mapfile is created that can be used to

create volume group entries on other systems in the high

availability cluster (with the vgimport command).

- m option : generates the map file

- v option : print verbose

FTP the .map files to secondary (failover) node.

On Secondary (failover) node, create the volume group directories:

E.g.

3. mkdir /dev/vg024. mkdir /dev/vg035. ls l /dev/*/group6. mknod /dev/vg01/group c 64 0x0200007. mknod /dev/vg02/group c 64 0x030000

Import the volume groups onto the secondary (failover) nodeE.g.

8. vgimport s m /tmp/vg02.map /dev/vg029. vgimport s m /tmp/vg03.map /dev/vg03

Note : Leave the cluster volume groups decactivated.

Configure the Cluster (do this on one node). cmquerycl [w] [full] v C /etc/cmcluster/cluster.conf n primary node n secondary node

[n other nodes in the cluster]

(Note : This will generate the cluster config file.)

. Edit the /etc/cmcluster/cluster.conf file

# **********************************************************************

# ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************


9/55

# ***** For complete details about cluster parameters and how to ****

# ***** set them, consult the ServiceGuard manual. ****

# **********************************************************************

# Enter a name for this cluster. This name will be used to identify the

# cluster when viewing or manipulating it.

CLUSTER_NAME Kcdatabases

# Cluster Lock Parameters

#

# The cluster lock is used as a tie-breaker for situations

# in which a running cluster fails, and then two equal-sized

# sub-clusters are both trying to form a new cluster. The

# cluster lock may be configured using either a lock disk# or a quorum server.

#

# You can use either the quorum server or the lock disk as

# a cluster lock but not both in the same cluster.

#

# Consider the following when configuring a cluster.# For a two-node cluster, you must use a cluster lock. For

# a cluster of three or four nodes, a cluster lock is strongly

# recommended. For a cluster of more than four nodes, a

# cluster lock is recommended. If you decide to configure

# a lock for a cluster of more than four nodes, it must be

# a quorum server.

# Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and

# FIRST_CLUSTER_LOCK_PV parameters to define a lock disk.

# The FIRST_CLUSTER_LOCK_VG is the LVM volume group that

# holds the cluster lock. This volume group should not be

# used by any other cluster as a cluster lock device.

# Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.

# The QS_HOST is the host name or IP address of the system

# that is running the quorum server process. The

# QS_POLLING_INTERVAL (microseconds) is the interval at which

# ServiceGuard checks to make sure the quorum server is running.

# The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase

# the time interval after which the quorum server is marked DOWN.

#

# The default quorum server timeout is calculated from the

# ServiceGuard cluster parameters, including NODE_TIMEOUT and

# HEARTBEAT_INTERVAL. If you are experiencing quorum server

# timeouts, you can adjust these parameters, or you can include

# the QS_TIMEOUT_EXTENSION parameter.

#

# For example, to configure a quorum server running on node

# "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to

# add 2 seconds to the system assigned value for the quorum server

# timeout, enter:

#

# QS_HOST qshost

# QS_POLLING_INTERVAL 120000000

# QS_TIMEOUT_EXTENSION 2000000

FIRST_CLUSTER_LOCK_VG /dev/vg02 < - -This is automatically searched for.


10/55

# Definition of nodes in the cluster.# Repeat node definitions as necessary for additional nodes.

NODE_NAME sgpue036

NETWORK_INTERFACE lan0

HEARTBEAT_IP 192.0.0.1.


11/55

# Enter the maximum number of packages which will be configured in the cluster.

# You can not add packages beyond this limit.

# This parameter is required.

MAX_CONFIGURED_PACKAGES 8

# List of cluster aware LVM Volume Groups. These volume groups will# be used by package applications via the vgchange -a e command.

# Neither CVM or VxVM Disk Groups should be used here.

# For example:

# VOLUME_GROUP /dev/vgdatabase

# VOLUME_GROUP /dev/vg02

VOLUME_GROUP /dev/vg02

VOLUME_GROUP /dev/vg03

Verify the Cluster Configuration (do this on one node)1. cmcheckconf [k] v C /etc/cmcluster/cluster.conf

Note : If there are no errors, means that the cluster is ready to be applied.

Distributing the Binary Configuration File (do this on one node). vgchange a y /dev/vg02 (cluster lock volume group). cmapplyconf [k] v C /etc/cmcluster/cluster.conf. vgchange a n /dev/vg02

Note : Need to activate cluster lock volume group in order for it to be applied for first time

clusters. Subsequent changes to the cluster may not need to activate cluster lock or even may

not need to down the cluster ie can be done online but not recommended.

Note : Need to deactivate cluster lock disk right after cluster changes are applied.

Backing up Volume Group and Cluster Lock Configuration Data (optional)

1. vgcfgbackup u /dev/vg022. vgcfgbackup u /dev/vg03

Note : This does not requires the volume groups to be activated.

Checking Cluster Operation (do on either node)

1. cmruncl v2. cmhaltnode v primary node3. cmrunnode v primary node4. cmhaltcl v5. cmruncl v6. cmhaltcl v

Note : Try this on all other nodes in the cluster as well.

Disable Automount of Volume Groups (On both nodes)

1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0

Note : This is necessary as we do not want the cluster volume groups to be activated when a

system reboots. It is now under the control of the cluster now.


12/55

Disable Autostart Features (On both nodes)

1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0

Note : This is to prevent the cluster node from automatically joining the cluster after a

reboot. Usually done when doing maintenance.

Create Packages

E.g.1. mkdir /etc/cmcluster/kci2prd < - can be any name2. cmmakepkg p /etc/cmcluster/kci2prd.conf < - can be any name3. Edit the configuration file

Note : If the package and control file is special (e.g NFS required) then do not run the

cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS

extension toolkit (similar for SAP extension). You still need to do adjustments to the files t

suit your needs.

# **********************************************************************

# ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******

# **********************************************************************

# ******* Note: This file MUST be edited before it can be used. ********

# * For complete details about package parameters and how to set them, *

# * consult the MC/ServiceGuard ServiceGuard OPS Edition manuals *******

# **********************************************************************

# Enter a name for this package. This name will be used to identify the

# package when viewing or manipulating it. It must be different from

# the other configured package names.

PACKAGE_NAME kci2prd

# Enter the package type for this package. PACKAGE_TYPE indicates

# whether this package is to run as a FAILOVER or SYSTEM_MULTI_NODE

# package.#

# FAILOVER package runs on one node at a time and if a failure

# occurs it can switch to an alternate node.

#

# SYSTEM_MULTI_NODE

# package runs on multiple nodes at the same time.

# It can not be started and halted on individual nodes.

# Both NODE_FAIL_FAST_ENABLED and AUTO_RUN must be set

# to YES for this type of package. All SERVICES must

# have SERVICE_FAIL_FAST_ENABLED set to YES.

#

# NOTE: Packages which have a PACKAGE_TYPE of SYSTEM_MULTI_NODE are

# not failover packages and should only be used for applications

# provided by Hewlett-Packard.#

# Since SYSTEM_MULTI_NODE packages run on multiple nodes at

# one time, following parameters are ignored:

#

# FAILOVER_POLICY

# FAILBACK_POLICY

#

# Since an IP address can not be assigned to more than node at a

# time, relocatable IP addresses can not be assigned in the

# package control script for multiple node packages. If


13/55

# volume groups are assigned to multiple node packages they must

# activated in a shared mode and data integrity is left to the

# application. Shared access requires a shared volume manager.

#

#

# Examples : PACKAGE_TYPE FAILOVER (default)

# PACKAGE_TYPE SYSTEM_MULTI_NODE

#

PACKAGE_TYPE FAILOVER

# Enter the failover policy for this package. This policy will be used

# to select an adoptive node whenever the package needs to be started.

# The default policy unless otherwise specified is CONFIGURED_NODE.

# This policy will select nodes in priority order from the list of

# NODE_NAME entries specified below.

#

# The alternative policy is MIN_PACKAGE_NODE. This policy will select

# the node, from the list of NODE_NAME entries below, which is

# running the least number of packages at the time this package needs

# to start.

FAILOVER_POLICY CONFIGURED_NODE

# Enter the failback policy for this package. This policy will be used

# to determine what action to take when a package is not running on

# its primary node and its primary node is capable of running the

# package. The default policy unless otherwise specified is MANUAL.

# The MANUAL policy means no attempt will be made to move the package

# back to its primary node when it is running on an adoptive node.

#

# The alternative policy is AUTOMATIC. This policy will attempt to

# move the package back to its primary node whenever the primary node

# is capable of running the package.

FAILBACK_POLICY MANUAL

# Enter the names of the nodes configured for this package. Repeat

# this line as necessary for additional adoptive nodes.

#

# NOTE: The order is relevant.

# Put the second Adoptive Node after the first one.

#

# Example : NODE_NAME original_node

# NODE_NAME adoptive_node

#

# If all nodes in cluster is to be specified and order is not

# important, "NODE_NAME *" may be specified.

#

# Example : NODE_NAME *

NODE_NAME sgpue036 NODE_NAME sgpue037

# Enter the value for AUTO_RUN. Possible values are YES and NO.

# The default for AUTO_RUN is YES. When the cluster is started the

# package will be automatically started. In the event of a failure the


14/55

# package will be started on an adoptive node. Adjust as necessary.

#

# AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.

AUTO_RUN YES

# Enter the value for LOCAL_LAN_FAILOVER_ALLOWED.

# Possible values are YES and NO.

# The default for LOCAL_LAN_FAILOVER_ALLOWED is YES. In the event of a

# failure, this permits the cluster software to switch LANs locally

# (transfer to a standby LAN card). Adjust as necessary.

#

# LOCAL_LAN_FAILOVER_ALLOWED replaces obsolete NET_SWITCHING_ENABLED.

LOCAL_LAN_FAILOVER_ALLOWED YES

# Enter the value for NODE_FAIL_FAST_ENABLED.

# Possible values are YES and NO.

# The default for NODE_FAIL_FAST_ENABLED is NO. If set to YES,

# in the event of a failure, the cluster software will halt the node

# on which the package is running. All SYSTEM_MULTI_NODE packages must have# NODE_FAIL_FAST_ENABLED set to YES. Adjust as necessary.

NODE_FAIL_FAST_ENABLED NO

# Enter the complete path for the run and halt scripts. In most cases

# the run script and halt script specified here will be the same script,

# the package control script generated by the cmmakepkg command. This

# control script handles the run(ning) and halt(ing) of the package.

# Enter the timeout, specified in seconds, for the run and halt scripts.

# If the script has not completed by the specified timeout value,

# it will be terminated. The default for each script timeout is

# NO_TIMEOUT. Adjust the timeouts as necessary to permit full

# execution of each script.

# Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of# all SERVICE_HALT_TIMEOUT specified for all services.

RUN_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl

RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl

HALT_SCRIPT_TIMEOUT NO_TIMEOUT

# Enter the names of the storage groups configured for this package.

# Repeat this line as necessary for additional storage groups.

#

# Storage groups are only used with CVM disk groups. Neither

# VxVM disk groups or LVM volume groups should be listed here.

# By specifying a CVM disk group with the STORAGE_GROUP keyword

# this package will not run until the VxVM-CVM-pkg package is

# running and thus the CVM shared disk groups are ready for

# activation.

#

# NOTE: Should only be used by applications provided by

# Hewlett-Packard.

#

# Example : STORAGE_GROUP dg01

# STORAGE_GROUP dg02



15/55


#

# Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the

# SERVICE_HALT_TIMEOUT values for this package. Repeat these

# three lines as necessary for additional service names. All

# service names MUST correspond to the service names used by

# cmrunserv and cmhaltserv commands in the run and halt scripts.

#

# The value for SERVICE_FAIL_FAST_ENABLED can be either YES or

# NO. If set to YES, in the event of a service failure, the

# cluster software will halt the node on which the service is

# running. If SERVICE_FAIL_FAST_ENABLED is not specified, the

# default will be NO.

#

# SERVICE_HALT_TIMEOUT is represented in the number of seconds.

# This timeout is used to determine the length of time (in

# seconds) the cluster software will wait for the service to

# halt before a SIGKILL signal is sent to force the termination

# of the service. In the event of a service halt, the cluster

# software will first send a SIGTERM signal to terminate the

# service. If the service does not halt, after waiting for the# specified SERVICE_HALT_TIMEOUT, the cluster software will send

# out the SIGKILL signal to the service to force its termination.

# This timeout value should be large enough to allow all cleanup

# processes associated with the service to complete. If the

# SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be

# assumed, meaning the cluster software will not wait at all

# before sending the SIGKILL signal to halt the service.

#

# Example: SERVICE_NAME DB_SERVICE

# SERVICE_FAIL_FAST_ENABLED NO

# SERVICE_HALT_TIMEOUT 300

#

# To configure a service, uncomment the following lines and

# fill in the values for all of the keywords.#

SERVICE_NAME kci2prdSERVICE_FAIL_FAST_ENABLED NO

SERVICE_HALT_TIMEOUT 300

# Enter the network subnet name that is to be monitored for this package.

# Repeat this line as necessary for additional subnet names. If any of

# the subnets defined goes down, the package will be switched to another

# node that is configured for this package and has all the defined subnets

# available.

SUBNET 15.209.0.0

# The keywords RESOURCE_NAME, RESOURCE_POLLING_INTERVAL,

# RESOURCE_START, and RESOURCE_UP_VALUE are used to specify Package

# Resource Dependencies. To define a package Resource Dependency, a

# RESOURCE_NAME line with a fully qualified resource path name, and

# one or more RESOURCE_UP_VALUE lines are required. The

# RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional.

#

# The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the

# resource is to be monitored. It will be defaulted to 60 seconds if


16/55

# RESOURCE_POLLING_INTERVAL is not specified.

#

# The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.

# The default setting for RESOURCE_START is AUTOMATIC. If AUTOMATIC

# is specified, ServiceGuard will start up resource monitoring for

# these AUTOMATIC resources automatically when the node starts up.

# If DEFERRED is selected, ServiceGuard will not attempt to start

# resource monitoring for these resources during node start up. User

# should specify all the DEFERRED resources in the package run script

# so that these DEFERRED resources will be started up from the package

# run script during package run time.

#

# RESOURCE_UP_VALUE requires an operator and a value. This defines

# the resource 'UP' condition. The operators are =, !=, >, =,

# and or >= may be used

# for the first operator, and only < or 5.1 greater than 5.1 (threshold)

# RESOURCE_UP_VALUE > -5 and < 10 between -5 and 10 (range)

#

# Note that "and" is required between the lower limit and upper limit# when specifying a range. The upper limit must be greater than the lower

# limit. If RESOURCE_UP_VALUE is repeated within a RESOURCE_NAME block, then

# they are inclusively OR'd together. Package Resource Dependencies may be

# defined by repeating the entire RESOURCE_NAME block.

#

# Example : RESOURCE_NAME /net/interfaces/lan/status/lan0

# RESOURCE_POLLING_INTERVAL 120

# RESOURCE_START AUTOMATIC

# RESOURCE_UP_VALUE = RUNNING

# RESOURCE_UP_VALUE = ONLINE

#

# Means that the value of resource /net/interfaces/lan/status/lan0

# will be checked every 120 seconds, and is considered to

# be 'up' when its value is "RUNNING" or "ONLINE".

#

# Uncomment the following lines to specify Package Resource Dependencies.

#

#RESOURCE_NAME

#RESOURCE_POLLING_INTERVAL

#RESOURCE_START

#RESOURCE_UP_VALUE [and ]


17/55

Create Package Control Scripts1. cmmakepkg s /etc/cmcluster/kci2prd/kci2prd.cntl2. Edit the control script.

Note : If the package and control file is special (e.g NFS required) then do not run the

cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS

extension toolkit (similar for SAP extension). You still need to do adjustments to the files tsuit your needs.

Note : It is possible that packages do not use any volume groups.

# **********************************************************************

# * *

# * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) *

# * *

# * Note: This file MUST be edited before it can be used. *

# * *

# **********************************************************************

# The PACKAGE and NODE environment variables are set by

# ServiceGuard at the time the control script is executed.

# Do not set these environment variables yourself!

# The package may fail to start or halt if the values for

# these environment variables are altered.

# UNCOMMENT the variables as you set them.

# Set PATH to reference the appropriate directories.


18/55

PATH=/usr/bin:/usr/sbin:/etc:/bin

# VOLUME GROUP ACTIVATION:

# Specify the method of activation for volume groups.

# Leave the default ("VGCHANGE="vgchange -a e") if you want volume

# groups activated in exclusive mode. This assumes the volume groups have

# been initialized with 'vgchange -c y' at the time of creation.

#

# Uncomment the first line (VGCHANGE="vgchange -a e -q n"), and comment

# out the default, if your disks are mirrored on separate physical paths,

#

# Uncomment the second line (VGCHANGE="vgchange -a e -q n -s"), and comment

# out the default, if your disks are mirrored on separate physical paths,

# and you want the mirror resynchronization to ocurr in parallel with

# the package startup.

#

# Uncomment the third line (VGCHANGE="vgchange -a y") if you wish to

# use non-exclusive activation mode. Single node cluster configurations

# must use non-exclusive activation.

#

# VGCHANGE="vgchange -a e -q n"

# VGCHANGE="vgchange -a e -q n -s"

# VGCHANGE="vgchange -a y"VGCHANGE="vgchange -a e" # Default

# CVM DISK GROUP ACTIVATION:

# Specify the method of activation for CVM disk groups.

# Leave the default

# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite")

# if you want disk groups activated in the exclusive write mode.

#

# Uncomment the first line

# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"),

# and comment out the default, if you want disk groups activated in

# the readonly mode.

#

# Uncomment the second line# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"),

# and comment out the default, if you want disk groups activated in the

# shared read mode.

#

# Uncomment the third line

# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"),

# and comment out the default, if you want disk groups activated in the

# shared write mode.

#

# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"

# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"

# CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"

CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite"

# VOLUME GROUPS

# Specify which volume groups are used by this package. Uncomment VG[0]=""

# and fill in the name of your first volume group. You must begin with

# VG[0], and increment the list in sequence.

#

# For example, if this package uses your volume groups vg01 and vg02, enter:

# VG[0]=vg01

# VG[1]=vg02

#

# The volume group activation method is defined above. The filesystems


19/55

# associated with these volume groups are specified below.

#

VG[0]=vg02VG[1]=vg03

# CVM DISK GROUPS

# Specify which cvm disk groups are used by this package. Uncomment

# CVM_DG[0]="" and fill in the name of your first disk group. You must

# begin with CVM_DG[0], and increment the list in sequence.

#

# For example, if this package uses your disk groups dg01 and dg02, enter:

# CVM_DG[0]=dg01

# CVM_DG[1]=dg02

#

# The cvm disk group activation method is defined above. The filesystems

# associated with these volume groups are specified below in the CVM_*

# variables.

#

#CVM_DG[0]=""

# VxVM DISK GROUPS

# Specify which VxVM disk groups are used by this package. Uncomment

# VXVM_DG[0]="" and fill in the name of your first disk group. You must# begin with VXVM_DG[0], and increment the list in sequence.

#

# For example, if this package uses your disk groups dg01 and dg02, enter:

# VXVM_DG[0]=dg01

# VXVM_DG[1]=dg02

#

# The cvm disk group activation method is defined above.

#

#VXVM_DG[0]=""

#

# NOTE: A package could have LVM volume groups, CVM disk groups and VxVM

# disk groups.

## FILESYSTEMS

# Specify the filesystems which are used by this package. Uncomment

# LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]="" and fill in the name of your first

# logical volume, filesystem and mount option for the file system. You must

# begin with LV[0], FS[0] and FS_MOUNT_OPT[0] and increment the list in

# sequence.

#

# For the LVM example, if this package uses the file systems pkg1a and

# pkg1b, which are mounted on the logical volumes lvol1 and lvol2 with

# read and write options enter:

# LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_MOUNT_OPT[0]="-o rw"

# LV[1]=/dev/vg01/lvol2; FS[1]=/pkg1b; FS_MOUNT_OPT[1]="-o rw"

#

# For the CVM or VxVM example, if this package uses the file systems

# pkg1a and pkg1b, which are mounted on the volumes lvol1 and lvol2

# with read and write options enter:

# LV[0]="/dev/vx/dsk/dg01/vol01"; FS[0]="/pkg1a"; FS_MOUNT_OPT[0]="-o rw"

# LV[1]="/dev/vx/dsk/dg01/vol02"; FS[1]="/pkg1b"; FS_MOUNT_OPT[1]="-o rw"

#

# The filesystems are defined as triplets of entries specifying the logical

# volume, the mount point and the mount options for the file system. Each

# filesystem will be fsck'd prior to being mounted. The filesystems will be

# mounted in the order specified during package startup and will be unmounted

# in reverse order during package shutdown. Ensure that volume groups


20/55

# referenced by the logical volume definitions below are included in

# volume group definitions above.

#

#LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]=""

LV[0]=/dev/vg02/lvol1; FS[0]=/oracle/KCI2PRD/data01; FS_MOUNT_OPT[0]="-o rw,suid,largefiles"

LV[1]=/dev/vg02/lvol2; FS[1]=/oracle/KCI2PRD/data02; FS_MOUNT_OPT[1]="-o rw,suid,largefiles"LV[2]=/dev/vg02/lvol3; FS[2]=/oracle/KCI2PRD/data03; FS_MOUNT_OPT[2]="-o rw,suid,largefiles"LV[3]=/dev/vg02/lvol4; FS[3]=/oracle/KCI2PRD/data04; FS_MOUNT_OPT[3]="-o rw,suid,largefiles"

LV[4]=/dev/vg02/lvol5; FS[4]=/oracle/KCI2PRD/data05; FS_MOUNT_OPT[4]="-o rw,suid,largefiles"

LV[5]=/dev/vg02/lvol6; FS[5]=/oracle/KCI2PRD/data06; FS_MOUNT_OPT[5]="-o rw,suid,largefiles"LV[6]=/dev/vg02/lvol7; FS[6]=/oracle/KCI2PRD/data07; FS_MOUNT_OPT[6]="-o rw,suid,largefiles"

LV[7]=/dev/vg02/lvol8; FS[7]=/oracle/KCI2PRD/data08; FS_MOUNT_OPT[7]="-o rw,suid,largefiles"LV[8]=/dev/vg02/lvol9; FS[8]=/oracle/KCI2PRD/data09; FS_MOUNT_OPT[8]="-o rw,suid,largefiles"

LV[9]=/dev/vg02/lvol10; FS[9]=/oracle/KCI2PRD/data10; FS_MOUNT_OPT[9]="-o rw,suid,largefiles"LV[10]=/dev/vg02/lvol11; FS[10]=/oracle/KCI2PRD/mirrlogA; FS_MOUNT_OPT[10]="-o

rw,suid,largefiles"LV[11]=/dev/vg02/lvol12; FS[11]=/oracle/KCI2PRD/mirrlogB; FS_MOUNT_OPT[11]="-o

rw,suid,largefiles"LV[12]=/dev/vg02/lvol13; FS[12]=/oracle/KCI2PRD/origlogA; FS_MOUNT_OPT[12]="-o

rw,suid,largefiles"LV[13]=/dev/vg02/lvol14; FS[13]=/oracle/KCI2PRD/origlogB; FS_MOUNT_OPT[13]="-o

rw,suid,largefiles"LV[14]=/dev/vg03/lvol1; FS[14]=/oracle/KCI2PRD/arch; FS_MOUNT_OPT[14]="-o rw,suid,largefiles"

LV[15]=/dev/vg03/lvol2; FS[15]=/oracle/KCI2PRD/bkup01; FS_MOUNT_OPT[15]="-o rw,suid,largefiles

#

# VOLUME RECOVERY

#

# When mirrored VxVM volumes are started during the package control

# bring up, if recovery is required the default behavior is for

# the package control script to wait until recovery has been

# completed.

#

# To allow mirror resynchronization to ocurr in parallel with

# the package startup, uncomment the line

# VXVOL="vxvol -g \$DiskGroup -o bg startall" and comment out the default.#

# VXVOL="vxvol -g \$DiskGroup -o bg startall"

VXVOL="vxvol -g \$DiskGroup startall" # Default

# FILESYSTEM UNMOUNT COUNT

# Specify the number of unmount attempts for each filesystem during package

# shutdown. The default is set to 1.

FS_UMOUNT_COUNT=1

# FILESYSTEM MOUNT RETRY COUNT.

# Specify the number of mount retrys for each filesystem.

# The default is 0. During startup, if a mount point is busy

# and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and

# the script will exit with 1. If a mount point is busy and

# FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt

# to kill the user responsible for the busy mount point

# and then mount the file system. It will attempt to kill user and

# retry mount, for the number of times specified in FS_MOUNT_RETRY_COUNT.

# If the mount still fails after this number of attempts, the script

# will exit with 1.

# NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute

# "fuser -ku" to freeup busy mount point.

FS_MOUNT_RETRY_COUNT=0


21/55

# CONCURRENT VGCHANGE OPERATIONS

# Specify the number of concurrent volume group activations or

# deactivations to allow during package startup or shutdown.

# Setting this value to an appropriate number may improve the performance

# while activating or deactivating a large number of volume groups in the

# package. If the specified value is less than 1, the script defaults it

# to 1 and proceeds with a warning message in the package control script

# logfile.

CONCURRENT_VGCHANGE_OPERATIONS=1

# CONCURRENT DISK GROUP OPERATIONS

# Specify the number of concurrent VxVM DG imports or deports to allow

# during package startup or shutdown.


# while importing or deporting a large number of disk groups in the

# package. If the specified value is less than 1, the script defaults it

# to 1 and proceeds with a warning message in the package control script

# logfile.

CONCURRENT_DISKGROUP_OPERATIONS=1

# CONCURRENT FSCK OPERATIONS

# Specify the number of concurrent fsck to allow during package startup.# Setting this value to an appropriate number may improve the performance

# while checking a large number of file systems in the package. If the

# specified value is less than 1, the script defaults it to 1 and proceeds

# with a warning message in the package control script logfile.

CONCURRENT_FSCK_OPERATIONS=1

# CONCURRENT MOUNT AND UMOUNT OPERATIONS

# Specify the number of concurrent mounts and umounts to allow during

# package startup or shutdown.


# while mounting or un-mounting a large number of file systems in the package.

# If the specified value is less than 1, the script defaults it to 1 and

# proceeds with a warning message in the package control script logfile.

CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1

# IP ADDRESSES

# Specify the IP and Subnet address pairs which are used by this package.

# Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first

# IP and subnet address. You must begin with IP[0] and SUBNET[0] and

# increment the list in sequence.

#

# For example, if this package uses an IP of 192.10.25.12 and a subnet of

# 192.10.25.0 enter:

# IP[0]=192.10.25.12

# SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0)

#

# Hint: Run "netstat -i" to see the available subnets in the Network field.

#

# IP/Subnet address pairs for each IP address you want to add to a subnet

# interface card. Must be set in pairs, even for IP addresses on the same

# subnet.

#

#IP[0]=""

#SUBNET[0]=""

IP[0]="15.209.0.33"

SUBNET[0]="15.209.0.0" # netmask 255.255.255.192


22/55

# SERVICE NAMES AND COMMANDS.

# Specify the service name, command, and restart parameters which are

# used by this package. Uncomment SERVICE_NAME[0]="", SERVICE_CMD[0]="",

# SERVICE_RESTART[0]="" and fill in the name of the first service, command,

# and restart parameters. You must begin with SERVICE_NAME[0], SERVICE_CMD[0],

# and SERVICE_RESTART[0] and increment the list in sequence.

#

# For example:

# SERVICE_NAME[0]=pkg1a

# SERVICE_CMD[0]="/usr/bin/X11/xclock -display 192.10.25.54:0"

# SERVICE_RESTART[0]="" # Will not restart the service.

#

# SERVICE_NAME[1]=pkg1b

# SERVICE_CMD[1]="/usr/bin/X11/xload -display 192.10.25.54:0"

# SERVICE_RESTART[1]="-r 2" # Will restart the service twice.

#

# SERVICE_NAME[2]=pkg1c

# SERVICE_CMD[2]="/usr/sbin/ping"

# SERVICE_RESTART[2]="-R" # Will restart the service an infinite

# number of times.

#

# Note: No environmental variables will be passed to the command, this

# includes the PATH variable. Absolute path names are required for the# service command definition. Default shell is /usr/bin/sh.

#

#SERVICE_NAME[0]=""

#SERVICE_CMD[0]=""

#SERVICE_RESTART[0]=""

SERVICE_NAME[0]=kci2prdSERVICE_CMD[0]="/etc/cmcluster/kci2prd/kci2prd.sh monitor"SERVICE_RESTART[0]=""

# DEFERRED_RESOURCE NAME

# Specify the full path name of the 'DEFERRED' resources configured for

# this package. Uncomment DEFERRED_RESOURCE_NAME[0]="" and fill in the# full path name of the resource.

#

#DEFERRED_RESOURCE_NAME[0]=""

# DTC manager information for each DTC.

# Example: DTC[0]=dtc_20

#DTC_NAME[0]=

# START OF CUSTOMER DEFINED FUNCTIONS

# This function is a place holder for customer define functions.

# You should define all actions you want to happen here, before the service is

# started. You can create as many functions as you need.

function customer_defined_run_cmds

{

# ADD customer defined run commands.

: # do nothing instruction, because a function must contain some command.

/etc/cmcluster/kci2prd/kci2prd.sh start

test_return 51

}


23/55

# This function is a place holder for customer define functions.

# You should define all actions you want to happen here, before the service is

# halted.

function customer_defined_halt_cmds

{

# ADD customer defined halt commands.

: # do nothing instruction, because a function must contain some command.

/etc/cmcluster/kci2prd/kci2prd.sh shutdown

test_return 52

}

# END OF CUSTOMER DEFINED FUNCTIONS

..

Ftp all ascii scripts to secondary (failover) node/nodes.

Verify the Cluster Configuration (Do this on the packages primary node)cmcheckconf [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf

Note : If there are no errors, means that the package is ready to be applied

Distribute the Cluster Configuration File (Do this on the packages primary node)1. vgchange a y /dev/vg02 (cluster lock volume group)

cmapplyconf [v] [C] [/etc/cmcluster/cluster.conf] P

/etc/cmcluster/kci2prd/kci2prd.conf

3. vgchange a n /dev/vg02

Note : You should not need to activate and later deactivate cluster lock volume group while

applying packages.

Note : Repeat steps Create Packages to here again if there are more packages required in thecluster.

Configure Automounter (Do this only if your system is using automounter)Check that in /etc/rc.config.d/nfsconf, the automounter section should be:

AUTOMOUNT=1

AUTOMASTER="/etc/auto_master"

AUTOMOUNT_OPTIONS="-f $AUTO_MASTER"

AUTOMOUNTD_OPTIONS=

Check in /etc/rc.config.d/nfsconf, one nfs client and one nfs server daemon is configured to

run:

NFS_CLIENT=1

NFS_SERVER=1

NUM_NFSD=4

NUM_NFSIOD=4Add this line to /etc/auto_master

/- /etc/auto.direct

Create an /etc/auto.direct file

/oracle :/export/

Restart the automounter with

/sbin/init.d/nfs.client stop

/sbin/init.d/nfs.client start

Disable Automount of Volume Groups (On both nodes)

1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0


24/55

Enable Autostart Features (On both nodes)

1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=1

Checking Package Operation (do on either node)

7. cmruncl v8. cmhaltnode v primary node (node will be halted and package failed

over to secondary (adoptive) node)

9. cmrunnode v primary node (node will rejoin cluster)10. cmhaltpkg package name (halt package on adoptive node)11. cmrunpkg package name (run package on original node)12. cmmodpkg e package name (enable package switching)13. cmhaltcl v

Note : Use cmviewcl or cmviewcl v to view results of each command.

MC/ServiceGuard Template

System Configuration

Hardware Information

Hostname

Model

Operating System version

Physical Memory

Swap Space

Non-Shared HDs

Shared HDs

Tapes

LAN Cards

Primary and Standby Network

Type

Heartbeat Network TypeMC ServiceGuard Version

MirrorDisk/UX Version

Online JFS Version

Application name / Application

version

Database name / Database version

OS/Appls Patch Level


25/55

System Information

Server Hostname

Server IP Address

Server IP Netmask

Server Default Router

Primary Network on separate

Switch

Standby Network on separate

Switch

Operation System File System Layout

Volume Group Logical FS Type Size (mb) Mount point

MC/ServiceGuard Configuration

Cluster Information

Cluster Name

Cluster Members

Cluster Lock Disk

Heartbeat Interval Default Value is 1

Node Timeout Default Value is 2 ; recommended 8

Network Polling Interval Default Value is 2

Autostart Delay Default Value is 10mins

Maximum Configured Packages To allow online package reconfiguration

Packages Overview

The cluster consist of ________ packages:

1.

2.3.

Detailed Package Information:

Package Name

Re-locatable Hostname

Re-locatable IP Address

Monitor Subnet


26/55

Primary Node

Adoptive Node

Run/Halt Script

Run/Halt Script Timeout

Package Switch Enabled

Network Switch Enabled

Node Failfast Enabled

Service NameVolume Groups

Logic Logical Volume and File System Details

Device file Size/ Type Mount Point Owner Group Perm.

Parameter Value

CLUSTER_NAME

FIRST_CLUSTER_LOCK_VG

NODE_NAME

NETWORK_INTERFACE

HEARTBEAT_IP

NETWORK_INTERFACE

HEARTBEAT_IP

FIRST_CLUSTER_LOCK_PV

NODE_NAME

NETWORK_INTERFACE

HEARTBEAT_IP

NETWORK_INTERFACE

HEARTBEAT_IP


27/55

FIRST_CLUSTER_LOCK_PV

HEARTBEAT_INTERVAL (Default value is 1s)

NODE_TIMEOUT (Default value is 2s)

AUTO_START_TIMEOUT (Default value is 10 mins)

NETWORK_POLLING_INTERVAL (Default value is 2s)

MAX_CONFIGURED_PACKAGES (To allow and add for online package

reconfiguration)

VOLUME_GROUP

!"

Parameter Value

PACKAGE_NAME

NODE_NAME

NODE_NAME

RUN_SCRIPT

RUN_SCRIPT_TIMEOUT

HALT_SCRIPT

HALT_SCRIPT_TIMEOUT

SERVICE_NAME

SUBNET

AUTO_RUN

(PKG_SWITCHING_ENABLED)

YES

LOCAL_LAN_FAILOVER_ALLOWED

(NET_SWITCHING_ENABLED)

YES


28/55

NODE_FAIL_FAST_ENABLED NO

!#"

Parameter Value

PATH (Default value is 2s)

VGCHANGE "vgchange a e"

VG[0]

VG[1]

LV[0]

LV[1]

LV[2]

FS[0]

FS[1]

FS[2]

IP[0]

SUBNET[0]

SERVICE_NAME[0]

SERVICE_CMD[0]

SERVICE_RESTART[0]

function

customer_defined_run_cmds


29/55

function

customer_defined_halt_cmds

$%#'"

Parameter Value

INFORMIX_HOME or

ORACLE_HOME

INFORMIX_SESSION_NAME or

ORACLE_SESSION HOME

(Mount point and session name)

MONITOR_INTERVAL (Time between checks)

MONITOR_PROCESSES (Processes like dataserver etc)

PACKAGE_NAME

TIME_OUT (Waiting time in seconds for Informix/Oracle

abort to complete before killing

Informix/Oracleprocesses)

Note : If it is oracle, SAP or NFS, there are pre-defined scripts for these, provided you

install the enterprise master toolkit and nfs toolkit - /opt/cmcluster/


30/55

STING MC/SERVICEGUARD

1.1 Test Overview

This section contains the test requirement and test plan for the MC/ServiceGuard

1.2 Test Requirement

The MC/ServiceGuard product is a High Availability solution that performs system failure detection and transfers the applicati

from the primary node to the adoptive node when a system failure occurs.

Note : We assume that there is only 1 package in the cluster. If in the event there are more packages, please change/add steaccordingly.

The faults to be tested and the appropriate methods are listed below:

Type of Failure Method of Simulation

CPU, Memory, Power Supply and

Operating System

Active LAN

Total Data LAN

Reset of server

Removal of LAN cable from active LAN card

Removal of all Data LAN cables from server

1.3 Verification method

Upon startup of the package, the verification checkpoints are


31/55

. Log onto surviving server and run the command cmviewcl to check that the package application is RUNNING

. Ping the relocatable IP from another station in the same network

. Check that all shared file systems are mounted.


32/55

1.4 Test Checklist

Five categories of test that will be performed are as follows:

a. Normal Bootupb. Manual Package Switching Functionality

c. LAN Failure Tests Heartbeat Failure

Data LAN Failure

d. System Failure Testse. Failures not affecting package These are sanity checks to ensure that failure of the adoptive node in the cluster has no side effect on the primary node.

No. Test Method of

Simulation

Expected Result Check Remarks

NORMAL BOOTUP SEQUENCE

1 Normal boot

up

Power on or reboot

both servers

Cluster is up with

node1 and node2running and package is

running on node1

MANUAL PACKAGE SWITCHING FUNCTIONALITY

1 Package halts

successfully

on node1

Run cmhaltpkg v

package command

Application shuts

down successfully and

package is halted

properly

2 Package

starts

successfully

on node2

Run cmrunpkg v

n node2 package

command

Package starts up

successfully on node2

3 Package halts

successfully

on node2

Run cmhaltpkg v

package command

Application shuts

down successfully and

package is halted

properly

4 Package

starts

successfully

on node1

Run cmrunpkg v

n node1 package

command

Package starts up

successfully on node1


33/55

No. Test Method of

Simulation

Expected Result Check Remarks

LAN FAILURE TESTS

1 Heartbeat

LAN failure

on node1

package is running

on node1

Pull out lan0 cable

on node1

lan1 takes over as

Heartbeat LAN and

package remainsrunning on node1

2 Pri Data LANfailure on

node1

package is runningon node1

Pull out lan1 cable

on node1

Sec LAN, lan5 takesover as active LAN

and package remains

running on node1

3 Sec Data

LAN failure

on node1

package is running

on node1

Pull out lan5 cable

on node1

Pri LAN, lan1 takes

over as active LAN

and package remains

running on node1

4 Total Data

LAN Failure

on node1

package is running

on node1

Pull out lan1 and

lan5 from node1

Package fails to node2

if it is running as a

node in the cluster ; 50

% chance of failing on

adoptive node as

unable to get cluster

lock and panic reboots

5 Heartbeat

LAN failure

on node2

package is running

on node2

Pull out lan0 cable

on node2

lan1 takes over as

Heartbeat LAN and

package remains

running on node2

6 Pri Data LANfailure on

node2

package is runningon node2

Pull out lan1 cable

on node2

Secondary LAN, lan5takes over as active

LAN and package

remains running on

node2

7 Sec Data

LAN failure

on node2

package is running

on node2

Pull out lan5 cable

on node2

Pri LAN, lan1 takes

over as active LAN

and package remains

running on node2

8 Total Data

LAN Failure

on node2

package is running

on node2

Pull out lan1 and

lan5 from node2



node in the cluster; ;50 % chance of failing

on adoptive node as

unable to get cluster

lock and panic reboots


34/55

You may wish to extend the test to test the functionaility of MC/ServiceGuard with regards to application monitoring

scripts and application failover.

No. Test Method of

Simulation

Result Check Remarks

SYSTEM FAILURE TESTS

1 node1 failure package is running

on node1

Reset node1

(try both shutdown

ry and reboot or rs

from console)


if it is running as anode in the cluster

Yes


on node2

Reset node2

(try both shutdown

ry and reboot or

rs from console)



node in the cluster

Yes

FAILURES NOT AFFECTING PACKAGE


on node1

Cluster reforms to a

single node cluster and

package continues to

run on node1

Yes


35/55

MC/ServiceGuard Troubleshooting

Troubleshooting using log files

For troubleshooting, there are a few files that will help to log problems experienced by MC/ServiceGuard, these are:

a. /var/adm/syslog/syslog.logb. /etc/cmcluster/packagedir/packagename.cntl.log

These files need to be maintained as the file size will grow. This can ultimately affect / file system if not maintained.

The package control log file will contain information regarding packagestart/stop. Each package will have its own package control log file.

Note : Always use cmviewcl or cmviewcl v to help to see the status of your

cluster.

Common Problems :

. Problems of configuration- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files

. Warning : Missing cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem occurs after something has changed affecting the cluste

lock disk

eg. SCSI ID of disk changed- No issue at the moment, but when a tie breaker period occurs, nodes

will not be able to detect the disk and all nodes may panic reboot.

Solution :

1. Schedule downtime to halt the cluster (cmhaltcl)

2. Run vgchange c n vgsh to remove the cluster lock volume group fromthe cluster.

3. Activate vgsh on the node where the cluster configuration ascii fileexists by running

vgchange a y vgsh and do a cmapplyconf v C /etc/cmcluster/cluster.ascii

Answer yes

to the change and then run vgchange a n vg02 to deactivate the cluster

lock volume

disk.


36/55

4. Start the cluster (cmruncl)

. Warning : I/O error on cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem usually occurs if something is wrong with one of the

SPUs or controllers of the disk array connected to one of the nodes.

- If happened on the primary node, it would be possible that theapplication would already have hung.

- No issue if occur on adoptive node at the moment, but when a tiebreaker period occurs, nodes will not be able to detect the disk and all nodesmay panic reboot.

- In other cases, cluster lock disk itself could be faulty and a hungsituation wrf to the application and bdfwill occur.

Solution :

- Schedule downtime and ask CE to check the SPU or controller

. Cluster failures- Cluster cannot start- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files- could be hardware, package induced, application problem. Again check

log files.

. Package failures

- Package unable to start totally on all nodes- Check syslog and package log file.

Possible config problem or control script problem orapplication script name changed.

- Package cannot failover to adoptive node but can start on primarynode

- Check syslog and package log file. Possible could be package switching or node disabled Cmmodpkg e package name to enable package switching Cmmodpkg e package name n node name to enable node

(package to run on this node)

- Package cannot mount/umount filesystems from package log

- Package failed to start because of mount problems Possible shared VG not marked as cluster or activated -

manually mounted fileystem or someone accessing umounted directory

Unmount all filesystems, check who accessing directory and gethat person to exit, vgchange c y vgsh to mark cluster and deactivate and trystarting again.

Harddisk problem- Package failed to halt

application process hung and could not be killed.


37/55

Hardddisk problem

. Service Failure- Cmviewcl v to see the status of all packages and their services.- Trace from the package control file and syslog to see why did it fai

etc.- Possible config problem or control script problem or application

script name changed.

. Node timeout- Recommended node timeout value in cluster config file is 5-8 seconds- Otherwise if use default 2 seconds, system may panic reboot due to

tie breaker scenario because of poor network performance.

. GSP problems- Known problem for L class servers (certain generation)- Cause system to panic reboot and failover package to adoptive node- Patch recommended/GSP Firmware upgrade need to be done

. LAN problems- NMID problems

0. Disk problems- SCSI ID changed /conflict perhaps due to controller card factory default setting Cannot bring up cluster Need CE to change accordingly.- Cluster lock disk failed

If lock disk RAID1 or RAID5 no problem If lock disk LVM mirror need to do vgcfgrestore and vgsync to

recover the lock info which is stored on the BBR table part of the disk

If no mirror, then need reapply cluster


38/55

On-Going Upgrades/Changes to systems/cluster /package

- Pro-active Patch installation (node by node)- Data Centre outages (shutdown entire cluster)- Rolling upgrades (node by node)

Keychain Cluster - Shutdown and Startup Procedure

-------------------------------------------------

Last update: 19 June SGP 2002

*******************************************************************

Please follow these steps whenever you need to arrange a shutdown

for sgpue036.sgp.hp.com & sgpue037.sgp.hp.com.

Special handling is required because of their MC/Serviceguard HA

environment.

*******************************************************************

Before you shutdown a node

--------------------------

1. Get agreement with application support on schedule, scope and

duration of shutdown.

2. Ensure both nodes in the cluster are up and running. If any node

is down or appears to be having problems, DO NOT proceed with

shutdown.

3. If shutting down a primary node, goto section titled "Shutting down

and restarting the primary node".

If shutting down a secondary node, goto section titled "Shutting down

and restarting the secondary node".

If shutting down the entire cluster, goto section titled "Shutting down

and restarting the MC/SG cluster".

If doing rolling upgrade, goto section titled "Doing a rolling upgrade".

Shutting down and restarting the primary node

------------------------------------------------

We assume primary node = sgpue036 and secondary node = sgpue037


39/55

in the following examples.

1. Before shutdown, make a note of all packages currently running

on each node.

sgpue036# cmviewcl

> CLUSTER STATUS

> knet up

>

> NODE STATUS STATE

> sgpue036 up running

>

> PACKAGE STATUS STATE AUTO_RUN NODE

> kci2stg up running enabled sgpue036

>

> NODE STATUS STATE


>> PACKAGE STATUS STATE AUTO_RUN NODE

> kcdbstg up running enabled sgpue037

> kcnfs up running enabled sgpue037

2. Halt primary node sgpue036

sgpue036# cmhaltnode -f -v sgpue036

Production packages will failover from sgpue036 to sgpue037. sgpue036

will cease to be a member of the active cluster.

3. Check package status on cluster

sgpue036# cmviewcl

> CLUSTER STATUS

> knet up

>

> NODE STATUS STATE


>


> kci2stg up running disabled sgpue037

> kcdbstg up running enabled sgpue037> kcnfs up running enabled sgpue037

>

> NODE STATUS STATE

> sgpue036 down halted

4. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the

following line:


40/55

AUTOSTART_CMCLD = 0

5. Now we can proceed to shutdown (for PM, repair) or reboot

(for patching, kernel regen) sgpue036, eg:

sgpue036# /etc/shutdown -h 0

sgpue036# /etc/shutdown -r 0

6. When repair or reboot is over, sgpue036 should be booted up to

run level 3

sgpue036# who -r

. run-level 3 Jan 17 08:01 3 0 S


following line:

AUTOSTART_CMCLD = 1

8. Make sgpue036 join the cluster

sgpue036# cmrunnode -v sgpue036

9. Halt production packages on sgpue037

sgpue037# cmhaltpkg kci2stg

10. Restart production packages on sgpue036

sgpue036# cmrunpkg kci2stg

11. Re-enable package switching on production packages

sgpue036# cmmodpkg -e kci2stg

12. Check package status on cluster.

You should see the same listing as shown in Step 1 ie.

sgpue036# cmviewcl

> CLUSTER STATUS

> knet up

>> NODE STATUS STATE


>



>

> NODE STATUS STATE



41/55

>




13. Release sgpue036 to customers (notify by phone, email etc)

Shutting down and restarting the secondary node

---------------------------------------------

1. Before shutdown, make a note of all packages currently running

on each node

sgpue037# cmviewcl

> CLUSTER STATUS

> knet up

>> NODE STATUS STATE


>



>

> NODE STATUS STATE


>




2. Halt secondary node sgpue037

sgpue037# cmhaltnode -f -v sgpue037

Production packages will failover from sgpue037 to sgpue036. sgpue037

will cease to be a member of the active cluster.

3. Check package status on cluster

sgpue037# cmviewcl

> CLUSTER STATUS

> knet up

>

> NODE STATUS STATE


>




42/55

> kcdbstg up running disabled sgpue036

> kcnfs up running disabled sgpue036

>

> NODE STATUS STATE

> sgpue037 down halted

4. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include thefollowing line:

AUTOSTART_CMCLD = 0

5. Now we can proceed to shutdown (for PM, repair) or reboot

(for patching, kernel regen) sgpue037, eg:


c# /etc/shutdown -r 0

6. When repair or reboot is over, sgpue037 should be booted up to

run level 3

sgpue037# who -r

. run-level 3 Jan 17 08:01 3 0 S


following line:

AUTOSTART_CMCLD = 1

8. Make sgpue037 join the cluster

sgpue037# cmrunnode -v sgpue037

9. Halt production packages on sgpue036

sgpue036# cmhaltpkg kcdbstg

sgpue036# cmhaltpkg kcnfs

10. Restart production packages on sgpue037

sgpue037# cmrunpkg kcdbstg

sgpue037# cmrunpkg kcnfs

11. Re-enable package switching on production packages

sgpue037# cmmodpkg -e kcdbstg

sgpue037# cmmodpkg -e kcnfs


You should see the same listing as shown in Step 1 ie.

sgpue037# cmviewcl


43/55

> CLUSTER STATUS

> knet up

>

> NODE STATUS STATE




>

> NODE STATUS STATE


>




13. Release sgpue037 to customers (notify by phone, email etc)

Shutting down and restarting the MC/SG cluster----------------------------------------------

We assume primary node = sgpue036 and secondary node = sgpue037 in

the following examples.

1. Log in to sgpue036 or sgpue037 as superuser and issue command to

halt cluster daemon

sgpue036# cmhaltcl -f -v

2. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include

the following line:

AUTOSTART_CMCLD = 0

3. Proceed to shutdown each node



4. After planned activity is over, bootup each node to run level 3

sgpue036# who -r

sgpue037# who -r

. run-level 3 Jan 17 08:01 3 0 S

5. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the

following line:

AUTOSTART_CMCLD = 1

6. Startup the cluster daemon from any node


44/55

sgpue036# cmruncl -v


It should look exactly like the following

sgpue036# cmviewcl

> CLUSTER STATUS

> knet up

>

> NODE STATUS STATE


>



>

> NODE STATUS STATE





8. Release machines to customers (notify by phone, email etc)

Doing a rolling upgrade

-----------------------

This is the most common scenario where we work on 1 node at a time

without bringing down the entire cluster. This ensures there is at

least 1 node available to run the application packages. The stepsare already detailed above. Either:

1. Shutting down and restarting the primary node

2. Shutting down and restarting the secondary node

or

1. Shutting down and restarting the secondary node

2. Shutting down and restarting the primary node

Note : This may apply to OS upgrades eg. 10.20 to 11.00 whereby MC/SG is fromver 10.10 to 11.X

Another method, you may deploy is building a separate cluster on a

separate machine with

the latest OS and just copy all config files over, and just swap packag

IPs.


45/55

- Modifying the clustero Anything to do with the cluster will need to reapply the cluster (go

through the cluster.conf file to see what are the parameters) so needdowntime to halt the cluster, except for adding/removing nodes and packages

which can be done while cluster is still up and running. Eg. Node timeout, heartbeat interval Eg. cluster name Eg. Heartbeat IPs Eg. No. of packages Eg. Change of node names Eg. Manual change / add of volume groupo Steps Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run Cmgetconf v c cluster name

outputfilename (cluster ascii file - name it something different) to getlatest copy of cluster config file.

Modify the outputfilename to make the intended changes to thecluster.

cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors

Cmapplyconf v C outputfilename - cluster ascii file) if noerrors

Start the cluster Cmruncl

- Adding/removing nodes to the clustero Addingo Online method Heartbeat must be configured and network ready Can be done on any node (preferably node where original cluster

config file was placed) cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n

primary node n secondary node n new node

(Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the

outputfilename.)

Cmgetconf v c cluster name outputfilename (cluster ascii file- name it something different) to get latest copy of cluster config file.

Check and Combine the 2 configurations into one final configfile.


46/55

cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors

Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors

Cmrunnode node name to join the cluster Modify all package config files to include the new node if

desired. (Remember modifying the package config file will need a downtime toapply the package config file)

o Offline method Same except perform with cluster halted and then when made all

the changes, start cluster

o Removingo Online method Modify all package config files to exclude the new node if

configured in the package. (Remember modifying the package config file willneed a downtime to apply the package config file)

Halt all ACTIVE packages on the node cmhaltpkg package names Halt the node cmhaltnode v node name Cmgetconf v c cluster name outputfilename (cluster ascii file

- name it something different) to get latest copy of cluster config file. Edit this cluster ascii file to remove the node details cmcheckconf v C outputfilename - cluster ascii file) check

for any errors


Do whatever with the node, power down, redeploy Vgexport vgsh (off the removed node)o Offline method Same except perform with cluster halted and then when made all

the changes, start cluster, skip the halt package and halt node steps

Note : While cluster is running, you can remove node from cluster while thenode is reachable ie connected to LAN recommended. In the event, if the nodeis unreachable, it can still be removed from cluster, only if there are nopackages which specify the unreachable node. If there are packages that dependon the unreachable node, then best to halt the cluster and do the changes onthe package and cluster config files to remove the node from the cluster.

- Adding/removing packages to the clustero Addingo Online methodo Create Packages on primary node mkdir /etc/cmcluster/packagedir cmmakepkg p /etc/cmcluster/packagedir/packagename.conf Edit the configuration file cmmakepkg s /etc/cmcluster/ packagedir/packagename.cntl


47/55

Edit the control script.

Note : If the package and control file is special (e.g NFS required) then do

not run the cmmakepkg command, just get the pre-defined scripts from the MC/SG

NFS extension toolkit,

You may still need to do some adjustments. (similar for SAP extension).

Note : It is possible that packages do not use any volume groups.

ftp the control script file to the adoptive nodes On primary node cmcheckconf v P packagename.conf package config file)

check for any errors

Cmapplyconf v P packagename.conf package config file) if no errors

Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching Test package on all adoptive nodes if possible

Note : Repeat steps Create Packages to here again if there are more packages required in the cluster.



o Removingo Online method Cmhaltpkg v package name Cmdeleteconf f v p package name

Cmviewcl (to view that it is no longer part of the cluster)Note : The package config and control files are not removed ie deleted fromsystem,

just removed from the cluster.



- Modifying packageso 2 parts package config file and package control fileo Anything to do with modifying thepackage config file will need to

reapply the package(go through the package.conf file to see what are theparameters)

Parameters that can be changed without stopping package iecluster and package is up and running.


48/55

Eg. Failover policy, Failback policy Eg. Add/Remove/modify Node names E.g Switching parameters

Steps Cmgetconf v p package name outputfilename (package confi

file - name it something different) to get latest copy of package configfile.

Modify the outputfilename to make the intended changes tothe package config.

cmcheckconf v P outputfilename package config file) check for any errors

Cmapplyconf v P outputfilename package config file) if no errors

Parameters that must be changed by stopping package ie package

is down but cluster is up and running. Eg. package name (if possible change hosting directory name as

well) Eg. Change Run/Halt Scripts Eg. Add/remove Service names Eg. Add/remove Subnet

Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, run Cmgetconf v p package name

outputfilename (package config file - name it something different) to getlatest copy of package config file.

Modify the outputfilename to make the intended changes tothe package config.

cmcheckconf v P outputfilename package config file) check for any errors

Cmapplyconf v P outputfilename package config file) if no errors

Start the packageo Cmrunpkg package nameo Cmmodpkg e package name to re-enable package

switching

o Anything to do with modifying thepackage control file will NOT needto reapply the package(go through the package.cntl file to see what are theparameters) script, but need downtime to halt the package, but the clusterand other packages in the cluster can still be running.


49/55

Eg. VG name and no. of VGs Eg. LVs, names of mount points and no.s Eg. Nfs mounts Eg. Package IPs and subnet

Eg. Service names Eg. Subnet E.g Application start/stop scriptso Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, modify the package control file to make

the intended changes.

Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching

- Adding/modifying LAN cards in the clustero If there is a need to add or upgrade/replace LAN cards in a clustere

environment, need to take note of the LAN ID (NMID)o Usually adding will not cause an issue, unless it will be part of

cluster, and it is already connected to the network need to reconfigure andreapply cluster config file.

o For upgrading/replacing LAN cards, NMID may change, eg. Upgradingfrom a 10BT to a 100BT or replacing a 1 port LAN card with a 4 port LAN card.In such a case, the cluster cannot startup, because the cluster setting isdifferent (cluster trying to find LAN1 configured in the cluster config file,but the NMID has already changed to LAN2. We will need to reform, re-apply thecluster, before running it.

o Steps Method 1 Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run

o cmquerycl [w] [full] v C/etc/cmcluster/outputfilename n primary node n secondary node [n other node

in the cluster]

(Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the

outputfilename. This should have automatically generated the cluster config

file with the new LAN card NMID.


50/55

run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.

Check and Combine the 2 configurations into one final configfile.

cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors

Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors


Method 2 not recommended

Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster

run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.

Modify the outputfilename to make the intended changes to thecluster (if you are aware of the change in NMID of the LAN card.

cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors



- Extending/Reducing logical volumes in the clusterpackageso (ONLINE) No downtime required provided OnlineJFS installedo Make changes on node where logical volumes are mountedo No action required on adoptive nodeso Extending : Lvextend L newsizeinBL /dev/vgsh/shlvol Fsadm f vxfs b newsizeinKB /shnameo Reducing : Fsadm f vxfs b newsizeinKB /shname Lvreduce L newsizeinBL /dev/vgsh/shlvol

- LVMTAB needs to be updated when :o Adding/removing Disks Logical volumes


51/55

Volume groups

- Adding/Removing new Physical volumes/Disks to the

volume group owned by packageo Addingo On the primary node (node where shared VG is activated, where packag

is running) Pvcreate new disk Vgextend new disk to the identified shared volume group VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

oOn the adoptive nodes

VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Removing Same steps except that use vgreduce (no pvcreate required)

o (Online) No downtime required, but it will be good to schedule one iyou want to test the failover.

o Do I need to re-apply the cluster and package? No.

- Adding/Removing logical volumes to the volumegroup owned by the package

o Adding

o On the primary node, (node where shared VG is activated, wherepackage is running) Lvcreate L . Newfs . Mkdir /filesystem Mount fileystem manually and assign correct ownershipd and

permissions


52/55

Umount fileystem VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid Mkdir /filesystem VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Schedule time to halt the package -(only package affected).

Cmhaltpkg package name

o After package halted,modify the package control script (.cntl) toinclude the new filesystem on all nodes.

o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that filesystem is mounted and accessible.o Test on all adoptive nodes.

o Removing

Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the

package from cluster

Vgchange a y vgsh to activate vg Lvremove the logical volume Vgchange a n vgsh to deactivate the vg Vgchange c y vgsh to mark the vg as part of the cluster Modify package control files on all nodes to exclude this

LV and filesystem

Cmrunpkg package name - to restart package Cmmodpkg e . to re-enable package switching Vgexport mapfile on primary and ftp to all adoptive node Vgexport., vgimport . Mapfile on adoptive nodes Test on all adoptive nodes


53/55

o Offline for package affected, but cluster can be up and running,other packages can be up and running.

o Do I need to re-apply the cluster/package (changing package controlfile does not need a reapplication)? No.

o Can I create a LV/filesystem that is not mounted by my package butbelongs to the same volume group ie I mount it via /etc/fstab ? No, this wilcause a problem since the VG will need to be activated/deactivated package

may fail.

- Adding new Volume groups to the cluster packageso Addingo On the primary node (node where shared VG is activated, where packag

is running) Pvcreate new disk

Mkdir /dev/vgsh new share vg Mknod /dev/vgsh/group c 64 0x0. Vgcreate new shared volume group Create necessary lvols and filesystems or raw devices for VG Mount the filesystems and change permissions and ownerships

accordingly VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh Mkdir /filesystems for the logical volumeso On the primary node, Vgchange c y /dev/vgsh to mark the VG as part of the cluster Umount all filesystems in this new shared VG and deactivate it vgchange a n vgsh. Check /var/adm/syslog/syslog.log to see if this vg has been

successfully marked in the cluster Cmgetconf v c cluster name outputfilename (name it something

different) to see that it has been entered into the cluster config file.


54/55

If no, then we will need to down the entire cluster, check andre-apply the cluster.

o Method 1 (do this if successfully marked)o Schedule time to halt the package -(only package affected). Cmhaltpkg package nameo After package halted,modify the package control script (.cntl) to

include the new filesystem, and Volume Group on all nodes.o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that the VG is activated and filesystems are mounted and

accessible.o Test on all adoptive nodes.

o Method 2 (do this if not marked successfully)o Schedule time to halt the entire cluster. Cmhaltcl

o After cluster halted, run Cmgetconf v c cluster name outputfilenam(cluster ascii file - name it something different) to see that it has beenentered into the cluster config file.

o If not entered, try to manually type in the new shared VG into thenew cluster outputfilename.

o Cmcheckconf v C outputfilename - cluster ascii file) check forany errors

o Cmapplyconf v C outputfilename - cluster ascii file) if no erroro modify the package control script (.cntl) to include the new

filesystem and Volume Group on all nodes.o Start the cluster cmrunclo Verify that the VG is activated and filesystems are mounted and

accessible.

o Test that the VG can be mounted on all adoptive nodes.

o Removing

Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the

package from cluster

Modi

ha with mc guard concepts

Documents