ha with mc guard concepts

Upload: avikcse

Post on 07-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 HA With MC Guard Concepts

    1/55

    HA with MC/ServiceGuard (Concepts)

    http://uxsl.europe.hp.com/doc/tech/ha/HAtrain/ Prepared by Anand

    Other platforms have other HA software

    HA means the following :- no SPOC- N+1 redundancy - Ideal : Dual power sources/vendors ; hubs and switches connected to dual power sources- Not load balancing (foundry / cisco local director, software load balancers)

    HA Terminology :- Cluster (1)- Node (1 to many)- Package (1 to many) - Floating IPs (single/multiple eg. BAMM) ; Can specify hostnames in DNS for each

    floating IP

    Question : Can we have a node in 2 clusters ? Not advisable - dependencies

    Availability :99% - standard server99.5% - MC/ServiceGuard application, not node99.99% - ??

    Criteria for HA :Ensure that both (all) nodes in the cluster

    - Are of same build hardware and software wise (patch level, kernel changes, user accounts)

    Type of disks applicable for use with HA MC/ServiceGuard :

    - In general, disks with 2 SPUs/controllers :- VA- FC10, SC10- XP- DS- AutoRAID 12H- Nike disks

    Not recommended- Jamaica disks- Desktop

    Note : Disks should have HA (RAID1, RAID5) as well.

    Question : Can MC/ServiceGuard work across DCs or countries ie one node in Singapore, the other node Japan ?

    Answer : Yes, provided the heartbeat cable is long enough, or more importantly the subnet is the same andthe shared disk system is accessible by both servers.

  • 8/3/2019 HA With MC Guard Concepts

    2/55

    Software Licenses

    Part# Description Qty Unit PriceB3935DA MC/SG software system license for HPUX 11.x 2 USD 0.0B3935DA-AE5 MC/SG software license for K/N class 2 USD 5117.0B3935DA-ABA MC/SG software English localization 2 USD 0.0

    B3935DA-0S6 MC/SG 24x7 Support (first year) 2 USD 496.8B5140BA MC/SG NFS toolkit license 2 USD 322.5B5140BA-0S6 MC/SG NFS toolkit 24x7 support (first year) 2 USD 64.8B5139DA Enterprise Cluster Extension 2 USD 427.85B5139DA-0S6 Enterprise Cluster Extension 24x7 Support 2 USD 86.4first year)H6194AA MC/SG Implementation 1 USD 15000.0(to be included only if you want to buy consulting and implementation service from HPC)B7885BA MC/SG LTU Extension for SAP 1 USD 12900.0(per SAP instance)

    **Pls verify with the SAP team if any other SAP related license is needed.

    If you would like to buy service from HPC, what our team usually do is to approach Vincent who's tAccount manager for HPO and he will arrange for someone from HPC to work with us. (Do remembinclude the USD15k)

    Software Installation

    Note : MC/ServiceGuard can be installed from ctss144 depots. (/var/depot/applications/11.00/hp-ux.,/var/depot/applications/11.11/hp-ux.,)

    We have in our depots :Version 11.09Version 11.13 - recommended

    MC/ServiceGuard software to install (basic setup, install on both machines):

    B3935DA A.11.13 MC / Service GuardB5140BA A.11.00.04 MC/ServiceGuard NFS Toolkit install only if NFS

    is required to work within the cluster

    B5139DA B.01.06 Enterprise Cluster Master Toolkit - optionalB8324BA A.01.03 HP Cluster Object Manager optional

    Note : Only install the above software from the same DART/CD version, do not try to mix and match from dreleases.

    Note : If OS is ver 11.11 (11i) and your OS is Mission Critical Environment, then it should come withMC/ServiceGuard installed.

  • 8/3/2019 HA With MC Guard Concepts

    3/55

    Note : Do check /etc/services and /etc/inetd.conf files for the MC/ServiceGuard related services, esp. for 1mission critical OS.

    /etc/serviceshacl-hb 5300/tcp # High Availability (HA) Cluster heartbeat

    hacl-gs 5301/tcp # HA Cluster General Serviceshacl-cfg 5302/tcp # HA Cluster TCP configurationhacl-cfg 5302/udp # HA Cluster UDP configurationhacl-probe 5303/tcp # HA Cluster TCP probehacl-probe 5303/udp # HA Cluster UDP probehacl-local 5304/tcp # HA Cluster Commandshacl-test 5305/tcp # HA Cluster Testhacl-dlm 5408/tcp # HA Cluster distributed lock manager

    /etc/inetd.confhacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -phacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c

    hacl-probe stream tcp nowait root /opt/cmom/lbin/cmomd /opt/cmom/lbin/cmomd -f /var/opt/cmom/cmo

    Depending on what version of MC/ServiceGuard is installed, MC/ServiceGuard patches must be installed:

    http://haweb.cup.hp.com/Support/Patches/SG11.00.html

    Question : can we install one node with MC/ServiceGuard version 11.09 and the other with version 11.13 osomething else, ie different versions?

    Answer : Not advisable, compatibility issues. Unless, your doing rolling upgrades.

  • 8/3/2019 HA With MC Guard Concepts

    4/55

    MC/ServiceGuard Network Design

    Note : Usually heartbeat LAN, use internal LAN cardPrimary and Secondary LANs use 2 separate LAN cards.

    Question : How will a 3 node, 4 node cluster be like ?

    How can we configure the packages to failover ?? Many possibilities

    Heartbeat network- cross UTP- Serial cable- dedicated heartbeat subnet- Primary LAN usually set as secondary heartbeat

    Cluster/Package Node Configurations-ACTIVE ; ACTIVE

    -ACTIVE ; PASSIVE

    Cluster Lock Disk-Tie breaker-Who gets the lock disk who will reform the cluster, the other will panic reboot usually-What if the cluster lock disk is dead?? - UNPLANNED OUTAGE

    switch 1 switch 2

    User Lan (Securenet)

    lan0 lan1lan1 lan0

    lan2

    FC2

    lan2

    sgpue036 sgpue037

    heartbeat lan

    (cross UTP cable)

    Primary lan (15.209.0.25) -

    cable name : sgpue036

    Primary lan (15.209.0.26) -

    cable name : sgpue037

    192.0.0.1 192.0.0.2

    Keychain Database MC/ServiceGuard network design

    Failover lan(no physical IP, but must

    be connected to switch 1))cable name: sgpue037s

    Failover lan(no physical IP, but must

    be connected to switch 2))cable name: sgpue036s

  • 8/3/2019 HA With MC Guard Concepts

    5/55

    MC/ServiceGuard Monitoring-Hardware

    -Application

    -ITO

    -ClusterviewPlus

    -NNM

    MC/ServiceGuard Commands

    CmqueryclCmcheckconfCmapplyconf will distribute binary configuration details to all nodes in the clusterCmgetconf

    Cluster specific commandsCmruncl

    CmviewclCmhaltcl

    Node specific commandsCmrunnodeCmhaltnode

    Package specific commandsCmrunpkgCmhaltpkgCmmodpkg

    MC/ServiceGuard with SAM

    MC/ServiceGuard backups

    Database vendors online backup tools

    Split mirror

    Business Copy (VA, XP) KNET

    JFS snapshots

    Practise of backup for HPMS if no special requestso for filesystem backup whatever filesystem is mounted on which

    system, therefore if failed over.

    o for database SAP/DBA will consult tools team on backup strategy usually configure omniback to detect and backup by floating IP.

    Issues with BAMM ??

  • 8/3/2019 HA With MC Guard Concepts

    6/55

    Project Timeline (TAT)Gathering information 2 daysHardware setup (LAN)- 2 daysConfiguration - 3 days (varies) dependencies : Application/DB scriptsTesting - 1 day (require CE presence)

  • 8/3/2019 HA With MC Guard Concepts

    7/55

    Configure /etc/rc.config.d/netconf on each of the nodes in the cluster with the heartbeat LAN (if using LANand not serial interface)

    !"#

    #

    !"

    #

    $%&'&($)*+,-.+/)$&0$%&&)%$1&($.-23*&)

    $.-23*&%4&5*$$%54$6&5*$$%5*4$0%0$&(4$.%4&$)&0-%4&(30%&*%$-

    -%0&.)$%4&*$)&0-%4&(30%&*

    2222

    $)&*$$%$)&*$$%$)&*$$%$)&*$$%

    $)&*$$%$)&*$$%$)&*$$%$)&*$$%

    Sgpue036.sgp.hp.com root

    Sgpue037.sgp.hp.com root

    $%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%$%&'%&*%-7&+8&((*&%&%4&5&%(5(6(30%&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$&*5(6($)&-0%.-&$

    $)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0$)&04-0-0&(&00*+.$*%4&(30%&*%$-)&%-.+-%0$)&0

    Unmount Logical Volumes and deactivate the Volume Groups that will be controlled/run by the cluster.(These do not need to be entered in /etc/fstab)

    E.g.. vgchange a n vg02. vgchange a n vg03

    Note : It is possible that a cluster does not have any cluster lock disk or a even VG at all.

    Same for packages. Also each VG must be unique for each package, cannot use the same VG for

    other packages.

    Export and distribute the Volume Groups to the secondary (failover) node.

    E.g.

    1. vgexport p s m v /tmp/vg02.map /dev/vg022. vgexport p s m v /tmp/vg03.map /dev/vg03

  • 8/3/2019 HA With MC Guard Concepts

    8/55

    -p option : preview mode, so that the volume group will not be exported

    off the original node.

    - s option : Sharable option, Series 800 only. When the s option is

    is specified, then the -p, -v, and m options must also be

    specified. A mapfile is created that can be used to

    create volume group entries on other systems in the high

    availability cluster (with the vgimport command).

    - m option : generates the map file

    - v option : print verbose

    FTP the .map files to secondary (failover) node.

    On Secondary (failover) node, create the volume group directories:

    E.g.

    3. mkdir /dev/vg024. mkdir /dev/vg035. ls l /dev/*/group6. mknod /dev/vg01/group c 64 0x0200007. mknod /dev/vg02/group c 64 0x030000

    Import the volume groups onto the secondary (failover) nodeE.g.

    8. vgimport s m /tmp/vg02.map /dev/vg029. vgimport s m /tmp/vg03.map /dev/vg03

    Note : Leave the cluster volume groups decactivated.

    Configure the Cluster (do this on one node). cmquerycl [w] [full] v C /etc/cmcluster/cluster.conf n primary node n secondary node

    [n other nodes in the cluster]

    (Note : This will generate the cluster config file.)

    . Edit the /etc/cmcluster/cluster.conf file

    # **********************************************************************

    # ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE ***************

  • 8/3/2019 HA With MC Guard Concepts

    9/55

    # ***** For complete details about cluster parameters and how to ****

    # ***** set them, consult the ServiceGuard manual. ****

    # **********************************************************************

    # Enter a name for this cluster. This name will be used to identify the

    # cluster when viewing or manipulating it.

    CLUSTER_NAME Kcdatabases

    # Cluster Lock Parameters

    #

    # The cluster lock is used as a tie-breaker for situations

    # in which a running cluster fails, and then two equal-sized

    # sub-clusters are both trying to form a new cluster. The

    # cluster lock may be configured using either a lock disk# or a quorum server.

    #

    # You can use either the quorum server or the lock disk as

    # a cluster lock but not both in the same cluster.

    #

    # Consider the following when configuring a cluster.# For a two-node cluster, you must use a cluster lock. For

    # a cluster of three or four nodes, a cluster lock is strongly

    # recommended. For a cluster of more than four nodes, a

    # cluster lock is recommended. If you decide to configure

    # a lock for a cluster of more than four nodes, it must be

    # a quorum server.

    # Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and

    # FIRST_CLUSTER_LOCK_PV parameters to define a lock disk.

    # The FIRST_CLUSTER_LOCK_VG is the LVM volume group that

    # holds the cluster lock. This volume group should not be

    # used by any other cluster as a cluster lock device.

    # Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL,# and QS_TIMEOUT_EXTENSION parameters to define a quorum server.

    # The QS_HOST is the host name or IP address of the system

    # that is running the quorum server process. The

    # QS_POLLING_INTERVAL (microseconds) is the interval at which

    # ServiceGuard checks to make sure the quorum server is running.

    # The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase

    # the time interval after which the quorum server is marked DOWN.

    #

    # The default quorum server timeout is calculated from the

    # ServiceGuard cluster parameters, including NODE_TIMEOUT and

    # HEARTBEAT_INTERVAL. If you are experiencing quorum server

    # timeouts, you can adjust these parameters, or you can include

    # the QS_TIMEOUT_EXTENSION parameter.

    #

    # For example, to configure a quorum server running on node

    # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to

    # add 2 seconds to the system assigned value for the quorum server

    # timeout, enter:

    #

    # QS_HOST qshost

    # QS_POLLING_INTERVAL 120000000

    # QS_TIMEOUT_EXTENSION 2000000

    FIRST_CLUSTER_LOCK_VG /dev/vg02 < - -This is automatically searched for.

  • 8/3/2019 HA With MC Guard Concepts

    10/55

    # Definition of nodes in the cluster.# Repeat node definitions as necessary for additional nodes.

    NODE_NAME sgpue036

    NETWORK_INTERFACE lan0

    HEARTBEAT_IP 192.0.0.1.

  • 8/3/2019 HA With MC Guard Concepts

    11/55

    # Enter the maximum number of packages which will be configured in the cluster.

    # You can not add packages beyond this limit.

    # This parameter is required.

    MAX_CONFIGURED_PACKAGES 8

    # List of cluster aware LVM Volume Groups. These volume groups will# be used by package applications via the vgchange -a e command.

    # Neither CVM or VxVM Disk Groups should be used here.

    # For example:

    # VOLUME_GROUP /dev/vgdatabase

    # VOLUME_GROUP /dev/vg02

    VOLUME_GROUP /dev/vg02

    VOLUME_GROUP /dev/vg03

    Verify the Cluster Configuration (do this on one node)1. cmcheckconf [k] v C /etc/cmcluster/cluster.conf

    Note : If there are no errors, means that the cluster is ready to be applied.

    Distributing the Binary Configuration File (do this on one node). vgchange a y /dev/vg02 (cluster lock volume group). cmapplyconf [k] v C /etc/cmcluster/cluster.conf. vgchange a n /dev/vg02

    Note : Need to activate cluster lock volume group in order for it to be applied for first time

    clusters. Subsequent changes to the cluster may not need to activate cluster lock or even may

    not need to down the cluster ie can be done online but not recommended.

    Note : Need to deactivate cluster lock disk right after cluster changes are applied.

    Backing up Volume Group and Cluster Lock Configuration Data (optional)

    1. vgcfgbackup u /dev/vg022. vgcfgbackup u /dev/vg03

    Note : This does not requires the volume groups to be activated.

    Checking Cluster Operation (do on either node)

    1. cmruncl v2. cmhaltnode v primary node3. cmrunnode v primary node4. cmhaltcl v5. cmruncl v6. cmhaltcl v

    Note : Try this on all other nodes in the cluster as well.

    Disable Automount of Volume Groups (On both nodes)

    1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0

    Note : This is necessary as we do not want the cluster volume groups to be activated when a

    system reboots. It is now under the control of the cluster now.

  • 8/3/2019 HA With MC Guard Concepts

    12/55

    Disable Autostart Features (On both nodes)

    1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=0

    Note : This is to prevent the cluster node from automatically joining the cluster after a

    reboot. Usually done when doing maintenance.

    Create Packages

    E.g.1. mkdir /etc/cmcluster/kci2prd < - can be any name2. cmmakepkg p /etc/cmcluster/kci2prd.conf < - can be any name3. Edit the configuration file

    Note : If the package and control file is special (e.g NFS required) then do not run the

    cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS

    extension toolkit (similar for SAP extension). You still need to do adjustments to the files t

    suit your needs.

    # **********************************************************************

    # ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template) *******

    # **********************************************************************

    # ******* Note: This file MUST be edited before it can be used. ********

    # * For complete details about package parameters and how to set them, *

    # * consult the MC/ServiceGuard ServiceGuard OPS Edition manuals *******

    # **********************************************************************

    # Enter a name for this package. This name will be used to identify the

    # package when viewing or manipulating it. It must be different from

    # the other configured package names.

    PACKAGE_NAME kci2prd

    # Enter the package type for this package. PACKAGE_TYPE indicates

    # whether this package is to run as a FAILOVER or SYSTEM_MULTI_NODE

    # package.#

    # FAILOVER package runs on one node at a time and if a failure

    # occurs it can switch to an alternate node.

    #

    # SYSTEM_MULTI_NODE

    # package runs on multiple nodes at the same time.

    # It can not be started and halted on individual nodes.

    # Both NODE_FAIL_FAST_ENABLED and AUTO_RUN must be set

    # to YES for this type of package. All SERVICES must

    # have SERVICE_FAIL_FAST_ENABLED set to YES.

    #

    # NOTE: Packages which have a PACKAGE_TYPE of SYSTEM_MULTI_NODE are

    # not failover packages and should only be used for applications

    # provided by Hewlett-Packard.#

    # Since SYSTEM_MULTI_NODE packages run on multiple nodes at

    # one time, following parameters are ignored:

    #

    # FAILOVER_POLICY

    # FAILBACK_POLICY

    #

    # Since an IP address can not be assigned to more than node at a

    # time, relocatable IP addresses can not be assigned in the

    # package control script for multiple node packages. If

  • 8/3/2019 HA With MC Guard Concepts

    13/55

    # volume groups are assigned to multiple node packages they must

    # activated in a shared mode and data integrity is left to the

    # application. Shared access requires a shared volume manager.

    #

    #

    # Examples : PACKAGE_TYPE FAILOVER (default)

    # PACKAGE_TYPE SYSTEM_MULTI_NODE

    #

    PACKAGE_TYPE FAILOVER

    # Enter the failover policy for this package. This policy will be used

    # to select an adoptive node whenever the package needs to be started.

    # The default policy unless otherwise specified is CONFIGURED_NODE.

    # This policy will select nodes in priority order from the list of

    # NODE_NAME entries specified below.

    #

    # The alternative policy is MIN_PACKAGE_NODE. This policy will select

    # the node, from the list of NODE_NAME entries below, which is

    # running the least number of packages at the time this package needs

    # to start.

    FAILOVER_POLICY CONFIGURED_NODE

    # Enter the failback policy for this package. This policy will be used

    # to determine what action to take when a package is not running on

    # its primary node and its primary node is capable of running the

    # package. The default policy unless otherwise specified is MANUAL.

    # The MANUAL policy means no attempt will be made to move the package

    # back to its primary node when it is running on an adoptive node.

    #

    # The alternative policy is AUTOMATIC. This policy will attempt to

    # move the package back to its primary node whenever the primary node

    # is capable of running the package.

    FAILBACK_POLICY MANUAL

    # Enter the names of the nodes configured for this package. Repeat

    # this line as necessary for additional adoptive nodes.

    #

    # NOTE: The order is relevant.

    # Put the second Adoptive Node after the first one.

    #

    # Example : NODE_NAME original_node

    # NODE_NAME adoptive_node

    #

    # If all nodes in cluster is to be specified and order is not

    # important, "NODE_NAME *" may be specified.

    #

    # Example : NODE_NAME *

    NODE_NAME sgpue036 NODE_NAME sgpue037

    # Enter the value for AUTO_RUN. Possible values are YES and NO.

    # The default for AUTO_RUN is YES. When the cluster is started the

    # package will be automatically started. In the event of a failure the

  • 8/3/2019 HA With MC Guard Concepts

    14/55

    # package will be started on an adoptive node. Adjust as necessary.

    #

    # AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED.

    AUTO_RUN YES

    # Enter the value for LOCAL_LAN_FAILOVER_ALLOWED.

    # Possible values are YES and NO.

    # The default for LOCAL_LAN_FAILOVER_ALLOWED is YES. In the event of a

    # failure, this permits the cluster software to switch LANs locally

    # (transfer to a standby LAN card). Adjust as necessary.

    #

    # LOCAL_LAN_FAILOVER_ALLOWED replaces obsolete NET_SWITCHING_ENABLED.

    LOCAL_LAN_FAILOVER_ALLOWED YES

    # Enter the value for NODE_FAIL_FAST_ENABLED.

    # Possible values are YES and NO.

    # The default for NODE_FAIL_FAST_ENABLED is NO. If set to YES,

    # in the event of a failure, the cluster software will halt the node

    # on which the package is running. All SYSTEM_MULTI_NODE packages must have# NODE_FAIL_FAST_ENABLED set to YES. Adjust as necessary.

    NODE_FAIL_FAST_ENABLED NO

    # Enter the complete path for the run and halt scripts. In most cases

    # the run script and halt script specified here will be the same script,

    # the package control script generated by the cmmakepkg command. This

    # control script handles the run(ning) and halt(ing) of the package.

    # Enter the timeout, specified in seconds, for the run and halt scripts.

    # If the script has not completed by the specified timeout value,

    # it will be terminated. The default for each script timeout is

    # NO_TIMEOUT. Adjust the timeouts as necessary to permit full

    # execution of each script.

    # Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of# all SERVICE_HALT_TIMEOUT specified for all services.

    RUN_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl

    RUN_SCRIPT_TIMEOUT NO_TIMEOUTHALT_SCRIPT /etc/cmcluster/kci2prd/kci2prd.cntl

    HALT_SCRIPT_TIMEOUT NO_TIMEOUT

    # Enter the names of the storage groups configured for this package.

    # Repeat this line as necessary for additional storage groups.

    #

    # Storage groups are only used with CVM disk groups. Neither

    # VxVM disk groups or LVM volume groups should be listed here.

    # By specifying a CVM disk group with the STORAGE_GROUP keyword

    # this package will not run until the VxVM-CVM-pkg package is

    # running and thus the CVM shared disk groups are ready for

    # activation.

    #

    # NOTE: Should only be used by applications provided by

    # Hewlett-Packard.

    #

    # Example : STORAGE_GROUP dg01

    # STORAGE_GROUP dg02

    # STORAGE_GROUP dg03

  • 8/3/2019 HA With MC Guard Concepts

    15/55

    # STORAGE_GROUP dg04

    #

    # Enter the SERVICE_NAME, the SERVICE_FAIL_FAST_ENABLED and the

    # SERVICE_HALT_TIMEOUT values for this package. Repeat these

    # three lines as necessary for additional service names. All

    # service names MUST correspond to the service names used by

    # cmrunserv and cmhaltserv commands in the run and halt scripts.

    #

    # The value for SERVICE_FAIL_FAST_ENABLED can be either YES or

    # NO. If set to YES, in the event of a service failure, the

    # cluster software will halt the node on which the service is

    # running. If SERVICE_FAIL_FAST_ENABLED is not specified, the

    # default will be NO.

    #

    # SERVICE_HALT_TIMEOUT is represented in the number of seconds.

    # This timeout is used to determine the length of time (in

    # seconds) the cluster software will wait for the service to

    # halt before a SIGKILL signal is sent to force the termination

    # of the service. In the event of a service halt, the cluster

    # software will first send a SIGTERM signal to terminate the

    # service. If the service does not halt, after waiting for the# specified SERVICE_HALT_TIMEOUT, the cluster software will send

    # out the SIGKILL signal to the service to force its termination.

    # This timeout value should be large enough to allow all cleanup

    # processes associated with the service to complete. If the

    # SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be

    # assumed, meaning the cluster software will not wait at all

    # before sending the SIGKILL signal to halt the service.

    #

    # Example: SERVICE_NAME DB_SERVICE

    # SERVICE_FAIL_FAST_ENABLED NO

    # SERVICE_HALT_TIMEOUT 300

    #

    # To configure a service, uncomment the following lines and

    # fill in the values for all of the keywords.#

    SERVICE_NAME kci2prdSERVICE_FAIL_FAST_ENABLED NO

    SERVICE_HALT_TIMEOUT 300

    # Enter the network subnet name that is to be monitored for this package.

    # Repeat this line as necessary for additional subnet names. If any of

    # the subnets defined goes down, the package will be switched to another

    # node that is configured for this package and has all the defined subnets

    # available.

    SUBNET 15.209.0.0

    # The keywords RESOURCE_NAME, RESOURCE_POLLING_INTERVAL,

    # RESOURCE_START, and RESOURCE_UP_VALUE are used to specify Package

    # Resource Dependencies. To define a package Resource Dependency, a

    # RESOURCE_NAME line with a fully qualified resource path name, and

    # one or more RESOURCE_UP_VALUE lines are required. The

    # RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional.

    #

    # The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the

    # resource is to be monitored. It will be defaulted to 60 seconds if

  • 8/3/2019 HA With MC Guard Concepts

    16/55

    # RESOURCE_POLLING_INTERVAL is not specified.

    #

    # The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.

    # The default setting for RESOURCE_START is AUTOMATIC. If AUTOMATIC

    # is specified, ServiceGuard will start up resource monitoring for

    # these AUTOMATIC resources automatically when the node starts up.

    # If DEFERRED is selected, ServiceGuard will not attempt to start

    # resource monitoring for these resources during node start up. User

    # should specify all the DEFERRED resources in the package run script

    # so that these DEFERRED resources will be started up from the package

    # run script during package run time.

    #

    # RESOURCE_UP_VALUE requires an operator and a value. This defines

    # the resource 'UP' condition. The operators are =, !=, >, =,

    # and or >= may be used

    # for the first operator, and only < or 5.1 greater than 5.1 (threshold)

    # RESOURCE_UP_VALUE > -5 and < 10 between -5 and 10 (range)

    #

    # Note that "and" is required between the lower limit and upper limit# when specifying a range. The upper limit must be greater than the lower

    # limit. If RESOURCE_UP_VALUE is repeated within a RESOURCE_NAME block, then

    # they are inclusively OR'd together. Package Resource Dependencies may be

    # defined by repeating the entire RESOURCE_NAME block.

    #

    # Example : RESOURCE_NAME /net/interfaces/lan/status/lan0

    # RESOURCE_POLLING_INTERVAL 120

    # RESOURCE_START AUTOMATIC

    # RESOURCE_UP_VALUE = RUNNING

    # RESOURCE_UP_VALUE = ONLINE

    #

    # Means that the value of resource /net/interfaces/lan/status/lan0

    # will be checked every 120 seconds, and is considered to

    # be 'up' when its value is "RUNNING" or "ONLINE".

    #

    # Uncomment the following lines to specify Package Resource Dependencies.

    #

    #RESOURCE_NAME

    #RESOURCE_POLLING_INTERVAL

    #RESOURCE_START

    #RESOURCE_UP_VALUE [and ]

  • 8/3/2019 HA With MC Guard Concepts

    17/55

    Create Package Control Scripts1. cmmakepkg s /etc/cmcluster/kci2prd/kci2prd.cntl2. Edit the control script.

    Note : If the package and control file is special (e.g NFS required) then do not run the

    cmmakepkg command, just get the predefined config and control scripts from the MC/SG NFS

    extension toolkit (similar for SAP extension). You still need to do adjustments to the files tsuit your needs.

    Note : It is possible that packages do not use any volume groups.

    # **********************************************************************

    # * *

    # * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) *

    # * *

    # * Note: This file MUST be edited before it can be used. *

    # * *

    # **********************************************************************

    # The PACKAGE and NODE environment variables are set by

    # ServiceGuard at the time the control script is executed.

    # Do not set these environment variables yourself!

    # The package may fail to start or halt if the values for

    # these environment variables are altered.

    # UNCOMMENT the variables as you set them.

    # Set PATH to reference the appropriate directories.

  • 8/3/2019 HA With MC Guard Concepts

    18/55

    PATH=/usr/bin:/usr/sbin:/etc:/bin

    # VOLUME GROUP ACTIVATION:

    # Specify the method of activation for volume groups.

    # Leave the default ("VGCHANGE="vgchange -a e") if you want volume

    # groups activated in exclusive mode. This assumes the volume groups have

    # been initialized with 'vgchange -c y' at the time of creation.

    #

    # Uncomment the first line (VGCHANGE="vgchange -a e -q n"), and comment

    # out the default, if your disks are mirrored on separate physical paths,

    #

    # Uncomment the second line (VGCHANGE="vgchange -a e -q n -s"), and comment

    # out the default, if your disks are mirrored on separate physical paths,

    # and you want the mirror resynchronization to ocurr in parallel with

    # the package startup.

    #

    # Uncomment the third line (VGCHANGE="vgchange -a y") if you wish to

    # use non-exclusive activation mode. Single node cluster configurations

    # must use non-exclusive activation.

    #

    # VGCHANGE="vgchange -a e -q n"

    # VGCHANGE="vgchange -a e -q n -s"

    # VGCHANGE="vgchange -a y"VGCHANGE="vgchange -a e" # Default

    # CVM DISK GROUP ACTIVATION:

    # Specify the method of activation for CVM disk groups.

    # Leave the default

    # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite")

    # if you want disk groups activated in the exclusive write mode.

    #

    # Uncomment the first line

    # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"),

    # and comment out the default, if you want disk groups activated in

    # the readonly mode.

    #

    # Uncomment the second line# (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"),

    # and comment out the default, if you want disk groups activated in the

    # shared read mode.

    #

    # Uncomment the third line

    # (CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"),

    # and comment out the default, if you want disk groups activated in the

    # shared write mode.

    #

    # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=readonly"

    # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedread"

    # CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=sharedwrite"

    CVM_ACTIVATION_CMD="vxdg -g \$DiskGroup set activation=exclusivewrite"

    # VOLUME GROUPS

    # Specify which volume groups are used by this package. Uncomment VG[0]=""

    # and fill in the name of your first volume group. You must begin with

    # VG[0], and increment the list in sequence.

    #

    # For example, if this package uses your volume groups vg01 and vg02, enter:

    # VG[0]=vg01

    # VG[1]=vg02

    #

    # The volume group activation method is defined above. The filesystems

  • 8/3/2019 HA With MC Guard Concepts

    19/55

    # associated with these volume groups are specified below.

    #

    VG[0]=vg02VG[1]=vg03

    # CVM DISK GROUPS

    # Specify which cvm disk groups are used by this package. Uncomment

    # CVM_DG[0]="" and fill in the name of your first disk group. You must

    # begin with CVM_DG[0], and increment the list in sequence.

    #

    # For example, if this package uses your disk groups dg01 and dg02, enter:

    # CVM_DG[0]=dg01

    # CVM_DG[1]=dg02

    #

    # The cvm disk group activation method is defined above. The filesystems

    # associated with these volume groups are specified below in the CVM_*

    # variables.

    #

    #CVM_DG[0]=""

    # VxVM DISK GROUPS

    # Specify which VxVM disk groups are used by this package. Uncomment

    # VXVM_DG[0]="" and fill in the name of your first disk group. You must# begin with VXVM_DG[0], and increment the list in sequence.

    #

    # For example, if this package uses your disk groups dg01 and dg02, enter:

    # VXVM_DG[0]=dg01

    # VXVM_DG[1]=dg02

    #

    # The cvm disk group activation method is defined above.

    #

    #VXVM_DG[0]=""

    #

    # NOTE: A package could have LVM volume groups, CVM disk groups and VxVM

    # disk groups.

    ## FILESYSTEMS

    # Specify the filesystems which are used by this package. Uncomment

    # LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]="" and fill in the name of your first

    # logical volume, filesystem and mount option for the file system. You must

    # begin with LV[0], FS[0] and FS_MOUNT_OPT[0] and increment the list in

    # sequence.

    #

    # For the LVM example, if this package uses the file systems pkg1a and

    # pkg1b, which are mounted on the logical volumes lvol1 and lvol2 with

    # read and write options enter:

    # LV[0]=/dev/vg01/lvol1; FS[0]=/pkg1a; FS_MOUNT_OPT[0]="-o rw"

    # LV[1]=/dev/vg01/lvol2; FS[1]=/pkg1b; FS_MOUNT_OPT[1]="-o rw"

    #

    # For the CVM or VxVM example, if this package uses the file systems

    # pkg1a and pkg1b, which are mounted on the volumes lvol1 and lvol2

    # with read and write options enter:

    # LV[0]="/dev/vx/dsk/dg01/vol01"; FS[0]="/pkg1a"; FS_MOUNT_OPT[0]="-o rw"

    # LV[1]="/dev/vx/dsk/dg01/vol02"; FS[1]="/pkg1b"; FS_MOUNT_OPT[1]="-o rw"

    #

    # The filesystems are defined as triplets of entries specifying the logical

    # volume, the mount point and the mount options for the file system. Each

    # filesystem will be fsck'd prior to being mounted. The filesystems will be

    # mounted in the order specified during package startup and will be unmounted

    # in reverse order during package shutdown. Ensure that volume groups

  • 8/3/2019 HA With MC Guard Concepts

    20/55

    # referenced by the logical volume definitions below are included in

    # volume group definitions above.

    #

    #LV[0]=""; FS[0]=""; FS_MOUNT_OPT[0]=""

    LV[0]=/dev/vg02/lvol1; FS[0]=/oracle/KCI2PRD/data01; FS_MOUNT_OPT[0]="-o rw,suid,largefiles"

    LV[1]=/dev/vg02/lvol2; FS[1]=/oracle/KCI2PRD/data02; FS_MOUNT_OPT[1]="-o rw,suid,largefiles"LV[2]=/dev/vg02/lvol3; FS[2]=/oracle/KCI2PRD/data03; FS_MOUNT_OPT[2]="-o rw,suid,largefiles"LV[3]=/dev/vg02/lvol4; FS[3]=/oracle/KCI2PRD/data04; FS_MOUNT_OPT[3]="-o rw,suid,largefiles"

    LV[4]=/dev/vg02/lvol5; FS[4]=/oracle/KCI2PRD/data05; FS_MOUNT_OPT[4]="-o rw,suid,largefiles"

    LV[5]=/dev/vg02/lvol6; FS[5]=/oracle/KCI2PRD/data06; FS_MOUNT_OPT[5]="-o rw,suid,largefiles"LV[6]=/dev/vg02/lvol7; FS[6]=/oracle/KCI2PRD/data07; FS_MOUNT_OPT[6]="-o rw,suid,largefiles"

    LV[7]=/dev/vg02/lvol8; FS[7]=/oracle/KCI2PRD/data08; FS_MOUNT_OPT[7]="-o rw,suid,largefiles"LV[8]=/dev/vg02/lvol9; FS[8]=/oracle/KCI2PRD/data09; FS_MOUNT_OPT[8]="-o rw,suid,largefiles"

    LV[9]=/dev/vg02/lvol10; FS[9]=/oracle/KCI2PRD/data10; FS_MOUNT_OPT[9]="-o rw,suid,largefiles"LV[10]=/dev/vg02/lvol11; FS[10]=/oracle/KCI2PRD/mirrlogA; FS_MOUNT_OPT[10]="-o

    rw,suid,largefiles"LV[11]=/dev/vg02/lvol12; FS[11]=/oracle/KCI2PRD/mirrlogB; FS_MOUNT_OPT[11]="-o

    rw,suid,largefiles"LV[12]=/dev/vg02/lvol13; FS[12]=/oracle/KCI2PRD/origlogA; FS_MOUNT_OPT[12]="-o

    rw,suid,largefiles"LV[13]=/dev/vg02/lvol14; FS[13]=/oracle/KCI2PRD/origlogB; FS_MOUNT_OPT[13]="-o

    rw,suid,largefiles"LV[14]=/dev/vg03/lvol1; FS[14]=/oracle/KCI2PRD/arch; FS_MOUNT_OPT[14]="-o rw,suid,largefiles"

    LV[15]=/dev/vg03/lvol2; FS[15]=/oracle/KCI2PRD/bkup01; FS_MOUNT_OPT[15]="-o rw,suid,largefiles

    #

    # VOLUME RECOVERY

    #

    # When mirrored VxVM volumes are started during the package control

    # bring up, if recovery is required the default behavior is for

    # the package control script to wait until recovery has been

    # completed.

    #

    # To allow mirror resynchronization to ocurr in parallel with

    # the package startup, uncomment the line

    # VXVOL="vxvol -g \$DiskGroup -o bg startall" and comment out the default.#

    # VXVOL="vxvol -g \$DiskGroup -o bg startall"

    VXVOL="vxvol -g \$DiskGroup startall" # Default

    # FILESYSTEM UNMOUNT COUNT

    # Specify the number of unmount attempts for each filesystem during package

    # shutdown. The default is set to 1.

    FS_UMOUNT_COUNT=1

    # FILESYSTEM MOUNT RETRY COUNT.

    # Specify the number of mount retrys for each filesystem.

    # The default is 0. During startup, if a mount point is busy

    # and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and

    # the script will exit with 1. If a mount point is busy and

    # FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt

    # to kill the user responsible for the busy mount point

    # and then mount the file system. It will attempt to kill user and

    # retry mount, for the number of times specified in FS_MOUNT_RETRY_COUNT.

    # If the mount still fails after this number of attempts, the script

    # will exit with 1.

    # NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute

    # "fuser -ku" to freeup busy mount point.

    FS_MOUNT_RETRY_COUNT=0

  • 8/3/2019 HA With MC Guard Concepts

    21/55

    # CONCURRENT VGCHANGE OPERATIONS

    # Specify the number of concurrent volume group activations or

    # deactivations to allow during package startup or shutdown.

    # Setting this value to an appropriate number may improve the performance

    # while activating or deactivating a large number of volume groups in the

    # package. If the specified value is less than 1, the script defaults it

    # to 1 and proceeds with a warning message in the package control script

    # logfile.

    CONCURRENT_VGCHANGE_OPERATIONS=1

    # CONCURRENT DISK GROUP OPERATIONS

    # Specify the number of concurrent VxVM DG imports or deports to allow

    # during package startup or shutdown.

    # Setting this value to an appropriate number may improve the performance

    # while importing or deporting a large number of disk groups in the

    # package. If the specified value is less than 1, the script defaults it

    # to 1 and proceeds with a warning message in the package control script

    # logfile.

    CONCURRENT_DISKGROUP_OPERATIONS=1

    # CONCURRENT FSCK OPERATIONS

    # Specify the number of concurrent fsck to allow during package startup.# Setting this value to an appropriate number may improve the performance

    # while checking a large number of file systems in the package. If the

    # specified value is less than 1, the script defaults it to 1 and proceeds

    # with a warning message in the package control script logfile.

    CONCURRENT_FSCK_OPERATIONS=1

    # CONCURRENT MOUNT AND UMOUNT OPERATIONS

    # Specify the number of concurrent mounts and umounts to allow during

    # package startup or shutdown.

    # Setting this value to an appropriate number may improve the performance

    # while mounting or un-mounting a large number of file systems in the package.

    # If the specified value is less than 1, the script defaults it to 1 and

    # proceeds with a warning message in the package control script logfile.

    CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1

    # IP ADDRESSES

    # Specify the IP and Subnet address pairs which are used by this package.

    # Uncomment IP[0]="" and SUBNET[0]="" and fill in the name of your first

    # IP and subnet address. You must begin with IP[0] and SUBNET[0] and

    # increment the list in sequence.

    #

    # For example, if this package uses an IP of 192.10.25.12 and a subnet of

    # 192.10.25.0 enter:

    # IP[0]=192.10.25.12

    # SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0)

    #

    # Hint: Run "netstat -i" to see the available subnets in the Network field.

    #

    # IP/Subnet address pairs for each IP address you want to add to a subnet

    # interface card. Must be set in pairs, even for IP addresses on the same

    # subnet.

    #

    #IP[0]=""

    #SUBNET[0]=""

    IP[0]="15.209.0.33"

    SUBNET[0]="15.209.0.0" # netmask 255.255.255.192

  • 8/3/2019 HA With MC Guard Concepts

    22/55

    # SERVICE NAMES AND COMMANDS.

    # Specify the service name, command, and restart parameters which are

    # used by this package. Uncomment SERVICE_NAME[0]="", SERVICE_CMD[0]="",

    # SERVICE_RESTART[0]="" and fill in the name of the first service, command,

    # and restart parameters. You must begin with SERVICE_NAME[0], SERVICE_CMD[0],

    # and SERVICE_RESTART[0] and increment the list in sequence.

    #

    # For example:

    # SERVICE_NAME[0]=pkg1a

    # SERVICE_CMD[0]="/usr/bin/X11/xclock -display 192.10.25.54:0"

    # SERVICE_RESTART[0]="" # Will not restart the service.

    #

    # SERVICE_NAME[1]=pkg1b

    # SERVICE_CMD[1]="/usr/bin/X11/xload -display 192.10.25.54:0"

    # SERVICE_RESTART[1]="-r 2" # Will restart the service twice.

    #

    # SERVICE_NAME[2]=pkg1c

    # SERVICE_CMD[2]="/usr/sbin/ping"

    # SERVICE_RESTART[2]="-R" # Will restart the service an infinite

    # number of times.

    #

    # Note: No environmental variables will be passed to the command, this

    # includes the PATH variable. Absolute path names are required for the# service command definition. Default shell is /usr/bin/sh.

    #

    #SERVICE_NAME[0]=""

    #SERVICE_CMD[0]=""

    #SERVICE_RESTART[0]=""

    SERVICE_NAME[0]=kci2prdSERVICE_CMD[0]="/etc/cmcluster/kci2prd/kci2prd.sh monitor"SERVICE_RESTART[0]=""

    # DEFERRED_RESOURCE NAME

    # Specify the full path name of the 'DEFERRED' resources configured for

    # this package. Uncomment DEFERRED_RESOURCE_NAME[0]="" and fill in the# full path name of the resource.

    #

    #DEFERRED_RESOURCE_NAME[0]=""

    # DTC manager information for each DTC.

    # Example: DTC[0]=dtc_20

    #DTC_NAME[0]=

    # START OF CUSTOMER DEFINED FUNCTIONS

    # This function is a place holder for customer define functions.

    # You should define all actions you want to happen here, before the service is

    # started. You can create as many functions as you need.

    function customer_defined_run_cmds

    {

    # ADD customer defined run commands.

    : # do nothing instruction, because a function must contain some command.

    /etc/cmcluster/kci2prd/kci2prd.sh start

    test_return 51

    }

  • 8/3/2019 HA With MC Guard Concepts

    23/55

    # This function is a place holder for customer define functions.

    # You should define all actions you want to happen here, before the service is

    # halted.

    function customer_defined_halt_cmds

    {

    # ADD customer defined halt commands.

    : # do nothing instruction, because a function must contain some command.

    /etc/cmcluster/kci2prd/kci2prd.sh shutdown

    test_return 52

    }

    # END OF CUSTOMER DEFINED FUNCTIONS

    ..

    Ftp all ascii scripts to secondary (failover) node/nodes.

    Verify the Cluster Configuration (Do this on the packages primary node)cmcheckconf [C] [/etc/cmcluster/cluster.conf] P /etc/cmcluster/kci2prd/kci2prd.conf

    Note : If there are no errors, means that the package is ready to be applied

    Distribute the Cluster Configuration File (Do this on the packages primary node)1. vgchange a y /dev/vg02 (cluster lock volume group)

    cmapplyconf [v] [C] [/etc/cmcluster/cluster.conf] P

    /etc/cmcluster/kci2prd/kci2prd.conf

    3. vgchange a n /dev/vg02

    Note : You should not need to activate and later deactivate cluster lock volume group while

    applying packages.

    Note : Repeat steps Create Packages to here again if there are more packages required in thecluster.

    Configure Automounter (Do this only if your system is using automounter)Check that in /etc/rc.config.d/nfsconf, the automounter section should be:

    AUTOMOUNT=1

    AUTOMASTER="/etc/auto_master"

    AUTOMOUNT_OPTIONS="-f $AUTO_MASTER"

    AUTOMOUNTD_OPTIONS=

    Check in /etc/rc.config.d/nfsconf, one nfs client and one nfs server daemon is configured to

    run:

    NFS_CLIENT=1

    NFS_SERVER=1

    NUM_NFSD=4

    NUM_NFSIOD=4Add this line to /etc/auto_master

    /- /etc/auto.direct

    Create an /etc/auto.direct file

    /oracle :/export/

    Restart the automounter with

    /sbin/init.d/nfs.client stop

    /sbin/init.d/nfs.client start

    Disable Automount of Volume Groups (On both nodes)

    1. Edit /etc/lvmrc file and set AUTO_VG_ACTIVATE=0

  • 8/3/2019 HA With MC Guard Concepts

    24/55

    Enable Autostart Features (On both nodes)

    1. Edit /etc/rc.config.d/cmcluster and set AUTOSTART_CMCLD=1

    Checking Package Operation (do on either node)

    7. cmruncl v8. cmhaltnode v primary node (node will be halted and package failed

    over to secondary (adoptive) node)

    9. cmrunnode v primary node (node will rejoin cluster)10. cmhaltpkg package name (halt package on adoptive node)11. cmrunpkg package name (run package on original node)12. cmmodpkg e package name (enable package switching)13. cmhaltcl v

    Note : Use cmviewcl or cmviewcl v to view results of each command.

    MC/ServiceGuard Template

    System Configuration

    Hardware Information

    Hostname

    Model

    Operating System version

    Physical Memory

    Swap Space

    Non-Shared HDs

    Shared HDs

    Tapes

    LAN Cards

    Primary and Standby Network

    Type

    Heartbeat Network TypeMC ServiceGuard Version

    MirrorDisk/UX Version

    Online JFS Version

    Application name / Application

    version

    Database name / Database version

    OS/Appls Patch Level

  • 8/3/2019 HA With MC Guard Concepts

    25/55

    System Information

    Server Hostname

    Server IP Address

    Server IP Netmask

    Server Default Router

    Primary Network on separate

    Switch

    Standby Network on separate

    Switch

    Operation System File System Layout

    Volume Group Logical FS Type Size (mb) Mount point

    MC/ServiceGuard Configuration

    Cluster Information

    Cluster Name

    Cluster Members

    Cluster Lock Disk

    Heartbeat Interval Default Value is 1

    Node Timeout Default Value is 2 ; recommended 8

    Network Polling Interval Default Value is 2

    Autostart Delay Default Value is 10mins

    Maximum Configured Packages To allow online package reconfiguration

    Packages Overview

    The cluster consist of ________ packages:

    1.

    2.3.

    Detailed Package Information:

    Package Name

    Re-locatable Hostname

    Re-locatable IP Address

    Monitor Subnet

  • 8/3/2019 HA With MC Guard Concepts

    26/55

    Primary Node

    Adoptive Node

    Run/Halt Script

    Run/Halt Script Timeout

    Package Switch Enabled

    Network Switch Enabled

    Node Failfast Enabled

    Service NameVolume Groups

    Logic Logical Volume and File System Details

    Device file Size/ Type Mount Point Owner Group Perm.

    Parameter Value

    CLUSTER_NAME

    FIRST_CLUSTER_LOCK_VG

    NODE_NAME

    NETWORK_INTERFACE

    HEARTBEAT_IP

    NETWORK_INTERFACE

    HEARTBEAT_IP

    FIRST_CLUSTER_LOCK_PV

    NODE_NAME

    NETWORK_INTERFACE

    HEARTBEAT_IP

    NETWORK_INTERFACE

    HEARTBEAT_IP

  • 8/3/2019 HA With MC Guard Concepts

    27/55

    FIRST_CLUSTER_LOCK_PV

    HEARTBEAT_INTERVAL (Default value is 1s)

    NODE_TIMEOUT (Default value is 2s)

    AUTO_START_TIMEOUT (Default value is 10 mins)

    NETWORK_POLLING_INTERVAL (Default value is 2s)

    MAX_CONFIGURED_PACKAGES (To allow and add for online package

    reconfiguration)

    VOLUME_GROUP

    !"

    Parameter Value

    PACKAGE_NAME

    NODE_NAME

    NODE_NAME

    RUN_SCRIPT

    RUN_SCRIPT_TIMEOUT

    HALT_SCRIPT

    HALT_SCRIPT_TIMEOUT

    SERVICE_NAME

    SUBNET

    AUTO_RUN

    (PKG_SWITCHING_ENABLED)

    YES

    LOCAL_LAN_FAILOVER_ALLOWED

    (NET_SWITCHING_ENABLED)

    YES

  • 8/3/2019 HA With MC Guard Concepts

    28/55

    NODE_FAIL_FAST_ENABLED NO

    !#"

    Parameter Value

    PATH (Default value is 2s)

    VGCHANGE "vgchange a e"

    VG[0]

    VG[1]

    LV[0]

    LV[1]

    LV[2]

    FS[0]

    FS[1]

    FS[2]

    IP[0]

    SUBNET[0]

    SERVICE_NAME[0]

    SERVICE_CMD[0]

    SERVICE_RESTART[0]

    function

    customer_defined_run_cmds

  • 8/3/2019 HA With MC Guard Concepts

    29/55

    function

    customer_defined_halt_cmds

    $%#'"

    Parameter Value

    INFORMIX_HOME or

    ORACLE_HOME

    INFORMIX_SESSION_NAME or

    ORACLE_SESSION HOME

    (Mount point and session name)

    MONITOR_INTERVAL (Time between checks)

    MONITOR_PROCESSES (Processes like dataserver etc)

    PACKAGE_NAME

    TIME_OUT (Waiting time in seconds for Informix/Oracle

    abort to complete before killing

    Informix/Oracleprocesses)

    Note : If it is oracle, SAP or NFS, there are pre-defined scripts for these, provided you

    install the enterprise master toolkit and nfs toolkit - /opt/cmcluster/

  • 8/3/2019 HA With MC Guard Concepts

    30/55

    STING MC/SERVICEGUARD

    1.1 Test Overview

    This section contains the test requirement and test plan for the MC/ServiceGuard

    1.2 Test Requirement

    The MC/ServiceGuard product is a High Availability solution that performs system failure detection and transfers the applicati

    from the primary node to the adoptive node when a system failure occurs.

    Note : We assume that there is only 1 package in the cluster. If in the event there are more packages, please change/add steaccordingly.

    The faults to be tested and the appropriate methods are listed below:

    Type of Failure Method of Simulation

    CPU, Memory, Power Supply and

    Operating System

    Active LAN

    Total Data LAN

    Reset of server

    Removal of LAN cable from active LAN card

    Removal of all Data LAN cables from server

    1.3 Verification method

    Upon startup of the package, the verification checkpoints are

  • 8/3/2019 HA With MC Guard Concepts

    31/55

    . Log onto surviving server and run the command cmviewcl to check that the package application is RUNNING

    . Ping the relocatable IP from another station in the same network

    . Check that all shared file systems are mounted.

  • 8/3/2019 HA With MC Guard Concepts

    32/55

    1.4 Test Checklist

    Five categories of test that will be performed are as follows:

    a. Normal Bootupb. Manual Package Switching Functionality

    c. LAN Failure Tests Heartbeat Failure

    Data LAN Failure

    d. System Failure Testse. Failures not affecting package These are sanity checks to ensure that failure of the adoptive node in the cluster has no side effect on the primary node.

    No. Test Method of

    Simulation

    Expected Result Check Remarks

    NORMAL BOOTUP SEQUENCE

    1 Normal boot

    up

    Power on or reboot

    both servers

    Cluster is up with

    node1 and node2running and package is

    running on node1

    MANUAL PACKAGE SWITCHING FUNCTIONALITY

    1 Package halts

    successfully

    on node1

    Run cmhaltpkg v

    package command

    Application shuts

    down successfully and

    package is halted

    properly

    2 Package

    starts

    successfully

    on node2

    Run cmrunpkg v

    n node2 package

    command

    Package starts up

    successfully on node2

    3 Package halts

    successfully

    on node2

    Run cmhaltpkg v

    package command

    Application shuts

    down successfully and

    package is halted

    properly

    4 Package

    starts

    successfully

    on node1

    Run cmrunpkg v

    n node1 package

    command

    Package starts up

    successfully on node1

  • 8/3/2019 HA With MC Guard Concepts

    33/55

    No. Test Method of

    Simulation

    Expected Result Check Remarks

    LAN FAILURE TESTS

    1 Heartbeat

    LAN failure

    on node1

    package is running

    on node1

    Pull out lan0 cable

    on node1

    lan1 takes over as

    Heartbeat LAN and

    package remainsrunning on node1

    2 Pri Data LANfailure on

    node1

    package is runningon node1

    Pull out lan1 cable

    on node1

    Sec LAN, lan5 takesover as active LAN

    and package remains

    running on node1

    3 Sec Data

    LAN failure

    on node1

    package is running

    on node1

    Pull out lan5 cable

    on node1

    Pri LAN, lan1 takes

    over as active LAN

    and package remains

    running on node1

    4 Total Data

    LAN Failure

    on node1

    package is running

    on node1

    Pull out lan1 and

    lan5 from node1

    Package fails to node2

    if it is running as a

    node in the cluster ; 50

    % chance of failing on

    adoptive node as

    unable to get cluster

    lock and panic reboots

    5 Heartbeat

    LAN failure

    on node2

    package is running

    on node2

    Pull out lan0 cable

    on node2

    lan1 takes over as

    Heartbeat LAN and

    package remains

    running on node2

    6 Pri Data LANfailure on

    node2

    package is runningon node2

    Pull out lan1 cable

    on node2

    Secondary LAN, lan5takes over as active

    LAN and package

    remains running on

    node2

    7 Sec Data

    LAN failure

    on node2

    package is running

    on node2

    Pull out lan5 cable

    on node2

    Pri LAN, lan1 takes

    over as active LAN

    and package remains

    running on node2

    8 Total Data

    LAN Failure

    on node2

    package is running

    on node2

    Pull out lan1 and

    lan5 from node2

    Package fails to node1

    if it is running as a

    node in the cluster; ;50 % chance of failing

    on adoptive node as

    unable to get cluster

    lock and panic reboots

  • 8/3/2019 HA With MC Guard Concepts

    34/55

    You may wish to extend the test to test the functionaility of MC/ServiceGuard with regards to application monitoring

    scripts and application failover.

    No. Test Method of

    Simulation

    Result Check Remarks

    SYSTEM FAILURE TESTS

    1 node1 failure package is running

    on node1

    Reset node1

    (try both shutdown

    ry and reboot or rs

    from console)

    Package fails to node2

    if it is running as anode in the cluster

    Yes

    2 node2 failure package is running

    on node2

    Reset node2

    (try both shutdown

    ry and reboot or

    rs from console)

    Package fails to node1

    if it is running as a

    node in the cluster

    Yes

    FAILURES NOT AFFECTING PACKAGE

    1 node2 failure package is running

    on node1

    Cluster reforms to a

    single node cluster and

    package continues to

    run on node1

    Yes

  • 8/3/2019 HA With MC Guard Concepts

    35/55

    MC/ServiceGuard Troubleshooting

    Troubleshooting using log files

    For troubleshooting, there are a few files that will help to log problems experienced by MC/ServiceGuard, these are:

    a. /var/adm/syslog/syslog.logb. /etc/cmcluster/packagedir/packagename.cntl.log

    These files need to be maintained as the file size will grow. This can ultimately affect / file system if not maintained.

    The package control log file will contain information regarding packagestart/stop. Each package will have its own package control log file.

    Note : Always use cmviewcl or cmviewcl v to help to see the status of your

    cluster.

    Common Problems :

    . Problems of configuration- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files

    . Warning : Missing cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem occurs after something has changed affecting the cluste

    lock disk

    eg. SCSI ID of disk changed- No issue at the moment, but when a tie breaker period occurs, nodes

    will not be able to detect the disk and all nodes may panic reboot.

    Solution :

    1. Schedule downtime to halt the cluster (cmhaltcl)

    2. Run vgchange c n vgsh to remove the cluster lock volume group fromthe cluster.

    3. Activate vgsh on the node where the cluster configuration ascii fileexists by running

    vgchange a y vgsh and do a cmapplyconf v C /etc/cmcluster/cluster.ascii

    Answer yes

    to the change and then run vgchange a n vg02 to deactivate the cluster

    lock volume

    disk.

  • 8/3/2019 HA With MC Guard Concepts

    36/55

    4. Start the cluster (cmruncl)

    . Warning : I/O error on cluster lock disk- Will repeat itself every hour by cmcld daemon in syslog.log- This problem usually occurs if something is wrong with one of the

    SPUs or controllers of the disk array connected to one of the nodes.

    - If happened on the primary node, it would be possible that theapplication would already have hung.

    - No issue if occur on adoptive node at the moment, but when a tiebreaker period occurs, nodes will not be able to detect the disk and all nodesmay panic reboot.

    - In other cases, cluster lock disk itself could be faulty and a hungsituation wrf to the application and bdfwill occur.

    Solution :

    - Schedule downtime and ask CE to check the SPU or controller

    . Cluster failures- Cluster cannot start- missing entries /etc/services, /etc/inetd.conf- .rhosts or cmclnodelist not configured- grammatic errors in config and control files- could be hardware, package induced, application problem. Again check

    log files.

    . Package failures

    - Package unable to start totally on all nodes- Check syslog and package log file.

    Possible config problem or control script problem orapplication script name changed.

    - Package cannot failover to adoptive node but can start on primarynode

    - Check syslog and package log file. Possible could be package switching or node disabled Cmmodpkg e package name to enable package switching Cmmodpkg e package name n node name to enable node

    (package to run on this node)

    - Package cannot mount/umount filesystems from package log

    - Package failed to start because of mount problems Possible shared VG not marked as cluster or activated -

    manually mounted fileystem or someone accessing umounted directory

    Unmount all filesystems, check who accessing directory and gethat person to exit, vgchange c y vgsh to mark cluster and deactivate and trystarting again.

    Harddisk problem- Package failed to halt

    application process hung and could not be killed.

  • 8/3/2019 HA With MC Guard Concepts

    37/55

    Hardddisk problem

    . Service Failure- Cmviewcl v to see the status of all packages and their services.- Trace from the package control file and syslog to see why did it fai

    etc.- Possible config problem or control script problem or application

    script name changed.

    . Node timeout- Recommended node timeout value in cluster config file is 5-8 seconds- Otherwise if use default 2 seconds, system may panic reboot due to

    tie breaker scenario because of poor network performance.

    . GSP problems- Known problem for L class servers (certain generation)- Cause system to panic reboot and failover package to adoptive node- Patch recommended/GSP Firmware upgrade need to be done

    . LAN problems- NMID problems

    0. Disk problems- SCSI ID changed /conflict perhaps due to controller card factory default setting Cannot bring up cluster Need CE to change accordingly.- Cluster lock disk failed

    If lock disk RAID1 or RAID5 no problem If lock disk LVM mirror need to do vgcfgrestore and vgsync to

    recover the lock info which is stored on the BBR table part of the disk

    If no mirror, then need reapply cluster

  • 8/3/2019 HA With MC Guard Concepts

    38/55

    On-Going Upgrades/Changes to systems/cluster /package

    - Pro-active Patch installation (node by node)- Data Centre outages (shutdown entire cluster)- Rolling upgrades (node by node)

    Keychain Cluster - Shutdown and Startup Procedure

    -------------------------------------------------

    Last update: 19 June SGP 2002

    *******************************************************************

    Please follow these steps whenever you need to arrange a shutdown

    for sgpue036.sgp.hp.com & sgpue037.sgp.hp.com.

    Special handling is required because of their MC/Serviceguard HA

    environment.

    *******************************************************************

    Before you shutdown a node

    --------------------------

    1. Get agreement with application support on schedule, scope and

    duration of shutdown.

    2. Ensure both nodes in the cluster are up and running. If any node

    is down or appears to be having problems, DO NOT proceed with

    shutdown.

    3. If shutting down a primary node, goto section titled "Shutting down

    and restarting the primary node".

    If shutting down a secondary node, goto section titled "Shutting down

    and restarting the secondary node".

    If shutting down the entire cluster, goto section titled "Shutting down

    and restarting the MC/SG cluster".

    If doing rolling upgrade, goto section titled "Doing a rolling upgrade".

    Shutting down and restarting the primary node

    ------------------------------------------------

    We assume primary node = sgpue036 and secondary node = sgpue037

  • 8/3/2019 HA With MC Guard Concepts

    39/55

    in the following examples.

    1. Before shutdown, make a note of all packages currently running

    on each node.

    sgpue036# cmviewcl

    > CLUSTER STATUS

    > knet up

    >

    > NODE STATUS STATE

    > sgpue036 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 up running

    >> PACKAGE STATUS STATE AUTO_RUN NODE

    > kcdbstg up running enabled sgpue037

    > kcnfs up running enabled sgpue037

    2. Halt primary node sgpue036

    sgpue036# cmhaltnode -f -v sgpue036

    Production packages will failover from sgpue036 to sgpue037. sgpue036

    will cease to be a member of the active cluster.

    3. Check package status on cluster

    sgpue036# cmviewcl

    > CLUSTER STATUS

    > knet up

    >

    > NODE STATUS STATE

    > sgpue037 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running disabled sgpue037

    > kcdbstg up running enabled sgpue037> kcnfs up running enabled sgpue037

    >

    > NODE STATUS STATE

    > sgpue036 down halted

    4. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the

    following line:

  • 8/3/2019 HA With MC Guard Concepts

    40/55

    AUTOSTART_CMCLD = 0

    5. Now we can proceed to shutdown (for PM, repair) or reboot

    (for patching, kernel regen) sgpue036, eg:

    sgpue036# /etc/shutdown -h 0

    sgpue036# /etc/shutdown -r 0

    6. When repair or reboot is over, sgpue036 should be booted up to

    run level 3

    sgpue036# who -r

    . run-level 3 Jan 17 08:01 3 0 S

    7. Edit /etc/rc.config.d/cmcluster file on sgpue036 to include the

    following line:

    AUTOSTART_CMCLD = 1

    8. Make sgpue036 join the cluster

    sgpue036# cmrunnode -v sgpue036

    9. Halt production packages on sgpue037

    sgpue037# cmhaltpkg kci2stg

    10. Restart production packages on sgpue036

    sgpue036# cmrunpkg kci2stg

    11. Re-enable package switching on production packages

    sgpue036# cmmodpkg -e kci2stg

    12. Check package status on cluster.

    You should see the same listing as shown in Step 1 ie.

    sgpue036# cmviewcl

    > CLUSTER STATUS

    > knet up

    >> NODE STATUS STATE

    > sgpue036 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 up running

  • 8/3/2019 HA With MC Guard Concepts

    41/55

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kcdbstg up running enabled sgpue037

    > kcnfs up running enabled sgpue037

    13. Release sgpue036 to customers (notify by phone, email etc)

    Shutting down and restarting the secondary node

    ---------------------------------------------

    1. Before shutdown, make a note of all packages currently running

    on each node

    sgpue037# cmviewcl

    > CLUSTER STATUS

    > knet up

    >> NODE STATUS STATE

    > sgpue036 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kcdbstg up running enabled sgpue037

    > kcnfs up running enabled sgpue037

    2. Halt secondary node sgpue037

    sgpue037# cmhaltnode -f -v sgpue037

    Production packages will failover from sgpue037 to sgpue036. sgpue037

    will cease to be a member of the active cluster.

    3. Check package status on cluster

    sgpue037# cmviewcl

    > CLUSTER STATUS

    > knet up

    >

    > NODE STATUS STATE

    > sgpue036 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

  • 8/3/2019 HA With MC Guard Concepts

    42/55

    > kcdbstg up running disabled sgpue036

    > kcnfs up running disabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 down halted

    4. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include thefollowing line:

    AUTOSTART_CMCLD = 0

    5. Now we can proceed to shutdown (for PM, repair) or reboot

    (for patching, kernel regen) sgpue037, eg:

    sgpue037# /etc/shutdown -h 0

    c# /etc/shutdown -r 0

    6. When repair or reboot is over, sgpue037 should be booted up to

    run level 3

    sgpue037# who -r

    . run-level 3 Jan 17 08:01 3 0 S

    7. Edit /etc/rc.config.d/cmcluster file on sgpue037 to include the

    following line:

    AUTOSTART_CMCLD = 1

    8. Make sgpue037 join the cluster

    sgpue037# cmrunnode -v sgpue037

    9. Halt production packages on sgpue036

    sgpue036# cmhaltpkg kcdbstg

    sgpue036# cmhaltpkg kcnfs

    10. Restart production packages on sgpue037

    sgpue037# cmrunpkg kcdbstg

    sgpue037# cmrunpkg kcnfs

    11. Re-enable package switching on production packages

    sgpue037# cmmodpkg -e kcdbstg

    sgpue037# cmmodpkg -e kcnfs

    12. Check package status on cluster.

    You should see the same listing as shown in Step 1 ie.

    sgpue037# cmviewcl

  • 8/3/2019 HA With MC Guard Concepts

    43/55

    > CLUSTER STATUS

    > knet up

    >

    > NODE STATUS STATE

    > sgpue036 up running

    >> PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kcdbstg up running enabled sgpue037

    > kcnfs up running enabled sgpue037

    13. Release sgpue037 to customers (notify by phone, email etc)

    Shutting down and restarting the MC/SG cluster----------------------------------------------

    We assume primary node = sgpue036 and secondary node = sgpue037 in

    the following examples.

    1. Log in to sgpue036 or sgpue037 as superuser and issue command to

    halt cluster daemon

    sgpue036# cmhaltcl -f -v

    2. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include

    the following line:

    AUTOSTART_CMCLD = 0

    3. Proceed to shutdown each node

    sgpue036# /etc/shutdown -h 0

    sgpue037# /etc/shutdown -h 0

    4. After planned activity is over, bootup each node to run level 3

    sgpue036# who -r

    sgpue037# who -r

    . run-level 3 Jan 17 08:01 3 0 S

    5. Edit /etc/rc.config.d/cmcluster file on ALL nodes to include the

    following line:

    AUTOSTART_CMCLD = 1

    6. Startup the cluster daemon from any node

  • 8/3/2019 HA With MC Guard Concepts

    44/55

    sgpue036# cmruncl -v

    7. Check package status on cluster.

    It should look exactly like the following

    sgpue036# cmviewcl

    > CLUSTER STATUS

    > knet up

    >

    > NODE STATUS STATE

    > sgpue036 up running

    >

    > PACKAGE STATUS STATE AUTO_RUN NODE

    > kci2stg up running enabled sgpue036

    >

    > NODE STATUS STATE

    > sgpue037 up running

    >> PACKAGE STATUS STATE AUTO_RUN NODE

    > kcdbstg up running enabled sgpue037

    > kcnfs up running enabled sgpue037

    8. Release machines to customers (notify by phone, email etc)

    Doing a rolling upgrade

    -----------------------

    This is the most common scenario where we work on 1 node at a time

    without bringing down the entire cluster. This ensures there is at

    least 1 node available to run the application packages. The stepsare already detailed above. Either:

    1. Shutting down and restarting the primary node

    2. Shutting down and restarting the secondary node

    or

    1. Shutting down and restarting the secondary node

    2. Shutting down and restarting the primary node

    Note : This may apply to OS upgrades eg. 10.20 to 11.00 whereby MC/SG is fromver 10.10 to 11.X

    Another method, you may deploy is building a separate cluster on a

    separate machine with

    the latest OS and just copy all config files over, and just swap packag

    IPs.

  • 8/3/2019 HA With MC Guard Concepts

    45/55

    - Modifying the clustero Anything to do with the cluster will need to reapply the cluster (go

    through the cluster.conf file to see what are the parameters) so needdowntime to halt the cluster, except for adding/removing nodes and packages

    which can be done while cluster is still up and running. Eg. Node timeout, heartbeat interval Eg. cluster name Eg. Heartbeat IPs Eg. No. of packages Eg. Change of node names Eg. Manual change / add of volume groupo Steps Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run Cmgetconf v c cluster name

    outputfilename (cluster ascii file - name it something different) to getlatest copy of cluster config file.

    Modify the outputfilename to make the intended changes to thecluster.

    cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors

    Cmapplyconf v C outputfilename - cluster ascii file) if noerrors

    Start the cluster Cmruncl

    - Adding/removing nodes to the clustero Addingo Online method Heartbeat must be configured and network ready Can be done on any node (preferably node where original cluster

    config file was placed) cmquerycl [w] [full] v C /etc/cmcluster/outputfilename n

    primary node n secondary node n new node

    (Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the

    outputfilename.)

    Cmgetconf v c cluster name outputfilename (cluster ascii file- name it something different) to get latest copy of cluster config file.

    Check and Combine the 2 configurations into one final configfile.

  • 8/3/2019 HA With MC Guard Concepts

    46/55

    cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors

    Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors

    Cmrunnode node name to join the cluster Modify all package config files to include the new node if

    desired. (Remember modifying the package config file will need a downtime toapply the package config file)

    o Offline method Same except perform with cluster halted and then when made all

    the changes, start cluster

    o Removingo Online method Modify all package config files to exclude the new node if

    configured in the package. (Remember modifying the package config file willneed a downtime to apply the package config file)

    Halt all ACTIVE packages on the node cmhaltpkg package names Halt the node cmhaltnode v node name Cmgetconf v c cluster name outputfilename (cluster ascii file

    - name it something different) to get latest copy of cluster config file. Edit this cluster ascii file to remove the node details cmcheckconf v C outputfilename - cluster ascii file) check

    for any errors

    Cmapplyconf v C outputfilename - cluster ascii file) if noerrors

    Do whatever with the node, power down, redeploy Vgexport vgsh (off the removed node)o Offline method Same except perform with cluster halted and then when made all

    the changes, start cluster, skip the halt package and halt node steps

    Note : While cluster is running, you can remove node from cluster while thenode is reachable ie connected to LAN recommended. In the event, if the nodeis unreachable, it can still be removed from cluster, only if there are nopackages which specify the unreachable node. If there are packages that dependon the unreachable node, then best to halt the cluster and do the changes onthe package and cluster config files to remove the node from the cluster.

    - Adding/removing packages to the clustero Addingo Online methodo Create Packages on primary node mkdir /etc/cmcluster/packagedir cmmakepkg p /etc/cmcluster/packagedir/packagename.conf Edit the configuration file cmmakepkg s /etc/cmcluster/ packagedir/packagename.cntl

  • 8/3/2019 HA With MC Guard Concepts

    47/55

    Edit the control script.

    Note : If the package and control file is special (e.g NFS required) then do

    not run the cmmakepkg command, just get the pre-defined scripts from the MC/SG

    NFS extension toolkit,

    You may still need to do some adjustments. (similar for SAP extension).

    Note : It is possible that packages do not use any volume groups.

    ftp the control script file to the adoptive nodes On primary node cmcheckconf v P packagename.conf package config file)

    check for any errors

    Cmapplyconf v P packagename.conf package config file) if no errors

    Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching Test package on all adoptive nodes if possible

    Note : Repeat steps Create Packages to here again if there are more packages required in the cluster.

    o Offline method Same except perform with cluster halted and then when made all

    the changes, start cluster

    o Removingo Online method Cmhaltpkg v package name Cmdeleteconf f v p package name

    Cmviewcl (to view that it is no longer part of the cluster)Note : The package config and control files are not removed ie deleted fromsystem,

    just removed from the cluster.

    o Offline method Same except perform with cluster halted and then when made all

    the changes, start cluster

    - Modifying packageso 2 parts package config file and package control fileo Anything to do with modifying thepackage config file will need to

    reapply the package(go through the package.conf file to see what are theparameters)

    Parameters that can be changed without stopping package iecluster and package is up and running.

  • 8/3/2019 HA With MC Guard Concepts

    48/55

    Eg. Failover policy, Failback policy Eg. Add/Remove/modify Node names E.g Switching parameters

    Steps Cmgetconf v p package name outputfilename (package confi

    file - name it something different) to get latest copy of package configfile.

    Modify the outputfilename to make the intended changes tothe package config.

    cmcheckconf v P outputfilename package config file) check for any errors

    Cmapplyconf v P outputfilename package config file) if no errors

    Parameters that must be changed by stopping package ie package

    is down but cluster is up and running. Eg. package name (if possible change hosting directory name as

    well) Eg. Change Run/Halt Scripts Eg. Add/remove Service names Eg. Add/remove Subnet

    Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, run Cmgetconf v p package name

    outputfilename (package config file - name it something different) to getlatest copy of package config file.

    Modify the outputfilename to make the intended changes tothe package config.

    cmcheckconf v P outputfilename package config file) check for any errors

    Cmapplyconf v P outputfilename package config file) if no errors

    Start the packageo Cmrunpkg package nameo Cmmodpkg e package name to re-enable package

    switching

    o Anything to do with modifying thepackage control file will NOT needto reapply the package(go through the package.cntl file to see what are theparameters) script, but need downtime to halt the package, but the clusterand other packages in the cluster can still be running.

  • 8/3/2019 HA With MC Guard Concepts

    49/55

    Eg. VG name and no. of VGs Eg. LVs, names of mount points and no.s Eg. Nfs mounts Eg. Package IPs and subnet

    Eg. Service names Eg. Subnet E.g Application start/stop scriptso Steps Schedule downtime to halt package affected Cmhaltpkg package name to halt the package After package halted, modify the package control file to make

    the intended changes.

    Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switching

    - Adding/modifying LAN cards in the clustero If there is a need to add or upgrade/replace LAN cards in a clustere

    environment, need to take note of the LAN ID (NMID)o Usually adding will not cause an issue, unless it will be part of

    cluster, and it is already connected to the network need to reconfigure andreapply cluster config file.

    o For upgrading/replacing LAN cards, NMID may change, eg. Upgradingfrom a 10BT to a 100BT or replacing a 1 port LAN card with a 4 port LAN card.In such a case, the cluster cannot startup, because the cluster setting isdifferent (cluster trying to find LAN1 configured in the cluster config file,but the NMID has already changed to LAN2. We will need to reform, re-apply thecluster, before running it.

    o Steps Method 1 Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster After cluster halted, run

    o cmquerycl [w] [full] v C/etc/cmcluster/outputfilename n primary node n secondary node [n other node

    in the cluster]

    (Note : This will query the system configuration and generatethe new cluster config file, according to whatever name you specified as the

    outputfilename. This should have automatically generated the cluster config

    file with the new LAN card NMID.

  • 8/3/2019 HA With MC Guard Concepts

    50/55

    run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.

    Check and Combine the 2 configurations into one final configfile.

    cmcheckconf v C finalconfigfile - cluster ascii file) checkfor any errors

    Cmapplyconf v C finalconfigfile - cluster ascii file) if noerrors

    Start the cluster Cmruncl

    Method 2 not recommended

    Schedule downtime to halt entire cluster Cmhaltcl f to halt the cluster

    run Cmgetconf v c cluster name outputfilename (cluster asciifile - name it something different) to get latest copy of cluster configfile.

    Modify the outputfilename to make the intended changes to thecluster (if you are aware of the change in NMID of the LAN card.

    cmcheckconf v C outputfilename - cluster ascii file) checkfor any errors

    Cmapplyconf v C outputfilename - cluster ascii file) if noerrors

    Start the cluster Cmruncl

    - Extending/Reducing logical volumes in the clusterpackageso (ONLINE) No downtime required provided OnlineJFS installedo Make changes on node where logical volumes are mountedo No action required on adoptive nodeso Extending : Lvextend L newsizeinBL /dev/vgsh/shlvol Fsadm f vxfs b newsizeinKB /shnameo Reducing : Fsadm f vxfs b newsizeinKB /shname Lvreduce L newsizeinBL /dev/vgsh/shlvol

    - LVMTAB needs to be updated when :o Adding/removing Disks Logical volumes

  • 8/3/2019 HA With MC Guard Concepts

    51/55

    Volume groups

    - Adding/Removing new Physical volumes/Disks to the

    volume group owned by packageo Addingo On the primary node (node where shared VG is activated, where packag

    is running) Pvcreate new disk Vgextend new disk to the identified shared volume group VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

    oOn the adoptive nodes

    VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Removing Same steps except that use vgreduce (no pvcreate required)

    o (Online) No downtime required, but it will be good to schedule one iyou want to test the failover.

    o Do I need to re-apply the cluster and package? No.

    - Adding/Removing logical volumes to the volumegroup owned by the package

    o Adding

    o On the primary node, (node where shared VG is activated, wherepackage is running) Lvcreate L . Newfs . Mkdir /filesystem Mount fileystem manually and assign correct ownershipd and

    permissions

  • 8/3/2019 HA With MC Guard Concepts

    52/55

    Umount fileystem VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

    o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid Mkdir /filesystem VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsho Schedule time to halt the package -(only package affected).

    Cmhaltpkg package name

    o After package halted,modify the package control script (.cntl) toinclude the new filesystem on all nodes.

    o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that filesystem is mounted and accessible.o Test on all adoptive nodes.

    o Removing

    Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the

    package from cluster

    Vgchange a y vgsh to activate vg Lvremove the logical volume Vgchange a n vgsh to deactivate the vg Vgchange c y vgsh to mark the vg as part of the cluster Modify package control files on all nodes to exclude this

    LV and filesystem

    Cmrunpkg package name - to restart package Cmmodpkg e . to re-enable package switching Vgexport mapfile on primary and ftp to all adoptive node Vgexport., vgimport . Mapfile on adoptive nodes Test on all adoptive nodes

  • 8/3/2019 HA With MC Guard Concepts

    53/55

    o Offline for package affected, but cluster can be up and running,other packages can be up and running.

    o Do I need to re-apply the cluster/package (changing package controlfile does not need a reapplication)? No.

    o Can I create a LV/filesystem that is not mounted by my package butbelongs to the same volume group ie I mount it via /etc/fstab ? No, this wilcause a problem since the VG will need to be activated/deactivated package

    may fail.

    - Adding new Volume groups to the cluster packageso Addingo On the primary node (node where shared VG is activated, where packag

    is running) Pvcreate new disk

    Mkdir /dev/vgsh new share vg Mknod /dev/vgsh/group c 64 0x0. Vgcreate new shared volume group Create necessary lvols and filesystems or raw devices for VG Mount the filesystems and change permissions and ownerships

    accordingly VGEXPORT with preview option the particular shared VG mapfile Vgexport m vgsh.map p s v vgsh Ftp mapfile to the adoptive nodes

    o On the adoptive nodes VGEXPORT the identified shared volume group off the system Vgexport vgsh Mkdir /dev/vgsh Mknod /dev/vgsh/group c 64 0x. same vgid VGIMPORT the shared volume group to the system with the mapfile Vgimport m vgsh.map s v vgsh Mkdir /filesystems for the logical volumeso On the primary node, Vgchange c y /dev/vgsh to mark the VG as part of the cluster Umount all filesystems in this new shared VG and deactivate it vgchange a n vgsh. Check /var/adm/syslog/syslog.log to see if this vg has been

    successfully marked in the cluster Cmgetconf v c cluster name outputfilename (name it something

    different) to see that it has been entered into the cluster config file.

  • 8/3/2019 HA With MC Guard Concepts

    54/55

    If no, then we will need to down the entire cluster, check andre-apply the cluster.

    o Method 1 (do this if successfully marked)o Schedule time to halt the package -(only package affected). Cmhaltpkg package nameo After package halted,modify the package control script (.cntl) to

    include the new filesystem, and Volume Group on all nodes.o Start the package Cmrunpkg package name Cmmodpkg e package name to re-enable package switchingo Verify that the VG is activated and filesystems are mounted and

    accessible.o Test on all adoptive nodes.

    o Method 2 (do this if not marked successfully)o Schedule time to halt the entire cluster. Cmhaltcl

    o After cluster halted, run Cmgetconf v c cluster name outputfilenam(cluster ascii file - name it something different) to see that it has beenentered into the cluster config file.

    o If not entered, try to manually type in the new shared VG into thenew cluster outputfilename.

    o Cmcheckconf v C outputfilename - cluster ascii file) check forany errors

    o Cmapplyconf v C outputfilename - cluster ascii file) if no erroro modify the package control script (.cntl) to include the new

    filesystem and Volume Group on all nodes.o Start the cluster cmrunclo Verify that the VG is activated and filesystems are mounted and

    accessible.

    o Test that the VG can be mounted on all adoptive nodes.

    o Removing

    Schedule downtime to halt the package Cmhaltpkg package name - on primary node Vgchange c n vgsh - to unmark the VG that belongs to the

    package from cluster

    Modi