ra dev guide

Upload: salamsalar

Post on 02-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Ra Dev Guide

    1/34

    TheOCFResourceAgentDevelopersGuide

    FlorianHaas

  • 7/27/2019 Ra Dev Guide

    2/34

    TheOCFResourceAgentDevelopersGuideFlorian HaasCopyright 2010 LINBIT HA-Solutions GmbH

    License information

    The text of and illustrations in this document are licensed under a Creative Commons AttributionShare Alike 3.0 Unported license("CC-BY-SA").

    A summary of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/.

    The full license text is available at http://creativecommons.org/licenses/by-sa/3.0/legalcode.

    In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.

    http://creativecommons.org/licenses/by-sa/3.0/legalcodehttp://creativecommons.org/licenses/by-sa/3.0/
  • 7/27/2019 Ra Dev Guide

    3/34

    iii

    1. Introduction ........ ......... ........ ........ ........ ........ ........ ........ ........ ......... ........ ........ ........ .. 11.1. What is a resource agent? ......... ........ ........ ........ ........ ........ ........ ......... ........ .... 11.2. Who or what uses a resource agent? ... .... .... .... ... .... .... .... .... .... .... ... .... .... .... .... .. 11.3. Which language is a resource agent written in? ..... .... .... .... .... .... .... .... .... .... .... .... 1

    2. API definitions ........ ........ ........ ........ ........ ......... ........ ........ ........ ........ ........ ........ ........ 22.1. Environment variables ...... ........ ........ ........ ........ ........ ......... ........ ........ ........ ..... 2

    2.2. Actions ........................................................................................................ 22.3. Timeouts ...................................................................................................... 32.4. Metadata ..................................................................................................... 3

    3. Return codes ........................................................................................................... 53.1. OCF_SUCCESS (0) ....................................................................................... 53.2. OCF_ERR_GENERIC (1) ............................................................................... 53.3. OCF_ERR_ARGS (2) ..................................................................................... 53.4. OCF_ERR_UNIMPLEMENTED (3) .................................................................... 53.5. OCF_ERR_PERM (4) ................................................................ . . . .. . . . .. . . . .. . . . .. . 63.6. OCF_ERR_INSTALLED (5) ........................................................................... 63.7. OCF_ERR_CONFIGURED (6) .................................................................. .. .. .. . 63.8. OCF_NOT_RUNNING (7) ...................................................................... . .. . .. . .. 6

    3.9. OCF_RUNNING_MASTER (8) ......................................................................... 63.10. OCF_FAILED_MASTER (9) ......................................................................... 64. Resource agent structure .. ........ ........ ........................................................................ 8

    4.1. Resource agent interpreter ............................................................................. 84.2. Author and license information ....................................................................... 84.3. Initialization .................................................................................................. 84.4. Functions implementing resource agent actions ................................................. 94.5. Execution block ............................................................................................ 9

    5. Resource agent actions ............ ............................................................................... 105.1. start action ............................................................................................. 105.2. stop action ............................................................................................... 105.3. monitor action ......................................................................................... 125.4. validate-all action ............................................................................... 13

    5.5. meta-data action ....... ........ .... .................................................................. 135.6. promote action ......................................................................................... 145.7. demote action ........................................................................................... 155.8. migrate_to action ................................................................................... 165.9. migrate_from action .. .... ......................................................................... 175.10. notify action ...... ........ ..... ...................................................................... 18

    6. Script variables

    6.6. $HA_RSCTMP ............................................................................................. 207. Convenience functions ........................................................................................... 21

    7.1. Logging: ocf_log ...................................................................................... 217.2. Testing for binaries: have_binary and check_binary ................................ 217.3. Executing commands and capturing their output: ocf_run .............................. 217.4. Locks: ocf_take_lock and ocf_release_lock_on_exit ....................... 227.5. Testing for numerical values: ocf_is_decimal ............................................ 227.6. Testing for boolean values: ocf_is_true .................................................... 227.7. Pseudo resources: ha_pseudo_resource ................................................... 23

    8. Special considerations ............................................................................................. 248.1. Licensing .................................................................................................... 248.2. Locale settings . .... .... ................................................................................... 248.3. Testing for running processes ....................................................................... 248.4. Specifying a master preference ..................................................................... 25

    9. Testing, installing, and packaging resource agents ........ ........ ........ ........ ........ ........ ...... 27

  • 7/27/2019 Ra Dev Guide

    4/34

    The OCF Resource AgentDevelopers Guide

    iv

    9.1. Testing resource agents ........ ........ ........ ........ ......... ........ ........ ........ ........ ...... 279.2. Installing resource agents .... .... .... ... .... .... .... .... .... ... .... .... .... .... .... ... .... .... .... .... 279.3. Packaging resource agents ....... ........ ........ ........ ........ ........ ........ ......... ........ ... 28

    9.3.1. RPM packaging ........ ........ ........ ........ ........ ......... ........ ........ ........ ....... 289.3.2. Debian packaging ........ ........ ......... ........ ........ ........ ........ ........ ........ .... 28

    9.4. Submitting resource agents .... .... .... .... .... .... .... ... .... .... .... .... .... ... .... .... .... .... .... 29

  • 7/27/2019 Ra Dev Guide

    5/34

    1

    Chapter 1. IntroductionThis document is to serve as a guide and reference for all developers, maintainers, andcontributors working on OCF (Open Cluster Framework) compliant cluster resource agents. Itexplains the anatomy and general functionality of a resource agent, illustrates the resource agentAPI, and provides valuable hints and tips to resource agent authors.

    1.1. Whatisaresourceagent?A resource agent is an executable that manages a cluster resource. No formal definition of a clusterresource exists, other than "anything a cluster manages is a resource." Cluster resources can beas diverse as IP addresses, file systems, database services, and entire virtual machines to namejust a few examples.

    1.2. Whoorwhatusesaresourceagent?Any Open Cluster Framework (OCF) compliant cluster management application is capable ofmanaging resources using the resource agents described in this document. At the time of writing,two OCF compliant cluster management applications exist for the Linux platform:

    Pacemaker, a cluster manager supporting both the Corosync and Heartbeat cluster messagingframeworks. Pacemaker evolved out of the Linux-HA project.

    RGmanager, the cluster manager bundled in Red Hat Cluster Suite. It supports the Corosynccluster messaging framework exclusively.

    1.3. Whichlanguageisaresourceagentwritten

    in?An OCF compliant resource agent can be implemented in anyprogramming language. The API isnot language specific. However, most resource agents are implemented as shell scripts, which iswhy this guide primarily uses example code written in shell language.

  • 7/27/2019 Ra Dev Guide

    6/34

    2

    Chapter 2. APIdefinitions

    2.1. Environmentvariables

    A resource agent receives all configuration information about the resource it manages viaenvironment variables. The names of these environment variables are always the name of theresource parameter, prefixed with OCF_RESKEY_. For example, if the resource has an ipparameter set to 192.168.1.1, then the resource agent will have access to an environmentvariable OCF_RESKEY_ip holding that value.

    For any resource parameter that is not required to be set by the user that is, its parameterdefinition in the resource agent metadata does not specify required="true" then theresource agent must

    Provide a reasonable default. This should be advertised in the metadata. By convention, theresource agent uses a variable named OCF_RESKEY__default that

    holds this default.

    Alternatively, cater correctly for the value being empty.

    In addition, the cluster manager may also support metaresource parameters. These do not applydirectly to the resource configuration, but rather specify how the cluster resource manageris expected to manage the resource. For example, the Pacemaker cluster manager uses thetarget-rolemeta parameter to specify whether the resource should be started or stopped.

    Meta parameters are passed into the resource agent in the OCF_RESKEY_CRM_meta_

    namespace, with any hypens converted to underscores. Thus, thetarget-roleattribute mapsto an environment variable named OCF_RESKEY_CRM_meta_target_role.

    2.2. ActionsAny resource agent must support one command-line argument which specifies the action theresource agent is about to execute. The following actions must be supported by any resourceagent:

    start starts the resource.

    stop shuts down the resource.

    monitor queries the resource for its state.

    meta-data dumps the resource agent metadata.

    In addition, resource agents may optionally support the following actions:

    promote turns a resource into the Master role (Master/Slave resources only).

    demote turns a resource into the Slave role (Master/Slave resources only).

    migrate_to and migrate_from implement live migration of resources.

    validate-all validates a resources configuration.

    usage or help displays a usage message when the resource agent is invoked from the

    command line, rather than by the cluster manager.

    status historical (deprecated) synonym formonitor.

  • 7/27/2019 Ra Dev Guide

    7/34

    API definitions

    3

    2.3. TimeoutsAction timeouts are enforced outside the resource agent proper. It is the cluster managersresponsibility to monitor how long a resource agent action has been running, and terminate it ifit does not meet its completion deadline. Thus, resource agents need not themselves check for

    any timeout expiry.

    Resource agents can, however, advisethe user of sensible timeout values (which, when correctlyset, will be duly enforced by the cluster manager). See the following section [3] for detailson how a resource agent advertises its suggested timeouts.

    2.4. MetadataEvery resource agent must describe its own purpose and supported parameters in a set ofXML metadata. This metadata is used by cluster management applications for on-line help, andresource agent man pages are generated from it as well. The following is a fictitious set ofmetadata from an imaginary resource agent:

    0.1

    This is a fictitious example resource agent written for the

    OCF Resource Agent Developers Guide.

    Example resource agent

    for budding OCF RA developers

    Number of eggs, an example numeric parameter

    Number of eggs

    Enable superfrobnication, an example boolean parameter

    Enable superfrobnication

    Data directory, an example string parameter

    Data directory

  • 7/27/2019 Ra Dev Guide

    8/34

    API definitions

    4

    The resource-agent element, of which there must only be one per resource agent, definesthe resource agent name and version.

    The longdesc and shortdesc elements in resource-agent provide a long and shortdescription of the resource agents functionality. While shortdesc is a one-line description ofwhat the resource agent does and is usually used in terse listings, longdesc should give a full-blown description of the resource agent in as much detail as possible.

    Theparameters element describes the resource agent parameters, and should hold any numberofparameter children one for each parameter that the resource agent supports.

    Everyparameter should, like the resource-agent as a whole, come with ashortdesc and

    a longdesc, and also a content child that describes the parameters expected content.

    On the content element, there may be four different attributes:

    type describes the parameter type (string, integer, or boolean). If unset, type

    defaults to string.

    required indicates whether setting the parameter is mandatory (required="true") oroptional (required="false").

    For optional parameters, it is customary to provide a sensible default via the defaultattribute.

    Finally, the unique attribute (allowed values: true or false) indicates that a specific valuemust be unique across the cluster, for this parameter of this particular resource type. Forexample, a highly available floating IP address is declared unique as that one IP addressshould run only once throughout the cluster, avoiding duplicates.

    The actions list defines the actions that the resource agent advertises as supported.

    Every action should list its own timeout value. This is a hint to the user what minimaltimeoutshould be configured for the action. This is meant to cater for the fact that some resources arequick to start and stop (IP addresses or filesystems, for example), some may take several minutesto do so (such as databases).

    In addition, recurring actions (such as monitor) should also specify a recommended minimum

    interval, which is the time between two consecutive invocations of the same action. Liketimeout, this value does not constitute a default it is merely a hint for the user which actioninterval to configure, at minimum.

  • 7/27/2019 Ra Dev Guide

    9/34

    5

    Chapter 3. ReturncodesFor any invocation, resource agents must exit with a defined return code that informs the callerof the outcome of the invoked action. The return codes are explained in detail in the followingsubsections.

    3.1. OCF_SUCCESS(0)

    The action completed successfully. This is the expected return code for any successfulstart,stop, promote, demote, migrate_from, migrate_to, meta_data, help, and usage

    action.

    For monitor (and its deprecated alias, status), however, a modified convention applies:

    For primitive (stateless) resources, OCF_SUCCESS from monitor means that theresource is running. Non-running and gracefully shut-down resources must instead return

    OCF_NOT_RUNNING.

    For master/slave (stateful) resources, OCF_SUCCESS from monitor means that theresource is running in Slave mode. Resources running in Master mode must insteadreturn OCF_RUNNING_MASTER , and gracefully shut-down resources must instead returnOCF_NOT_RUNNING.

    3.2. OCF_ERR_GENERIC(1)

    The action returned a generic error. A resource agent should use this exit code only when noneof the more specific error codes, defined below, accurately describes the problem.

    The cluster resource manager interprets this exit code as a softerror. This means that unlessspecifically configured otherwise, the resource manager will attempt to recover a resource whichfailed with OCF_ERR_GENERIC in-place usually by restarting the resource on the same node.

    3.3. OCF_ERR_ARGS(2)

    The resource agent was invoked with incorrect arguments. This is a safety net "cant happen"error which the resource agent should only return when invoked with, for example, an incorrectnumber of command line arguments.

    Note

    The resource agent should not return this error when instructed to perform anaction that it does not support. Instead, under those circumstances, it should returnOCF_ERR_UNIMPLEMENTED.

    3.4. OCF_ERR_UNIMPLEMENTED(3)

    The resource agent was instructed to execute an action that the agent does not implement.

    Not all resource agent actions are mandatory. promote, demote, migrate_to,migrate_from, and notify, are all optional actions which the resource agent may or may not

    implement. When a non-stateful resource agent is misconfigured as a master/slave resource, forexample, then the resource agent should alert the user about this misconfiguration by returningOCF_ERR_UNIMPLEMENTEDon the promote and demote actions.

  • 7/27/2019 Ra Dev Guide

    10/34

    Return codes

    6

    3.5. OCF_ERR_PERM(4)The action failed due to insufficient permissions. This may be due to the agent not being able toopen a certain file, to listen on a specific socket, to write to a directory, or similar.

    The cluster resource manager interprets this exit code as a harderror. This means that unlessspecifically configured otherwise, the resource manager will attempt to recover a resource whichfailed with this error by restarting the resource on a different node (where the permission problemmay not exist).

    3.6. OCF_ERR_INSTALLED(5)The action failed because a required component is missing on the node where the action wasexecuted. This may be due to a required binary not being executable, or a vital configuration filebeing unreadable.

    The cluster resource manager interprets this exit code as a harderror. This means that unless

    specifically configured otherwise, the resource manager will attempt to recover a resource whichfailed with this error by restarting the resource on a different node (where the required files orbinaries may be present).

    3.7. OCF_ERR_CONFIGURED(6)The action failed because the user misconfigured the resource. For example, the user may haveconfigured an alphanumeric string for a parameter that really should be an integer.

    The cluster resource manager interprets this exit code as a fatalerror. Since this is a configurationerror that is present cluster-wide, it would make no sense to recover such a resource on a differentnode, let alone in-place. When a resource fails with this error, the cluster manager will attempt toshut down the resource, and wait for administrator intervention.

    3.8. OCF_NOT_RUNNING(7)The resource was found not to be running. This is an exit code that may be returned by themonitor action exclusively. Note that this implies that the resource has either gracefullyshutdown, or has never been started.

    If the resource is not running due to an error condition, themonitor action should instead returnone of the OCF_ERR_exit codes or OCF_FAILED_MASTER.

    3.9. OCF_RUNNING_MASTER(8)The resource was found to be running in the Master role. This applies only to stateful (Master/Slave) resources, and only to their monitor action.

    Note that there is no specific exit code for "running in slave mode". This is because their is nofunctional distinction between a primitive resource running normally, and a stateful resourcerunning as a slave. The monitor action of a stateful resource running normally in theSlave roleshould simply return OCF_SUCCESS.

    3.10. OCF_FAILED_MASTER(9)

    The resource was found to have failed in the Master role. This applies only to stateful (Master/Slave) resources, and only to their monitor action.

  • 7/27/2019 Ra Dev Guide

    11/34

    Return codes

    7

    The cluster resource manager interprets this exit code as a softerror. This means that unlessspecifically configured otherwise, the resource manager will attempt to recover a resource whichfailed with$OCF_FAILED_MASTER in-place usually by demoting, stopping, starting and thenpromoting the resource on the same node.

  • 7/27/2019 Ra Dev Guide

    12/34

    8

    Chapter 4. ResourceagentstructureA typical (shell-based) resource agent contains standard structural items, in the order as listed inthis section. It describes the expected behavior of a resource agent with respect to the variousactions it supports, using a fictitous resource agent named foobar as an example.

    4.1. ResourceagentinterpreterAny resource agent implemented as a script must specify its interpreter using standard"shebang" (#!) header syntax.

    #!/bin/sh

    If a resource agent is written in shell, specifying the generic shell interpreter (#!/bin/sh) isgenerally preferred, though not required. Resource agents declared as/bin/shcompatible mustnot use constructs native to a specific shell (such as, for example,${!variable} syntax nativeto bash). It is advisable to occasionally run such resource agents through a sanitization utilitysuch as checkbashisms .

    It is considered a regression to introduce a patch that will make a previously sh compatibleresource agent suitable only for bash, ksh, or any other non-generic shell. It is, however,perfectly acceptable for a new resource agent to explicitly define a specific shell, such as/bin/bash, as its interpreter.

    4.2. AuthorandlicenseinformationThe resource agent should contain a comment listing the resource agent author(s) and/orcopyright holder(s), and stating the license that applies to the resource agent:

    #

    # Resource Agent for managing foobar resources.

    #

    # License: GNU General Public License (GPL)

    # (c) 2008-2010 John Doe, Jane Roe,

    # and Linux-HA contributors

    When a resource agent refers to a license for which multiple versions exist, it is assumed that thecurrent version applies.

    4.3. InitializationAny shell resource agent should source the.ocf-shellfuncsfunction library. With the syntaxbelow, this is done in terms of$OCF_FUNCTIONS_DIR , which for testing purposes, and alsofor generating documentation may be overridden from the command line.

    # Initialization:

    : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}

    . ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs

    Defaults for resource agent parameters should be set by initializing variables with the suffix_default:

    # Defaults

    OCF_RESKEY_superfrobnicate_default=0

    : ${OCF_RESKEY_superfrobnicate=${OCF_RESKEY_superfrobnicate_default}}

  • 7/27/2019 Ra Dev Guide

    13/34

    Resource agent structure

    9

    Note

    The resource agent should make sure that it sets a default for any parameter notmarked as required in the metadata.

    4.4. FunctionsimplementingresourceagentactionsWhat follows next are the functions implementing the resource agents advertised actions. Theindividual actions are described in detail in Chapter 5, Resource agent actions[10].

    4.5. ExecutionblockThis is the part of the resource agent that actually executes when the resource agent is invoked.It typically follows a fairly standard structure:

    # Make sure meta-data and usage always succeed

    case $__OCF_ACTION in

    meta-data) foobar_meta_data

    exit $OCF_SUCCESS

    ;;

    usage|help) foobar_usage

    exit $OCF_SUCCESS

    ;;

    esac

    # Anything other than meta-data and usage must pass validation

    foobar_validate || exit $?

    # Translate each action into the appropriate function call

    case $__OCF_ACTION in

    start) foobar_start;;

    stop) foobar_stop;;

    status|monitor) foobar_monitor;;

    promote) foobar_promote;;

    demote) foobar_demote;;

    reload) ocf_log info "Reloading..."

    foobar_start

    ;;

    validate-all) ;;

    *) foobar_usage

    exit $OCF_ERR_UNIMPLEMENTED

    ;;

    esac

    rc=$?

    # The resource agent may optionally log a debug message

    ocf_log debug "${OCF_RESOURCE_INSTANCE} $__OCF_ACTION returned $rc"

    exit $rc

  • 7/27/2019 Ra Dev Guide

    14/34

    10

    Chapter 5. ResourceagentactionsEach action is typically implemented in a separate function or method in the resource agent. Byconvention, these are usually named _ , so the function implementing thestart action in foobarwould be named foobar_start() .

    As a general rule, whenever the resource agent encounters an error that it is not able to recover, itis permitted to immediately exit, throw an exception, or otherwise cease execution. Examples forthis include configuration issues, missing binaries, permission problems, etc. It is not necessary topass these errors up the call stack.

    It is the cluster managers responsibility to initiate the appropriate recovery action based on theusers configuration. The resource agent should not guess at said configuration.

    5.1. startactionWhen invoked with the start action, the resource agent must start the resource if it is not

    yet running. This means that the agent must verify the resources configuration, query its state,and then start it only if it is not running. A common way of doing this would be to invoke thevalidate_all and monitor function first, as in the following example:

    foobar_start() {

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # if resource is already running, bail out early

    if foobar_monitor; then

    ocf_log info "Resource is already running"

    return $OCF_SUCCESS

    fi

    # actually start up the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ...

    # After the resource has been started, check whether it started up

    # correctly. If the resource starts asynchronously, the agent may

    # spin on the monitor function here -- if the resource does not

    # start up within the defined timeout, the cluster manager will

    # consider the start action failed

    while ! foobar_monitor; do

    ocf_log debug "Resource has not started yet, waiting"

    sleep 1

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    5.2. stopactionWhen invoked with the stop action, the resource agent must stop the resource, if it is running.

    This means that the agent must verify the resource configuration, query its state, and then stop itonly if it is currently running. A common way of doing this would be to invoke thevalidate_all

    and monitor function first. It is important to understand that stop is a force operation the

  • 7/27/2019 Ra Dev Guide

    15/34

    Resource agent actions

    11

    resource agent must do everything in its power to shut down, the resource, short of rebootingthe node or shutting it off. Consider the following example:

    foobar_stop() {

    local rc

    # exit immediately if configuration is not validfoobar_validate_all || exit $?

    foobar_monitor

    rc=$?

    case "$rc" in)

    "$OCF_SUCCESS")

    # Currently running. Normal, expected behavior.

    ocf_log debug "Resource is currently running"

    ;;

    "$OCF_RUNNING_MASTER")

    # Running as a Master. Need to demote before stopping.

    ocf_log info "Resource is currently running as Master"

    foobar_demote || \

    ocf_log warn "Demote failed, trying to stop anyway"

    ;;

    "$OCF_NOT_RUNNING")

    # Currently not running. Nothing to do.

    ocf_log info "Resource is already stopped"

    return $OCF_SUCCESS

    ;;

    esac

    # actually shut down the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ...

    # After the resource has been stopped, check whether it shut down

    # correctly. If the resource stops asynchronously, the agent may

    # spin on the monitor function here -- if the resource does not

    # shut down within the defined timeout, the cluster manager will

    # consider the stop action failed

    while foobar_monitor; do

    ocf_log debug "Resource has not stopped yet, waiting"

    sleep 1

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    Note

    The expected exit code for a successful stop operation is $OCF_SUCCESS, not$OCF_NOT_RUNNING.

    Important

    A failed stop operation is a potentially dangerous situation which the cluster managerwill almost invariably try to resolve by means of node fencing. In other words,

  • 7/27/2019 Ra Dev Guide

    16/34

    Resource agent actions

    12

    the cluster manager will forcibly evict from the cluster a node on which a stopoperation has failed. While this measure serves ultimately to protect data, it doescause disruption to applications and their users. Thus, a resource agent should makesure that it exits with an error only if all avenues for proper resource shutdown havebeen exhausted.

    5.3. monitoractionThe monitor action queries the current status of a resource. It must discern between threedifferent states:

    resource is currently running (return$OCF_SUCCESS);

    resource has stopped gracefully (return$OCF_NOT_RUNNING);

    resource has run into a problem and must be considered failed (return the appropriate$OCF_ERR_code to indicate the nature of the problem).

    foobar_monitor() {

    local rc

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    ocf_run frobnicate --test

    # This example assumes the following exit code convention

    # for frobnicate:

    # 0: running, and fully caught up with master

    # 1: gracefully stopped

    # any other: error

    case "$?" in0)

    rc=$OCF_SUCCESS

    ocf_log debug "Resource is running"

    ;;

    1)

    rc=$OCF_NOT_RUNNING

    ocf_log debug "Resource is not running"

    ;;

    *)

    ocf_log err "Resource has failed"

    exit $OCF_ERR_GENERIC

    esac

    return $rc

    }

    Stateful (master/slave) resource agents may use a more elaborate monitoring scheme where theycan provide "hints" to the cluster manager identifying which instance is best suited to assume theMaster role. Section 8.4, Specifying a master preference [25] explains the details.

    Note

    The cluster manager may invoke the monitor action for a probe, which is a testwhether the resource is currently running. Normally, the monitor operation would

    behave exactly the same during a probe and a "real" monitor action. If a specificresource does require special treatment for probes, however, theocf_is_probeconvenience function is available in the OCF shell functions library for that purpose.

  • 7/27/2019 Ra Dev Guide

    17/34

    Resource agent actions

    13

    5.4. validate-allactionThe validate-all action tests for correct resource agent configuration and a workingenvironment. validate-all should exit with one of the following return codes:

    $OCF_SUCCESS all is well, the configuration is valid and usable.

    $OCF_ERR_CONFIGURED the user has misconfigured the resource.

    $OCF_ERR_INSTALLED the resource has possibly been configured correctly, but a vitalcomponent is missing on the node where validate-all is being executed.

    $OCF_ERR_PERM the resource is configured correctly and is not missing any requiredcomponents, but is suffering from a permission issue (such as not being able to create anecessary file).

    validate-all is usually wrapped in a function that is not only called when explicitly invokingthe corresponding action, but also as a sanity check from just about any other function.

    Therefore, the resource agent author must keep in mind that the function may be invoked duringthe start, stop, and monitor operations, and also during probes.

    Probes pose a separate challenge for validation. During a probe (when the cluster manager mayexpect the resource notto be running on the node where the probe is executed), some requiredcomponents may be expectedto not be available on the affected node. For example, this includesany shared data on storage devices not available for reading during the probe. The validate-all function may thus need to treat probes specially, using the ocf_is_probe conveniencefunction:

    foobar_validate_all() {

    # Test for configuration errors first

    if ! ocf_is_decimal $OCF_RESKEY_eggs; then

    ocf_log err "eggs is not numeric!"exit $OCF_ERR_CONFIGURED

    fi

    # Test for required binaries

    check_binary frobnicate

    # Check for data directory (this may be on shared storage, so

    # disable this test during probes)

    if ! ocf_is_probe; then

    if ! [ -d $OCF_RESKEY_datadir ]; then

    ocf_log err "$OCF_RESKEY_datadir does not exist or is not a directory

    exit $OCF_ERR_INSTALLED

    fi

    fi

    return $OCF_SUCCESS

    }

    5.5. meta-dataactionThemeta-dataaction dumps the resource agent metadata to standard output. The output mustfollow the metadata format as specified in Section 2.4, Metadata [3].

    foobar_meta_data {cat

  • 7/27/2019 Ra Dev Guide

    18/34

    Resource agent actions

    14

    0.1

    ...

    EOF

    }

    5.6. promoteactionThe promote action is optional. It must only be supported by statefulresource agents, whichmeans agents that discern between two distinct roles: Master and Slave. Slave is functionallyidentical to the Started state in a stateless resource agent. Thus, while a regular (stateless)resource agent only needs to implement start and stop, a stateful resource agent must alsosupport the promote action to be able to make a transition between the Started (Slave)and Master roles.

    foobar_promote() {

    local rc

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # test the resource's current state

    foobar_monitor

    rc=$?

    case "$rc" in)

    "$OCF_SUCCESS")

    # Running as slave. Normal, expected behavior.

    ocf_log debug "Resource is currently running as Slave"

    ;;

    "$OCF_RUNNING_MASTER")

    # Already a master. Unexpected, but not a problem.

    ocf_log info "Resource is already running as Master"

    return $OCF_SUCCESS

    ;;

    "$OCF_NOT_RUNNING")

    # Currently not running. Need to start before promoting.

    ocf_log info "Resource is currently not running"

    foobar_start

    ;;

    *)

    # Failed resource. Let the cluster manager recover.

    ocf_log err "Unexpected error, cannot promote"exit $rc

    ;;

    esac

    # actually promote the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ocf_run frobnicate --master-mode || exit $OCF_ERR_GENERIC

    # After the resource has been promoted, check whether the

    # promotion worked. If the resource promotion is asynchronous, the

    # agent may spin on the monitor function here -- if the resource# does not assume the Master role within the defined timeout, the

    # cluster manager will consider the promote action failed.

  • 7/27/2019 Ra Dev Guide

    19/34

    Resource agent actions

    15

    while true; do

    foobar_monitor

    if [ $? -eq $OCF_RUNNING_MASTER ]; then

    ocf_log debug "Resource promoted"

    break

    else

    ocf_log debug "Resource still awaiting promotion"sleep 1

    fi

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    5.7. demoteactionThe demote action is optional. It must only be supported by statefulresource agents, which

    means agents that discern between two distict roles: Master and Slave. Slave is functionallyidentical to the Started state in a stateless resource agent. Thus, while a regular (stateless)resource agent only needs to implement start and stop, a stateful resource agent must alsosupport the demote action to be able to make a transition between the Master and Started(Slave) roles.

    foobar_demote() {

    local rc

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # test the resource's current state

    foobar_monitor

    rc=$?

    case "$rc" in)

    "$OCF_RUNNING_MASTER")

    # Running as master. Normal, expected behavior.

    ocf_log debug "Resource is currently running as Master"

    ;;

    "$OCF_SUCCESS")

    # Alread running as slave. Nothing to do.

    ocf_log debug "Resource is currently running as Slave"

    return $OCF_SUCCESS

    ;;

    "$OCF_NOT_RUNNING")# Currently not running. Getting a demote action

    # in this state is unexpected. Exit with an error

    # and let the cluster manager recover.

    ocf_log err "Resource is currently not running"

    exit $OCF_ERR_GENERIC

    ;;

    *)

    # Failed resource. Let the cluster manager recover.

    ocf_log err "Unexpected error, cannot demote"

    exit $rc

    ;;

    esac

    # actually demote the resource here (make sure to immediately

  • 7/27/2019 Ra Dev Guide

    20/34

    Resource agent actions

    16

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ocf_run frobnicate --unset-master-mode || exit $OCF_ERR_GENERIC

    # After the resource has been demoted, check whether the

    # demotion worked. If the resource demotion is asynchronous, the

    # agent may spin on the monitor function here -- if the resource# does not assume the Slave role within the defined timeout, the

    # cluster manager will consider the demote action failed.

    while true; do

    foobar_monitor

    if [ $? -eq $OCF_RUNNING_MASTER ]; then

    ocf_log debug "Resource still awaiting promotion"

    sleep 1

    else

    ocf_log debug "Resource demoted"

    break

    fi

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    5.8.migrate_toactionThe migrate_to action can serve one of two purposes:

    Initiate a native push type migration for the resource. In other words, instruct the resourceto move to a specific node from the node it is currently running on. The resource agent

    knows about its destination node via the $OCF_RESKEY_CRM_meta_migrate_targetenvironment variable.

    Freeze the resource in a freeze/thaw(also known as suspend/resume) type migration. In thismode, the resource does not need any information about its destination node at this point.

    The example below illustrates a push type migration:

    foobar_migrate_to() {

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # if resource is not running, bail out early

    if ! foobar_monitor; thenocf_log err "Resource is not running"

    exit $OCF_ERR_GENERIC

    fi

    # actually start up the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ocf_run frobnicate --migrate \

    --dest=$OCF_RESKEY_CRM_meta_migrate_target \

    || exit OCF_ERR_GENERIC

    ...

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

  • 7/27/2019 Ra Dev Guide

    21/34

    Resource agent actions

    17

    }

    In contrast, a freeze/thaw type migration may implement its freeze operation like this:

    foobar_migrate_to() {

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # if resource is not running, bail out early

    if ! foobar_monitor; then

    ocf_log err "Resource is not running"

    exit $OCF_ERR_GENERIC

    fi

    # actually start up the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ocf_run frobnicate --freeze || exit OCF_ERR_GENERIC

    ...

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    5.9. migrate_fromactionThe migrate_from action can serve one of two purposes:

    Complete a native push type migration for the resource. In other words, checkwhether the migration has succeeded properly, and the resource is running onthe local node. The resource agent knows about its the migration source via the$OCF_RESKEY_CRM_meta_migrate_source environment variable.

    Thaw the resource in a freeze/thaw (also known as suspend/resume) type migration. In thismode, the resource usually not need any information about its source node at this point.

    The example below illustrates a push type migration:

    foobar_migrate_from() {

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # After the resource has been migrated, check whether it resumed

    # correctly. If the resource starts asynchronously, the agent may

    # spin on the monitor function here -- if the resource does not

    # run within the defined timeout, the cluster manager will

    # consider the migrate_from action failed

    while ! foobar_monitor; do

    ocf_log debug "Resource has not yet migrated, waiting"

    sleep 1

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

    In contrast, a freeze/thaw type migration may implement its thaw operation like this:

    foobar_migrate_from() {

  • 7/27/2019 Ra Dev Guide

    22/34

    Resource agent actions

    18

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # actually start up the resource here (make sure to immediately

    # exit with an $OCF_ERR_ error code if anything goes seriously

    # wrong)

    ocf_run frobnicate --thaw || exit OCF_ERR_GENERIC

    # After the resource has been migrated, check whether it resumed

    # correctly. If the resource starts asynchronously, the agent may

    # spin on the monitor function here -- if the resource does not

    # run within the defined timeout, the cluster manager will

    # consider the migrate_from action failed

    while ! foobar_monitor; do

    ocf_log debug "Resource has not yet migrated, waiting"

    sleep 1

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expectedreturn $OCF_SUCCESS

    }

    5.10. notifyactionWith notifications, instances of clones (and of master/slave resources, which are an extended kindof clones) can inform each other about their state. When notifications are enabled, any action onany instance of a clone carries a pre and post notification. Then, the cluster manager invokesthe notify operation on allclone instances. For notify operations, additional environmentvariables are passed into the resource agent during execution:

    $OCF_RESKEY_CRM_meta_notify_type the notification type (pre or post)

    $OCF_RESKEY_CRM_meta_notify_operation the operation (action) that thenotification is about (start, stop, promote, demote etc.)

    $OCF_RESKEY_CRM_meta_notify_start_uname node name of the node where theresource is being started (start notifications only)

    $OCF_RESKEY_CRM_meta_notify_stop_uname node name of the node where theresource is being stopped (stop notifications only)

    $OCF_RESKEY_CRM_meta_notify_master_uname node name of the node where theresource currently is in the Master role

    $OCF_RESKEY_CRM_meta_notify_promote_uname node name of the node wherethe resource currently is being promoted to the Master role (promote notifications only)

    $OCF_RESKEY_CRM_meta_notify_demote_uname node name of the node where theresource currently is being demoted to the Slave role (demote notifications only)

    Notifications come in particularly handy for master/slave resources using a "pull" scheme, wherethe master is a publisher and the slave a subscriber. Since the master is obviously only available assuch when a promotion has occurred, the slaves can use a "pre-promote" notification to configurethemselves to subscribe to the right publisher.

    Likewise, the subscribers may want to unsubscribe from the publisher after it has relinquished its

    master status, and a "post-demote" notification can be used for that purpose.

    Consider the example below to illustrate the concept.

  • 7/27/2019 Ra Dev Guide

    23/34

    Resource agent actions

    19

    foobar_notify() {

    local type_op

    type_op="${OCF_RESKEY_CRM_meta_notify_type}-${OCF_RESKEY_CRM_meta_notify_op

    ocf_log debug "Received $type_op notification."

    case "$type_op" in

    'pre-promote')ocf_run frobnicate --slave-mode \

    --master=$OCF_RESKEY_CRM_meta_notify_promote_una

    || exit $OCF_ERR_GENERIC

    ;;

    'post-demote')

    ocf_run frobnicate --unset-slave-mode || exit $OCF_ERR_GENERIC

    ;;

    esac

    return $OCF_SUCCESS

    }

    Note

    A master/slave resource agent may support a multi-masterconfiguration, wherethere is possibly more than one master at any given time. If that is the case, thenthe $OCF_RESKEY_CRM_meta_notify_*_uname variables may each contain aspace-separated lists of hostnames, rather than a single host name as shown in theexample. Under those circumstances the resource agent would have to properlyiterate over this list.

  • 7/27/2019 Ra Dev Guide

    24/34

    20

    Chapter 6. ScriptvariablesThis section outlines variables typically available to resource agents, primarily for conveniencepurposes. For additional variables available while the agent is being executed, refer to Section 2.1,Environment variables [2] and Chapter 3, Return codes[5].

    6.1. $OCF_ROOTThe root of the OCF resource agent hierarchy. This should never be changed by a resource agent.This is usually /usr/lib/ocf.

    6.2. $OCF_FUNCTIONS_DIRThe directory where the resource agents shell function library,.ocf-shellfuncs, resides. Thisis usually defined in terms of$OCF_ROOT and should never be changed by a resource agent. Thisvariable may, however, be overridden from the command line while testing a new or modifiedresource agent.

    6.3. $OCF_RESOURCE_INSTANCEThe resource instance name. For primitive (non-clone, non-stateful) resources, this is simply theresource name. For clones and stateful resources, this is the primitive name, followed by a colonan the clone instance number (such as p_foobar:0).

    6.4. $__OCF_ACTION

    The currently invoked action. This is exactly the first command-line argument that the clustermanager specifies when it invokes the resource agent.

    6.5. $__SCRIPT_NAMEThe name of the resource agent. This is exactly the base name of the resource agent script, withleading directory names removed.

    6.6. $HA_RSCTMPA temporary directory for use by resource agents. The system startup sequence (on any LSB

    compliant Linux distribution) guarantees that this directory is emptied on system startup, so thisdirectory will not contain any stale data after a node reboot.

  • 7/27/2019 Ra Dev Guide

    25/34

    21

    Chapter 7. Conveniencefunctions

    7.1. Logging:ocf_log

    Resource agents should use theocf_log function for logging purposes. This convenient loggingwrapper is invoked as follows:

    ocf_log "Log message"

    It supports following the following severity levels:

    debug for debugging messages. Most logging configurations suppress this level by default.

    info for informational messages about the agents behavior or status.

    warn for warnings. This is for any messages which reflect unexpected behavior that doesnotconstitute an unrecoverable error.

    err for errors. As a general rule, this logging level should only be used immediately prior toan exit with the appropriate error code.

    crit for critical errors. As witherr, this logging level should not be used unless the resourceagent also exits with an error code. Very rarely used.

    7.2. Testingforbinaries:have_binaryandcheck_binary

    A resource agent may need to test for the availability of a specific executable. Thehave_binary

    convenience function comes in handy here:

    if ! have_binary frobnicate; then

    ocf_log warn "Missing frobnicate binary, frobnication disabled!"

    fi

    If a missing binary is a fatal problem for the resource, then thecheck_binary function shouldbe used:

    check_binary frobnicate

    Using check_binary is a shorthand method for testing for the existence (and executability) ofthe specified binary, and exiting with$OCF_ERR_INSTALLED if it cannot be found or executed.

    Note

    Both have_binary and check_binary honor $PATH when the binary to testfor is not specified as a full path. It is usually wise to nottest for a full path, as binaryinstallations path may vary by distribution or user policy.

    7.3. Executingcommandsandcapturingtheiroutput:ocf_run

    Whenever a resource agent needs to execute a command and capture its output, it should use

    the ocf_run convenience function, invoked as in this example:

    ocf_run "frobnicate --spam=eggs" || exit $OCF_ERR_GENERIC

  • 7/27/2019 Ra Dev Guide

    26/34

    Convenience functions

    22

    With the command specified above, the resource agent will invoke frobnicate --spam=eggs and capture its output and exit code. If the exit code is nonzero (indicating anerror), ocf_run logs the command output with the err logging severity, and the resource agentsubsequently exits.

    If the resource agent wishes to capture the output ofboth a successful and a failed command

    execution, it can use the -v flag with ocf_run. In the example below, ocf_run will log anyoutput from the command with the info severity if the command exit code is zero (indicatingsuccess), and with err if it is nonzero.

    ocf_run -v "frobnicate --spam=eggs" || exit $OCF_ERR_GENERIC

    Finally, if the resource agent wants to log the output of a command with a nonzero exit code witha severity otherthan error, it may do so by adding the -info or -warn option to ocf_run:

    ocf_run -warn "frobnicate --spam=eggs"

    7.4. Locks:ocf_take_lockand

    ocf_release_lock_on_exitOccasionally, there may be different resources of the same type in a cluster configurationthat should not execute actions in parallel. When a resource agent needs to guardagainst parallel execution on the same machine, it can use the ocf_take_lock andocf_release_lock_on_exit convenience functions:

    LOCKFILE=${HA_RSCTMP}/foobar

    ocf_release_lock_on_exit $LOCKFILE

    foobar_start() {

    ...

    ocf_take_lock $LOCKFILE

    ...}

    ocf_take_lock attempts to acquire the designated $LOCKFILE. When it is unavailable,it sleeps a random amount of time between 0 and 1 seconds, and retries.ocf_release_lock_on_exit releases the lock file when the agent exits (for any reason).

    7.5. Testingfornumericalvalues:ocf_is_decimal

    Specifically for parameter validation, it can be helpful to test whether a given value is numeric.

    The ocf_is_decimal function exists for that purpose:

    foobar_validate_all() {

    if ! ocf_is_decimal $OCF_RESKEY_eggs; then

    ocf_log err "eggs is not numeric!"

    exit $OCF_ERR_CONFIGURED

    fi

    ...

    }

    7.6. Testingforbooleanvalues:ocf_is_true

    When a resource agent defines a boolean parameter, the value for this parameter may be specifiedby the user as 0/1, true/false, or on/off. Since it is tedious to test for all these values fromwithin the resource agent, the agent should instead use theocf_is_trueconvenience function:

  • 7/27/2019 Ra Dev Guide

    27/34

    Convenience functions

    23

    if ocf_is_true $OCF_RESKEY_superfrobnicate; then

    ocf_run "frobnicate --super"

    fi

    Note

    Ifocf_is_true is used against an empty or non-existant variable, it always returnsan exit code of1, which is equivalent to false.

    7.7. Pseudoresources:ha_pseudo_resource"Pseudo resources" are those where the resource agent in fact does not actually start or stopsomething akin to a runnable process, but merely executes a single action and then needs someform of tracing whether that action has been executed or not. Theportblock resource agentis an example of this.

    Resource agents for pseudo resources can use a convenience function,ha_pseudo_resource ,which makes use oftracking filesto keep tabs on the status of a resource. Iffoobarwas designed

    to manage a pseudo resource, then itsstart action could look like this:

    foobar_start() {

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    # if resource is already running, bail out early

    if foobar_monitor; then

    ocf_log info "Resource is already running"

    return $OCF_SUCCESS

    fi

    # start the pseudo resourceha_pseudo_resource ${OCF_RESOURCE_INSTANCE} start

    # After the resource has been started, check whether it started up

    # correctly. If the resource starts asynchronously, the agent may

    # spin on the monitor function here -- if the resource does not

    # start up within the defined timeout, the cluster manager will

    # consider the start action failed

    while ! foobar_monitor; do

    ocf_log debug "Resource has not started yet, waiting"

    sleep 1

    done

    # only return $OCF_SUCCESS if _everything_ succeeded as expected

    return $OCF_SUCCESS

    }

  • 7/27/2019 Ra Dev Guide

    28/34

    24

    Chapter 8. Specialconsiderations

    8.1. Licensing

    Whenever possible, resource agent contributors are encouragedto use the GNU General PublicLicense (GPL), version 2 and later, for any new resource agents. The shell functions library doesnot strictly mandate this, however, as it is licensed under the GNU Lesser General Public License(LGPL), version 2.1 and later (so it can be used by non-GPL agents).

    The resource agent mustexplicitly state its own license in the agent source code.

    8.2. Localesettings

    When sourcing .ocf-shellfuncsas explained in Section 4.3, Initialization [8], any resourceagent automatically sets LANG and LC_ALL to the C locale. Resource agents can thus expect to

    always operate in the C locale, and need not resetLANG or any of the LC_environment variablesthemselves.

    8.3. Testingforrunningprocesses

    For testing whether a particular process (with a known process ID) is currently running, afrequently found method is to send it a 0 signal and catch errors, similar to this example:

    if kill -s 0 `cat $daemon_pid_file`; then

    ocf_log debug "Process is currently running"

    else

    ocf_log warn "Process is dead, removing pid file"rm -f $daemon_pid_file

    if

    This method has a significant drawback: kill -s 0 does return successfully for zombieprocesses. Zombies, also known as defunct processes, are processes that no longer run but stillhold an entry in the process table. Thus, they must be considered failed resources for all meansand purposes, and for them the kill -s 0 approach yields a misleading, successful, result.

    The kill -s 0 approach can employ an additional safeguard (which, however, will work onLinux only):

    pid=`cat $daemon_pid_file`if kill -s 0 $pid; then

    # Process exists in process table, check its status

    if grep -E "State:[[:space:]]+Z \(zombie\)" /proc/$pid/status; then

    ocf_log err "Process is defunct"

    # Bail out and let the cluster manager recover

    exit $OCF_ERR_GENERIC

    else

    ocf_log_debug "Process is currently running"

    fi

    else

    ocf_log warn "Process is dead, removing pid file"

    rm -f $daemon_pid_file

    if

  • 7/27/2019 Ra Dev Guide

    29/34

    Special considerations

    25

    Important

    An approach far superior to both these examples is to instead test the functionalityof the daemon by connecting to it with a client process, as shown in the example inSection 5.3, monitor action [12].

    8.4. SpecifyingamasterpreferenceStateful (master/slave) resources must set their own master preference they can thus providehints to the cluster manager which is the the best instance to promote to the Master role.

    Important

    It is acceptable for multiple instances to have identical positive master preferences.In that case, the cluster resource manager will automatically select a resource agentto promote. However, ifall instances have the (default) master score of zero, thecluster manager will not promote any instance at all. Thus, it is crucial that at least

    one instance has a positive master score.

    For this purpose, crm_master comes in handy. This convenience wrapper around thecrm_attribute sets a node attribute named master-$OCF_RESOURCE_INSTANCE [20]

    for the node it is being executed on, and fills this attribute with the specified value. The clustermanager is then expected to translate this into a promotion score for the corresponding instance,and base its promotion preference on that score.

    Stateful resource agents typically execute crm_master during the monitor [12] and/ornotify [18] action.

    The following example assumes that thefoobar resource agent can test the applications statusby executing a binary that returns certain exit codes based on whether

    the resource is either in the master role, or is a slave that is fully caught up with the master (atany rate, it has current data), or

    the resource is in the slave role, but through some form of asynchronous replication has "fallenbehind" the master, or

    the resource has gracefully stopped, or

    the resource has unexpectedly failed.

    foobar_monitor() {

    local rc

    # exit immediately if configuration is not valid

    foobar_validate_all || exit $?

    ocf_run frobnicate --test

    # This example assumes the following exit code convention

    # for frobnicate:

    # 0: running, and fully caught up with master

    # 1: gracefully stopped

    # 2: running, but lagging behind master

    # any other: error

    case "$?" in0)

    rc=$OCF_SUCCESS

  • 7/27/2019 Ra Dev Guide

    30/34

    Special considerations

    26

    ocf_log debug "Resource is running"

    # Set a high master preference. The current master

    # will always get this, plus 1. Any current slaves

    # will get a high preference so that if the master

    # fails, they are next in line to take over.

    crm_master -l reboot -v 100

    ;;1)

    rc=$OCF_NOT_RUNNING

    ocf_log debug "Resource is not running"

    # Remove the master preference for this node

    crm_master -l reboot -D

    ;;

    2)

    rc=$OCF_SUCCESS

    ocf_log debug "Resource is lagging behind master"

    # Set a low master preference: if the master fails

    # right now, and there is another slave that does

    # not lag behind the master, its higher master# preference will win and that slave will become

    # the new master

    crm_master -l reboot -v 5

    ;;

    *)

    ocf_log err "Resource has failed"

    exit $OCF_ERR_GENERIC

    esac

    return $rc

    }

  • 7/27/2019 Ra Dev Guide

    31/34

    27

    Chapter 9. Testing,installing,andpackagingresourceagents

    This section discusses what to do with your resource agent once it is done how to test it,where to install it, and how to include it in either your own application package or in the Linux-HA resource agents repository.

    9.1. Testingresourceagents

    The resource agents repository (and hence, any installed resource agents package) containsa utility named ocf-tester. This shell script allows you to conveniently and easily test thefunctionality of your resource agent.

    ocf-tester is commonly invoked, as root, like this:

    ocf-tester -n [-o = ... ]

    is an arbitrary resource name.

    You may set any number of= with the -o option, corresponding to anyresource parameters you wish to set for testing.

    is the full path to your resource agent.

    When invoked, ocf-tester executes all mandatory actions and enforces action behavior asexplained in Chapter 5, Resource agent actions[10].

    It also tests for optional actions. Optional actions must behave as expected when advertised, butdo not cause ocf-tester to flag an error if not implemented.

    Important

    ocf-tester does not initiate "dry runs" of actions, nor does it create resourcedummies of any kind. Instead, it exercises the actual resource agent as-is, whetherthat may include opening and closing databases, mounting file systems, starting orstopping virtual machines, etc. Use with care.

    For example, you could run ocf-tester on the foobar resource agent as follows:

    # ocf-tester -n foobartest \

    -o superfrobnicate=true \

    -o datadir=/tmp \

    /home/johndoe/ra-dev/foobar

    Beginning tests for /home/johndoe/ra-dev/foobar...

    * Your agent does not support the notify action (optional)

    * Your agent does not support the reload action (optional)

    /home/johndoe/ra-dev/foobar passed all tests

    9.2. Installingresourceagents

    If you choose to include your resource agent in your own project, make sure it installs into

    the correct location. Resource agents should install into the /usr/lib/ocf/resource.d/ directory, where is the name of your project or any other name youwish to identify the resource agent with.

  • 7/27/2019 Ra Dev Guide

    32/34

    Testing, installing, andpackaging resource agents

    28

    For example, if your foobar resource agent is being packaged as part of a project namedfortytwo, then the correct full path to your resource agent would be /usr/lib/ocf/resource.d/fortytwo/foobar . Make sure your resource agent installs with 0755 (-rwxr-xr-x) permission bits.

    When installed this way, OCF-compliant cluster resource managers will be able to properly

    identify, parse, and execute your resource agent. The Pacemaker cluster manager, for example,would map the above-mentioned installation path to the ocf:fortytwo:foobar resourcetype identifier.

    9.3. PackagingresourceagentsWhen you package resource agents as part of your own project, you should apply theconsiderations outlined in this section.

    Note

    If you instead prefer to submit your resource agent to the Linux-HA resource agents

    repository, see Section 9.4, Submitting resource agents[29] for informationon doing so.

    9.3.1. RPMpackaging

    It is recommended to put your OCF resource agent(s) in an RPM sub-package, with the name-resource-agents . Ensure that the package owns its provider directory, anddepends on the upstream resource-agents package which lays out the directory hierarchyand provides convenience shell functions. An example RPM spec snippet is given below:

    %package resource-agents

    Summary: OCF resource agent for Foobar

    Group: System Environment/BaseRequires: %{name} = %{version}-%{release}, resource-agents

    %description resource-agents

    This package contains the OCF-compliant resource agents for Foobar.

    %files resource-agents

    %defattr(755,root,root,-)

    %dir %{_prefix}/lib/ocf/resource.d/fortytwo

    %{_prefix}/lib/ocf/resource.d/fortytwo/foobar

    Note

    If an RPM spec file contains a %package declaration, then RPM considers this asub-package which inherits top-level fields such asName,Version,License, etc.Sub-packages have the top-level package name automatically prepended to theirown name. Thus the snippet above would create a sub-package named foobar-resource-agents (presuming the package Name is foobar).

    9.3.2. Debianpackaging

    For Debian packages, like for RPMs [28], it is recommended to create a separate packageholding your resource agents, which then should depend on the cluster-agents package.

    NoteThis section assumes that you are packaging with debhelper.

  • 7/27/2019 Ra Dev Guide

    33/34

    Testing, installing, andpackaging resource agents

    29

    An example debian/control snippet is given below:

    Package: foobar-cluster-agents

    Priority: extra

    Architecture: all

    Depends: cluster-agents

    Description: OCF-compliant resource agents for Foobar

    You will also create a separate .install file. Sticking with the example of installing thefoobar resource agent as a sub-package offortytwo, the debian/fortytwo-cluster-agents.install file could consist of the following content:

    usr/lib/ocf/resource.d/fortytwo/foobar

    9.4. SubmittingresourceagentsIf you choose not to bundle your resource agent with your own package, but instead wish tosubmit it to the upstream resource agent repository hosted on the Linux-HA Mercurial server

    [http://hg.linux-ha.org/agents], please follow the steps outlined in this section.

    Create a working copy (a Mercurial clone) of the upstream repository with the followingcommand:

    hg clone http://hg.linux-ha.org/agents resource-agents

    Create a new Mercurial queue, and a new patchset:

    cd resource-agents

    hg qinit

    hg qnew --edit foobar-ra

    In your patch message, be sure to include a meaningful description, for example:

    High: foobar: new resource agent

    This new resource agent adds functionality to manage a foobar service.

    It supports being configured as a primitive or as a master/slave set,

    and also optionally supports superfrobnication.

    Then, copy your resource agent into the heartbeat subdirectory:

    cd heartbeat

    cp /path/to/your/local/copy/of/foobar .

    chmod 0755 foobar

    hg add foobar

    cd ..

    Next, modify the Makefile.am file in resource-agents/heartbeat and add your newresource agent to the ocf_SCRIPTS list. This will make sure the agent is properly installed.

    Lastly, open Makefile.am in resource-agents/doc and add ocf_heartbeat_.7to the man_MANS variable. This will automatically generate a resource agent manual page fromits metadata, and then install that man page into the correct location.

    Once all that is done, you can update your patch set:

    hg qrefresh

    Now the patch set is good for review on the mailing list:

    hg email [email protected] foobar-ra

    http://hg.linux-ha.org/agentshttp://hg.linux-ha.org/agentshttp://hg.linux-ha.org/agentshttp://hg.linux-ha.org/agentshttp://hg.linux-ha.org/agentshttp://hg.linux-ha.org/agents
  • 7/27/2019 Ra Dev Guide

    34/34

    Testing, installing, andpackaging resource agents

    Once your new resource agent has been accepted for merging, one of the upstream developerswill push your patch into the upstream repository. At that point, you can update your checkoutfrom upstream, and remove your own patch set.

    hg qpop -a

    hg pull --update

    hg qdelete foobar-ra