linux-ha with pacemaker

55
Linux High Availability Linux High Availability Kris Buytaert

Upload: kris-buytaert

Post on 06-May-2015

6.838 views

Category:

Technology


6 download

DESCRIPTION

My Linux HA with Pacemaker presentation As given at #load11

TRANSCRIPT

Page 1: Linux-HA with Pacemaker

Linux High AvailabilityLinux High Availability

Kris Buytaert

Page 2: Linux-HA with Pacemaker

Kris BuytaertKris Buytaert@krisbuytaert@krisbuytaert● I used to be a Dev, Then Became an I used to be a Dev, Then Became an

OpOp● Senior Linux and Open Source Senior Linux and Open Source

Consultant @inuits.beConsultant @inuits.be● „„Infrastructure Architect“Infrastructure Architect“● Building Clouds since before the Building Clouds since before the

Cloud Cloud ● Surviving the 10Surviving the 10thth floor test floor test● Co-Author of some books Co-Author of some books ● Guest Editor at some sitesGuest Editor at some sites

Page 3: Linux-HA with Pacemaker

What is HA Clustering ?What is HA Clustering ?

● One service goes down One service goes down

=> others take over its work=> others take over its work

● IP address takeover, service takeover, IP address takeover, service takeover,

● Not designed for high-performanceNot designed for high-performance

● Not designed for high troughput (load balancing)Not designed for high troughput (load balancing)

Page 4: Linux-HA with Pacemaker

Does it Matter ?Does it Matter ?

● Downtime is expensiveDowntime is expensive

● You mis out on $$$You mis out on $$$

● Your boss complains Your boss complains

● New users don't returnNew users don't return

Page 5: Linux-HA with Pacemaker

Lies, Damn Lies, and Lies, Damn Lies, and StatisticsStatistics

Counting ninesCounting nines(slide by Alan R)(slide by Alan R)

99.9999% 30 sec99.999% 5 min99.99% 52 min99.9% 9  hr  99% 3.5 day

Page 6: Linux-HA with Pacemaker

The Rules of HAThe Rules of HA

● Keep it SimpleKeep it Simple● Keep it SimpleKeep it Simple● Prepare for FailurePrepare for Failure● Complexity is the enemy of Complexity is the enemy of

reliabilityreliability● Test your HA setup Test your HA setup

Page 7: Linux-HA with Pacemaker

MythsMyths● Virtualization will solve your HA NeedsVirtualization will solve your HA Needs

● Live migration is the solution to all your problemsLive migration is the solution to all your problems

● VM mirroring is the solution to all your problemsVM mirroring is the solution to all your problems

● HA will make your platform more stableHA will make your platform more stable

Page 8: Linux-HA with Pacemaker

Eliminating the SPOFEliminating the SPOF● Find out what Will Fail

• Disks• Fans• Power (Supplies)

● Find out what Can Fail• Network• Going Out Of Memory

Page 9: Linux-HA with Pacemaker

Split BrainSplit Brain● Communications failures can lead to separated partitions of Communications failures can lead to separated partitions of

the clusterthe cluster

● If those partitions each try and take control of the cluster, If those partitions each try and take control of the cluster, then it's called a split-brain conditionthen it's called a split-brain condition

● If this happens, then bad things will happenIf this happens, then bad things will happen

• http://linux-ha.org/BadThingsWillHappenhttp://linux-ha.org/BadThingsWillHappen

Page 10: Linux-HA with Pacemaker

You care about ?You care about ?

● Your data ?Your data ?• ConsistentConsistent• RealitimeRealitime• Eventual Consistent Eventual Consistent

● Your ConnectionYour Connection• AlwaysAlways• Most of the timeMost of the time

Page 11: Linux-HA with Pacemaker

Shared StorageShared Storage● Shared StorageShared Storage

● Filesystem Filesystem

• e.g GFS, GpFSe.g GFS, GpFS

● Replicated ?Replicated ?

● Exported Exported Filesystem ?Filesystem ?

● $$$ $$$ 1+1 <> 21+1 <> 2

● Storage = SPOF Storage = SPOF

● Split Brain :(Split Brain :(

● StonithStonith

Page 12: Linux-HA with Pacemaker

(Shared) Data(Shared) Data● Issues : Issues :

• Who Writes ? Who Writes ?

• Who Reads ?Who Reads ?

• What if 2 Active application want to write ? What if 2 Active application want to write ?

• What if an active server crashes during writing ?What if an active server crashes during writing ?

• Can we accept delays ? Can we accept delays ?

• Can we accept readonly data ?Can we accept readonly data ?

● Hardware Requirements Hardware Requirements

● Filesystem Requirements (GFS, GpFS, ...) Filesystem Requirements (GFS, GpFS, ...)

Page 13: Linux-HA with Pacemaker

DRBDDRBD● Distributed Replicated Block DeviceDistributed Replicated Block Device

● In the Linux Kernel (as of very recent)In the Linux Kernel (as of very recent)

● Usually only 1 mountUsually only 1 mount

• Multi mount as of 8.X Multi mount as of 8.X

• Requires GFS / OCFS2Requires GFS / OCFS2

● Regular FS ext3 ... Regular FS ext3 ...

● Only 1 application instance Active accessing dataOnly 1 application instance Active accessing data

● Upon Failover application needs to be started on other nodeUpon Failover application needs to be started on other node

Page 14: Linux-HA with Pacemaker

DRBD(2)DRBD(2)● What happens when you pull the plug of a Physical What happens when you pull the plug of a Physical

machine ? machine ?

• Minimal TimeoutMinimal Timeout

• Why did the crash happen ? Why did the crash happen ?

• Is my data still correct ?Is my data still correct ?

Page 15: Linux-HA with Pacemaker

Alternatives to DRBDAlternatives to DRBD● GlusterFS looked promising GlusterFS looked promising

• ““Friends don't let Friends use Gluster”Friends don't let Friends use Gluster”

• Consistency problems Consistency problems

• Stability ProblemsStability Problems

• Maybe laterMaybe later

● MogileFSMogileFS

• Not posix Not posix

• App needs to implement the APIApp needs to implement the API

● Ceph Ceph

• ??

Page 16: Linux-HA with Pacemaker

HA ProjectsHA Projects● Linux HA ProjectLinux HA Project

● Red Hat Cluster SuiteRed Hat Cluster Suite

● LVS/KeepalivedLVS/Keepalived

● Application Specific Clustering SoftwareApplication Specific Clustering Software

• e.g Terracotta, MySQL NDBDe.g Terracotta, MySQL NDBD

Page 17: Linux-HA with Pacemaker

Heartbeat Heartbeat ● Heartbeat v1Heartbeat v1

• Max 2 nodesMax 2 nodes

• No finegrained resourcesNo finegrained resources

• Monitoring using “mon”Monitoring using “mon”

● Heartbeat v2Heartbeat v2

• XML usage was a consulting opportunityXML usage was a consulting opportunity

• Stability issuesStability issues

• Forking ?Forking ?

Page 18: Linux-HA with Pacemaker

Heartbeat v1Heartbeat v1

/etc/ha.d/ha.cf/etc/ha.d/ha.cf

/etc/ha.d/haresources/etc/ha.d/haresourcesmdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \

IPaddr2::10.16.0.13/16/bond0.16 monIPaddr2::10.16.0.13/16/bond0.16 mon

/etc/ha.d/authkeys/etc/ha.d/authkeys

Page 19: Linux-HA with Pacemaker

Heartbeat v2Heartbeat v2

““A consulting Opportunity”A consulting Opportunity”

LMBLMB

Page 20: Linux-HA with Pacemaker

Clone ResourceClone Resource

Clones in v2 were buggyClones in v2 were buggy

Resources were started on 2 nodesResources were started on 2 nodes

Stopped again on “1” Stopped again on “1”

Page 21: Linux-HA with Pacemaker

Heartbeat v3Heartbeat v3

• No more /etc/ha.d/haresourcesNo more /etc/ha.d/haresources

• No more xmlNo more xml

• Better integrated monitoringBetter integrated monitoring

• /etc/ha.d/ha.cf has /etc/ha.d/ha.cf has

• crm=yescrm=yes

Page 22: Linux-HA with Pacemaker

Pacemaker ?Pacemaker ?● Not a fork Not a fork

● Only CRM Code taken out of Heartbeat Only CRM Code taken out of Heartbeat

● As of Heartbeat 2.1.3As of Heartbeat 2.1.3

• Support for both OpenAIS / HeartBeatSupport for both OpenAIS / HeartBeat

• Different Release Cycles as Heartbeat Different Release Cycles as Heartbeat

Page 23: Linux-HA with Pacemaker

Heartbeat, OpenAis, Heartbeat, OpenAis, Corosync ?Corosync ?● All Messaging LayersAll Messaging Layers

● Initially only HeartbeatInitially only Heartbeat

● OpenAISOpenAIS

● Heartbeat got unmaintainedHeartbeat got unmaintained

● OpenAIS had heisenbugs :(OpenAIS had heisenbugs :(

● Corosync Corosync

● Heartbeat maintenance taken over by LinBitHeartbeat maintenance taken over by LinBit

● CRM Detects which layerCRM Detects which layer

Page 24: Linux-HA with Pacemaker

OpenAISHeartbeat

Pacemaker

Cluster Glue

or

Page 25: Linux-HA with Pacemaker

● Stonithd : The Heartbeat fencing subsystem.

● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts).

● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration.

● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes.

● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster.

● openais messaging and membership layer.

● heartbeat messaging layer, an alternative to OpenAIS.

● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.

Pacemaker ArchitecturePacemaker Architecture

Page 26: Linux-HA with Pacemaker

Configuring Heartbeat with puppetConfiguring Heartbeat with puppet

heartbeat::hacf {"clustername":heartbeat::hacf {"clustername":

hosts => ["host-a","host-b"],hosts => ["host-a","host-b"],

hb_nic => ["bond0"],hb_nic => ["bond0"],

hostip1 => ["10.0.128.11"],hostip1 => ["10.0.128.11"],

hostip2 => ["10.0.128.12"],hostip2 => ["10.0.128.12"],

ping => ["10.0.128.4"],ping => ["10.0.128.4"],

} }

heartbeat::authkeys {"ClusterName":heartbeat::authkeys {"ClusterName":

password => “ClusterName ",password => “ClusterName ",

}}

http://github.com/jtimberman/puppet/tree/master/heartbeat/http://github.com/jtimberman/puppet/tree/master/heartbeat/

Page 27: Linux-HA with Pacemaker

CRM CRM ● Cluster Resource ManagerCluster Resource Manager

● Keeps Nodes in SyncKeeps Nodes in Sync

● XML BasedXML Based

● cibadm cibadm

● Cli manageableCli manageable

● Crm Crm

configureconfigureproperty $id="cib­bootstrap­options" \property $id="cib­bootstrap­options" \                stonith­enabled="FALSE" \stonith­enabled="FALSE" \                no­quorum­policy=ignore \no­quorum­policy=ignore \                start­failure­is­fatal="FALSE" \start­failure­is­fatal="FALSE" \rsc_defaults $id="rsc_defaults­options" \rsc_defaults $id="rsc_defaults­options" \                migration­threshold="1" \migration­threshold="1" \                failure­timeout="1"failure­timeout="1"primitive d_mysql ocf:local:mysql \primitive d_mysql ocf:local:mysql \                op monitor interval="30s" \op monitor interval="30s" \                params test_user="sure" params test_user="sure" test_passwd="illtell" test_table="test.table"test_passwd="illtell" test_table="test.table"primitive ip_db ocf:heartbeat:IPaddr2 \primitive ip_db ocf:heartbeat:IPaddr2 \                params ip="172.17.4.202" nic="bond0" \params ip="172.17.4.202" nic="bond0" \                op monitor interval="10s"op monitor interval="10s"group svc_db d_mysql ip_dbgroup svc_db d_mysql ip_dbcommitcommit

Page 28: Linux-HA with Pacemaker

Heartbeat ResourcesHeartbeat Resources● LSBLSB

● Heartbeat resource (+status)Heartbeat resource (+status)

● OCF (Open Cluster FrameWork) (+monitor)OCF (Open Cluster FrameWork) (+monitor)

● Clones (don't use in HAv2)Clones (don't use in HAv2)

● Multi State ResourcesMulti State Resources

Page 29: Linux-HA with Pacemaker

LSB Resource AgentsLSB Resource Agents● LSB == Linux Standards BaseLSB == Linux Standards Base

● LSB resource agents are standard System V-style init LSB resource agents are standard System V-style init scripts commonly used on Linux and other UNIX-like OSes scripts commonly used on Linux and other UNIX-like OSes

● LSB init scripts are stored under /etc/init.d/LSB init scripts are stored under /etc/init.d/

● This enables Linux-HA to immediately support nearly every This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages service that comes with your system, and most packages which come with their own init scriptwhich come with their own init script

● It's straightforward to change an LSB script to an OCF It's straightforward to change an LSB script to an OCF scriptscript

Page 30: Linux-HA with Pacemaker

OCF OCF ● OCF == Open Cluster FrameworkOCF == Open Cluster Framework

● OCF Resource agents are the most powerful OCF Resource agents are the most powerful

type of resource agent we supporttype of resource agent we support

● OCF RAs are extended init scriptsOCF RAs are extended init scripts• They have additional actions:They have additional actions:

• monitor – for monitoring resource healthmonitor – for monitoring resource health• meta-data – for providing information meta-data – for providing information

about the RA about the RA

● OCF RAs are located in OCF RAs are located in /usr/lib/ocf/resource.d/provider-name//usr/lib/ocf/resource.d/provider-name/

Page 31: Linux-HA with Pacemaker

MonitoringMonitoring● Defined in the OCF Resource scriptDefined in the OCF Resource script

● Configured in the parametersConfigured in the parameters

● You have to support multiple states You have to support multiple states

• Not runningNot running

• RunningRunning

• FailedFailed

Page 32: Linux-HA with Pacemaker

Anatomy of a Cluster Anatomy of a Cluster configconfig

• Cluster propertiesCluster properties

• Resource DefaultsResource Defaults

• Primitive DefinitionsPrimitive Definitions

• Resource Groups and ConstraintsResource Groups and Constraints

Page 33: Linux-HA with Pacemaker

Cluster PropertiesCluster Properties

property $id="cib-bootstrap-options" \ property $id="cib-bootstrap-options" \

stonith-enabled="FALSE" \ stonith-enabled="FALSE" \

no-quorum-policy="ignore" \ no-quorum-policy="ignore" \

start-failure-is-fatal="FALSE" \ start-failure-is-fatal="FALSE" \

No-quorum-policy = We'll ignore the loss of quorum on a 2 node clusterNo-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster

Start-failure : Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-stickinessvalue for resource-failure-stickiness

Page 34: Linux-HA with Pacemaker

Resource DefaultsResource Defaults

rsc_defaults $id="rsc_defaults-options" \ rsc_defaults $id="rsc_defaults-options" \ migration-threshold="1" \ migration-threshold="1" \ failure-timeout="1" \ failure-timeout="1" \ resource-stickiness="INFINITY" resource-stickiness="INFINITY"

failure-timeout means that after a failure there will be a 60 second timeout before the failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the node on which it failed.resource can come back to the node on which it failed.

Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeMigration-treshold=1 means that after 1 failure the resource will try to start on the other node

Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.

Page 35: Linux-HA with Pacemaker

Primitive DefinitionsPrimitive Definitions

primitive d_mine ocf:custom:tomcat \primitive d_mine ocf:custom:tomcat \ params instance_name="mine" \params instance_name="mine" \ monitor_urls="health.html" \monitor_urls="health.html" \ monitor_use_ssl="no" \ monitor_use_ssl="no" \ op monitor interval="15s" \op monitor interval="15s" \

on-fail="restart" \ on-fail="restart" \

primitive ip_mine_svc ocf:heartbeat:IPaddr2 \primitive ip_mine_svc ocf:heartbeat:IPaddr2 \ params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \ op monitor interval="10s"op monitor interval="10s"

Page 36: Linux-HA with Pacemaker

Parsing a configParsing a config● Isn't always done correctlyIsn't always done correctly

● Even a verify won't find all issuesEven a verify won't find all issues

● Unexpected behaviour might occurUnexpected behaviour might occur

Page 37: Linux-HA with Pacemaker

Where a resource runs Where a resource runs • multi state resourcesmulti state resources

• Master – Slave , Master – Slave , • e.g mysql master-slave, drbde.g mysql master-slave, drbd

• ClonesClones• Resources that can run on multiple nodes Resources that can run on multiple nodes

e.ge.g• Multimaster mysql serversMultimaster mysql servers• Mysql slavesMysql slaves• Stateless applicationsStateless applications

• location location • Preferred location to run resource, eg. Based on hostnamePreferred location to run resource, eg. Based on hostname

• colocation colocation • Resources that have to live together Resources that have to live together

• e.g ip address + servicee.g ip address + service• order order

Define what resource has to start first, or wait for another Define what resource has to start first, or wait for another resourceresource

• groups groups • Colocation + orderColocation + order

Page 38: Linux-HA with Pacemaker

eg. A Service on DRBDeg. A Service on DRBD● DRBD can only be active on 1 nodeDRBD can only be active on 1 node

● The filesystem needs to be mounted on that active DRBD The filesystem needs to be mounted on that active DRBD nodenode

group svc_mine d_mine ip_minegroup svc_mine d_mine ip_mine

ms ms_drbd_storage drbd_storage \ ms ms_drbd_storage drbd_storage \

meta master_max="1" master_node_max="1" clone_max="2" meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1" notify="true" clone_node_max="1" notify="true"

colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master

order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start

location cli-prefer-svc_db svc_db \ location cli-prefer-svc_db svc_db \

rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-arule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a

Page 39: Linux-HA with Pacemaker

Crm commandsCrm commands

Crm Crm Start the cluster resource managerStart the cluster resource manager

Crm resourceCrm resourceChange in to resource modeChange in to resource mode

Crm configureCrm configureChange into configure modeChange into configure mode

Crm configure show Crm configure show Show the current resource config Show the current resource config

Crm resource showCrm resource showShow the current resource stateShow the current resource state

Cibadm -QCibadm -QDump the full Cluster Information Base in XML Dump the full Cluster Information Base in XML

Page 40: Linux-HA with Pacemaker

Using crmUsing crm● Crm configureCrm configure

● Edit primitiveEdit primitive

● VerifyVerify

● CommitCommit

Page 41: Linux-HA with Pacemaker

But We love XMLBut We love XML● Cibadm -Q Cibadm -Q

Page 42: Linux-HA with Pacemaker

Checking the Cluster Checking the Cluster StateState

crm_mon -1 crm_mon -1

============ ============ Last updated: Wed Nov 4 16:44:26 2009 Last updated: Wed Nov 4 16:44:26 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============

Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]

Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS d_XMSd_XMS (ocf::ntc:XMS):(ocf::ntc:XMS): Started xms-2 Started xms-2 ip_XMSip_XMS (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-2 Started xms-2 ip_XMS_publicip_XMS_public (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-2 Started xms-2

Page 43: Linux-HA with Pacemaker

Stopping a resourceStopping a resourcecrm resource stop svc_XMS crm resource stop svc_XMS

crm_mon -1 crm_mon -1

============ ============ Last updated: Wed Nov 4 16:56:05 2009 Last updated: Wed Nov 4 16:56:05 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============

Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]

Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1

Page 44: Linux-HA with Pacemaker

Starting a resourceStarting a resourcecrm resource start svc_XMS crm resource start svc_XMS crm_mon -1 crm_mon -1

============ ============ Last updated: Wed Nov 4 17:04:56 2009 Last updated: Wed Nov 4 17:04:56 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. ============ ============

Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ]

Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS

Page 45: Linux-HA with Pacemaker

Moving a resourceMoving a resource● Resource migrateResource migrate

● Is permanentIs permanent , even upon failure , even upon failure

● Usefull in upgrade scenariosUsefull in upgrade scenarios

● Use resource unmigrate to restore Use resource unmigrate to restore

Page 46: Linux-HA with Pacemaker

Moving a resourceMoving a resource[xpoll-root@XMS-1 ~]# crm resource migrate svc_XMS xms-1 [xpoll-root@XMS-1 ~]# crm resource migrate svc_XMS xms-1 [xpoll-root@XMS-1 ~]# crm_mon -1 [xpoll-root@XMS-1 ~]# crm_mon -1 Last updated: Wed Nov 4 17:32:50 2009 Last updated: Wed Nov 4 17:32:50 2009 Stack: Heartbeat Stack: Heartbeat Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Current DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, unknown expected votes 2 Nodes configured, unknown expected votes 2 Resources configured. 2 Resources configured. Online: [ xms-1 xms-2 ] Online: [ xms-1 xms-2 ] Resource Group: svc_mysql Resource Group: svc_mysql d_mysqld_mysql (ocf::ntc:mysql):(ocf::ntc:mysql): Started xms-1 Started xms-1 ip_mysqlip_mysql (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 Resource Group: svc_XMS Resource Group: svc_XMS d_XMSd_XMS (ocf::ntc:XMS):(ocf::ntc:XMS): Started xms-1 Started xms-1 ip_XMSip_XMS (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1 ip_XMS_publicip_XMS_public (ocf::heartbeat:IPaddr2):(ocf::heartbeat:IPaddr2): Started xms-1 Started xms-1

Page 47: Linux-HA with Pacemaker

Migrate vs StandbyMigrate vs Standby● Think nrofnodes > 2 clustersThink nrofnodes > 2 clusters

● Migrate : send resource to node XMigrate : send resource to node X

• Only use that available oneOnly use that available one

● Standby : do not send resources to node XStandby : do not send resources to node X

• But use the other available onesBut use the other available ones

Page 48: Linux-HA with Pacemaker

DebuggingDebugging● Check crm_mon -f Check crm_mon -f

● Failcounts ? Failcounts ?

● Did the application launch correctly ?Did the application launch correctly ?

● /var/log/messages//var/log/messages/

• Warning: very verboseWarning: very verbose

Page 49: Linux-HA with Pacemaker

Resource not runningResource not running[menos-val3-root@mrs-a ~]# crm

crm(live)# resource

crm(live)resource# show

Resource Group: svc-MRS

d_MRS (ocf::ntc:tomcat) Stopped

ip_MRS_svc (ocf::heartbeat:IPaddr2) Stopped

ip_MRS_usr (ocf::heartbeat:IPaddr2) Stopped

Page 50: Linux-HA with Pacemaker

Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm

crm(live)# resource

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=1

crm(live)resource# failcount d_MRS delete mrs-a

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=0

Page 51: Linux-HA with Pacemaker

Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm

crm(live)# resource

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=1

crm(live)resource# failcount d_MRS delete mrs-a

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=0

Page 52: Linux-HA with Pacemaker

Resource FailcountResource Failcount[menos-val3-root@mrs-a ~]# crm

crm(live)# resource

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=1

crm(live)resource# failcount d_MRS delete mrs-a

crm(live)resource# failcount d_MRS show mrs-a

scope=status name=fail-count-d_MRS value=0

Page 53: Linux-HA with Pacemaker

Pacemaker and PuppetPacemaker and Puppet

● Plenty of non usable modules aroundPlenty of non usable modules around

• Hav1Hav1

● https://github.com/rodjek/puppet-pacemaker.githttps://github.com/rodjek/puppet-pacemaker.git

• Strict set of ops / parametersStrict set of ops / parameters

● Make sure your modules don't enable resourcesMake sure your modules don't enable resources

● I've been using templates till to populateI've been using templates till to populate

● Cibadm to configure Cibadm to configure

● Crm is complex , even crm doesn't parse correctly yetCrm is complex , even crm doesn't parse correctly yet

● Plenty of work ahead ! Plenty of work ahead !

Page 54: Linux-HA with Pacemaker

Getting HelpGetting Help● http://clusterlabs.orghttp://clusterlabs.org

● #linux-ha on irc.freenode.org#linux-ha on irc.freenode.org

● http://www.drbd.org/users-guide/http://www.drbd.org/users-guide/

Page 55: Linux-HA with Pacemaker

Contact :Contact :Kris Buytaert Kris Buytaert [email protected]@inuits.be

Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.inuits.be/http://www.virtualizatihttp://www.virtualization.com/on.com/http://www.oreillygmt.com/http://www.oreillygmt.com/ EsquimauxEsquimaux

Kheops Business Kheops Business CenterCenterAvenque Georges Avenque Georges Lemaître 54Lemaître 546041 Gosselies6041 Gosselies889.780.406889.780.406+32 495 698 668 +32 495 698 668

InuitsInuits't Hemeltje't HemeltjeGemeentepark 2Gemeentepark 22930 Brasschaat2930 Brasschaat891.514.231891.514.231

+32 473 441 636 +32 473 441 636