mysql ha with pacemaker

43
MySQL HA MySQL HA with PaceMaker with PaceMaker Kris Buytaert #opendbcamp

Upload: kris-buytaert

Post on 06-May-2015

4.954 views

Category:

Technology


5 download

DESCRIPTION

My opendbcamp 2011 presentation on Pacemaker and MySQL opportunities

TRANSCRIPT

Page 1: MySQL HA with  Pacemaker

MySQL HAMySQL HAwith PaceMakerwith PaceMaker

Kris Buytaert

#opendbcamp

Page 2: MySQL HA with  Pacemaker

Kris BuytaertKris Buytaert

● I used to be a Dev, Then Became an Op,I used to be a Dev, Then Became an Op,● Today I feel like a dev againToday I feel like a dev again● Senior Linux and Open Source Consultant @inuits.beSenior Linux and Open Source Consultant @inuits.be● „„Infrastructure Architect“Infrastructure Architect“● Building Clouds since before the Cloud Building Clouds since before the Cloud ● Surviving the 10Surviving the 10thth floor test floor test● Co-Author of some books Co-Author of some books ● Guest Editor at some sitesGuest Editor at some sites

Page 3: MySQL HA with  Pacemaker

In this presentationIn this presentation● High Availability ?High Availability ?

● MySQL HA SolutionsMySQL HA Solutions

● Linux HA / PacemakerLinux HA / Pacemaker

Page 4: MySQL HA with  Pacemaker

What is HA Clustering ?What is HA Clustering ?

● One service goes down One service goes down

=> others take over its work=> others take over its work

● IP address takeover, service takeover, IP address takeover, service takeover,

● Not designed for high-performanceNot designed for high-performance

● Not designed for high troughput (load Not designed for high troughput (load balancing)balancing)

Page 5: MySQL HA with  Pacemaker

Lies, Damn Lies, and Lies, Damn Lies, and StatisticsStatistics

Counting ninesCounting nines(slide by Alan R)(slide by Alan R)

99.9999% 30 sec99.999% 5 min99.99% 52 min99.9% 9  hr  99% 3.5 day

Page 6: MySQL HA with  Pacemaker

The Rules of HAThe Rules of HA

● Keep it SimpleKeep it Simple

● Keep it SimpleKeep it Simple

● Prepare for FailurePrepare for Failure

● Complexity is the enemy of reliabilityComplexity is the enemy of reliability

● Test your HA setup Test your HA setup

Page 7: MySQL HA with  Pacemaker

Eliminating the SPOFEliminating the SPOF

● Find out what Will Fail

• Disks

• Fans

• Power (Supplies)

● Find out what Can Fail

• Network

• Going Out Of Memory

Page 8: MySQL HA with  Pacemaker

Data vs ConnectionData vs Connection● DATA : DATA :

• ReplicationReplication

• Shared storage Shared storage

• DRBDDRBD

● ConnectionConnection

• LVSLVS

• ProxyProxy

• Heartbeat / PacemakerHeartbeat / Pacemaker

Page 9: MySQL HA with  Pacemaker

Shared StorageShared Storage● 1 MySQL instance1 MySQL instance

● Monitor MySQL node Monitor MySQL node

● StonithStonith

● $$$ $$$ 1+1 <> 21+1 <> 2

● Storage = SPOF Storage = SPOF

● Split Brain :(Split Brain :(

Page 10: MySQL HA with  Pacemaker

DRBDDRBD● Distributed Replicated Block DeviceDistributed Replicated Block Device

● In the Linux KernelIn the Linux Kernel

● Usually only 1 mountUsually only 1 mount

• Multi mount as of 8.X Multi mount as of 8.X

• Requires GFS / OCFS2Requires GFS / OCFS2

● Regular FS ext3 ... Regular FS ext3 ...

● Only 1 MySQL instance Active accessing dataOnly 1 MySQL instance Active accessing data

● Upon Failover MySQL needs to be started on Upon Failover MySQL needs to be started on other nodeother node

Page 11: MySQL HA with  Pacemaker

DRBD(2)DRBD(2)● What happens when you pull the plug of a What happens when you pull the plug of a

Physical machine ? Physical machine ?

• Minimal TimeoutMinimal Timeout

• Why did the crash happen ? Why did the crash happen ?

• Is my data still correct ?Is my data still correct ?

• Innodb Consistency Checks ?Innodb Consistency Checks ?

• Lengthy ?Lengthy ?

• Check your BinLog size Check your BinLog size

Page 12: MySQL HA with  Pacemaker

Other Solutions TodayOther Solutions Today

● MySQL Cluster NDBDMySQL Cluster NDBD

● Multi Master ReplicationMulti Master Replication

● MySQL ProxyMySQL Proxy

● MMMMMM

● FlipperFlipper

● BYOBYO

● .... ....

Page 13: MySQL HA with  Pacemaker

Pulling TrafficPulling Traffic● Eg. for Cluster, MultiMaster setups Eg. for Cluster, MultiMaster setups

• DNSDNS

• Advanced RoutingAdvanced Routing

• LVSLVS

• Or the upcoming slidesOr the upcoming slides

Page 14: MySQL HA with  Pacemaker

Linux-HA PaceMakerLinux-HA PaceMaker● Plays well with othersPlays well with others

● Manages more than MySQL Manages more than MySQL

● ...v3 .. don't even think about the rest anymore...v3 .. don't even think about the rest anymore

● http://clusterlabs.org/http://clusterlabs.org/

Page 15: MySQL HA with  Pacemaker

Heartbeat v1Heartbeat v1• Max 2 nodesMax 2 nodes• No finegrained resourcesNo finegrained resources• Monitoring using “mon”Monitoring using “mon”

/etc/ha.d/ha.cf/etc/ha.d/ha.cf/etc/ha.d/haresources/etc/ha.d/haresourcesmdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 \

IPaddr2::10.16.0.13/16/bond0.16 monIPaddr2::10.16.0.13/16/bond0.16 mon

/etc/ha.d/authkeys/etc/ha.d/authkeys

Page 16: MySQL HA with  Pacemaker

Heartbeat v2Heartbeat v2

• Stability issuesStability issues

• Forking ?Forking ?

““A consulting Opportunity”A consulting Opportunity”

LMBLMB

Page 17: MySQL HA with  Pacemaker

Clone ResourceClone Resource

Clones in v2 were buggyClones in v2 were buggy

Resources were started on 2 nodesResources were started on 2 nodes

Stopped again on “1” Stopped again on “1”

Page 18: MySQL HA with  Pacemaker

Heartbeat v3Heartbeat v3

• No more /etc/ha.d/haresourcesNo more /etc/ha.d/haresources

• No more xmlNo more xml

• Better integrated monitoringBetter integrated monitoring

• /etc/ha.d/ha.cf has /etc/ha.d/ha.cf has

• crm=yescrm=yes

Page 19: MySQL HA with  Pacemaker

Pacemaker ?Pacemaker ?● Not a fork Not a fork

● Only CRM Code taken out of Heartbeat Only CRM Code taken out of Heartbeat

● As of Heartbeat 2.1.3As of Heartbeat 2.1.3

• Support for both OpenAIS / HeartBeatSupport for both OpenAIS / HeartBeat

• Different Release Cycles as Heartbeat Different Release Cycles as Heartbeat

Page 20: MySQL HA with  Pacemaker

Heartbeat, OpenAis, Heartbeat, OpenAis, Corosync ?Corosync ?● All Messaging LayersAll Messaging Layers

● Initially only HeartbeatInitially only Heartbeat

● OpenAISOpenAIS

● Heartbeat got unmaintainedHeartbeat got unmaintained

● OpenAIS had heisenbugs :(OpenAIS had heisenbugs :(

● Corosync Corosync

● Heartbeat maintenance taken over by LinBitHeartbeat maintenance taken over by LinBit

● CRM Detects which layerCRM Detects which layer

Page 21: MySQL HA with  Pacemaker

OpenAISHeartbeat

Pacemaker

Cluster Glue

or

Page 22: MySQL HA with  Pacemaker

● Stonithd : The Heartbeat fencing subsystem.

● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts).

● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration.

● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes.

● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster.

● openais messaging and membership layer.

● heartbeat messaging layer, an alternative to OpenAIS.

● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.

Pacemaker ArchitecturePacemaker Architecture

Page 23: MySQL HA with  Pacemaker

Configuring Heartbeat CorrectlyConfiguring Heartbeat Correctly

heartbeat::hacf {"clustername":heartbeat::hacf {"clustername":

hosts => ["host-a","host-b"],hosts => ["host-a","host-b"],

hb_nic => ["bond0"],hb_nic => ["bond0"],

hostip1 => ["10.0.128.11"],hostip1 => ["10.0.128.11"],

hostip2 => ["10.0.128.12"],hostip2 => ["10.0.128.12"],

ping => ["10.0.128.4"],ping => ["10.0.128.4"],

} }

heartbeat::authkeys {"ClusterName":heartbeat::authkeys {"ClusterName":

password => “ClusterName ",password => “ClusterName ",

}}

http://github.com/jtimberman/puppet/tree/master/heartbeat/http://github.com/jtimberman/puppet/tree/master/heartbeat/

Page 24: MySQL HA with  Pacemaker

CRM CRM ● Cluster Resource Cluster Resource

ManagerManager

● Keeps Nodes in SyncKeeps Nodes in Sync

● XML BasedXML Based

● cibadm cibadm

● Cli manageableCli manageable

● Crm Crm

configureconfigureproperty $id="cib­bootstrap­options" \property $id="cib­bootstrap­options" \                stonith­enabled="FALSE" \stonith­enabled="FALSE" \                no­quorum­policy=ignore \no­quorum­policy=ignore \                start­failure­is­fatal="FALSE" \start­failure­is­fatal="FALSE" \rsc_defaults $id="rsc_defaults­options" \rsc_defaults $id="rsc_defaults­options" \                migration­threshold="1" \migration­threshold="1" \                failure­timeout="1"failure­timeout="1"primitive d_mysql ocf:local:mysql \primitive d_mysql ocf:local:mysql \                op monitor interval="30s" \op monitor interval="30s" \                params test_user="sure" test_passwd="illtell" params test_user="sure" test_passwd="illtell" test_table="test.table"test_table="test.table"primitive ip_db ocf:heartbeat:IPaddr2 \primitive ip_db ocf:heartbeat:IPaddr2 \                params ip="172.17.4.202" nic="bond0" \params ip="172.17.4.202" nic="bond0" \                op monitor interval="10s"op monitor interval="10s"group svc_db d_mysql ip_dbgroup svc_db d_mysql ip_dbcommitcommit

Page 25: MySQL HA with  Pacemaker

Heartbeat ResourcesHeartbeat Resources● LSBLSB

● Heartbeat resource (+status)Heartbeat resource (+status)

● OCF (Open Cluster FrameWork) (+monitor)OCF (Open Cluster FrameWork) (+monitor)

● Clones (don't use in HAv2)Clones (don't use in HAv2)

● Multi State ResourcesMulti State Resources

Page 26: MySQL HA with  Pacemaker

LSB Resource AgentsLSB Resource Agents● LSB == Linux Standards BaseLSB == Linux Standards Base

● LSB resource agents are standard System V-LSB resource agents are standard System V-style init scripts commonly used on Linux and style init scripts commonly used on Linux and other UNIX-like OSes other UNIX-like OSes

● LSB init scripts are stored under /etc/init.d/LSB init scripts are stored under /etc/init.d/

● This enables Linux-HA to immediately support This enables Linux-HA to immediately support nearly every service that comes with your nearly every service that comes with your system, and most packages which come with system, and most packages which come with their own init scripttheir own init script

● It's straightforward to change an LSB script to It's straightforward to change an LSB script to an OCF scriptan OCF script

Page 27: MySQL HA with  Pacemaker

OCF OCF ● OCF == Open Cluster FrameworkOCF == Open Cluster Framework

● OCF Resource agents are the most powerful type of OCF Resource agents are the most powerful type of

resource agent we supportresource agent we support

● OCF RAs are extended init scriptsOCF RAs are extended init scripts• They have additional actions:They have additional actions:

• monitor – for monitoring resource healthmonitor – for monitoring resource health• meta-data – for providing information about the RA meta-data – for providing information about the RA

● OCF RAs are located in OCF RAs are located in /usr/lib/ocf/resource.d/provider-name//usr/lib/ocf/resource.d/provider-name/

Page 28: MySQL HA with  Pacemaker

MonitoringMonitoring● Defined in the OCF Resource scriptDefined in the OCF Resource script

● Configured in the parametersConfigured in the parameters

● You have to support multiple states You have to support multiple states

• Not runningNot running

• RunningRunning

• FailedFailed

Page 29: MySQL HA with  Pacemaker

Anatomy of a Cluster Anatomy of a Cluster configconfig

• Cluster propertiesCluster properties

• Resource DefaultsResource Defaults

• Primitive DefinitionsPrimitive Definitions

• Resource Groups and ConstraintsResource Groups and Constraints

Page 30: MySQL HA with  Pacemaker

Cluster PropertiesCluster Properties

property $id="cib-bootstrap-options" \ property $id="cib-bootstrap-options" \

stonith-enabled="FALSE" \ stonith-enabled="FALSE" \

no-quorum-policy="ignore" \ no-quorum-policy="ignore" \

start-failure-is-fatal="FALSE" \ start-failure-is-fatal="FALSE" \

No-quorum-policy = We'll ignore the loss of quorum on a 2 node clusterNo-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster

Start-failure : Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-stickinessstickiness

Page 31: MySQL HA with  Pacemaker

Resource DefaultsResource Defaults

rsc_defaults $id="rsc_defaults-options" \ rsc_defaults $id="rsc_defaults-options" \ migration-threshold="1" \ migration-threshold="1" \ failure-timeout="1" \ failure-timeout="1" \ resource-stickiness="INFINITY" resource-stickiness="INFINITY"

failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the node on which it failed.node on which it failed.

Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeMigration-treshold=1 means that after 1 failure the resource will try to start on the other node

Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.

Page 32: MySQL HA with  Pacemaker

Primitive DefinitionsPrimitive Definitions

primitive d_mine ocf:custom:tomcat \primitive d_mine ocf:custom:tomcat \ params instance_name="mine" \params instance_name="mine" \ monitor_urls="health.html" \monitor_urls="health.html" \ monitor_use_ssl="no" \ monitor_use_ssl="no" \ op monitor interval="15s" \op monitor interval="15s" \

on-fail="restart" \ on-fail="restart" \

primitive ip_mine_svc ocf:heartbeat:IPaddr2 \primitive ip_mine_svc ocf:heartbeat:IPaddr2 \ params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \params ip="10.8.4.131" cidr_netmask="16" nic="bond0" \ op monitor interval="10s"op monitor interval="10s"

Page 33: MySQL HA with  Pacemaker

Parsing a configParsing a config● Isn't always done correctlyIsn't always done correctly

● Even a verify won't find all issuesEven a verify won't find all issues

● Unexpected behaviour might occurUnexpected behaviour might occur

Page 34: MySQL HA with  Pacemaker

Where a resource runs Where a resource runs

• multi state resourcesmulti state resources• Master – Slave , Master – Slave ,

• e.g mysql master-slave, drbde.g mysql master-slave, drbd• ClonesClones

• Resources that can run on multiple nodes Resources that can run on multiple nodes e.ge.g

• Multimaster mysql serversMultimaster mysql servers• Mysql slavesMysql slaves• Stateless applicationsStateless applications

• location location • Preferred location to run resource, eg. Based on hostnamePreferred location to run resource, eg. Based on hostname

• colocation colocation • Resources that have to live together Resources that have to live together

• e.g ip address + servicee.g ip address + service• order order

Define what resource has to start first, or wait for another resourceDefine what resource has to start first, or wait for another resource• groups groups

• Colocation + orderColocation + order

Page 35: MySQL HA with  Pacemaker

eg. A Service on DRBDeg. A Service on DRBD

● DRBD can only be active on 1 nodeDRBD can only be active on 1 node

● The filesystem needs to be mounted on that The filesystem needs to be mounted on that active DRBD nodeactive DRBD node

group svc_mine d_mine ip_minegroup svc_mine d_mine ip_mine

ms ms_drbd_storage drbd_storage \ ms ms_drbd_storage drbd_storage \

meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1" meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1" notify="true" notify="true"

colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master

order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start

location cli-prefer-svc_db svc_db \ location cli-prefer-svc_db svc_db \

rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-arule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a

Page 36: MySQL HA with  Pacemaker

A MySQL Resource A MySQL Resource ● OCFOCF

• Clone Clone

• Where do you hook up the IP ?Where do you hook up the IP ?

• Multi State Multi State

• But we have Master Master replication But we have Master Master replication

• Meta ResourceMeta Resource

• Dummy resource that can monitor Dummy resource that can monitor

• ConnectionConnection

• Replication stateReplication state

• ........

Page 37: MySQL HA with  Pacemaker

Simple 2 node exampleSimple 2 node exampleprimitive d_mysql ocf:ntc:mysql \primitive d_mysql ocf:ntc:mysql \ op monitor interval="30s" \op monitor interval="30s" \ params test_user="just" test_passwd="kidding" test_table="really"params test_user="just" test_passwd="kidding" test_table="really"

primitive ip_mysql_svc ocf:heartbeat:IPaddr2 \primitive ip_mysql_svc ocf:heartbeat:IPaddr2 \ params ip="10.8.0.30" cidr_netmask="255.255.255.0" params ip="10.8.0.30" cidr_netmask="255.255.255.0" nic="bond0" \nic="bond0" \ op monitor interval="10s"op monitor interval="10s"

group svc_mysql d_mysql ip_mysql_svc group svc_mysql d_mysql ip_mysql_svc

Page 38: MySQL HA with  Pacemaker

Monitor your SetupMonitor your Setup● Not just connectivity Not just connectivity

● Also functional Also functional

• Query dataQuery data

• Check resultset is correctCheck resultset is correct

● Check replication Check replication

• MaatKit MaatKit

• OpenARKOpenARK

Page 39: MySQL HA with  Pacemaker

How to deal with replication state ? How to deal with replication state ?

● Multiple slavesMultiple slaves

• Use Drbd ocf resource Use Drbd ocf resource

● 2 masters only use own script2 masters only use own script

• Replication is slow on the active node Replication is slow on the active node

• Shouldn't happen talk to HR / cfgmt peopleShouldn't happen talk to HR / cfgmt people

• Replication is slow on the passive node Replication is slow on the passive node

• Weight-- Weight--

• Replication breaks on the active node Replication breaks on the active node

send out warning, don't modify weights and check other nodesend out warning, don't modify weights and check other node

• Replication breaks on the passive node Replication breaks on the passive node

• Fence of the passive node Fence of the passive node

Page 40: MySQL HA with  Pacemaker

Node A Node B

HeartBeat

Pacemaker

“MySQLd” “MySQLd”

Hardware

Cluster Stack

Resource MySQL

ReplicationService IP MySQL

Adding MySQL to the Adding MySQL to the stackstack

Page 41: MySQL HA with  Pacemaker

Pitfalls & SolutionsPitfalls & Solutions● Monitor, Monitor,

• Replication stateReplication state

• Replication LagReplication Lag

● MaatKitMaatKit

● OpenARKOpenARK

Page 42: MySQL HA with  Pacemaker

ConclusionConclusion● Plenty of AlternativesPlenty of Alternatives

● Think about your DataThink about your Data

● Think about getting Queries to that DataThink about getting Queries to that Data

● Complexity is the enemy of reliabilityComplexity is the enemy of reliability

● Keep it SimpleKeep it Simple

● Monitor inside the DBMonitor inside the DB

Page 43: MySQL HA with  Pacemaker

ContactContactKris Buytaert Kris Buytaert [email protected]@inuits.be

Further ReadingFurther Reading@KrisBuytaert @KrisBuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.inuits.be/http://www.virtualization.com/http://www.virtualization.com/http://www.oreillygmt.com/http://www.oreillygmt.com/

EsquimauxEsquimauxKheops Business Kheops Business CenterCenterAvenque Georges Avenque Georges Lemaître 54Lemaître 546041 Gosselies6041 Gosselies889.780.406889.780.406+32 495 698 668 +32 495 698 668

InuitsInuits't Hemeltje't HemeltjeGemeentepark 2Gemeentepark 22930 Brasschaat2930 Brasschaat891.514.231891.514.231

+32 473 441 636 +32 473 441 636