-- linux-ha release 2 preview february/march, 2005 linux-ha release 2 alan robertson ibm linux...

32
-- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center [email protected]

Upload: alvin-wiggins

Post on 18-Jan-2018

226 views

Category:

Documents


0 download

DESCRIPTION

-- Linux-HA Release 2 Preview February/March, 2005 What Is HA Clustering? Putting together a group of computers which trust each other to provide a service even when system components fail When one machine goes down, others take over its work This involves IP address takeover, service takeover, etc. New work comes to the “takeover” machine Not primarily designed for high-performance

TRANSCRIPT

Page 1: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 2

Alan Robertson

IBM Linux Technology Center

[email protected]

Page 2: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 2What is High-Availability (HA) Clustering?What can HA do for me?What is the Linux-HA project?Linux-HA applicationsLinux-HA customersLinux-HA release 1 capabilitiesLinux-HA release 2 capabilitiesComparative ArchitecturesRelease 2 DetailsFutures

Page 3: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

What Is HA Clustering?Putting together a group of computers which trust each other to provide a service even when system components failWhen one machine goes down, others take over its workThis involves IP address takeover, service takeover, etc.New work comes to the “takeover” machineNot primarily designed for high-performance

Page 4: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

What Can HA Clustering Do For You?

It cannot achieve 100% availability – nothing can.HA Clustering designed to recover from single faultsIt can make your outages very short

From about a second to a few minutesIt is like a Magician's (Illusionist's) trick:

When it goes well, the hand is faster than the eyeWhen it goes not-so-well, it can be reasonably visible

A good HA clustering system adds a “9” to your base availability

99->99.9, 99.9->99.99, 99.99->99.999, etc.

Complexity is the enemy of reliability!

Page 5: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Single Points of Failure (SPOFs)A single point of failure is a component whose failure will cause near-immediate failure of an entire system or serviceGood HA design eliminates of single points of failure

Page 6: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

How Does HA work?Manage redundancy to improve service availability

Like a cluster-wide-super-init on steroidsEven complex services are now “respawn”

on node (computer) deathon “impairment” of nodeson loss of connectivityfor services that aren't working (not necessarily stopped)managing very complex dependency relationships

Page 7: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Redundant CommunicationsIntra-cluster communication is critical to HA system operation

Most HA clustering systems provide mechanisms for redundant internal communication for heartbeats, etc.

External communications is usually essential to provision of service

External communication redundancy is usually accomplished through routing tricksHaving an expert in BGP or OSPF is a help

Page 8: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Redundant Data AccessReplicated

Copies of data are kept updated on more than one computer in the cluster

SharedTypically Fiber Channel Disk (SAN)Sometimes shared SCSI

Back-end Storage (“”)NFS, SMBBack-end database

Page 9: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

The Desire for HA systems

Who wants low-Who wants low-availability systems?availability systems?Why are so few systems High-Availability?

Page 10: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Why isn't everything HA?

Cost

Complexity

Page 11: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Page 12: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

The Linux-HA ProjectLinux-HA is the oldest high-availability project for Linux, with the largest associated communityThe core piece of Linux-HA is called “heartbeat”(though it does much more than heartbeat)Linux-HA has been in production since 1999, and is currently in use on about ten thousand sitesLinux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and othersLinux-HA is shipped with every major Linux distribution except one.

Page 13: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 1 ApplicationsLoad BalancersWeb ServersDatabase ServersCustom ApplicationsFirewallsRetail Point of Sale SolutionsAuthenticationFile ServersProxy ServersMedical Imaging

Almost any type server application you can think of – except SAP

Page 14: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA customersEmageonEmageon – medical imaging services Contraloria General de la RepublicaContraloria General de la Republica (Colombian government)IncredimailIncredimail bases their mail service on Linux-HA on IBM hardwareKarstadts'Karstadts' uses Linux-HA in each of several hundred storesBavarian Radio StationBavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake CityCircuit City, Autozone, others Circuit City, Autozone, others uses Linux-HA in each of several hundred

storesCitysavings BankCitysavings Bank in Munich (infrastructure)University of Toledo (US)University of Toledo (US) – 20k student Computer Aided Instruction systemAutostradaAutostrada – 230 clusters across countryThe Weather ChannelThe Weather Channel (weather.com)SonySony (manufacturing)ISO New EnglandISO New England manages power grid using 25 Linux-HA clusters

Page 15: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 1 capabilitiesSupports 2-node clustersCan use serial, UDP bcast, mcast, ucast comm.Fails over on node failureFails over on loss of IP connectivityCapability for failing over on loss of SAN connectivityLimited command line administrative tools to fail over, query current status, etc.Active/Active or Active/PassiveSimple resource group dependency modelRequires external tool for resource monitoringSNMP monitoring

Page 16: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 2 capabilitiesBuilt-in resource monitoringSupport for the OCF resource standardMuch Larger clusters supported (>= 8 nodes)Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP)XML-based resource configuration

Configuration and monitoring GUISupport for GFS cluster filesystemMulti-state (master/slave) resource support

Initially - no IP, SAN monitoring

Page 17: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Release 2 CreditsAndrew Beekhof – CRM, CIBGouchun Shi – significant infrastructure improvementsSun, Jiang Dong and Huang, Zhen – LRM, Stonithd and testingLars Marowsky-Bree – architecture, PHB :-)Alan Robertson – architecture, project leadership, original heartbeat code and testing

Page 18: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 1 Architecture

Page 19: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Linux-HA Release 2 Architecture(add TE and PE)

Page 20: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Resource Objects in Release 2Release 2 supports “resource objects” which can be any of the following:

Primitive ResourcesOCF, heartbeat-style, or LSB resource agent scripts

Resource Incarnations – need “n” resource objects - somewhereResource groups – a group of resources with implied co-location and linear ordering constraintsMulti-state resources (master/slave)

Designed to model master/slave (replication) resources (DRBD, et al)

Page 21: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Basic Dependencies in Release 2Ordering Dependencies

start before (implies stop after)

start after (implies stop before)

Mandatory Co-location Dependenciesmust be co-located withcannot be co-located with

Page 22: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Resource Location ConstraintsMandatory Constraints:

Resource Objects can be constrained to run on any selected subset of nodes. Default is none.

Preferential Constraints:Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditionsThe resource object is run on the node which has the highest weight (score)

Page 23: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Resource IncarnationsResource Incarnations allow one to have a resource which runs multiple (“n”) times on the clusterThis is useful for managing

load balancing clusters where you want “n” of them to be slave serversCluster filesystemsCluster Alias IP addresses

Page 24: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Resource GroupsResource Groups provide a shorthand for

making a creating ordering and co-location dependenciesEach resource object in the group is declared to have linear start-after ordering relationshipsEach resource object in the group is declared to have co-location dependencies on each otherThis is an easy way of converting release 1 resource groups to release 2

Page 25: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Multi-State (master/slave) Resources

Normal resources can be in one of two stable states:

Multi-state resources can have more than two stable states. For example:

This is ideal for modeling replication resources like DRBD

Page 26: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Advanced ConstraintsNodes can have arbitrary attributes associated with them in name=value formAttributes have types: int, string, versionConstraint expressions can use these attributes as well as node names, etc in largely arbitrary waysOperators:

=, != <, >, <=, >=,defined(attrname), undefined(attrname),colocated(resource id), not colocated(resource id)

Page 27: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Advanced Constraints (cont'd)Each constraint is associated with particular resource, and is evaluated in the context of a particular node.A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and a condition.If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node.Supported conditions are: (these distinctions may be unneeded ?)

can: same as prefer with MAXINT weightcannot: same as prefer with -MAXINT weightprefer: positive weightprefer not: same as prefer with negative weight

Page 28: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Security ConsiderationsCluster: A computer whose backplane is the Internet

If this isn't frightening, you don't understand...

You may think you have a secure cluster network

You're probably mistaken nowYou will be in the future

Page 29: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Secure Networks are Difficult Because...

Security is not often well-understood by adminsSecurity is well-understood by “black hats”Network security is easy to breach accidentally

Users bypass itHardware installers don't fully understand it

Most security breaches come from “trusted” staffStaff turnover is often a big issueVirus/Worm/P2P technologies will create new holes especially for Windows machines

Page 30: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Security AdviceGood HA software should be designed to assume insecure networks

Not all HA software assumes insecure networks

Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communicationCrossover cables are reasonably secure – all else is suspect

Page 31: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

References

New Web site content (in progress)(pretty - offline!)(editable)

Page 32: -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

-- Linux-HA Release 2 Preview February/March, 2005

Legal StatementsIBM is a trademark of International Business Machines Corporation.Linux is a registered trademark of Linus Torvalds.Other company, product, and service names may be trademarks or service marks of others.This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.