episode 3 db2 purescale availability and recovery [read only] [compatibility mode]

12
DB2 pureScale Availability & Recovery © 2010 IBM Corporation October 13, 2010 Aamer Sachedina ([email protected]) Kelly Schlamb ([email protected])

Upload: laura-hood

Post on 10-Jun-2015

736 views

Category:

Documents


5 download

DESCRIPTION

Slides from episode 3 of the recent DB2 pureScale webcast series with the IBM Lab team.

TRANSCRIPT

Page 1: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

DB2 pureScaleAvailability & Recovery

© 2010 IBM CorporationOctober 13, 2010

Aamer Sachedina ([email protected])Kelly Schlamb ([email protected])

Page 2: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Continuous Availability

� Protect from infrastructure outages

Automatic workload balancing

Duplexed secondary global lock

© 2009 IBM Corporation2

Automatically recovers from component failures

Tolerates multiple node failures

Duplexed secondary global lock and memory manager

Page 3: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

DB2 Cluster Services Overview

DB2DB2 DB2 DB2 DB2

� Integrated DB2 component� Single install as part of DB2 installation� Upgrades and maintenance through DB2 fixpack

© 2009 IBM Corporation3

CF

DB2 Cluster Services: Cluster File System

(GPFS)

DB2 Cluster Services:

Cluster Manager (RSCT) Cluster Automation (Tivoli SA MP)

DB2DB2 DB2 DB2 DB2

CF

Page 4: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

DB2 Cluster Services

DB2 Cluster ServicesReliable Scalable Cluster Technology

Tivoli Systems Automation for Multi-PlatformsIBM General Parallel File System

© 2009 IBM Corporation4

� DB2 CS tightly integrates these IBM products into DB2 pureScale

� DB2 instance creation creates RSCT and GPFS domains across hosts

� Single command used to add hosts to the instance:

db2iupdt –add -m newhost.acme.com db2inst1

Page 5: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

DB2 pureScale HA Architecture

Member

DB2 CS

Member

DB2 CS

Member

DB2 CS

Member

DB2 CS

© 2009 IBM Corporation5

Cluster Interconnect

GPFS

2nd-ary

CSCS

PrimarySecondary

Page 6: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Application Servers andDB2 Clients

Virtually Instantaneous Recovery From Node Failure

� Protect from infrastructure related outages– Automatically

redistribute workload to

© 2009 IBM Corporation6

redistribute workload to surviving nodes

– Automatically recover

in-flight transactions in

as little as 15-20 seconds including

detection

of the problem

Page 7: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Minimize the Impact of Planned Outages

Bring node

� Keep your system up– During OS fixes– HW updates– Administration

© 2009 IBM Corporation7

Identify MemberDo MaintenanceBring node back online

Page 8: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Member Hardware Failure

Clients

� Power cord tripped over accidentally

� DB2 Cluster Services looses heartbeat and declares member down

– Informs other members & CF servers– Fences member from logs and data– Initiates automated member restart on another

(“guest”) host> Using reduced, and pre-allocated memory model

– Member restart is like a database crash recovery in a single system database, but is much faster

• Redo limited to inflight transactions (due to FAC)• Benefits from page cache in CF

� In the mean-time, client connections

Single Database View

Automatic;

© 2009 IBM Corporation8

Log

CS

CS

DB2

Shared Data

� In the mean-time, client connections are automatically re-routed to healthy members

– Based on least load (by default), or,– Pre-designated failover member

� Other members remain fully available throughout – “Online Failover”

– Primary retains update locks held by member at the time of failure

– Other members can continue to read and update data not locked for write access by failed member

� Member restart completes– Retained locks released and all data fully

available

CS

DB2

CS

DB2

CS

Updated Pages Global Locks

LogLogLog

PrimarySecondary

Updated Pages Global Locks

CS

DB2

DB2

Ultra Fast;

Online

Almost all data remains available. Affected connections transparently re-routed to other members.

Page 9: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Member Failback

Clients� Power restored and system re-booted

� DB2 Cluster Services automatically detects system availability– Informs other members and

Single Database View

© 2009 IBM Corporation9

Log

CS

CS

DB2

Shared Data

– Informs other members and PowerHA pureScale servers

– Removes fence– Brings up member on home host

� Client connections automatically re-routed back to member

CS

DB2

CS

CS

Updated Pages Global Locks

LogLogLog

PrimarySecondary

Updated Pages Global Locks

CS

DB2

DB2

DB2

Page 10: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Information Management

Member Hardware Failure and Failback

> db2instance -list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

0 MEMBER STARTED host0 host0 NO

1 MEMBER STARTED host1 host1 NO

2 MEMBER STARTED host2 host2 NO

3 MEMBER STARTED host3 host3 NO

4 CF PRIMARY host4 host4 NO

5 CF PEER host5 host5 NODB2 DB2 DB2 DB2

host1host0 host3host2

> db2instance -list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

0 MEMBER STARTED host0 host0 NO

1 MEMBER STARTED host1 host1 NO

2 MEMBER STARTED host2 host2 NO

3 MEMBER RESTARTING host3 host2 NO

4 CF PRIMARY host4 host4 NO

5 CF PEER host5 host5 NOCS

> db2instance -list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

0 MEMBER STARTED host0 host0 NO

1 MEMBER STARTED host1 host1 NO

2 MEMBER STARTED host2 host2 NO

3 MEMBER WAITING_FOR_FAILBACK host3 host2 NO

4 CF PRIMARY host4 host4 NO

5 CF PEER host5 host5 NO

© 2009 IBM Corporation10

5 CF PEER host5 host5 NO

HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO

host1 ACTIVE NO NO

host2 ACTIVE NO NO

host3 ACTIVE NO NO

host4 ACTIVE NO NO

host5 ACTIVE NO NO

0 host0 0 - MEMBER

1 host1 0 - MEMBER

2 host2 0 - MEMBER

3 host3 0 - MEMBER

4 host4 0 - CF

5 host5 0 - CF

db2nodes.cfg

Shared Data

host4

PrimarySecondary

5 CF PEER host5 host5 NO

HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO

host1 ACTIVE NO NO

host2 ACTIVE NO NO

host3 INACTIVE NO YES

host4 ACTIVE NO NO

host5 ACTIVE NO NO

Log

DB2

CS

LogLogLog

DB2

5 CF PEER host5 host5 NO

HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO

host1 ACTIVE NO NO

host2 ACTIVE NO NO

host3 INACTIVE NO YES

host4 ACTIVE NO NO

host5 ACTIVE NO NO

host5

Shared Data

Page 11: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

Failure Mode

DB2 DB2DB2 DB2

CF CF

Member

OtherMembersRemainOnline ?

Automatic &Transparent ? Comments

Only data that was in-flight on failed memberremains locked temporarily.

Connections to failed member transparently

Summary : Single Failure

DB2 DB2DB2 DB2

CF CF

DB2 DB2DB2 DB2

CF CF

PrimaryCF

SecondaryCF

member transparently move to another member

Momentary “blip” in CF service.

Transparent to members(In-flight CCFrequests just take a few moreseconds before completingnormally.).

Momentary “blip” in CF service.

Transparent to members(In-flight CFrequests just take a few moreseconds before completingnormally.).

Page 12: Episode 3  DB2 pureScale Availability And Recovery [Read Only] [Compatibility Mode]

DB2 DB2 DB2 DB2

CF CF

Failure Mode

OtherMembersRemainOnline ?

Automatic &Transparent ? Comments

Only data that was in-flight on failed membersremains locked temporarily.

Recoveries done in parallel.Connections to failed member transparently

Summary : Multiple Failures

DB2 DB2 DB2 DB2

CF CF

DB2 DB2 DB2 DB2

CF CF

.

Same as member failure.

Momentary, transparent, “blip”in CF service.

.

.

Same as member failure.

Momentary, transparent, “blip”in CF service.

.

member transparently move to another member

Connections to failed member transparently move to another member

Connections to failed member transparently move to another member