ame-1934 : enable active-active messaging technology to extend workload balancing and high...
TRANSCRIPT
Enable Active-Active Enterprise Messaging Technology to extend workload balancing and high availability
Session AME-1934
© 2015 IBM Corporation
Wang Bo - IBM [email protected]
Agenda
• Concepts of Business Continuity
• Business Continuity
• High Availability
• Continuous Serviceability
• Continuous Availability Cross Sites
• Messaging Technologies for Business Continuity
• Cases Sharing
1
What does business continuity mean to you?
• Why we need to have a business continuity plan (BCP)?
• Don’t panic in the event of disaster crisis
• What we need to consider when preparing a BCP?
• "backups" and their locations
• a central command center, which we call it as "Crisis Management Team (CMT)" in IBMManagement Team (CMT)" in IBM
• maintain a "contact list“
• think about all possible "scenarios" and their corresponding
action plans
• consider "critical" information or applications first
2
Different levels of business continuity
• Enterprise Business Requires Business Continuity
Standby Active-Active
0. Disaster 1. High-Availability 2. Continuous 3. Continuous
Recovery
• Restore the
business after
a disaster
• Meet Service
Availability objectives
e.g., 99.9% availability
or no more than 8
hours of down-time a
year for maintenance
and failures
serviceability
• No downtime
within one data
center (planned
or not)
3
Availability cross
sites
• No downtime ever
(planned or not)
BC Level 1 – High Availability
• HA at different levels (AIX example)• Apps follow HA principles
• Middleware HA technologies
– Clustering, DB2 pureScale, MQ multi-ins
• OS HA technologies– PowerHA (HACMP)
• Hardware HA technologies
– Disk redundancy (RAID, SDD, etc)
–– FlashCopy, Metro/Global mirror
– Server redundancy (CPU, power, etc)
– Network redundancy
• Key point is eliminating SPOF• Redundancy
• RPO = 0!
4
BC Level 2 – Continuous Serviceability
• Usually based on workload take over
• Automatically take over
• A challenge for application affinity and sequence
• Decoupling of components – easier maintenance
• Old data may be lost – could combine with HA
• Maintenance• Maintenance
• Planed and unplanned downtime
• Rolling updates
• Coexistence
• Short RTO !
5
BC Level 3 – Continuous Availability Cross Sites
• Two or more sites, separated by unlimited distances, running
the same applications and having the same data to provide
cross-site workload balancing and Continuous Availability /
Disaster Recovery
• Customer data at geographically dispersed sites kept in sync
via synchronizationGDPS/PPRC GDPS/XRC or GDPS/GM Active/ActiveGDPS/PPRC GDPS/XRC or GDPS/GM Active/Active
Failover model Failover model Near CA model
Recovery time = 2 minutes Recovery time < 1 hour Recovery time < 1 minute
Distance < 20 KM Unlimited distance Unlimited distance
CD1SOURCECD1TABLE
CD1SOURCECD1TABLE
CD1SOURCECD1TABLE
CD1SOURCECD1TABLE
CD1SOURCECD1SOURCECD1TABLECD1CD1TABLE
CD1SOURCECD1TABLE
CD1SOURCECD1TABLE
CD1SOURCECD1SOURCECD1TABLECD1CD1TABLE
6
• Care about both
RPO & RTO!
Workload Balancing Through Data Replication
• Both sides run workload simultaneously, may with same or
different volumes. But both have the full picture of data!
• Replicate data from one platform to another
• Both sides may work equally, or have different focus, like below:
• Main server still do the existing critical work.
• Meanwhile, the offloaded server can run data analysis, query data, etc.
• New business requirements, but don’t want to touch the existing server!• New business requirements, but don’t want to touch the existing server!
• When purchase a new organization, may involve a different database
on a different platform. How to centralize the data?
Site A Site B
Synchronization
OLTP QLTP
Powerful
Critical production work
(DB updates/inserts)
Strict maintenance
process
Cautions: Nobody
wants it down
Less powerful
Less critical work
(DB queries)
Work can be delayed,
but may cost high CPU
(Data analysis, credit
card anti-fraud, etc)
New workloads
7
Agenda
• Concepts of Business Continuity
• Messaging Technologies for Business Continuity
• HA Technologies
• Continuous Serviceability Technologies
• Continuous Availability Cross Sites
• Cases Sharing• Cases Sharing
8
MQ Technologies
• HA Technologies
• QSG for MQ on zOS
• Failover Technologies
• Application HA
• Continuous Serviceability Technologies
• MQ Clustering
• Rolling Upgrade• Rolling Upgrade
• Continuous Availability Cross Sites
• Data Synchronization
• Synchronization Application Design
• How To Replicate Data
• Performance Consideration
9
HA - QSG for MQ on z/OS
Queue
manager
Private
queues
Queue
manager
Private
queues
Coupling facility failure
Queue
manager
Private
queues
Queue
manager
Private
queues
Nonpersistent
messages on
private queues
OK (kept)
Queue manager failure
10
Queue
manager
Private
queues
Shared
queues
Messages on
shared queues
OK (kept)
Nonpersistent messages on
shared queues
lost (deleted)
Queue
manager
Private
queues
Shared
queues
Messages on
shared queues
OK (kept)Nonpersistent
messages on
private queues
lost (deleted)
Persistent messages on
shared queues
restored from log
HA - Failover Technologies
• Failover
• The automatic switching of availability of a service
• Data accessible on all servers
• Multi-instance queue manager
• Integrated into the WebSphere MQ product
• Faster failover than HA cluster
• Runtime performance of networked storage• Runtime performance of networked storage
• More susceptible to MQ and OS defects
• HA cluster
• Capable of handling a wider range of failures
• Failover historically slower, but some HA clusters are improving
• Some customers frustrated by unnecessary failovers
• Extra product purchase and skills required
HA - Application Availability
• Application environment
• Dependencies like a specific DB, broker, WAS?
• machine-specific or server-specific?
• Start/stop operations – sequence?
• Message loss
• Really need every message delivered?
• Application affinities• Application affinities
• MQ connectivity
12
QM1
MQ Client
Application
QM3
QM2
App 1App 1Client 1
Gateway
QMgr
QMgr
Site 1
Continuous Serviceability – MQ Cluster
• Workload Balancing• Service Availability• Location Transparency (of a kind)
Service 1
Client 1
Service 1
QMgr
Site 2
13
QMgr QMgr
Service Service
QMgr
QMgr
App 1App 1Client
New York
but separated by an ocean and 3500 miles
Global applications
Multi - Data Center using MQ Cluster
QMgr QMgr
Service Service
QMgr
QMgr
App 1App 1Client
London
• Prefer traffic to stay geographically local
• Except when you have to look further afield
• How do you do this with clusters that span geographies?…
14
QMgr
Service
QMgr
App 1App 1Client
New York
DEF QALIAS(AppQ)
TARGET(NYQ)
DEF QALIAS(NYQ)
TARGET(ReqQ)
CLUSTER(Global)
CLWLPRTY(9)
AppQ NYQ
ReqQ
A A
LonQ
A
DEF QALIAS(LonQ)
TARGET(ReqQ)
CLUSTER(Global)
CLWLPRTY(4)
Set this up – The one cluster solution
London
• Clients always open AppQ
• Local alias determines the preferred region
• Cluster workload priority is used to target geographically local cluster aliases
• Use of CLWLPRTY enables automatic failover•CLWLRANK can be used for manual failover
Service
App 1App 1Client
QMgr
AppQ
AQMgr
NYQ
ReqQ
A
LonQ
A
DEF QALIAS(AppQ)
TARGET(LonQ)
DEF QALIAS(LonQ)
TARGET(ReqQ)
CLUSTER(Global)
CLWLPRTY(9)
DEF QALIAS(NYQ)
TARGET(ReqQ)
CLUSTER(Global)
CLWLPRTY(4)
15
QMgr QMgr
Service Service
QMgr
QMgr
App 1App 1Client
New York
USAQMgr
QMgr
Set this up - The two cluster solution
QMgr QMgr
Service Service
QMgr
QMgr
App 1App 1Client
London
EUROPE
QMgrQMgr
• The service queue managers join both geographical clusters•Each with separate cluster receivers for each cluster, at different cluster priorities. Queues are clustered in both clusters.
• The client queue managers are in their local cluster only.
16
Continuous Availability Cross Sites
• Data Synchronization is the key component in Active-Active
• Capture transaction change in real-time
• Publish the change in high performance with low latency
• Messaging based implementation is proven to be the simplest
way among kinds of methods of data transmission
• A high performance, reliable messaging product is needed for
the following requirements:
• Simplifies application development
• Ease of use
• Assured message delivery
• High Performance and Scalability
• Easy of Management
17
Active-Active Common Model based on Messaging
Workload Distributor
•Cross Site Workload Distribution
•Data synchronization
•Reply on high performance, reliable messaging transmission
•Flexible application design
•Automation & Management
Business App
BusinessData
SyncApp
Messaging
SyncApp
Messaging
Business App
BusinessData
Sites at a distance
18
How to replicate data?
• Capture transaction activities through DB2 logs – an independent tool
Log-based
Capture
WebSphere MQSource Target
Highly parallel
Apply
Q
Capture
Q
Apply
• Modify the existing applications – Send out transactional data with MQ API
• At the end of existing logic, add MQPUT call to send the data. Program an apply application at the target end.
• Flexible, can cross different platforms, even different database products. But need a robust application.
• Option to choose within or without syncpoint. – Will the existing transaction fail(roll back) if the send fails?
WebSphere MQ
Q-Replication
19
Performance Tuning Considerations
• Synchronize only the changed data, thus reduce the data
volume
• Introduce more parallelism
• Multiple synchronization channels for different type of workload
• More threads in sync application for parallel processing
• Multiple MQ channels to leverage single channel busy problem• Multiple MQ channels to leverage single channel busy problem
• Invest to use MQ new feature
• Bigger buffer pools above the bar
• Sequential pre-fetch
• Page set read/write performance enhancement
• Channel performance improvement
20
MQ Buffer pools read ahead enhancement
• Symptom : When the number of messages overruns the buffer
pool allocated for the queue, messages are spilled to disk and
must then be retrieved from disk.
• The read ahead enhancement enables message pre-fetch from
disk storage and improves MQGET performance.
• Available in PM PM63802/UK79853 in 2012 and PM81785/
UK91439 in 2013.
• Internal testing shows ~50% improvement with read ahead
enabled (msglen=6KB).
• Enable this feature if MQ buffer pool may overrun.
21
Agenda
• Concepts of Business Continuity
• Messaging Technologies for Business Continuity
• Cases sharing
• Case 1 (Active/Active with QREP tool )
• Case 2 (Active/Active with application)
• Case 3 (Workload offload )• Case 3 (Workload offload )
• Case 4 (Workload offload to multiple systems)
22
Beijing data center:
For disaster recovery
Requirements of a bank – Active/Active
• A commercial bank - data centers in Shanghai and Beijing
• Beijing: One existing data center for disaster recovery
• Shanghai: One existing data center for production, and one new data center for Active-Active. 70 km between two data centers
• This bank plans to achieve Active-Active between two data centers in Shanghai for core banking business.
rows/s MB/s
OLTP 45K-50K 45
Batch 140K 50
Month-End Batch 130K 70-80
1200 km
70 km
For disaster recovery
Shanghai data center 1 Production center
Shanghai data center 2
23
Month-End Batch 130K 70-80
Interest Accrual Batch 440K 172.5
MQ in Q Replication
• Part of the InfoSphere Data Replication product
• A software-based asynchronous replication solution
• For Relational Databases
• Changes are captured from the database recovery log; transmitted as
(compact) binary data; and then applied to the remote database(s) using SQL
statements.
• Leverages WebSphere MQ for Staging/Transport
• Each captured database transactions published in an MQ message (messages
sent at each commit interval)sent at each commit interval)
• Staging makes it possible to achieve continuous operation even when the target
database is down for some time or the network encounter some problem.
24
DB2
Control Tables
Site A
DB2
Control tables
Q Capture
Q Applyagent
agent
agentUser tables
database recoverylog
User tables
Unlimited Distance
Site B
Configuration & Monitoring
logrdr publish
Data CenterWebSphere MQ
DB2 Transaction Parallel Replay
AsynchronousLOG change data capture
Active DB2Active DB2 Persistent Staging Capability
SQL statements
MQ v8.0 features for Q Rep scenarios
• Sequential pre-fetch on z/OS
• The TUNE READAHEAD(ON), TUNE RAHGET(ON) delivered to
the bank as PTF in V71 and still applicable to V8
• Pageset read/write performance enhancements for QREP on z/OS
• Changes to the queue manager deferred write processor. Now it’s
the default behavious in the V8
• 64-bit enablement of buffer pools on z/OS
• More real storages can be used as buffers
• SMF Enhancements on z/OS
• Chinit SMF helps on tuning channel performance
• 64-bit log RBA
• We probably want QREP users to get to this
• Other improvement
• z/OS miscellaneous improvements (performance and serviceability)
• Channel performance on z/OS
25
Case 2(Active/Active with application)
• Active-Active Adaptability in Small/Medium-sized Banks
• China banks have setup storage based DR solution, but the business recovery time is too long
• Sysplex solution is expensive, and input-output ratio is not high. The distance is also limited.
• Need to consider application based solution, and mix with the storage based solutionstorage based solution
• Active-Active is the target model of modern data center
• Not only for mainframe, but heterogeneous and periphery distributed platform also need to be active-active
26
Business Requirement of Active-Active
• Credit card system on mainframe is based on the VisionPlus(V+) solution by First Data.
• Improve the capacity and availability of the whole credit card system.
• More comprehensive and more efficient services by payment systems of the banks.systems of the banks.
• More flexibility accesses, more comprehensive functions of liquidity risk management, extension of the scope of system monitoring
• Refinement of backup infrastructure
27
The target Active-Active System Structure
• Both the main system and the secondary system are active
• Real data synchronization for OLTP transactions
• The main system and the secondary system backup each other
• Workload can be taken over in case of planned or unplanned failure
File TransferOLTP Batch Terminal Anti-fraud Reporting Debt-collection
2. File Transfer(Secondary) V+ Mainframe
Batch Processing
(Main) V+ Mainframe
Batch Processing
DRNETHeadquarter Gateway
FinanceProcessing in BJ
Finance Processing in SH
OLTP Processing
OLTP Processing
VISA/MC/JCB .
Non-FinanceProcessing
3. Global Mirrorfiles
Workload Split by
Card BIN, and send to
BJ and SH
1.OLTP Transaction (MQ)28
Active-Active Deployment Model
Continuous Availability – Active-Active
Encryption
Core Data
Beijing Business Continuous Availability
Achieve Business Continuous Availability Achieve Business Continuous Availability
by front end and mainframe activeby front end and mainframe active--activeactive
Reliable Services
Synchronize application data based on Synchronize application data based on
Headquarter Gateway(Route by BIN)
Encryption
Front-end App System
Core Data
Syn
c
Syn
c
Shanghai
Front-end App System(Main)
Synchronize application data based on Synchronize application data based on
MQ reliable messaging, keep data MQ reliable messaging, keep data
consistency in real timeconsistency in real time
Data Backup
Backup key business data through MQ Backup key business data through MQ series series
Data interchange in real time
The data centers could be located in long The data centers could be located in long distancedistance
29
Active-Active Logical model for OLTP
• Self implemented replication service based on WebSphere MQ
for z/OS
Beijing Site Shanghai Site
Credit Card SystemCredit Card System
Workload Distributor
MQ queue manager 1
send
VSAM
AOR
Transaction Publisher
VSAM
Transaction Replay
retrieve
MQ queue manager 2
AOR
Transaction Publisher
Transaction Replay
retrieve
send
Credit Card System
30
Planned Site Switch Over Procedure
• Stop workload routing to BJ site
• Waiting for SH site duplex as BJ site data
• Workload re-rout to SH site
• Reverse GM from site B to site A
31
Unplanned Site Switch Over Procedure
• Stop workload routing to BJ site
• Workload re-rout to SH site
• Reverse GM from site B to site A
32
Characteristics of this case
• For business which has less complex master data with less
dependent database tables. For example, Credit Card business.
• The synchronization applications need to be developed
according to your business and technical requirements, rather
than an out-of-box product.than an out-of-box product.
33
Case 3 (Workload offload )
• Purpose
• A new business – SELECT frequently.
• Existing DB2 on zOS, but wants to buy an existing solution on Linux.
• So this is an active-active data replication within the same data center, cross platform.
• Implementation
• Modify the existing core banking applications + Send with MQ logic at the end.
• On the distributed side, develop another application for DB
updates/inserts.
• Minimize the impact on the existing applications - out of syncpoint.
34
Workload offload
• Easier and Faster expand the business
• The existing business is slight touched (nearly untouched).
• Flexible, no dependencies on the type of target database.
Workload Distributor
35
Core banking system(zOS)
zOS System
Core
Workload
StandbyLinux System
Query
Workload
Active
QUERY system(Linux)
App Logic:
• Existing logic
• MQPUT (data to update in DB)
• EXEC CICS SYNCPOINT
Apply
Application
App Logic:
• According to the data received,
update target with SQL statement
or SP
MQ ChannelzOS MQ Linux MQ
Case 4 (Workload offload to multiple systems)
• Purpose
• Replicate zOS database of core credit card system to a Linux database in a near real time window. There are multiple consumers on different Linux boxes want the same data.
• Implementation
• zOS MQ dose a normal put(same as the data replication discussed in
previous pages), only one copy of data is transferred to Linux MQ.
Then this MQ dose the 1-n publication with the MQ pub/sub engine.Then this MQ dose the 1-n publication with the MQ pub/sub engine.
36
MQPUT(/credit/deposit/)CICS/Batch
QM on zOS ((((QM1)))) QM distributed ((((QM2))))
SUB2.Q
APP1.SUB
SUB1.Q
MQGET orRemote QMGR
APP2.SUB
Cluster XMITQ
Or XMITQ(hierarchy)
……10 Subs in total
MQGET orRemote QMGR
Detailed implementation on pub/sub + HA
PublisherPublisherPublisher
MQ cluster
QM0A QM0B
QMGW01 QMGW03 QMGW02
QM0A/QM0B:
DEFINE TOPIC(MYTOPIC) TOPICSTR('/Price/Bread')
DEFINE QALIAS(MYTARGET) TARGET(MYTOPIC)
TARGTYPE(TOPIC) CLUSTER(CL0)
Duplicated Apps(On gateway QMGRs):
Just put messages to queue 'MYTARGET', the cluster
will use work-load balancing logic to route them to
either QM0A or QM0B.
MQPUT
CLWLPRTY = 7CLWLPRTY = 5
37
Hierarchy
App
5 Subscription
App
5 Subscription
App
5 Subscription
App
5 Subscription
QM01 QM02 QM03 QM04
TARGTYPE(TOPIC) CLUSTER(CL0)
QM01/QM02/QM03/QM04:
ALTER QMGR PARENT(QM0A)
/* For QM03/QM04, the parent is QM0B */
DEFINE QL(MYTARGETQ1)
DEFINE QL(MYTARGETQ2)
DEFINE QL(MYTARGETQ3)
DEFINE QL(MYTARGETQ4)
DEFINE QL(MYTARGETQ5)
DEFINE SUB(SUB01) TOPICSTR('/Price/Bread')
DEST(MYTARGETQ1)
DEFINE SUB(SUB02) TOPICSTR('/Price/Bread')
DEST(MYTARGETQ2)
DEFINE SUB(SUB03) TOPICSTR('/Price/Bread')
DEST(MYTARGETQ3)
DEFINE SUB(SUB04) TOPICSTR('/Price/Bread')
DEST(MYTARGETQ4)
DEFINE SUB(SUB05) TOPICSTR('/Price/Bread')
DEST(MYTARGETQ5)
Notices and DisclaimersCopyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been
reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED,INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Bluemix, Blueworks Live, CICS, Clearcase, DOORS®, Enterprise Document
Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand, Management System™, Global Business Services ®, Global Technology Services ®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, SoDA, SPSS, StoredIQ, Tivoli®, Trusteer®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.