b-whitepaper ibm gdoc cont avail dr solu 02-09 zsw03064.en-us

19
GDOC: The Distributed Continuous Availability / Disaster Recovery Solution February 2009 GDOC Geographically Dispersed Open Clusters Providing Continuous Availability & Disaster Recovery for Distributed Systems (Methodology Adapted for Symantec Storage Foundation HA/DR Technology) Teddi Maranzano [email protected]  Funso Daramola [email protected] Alfredo Fernandez [email protected]  

Upload: sivashankar-santhanam

Post on 07-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 1/19

GDOC: The Distributed Continuous Availability / Disaster Recovery SolutionFebruary 2009

GDOCGeographically Dispersed Open Clusters

Providing Continuous Availability & DisasterRecovery for Distributed Systems

(Methodology Adapted for Symantec Storage Foundation HA/DR Technology)

Teddi Maranzano [email protected]

Funso Daramola [email protected]

Alfredo Fernandez [email protected]

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 2/19

GDOC Geographically Dispersed Open ClustersPage 1

Table of Contents

Abstract.........................................................................................................................................................2Executive Summary ...................................................................................................................................... 2Lessons Learned about IT Survival ..............................................................................................................3What is GDOC?............................................................................................................................................. 4

Recovery Automation ................................................................................................................................5Data Replication........................................................................................................................................5

Replication Modes .................................................................................................................................6

Replication Technologies ......................................................................................................................8Testing.....................................................................................................................................................10Monitoring................................................................................................................................................10

Agents..................................................................................................................................................11Notification...........................................................................................................................................12Management Console..........................................................................................................................12Monitoring DR Readiness....................................................................................................................13

GDOC Architecture.....................................................................................................................................13Case Study..................................................................................................................................................14GDPS-GDOC Interface ...............................................................................................................................15IBM Global Technology Services (GTS) Offerings ..................................................................................... 16Summary ..................................................................................................................................................... 17Additional Information.................................................................................................................................17

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 3/19

GDOC Geographically Dispersed Open ClustersPage 2

AbstractGeographically Dispersed Open Clusters (GDOC) is a high availability and disaster recoverysolution for highly critical applications with aggressive recovery point and recovery timeobjectives. GDOC is a solution based on both IBM’s extensive high availability and disasterrecovery experience, and on high availability and disaster recovery software from Symantec.This white paper describes the GDOC solution in the context of distributed systems (such as

AIX®, Solaris, HP/UX, Linux®, and Microsoft® Windows®) and it also describes an interface toGeographically Dispersed Parallel Sysplex™ (GDPS®), IBM’s corresponding continuous

availability / disaster recovery Solution on mainframe systems (z/OS®

).Executive SummaryUnlike other business investments in infrastructure and inventory that can immediatelyincrease revenue or decrease costs, investments in disaster recovery (DR) are realized whenextended and unplanned downtime threatens the viability of the enterprise. Thereforecorporate management is often reluctant to make large investments in disaster recoveryunless they have personally been involved in an event that impacted areas such as revenue,customer satisfaction, and regulatory compliance.

Even delaying system maintenance to avoid downtime can jeopardize operations of theenterprise when known hardware or software issues are not addressed, or when businesscritical applications continue to run on platforms that are no longer supported by the variousvendors.

Further, in the era of the global economy, many industries are under mandates to havebusiness continuity plans in place. For example, in 2003 the Federal Reserve, the U.S. Officeof the Comptroller of the Currency and the U.S. Securities and Exchange Commission (SEC)created new business continuity objectives for all financial institutions. On the internationallevel, the Basel II Accords and the standards proposed by the International AccountingStandards Board include frameworks for managing physical risks.

What is the impact to your enterprise if your business critical applications became

unavailable? Would you lose revenue? Would you lose customers? Would you be inviolation of regulatory requirements? Would your competition be able to capitalize on yourextended downtime? Calculating the impact and costs of downtime and data loss is not asimple process because some of the effects are less recognizable than others. For example,while revenue lost during the outage can be measured, the impact to the enterprise’sreputation and goodwill is less apparent. When the business impact analysis and riskassessment are completed, the value of investing in a disaster recovery solution will becomeclear.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 4/19

GDOC Geographically Dispersed Open ClustersPage 3

In response to the various mandates, many large enterprises have deployed some type of recovery solution. In many cases it evolves as a piecemeal DR solution, incorporating thetechnologies already in-house to supplement manual processes. Also, as corporations merge,the resulting enterprise may inherit a variety of DR solutions from various vendors using verydifferent technologies, designs and business processes. Ultimately, these solutions becomemore complex resulting in solutions that are expensive, labor-intensive, and that often failwhen tested (if tested), and require the IT organization to learn how to manage multiplecomponents. GDOC is designed to help the business enterprise meet these challenges bystreamlining the automation, testing and management requirements of a world-class recoverysolution.

Lessons Learned about IT SurvivalEvents such as those on September 11, 2001 in the United States and more recent eventssuch as the 2003 power failure in the Northeast United States and Hurricane Katrina in 2005show how critical it is for businesses to be ready for both expected and unexpectedinterruptions.

Various agencies, including the Federal Reserve, the Securities and Exchange Commission,and the Office of the Comptroller of the Currency met with industry participants to discusslessons learned about IT survival. The following is a summary of those lessons:

Geographical separation of facilities and resources is critical to maintaining businesscontinuity. Any resource that cannot be replaced from external sources within the Recovery Time Objective (RTO) should be available within the enterprise, in multiple locations. Thisnot only applies to buildings and hardware resources, but also to employees and data, since planning employee and data survival is very critical. Allowing staff to work out of a homeoffice should not be overlooked as one way of being DR ready.

Depending on the RTO and Recovery Point Objective (RPO) - RTO and/or RPO aretypically expressed in hours or minutes - it may be necessary for some enterprises toimplement an in-house DR solution. If this is the case, the facilities required to achieve geographical separation may need to be owned by the enterprise.

The installed server capacity at the second data center can be used to help meet normalday-to-day data processing needs and fallback capacity can be provided either by prioritizing workloads (production, test, development, data mining) or by implementing capacity upgrades based on changing a license agreement, rather than by installing additional capacity. Disk resources need to be duplicated for disk data that is mirrored.

Recovery procedures must be well-documented, tested, maintained and available after adisaster. Data backup and/or data mirroring must run like clockwork all the time.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 5/19

GDOC Geographically Dispersed Open ClustersPage 4

It is highly recommended that the DR solution be based on as much automation as possiblesince in case of a disaster, one cannot assume that key skills will be readily available torestore IT services.

The enterprise’s critical service providers, suppliers and vendors may be affected by the samedisaster, therefore, one must enter into a discussion with them about their DR readiness.

What is GDOC?GDOC is IBM’s services framework and methodology for architecting high availability anddisaster recovery across distributed platforms such as AIX, Solaris, HP/UX, Linux, andWindows. A GDOC solution is a combination of Symantec high availability software andIBM’s service delivery methodology. GDOC uses Symantec Veritas Cluster Server HA/DR asthe mechanism to automate recovery, and offers a choice of many different data replicationtechnologies.

While the focus of this whitepaper is on high availability and disaster recovery on thedistributed platforms, it should be noted that there is a Veritas™ Cluster Server (VCS) agentfor GDPS (IBM’s corresponding continuous availability / disaster recovery Solution on z/OS).For enterprises that wish to automate both the distributed and mainframe platforms, recoveryof both environments can be driven from a single console on z/OS using the VCS agent forGDPS to integrate both the GDOC and GDPS solutions.

In the event of an outage or disaster, it can be quite challenging to manually coordinateisolated pockets of processes and procedures necessary to achieve minimal down time.Multiple manual processes are often used to implement and coordinate recovery plans thatare inherently error-prone and could significantly offset time and resources invested in eventhe best of IT solutions, leading to further unplanned and costly downtime.

GDOC addresses these problems by automating recovery at both the primary site and thedisaster recovery site (a secondary customer site or a third party site). In addition to recoveryautomation, GDOC also has a proactive method for measuring DR-readiness using a simple,non-disruptive testing feature of Veritas Cluster Server called Fire Drill, including a point-in-

time copy of data at the secondary site. GDOC is an enterprise solution that automatesrecovery from planned and unplanned failures across multiple platforms within a single siteand across sites.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 6/19

GDOC Geographically Dispersed Open ClustersPage 5

Recovery Automation

The GDOC methodology places particular emphasis on automation in order to make disasterrecovery more reliable. With GDOC, the business has the opportunity to make a businessdecision on whether or not to move operations to the DR site. This is not a decision for apiece of software or an administrator to make. The administrator’s job in the event of adisaster is to inform senior management of the actual event, what business criticalapplications are affected, the SLAs of those applications, and the estimated time to resolvethe problem that caused the event. Senior management will then make an expedited decisionon whether or not to declare a disaster. If a disaster is declared, instructions will be passedback to the administrator that site operations are to be moved. The administrator can start therecovery at the DR site with just a few clicks. From this point forward the recovery will becompletely automated – automated, but not automatic.

The need for near continuous availability of business critical applications, spanning multipleOS platforms and hardware technologies, drives the importance of automation in anyenterprise HA/DR solution. In GDOC, automation is used to replace several repeatableactivities that would otherwise require human intervention. VCS’s core engine, the High

Availability Daemon (HAD) communicates with various resources within the cluster throughagents. These agents monitor and orchestrate various components, and specify policies to actdeterministically in case of failures and other events.

VCS maintains an open framework to accommodate applications for which an agent doesn'tcurrently exist. Accelerated development is possible using this standard agent framework andIBM’s development experience.

Data Replication

Moving critical data between servers used to mean shuttling backup tapes to the DR site forrestoration there. The notable shortcomings with this approach are extended down time andunnecessary loss of data. But today’s relatively low cost of technology makes datatransmission over high speed networks and across long distances the superior choice.

Choosing the best replication mode (point-in-time, asynchronous, or synchronous) dependson a thorough understanding of the business application. A Business Impact Analysis willdetermine how critical an application is to the business organization relative to otherapplications, and some of the impacts of data loss for that application.

The result of the Business Impact Analysis will be two key factors in the IT BusinessContinuity plan: the Recovery Point Objective (RPO) and the Recovery Time Objective(RTO). The RPO is a business decision. It represents how much data can be lost before thebusiness organization begins to suffer. It is the result of combining the application’s I/O

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 7/19

GDOC Geographically Dispersed Open ClustersPage 6

profile with various replication technologies and then factoring in the network bandwidth.The RTO is a business process decision and is part of the design of the failover mechanism.It indicates how quickly the application must be restarted in order to avoid negativelyimpacting the business.

Replication Modes Periodic, or point-in-time replication, is creating a snapshot of the data and copying it to thetarget site. The impact on the application is only while the snapshot is being made, althoughit may be necessary to suspend writes in the application database. While this may beappropriate for small and relatively static files, it represents the highest exposure to data loss.

Any data written to the primary volumes between snapshots will be lost.

Asynchronous replication (Figure 1) significantly reduces the exposure to data loss but mayrequire additional infrastructure to minimize any performance impact to the application.Each write operation to the primary storage volume is almost immediately duplicated to theremote site. The application can continue processing almost immediately without waiting foran acknowledgement that the write operation completed successfully. The data replicationtechnology provides data consistency by writing data at the remote site in the same order inwhich it was written at the local site. If an outage occurs, in-flight I/Os (data written to theprimary volume but not yet written to the remote site) will be lost. Intervention by the

storage administrator or database administrator should not be required to re-establishdatabase integrity.

Synchronous replication (Figure 2) offers the least exposure to data loss, but has the highestimpact on application performance. Application write operations are simultaneously sent tothe volumes at the primary and remote sites. Because the application waits foracknowledgment that both write operations completed successfully, application performanceis negatively impacted as distance increases.

Two important considerations when choosing a replication technology are bandwidth andlatency. Network bandwidth, also called throughput, measures the amount of data that canmove over the channel per unit of time. Latency is the time required to complete a writeoperation due to the finite speed of light in optical fiber plus the delays caused by routers andswitches. Latency has an impact on the performance of write operations when synchronousreplication is used. Network bandwidth must be adequate for both synchronous andasynchronous replication to achieve the recovery point objective.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 8/19

GDOC Geographically Dispersed Open ClustersPage 7

When choosing a replication mode, consider the requirements of the application. Insynchronous mode, can the application tolerate waiting for the acknowledgement of the writeoperation from the DR site, or will the transactions time out and fail? Also consider thedistance between the primary site and the DR site. The recommended range for synchronousreplication is up to 100 km.

Figure 1: Synchronous Replication

Examples: Veritas Volume Replicator, IBM Metro Mirror, EMC SRDF

Primary Site

DatabaseServer

Database

DisasterRecovery Server

latency

RecoverySite

Copy ofDatabase

Wait on remotewrite to continue

processing

Acknowledgeremote write

Write toremote target

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 9/19

GDOC Geographically Dispersed Open ClustersPage 8

Figure 2: Asynchronous Replication

Replication Technologies

GDOC is an open and flexible solution that works with various replication technologies todeliver the desired RPO and RTO.

With volume-based replication, also called host-based, the replication technology isimplemented as a software product running in the server environment, usually along with thebusiness application. It is usually managed by the server administrators.

The replication software intercepts write operations to the primary volume or file system andduplicates them to a secondary volume or file system. For asynchronous replication mode,

server-based replication technologies contain the functionality to help ensure dataconsistency across sites.

While being volume-based makes this replication technology independent of the storagesubsystem, it can impact the application’s performance by adding complexity and consumingvaluable CPU and memory server resources.

Examples: Veritas Volume Replicator, IBM Global Mirror, EMC SRDF/A

Primary Site

DatabaseServer

Database

DisasterRecovery Server

latency

Recovery Site

Copy ofDatabase

In parallel,asychronously,acknowledgeremote write

In parallel,asychronously,write to remote

target

Do not waiton remote

write;continue

processing

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 10/19

GDOC Geographically Dispersed Open ClustersPage 9

Examples of site or campus-wide volume-based mirroring technologies includeVeritasVolume Manager and AIX LVM.Remote replication technologies includeVeritas Volume Replicator , Double-Take(Windows), IBM/Softek Replicator for UNIX ®, and AIX GLVM .

Storage-based replicationis functionally similar to volume-based replication on the server inthat write operations from the application result in writes to volumes in the primary storagearray and writes to corresponding volumes in the secondary storage array. Many storage-based replication technologies have the volume-based replication technology embeddedwithin the storage subsystem’s controller.

Storage-based replication is a leading technology in the IT marketplace, and is usuallymanaged by storage administrators. It has the advantage of not consuming server resourcesand can support multiple server types, but the replication function can add overhead to thestorage subsystem. Additionally, it often locks the enterprise into a specific storage platformat both the primary and remote sites.

Examples of storage-based replication include IBM Metro Mirror , IBM Global Mirror , EMC SRDF , and HDS TrueCopy .

Database replication, another major replication technology, is provided by speciallyengineered database management engines. While it works well for managing databaseconsistency, it adds overhead and complexity to the database and it does not providereplication services for any data outside of the database. Management of database replicationis usually done by database administrators, who may have to manually intervene tore-synchronize the database.

Popular examples of database replication include DB2® Universal Database HA/DR, Sybase Replication, Oracle Advanced Replication, and Informix® Dynamic Server .

When planning for a replication technology, consider which technologies is currently in-placeand whether they are performing effectively. Weigh the time and cost to implement thetechnology against the ease of operation and the data integrity that it provides. In some cases,

different replication technologies may be in use within an enterprise depending on theapplication platforms in play and the specific business requirements for a given application.Fortunately, the GDOC solution design coupled with the VCS technology can support thesetypes of complex environments.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 11/19

GDOC Geographically Dispersed Open ClustersPage 10

Symantec provides replication agents for VCS that allow Cluster Server to monitor system andapplication resources. Please see the Symantec Corporation Web site for more informationon supported Cluster Server agents:http://www.symantec.com/business/products/agents_options.jsp?pcid=pcat_business_cont&pvid=20_1

Testing

Testing DR capability has become a requirement in today’s business environment. Thebehavior of a VCS cluster in a GDOC environment can be predicted and tested without the

risk of any downtime. With the VCS Fire Drill tool, recovery readiness can be routinelyvalidated without interrupting production processing at the primary site, without theextensive planning, cost and disruption that are usually associated with traditional DR testing.VCS Fire Drill is executed at the DR site.

Fire Drill builds a near copy of the production service group being tested, leaving the networkcomponents out of the copy. VCS Fire Drill will use the native snapshot capability of thereplication being used in the GDOC environment. VCS then tests whether or not the FireDrill copy can be started against the snapshot of the replicated data at the DR site. If it canbe started, the automated start-up of the business critical application will succeed in the eventof a disaster. If it fails, information will be provided as to why, and any problems can be

remediated, and the environment can be tested again.These testing features of VCS are incorporated into the design of a GDOC solution to providerigorous and exhaustive testing of the cluster throughout the phases of deployment.

Monitoring

Organizations are now deploying business critical solutions with components spanningmultiple system platforms. Therefore it is important to track events as they occur across suchheterogeneous environments in order to monitor the health of any IT infrastructure. ITdepartments are usually faced with the task of delivering solutions that will provide promptand reliable event monitoring. Installing and maintaining Veritas Cluster Server as a singleproduct across several different Open System platforms enables you to use a single solution tomanage multiple, complex integrated systems. VCS uses programs called agents1 to monitorand recover cluster2 components called resources3 based on configurable parameters.

1 Agents are programs that manage computer resources, such as a volume group or IP address, within a node in acluster environment. Each type of resource requires an agent. An agent can manage multiple resources of the sametype.2 A VCS cluster comprises of related hardware and software components called Resource.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 12/19

GDOC Geographically Dispersed Open ClustersPage 11

Figure 3: Veritas Cluster Server Agents

Agents Veritas Cluster Server communicates through its agents (Figure 3) with various resourceswithin each node in the cluster to monitor, control and recover resources. There are primarilythree types of agents; bundled, enterprise and custom. Bundled agents such as NIC, Mountand Notifier are installed as part of VCS software. Enterprise agents can be installed asoptional packages and are available for major applications such as DB2, Oracle, DataGuard,WebSphere® and WebLogic. These enterprise agents have predefined functions compiled intotheir framework that interface with specific characteristics of its enterprise applicationresource type. These functions provide the interface necessary to manage complex applicationresources through common configurable options. Veritas Cluster Server comes with over fiftyagents and new applications are supported each quarter with the quarterly Agent Pack

release, thus reducing the consulting costs that come with custom development.

3 Resources are entities that can be managed, such as an application, file system, or database. Resources areclassified into types based on common definition. The resources of the same type within a cluster node are managedby an agent.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 13/19

GDOC Geographically Dispersed Open ClustersPage 12

If a compatible bundled or enterprise agent is not available for a resource, then a custom4 agent can be developed by following published guidelines

5. Based on experience deploying

GDOC HA/DR Solutions, IBM has developed custom agents for several uncommon resourcetypes that can be deployed without needing any further development.

Notification A GDOC solution uses the Notifier agent for providing notifications on cluster events. Clusterevents captured by the Notifier agent are sent to the Cluster Server Management Console, andcan also be forwarded as SNMP traps to a SNMP V2 MIB compatible enterprise monitoringtool such as HP Openview and IBM Tivoli® Monitoring. The Notifier agent can also useVeritas Cluster Server triggers to simultaneously send the events as an e-mail alert to a user-supplied recipient list. During GDOC implementation, IBM works closely with clients toseamlessly integrate the event notification capabilities of the GDOC solution with any existingenterprise monitoring system.

Management Console

A typical GDOC environment includes HA within a site and DR across multiple sites. TheVCS Management Console is a tool that provides centralized management for an entireGDOC environment from a single console. Other features of the VCS ManagementConsole are:

Centralized monitoring and control Centralized point for deploying configuration changes Management of multiple cluster environments from almost anywhere Cluster capacity trend analysis One-click migration Failed site recovery

Also, the robust capabilities of the VCS Management Console CLI and API interfaces make itpossible to integrate with other enterprise management solutions.

4 Custom agents are not supported by Symantec Technical Support5 The guidelines can be found in Veritas ™ Cluster Server Agent Developer’s Guide

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 14/19

GDOC Geographically Dispersed Open ClustersPage 13

Monitoring DR Readiness As described earlier, regular testing at the DR site validates DR-readiness. Continuousmonitoring of key components provides reassurance of DR-readiness between regular tests.Monitoring is typically done in the context of the existing systems management frameworkusing existing monitoring tools. The types of enhancements to monitoring that are requiredfor DR-readiness include monitoring key files on internal disks at each site, and monitoringthe state of data replication.

GDOC Architecture

Figure 4: GDOC Architecture

The GDOC Architecture (Figure 4) provides a solution is based on automating applicationstartup, shutdown, and recovery from a failure using Veritas Cluster Server HA/DR. VeritasCluster Server is used because it supports AIX, Solaris, HP/UX, Red Hat Linux, SUSE Linux,VMware, and Windows. There are four key elements to the GDOC solution:

Redundant infrastructure within the production site and at a secondary site Asynchronous data replication using a supported replication technology chosen by the client Point-in-time copy of data used for non-disruptive disaster recovery testing Recovery automation using Veritas Cluster Server HA/DR

Primary Site for theproduction application

Secondary Site forapplication recovery

Non-disruptiveremoterecoverytesting usingpoint-in-timecopyMonitoringMonitoringDRreadinessmonitoringoint-in-Time

CopyPoint-in-TimeCopy

VCS Cluster Active/Standby

Datareplicationacross sites

Enable remoterecovery byadding:

infrastructureat second site

Automated Recovery

Asynchronous Data Replicationsynchronous Data Replication

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 15/19

GDOC Geographically Dispersed Open ClustersPage 14

Case Study

Figure 5: Both local and geographically remote high availability

Availability within a Metropolitan Area The reference architecture in the Case Study (Figure 5) shown above is primarily for thedatabase server and middleware servers at the primary site only. Static servers such as Webservers can be redundant but do not need to share a data repository. Veritas Volume Manageris used to mirror the critical database files across IBM SAN Volume Controller clusters. Thistype of host-based mirroring prevents outages caused by a failure of any SAN component,including a storage array. The two servers shown in the diagram above can be physically nextto each other or they can be in adjacent buildings on the same campus (while still on thesame storage area network).

Veritas Cluster Server is used to protect the database or middleware (i.e. message hub) serversagainst server failures and to minimize downtime during planned outages. Both servers havephysical access to the disks that contain the database or data store, but only one serverlogically controls the files at any given time.

FlashCopy® is used to create point-in-time copies of data that can provide a backup in case of logical data corruption. Additional point-in-time copies can also be made to refresh a test orquality assurance environment.

Host Based Mirror(M1)

SVC Cluster 1SVC Cluster 1

DiskGroup A

SVC Cluster 2SVC Cluster 2

Host Based Mirror(M2)

DiskGroup A

VCS DB ClusterActive / StandbyVCS DB ClusterActive / Standby

M1 M2Improve Remote Availability byadding:

Remote clustering

Data replication toremote site

Remote point-in-time copyDisk

Group A

Improve Local Availabilityby adding:

Local clustering

Host based mirroring

Point-in-time copy

Asynchronous Replication (Global Mirror)

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 16/19

GDOC Geographically Dispersed Open ClustersPage 15

Both host based mirroring and FlashCopy provide a contingency (backout capability) formajor upgrades or changes to the environment.

Availability across a Wide Area To enable rapid recovery in the event of a disaster, hardware, software, and networkinfrastructure is deployed at a secondary site. Data is replicated to the secondary site usingstorage, database, or volume-based replication. The same application recovery automationused at the primary site is extended to the secondary site. In the case of a site-level disasterthe remote recovery site is used to continue processing.

A point-in-time copy of the data is used at the recovery site to perform non-disruptivedisaster recovery testing. Data replication is not stopped, even during a test, so that theRecovery Point Objective (RPO) service level is not suspended. The startup of theapplication(s), using a point-in-time copy of data, is fully automated.

GDPS-GDOC InterfaceWith the introduction of the Veritas Cluster Server (VCS) agent for IBM GeographicallyDispersed Parallel Sysplex (GDPS) Distributed Cluster Management (DCM), the z/OSenvironment can now be integrated with the GDOC Open Systems environment. With thisagent, GDOC can participate in coordinated, cross-platform recovery that is managed by the

GDPS DCM console.

The agent is installed in a global service group at either the primary or secondary site within aVeritas Cluster Server global cluster. By connecting to the GDPS DCM console, the agentprovides periodic cluster status information to GDPS.

The agent also executes VCS commands on behalf of the GDPS DCM console to control theVCS environment and sends trigger alerts to GDPS. For example, the agent can respond toGDPS DCM requests for Veritas Cluster Server to stop a service group or cluster, switch acluster to a remote site, or declare a site failure.

The Veritas GDPS DCM agent requires:

IBM GDPS 3.5 (and all requisite products) Veritas Cluster Server HA/DR 5.0 in a supported AIX, HP-UX, Linux or Solaris

configuration

Please review Veritas Cluster Server Agent for IBM GDPS DCM Installation andConfiguration Guide for more information about the capabilities of this VCS agent.

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 17/19

GDOC Geographically Dispersed Open ClustersPage 16

IBM Global Technology Services (GTS) OfferingsThe following GDOC services and offerings are available from IBM Global Services

Technical Consulting Workshop (TCW)The Technology Consulting Workshop, or TCW, is a 2-day onsite workshop that helps yougain consensus within the organization on what your requirements are, what a high levelsolution might be, and what the next steps should be.

There are pre-workshop activities and a pre-workshop questionnaire that enables IBM togather as much information in advance as possible and to set expectations on which topicswill be discussed during (and after) the workshop. The pre-workshop questionnaire also helpsidentify the necessary attendees for the workshop (those that can provide the requiredinformation)

There are also post-workshop activities that result in a TCW Summary Report provided toyou that includes:

Findings Recommendations Next Steps

IBM Implementation Services for Geographically Dispersed Open Clusters (GDOC)This is a multi-vendor solution designed to protect the availability of critical applications thatrun on UNIX, Microsoft, Windows, VMware or Linux operating system based servers. GDOCis based on an Open Systems Cluster architecture spread across two or more sites with datamirrored between sites to provide high availability and disaster recovery. It is designed to

provide you with similar functionality for open systems that GDPS provides for theIBM System z® mainframe running z/OS. This type of solution can provide a much shorterrecovery time for critical business applications, and is easier than recovering from tapebackup, or replicating data with manually initiated recovery processes. IBM and Symantechave co-developed an integration that links GDOC and GDPS and provides benefits such asenterprise level disaster recovery and single console management and monitoring of both themainframe and distributed disaster recovery platforms.

Finding

I m p o r t a n c

Suppor

A

B

C

E

D

Action

A - …C -…E - … Recommendations

Any WebBrowser(ortelnetclient)

CCA Management

Cluster

CCAManagement

Cluster

CCA Manager Console

VCS Cluster w/GlobalClusterOption

Primary Server HighAvailabilityStandby

OracleInstance(s)

OracleInstance(s)

DataReplication

One (orTwo) NodeVCS Cluster w/ Global

Cluster Option

Disaster RecoveryServer

Any WebBrowser(or telnet client)

CCA Manager Console

Project Plan

FramMobilisSupplychainDiscovery

AssesDetermine KPIShareleading

DeterminedesignConductvisioning

DesigCompleteDetermine

ImplemeTraiPiloRollou

Mobilise

Vision

Desig

Week 1 Month 1 Week 2 Month 2 Week3Month 3 Week5Month 4 Week 7Month 5 Week 12Month 6

Pla

Sustai

CCRs agree

Liv

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 18/19

GDOC Geographically Dispersed Open ClustersPage 17

GDOC is a services framework and methodology that includes the integration of VeritasCluster Server and associated software modules from Symantec Corporation. The solutioncomes with a base set of implementation services including:

Assessment and planning Design Solution build Testing and deployment

SummaryMany of today’s medium and large enterprises want to deploy in-house continuous availabilityor disaster recovery solutions that provide low (< 2 hours) recovery point objectives and low(< 4 hours) recovery time objectives for the most critical applications.

GDOC improves local availability by designing, implementing and testing server clustering,host based mirroring, and point-in-time data copies. GDOC provides continuous availabilitywith infrastructure at a second site, data replication across sites, non-disruptive remoterecovery testing using point-in-time copy, disaster recovery readiness monitoring, and fullyautomated recovery.

GDOC is a key component of a disaster recovery solution for the most critical applicationsthat run on either Windows or UNIX/Linux systems. In addition to data redundancy providedby continuous data replication, GDOC provides continuous availability through theimplementation of automated recovery, continuous monitoring of application recoverabilityand the ability to regularly test disaster recovery with minimal effort.

Additional InformationVeritas Data Center Software:http://www.symantec.com/business/theme.jsp?themeid=datacenter

GDOC Home Page:http://www-935. ibm.com /services/us/index.wss/offering/its/a1026541

IBM Optimization and Integration Services:http://www-935. ibm.com /services/us/index.wss/offerfamily/gts/a1027708

8/3/2019 B-whitepaper Ibm Gdoc Cont Avail Dr Solu 02-09 Zsw03064.en-us

http://slidepdf.com/reader/full/b-whitepaper-ibm-gdoc-cont-avail-dr-solu-02-09-zsw03064en-us 19/19

GDOC Geographically Dispersed Open ClustersPage 18

Corporation 2009

IBM Systems and Technology GroupRoute 100Somers, New York 10589U.S.A. Produced in the United States of America,02/2009All Rights Reserved

IBM, IBM logo, AIX, DB2, DB2 Universal Database, FlashCopy, GDPS, Geographically Dispersed Parallel

Sysplex, Informix, System z, WebSphere and z/OS are trademarks or registered trademarks of theInternational Business Machines Corporation.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarksof Adobe Systems Incorporated in the United States, and/or other countries.

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, othercountries, or both and is used under license therefrom.

InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, othercountries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in theUnited States, other countries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, IntelSpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or itssubsidiaries in the United States and other countries.

Symantec, the Symantec Logo, and Veritas are trademarks or registered trademarks of SymantecCorporation or its affiliates in the U.S. and other countries.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

ITIL is a registered trademark, and a registered community trademark of the Office of GovernmentCommerce, and is registered in the U.S. Patent and Trademark Office.

IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency,which is now part of the Office of Government Commerce.

All statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice,and represent goals and objectives only.

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections usingstandard IBM benchmarks in a controlled environment. The actual throughput that any user will experiencewill vary depending upon considerations such as the amount of multiprogramming in the user’s job stream,the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance canbe given that an individual user will achieve throughput improvements equivalent to the performance ratiosstated here.

ZSW0306 4 -USEN-00