-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
1/99
2009 Cisco Systems, Inc. All rights reserved. Cisco PublicBRKDCT-2987 1
Data Center Disaster Recovery andBusiness Continuance
BRKDCT-2987
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
2/99
2009 Cisco Systems, Inc. All rights reserved. Cisco PublicBRKDCT-2987 2
HousekeepingWe value your feedback- don't forget to complete youronline session evaluations after each session &complete the Overall Conference Evaluation which willbe available online from Thursday
Visit the World of Solutions
Please remember this is a 'non-smoking' venue!
Please switch off your mobile phones
Please make use of the recycling bins provided
Please remember to wear your badge at all times
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
3/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 3BRKDCT-2987
Cost of application downtime, lost dataand productivity
Regulatory mandates (Homeland
Defense, Basel II, HIPAA, GLB, SEC)Firms must recover business operations thesame business day a disruption occursOut -of-region data center, 200+ miles awayMandates backup data centers on separate
grids
Hurricanes
The Northeast Blackout
NYC Blizzard of 2003
Business Continuance Drivers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
4/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 4BRKDCT-2987
Business Continuance Is More Critical than Ever
75% of IT decision-makers have altered DisasterRecovery/Business Continuance programs as aresult of September 11
Following a disaster 43% of directly affected
businesses do not reopen and 29% fail within 24months as a result
Only 15% of Global 2000 enterprises have a full-fledged business continuity plan.
Disasters: fire, storm, floods, earthquakes, chemicalaccidents, nuclear accidents, wars
Sources: Disaster Recovery Journal, Gartner Group
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
5/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 5BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure ScenariosDesign Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - ClusteringData Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
6/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 6BRKDCT-2987
The Evolution of Data Centers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
7/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 7BRKDCT-2987
Data Center Evolution
1960 1980 2000 2010
B
u s
i n e s s
A g
i l i t y
NETWORKED DATACENTER PHASE
Mainframes
Terminal
Client/Server
COMPUTEEVOLUTION
NETWORKEVOLUTION
NetworkOptimization
InternetComputing
ContentNetworking
Data Center Continuous Availability
Data CenterConsolidation
Data CenterDistributed
TCP/IP
Thin Client: HTTP
1. Consolidation2. Integration3. Distributed4. High Availability
Data Center Networking
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
8/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 8BRKDCT-2987
What is involved in a Data Center
Application solution
Database solution
Linux/HP,Solaris/SunFire,WebLogic, J2EEcustom app, etc.
Linux/HP, Solaris/SunFire, Oracle10G RAC, etc.
Storage solutionMDS9000
Network infrastructure solutionCisco GSRs,CISCO CATALYST6500 , Cisco CatalystCat4000
Layer 4 7 services solution
Network security solution
Management and instrumentation solution
CSM,SSLM,CSS,CE, GSS
PIX,FWSM,IDSM,VPNSM,CSA
Terminalservers, NAM,Cisco WorksLMS/VMS,HSE
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
9/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 9BRKDCT-2987
What is Distributed Data Center
PrimaryData Center
SecondaryData Center
APP A APP B APP A APP C
Data Replication
FC FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
10/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 10BRKDCT-2987
Why Distributed Data Centers
Provide disaster recovery and business continuance
Avoid single , concentrated data depositary
High availability of applications and data access
Load balancing together with performance scalability
Better response and optimal content routing: proximityto clients
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
11/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 11BRKDCT-2987
Front-end IP Access Layer
Content Routingsite selection
PrimaryData Center
SecondaryData Center
APP A APP B APP A APP C
FC FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
12/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 12BRKDCT-2987
Application and Database Layer
PrimaryData Center
SecondaryData Center
APP A APP B APP A APP C
FC FC
Content SwitchingLoad Balancing
Server ClusteringHigh Availability
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
13/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 13BRKDCT-2987
Backend SAN Extension
PrimaryData Center
SecondaryData Center
APP A APP B APP A APP C
FC FC
Storage & OpticalDataMirroring and Replication
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
14/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 14BRKDCT-2987
Data Center Disaster Recovery
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
15/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 15BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure ScenariosDesign Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - ClusteringData Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
16/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 16BRKDCT-2987
Disaster Recovery
Recovery of data and resumption of service - Ensuringbusiness can recover and continue after failure ordisaster
Ability of a business to adapt, change and continue whenconfronted with various outside impacts
Mitigating the impact of a disaster
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
17/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 17BRKDCT-2987
What It means For Business
Business ResilienceContinued Operation of Business During a Failure
Disaster Recovery
Protecting Data Through OffsiteData Replicationand Backup
Business ContinuanceRestoration of Business
After a Failure
Zero Down Time is the ultimate goal
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
18/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 18BRKDCT-2987
Disaster Recovery Planning
Business Impact Analysis ( BIA)Determines the impacts of various disasters to specific businessfunctions and company assets
Risk AnalysisIdentifies important functions and assets that are critical tocompanys operations
Disaster Recovery Plan ( DRP )Restores operability of the target systems, applications, orcomputing facility at the secondary Data Center after the disaster
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
19/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 19BRKDCT-2987
Disaster Recovery Objectives
Recovery Point Objective (RPO)The point in time (prior to the outage) in which system and data
must be restored to
Tolerable lost of data in event of disaster or failure
The impact of data loss and the cost associated with the lossRecovery Time Objective (RTO)
The period of time after an outage in which the systems and datamust be restored to the predetermined RPO
The maximum tolerable outage timeRecovery Access Objective (RAO)
Time required to reconnect user to the recovered application,regardless where it is recovered
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
20/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 20BRKDCT-2987
Recovery Point/Time vs. Cost
Smaller RPO/RTOHigher $$$, Replication, Hot
standby
Larger RPO/RTOLower $$$, Tape backup/restore,
Cold stanby
time
Disaster strikes
time t 1 time t 2
Systems recoveredand operational
Recovery time
ExtendedCluster
ManualMigration
TapeRestore
secs mins hours days weeks
$$$ Increasing cost
Recovery point
SynchronousReplication
secsminshoursdays
AsynchronousReplication
PeriodicReplication
Tapebackup
time t 0
$$$ Increasing cost
Critical data isrecovered
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
21/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
22/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
23/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 23BRKDCT-2987
Network Failures
InternetServiceProvider A
ServiceProvider B
ISP failureDual ISP connections
Multiple ISP
Connection failure within thenetwork
ether-channelMultiple route paths
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
24/99 2010 Cisco Systems, Inc. All rights reserved. Cisco Public 24BRKDCT-2987
Device Failures
InternetServiceProvider A
ServiceProvider B
Routers, Switches, FWsHSRP
VRRP
HostsHA cluster
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
25/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 25BRKDCT-2987
Storage Failures
InternetServiceProvider A
ServiceProvider B
Disk arraysRAID
Disk Controllers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
26/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 26BRKDCT-2987
Site Failures
InternetServiceProvider A
ServiceProvider B
Partial Site Failure Application maintenance
Application migration Application scheduled DRexercise
Complete Site FailureDisaster
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
27/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 27BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure ScenariosDesign Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - ClusteringData Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
28/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 28BRKDCT-2987
Cold Standby
One or more data center with appropriately configuredspace equipped with pre-qualified environmental,electrical, and communication conditioning
Hardware and Software installation, Network access, anddata restoration all need manual intervention
Least expensive to implement and maintain
Substantial delay from standby to full operation
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
29/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 29BRKDCT-2987
Disaster Recovery Active/Standby
PrimaryData Center
SecondaryData Center
(Cold Standby)
APP A APP B APP A APP B
FC FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
30/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 30BRKDCT-2987
Warm Standby
A data center that is partially equipped with hardware andcommunications interfaces capable of providing backupoperating support.
Latest backups from the production data center must bedelivered
Network access needs to be activated
Provides better RTO and RPO than Cold StandbyBackup
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
31/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 31BRKDCT-2987
Disaster Recovery Active/Standby
PrimaryData Center
SecondaryData Center
(Warm Standby)
APP A APP B APP A APP B
IP/Optical Network
FC FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
32/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 32BRKDCT-2987
Hot Standby
A data center that is environmentally ready and hassufficient hardware, software to provide data processingservice with little down or no down time.
Hot Backup offers Disaster Recovery, with little or nohuman intervention
Application data is replicated from the primary site
A hot backup site provides very good RTO and RPO
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
33/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 33BRKDCT-2987
Disaster Recovery Active/Standby
PrimaryData Center
SecondaryData Center
APP A APP B APP A APP C
IP/Optical Network
FC FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
34/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
35/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 35BRKDCT-2987
Multiple Tiers of Application
Presentation Tier
Application Tier
Storage Tier
InternetServiceProvider A
ServiceProvider B
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
36/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 36BRKDCT-2987
InternalNetwork
Active/Active Application Processing
Active/StandbyDatabase Processing
Or Active/Active
InternalNetwork
Active/Active WebHosting
Active/Active Data Centers
InternetServiceProvider A
ServiceProvider B
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
37/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
38/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 38BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure ScenariosDesign Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - ClusteringData Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
39/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 39BRKDCT-2987
Site Selection Mechanisms
Site selection mechanisms depend on the technologyor mix of technologies adopted for request routing :
1. HTTP Redirect
2. DNS Based
3. L3 Routing with Route Health Injection (RHI)
Health of servers and/or applications needs to betaken into account
Optionally, other metrics (like load ) can be measuredand utilized for a better selection
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
40/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 40BRKDCT-2987
HTTP Redirection The Idea
Leveraging the HTTP redirect function:HTTP return code 302
Proper site selection made after the initial DNS request
has been resolved, via redirectionMainly as a method of providing site persistence whileproviding local server farm failure recovery
Can be used with the Location Cookie feature of theCSS to provide redirection after wrong site selection
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
41/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 41BRKDCT-2987
HTTP Redirection Traffic Flow
http://www2.cisco.com/
http://www1.cisco.com/
http://www.cisco.com/
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
42/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 42BRKDCT-2987
Advantages of the HTTP RedirectionApproach
Can be implemented without any otherGSLB devices or mechanisms
Inherent persistence to the selectedlocation
Can be used in conjunction with othermethods to provide more sophisticated
site selection
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
43/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
44/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 44BRKDCT-2987
DNS-Based Site Selection The Idea
The client D-proxy (local name server) performsiterative queries
The device which acts as site selector is theauthoritative name server for the domain(s) distributed
in multiple locationsThe site selector sends keepalives to servers orserver load balancer in the local and remote locations
The site selector selects a site for the name
resolution, according to the pre-defined answers andsite load balance method
The user traffic is sent to the selected location
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
45/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 46BRKDCT-2987
DNS-Based Site Selection Traffic Flow
Client
DNS Proxy
Data Center 1
http://www.cisco.com/
Root Name Server for/ Authoritative Name Server for .com
Authoritative Name Server cisco.com
AuthoritativeName Server
www.cisco.com
1
23 4
56
78
9
10
Data Center 2
UDP:53
TCP:80
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
46/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 48BRKDCT-2987
Advantages of the DNS Approach
Protocol independent: works with anyapplication that uses name resolution
Minimal configuration changes in the currentIP and DNS infrastructure (DNS authoritativeserver)
Implementation can be different for specifichost names
A-records can be changed on the fly
Can take load or data center size intoaccount
Can provide proximity
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
47/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 49BRKDCT-2987
Limitations of the DNS-Based Approach
Visibility limited to the D-proxy (not theclient)
Can not guarantee 100% sessionpersistency
DNS caching in the D-proxy
DNS caching in the client application
Order of multiple A-record answerscan be altered by D-proxies
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
48/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 50BRKDCT-2987
Route Health Injection The Idea
Server and application health monitoring provided bylocal Server Load Balancers
SLB can advertise or with draw VIP address to upstreamrouting devices depending on the availability of the localserver farm
Same VIP addresses can be advertised from multipledata centers IP Anycast
Relying on L3 routing protocols for route propagatingand content request routing
Disaster Recovery provided by network convergence
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
49/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 51BRKDCT-2987
Route Health Injection Implementation
Client BClient ARouter 13
Router 11
Router 12
Router 10
Location BPreferred Location for
VIP x.y.w.z
Location ABackup Location for
VIP x.y.w.z
Very High Cost
Low Cost
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
50/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 52BRKDCT-2987
Advantages of the RHI Approach
Supports legacy application and does notrely on a DNS infrastructure
Very good re-convergence time,especially in Intranets where L3 protocolscan be fine tuned appropriately
Protocol-independent: works with anyapplication
Robust protocols and proven features
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
51/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 53BRKDCT-2987
Limitations of the RHI Approach
Relies on host routes (32 bits), whichcannot be propagated all over theinternet (more on this later)
Requires tight integration between theapplication-aware devices and the L3routers
Inability to intelligently load balanceamong the data centers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
52/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 54BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure Scenarios
Design Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - ClusteringData Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
53/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 55BRKDCT-2987
Cluster Overview A cluster is two or more serversconfigured to appear as oneTwo types of clustering: Loadbalancing ( LB) and High
Availability ( HA)
Clustering provides benefits foravailability , reliability, scalability ,and manageability
LB clustering: multiple copies ofthe same application against thesame data set, usually read only
HA clustering: multiple copies oflong running application thatrequires access to a common datadepository, usually read and write
Application Servers
Web Servers
Database Servers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
54/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 56BRKDCT-2987
HA Cluster Connections
Public Network (typicallyEthernet) for client /Applicationrequests
Servers with same hardware,OS, and application software
Private Network (typicallyEthernet) for interconnectionbetween nodes. Could be directconnect, or optionally goingthrough the public network
Storage Disk (typically Fiber)
shared storage array, NAS orSAN
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
55/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 57BRKDCT-2987
Typical HA Cluster Components
Application software that are clustered to provide High Availability. Example: Microsoft Exchange, SQL, Oracledatabase, File and Print Services
Operating System that runs on the server hardware.Example: Microsoft Windows 2000 or 2003, Linux (and the
other flavors of UNIX), IBM VMS or z/OS (for mainframe)Cluster Software that provides the HA clustering servicefor the application. Example: Microsoft MSCS, EMC
AutoStart (Legato), Veritas Cluster Server, HP TruClusterand OpenVMS
Optionally, Cluster Enabler , a software that synchronizesthe cluster software with the storage disk array software
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
56/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
57/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 59BRKDCT-2987
File System Approaches for HA Clusters
Shared Everything Equal access to all storage
Each node mounts all storage resources
Provides a single layout reference system for all nodes
Changes updated in the layout reference
Shared Nothing Traditional file system with peer-peer communication
Each node mounts only its semi -private storage
Data stored on the peer systems storage is accessed via the peer -peer communication
Failed nodes storage needs to be mounted by the peer
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
58/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 60BRKDCT-2987
Geo-clusters
node1 node2
Local Datacenter
RemoteDatacenter
WAN
Disk Replication
Synchronous or Asynchronous
2 x RTT
Geo-cluster: cluster that span multiple data centers
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
59/99
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
60/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 62BRKDCT-2987
Split-Brain
Split-brain happens when all of thenetwork communication linksbetween two or more cluster nodesfail.
Both nodes could potentially goactive, and concurrently access thedisk, thus corrupting data
node1 node2
Data Corruption
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
61/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 63BRKDCT-2987
Resolution for Split Brain: Quorum
A quorum device serves as a tiebreaker to arbitrate which system hasaccess to resources.
The quorum ensures that even if thereis no communication between thenodes, only one node can continue toaccess the disk.
Only the node that owns the quorum(or, majority quorum votes) can bringresources online.
Any resource can be used as thearbitrator to break the tie.
node1 node2
quorum
Application data
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
62/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 64BRKDCT-2987
Extended Layer 2 Network
In most implementation,a common L2 network isneeded for the heartbeatbetween the nodes, aswell as public clientaccess
Extending VLAN on ageographical basis is notconsidered best practicebecause of the impact ofbroadcasts, multicast,flooding and Spanning-Tree integration issues
Public Layer 2 network
Private Layer 2 networknode1 node2
Local Datacenter
RemoteDatacenter
WAN
Disk Replication:Synchronous or Asynchronous
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
63/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 65BRKDCT-2987
Resolution: L3 Routed Solution
In certain cases a L3 routed solutionis possible
Microsoft MSCS Requires that 2 nodes be on thesame subnet.
The communication between the 2
nodes is UDP unicast Local Area Mobility (LAM) allows theplacement of the nodes on 2 differentsubnets
Veritas VCS Allows having nodes with IPaddresses in different subnets
The Virtual Address needs to changewhen moving from node1 to node2
DNS can be used to provide name-multiple IP mapping
node1 node2
Extended SAN
11.20.5.x 172.28.210.x
Disk Replication:Synchronous or Asynchronous
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
64/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 66BRKDCT-2987
Storage Disk Zoning
What storage disk arrayshould node 2 be zoned tobefore and after a failure onnode 1
To complete the failover youneed to change the zoningconfiguration
Software needed tosynchronize the ClusterSoftware with the Disk Arrayssoftware, i.e. Cluster Enabler
RW RD
RW RD
node1 node2
Extended SAN
sym1320 sym1291
standbyactive
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
65/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 67BRKDCT-2987
Resolution: Cluster Enabler
The Cluster Enabler (CE) providesthe interface between theClustering Software and the Disk
Arrays software
When the Clustering Softwaredetects a failure and wants to failthe node, the Cluster Enablerinstructs the Disk Array to performan failover
Cluster Enabler also allows node1to be zoned to sym1320 andnode2 to be zoned to 1291
The Cluster Enabler running oneach node typically communicateswith the Cluster Enabler Softwarerunning on the remote node withLocal Multicast messages RW WD
RW WD
node1 node2
Extended SAN
sym1320 sym1291
active standby
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
66/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 68BRKDCT-2987
Agenda
Introduction to Data Center - The Evolution
Data Center Disaster RecoveryObjectives
Failure Scenarios
Design Options
Components of Disaster RecoverySite Selection - Front End GSLB
Server High Availability - Clustering
Data Replication and Synchronization - SAN Extension
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
67/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 69BRKDCT-2987
Terminology
Storage subsystemJust a bunch of disks (JBOD)
Redundant array of independent disks (RAID)
Storage I/O devicesHost Bus Adapter (HBA)
Small Computer Serial Interface (SCSI)
Storage protocols
SCSIiSCSI
FC (FCIP)
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
68/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 70BRKDCT-2987
Terminology (Contd)
Direct Attached Storage (DAS)Storage is local behind the server
No storage sharing possible
Costly to scale; complex to manage
Network Attached Storage (NAS)Storage is accessed at a file level over an IP network
Storage can be shared between servers
Storage Area Networks (SAN)Storage is accessed at a block-level
Separation of Storage from the Server
High performance interconnect providing high I/O throughput
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
69/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 71BRKDCT-2987
Storage for ApplicationsPresentation Tier
Unrelated small data files commonly stored on internal disks
Manual distribution
Application Processing Tier
Transitional, unrelated dataSmall files residing on file systems
May use RAID to spread data over multiple disks
Storage Tier
Large, permanent data files or raw dataLarge batch updates, most likely Real time
Log and data on separate volumes
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
70/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 72BRKDCT-2987
Backup and Replication
Offsite tape vaultingBackup tapes stored at offsite location
Electronic vaultingTransmission of backup data to offsite location
Remote disk replicationContinuous copying of data to offsite location
Transparent to host
Other methods of replicationHost-based mirroring
Network-based replication
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
71/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 73BRKDCT-2987
Replication: Modes of Operation
Synchronous All data written to cache of local and remote arrays before I/O iscomplete and acknowledged to host
AsynchronousWrite acknowledged after write to local array cache; changes(writes) are replicated to remote array asynchronously
Semi-synchronous
Write acknowledged with a single subsequent WRITE commandpending from remote array
Synchronous Vs. Asynchronous Trade-
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
72/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 74BRKDCT-2987
SynchronousImpact to ApplicationPerformance
Distance Limited (Are BothSites within the SameThreat Radius)
No Data Loss
AsynchronousNo ApplicationPerformance Impact
Unlimited Distance (SecondSite Outside Threat Radius)
Exposure toPossible Data Loss
y V . yOff
Enterprises Must Evaluate the Trade-Offs
Maximum tolerable distance ascertained byassessing each application
Cost of data loss
http://www.legalaid.canberra.net.au/html/scales.gif -
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
73/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 75BRKDCT-2987
Data Replication with DB Example
Control Files identify other filesmaking up the database andrecords content and state ofthe db.Datafile is only updatedperiodically
Redo logs record db changesresulting from transactions
Used to play back changes thatmay not have been written todatafile when failure occurredTypically archived as they fill tolocal and DR site destinations
Control Files
Datafiles Redo LogFiles
Identify
Recordchanges to
DB name
creation date
backup performed
redo log time period
datafile state
Tablespaces
Indexes
Data Dictionary
Database changes
Data Replication with DB Example
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
74/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 76BRKDCT-2987
p p(Contd)
Database restored to state at time of failure (time t1)by:
1. Restoring Control Files & Datafiles from last Hot
Backup (time t0)2. Sequentially replaying changes from subsequent
Redo Logs (archived and online) changes madebetween time t0 and t1
Hot Backup ofDatafiles and
Control Files takenat Time t 0
t0
time
t1
Failure or disaster occurs attime t 1 Media Failure (e.g. disk) Human Error (datafile deletion) Database Corruption
Archived Redo Logs Online RedoLogs
. . . . . . . . .
Data Replication with DB Example
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
75/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 77BRKDCT-2987
p p(Contd)
Mixture of sync and async replication technologies commonly usedUsually only redo logs sync replicated to remote site
Archive logs created from redo log and copied when redo log switches
Point in time (PiT) copies of datafiles and control files copied periodically(e.g. nightly)
Redo Logs (Cyclic)Redo Logs (Cyclic)Copy of Every Committed
Transaction
Archive Logs
Synchronously Replicatedfor Zero Loss
Replicated/Copied
Primary Site Secondary Site
Replicated/Copied
Point in TimeCopy Taken
When DBQuiescent
Database
Database
copy attime t 0
DatabaseCopy atTime t 0
Earlier DBBackups
Archive Logs
SANExtensionTransport
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
76/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 78BRKDCT-2987
Back-End ApplicationServers
HighDensity
Multilayer SAN
Director
Enterprise-Class Storage Arrays
IntrusionDetection
Internet
Server Load Balancing
Content Caching
Stateful
Firewalls
HighDensity
Multilayer LAN
Switch
Front-End Application
Servers
Data Center Interconnection Options
Back-End ApplicationServers
HighDensity
Multilayer SAN
Director
Enterprise-Class storage Arrays
IntrusionDetection
Internet
Server Load Balancing
Content Caching
Stateful Firewalls
HighDensity
Multilayer LAN
Switch
Front-End Application
Servers
SONET/SDH
DWDM/CWDM
IP/Metro E
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
77/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 79BRKDCT-2987
Limited by Optics (Power Budget)
Data Center Transport Options
Dark Fiber
CWDM
DWDM
SONET/SDH
DataCenter Campus Metro Regional National
Increasing Distance
Sync
Sync (2Gbps)
Sync (2Gbps lambda)
Sync (1Gbps+ subrate)
Sync (Metro Eth)
Async
Async (1Gbps+)MDS9000 FCIP
Limited by Optics (Power Budget)
Limited by BB_Credits O p
t i c a
l
I P
Data Center Replication with SAN
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
78/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 80BRKDCT-2987
pExtension
Extend the normal reach ofa Fibre Channel fabric
Replication
Remote host to target array
Shared data clusters
FC FC
SAN ExtensionNetwork
Replication
Shared DataCluster or
Remote Host Access to
Storage
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
79/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 81BRKDCT-2987
DCInterconnect
Network
Site B
ReplicationFabrics
FCReplication
fabrics
SAN Design for Data Replication
Servers with two fibrechannel connections tostorage arrays for highavailability
Use of multipath software isrequired in dual fabric hostdesign
SAN extension fabrics
typically separate fromhost access fabricsReplication fabricrequirements generallyspecified by array vendor
Site A Server Access
FC
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
80/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 82BRKDCT-2987
Data Center Disaster Recovery
Sample Design
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
81/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 83BRKDCT-2987
Disaster Impact Radius
Disasters are characterized bytheir impact
Local, metro, regional, globalFire, flood, earthquake, attack
Is the backup site within the threatradius?
Local1 2 km
Metro< 50km
Regional< 400km
PrimaryData Center
SecondaryData Center DR Site
Global
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
82/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 84BRKDCT-2987
Active/Standby Architecture - Today
Hosts 1
Storage 1
Synch CWDMReplication
Hosts 2
MDS 9509s MDS 9509s
Hosts 3
MDS 9509s
MDS 9509Gateway
MDS 9509Gateway
Synch FCIPReplication
MDS 9509Gateway
Storage 2 Storage 3
HA Cluster(s)
Bunker
AsynchronousFCIP Replication
CAHigh Availability Site 1
CAHigh Availability Site 2
NCDisaster Recovery Site
Dual OC12
Electronic Journaling
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
83/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 85BRKDCT-2987
Frame Based Replication
ProductionCluster
Data Center 1
R2 BCV/R1SRDF
PiTPiT
PiTPiT
MDS
Arch
Redo
PROD
EMC/DMXEMC/DMX
DUAL OC12
D/R
MDS
Arch
Redo
D/R
EMC/DMX
BCV Timefinder Timefinder
SRDF/ASRDF/ASRDF/A
Data Center 2
Triple Threat
A i /A i A hi T
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
84/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 86BRKDCT-2987
Active/Active Architecture - Tomorrow
User SSLM
decryptsrequest
CSMroutes
request
Requestsdirected to
backupapplication
ACNScachespages
ContentEngine
CSMprobestrack
applicationhealth
GSS performs Site (DC) selectionaccording to pre-configured condition, using
FQDN
Requestsdirected to
primaryapplication
Service Locator Group
Presentation Layer
Data Centers
ClusteredBackendX Active
Y Standby
DC1
ClusteredBackendY Active
X Standby
DC2
ActiveData X
ActiveData Y
ActiveData Y
ActiveData X
Mirror
StandbyData Y
StandbyData X
Asynchronous
Replication
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
85/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 87BRKDCT-2987
SANTap and Continuous Data Protection
CDP Appliance
Production Servers
MDSSAN
SecondaryPrimary
SANTap Appliance based storage replication Reliable copy of WRITE operations SCSI-FCIP communication
Continuous Data Protection Automatic and Continuous Backups Time Addressable Storage (TAS) Any Point-in-Time Recovery Application based or Network based
SAN Tap
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
86/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 88BRKDCT-2987
MDS
Fabric Based Replication with CDP
ProductionCluster
Data Center 1
MDS
Arch
Redo
PROD
EMC/DMX
DUAL OC12
D/R
Data Center 2
SANTap
Replication/CDP Appliance
Replication/CDP Appliance
TAS/SATA
APiT
APiT
APiT
APiT
TAS/SATA
APiT
APiT Arch
Redo
BCV
EMC/DMX
D/R
SRDF/ASRDF/ASRDF/A
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
87/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 89BRKDCT-2987
End-End Data Center Resilience
PrimaryLocation
IP/Optical Network
FC
FC
SecondaryLocation
DB
CWDM/DWDM
CSS-1
FC
DC-3Web/APP
ServerFarm
DC-2DC-1
GSS-1 GSS-2
CSS-2 CSS-3
Corp.DNS
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
88/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 90BRKDCT-2987
Design Details
Data centers 1 and 2 are in primary location with closeenough distance that can provide DC HA for active/activeaccess
Data Center 3 (DR) with > tolerable disaster radius, awayfor Primary DC 1 and 2
Web/App server farms are load balanced geographicallyDB servers are within a geo-HA cluster and running in aL3 design
Synchronize Data replication between data centers withinthe primary location
Asynchronous Data replication is done between theprimary and secondary storage systems
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
89/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 91BRKDCT-2987
Business Continuity Planning
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
90/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 92BRKDCT-2987
BCP Concept: Two Tiers
BCP Management Tier Issues BCP policy
Champions the Process
Executes the Plan
BCP Process Tier Develops and maintains the Plan
Consists of stages
BCP Lifecycle
BCP Management
BCP Process
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
91/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 93BRKDCT-2987
Management and Process ActivitiesCreate BusinessContinuity Policy
Establish BCP SteeringCommittee
Establish BC PlanDevelopment Project
Establish BCP Trainingand Awareness Program
Coordinate BCP withPertinent Laws,
Regulations, and Industry
Standards
Coordinate with OtherInternal / External BCP
Related Agencies
Plan Development Project Maintain DisasterReadiness Project Execute BC Plan
Risk Management
Business Impact Analysis
BC Strategy Development
BC Plan Development
BC Plan Testing
BC Plan Maintenance andRegular Testing
B C P M a n a g e m e n
t -
B C P P r o c e s s -
l
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
92/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 94BRKDCT-2987
Example: Ciscos Corporate Program
BCP Concept is a modelCommonly adapted (tailored) to a specificorganizations needs
TestingEmbedding
BCMBCM
TrainingBusinessContinuity
Plan DevelopmentBCM
Program
Initiation
Assessment
BCP D li bl
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
93/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 95BRKDCT-2987
Plan Development Project
Risk Management
Business Impact Analysis
BC Strategy Development
BC Plan Development
BC Plan Testing
BCP Deliverables
Risk and Controls
Threats, Exposures,Risk Levels, and
Risk Controls
Business Impacts
Critical Processes,Operational and
Financial Impacts,and RecoveryRequirements Continuity
Strategy
Alternative CriticalResources andServices, and
Recovery Methods
B i I A l i
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
94/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 96BRKDCT-2987
Business Impact Analysis
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
95/99
References
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
96/99
2010 Cisco Systems, Inc. All rights reserved. Cisco Public 98BRKDCT-2987
References
www.drj.com
www.drii.org
www.contingencyplanning.org
www.thebci.org
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
97/99
C l t Y S i E l ti
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
98/99
2009 Cisco Systems, Inc. All rights reserved. Cisco Public 100
Complete Your Session Evaluation
Please give us your feedback!!
Complete the evaluation form you weregiven when you entered the room
This is session BRKDCT-2987
Dont forget to complete the overallevent evaluation form included inyour registration kit
YOUR FEEDBACK IS VERYIMPORTANT FOR US!!! THANKS
-
8/13/2019 6.Data Center Disaster Recovery and Business Continuance
99/99