Cloudifying High AvailabilityApplication-level Disaster Recovery on OpenStack
Ali HodrojDirector, Solution Architecture
Context and Concepts Regions, Zones, and Single Points of Failure Challenges and Trade-Offs Architecting HA/DR Solutions with Cloudify Case Studies Resources and Q&A
AGENDA
Copyright 2014 Gigaspaces. All Rights Reserved
Context and Concepts
3
FOCUS OF THIS SESSION
Copyright 2014 Gigaspaces. All Rights Reserved
Application DR, Fault isolation strategies, deployment patterns
• Pacemaker• Corosync
messaging• HAProxy• Galera MySQL
replication
HA/DR Layers
Power, air conditioning, fire protection…etc
Fault Tolerance Ability to withstand failure and operate with
normal or degraded performance Redundancy and Replication
High Availability “The nines” – 99.99% = 33mins/year Minutes/Hours of uptime per year
Single Point of Failure Part of a system that, if it fails, it will bring
down the entire system
CONCEPTS
High Availability
RTO How much downtime are you willing to
tolerate?
RPO How much data are you willing to lose ?
Cost Development Effort Redundant environments
Disaster Recovery
Copyright 2014 Gigaspaces. All Rights Reserved
Availability includes both planned and unplanned outage
“Everything fails, all the time” Cloud vendor SLA’s demand multi-zone
outage and deployments to be effective
CONCEPTS…IN THE REAL WORLD
High Availability Disaster Recovery
Copyright 2014 Gigaspaces. All Rights Reserved
+ + +
99.95% 99.90% 99.90% 99.99%
21 minutes 43 minutes 43 minutes 21 minutes
= 99.74% 112.3 minutes
Accomplishing high levels of redundancy in the cloud is expensive
Determining an appropriate RPO and RTO is ultimately a financial calculation
Regions, Zones, and Single Points of Failure
7
8
CLOUD HIGH AVAILABILITY: MATURITY MODEL
Single server instance, same data center
Same geographical region
Same operational procedures, provider
Single Points of Failures
Copyright 2014 Gigaspaces. All Rights Reserved
9
MULTI-ZONE ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Physically separated data centers within a region
Each availability zone Independent power feeds from
separate substations Redundant Power on each rack and
diverse cabling Shared images, security groups, and
floating IPs
10
MULTI-REGION ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Geographically dispersed
architecture Disaster Recovery Patterns
Replicate stateful tiers, orchestrate stateless upon failure
Challenges Data replication costs and
performance Network flow Orchestrating recovery
11
MULTI-CLOUD ARCHITECTURE
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Leverages cloud economics Workload migration (“Own the base, rent the spike”) Least single points of failure
Disaster Recovery Patterns Replicate stateful tiers, orchestrate
stateless upon failure
Challenges Bootstrapping data for stateful
services (snapshot or async replication?)
Data replication challenges over WAN
Complex setup
Challenges and Trade-Offs
12
13
DEPLOYMENT (ACCIDENTAL) COMPLEXITY
Consistent deployment
Cross zone configuration
Machine images, security groups, keys
Different API, zone/region hierarchies
Accidental Complexity: The higher we move in the HA scale, the less manageable the deployments become
Copyright 2014 Gigaspaces. All Rights Reserved
Replication in itself is useless, it’s the recovery orchestration that counts
Compute, Storage Cost
Bandwidth Cost
COST OF REDUNDANCY
Cost
VM Startup time / Instance Acquisition Latency/Bandwidth across regions General performance (IOPS, SSD)
RTO/RPO Impacting
Copyright 2014 Gigaspaces. All Rights Reserved
http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm-startup-time-in-the-cloud
Architecting HA/DR Solutions with Cloudify
15
Cloudify provides the equivalent of Amazon OpsWork on OpenStack
APP CENTRIC DEVOPShttp://appcatalog.cloudifysource.org/
Nova, Cinder, NeutronHeatOpenShift,
CloudFoundry
ORCHESTRATORS, RECIPES, AND “CLOUDS”
Existing Data Center OpenStack Private Cloud
Cloud Driver
OpenStack Public CloudOpenStack Micro Cloud
Cloud – a set of shared compute, storage, network resources behind an OpenStack API, e.g.: resource in:• Availability zone• Region• Public cloud
• HP Cloud US-West / AZ1
• RackSpace Chicago (ORD) region
• DevStack, Vagrant• Recipe
Development & DR testing
• Bare metal or virtual environment
18
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Automation First(operational processes)
• Decouple the Application from the infrastructure
(design for failure)
• Use Plug-In approach to plug the right cloud for the Job
(balance cost, complexity, testing)
• Aggressive monitoring across the app stack
19
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Automation First(operational processes)
Provision
Install
Configure
Deploy
Monitor
Scale
https://github.com/CloudifySource/cloudify-recipes/
20
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Decouple the Application from the infrastructure
(design for failure)
Storage
Network
Cloud Templates
Compute
21
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Use Plug-In approach to plug the right cloud for the Job
(balance cost, complexity, testing)
22
KEY PRINCIPLES
Copyright 2014 Gigaspaces. All Rights Reserved
• Aggressive monitoring across the app stack
Scaling rules
AutomaticFailover
Scaling rules
Case Studies(putting it all together)
23
24
DR ELASTICITY CONTINUUM
Cold/WarmDisaster Recovery
HotDisaster Recovery
Higher RTO
Lower Cost
Lower RTO
Higher Cost
Copyright 2014 Gigaspaces. All Rights Reserved
Operationally Critical
Business Critical
Mission Critical
25
OPERATIONALLY CRITICAL: COLD DR
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Financial Services customer, post-trade processing application
• Cold Disaster Recovery (clone your recipe on another cloud in case of disaster)
• Recipes used for Disaster Recovery planning trade-off analysis
26
BUSINESS CRITICAL: CROSS-REGION DR
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Transportation/Logistics Big Data / Realtime Analytics
• Autoscaling JBoss
• 4 services recipes deployed across both regions
• Recipes orchestrate setup, snapshot, and provisioning of PostgreSQL, Cassandra replication
• Federated data between cloud controllers (failover, polling, SQL master/slave promotion)
27
MISSION CRITICAL: IN-MEMORY WAN REPLICATION
Copyright 2014 Gigaspaces. All Rights Reserved
Characteristics Design / Recipe Implementation
Transportation/Logistics
• Replication as a Service https://github.com/dfilppi/repl-service
• Low-latency asynchronous replication across regions using in-memory replication technology (GigaSpaces XAP)
• Topologies: Master-Slave, Master-Master, Hub/Soke, Ring
• Reference data, HTTP session sharing
Resources
28
29
TRY IT OUT TODAY
Copyright 2014 Gigaspaces. All Rights Reserved
Join the communityhttp://www.cloudifysource.org
https://github.com/CloudifySource/cloudify-recipes
Try out and contribute some recipes
Questions?
30