application-level disaster recovery on openstack

Cloudifying High AvailabilityApplication-level Disaster Recovery on OpenStack

Ali HodrojDirector, Solution Architecture

Context and Concepts Regions, Zones, and Single Points of Failure Challenges and Trade-Offs Architecting HA/DR Solutions with Cloudify Case Studies Resources and Q&A

AGENDA

Copyright 2014 Gigaspaces. All Rights Reserved

Context and Concepts

3

FOCUS OF THIS SESSION


Application DR, Fault isolation strategies, deployment patterns

• Pacemaker• Corosync

messaging• HAProxy• Galera MySQL

replication

HA/DR Layers

Power, air conditioning, fire protection…etc

Fault Tolerance Ability to withstand failure and operate with

normal or degraded performance Redundancy and Replication

High Availability “The nines” – 99.99% = 33mins/year Minutes/Hours of uptime per year

Single Point of Failure Part of a system that, if it fails, it will bring

down the entire system

CONCEPTS

High Availability

RTO How much downtime are you willing to

tolerate?

RPO How much data are you willing to lose ?

Cost Development Effort Redundant environments

Disaster Recovery


Availability includes both planned and unplanned outage

“Everything fails, all the time” Cloud vendor SLA’s demand multi-zone

outage and deployments to be effective

CONCEPTS…IN THE REAL WORLD

High Availability Disaster Recovery


+ + +

99.95% 99.90% 99.90% 99.99%

21 minutes 43 minutes 43 minutes 21 minutes

= 99.74% 112.3 minutes

Accomplishing high levels of redundancy in the cloud is expensive

Determining an appropriate RPO and RTO is ultimately a financial calculation

Regions, Zones, and Single Points of Failure

7

8

CLOUD HIGH AVAILABILITY: MATURITY MODEL

Single server instance, same data center

Same geographical region

Same operational procedures, provider

Single Points of Failures


9

MULTI-ZONE ARCHITECTURE


Physically separated data centers within a region

Each availability zone Independent power feeds from

separate substations Redundant Power on each rack and

diverse cabling Shared images, security groups, and

floating IPs

10

MULTI-REGION ARCHITECTURE


Characteristics Geographically dispersed

architecture Disaster Recovery Patterns

Replicate stateful tiers, orchestrate stateless upon failure

Challenges Data replication costs and

performance Network flow Orchestrating recovery

11

MULTI-CLOUD ARCHITECTURE


Characteristics Leverages cloud economics Workload migration (“Own the base, rent the spike”) Least single points of failure

Disaster Recovery Patterns Replicate stateful tiers, orchestrate

stateless upon failure

Challenges Bootstrapping data for stateful

services (snapshot or async replication?)

Data replication challenges over WAN

Complex setup

Challenges and Trade-Offs

12

13

DEPLOYMENT (ACCIDENTAL) COMPLEXITY

Consistent deployment

Cross zone configuration

Machine images, security groups, keys

Different API, zone/region hierarchies

Accidental Complexity: The higher we move in the HA scale, the less manageable the deployments become


Replication in itself is useless, it’s the recovery orchestration that counts

Compute, Storage Cost

Bandwidth Cost

COST OF REDUNDANCY

Cost

VM Startup time / Instance Acquisition Latency/Bandwidth across regions General performance (IOPS, SSD)

RTO/RPO Impacting


http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm-startup-time-in-the-cloud




Architecting HA/DR Solutions with Cloudify

15

Cloudify provides the equivalent of Amazon OpsWork on OpenStack

APP CENTRIC DEVOPShttp://appcatalog.cloudifysource.org/

Nova, Cinder, NeutronHeatOpenShift,

CloudFoundry

http://appcatalog.cloudifysource.org/

http://appcatalog.cloudifysource.org/

ORCHESTRATORS, RECIPES, AND “CLOUDS”

Existing Data Center OpenStack Private Cloud

Cloud Driver

OpenStack Public CloudOpenStack Micro Cloud

Cloud – a set of shared compute, storage, network resources behind an OpenStack API, e.g.: resource in:• Availability zone• Region• Public cloud

• HP Cloud US-West / AZ1

• RackSpace Chicago (ORD) region

• DevStack, Vagrant• Recipe

Development & DR testing

• Bare metal or virtual environment

18

KEY PRINCIPLES


• Automation First(operational processes)

• Decouple the Application from the infrastructure

(design for failure)

• Use Plug-In approach to plug the right cloud for the Job

(balance cost, complexity, testing)

• Aggressive monitoring across the app stack

19

KEY PRINCIPLES


• Automation First(operational processes)

Provision

Install

Configure

Deploy

Monitor

Scale

https://github.com/CloudifySource/cloudify-recipes/



20

KEY PRINCIPLES


• Decouple the Application from the infrastructure

(design for failure)

Storage

Network

Cloud Templates

Compute

21

KEY PRINCIPLES


• Use Plug-In approach to plug the right cloud for the Job

(balance cost, complexity, testing)

22

KEY PRINCIPLES


• Aggressive monitoring across the app stack

Scaling rules

AutomaticFailover

Scaling rules

Case Studies(putting it all together)

23

24

DR ELASTICITY CONTINUUM

Cold/WarmDisaster Recovery

HotDisaster Recovery

Higher RTO

Lower Cost

Lower RTO

Higher Cost


Operationally Critical

Business Critical

Mission Critical

25

OPERATIONALLY CRITICAL: COLD DR


Characteristics Design / Recipe Implementation

Financial Services customer, post-trade processing application

• Cold Disaster Recovery (clone your recipe on another cloud in case of disaster)

• Recipes used for Disaster Recovery planning trade-off analysis

26

BUSINESS CRITICAL: CROSS-REGION DR



Transportation/Logistics Big Data / Realtime Analytics

• Autoscaling JBoss

• 4 services recipes deployed across both regions

• Recipes orchestrate setup, snapshot, and provisioning of PostgreSQL, Cassandra replication

• Federated data between cloud controllers (failover, polling, SQL master/slave promotion)

27

MISSION CRITICAL: IN-MEMORY WAN REPLICATION



Transportation/Logistics

• Replication as a Service https://github.com/dfilppi/repl-service

• Low-latency asynchronous replication across regions using in-memory replication technology (GigaSpaces XAP)

• Topologies: Master-Slave, Master-Master, Hub/Soke, Ring

• Reference data, HTTP session sharing

https://github.com/dfilppi/repl-service

https://github.com/dfilppi/repl-service

Resources

28

29

TRY IT OUT TODAY


Join the communityhttp://www.cloudifysource.org

https://github.com/CloudifySource/cloudify-recipes

Try out and contribute some recipes

http://www.cloudifysource.org/

http://www.cloudifysource.org/



Questions?

30

application-level disaster recovery on openstack

Technology

rights reserved replication

cloud architecture copyright

rights reserved http

rights reserved automation

cloud high availability

rights reserved application

zone architecture copyright

region architecture