application-level disaster recovery on openstack

30
Cloudifying High Availability Application-level Disaster Recovery on OpenStack Ali Hodroj Director, Solution Architecture

Upload: ali-hodroj

Post on 15-Jan-2015

1.212 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Application-level Disaster Recovery on OpenStack

Cloudifying High AvailabilityApplication-level Disaster Recovery on OpenStack

Ali HodrojDirector, Solution Architecture

Page 2: Application-level Disaster Recovery on OpenStack

Context and Concepts Regions, Zones, and Single Points of Failure Challenges and Trade-Offs Architecting HA/DR Solutions with Cloudify Case Studies Resources and Q&A

AGENDA

Copyright 2014 Gigaspaces. All Rights Reserved

Page 3: Application-level Disaster Recovery on OpenStack

Context and Concepts

3

Page 4: Application-level Disaster Recovery on OpenStack

FOCUS OF THIS SESSION

Copyright 2014 Gigaspaces. All Rights Reserved

Application DR, Fault isolation strategies, deployment patterns

• Pacemaker• Corosync

messaging• HAProxy• Galera MySQL

replication

HA/DR Layers

Power, air conditioning, fire protection…etc

Page 5: Application-level Disaster Recovery on OpenStack

Fault Tolerance Ability to withstand failure and operate with

normal or degraded performance Redundancy and Replication

High Availability “The nines” – 99.99% = 33mins/year Minutes/Hours of uptime per year

Single Point of Failure Part of a system that, if it fails, it will bring

down the entire system

CONCEPTS

High Availability

RTO How much downtime are you willing to

tolerate?

RPO How much data are you willing to lose ?

Cost Development Effort Redundant environments

Disaster Recovery

Copyright 2014 Gigaspaces. All Rights Reserved

Page 6: Application-level Disaster Recovery on OpenStack

Availability includes both planned and unplanned outage

“Everything fails, all the time” Cloud vendor SLA’s demand multi-zone

outage and deployments to be effective

CONCEPTS…IN THE REAL WORLD

High Availability Disaster Recovery

Copyright 2014 Gigaspaces. All Rights Reserved

+ + +

99.95% 99.90% 99.90% 99.99%

21 minutes 43 minutes 43 minutes 21 minutes

= 99.74% 112.3 minutes

Accomplishing high levels of redundancy in the cloud is expensive

Determining an appropriate RPO and RTO is ultimately a financial calculation

Page 7: Application-level Disaster Recovery on OpenStack

Regions, Zones, and Single Points of Failure

7

Page 8: Application-level Disaster Recovery on OpenStack

8

CLOUD HIGH AVAILABILITY: MATURITY MODEL

Single server instance, same data center

Same geographical region

Same operational procedures, provider

Single Points of Failures

Copyright 2014 Gigaspaces. All Rights Reserved

Page 9: Application-level Disaster Recovery on OpenStack

9

MULTI-ZONE ARCHITECTURE

Copyright 2014 Gigaspaces. All Rights Reserved

Physically separated data centers within a region

Each availability zone Independent power feeds from

separate substations Redundant Power on each rack and

diverse cabling Shared images, security groups, and

floating IPs

Page 10: Application-level Disaster Recovery on OpenStack

10

MULTI-REGION ARCHITECTURE

Copyright 2014 Gigaspaces. All Rights Reserved

Characteristics Geographically dispersed

architecture Disaster Recovery Patterns

Replicate stateful tiers, orchestrate stateless upon failure

Challenges Data replication costs and

performance Network flow Orchestrating recovery

Page 11: Application-level Disaster Recovery on OpenStack

11

MULTI-CLOUD ARCHITECTURE

Copyright 2014 Gigaspaces. All Rights Reserved

Characteristics Leverages cloud economics Workload migration (“Own the base, rent the spike”) Least single points of failure

Disaster Recovery Patterns Replicate stateful tiers, orchestrate

stateless upon failure

Challenges Bootstrapping data for stateful

services (snapshot or async replication?)

Data replication challenges over WAN

Complex setup

Page 12: Application-level Disaster Recovery on OpenStack

Challenges and Trade-Offs

12

Page 13: Application-level Disaster Recovery on OpenStack

13

DEPLOYMENT (ACCIDENTAL) COMPLEXITY

Consistent deployment

Cross zone configuration

Machine images, security groups, keys

Different API, zone/region hierarchies

Accidental Complexity: The higher we move in the HA scale, the less manageable the deployments become

Copyright 2014 Gigaspaces. All Rights Reserved

Replication in itself is useless, it’s the recovery orchestration that counts

Page 14: Application-level Disaster Recovery on OpenStack

Compute, Storage Cost

Bandwidth Cost

COST OF REDUNDANCY

Cost

VM Startup time / Instance Acquisition Latency/Bandwidth across regions General performance (IOPS, SSD)

RTO/RPO Impacting

Copyright 2014 Gigaspaces. All Rights Reserved

http://www.slideshare.net/mingtemp/a-performance-study-on-the-vm-startup-time-in-the-cloud

Page 15: Application-level Disaster Recovery on OpenStack

Architecting HA/DR Solutions with Cloudify

15

Page 16: Application-level Disaster Recovery on OpenStack

Cloudify provides the equivalent of Amazon OpsWork on OpenStack

APP CENTRIC DEVOPShttp://appcatalog.cloudifysource.org/

Nova, Cinder, NeutronHeatOpenShift,

CloudFoundry

Page 17: Application-level Disaster Recovery on OpenStack

ORCHESTRATORS, RECIPES, AND “CLOUDS”

Existing Data Center OpenStack Private Cloud

Cloud Driver

OpenStack Public CloudOpenStack Micro Cloud

Cloud – a set of shared compute, storage, network resources behind an OpenStack API, e.g.: resource in:• Availability zone• Region• Public cloud

• HP Cloud US-West / AZ1

• RackSpace Chicago (ORD) region

• DevStack, Vagrant• Recipe

Development & DR testing

• Bare metal or virtual environment

Page 18: Application-level Disaster Recovery on OpenStack

18

KEY PRINCIPLES

Copyright 2014 Gigaspaces. All Rights Reserved

• Automation First(operational processes)

• Decouple the Application from the infrastructure

(design for failure)

• Use Plug-In approach to plug the right cloud for the Job

(balance cost, complexity, testing)

• Aggressive monitoring across the app stack

Page 19: Application-level Disaster Recovery on OpenStack

19

KEY PRINCIPLES

Copyright 2014 Gigaspaces. All Rights Reserved

• Automation First(operational processes)

Provision

Install

Configure

Deploy

Monitor

Scale

https://github.com/CloudifySource/cloudify-recipes/

Page 20: Application-level Disaster Recovery on OpenStack

20

KEY PRINCIPLES

Copyright 2014 Gigaspaces. All Rights Reserved

• Decouple the Application from the infrastructure

(design for failure)

Storage

Network

Cloud Templates

Compute

Page 21: Application-level Disaster Recovery on OpenStack

21

KEY PRINCIPLES

Copyright 2014 Gigaspaces. All Rights Reserved

• Use Plug-In approach to plug the right cloud for the Job

(balance cost, complexity, testing)

Page 22: Application-level Disaster Recovery on OpenStack

22

KEY PRINCIPLES

Copyright 2014 Gigaspaces. All Rights Reserved

• Aggressive monitoring across the app stack

Scaling rules

AutomaticFailover

Scaling rules

Page 23: Application-level Disaster Recovery on OpenStack

Case Studies(putting it all together)

23

Page 24: Application-level Disaster Recovery on OpenStack

24

DR ELASTICITY CONTINUUM

Cold/WarmDisaster Recovery

HotDisaster Recovery

Higher RTO

Lower Cost

Lower RTO

Higher Cost

Copyright 2014 Gigaspaces. All Rights Reserved

Operationally Critical

Business Critical

Mission Critical

Page 25: Application-level Disaster Recovery on OpenStack

25

OPERATIONALLY CRITICAL: COLD DR

Copyright 2014 Gigaspaces. All Rights Reserved

Characteristics Design / Recipe Implementation

Financial Services customer, post-trade processing application

• Cold Disaster Recovery (clone your recipe on another cloud in case of disaster)

• Recipes used for Disaster Recovery planning trade-off analysis

Page 26: Application-level Disaster Recovery on OpenStack

26

BUSINESS CRITICAL: CROSS-REGION DR

Copyright 2014 Gigaspaces. All Rights Reserved

Characteristics Design / Recipe Implementation

Transportation/Logistics Big Data / Realtime Analytics

• Autoscaling JBoss

• 4 services recipes deployed across both regions

• Recipes orchestrate setup, snapshot, and provisioning of PostgreSQL, Cassandra replication

• Federated data between cloud controllers (failover, polling, SQL master/slave promotion)

Page 27: Application-level Disaster Recovery on OpenStack

27

MISSION CRITICAL: IN-MEMORY WAN REPLICATION

Copyright 2014 Gigaspaces. All Rights Reserved

Characteristics Design / Recipe Implementation

Transportation/Logistics

• Replication as a Service https://github.com/dfilppi/repl-service

• Low-latency asynchronous replication across regions using in-memory replication technology (GigaSpaces XAP)

• Topologies: Master-Slave, Master-Master, Hub/Soke, Ring

• Reference data, HTTP session sharing

Page 28: Application-level Disaster Recovery on OpenStack

Resources

28

Page 29: Application-level Disaster Recovery on OpenStack

29

TRY IT OUT TODAY

Copyright 2014 Gigaspaces. All Rights Reserved

Join the communityhttp://www.cloudifysource.org

https://github.com/CloudifySource/cloudify-recipes

Try out and contribute some recipes

Page 30: Application-level Disaster Recovery on OpenStack

Questions?

30