nicta, disaster recovery using openstack

Building a Disaster Recovery Solution using OpenStack

Jorke Odolphi

Principal Research Engineer

NICTA

[email protected]

@jorke

http://bionicvision.org.au/eye

The Team

Yuru – ‘cloud’, Gamilaraay People NSW

Problem

The cloud can fail.

Online businesses that rely and benefit most from the cloud don’t have the skills

to handle failure.

Disaster Recovery

process, policies and procedures related to preparing for recovery or continuation of

technology infrastructure critical to an organisation after a natural or human-induced

disaster *

*according to wikipedia..

RPO

Recovery Point Objective

“maximum tolerable period in which data might be lost from an IT Service due to a Major

incident…” *


RTO

Recovery Time Objective

“duration of time and a service level within which a business process must be restored after

a disaster…” *


Recovery Time Objective0 downtime Sometime...

Realtime recovery/

failover

Recovery Point

Objective

Somewhere..

Without re-architecting your application;

Provide a configurable warm standby solution,

with a known consistent RPO,

reducing RTO,

minimising business impact.

Our Goal

Goals and Challenges

Replicate application over to OpenStack in case of a disaster

–Preserve the running environment of the application, this includes:

• Compute instances

• Networks

• DNS

Minimise RTO and RPO AND cost!

mypizzashop.com.auPublic IP / Load Balanced

Web front endApache/Nginx/IIS

app.mypizzashop.com.auPrivate IP

ApplicationProcessing/memcache

db.mypizzashop.com.auPrivate IPDatabase

MySQL/PostgreSQL/MSSQL

Architecting for DR in Cloud

Virtualise your servers

– snapshotting support in hypervisor primarily at the disk

Use Dynamic DNS solutions

– E.g. Route 53, Anycast DNS

Compatibility across IaaS Clouds

Cloud Provider

Framework Compute Instance

Object Store

Block Storage

Network Security Group

AWS Custom ✓ ✓ ✓ DHCP ✓

Rackspace Custom ✓ ✓ ✗ STATIC ✗

Ninefold CloudStack ✓ ✓ ✓ DHCP ✓

TryStack OpenStack ✓ ✓ ✓ DHCP ✓

HP Cloud OpenStack ✓ ✓ ✗ DHCP ✓

• Replication from one cloud to another is NOT always possible • Some clouds do not have all the technology pieces (e.g., Block Storage)

• Minimum requirements for replicating application servers: • compute instance and persistent storage, such as object store or block storage • Snapshot service (to ensure point-in-time consistency) • Hypervisor support (e.g., PVGrub)

Overview of DR Process

AWSTake snapshot Create volume

Partition

Send to storageDownload from storage

OpenStackMount new

instance

Building DR using OpenStack

Progress: – Deploying OpenStack in our NICTA lab – Successfully replicated AWS compute instances to

OpenStack • In Rackspace OpenStack public cloud (private beta) • Instances created from standard 64-bit EXT3 AWS OpenSuse

image

Requirements: – Xen support for PVGrub – Write access to partition table – Network support

Problems

Latency Point in Time Log and replay / transactional How do modern databases handle broken transactions / problem disks? Rollback

Optimisations: Incremental Backup

Typical AWS system volume is around 10GB

Replication is tricky for large data volumes

– Initial backup:

• Send the whole data volume (unavoidable!)

• Optimise by compression and skipping empty space (0’s)

– Subsequent backups:

• Incremental – partition a volume into chunks and resend only the difference (the ‘delta’)

Large Data Transfer Across Cloud Datacenters Why so slow?

Optimisations: Large Data Transfer Across Cloud Datacenters for DR

Problem: Transferring large data volumes is slow

– Where is the bottleneck?

• Reading from the source volume? YES!!

• Transferring across LAN/WAN?

• Writing to destination volume?

• Our solution

Rapidly Cloning data volumes from snapshots

– Parallel transfers

50 40

190

140

Volume Scan (MB/s) End-to-end Transfer(MB/s)

Data Transfer Evaluations

1 Clone 4 Clones

Reversing..

Point us to your instances

Replicate to new cloud/region

Automatically sync changes every hour

If the worst happens: failover

Questions?

Or answers?

Jorke Odolphi

[email protected]

@jorke

nicta, disaster recovery using openstack

Technology

data volume unavoidable

disaster recovery solution

yuru cloud

cloud dont

source volume

destination volume

xen support

incremental partition