Download - Surviving the Worst · 2019-02-26 · 1 © 2013 IBM Corporation Surviving the Worst: A Vision for OpenStack Disaster Recovery Presenter: Michael Factor [email protected] IBM Research

© 2013 IBM Corporation1

Surviving the Worst: A Vision for OpenStack Disaster Recovery

Presenter: Michael Factor

[email protected]

IBM Research – Haifa


Outline and Presentation Objective

Basic Disaster Recovery (DR) concepts

• What is Disaster Recovery (DR)?

• Recovery time objective (RTO), recovery point objective (RPO)

• Data replication

• Consistency

Workload Example

Vision for OpenStack DR

Objective: Motivate the requirements for DR for OpenStack and encourage involvement in on-

going efforts to enable OpenStack DR.


What is Disaster Recovery?

According to Wikipedia, Disaster Recovery (DR) is “the process, policies and procedures . . . for recovery . . . of technology infrastructure . . . after a natural or human-induced disaster.”



According to Wikipedia, Disaster Recovery (DR) is "the process, policies and procedures . . . for recovery . . . of technology infrastructure . . . after a natural or

human-induced disaster.”

Implication: Surviving a disaster requires geographic dispersion



According to Wikipedia, Disaster Recovery (DR) is "the process, policies and procedures . . . for recovery . . . of technology infrastructure . . . after a natural or human-induced disaster.”

ConfigurationSoftwareNetworkStorageServers




Up front (good path)

• Planning

• Copy

• Testing

DR Copy

Periodic

Test

Primary DC Secondary DC





• Planning

• Copy

• Testing

DR Copy

Periodic

Test

X


Copy Approaches• Continuous

• Synchronous

• Asynchronous• Periodic

• Online• Offline





• Planning

• Copy

• Testing

Detection

DR Copy

Periodic

Test

X

Oops!






• Planning

• Copy

• Testing

Detection

Recovery

• Infrastructure

• Application

Secondary DC


Recovery Point Objective (RPO)Recovery Time Objective (RTO)

Recovery Point Objective Recovery Time Objective

How far back in time a

disaster takes one

How long until operational

after a disaster

RPO=0 � synchronous copy RTO=0 � hot backup site


Replicating the data: Synchronous

A

ACK

A

Primary Secondary

1 4

2

3


Replicating the data: Asynchronous

A

ACK

A

Primary Secondary

1 2

3

4


Consistency

Consistency: “Data at secondary is what the application could have seen at the primary”

• May require some “fix-up” to make consistent

If application writes A,B,C,D,E

• Having A,B,C at secondary is consistent

• Having E,B,D at secondary is inconsistent

Inconsistent data results when the data is not forwarded in the order of hosts writes

• Primarily an issue with asynchronous replication

Inconsistent data is essentially garbage


Outline


Workload Example

• A three tier application

• What is needed for the workload to survive a disaster?



Workload example: Three Tier Application

Client

Persistentstorage

Client Client

AppServer

AppServer

AppServer

Host

DataServer

Host

AppServer

AppServer

AppServer

Host

Data

Server

Host

Persistentstorage

Image

Repo

Identity

Service


Client

Persistentstorage

Client Client

Host

Host

Host

Host

Persistentstorage

AppServer

AppServer

AppServer

AppServer

AppServer

AppServer

DataServer

Data

Server

• Ensure image content is available at recovery data

center.• Ensure compatible image

metadata

What is needed for the workload to survive a disaster?

Image

Repo

Images for Application and Data Servers obtained from Glance

Identity

Service


Image

Repo

Application and Data servers

VMs managed by Nova

Security managed by Keystone

Network configured by NeutronIdentity

Service

Client

Persistentstorage

Client Client

Persistentstorage


AppServer

AppServer

AppServer

Host

DataServer

Host

AppServer

AppServer

AppServer

Host

Data

Server

Host

• Compatible metadata for network,

security and VMs at the recovery site

• Consistent with application’s persistent data

• May be different approaches for

different metadata


ClientClient Client

Host

Host

Host

Host

AppServer

AppServer

AppServer

AppServer

AppServer

AppServer

DataServer

Data

Server


Persistentstorage

Persistentstorage

Image

Repo

Persistent storage managed by Cinder

Identity

Service• Persistent state modified by

application should be replicated to secondary

• Configuration information needs to be replicated, e.g.,

volume size


ClientClient Client

Host

Host

Host

Host

AppServer

AppServer

AppServer

AppServer

AppServer

AppServer

DataServer

Data

Server


Persistentstorage

Persistentstorage

Image

Repo

Heat can be a means of extracting configuration information and deploying at recovery site

Templates from primary and recovery site may not be identical

Identity

Service


Outline


Workload Example


• Overview

• State

Images

Data

Metadata

• Automate


Some basic tenants of our vision

The DR is between a primary cloud and a target cloud

• Independent of one another – share-nothing

Except perhaps a geo-distributed Swift deployment

• Primary and target clouds interact through a “mediator”.

Enable hybrid deployments between private and public cloud

• Also private-private and public-public

Protect a set of VMs and related resources

• A workload or a set of workloads owned by a tenant.

Allow flexibility in choice of RPO and RTO


Vision for OpenStack DR: The big picture

Security

Storage

VMs

Images

Network Security

Storage

VMs

Images

Network

DR Middleware DR MiddlewareWorkload

description

Workload

descriptionSecurity details, VM images, and

metadata, etc (Icehouse and beyond)

Primary cloud Target cloud

Storage StorageStorage Replication (Icehouse)


State: Images Glance registry

• Handle like metadata

Glance backend store

• Swift global cluster

• Cinder

Storage replication

• Manual replication


State: Images

Proxy Layer

Storage Servers

Private

Network

Swift Global Cluster

Glance registry




• Cinder

Storage replication



State: Images

Proxy Layer

Storage Servers

Private

Network


Glance registry




• Cinder

Storage replication



State: Data High RPO

• Back up to Swift

Global/Remote Cluster

Low RPO

• Storage level replication

Cinder managed (storage)

Nova managed (host)





Low RPO



Nova managed (host)Cinder

Volume

Repl.

Gateway

Driver

Cinder

VolumeDriver

I need storage

replicated in

Europe





Low RPO




Volume

Repl.

Gateway

Driver

Cinder

VolumeDriver

Scheduler locates driver

that supports appropriate local-remote pairing

Driver creates local copy

and performs any initialization for pairing





Low RPO




Volume

Repl.

Gateway

Driver

Cinder

VolumeDriver

Cinder asks Gateway to

create remote volume via remote Cinder



• Back up to Swift Global

Cluster

Low RPO




Volume

Repl.

Gateway

Driver

Cinder

VolumeDriver

All host writes are now replicated

Work for Icehouse


State: Metadata

Examples of OpenStack metadata

Nova

• VM flavors

• SSH keys

Keystone

• Identities of users

Neutron

• Virtual networks between VMs

Cinder

• Volume types

• Pairing

Glance Registry

• Image metadata

Approaches to replicating metadata

• Periodic

• Continuous

Transfer and apply at remote site

• Some “clean up” required

• Some commands applied at once others only at recovery

Copying raw data from controller DBs does

not work

• Non-selective

• Requires same configuration/hardware at both sites


Automation

Identify what to protect and setup

• ImagesEnsure content available at secondary

• MetadataExtract description and dependencies

Replicate

Create dependencies

Set up to deploy

• DataSet up for replication

Test

_________________________

Failover

• Deploy at secondary


Secondary

Glance

Automation Example: Glance

Proxy Layer

Storage Servers

Private Network

A3B5


Primary

Glance

1. Create image in Swift Global

Cluster


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


A3B5

Primary

Glance


Cluster


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


A3B5

Primary

Glance


Cluster

A3B5


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


Primary

Glance


Cluster2. Define primary Glance to point to

image in Swift

A3B5

A3B5

A3B5


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


Primary

Glance



image in Swift

A3B5

A3B5

A3B5


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


Primary

Glance



image in Swift3. Extract metadata from primary

A3B5

A3B5

A3B5

A3B5


Secondary

Glance


Proxy Layer

Storage Servers

Private Network


Primary

Glance



image in Swift3. Extract metadata from primary

4. Replicate to secondary site and apply to secondary glance

A3B5A3B5

A3B5

A3B5


Automation Example: Nova

1. Provision VM at primary (possibly with Heat)

Primary

Nova

Flavorsfoo

barbaz

VM1:

foo

VM2:

baz

Secondary

Nova

Flavors

baz




2. Extract metadata and dependencies, e.g., flavors, from

primary and replicate to secondary

Primary

Nova

Flavorsfoo

barbaz

VM1:

foo

VM2:

baz

Secondary

Nova

Flavors

baz

VMs

VM1:foo; VM2:bazDependencies

Flavors: foo, baz





primary and replicate to secondary 3. Create dependencies and heat template at secondary

Primary

Nova

Flavorsfoo

barbaz

VM1:

foo

VM2:

baz

Secondary

Nova

Flavorsfoo

baz

VMs


Flavors: foo, baz

Template

Provision

VM1 fooVM2 baz





primary and replicate to secondary 3. Create dependencies and heat template at secondary

4. Use template to deploy on failure

Primary

Nova

Flavorsfoo

barbaz

VM1:

foo

VM2:

baz

Secondary

Nova

Flavorsfoo

baz

VM1:

foo

VM2:

baz

VMs


Flavors: foo, baz

Template

Provision

VM1 fooVM2 baz


Outline and Presentation Objective


Workload Example


Objective: Motivate the requirements for DR for OpenStack and encourage involvement in on-going efforts to enable OpenStack DR.

We are just scratching the surface and encourage involvement.

• See https://wiki.openstack.org/wiki/DisasterRecovery

• Cinder design summit session on volume continuous replication today at 2:40 PM.

• Unconference session tomorrow at 9:50am

Download - Surviving the Worst · 2019-02-26 · 1 © 2013 IBM Corporation Surviving the Worst: A Vision for OpenStack Disaster Recovery Presenter: Michael Factor [email protected] IBM Research

Top Related