storage and disaster recoveryd36cz9buwru1tt.cloudfront.net/aws-gov-summit-2011/...aws gov cloud...

24
AWS Gov Cloud Summit II Storage and Disaster Recovery Matt Tavis | Principal Solutions Architect

Upload: phungkien

Post on 06-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

AWS Gov Cloud Summit II

Storage and Disaster Recovery

Matt Tavis | Principal Solutions Architect

AWS Gov Cloud Summit II

• High Availability, Storage Backup and Disaster Recovery form a continuum of continuity of operations (COOP) solutions to avert data loss and application downtime

– In the face of internal or external events, how do you: • Keep your application running 24x7? – HA

• Make sure your data is safe? – Backup Storage

• Get an application back up after a major disaster? – DR

The Business Continuity Continuum

High Availability

Data Backup

Disaster Recovery

AWS Gov Cloud Summit II

• DR is one end of the continuum – Recover from any event within a defined period of

time (RTO) and data loss (RPO)

• Goal: Application restarted and data recovered

within an acceptable period of time – Application may run at lower function or lower

capacity

• Traditional IT model: DR is “off-site” – Low end DR: Off-Site Backups – High end DR: full Hot-Site DR

Disaster Recovery

AWS Gov Cloud Summit II

• AWS is useful for traditional low-end DR to high-end HA, but…

• AWS encourages a rethinking of traditional DR / HA practices – Everything in the cloud is “off-site” and (potentially) “multi-site”

– Using multiple sites (multiple AZs) comes largely for free

– Using multiple geographically-distributed sites (multiple Regions) is significantly cheaper and easier

• Tends to move the default design point away from “cold” Disaster Recovery toward “hot” High Availability, which blends application scaling and COOP design points

• Makes it easier to stack multiple mechanisms – e.g., Basic HA within one Region, DR site in second Region

How Does AWS Change Traditional DR?

AWS Gov Cloud Summit II

Japan

Availability Zone A

Availability Zone B

• Regions are completely separate clouds • Multiple connected Availability Zones in each Region with private intra-AZ connectivity • AWS services use AZs to provide their high reliability SLAs; you should too

AWS Regions and Availability Zones

EU West Region

Availability Zone A

Availability Zone B

US East Region

Availability Zone A

Availability Zone C

Availability Zone B

US West Region

Availability Zone A

Availability Zone B

Singapore

Availability Zone A

Availability Zone B

GovCloud

Availability Zone A

Availability Zone B

AWS Gov Cloud Summit II

• Amazon Simple Storage Service (S3)

– Highly-durable blob storage

– Highly useful for archival and backup

• Elastic Block Store (EBS) and EBS Snapshots

– Persistent Data volumes for EC2 instances

– Redundant within a single Availability Zone

– Snapshot backups provide long-term durability, and volume sharing / cloning capability within a Region

AWS Backup Storage Capabilities

Copyright © 2011 Amazon Web Services

AWS Gov Cloud Summit II

• Amazon Import/Export – Migration of large amounts of data to AWS – “Virtual Sneakernet” – send hard drives to AWS

• Continual Data Backup – Backup products – many products and partners here – Replication (mirroring, db replication, log shipping, etc.) – Managed File Transfer products – Scripted rsync, tsunami, etc.

• Amazon VM Import

– Support for migrating virtual machines & disks to AWS – Windows-only today with more OSes over time

Data Migration to AWS

AWS Gov Cloud Summit II

• Variety of approaches exist – Tradeoff between RTO/RPO vs. cost and

complexity

Example Architectural Patterns:

Architectural Patterns Overview

Approach RTO RPO

Backup and Restore Hours to Days Day(s)

“Pilot Light” for Quick Recovery Hours Minutes to Hours

Fully Functioning Low Capacity Standby Minutes to Hours Minutes to Hours

Multi-Site Hot Standby Zero to Minutes Immediate to Minutes

AWS Gov Cloud Summit II

• Advantages – Simple to get started

– Extremely cost effective (mostly backup storage)

• Preparation Phase – Take backups of current systems

– Store backups in S3

– Describe procedure to restore from backup on AWS • Know which AMI to use, build your own as needed

• Know how to restore system from backups

• Know how to switch to new system

• Know how to configure the deployment

Backup and Restore – Pros and Prep

AWS Gov Cloud Summit II

• In Case of Disaster – Bring up required infrastructure in AWS

• EC2 instances with prepared AMIs, Load Balancing, etc.

– Restore system from S3 backups

– Switch over to the new system • Adjust DNS records to point to AWS

• Objectives – RTO: as long as it takes to bring up infrastructure

and restore system from backups

– RPO: time since last backup

Backup and Restore – Recovery Approach

AWS Gov Cloud Summit II

Backup and Restore – High-level Architecture

Existing Data center

Front-end

Server

Application

Server

Database

Server

Storage

Data Backup

Bucket

Code/Logs

Code/Logs

Data Dumps

Data Files

AWS Gov Cloud Summit II

• Advantages – Reduced RTO and RPO – Very cost effective (very few 24/7 resources)

• Preparation Phase – Enable replication of all critical data to AWS

• Standby DB, replica, mirror, etc. • Reduced infrastructure that runs 24/7 in AWS

– Prepare all required resources for automatic start • AMIs, Network Settings, Load Balancing, etc. • Only runs when used for DR

– Reserved Instances

“Pilot Light” for Quick Recovery – Pros and Prep

AWS Gov Cloud Summit II

• In Case of Disaster – Automatically bring up resources around the

replicated core data set

– Scale the system as needed to handle current production traffic

– Switch over to the new system • Adjust DNS records to point to AWS

• Objectives – RTO: as long as it takes to detect need for DR and

automatically scale up replacement system

– RPO: depends on replication type

“Pilot Light” for Quick Recovery – Recovery Approach

AWS Gov Cloud Summit II

“Pilot Light” for Quick Recovery – High-level Architecture

Existing Data center

Front-end

Server

Application

Server

Database

Server

Storage

Data Backup

Bucket

Data Backups

Pre-canned

AMIs

Role-based AMIs

“Real-time”

Replication DB

DB Replication

AWS Gov Cloud Summit II

• Advantages

– Can take some production traffic at any time

– Cost savings (IT footprint smaller than full DR)

• Preparation

– Similar to “Pilot Light”

– All necessary components running 24/7, but not scaled for production traffic

– Best practice – continuous testing

• “Trickle” a statistical subset of production traffic to DR site

Fully Functioning Low-Capacity Standby – Pros and Prep

AWS Gov Cloud Summit II

• In Case of Disaster

– Immediately fail over most critical production load

• Adjust DNS records to point to AWS

– (Auto) Scale the system further to handle all production load

• Objectives

– RTO: for critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further

– RPO: depends on replication type

Fully Functioning Low-Capacity Standby – Recovery Approach

AWS Gov Cloud Summit II

Fully Functioning Low-Capacity Standby – High-level Architecture

Existing Data center

Front-end

Server

Application

Server

Database

Server

Storage

Data Backup

Bucket

Data Backups “Real-time”

Replication DB

DB Replication

Auto scaling

Group

Auto scaling

Group

Warm FE

Tier

Warm App

Tier

Zero Weight DNS Route

AWS Gov Cloud Summit II

• Advantages

– At any moment can take all production load

• Preparation

– Similar to Low-Capacity Standby

– But fully scaling in/out with production load

Multi-Site Hot Standby – Pros and Prep

AWS Gov Cloud Summit II

• In Case of Disaster

– Immediately fail over all production load

• Adjust DNS records to point to AWS

• Objectives

– RTO: as long as it takes fail over

– RPO: depends on replication type

Multi-Site Hot Standby – Recovery Approach

AWS Gov Cloud Summit II

Multi-Site Hot Standby – High-level Architecture

Existing Data center

Front-end

Server

Application

Server

Database

Server

Storage

Data Backup

Bucket

Data Backups “Real-time”

Replication DB

DB Synchronization

Auto scaling Group

Auto scaling Group

Hot FE

Tier

Hot App

Tier

Active DNS Route

AWS Gov Cloud Summit II

• Start simple and work your way up

– Backups in AWS as a first step

– Incrementally improve RTO/RPO as a continuous effort

• Check for any software licensing issues

• Exercise your DR Solution

– Game Day

– Ensure backups, snapshots, AMIs, etc. are working

– Monitor your monitoring system

Best Practices for Being Prepared

AWS Gov Cloud Summit II

• Various building blocks available

• Fine control over cost vs. RTO/RPO tradeoffs

• Ability to scale up rapidly when needed

• Pay for what you use, and only when you use it (when an event happens)

• Ability to easily and effectively test your DR plan

• Availability of multiple locations world wide

• Variety of Solution Providers

Conclusion – Advantages of DR with AWS

AWS Gov Cloud Summit II

Thank You!