storage and disaster recoveryd36cz9buwru1tt.cloudfront.net/aws-gov-summit-2011/...aws gov cloud...
TRANSCRIPT
AWS Gov Cloud Summit II
• High Availability, Storage Backup and Disaster Recovery form a continuum of continuity of operations (COOP) solutions to avert data loss and application downtime
– In the face of internal or external events, how do you: • Keep your application running 24x7? – HA
• Make sure your data is safe? – Backup Storage
• Get an application back up after a major disaster? – DR
The Business Continuity Continuum
High Availability
Data Backup
Disaster Recovery
AWS Gov Cloud Summit II
• DR is one end of the continuum – Recover from any event within a defined period of
time (RTO) and data loss (RPO)
• Goal: Application restarted and data recovered
within an acceptable period of time – Application may run at lower function or lower
capacity
• Traditional IT model: DR is “off-site” – Low end DR: Off-Site Backups – High end DR: full Hot-Site DR
Disaster Recovery
AWS Gov Cloud Summit II
• AWS is useful for traditional low-end DR to high-end HA, but…
• AWS encourages a rethinking of traditional DR / HA practices – Everything in the cloud is “off-site” and (potentially) “multi-site”
– Using multiple sites (multiple AZs) comes largely for free
– Using multiple geographically-distributed sites (multiple Regions) is significantly cheaper and easier
• Tends to move the default design point away from “cold” Disaster Recovery toward “hot” High Availability, which blends application scaling and COOP design points
• Makes it easier to stack multiple mechanisms – e.g., Basic HA within one Region, DR site in second Region
How Does AWS Change Traditional DR?
AWS Gov Cloud Summit II
Japan
Availability Zone A
Availability Zone B
• Regions are completely separate clouds • Multiple connected Availability Zones in each Region with private intra-AZ connectivity • AWS services use AZs to provide their high reliability SLAs; you should too
AWS Regions and Availability Zones
EU West Region
Availability Zone A
Availability Zone B
US East Region
Availability Zone A
Availability Zone C
Availability Zone B
US West Region
Availability Zone A
Availability Zone B
Singapore
Availability Zone A
Availability Zone B
GovCloud
Availability Zone A
Availability Zone B
AWS Gov Cloud Summit II
• Amazon Simple Storage Service (S3)
– Highly-durable blob storage
– Highly useful for archival and backup
• Elastic Block Store (EBS) and EBS Snapshots
– Persistent Data volumes for EC2 instances
– Redundant within a single Availability Zone
– Snapshot backups provide long-term durability, and volume sharing / cloning capability within a Region
AWS Backup Storage Capabilities
Copyright © 2011 Amazon Web Services
AWS Gov Cloud Summit II
• Amazon Import/Export – Migration of large amounts of data to AWS – “Virtual Sneakernet” – send hard drives to AWS
• Continual Data Backup – Backup products – many products and partners here – Replication (mirroring, db replication, log shipping, etc.) – Managed File Transfer products – Scripted rsync, tsunami, etc.
• Amazon VM Import
– Support for migrating virtual machines & disks to AWS – Windows-only today with more OSes over time
Data Migration to AWS
AWS Gov Cloud Summit II
• Variety of approaches exist – Tradeoff between RTO/RPO vs. cost and
complexity
Example Architectural Patterns:
Architectural Patterns Overview
Approach RTO RPO
Backup and Restore Hours to Days Day(s)
“Pilot Light” for Quick Recovery Hours Minutes to Hours
Fully Functioning Low Capacity Standby Minutes to Hours Minutes to Hours
Multi-Site Hot Standby Zero to Minutes Immediate to Minutes
AWS Gov Cloud Summit II
• Advantages – Simple to get started
– Extremely cost effective (mostly backup storage)
• Preparation Phase – Take backups of current systems
– Store backups in S3
– Describe procedure to restore from backup on AWS • Know which AMI to use, build your own as needed
• Know how to restore system from backups
• Know how to switch to new system
• Know how to configure the deployment
Backup and Restore – Pros and Prep
AWS Gov Cloud Summit II
• In Case of Disaster – Bring up required infrastructure in AWS
• EC2 instances with prepared AMIs, Load Balancing, etc.
– Restore system from S3 backups
– Switch over to the new system • Adjust DNS records to point to AWS
• Objectives – RTO: as long as it takes to bring up infrastructure
and restore system from backups
– RPO: time since last backup
Backup and Restore – Recovery Approach
AWS Gov Cloud Summit II
Backup and Restore – High-level Architecture
Existing Data center
Front-end
Server
Application
Server
Database
Server
Storage
Data Backup
Bucket
Code/Logs
Code/Logs
Data Dumps
Data Files
AWS Gov Cloud Summit II
• Advantages – Reduced RTO and RPO – Very cost effective (very few 24/7 resources)
• Preparation Phase – Enable replication of all critical data to AWS
• Standby DB, replica, mirror, etc. • Reduced infrastructure that runs 24/7 in AWS
– Prepare all required resources for automatic start • AMIs, Network Settings, Load Balancing, etc. • Only runs when used for DR
– Reserved Instances
“Pilot Light” for Quick Recovery – Pros and Prep
AWS Gov Cloud Summit II
• In Case of Disaster – Automatically bring up resources around the
replicated core data set
– Scale the system as needed to handle current production traffic
– Switch over to the new system • Adjust DNS records to point to AWS
• Objectives – RTO: as long as it takes to detect need for DR and
automatically scale up replacement system
– RPO: depends on replication type
“Pilot Light” for Quick Recovery – Recovery Approach
AWS Gov Cloud Summit II
“Pilot Light” for Quick Recovery – High-level Architecture
Existing Data center
Front-end
Server
Application
Server
Database
Server
Storage
Data Backup
Bucket
Data Backups
Pre-canned
AMIs
Role-based AMIs
“Real-time”
Replication DB
DB Replication
AWS Gov Cloud Summit II
• Advantages
– Can take some production traffic at any time
– Cost savings (IT footprint smaller than full DR)
• Preparation
– Similar to “Pilot Light”
– All necessary components running 24/7, but not scaled for production traffic
– Best practice – continuous testing
• “Trickle” a statistical subset of production traffic to DR site
Fully Functioning Low-Capacity Standby – Pros and Prep
AWS Gov Cloud Summit II
• In Case of Disaster
– Immediately fail over most critical production load
• Adjust DNS records to point to AWS
– (Auto) Scale the system further to handle all production load
• Objectives
– RTO: for critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further
– RPO: depends on replication type
Fully Functioning Low-Capacity Standby – Recovery Approach
AWS Gov Cloud Summit II
Fully Functioning Low-Capacity Standby – High-level Architecture
Existing Data center
Front-end
Server
Application
Server
Database
Server
Storage
Data Backup
Bucket
Data Backups “Real-time”
Replication DB
DB Replication
Auto scaling
Group
Auto scaling
Group
Warm FE
Tier
Warm App
Tier
Zero Weight DNS Route
AWS Gov Cloud Summit II
• Advantages
– At any moment can take all production load
• Preparation
– Similar to Low-Capacity Standby
– But fully scaling in/out with production load
Multi-Site Hot Standby – Pros and Prep
AWS Gov Cloud Summit II
• In Case of Disaster
– Immediately fail over all production load
• Adjust DNS records to point to AWS
• Objectives
– RTO: as long as it takes fail over
– RPO: depends on replication type
Multi-Site Hot Standby – Recovery Approach
AWS Gov Cloud Summit II
Multi-Site Hot Standby – High-level Architecture
Existing Data center
Front-end
Server
Application
Server
Database
Server
Storage
Data Backup
Bucket
Data Backups “Real-time”
Replication DB
DB Synchronization
Auto scaling Group
Auto scaling Group
Hot FE
Tier
Hot App
Tier
Active DNS Route
AWS Gov Cloud Summit II
• Start simple and work your way up
– Backups in AWS as a first step
– Incrementally improve RTO/RPO as a continuous effort
• Check for any software licensing issues
• Exercise your DR Solution
– Game Day
– Ensure backups, snapshots, AMIs, etc. are working
– Monitor your monitoring system
Best Practices for Being Prepared
AWS Gov Cloud Summit II
http://aws.amazon.com/solutions/solution-providers/
http://aws.amazon.com/solutions/case-studies/
DR Solution Providers
AWS Gov Cloud Summit II
• Various building blocks available
• Fine control over cost vs. RTO/RPO tradeoffs
• Ability to scale up rapidly when needed
• Pay for what you use, and only when you use it (when an event happens)
• Ability to easily and effectively test your DR plan
• Availability of multiple locations world wide
• Variety of Solution Providers
Conclusion – Advantages of DR with AWS