disaster recovery site on aws - minimal cost maximum efficiency (stg305) | aws re:invent 2013
DESCRIPTION
Implementation of a disaster recovery (DR) site is crucial for the business continuity of any enterprise. Due to the fundamental nature of features like elasticity, scalability, and geographic distribution, DR implementation on AWS can be done at 10-50% of the conventional cost. In this session, we do a deep dive into proven DR architectures on AWS and the best practices, tools and techniques to get the most out of them.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Disaster Recovery Site on AWS:
Minimal Cost Maximum Efficiency
Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS)
November 15, 2013
What you will learn
• Why AWS for disaster recovery?
• Common DR architectures
– Pilot light architecture
• Demo
• Code walkthrough
– Backup and restore
• Customer case studies
• Where to go next
Conventional Disaster Recovery sites
• High cost
• Low ROI
• Implemented only for most critical systems
• Usually scaled down to 50% of production
• Systems in a remote region challenging
• Costly software licenses based on hardware usage
Disaster Recovery site on AWS
• Unprecedented capabilities to implement DR sites
• Easily setup DR sites on different geographic regions
• Cut down DR site cost by up to 70%
• Substantial savings on software licenses
Global reach from your desktop
Common DR architectures
Backup and
restore Pilot light
Warm standby
Hot standby
Pilot light architecture
Pilot light architecture
Create instances from
AMIs
Build resources around
replicated dataset
Keep ‘pilot light’ on by replicating core
databases
Build AWS resources around dataset and
leave in stopped state
Pilot light architecture
Build resources around
replicated dataset
Keep ‘pilot light’ on by replicating core
databases
Build AWS resources around dataset and
leave in stopped state
Scale resources in AWS in
response to a DR event
Start up pool of resources in AWS when
events dictate
Scale up the database instance to handle
production capacity
Pilot light architecture
Pilot light architecture
Switchover to AWS Make necessary DNS changes to redirect
traffic to the DR site on AWS
Pilot Light
DEMO
Setup Data Replication
Active Passive
Amazon Route 53
Scaled down Standby
Elastic Load
Balancing
Data Volume
Web/ App servers
US East (N. Virginia)
Web/ App Server AMI
Simple DR solution – awsdrdemo.com
Copy AMI
US West (N. California)
Active
Auto scaling Group
Oracle Master
DB
Oracle Slave DB
Active
Amazon Route 53
Elastic Load
Balancing
Data Volume
Web/ App servers
US East (N. Virginia)
Simple DR solution – awsdrdemo.com
US West (N. California)
Gone Active
Elastic Load
Balancing
Data Volume
Web/ App servers
Active
Auto Scaling group
Oracle Master
DB
Oracle Slave DB
DNS Failover
Autoscale
Scale up DB
Architecture
Active Mirroring /
Replication
Active Passive Amazon Route 53
AMI - Scaled down
Standby
Data Volume
Secondary DB
US West (N. California) Data
Volume
Primary
Web/ App server
US East (N. Virginia)
Webserver AMI
AMI Copy
(ami-996634f0)
Failover App
VPC ID - vpc-a4f2efcc
Subnet IDs-
subnet-bbf2efd3
subnet-884b01ce
subnet-bef2efd6
VPC ID - vpc-5f9ef53e
Subnet IDs-
subnet-440c786c
subnet-289ef549
subnet-2c9ef54d
DR ELB -
Created on Failover
Web Servers:
i-36af5751
awsdrdemo.com
Active ELB:
DRDemoPrimaryELB-
52152634.us-east-
1.elb.amazonaws.com
Primary Database Server:
(i-026aad65)
Private IP
174.168.1.11
Secondary Database Server:
(i-3b266960)
Private IP
174.168.1.11
Failover App Instance:
i-55cfde0e
Elastic IP
54.215.157.25
Web Servers -
Created on Failover
failover.awsdrdemo.com
console.aws.amazon.com
Demo – AWS Resources
awsdrdemo.com
Demo – Application
failover.awsdrdemo.com
Demo – Failover Kickoff
status.awsdrdemo.com/dr
Demo – Failover Status Updates
Failover Steps
Launch Failover
Application
AWS CloudFormation
- Launch web servers
Resize Target
Database Instance
Route 53 DNS
Updates
AWS CloudFormation
–
Launch ELB Go Live
Failover Application Architecture
AWS Region
Webserver AMI
Failover App
CLI
(3)
Launch
CloudFormation
Admin
Users
SNS HTTP
Notification
(5)
CF
Updates
(4)
Script
Updates
(2)
Invoke
Shell Script
(1)
Trigger DR
procedure
(6)
Real-time
feed from SNS
Metadata Requests // Sample code for metadata request using .NET API SDK
string uri = "http://169.254.169.254/latest/meta-data/placement/availability-zone";
// Create Web Request
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse webresponse =
webresponse = (HttpWebResponse)webrequest.GetResponse();
Encoding enc = System.Text.Encoding.GetEncoding(1252);
StreamReader loResponseStream = new
StreamReader(webresponse.GetResponseStream(), enc);
// get availability zone value
string availzone = loResponseStream.ReadToEnd();
Amazon Route53 Updates
# Retrieving existing ELB details from Route53 Hosted Zone..“
domainname=www.awsdrdemo.com
hostedzoneid="ZXXXXXXXXXXXXR“
# Retrieve ELB alias zone-id from existing Route53 zone
zoneid= $(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --
start-record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $2'})
dns=$(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --start-
record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $4'})
aws --region us-west-1 route53 change-resource-record-sets --hosted-zone-id $hostedzoneid --
change-batch file:///usr/local/bin/route53.json
http://vrg.s3.amazonaws.com/downloads/route53.json
Resize Database Instance # Stopping DB instance for resizing
aws --region us-west-1 ec2 stop-instances --instance-ids $dbInstanceId
# Publish Amazon SNS messages for actions
aws --region us-west-1 sns publish --topic-arn $snsarn --message "Resizing the stopped
instance“
# Resize the DB instance
aws --region us-west-1 ec2 modify-instance-attribute --instance-id $dbInstanceId --instance-
type "{\"Value\": \"m1.small\"}"
# Start the resized DB instance
aws --region us-west-1 ec2 start-instances --instance-ids $dbInstanceId
AWS CloudFormation Stack Launch # Launch DR stack using AWS CloudFormation script
launchedstackid =$(aws --region us-west-1 --output text cloudformation create-stack --stack-
name $stackname --template-body file:///usr/local/bin/ELBWithEC2Instances.template --
notification-ar-ns $snsarn --parameters
ParameterKey="HostedZoneId",ParameterValue="$hostedzoneid")
AWS CloudFormation Template {
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "AWS CloudFormation Template ELBWithEC2Instances: Create a load balanced, Auto Scaled sample website where the instances are locked down to only accept traffic from the load balancer. This script creates an Auto Scaling group behind a load balancer with a simple health check. The web site is available on port 80, however, the instances can be configured to listen on any port (8888 by default).",
"Parameters" : {
"KeyPairName" : {
"Description" : "Name of an existing Amazon EC2 key pair for SSH access",
"Type" : "String",
"Default" : "kamalkeydr"
},
"InstanceType" : {
"Description" : "WebServer EC2 instance type",
"Type" : "String",
"Default" : "m1.small",
"AllowedValues" : [ "t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xlarge","cg1.4xlarge"],
"ConstraintDescription" : "must be a valid EC2 instance type."
},
"WebServerPort" : {
"Description" : "TCP/IP port of the web server",
"Type" : "String",
"Default" : "80"
},
"HostedZoneId" : {
"Type" : "String",
"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone",
"Default" : "Z1M58G0W56PQJA"
}
},
"Mappings" : {
"AWSInstanceType2Arch" : {
"t1.micro" : { "Arch" : "64" },
"m1.small" : { "Arch" : "64" },
"m1.medium" : { "Arch" : "64" },
"m1.large" : { "Arch" : "64" },
"m1.xlarge" : { "Arch" : "64" },
"m2.xlarge" : { "Arch" : "64" },
"m2.2xlarge" : { "Arch" : "64" },
"m2.4xlarge" : { "Arch" : "64" },
"c1.medium" : { "Arch" : "64" },
"c1.xlarge" : { "Arch" : "64" }
},
"AWSRegionArch2AMI" : {
"us-west-1" : { "32" : "ami-5e41761b", "64" : "ami-5e41761b" }
}
},
"Resources" : {
"WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"AvailabilityZones" : [ "us-west-1a"],
"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },
"MinSize" : "2",
"MaxSize" : "2",
"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],
"VPCZoneIdentifier" : ["subnet-bbf2efd3"]
}
},
"LaunchConfig" : {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },
{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" },
"Arch" ] } ] },
"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},
"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],
"InstanceType" : { "Ref" : "InstanceType" },
"KeyName" : { "Ref" : "KeyPairName" },
"AssociatePublicIpAddress" : "true"
}
},
"ElasticLoadBalancer" : {
"Type" : "AWS::ElasticLoadBalancing::LoadBalancer",
"Properties" : {
"SecurityGroups" : [ { "Ref" : "LoadBalancerSecurityGroup" } ],
"Subnets" : ["subnet-bbf2efd3"],
"Listeners" : [ {
"LoadBalancerPort" : "80",
"InstancePort" : { "Ref" : "WebServerPort" },
"Protocol" : "HTTP"
} ],
"HealthCheck" : {
"Target" : { "Fn::Join" : [ "", ["HTTP:", { "Ref" : "WebServerPort" }, "/"]]},
"HealthyThreshold" : "2",
"UnhealthyThreshold" : "10",
"Interval" : "10",
"Timeout" : "3"
}
}
},
"LoadBalancerSecurityGroup" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Enable HTTP access on port 80",
"VpcId" : "vpc-a4f2efcc",
"SecurityGroupIngress" : [ {
"IpProtocol" : "tcp",
"FromPort" : "80",
"ToPort" : "80",
"CidrIp" : "0.0.0.0/0"
} ],
"SecurityGroupEgress" : [ {
"IpProtocol" : "tcp",
"FromPort" : { "Ref" : "WebServerPort" },
"ToPort" : { "Ref" : "WebServerPort" },
"CidrIp" : "0.0.0.0/0"
} ]
}
},
"myDNS" : {
"Type" : "AWS::Route53::RecordSetGroup",
"Properties" : {
"HostedZoneName" : "awsdrdemo.com.",
"Comment" : "Zone apex alias targeted to myELB LoadBalancer.",
"RecordSets" : [
{
"Name" : "www.awsdrdemo.com.",
"Type" : "A",
"AliasTarget" : {
"HostedZoneId" : { "Fn::GetAtt" : ["ElasticLoadBalancer", "CanonicalHostedZoneNameID"] },
"DNSName" : { "Fn::GetAtt" : ["ElasticLoadBalancer","CanonicalHostedZoneName"] }
}
}
]
}
},
"InstanceSecurityGroup" : {
"Type" : "AWS::EC2::SecurityGroup",
"Properties" : {
"GroupDescription" : "Enable SSH access and HTTP access on the inbound port",
"VpcId" : "vpc-a4f2efcc",
"SecurityGroupIngress" : [ {
"IpProtocol" : "tcp",
"FromPort" : { "Ref" : "WebServerPort" },
"ToPort" : { "Ref" : "WebServerPort" },
"CidrIp" : "0.0.0.0/0"
} ]
}
}
},
"Outputs" : {
"URL" : {
"Description" : "URL of the website",
"Value" : { "Fn::Join" : [ "", [ "http://", { "Fn::GetAtt" : [ "ElasticLoadBalancer", "DNSName" ]}]]}
}
}
}
HEADERS
PARAMETERS
MAPPINGS
RESOURCES
OUTPUTS
http://vrg.s3.amazonaws.com/downloads/ELBWithEC2Instances.template
Parameters "Parameters" : {
"KeyPairName" : {
"Description" : "Name of an existing Amazon EC2 key pair for SSH access",
"Type" : "String"
},
"InstanceType" : {
"Description" : "WebServer EC2 instance type",
"Type" : "String",
"Default" : "m1.small",
"AllowedValues" : [
"t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xl
arge","cg1.4xlarge"],
"ConstraintDescription" : "must be a valid EC2 instance type."
},
"HostedZoneId" : {
"Type" : "String",
"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone"
}
}
Resources – Web Servers "WebServerGroup" : {
"Type" : "AWS::AutoScaling::AutoScalingGroup",
"Properties" : {
"AvailabilityZones" : [ "us-west-1a"],
"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },
"MinSize" : "2",
"MaxSize" : "2",
"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],
"VPCZoneIdentifier" : ["subnet-bbf2efd3"]
}
},
"LaunchConfig" : {
"Type" : "AWS::AutoScaling::LaunchConfiguration",
"Properties" : {
"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },
{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] },
"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},
"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],
"KeyName" : { "Ref" : "KeyPairName" }
}
status.awsdrdemo.com/dr
Demo – Failover Status Updates
Disaster recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Primary and DR sites on AWS
Backup & Restore pattern
Simple to get started
Easy starting point for exploring the
AWS cloud
Low technical barrier to entry
Focus on incorporating cloud into your
DR strategy, not on complex technical
issues related to hot-hot systems
Cost-effective
Very high levels of data durability at
low price
Cost of storing snapshots in
Amazon S3
Archiving possibilities beyond tape
using Amazon Glacier
Backup and restore
Backup and restore
Create instances from
AMIs
Restore data from backups
Backup and restore
Many ways to backup
Disaster Recovery site on AWS can be for
• Primary site on customer data center
• Primary on AWS itself
Primary and DR sites on AWS
Customer case study
We are sincerely eager to hear
your feedback on this
presentation and on re:Invent.
Please fill out an evaluation form
when you have a chance.