disaster recovery site on aws - minimal cost maximum efficiency (stg305) | aws re:invent 2013

40
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Disaster Recovery Site on AWS: Minimal Cost Maximum Efficiency Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS) November 15, 2013

Upload: amazon-web-services

Post on 12-Jan-2015

1.244 views

Category:

Technology


6 download

DESCRIPTION

Implementation of a disaster recovery (DR) site is crucial for the business continuity of any enterprise. Due to the fundamental nature of features like elasticity, scalability, and geographic distribution, DR implementation on AWS can be done at 10-50% of the conventional cost. In this session, we do a deep dive into proven DR architectures on AWS and the best practices, tools and techniques to get the most out of them.

TRANSCRIPT

Page 1: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Disaster Recovery Site on AWS:

Minimal Cost Maximum Efficiency

Abdul Sathar Sait, Vikram Garlapati, and Kamal Arora (AWS)

November 15, 2013

Page 2: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

What you will learn

• Why AWS for disaster recovery?

• Common DR architectures

– Pilot light architecture

• Demo

• Code walkthrough

– Backup and restore

• Customer case studies

• Where to go next

Page 3: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Conventional Disaster Recovery sites

• High cost

• Low ROI

• Implemented only for most critical systems

• Usually scaled down to 50% of production

• Systems in a remote region challenging

• Costly software licenses based on hardware usage

Page 4: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Disaster Recovery site on AWS

• Unprecedented capabilities to implement DR sites

• Easily setup DR sites on different geographic regions

• Cut down DR site cost by up to 70%

• Substantial savings on software licenses

Page 5: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Global reach from your desktop

Page 6: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Common DR architectures

Backup and

restore Pilot light

Warm standby

Hot standby

Page 7: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Pilot light architecture

Page 8: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Pilot light architecture

Create instances from

AMIs

Page 9: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Build resources around

replicated dataset

Keep ‘pilot light’ on by replicating core

databases

Build AWS resources around dataset and

leave in stopped state

Pilot light architecture

Page 10: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Build resources around

replicated dataset

Keep ‘pilot light’ on by replicating core

databases

Build AWS resources around dataset and

leave in stopped state

Scale resources in AWS in

response to a DR event

Start up pool of resources in AWS when

events dictate

Scale up the database instance to handle

production capacity

Pilot light architecture

Page 11: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Pilot light architecture

Switchover to AWS Make necessary DNS changes to redirect

traffic to the DR site on AWS

Page 12: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Pilot Light

DEMO

Page 13: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Setup Data Replication

Active Passive

Amazon Route 53

Scaled down Standby

Elastic Load

Balancing

Data Volume

Web/ App servers

US East (N. Virginia)

Web/ App Server AMI

Simple DR solution – awsdrdemo.com

Copy AMI

US West (N. California)

Active

Auto scaling Group

Oracle Master

DB

Oracle Slave DB

Page 14: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Active

Amazon Route 53

Elastic Load

Balancing

Data Volume

Web/ App servers

US East (N. Virginia)

Simple DR solution – awsdrdemo.com

US West (N. California)

Gone Active

Elastic Load

Balancing

Data Volume

Web/ App servers

Active

Auto Scaling group

Oracle Master

DB

Oracle Slave DB

DNS Failover

Autoscale

Scale up DB

Page 15: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Architecture

Active Mirroring /

Replication

Active Passive Amazon Route 53

AMI - Scaled down

Standby

Data Volume

Secondary DB

US West (N. California) Data

Volume

Primary

Web/ App server

US East (N. Virginia)

Webserver AMI

AMI Copy

(ami-996634f0)

Failover App

VPC ID - vpc-a4f2efcc

Subnet IDs-

subnet-bbf2efd3

subnet-884b01ce

subnet-bef2efd6

VPC ID - vpc-5f9ef53e

Subnet IDs-

subnet-440c786c

subnet-289ef549

subnet-2c9ef54d

DR ELB -

Created on Failover

Web Servers:

i-36af5751

awsdrdemo.com

Active ELB:

DRDemoPrimaryELB-

52152634.us-east-

1.elb.amazonaws.com

Primary Database Server:

(i-026aad65)

Private IP

174.168.1.11

Secondary Database Server:

(i-3b266960)

Private IP

174.168.1.11

Failover App Instance:

i-55cfde0e

Elastic IP

54.215.157.25

Web Servers -

Created on Failover

failover.awsdrdemo.com

Page 16: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

console.aws.amazon.com

Demo – AWS Resources

Page 17: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

awsdrdemo.com

Demo – Application

Page 18: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

failover.awsdrdemo.com

Demo – Failover Kickoff

Page 19: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

status.awsdrdemo.com/dr

Demo – Failover Status Updates

Page 20: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Failover Steps

Launch Failover

Application

AWS CloudFormation

- Launch web servers

Resize Target

Database Instance

Route 53 DNS

Updates

AWS CloudFormation

Launch ELB Go Live

Page 21: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Failover Application Architecture

AWS Region

Webserver AMI

Failover App

CLI

(3)

Launch

CloudFormation

Admin

Users

SNS HTTP

Notification

(5)

CF

Updates

(4)

Script

Updates

(2)

Invoke

Shell Script

(1)

Trigger DR

procedure

(6)

Real-time

feed from SNS

Page 22: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Metadata Requests // Sample code for metadata request using .NET API SDK

string uri = "http://169.254.169.254/latest/meta-data/placement/availability-zone";

// Create Web Request

HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(uri);

HttpWebResponse webresponse =

webresponse = (HttpWebResponse)webrequest.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream = new

StreamReader(webresponse.GetResponseStream(), enc);

// get availability zone value

string availzone = loResponseStream.ReadToEnd();

Page 23: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Amazon Route53 Updates

# Retrieving existing ELB details from Route53 Hosted Zone..“

domainname=www.awsdrdemo.com

hostedzoneid="ZXXXXXXXXXXXXR“

# Retrieve ELB alias zone-id from existing Route53 zone

zoneid= $(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --

start-record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $2'})

dns=$(aws --region us-west-1 --output text route53 list-resource-record-sets --hosted-zone-id $hostedzoneid --start-

record-name $domainname --start-record-type A --max-items 1 | grep ALIASTARGET | awk {'print $4'})

aws --region us-west-1 route53 change-resource-record-sets --hosted-zone-id $hostedzoneid --

change-batch file:///usr/local/bin/route53.json

http://vrg.s3.amazonaws.com/downloads/route53.json

Page 24: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Resize Database Instance # Stopping DB instance for resizing

aws --region us-west-1 ec2 stop-instances --instance-ids $dbInstanceId

# Publish Amazon SNS messages for actions

aws --region us-west-1 sns publish --topic-arn $snsarn --message "Resizing the stopped

instance“

# Resize the DB instance

aws --region us-west-1 ec2 modify-instance-attribute --instance-id $dbInstanceId --instance-

type "{\"Value\": \"m1.small\"}"

# Start the resized DB instance

aws --region us-west-1 ec2 start-instances --instance-ids $dbInstanceId

Page 25: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

AWS CloudFormation Stack Launch # Launch DR stack using AWS CloudFormation script

launchedstackid =$(aws --region us-west-1 --output text cloudformation create-stack --stack-

name $stackname --template-body file:///usr/local/bin/ELBWithEC2Instances.template --

notification-ar-ns $snsarn --parameters

ParameterKey="HostedZoneId",ParameterValue="$hostedzoneid")

Page 26: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

AWS CloudFormation Template {

"AWSTemplateFormatVersion" : "2010-09-09",

"Description" : "AWS CloudFormation Template ELBWithEC2Instances: Create a load balanced, Auto Scaled sample website where the instances are locked down to only accept traffic from the load balancer. This script creates an Auto Scaling group behind a load balancer with a simple health check. The web site is available on port 80, however, the instances can be configured to listen on any port (8888 by default).",

"Parameters" : {

"KeyPairName" : {

"Description" : "Name of an existing Amazon EC2 key pair for SSH access",

"Type" : "String",

"Default" : "kamalkeydr"

},

"InstanceType" : {

"Description" : "WebServer EC2 instance type",

"Type" : "String",

"Default" : "m1.small",

"AllowedValues" : [ "t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xlarge","cg1.4xlarge"],

"ConstraintDescription" : "must be a valid EC2 instance type."

},

"WebServerPort" : {

"Description" : "TCP/IP port of the web server",

"Type" : "String",

"Default" : "80"

},

"HostedZoneId" : {

"Type" : "String",

"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone",

"Default" : "Z1M58G0W56PQJA"

}

},

"Mappings" : {

"AWSInstanceType2Arch" : {

"t1.micro" : { "Arch" : "64" },

"m1.small" : { "Arch" : "64" },

"m1.medium" : { "Arch" : "64" },

"m1.large" : { "Arch" : "64" },

"m1.xlarge" : { "Arch" : "64" },

"m2.xlarge" : { "Arch" : "64" },

"m2.2xlarge" : { "Arch" : "64" },

"m2.4xlarge" : { "Arch" : "64" },

"c1.medium" : { "Arch" : "64" },

"c1.xlarge" : { "Arch" : "64" }

},

"AWSRegionArch2AMI" : {

"us-west-1" : { "32" : "ami-5e41761b", "64" : "ami-5e41761b" }

}

},

"Resources" : {

"WebServerGroup" : {

"Type" : "AWS::AutoScaling::AutoScalingGroup",

"Properties" : {

"AvailabilityZones" : [ "us-west-1a"],

"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },

"MinSize" : "2",

"MaxSize" : "2",

"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],

"VPCZoneIdentifier" : ["subnet-bbf2efd3"]

}

},

"LaunchConfig" : {

"Type" : "AWS::AutoScaling::LaunchConfiguration",

"Properties" : {

"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },

{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" },

"Arch" ] } ] },

"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},

"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],

"InstanceType" : { "Ref" : "InstanceType" },

"KeyName" : { "Ref" : "KeyPairName" },

"AssociatePublicIpAddress" : "true"

}

},

"ElasticLoadBalancer" : {

"Type" : "AWS::ElasticLoadBalancing::LoadBalancer",

"Properties" : {

"SecurityGroups" : [ { "Ref" : "LoadBalancerSecurityGroup" } ],

"Subnets" : ["subnet-bbf2efd3"],

"Listeners" : [ {

"LoadBalancerPort" : "80",

"InstancePort" : { "Ref" : "WebServerPort" },

"Protocol" : "HTTP"

} ],

"HealthCheck" : {

"Target" : { "Fn::Join" : [ "", ["HTTP:", { "Ref" : "WebServerPort" }, "/"]]},

"HealthyThreshold" : "2",

"UnhealthyThreshold" : "10",

"Interval" : "10",

"Timeout" : "3"

}

}

},

"LoadBalancerSecurityGroup" : {

"Type" : "AWS::EC2::SecurityGroup",

"Properties" : {

"GroupDescription" : "Enable HTTP access on port 80",

"VpcId" : "vpc-a4f2efcc",

"SecurityGroupIngress" : [ {

"IpProtocol" : "tcp",

"FromPort" : "80",

"ToPort" : "80",

"CidrIp" : "0.0.0.0/0"

} ],

"SecurityGroupEgress" : [ {

"IpProtocol" : "tcp",

"FromPort" : { "Ref" : "WebServerPort" },

"ToPort" : { "Ref" : "WebServerPort" },

"CidrIp" : "0.0.0.0/0"

} ]

}

},

"myDNS" : {

"Type" : "AWS::Route53::RecordSetGroup",

"Properties" : {

"HostedZoneName" : "awsdrdemo.com.",

"Comment" : "Zone apex alias targeted to myELB LoadBalancer.",

"RecordSets" : [

{

"Name" : "www.awsdrdemo.com.",

"Type" : "A",

"AliasTarget" : {

"HostedZoneId" : { "Fn::GetAtt" : ["ElasticLoadBalancer", "CanonicalHostedZoneNameID"] },

"DNSName" : { "Fn::GetAtt" : ["ElasticLoadBalancer","CanonicalHostedZoneName"] }

}

}

]

}

},

"InstanceSecurityGroup" : {

"Type" : "AWS::EC2::SecurityGroup",

"Properties" : {

"GroupDescription" : "Enable SSH access and HTTP access on the inbound port",

"VpcId" : "vpc-a4f2efcc",

"SecurityGroupIngress" : [ {

"IpProtocol" : "tcp",

"FromPort" : { "Ref" : "WebServerPort" },

"ToPort" : { "Ref" : "WebServerPort" },

"CidrIp" : "0.0.0.0/0"

} ]

}

}

},

"Outputs" : {

"URL" : {

"Description" : "URL of the website",

"Value" : { "Fn::Join" : [ "", [ "http://", { "Fn::GetAtt" : [ "ElasticLoadBalancer", "DNSName" ]}]]}

}

}

}

HEADERS

PARAMETERS

MAPPINGS

RESOURCES

OUTPUTS

http://vrg.s3.amazonaws.com/downloads/ELBWithEC2Instances.template

Page 27: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Parameters "Parameters" : {

"KeyPairName" : {

"Description" : "Name of an existing Amazon EC2 key pair for SSH access",

"Type" : "String"

},

"InstanceType" : {

"Description" : "WebServer EC2 instance type",

"Type" : "String",

"Default" : "m1.small",

"AllowedValues" : [

"t1.micro","m1.small","m1.medium","m1.large","m1.xlarge","m2.xlarge","m2.2xlarge","m2.4xlarge","c1.medium","c1.xlarge","cc1.4xlarge","cc2.8xl

arge","cg1.4xlarge"],

"ConstraintDescription" : "must be a valid EC2 instance type."

},

"HostedZoneId" : {

"Type" : "String",

"Description" : "The Record Set's Hosted Zone Id for the existing hosted zone"

}

}

Page 28: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Resources – Web Servers "WebServerGroup" : {

"Type" : "AWS::AutoScaling::AutoScalingGroup",

"Properties" : {

"AvailabilityZones" : [ "us-west-1a"],

"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },

"MinSize" : "2",

"MaxSize" : "2",

"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" }],

"VPCZoneIdentifier" : ["subnet-bbf2efd3"]

}

},

"LaunchConfig" : {

"Type" : "AWS::AutoScaling::LaunchConfiguration",

"Properties" : {

"ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },

{ "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] },

"UserData" : { "Fn::Base64" : { "Ref" : "WebServerPort" }},

"SecurityGroups" : [ { "Ref" : "InstanceSecurityGroup" } ],

"KeyName" : { "Ref" : "KeyPairName" }

}

Page 29: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

status.awsdrdemo.com/dr

Demo – Failover Status Updates

Page 30: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Disaster recovery site on AWS can be for

• Primary site on customer data center

• Primary on AWS itself

Page 31: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Primary and DR sites on AWS

Page 32: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Backup & Restore pattern

Simple to get started

Easy starting point for exploring the

AWS cloud

Low technical barrier to entry

Focus on incorporating cloud into your

DR strategy, not on complex technical

issues related to hot-hot systems

Cost-effective

Very high levels of data durability at

low price

Cost of storing snapshots in

Amazon S3

Archiving possibilities beyond tape

using Amazon Glacier

Page 33: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Backup and restore

Page 34: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Backup and restore

Page 35: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Create instances from

AMIs

Restore data from backups

Backup and restore

Page 36: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Many ways to backup

Page 37: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Disaster Recovery site on AWS can be for

• Primary site on customer data center

• Primary on AWS itself

Page 38: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Primary and DR sites on AWS

Page 39: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

Customer case study

Page 40: Disaster Recovery Site on AWS - Minimal Cost Maximum Efficiency (STG305) | AWS re:Invent 2013

We are sincerely eager to hear

your feedback on this

presentation and on re:Invent.

Please fill out an evaluation form

when you have a chance.