up and running, even during disaster
TRANSCRIPT
A cost-effective, fully-automated, reduced time-to-recover disaster recovery solution for aleading consumer goods company
CLIE
NT S
UCCE
SS S
TORY
Up and running, even during disaster
Brand websites are today an important medium to deliver brand experiences. And when you are one of the top three consumer brand companies in the world, ensuring continuity of your brand websites in case of a disaster is critical. CSS Corp planned and executed a cost-effective disaster recovery of the content management platform of a leading consumer brand company without disturbing their existing production infrastructure.
Being prepared for possible risks and unplanned outages is a must to minimize, if not eliminate, the damage to reputation and financial losses that disasters may cause for any business. And the cost and impact is far greater when you are one of the top three players in your industry. A carefully planned, crafted, and tested disaster recovery (DR) is essential. Our client, a leading consumer goods company, knew the importance of DR and approached us to test its business-critical content delivery infrastructure. Our pilot light method gave a quick recovery time and ensured cost-effectiveness without disturbing the existing production infrastructure.
One of the top three consumer brand companies in the world, providing some of the best-known global products to people in more than 190 countries.
About the Client
Client SituationThe company uses a well-known content management system to host their top brand commercial websites. Client's business considers this as a platinum platform. This infrastructure is hosted in the US East. The company wanted to set up disaster recovery (DR) in the US West.
The company mandated CSS Corp with planning and architecting the recovery of the infrastructure that hosts customer-facing Content Delivery Application (CDA) from the US East to the US West, in case of a disaster.
Disaster recovery in Amazon Web Services (AWS) cloud – the context
Understanding the difference between AWS availability zones and AWS regions is very important while designing DR for AWS cloud. In AWS, Availability Zones are one or more independent data centers. Architecting and designing deployment architectures across Availability Zones is for high availability of the application. CSS Corp designs and deploys every tier of the deployment in high availability mode.
Further, in AWS, all availability zones within a particular region form a region. Regional failure is considered as disaster. In this context, recovering from one region to another region is disaster recovery.
The CSS Corp SolutionCSS Corp adopted the approach of creating a pilot light model DR for the platinum platform to achieve operating efficiencies and cost optimization. The team created a comprehensive plan to provision / deploy the entire platinum platform's CDA infrastructure in the US West region once the recovery site was built. CSS also executed the Disaster Recovery processto test the recovery without disturbing the current production infrastructure in the US East.To this end, the CSS team worked collaboratively with the company’s digital marketing team, the service continuity team, their platform management partners, and AWS solution architects.
Security Group
Availability Zone 3
DNS SWITCH OVER
Public Subnet
Private Subnet
US East
US West
REGIONAL FAILURE DETECTED
DISASTER RECOVERY
Push-buttonRecovery
ORCHESTRATION BUILD ON REGION 2
CSS Corp - Automation and Recovery Strategy
SNAPSHOT - AMI
A Synchronous Mirror
Witn
ess
Mon
itorin
g
Act
ive
/ Pas
sive
Master Slave
Data transfer
Log Shipping
3
4
5
2
AM
I
PIL
OT
LIG
HT
DR
PRODUCTION RECOVERY
Security Group
Availability Zone 2
VPC
Security Group
Availability Zone 1
MSecurity Group
Availability Zone 1
MS SQL
MSecurity Group
Availability Zone 2
MS SQL
1
Private Subnet
MSecurity Group
Availability Zone 1
MS SQL
Public Subnet
Security Group
Availability Zone 1
West
VPC
Minimal RunningFootprint
6
PA S
s
Mity Group
lability Zone 1
Slave
ne 2
Subnets Mirror
Mon
itorin
g
Act
ive
/ Pas
sive
Securit
Availability
L
Sec
Availabili
RECOVERY
TRY
B
Snapshot
Pilot light DR model
As Backup is the base for any recovery process, CSS Corp worked on a custom backup strategy.
– Instances are bundled and snapshots placed in S3
Database point-in-time recovery
1. Take transaction log backup
2. Take full backup
3. Restore full backup with “No Recovery” option
4. Restore transaction log backup using stop-at option in point-in-time
The team began by building a lightweight replica of the production infrastructure in cloud in the US West. This was followed by building an “as-is” database server with log shipping in the US West. All other components were replicated in passive through snapshots and AMI shipping.
Prerequisites
AMI
Snapcopy
Log shipping
During DR
Make pilot light – production heavy
Add all moving parts – ELBs, SSLs
Dummy DNS to test production websites
Validate production sites with Platform Partners
Sign off from client
Post implementation review
Log shipping and DB mirroring for High availability
Ongoing ELB / SSL cert mapping in DR
EBS snapcopy – ongoing
Mind map of the DR flow
Business OutcomesOur approach resulted in six key benefits for the client:
Cost-effective: By planning and executing a pilot light disaster recovery, which can be scaled-up and made production-ready, our team ensured cost-effective deployment.
Low Recovery Time Objective (RTO): The pilot light method gave a quicker recovery time (of 5 hours) than the ‘backup and restore’ scenario as the core pieces of the system are already running and are continually kept up-to-date. Target RTO from business: 24 hours.
Recovery Point Objective (RPO): We currently have RPO of four hours as per the backup schedule and recovery strategy.
Fully automated reduced time to recover: Our fully integrated automated platform (HAPX) and AWS API facilitates automated provisioning and configuration of the infrastructure resources.
Data consistency: Log shipping ensured data consistency.
PII compliance: As DR for US East was deployed in US West, data within the geography stayed within the region thus ensuring compliance to PII regulations.
For more information, please mail us at [email protected]
AWS
Common Technology Platform
Login Sync Job
Snap Copy
EBS Snapshot
Log Shipping
SSL
SSL
Out of 16 servers, 9 AMI’s willbe bundled from US eastregion to US west
Script to automate EBS snapcopy from region 1 to region 2every 12 hours
Disaster Recovery SiteProduction Site
USEast US
West
Not Running
Elastic Load Balancing
EC2 DB on Instance
EC2 DB on Instance
Elastic Load Balancing