up and running, even during disaster

4
A cost-effective, fully-automated, reduced time-to-recover disaster recovery solution for a leading consumer goods company CLIENT SUCCESS STORY Up and running, even during disaster Brand websites are today an important medium to deliver brand experiences. And when you are one of the top three consumer brand companies in the world, ensuring continuity of your brand websites in case of a disaster is critical. CSS Corp planned and executed a cost-effective disaster recovery of the content management platform of a leading consumer brand company without disturbing their existing production infrastructure.

Upload: css-corp

Post on 21-Aug-2015

28 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Up and Running, even during disaster

A cost-effective, fully-automated, reduced time-to-recover disaster recovery solution for aleading consumer goods company

CLIE

NT S

UCCE

SS S

TORY

Up and running, even during disaster

Brand websites are today an important medium to deliver brand experiences. And when you are one of the top three consumer brand companies in the world, ensuring continuity of your brand websites in case of a disaster is critical. CSS Corp planned and executed a cost-effective disaster recovery of the content management platform of a leading consumer brand company without disturbing their existing production infrastructure.

Page 2: Up and Running, even during disaster

Being prepared for possible risks and unplanned outages is a must to minimize, if not eliminate, the damage to reputation and financial losses that disasters may cause for any business. And the cost and impact is far greater when you are one of the top three players in your industry. A carefully planned, crafted, and tested disaster recovery (DR) is essential. Our client, a leading consumer goods company, knew the importance of DR and approached us to test its business-critical content delivery infrastructure. Our pilot light method gave a quick recovery time and ensured cost-effectiveness without disturbing the existing production infrastructure.

One of the top three consumer brand companies in the world, providing some of the best-known global products to people in more than 190 countries.

About the Client

Client SituationThe company uses a well-known content management system to host their top brand commercial websites. Client's business considers this as a platinum platform. This infrastructure is hosted in the US East. The company wanted to set up disaster recovery (DR) in the US West.

The company mandated CSS Corp with planning and architecting the recovery of the infrastructure that hosts customer-facing Content Delivery Application (CDA) from the US East to the US West, in case of a disaster.

Disaster recovery in Amazon Web Services (AWS) cloud – the context

Understanding the difference between AWS availability zones and AWS regions is very important while designing DR for AWS cloud. In AWS, Availability Zones are one or more independent data centers. Architecting and designing deployment architectures across Availability Zones is for high availability of the application. CSS Corp designs and deploys every tier of the deployment in high availability mode.

Further, in AWS, all availability zones within a particular region form a region. Regional failure is considered as disaster. In this context, recovering from one region to another region is disaster recovery.

The CSS Corp SolutionCSS Corp adopted the approach of creating a pilot light model DR for the platinum platform to achieve operating efficiencies and cost optimization. The team created a comprehensive plan to provision / deploy the entire platinum platform's CDA infrastructure in the US West region once the recovery site was built. CSS also executed the Disaster Recovery processto test the recovery without disturbing the current production infrastructure in the US East.To this end, the CSS team worked collaboratively with the company’s digital marketing team, the service continuity team, their platform management partners, and AWS solution architects.

Security Group

Availability Zone 3

DNS SWITCH OVER

Public Subnet

Private Subnet

US East

US West

REGIONAL FAILURE DETECTED

DISASTER RECOVERY

Push-buttonRecovery

ORCHESTRATION BUILD ON REGION 2

CSS Corp - Automation and Recovery Strategy

SNAPSHOT - AMI

A Synchronous Mirror

Witn

ess

Mon

itorin

g

Act

ive

/ Pas

sive

Master Slave

Data transfer

Log Shipping

3

4

5

2

AM

I

PIL

OT

LIG

HT

DR

PRODUCTION RECOVERY

Security Group

Availability Zone 2

VPC

Security Group

Availability Zone 1

MSecurity Group

Availability Zone 1

MS SQL

MSecurity Group

Availability Zone 2

MS SQL

1

Private Subnet

MSecurity Group

Availability Zone 1

MS SQL

Public Subnet

Security Group

Availability Zone 1

West

VPC

Minimal RunningFootprint

6

PA S

s

Mity Group

lability Zone 1

Slave

ne 2

Subnets Mirror

Mon

itorin

g

Act

ive

/ Pas

sive

Securit

Availability

L

Sec

Availabili

RECOVERY

TRY

B

Snapshot

Pilot light DR model

Page 3: Up and Running, even during disaster

As Backup is the base for any recovery process, CSS Corp worked on a custom backup strategy.

– Instances are bundled and snapshots placed in S3

Database point-in-time recovery

1. Take transaction log backup

2. Take full backup

3. Restore full backup with “No Recovery” option

4. Restore transaction log backup using stop-at option in point-in-time

The team began by building a lightweight replica of the production infrastructure in cloud in the US West. This was followed by building an “as-is” database server with log shipping in the US West. All other components were replicated in passive through snapshots and AMI shipping.

Prerequisites

AMI

Snapcopy

Log shipping

During DR

Make pilot light – production heavy

Add all moving parts – ELBs, SSLs

Dummy DNS to test production websites

Validate production sites with Platform Partners

Sign off from client

Post implementation review

Log shipping and DB mirroring for High availability

Ongoing ELB / SSL cert mapping in DR

EBS snapcopy – ongoing

Page 4: Up and Running, even during disaster

Mind map of the DR flow

Business OutcomesOur approach resulted in six key benefits for the client:

Cost-effective: By planning and executing a pilot light disaster recovery, which can be scaled-up and made production-ready, our team ensured cost-effective deployment.

Low Recovery Time Objective (RTO): The pilot light method gave a quicker recovery time (of 5 hours) than the ‘backup and restore’ scenario as the core pieces of the system are already running and are continually kept up-to-date. Target RTO from business: 24 hours.

Recovery Point Objective (RPO): We currently have RPO of four hours as per the backup schedule and recovery strategy.

Fully automated reduced time to recover: Our fully integrated automated platform (HAPX) and AWS API facilitates automated provisioning and configuration of the infrastructure resources.

Data consistency: Log shipping ensured data consistency.

PII compliance: As DR for US East was deployed in US West, data within the geography stayed within the region thus ensuring compliance to PII regulations.

For more information, please mail us at [email protected]

AWS

Common Technology Platform

Login Sync Job

Snap Copy

EBS Snapshot

Log Shipping

SSL

SSL

Out of 16 servers, 9 AMI’s willbe bundled from US eastregion to US west

Script to automate EBS snapcopy from region 1 to region 2every 12 hours

Disaster Recovery SiteProduction Site

USEast US

West

Not Running

Elastic Load Balancing

EC2 DB on Instance

EC2 DB on Instance

Elastic Load Balancing