april 2008 bcnet 2008 advanced networks conference are we ready? a disaster recovery plan for the...

42
April 2008 April 2008 BCNET 2008 Advanced Networks BCNET 2008 Advanced Networks Conference Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community

Upload: crystal-griffin

Post on 28-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Are We Ready? A Disaster Recovery Plan for the BCNET Community

Page 2: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

IntroductionsIntroductions

Scott Owen, UBCScott Owen, UBC

Steve Hillman, SFU Steve Hillman, SFU

Jaime Garcia, BCITJaime Garcia, BCIT

Colin Leavett-Brown, UVicColin Leavett-Brown, UVic

Stan Shaw, BCNET project manager, Disaster Stan Shaw, BCNET project manager, Disaster Recovery Business Case - Recovery Business Case - FacilitatorFacilitator

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 3: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 4: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 5: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 6: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 7: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 8: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 9: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 10: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Data is particularly vulnerableData is particularly vulnerable

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

to earthquake, fire, and water damageto earthquake, fire, and water damage

So whatSo whatare we doing about it?are we doing about it?

Page 11: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Panel DiscussionPanel Discussion

Page 12: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

UBCUBC

Scott OwenScott Owen– Business continuityBusiness continuity– Disaster recoveryDisaster recovery

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 13: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

SFUSFU

Steve HillmanSteve Hillman– DR Planning and Technologies

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 14: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

14

Technologies in use at SFU

Disk-to-disk replication– NetApp SnapVault

Lightweight, point-in-time copy of dataDoesn’t provide CDP

Database active-standby replication– In production on ERP DB2 database

serversBut standby server is in the same data centre

– In pilot with one MySQL server

Page 15: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

15

Technologies (cont.)

Server Virtualization– VMWare with vMotion

10-30 server instances per physical server

Instances can be moved live between physical servers

– Solaris ZonesMultiple Solaris servers per physical server allow us to host redundant services at a remote location using one physical box

Page 16: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

16

Technologies (continued)

Redundant servers– “Easiest” servers were done first, with

redundant servers located at SFU Surrey campus:

AD Servers

DNS

LDAP server

Incoming mail gateway

Authenticated mail relay host

Syslog/monitoring host

Page 17: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

17

DR Planning

Planned or Under Discussion:– Application clustering w/ some nodes at each

site– Standby databases remotely located– Solaris ZFS send/receive for remote

snapshots– More redundant servers– Very remote server for ‘dark’ Website and

ERP backup

Page 18: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

BCITBCIT

Jamie GarciaJamie Garcia– An IT Continuity Update

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 19: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

As of Jan 2007I-net

BCNET

HC

Kel

ATC/ BMC

Kel

SE12

Banner

Notes

Web

DTC

Switch

Page 20: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Actions taken

Page 21: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

As of Dec 2007I-net

ShawBCNET

HC

Kel

NE25

ATC/ BMC

Kel

SE12

Banner

Notes

Web

DTC

Banner

Notes

Web

Switch

Switch

Switch

Page 22: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Failure Impact Analysis – Loss of SE12

Jan 2007 Dec 2007 Notes

Internet Access

Burnaby RTO: seconds for NE 12 locations only

DTC Not impacted

BMC/ATC Internet traffic still routed through Burnaby

Kelowna Not impacted

DHCP/DNS RTO: seconds

Applications

Banner RTO: hours

Notes RTO: minutes

Public Web RTO: hours but static, public content only

myBCIT

Cognos

WebCT

Others

Page 23: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Failure Impact Analysis – Loss of BCNET Connection

November 23rd, 2007

23

Jan 2007

Dec 2007

Notes

Internet Access

Burnaby RTO: seconds

DTC Not impacted

BMC/ATC

Kelowna Not impacted

DHCP/DNS

RTO: seconds

Page 24: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Failure Impact Analysis – Loss of BCNET Harbour Centre

November 23rd, 2007

24

Jan 2007

Dec 2007

Notes

Internet Access

Burnaby RTO: seconds

DTC

BMC/ATC

Kelowna Not impacted

Page 25: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Backup / Restore ImprovementsDisk to disk back up to DTC to highly availability Data Domain device

Preliminary testing results: – 30 Gbytes of data consisting of 247,000 files

in just under 30 minutes. – web site restored: 90,000 files, 8.47 Gbytes in

14 minutes and 6 seconds.– ½ of Banner data base restored in 27-35

minutes compared to 1-2 ½ hours from tape

November 23rd, 2007

25

Page 26: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Next StepsRemainder of 07/08– Test Application Level continuity for Banner, Notes and Public

Web– Set RTO/RPOs based on testing– Test network fail over

08/09– Increase RTO coverage for other critical applications:

WebCT

Dynamic portions of Web

myBCIT

09/10– Move DRP datacentre out of earthquake zone

November 23rd, 2007

26

Page 27: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

UVicUVic

Colin Leavett-BrownColin Leavett-Brown– Disaster recovery and business continuityDisaster recovery and business continuity

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 28: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

BCNETBCNET

Stan ShawStan Shaw– Can we take advantage of BCNET’s Can we take advantage of BCNET’s

advanced network to enhance Disaster advanced network to enhance Disaster Recovery projects currently underway in our Recovery projects currently underway in our institutions?institutions?

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 29: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

A BCNET Collaborative EffortA BCNET Collaborative Effort

Goal: Goal: Develop strategies and a Develop strategies and a business case business case

using using – BCNET’s advanced optical networkBCNET’s advanced optical network– BCNET member institution's own resourcesBCNET member institution's own resources– and our strengths as an organization to and our strengths as an organization to

develop innovative, collaborative solutionsdevelop innovative, collaborative solutions

To address risks associated with major data To address risks associated with major data loss due to a wide scale disaster.loss due to a wide scale disaster.

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 30: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

How we did itHow we did it

Determine ScopeDetermine Scope– Develop a flexible business model to address the Develop a flexible business model to address the

need to protect core administrative systemsneed to protect core administrative systems

Determine what is NOT in scopeDetermine what is NOT in scope– We are not seeking to develop an inter-institutional We are not seeking to develop an inter-institutional

model of business continuitymodel of business continuity– Covers only BCNET core and higher education Covers only BCNET core and higher education

institution membersinstitution members– While recognizing research is essential to the life of a While recognizing research is essential to the life of a

university, it would be beyond the scope of this university, it would be beyond the scope of this project to develop a comprehensive business plan to project to develop a comprehensive business plan to recover research and departmental data.recover research and departmental data.

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 31: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Case StudyCase Study

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Ohio State University University of Cincinnati

Page 32: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Ohio State University and Ohio State University and University of CincinnatiUniversity of Cincinnati

Faced significant challenges Faced significant challenges in obtaining fundingin obtaining funding

dealing with hardware and software compatibilitydealing with hardware and software compatibility

Addressing policies and politicsAddressing policies and politics

Getting legal agreements in placeGetting legal agreements in place

Required a phased approachRequired a phased approach a working short-term solutiona working short-term solution

Create a flexible enough system so that other organizations Create a flexible enough system so that other organizations could participate could participate

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 33: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Case Study: OHU and UC Case Study: OHU and UC (continued)(continued)

Solutions:Solutions:

Cold sites: reserve rack/floor space for off-Cold sites: reserve rack/floor space for off-site backup at $2200 US (minimum) per site backup at $2200 US (minimum) per year.year.

Warm/hot sites: for open systems, drop Warm/hot sites: for open systems, drop shipped and installed at same price per shipped and installed at same price per year, $2200 US per year.year, $2200 US per year.

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 34: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

University of CaliforniaUniversity of California– 209,000 students, 170,000 staff, 10 campus 209,000 students, 170,000 staff, 10 campus

sites across the Statesites across the State

The challenge:The challenge: provision a DR solution into provision a DR solution into independently managed campuses across independently managed campuses across multiple earthquake zones.multiple earthquake zones.

– Would external DR providers assist a Would external DR providers assist a university first, should a true crisis occur?university first, should a true crisis occur?

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Case Study #2Case Study #2

Page 35: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 36: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Case Study #2 (continued)Case Study #2 (continued)

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

University of California San Diego Office of the President (Berkley)

Solution:Solution: Provision a DR solution internally across Provision a DR solution internally across the University of Californiathe University of California

- A DR pilot was undertaken between- A DR pilot was undertaken between

Page 37: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Case Study #2 (continued)Case Study #2 (continued)

Solution:Solution:Addressed both mainframe and non-mainframe Addressed both mainframe and non-mainframe environmentsenvironmentsMirror each institution’s computer environment: Mirror each institution’s computer environment: create a seamless DR solution. Utilize identical create a seamless DR solution. Utilize identical hardware, co-purchased.hardware, co-purchased.Utilize a 10 Gigabit backbone network for high-Utilize a 10 Gigabit backbone network for high-speed data replication for near-real-time backupspeed data replication for near-real-time backupNon mainframe: replicate Windows, Linux, using Non mainframe: replicate Windows, Linux, using SANsSANs

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 38: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

A Business Case A Business Case Recommendations for BCNETRecommendations for BCNET

Principles:Principles:KISS (for now): provisioning shared facilities is KISS (for now): provisioning shared facilities is much simpler than trying to implement a ‘turn-key’ much simpler than trying to implement a ‘turn-key’ DR service.DR service.– Pilot an inter-site pilot that tests utility of sharing space Pilot an inter-site pilot that tests utility of sharing space

to co-locate disaster recovery serversto co-locate disaster recovery servers– Develop and sign an inter-site contract that has broad Develop and sign an inter-site contract that has broad

applicability across BCNET to address governance and applicability across BCNET to address governance and service issuesservice issues

– Once successful, broaden deployment to mutually Once successful, broaden deployment to mutually placed DR backup servers across BCNET member placed DR backup servers across BCNET member sites.sites.

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 39: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Practical StepsPractical Steps

We’ve identified basic requirements and potential facilities.We’ve identified basic requirements and potential facilities.We’ve worked out a formula to share data centre space We’ve worked out a formula to share data centre space across BCNET member sitesacross BCNET member sitesWe’ve worked out the basis for providing servicesWe’ve worked out the basis for providing servicesWe’ve identified commercial and alternative data centresWe’ve identified commercial and alternative data centresIt enhances, rather than replaces existing BC and DR It enhances, rather than replaces existing BC and DR effortseffortsIt is cost-effectiveIt is cost-effectiveIt is achievable – servers can be rapidly deployed with It is achievable – servers can be rapidly deployed with milestone deliverables and tasks developed with greater milestone deliverables and tasks developed with greater confidence it can be achieved on-time and on-budget.confidence it can be achieved on-time and on-budget.

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 40: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Thank you!Thank you!The Disaster Recovery Working GroupThe Disaster Recovery Working Group

UVic: Colin Leavett-BrownUVic: Colin Leavett-Brown

UBC: Margaret Sayer and Scott Owen UBC: Margaret Sayer and Scott Owen

RR - Steve Beaudry RR - Steve Beaudry

UNBC - Robert LucasUNBC - Robert Lucas

BCIT - Jaime Garcia BCIT - Jaime Garcia

TRU - Wesley Cole and David Burkholder TRU - Wesley Cole and David Burkholder

SFU - Steve HillmanSFU - Steve Hillman

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

Page 41: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference

BCNETBCNET

Questions?Questions?

www.bc.netwww.bc.net

Page 42: April 2008 BCNET 2008 Advanced Networks Conference Are We Ready? A Disaster Recovery Plan for the BCNET Community Are We Ready? A Disaster Recovery Plan

Interested?Interested?

For a copy of the DR Business case,For a copy of the DR Business case,

Contact:Contact:

Stan Shaw: [email protected] Shaw: [email protected]

April 2008April 2008 BCNET 2008 Advanced Networks ConferenceBCNET 2008 Advanced Networks Conference