disaster recovery at the university of alberta rob lake (presenter and co-producer) information...

36
Disaster Recovery at the University of Alberta Rob Lake (Presenter and Co-producer) Information Technology Planning and Forecasting Officer Office of the Vice Provost (Information Technology) University of Alberta [email protected] www.vpit.ualberta.ca

Upload: eustace-hensley

Post on 17-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Disaster Recovery at theUniversity of Alberta

Rob Lake(Presenter and Co-producer)

Information Technology Planning and Forecasting OfficerOffice of the Vice Provost (Information Technology)

University of [email protected]

Co-produced with (and thanks to):

Marika BourqueAssociate CITO and Executive Director of AICT

University of Alberta

Kevin MoodieDirector of AICT

University of Alberta

Brian AchesonDirector of AICT

University of Alberta

2007 EDUCAUSE Top Ten IT Issues

1. Funding IT

2. Security

3. Administrative/ERP/Information Systems

4. Identity/Access Management

5. Disaster Recovery/Business Continuity

6. Faculty Development, Support and Training

7. Infrastructure

8. Strategic Planning

9. Course/Learning Management Systems

10. Governance, Organization and Leadership for IT

Source: EDUCAUSE review May/June 2007

2006 ECAR Survey Results

31.6% - Strongly Disagree

35.0% - Disagree

10.0% - Neutral

18.6% - Agree

2.1% - Don't Know 2.7% - Strongly Agree

“If central IT systems and services were not operational at myinstitution, business units could carry out essential operations.”

Source: Ron Yanosky, ECAR Symposium, 30 June 2006

2006 ECAR Survey Results

20.7% - Disagree

22.3% - Neutral

44.6% - Agree

0.9% - Don't Know 6.5% - Strongly Agree

“My institution is prepared to restore centrallycontrolled systems in the event of a disruption.”

5.0% - Strongly Disagree

Source: Ron Yanosky, ECAR Symposium, 30 June 2006

ECAR Most Critical Services

• Campus Internet connection• Institutional Web site• Campus network• E-mail• Voice telephony• Course management system• Recovery time objective (RTO): 48 hours or less

Source: ECAR Research Bulletin Volume 2007 Issue 4

Disaster Recovery at the UofA

• Four components to disaster recovery plan: Academic information systems Administrative information systems Off-site data recovery centre Emergency notification

Academic Information Systems

• AICT Disaster Recovery Overview plan completed in March 2007

• Core academic services: Voice and data network connectivity Web Email / Webmail Telephony E-Learning (WebCT) DNS Identity Management AFS

Academic Information Systems

• Recovery Time Objective: 48 hours• Current plan requires a hot site to meet RTO for

core academic services• Will investigate warm site possibilities and

virtualization opportunities in the future

Academic Information Systems

• Overall requirements (worst case): $6.0 million for basic infrastructure $5.6 million to meet specified RTO of critical services $9.4 million for restoration of secondary services Total cost: $21.0 million

Academic Information Systems

• Plan considers two scenarios: Restoration with a secondary hot site Restoration without a secondary hot site

Academic Information Systems

• Restoration with a secondary hot site: Basic infrastructure already in place Fully functioning equipment for the core academic

services already in place Installation of secondary services 48 hour RTO for the core services 3 month minimal restoration timeframe for the

secondary services

Academic Information Systems

• Restoration without a secondary hot site: Requires selection of a hot site Installation of basic infrastructure Installation of core services Installation of secondary services 3 to 6 month downtime for core services Up to 9 month downtime for secondary services

Administrative Information Systems

• Outsourced to IBM Global Services since 2000• Relocated to Markham, Ontario in 2005• Warm site options in Montreal and Edmonton• Deferred until outsourcing contract renewal in

2010

Regional Data Centre

• Toma and Bouma Management Consultants and Stantec engaged in April 2006 to develop a Business Plan for a new Disaster Recovery Centre (DRC)

• Preliminary Business Case completed in late 2006

• Approved by Vice-Presidents in April 2007

Data Centre Standards

• Defined by the Telecommunications Infrastructure Standard for Data Centers (TIA 942)

• Classifies data centers into Tiers• Each Tier offers a higher degree of

sophistication and reliability

Tier 1

• Basic: 99.671% availability• Annual downtime of 28.8 hours• Susceptible to disruptions from both planned

and unplanned activity• Single path for power and cooling distribution, no

redundant components (N)• May or may not have a raised floor, UPS or

generator• 3 months to implement

Tier 2

• Redundant Components: 99.741% availability• Annual downtime of 22.0 hours• Less susceptible to disruption from both planned

and unplanned activity• Single path for power and cooling disruption,

includes redundant components (N+1)• Includes raised floor, UPS and generator• 3 to 6 months to implement

Tier 3

• Concurrently Maintainable: 99.982% availability• Annual downtime of 1.6 hours• Enables planned activity without disrupting

computer hardware operation, but unplanned events will still cause disruption

• Multiple power and cooling distribution paths but with only one path active, includes redundant components (N+1)

• Includes raised floor, UPS and generator• 15 to 20 months to implement

Tier 4

• Fault Tolerant: 99.995% availability• Annual downtime of 0.4 hours• Planned activity does not disrupt critical load and

data center can sustain at least one worst-case unplanned event with no critical load impact

• Multiple active power and cooling distribution paths with redundant components

• 15 to 20 months to implement

Regional Data Centre

• Data Centre Requirements: Tier 3 18,000 sq. ft.

• 6000 sq. ft. for servers and racks• 3000 sq. ft. for future growth• 9000 sq. ft. for support

Minimum 5 km from primary computing centre

Regional Data Centre

• Options:1. Exchange computing centre space with other

institutions

2. Lease space from service providers

3. Build new DRC alone

4. Build new DRC with public and / or private partners

Option 1

1. Exchange computing centre space• Minimal exchange with U of Calgary• No space in either computing centre• Reliance on external staff• Would require new capital investment

Option 2

2. Lease space from service provider• Four vendors surveyed• Lack of capacity in Alberta at this time, but that is

changing• Vendors would consider building a facility in the

Edmonton area if they could find an “anchor” tenant• Costs unclear, but would be about $3.5 million per

year

Option 3

3. Build new DRC alone• Requires provincial funding assistance• Unlikely for University of Alberta only

Option 4

4. Build new DRC with public and / or private partners (P3 arrangement)• Northern Alberta post-secondary institutions• Government of Alberta• City of Edmonton• Capital Health• TELUS / Epcor / etc.

Option 4

• 16 Alberta post-secondary institutions surveyed: 12 responded 5 have no plan 6 have a plan in progress 1 plan completed

• 8 interested in a regional solution• 4 “somewhat interested”

Regional Data Centre

• 30,000 sq. ft. facility required• Capital costs range from $12 million to $36

million (average: $22 million)• Operating costs range from $1 million to $4

million per year (average: 3.5 million)• Better chance to be funded by provincial

government• Governance model required• Rural or urban, but travel time important

Regional Data Centre

• Working group established in August 2008 between Government of Alberta, City of Edmonton, Capital Health and the University of Alberta

• Four meetings held• Government of Alberta leading the initiative

• Consultants to finish long term strategy by end of February

• 50,000 sq. ft. Tier 3 facility for 20 year period

Regional Data Centre

• Four possible locations considered• Location needs to satisfy Provincial Auditors• May still involve a P3 model• Jubilee Auditorium model

Risk Mitigation

• 3000 square foot server room to open in Enterprise Square in March 2008

• Lights out facility• Limited hot site capability – storage of offsite

backup tapes• Intended for building tenants – one room per

building• Green computing• Virtualization• IT Principles of Operation

Emergency Communication System

• To be implemented by the start of the 2008/09 academic year

• Emergency Communication Work Group established in November 2007

• Work in progress on an Emergency Communication Plan

• RFP for an alert system to be released by the end of February 2008

Getting the Message Out

• Home page announcement• Email• Telephony (cell and VOIP-enabled phones)• Facebook• Campus and local radio / media• Sirens• Flashing lights

Notification Software Criteria

• Flexible• Easy-to-use• Continuously available (24x7x365)• Accommodate two-way communications• Accessible from multiple locations• Handle high volumes for calls or messages

within a reasonable timeframe• Support an educational institution environment

Getting Started…..

• This effects everyone – need buy-in from all constituents on campus

• Roll-out will include a sign-up campaign and a public awareness campaign

• Many roll-out strategies will be employed• Several emergency exercises have recently

been held

Summary

• Still an outstanding IT liability• No inexpensive solution available• Lack of capability for service providers• Partnership for a new regional facility the best

option at this time• Some mitigation with the Enterprise Square

server room• Currently exploring many partnership options