Maintaining Business Continuity After Internal and External
IncidentsJohn Duff, Ph.D.Copyright John Duff 2008. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
2004 Got Our Attention…….
We are here
The class of 2008 was evacuated 4 times during their freshman Year
We are six feet above sea level
Actually lower than......
Old plan was….
Keep teaching until the water is chest deep…..
This is How We Responded - New plan….
PREPARE
RECOVER
CONTINUE
Keep Everyone Safe
Preserve the Enterprise
The ITS plan was developed in this context
Katrina Effect on top of 2004
The effect of Katrina was to cause us to focus on Business Continuity
How can we survive as an
enterprise if our campus is
damaged substantially?
Question & Challenge
Can we transition from a Residential College to
a distributed, virtual college?
What can ITS do to help make this happen?
Components of Strategy
Leadership
Emergency Management Group
Executive Emergency Management Team
Local Emergency Management groups
Equipped with:
Satellite Phones
Aircards
Components of Strategy
Students
Emergency shelters identified
Transportation provided
Faculty
Evacuate
Severe weather syllabus required & posted online
Staff
Follow advice of local emergency officials
ITS Challenge
Maintain full functionality – anywhere, anytime access:
• Web services • Payment methods• Remote access to critical business and academic processes• Email – existing email and means to stay in contact during and after an event• Support Delivery of Academic Program & Library Services
Become a virtual organization
Identified Requirements
Where did we need Hot Fail Over?
• WWW• Intranet – myEckerd• WebCT• CGI• Webmail• ECWeb• LDAP• Sendmail – manual DNS entry change
Fail over managed by:
Cisco CSS global and local load balancing
Identified Requirements
What additional services can be made available on short notice?
• Administrative Software• Banner• Touchnet – payment gateway• Library database access
• Academic resources• Wiki• Online course materials• Schedules• Rosters
To Deliver Services Requires Co-location
Selected Peak-10 in Tampa
Factors influencing the decision -Cost of the pipe (50mb Metro
Ethernet) -Proximity
-Elevation - 30’ vs 6’
Co-location Servers
Assigned multiple roles to servers to reduce cost
• Database Sun Fire V240
• Banner• Aims• Oracle• mysql• postgresql
• Mail Sun Fire V210
• DNS• SMTP• POP/IMAP• LDAP• IMP Webmail
• Web Services Sun Fire V210
• Payment/Windows Dell PowerEdge 700
Additional Hardware
Network
Firewall
Switch
VPN
Router
Cisco CSS
Console Access
WTi 16-port Serial Switch
Three 9-pin/RJ45 null-modem adapters
External Modem
Tape Backup attached to Backup Express
Storage System
• Sun StorEdge 6130s replaced single-host RAIDs—
before this project, no storage consolidation
• Critical systems (our student information system, billing system, Web and distance learning—and e-mail, according to our Board of Trustees) now consolidated on SSE 6130
Storage Environment• Two identical configurations in St. Pete and Tampa
• Single Cisco fiber switch with FCIP gateway
• Sun StorEdge 6130: single tray, 2 TB of storage
• Dual fiber connections to hosts
• Old-school backups: remote ufsdump server or array FS snapshots
Replication Strategies• Application-based, operating system–based, or
array-based?• Pluses and minuses to each approach; in the end
we chose a mix• Replication of sendmail server was our biggest
question mark• Oracle Data Guard for Oracle on our student
information system• Mix of array replication and other tools (rsync, etc.)
for Web services
When do We Execute?
Established a five day pre-event timelineDay 1
5-day CONE
Day 2 Day 3
3-day CONE
Day 4 Day 5
EVENT
Power down
Staff and Students evacuate
campus
24 hour window for staff to evacuate the area
Schedule full backups on key
Servers, switch
Library remote dbs to colo site
Power down campus servers
Activate colo servers
Post-event timeline
D Day +1 D-Day +2 D-Day +3 D-Day +4 D-Day +5
Damage assessment
Staff Returns to campus
Begin power up on Campus
Power up campus servers
Re-sync begins
Novell, Library return to normal
Up to 4 days
Developing a Culture of Testing
ITSTabletop exercise
Live tests
Scripts, power down procedures, etc.
Now at <30 minutes to bring up co-location site
Campus-wide test of Unit PlansAn average of 40 users test annually
Blue Sky assignments – test VPN access & query database
Pre/Post meetings
Other Considerations
Mailroom
Stop
Locate
Re-route
Timing
Phone Service
How do we operate without the switch?
Lessons Learned
• More cultural than technical at this point• Cost is always an issue – how can we best leverage
co-located site and other resources?• Knowledge transfer and sharing is critical –
technology is great - single point of failure is an individual
• There is never a good time to test – build a schedule and stay with it
THANK YOU
John Duff, Ph.D. - Acting Director of ITS
727-864-8318
Walter Moore – Senior Systems Administrator
727-864-8318