disaster recovery & data integrity cpte 433 john beckett 1
TRANSCRIPT
2
Disaster Recovery Plan
• Identify risks• Company’s legal and fiduciary
responsibilities– …and mission
• Plan for how to handle each risk
3
What’s Your Budget?
Formula(Cost – (Cost after mitigation)) x risk
• Cost: $500,000• Cost after mitigation: $100,000• Risk: .001• Budget for preventing this event should
then be $400,000 x .001 = $400.• Can you prevent this event for that
amount?
4
Budgeting Pitfalls
• Your company might not survive an event, so it is not just a “cost benefit” calculation.– Common response is to simply ignore what
might kill us.– That is an improper translation of “what we
cannot control we should not waste money trying to control.”
• Presuming that money solves problems.– People solve problems. Money is a tool.– Perhaps you need to re-think the way you do
things.• It might even be less expensive to do it
right!
5
The Ireland Case
• People focused on risks to equipment, not themselves.
• Nobody was making a judgment of when it was no longer safe for people to stay in the building.
• A fire-safety professional should have been in charge of the operation.
6
A Special Danger
• Generators emit Carbon Monoxide– Colorless– Tasteless– Kills
• A person going in to save someone who has collapsed, may suffer the same fate.
• There have been cases where a string of people have died, each trying to save the previous person.
7
Prepare For Most-Likely Risk
Case of call center• Exploited advantages of site
– Good weather• Evaluated most-likely risk
– Earthquake• Survived an event without mishap
8
Data Integrity
• Off-site backup• How trusted is the site?• How available is your data to you?• How available is your data to bad
guys?• How current is the backup?• How long will it take to recover?
9
Former Plan at IS
• All data backed up nightly, current backups in fire-proof safe.
• Quarterly backups (at strategic points in the business cycle) stored at auditor’s office.
• Fire procedure involved removing all discs.
10
Newer Plan
• Separate disk farm located across the street
• Connected with gigabit Ethernet• Duplicates existing data up-to-the-
moment
11
Disaster Plan Binder Debacle
• No prominent “DISASTER PLAN” binder.
• Auditors assumed there was no plan.• Management acted on that
assumption.Was this reasonable?• No: There was a plan.• Yes: Without “publishing” the plan, it
could easily get lost along the way.
12
The Environment Changes
• Disks no longer had removable packs• Data spread through many
computers, not just a single disk-farm
• Need to revise the plan!– Live copies at an alternate location
13
Do People Read the Plan?
• Only if they are afraid not to.• Favorable reinforcement:
– Threat of being embarrassed if they don’t know it.
– Lost of job status (seniority, whatever) – built into contract.
– Drills• Unfavorable reinforcement:
– Lack of connection between the plan and reality
14
When Planning Emergency Operations
What resources are needed?• Servers• Power/Environment• Security• People
– Don’t forget they have physical and emotional needs
15
How to Organize Emergency Operations
• Organize according to the service being offered.
• Each service has necessary inputs and outputs – so grouping makes sense.
16
Intellectual Property
• Establishing ownership of IP is a good reason for “forever” backups.
• Your IP is more likely to fit on a CD/DVD than all the other stuff you’d like to back up, so a separate IP CD/DVD may make sense.
17
What’s a Good Backup?
• Reasonable service interruption• Covers “Working Set” of data• Do-able for users• Media are affordable given a generous retention
policy– Three Zip disks isn’t going to cut it!
• Commodity media if possible: DVD (or CD for small applications). -> BluRay
• Partition your backup logically• Reader needs to work if your host dies
– In my case, external BluRay RW device
18
That SSH exploit
• Rule: Never have a no-password login.
• Problem: What about network shares brought up when the machine is booted up?
• Solution: Firewall those shares so they don’t go out. Not a total solution.– Supplement with user log-in to the PC.
19
Media Relations• Plan ahead who will handle media
relations.• Be careful what answers you give:
they may appear (distorted) in the newspaper tomorrow.– Recurring issue: software piracy and
security at SAU– The job of a reporter is to come up with
something “newsworthy.” Beware of “would you say…” questions.
20
What’s a UPS?
• It’s a way to shutdown in an orderly manner.• It’s a fair warning that you need to restore
power or shut down.• It isn’t a way to operate.• Application: Running computers for a
transaction system at an event. The UPS will notify you if somebody kicks the cord loose.
• Laptop bonus: Add up the cost of components for desktop including UPS, and you may very well have a laptop price (and less install cost).
21
The “Next Event” Syndrome
• Often the thing a user does after they discover something is wrong, is what prevents recovery.
• Design systems in a “fail-soft” manner.
• Treat people in a “fail-soft” manner.– Be more interested in how you can
design your system to facilitate getting things right, than who pushed the wrong button.
– Deming: Abolish “blame”, focus on “process”