disaster recovery & data integrity cpte 433 john beckett 1

22
Disaster Recovery & Data Integrity CPTE 433 John Beckett 1

Upload: bertram-stevens

Post on 29-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

1

Disaster Recovery & Data Integrity

CPTE 433

John Beckett

2

Disaster Recovery Plan

• Identify risks• Company’s legal and fiduciary

responsibilities– …and mission

• Plan for how to handle each risk

3

What’s Your Budget?

Formula(Cost – (Cost after mitigation)) x risk

• Cost: $500,000• Cost after mitigation: $100,000• Risk: .001• Budget for preventing this event should

then be $400,000 x .001 = $400.• Can you prevent this event for that

amount?

4

Budgeting Pitfalls

• Your company might not survive an event, so it is not just a “cost benefit” calculation.– Common response is to simply ignore what

might kill us.– That is an improper translation of “what we

cannot control we should not waste money trying to control.”

• Presuming that money solves problems.– People solve problems. Money is a tool.– Perhaps you need to re-think the way you do

things.• It might even be less expensive to do it

right!

5

The Ireland Case

• People focused on risks to equipment, not themselves.

• Nobody was making a judgment of when it was no longer safe for people to stay in the building.

• A fire-safety professional should have been in charge of the operation.

6

A Special Danger

• Generators emit Carbon Monoxide– Colorless– Tasteless– Kills

• A person going in to save someone who has collapsed, may suffer the same fate.

• There have been cases where a string of people have died, each trying to save the previous person.

7

Prepare For Most-Likely Risk

Case of call center• Exploited advantages of site

– Good weather• Evaluated most-likely risk

– Earthquake• Survived an event without mishap

8

Data Integrity

• Off-site backup• How trusted is the site?• How available is your data to you?• How available is your data to bad

guys?• How current is the backup?• How long will it take to recover?

9

Former Plan at IS

• All data backed up nightly, current backups in fire-proof safe.

• Quarterly backups (at strategic points in the business cycle) stored at auditor’s office.

• Fire procedure involved removing all discs.

10

Newer Plan

• Separate disk farm located across the street

• Connected with gigabit Ethernet• Duplicates existing data up-to-the-

moment

11

Disaster Plan Binder Debacle

• No prominent “DISASTER PLAN” binder.

• Auditors assumed there was no plan.• Management acted on that

assumption.Was this reasonable?• No: There was a plan.• Yes: Without “publishing” the plan, it

could easily get lost along the way.

12

The Environment Changes

• Disks no longer had removable packs• Data spread through many

computers, not just a single disk-farm

• Need to revise the plan!– Live copies at an alternate location

13

Do People Read the Plan?

• Only if they are afraid not to.• Favorable reinforcement:

– Threat of being embarrassed if they don’t know it.

– Lost of job status (seniority, whatever) – built into contract.

– Drills• Unfavorable reinforcement:

– Lack of connection between the plan and reality

14

When Planning Emergency Operations

What resources are needed?• Servers• Power/Environment• Security• People

– Don’t forget they have physical and emotional needs

15

How to Organize Emergency Operations

• Organize according to the service being offered.

• Each service has necessary inputs and outputs – so grouping makes sense.

16

Intellectual Property

• Establishing ownership of IP is a good reason for “forever” backups.

• Your IP is more likely to fit on a CD/DVD than all the other stuff you’d like to back up, so a separate IP CD/DVD may make sense.

17

What’s a Good Backup?

• Reasonable service interruption• Covers “Working Set” of data• Do-able for users• Media are affordable given a generous retention

policy– Three Zip disks isn’t going to cut it!

• Commodity media if possible: DVD (or CD for small applications). -> BluRay

• Partition your backup logically• Reader needs to work if your host dies

– In my case, external BluRay RW device

18

That SSH exploit

• Rule: Never have a no-password login.

• Problem: What about network shares brought up when the machine is booted up?

• Solution: Firewall those shares so they don’t go out. Not a total solution.– Supplement with user log-in to the PC.

19

Media Relations• Plan ahead who will handle media

relations.• Be careful what answers you give:

they may appear (distorted) in the newspaper tomorrow.– Recurring issue: software piracy and

security at SAU– The job of a reporter is to come up with

something “newsworthy.” Beware of “would you say…” questions.

20

What’s a UPS?

• It’s a way to shutdown in an orderly manner.• It’s a fair warning that you need to restore

power or shut down.• It isn’t a way to operate.• Application: Running computers for a

transaction system at an event. The UPS will notify you if somebody kicks the cord loose.

• Laptop bonus: Add up the cost of components for desktop including UPS, and you may very well have a laptop price (and less install cost).

21

The “Next Event” Syndrome

• Often the thing a user does after they discover something is wrong, is what prevents recovery.

• Design systems in a “fail-soft” manner.

• Treat people in a “fail-soft” manner.– Be more interested in how you can

design your system to facilitate getting things right, than who pushed the wrong button.

– Deming: Abolish “blame”, focus on “process”

22

Post-Mortem

• No-Fault meeting• Look honestly at causes• Get accurate info on actions that

were taken• Evaluate results of those actions• Plan for better reactions in the future

• Separate meetings for techs and users