disaster recovery - university at albany, suny · 2014-08-07 · sanjay goel, school of business,...
TRANSCRIPT
Disaster Recovery
Sanjay GoelSchool of Business
University at Albany, SUNY
Sanjay Goel, School of Business, University at Albany, SUNY
2
•
Disaster is an event that may lead to some subsequent events that are not desirable, can cause destruction on a large scale, and loss of property, life...etc
•
Disasters are catastrophic events as opposed to normal failures that can be handled by controls imposed in organizations
•
Disasters can be natural or man made
Disasters Definitions
Public Domain by Creator David Rydevik
Tsunami, Thailand, 2004
Sanjay Goel, School of Business, University at Albany, SUNY
3
•
A natural disaster is the consequence of a combination of a naturally occurring physical event e.g. volcanic eruption, earthquake, landslide that may lead to significant damage of life, property or operations
Disasters Natural Disasters
Mount Pinatubo
eruption, 1991
Tsunami in Sumatra, 2004
U.S. Federal Govt. Public Domain
U.S. Federal Govt. Public Domain
Sanjay Goel, School of Business, University at Albany, SUNY
4
•
Disasters involving human intent, negligence, error or involving a failure of a system are called human-
made disasters.
Disasters Man-made Disasters
Hurricane Katarina, 2004
Space Shuttle Challenger, 1986
NASA Public Domain
Sanjay Goel, School of Business, University at Albany, SUNY
6
September 11 Disaster Case Study
Sanjay Goel, School of Business, University at Albany, SUNY
7
Disasters September 11 - lessons
•
People and Information–
Virtually everything else was replaceable or re-creatable
•
Email was vital•
Communications were difficult
•
Crisis Management became critical–
command post and friends
•
Communicate well-being of company•
Finances are strained
Sanjay Goel, School of Business, University at Albany, SUNY
8
Disaster Recovery September 11 - lessons
•
Alternate workplaces•
IT issues were significant–
Tapes were inaccessible, poor backup, slow recovery
–
Disaster recovery staff were not dispersed in some cases
–
Lack of automation•
Paper records lost
•
Supply chain severely impacted
Sanjay Goel, School of Business, University at Albany, SUNY
9
Disaster Recovery September 11 - lessons
•
NY Economic impact = US$83B•
57,000 job loss by 2003
•
30 % of Office Space lost in NY•
25 %: power outage of over 8 hours (since 1997)
•
Key needs during disasters–
People
–
Information Technology–
Facilities
–
Connectivity–
Supply Chain
Sanjay Goel, School of Business, University at Albany, SUNY
10
• September 11 and Hurricane Katarina happen once in a while!
• Disasters can happen around you in every day life….
Disaster Recovery Impact on Organizations
Sanjay Goel, School of Business, University at Albany, SUNY
11
Disaster Recovery Impact on Organizations
Lost revenueLost revenue
BusinessBusiness interruptioninterruption
CompetitivenessCompetitiveness
LitigationLitigation
CompanyCompany reputationreputation
E-commerce downApplications downLost billings recordsLost business information
Used against youLost businessLost market shareHigher expensesOpportunity Costs
Customer perceptionInvestor uncertaintyLender uncertaintyHiring slowdownEmployee turnoverImpact to brand and image
End-users cannot do their jobsIT operations disruptedCustomers cannot access dataSuppliers cannot complete serviceHigher phone volumeLost ordersCustomer care calls disconnected
Investor filingsSupplier misunderstandingsCustomer contracts unmetService levels unmet
Sanjay Goel, School of Business, University at Albany, SUNY
12
•
Logical Outage–
Software bug
–
Virus/hack–
Data corruption
–
Accidental deletion of data–
DOS Attack
•
Component Outage–
CPU fault
–
Disk failure–
Network Card Failure
–
Software–
Fiber cuts
Disaster Recovery Events that can lead to disasters
•
Damage to Premises–Flooding / Water leaks–Storms–Hurricane–Fire–Power Outage–Terrorism/War
Sanjay Goel, School of Business, University at Albany, SUNY
13
•
Tangible losses1.
Employee productivity loss (62%)
2.
Data loss (43%)3.
Reduction in profits (40%)
4.
Damage to customer relationships (38%)
5.
Reduction in revenue (27%)
Disaster Recovery Consequences of Disasters
•
Intangible losses–
Reputation
–
Market–
Criminal Liability
–
Customer Satisfaction–
Stock Price
–
Brand Equity
Source: VERITAS Disaster Recovery Research, Sept. 2004
Sanjay Goel, School of Business, University at Albany, SUNY
14
•
Local organization–
No data center facilities
–
Limited budget/resources available•
Small closet with non-enterprise equipment
•
Sprinkler system malfunction–
Sprinkler soaks equipment
–
Servers short–
DSL router crashes
–
HD head crash–
No personnel injured
Disaster Recovery Scenario 1: Small Organization
Sanjay Goel, School of Business, University at Albany, SUNY
15
•
National/Multi-national organization–
Data center facilities
–
Large budget–
Multiple personnel
•
Data center in large city•
Bomb/Explosion–
Explosion in data center building
–
Some personnel injured–
Most equipment destroyed
–
Onsite backups destroyed
Disaster Recovery Scenario 2: Large Organization
Sanjay Goel, School of Business, University at Albany, SUNY
16
•
Large organizations typically have a two-level disaster recovery plan.
•
Level 1: Build enough redundancy in network and equipment to recover from a minor disaster, such as loss of a major server or
portion of the network (Business Continuity)•
Level 2: Create contingency plans if the services are completely
disabled (Contingency Planning)
Disaster Recovery Levels
Sanjay Goel, School of Business, University at Albany, SUNY
17
•
Business Continuity Planning involves identification of potential impacts of catastrophic failures that threaten the survival of an organization and provides a framework for building resilience and capability for an effective response which protects the organizational assets including property, reputation, brand value(Adapted from: British Standards Institute PAS56)
Disaster Recovery Business Continuity
Sanjay Goel, School of Business, University at Albany, SUNY
18
Disaster Recovery Business Contingency Planning (BCP)
•
BCP reduces the impact of business interruption to an acceptable level following large scale disruptions due to catastrophic failures
by resumption of interrupted business functions.•
These may include
–
Recovering operations using alternate equipment–
Performing affected business processes using manual methods.
•
It also assists management in providing customer confidence and service satisfaction, as crisis management control can assist the corporation in maintaining market share, and can provide the basis to promote industry images.
Sanjay Goel, School of Business, University at Albany, SUNY
19
Disaster Recovery Business Contingency Planning Cont’d.
•
Business Contingency Planning provides a control on revenue loss and cash flow exposures during any business interruption.
•
Each business function is analyzed to define the consequences of an outage of service in quantifiable financial terms, operational
impacts, and legal or regulatory restrictions. •
These consequences are then assessed by management who defines the point at which the consequences are unacceptable. That point becomes the recovery time frame.
Sanjay Goel, School of Business, University at Albany, SUNY
20
•
BCP identifies recovery alternatives to restore critical business functions which are weighed using cost benefit analysis
•
Solutions are selected to obtain a balance between acceptable potential losses and acceptable onetime and annual costs
•
A recovery plan is developed around the recovery solution authorized by management
•
The recovery plan is exercised to train the recovery organization, to define changes necessary in the plan to strengthen it, and to provide a tested vehicle which when executed will permit an effective resumption of interrupted business functions or computer operations
Disaster Recovery Business Contingency Planning Cont’d.
Sanjay Goel, School of Business, University at Albany, SUNY
21
•
Ensure continuity and survival of the business, protect corporate assets, provide management control of risks and exposures, provide preventative measures where appropriate, and to take proactive management control of any business interruption.
•
The business continuity plan answers several questions –
How do I reestablish my business function?–
What is a disaster?–
When do the impacts begin?–
How much loss can be tolerated?–
What are the options?–
What will a recovery plan cost?–
How much is enough?
Disaster Recovery BCP Objectives
Sanjay Goel, School of Business, University at Albany, SUNY
22
Disaster Recovery External Vendors
•
Contract with professional disaster recovery firms to provide second level support for major disasters.
•
Disaster recovery firms offer services such as, secure storage for backups, Complete networked data center that clients can use it their network is destroyed.
•
Full services are expensive, but worthwhile when disruption can cause large losses to revenue
Sanjay Goel, School of Business, University at Albany, SUNY
23
Plan
Sanjay Goel, School of Business, University at Albany, SUNY
24
•
Disaster Recovery Plan –
Describes how an organization deals with potential disasters
–
Consists of precautions taken such that the effects of a disaster are minimized, and the organization is able to maintain or quickly resume mission-critical functions
–
Planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster prevention.
•
Critical organizational assets include–
People
–
Data and systems
–
Communications & Networking
•
Challenges include–
Data growth at 50-80% per year (Gartner) & Increasing complexity of IT infrastructure
Disaster Recovery The Plan
Sanjay Goel, School of Business, University at Albany, SUNY
25
•
Plan responses to possible disasters, providing for partial or complete recovery of all data, application software, network components, and physical facilities.
•
Develop backup and recovery controls
that enable an organization to recover its data and restart its application software should some part of the network fail.
•
Address anomalous situations, such as, destruction of main database or the data center itself.
Disaster Recovery Objectives
Sanjay Goel, School of Business, University at Albany, SUNY
26
•
Names of responsible individuals•
Staff assignments and responsibilities
•
List of priorities of “fix-firsts”•
Location of alternative facilities.
•
Recovery procedures for data communications facilities, servers and application systems.
•
Actions to be taken under various contingencies.•
Manual processes.
•
Updating & testing procedures.•
Safe storage of data, software and the plan itself.
Disaster Recovery Disaster Recovery Plan (Elements)
Sanjay Goel, School of Business, University at Albany, SUNY
27
•
Create network redundancy
•
Protect from natural disasters–
Set your office away from a flood plane
•
Prevent theft•
Prevent computer virus attacks
•
Prevent DOS attacks
Disaster Recovery Prevention (Best Solution)
•
Redundancy & fault tolerance –
Uninterruptible power supplies (UPS)
–
Fault-tolerant servers–
Disk mirroring
–
Disk duplexing
Sanjay Goel, School of Business, University at Albany, SUNY
28
•
The best solution is to have a completely redundant network that duplicates every network component, but in a different location.
•
Stopping disasters is difficult. The most fundamental principle is to decentralize network resources.
•
Steps should be taken based on the expected risk of specific type of disaster (e.g. flood, earth quake, etc.)
Disaster Recovery Natural Disasters (Prevention)
Sanjay Goel, School of Business, University at Albany, SUNY
29
Disaster Recovery Equipment Theft (Prevention)•
Equipment theft can lead to serious disruptions if adequate precautions against
it are not taken.
–
Industry sources indicate that about $1 billion is lost each year to theft of computers and related equipment (USA statistic).
–
For this reason, security plans should include an evaluation of ways to prevent equipment theft.
Sanjay Goel, School of Business, University at Albany, SUNY
30
•
Viruses and worms can lead to catastrophic failures by disruption of networks and destroying the integrity of data and systems
•
Several different types of viruses and worms exist–
Macro viruses attach themselves to documents and become active when the files are opened are also common.
–
These can also facilitate bot infections in organizations–
Anti-virus software packages are available to check disks and files to ensure that they are virus-free.
–
Incoming e-mail messages are one of the most common source of viruses. Attachments to incoming e-mail should be routinely checked for viruses
–
Use of e-mail filtering programs should be considered
Disaster Recovery Prevention (Best Solution)
Sanjay Goel, School of Business, University at Albany, SUNY
31
Days MinsHrsWks Secs Mins DaysHrsSecs Wks
Recovery PointRecovery Point Recovery TimeRecovery Time
Tape Restore
ClusteringSnapshots
Replication
Tape Backup
Periodic Replication
Snapshots
Disaster Recovery Assessing Needs
• Recovery Point Objective– Amount of data loss acceptable– The point to which data must be
restored
• Recovery Time Objective– Amount of time it takes to come
back online– The time by which data must be
restored
Sanjay Goel, School of Business, University at Albany, SUNY
32
•
How often do you backup your data?–
Backup is the foundation of any good DR strategy as it is a point-in-time snapshot of data. The more often you backup the safer you are.
•
How much data loss can you afford? (RPO)–
Data is critical to success of organizations and must be protected at all costs.
–
New laws and regulations also dictate requirements for acceptable data loss.–
Not all data is created equal, and because there’s a high cost associated with safeguarding data, pick and choose what to protect.
•
How much downtime can you afford? (RTO)–
Clustering (fastest) and Bare Metal Restore (fast) simply automate the tasks of getting back to business faster. The more critical the system
the higher the need for automation.
Disaster Recovery Data Backup
Sanjay Goel, School of Business, University at Albany, SUNY
33
•
Security policies, firewalls •
Backups
•
Redundancy–
Employee (cross-training)–
Resource (power, network, utility) –
Hardware (servers, ups, etc.)•
ID staff and enforce strict access control
•
Training •
Testing & revisions
•
Documentation•
Facility access (evacuation)
•
Licensing compliance review •
Risk assessment
Disaster Recovery Preventative Controls
Sanjay Goel, School of Business, University at Albany, SUNY
34
•
Prioritize Services (based on time, safety) •
Risk assessment (based on time of year)
•
Notification & communications (tree process, cell phones vs. VoIP vs. traditional means)
•
Local TV & radio news resources•
Verification of strategies & plans
•
Need buy-in from staff, unions, management •
Contingency for non-locatable staff
•
Alternate workspace locations •
Business continuity measures
•
Coordinate with FEMA, law enforcement (&other agencies)
Disaster Recovery Strategies to recover
Sanjay Goel, School of Business, University at Albany, SUNY
35
•
ID unrecoverable scenarios •
Configuration management
•
ID mission-critical data•
Ongoing or fail back strategy for extended disasters
•
Identify coordinators (and chain of command) –
Escalation process
•
Prioritize & allocate resources–
Reassign resources
–
Skills matrix–
Work with partners & clients
Disaster Recovery Strategies to recover
Sanjay Goel, School of Business, University at Albany, SUNY
36
•
Regular communication with IT staff •
Identify critical services
•
Testing of recovered services•
Follow documented procedures to fix problems
•
Contingency clause with vendors, service providers, contractors
•
Relocate services away from affected area (rent, lease, safer space within building)
•
Monitor systems continuously •
Work with disaster recovery team
•
Test and validate systems one by one based on priority
Disaster Recovery IT Contingency Plan
Sanjay Goel, School of Business, University at Albany, SUNY
37
Risk
Sanjay Goel, School of Business, University at Albany, SUNY
38
•
Risk
–
perception of
uncertainty in events that occur and actions taken.
•
Risks encountered in everyday decision-making
•
Multiple ways to consider risks:–
Risk as feelings–
Risk as analysis–
Risk as politics•
We primarily evaluate risk intuitively (as feelings)
Risk Definition
Sanjay Goel, School of Business, University at Albany, SUNY
39
•
Statisticians–
Probabilities
–
Consequences of Adverse Events
–
Quantifiable•
Social scientists–
Invented to cope with uncertainties
–
Dependent on perception–
Risk perception: blending of science and judgment with important psychological, social, cultural, and political factors
Risk Opposing Views
Sanjay Goel, School of Business, University at Albany, SUNY
40
•
Uncertainty in computing risk is unavoidable
•
Reactions to risk based on emotion, rather than scientific evidence.–
When people become outraged, they may overreact.
–
If people are not outraged, they may under-react.
–
An industrial process producing an unpronounceable chemical is a much less acceptable risk than something more everyday, like driving or eating junk food.
Risk Human Factors
Sanjay Goel, School of Business, University at Albany, SUNY
41
•
Risk comparisons may be more clear than using absolute numbers
•
Emotions must be considered with scientific evidence.
•
People become uneasy when scientists are not certain about the risk posed by a hazard (effect, severity, or prevalence). –
Rather than diminish legitimate concerns or heighten illegitimate ones, psychological factors must be addressed to encourage constructive action.
Risk Human Factors
Sanjay Goel, School of Business, University at Albany, SUNY
42
• Risk is the probability that a specific threat
will
successfully exploit a vulnerability
causing a loss.•
Risks are evaluated by three distinguishing characteristics:
1.
Loss associated with an event, e.g., disclosure of confidential data, lost time and revenues.
2.
Likelihood that event will occur, i.e. probability of occurrence
3.
Degree risk outcome can be influenced, i.e. controls•
Various forms of threats exist
– Different stakeholders have different perceptions
–
Several sources of threats exist simultaneously
Risk Formal Definition
Sanjay Goel, School of Business, University at Albany, SUNY
43
•
Risk is the probability that a specific threat
will
successfully exploit a vulnerability
causing a loss.
Risk Risk Management Process
What can go wrong (Initiating Events)?
How Bad(Consequences)?
How Often(Likelihood of failure)?
Aggregate Risk(Likelihood of consequences calculated for every
possible combination of precipitating events)
Measures to reduce the consequences of risk until they reach acceptable levels (Benefits > Aggregated Risk)
Sanjay Goel, School of Business, University at Albany, SUNY
44
Risk Example #1: Caveman Going to Hunt
• Potential Accidents–
Being eaten by prey–
Being mistakenly hurt by tribe member–
Accidentally getting hurt on terrain
• Hazard Control(Reduce likelihood of damage)
–
Avoid dangerous terrain–
Scare animals with fire or sticks–
Hide from animals–
Hunt in groups
• Protection & Damage Limitation(Reduce Consequences)–
Apply first aid–
Run once animal follows you
• How Bad(Consequences)
–
Injury–
Death
Ris
k =
Con
seq
uen
ce x
Lik
elih
ood
Cost-Benefit Analysis
Total Risk
TotalBenefit
Food
Sanjay Goel, School of Business, University at Albany, SUNY
45
Risk Example #2: Participating in Sports Event
• Potential Accidents–
Collision–
Slipping–
Tripping
• Hazard Control(Reduce likelihood of damage)
–
Training–
Being Careful–
Using proper footwear & protective gear–
Following Rules
• Protection & Damage Limitation(Reduce Consequences)–
First Aid–
Ambulance –
Medical & Hospital Services
• How Bad(Consequences)
–
Out for Match–
Out for Season
Ris
k =
Con
seq
uen
ce x
Lik
elih
ood
Cost-Benefit Analysis
–
Broken Bone–
Sprained Muscle–
Torn Ligament
Thrill & PrideTotal Risk
TotalBenefit
Sanjay Goel, School of Business, University at Albany, SUNY
46
Risk Example #3: Driving to Work
• Potential Accidents–
Head on Collision–
Side/Rear-end impact–
Hit pedestrian–
Overturn Car–
Carjacking
• Hazard Control(Reduce likelihood of damage)
–
License–
Proper road & signal construction–
Safety Barriers–
Police Surveillance & speed control–
Obeying traffic rules
• Protection & Damage Limitation(Reduce Consequences)–
Having Airbags Installed in Vehicle–
Wearing Seatbelts–
First Aid & Hospitalization
• How Bad(Consequences)
–
Vehicle Damage–
Traffic Ticket
Ris
k =
Con
seq
uen
ce x
Lik
elih
ood
Cost-Benefit Analysis
–
Death–
Insurance Premium Hike
–
Injury
• Causes –
Fatigue–
Poor Judgment–
Environmental Conditions
–
Failure to see traffic signals
EmploymentTotal Risk
TotalBenefit
Sanjay Goel, School of Business, University at Albany, SUNY
47
• Risk is defined as the expected losses as a result of potential threats that can manifest themselves and cause damage to assets.
• Risk can be analyzed by assessing the probability of an event, the vulnerability of the elements at risk, and the value of assets that are in danger
• Risk assessment forms an important input in disaster management, in the design of development plans, and in emergency response planning.
• Disaster planning should follow the same procedure as routine risk assessment but typically covers more catastrophic events
Disaster Recovery Risk
Sanjay Goel, School of Business, University at Albany, SUNY
48
Disaster Recovery Risk Concept Map
Source: Australian Standard Handbook of Information Security Risk Management –
HB231-
2000
•
Threats exploit system vulnerabilities which expose system assets.
•
Security controls protect against threats by meeting security requirements established on the basis of asset values.
Sanjay Goel, School of Business, University at Albany, SUNY
49
Disaster Recovery Risk Analysis – Matrix Based Approach
Sanjay Goel, School of Business, University at Albany, SUNY
50
•
Consists of three matrices –
Vulnerability Matrix: Links assets to vulnerabilities–
Threat Matrix: Links vulnerabilities to threats–
Control Matrix: Links threats to the controls
•
Step 1–
Identify the assets & compute the relative importance of assets
•
Step 2–
List assets in the columns of the matrix.–
List vulnerabilities in the rows within the matrix.–
The value row should contain asset values. –
Rank the assets based on the impact to the organization.–
Compute the aggregate value of relative importance of different vulnerabilities
Matrix Based Approach Methodology
Sanjay Goel, School of Business, University at Albany, SUNY
51
•
Step 3–
Add aggregate values of vulnerabilities from vulnerability matrix to the column side of the threat matrix
–
Identify the threats and add them to the row side of the threat matrix–
Determine the relative influence of threats on the vulnerabilities–
Compute aggregate values of importance of different threats
•
Step 4–
Add aggregate values of threats from the threat matrix to the column side of control matrix
–
Identify the controls and add them to the row side of the control matrix
–
Compute aggregate values of importance of different controls
Matrix Based Approach Methodology
Sanjay Goel, School of Business, University at Albany, SUNY
52
•
There needs to be a threshold for determining the correlations within the matrices. For each matrix, the thresholds can be different. This can be done in two ways:
•
Qualitatively –
determined relative to other correlations–
e.g. asset1/vulnerability1 (L) is much lower than asset3/vulnerability3 (H) correlation. asset2/vulnerability2 correlation is in-between (M)
•
Quantitatively–
determined by setting limits–
e.g. if no correlation (0), if lower than 10% correlation (L), if lower than 35% medium (M), if greater than 35% (H)
Matrix Based Approach Determining L/M/H
Sanjay Goel, School of Business, University at Albany, SUNY
53
•
Although the example provided gives 4 different levels (Not Relevant, Low, Medium, and High), organizations may choose to have more levels for finer grained evaluation.
•
For example:–
Not Relevant (0)
–
Very Low (1)–
Low (2)
–
Medium-Low (3)–
Medium (4)
–
Medium-High (5)–
High (6)
Matrix Based Approach Extension of L/M/H
Sanjay Goel, School of Business, University at Albany, SUNY
54
• Customize matrix to assets & vulnerabilities applicable to case– Compute cost of each asset and put them in the value row– Determine correlation with vulnerability and asset (L/M/H)– Compute the sum of product of vulnerability & asset values; add to impact
column
Matrix Based Approach Assets and Vulnerabilities
ScaleNot Relevant -
0Low –
1Medium –
3High –
9
Critical Infrastructure
Trade Secrets (IP)
Client Secrets
Reputation (Trust)
Lost Sales/Revenue
Cleanup Costs
Info/ Integrity
Hardw
are
Software
Services
Web Servers Compute Servers FirewallsRoutersClient NodesDatabases
ValueVulnerabilities
Assets &
Costs
Relative Im
pact
Sanjay Goel, School of Business, University at Albany, SUNY
55
• Complete matrix based on the specific case– Add values from the Impact column of the previous matrix– Determine association between threat and vulnerability– Compute aggregate exposure values by multiplying impact and the associations
Matrix Based Approach Vulnerabilities and Threats
ScaleNot Relevant -
0Low–
1Medium –
3High –
9
Web Servers
Compute Servers
Firewalls
Routers
Client Nodes
Databases
… … … …
Hacking AttacksFloodsEarthquakesHuman ErrorsInsider AttacksHurricane
ValueThreats
Vu
lnerab
ilities
Relative T
hreat
Imp
ortance
Sanjay Goel, School of Business, University at Albany, SUNY
56
• Customize matrix based on the specific case– Add values from the relative exposure column of the previous matrix– Determine impact of different controls on different threats– Compute the aggregate value of benefit of each control
Matrix Based Approach Threats and Controls
ScaleNot Relevant -
0 Low –
1Medium –
3High –
9
Denial of Service
Spoofing
Malicious Code
Hum
an Errors
Insider Attacks
Intrusion
Spam
Physical Dam
age
… …
FirewallsIDSSingle Sign-OnDMZTrainingSecurity Policy
ValueControls
Th
reats
Valu
e of Con
trol
Network ConfigurationHardening of Environment
Sanjay Goel, School of Business, University at Albany, SUNY
57
When Disaster’s Happen?
Sanjay Goel, School of Business, University at Albany, SUNY
58
When Disasters Happen Physical Damage
• Flood/leak
• Fire
• Lightning damage
• Insects/rodents
• Mechanical shock
• Overheating
• Damage to media
• Natural disasters
Sanjay Goel, School of Business, University at Albany, SUNY
59
• Hard drive failure
• Power supply failure
• Other failure leading to physical damage
• Media (CD-ROM, tape) decay
When Disasters Happen Mechanical Failure
Sanjay Goel, School of Business, University at Albany, SUNY
60
• Accidental file deletion
• Accidental file replacement
• Loss of entire system (e.g., lost laptop)
• Loss of media (e.g., CDRWs, USB keys)
When Disasters Happen Human Error
Sanjay Goel, School of Business, University at Albany, SUNY
61
• Malware (viruses, worms, trojans, spyware)
• Disgruntled employees
• External attackers
• Theft of systems or media
• Arson
• Terrorism
When Disasters Happen Malice & Evil
Sanjay Goel, School of Business, University at Albany, SUNY
62
• Human error or malicious attack can damage data on hard drives
• Backup tapes can be damaged by viruses or natural disasters that destroy the tapes
• Since no single defense is perfect, you must layer your defenses–
Multiple backups over time (archival backups)
– Copies stored in multiple locations (business continuity backups)
When Disasters Happen Layered Defense
Sanjay Goel, School of Business, University at Albany, SUNY
63
• Determine the systems to back up through risk analysis
• Assign responsibility for performing the backups?
• Determine the priority of systems to restore (e.g., do you restore the payroll system before the marketing web site?)
• Create a recovery plan with clear chain of command to initiate recovery.
• Keep backup copies offline
When Disasters Happen Creating a Plan
Sanjay Goel, School of Business, University at Albany, SUNY
64
Disaster Recovery Plan
Sanjay Goel, School of Business, University at Albany, SUNY
65
• A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during and after a disaster.
• The primary objective of disaster recovery planning is to protect the organization in the event that all or part of its operations and/or computer services are rendered unusable. –
The plan should minimize disruption of operations and ensure organizational stability and an orderly recovery after a disaster.
• Other objectives of disaster recovery planning include:–
Provide a sense of security for employees–
Minimize risk of delays–
Guarantee the reliability of standby systems–
Provide a standard for testing the plan.–
Minimize decision-making during a disaster•
The plan should be documented and tested
Disaster Recovery Plan Definitions
Sanjay Goel, School of Business, University at Albany, SUNY
66
• Insurance alone is insufficient to manage disaster recovery since it does not compensate for the loss of business during the interruption
• There are several reasons to have a disaster recovery plan–
Minimizing potential economic loss–
Decreasing potential exposures–
Reducing the probability of occurrence–
Reducing disruptions to operations–
Ensuring organizational stability–
Providing an orderly recovery–
Minimizing insurance premiums–
Reducing reliance on certain key individuals–
Protecting the assets of the organization–
Ensuring the safety of personnel and customers–
Minimizing decision-making during a disastrous event–
Minimizing legal liability
Disaster Recovery Plan Reasons
Sanjay Goel, School of Business, University at Albany, SUNY
67
• Management should endorse the disaster planning effort
• It should be responsible for coordinating the planning efforts and ensuring its dissemination in the organization.
• Resources must be committed to the development of an effective plan. –
Both financial and labor resources must be provided
Disaster Recovery Plan Obtain Top Management Commitment
Sanjay Goel, School of Business, University at Albany, SUNY
68
• A planning committee should oversee the development and implementation of the plan.
• It should include representatives from all functional areas of the organization.
• Key committee members should include the Information Security Officer, Chief Information Officer, operations manager and the data processing manager.
• The committee should be responsible for defining the scope of the plan.
Disaster Recovery Plan Establish a Planning Committee
Sanjay Goel, School of Business, University at Albany, SUNY
69
• Disaster Recovery Plan should be linked closely with risk assessment of the organization
• All the assets, vulnerabilities and threats should be comprehensively collected
• The primary focus should however be on the more catastrophic events, including natural, technical and human threats. –
The plan should be written for the worst case scenario
• Each functional area of the organization should be analyzed to determine the potential consequence and impact associated with several disaster scenarios.
• The risk assessment should also include the safety of critical documents and vital records.
Disaster Recovery Plan Perform Risk Analysis
Sanjay Goel, School of Business, University at Albany, SUNY
70
• Critical needs are defined as the procedures and equipment essential to continue operations in case of a disaster
• To determine critical needs–
All the operations and processes of each department should be documented
–
They should be ranked in order of priority (Essential, important
and non-essential.)
• The maximum time that the organization can operate without each critical system should be also determined
• Critical areas include–
Functional operations–
Key personnel–
Information–
Processing Systems–
Service–
Documentation–
Vital records –
Policies and procedures
Disaster Recovery Plan Establish Critical Needs
Sanjay Goel, School of Business, University at Albany, SUNY
71
• Elements to Consider:–
Facilities
– Hardware
– Software
– Communications
– Data files
– Customer services
– User operations
– Management Information Systems
– End-user systems
– Other processing operations
Disaster Recovery Plan Strategies
• Options for recovery–
Hot sites
– Warm sites
– Cold sites
– Reciprocal agreements
– Two data centers
– Multiple computers
– Service centers
– Consortium arrangement
– Vendor supplied equipment
Sanjay Goel, School of Business, University at Albany, SUNY
72
•
Backup position listing
•
Critical telephone numbers
•
Communications inventory
•
Distribution register
•
Documentation inventory
•
Equipment inventory
•
Forms inventory
Insurance Policy inventory•
Main computer hardware inventory
•
Master call list
•
Master vendor list
Disaster Recovery Plan Emergency Information to Collect
•
Hardware and software inventory
•
Notification checklist
•
Office supply inventory
•
Off-site storage location inventory
•
Software and data files backup/retention schedules
•
Telephone inventory
•
Temporary location specifications
•
Other materials and documentation
Sanjay Goel, School of Business, University at Albany, SUNY
73
• First an outline for detailed procedures in the plan needs to be created
• The outline should be approved by the top management
• This outline becomes the table of content for the plan
• Plan should–
Provide roadmap for detailed procedures
– Identify the scope clearly
– Identify any potential redundancies in the plan
Disaster Recovery Plan Writing the Plan
Sanjay Goel, School of Business, University at Albany, SUNY
74
• Create a standard template for all detailed procedures–
Makes training and dissemination easier
– Removes ambiguities in the plan
– Facilitates collaboration during writing
• Procedures should include pre & post disaster procedures
• Plan should include provisions for periodic evaluation and revisions
• Specific teams should be created for different functional areas (i.e. administration, facilities, logistics, user support, computer backup, restoration, etc.)
Disaster Recovery Plan Creating Detailed Procedures
Sanjay Goel, School of Business, University at Albany, SUNY
75
• Each team should have a leader and different personnel on the team should have clearly delineated roles.
• Management team should be created to coordinate the recovery process, access damage, activate the recovery plan, and work with team leaders
• Despite the management structure teams may need to operate autonomously during the disaster
• Management team members should set priorities, policies and procedures in case of unforeseen contingencies
Disaster Recovery Plan Creating Detailed Procedures Cont’d.
Sanjay Goel, School of Business, University at Albany, SUNY
76
• The procedures should be validated periodically (at least annually)
• Validation procedures should be a part of the plan
• Validation of the procedures would help in –
Determining the reliability of backup facilities & procedures
– Identify weaknesses in the procedures
– Provide training to the different teams
– Protect the company from legal liabilities (due diligence)
Disaster Recovery Plan Validation & Testing of Procedures
Sanjay Goel, School of Business, University at Albany, SUNY
77
• The initial testing involves a walk-through of the entire plan
• Plan should be updated if any discrepancies or inconsistencies are observed
• The testing of the plan should be done in sections first to avoid large scale work disruptions
• Testing procedures may vary considerably–
Checklist tests
– Simulation tests
– Parallel tests
– Full interruption tests
Disaster Recovery Plan Testing Entire Plan
Sanjay Goel, School of Business, University at Albany, SUNY
78
• The plan requires approval of the top management
• Management responsibilities include–
Establishing policies, procedures, and responsibilities for comprehensive contingency planning
– Review and approve the plan annually, documenting the reviews in writing
Disaster Recovery Plan Approval of the Plan
Sanjay Goel, School of Business, University at Albany, SUNY
79
Writing the Plan
Sanjay Goel, School of Business, University at Albany, SUNY
80
• The contents of the plan should follow a logical sequence
• It should be written in standard and understandable format–
Standardization facilitate the consistency and conformity throughout the plan.
– Standardization is especially important if several people write the procedures
• Plans should be brief and to the point with properly documented procedures
• Well-written plans reduce the time required to read and understand and therefore, result in a better chance of success
Disaster Recovery Plan Writing
Sanjay Goel, School of Business, University at Albany, SUNY
81
• Plan has two major elements, i.e., Background & Instructions
• Background information should be written using indicative (direct subject-verb-predicate structure) sentences and should include–
Purpose of the procedure
– Scope of the procedure (e.g. location, equipment, personnel, and expected time to complete the procedure) Reference materials (i.e., other manuals, information, or materials that should be consulted)
– Documentation describing the applicable forms that must be used when performing the procedures
– Authorizations listing the specific approvals required
– Particular policies applicable to the procedures
Disaster Recovery Plan Writing
Sanjay Goel, School of Business, University at Albany, SUNY
82
• Instructions should be developed using the imperative style (start with a verb while the pronoun “you”
is assumed) and issue directions to be followed.
Headings should include:–
Subject category number and description
– Subject subcategory number and description
– Page number
– Revision number
– Superseded date
• A suggested format is to have standard headings on each page separated from the details of procedures.
Disaster Recovery Plan Writing
Sanjay Goel, School of Business, University at Albany, SUNY
83
• Write the plan with the assumption it will be implemented by personnel completely unfamiliar with the function and operation.
• Use short, simple sentences that present one idea at a time
• Use short paragraphs with topic sentences to start each paragraph..
• To improve brevity use active voice verbs in present tense.
• Avoid jargon.
• Use position titles rather than specific names of individuals and avoid gender nouns and pronouns
• Identify events that occur in parallel and events that must occur sequentially.
• Use descriptive verbs (e.g. Acquire, Activate Advise Answer Create Move, Declare, Pay, etc.)
Disaster Recovery Plan Writing Style
Sanjay Goel, School of Business, University at Albany, SUNY
84
• A comprehensive plan should include areas of operation outside data processing.
• The plan should be developed for the worst case scenario, i.e., the primary facility is destroyed–
Less critical situations can be handled by using a part of the plan.
• No disaster plan can cover every conceivable disaster–
The scope of the plan should be created based on a risk analysis (and cost-benefit analysis)
Disaster Recovery Plan Scope
Sanjay Goel, School of Business, University at Albany, SUNY
85
• Some basic assumptions need to be made while developing the plan–
The primary location of the organization has been destroyed but an alternate facility is available
– Staff is available to perform critical functions and can report to backup sites for recovery and reconstruction activities
– Off-site storage facilities and materials survive and an adequate supply of critical forms and supplies are stored off-site
– The disaster recovery plan is current
– Long distance and local communications lines are still available
– Surface transportation in the local area is possible
– Vendors will perform according to their preexisting contracts
Disaster Recovery Plan Assumptions
Sanjay Goel, School of Business, University at Albany, SUNY
86
• The organization structure during emergency may be different from the organization chart
• Teams should be created each with a manager an alternate
• Each team should have specific responsibilities as listed on the right
Disaster Recovery Plan Creating Teams
–
Management–
Business recovery–
Data backup & recovery–
Computing restoration–
Damage assessment–
Security –
Facilities support–
Administrative support–
Logistics support–
User support–
Off-site storage–
Software & Applications–
Network & Communications–
Human relations–
Marketing/Customer relations
Sanjay Goel, School of Business, University at Albany, SUNY
87
•
One of the most important elements of creating the plan is data collection
•
The basic elements of data collection come from risk assessment in the organization
•
The fundamental difference between risk assessment and disaster planning is that for disaster planning you have to identify catastrophic events that have a low probability of occurrence but similar techniques can be used
•
There are several questions that you need to ask –
as shown in the questionnaire
Disaster Recovery Plan Data Collection
Sanjay Goel, School of Business, University at Albany, SUNY
88
Your assignment is to create a disaster recovery plan for your home. You will start with a risk assessment of your household using
Disaster Recovery Plan Case
Sanjay Goel, School of Business, University at Albany, SUNY
89
Your assignment is to create a disaster recovery plan for your home. You will start with a risk assessment of your household using the risk matrices and then decide what controls can you put into place.
Disaster Recovery Plan Case
Sanjay Goel, School of Business, University at Albany, SUNY
90
•
Business Continuity Plans are important for all sizes of businesses
•
Best solution is to prevent disaster•
If disasters do happen the organization should be ready
•
A lot of the issues involves logistics during the disaster•
The problem boils down to risk analysis
Disaster Recovery Summary