it risk management, planning and mitigation tcom 5253/msis 4373
DESCRIPTION
IT Risk Management, Planning and Mitigation TCOM 5253/MSIS 4373. Business Continuity Planning 6 December 2007 Charles G. Gray. Business Continuity and Disaster Recovery. Business Continuity - Continuation of the “business” (revenue-generation) in the face of any unusual or unforeseen event - PowerPoint PPT PresentationTRANSCRIPT
(c) 2007 Charles G. Gray 1
IT Risk Management, Planning and Mitigation
TCOM 5253/MSIS 4373
Business Continuity Planning6 December 2007
Charles G. Gray
(c) 2007 Charles G. Gray 2
Business Continuity and Disaster Recovery
• Business Continuity - Continuation of the “business” (revenue-generation) in the face of any unusual or unforeseen event– Overall identification of potential events and
the predicted impact on the organization
• Disaster – an event that causes significant damage to business operations and requires some actions to recover
(c) 2007 Charles G. Gray 3
DR vs. BCP• Disaster recovery is no longer enough
• Business operations must be sustained – Legal requirements– Cash flow– Customer retention
• Business continuity is the first priority – then disaster recovery
(c) 2007 Charles G. Gray 4
Business Continuity Planning• An exercise in risk management• Not a “revenue producing” activity
– Business overhead (“cost of doing business”)
• A form of business insurance justified on losses that might occur
• Adequate budgets must be planned– Money– Staff– Time
(c) 2007 Charles G. Gray 5
Key Components of Business Operations
• People
• Equipment
• Workplace
• Suppliers
• Logistics
• Finance
(c) 2007 Charles G. Gray 6
Disaster Examples• Fire• Flood• Malicious damage• Theft• Terrorism• Sabotage• Explosion• Chemical spill• Gas leak
• Disease• Earthquake• Tropical storm• Biological agent• Hostage situation • Threat of action• Criminal damage• Accidental damage• Fault or failure
(c) 2007 Charles G. Gray 7
Key Factors Affected by a Disaster• Financial• Reputation
– Business (Tylenol, Arthur Anderson)– Personal
• NYC Mayor Giuliani• Enron CEO Ken Lay
• Customer service• National security• Health and safety
– Employees– General public
• Regulatory
(c) 2007 Charles G. Gray 8
What is the Cost of Downtime?• Productivity
– Number of employees times loaded pay rate
• Damaged reputation– Customers– Suppliers and business partners– Banks and financial markets
• Revenue– Direct loss, billing losses– Compensatory payments– Loss of future revenue
(c) 2007 Charles G. Gray 9
What is the Cost of Downtime?• Financial performance
– Revenue recognition– Cash flow– Lost discounts (Accounts payable)– Credit rating– Stock price
• Other expenses– Temporary employees, equipment rental,
overtime costs, extra shipping, travel expenses, legal obligations
(c) 2007 Charles G. Gray 10
Examples of Downtime Costs
• Energy $2.8 M per hour
• Telecommunications $2.1
• Manufacturing $1.6
• Finance/brokerage $1.5
• Info Technology $1.3
• Insurance $1.2
• Retail $1.1
• Pharmaceuticals $1.1Source – Meta Group 2006
(c) 2007 Charles G. Gray 11
The Ultimate Cost of Downtime
• 80% of businesses that suffer a major disruption fail within 18 months (Financial Times 18 April 2007)
• Most disruptions are relatively mundane– Drilling through an outside power cable– Failure of air conditioning– “Banana skins” – business slips that result in
loss of customers
(c) 2007 Charles G. Gray 12
BCP and IT• IT facilitates the majority of key business
processes in a modern company– IT systems control the workflow, production,
shipping, billing, customer service (?), etc.– Even the simplest operations can fail when
“the computer is down”
• IT is a strong management tool– Anything with costs associated with it is
tracked for audit and control
(c) 2007 Charles G. Gray 13
Disaster Recovery• Implementation of a response to a specific
type of event– A plan with supporting infrastructure, which is
implemented in the event of a disaster
• Usually treated as an “add on”– Tested occasionally, but rarely emphasized– Financial considerations (CBA)
• Cost of downtime vs. cost of system resilience
(c) 2007 Charles G. Gray 14
“Gap” Analysis• Lack of knowledge transfer between business
continuity and technical disaster recovery• IT security and physical security operate
autonomously• No clear quantitative methodology to rate and
benchmark• Health and safety issues not integrated into the
business• Continuity planning is isolated
– No senior-level champion– Not integrated throughout the business
(c) 2007 Charles G. Gray 15
IT/Business Boundary• IT segmented apart from the “business”
– Creators of technology on one side, users on the other
• Business analysts, project managers and “relationship” managers are expected to bridge the gap
• The business may duplicate some IT support functions to gain some “control”– IT may not even know about it
• Highly inefficient
(c) 2007 Charles G. Gray 16
Cultural Issues - Mistrust• Business tells IT that the requirement was
misunderstood
• Business rejects the technology as not working
• Business realizes their error, to “save face” accepts the technology but does not implement
• Realize their error and try to negotiate
• Find any other way possible to “save face”
(c) 2007 Charles G. Gray 17
Relationship to BCP• BCP is about building a solid and resilient
organization that can deal with difficult circumstances or situations
• Organization must be designed with business continuity in mind – not “bolted on” later– Ugly to look at– Difficult to manage– Costly!
(c) 2007 Charles G. Gray 18
Health and Safety
• Most important business continuity indicators
• People are the principal asset of any business – without them, nothing happens
• Most companies comply with the “letter of the law” – even if they don’t understand what the law is trying to effect
• Companies are responsible for doing all they can to provide a safe workplace
(c) 2007 Charles G. Gray 19
Not Just Fire Anymore• Fire escapes are needed, but that’s not all
– Think about emergency slides (airplanes)
• Terrorism• Natural disaster (global warming??)
– Tropical storms, tornados, tsunami, etc.
• Workplace must be designed for protection and evacuation– Flying glass is the biggest cause of injury– ADA compliance (rules on access, but not
egress)
(c) 2007 Charles G. Gray 20
Terrorism• Direct loss of life
• General economic impact– “Multiplier” effect (trickle-down)
• A company with 10,000 employees may influence $1B in indirect community economic impact
– Salaries, goods, services, taxes
• Mere threat of direct and indirect impact
• Psychological effect on employees– Highest impact on business continuity is
employee perception and panic
(c) 2007 Charles G. Gray 21
Risk, Motivation and CBA• In failing to protect against a disaster that
could be foreseen, is a company being negligent?
• When acts of terror can strike any business at any time, is there not a predictable risk to ALL businesses?
• What is the cost of lost business, loss of reputation or loss of life?
• Are not all businesses bound to protect employees against such events?
(c) 2007 Charles G. Gray 22
Key Issues (1)• Business continuity measures are typically
reactive – need to be more proactive
• No standard approach to business continuity across organizations/industries
• Organizations are not designed with business continuity in the forefront
• The threat of terrorism needs to be addressed more specifically when planning for business continuity
(c) 2007 Charles G. Gray 23
Key Issues (2)• Focus needs to be put on people as the core
asset of the organization• Organizations need to be motivated toward
better continuity preparation, security and health and safety
• A means of financially justifying these or even more comprehensive measures must be found
• Insurers need to cooperate with industry to ensure that individuals, economies and national security are better protected
(c) 2007 Charles G. Gray 24
Communication
Security & Safety
Quality Assurance
Governance and Strategy
Management
Rationalization
Risk Reduction
Rating
Rigor
Robustness
Resilience
Recovery
The Continuity Assurance Framework
Iterative Process
(c) 2007 Charles G. Gray 25
Continuity Assurance Methodology• Strategy sets the direction
• Governance is the navigation that keeps us on course
• Management controls the day-to-day operation of the continuity assurance machine
• QA measures progress in terns of achievement– Interfaces across and around all other functions
(c) 2007 Charles G. Gray 26
The “Machine” Model• Seven levels of quality (continuity)
assurance are the spokes in the wheel
• The hub and spokes of the wheel are encircled by a ring of security and safety
• Encircling all of the elements is communication and knowledge transfer
(c) 2007 Charles G. Gray 27
Core Methodology• Rationalization
• Risk Reduction
• Rating
• Rigor
• Robustness
• Resilience
• Recovery
(c) 2007 Charles G. Gray 28
Rationalization• First step on the path to continuity assurance
– If the foundation is wrong the whole method is undermined
• Rationalize the organization to harmonize security, continuity and recovery functional areas
• Review of processes to avoid overlap• Ensure that business continuity is integrated into
the organization rather than “bolted on”
(c) 2007 Charles G. Gray 29
Risk Reduction• Identification of risks to the business
• Measures the organization determines to put in place to reduce each risk identified
• Eliminate as many risks as possible in order to accurately rate true criticality of processes, people, and systems
(c) 2007 Charles G. Gray 30
Rating• Rating of people, processes and systems
to ensure that the organization is aware of its critical components and assets– You may not even know what the components
are!
• Must understand the business structure before looking at individual components in detail
• Expose weaknesses
(c) 2007 Charles G. Gray 31
Rigor of Process• Processes must be in place to manage
component configuration– Configuration/change management– Very few organizations have adequate
controls
• Rating should identify business areas that require reinforced/improved processes
• Identify which supporting systems need to be reinforced or made more robust
(c) 2007 Charles G. Gray 32
Robustness of Architecture• Determine the vulnerabilities in the
infrastructure and take an integrated architectural approach to correction
• Exercise control of the environment to safely manage any fundamental changes to the architecture– Proceed cautiously!
• Make sure underlying architecture is sound so as to not replicate something less than ideal
(c) 2007 Charles G. Gray 33
Resilience
• Once the underlying systems architecture is strengthened, add new levels of insular resilience to the critical components
• Applies to more than just IT systems – includes people– Need the information that systems AND people
have for business continuity
• Geographic diversity can avoid having to go to “recovery” from a localized event
(c) 2007 Charles G. Gray 34
Resilience–Sun Microsystems
• Championed by a senior executive at HQ
• Plan “owned” by business units
• Ask “what is most critical to the business?”– Why are we doing this?– Will this work in the event of a catastrophe?
• Plan must be simple and workable– Simulation/dry run/dress rehearsal is a
necessity• You may be amazed at the glitches discovered
(c) 2007 Charles G. Gray 35
Recovery• Recovery is what is left after all else failed
because the “event” was to widespread or severe– If you have been successful at all of the
previous levels then recovery will be necessary only in the most severe circumstances
• Recovery process has its own set of risks
(c) 2007 Charles G. Gray 36
Success of the Framework• Iterative process
– Continuous improvement– Revisit each level to tweak their capabilities
• Each level builds on the previous levels
– Holistic view of the organization– Employ new capabilities in response to the
ever-changing business environment
• Key performance indicators (KPI) at every level
(c) 2007 Charles G. Gray 37
Continuity Rating
• Continuity Assurance Achievement Rating (CAAR)– Overall rating of all KPIs across all levels of
the model– Measure of overall business continuity
capability
(c) 2007 Charles G. Gray 38
Solving Continuity Problems
• Root cause analysis– Pareto charts
• The “80/20 rule”• The “trivial many, and vital few”
– Fishbone (Ishikawa) process• Cause and effect diagrams• Systematically list all of the different causes that
can be attributed to a specific problem
• Ask the “why” question five levels down
(c) 2007 Charles G. Gray 39
Communication• Changes needed to truly incorporate business
continuity processes are traumatic– The only thing worse may be a merger
• Consistent and complete communication across the organization is imperative
• Akin to PR and marketing• Must have “buy-in” from top to bottom
– Everyone becomes part of the solution
• Demanding task requiring full-time resources and materials
(c) 2007 Charles G. Gray 40
Summary• Business continuity capabilities are not
simple and may require fundamental change across the entire organization
• Disaster comes not only in random events– Can be planned by some and thrust upon
others– Not just natural and indiscriminate but can be
orchestrated and targeted
• Business must orchestrate responses and target defenses to maintain safety, security, and overall continuity