NWPPA2016Disaster Recovery NWPPA Reno, NV
Copyright 2016, IVOXY Consulting, LLC
About Us
Our Consulting Services
3
‣ We perform assessments, design, implementation, project management, support & training services for a focused set of solutions: ‣ Virtualized servers & desktops. ‣ Hyper-converged infrastructure. ‣ Centralized storage (SAN, NAS). ‣ Networking (switching, routing, intelligent gateways, wireless, network security). ‣ Data protection & compliance (backup, archiving, encryption & e-discovery). ‣ Information security. ‣ Mobile device management. ‣ Infrastructure monitoring & management.
Educational Services
4
‣ We have created our own, custom training classes on technologies we sell & support.
‣ We offer open-enrollment classes in Seattle & Portland, as well as on-site training (which can be customized to your needs).
‣ Our classes include: ‣ Hands On Upgrade to VMware vSphere 6 ‣ Advanced VMware vSphere Performance Tuning ‣ Hands On NetApp cMode ‣ Hands On VMware View/Horizon ‣ Hands On Cisco UCS
‣ Visit our training site for dates & locations.
Hardware & Software
5
‣ IVOXY sells the hardware & software that we recommend, implement & support.
About BC/DR
7
Which One?
‣Disaster recovery focuses on the IT infrastructure (hardware, software, connectivity) required to recover data & systems. ‣Business continuance also includes the necessary components to allow the business to function. ‣A DR plan doesn’t cover office space, telephones, or fax machines; a BC plan covers these needs.
8
Is BC Important?
‣ BC may be important to the business, but should be considered outside of the purview of an IT DR plan.
‣ IT creates DR plans, the business creates BC plans.
Phase I: Getting Buy-In
10
Is the Business On Board?‣ Will the business agree to fund a provisional DR investigation? ‣ Avoid all-or-nothing projects; create realistic estimates of time,
effort, expertise & cost before moving forward. ‣ A phased approach is best; this strategy allows for several
go/no-go gates where the business can stop or wait while conserving resources.
‣ Will the business create priorities & values on its data, downtime & lost productivity?
‣ It is not IT’s responsibility to assign meaningful values to data & the systems that manage the business’ data.
11
Qualifying Questions for the Business‣ What IT applications are mission critical to the business &
why? ‣ What does each hour of downtime for these applications
cost? ‣ What does the loss of data for each of these applications
cost? How would the business recover from an hour of lost data?
‣ Are there regulatory governance for any systems & their associated data?
‣ What type of disasters is the business looking to avoid?
12
Qualifying Questions for the Business‣ What is the organization’s tolerance to a loss in
performance (application responsiveness) during a DR event?
‣ How long will the business use the DR site before it becomes the primary IT location, or before the primary location is rebuilt (i.e. permanent outage/complete loss of site)?
‣ How long can the business wait for a fail-back event (recovering back to the primary site)?
13
The Importance of Phase I
‣ If the business is unwilling or unable to answer these questions, any DR strategy is unlikely to succeed.
‣ It’s often helpful for the business to engage a consultancy to work through these questions; when the clock is ticking on the meter, people behave in an accountable fashion.
‣ This phase helps create & justify a budget.
Terminology
15
Terms‣ Consistency Point Objective (CPO): The complete data &
supporting systems needed to run an application. Absolutely necessary to actually recover a platform.
‣ Recovery Point Objective (RPO): The most amount of data an application/user can afford to lose, according to the plan.
‣ Recovery Time Objective (RTO): The longest amount of time an application can be unavailable, according to the plan.
‣ Native Site Redundancy (NSR): These are systems that are always on at the DR site, like networking, Active Directory and email. In some cases, they use built-in technology to keep all replicas consistent.
16
Terms‣ Business Critical Application (BCA): This is an IT service, consisting of
one or more pieces of software, that is essential to the operation of the organization’s activities.
‣ Snapshot: A snapshot is the state of a system at a particular point in time. In many cases, they allow systems to create copies of data nearly instantaneously. Many systems use these as an integral part of how they replicate data. Also, they can take advantage of technologies like de-duplication and compression to reduce the amount of data that must be sent from the primary to the secondary site.
‣ Backup: The process of copying and archiving of computer data so it may be used to restore the original after a data loss event.
‣ Service Level Agreement (SLA): A commitment to provide a certain level of IT functionality, availability, performance or capacity to the organization.
17
Terms‣ Replication: Sharing information so as to ensure consistency
between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
‣ Fail-over: Moving the active/in-use IT systems & associated data to the DR site.
‣ Fail-back: Fail-back is the process of restoring a system, component, or service in a state of failover back to its original state (before failure). This involves reversing the direction of replication from the secondary (DR) site to the primary (production) site.
‣ Minimum Recovery Distance (MRD): The minimum distance between production & DR sites.
Concepts
19
Minimum Recovery Distance
How far is too far? How close is too close? What disaster(s) are you trying to avoid?
What disaster(s) are likely to affect the production site?
20
Consistency Point Objective
What data - and therefore what systems - are required to make each application consistent?
21
Recovery Point Objective
What is the per-hour cost of lost data? Who in the business can determine that number?
22
Recovery Time Objective
What is the per-hour cost of downtime? Who in the organization can determine that number?
23
Native Site Redundancy
Many systems have their own availability technology. Think Active Directory, Exchange DAG’s & SQL replication.
Phase II: Getting Specific
25
Phase II Topics‣ What IT resources are required to deliver each
application? Ensure the CPO is met for each application.
‣ What is the process to protect each application? ‣ What is the process to recover each application? ‣ What is are the performance & capacity requirements
to deliver each application? ‣ What are the licensing implications for running each
application in the DR site?
26
Phase II Topics‣ What technologies does the business own, that can be
leveraged for DR? ‣ What is the rate of data change within each CPO
group (application & dependencies), within a single RPO period?
‣ What non-live datasets must be protected by the DR solution? ‣ Archive data (may be regulated). ‣ Backup data (to meet an SLA to the business).
27
Phase II Topics‣ Does the organization have access to another site
that can be used for DR? ‣ What are the availability requirements for the DR
site? ‣ What is the rate of data growth for systems part of
the DR plan? ‣ What data protection solutions are supported by the
various vendors that make the application software?
28
Phase II Topics
‣ How will users sessions be redirected to the DR site?
‣ How will data/systems be failed-back? ‣ How will the DR solution be tested?
29
What the Business Should Expect
‣ The costs for a DR strategy can be answered with reasonable accuracy from the answers in Phase II.
‣ Based on this estimate, the organization may or may not want to proceed to the next phase: Execution.
Insight & Advice
31
Integrated Data Protection‣ Think integrated data protection, rather than backup
& DR. ‣ For example, consolidate backup & DR into a single
data protection event. ‣ Array snapshots remain the fastest external technique
to combine backup & DR (replicated snapshots). ‣ Popular array vendors & popular data protection
vendors integrate with each other (CommVault Simpana, for example).
32
Integrated Data Protection‣ A backup makes for a poor DR solution. ‣ Can the backup run frequently enough to meet your RPO? ‣ What is the performance impact of running a backup job
during business hours? ‣ Can the backup data be sent off site frequently enough to
meet the off-site requirements? ‣ How long will recovery take? ‣ What skills are needed to perform the recovery? ‣ What systems are needed to perform the recovery?
33
Keep It Simple‣ Every solution you own... ‣ Costs something to buy, something to keep. ‣ Must have trained administrators. ‣ Will have patches, updates & enhancements that must be
managed. ‣ Must be regularly tested. ‣ Has a support matrix, that in turn must dovetail with other
support matrices. ‣ Has best practices that must be followed.
34
Keep It Simple‣ Will you be the first customer to integrate solution X
with solution Y? Avoid being the guinea pig at all costs.
‣ Will the vendors that make up your applications & data protection solution(s) support your design?
‣ Does the benefit of adding boutique solution X outweigh the additional cost & effort?
‣ The less data protection solutions you own, the better.
35
Hosted Solutions‣ Are you looking at hosted solutions? ‣ Have you created a set of requirements for the
hosting provider? ‣ SLA’s for uptime, performance, network connectivity. ‣ Data governance (encryption, access control, ISO
certification, etc.). ‣ Access to equipment. ‣ Future expansion considerations.
36
Network Connectivity‣ Use dedicated links whenever possible; enforce bandwidth
policies if not. ‣ Has the network connection been tested for throughput,
reliability & errors? ‣ Your 100Mb circuit might become 70Mb during peak hours. ‣ Surprise: Your dedicated MPLS circuit runs on a shared/
best-effort backbone. ‣ Providers will often discard packets when the network is
under stress; you’ll see this as errors on the line & a drop in speed.
37
Network Connectivity
‣ What SLA does your network provider offer? How is it enforced?
‣ If your bandwidth needs increase, can the provider deliver more?
‣ If availability matters, look for single-source providers; the less companies that touch your connection, the better.
38
Network Connectivity
‣ WAN acceleration is worth it. ‣ Plan on 30% overhead for TCP/IP encapsulation. ‣ Add 10% to that for encryption. ‣ WAN acceleration technology is very inexpensive. ‣ WAN acceleration can address error-prone
connections (CRCs in lieu of retransmits).
39
Network Connectivity
‣ Network gateways are an easy way to simplify redirecting traffic.
‣ Shameless plug: F5 Networks.
40
On the Radar: NSX
Common DR Solutions
42
Common DR Solutions‣ Array snapshots, replicated off-site to a partner array. ‣ Virtual machines in lieu of physical servers. ‣ VMware Site Recovery Manager (SRM). ‣ Veeam Backup/Replication. ‣ F5 gateways. ‣ Exchange DAG’s. ‣ Oracle, SQL transactional replication. ‣ Zerto.
In Closing
44
DR/BC Workshop‣We offer on-site DR workshops, which cover the phases & topics covered in this workshop. ‣We sell hardware & software solutions to support your DR initiative. ‣For example, a Phase II DR consulting engagement can be as short as three (3) days. ‣ If your DR plan isn’t successful, we can send your resume to some people we know.
45
For NWPPA Members
‣DR/BC mini-assessment at no cost. ‣Email [email protected] to schedule the assessment. ‣Plan on 4-6 hours for discovery.