high availability as a service (haaas) · 2020. 7. 10. · mohamed sohail data protection and...
TRANSCRIPT
Mohamed SohailData Protection and Availability SpecialistDell [email protected]
Emanuela Caramagna System EngineerPure [email protected]
Sameh GadSenior ConsultantDell [email protected]
HIGH AVAILABILITY AS A SERVICE (HAAAS)
2016 EMC Proven Professional Knowledge Sharing 2
Table of Contents
Abstract ........................................................................................................................... 2
Introduction .................................................................................................................... 5
Components of the Design ............................................................................................... 5
Solution Roadmap ........................................................................................................... 6 List Failure Scenarios .................................................................................................... 7
Example 1: (Failure scenario for an engineering system). ................................................ 7 Evaluate Failure Scenarios .......................................................................................... 10 Map Scenarios to Requirements ................................................................................. 11
Design solution .............................................................................................................. 13
The core architecture ..................................................................................................... 14
Why High availability ..................................................................................................... 15
The journey towards HAaaS ........................................................................................... 16
Importance of HAaaS business model ............................................................................. 16 Risks of the Cloud – Fear of flying ............................................................................... 21
HAaS Use Cases .............................................................................................................. 22 1st use case ................................................................................................................ 22
High Availability as a Service in the Cloud ...................................................................... 22 2nd use case ................................................................................................................ 24
High Availability as a Service to the Cloud approach ..................................................... 24
Conclusion ..................................................................................................................... 26
Preferences ................................................................................................................... 27
List of figures ................................................................................................................. 28
Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do not necessarily reflect Dell EMC’s views, processes or methodologies. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.
2016 EMC Proven Professional Knowledge Sharing 3
Abstract
As society increasingly depends on computer-based systems, the need for ensuring that
services are provided to end-users continuously has become critical. To build a computer
system upon which people can depend, a system designer must first have a clear idea of all
the potential causes that may bring down a system. Back in 1965, Digital Equipment Corp
(DEC) changed the computer world by introducing the first open system IBM Mainframe
technology, the PDP-8.
Figure 1: Mainframe system
This was the first commercial success in the minicomputer area, and opened the door for
future scenarios and today’s open systems technology.
After almost 20 years of computer evolutions customers waited for more robust solutions
with High Availability. In 1984, DEC introduced the first cluster system for VAX/VMS
operating systems.
2016 EMC Proven Professional Knowledge Sharing 4
Figure 2: Digital cluster system
Over time, Cluster architectures have been adopted by most of the major vendors: HP, IBM,
SUN, Linux distributions like Suse and Redhat, and Microsoft. Most of the cluster solutions
worked in an active/passive configuration; the service is hosted on one node and in case of
node failure, service is switched to another node.
This approach had points, such as:
Resources are dedicated to a specific cluster and cannot be shared.
Most cluster platforms have active-passive approach, meaning some resourses
remain in stand-by state.
In most cases, the cluster protects against hardware failure, but restart on other
nodes is not guaranteed; manual intervention is required in the event of
unattended failure.
Cluster approach requires complex architecture with a very high level of
knowledge to be implemented and managed.
This limitation created the need for a new architecture that can allow dynamic sharing of
resources between nodes and provide more robust options in case of hardware failure.
2016 EMC Proven Professional Knowledge Sharing 5
Introduction
High Availability is all about redundancy and the accuracy to identify the failure on all the IT
Infrastructure component levels with an automated action.
Services must be classified based on each organization’s business requirement. The critical
services require 24/7 availability with minimum or no downtime and optimum performance.
This usually requires significant investments in IT infrastructure by having redundancy on all
levels of the datacenter facilities including, Network, Compute, Applications clusters, and so
on to achieve the target availability. Using today’s modern solutions, it has become much
easier/faster to achieve the targets.
High availability is vital to keep the services
alive, as, that single server will not fulfill such
target. However, there are other factors like
hardware failures, data corruption, network
outage, operating systems crash, or even bugs
(in Databases, Application Servers, Web
Servers). So the target is to have a solution that prevents the downtime in such cases and
recovers the situation immediately.
The cluster is a key component of any High Availability solution and is used for redundancy.
It distributes the load between the different cluster nodes to achieve the required
scalability, high response time, load balancing, and performance for mission critical
applications/systems.
Components of the Design
The design and high availability goal is very important to minimize downtime. The new
automation technologies are now based on virtualization of the infrastructure, which
increases the chance to define the requirements to be deployed with just a click (Request).
Actually, the main objective is to have High availability as a service become easy as you just
can select it as a choice, for example, website needs five-nines availability. This can then be
reflected on the subsystems, and start to build all the prerequisites to achieve a 99.999 High
available website.
2016 EMC Proven Professional Knowledge Sharing 6
The parameters below need to be fulfilled.
Expected Number of users/traffic
Web Servers
Application Servers
Database Servers
Storage IOPS
Network Traffic
Backup & Recovery
Solution Roadmap
Up to now, we have learned about the basic principles of good system design for high
availability: categorization in the system stack, redundancy, robustness and simplicity, and
virtualization. But what does this mean for you if you are responsible for producing a
solution and want to create the technical system design? Let us assume that you have
written down the business objectives and the business processes that are relevant for your
system architecture. After that you need to consider the following steps:
List failure scenarios
Evaluate scenarios, and determine their probability
Map scenarios to requirements
Design solution, using the dependency chart methodology
Review the solution, and check its behavior against failure scenarios
These steps are not just executed in sequence. Most important,
solutions, requirements, and failure scenarios are not
independent. If one has a different solution, there might well be
different failures to consider. In addition, different solutions
come with very different price tags attached. Business owners
sometimes want to reconsider their requirements when they
recognize that the protection against some failure scenarios costs more than the damage
that might be caused by them. Therefore, during each step, evaluate if the results make it
necessary to reconsider the previous steps' results. These feedback loops prevent the
consistent-but-wrong design syndrome.
With this iterative approach in mind, let’s examine each of those steps in more detail.
2016 EMC Proven Professional Knowledge Sharing 7
List Failure Scenarios
It is not realistic to list each and every incident
that can render such complex systems unusable.
For example, one can have an outage owing to
resource overload that may be caused by many
reasons: too many users, a software error, either
in the application or the operating system, a
denial of service attack, etc. It is not possible to list all the reasons, but it is possible to list all
components that can fail and what happens if they fail alone or in combination.
We can start by writing up the specific system stack without any redundancy information.
Then we list for each component how that component can fail. The system stack already
gives us a good component categorization that will help us categorize the failure scenarios
as well. First, we will write up high-level failure scenarios, and then iterate over them and
make them more precise by providing more detailed (and more technical) descriptions of
what can go wrong.
Sometimes the owner of a business process has their own failure scenarios, e.g. from past
incidents, that it wants to see covered. Usually, it is easy to add them to the list of generic
failure scenarios. That is a good thing to do even if they are there already in a generalized
form — it will bring better buy-in from that important stakeholder.
Example 1: (Failure scenario for an engineering system).
--------------------------------------------------------------------------------
The following list is an excerpt from failure scenarios for an engineering system that also
utilizes a database with part detail information. This is the second iteration, where high-level
failure scenarios (marked with bullets) are dissected into more specific scenarios (marked
with dashes). The iteration process is not finished yet; the failure scenario list therefore is
not complete and covers only exemplary failures.
But if you compare that list with the one from Table 2.5, it is clear that this is more
structured and oriented along the system stack. It is the result of a structured analysis, and
not of a brainstorming session:
2016 EMC Proven Professional Knowledge Sharing 8
User- or usage-caused failure
Deletion of a small amount of data (up to a few megabytes)
Deletion of a large amount of data (some gigabytes, up to terabytes)
Utilization of too many resources in a thread-based application
Flood of requests/jobs/transactions for a system
Administrator-caused failure
Deletion of application data
Deletion of user or group information
Change to configuration or program makes service nonfunctional
Incomplete change to configuration or program that makes failure protection
nonfunctional (e.g. configuration change on a single cluster node)
Engineering application failures
Aborting of application
Corruption of data by application error
Loss of data by application error
Hung Java virtual machines
Memory leak consuming available main memory
File access denied owing to erroneous security setup
Database failures
Database file corrupted
Database content corrupted
Index corrupted
Database log corrupted
Deadlocks
Automatic recovery not successful, manual intervention needed
Operating system failures
Log files out of space
Disk full
Dead, frozen, or runaway processes
Operating system queues full (CPU load queue, disk, network, … )
Error in hardware driver leads to I/O corruption
2016 EMC Proven Professional Knowledge Sharing 9
File system corruption
Recover by journal possible
Automatic file system check time within the service level agreement (SLA)
Automatic file system check time beyond the SLA
Manual file system repair needed
Storage subsystem failure
Disk media failure
Microcode controller failure
Volume manager failure
Backplane failure
Storage switch interface failure
Hardware failure
CPU failure
Memory failure
Network interface card failure
Backplane failure
Uninterruptible power supply (UPS) failure
Physical environment destroyed
Power outage
Room destroyed (e.g. by fire)
Building destroyed (e.g. by flood)
Site destroyed (e.g. by airplane crash)
Town destroyed (e.g. by hurricane, large earthquake, war)
Infrastructure service unavailable
Active Directory/Lightweight Directory Access Protocol (LDAP) outage, not
reachable, or corrupted
DNS not reachable
Loss of shared network infrastructure
Network latency extended beyond functionality
Virus attack
Switch or router failure
2016 EMC Proven Professional Knowledge Sharing 10
Email not available
Backup server not reachable
License server outage or not reachable
Security incidents
Sabotage
Virus attacks
Denial of service attacks
Break-ins with suspected change of data
You might have noticed that some failure descriptions are quite coarse and do not go into
much detail. Failure scenario selection is guided by experience, and in particular by
experience with potential solutions. When one knows that all faults that are related to
processes will have to be handled the same way (namely, the system must be restarted) it
does not make much sense to distinguish whether the CPU load or the memory queue is full.
Evaluate Failure Scenarios
For each failure scenario, you have to estimate two properties:
1. The probability of the failure
2. The damage that is caused by that failure
But in practice, we cannot determine numbers, neither for the probability nor for the
damage. If we have a similar system running and have had incidents there, we can use this
data for better approximations.
What we can do is determine the relative probability and the relative damage of the
scenarios and map them on a two-dimensional graph. Figure 4.10 shows such a mapping for
selected scenarios.
2016 EMC Proven Professional Knowledge Sharing 11
Figure 3: Scenario mapping on probability and damage estimation
Map Scenarios to Requirements
Scenarios with high probability must be covered within the SLA requirements. All these
failures must lead only to minor outages, i.e. to outages where work can continue in short
timeframes. Protection against this class of failures falls in the realm of high availability.
Usually, some of the failure scenarios are expected to lead to no outage at all, also to no
aborted user sessions. In particular, this is true for defects in disk storage media that happen
quite often. When disks fail, backup disks must take over functionality without any
interruption and without any state changes beyond the operating system or the storage
subsystem.
Our knowledge of business objectives and processes, i.e.,
about the requirements, gives an initial assumption about
maximum outage times per event and maximum outage
times per month or per year for this class of failure
scenarios. For example, business objectives would strive for
at maximum 1 minute per incident and 2 minutes per
month, during 14×5 business hours. (As mentioned in Chapter 1, such measurements are
more illustrative than 99.99%.) Later, when we have seen the costs for such a solution, the
2016 EMC Proven Professional Knowledge Sharing 12
business owners might want to lower their requirements; then we have to iterate the
process described.
There are failure scenarios with low probability and high potential damage that should be
considered as major outages and will not be covered by SLAs. If we choose to protect against
these failures as well, we need to introduce disaster-recovery solutions.
Again, requirements for disaster recovery come from business objectives and processes. The
requirements are expressed in terms of recovery time objectives and recovery point
objectives. For example, requirements might be to achieve functionality again within 72
hours of declaring the disaster, and to lose at most 4 hours of data.
At the very end, there are failure scenarios that we choose not to defend against. Most
often, these failure scenarios are associated with damage to non-IT processes or systems
that is even larger and makes the repair of IT systems unnecessary. It might also be that we
judge their probability to be so low that we will live with it and do not want to spend money
for protection. For example, while coastal regions or cities near rivers will often find it
necessary to protect themselves against floods, businesses in inner areas will often shun
protection against large-scale natural catastrophes like hurricanes or tsunamis.
Eventually, such scenario/requirements mapping means categorization of our scenario map.
We color different areas and tell which kind of protection we want for these failure
scenarios.
Figure 4 takes up Figure 3 and adds those areas. We can also have two other, similar, figures
where we exchange the meaning of the x-axis. In the first one, we use outage times. Then
we can have two markers, one for the maximum minor outage time and one for recovery
time objective. The locations of some failure scenarios in this graph will change, but the idea
is the same: We can show which failure scenario must be handled by which fault protection
method. The second additional figure would use recovery point objectives on the x-axis and
would show requirements on maximum data loss.
2016 EMC Proven Professional Knowledge Sharing 13
Figure 4: Requirement areas added to scenario mapping
It is important to point out that the chart has a large area where no scenario is placed and
which is not touched by any of the requirement areas. We call this area the forbidden zone,
as failure scenarios that appear subsequently must not be located there. If they are, we have
to remap the scenarios and redesign our solution.
The possibility exists that there is a failure scenario with high probability and high damage,
where the protection cost would be very high as well. For example, if an application allowed
a user to erase several hundred gigabytes of data without being able to cancel the process,
and without any undue facility, this might very well lead to a major outage. In such cases,
the only possibility might be to change the application's code, or to select another
application that provides similar functionality.
Design solution
Up to now, we have learned about the basic principles of good system design for high
availability: categorization in the system stack,
redundancy, robustness and simplicity, and virtualization.
But what does this mean for you if you are responsible
for producing a solution and want to create the technical
system design? Let us assume that you have written down the business objectives and the
business processes that are relevant for your system architecture. If you want to produce
2016 EMC Proven Professional Knowledge Sharing 14
the what and how cells of the system architecture, you need to proceed in the following
steps:
1. List failure scenarios
2. Evaluate scenarios, and determine their probability
3. Map scenarios to requirements
4. Design solution, using the dependency chart methodology
5. Review the solution, and check its behavior against failure scenarios
These steps are not just executed in sequence. Most important, solutions, requirements,
and failure scenarios are not independent. If one has a different solution there might well be
different failures to consider. Also, different solutions come with very different price tags
attached. Business owners sometimes want to reconsider their requirements when they
recognize that the protection against some failure scenarios costs more than the damage
that might be caused by them. Therefore, during each step, we need to evaluate if the
results make it necessary to reconsider the previous steps' results. These feedback loops
prevent the consistent-but-wrong design syndrome.
With this iterative approach in mind, let us have a look at each of those steps in more detail.
The core architecture
“VIRTUALIZATION” is the key word for today’s architectures and future years; VMware
created the first-in-class hypervisor solution for open systems. VMware changed the game
again and created a complete hypervisor infrastructure that allows installing and managing
complete virtualized infrastructure.
Figure 5 shows some key points in VMware development history.
2016 EMC Proven Professional Knowledge Sharing 15
Figure 5: VMware development history overview
The new strategy includes a software layer that enables High Availability & Disaster
Recovery like RecoverPoint and VPLEX deployed as a service in ViPR, with advanced
reporting features like chargeback, capacity planning, service status reports and history, and
self-service deploy approaches.
Why High Availability?
The answer lies in the consequences when the desired services are not available. Imagine
you were one of the one million mobile phone users in Finland who were affected by a
widespread disturbance of a mobile telephone service [1] and had problems receiving your
incoming calls and text messages. The interruption of service, reportedly caused by a data
overload in the network, lasted for about seven hours during the day. You could also picture
yourself as one of the four million mobile phone subscribers in Sweden when a fault,
although not specified, caused the network to fail and unable to provide you with mobile
phones services [2]. The disruption lasted for about twelve hours, beginning in the afternoon
and continuing until around midnight.
Another high-profile and high-impact computer system failure was at Amazon Web Services
[4] for providing web hosting services by means of its cloud infrastructure to many web
sites. The failure was reportedly caused by an upgrade of network capacity and lasted for
almost four days before the last affected consumer data were recovered [5], although 0.07%
of the affected data could not be restored. The consequence of this failure was the
2016 EMC Proven Professional Knowledge Sharing 16
unavailability of services to the end customers of the web sites using the hosting services.
Amazon had also paid 10-day service credits to those affected customers.
The journey towards HAaaS
EMC made many steps toward achieving this vision of HAaaS with its federation portfolio.
Figure 6 illustrates the new strategy of a software layer that enables high availability based
on Software-Defined Data Center Architecture..
Figure 6: SDDC diagram
We will show in the next pages our vision to achieve the 99.999 nines to reach this.
Importance of HAaaS business model
High availability (HA) is paramount to the modern business and mission critical applications
because of its critical position and open design. A key strength of the modern business is the
ability to interact with multiple cross-format applications; however, this strength also
creates multiple touch-points that can affect availability. The conclusion? Mission critical
applications require a well-designed HA solution that can protect and maintain uptime not
only for the Infrastructure services but also on the application level services.
2016 EMC Proven Professional Knowledge Sharing 17
Figure 7: Infrastructure as a Service as well the Platform as a Service leverage the High availability active-active solution
One of the success models is EMC model. EMC High Availability Stack is a solid solution to
minimize the impact and downtime of critical applications and automate the recovery on the
service level to insure continuous/maximum services availability. This can be done by
integrating the VMware HA with the third party cluster software and insure 360-degree
service availability.
The new approach for recovery time or estimated time of repair (ETR) can be minimized if
the failure/fault is detected through hypervisor-level platforms.
High Availability includes:
HA Infrastructure/Storage
HA Network/Security
HA Servers/OS
HAaaS primary components are:
Cluster Software
Services integration with the cluster software
2016 EMC Proven Professional Knowledge Sharing 18
Cluster File System
Application
Figure 8: Extended Cluster Service between the sites on the application level availability
Figure 9: protecting the database services against ESXi host failures.
The VMware capability can include the cluster services and manage the third party products
such as shown in Figure 10, 11, 12.
2016 EMC Proven Professional Knowledge Sharing 19
File Systems
Web Servers
Application Servers
Database Servers
Figure 10: High Availability with the application components included on top of the virtualization layer
Figure 11: the HAaaS targets control each tier availability making sure there is end-to-end redundancy and consistency on the virtualization level
2016 EMC Proven Professional Knowledge Sharing 20
Figure 12: High Availability is important as well on the application level not the VMs or the Hypervisors
There are available cluster software for the virtualized applications/databases provided by
3rd party products that can fulfill the requirements to have an end-to-end HAaaS Solution:
HA Infrastructure/Storage
HA Network/Security
HA Servers/OS
HA File Systems
HA Applications (Existing 3rd Party Cluster Products/Services the Logical
Layer)
HA Backup & Recovery
2016 EMC Proven Professional Knowledge Sharing 21
Figure 13 Cluster Active/Passive example will require minimal downtime to failover the DB service
Risks of the Cloud – Fear of flying
Enterprises know that the cloud will change IT, but security and performance are a concern.
Each cloud model has potential risks: reliability, adaptability, application compatibility,
efficiency, scaling, locking, security, and compliance.
Companies must select an enterprise cloud solution to suit a complex mix of applications;
these decisions require great care. This solution should offer the only enterprise-class cloud
solution, designed for mission-critical applications, with performance and security built for
the enterprise. To achieve this, Enterprise or service providers should combine existing IT by
building private clouds, using virtual private clouds, and accessing public clouds.
Therefore, it might be beyond virtualization. This will require a mechanism that manages
compute, memory, storage, and networking with small cloud units, μVMs. Unlike fixed-size
VMs, μVMs are dynamically allocated, and optimized per application with simple automatic
monitoring and control.
2016 EMC Proven Professional Knowledge Sharing 22
Figure 14: layers of sevices of data protection
In general, Protection services are divided into three levels of protection and EMC can cover
all these requirements adding Virtustream for mission critical applications requirements. In
addition, it is possible to create an architecture with multiple levels of protection to enable
different service levels and multiple copies managed or retained inside the infrastructure.
The next two sessions use two real uses cases to describe in more detail.
HAaS Use Cases
1st use case
High Availability as a Service in the Cloud
In this real world example, the entire infrastructure is inside the Cloud Service provider site.
Service provider has three sites with a HA + DR approach and HA sites can host active vApps.
Five years ago a regional telecommunication company decided to change its strategy by
adding Cloud services within their service catalog. It was a real challenging situation for
them as it was a new deployment from scratch, they did not know the market horizons, the
technologies that can enable them to do so, and they were guided by the request of the
customers.
Their internal infrastructure was based on VMware, Cisco, and EMC and they decided to
maintain this infrastructure also for the services provided to their customers.
Their first Cloud service was computation with two sites configuration with VMware vCloud
director and EMC VPLEX.
2016 EMC Proven Professional Knowledge Sharing 23
The company is well positioned within their region because they own the entire connectivity
infrastructure, but on the initial phase, the problem was to “learn” how to sell Cloud
services.
EMC supported them to create new services and offer them to customers in an innovative
way.
Every HA architecture requires an application that supports HA such as VMware, Oracle, and
Hyper-V. Enabling customers to create a vApp or a Virtual machine is the easiest and most
flexible approach to implement HAaaS.
This Cloud provider can now offer different and combined levels of protection with different
RTO / RPO and related cost. Compute services are delivered with VMware orchestrated by a
vCloud Director with a virtual Data Center configured for every customer.
Cloud service provider configuration allows implementing different protection levels:
Local Protection, with VPLEX HA on a specific site
Remote protection in HA with a VPLEX Metro configuration between two sites
Disaster Recovery to a remote site with RecoverPoint
Recovery from backup with Avamar and Data Domain configuration
Figure 15: Cloud provider disaster recovery
2016 EMC Proven Professional Knowledge Sharing 24
Cloud service provider service catalog allows combining different levels of protection to
match the customer requirements.
One real example of a customer environment is shown below:
Level Number of VM Total Size Configured services Mission Critical 10 4TB HA Local +
Remote+DR+Backup Critical 30 6TB HA Remote+DR+Backup
Standard 40 8TB DR+Backup Test&Dev 30 5TB Backup
This approach creates additional value on Cloud Provider services and enables customers to
model cost of real Business values of the Virtual machine.
2nd use case
High Availability as a Service to the Cloud approach
One year ago, an Italian service provider wanted to find a solution to offer a Hybrid Cloud
HaaS/DRaaS to their customer. This service provider focuses on Virtual Environments, and
offers Cloud services, and managed services. Their offering is created to answer several
market segments, but their focus is concentrated on the Public sector where they have a
considerable presence.
When RecoverPoint for Virtual Machine was presented, they immediately agreed that it was
the solution for the new HaaS/DRaaS services that they want to launch.
They were not a strong EMC customer (most of the infrastructure was based on IBM and
Dell technologies) but with RecoverPoint for VM, they could offer an independent service
platform with different levels of RTO/RPO in base of connectivity.
The most important point for this service provider was that they should be able to offer a
real Hybrid Cloud Solution with the possibility to migrate end customers’ infrastructure in
the Cloud without any additional efforts.
This scenario could be deployed by a real HA implementation and the schema can be the
same as for Cloud. In this scenario, the customer virtual machines are distributed
dynamically between the available redundant sites.
2016 EMC Proven Professional Knowledge Sharing 25
Our customer can decide to adopt a strategy near HA with RecoverPoint for virtual
machines. With this technology, a customer can have a synchronous replication with a
minimum RTO time. This approach is positioned near the availability as represented in the
figure below.
Figure 16: Near high availability - Near HA
With this technology, the Cloud service provider can define different level of protection:
Near HA with synchronous replication and local/remote protection
Disaster recovery with asynchronous replication and local/remote protection
This architecture allows sharing Cloud resources reducing service costs for provider and
customer.
Figure 17: Architecture overview
With this technology approach, it is possible to configure multiple site protection and the
Cloud provider can manage, for example, two remote sites with two different copies and
journaling of VMs.
PROVIDER SITE
TENANT B
TENANT C
TENANT A
2016 EMC Proven Professional Knowledge Sharing 26
This is an example of a topology with a shared vCenter inside the Cloud Provider site.
Figure 18: vCenter schema
RecoverPoint for virtual machine have several advantages inside a VMware environment:
Protection granularity is at VM level and not based on LUN.
Customer can define a restart sequence in case of disaster recovery, helping to
create a flexible disaster recovery plan.
Cloud provider can manage VM directly from central vCenter.
Customer will have a self-service portal with guided procedure to protect a VM and
test or manage a start on remote site
Product allows managing DR test inside isolated network and with different network
addresses.
The entire infrastructure is virtual without any hardware appliance at customer site.
Conclusion
High Availability as a Service becomes an essential part of the modern virtualization
solutions to insure service availability. Figure 19 shows the importance of integrity between
the application solutions to reach the targeted availability. However, each product can play a
vital role during the system design and the automation process.
2016 EMC Proven Professional Knowledge Sharing 27
Considerations while building HAaaS:
Understand the exact business requirements including RTO, RPO, SLO, SLA, and SLM.
High availability requires a lot of investments, so it is crucial to evaluate the
environment and choose the exact products/components. It’s a key success factor to
accomplish this target successfully.
Skills are a very important in the design, building, and operation phases in order to
maximize the efficiency/benefits from the HAaaS solutions and build a modern and
stable cloud solution.
Figure 19: High availability scenario for Oracle Database Virtual Machine
References
Service Availability: Principles and Practice by Maria Toeroe and Francis Tam (eds)
https://en.wikipedia.org/wiki/High_availability
https://en.wikipedia.org/wiki/High-availability_cluster
https://virtuallylg.wordpress.com/2013/10/10/comparing-vmware-vsphere-app-ha-
with-symantec-applicationha/
http://www.storagereview.com/vmware_vmmark_virtualization_benchmark
http://mandarshinde.com/elasticsearch-basics/
http://virtcloud.blogspot.com.eg/2011/07/designing-your-private-cloud-with.html
High Availability and Disaster Recovery—Concepts, Design, Implementation
Virtuostream.com
2016 EMC Proven Professional Knowledge Sharing 28
List of figures
Figure 1: Mainframe system ...................................................................................................... 3 Figure 2: Digital cluster system ................................................................................................. 4 Figure 3: Scenario mapping on probability and damage estimation ...................................... 11 Figure 4: Requirement areas added to scenario mapping ...................................................... 13 Figure 5: VMware development history overview .................................................................. 15 Figure 6: SDDC diagram ........................................................................................................... 16 Figure 7 IaaS, PaaS and SaaS as part of Infrastructure as service as well the Platform as Service the High availability, as service is important .............................................................. 17 Figure 8: Extended Cluster Service between the sites on the application level availability ... 18 Figure 9: protecting the database services against ESXi host failures. ................................... 18 Figure 10: High Availability with the application components included on the top of the virtualization layer ................................................................................................................... 19 Figure 11: the HAaaS targets are to control each tier availability making sure there is end-to-end redundancy & consistency on the virtualization level ..................................................... 19 Figure 12: High Availability important as well on the application level not the VMs or the Hypervisors .............................................................................................................................. 20 Figure 13 Cluster Active/Passive example will require minimal downtime to failover the DB service ...................................................................................................................................... 21 Figure 14: layers of sevices of data protection ....................................................................... 22 Figure 15: Cloud provider disaster recovery ........................................................................... 23 Figure 16: Near high availability - Near HA ............................................................................. 25 Figure 17: Architecture overview ............................................................................................ 25 Figure 18: Vcenter schema ...................................................................................................... 26 Figure 19: High availability scenario for Oracle Database Virtual Machine ............................ 27
Dell EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying and distribution of any Dell EMC software described in this publication requires an applicable software license.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.