high availability as a service (haaas) · 2020. 7. 10. · mohamed sohail data protection and...

28
Mohamed Sohail Data Protection and Availability Specialist Dell EMC [email protected] Emanuela Caramagna System Engineer Pure Storage [email protected] Sameh Gad Senior Consultant Dell EMC [email protected] HIGH AVAILABILITY AS A SERVICE (HAAAS)

Upload: others

Post on 19-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

Mohamed SohailData Protection and Availability SpecialistDell [email protected]

Emanuela Caramagna System EngineerPure [email protected]

Sameh GadSenior ConsultantDell [email protected]

HIGH AVAILABILITY AS A SERVICE (HAAAS)

Page 2: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 2

Table of Contents

Abstract ........................................................................................................................... 2

Introduction .................................................................................................................... 5

Components of the Design ............................................................................................... 5

Solution Roadmap ........................................................................................................... 6 List Failure Scenarios .................................................................................................... 7

Example 1: (Failure scenario for an engineering system). ................................................ 7 Evaluate Failure Scenarios .......................................................................................... 10 Map Scenarios to Requirements ................................................................................. 11

Design solution .............................................................................................................. 13

The core architecture ..................................................................................................... 14

Why High availability ..................................................................................................... 15

The journey towards HAaaS ........................................................................................... 16

Importance of HAaaS business model ............................................................................. 16 Risks of the Cloud – Fear of flying ............................................................................... 21

HAaS Use Cases .............................................................................................................. 22 1st use case ................................................................................................................ 22

High Availability as a Service in the Cloud ...................................................................... 22 2nd use case ................................................................................................................ 24

High Availability as a Service to the Cloud approach ..................................................... 24

Conclusion ..................................................................................................................... 26

Preferences ................................................................................................................... 27

List of figures ................................................................................................................. 28

Disclaimer: The views, processes or methodologies published in this article are those of the authors. They do not necessarily reflect Dell EMC’s views, processes or methodologies. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

Page 3: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 3

Abstract

As society increasingly depends on computer-based systems, the need for ensuring that

services are provided to end-users continuously has become critical. To build a computer

system upon which people can depend, a system designer must first have a clear idea of all

the potential causes that may bring down a system. Back in 1965, Digital Equipment Corp

(DEC) changed the computer world by introducing the first open system IBM Mainframe

technology, the PDP-8.

Figure 1: Mainframe system

This was the first commercial success in the minicomputer area, and opened the door for

future scenarios and today’s open systems technology.

After almost 20 years of computer evolutions customers waited for more robust solutions

with High Availability. In 1984, DEC introduced the first cluster system for VAX/VMS

operating systems.

Page 4: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 4

Figure 2: Digital cluster system

Over time, Cluster architectures have been adopted by most of the major vendors: HP, IBM,

SUN, Linux distributions like Suse and Redhat, and Microsoft. Most of the cluster solutions

worked in an active/passive configuration; the service is hosted on one node and in case of

node failure, service is switched to another node.

This approach had points, such as:

Resources are dedicated to a specific cluster and cannot be shared.

Most cluster platforms have active-passive approach, meaning some resourses

remain in stand-by state.

In most cases, the cluster protects against hardware failure, but restart on other

nodes is not guaranteed; manual intervention is required in the event of

unattended failure.

Cluster approach requires complex architecture with a very high level of

knowledge to be implemented and managed.

This limitation created the need for a new architecture that can allow dynamic sharing of

resources between nodes and provide more robust options in case of hardware failure.

Page 5: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 5

Introduction

High Availability is all about redundancy and the accuracy to identify the failure on all the IT

Infrastructure component levels with an automated action.

Services must be classified based on each organization’s business requirement. The critical

services require 24/7 availability with minimum or no downtime and optimum performance.

This usually requires significant investments in IT infrastructure by having redundancy on all

levels of the datacenter facilities including, Network, Compute, Applications clusters, and so

on to achieve the target availability. Using today’s modern solutions, it has become much

easier/faster to achieve the targets.

High availability is vital to keep the services

alive, as, that single server will not fulfill such

target. However, there are other factors like

hardware failures, data corruption, network

outage, operating systems crash, or even bugs

(in Databases, Application Servers, Web

Servers). So the target is to have a solution that prevents the downtime in such cases and

recovers the situation immediately.

The cluster is a key component of any High Availability solution and is used for redundancy.

It distributes the load between the different cluster nodes to achieve the required

scalability, high response time, load balancing, and performance for mission critical

applications/systems.

Components of the Design

The design and high availability goal is very important to minimize downtime. The new

automation technologies are now based on virtualization of the infrastructure, which

increases the chance to define the requirements to be deployed with just a click (Request).

Actually, the main objective is to have High availability as a service become easy as you just

can select it as a choice, for example, website needs five-nines availability. This can then be

reflected on the subsystems, and start to build all the prerequisites to achieve a 99.999 High

available website.

Page 6: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 6

The parameters below need to be fulfilled.

Expected Number of users/traffic

Web Servers

Application Servers

Database Servers

Storage IOPS

Network Traffic

Backup & Recovery

Solution Roadmap

Up to now, we have learned about the basic principles of good system design for high

availability: categorization in the system stack, redundancy, robustness and simplicity, and

virtualization. But what does this mean for you if you are responsible for producing a

solution and want to create the technical system design? Let us assume that you have

written down the business objectives and the business processes that are relevant for your

system architecture. After that you need to consider the following steps:

List failure scenarios

Evaluate scenarios, and determine their probability

Map scenarios to requirements

Design solution, using the dependency chart methodology

Review the solution, and check its behavior against failure scenarios

These steps are not just executed in sequence. Most important,

solutions, requirements, and failure scenarios are not

independent. If one has a different solution, there might well be

different failures to consider. In addition, different solutions

come with very different price tags attached. Business owners

sometimes want to reconsider their requirements when they

recognize that the protection against some failure scenarios costs more than the damage

that might be caused by them. Therefore, during each step, evaluate if the results make it

necessary to reconsider the previous steps' results. These feedback loops prevent the

consistent-but-wrong design syndrome.

With this iterative approach in mind, let’s examine each of those steps in more detail.

Page 7: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 7

List Failure Scenarios

It is not realistic to list each and every incident

that can render such complex systems unusable.

For example, one can have an outage owing to

resource overload that may be caused by many

reasons: too many users, a software error, either

in the application or the operating system, a

denial of service attack, etc. It is not possible to list all the reasons, but it is possible to list all

components that can fail and what happens if they fail alone or in combination.

We can start by writing up the specific system stack without any redundancy information.

Then we list for each component how that component can fail. The system stack already

gives us a good component categorization that will help us categorize the failure scenarios

as well. First, we will write up high-level failure scenarios, and then iterate over them and

make them more precise by providing more detailed (and more technical) descriptions of

what can go wrong.

Sometimes the owner of a business process has their own failure scenarios, e.g. from past

incidents, that it wants to see covered. Usually, it is easy to add them to the list of generic

failure scenarios. That is a good thing to do even if they are there already in a generalized

form — it will bring better buy-in from that important stakeholder.

Example 1: (Failure scenario for an engineering system).

--------------------------------------------------------------------------------

The following list is an excerpt from failure scenarios for an engineering system that also

utilizes a database with part detail information. This is the second iteration, where high-level

failure scenarios (marked with bullets) are dissected into more specific scenarios (marked

with dashes). The iteration process is not finished yet; the failure scenario list therefore is

not complete and covers only exemplary failures.

But if you compare that list with the one from Table 2.5, it is clear that this is more

structured and oriented along the system stack. It is the result of a structured analysis, and

not of a brainstorming session:

Page 8: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 8

User- or usage-caused failure

Deletion of a small amount of data (up to a few megabytes)

Deletion of a large amount of data (some gigabytes, up to terabytes)

Utilization of too many resources in a thread-based application

Flood of requests/jobs/transactions for a system

Administrator-caused failure

Deletion of application data

Deletion of user or group information

Change to configuration or program makes service nonfunctional

Incomplete change to configuration or program that makes failure protection

nonfunctional (e.g. configuration change on a single cluster node)

Engineering application failures

Aborting of application

Corruption of data by application error

Loss of data by application error

Hung Java virtual machines

Memory leak consuming available main memory

File access denied owing to erroneous security setup

Database failures

Database file corrupted

Database content corrupted

Index corrupted

Database log corrupted

Deadlocks

Automatic recovery not successful, manual intervention needed

Operating system failures

Log files out of space

Disk full

Dead, frozen, or runaway processes

Operating system queues full (CPU load queue, disk, network, … )

Error in hardware driver leads to I/O corruption

Page 9: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 9

File system corruption

Recover by journal possible

Automatic file system check time within the service level agreement (SLA)

Automatic file system check time beyond the SLA

Manual file system repair needed

Storage subsystem failure

Disk media failure

Microcode controller failure

Volume manager failure

Backplane failure

Storage switch interface failure

Hardware failure

CPU failure

Memory failure

Network interface card failure

Backplane failure

Uninterruptible power supply (UPS) failure

Physical environment destroyed

Power outage

Room destroyed (e.g. by fire)

Building destroyed (e.g. by flood)

Site destroyed (e.g. by airplane crash)

Town destroyed (e.g. by hurricane, large earthquake, war)

Infrastructure service unavailable

Active Directory/Lightweight Directory Access Protocol (LDAP) outage, not

reachable, or corrupted

DNS not reachable

Loss of shared network infrastructure

Network latency extended beyond functionality

Virus attack

Switch or router failure

Page 10: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 10

Email not available

Backup server not reachable

License server outage or not reachable

Security incidents

Sabotage

Virus attacks

Denial of service attacks

Break-ins with suspected change of data

You might have noticed that some failure descriptions are quite coarse and do not go into

much detail. Failure scenario selection is guided by experience, and in particular by

experience with potential solutions. When one knows that all faults that are related to

processes will have to be handled the same way (namely, the system must be restarted) it

does not make much sense to distinguish whether the CPU load or the memory queue is full.

Evaluate Failure Scenarios

For each failure scenario, you have to estimate two properties:

1. The probability of the failure

2. The damage that is caused by that failure

But in practice, we cannot determine numbers, neither for the probability nor for the

damage. If we have a similar system running and have had incidents there, we can use this

data for better approximations.

What we can do is determine the relative probability and the relative damage of the

scenarios and map them on a two-dimensional graph. Figure 4.10 shows such a mapping for

selected scenarios.

Page 11: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 11

Figure 3: Scenario mapping on probability and damage estimation

Map Scenarios to Requirements

Scenarios with high probability must be covered within the SLA requirements. All these

failures must lead only to minor outages, i.e. to outages where work can continue in short

timeframes. Protection against this class of failures falls in the realm of high availability.

Usually, some of the failure scenarios are expected to lead to no outage at all, also to no

aborted user sessions. In particular, this is true for defects in disk storage media that happen

quite often. When disks fail, backup disks must take over functionality without any

interruption and without any state changes beyond the operating system or the storage

subsystem.

Our knowledge of business objectives and processes, i.e.,

about the requirements, gives an initial assumption about

maximum outage times per event and maximum outage

times per month or per year for this class of failure

scenarios. For example, business objectives would strive for

at maximum 1 minute per incident and 2 minutes per

month, during 14×5 business hours. (As mentioned in Chapter 1, such measurements are

more illustrative than 99.99%.) Later, when we have seen the costs for such a solution, the

Page 12: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 12

business owners might want to lower their requirements; then we have to iterate the

process described.

There are failure scenarios with low probability and high potential damage that should be

considered as major outages and will not be covered by SLAs. If we choose to protect against

these failures as well, we need to introduce disaster-recovery solutions.

Again, requirements for disaster recovery come from business objectives and processes. The

requirements are expressed in terms of recovery time objectives and recovery point

objectives. For example, requirements might be to achieve functionality again within 72

hours of declaring the disaster, and to lose at most 4 hours of data.

At the very end, there are failure scenarios that we choose not to defend against. Most

often, these failure scenarios are associated with damage to non-IT processes or systems

that is even larger and makes the repair of IT systems unnecessary. It might also be that we

judge their probability to be so low that we will live with it and do not want to spend money

for protection. For example, while coastal regions or cities near rivers will often find it

necessary to protect themselves against floods, businesses in inner areas will often shun

protection against large-scale natural catastrophes like hurricanes or tsunamis.

Eventually, such scenario/requirements mapping means categorization of our scenario map.

We color different areas and tell which kind of protection we want for these failure

scenarios.

Figure 4 takes up Figure 3 and adds those areas. We can also have two other, similar, figures

where we exchange the meaning of the x-axis. In the first one, we use outage times. Then

we can have two markers, one for the maximum minor outage time and one for recovery

time objective. The locations of some failure scenarios in this graph will change, but the idea

is the same: We can show which failure scenario must be handled by which fault protection

method. The second additional figure would use recovery point objectives on the x-axis and

would show requirements on maximum data loss.

Page 13: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 13

Figure 4: Requirement areas added to scenario mapping

It is important to point out that the chart has a large area where no scenario is placed and

which is not touched by any of the requirement areas. We call this area the forbidden zone,

as failure scenarios that appear subsequently must not be located there. If they are, we have

to remap the scenarios and redesign our solution.

The possibility exists that there is a failure scenario with high probability and high damage,

where the protection cost would be very high as well. For example, if an application allowed

a user to erase several hundred gigabytes of data without being able to cancel the process,

and without any undue facility, this might very well lead to a major outage. In such cases,

the only possibility might be to change the application's code, or to select another

application that provides similar functionality.

Design solution

Up to now, we have learned about the basic principles of good system design for high

availability: categorization in the system stack,

redundancy, robustness and simplicity, and virtualization.

But what does this mean for you if you are responsible

for producing a solution and want to create the technical

system design? Let us assume that you have written down the business objectives and the

business processes that are relevant for your system architecture. If you want to produce

Page 14: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 14

the what and how cells of the system architecture, you need to proceed in the following

steps:

1. List failure scenarios

2. Evaluate scenarios, and determine their probability

3. Map scenarios to requirements

4. Design solution, using the dependency chart methodology

5. Review the solution, and check its behavior against failure scenarios

These steps are not just executed in sequence. Most important, solutions, requirements,

and failure scenarios are not independent. If one has a different solution there might well be

different failures to consider. Also, different solutions come with very different price tags

attached. Business owners sometimes want to reconsider their requirements when they

recognize that the protection against some failure scenarios costs more than the damage

that might be caused by them. Therefore, during each step, we need to evaluate if the

results make it necessary to reconsider the previous steps' results. These feedback loops

prevent the consistent-but-wrong design syndrome.

With this iterative approach in mind, let us have a look at each of those steps in more detail.

The core architecture

“VIRTUALIZATION” is the key word for today’s architectures and future years; VMware

created the first-in-class hypervisor solution for open systems. VMware changed the game

again and created a complete hypervisor infrastructure that allows installing and managing

complete virtualized infrastructure.

Figure 5 shows some key points in VMware development history.

Page 15: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 15

Figure 5: VMware development history overview

The new strategy includes a software layer that enables High Availability & Disaster

Recovery like RecoverPoint and VPLEX deployed as a service in ViPR, with advanced

reporting features like chargeback, capacity planning, service status reports and history, and

self-service deploy approaches.

Why High Availability?

The answer lies in the consequences when the desired services are not available. Imagine

you were one of the one million mobile phone users in Finland who were affected by a

widespread disturbance of a mobile telephone service [1] and had problems receiving your

incoming calls and text messages. The interruption of service, reportedly caused by a data

overload in the network, lasted for about seven hours during the day. You could also picture

yourself as one of the four million mobile phone subscribers in Sweden when a fault,

although not specified, caused the network to fail and unable to provide you with mobile

phones services [2]. The disruption lasted for about twelve hours, beginning in the afternoon

and continuing until around midnight.

Another high-profile and high-impact computer system failure was at Amazon Web Services

[4] for providing web hosting services by means of its cloud infrastructure to many web

sites. The failure was reportedly caused by an upgrade of network capacity and lasted for

almost four days before the last affected consumer data were recovered [5], although 0.07%

of the affected data could not be restored. The consequence of this failure was the

Page 16: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 16

unavailability of services to the end customers of the web sites using the hosting services.

Amazon had also paid 10-day service credits to those affected customers.

The journey towards HAaaS

EMC made many steps toward achieving this vision of HAaaS with its federation portfolio.

Figure 6 illustrates the new strategy of a software layer that enables high availability based

on Software-Defined Data Center Architecture..

Figure 6: SDDC diagram

We will show in the next pages our vision to achieve the 99.999 nines to reach this.

Importance of HAaaS business model

High availability (HA) is paramount to the modern business and mission critical applications

because of its critical position and open design. A key strength of the modern business is the

ability to interact with multiple cross-format applications; however, this strength also

creates multiple touch-points that can affect availability. The conclusion? Mission critical

applications require a well-designed HA solution that can protect and maintain uptime not

only for the Infrastructure services but also on the application level services.

Page 17: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 17

Figure 7: Infrastructure as a Service as well the Platform as a Service leverage the High availability active-active solution

One of the success models is EMC model. EMC High Availability Stack is a solid solution to

minimize the impact and downtime of critical applications and automate the recovery on the

service level to insure continuous/maximum services availability. This can be done by

integrating the VMware HA with the third party cluster software and insure 360-degree

service availability.

The new approach for recovery time or estimated time of repair (ETR) can be minimized if

the failure/fault is detected through hypervisor-level platforms.

High Availability includes:

HA Infrastructure/Storage

HA Network/Security

HA Servers/OS

HAaaS primary components are:

Cluster Software

Services integration with the cluster software

Page 18: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 18

Cluster File System

Application

Figure 8: Extended Cluster Service between the sites on the application level availability

Figure 9: protecting the database services against ESXi host failures.

The VMware capability can include the cluster services and manage the third party products

such as shown in Figure 10, 11, 12.

Page 19: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 19

File Systems

Web Servers

Application Servers

Database Servers

Figure 10: High Availability with the application components included on top of the virtualization layer

Figure 11: the HAaaS targets control each tier availability making sure there is end-to-end redundancy and consistency on the virtualization level

Page 20: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 20

Figure 12: High Availability is important as well on the application level not the VMs or the Hypervisors

There are available cluster software for the virtualized applications/databases provided by

3rd party products that can fulfill the requirements to have an end-to-end HAaaS Solution:

HA Infrastructure/Storage

HA Network/Security

HA Servers/OS

HA File Systems

HA Applications (Existing 3rd Party Cluster Products/Services the Logical

Layer)

HA Backup & Recovery

Page 21: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 21

Figure 13 Cluster Active/Passive example will require minimal downtime to failover the DB service

Risks of the Cloud – Fear of flying

Enterprises know that the cloud will change IT, but security and performance are a concern.

Each cloud model has potential risks: reliability, adaptability, application compatibility,

efficiency, scaling, locking, security, and compliance.

Companies must select an enterprise cloud solution to suit a complex mix of applications;

these decisions require great care. This solution should offer the only enterprise-class cloud

solution, designed for mission-critical applications, with performance and security built for

the enterprise. To achieve this, Enterprise or service providers should combine existing IT by

building private clouds, using virtual private clouds, and accessing public clouds.

Therefore, it might be beyond virtualization. This will require a mechanism that manages

compute, memory, storage, and networking with small cloud units, μVMs. Unlike fixed-size

VMs, μVMs are dynamically allocated, and optimized per application with simple automatic

monitoring and control.

Page 22: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 22

Figure 14: layers of sevices of data protection

In general, Protection services are divided into three levels of protection and EMC can cover

all these requirements adding Virtustream for mission critical applications requirements. In

addition, it is possible to create an architecture with multiple levels of protection to enable

different service levels and multiple copies managed or retained inside the infrastructure.

The next two sessions use two real uses cases to describe in more detail.

HAaS Use Cases

1st use case

High Availability as a Service in the Cloud

In this real world example, the entire infrastructure is inside the Cloud Service provider site.

Service provider has three sites with a HA + DR approach and HA sites can host active vApps.

Five years ago a regional telecommunication company decided to change its strategy by

adding Cloud services within their service catalog. It was a real challenging situation for

them as it was a new deployment from scratch, they did not know the market horizons, the

technologies that can enable them to do so, and they were guided by the request of the

customers.

Their internal infrastructure was based on VMware, Cisco, and EMC and they decided to

maintain this infrastructure also for the services provided to their customers.

Their first Cloud service was computation with two sites configuration with VMware vCloud

director and EMC VPLEX.

Page 23: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 23

The company is well positioned within their region because they own the entire connectivity

infrastructure, but on the initial phase, the problem was to “learn” how to sell Cloud

services.

EMC supported them to create new services and offer them to customers in an innovative

way.

Every HA architecture requires an application that supports HA such as VMware, Oracle, and

Hyper-V. Enabling customers to create a vApp or a Virtual machine is the easiest and most

flexible approach to implement HAaaS.

This Cloud provider can now offer different and combined levels of protection with different

RTO / RPO and related cost. Compute services are delivered with VMware orchestrated by a

vCloud Director with a virtual Data Center configured for every customer.

Cloud service provider configuration allows implementing different protection levels:

Local Protection, with VPLEX HA on a specific site

Remote protection in HA with a VPLEX Metro configuration between two sites

Disaster Recovery to a remote site with RecoverPoint

Recovery from backup with Avamar and Data Domain configuration

Figure 15: Cloud provider disaster recovery

Page 24: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 24

Cloud service provider service catalog allows combining different levels of protection to

match the customer requirements.

One real example of a customer environment is shown below:

Level Number of VM Total Size Configured services Mission Critical 10 4TB HA Local +

Remote+DR+Backup Critical 30 6TB HA Remote+DR+Backup

Standard 40 8TB DR+Backup Test&Dev 30 5TB Backup

This approach creates additional value on Cloud Provider services and enables customers to

model cost of real Business values of the Virtual machine.

2nd use case

High Availability as a Service to the Cloud approach

One year ago, an Italian service provider wanted to find a solution to offer a Hybrid Cloud

HaaS/DRaaS to their customer. This service provider focuses on Virtual Environments, and

offers Cloud services, and managed services. Their offering is created to answer several

market segments, but their focus is concentrated on the Public sector where they have a

considerable presence.

When RecoverPoint for Virtual Machine was presented, they immediately agreed that it was

the solution for the new HaaS/DRaaS services that they want to launch.

They were not a strong EMC customer (most of the infrastructure was based on IBM and

Dell technologies) but with RecoverPoint for VM, they could offer an independent service

platform with different levels of RTO/RPO in base of connectivity.

The most important point for this service provider was that they should be able to offer a

real Hybrid Cloud Solution with the possibility to migrate end customers’ infrastructure in

the Cloud without any additional efforts.

This scenario could be deployed by a real HA implementation and the schema can be the

same as for Cloud. In this scenario, the customer virtual machines are distributed

dynamically between the available redundant sites.

Page 25: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 25

Our customer can decide to adopt a strategy near HA with RecoverPoint for virtual

machines. With this technology, a customer can have a synchronous replication with a

minimum RTO time. This approach is positioned near the availability as represented in the

figure below.

Figure 16: Near high availability - Near HA

With this technology, the Cloud service provider can define different level of protection:

Near HA with synchronous replication and local/remote protection

Disaster recovery with asynchronous replication and local/remote protection

This architecture allows sharing Cloud resources reducing service costs for provider and

customer.

Figure 17: Architecture overview

With this technology approach, it is possible to configure multiple site protection and the

Cloud provider can manage, for example, two remote sites with two different copies and

journaling of VMs.

PROVIDER SITE

TENANT B

TENANT C

TENANT A

Page 26: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 26

This is an example of a topology with a shared vCenter inside the Cloud Provider site.

Figure 18: vCenter schema

RecoverPoint for virtual machine have several advantages inside a VMware environment:

Protection granularity is at VM level and not based on LUN.

Customer can define a restart sequence in case of disaster recovery, helping to

create a flexible disaster recovery plan.

Cloud provider can manage VM directly from central vCenter.

Customer will have a self-service portal with guided procedure to protect a VM and

test or manage a start on remote site

Product allows managing DR test inside isolated network and with different network

addresses.

The entire infrastructure is virtual without any hardware appliance at customer site.

Conclusion

High Availability as a Service becomes an essential part of the modern virtualization

solutions to insure service availability. Figure 19 shows the importance of integrity between

the application solutions to reach the targeted availability. However, each product can play a

vital role during the system design and the automation process.

Page 27: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 27

Considerations while building HAaaS:

Understand the exact business requirements including RTO, RPO, SLO, SLA, and SLM.

High availability requires a lot of investments, so it is crucial to evaluate the

environment and choose the exact products/components. It’s a key success factor to

accomplish this target successfully.

Skills are a very important in the design, building, and operation phases in order to

maximize the efficiency/benefits from the HAaaS solutions and build a modern and

stable cloud solution.

Figure 19: High availability scenario for Oracle Database Virtual Machine

References

Service Availability: Principles and Practice by Maria Toeroe and Francis Tam (eds)

https://en.wikipedia.org/wiki/High_availability

https://en.wikipedia.org/wiki/High-availability_cluster

https://virtuallylg.wordpress.com/2013/10/10/comparing-vmware-vsphere-app-ha-

with-symantec-applicationha/

http://www.storagereview.com/vmware_vmmark_virtualization_benchmark

http://mandarshinde.com/elasticsearch-basics/

http://virtcloud.blogspot.com.eg/2011/07/designing-your-private-cloud-with.html

High Availability and Disaster Recovery—Concepts, Design, Implementation

Virtuostream.com

Page 28: HIGH AVAILABILITY AS A SERVICE (HAAAS) · 2020. 7. 10. · Mohamed Sohail Data Protection and Availability Specialist Dell EMC mohamed.sohail@dell.com Emanuela Caramagna System Engineer

2016 EMC Proven Professional Knowledge Sharing 28

List of figures

Figure 1: Mainframe system ...................................................................................................... 3 Figure 2: Digital cluster system ................................................................................................. 4 Figure 3: Scenario mapping on probability and damage estimation ...................................... 11 Figure 4: Requirement areas added to scenario mapping ...................................................... 13 Figure 5: VMware development history overview .................................................................. 15 Figure 6: SDDC diagram ........................................................................................................... 16 Figure 7 IaaS, PaaS and SaaS as part of Infrastructure as service as well the Platform as Service the High availability, as service is important .............................................................. 17 Figure 8: Extended Cluster Service between the sites on the application level availability ... 18 Figure 9: protecting the database services against ESXi host failures. ................................... 18 Figure 10: High Availability with the application components included on the top of the virtualization layer ................................................................................................................... 19 Figure 11: the HAaaS targets are to control each tier availability making sure there is end-to-end redundancy & consistency on the virtualization level ..................................................... 19 Figure 12: High Availability important as well on the application level not the VMs or the Hypervisors .............................................................................................................................. 20 Figure 13 Cluster Active/Passive example will require minimal downtime to failover the DB service ...................................................................................................................................... 21 Figure 14: layers of sevices of data protection ....................................................................... 22 Figure 15: Cloud provider disaster recovery ........................................................................... 23 Figure 16: Near high availability - Near HA ............................................................................. 25 Figure 17: Architecture overview ............................................................................................ 25 Figure 18: Vcenter schema ...................................................................................................... 26 Figure 19: High availability scenario for Oracle Database Virtual Machine ............................ 27

Dell EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying and distribution of any Dell EMC software described in this publication requires an applicable software license.

Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.