thesis

Can we design an efficient ByzantineFault Tolerant mechanism for acloud computing environment?

Karan Chhabra

Submitted as part of the requirements for the degree

of MSc in Cloud Computing

at the School of Computing,

National College of Ireland

Dublin, Ireland.

April 2015

Supervisor Dr. Adriana Chis

Abstract

The substantial advancements in Information Technology (IT) over the last century,

have triggered a perceived vision of computing being a 5th utility one day after

water, gas, electricity and telephony. In order to deliver this vision various computing

paradigm are proposed out which Cloud computing is the latest paradigm. Cloud

computing has benefits of its own including on-demand service, pay-per-use etc. but

along with benefits there are various challenges associated with it .

One of the major research challenges in Cloud Computing is to ensure reliability and

availability of resources provided by it which is only possible if the cloud computing

environment is not prone to faults and if there is a proper fault tolerant system/mecha-

nism to prevent these faults. This concept becomes even stronger after introduction of

federated computing i.e. a type of computing in which customers using cloud services

are allowed to scale various applications over many domains but building such reliable

clouds is an area of concern due faults like byzantine faults which are arbitrary in

nature.

This research work is focused on designing an efficient Byzantine Fault Tolerant

mechanism BFS (Byzantine Fault Solution) for a cloud computing environment by

creating a vigorous Fault Tolerant (FT) system in order to prevent byzantine faults

in a cloud computing environment. This research problem is essential because these

arbitrary faults or byzantine faults can occur in any running process within any cloud

computing environment at any point of time and a proper fault tolerant mechanism is

required to prevent or overcome these faults.

Keywords: Cloud computing, Fault tolerance, Byzantine faults, Byzantine fault

tolerance.

ii

Contents

Abstract ii

1 Introduction 1

2 Background 3

2.1 Cloud computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Fault Tolerance: A challenge in cloud computing . . . . . . . . . . . . . 4

2.3 Byzantine Fault Tolerance (BFT) . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Current Byzantine Fault Tolerance (BFT) work in cloud . . . . . . . . . 7

2.5 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.6 Expected Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Bibliography 10

iii

Chapter 1

Introduction

Cloud computing is a term used to describe a category of on-demand computing ser-

vices (pay as you go) which were first offered by providers, such as Google, Amazon and

Microsoft. Cloud computing acts as a model on which the infrastructure of computing

is viewed as a cloud, through which individuals access applications on demand. Cloud

computing is considered as utility computing which has enabled computing infrastruc-

ture which are highly scaled and flexible in nature. The computing infrastructure

empowered by cloud computing has provided capital saving for both consumers and

service providers. Such highly scalable compute resources has led to a tremendous

amount of cost saving.

Before we proceed further to the research problem, it is important to understand the

concepts of cloud computing along with the different areas of cloud computing related

to the research problem.

In Cloud computing major emphasis is laid on providing computing as a service i.e.

on demand and is achieved by providing a virtualized infrastructure consisting of data

centers which are maintained and monitored by content providers. The services offered

by cloud computing vendors are based on the fundamental models of cloud computing

such as IaaS (Infrastructure as a Service), SaaS (Software as a Service) and PaaS (Plat-

form as a Service). Cloud computing stands tall over other computing paradigms such

as grid computing, utility computing, mainframe computers and peer-to-peer comput-

ers due to its characteristics such as agility, Application Programming Interface (API),

cost, device and location independency, maintenance, multitenancy, performance, pro-

ductivity, reliability, scalability, elasticity and security.

Cloud Computing provides benefits like cost efficiency, on-demand services and multi-

tenancy but with some risks associated to it. One of the major research challenges

in Cloud Computing is to ensure reliability and availability of resources provided by

1

it which is only possible if the cloud computing environment is not prone to faults

and if there is a proper fault tolerant system/mechanism to prevent these faults. So

there is a requirement for a vigorous Fault Tolerant (FT) system in Cloud Comput-

ing. These faults can be of many types but this research problem focuses on byzantine

faults, which occur at any time and destroy an ongoing process completely in a cloud

computing environment. This document also focuses on the measures to overcome this

problem in order to provide good quality of service. There is a requirement for a robust

fault tolerance mechanism in order to prevent these byzantine faults.

[9] say that the fundamental problem occurring in distributed systems is constructing

robust network services that can with-stand a wide range of failure types. In order to

mask arbitrary failures the most general approach used is byzantine fault tolerance but

it is considered too exorbitant to install in practice and many solutions are not resilient

to performance attacks. According to [1], the major requirement for the clients deal-

ing with clouds is data security as clouds may fail due to the faults occurring in the

software or hardware, or attacks from malicious insiders. Hence, the construction of a

highly reliable or consistent cloud system has become a vital research requirement.

According to [19] cloud computing is becoming a highly popular and efficient solu-

tion for constructing dependable applications on dispersed resources. However, it is a

perilous challenge to assure the dependability of applications within the system due

to the very dynamic environment. This document debates the problems faced due

to byzantine faults occurring in cloud computing environments along-with the differ-

ent approaches to overcome these faults by applying different byzantine fault tolerant

mechanisms in a cloud environment.

The structure of this document is laid as follows:

Chapter 2 discusses the related work done in domain of the research problem along-with

hypothesis and expected contribution.

2

Chapter 2

Background

This chapter discusses the background of Cloud computing along-with a glimpse of

domain of the research problem, Fault tolerance: A challenge in cloud computing,

Byzantine Fault Tolerance (BFT), Current Byzantine Fault Tolerance (BFT) work in

cloud, Hypothesis and Contribution.

2.1 Cloud computing

This section discusses the background of cloud computing and the main issues that oc-

cur in a cloud computing environment. The first paragraph discusses the background

of cloud computing while the second paragraph discusses the main issues in a cloud

computing environment.

[2] define cloud computing as referring to both the applications delivered as services

over the internet and the hardware and systems software in the data centers that pro-

vide those services. The services themselves have long been referred to as Software as a

Service (SaaS). Some vendors use terms such as IaaS (Infrastructure as a Service) and

PaaS (Platform as a Service) to describe their products, but eschew with these because

accepted definitions for them still vary widely.

[2] and [3] describe cloud computing as a computing rebellion that has the capability

to change a huge part of the IT industry by making software eye-catching and chang-

ing the way IT hardware is purchased and designed. They have also compared cloud

computing with several other computing paradigms that have promised to deliver util-

ity computing, Grid computing and Cluster computing. According to NIST, Cloud

Computing is a model for enabling ubiquitous convenient, on-demand network access

to a shared pool of configurable computing resources (e.g., networks, servers, storage,

3

applications, and services) that can be rapidly provisioned and released with minimal

management effort or service provider interaction. [16]

The advent of Cloud Computing has conveyed a new dimension to the domain of (IT)

Information Technology but with many benefits there are some issues related to it. One

of the major research challenges in Cloud Computing is to make sure that the resources

are reliable and continuously available. [11] recommend the need for a vigorous Fault

Tolerant (FT) system in Cloud Computing. However building vigorous fault tolerant

systems is itself a big challenge but it plays an important role in the improvement of

quality of service (QOS) in cloud computing.

In the next section we will discuss Fault Tolerance in more detail.

2.2 Fault Tolerance: A challenge in cloud computing

This section discusses fault tolerance, its resilience and management in cloud computing

along with some fault tolerant techniques applied in a cloud computing environment

in order to build a fault tolerant environment and also demonstrates the perception of

the number, nature and kind of faults that appear in cloud computing infrastructures,

and the impact of these faults on a users applications along with the measures to

handle these faults in a cost-effective and efficient manner. The rst paragraph discusses

fault tolerance and how different journals and conferences describe fault tolerance in

cloud computing while the second paragraph discusses the different approaches of fault

tolerance in different computing environments.

[17] discuss and classify the type of faults that appear as failures to the end users

as Crash faults that stop the functioning of the system components completely or

remain inactive at the time of failures and Byzantine faults that force the components

of a system to act arbitrarily at the time of failure, triggering the system to behave

randomly incorrect. [11] say that a single Cloud consists of different layers which can

be affected with various types of faults. So these layers requires different levels of fault

tolerant techniques in order to provide seamless service. The argument of both sets

of authors share a common ground on providing a robust fault tolerance in order to

overcome faults. [17] discuss fault tolerance as the capability of the system to achieve

its purpose even in the existence of failures. They describe fault tolerance as one of the

mechanisms to improve the overall dependability of the system.

Fault tolerance serves as a key factor in achieving good quality of service in a computing

environment because if a system is prone to faults it cannot provide good service quality

so a tough fault tolerance mechanism is required to prevent faults. Since fault tolerance

is required in a cloud computing environment to prevent faults, different people have

proposed different approaches of fault tolerance in order to achieve quality of service

4

in a cloud computing environment. [14] discuss an inventive and integrated perception

for creation and management of fault tolerance in Clouds. They present an intangible

framework called Fault Tolerance Manager (FTM) which provides a base for the service

provider to propose fault tolerance as a service, and propose an inclusive approach to

cover execution details of fault tolerance techniques to developers and users by virtue of

a committed service layer. [20] discuss reliable cloud applications as a critical research

problem and propose a FTCloud framework to build cloud applications which can

tolerate faults and hence improve reliability. [11] agree with [14] on a requirement

for a vigorous fault tolerance framework in order to prevent faults. [11] discuss the

elementary concepts of fault tolerance by thoughtful consideration of different Fault

Tolerance policies like Reactive Fault tolerance policy and Proactive Fault Tolerance

policy and the related Fault Tolerance techniques applied on diverse types of faults.

[15] debate the major problems arising in data sets due to the continuous increase in

the size of data sets and the requirement for data management in order to meet the

processing needs by means of added parallelism (by including a greater number of nodes

and/or cores into the system) but this exposes the system to frequent failures during

processing. In order to recover from this problem they present a possible design and

implementation of a fault- tolerant environment for processing large queries on huge

datasets.

To conclude, Fault tolerance is becoming a crucial area of research as faults occurring

during processing highly affect the performance of a system which in turn affects the

service quality offered by the system. [17] discuss the importance of fault tolerance in

a cloud computing environment to prevent faults like byzantine faults and develop an

efficient fault tolerance model to prevent these kind of faults.

In the next section we will be discussing Byzantine fault tolerance in further detail.

2.3 Byzantine Fault Tolerance (BFT)

This section discusses Byzantine Fault Tolerance (BFT), a fault tolerant mechanism

required to prevent byzantine faults in a cloud computing environment. The rst para-

graph discusses byzantine fault tolerance and how different journals and conferences

describe byzantine fault tolerance in a cloud computing environment while the second

paragraph provides a conclusion to byzantine fault tolerance with a glimpse of different

approaches for byzantine fault tolerance in a cloud computing environment.

[19] confer byzantine faults as arbitrary faults which when they occur in a cloud com-

puting environment may damage a process or an entire application and in order to

build dependable cloud applications on a cloud infrastructure, it is vital to design a

fault tolerance structure for handling these type of faults. Normally, the dependability

5

of cloud applications in a cloud computing environment is effected due to different types

of faults such as network faults (disconnection), node faults (crashing), byzantine faults

(arbitrary faults) and the research problem is focused on one of these faults i.e. byzan-

tine faults, application of a byzantine fault tolerant mechanism in a cloud computing

environment. [4] discuss the arbitrary behavior caused by byzantine faults due to ma-

licious attacks, software error and mistake of operator and suggest the requirement of

highly available systems for the growing online services in order to provide these services

without interruptions. [10] discuss the need of a byzantine fault tolerant mechanism in

order to overcome hardware and software errors, arbitrary/byzantine failures which are

generated by malicious attacks in modern distributed systems. Byzantine faults make

a huge impact on quality of service in a cloud computing environment as these faults

send an inconsistent response to a request, forcing a process to crash. [9] agree with

[19] on byzantine fault tolerance, a necessary framework to overcome byzantine faults.

Both sets of authors confer that regardless of signicant improvement in making byzan-

tine fault tolerance practical, it is still not adopted widely because of high overheads

and complex techniques involved building such structures. [9] say despite signicant

progress in making BFT practical, it has not been widely adopted, mainly because of

the complexity of the techniques involved and high overheads. In addition, BFT is not

a panacea, since there are a variety of attacks, such as various performance attacks that

BFT does not handle well. [18] describes byzantine faults as malicious behavior or arbi-

trary faults which have become an important issue in a cloud computing environment,

an efficient fault tolerant structure to prevent byzantine faults is required. [13] agrees

with [10] on the requirement of a robust fault tolerant mechanism in a cloud computing

environment. [13] discuss the requirement of byzantine fault tolerant protocols in order

to tolerate arbitrary/byzantine failures of hardware and software components. Hence

byzantine fault tolerance attracts lots of researchers ever since byzantine faults were

introduced but despite being a big research area byzantine fault tolerance suffers from

a limited practical adoption in real-time systems such as the aerospace industry.

To conclude, Byzantine fault tolerance is an approach for overcoming the effects of

byzantine faults or arbitrary faults in a cloud computing environment to enhance qual-

ity of service. Since byzantine fault tolerance serves as an area of concern therefore so

many attempts have been made to use byzantine fault tolerance in cloud computing

environments in order to enhance quality of service provided. Byzantine fault tolerance

is serving as a big issue in cloud computing and makes the research problem even more

important because of the amount of impact it causes to a cloud computing environ-

ment and in this review we will be discussing more about the different approaches of

application of byzantine fault tolerance in a cloud computing environment.

In the next section we will analyze the different approaches of Byzantine fault tolerance

6

in detail.

2.4 Current Byzantine Fault Tolerance (BFT) work in

cloud

This section discusses the current work done in the domain byzantine fault tolerance

(BFT) in a cloud computing environment. This section provides an analysis on the

different methods adopted by different people in order to provide byzantine fault tol-

erance in cloud computing. The rst paragraph analyses the diverse approaches for

byzantine fault tolerance by different people in a cloud computing environment while

the second paragraph provides a conclusion to different methodologies for byzantine

fault tolerance.

[1] consider data security, a significant requirement in clouds because a cloud may fail

due to faults occurring in the hardware or software making it a critical research problem.

They propose a practical model, BFT-MCDB (Byzantine Fault Tolerance Multi-Clouds

Database) with byzantine fault tolerance in a multi-cloud environment which depend

on an approach which combines Shamirs secret sharing approach (to detect Byzantine

failure) along with Byzantine Agreement protocols in a multi-cloud computing environ-

ment ensuring the security of stowed data inside the cloud. [19] agree with [1] in context

to byzantine fault tolerance being an important research area in a cloud computing en-

vironment. [19] discuss the importance of building highly dependent applications in a

cloud computing environment. They say building such applications is a big challenge

in order to guarantee dependability of the applications mostly in voluntary-resource

cloud because of the highly dynamic environment, so they propose a Byzantine fault

tolerance structure for constructing vigorous systems in voluntary-resource cloud envi-

ronment, BFTCloud (Byzantine Fault Tolerant Cloud) which guarantees heftiness of

systems when up to f out of 3f +1 resource providers incur fault which may include

arbitrary faults. [4] propose a new replication algorithm to tolerate byzantine faults

and produce highly available systems. [19] also say that BFTCloud guarantees high

reliability of systems built on the top of voluntary-resource cloud infrastructure and

ensures good performance of these systems. [12] support [19] and [1] on proposing a

fault tolerant system that can overcome arbitrary faults or byzantine faults and also

discuss strengthening this concept with the rise of federated computing clouds which

help to meet Quality of Service targets by allowing users to scale the applications across

various domains. They analyze the application of byzantine fault tolerance to federated

clouds in detail by an experiment under which a cloud framework called FT-FC is built

which allows them to create diversity based byzantine fault tolerant systems and apply

7

them to federated Clouds in order to examine the efficiency of byzantine fault tolerance

in federated Clouds. [5] describe about how to build an interoperable heterogeneous

cloud milieu inside a horizontally federated configuration, where clouds cooperate with

each other in order to build trust and provide new opportunities for business including

power saving, reduced cost assets and on-demand provisioning of resources. [7] talk

about the importance of MapReduce to run scientific data analysis and how result of

these MapReduce jobs get effected by arbitrary faults. They say MapReduce runtimes

like Hadoop tolerate crash faults, but not arbitrary or Byzantine faults. So they pro-

pose a MapReduce algorithm to tolerate these type of faults. Both [8] and [6] agrees

with [7] and describe byzantine faults in a cloud and propose a MapReduce runtime

which can tolerate these faults and run at a low cost in terms of execution time.

To conclude, byzantine fault tolerance is considered a large domain and a lot of work

is done on it, there are various other approaches for application of byzantine fault

tolerance in cloud computing environment along with the approaches discussed above

making byzantine fault tolerance a wide area for research.

In the next section we will confer the hypothesis of this document.

2.5 Hypothesis

This document focuses on the following research problem:-

Can we design an efficient Byzantine Fault Tolerant mechanism for a cloud

computing environment?

This research problem is important because these arbitrary faults or byzantine faults

can occur in any running process within any cloud computing environment at any point

of time and a proper fault tolerant mechanism is required to prevent or overcome these

faults. In this research work, we propose an efficient byzantine fault tolerant framework

named BFS (Byzantine Fault Solution) for a cloud computing environment.

The next section describes the expected contribution for this research problem.

2.6 Expected Contribution

This paper carefully addresses the byzantine/arbitrary faults and continues the novel

approach of [12] and proposes an effective fault tolerant framework, Byzantine Fault

8

Solution (BFS) for a cloud computing environment by trying to:

• Inject complex faults into the application layer,

• Explain the reasons of failing due to byzantine/arbitrary faults,

• Increase the cloud dynamicity, record results for the same framework, and

• Compare Byzantine Fault Solution (BFS) with the framework proposed by [12].

9

Bibliography

[1] M.A. AlZain, B. Soh, and E. Pardede. A byzantine fault tolerance model for a multi-cloud

computing. In Computational Science and Engineering (CSE), 2013 IEEE 16th International

Conference on, pages 130–137, Dec 2013.

[2] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwin-

ski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. A view of cloud

computing. Commun. ACM, 53(4):50–58, April 2010.

[3] Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic. Cloud

computing and emerging {IT} platforms: Vision, hype, and reality for delivering computing as

the 5th utility. Future Generation Computer Systems, 25(6):599 – 616, 2009.

[4] Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance and proactive recovery.

ACM Trans. Comput. Syst., 20(4):398–461, November 2002.

[5] A. Celesti, F. Tusa, M. Villari, and A. Puliafito. Three-phase cross-cloud federation model: The

cloud sso authentication. In Advances in Future Internet (AFIN), 2010 Second International

Conference on, pages 94–101, July 2010.

[6] M. Correia, P. Costa, M. Pasin, A. Bessani, F. Ramos, and P. Verissimo. On the feasibility of

byzantine fault-tolerant mapreduce in clouds-of-clouds. In Reliable Distributed Systems (SRDS),

2012 IEEE 31st Symposium on, pages 448–453, Oct 2012.

[7] P. Costa, M. Pasin, A.N. Bessani, and M. Correia. Byzantine fault-tolerant mapreduce: Faults

are not just crashes. In Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third

International Conference on, pages 32–39, Nov 2011.

[8] P. Costa, M. Pasin, A.N. Bessani, and M.P. Correia. On the performance of byzantine fault-

tolerant mapreduce. Dependable and Secure Computing, IEEE Transactions on, 10(5):301–313,

Sept 2013.

[9] S. Duan, K. Levitt, Hein Meling, S. Peisert, and Haibin Zhang. Byzid: Byzantine fault tolerance

from intrusion detection. In Reliable Distributed Systems (SRDS), 2014 IEEE 33rd International

Symposium on, pages 253–264, Oct 2014.

[10] S. Duan, S. Peisert, and K.N. Levitt. hbft: Speculative byzantine fault tolerance with minimum

cost. Dependable and Secure Computing, IEEE Transactions on, 12(1):58–70, Jan 2015.

[11] A. Ganesh, M. Sandhya, and S. Shankar. A study on fault tolerance methods in cloud computing.

In Advance Computing Conference (IACC), 2014 IEEE International, pages 844–849, Feb 2014.

[12] P. Garraghan, P. Townend, and Jie Xu. Byzantine fault-tolerance in federated cloud computing.

In Service Oriented System Engineering (SOSE), 2011 IEEE 6th International Symposium on,

pages 280–285, Dec 2011.

10

[13] Rachid Guerraoui and Maysam Yabandeh. Independent faults in the cloud. In Proceedings of

the 4th International Workshop on Large Scale Distributed Systems and Middleware, LADIS ’10,

pages 12–17, New York, NY, USA, 2010. ACM.

[14] R. Jhawar, V. Piuri, and M. Santambrogio. Fault tolerance management in cloud computing: A

system-level perspective. Systems Journal, IEEE, 7(2):288–297, June 2013.

[15] M.C. Kurt and G. Agrawal. A fault-tolerant environment for large-scale query processing. In High

Performance Computing (HiPC), 2012 19th International Conference on, pages 1–10, Dec 2012.

[16] Timothy Grance Peter Mell. The nist definition of cloud computing. September 2011. [Online;

accessed 26-December-2014].

[17] Vincenzo Piuri Ravi Jhawar. Fault tolerance and resilience in cloud computing environment. In

Cyber Security and IT Infrastructure Protection, pages 1–28, Boston:Syngress, 2014.

[18] Marko Vukolic. The byzantine empire in the intercloud. SIGACT News, 41(3):105–111, September

2010.

[19] Yilei Zhang, Zibin Zheng, and M.R. Lyu. Bftcloud: A byzantine fault tolerance framework for

voluntary-resource cloud computing. In Cloud Computing (CLOUD), 2011 IEEE International

Conference on, pages 444–451, July 2011.

[20] Zibin Zheng, T.C. Zhou, M.R. Lyu, and I. King. Ftcloud: A component ranking framework for

fault-tolerant cloud applications. In Software Reliability Engineering (ISSRE), 2010 IEEE 21st

International Symposium on, pages 398–407, Nov 2010.

11

thesis

Documents