a load balancing model based on cloud partitioningfor the public cloud

A Load Balancing Model Based on Cloud Partitioningfor the Public Cloud

Gaochao Xu, Junjie Pang, and Xiaodong Fu�

Abstract: Load balancing in the cloud computing environment has an important impact on the performance. Good

load balancing makes cloud computing more efficient and improves user satisfaction. This article introduces a

better load balance model for the public cloud based on the cloud partitioning concept with a switch mechanism

to choose different strategies for different situations. The algorithm applies the game theory to the load balancing

strategy to improve the efficiency in the public cloud environment.

Key words: load balancing model; public cloud; cloud partition; game theory

1 Introduction

Cloud computing is an attracting technology in the fieldof computer science. In Gartner’s report[1], it says thatthe cloud will bring changes to the IT industry. Thecloud is changing our life by providing users withnew types of services. Users get service from a cloudwithout paying attention to the details[2]. NIST gave adefinition of cloud computing as a model for enablingubiquitous, convenient, on-demand network access toa shared pool of configurable computing resources(e.g., networks, servers, storage, applications, andservices) that can be rapidly provisioned and releasedwith minimal management effort or service providerinteraction[3]. More and more people pay attention tocloud computing[4, 5]. Cloud computing is efficient andscalable but maintaining the stability of processing somany jobs in the cloud computing environment is a verycomplex problem with load balancing receiving muchattention for researchers.

Since the job arrival pattern is not predictableand the capacities of each node in the cloud differ,for load balancing problem, workload control is

�Gaochao Xu and Xiaodong Fu are with College of ComputerScience and Technology, Jilin University, Changchun 130012,China. E-mail: fxugc, [email protected].� Junjie Pang is with College of Software, Jilin University,

Changchun 130012, China. E-mail: [email protected].�To whom correspondence should be addressed.

Manuscript received: 2012-12-14; accepted: 2013-01-15

crucial to improve system performance and maintainstability. Load balancing schemes depending onwhether the system dynamics are important can beeither static and dynamic[6]. Static schemes do not usethe system information and are less complex whiledynamic schemes will bring additional costs for thesystem but can change as the system status changes. Adynamic scheme is used here for its flexibility. Themodel has a main controller and balancers to gather andanalyze the information. Thus, the dynamic control haslittle influence on the other working nodes. The systemstatus then provides a basis for choosing the right loadbalancing strategy.

The load balancing model given in this articleis aimed at the public cloud which has numerousnodes with distributed computing resources in manydifferent geographic locations. Thus, this model dividesthe public cloud into several cloud partitions. Whenthe environment is very large and complex, thesedivisions simplify the load balancing. The cloud has amain controller that chooses the suitable partitions forarriving jobs while the balancer for each cloud partitionchooses the best load balancing strategy.

2 Related Work

There have been many studies of load balancing for thecloud environment. Load balancing in cloud computingwas described in a white paper written by Adler[7] whointroduced the tools and techniques commonly used for

IEEE TRANSACTIONS ON CLOUD COMPUTING YEAR 2013

load balancing in the cloud. However, load balancingin the cloud is still a new problem that needs newarchitectures to adapt to many changes. Chaczko etal.[8] described the role that load balancing plays inimproving the performance and maintaining stability.

There are many load balancing algorithms, suchas Round Robin, Equally Spread Current ExecutionAlgorithm, and Ant Colony algorithm. Nishant et al.[9]

used the ant colony optimization method in nodes loadbalancing. Randles et al.[10] gave a compared analysisof some algorithms in cloud computing by checkingthe performance time and cost. They concluded that theESCE algorithm and throttled algorithm are better thanthe Round Robin algorithm. Some of the classical loadbalancing methods are similar to the allocation methodin the operating system, for example, the Round Robinalgorithm and the First Come First Served (FCFS)rules. The Round Robin algorithm is used here becauseit is fairly simple.

3 System Model

There are several cloud computing categories with thiswork focused on a public cloud. A public cloud isbased on the standard cloud computing model, withservice provided by a service provider[11]. A largepublic cloud will include many nodes and the nodes indifferent geographical locations. Cloud partitioning isused to manage this large cloud. A cloud partition isa subarea of the public cloud with divisions based onthe geographic locations. The architecture is shown inFig.1.

The load balancing strategy is based on the cloudpartitioning concept. After creating the cloud partitions,the load balancing then starts: when a job arrives at

Fig. 1 Typical cloud partitioning.

the system, with the main controller deciding whichcloud partition should receive the job. The partitionload balancer then decides how to assign the jobs tothe nodes. When the load status of a cloud partition isnormal, this partitioning can be accomplished locally. Ifthe cloud partition load status is not normal, this jobshould be transferred to another partition. The wholeprocess is shown in Fig.2.

3.1 Main controller and balancers

The load balance solution is done by the main controllerand the balancers.

The main controller first assigns jobs to thesuitable cloud partition and then communicates withthe balancers in each partition to refresh this statusinformation. Since the main controller deals withinformation for each partition, smaller data sets willlead to the higher processing rates. The balancers ineach partition gather the status information from everynode and then choose the right strategy to distribute thejobs. The relationship between the balancers and themain controller is shown in Fig.3.

3.2 Assigning jobs to the cloud partition

When a job arrives at the public cloud, the first step isto choose the right partition. The cloud partition statuscan be divided into three types:(1) Idle: When the percentage of idle nodes exceeds ˛,

change to idle status.

Fig. 2 Job assignment strategy.

Fig. 3 Relationships between the main controllers, thebalancers, and the nodes.

(2) Normal: When the percentage of the normal nodesexceeds ˇ, change to normal load status.

(3) Overload: When the percentage of the overloadednodes exceeds , change to overloaded status.

The parameters ˛, ˇ, and are set by the cloudpartition balancers.

The main controller has to communicate withthe balancers frequently to refresh the statusinformation. The main controller then dispatchesthe jobs using the following strategy:

When job i arrives at the system, the main controllerqueries the cloud partition where job is located. If thislocation’s status is idle or normal, the job is handledlocally. If not, another cloud partition is found that isnot overloaded. The algorithm is shown in Algorithm 1.

3.3 Assigning jobs to the nodes in the cloudpartition

The cloud partition balancer gathers load informationfrom every node to evaluate the cloud partitionstatus. This evaluation of each node’s load status is veryimportant. The first task is to define the load degree of

Algorithm 1 Best Partition Searchingbegin

while job dosearchBestPartition (job);if partitionState == idle k partitionState == normal then

Send Job to Partition;else

search for another Partition;end if

end whileend

each nodes.The node load degree is related to various

static parameters and dynamic parameters. The staticparameters include the number of CPU’s, the CPUprocessing speeds, the memory size, etc. Dynamicparameters are the memory utilization ratio, the CPUutilization ratio, the network bandwidth, etc. The loaddegree is computed from these parameters as below:

Step 1 Define a load parameter set: F D fF1; F2;

� � � ; Fmg with each Fi .1 6 i 6 m;Fi 2 Œ0; 1�/

parameter being either static or dynamic. m

represents the total number of the parameters.Step 2 Compute the load degree as:

Load degree.N / DmX

iD1

˛iFi ;

˛i .Pn

iD1 ˛i D 1/ are weights that may differ fordifferent kinds of jobs.N represents the current node.Step 3 Define evaluation benchmarks. Calculate theaverage cloud partition degree from the node loaddegree statistics as:

Load degreeavg D

PniD1 Load degree.Ni /

n:

The bench mark Load degreehigh is then set fordifferent situations based on the Load degreeavg.

Step 4 Three nodes load status levels are thendefined as:� Idle When

Load degree.N / D 0;

there is no job being processed by this node so the statusis charged to Idle.� Normal For

0 < Load degree.N / 6 Load degreehigh;

the node is normal and it can process other jobs.� Overloaded When

Load degreehigh 6 Load degree.N /;

the node is not available and can not receive jobs untilit returns to the normal.

The load degree results are input into the Load StatusTables created by the cloud partition balancers. Eachbalancer has a Load Status Table and refreshes it eachfixed period T . The table is then used by the balancersto calculate the partition status. Each partition status hasa different load balancing solution. When a job arrivesat a cloud partition, the balancer assigns the job to thenodes based on its current load strategy. This strategyis changed by the balancers as the cloud partition statuschanges.

4 Cloud Partition Load Balancing Strategy

4.1 Motivation

Good load balance will improve the performance of theentire cloud. However, there is no common method thatcan adapt to all possible different situations. Variousmethods have been developed in improving existingsolutions to resolve new problems.

Each particular method has advantage in a particulararea but not in all situations. Therefore, the currentmodel integrates several methods and switches betweenthe load balance method based on the system status.

A relatively simple method can be used for thepartition idle state with a more complex method for thenormal state. The load balancers then switch methodsas the status changes. Here, the idle status uses animproved Round Robin algorithm while the normalstatus uses a game theory based load balancing strategy.

4.2 Load balance strategy for the idle status

When the cloud partition is idle, many computingresources are available and relatively few jobs arearriving. In this situation, this cloud partition has theability to process jobs as quickly as possible so a simpleload balancing method can be used.

There are many simple load balance algorithmmethods such as the Random algorithm, the WeightRound Robin, and the Dynamic Round Robin[12]. TheRound Robin algorithm is used here for its simplicity.

The Round Robin algorithm is one of the simplestload balancing algorithms, which passes each newrequest to the next server in the queue. The algorithmdoes not record the status of each connection so ithas no status information. In the regular Round Robinalgorithm, every node has an equal opportunity to bechosen. However, in a public cloud, the configurationand the performance of each node will be not the same;thus, this method may overload some nodes. Thus, animproved Round Robin algorithm is used , which called“Round Robin based on the load degree evaluation”.

The algorithm is still fairly simple. Before the RoundRobin step, the nodes in the load balancing table areordered based on the load degree from the lowest tothe highest. The system builds a circular queue andwalks through the queue again and again. Jobs will thenbe assigned to nodes with low load degrees. The nodeorder will be changed when the balancer refreshes theLoad Status Table.

However, there may be read and write inconsistency

at the refresh period T . When the balance table isrefreshed, at this moment, if a job arrives at the cloudpartition, it will bring the inconsistent problem. Thesystem status will have changed but the information willstill be old. This may lead to an erroneous load strategychoice and an erroneous nodes order. To resolve thisproblem, two Load Status Tables should be created as:Load Status Table 1 and Load Status Table 2. A flag isalso assigned to each table to indicate Read or Write.

When the flag = “Read”, then the Round Robin basedon the load degree evaluation algorithm is using thistable.

When the flag = “Write”, the table is being refreshed,new information is written into this table.

Thus, at each moment, one table gives the correctnode locations in the queue for the improved RoundRobin algorithm, while the other is being prepared withthe updated information. Once the data is refreshed, thetable flag is changed to “Read” and the other table’s flagis changed to “Write”. The two tables then alternate tosolve the inconsistency. The process is shown in Fig.4.

4.3 Load balancing strategy for the normal status

When the cloud partition is normal, jobs are arrivingmuch faster than in the idle state and the situation isfar more complex, so a different strategy is used for theload balancing. Each user wants his jobs completed inthe shortest time, so the public cloud needs a methodthat can complete the jobs of all users with reasonableresponse time.

Penmatsa and Chronopoulos[13] proposed a staticload balancing strategy based on game theory fordistributed systems. And this work provides us witha new review of the load balance problem in thecloud environment. As an implementation of distributedsystem, the load balancing in the cloud computingenvironment can be viewed as a game.

Game theory has non-cooperative games andcooperative games. In cooperative games, the decisionmakers eventually come to an agreement which is calleda binding agreement. Each decision maker decides bycomparing notes with each others. In non-cooperativegames, each decision maker makes decisions only forhis own benefit. The system then reachs the Nashequilibrium, where each decision maker makes theoptimized decision. The Nash equilibrium is when eachplayer in the game has chosen a strategy and no playercan benefit by changing his or her strategy while theother players strategies remain unchanged.

Fig. 4 The solution of inconsistently problem.

There have been many studies in using game theoryfor the load balancing. Grosu et al.[14] proposed aload balancing strategy based on game theory forthe distributed systems as a non-cooperative gameusing the distributed structure. They compared thisalgorithm with other traditional methods to showthat their algorithm was less complexity with betterperformance. Aote and Kharat[15] gave a dynamic loadbalancing model based on game theory. This model isrelated on the dynamic load status of the system withthe users being the decision makers in a non-cooperativegame.

Since the grid computing and cloud computingenvironments are also distributed system, thesealgorithms can also be used in grid computing and cloudcomputing environments. Previous studies have shownthat the load balancing strategy for a cloud partitionin the normal load status can be viewed as a non-cooperative game, as described here.

The players in the game are the nodes and thejobs. Suppose there are n nodes in the current cloudpartition withN jobs arriving, then define the followingparameters:�i : Processing ability of each node, i D 1; � � � ; n.�j : Time spending of each job.˚ D

PNjD1 �j : Time spent by the entire cloud

partition, ˚ <Pn

iD1 �i .sj i : Fraction of job j that assigned to node i

(Pn

iD1 sj i D 1 and 0 6 sj i 6 1).In this model, the most important step is finding the

appropriate value of sj i . The current model uses themethod of Grosu et al.[14] called “the best reply” tocalculate sj i of each node, with a greedy algorithm thenused to calculate sj i for all nodes. This procedure givesthe Nash equilibrium to minimize the response timeof each job. The strategy then changes as the node’sstatuses change.

5 Future Work

Since this work is just a conceptual framework, morework is needed to implement the framework and resolvenew problems. Some important points are:(1) Cloud division rules: Cloud division is not a simple

problem. Thus, the framework will need a detailedcloud division methodology. For example, nodes ina cluster may be far from other nodes or there willbe some clusters in the same geographic area thatare still far apart. The division rule should simplybe based on the geographic location (province orstate).

(2) How to set the refresh period: In the data statisticsanalysis, the main controller and the cloud partitionbalancers need to refresh the information at a fixedperiod. If the period is too short, the high frequencywill influence the system performance. If the periodis too long, the information will be too old to makegood decision. Thus, tests and statistical tools areneeded to set a reasonable refresh periods.

(3) A better load status evaluation: A good algorithm isneeded to set Load degreehigh and Load degreelow,

and the evaluation mechanism needs to be morecomprehensive.

(4) Find other load balance strategy: Other load balancestrategies may provide better results, so tests areneeded to compare different strategies. Many testsare needed to guarantee system availability andefficiency.

Acknowledgements

We would like to thank the editors and anonymousreviewers for their valuable comments and helpfulsuggestions.

References

[1] R. Hunter, The why of cloud, http://www.gartner.com/DisplayDocument?doc cd=226469&ref= g noreg, 2012.

[2] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis,and A. Vakali, Cloud computing: Distributed internetcomputing for IT and scientific research, InternetComputing, vol.13, no.5, pp.10-13, Sept.-Oct. 2009.

[3] P. Mell and T. Grance, The NIST definition of cloudcomputing, http://csrc.nist.gov/ publications/nistpubs/800-145/SP800-145.pdf, 2012.

[4] Microsoft Academic Research, Cloud computing, http://libra.msra.cn/Keyword/6051/cloud-computing?query=cloud%20computing, 2012.

[5] Google Trends, Cloud computing, http://www.google.com/trends/explore#q=cloud%20computing, 2012.

[6] N. G. Shivaratri, P. Krueger, and M. Singhal, Loaddistributing for locally distributed systems, Computer,vol. 25, no. 12, pp. 33-44, Dec. 1992.

[7] B. Adler, Load balancing in the cloud: Tools, tips andtechniques, http://www.rightscale. com/info center/white-papers/Load-Balancing-in-the-Cloud.pdf, 2012

[8] Z. Chaczko, V. Mahadevan, S. Aslanzadeh, andC. Mcdermid, Availability and load balancing in cloudcomputing, presented at the 2011 International Conferenceon Computer and Software Modeling, Singapore, 2011.

[9] K. Nishant, P. Sharma, V. Krishna, C. Gupta, K. P. Singh,N. Nitin, and R. Rastogi, Load balancing of nodesin cloud using ant colony optimization, in Proc. 14thInternational Conference on Computer Modelling andSimulation (UKSim), Cambridgeshire, United Kingdom,Mar. 2012, pp. 28-30.

[10] M. Randles, D. Lamb, and A. Taleb-Bendiab, Acomparative study into distributed load balancingalgorithms for cloud computing, in Proc. IEEE 24thInternational Conference on Advanced InformationNetworking and Applications, Perth, Australia, 2010,pp. 551-556.

[11] A. Rouse, Public cloud, http://searchcloudcomputing.techtarget.com/definition/public-cloud, 2012.

[12] D. MacVittie, Intro to load balancing for developers —The algorithms, https://devcentral.f5.com/blogs/us/intro-to-load-balancing-for-developers-ndash-the-algorithms,2012.

[13] S. Penmatsa and A. T. Chronopoulos, Game-theoreticstatic load balancing for distributed systems, Journalof Parallel and Distributed Computing, vol. 71, no. 4,pp. 537-555, Apr. 2011.

[14] D. Grosu, A. T. Chronopoulos, and M. Y. Leung, Loadbalancing in distributed systems: An approach usingcooperative games, in Proc. 16th IEEE Intl. Parallel andDistributed Processing Symp., Florida, USA, Apr. 2002,pp. 52-61.

[15] S. Aote and M. U. Kharat, A game-theoretic model fordynamic load balancing in distributed systems, in Proc. theInternational Conference on Advances in Computing,Communication and Control (ICAC3 ’09), New York,USA, 2009, pp. 235-238.

Gaochao Xu received his BEng degreein Computer Science in 1988, MEngdegree in 1991 and PhD degree from JilinUniversity in 1995. Currently, he is aprofessor and PhD supervisor of Collegeof Computer Science and Technology,Jilin University, China. His main researchinterests include distributed system, grid

computing, cloud computing, Internet of things, informationsecurity, software testing, and software reliability assessment.

Junjie Pang received his BEng degree incomputer science in 2010 from ChangchunUniversity of Technology. Currently, heis studying for a master degree onsoftware engineering in the College ofSoftware, Jilin University, China. His mainresearch interests include cloud computing,load balancing strategy, and visualization

technology.

Xiaodong Fu received her MEng degreefrom Jilin University in 2005. Currently,she is a senior engineer of College ofComputer Science and Technology, JilinUniversity, China. Her main researchinterests include distributed computationand software reliability analysis.

a load balancing model based on cloud partitioningfor the public cloud

Documents