[ieee seventh ieee international symposium on cluster computing and the grid - rio de janeiro,...

BBCLB: A Bulletin-Board based Cooperative Load Balance Strategy for Service Grid

Tianyu Wo, Liang Zhong, Chunming Hu, Jinpeng Huai Computer Science School, Beihang University

{woty ,zhongl, hucm, huaijp}@act.buaa.edu.cn

Abstract

Although many efforts have been put on the load

balance in network and job scheduling systems, most of them, however, can not be applied in the service grid environment directly since they are often designed for a homogeneous system with limited scalability. It is still a challenge problem to balance the load among service grid nodes which are often highly dynamic, heterogeneous and linked by wide-area network. In this paper, we present a load balance strategy using several bulletin-boards as load intermediates among grid nodes. A modified thresholds based load transfer algorithm has been applied with a non-preemptive selection policy. Based on the strategy above, a load balance system is realized in CROWN, a service oriented grid middleware, and deployed in the CROWN testbed. The performance evaluations have shown that our strategy can effectively balance the load of service invocation, and improve the system throughput. 1. Introduction

To provide a high performance computing environment is one of the primary goals of the grid approach. The computation load on different work units may differs significantly. This unbalance becomes the major limitation of the system performance and may cause a considerable waste of system resources. In a tightly coupled system, a straight-forward answer of this issue is that to have a load balancer that receiving all requests and dispatching them onto different work units. Previous work exists in literature[1][2] mostly focus on algorithms that select proper job and bind them with proper resources.

With the emergence of Open Grid Service Architecture (OGSA)[3], service-based grid architecture has now become the significant trends of

future grid technology. In a service grid environment, interoperability is ensured by using open standard information exchange patterns and protocols, so that many heterogeneous resources that belong to different domains can cooperatively join together to fulfill the requirements which are hard to achieve on a single system. It is even harder to realize load balance in such environment, since resources are highly dynamic and distributed in a wide area environment. Further more the autonomy of resources makes it not applicable to deploy a central load balancer. The transportation of service requests and responses messages are quite different from traditional distributed systems as well. Existing work does not fully address these issues and can not be applied in the service grid environment directly.

In this paper, we propose a load balance strategy using several bulletin-board services as the load intermediates. Computing nodes in grid system can publish part of their load to the bulletin-board according to the load status they observed. Nodes with lighter load can also fetch queued requests so that different nodes can cooperatively interact with each other to achieve a balanced load situation. A load estimator and an adaptive load transfer policy have been applied in our approach. They help the service containers to determine their load situation and select proper requests to be publish on the bulletin-boards. Our work has been realized in CROWN NodeServer, a service container used in CROWN Grid[4]. We also conduct a series of performance evaluations on our system. The result shows that our strategy can effectively balance the load of service invocation among service containers, and improve the overall system throughput.

The rest of paper is arranged as follows. In Section 2 we present work related to load balance. Section 3 outlines the requirement and design principle of bulletin-board based cooperative load balance. Detailed implementation is discussed in Section 4. Then we provide a series of system evaluations in

Seventh IEEE International Symposium on Cluster Computing and the Grid(CCGrid'07)0-7695-2833-3/07 $20.00 © 2007

Section 5. Finally we conclude our work and discuss the future work in Section 6. 2. Related Work

Load balance issues raise both from network connections and distributed systems. LVS[5] is a load balance solutions for network connections and applications. LVS provides server clustering for Linux systems. A load balancer is used to dispatch network accesses to a set of work units by using virtual IP address technology. Four different scheduling algorithms are provided: Round Robin Scheduling, Weighted Round Robin Scheduling, Least Connection Scheduling and Weighted Least Connection Scheduling. Applications like web servers can benefit from LVS transparently since it provides an efficient kernel level support of load balance. Some commercial products are also available for example: Network Dispatch, Big/IP and Local Director.

Condor[6], LSF[7] and Ninf[8] provide load balance scheduling in distributed systems, especially in cluster systems. In these systems, local schedulers can exchange load information with front-end schedulers (meta-scheduler in some cases), and dispatch job according to certain strategies.

Levy[9] proposed a framework of Web Service performance management. They evaluated the system load by using QoS metrics. Requests of SOAP Web Services are queued and scheduled in an inner level scheduler and an outer level scheduler is responsible for periodically adjusting the inner level scheduler parameters.

When interconnecting several large scale computing resources to form a grid environment, several systems extend the load balance solutions in clusters and integrated with grid platforms like Globus Toolkit[10] directly. (For example, Condor-G[11], Nimrod/G[12], and Ninf-G[13]). Research effort using an agent-based load balance strategy[14] is also proposed. Local schedulers are kept unchanged in autonomies while agents are used to exchange load information among different autonomies.

Dobber[20] proposed a dynamic load balance strategy which predict the future load situation according to history information and reschedule newly incoming request to balance the overall system load.

Our work differs from above efforts in that; we use several bulletin board services as the load intermediates. There is no central scheduler that collects load information and performs request dispatches, but instead each node in the system participates in the load management and share work

load with each other cooperatively. This strategy can be more suitable for a service based wide-area grid computing environment because in such situation resources are often highly dynamic, autonomic and heterogeneous. 3. Design Overview

In service grid environment, all nodes are receiving request directly. So the load balance strategy has to be distributed i.e. each node may act as both the job worker and the load balancer. In this section we present the way to define the system load situation, the policy to trigger load migration and the deployment in a wide-area environment. 3.1 Determining the Load Situation

Several metrics can be used to model the load situation of a node. The status of hardware, like the load of CPU, memory and disk space, is often taken as the physical load factor. These low-level metrics can be used to reflect the load situation of the overall system, but in certain scenario we may also use some higher level metrics which reflecting the load situation more focusing on the grid applications and avoiding interference from other running processes. One of the high-level metric we may use as a logic load factor is the number of requests waiting in the processing queue. Actually the definition of load situation on a node is much of an application specific problem and may be measured by a combination of many different metrics.

In BBCLB we work out an extensible framework to support various methods of modeling the load of a node. For each kind of services a different algorithm estimating the work load may be applied. Also we provide a default load metric based on physical load factor. 3.2 Load Migration

The basic idea of a load balance system is to transfer partial of system work load from busy nodes to some idle ones. Some policy is needed to determine the time and target of load migration.

The trigger of a load transferring is often based on the current load situation of the whole system and some predication about the future. The cost of load migration is also required to be taken into account, especially in a wide-area environment.

Migration target selection is also a key factor to the effect of load balance strategy. One can pick a target randomly from other work nodes. This policy is very


easy to deploy and require little resource in determining the load migration target. It however, uses little information about the load of target node so that it’s inefficient to select a target node with lower work load and jobs may be rescheduled for several times. When the cost of load transferring is not negligible, as in the service grid environment, the overhead of this policy will be unacceptable. Another choice is to propagate load information among nodes and pick a target node according to the load situation on the nodes. It’s known as a gradient model based scheduling algorithm[15]. But it’s obvious that the effect of load information separation may greatly affect the performance of load balance. In a highly dynamic environment, the cost of propagating system load will be extremely high. To limit the target discovery range, on can select target from its neighbors[16].

In our solution the load transferring policy is mainly based on a two-threshold policy. It assigns certain thresholds to distinguish the node status from busy to idle. As shown in Figure 1, when the system work load L is less than α, the node is in a status of light loaded and becomes a potential receiver of extra requests. While in another case, when L > β, the system is running out of resources and partial of its work load may be transferred. A system runs in a normal status when α ≤ L ≤ β.

Figure 1. Load thresholds

When a node is busy, i.e. L > β, partial of its work load can be delegated out. Considering the overhead of load migration, it is not necessary to transfer workload as soon as the system becomes busy, instead, we calculate a probability pβ which is in proportion to L according to formula (1). And than we make a decision on whether to start the migration process according to pβ. Here Lmax denotes the max value of system workload.

Idle nodes can also ask for some extra work with a probability pα which is in inverse proportion to L according to formula (2).

⎪⎩

⎪⎨

⎧

≥

<=

β

β

β LL

LL

p,

,0

max

(1)

⎪⎩

⎪⎨

⎧

>

≤≤−=

α

αα

L

LL

Lp

,0

0,1max

(2)

As shown in Figure 2, a load balance intermediate called bulletin-board service (BBS) is introduced.

Nodes may delegate work load to a BBS or ask for extra jobs from it.

Figure 2. Load transferring procedure

The BBS not only maintains a request buffer but also act as a load information exchanging center. When some work load is transferred to a BBS, the load situation on the source node is also reported. This piece of information will be collected by some idle nodes when they interact with the BBS.

Obviously, the BBS may become a bottle neck of the whole system since it act as a hub of busy and idle nodes. So in our solution, an idle node acquires extra work from BBS with a probability pBBS. It can also acquire extra work from a busy node directly according to its knowledge about the load situation of that node. More precisely, for node i, a probability pi is assigned according to (3) where Li is the latest load information of node i, Δti denotes the duration since the load information is last reported from that node. Here we believe that recent information may be more accurate to reflect the situation of current system. The probability pi associated with node i is indicating the potential possibility that extra work will be fetched from that node. By using this mechanism the load of BBS may be effectively reduced.

∑ ΔΔ

=)/(

/

ii

iii tL

tLp (3)

We are using a non-preemptive work load selection policy so that only requests in the waiting queue can be transferred. In a service grid environment services of same kind can be deployed on different nodes for many instances. The business logic of service instances of the same service type remain identical while the statuses of running instances may be different. So it’s applicable to transfer these queuing requests among service nodes.

3.3 Wide-area Deployment

Another issue of a distributed load balance strategy is to exchange load information among service nodes. As shown in Figure 3, BBCLB employs a two-tiered architecture. In the outer tier, computing nodes are grouped into different domains and forming the CROWN Computing Environment. A BBS is deployed


for each domain to coordinate the load of nodes in the same domain. All BBS may exchange load information with each other in the logical load information backbone.

By using this architecture BBCLB can be scalable in a wide area environment. When BBS is over loaded, an inter-domain load balance procedure will be started. It is much similar to the inner-domain procedure we discussed earlier. New requests may be redirected to other BBS.

Figure 3. Two-tier architecture of BBLCB 4. Implementation

In this section we present the detail of system implementation of BBCLB. 4.1 Asynchronized Message Exchange Pattern

A large amount of services in grid environment perform complicated computing progresses that may have a rather long running time. The message exchange pattern (MEP) of synchronized RPC described in SOAP specification[17] may not be that applicable since the HTTP connections usually used to carry SOAP message may timeout before the response is produced. Asynchronized request/response protocol is used in our system. A client generates a request and sends it to a service container. The service queues the request and sends a receipt immediately. The client can queries for the result by referencing the receipt or gets notification when the result is ready.

It is very important to have this message exchange pattern standardized to support heterogeneous system in wide-area environment. As shown in Figure 4 we implement BBCLB in CROWN Grid as part of extension of NodeServer which is a service container based on Globus Toolkit 4. Our solution of the asynchronized MEP is based on the WSRF / WSN specifications[18] so that the asynchronized protocol described above can be applied in a standard way. The client creates a resource on the target service to start a computing procedure. The resource creation request returns immediately with an endpoint reference (EPR).

The client can query the result associating with the EPR later, or a subscription can be made upon the resource and the client will be notified whenever there is a change to the properties of that resource.

Figure 4. The BBCLB extension of CROWN NodeServer

4.2 Loose Coupling

We assume that there is no system (or data) dependency of grid services. Service requests can be transferred among service instances deployed on different nodes. That is each service can be substituted by another service of the same type. This loose coupling condition is true especially in many computation intensive applications. In this situation the client doesn’t care about the actual node that generates the result, as long as the proper result will be delivered to the client eventually. Under the assumption above we can implement BBCLB by modifying the service container only. Neither business logic nor its implementation of a service needs to be changed.

Figure 5 shows the modules involved in BBCLB. We implemented three major modules in CROWN NodeServer and an independent bulletin-board service.

Figure 5. Modules of BBCLB

A load balance handler is deployed in the service container and intercepts all incoming requests. It queries the load analyzer for the system load information. When the system is busy the handler may start the load transfer progress and redirect the request


to a BBS through a load transmitter. The current load situation is also reported to BBS.

A load analyzer is used to collect the load information of the node. Each service can assign a definition of load. By default the load analyzer report the system load based on the CPU usage of the hosting environment.

A load receiver checks the BBS for some extra work periodically when the system is idle. It also maintains a list of the load information of other nodes so that it can pull work load from a busy node directly without the help of BBS. The list is updated by using the load information brought with the extra wok.

A bulletin Board Service (BBS) is deployed as a grid service. It maintains a queue of transferred request from busy nodes. For each request a service type, a timestamp of receiving time, an identifier of source node and the load of the source node are recorded. Figure 6 presents the structure of BBS. It exposes 3 interfaces to support work load transition as well as work load information exchanging.

Figure 6. Structure of Bulletin-Board

Service 4.3 Service Deployment and Configuration

When user deploys a new service into CROWN NodeServer, a BBCLB feature can be enabled by modifying the deployment descriptor file (WSDD). We add several extra parameters into the WSDD and they are interpreted by the Load Balance Handler intercepting the requests.

BBS_URL: assigning the URL of the BBS. Each service can use its own BBS overriding the default BBS assigned in the global setting.

THRESHOLD_ALPHA: assigning the threshold value of α which is the lower bound of normal status.

THRESHOLD_BETA: assigning the threshold value of β which is the upper bound of normal status. 5. System Evaluation

In this section we present the performance evaluation of our system. 5.1 Experiment Setup

The resources of experimental are mainly from CROWN test-bed[4]. A cluster of 21 nodes are employed to form a single domain, naming cu01~cu21. Each node has 2 Intel Nocona Xeon 2.8G CPU, 2G memory and 36G SCSI hard drive. Nodes are interconnected with a gigabit Ethernet. A linux kernel 2.4 is used. All system clocks are synchronized. We deploy CROWN NodeServer with BBLCB on each node, and a bulletin-board service is deployed on cu21. Requests are generated from a client running on a separate PC. The target of an individual request is picked from cu01~cu20 according to a normal distribution, so that cu10 and cu11 receive most of requests while cu01 and cu20 receive least. The waiting time between requests distributed according to the exponential distribution with the mean of 200 (ms). The behavior of the target service is to perform some calculation for a specific period (controlled by a parameter service_thinking_time, and equals to 10 seconds in this test) and return.

Four tests are performed. In the first test we disable the BBCLB so that each node process all requests it receives. And then we enable BBCLB with 3 different set of thresholds. 10000 requests are sent in each test case. We collect the service invocation information by using the log and statistic functions that CROWN provided. We send the same sequence of requests with same interval sequence, so the input of each test keeps identical. We use the request queue length as an indicator of the system load. 5.3 Result

∑

∑

=

=

=

−=

n

ii

n

ii

rn

rwhere

rrn

1

1

1

,1　σ

(4)

)(1)(

,)()(1)(

1

1

trn

trwhere

trtrn

t

n

ii

n

ii

∑

∑

=

=

=

−=σ (5)

Firstly we analyze the overall effect of load balance when we introduce BBLCB. Figure 7 shows the number of requests been sent to and processed by each node in a test. In this test we have BBLCB enabled and the thresholds (α,β)=(16,128). We can see that, by


0

100

200

300

400

500

600

700

800

900

cu01

cu02

cu03

cu04

cu05

cu06

cu07

cu08

cu09

cu10

cu11

cu12

cu13

cu14

cu15

cu16

cu17

cu18

cu19

cu20

Request Received

Request Processed

Figure 7. Number of requests on each node

(BBLCB enabled, α=16, β=128) Figure 8. Change of Load Balance Factor (σ) during

the test using the BBCLB the load of each node can be effectively balanced. We define the load balance factor σ in (4) to quantify the effect above. In (4) n is the number of nodes, ri is the requests processed by node i. Without using BBCLB we have σ = 209.7, while by using BBCLB it is reduced to 3.4. This is not surprising because the size of requests queue of each node is bounded by the thresholds.

We also calculate the runtime load balance factor σ(t) for each test. It shows the unbalance situation of time t. And in this case ri(t) is defined as the requests waiting in the queue of node i at time t. We present the change of σ during the test in Figure 8. As we can see, the running progress of test is much balanced when

using the BBLCB. Since the threshold β controls the upper bound of the request queue in each node. We also notice that a smaller β may leads to a more balanced running progress. However the overhead of interacting with BBS may be considerable when we use a very small β. So a proper value of β should be set according to the processing capability of a node.

Figure 9 shows the work load of these nodes during the test. We choose 5 typical nodes form the 20 test nodes. Without using BBCLB a node (cu11) may reach a load peak of 600, while it can be obviously reduced when we use BBCLB. The work load of each node is much similar and the total time span of finishing all request are also reduced about 30%.

(a) Without using BBCLB (b) α=4, β=64

(c) α=8, β=64 (d) α=16, β=128

Figure 9. Work load of nodes during the test. The workload is evaluated by the number of waiting jobs in the queue.


We log the processing of each request. The processing time span, i.e. the time between a request being sent from the client and the response generated for that request, is analyzed, The result is shown in Figure 10. Without using BBCLB, any requests must be waited in queue before it is processed, while with

BBCLB enabled the BBS may be involved to help the request being transferred to another node. The processing time span of a request may not increase linearly as it does in the test with BBLCB disabled, but rather stops at a certain value which is related to the processing capability and the density of requests.

(a) Without using BBCLB (b) α=4, β=64

(c) α=8, β=64 (d) α=16, β=128

Figure 10. Request processing time span. x-axis shows the start time of a request.

The evaluation in this section shows the effectiveness of BBCLB. It can be used to balance service requests between service containers with acceptable overhead, so that the throughput of the whole system can be improved. 6. Conclusion and Future Work

In this paper we presented techniques for balancing the load of service invocation among service containers in a grid environment. Instead of deploying a central load balancer, we introduced several load intermediates called bulletin-board services. By using BBS a new collaborative load balance strategy is proposed which uses a non-preemptive load selection policy. Distributed node can share load information and exchange load of service invocation with each other. We implemented our system by developing a set of services and extending the grid service container.

Finally we deployed our system in CROWN Grid and performed a set of evaluations. The result shows the effectiveness of our approaches.

Although the system we proposed supports load information exchange between BBS, the topology of these BBS can be a key factor affects the system performance especially in some application specific scenarios. The test-bed we used for the evaluation is not as dynamic as we desired. We are building a grid environment which contains much more resources that are highly dynamic and is closer to real production system. We believe by using such a test-bed the evaluation result can be much more convincing and the approaches we proposed can be refined in depth. Also the model we applied in determining the load status on a single node can be improved by using a more precise mathematical model we leave these issues as the future work.


Acknowledgements

Part of this work is supported by grants from the China National Science Foundation (Project No. 91412011, 60525209, 60473010), China Ministry of Science and Technology (under contract 2005CB321803). The authors would like to thank Lei Li for her valuable contribution to this work. References [1] M. Livny and M. Melman. Load Balancing in

Homogeneous Broadcast Distributed Systems. In Proceedings of the ACM Computer Network Performance Symposium, page 47-55, College Park, Maryland, United States, 1982.

[2] T. Kunz. The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme. IEEE Transaction on Software Engineering, 17(7): 725-730, 1991.

[3] Foster I. The Physiology of the Grid – An Open Grid Service Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, 2002.

[4] CROWN Project. http://www.crown.org.cn. [5] Linux Virtual Server Project web site.

http://www.linuxvirtualserver.org/. [6] M. Litzkow, M. Livny, and M. Mutka. Condor – a

hunter of idle workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems, page 104-111, San Jose, CA, USA, 1988.

[7] S. Zhou. LSF: load sharing in large-scale heterogeneous distributed systems. In Proceedings of Workshop on Cluster Computing, 1992.

[8] H. Nakada, M. Sato, and S. Sekiguchi. Design and implementations of Ninf: towards a global computing infrastructure. Future Generation Computing Systems 5(6): 649-658, 1999.

[9] R. Levy, J. Nagarajarao, G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef. Performance Management for Cluster Based Web Services. Technical Report, IBM TJ Watson Research Center, 2003.

[10] Foster I., Kesselman C. Globus: A Metacomputing

Infrastructure Toolkit. International Journal of Supercomputer Applications, 11(2): 115-129, 1997.

[11] J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke. Condor-G: a computation management agent for multi-institutional grids. In Proceedings of the 10th IEEE Symposium on High Performance Distributed Computing, San Francisco, CA, USA, 2001.

[12] D. Abramson, J. Giddy, and L. Kotler. High performance parametric modeling with Nimrod/G: killer application for the global grid? In Proceedings of the 14th International Parallel and Distributed Processing Symposium, Cancun, Mexico, 2000.

[13] Y. Tanaka. Ninf-G: grid RPC system based on the Globus toolkit. In The 2001 Globus Retreat, San Francisco, CA, USA, 2001.

[14] Junwei Cao. Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling. In Proceedings of 17th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2003), Nice, France, April 2003.

[15] L in F C H, Keller R M. The gradient model load balancing method. IEEE Transactions on Software Engineering Special issue on distributed systems,13(1):32-38, 1987.

[16] Shu W, Kale L V. A dynamic scheduling strategy for the Chare2Kernei system. In Proceedings of Supercomputing’89, page 389-398, Reno, Nevada, 1989.

[17] Simple Object Access Protocol (SOAP) Specification. http://www.w3.org/2000/xp/Group/

[18] WSRF / WSN Specifications. http://www.oasis-open.org/committees/tc_home.php.

[19] Chuming Hu. Research on Service Oriented Grid Middleware with End to End Quality of Service. Doctoral thesis, Beihang University, 2005. (In Chinese)

[20] Menno Dobber, Ger Koole, Rob van der Mei. Dynamic Load Balancing Experiments in a Grid. In Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid, vol 2, page 1063-1070, Cardiff, UK, 2005.


[ieee seventh ieee international symposium on cluster computing and the grid - rio de janeiro,...

Documents