load-balancing policy falls into two broad groups:cs550/projects/december_6th/rao_lata... · web...

Illinois Institute Of Technology

Comparative Operating Systems – FALL 2001

Professor : Marius SoneruSite: Internet (Section 251)Report Topic: Load Balancing in Distributed SystemsName: Ms. Lata S. RaoSID: 333-96-1950Email: [email protected]

1

Introduction to and Need for load balancing in distributed systems:

Distributed systems are composed of several loosely -coupled independent computers communicating over a high-bandwidth network. This collection of independent computers presents an uni-processor view to the user i.e. several computers collectively “co-operate” to satisfy the user’s request. Users of such a system submit tasks at their host computers for processing. In the case of a standalone workstation, the task submitted by the user would be immediately scheduled for processing depending upon the scheduling algorithm used. But in a distributed system where the resources are shared and the user has a uni-processor view of the resource pool, the first step in scheduling the submitted task is to decide where to schedule it and this brings the issue of load balancing into picture.

In general load distribution, a general form of load balancing can be defined as the process of allocating resources to a task from a pool of resources depending upon parameters like:1) Process/tasks requirements (type, size, priority etc.) of the process,2) Resource availability, that is the current state of the distributed system , and3) Static, dynamic or adaptive rules for such process in the distributed system.

The basic fact that submission of tasks by users on different hosts is random leads to a situation where some of the hosts are highly loaded while others are not. This issue is more prominent in heterogeneous distributed systems wherein each of the hosts have varied computing power. Even in the case of a homogeneous distributed system, system performance can be improved by appropriately transferring the load from heavily loaded computers (senders) to idle or lightly loaded computers (receivers). The two basic criteria for any load balancing system are the definition of performance and load. These two criteria are responsible for the active decision making in the load balancing process. Evaluating this criteria at run time “efficiently ”and deciding upon representations of this criteria (for ex. Average response time as a measure of performance etc.) is the most important issue, as elaborated in later sections.

Load Balancing as a special case of load distribution:

Load balancing and load sharing can be considered as special cases of load distribution, differing on their load distributing principle. Both strategies attempt to decrease the probability of an idle state for a resource by transferring tasks to lightly loaded nodes. Further, load balancing algorithms attempt to “equalize” loads at all nodes. This implies that load balancing strategy generally involves higher overhead since task transfers occur at a higher rate than a load sharing strategy.

2

Load Balancing Architectures:

Depending upon the logical placement and implementation of load balancing module there are 2 possible architectures:

1) Centralized Load Balancer:

In this scenario the load Balancer sits at the core of the network connecting the different hosts in the distributed system. It is responsible for:

- Implementing the Information policy (elaborated later) using one of the possible algorithms.- Implementing the transfer policy (elaborated later) using one of the possible algorithms.- Providing the basic function of load balancing.

3

Advantages:

- Load balancing algorithms implemented at one location than at all other hosts.- Possibility to centralize the policy for transfer of tasks.- Highly adaptive implementations possible at correspondingly lower levels of complexity in all

the hosts.- Higher level of adaptability to heterogeneous environments and networks.- No or minimal additional overhead on each individual host due to the Load Balancing support, as

they do not spend time in decision making for the load balancing purposes.

Disadvantages;- Relatively more expensive as need for a dedicated host and resources for the Load Balancer.- Single point of failure for load balancing purposes.(can be avoided by HOT Standby)- Higher levels of Scalability Issues as the size of the distributed grown in terms of the amount of

resources it handles and the amount of clients it server.

2) Peer to Peer Load Balancer:

4

In this scenario each individual host is responsible for performing the load balancing function. This makes each module highly independent in terms of load balancing operation. Also there is no single point of failure for the load balancing system. But it also increases the complexity and the overhead on each system.

Advantages:

- No single point of failure, i.e failure of any single host does not bring the whole system down.- Independent unit of operation for each host.- Highly scalable architecture as the power of load balancing infrastructure grows with the resource

pool.

Disadvantages:

- More complex functionality especially for adaptive algorithm implementations.- Overhead on each individual host adds up to the load of the system.- Relatively lesser adaptable to heterogeneous environments.

The different peripheral blocks in the above Diagrams are elaborated in the following sections.

Client Instances:

These are the user processes, which require appropriate resources and hence have to be passed through the load balancing rules before they can be scheduled and hence allocated resources.The load balancing algorithms generally employ “Location transparency” for user processes i.e. the user processes do not need to know where they get executed. These processes have some parameters like their priority, their type, etc. which may be taken into consideration by the load balancing policy.

Resources:

These are the various resources, which have to be load balanced. Examples of these resources are CPU, memory, bandwidth, a Software Server process etc. These resources may be considered as a composite or as an individual depending upon the granularity of the load balancing strategy. In case these resources are used as composite, an important issue is the factor of influence each resource has on the overall system resource utilization. Also the load status or the information policy for the resources can be categorized as follows:

- Broadcast: In this case each resource broadcasts its state of load to each of the load balancer module which may use this information for the decision making process of “where” to execute the task. This policy is simple to implement and quite general purpose; but since the broadcast occurs irrespective of the fact that it may not be required, it involves quite an amount of overhead in terms of message passing.

- Kernel Monitor Process based: In this case a kernel-dedicated process is responsible for collecting information from the various resources about their load states. This can be done either periodically or depending upon some criterion. This involves simpler implementation of resources but involves the overhead of the kernel process for monitoring and statistic collection.

5

But it has the advantage that it can be highly adaptive to different load scenarios and hence may decrease the overhead due to its own operation in the case of high load scenarios.

- Publish - Subscribe Model: In this scenario, the Load balancer module can subscribe to the various resources to accept their load status. And it is responsibility of each individual resource to publish its status (either periodically, demand-driven or after a state change). Also complicated implementations may allow changes in the subscription content during runtime to improve adaptability.

Communication Network:

This section forms the interconnecting link between the various hosts of the distributed system. It is responsible for providing a fast, reliable and fault-tolerant communication mechanism for the basic message passing mechanism among different hosts. It is also responsible for providing protocol conversions for distributed systems consisting of heterogeneous network support.

Load Balancer:

This module is the one, which is responsible for implementing the various load-balancing algorithms, and ensuring that the load states are fairly distributed among the various resources in the system. It should be able to avoid unstable System State for cases like “processor thrashing”.

Apart from improving the system performance as being the main goal of the load balancer, there are other goals which its must satisfy:

1) Scalability: This implies comparable behavior of the load balancer when more resources are added to the distributed system and hence implies minimal overhead on scheduling decisions.

2) Consistency of Task operation: The results produced by executing the task at another host than the one it was submitted should be the same , given if it were not transferred.

3) Location transparency: Remote execution of tasks should not require any specific requirements from the task structure and also the task should not be aware of its location of execution.

4) Heterogeneous Architecture Support: This would involve supporting various different types of CPUs in terms of their architectures, processing power, special hardware etc.

Load Balancing Policies:

Load-balancing policy falls into two broad groups: (a) Static: Static policies use algorithms which operate without regard to run-time loads across a system,

while dynamic policies use the run-time performance of various parts of a system in order to make more informed decisions about balancing.

(b) Dynamic: A dynamic load-balancing policy is one, which uses run-time state information in making scheduling decisions. There are two kinds of dynamic policies:

b.1 Adaptive This may adjust policy parameters in order to gradually improve their performance. b.2 Non-Adaptive. This always uses the same (fixed, load-dependent) policy.

6

The key point is that while non-adaptive policies use only the information about the run-time state, adaptive policies use, in addition to this, information about current performance. In adaptive policies, the rules for adjusting policy parameters may be static or dynamic. An example of the static might be shift to a conservative migration rule when system-wide load patterns are varying too rapidly. An example of the dynamic could be increase sender-side threshold when migrated jobs cause slowdown rather than speedup. Some researchers refer to the performance-driven adaptation exhibited by the second policy as learning. Since both non-adaptive policies and adaptive policies with static rules really use only load information, it is confusing to distinguish between them. One way to avoid such confusion is to restrict the use of the word “adaptive” to policies that use performance feedback in order to drive their adjustment of policy parameters.

Issues in Load Balancing Policy:

Several factors and configurations (apart from its classification as either static, dynamic, adaptive, etc )determine the most efficient load balancing policy for a particular requirement. Some of the crucial factors which go in the decision making of the load balancing policy are as follows:

Load and Performance Metrics:

Defining proper load and performance indices are crucial to improving the efficiency of a distributed system. Some for the indicators for load are the CPU queue length (i.e. the number of processes waiting for the CPU) and processor utilization. Although processor utilization is a better indicator of the load, it requires a background process for monitoring the CPU utilization continuously and hence imposes more overhead as compared to evaluating the CPU queue length. Different implementations use either of the above or a hybrid combination of the two.

Pre-emptive vs Non pre-emptive transfers :

To achieve an even distribution of the workload in a distributed system, either preemptive or non-preemptive load distribution strategies are used. Preemptive load distribution involves process migration, while non-preemptive strategies are based on initial placement of processes on the machines. Process migration is a mechanism where a process on one machine is moved to another machine in a distributed system. Process migration is an expensive operation as the state of the task (i.e. its virtual memory image, process control block, unread I/O buffers etc.) have to be transferred in addition to the environment details the task needs.

Load Balancing Algorithm Granularity:

The load balancing algorithm granularity depends upon whether the entire host is considered as a single resource and hence the load in this case refers to the load on the entire system or as a composite of all resources. More complex implementation segreggate the various resources on all hosts into various classes of resources and apply load balancing to all the classes of resources. Further complexity may involve applying different load balancing policies to different classes of resources.

7

Composition of Load distributing algorithms :

A load-distributing algorithm is composed of the following:

a) A transfer policy, that determines whether a node can be made to participate in a task transfer. Transfer policies are based upon threshold and special resource requirements. Threshold is the case wherein tasks are transferred , if the number of tasks on a particular node exceeds the threshold. In such case the host where the threshold is exceeded becomes the sender and the host where the threshold is not yet reached is made the receiver. Implementation of this policy mostly involves non pre-emptive task transfers.Special resource requirements is the case wherein the most recently spawned task requires a special resource and is not available on the local host (even though its threshold limit for the number of tasks has not yet reached). Most implementations would use a hybrid combination of the above two and some even involving the imbalance in the load among the nodes as an important factor.

b) A selection policy that determines which task is eligible for transfer, once the transfer policy has decided that the node is a sender.Selection policy is based upon criteria like threshold, special requirements, etc.Threshold is the case wherein, the task, which has just arrived and caused the threshold to exceed, has to be transferred and it is a non pre-emptive transfer.Special requirements is the case wherein, the threshold has exceeded but there is information that an already executing task’s requirements are more suitably satisfied on some other node and hence decision to migrate that process rather than transfer the process which caused the threshold to exceed is made.

Any selection policy implementation is limited by the following factors:- Over head incurred in the transfer should be compensated for by the reduction in the response time

realized by the task.- The transferred task should probably experience a decreased response time.- Task transfer overhead should be minimal (as would be for smaller tasks).- Location dependent calls for the selected tasks should be minimal.

c) A location policy that determines to which node a selected task can be sent.This policy is tightly coupled with the Selection policy. For. e.g if a newly created task caused the threshold to exceed then any particular node which is a “receiver” can be selected to be the next host on which the task is to be executed next. On the other hand, if the criteria to select a task was that of a “special” requirement of the task, then the decision of whom to send the task involves finding all receivers and then deciding who is the most suitable one to transfer the task to, depending upon the task requirements. Finding out which node is a receiver, may involve polling individual nodes (either periodically or when the demand arises, & either serially or parallely through a broadcast) or may involve collecting information from a single central system responsible for maintaining the state of the distributed system. Also, nearest neighbor or previous pooling results may be used to decide which node to poll.

8

d) An information policy, that fires the collection of system state information. This information is used by the transfer policy for decision making and by the location policy as well.This policy forms the heart of the load-balancing algorithm in terms of maintaining its consistency and hence it’s proper run-time implementation. This policy should not have considerable over head as well as it should be quite adaptive to the various possible states of the distributed system.The logical classification for load balancing depending upon its implementation is as follows:

- Central Controller Based: In this scenario a central system is responsible for collecting the state information of the entire distributed system. Each individual node may request this controller for appropriate information as and when it requires. Further, in more complicated systems, the controller can be made to implement the location policy instead of each individual hosts. The advantage of such architecture is that it simplifies the implementation of the location and information policy as it is centralized. Also probability of state inconsistencies is reduced. The disadvantage of such architecture is that it increases the cost of the overall system, and also introduces a single point of failure in terms of load balancing strategy.

- Peer to Peer Based: In this scenario the task of implementing the information policy and the location policy rests in each of the individual nodes. This involves more complicated implementation and also an overhead on each host. But the advantage is that there is no single point of failure and also the load balancing strategy still works in the case of the failure of any of the individual nodes.

Depending upon the means & the time of collecting the information most information policies fall into the following 3 categories:

- Demand driven: In this case information about the state of peer nodes is collected only when a requirement arises to do so. i.e there is no anticipation of future need for load balancing. In most cases this means that whenever a node changes state between a sender and receiver and/or vice versa, information is collected. Thus, inherently this is a dynamic policy. These are further classified into 3 types:

- Sender initiated: Senders looking for receivers to transfer the load.- Receiver initiated: Receivers solicit load from senders.- Symmetrically initiated: Load sharing actions are initiated by the demand for extra

processing power or extra work.- Periodic: Here nodes mutually exchange load information as periodic intervals. This is the

simplest implementation but since its not adaptive it suffers during heavy loads as the information sharing itself imposes a big overhead.

- State Change Driven: In this case information about individual states is exchanged whenever the state changes occur over a particular degree. In this case there could be a use of a “publish-subscribe ” model. Thus nodes interested in specific changes in the states of specific nodes would register with either each of the nodes (in a peer-to-peer model) or to the centralized controller(in the centralized controller model.) It is the responsibility of each node or the controller to broadcast state information changes to the interested nodes.

Stability of a Load Balancing Algorithm:

Stability is generally measured in terms of the sizes of the CPU queues, since it represents the number of pending requests for a particular CPU. When the arrival rate of work exceeds the rate at which the system

9

can perform work, the queues grow unboundedly in which case the system can lead to instability. Such instability can also be caused due to the ineffectiveness of the algorithm in which case the algorithm itself imposes an overhead on the system to lead it to unstable limits. Also it is possible that a stable algorithm may be ineffective in improving the performance of the system. Hence ideally the “effectiveness” of the algorithm is used as a measure for evaluating its functionality in terms of stability.

As an another view of stability the algorithm itself may be fooled due to its inherent rules where in it performs fruitless action indefinitely and hence operated in a “run-away” kind of scenario. This would mean that the algorithm itself has an unstable boundary condition due to its inherent built in rules.

Summary:

Load balancing forms an important strategy for the improvement of the average response time for any user process. Load balancing relies on better Network, CPU and resource architectures and speed. With the advances in these peripheral technologies, the main focus in the development has shifted to efficient implementation of load balancing algorithms, which assume better infrastructure and hence employ efficient adaptive strategies. Higher level of load balancing granularity based on resource provides more efficient control over the utilization of resources. Some of the factors, which have supported the rapid development of load balancing equipment are as follows:

- Dropping prices for network semiconductor resources, etc. due to higher level of advances in these areas.

- Internet serving as a major “requirement” factor for the load balancing methodology for 24 *7 support.

The concept of load balancing and advances in its implementation has led to development of a newer concept in terms of the “cluster”. A cluster is a logical grouping of several resources (basically a distributed system) which addresses load balancing, fail over & availability issues.

References :

1. Advanced Concepts in Operating Systems : Mukesh Singhal, Niranjan G. Shivaratri

10

load-balancing policy falls into two broad groups:cs550/projects/december_6th/rao_lata... · web...

Documents