[ieee 2006 seventh international conference on parallel and distributed computing, applications and...

Processor Allocation in Mesh Multiprocessors Using a Hybrid Method

Sanya Attari Ayaz Isazadeh Islamic Azad University Department or Computer Science

Ahar Branch Tabriz University Ahar, IRAN Tabriz, IRAN

sanya. [email protected] [email protected]. ir

Abstract

Mesh-connected systems have become popular because of their simple structure. Most of the allocation strategies in mesh systems are contiguous or noncontiguous. We propose a new hybrid processor allocation algorithm for mesh-connected systems. This method starts by processor allocation, contiguously; when contiguous aNocation is notpossible, the request is decomposed into smaller sub-meshes, such that for each sub-mesh a region can be allocated. Regions formed in this method have no regular forms and as a result all the free processors in a mesh are usefil in allocating process and number of rejected requests have become minimum. Compared to the other schemes, the proposed algorithm minimizes the communication delay among the selected processors. Our method combines the advantages of both contiguous and non-contiguous allocation schemes. We will show that it achieves minimum job response time and waiting time compared to the other strategies as well as improving the system utilization by using all idle processors in the system.

Index Terms- mesh multiprocessor, allocation, algorithm, noncontiguous allocation, contiguous allocation, fragmentation.

1. Introduction

Among the topologies proposed for multicomputers, the mesh topology has gained popularity because of its simplicity. In this topology we assume that processors maintain as a mesh of nodes. One of the most popular mesh-connected models is two-dimensional mesh, for which a number of different processors allocation approaches have been proposed. To improve the system performance in mesh-connected multicomputers, it is important to find an efficient processor allocation scheme, allocating free sub-meshes to incoming jobs.

In this paper we propose a new algorithm, solving some of the problems of both contiguous and non-contiguous allocation approaches. Our objective in this method is to solve some of the problems that exist in previous methods, like fragmentation, communication delay, rejected requests, and complexity.

Our method, first tries to allocate requested processors in the way of contiguous strategy. If it was not possible, it uses a non-contiguous scheme to allocate the requested processors. The underlying data structure for this method is provided by a simple algorithm, which constructs and maintains updated regions of free processors. Working with these regions the method can allocate the requested processors in the best possible way and, thereby, improve the overall system performance.

2. Background

In a multiprocessor environment an incoming job Two types of approaches are available for processor requires requests a number of processors to allocation on mesh-connected systems: contiguous and complete the task, The processor allocation is the n~n-contiguous [1,2,31- Many contiguous Processor process of selecting and allocating available processors allocation ~ ~ h e m e s are proposed to minimize the delay for the incoming job. The selection process attempts to of interprocessor communication. The fragmentation select the requested processors in minimal problem, 0ccw~ed in contiguous approaches, can communication cost. increase the job waiting time and decrease system

Proceedings of the Seventh International Conference onParallel and Distributed Computing,Applications and Technologies (PDCAT'06)0-7695-2736-1/06 $20.00 © 2006

performance. To solve these problems, non-contiguous allocation strategies have been considered. These strategies offer several significant advantages over the contiguous allocation, including the elimination of internal and external fragmentation and limitation of queuing delay of jobs. But note that these schemes also introduce potential problem due to message contention and potential communication interference with other jobs. Most successful scheme may be a hybrid between contiguous and non-contiguous schemes. We provide a brief description of existing processor allocation strategies, compare their characteristics, and point out the points of strengths and weaknesses.

2.1. Contiguous allocation schemes

Some major contiguous processor allocation schemes are outlined here.

Two Dimensional Buddy Strategy: Two- dimensional meshes are a popular interconnection topology for multicomputers because of its simplicity. Most of the processor allocation algorithms for two- dimensional meshes including Two Dimensional Buddy Strategy [ I ] have virtual fragmentation problem. Two-dimensional Buddy system is usable only in square mesh systems of size k x k where k = 2

for an integer n. the size of requested sub-meshes are rounded up to the nearest power of 2. This rounding leads to a large number of processors being wasted due to internal fragmentation. For a mesh M (k, k) (where k = 2 "), the scheme will maintain (n + I ) list of free sub-meshes, the ath list sorting the free sub-meshes of size 2" x 24 0 5 a 5 n. If the list is empty, a free sub- mesh in the list of larger size is decomposed into smaller ones to allocate the task.

Frame Sliding Strategy: This strategy [2] is applicable to a mesh of arbitrary size and shape. FS allocates a sub-mesh to a requested job with internal fragmentation elimination. To allocate an incoming request S (w, h) the FS strategy searches for an available frame of the requested size. The search starts from the lowest leftmost available node. If the processors in currently examined frame are not available (the frame is slide over the mesh system to the next candidate frame, which is either w nodes away in the x-dimension or h nodes away in the y-dimension, depending on the current position of the frame), It should be noted that the FS strategy is better than buddy, at finding a sub-mesh for an incoming request.

Adaptive Scan Strategy: The Adaptive Scan strategy [4] was proposed as an improvement over the Frame Sliding strategy. This strategy is usable for mesh systems of all sizes and shapes and it eliminates

internal fragmentation by allocating sub-mesh of exactly the requested size. AS searches a frame likes the FS strategy. But instead of using fixed strides of w and h, it uses a fixed vertical stride of 1 and an adaptive stride for the horizontal direction. If no free frame is found to allocate the current request, the orientation of the required frame is rotated by 90 degrees.

EFPA Strategy: When an incoming job requests a sub-mesh of size m x n, EFPA (Extended Flexible Processor Allocation) [5] first tries to allocate the conventional rectangular sub-meshes (m x n, n x m, m/2 x 2n, 2m x 1112, 2n x m /2 and 1112 x 2m). If these sub-meshes are not available, EFPA allocate L-shaped sub-meshes instead of signaling the allocation failure. These mesh manipulations sometimes increase the running time of the job. But allocating the L-shape sub-meshes, instead of waiting in a queue, can provide a high system utilization. The free list of rectangular sub-meshes used in this strategy is sorted by increasing order of shorter edge of sub-meshes. The free list of the L-shaped sub-meshes is arranged as a sequence in the increasing order of the longest edge of the L- shaped sub-meshes.

Best-Fit and First-Fit Strategy: The Best-Fit and First-Fit strategies [6] are applicable to mesh systems of any sizes and shapes. FF and BF are able to allocate sub-meshes of the requested sizes precisely, which eliminate internal fragmentation. Two binary (011) arrays are used to speed the search process. The first array, called busy array, is used to store the allocation state of the mesh. The second array, called free-base array with respect to a task, is such an array that element FB [i, j ] has the value 0 if processor [i, j ] can serve as the free base to host the task. These strategies check all possible frames of the requested size without checking all the processors in the mesh systems. The BF policy chooses a comer from the smallest region. The Best-Fit frame is allocated to the current request. FF strategy stops its search as soon as it finds a free base in the coverage array.

Leapfrog Method: In Leapfrog method [3], a new data structure, the Run-length array (R-arrayl, is proposed to represent the mesh. The element in the R- array stores the statistical information about the occupied conditions of the mesh. The scheme only checks the left boundary and the set of right-border line segments (RBL) of the allocated sub-meshes with respect to the task. An R-array is a two-dimensional array, representing a mesh. For each processor p, there is an element in the R-array storing the length of the free or occupied run counted from p. To search for free bases, the first-fit process of the Leapfrog scheme


scans the elements in the R-array from the lowest-left comer. The process continues unless it reaches an element whose top and right processors are not free, in which case the element is rejected. The first-fit process aborts its search once it finds a free base. However, the best-fit process will not stop its search until all the candidate processors are checked.

2.2. Non-contiguous allocation schemes

Several non-contiguous processor allocation schemes have been proposed.

Random and Multiple-Buddy System (MBS): In random scheme, when a job requests n processors, it is allocated with n randomly selected nodes. MBS [7] is an extension of the Two- Dimensional Buddy strategy. In this strategy, if an incoming job requests n processors, it is factorized into a base four

representation of zIOg4 r=O di (2i x 2i ) where

O g i 4 . Then the request is allocated to the system according to the number of blocks required. If a required block is not available, the MBS recursively searches for a bigger block and repeatedly breaks it into buddies until it produces a block of the desired size.

Nahe: Another non-contiguous allocation strategy, called Naive [7]. In this strategy a request for k processors is satisfied by k first free processors in a row major scan. Like Random strategy, Nazve doesn't have any fragmentation. Allocation and deallocation algorithm's complexity for Random and NaYve is 0 (k).

Paging Allocation Strategy: The other simple non- contiguous strategy is Paging allocation strategy 181. This method, initially divides the entire mesh into

pages, square blocks with side length 2page-s'ze . Pages are basic units for allocation. A request for k processors is satisfied by allocating free pages until all the requested processors allocate. The order, in which the pages are scanned, is determined by the indexing scheme. There is an ordered list in this strategy for keeping unallocated pages. Each page entry contains row and column's indices and a unique order index assigned by indexing scheme. When a new job - -

requests it processor^ 2 pqc-s i ze 2poge-srzr I entries

are removed from the list and corresponding pages are assigned to the job. The complexity of this algorithm is O(k).

3. The Proposed Algorithm

When a task comes up, the allocation process searches the mesh and assigns a free sub-mesh large enough to the task. Generally the process allocates a contiguous sub-mesh to the task. Recently, some research proposed different ideas such as non- contiguous allocation scheme to increase the degree of system utilization. We propose a new scheme for processor allocation using all idle processors in the system to decrease the system fragmentation.

First of all we make a matrix composed of 0s and 1s (called Basic Matrix), where 0s represent free processors and 1s represents busy processors.

Representing the current condition of the mesh, this matrix is used for specifying the free processors in the mesh and as a basis for an optimal matrix, which in turn will be used in proposed method. We are, in fact, looking for all the mesh regions of free connected processors. To accomplish this, we scan the Basic Matrix line by line, finding free regions. While scanning, every time an element of value 0 is hit, we change it to 2, indicating that the element has been scanned. Then we consider the four neighboring elements. For any of these elements with value 0 (i.e., the corresponding processor is free), we continue the scanning process in the direction of the element. This process continues until no more neighboring 0 (i.e., free processor) is left, at which time the framing process for this region stops. Consequently the regions constructed in this way represent regions of free connected processors.

During the framing process, for every region a counter is set, specifying the number of free processors available in the region; this information is kept in a small array, called$array, and used by the system in allocating processors to each request.

The execution phase starts with arrival of a request to the system. Based on the number of processors requested, using the f-array, a suitable region in Best- Fit model is selected. If no region contains the number of processors requested, then we first select the biggest possible region and next, based on the remaining number of processors, start searching for another suitable region. This process is continued until all the requested processors are allocated for the incoming job.

As described, this method by finding the regions in the best way and allocating regions based on the requested number of processors, and by using the


Best-Fit model, attempts to solve the fragmentation problem. The method, also, is capable to specify the regions containing maximum number of connected processors, and consequently solves the communication delay problem.

An advantage of this method is it's capability of specifying the regions in all possible shapes, not just in rectangular shape, and thus, the regions can contain all the connected processors in all different shapes.

Also, in the case that the number of requested processors is fewer than the number of processors in the smallest region, this method selects the region and allocates the processors with the most appropriate connection. Figure 1 shows an example of proposed method.

f-array

Figure 1: An example of proposed method

4. Evaluation

Simulation has been performed to compare the performance of the proposed algorithm with other processor allocation schemes. The performance of non-contiguous allocation scheme depends on a communication latency of each job. The performance is measured in terms of the mean response time for job and system utilization. The size of mesh system is 10 x 10. The requested job size has side length of 2 - 30. The service time has exponential distribution with mean of 8 time units. Figure 2 shows the mean waiting time of our algorithm, which is lower than other strategies. Mean job response time of this method (shown in Figure 2), compared to the other strategies, has better result when the number of jobs is increased. The improvement in mean response time and system utilization will be significant for multiple computer systems.

Workload

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

workload

Figure 2: mean job response and mean waiting time vs. the system workload

5. Conclusion

We have proposed a new efficient hybrid processor allocation algorithm for mesh-connected systems. It


starts by processor allocation, contiguously; when contiguous allocation is not possible, the request is decomposed into smaller sub-meshes, such that for each sub-mesh a region can be allocated. Allocating of jobs to non-contiguous nodes, allows jobs to be executed without waiting if the number of processors are sufficient (i.e., reduced queuing delay). Our method is applicable to any size of requested sub- meshes and reduces the fragmentation problem. Because of the reduced fragmentation and queuing delay, our algorithm has shorter mean response time compared to the other approaches. Compared to the other schemes, the proposed algorithm minimizes the distance and consequently the communication delay among the selected processors. Our method combines the advantages of both contiguous and non-contiguous allocation schemes. We have shown that it achieves minimum job response time and waiting time compared to the other strategies as well as improving the system utilization by using all idle processors in the system.

References

[I] K. Li and K. Cheng. A two-dimensional buddy system for dynamic resource allocation in a partitionable mesh-connected system. Parallel and Distributed Computing Journal, 12(1), 1991.7943.

[3] F. Wu, C. Hsu, and L. Chou. Processor allocation in mesh multiprocesso-r using the leapfrog method. IEEE Transaction on Parallel and Distributed Systems, 14(3), 2003,274-289.

[4] J. Ding and L. Bhuyan. An adaptive submesh allocation strategy for two-dimensional mesh-connected systems. In Proceedings of Int'l Conference on Parallel Processing, 1993,193-200.

[5] Kh. Seo and SC. Kim. Extended Flexible Processor Allocation Strategy for Mesh-Connected Systems Using Shape Manipulations. In Proceedings of Int 'I Conference on Parallel and distributed systems, 1997,780.

[6] Y. Zhu. Efficient processor allocation strategies for mesh- connected parallel computers. Parallel and Distributed Computing Journal, 16, 1992,328-337.

[7] C. Chang and P. Mohapatra. An adaptive job allocation method for the directly connected multicomputer systems. In Proceedings of Int 'I Conference on Distributed Computing Systems, 1996,224-232.

[8] V. Lo, K. J. Windisch, W. Liu and B. Nitzberg. Noncontiguous processor allocation algorithms for mesh-connected multicomputers, IEEE Transaction on Parallel and Distributed Systems, 8(7), 1997, 227-236.

[2] P. Chuang and N. Tzeng. Allocating precise submeshes in mesh connected systems. IEEE Transaction on Parallel and Distributed Systems, 5(2), 1994,211-2 17.


[ieee 2006 seventh international conference on parallel and distributed computing, applications and...

Documents