a tree-based approach to matchmaking algorithms for resource discovery

Received 25 May 2007Revised 9 November 2007

Copyright © 2008 John Wiley & Sons, Ltd. Accepted 11 November 2007

A tree-based approach to matchmaking algorithms

for resource discovery

Md. Rafi qul Islam, Md. Zahidul Islam*,† and Nazia Leyla

Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh

SUMMARY

One of the essential operations in a distributed computing is resource discovery. A resource discovery service pro-vides mechanisms to identify the set of resources capable of satisfying the requirements of a job from a large collection of resources. The matchmaking framework provides a reasonable solution to resource management in a distributed environment; it is composed of four important components as classifi ed advertisement (classad), matchmaker pro-tocol, matchmaking algorithm and claiming protocols. Most of the time required to fi nd a resource depends on the performance of the matchmaking algorithms. A distributed environment introduces a large set of heterogeneous resources which is always changing. The matchmaking algorithms should incorporate with this highly changing environment. In this paper we proposed a fast and effi cient searching method for matchmaking algorithms which also deals with resource heterogeneity. The proposed approach reduces the searching time to a linear function from a cubic function proposed by R. Raman, M. Livny, and M. Solomon. We discuss briefl y the working principles of the method and compare the experimental results of the proposed matchmaking algorithm with those of the existing algorithm. Copyright © 2008 John Wiley & Sons, Ltd.

1. INTRODUCTION

Management of the highly variant resource pool of distributed systems has become very important. The ultimate target of any resource-sharing environment is to pool together large sets of resources and to make them available to its users. One of the essential tasks in developing discovery and composition mechanisms is matchmaking, which allows service providers to fulfi ll service consumers’ requests [1]. Matchmaking is considered as a search or discovery problem wherein service consumers attempt to locate required services in order to accomplish their tasks. Matchmaking uses a classifi ed advertisements data model to represent the principals of the system and folds the query language into the data model, allowing entities to publish queries (i.e., requirements) as attributes [2]. Among the four components of the matchmaking framework, the matchmaking algorithm is very important [3].

Constantinescu and Faltings [4] proposed a matchmaking technique that can be developed into numer-ical encoding and indexing techniques of the service directories. Raman et al. [5] proposed a matchmaking algorithm that uses iterative backtracking and computes in O(n3) time. Here we proposed a hierarchical approach to the matchmaking algorithm, in which the resources are grouped dynamically. The proposed algorithm organizes the resources as a tree according to the attributes of the machines and the tree will be searched according to the requirements of the job. The analysis of this approach shows that the pro-posed mechanism is able to incorporate new searches to resource discovery service in a reasonable time with a linear complexity.

INTERNATIONAL JOURNAL OF NETWORK MANAGEMENTInt. J. Network Mgmt 2008; 18: 427–436Published online 17 January 2008 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/nem.686

*Correspondence to: Md. Zahidul Islam, Computer Science and Engineering Discipline, Khulna University, Khulna, Bangladesh.†E-mail: [email protected]

428 MD. R. ISLAM, MD. Z. ISLAM AND N. LEYLA

Copyright © 2008 John Wiley & Sons, Ltd. Int. J. Network Mgmt 2008; 18: 427–436 DOI: 10.1002/nem

The rest of the paper is organized as follows. The next section describes some background topics in this fi eld. The existing matchmaking algorithm is discussed in Section 3. Section 4 gives a brief overview of the proposed algorithm. Experimental results are discussed in Section 5. Some limitations are stated in Section 6.

2. BACKGROUND TOPICS

This section describes background topics that will increase the readability of this paper.

2.1 The matchmaking framework

The basic concept of matchmaking is as follows: entities (machines or jobs) that provide (a machine) or require (a job) a service advertise their characteristics and requirements in classifi ed advertisements (classads). The heart of a matchmaking framework is a matchmaker, which matches classads in a manner that satisfi es the constraints specifi ed in the respective advertisements and informs the relevant entities of the match. The responsibility of the matchmaker is then to map the job to the matched machine. The matched entities establish contact, possibly negotiate further terms, and then cooperate to perform the desired service.

The matchmaking framework may be decomposed into four components as proposed by Raman et al. [5,6]:

1. the classad specifi cation, which defi nes a language for expressing characteristics and constraints, and a semantics of evaluating these attributes of the machines as well as jobs;

2. the matchmaking algorithm, which is used by the matchmaker to create matches. The matchmaking algorithm relates the contents of submitted classads and the state of the system to the matches that will be created;

3. the matchmaking protocol, which defi nes how matched entities are notifi ed and what information they are given in case of a match; and

4. the claiming protocols, which defi nes what actions the matched entities take to enable discharge of service. It also establishes the required relationship between the job and the machine to run the job.

2.2 Classifi ed advertisements

A classad is a fundamental component of matchmaking which is a highly fl exible and extensible data model that is used to represent entities in a form of advertisement [7]. The matchmaking framework provides a general resource selection mechanism based on the classad language, which allows users to describe arbitrary resource requests and resource owners to describe their resources. The classad lan-guage can be further divided into two subclasses as follows.

Resource classadsThe resource classads advertise the machine’s attributes and conditions when it will run jobs. Some typical machine attributes include the amount of memory, CPU type, CPU clock speed, and current load. The conditions for accepting and running jobs on the machine can be specifi ed so that jobs run only when there is no other activity or only during certain hours of the day or night. The owner of the resource controls when and how the resource is used. The owner can also specify special priority for users or groups.

Job classadsThe job classad specifi es the type of machine the job needs and the minimum or maximum attribute values acceptable for the job to run. For example, the job classad could specify that the job must be run

A TREE-BASED APPROACH TO MATCHMAKING ALGORITHMS 429


on the Linux operating system with Intel architecture and have 2 GB of memory available and 40 GB of disk space. Condor reviews the job classad and all machine classads to identify matches that satisfy the requirements of both. If multiple resources meet the job requirements, the resources can be ranked by a resource classad attribute. For example, of the resources that meet or exceed job requirements, select the one with the fastest clock speed or with the most memory.

2.3 Advertising

In a resource pool, resources periodically send a resource classad to the central manager. The central manager includes a process that collects the resource classads for all the resources in the pool. When jobs are submitted, another central manager process queries the collection of resource classads to see which match the job requirements. In order to notify the matchmaker about the remote resources and be able to schedule jobs to run on the resources, the resource must manually advertise by generating and sending a representative resource classad.

2.4 Matchmaking mechanism

We found the following architecture of matchmaking process proposed by Raman et al. [7]. Let us now look at the specifi c actions taken by entities, which require matchmaking services. The working principle can be easily described by Figure 1:

• Step 1: Providers (resource classad) and customers (job classad) construct classads and send them to the matchmaker. These classads must be constructed according to the advertising protocol specifi ed by the matchmaker.

• Step 2: The matchmaker then invokes a matchmaking algorithm by which matches are identifi ed.• Step 3: After the matching phase, the matchmaker invokes a matchmaking protocol to notify the two

parties that were matched and sends them the matching ads.• Step 4: The customer then contacts the server directly, using a claiming protocol to establish a working

relationship with the provider.

To perform the match, the matchmaker evaluates expressions in an environment that allows each classad to access attributes of the other: an attribute reference of the form ‘self.attribute-name’ refers to another attribute of classad containing the reference, while ‘other.attribute-name’ refers to an attribute of the other

Figure 1. Actions involved in match making



ad. A matchmaking algorithm considers a pair of ads to be incompatible unless their Constraint expres-sions both evaluate to true. The Rank attribute is used to choose among compatible matches: among provider ads matching a given customer ad, the matchmaker chooses the one with the highest Rank value (non-integer values are treated as zero), breaking ties according to the provider’s Rank value.

Existing matchmaking mechanismsIn this section we present some existing matchmaking mechanisms. A matchmaking mechanism using fuzzy logic is proposed by Kuo-Ming Chao et al. [1] and Chun-Lung Huang et al. [8]. Here the proposed fuzzy matchmaking framework represents the underlying data using fuzzy logic and semantic web technologies in order to optimize the discovery process. It also enables service consumers to employ any terms in queries in order to fi nd appropriate services. The fi rst step is to generalize contents (resource attributes) into fuzzy terms by employing a fuzzy classifi er, which represents a service or a sub-service as descriptive fuzzy terms. These terms are structured as a hierarchy via fuzzy rules and the predefi ned fuzzy sets. In the next step, a service consumer initiates a query, and the fuzzy classifi er checks and trans-forms the query into fuzzy terms. The fuzzy matchmaking mechanism is triggered. If there is a match in terms of their parts or types, fuzzy reasoning is then used to map the fuzzy request to the appropriate data of services.

Another matchmaking process based on balanced directory trees is proposed by Constantinescu and co-workers [4,9]. The proposed matchmaking mechanism consists of two logical parts: one handling the encoding of DAML-S service descriptions into a numeric form and the second handling the management and search of the directory of service descriptions. The children have assigned upon insertion an integer key which is unique and persistent at the parent and each child is also assigned an interval. The numeric encoding of a service description includes the map between sets of intervals representing properties and sets of intervals representing classes. The second logical part of the matchmaker relies on the General-ized Search Tree (GiST) for creating and maintaining the directory. Each internal node of the tree holds a key in the form of a predicate P and can hold at the maximum a predetermined number of pointers to other nodes. To search for records that satisfy a query predicate Q the paths of the tree that have keys P that satisfy Q are followed. Thus in GiST terms, any requirement for a general search tree is that the search key of a given node is a predicate that holds for all the nodes below. As such, it contains a library implementing the generic GiST algorithms and specifi c code implementing the functionality required for handling numeric service descriptions.

3. EXISTING MATCHMAKING ALGORITHM

The matchmaking algorithm proposed by Raman et al. [5,6] uses a docking paradigm to matchmaking. Each classad is divided into an ordered list of labeled ports; each port requests a ‘submatch’. A multilat-eral match occurs by docking the individual ports of distinct advertisements, thus forming tree-shaped ‘gangs’ of linked classads, as illustrated in Figure 2.

The algorithm takes on a classad as input, which is considered as the root advertisement (or root) of the required gang. The algorithm attempts to fi ll each port in the order they are listed in the ports’ attribute. Compatibility between ports of advertisements is determined by evaluating the constraints defi ned in those ports.

If no compatible candidates are found for a particular port of the root advertisement, the correspond-ing port cannot be fi lled and the algorithm backtracks to the previously fi lled port and attempts to refi ll that port with another candidate. If the refi ll operation succeeds, the algorithm continues to the suc-ceeding port; otherwise it backtracks yet again to the preceding port. As the terminating condition, each port maintains a history set of candidate ports with which docking attempts have already been made. During the refi ll operation, only candidate ports that do not exist in the history set are considered. The algorithm considers that all the previous ports to the one under consideration are docked and all ports that succeed the port are undocked.



The non-root advertisements of a gang may themselves have multiple ports, so to complete the search the above algorithm must be applied recursively to each advertisement included in the gang. A funda-mental difference between non-root advertisements and the root advertisement is that one of the ports of a non-root advertisement must serve as a ‘parent link’ in the gang tree. If the constraints between the parent link port and its counterpart in the parent advertisement is evaluated to true or false, the compat-ibility of the parent link port is known. Otherwise, the port is marked as a ‘tentative yes’ and used as the parent link. If the parent link is compatible, the port’s parent link status is changed from ‘tentative yes’ to a ‘positive yes’, after which the algorithm proceeds as usual.

The best case occurred when workload, which consists of a given number of jobs, each of which is compatible with every machine and every license, and machines and licenses are perfectly interleaved.

The worst-case performance of the algorithm shown in research work by Raman [6] may be easily derived and expressed in terms of the number of expression evaluations performed. Given r requests (i.e., roots), n machine advertisements, no licenses, and the assumption that the machine port of each root is compatible (on average) with k of the n machines, the number of expression evaluations performed by the naïve algorithm is r (n + kn). Thus, the algorithm computes in O(n3) in the worst case, when both r and k are equal to n.

4. MATCHMAKING ALGORITHM WITH TREE SEARCHING

In our proposed algorithm, the matchmaker contains a dynamically allocated tree based on the attributes described in machine classads. A new node will be created for a distinct request from separate machines. If a previously allocated machine sends a new request with changed attributes then the existing node for that particular machine will be deleted and a new node will be inserted into the existing tree. The formulation of the tree will be as follows: the tree will contain a dummy root containing a left subtree of busy machines and a right subtree of idle machines. To be more convenient let us consider the example in a machine classad: if an attribute is Activity = ‘Idle’, it will be inserted into the right subtree; otherwise it will be added to the left subtree.

The left subtree will be maintained as a complete binary tree. The nodes will be added level by level without any special effort. When a machine becomes free, completing its workload, the machine will be removed from the left subtree and will be inserted into the right subtree [3].

In the right subtree, each level of the subtree will check a specifi c condition of the classifi ed advertise-ment of the machine. If the machine is ‘idle’ we move to the right subtree from the dummy root. Now the fi rst level will check the architecture of the machine, e.g., Intel, IBM, Motorola. The next level will check the operating system of the machine. At the leaf node we know the architecture and the operating system of the machine. Each leaf node will contain an identity number of a table. The corresponding

Figure 2. The gangmatching operation



table will contain the necessary information about the machine. The table will be sorted according to the rank. For example, let us consider 10 machines with architecture Intel and operating system Linux in the grid. The attributes of these 10 machines will appear together in the corresponding table if all the machines are idle. Now if a job requires a machine with architecture Intel and operating system Solaris then the matchmaker has to search only that table.

4.1 Analysis of the matchmaking algorithm with tree searching

For simplicity we have assumed that all the machines have two common properties. Thus in Figure 3 there are two levels in the right subtree. In a real case, for a large and heterogeneous grid environment properties obviously vary. Thus the number of levels (as well as number of nodes) will increase in the right subtree accordingly.

To analyze the performance of the system let us consider a simple example, which will give a rough estimate. Let us think of a system consisting of 1000 resources. Among the resources let 550 machines be of Intel architecture, among which 200 machines run Windows, 200 machines run Linux and 150 machines run Solaris. The other 450 machines are of IBM architecture, among which 150 machines run Windows, 150 machines run Linux and 150 machines run Solaris. According to Figure 3, the corresponding table for Intel machines running Solaris will contain 150 entries at most, if all the machines are idle.

Now a job classad arrives to the matchmaker, requiring an Intel machine running Solaris. To get to the required table it requires three searches. For the best case, if the required machine has the highest rank then only 3(path length) + 1(table entry) = 4 searches are required. For the worst case, if the required machine has the lowest rank then only 3(path length) + 150(table entry) = 153 searches are required.

4.2 Validity and complexity of the proposed algorithm

Here the searching procedure will be based on Branch-and-Bound strategy. A pictorial view of the search tree at an instant is depicted in Figure 4. We will use a FIFO list for holding the live nodes. These live nodes will be E-node from the queue one by one. At each cycle the node at the head of the queue is removed and expanded, and its children are placed at the end of the queue [3].

Initially the queue is empty. Hence there will be only one live node that is the root of the tree. Now the E-node is node 1, i.e., the root. Exploring the root we get two child nodes (2,3), which are inserted into the queue. Current live nodes are 2 and 3. The next E-node is node 2. Node 2 is immediately killed and we backtrack to the parent of node 2, i.e., the root. The next node from the queue becomes the E-node. The searching algorithm is depicted in Algorithm 1.

Figure 3. A schematic representation of the tree



Algorithm 1: Searching algorithm for Matchmaking using tree.Input: Job classad specifying attributes required by the job.Output: Pointer to the table containing the machine.

This algorithm uses a queue, Q, to keep the nodes of the search tree. R is the root of the tree. The front element of the queue is assigned to a variable, x. Then condition associated with the node holding by x is matched with the condition to be checked at each level. The function MATCH(x, Coni) returns true if the condition associated with the node holding by x matches the condition Coni (the condition associated with level i).

1. INSERT(R) into Q. 2. x ← FRONT(Q). 3. If x is a decision node, then: 4. Return pointer to the destined table. 5. [End of If structure.] 6. [Match the current node with the conditions at each level.] 7. If MATCH (x, Coni) = false, then: 8. DELETE(Q). 9. Else:10. Repeat for i = 1 to n // n is the number of children of the node holding by x11. INSERT(ci) into Q. // ci ∈ cie children of the node holding by x.12. [End of for loop]13. [End of If structure.]14. If EMPTY(Q), then:15. Return “not found”.16. Algorithm terminates.17. [End of If structure.]18. Go to step 2.

The algorithm accepts a request in the form of a job classad containing the characteristics of the resources, which are required by the job. The tree is searched based on the conditions specifi ed in the job classad. If the condition of job classad does not match the node of the tree (line 7), the element is deleted from the queue (line 8). On the other hand, if the condition matches then we insert the children of the current

Figure 4. The search tree at an instant



node into the queue, which is indicated in line 10. The matching procedure is performed by the segment of lines 7–13. If the queue becomes empty this indicates that the required resource is not available in the current resource pool; then the algorithm terminates, returning a ‘not found’ message by the condi-tion of line 14. Each time, after matching the conditions the algorithm returns to line 2. In line 2, the front element of the queue is taken for consideration. In line 3, the node under consideration is checked whether it is a decision node or not. If the current node is the decision node then the algorithm terminates, returning a pointer to the destined table; otherwise the rest of the procedure is executed as above. Hence the proposed algorithm has two termination points: line 3 and line 14. If the resource is found then the algorithm terminates, indicating the destined table of the resource by line 3. Otherwise the resource is not available and the algorithm terminates by line 14.

Here, according to our proposed method time complexity depends on three factors:

(a) the time required to construct the tree;(b) the time required to reach the desired table;(c) the time required to search the table.

We assume each node can be generated in constant time. We also note that for the branching condition only one node will be explored at each level. To be more specifi c, the node that matches the required condition will be explored further.

Let us consider T to be the state space tree with n nodes and d levels. Then the time required to reach the desired table is given by O(n + d). If the corresponding table contains r entries then it will require a maximum r searches to fi nd the target machine. Thus we may conclude that the total complexity will be O(n + d + r).

5. EXPERIMENTAL RESULTS AND DISCUSSION

To evaluate the performance of the various matchmaking algorithms we devised the following simula-tion study. The experimental result of the gangmatching algorithm proposed by Raman et al. [5] and our proposed algorithm for various numbers of machines and jobs are shown. Figure 5 illustrates the elapsed time performance of the naive fi xed-order and indexed algorithm.

To simulate the proposed algorithm in a heterogeneous environment the following considerations are made: there are always 40% of the machines busy and 60% idle. Among the 60% of the idle machines 30% are Intel, 15% IBM and 15% Motorola. Each of the architectures combines 15% Windows operating system, 8% Linux operating system and 7% Solaris operating system.

At level three each node leads to a separate table, which contains attributes of the respective machines. The best time for the proposed algorithm occurs when the machine is found as the fi rst entry in the table

Figure 5. Comparison between naïve and indexed algorithm



(which is always 4) and the worst case occurs when the required machine is found in the last entry of the table. Then the required time for constructing and searching the tree is shown in Table 1.

From the simulated result for different numbers of machines a comparison between the response time (only the average case for each method is considered) for the Naïve, Indexed and Proposed algorithm is shown in Table 2. The performances of the three approaches is shown in Figure 6.

To compare the performance of the three algorithms the parameters are equalized. These comparisons illustrate the fact that initially elapsed time for the proposed algorithm is higher than those of the Naïve and Indexed algorithm but with the increase in number of machines the proposed algorithm shows better performance than the existing algorithms.

No of machines

Idle (60%) Busy (40%)

Construction time

Search time Total time

Best case Worst case

Average case

500 300 200 60 4 78 41 1011000 600 400 120 4 153 78.5 198.51500 900 600 180 4 228 116 2962000 1200 800 240 4 303 153.5 393.52500 1500 1000 300 4 378 191 4913000 1800 1200 360 4 453 228.5 588.53500 2100 1400 420 4 528 266 6464000 2400 1600 480 4 603 303.5 783.5

Table 1. Time calculation for constructing and searching the tree

No. of machines Naïve algorithm Indexed algorithm Proposed algorithm

500 10.5 10.5 1011000 20.5 30.5 198.51500 35.5 80.5 2962000 130.5 130.5 393.52500 210.5 210.5 4913000 280.5 315.5 588.53500 500.5 410.5 6464000 780.5 580.5 783.5

Table 2. Time comparison between naïve, indexed and proposed algorithm

Figure 6. Comparison between naïve, indexed and proposed algorithm



6. CONCLUSION

This paper focuses on resource discovery in distributed environments. We keep the total structure of the resource management system of grid environment unchanged. The main focus of this presentation is to reduce the searching complexity of resource discovery and the whole system is formulated in a system-atic manner. We introduced a new searching mechanism based on a dynamically allocated tree. The left subtree is a complete binary tree that consists of busy machines, but the right subtree is not a complete binary tree. Hence, the searching complexity increases slightly but still the mechanism is promising and effi cient enough to work with for better performance. Our main concern is to reduce the time complexity and to organize the resources to adopt with the highly dynamic nature of the distributed environment. The memory requirement of the algorithm is not considered. A shortcoming of our current formulation is that the number of nodes may be very large for variation in the types of attributes of the machines.

REFERENCES

1. Chao K-M, Younas M, Lo C-C, Tan T-H. Fuzzy matchmaking for web services. In Proceedings of the 19th IEEE International Conference on Advanced Information Networking and Applications (AINA’05), 2005.

2. Abbas A. Grid Computing: A Practical Guide to Technology and Applications (1st edn). Charles River Media: Boston, MA, 2004.

3. Islam MR, Islam MZ, Leyla N. A matchmaking algorithm for resource discovery on grid. In Proceedings of the International Conference on Information and Communication Technology (ICICT), Dhaka, Bangladesh, March 2007.

4. Constantinescu I, Faltings B. Effi cient matchmaking and directory services. In Proceedings of the IEEE/WIC Inter-national Conference on Web Intelligence (WI’03), 2003.

5. Raman R, Livny M, Solomon M. Policy driven heterogeneous resource co-allocation with gangmatching. In Pro-ceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC’03), 2003.

6. Raman R. Matchmaking frameworks for distributed resource management. PhD thesis, University of Wiscon-sin–Madison, 2001.

7. Raman R, Livny M, Solomon M. Matchmaking: distributed resource management for high-throughput comput-ing. In Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing (HPDC7), July 1998.

8. Huang C-L, Chao K-M, Lo C-C. A moderated fuzzy matchmaking for web services. In Proceedings of the 19th IEEE International Conference on Advanced Information Networking and Applications (AINA’05), 2005.

9. Constantinescu I, Binder W, Faltings B. Flexible and effi cient matchmaking and ranking in service directories. In Proceedings of IEEE International Conference on Web Services (ICWS’05), 2005.

AUTHORS’ BIOGRAPHIES

Md. Rafi qul Islam received M.Sc in Engineering (Computers) from Azerbaijan Polytechnic Institute in 1987 and PhD in Computer Science from Universiti Teknologi Malaysia (UTM) in 1999. Currently he is a Professor and head of Computer Science and Engineering Discipline and Dean of Science, Engineering and Technology School of Khulna University, Bangladesh. He had published several papers in journals and conference proceedings. His areas of inter-est include design and analysis of algorithms in the area of external sorting, data compression, bio-informatics, grid computing etc.

Md. Zahidul Islam received B.Sc in Computer Science and Engineering (CSE) from Khulna University, Bangladesh in 2006. Currently he is a Part time lecturer of Computer Science and Engineering Discipline of Khulna University, Bangladesh. His areas of interest include Networking, Database systems, distributed system, computer security and genetic algorithm.

Nazia Leyla received B.Sc in Computer Science and Engineering (CSE) from Khulna University, Bangladesh in 2006. Currently she is a lecturer in the Department of Computer Science and Engineering, Darul Ihsan University, Bangladesh. Her areas of interest include distributed systems, computer architecture and digital image processing and operating system.

a tree-based approach to matchmaking algorithms for resource discovery

Documents