distributed particle simulation method on adaptive collaborative system

9
Future Generation Computer Systems 18 (2001) 79–87 Distributed particle simulation method on adaptive collaborative system Yudong Sun , Zhengyu Liang, Cho-Li Wang Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong, China Abstract This paper presents a distributed N -body method based on an adaptive collaborative system model. The collaborative system is formed by the distributed objects on a distributed system. The system can be reconfigured during the computation to fully utilize the computing power of the networked hosts. The method is implemented in Java and RMI to support distributed computing in heterogeneous environment. A distributed tree structure is designed for communication-efficient computation of N -body method. The performance test shows satisfactory speedup and portability of the method on both homogeneous and heterogeneous clusters. The collaborative system model can be used in various applications and it is expandable to wide-area environment. © 2001 Elsevier Science B.V. All rights reserved. Keywords: N -body; Distributed object; Collaborative system; Java 1. Introduction N -body problems study the evolution of physical system with numerous bodies (particles) under the cu- mulative force influence on every body from all other bodies. The force influence causes continuous body movement. Many systems in astrophysics, plasma physics, molecular dynamics, fluid dynamics, radios- ity calculations in computer graphics, etc. exhibit this behavior [11]. The common feature of these systems is the large degree of precision in the information requirements of the bodies in a physical domain. A body requires gradually rough information in less fre- quency from parts of the domain that are farther away. The body distribution in physical domain, and there- fore the domain parts and the influences from those This research was supported by Hong Kong Research Grants Council (RGC) grant 10201696 and The University of Hong Kong CRCG grant 10200544. Corresponding author. parts are continuously changing during the evolution of the system. It needs to iterate the computation of the force influences on each body. Thus N -body problems are computation-intensive problems. The methods for solving N -body problems [1,5,6] are usually hierarchical methods in which a tree struc- ture is applied to represent the body distribution in a physical domain. The tree is constructed based on do- main decomposition. The force influences on the bod- ies are computed by traversing the tree. Barnes–Hut method [1,11] is a typical hierarchical N -body algo- rithm. In this method a physical domain is recursively divided into sub-spaces until there is only one body (or none) in each sub-space. The domain decomposi- tion is in accordance with the body distribution in the space. Fig. 1a gives an example on domain decom- position in 2D space. Then, a quadtree for 2D space is constructed based on the domain decomposition as Fig. 1b shows. For 3D space, an octree will be built. After the force influence on each body from all other bodies is computed by traversing the quadtree, the 0167-739X/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved. PII:S0167-739X(00)00077-7

Upload: yudong-sun

Post on 02-Jul-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed particle simulation method on adaptive collaborative system

Future Generation Computer Systems 18 (2001) 79–87

Distributed particle simulation method on adaptivecollaborative system�

Yudong Sun∗, Zhengyu Liang, Cho-Li WangDepartment of Computer Science and Information Systems, The University of Hong Kong, Hong Kong, China

Abstract

This paper presents a distributed N -body method based on an adaptive collaborative system model. The collaborativesystem is formed by the distributed objects on a distributed system. The system can be reconfigured during the computation tofully utilize the computing power of the networked hosts. The method is implemented in Java and RMI to support distributedcomputing in heterogeneous environment. A distributed tree structure is designed for communication-efficient computationof N -body method. The performance test shows satisfactory speedup and portability of the method on both homogeneous andheterogeneous clusters. The collaborative system model can be used in various applications and it is expandable to wide-areaenvironment. © 2001 Elsevier Science B.V. All rights reserved.

Keywords: N -body; Distributed object; Collaborative system; Java

1. Introduction

N -body problems study the evolution of physicalsystem with numerous bodies (particles) under the cu-mulative force influence on every body from all otherbodies. The force influence causes continuous bodymovement. Many systems in astrophysics, plasmaphysics, molecular dynamics, fluid dynamics, radios-ity calculations in computer graphics, etc. exhibit thisbehavior [11]. The common feature of these systemsis the large degree of precision in the informationrequirements of the bodies in a physical domain. Abody requires gradually rough information in less fre-quency from parts of the domain that are farther away.The body distribution in physical domain, and there-fore the domain parts and the influences from those

� This research was supported by Hong Kong Research GrantsCouncil (RGC) grant 10201696 and The University of Hong KongCRCG grant 10200544.

∗ Corresponding author.

parts are continuously changing during the evolutionof the system. It needs to iterate the computationof the force influences on each body. Thus N -bodyproblems are computation-intensive problems.

The methods for solving N -body problems [1,5,6]are usually hierarchical methods in which a tree struc-ture is applied to represent the body distribution in aphysical domain. The tree is constructed based on do-main decomposition. The force influences on the bod-ies are computed by traversing the tree. Barnes–Hutmethod [1,11] is a typical hierarchical N -body algo-rithm. In this method a physical domain is recursivelydivided into sub-spaces until there is only one body(or none) in each sub-space. The domain decomposi-tion is in accordance with the body distribution in thespace. Fig. 1a gives an example on domain decom-position in 2D space. Then, a quadtree for 2D spaceis constructed based on the domain decomposition asFig. 1b shows. For 3D space, an octree will be built.After the force influence on each body from all otherbodies is computed by traversing the quadtree, the

0167-739X/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.PII: S0167 -739X(00 )00077 -7

Page 2: Distributed particle simulation method on adaptive collaborative system

80 Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87

Fig. 1. Barnes–Hut tree for 2D space: (a) Body distribution and domain decomposition; (b) Quad Barnes–Hut tree.

bodies move to their new positions under the force im-pact. This is one simulation step. The tree should bereconstructed at the beginning of each simulation stepto reflect the updated body distribution. The particlesimulation proceeds by repeating the tree constructionand force calculation.

Parallel N -body methods [4,7,10,11] can be de-rived from the sequential Barnes–Hut method. Forexample, Singh presented a parallel N -body methodin shared-address-space model [11,12]. Multiple pro-cesses cooperatively build a global Barnes–Hut treein shared memory segment. These processes carry outthe force computation by concurrently traversing onthe global tree. The shared-memory model is onlyapplicable in shared-memory systems like SMP ma-chines in which data sharing can be realized by meansof shared memory access. In distributed-memory sys-tems, however, message-passing is the general com-munication approach [4]. If the global tree approachis used for solving N -body problems in distributedsystem, each process has to duplicate the global treefor its local access. The propagation of the global treeinevitably involves heavy communication. One solu-tion to reduce the communication overhead is to de-compose the global tree structure into subtrees [9].Each processor builds one of the subtrees. In this case,there is no need to build the global tree. With theproper sub-domain decomposition and data transfer-ring schemes, inter-process communication can be cutdown in the subtree scheme.

To solve the N -body problem on distributed sys-tem, we have designed a distributed object-oriented

(DOO) N -body method based on collaborative systemmodel. A collaborative system is formed by a groupof distributed objects. One of the objects works as thecompute coordinator. It initiates the computing proce-dure by invoking the objects on remote hosts and dis-patching computing tasks to these objects. Afterwardsit also collects the results of the computation. Otherobjects are called compute engines. Those objects ac-cept and process the computing tasks. The N -bodymethod in this paper is an improvement to the staticmethod discussed in [13]. The method here is adap-tive to the states of the underlying hosts. It allows thecollaborative system to be reconfigured. The comput-ing tasks can be migrated from an overloaded host toa lightly-loaded one to enhance the computing effi-ciency. A distributed tree structure is constructed tosupport the communication-efficient computing of theN -body method in distributed system.

With the target to run on heterogeneous plat-forms, the N -body method is implemented in Javaand RMI (remote method invocation) [3]. Java is aplatform-independent language that enables an ap-plication to directly run on various platforms. Theobject-oriented feature of Java provides the founda-tion to the implementation of DOO method. RMI is acomponent of Java API, which is a Java-based inter-face to support distributed object-oriented computing.It supplies inter-object communication facilities thatallows an object on one host to make method call tothe objects on other hosts. RMI provides a registryfor remote object reference. Distributed objects canregister themselves, locate each other and invoke the

Page 3: Distributed particle simulation method on adaptive collaborative system

Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87 81

methods on remote objects through the registry [3].These mechanisms in RMI make the collaborativesystem to be reconfigurable. When there are changesoccurring in system resources (e.g., the available hostsor their workloads are altered), the compute coordina-tor can select new hosts to join the computation anddiscard the stale ones. The computing tasks on stalehosts would be migrated to the newly joined hosts. Thedistributed object-oriented method in Java and RMIhas high portability on heterogeneous platforms andflexibility in dynamically reconfigured environment.

In the following text, Section 2 illustrates the col-laborative system model and Section 3 describes thedistributed object-oriented N -body method based onthe collaborative system model. Section 4 reports theperformance of the N -body method on both homo-geneous and heterogeneous clusters. Related work iscovered in Section 5. The conclusions are summarizedin Section 6.

2. Collaborative system

The N -body method is executed by a group of dis-tributed objects on networked hosts. A collaborativesystem is formed by the distributed objects at the be-ginning of the computation. Such a system can be re-configured during the computation in response to thechanges in the states of the hosts.

Fig. 2. The collaborative system built on P hosts.

2.1. System establishment

Fig. 2 shows the structure of a collaborative system.The compute coordinator is the first object created onone of the hosts to activate the computation. It findsthe available hosts in cluster based on the informationsupplied by ClusterProbe — a Java-based tool formonitoring a large cluster, developed by the HKUSystem Research Group [8]. It provides the servicesfor monitoring and managing all hosts in the cluster.One of its services is reporting the available hostsand their states, e.g., the workload. The compute co-ordinator selects a number of hosts to take part in theexecution of an application based on the system infor-mation supplied by ClusterProbe. The hosts with lowworkload have high priority to be chosen. The com-pute coordinator starts the computing objects on thesehosts. The computing objects register themselves toRMI registry. After the registration, an object canobtain the references to all remote objects by look-ing up the registry for interaction. So a collaborativesystem has been set up. It is then ready to execute anapplication.

After the collaborative system has been established,the compute coordinator divides the application intocomputing tasks and allocates the tasks to remote ob-jects (compute engines), one task per compute en-gine, by the way of remote method invocation. Thecompute coordinator is also responsible to manage

Page 4: Distributed particle simulation method on adaptive collaborative system

82 Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87

the collaborative system. It coordinates the computingprocedures, which are application-dependent, on allcompute engines. The communications between thecompute engines are via method invocation interfaceof RMI. In the N -body method described in Section 3,the compute coordinator decomposes the physical do-main and partitions N bodies into subsets. It assignsthe body subsets to remote compute engines. After theinitiation of the computation, the compute coordinatoralso works as a compute engine to process one of thecomputing tasks.

2.2. System reconfiguration

The configuration of a distributed system can bedynamically altered. Some hosts may be removedfrom and new hosts may be added to the system.As the hosts in a distributed system are shared bymultiple users or by multiple jobs, the states of thehosts such as the workload may vary from time totime. Only the idle or lightly-loaded hosts are theproper candidates to join the collaborative system.ClusterProbe constantly detects the states of the hosts.The compute coordinator can reorganize the collab-orative system according to the current status of theunderlying distributed system. It retrieves the lat-est status information from ClusterProbe and selectsthe lightly-loaded hosts to replace the overloadedones in collaborative system. Computing objects arecreated on the newly-joined hosts. These objectsbecome the compute engines by registering them-selves to RMI registry. The computing tasks on thereplaced hosts are directly sent to the correspondentnew compute engines on the substitute hosts. Thenthe stale compute engines stop to work and the newcompute engines take the place. Therefore the collab-orative system has been reconfigured and the com-putation goes on it. The system reconfiguration canimprove the overall performance of the collaborativesystem.

In the N -body method, the particle simulation pro-ceeds by iterating the simulation step. It is a properopportunity to reconfigure the collaborative system atthe end of each simulation step. At that moment, thecompute coordinator checks the workload of the un-derlying hosts. If any host is overloaded, the com-pute coordinator will try to find a lightly-loaded hostin the cluster for replacement. If a substitute host is

available, the overloaded host will be replaced. Thecompute coordinator manages to create a compute en-gine on the new host. The new compute engine on thenew host gets the body set from the replaced host, sothat it takes over the computing task from the over-loaded host. The host substitution may take place onseveral overloaded hosts.

3. DOO-based N-body method

The distributed object-oriented N -body methodoriginated from the Barnes–Hut application inSPLASH2 suite [14]. In our DOO method, a dis-tributed tree structure is designed to reduce the com-munication overhead caused by global data sharingamong the compute engines.

3.1. Distributed tree structure

The distributed tree structure is related to the do-main partitioning. If there are totally N bodies andP hosts involved in the computation, the physicalspace should be partitioned into P sub-domains andthus the N bodies into P subsets. Therefore the dis-tributed tree structure contains P subtrees, one sub-tree for one sub-domain. The domain partitioning isaccomplished by decomposing the global Barnes–Huttree which includes all bodies in a physical space.The partitioning should be based on the spatial lo-cality of the bodies in sub-domains. The number ofbodies in each sub-domain should be approximatelybalanced. If the method is to be executed on fourhosts, the physical space in Fig. 1a will be parti-tioned into four sub-domains as Fig. 3a shows. Eachcompute engine will get a subset of the bodies andbuild a local subtree for the sub-domain. So the dis-tributed tree structure in Fig. 3b is composed of foursubtrees.

The force computation for each body, however,requires the data from all other sub-domains. A com-pute engine needs to access not only its local subtreebut also remote subtrees. There are two extreme ap-proaches to share the data of other sub-domains. Thefirst approach is complete tree approach. It prop-agates N bodies to all compute engines and eachcompute engine builds a complete Barnes–Hut tree.So the computation on each engine can be locally

Page 5: Distributed particle simulation method on adaptive collaborative system

Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87 83

Fig. 3. A distributed Barnes–Hut tree for 2D space on four compute engines: (a) sub-domains after partitioning; (b) distributed Barnes–Huttree and partial subtrees.

fulfilled. The second approach is sending the bodiesto remote compute engines to access the subtreesthere. The force contributions from those subtreesare computed on the remote compute engines andthe results will be sent back to the home computeengine of the bodies. Obviously both of the two ap-proaches will bring about prohibitive communicationoverhead.

A compromise is made in our N -body method togain partial sharing of the subtrees, i.e., broadcastingparts of subtrees to other compute engines. After thesubtree has been constructed, every compute enginebuilds a partial copy of the subtree called partial sub-tree shown in Fig. 3b. The partial subtree containsthe top levels of the local subtree. The partial sub-tree will be broadcast to all other compute engines fortheir local use. Due to the partial duplication of thesubtrees, most of the force computation can be com-pleted on local compute engine. Only when a bodyrequires the data in the lower levels of a remote sub-tree, it will be sent to the remote compute engine. Inthis case, the force contribution from that sub-domainis calculated on the remote compute engine by ac-cessing the full subtree there. The result is sent backto the body’s home compute engine. The partial sub-tree scheme can effectively reduce the communicationoverhead in solving N -body problems on distributedsystem. The benefit will be verified by the performancetests in Section 4.

3.2. Computing round

In the N -body method, the particle simulation pro-cedure begins after the subsets of the bodies have beenassigned to the remote compute engines. The simu-lation procedure advances by iterating the followingcomputing round on all compute engines. A comput-ing round fulfills one simulation step. The computecoordinator makes the synchronization of the comput-ing round on all compute engines. A computing roundconsists of four sub-steps:

1. Subtree construction and propagation. Each com-pute engine builds a subtree and a partial subtreefor the sub-domain assigned to it. The partial sub-tree is broadcast to other compute engines. So eachcompute engine has got all partial subtrees.

2. Force calculation. Each compute engine calculatesthe force exerting on every body in the sub-domainby traversing the local subtree and all partial sub-trees. The bodies requiring to access remote sub-trees are sent to the remote compute engines. Inthis case, the force contributions from the remotesub-domains are computed on remote compute en-gines by traversing the full subtrees there. Theresults are sent back to the bodies’ home com-pute engines. Then the force contributions from allsub-domains are summed to get the total force in-fluence on a body. Finally the state of each body

Page 6: Distributed particle simulation method on adaptive collaborative system

84 Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87

is updated as the effect of the force influence, e.g.the altered velocity and the new position of a body.

3. Body redistribution. At the beginning of the par-ticle simulation, each compute engine has beenassigned a sub-domain. So the boundary of eachsubdomain is known. At the end of substep-2, eachbody moves to its new position. Some bodies maycross the scope of local sub-domain and enter ad-jacent sub-domains. It is required to check the newposition of every body to determine in which sub-domain the new position locates. If it is beyondthe local subdomain, the body should be transmit-ted to the destination compute engines where thenew position falls, so that the body locality in thesub-domain on each compute engine can be main-tained. As the bodies move ahead in small pace,there are usually only a few bodies transmitted toother compute engines in each computing round.

4. Load inspection and system reconfiguration (if nec-essary). At the end of each computing round, thecompute coordinator checks the hosts’ workloadby referring to the state information supplied byClusterProbe. If the workload on a host exceedsthe pre-defined threshold, the host is considered tobe overloaded. The coordinator will try to find an-other host in the cluster to replace the overloadedone. The substitute host should have the workloadbelow the threshold and it has not joined the col-laborative system yet. If the substitute is available,a computing engine will be created on it and thenew compute engine registers itself to the collab-orative system. The computing task on the over-loaded host is then migrated to the new host. In the

Fig. 4. Speedup of two N -body methods (distributed tree scheme and complete tree approach) on the homogeneous cluster of PCs, underdifferent problems size N .

N -body method, the computing task is representedby the bodies in a sub-domain. Therefore the bodyset on the overloaded host is sent to the new host.Then the compute engine on the old host is ter-minated. The host replacement may take place onseveral overloaded hosts. So the collaborative sys-tem is reconfigured with the newly-joined computeengines and all remaining compute engines. Afterthat, the simulation procedure continues to start thenext computing round.

4. Performance evaluation

The N -body method is implemented in Java andRMI. It has been tested on two clusters. One is a ho-mogeneous cluster, and the other is a heterogeneouscluster. The N -body application in the test simulatesthree-dimensional motion of the particles in Plum-mer model. Speedup can be achieved on both of theclusters. The platform-independent feature of Javaprogramming supports the N -body application to beexecutable on the heterogeneous cluster.

4.1. Tests on homogeneous cluster

The homogeneous cluster consists of 10 PentiumII450 MHz PCs running Linux 2.0.36, connected by100 Mbps fast ethernet switch.

To manifest the lower communication overheadcontributed by the distributed tree structure, we im-plemented another straightforward approach as a con-trast. It is the complete tree approach: the compute

Page 7: Distributed particle simulation method on adaptive collaborative system

Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87 85

Fig. 5. Execution time breakdowns of two N -body methods.

coordinator starts a computing round by broadcastingall of the bodies to every compute engine, where acomplete tree of N bodies is built. It is the Barnes–Huttree as shown in Fig. 1b. In the complete tree ap-proach, each compute engine still computes the forceson one subset of bodies, but the force calculation canbe totally finished within the local compute engine.No remote tree access is required. Then the newstates of all bodies should be broadcast to all com-pute engines again for the next computing round. Thedata communication overhead is much higher in thecomplete tree approach than that in distributed treescheme.

Fig. 4 displays the speedup of DOO N -body methodwith distributed tree structure and the complete treeapproach. For the distributed tree method, speedup canbe obtained for all test cases. Larger speedup can beachieved on larger problem size. On the other hand,broadcasting N bodies inhibits the speedup of thecomplete tree approach. The speedup occurs merelybelow three hosts. The execution slows down whenmore processors are used.

The time breakdowns in Fig. 5 show the commu-nication efficiency of the distributed tree structure. Indistributed tree method, the communication occupiesa small proportion of the execution time. But in com-plete tree approach, most of the execution time is spentin communication above three hosts. The test results

verify that the distributed tree is an appropriate datastructure for the distributed N -body method.

4.2. Tests on heterogeneous cluster

The N -body method with distributed tree structurehas also been tested on a wider-range heterogeneouscluster. The cluster consisted of six hosts:

• two PentiumII 450 MHz PCs, running Linux 2.0.36;• two Sun UltraSPARC-1 workstations, running So-

laris 2.6;• two processors in an SGI PowerChallenge SMP,

running IRIX 6.2.

This is a heterogeneous system with three types ofplatforms. The hosts locate at different sites. Two Pen-tiumII PCs lie in the local cluster as in the former test.Two Sun UltraSPARC-1 workstations belong to a re-mote cluster of Sun workstations, and the SGI Pow-erChallenge is a remote server. The hosts are linkedtogether across HKU campus network. The N -bodyapplication in Java and RMI is able to run on theseplatforms without any modification. The applicationruns on 1–6 processors in the order of two PentiumIIPCs, two Sun UltraSPARC-1 workstations, and twoprocessors in SGI PowerChallenge. Fig. 6 shows thespeedup of the distributed tree method on the hetero-geneous cluster. The sequential time is measured on

Page 8: Distributed particle simulation method on adaptive collaborative system

86 Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87

Fig. 6. Speedup of the N -body method (distributed tree structure)on the heterogeneous cluster.

one of the PentiumII PCs. Speedup can also be ob-served on the heterogeneous cluster.

The execution time breakdown on the heteroge-neous cluster shown in Fig. 7 is similar to that onthe homogeneous cluster. The communication latencyis a bit longer due to the heavy communication traf-fic in campus network. However, the communicationtime is still shorter than the computation time. The re-sult has further verified the communication efficiencyof the distributed tree structure. It has also confirmedthat the collaborative system model is an appropriateframework for distributed computing.

It is a fact that the execution time of Java applica-tion is not satisfactory due to the inefficiency of Javainterpreter. Nevertheless, Java possesses the attractiveproperties as strong platform-independent portabil-ity and flexibility that are demanded in distributedcomputing. In addition, the performance of Java ap-plication is getting improved with the upgrading Javasystem.

Fig. 7. Execution time breakdown of the N -body method (dis-tributed tree structure) on the heterogeneous cluster.

5. Related work

There are other instances of parallel N -body meth-ods based on Barnes–Hut method. Three examples arediscussed as follows.

Grama [4] presented a parallel implementation ofBarnes–Hut method on message passing computer. Inhis method, a 2D physical domain was partitioned intosubdomains. The particles in one subdomain were as-signed to one processor. A local tree was constructedper processor and then all local trees were merged toform a global tree. All nodes above a certain cut-offdepth in the global tree were broadcast to all pro-cessors. Grama’s method for 2D space was run ona 256-processor nCUBE2 parallel computer. In ourDOO method, however, there is no global tree to bebuilt but the top levels of the subtrees, i.e., partial sub-trees, are broadcast. Our scheme can save the cost ofbuilding a global tree and increase the ratio of forcecomputation on local compute engine. Our method ismore appropriate in distributed environment.

The other two methods are the applications builton a general-purpose data structure layer and imple-mented in C++.

The object-oriented support for adaptive methods in[2] provided a global data structure PTREE that wasimplemented as a collection of local data structures ondistributed-memory machine. The data structure wasdistributed to multiple processors where computationswere carried out and the partial results were merged.The global data structure could support different ap-plications. A gravitational N -body simulation wasimplemented on the global data structure. The appli-cation was tested on a 64-node iPSC/860 machine.

Liu [9] described an implementation of parallelC++ N -body framework that could support variousscientific simulations which involved tree structures.The framework consisted of three layers: (1) Generictree layer supported simple tree construction and ma-nipulation methods, and system programmers couldbuild special libraries using classes in this layer. (2)Barnes–Hut tree layer supported tree operations re-quired in most of the N -body tree algorithms. (3) Ap-plication layer implemented a gravitational N -bodyapplication upon the BH-tree layer. The communica-tion library was implemented in MPI. The applicationwas executed on a cluster of four UltraSPARC work-stations connected by a fast Ethernet network.

Page 9: Distributed particle simulation method on adaptive collaborative system

Y. Sun et al. / Future Generation Computer Systems 18 (2001) 79–87 87

On the contrary, our method is based on a dis-tributed tree structure dedicated to DOO N -bodymethod. The method should be more efficient fordistributed computing. Furthermore, our N -bodyapplication is implemented in Java and RMI. It isexecutable on heterogeneous system and adaptive tosystem reconfiguration. Thus, it is a more flexiblemethod running on various platforms.

6. Conclusions

We have discussed a distributed object-oriented(DOO) method for solving N -body problems ondistributed systems. Distributed objects on differ-ent hosts form a collaborative system to executethe application. The method is implemented in Javaand RMI interface to support distributed comput-ing, especially on heterogeneous platforms. By RMIregistry mechanism for distributed objects, the col-laborative system can be dynamically reconfiguredduring the computation to fully utilize the comput-ing resources in a cluster. The collaborative sys-tem model, with the architectural-neutral featuresof Java and RMI, has high flexibility, portabilityand adaptability on heterogeneous platforms. Thecollaborative system is also feasible for other appli-cations. It is a promising framework for distributedcomputing in distributed environment. The frame-work can also be expanded to wide-area environ-ment.

N -body method deals with a great number ofparticles in physical space. There exists high datadependency among all bodies. The heavy communi-cation is the bottleneck for the performance of themethod. We propose a distributed tree structure to re-duce communication overhead. The N -body methodhas been tested on a local homogeneous cluster anda wider-range heterogeneous cluster. The results ofperformance tests show that DOO method is ap-propriate in distributed and heterogeneous environ-ment.

References

[1] J. Barnes, P. Hut, A hierarchical O(N logN) force-calculationalgorithm, Nature 324 (4) (1986) 446–449.

[2] S. Bhatt, M. Chen, et al., Object-oriented support foradaptive methods on parallel machines, Sci. Comput. 2 (1993)179–192.

[3] J. Farley, Java Distributed Computing, O’Reilly & AssociatesInc, USA, 1998.

[4] A.Y. Grama, V. Kumar, A. Sameh, n-Body simulation usingmessage passing parallel computers, in: Proceedings ofthe Seventh SIAM Conference on Parallel Processing forScientific Computing, 1995, pp. 355–360.

[5] L. Greengard, V. Rokhlin, A fast algorithm for particlesimulations, J. Comput. Phys. 73 (1987) 325–348.

[6] L. Hernquist, Hierarchical N -body methods, Comput. Phys.Commun. 48 (1988) 107–115.

[7] Y.C. Hu, S.L. Johnsson, S.H. Teng, A data-parallel adaptiven-body method, in: Proceedings of the Eighth SIAMConference on Parallel Processing for Scientific Computing,1997.

[8] Z. Liang, Y. Sun, C.L. Wang, ClusterProbe: an open, flexibleand scalable cluster monitoring tool, in: Proceedings of theFirst International Workshop on Cluster Computing, Australia,August 1999, pp. 261–268.

[9] P. Liu, J.J. Wu, A framework for parallel tree-basedscientific simulations, in: Proceedings of the 26th InternationalConference on Parallel Processing, 1997.

[10] J. Salmon, M.S. Warren, Parallel, out-of-core methods forN -body simulation, in: Proceedings of the Eighth SIAMConference on Parallel Processing for Scientific Computing,1997.

[11] J.P. Singh, J.L. Hennessy, A. Gupta, Implications ofhierarhical N -body methods for multiprocessor architectures,ACM Trans. Comput. Syst. 13 (2) (1995) 141–202.

[12] J.P. Singh, C. Holt, et al., Load balancing and data localityin adaptive hierarchical N -body methods: Barnes–Hut, fastmultipole, and radiosity, J. Parallel Distrib. Comput. 27 (2)(1995) 118-141.

[13] Y. Sun, Z. Liang, C.L. Wang, A distributed object-orientedmethod for particle simulations on clusters, in: Proceedingsof Seventh International Conference on High-performanceComputing and Networking, HPCN Europe, Lecture Notesin Computer Science, Vol. 1593, Springer, Berlin, 1999,pp. 251–259.

[14] S.C. Woo, M. Ohara, et al., The SPLASH-2 programs:characterization and methodological considerations, in:Proceedings of the 22nd Annual International Symposium onComputer Architecture, Santa Margherita Ligure, 1995, pp.24–36.