the distributed asci sup ercomputer pro jectrvannieuwpoort.synology.me/papers/das.pdf · the...

21

Upload: others

Post on 13-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

The Distributed ASCI Super omputer Proje t

Henri Bal, Raoul Bhoedjang, Rutger Hofman, Ceriel Ja obs, Thilo Kielmann, Jason Maassen,

Rob van Nieuwpoort, John Romein, Lu Renambot, Tim R�uhl, Ronald Veldema, Kees Verstoep,

Aline Baggio, Ger o Ballintijn, Ihor Kuz, Guillaume Pierre, Maarten van Steen, Andy Tanenbaum,

Gerben Doornbos, Desmond Germans, Hans Spoelder, Evert-Jan Baerends, Stan van Gisbergen

Fa ulty of S ien es, Vrije Universiteit

Hamideh Afsermanesh, Di k van Albada, Adam Belloum, David Dubbeldam, Zeger Hendrikse,

Bob Hertzberger, Alfons Hoekstra, Kamil Iskra, Drona Kandhai, Dennis Koelma,

Frank van der Linden, Benno Overeinder, Peter Sloot, Piero Spinnato

Department of Computer S ien e, University of Amsterdam

Di k Epema, Arjan van Gemund, Pieter Jonker, Andrei Radules u, Cees van Reeuwijk, Henk Sips

Delft University of Te hnology

Peter Knijnenburg, Mi hael Lew, Floris Sluiter, Lex Wolters

Leiden Institute of Advan ed Computer S ien e, Leiden University

Hans Blom, Cees de Laat, Aad van der Steen

Fa ulty of Physi s and Astronomy, Utre ht University

Abstra t

The Distributed ASCI Super omputer (DAS) is a homogeneous wide-area distributed system onsist-

ing of four luster omputers at di�erent lo ations. DAS has been used for resear h on ommuni ation

software, parallel languages and programming systems, s hedulers, parallel appli ations, and distributed

appli ations. The paper gives a preview of the most interesting resear h results obtained so far in the

DAS proje t.

1

1

More information about the DAS proje t an be found on http://www. s.vu.nl/das/

1

Page 2: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

1 Introdu tion

The Distributed ASCI Super omputer (DAS) is an experimental testbed for resear h on wide-area distributed

and parallel appli ations. The system was built for the Advan ed S hool for Computing and Imaging (ASCI)

2

,

a Dut h resear h s hool in whi h several universities parti ipate. The goal of DAS is to provide a ommon

omputational infrastru ture for resear hers within ASCI, who work on various aspe ts of parallel and

distributed systems, in luding ommuni ation substrates, programming environments, and appli ations. Like

a meta omputer [41℄ or omputational grid [17℄, DAS is a physi ally distributed system that appears to its

users as a single, oherent system. Unlike meta omputers, we designed DAS as a homogeneous system.

The DAS system onsists of four luster omputers, lo ated at four di�erent universities in ASCI, linked

together through wide area networks (see Figure 1). All four lusters use the same pro essors and lo al

network and run the same operating system. Ea h university has fast a ess to its own lo al luster. In

addition, a single appli ation an use the entire wide-area system, for example for remote ollaboration

or distributed super omputing. DAS an be seen as a prototype omputational grid, but its homogeneous

stru ture makes it easier to avoid the engineering problems of heterogeneous systems. (Heterogeneous systems

are the obje t of study in several other proje ts, most noti eably Legion [18℄ and Globus [16℄). DAS an

also be seen as a luster omputer, ex ept that it is physi ally distributed.

This paper gives a preview of some resear h results obtained in the DAS proje t sin e its start in June

1997. We �rst des ribe the DAS ar hite ture in more detail (Se tion 2) and then we dis uss how DAS is used

for resear h on systems software (Se tion 3) and appli ations (Se tion 4). Finally, in Se tion 5 we present

our on lusions.

2 The DAS system

DAS was designed as a physi ally distributed homogeneous luster omputer. We de ided to use luster

te hnology be ause of the ex ellent pri e/performan e ratio of ommodity (o�-the-shelf) hardware. We

wanted the system to be distributed, to give the parti ipating universities fast a ess to some lo al resour es.

One of the most important design de isions was to keep the DAS system homogeneous. The reasons for

this hoi e were to allow easy ex hange of software and to stimulate ooperation between ASCI resear hers.

Both the hardware and the software of DAS are homogeneous: ea h node has the same pro essor and runs

the same operating system. Also, the lo al area network within all lusters is the same. The only variations

in the system are the amount of lo al memory and network interfa e memory (SRAM) in ea h node and the

2

The ASCI resear h s hool is unrelated to, and ame into existen e before, the A elerated Strategi Computing Initiative.

2

Page 3: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

VU Amsterdam UvA Amsterdam

LeidenDelft

24 24

24128

6 Mbit/sATM

Figure 1: The wide-area DAS system.

number of nodes in ea h luster. Three lusters have 24 nodes; the luster at the VU initially ontained 64

nodes, but was expanded to 128 nodes in May 1998.

We sele ted the 200 MHz Pentium Pro as pro essor for the DAS nodes, at the time of pur hase the

fastest Intel CPU available. The hoi e for the lo al network was based on resear h on an earlier luster

omputer, omparing the performan e of parallel appli ations on Fast Ethernet, ATM, and Myrinet [29℄.

Myrinet was sele ted as lo al network, be ause it was by far the fastest of the networks onsidered. Myrinet

is a swit h-based network using wormhole routing. Ea h ma hine ontains an interfa e ard with the LANai

4.1 programmable network interfa e pro essor. The interfa e ards are onne ted through swit hes. The

Myrinet swit hes are onne ted in a ring topology for the 24-node lusters and in a 4 by 8 torus for the

128-node luster. Myrinet is used as fast user-level inter onne t. The nodes in ea h luster are also onne ted

by a partially swit hed Fast Ethernet, whi h is used for operating system traÆ .

Sin e DAS is homogeneous, it runs a single operating system. We initially hose BSD/OS (from BSDI)

as OS, be ause it is a stable system with ommer ial support. The in reasing popularity of Linux both

worldwide and within the ASCI s hool made us swit h to Linux (RedHat 5.2) in early 1999.

The lusters were assembled by Parsyte . Figure 2 shows the 128-node luster of the Vrije Universiteit.

Ea h node is a ompa t module rather than a desktop PC or minitower, but it ontains a standard PC

motherboard. The lusters were delivered in June 1997.

The lusters are onne ted by wide-area networks in two di�erent ways:

� using the National Resear h Network infrastru ture (best e�ort network) and the LANs of the univer-

sities

� using an ATM based Virtual Private Network (Quality of Servi e network)

3

Page 4: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Figure 2: The 128-node DAS luster at the Vrije Universiteit.

This setup allowed us to ompare a dedi ated �xed Quality of Servi e network with a best e�ort network.

The best e�ort network onsists generally of 100 Mbit/s Ethernet onne tions to a lo al infrastru ture; ea h

university typi ally had a 34 Mbit/s onne tion to the Dut h ba kbone, later in reased to 155 Mbit/s. The

ATM onne tions are all 6 Mbit/s onstant bitrate permanent virtual ir uits. The round trip times on

the ATM onne tions have hardly any variation and typi ally are around 4 ms. On the best e�ort network

the traÆ always has to pass about 4 routers, whi h ause a millise ond delay ea h. Measurements show

that the round trip times vary by about an order of magnitude due to the other internet traÆ . Attainable

throughput on the ATM network is also onstant. On the best e�ort network, the potential throughput is

mu h higher, but during daytime ongestion typi ally gives throughputs of about 1-2 Mbit/s. This problem

improved later during the proje t.

For most resear h proje ts, we thus use the dedi ated 6 Mbit ATM links. The lusters are onne ted by

these links using a fully- onne ted graph topology, so there is a link between every pair of lusters.

3 Resear h on systems software

DAS is used for various resear h proje ts on systems software, in luding low-level ommuni ation proto ols,

languages, and s hedulers.

3.1 Low-level ommuni ation software

High-speed networks like Myrinet an obtain ommuni ation speeds lose to those of super omputers, but

realizing this potential is a hallenging problem. There are many intri ate design issues for low-level network

4

Page 5: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

interfa e (NI) proto ols [8℄. We have designed and implemented a network interfa e proto ol for Myrinet,

alled LFC [9, 10℄. LFC is both eÆ ient and provides the right fun tionality for higher-level programming

systems. The LFC software runs partly on the host and partly on the embedded LANai pro essor of the

Myrinet network interfa e ard. An interesting feature of LFC is its spanning tree broad ast proto ol, whi h

is implemented on the NIs. By forwarding broad ast messages on the NI rather than on the host, fewer

intera tions are needed between the NI and the host, thus speeding up broad asts substantially.

We have also developed a higher-level ommuni ation library, alled Panda [5℄, whi h supports asyn-

hronous point-to-point ommuni ation, remote pro edure alls, totally-ordered broad ast, and multithread-

ing. On Myrinet, Panda is implemented on top of LFC. The LFC and Panda libraries have been used for a

variety of programming systems, in luding MPI, PVM, Java, Or a, Ja kal, and CRL.

3.2 Languages and programming systems

Various languages and libraries have been studied using DAS. Part of this work fo uses on lo al lusters,

but several programming environments also have been implemented on the entire wide-area DAS system.

Manta is an implementation of Java designed for high-performan e omputing. Manta uses a stati

(native) ompiler rather than an interpreter or JIT (just-in-time ompiler), to allow more aggressive op-

timizations. Manta's implementation of Remote Method Invo ation (RMI) is far more eÆ ient than that

in other Java systems. On Myrinet, Manta obtains a null-laten y for RMIs of 37 �se , while the JDK 1.1

obtains a laten y of more than 1200 �se [31℄. This dramati performan e improvement was obtained by

generating spe ialized serialization routines during ompile-time, by reimplementing the RMI proto ol it-

self, and by using LFC and Panda instead of TCP/IP. Manta uses its own RMI proto ol, but also has the

fun tionality to interoperate with other Java Virtual Ma hines. To handle polymorphi RMIs [53℄, Manta is

able to a ept a Java lass �le (byte ode) from a JVM, ompile it dynami ally to a binary format, and link

the result into the running appli ation program.

We have also developed a �ne-grained Distributed Shared Memory system for Java, alled Ja kal [52℄.

Ja kal allows multithreaded (shared-memory) Java programs to be run on distributed-memory systems,

su h as a DAS luster. It implements a software a he- oheren e proto ol that manages regions. A region

is a ontiguous blo k of memory that ontains either a single obje t or a �xed-size partition of an array.

Ja kal a hes regions and invalidates the a hed opies at syn hronization points (the start and end of a

syn hronized statement in Java). Ja kal uses lo al and global mark-and-sweep garbage olle tion algorithms

that are able to deal with repli ated obje ts and partitioned arrays. Ja kal has been implemented on DAS

on top of LFC.

5

Page 6: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Spar/Java is a data and task parallel programming language for semi-automati parallel programming, in

parti ular for the programming of array-based appli ations [47℄. Apart from a few minor modi� ations, the

language is a superset of Java. This provides Spar/Java with a modern, solid language as basis, and makes

it a essible to a large group of users. Spar/Java extends Java with onstru ts for parallel programming,

extensive support for array manipulation, and a number of other powerful language features. It has a exible

annotation language for spe ifying data and task mappings at any level of detail [46℄. Alternatively, ompile-

time or run-time s hedulers an do (part of) the s heduling. Spar/Java runs on the DAS using MPI and

Panda.

Or a is an obje t-based distributed shared memory system. Its runtime system dynami ally repli ates

and migrates shared obje ts, using heuristi information from the ompiler. Or a has been implemented

on top of Panda and LFC. An extensive performan e analysis of Or a was performed on DAS, in luding a

omparison with the TreadMarks page-based DSM and the CRL region-based DSM [5℄. Also, a data-parallel

extension to Or a has been designed and implemented, resulting in a language with mixed task and data

parallelism. This extended language and its performan e on DAS are des ribed in [20℄.

In the ESPRIT proje t PREPARE, an HPF ompiler has been developed with an advan ed and eÆ ient

parallelization engine for regular array assignments [14, 39℄. The PREPARE HPF ompiler has been ported

to the DAS and uses the CoSy ompilation framework in ombination with the MPI message passing library.

In another ESPRIT proje t, alled Dynamite

3

, we have developed an environment for the dynami

migration of tasks in a PVM program [21, 22, 44℄. A version of this ode is now available for SUN OS

5.5.1, SUN OS 5.6 and Linux/i386 2.0 and 2.2 (lib 5 and glib 2.0). DAS is being used for developing

and testing the Linux version of this ode. The aim of this work is to develop an environment for Linux,

supporting dynami al task migration for PVM and MPI, and to make this environment available to the

resear h ommunity. Dynamite is minimally intrusive in the sense that it does not require modi� ations in

the user's program and is implemented entirely in user spa e and thus does not require modi� ations to the

kernel. The Dynamite system in ludes: a modi�ed version of the Linux ELF dynami loader, whi h does

he kpointing and restarting of tasks; a modi�ed version of PVM, supporting the transparent migration

of tasks; monitoring programs for the system load and the PVM system; a dynami task s heduler; and

optionally, a GUI an be added that guides the user through the ne essary steps to set up the environment

and to start up a program.

Several programming systems have also been implemented on multiple lusters of the wide-area DAS

system. Both Or a and Manta have been implemented on wide-area DAS and have been used to study the

3

See http://www.hoise. om/dynamite

6

Page 7: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

performan e of wide-area parallel appli ations [35, 45℄. In addition, we have developed a new MPI (Message

Passing Interfa e) library for wide-area systems, alled MagPIe [27℄. MagPIe optimizes MPI's olle tive

ommuni ation operations and takes the hierar hi al stru ture of wide-area systems into a ount. With

MagPIe, most olle tive operations require only a single wide-area laten y. For example, an MPI broad ast

message is performed by sending it in parallel over the di�erent wide-area links and then forwarding it

within ea h luster. Existing MPI implementations that are not aware of the wide-area stru ture often

forward a message over multiple wide-area links (thus taking multiple wide-area laten ies) or even send the

same information multiple times over the same wide-area link. On DAS, MagPIe outperforms MPICH by a

fa tor of 2-8.

3.3 S hedulers

Another topi we are investigating with the DAS is the s heduling of parallel programs a ross multiple DAS

lusters. Our urrent DAS s heduler (prun) only operates on single lusters, so for multi- luster s heduling

we need a me hanism for o-allo ation of pro essors in di�erent lusters at the same time. We are urrently

investigating the use of Globus with its support for o-allo ation [13℄ for this purpose. So far, we have

implemented a simple interfa e between Globus and prun, and we have been able to submit and run two-

luster jobs through Globus. An important feature of prun that fa ilitates o-allo ation is its support for

reservations. We are planning to enhan e the interfa e between the lo al s hedulers and Globus, and if

ne essary the o-allo ation omponent of Globus, so that more optimal global s heduling de isions an be

made. Our �rst results on the performan e of o-allo ation in DAS-like systems an be found in [11℄.

4 Resear h on appli ations

We have used the DAS system for appli ations that run on a single luster (Se tion 4.1) and for wide-

area appli ations (Se tion 4.2). Also, we have studied Web-based appli ations (Se tion 4.3) and worldwide

distributed appli ations (Se tion 4.4).

4.1 Parallel appli ations

DAS has been used for a large number of parallel appli ations, in luding dis rete event simulation [33, 40℄,

Latti e Gas - and Latti e Boltzmann Simulations [15, 24, 25℄, parallel imaging [28℄, image sear hing [12℄,

datamining, N-body simulation [42℄, game tree sear h [38℄, simulation of bran h-predi tion me hanisms, ray

7

Page 8: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

tra ing, mole ular dynami s, and quantum hemistry [19℄. We dis uss some of these appli ations in more

detail below.

The PILE proje t is to design a parallel programming model and environment for time- onstraint image

pro essing appli ations [28℄. The programming model is based on the analysis of typi al solutions employed

by users from the image pro essing ommunity. The PILE system is built around a software library ontaining

a set of abstra t data types and asso iated operations exe uting in a data parallel fashion. As implementation

vehi les on the DAS, MPI, CRL, and Spar/Java are being investigated.

Another appli ation area is image databases. Visual on ept re ognition [12℄ algorithms typi ally require

the solution of omputationally intensive sub-problems su h as orrelation and the optimal linear prin ipal

omponent transforms. We have designed eÆ ient algorithms for distributing the re ognition problem a ross

high bandwidth, distributed omputing networks. This has led not only to new algorithms for parallelizing

prevalent prin ipal omponent transforms, but also to novel te hniques for segmenting images and video for

real time appli ations.

Another proje t investigates hardware usage for bran h predi tors. Bran h predi tors are used in most

pro essors to keep the instru tion pipeline �lled. As a �rst step, we want to investigate the e�e ts of many

di�erent ( avors of) algorithms. For this purpose, a database was built whi h urrently holds about 8000

SQL-sear hable re ords. Ea h re ord ontains a detailed des ription of the state of a bran h predi tor after

the run of a tra e. These tra es were reated on the DAS ma hine using a MIPS-simulator, whi h simulates

six di�erent Spe 95 ben hmarks. The database was also built on the DAS ma hine, whi h took about 20

hours using 24 nodes. The individual nodes were used as stand-alone omputers. Ea h node ran a opy of a

bran h-predi tor simulator and worked on its own data-set. The main advantage of using the DAS ma hine

for this proje t is that it provides a transparent multi- omputer platform, whi h has proven to build the

required database in a fra tion of the time needed by a single omputer. With the resulting database the

investigation of the next steps in this proje t has been started.

We also use DAS for experimental resear h on the Time Warp Dis rete Event Simulation method. The

availability of fast, low-laten y ommuni ation is an important asset here. We have made extensive use of the

DAS to run Time Warp simulations for studying the in uen e of the appli ation dynami s on the exe ution

behavior of the Time Warp simulation kernel. The appli ation under study is an Ising spin system. The

results learly show the in uen e of the Ising spin dynami s on the Time Warp exe ution behavior in terms

of rollba k length and frequen y, and turnaround times. The results indi ate the need for adaptive optimism

ontrol me hanisms. Su essful experiments showed the versatility of optimism ontrol. First results are

obtained for des ribing and measuring self organization in parallel asyn hronous Cellular Automata with

8

Page 9: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Time Warp optimisti s heduling. It was shown that di�erent s aling laws exist for rollba k lengths with

varying Time Warp windows. These results were experimentally validated for sto hasti spin dynami s

systems. The work is des ribed in more detail in [40℄.

Another proje t studies N-body simulations. The numeri al integration of gravitational N-body problems

in its most basi formulation requires O(N

2

) operations per time step and O(N) time steps to study the

evolution of the system over a dynami ally interesting period of time. Realisti system sizes range from

a few thousand (open lusters) through 10

6

(globular lusters) to 10

12

(large galaxies). There are several

options to speed up su h al ulations: use a parallel omputer system; use fast, spe ial purpose, massively

parallel hardware, su h as the GRAPE-4 system of Makino [32℄; avoid re omputing slowly varying for es too

frequently (i.e. use individual time-steps); or arefully ombine groups of parti les in omputing the for es on

somewhat distant parti les (this leads to the well-known hierar hi al methods). Ea h of these te hniques has

distin t advantages and drawba ks. In our resear h we strive to �nd optimal mixes of the above approa hes

for various lasses of problems. We have atta hed two GRAPE-4 boards, whi h were kindly made available

by Jun Makino, to two separate DAS nodes at the University of Amsterdam. The system is used both by the

Astronomy Department for a tual N-body simulations, and by the Se tion Computational S ien e to model

the behavior of su h hybrid omputer systems and to guide the development of more advan ed approa hes

to N-body simulations, ombining some or all of the above te hniques.

The implementation of a hierar hi al algorithm on a parallel omputer and the ontrol of the resulting

numeri al errors are important omponents of our resear h. The methodologies to be developed have a

wider appli ability than astrophysi al N-body problems alone. Experiments have been performed on two

astronomi al N-body odes that have been instrumented to obtain performan e data for individual software

omponents and the GRAPE. One of the odes was adapted to run as a parallel ode on the DAS with

GRAPE. A simulation model for the resulting hybrid ar hite ture has been developed that reprodu es the

a tual performan e of the system quite a urately, so we will use this model for performan e analysis and

predi tion for similar hybrid systems.

We also study parti le models with highly onstrained dynami s. These Latti e Gas (LGA) - and Latti e

Boltzmann models (LBM) originated as mesos opi parti le models that an mimi hydrodynami s. We use

these models to study u tuations in uids and to study ow and transport in disordered media, su h as

random �ber networks and random lose pa kings of spheres. In all ases the omputational demands are

huge. Typi al LGA runs require 50 hours ompute time on 4 to 8 DAS nodes and are ompute bounded.

Typi al LBM runs simulate ow on very large grids (256

3

to 512

3

) and are usually bounded by the available

memory in the parallel ma hine. Large produ tion runs are typi ally exe uted on 8 to 16 DAS nodes.

9

Page 10: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Our parallel Latti e Boltzmann ode was originally developed on a Cray T3E [24, 25℄ under MPI. Be-

ause of the inherent data lo ality of the LBM iteration, parallelization was straightforward. However, to

guarantee good load balan ing we use Orthogonal Re ursive Bise tion to obtain a good partitioning of the

omputational box. The ode was ported to several systems (DAS, Parsyte CC, IBM SP2). Currently, we

investigate ow and transport in random lose pa kings of spheres, using the DAS.

We developed a generi parallel simulation environment for thermal 2-dimensional LGA [15℄. The de-

omposition of the re tangular omputational grid is obtained by a strip wise partitioning. An important

part of the simulation is ontinuous Fourier transformations of density u tuations after ea h LGA iteration.

As a parallel FFT we use the very eÆ ient publi domain pa kage FFTW (http://www.�tw.org/) whi h

exe utes without any adaptations on the DAS.

Another proje t involves parallel ray tra ing. The ray tra er we use is based on Radian e and uses

PVM (on top of Panda) for inter-pro essor ommuni ation. The port to DAS was a hieved with the aim to

ompare performan e results on di�erent platforms, in luding the DAS, a Parsyte CC, and various lusters

of workstations. The algorithm onsists of a demand-driven part for those tasks whi h require either a

large amount of data or data whi h is diÆ ult to predi t in advan e. This leads to a basi , but unbalan ed

workload. In order to a hieve proper eÆ ien ies, demand driven tasks are reated where possible. These

in lude tasks whi h are relatively ompute intensive and require a small amount of data. Demand driven

tasks are then used to balan e the workload. An overview of the algorithm, in luding results, are given

in [36℄.

In the Multigame proje t, we have developed an environment for distributed game-tree sear h. A pro-

grammer des ribes the legal moves of a game in the Multigame language and the ompiler generates a move

generator for that game. The move generator is linked with a runtime system that ontains parallel sear h

engines and heuristi s, resulting in a parallel program. Using the Multigame system, we developed a new

sear h algorithm for single-agent sear h, alled Transposition Driven Sear h, whi h obtains nearly perfe t

speedups up to 128 DAS nodes [38℄.

Several intera tive appli ations are being studied that use parallelism to obtain real-time responses, for

example for steering simulations. Examples are simulation of the haoti behavior of lasers or of the motion

of magneti vorti es in disordered super ondu tors, real-time analysis of experimental data (e.g., determining

the urrent patterns in at ondu tors from magneto-opti al imaging [54℄), and post-pro essing experimental

results. Examples of the latter ategory are re onstru tion of 3D images of teeth from lo al CT-data and

re onstru tion of the sea-bottom stru ture from a ousti al data. As many problems use linear-algebrai

operations or Fast Fourier Transforms, ex ellent speed-ups an be obtained.

10

Page 11: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

One of the largest appli ations ported to DAS is the Amsterdam Density Fun tional (ADF) program [19℄

of the Theoreti al Chemistry se tion of the Vrije Universiteit. ADF is a quantum hemi al program. It uses

density fun tional theory to al ulate the ele troni stru ture of mole ules, whi h an be used for studying

various hemi al problems. ADF has been implemented on DAS on top of the MPI/Panda/LFC layers. The

obtained speed-up strongly depends on the hosen a ura y parameters, the mole ule, and the al ulated

property. The best speed-up measured so far was 70 on 90 CPUs for a al ulation on 30 water mole ules.

Current work fo uses on eliminating the remaining sequential bottlene ks and improving the load balan ing

s heme.

We have also ompared the appli ation performan e of the DAS lusters with that of a 4-pro essor SGI

Origin200 system (at the University of Utre ht). This study mainly uses the EuroBen ben hmark, whi h

indi ates performan e for te hni al/s ienti� omputations. The single-node observed peak performan e of

the Pentium Pro nodes of DAS is 3{5 times lower than that of the Origin200 nodes, mainly be ause the

O200 nodes are super s alar (they have more independently s hedulable oating-point operations per y le).

We ompared the ommuni ation performan e of DAS and the O200 by running a simple ping-pong test

written in Fortran 77/MPI. The MPI system used for DAS is a port of MPICH on top of Panda and LFC.

The bandwidth within a lo al DAS luster is about half of that of the O200. Finally, we measured the

performan e of a dense matrix-ve tor multipli ation The DAS version s ales well, but for large problem sizes

the program runs out of its L2 a he, resulting in a performan e de rease. The maximum speed obtained

on DAS (1228 M op/s) is a fa tor of 6 lower than on the O200 (7233 M op/s).

4.2 Wide-area appli ations

One of the goals of the DAS proje t was to do resear h on distributed super omputing appli ations, whi h

solve very large problems using multiple parallel systems at di�erent geographi lo ations. Most experiments

in this area done so far (e.g., SETI�home and RSA-155) use very oarse-grained appli ations. In our work,

we also investigate whether more medium-grained appli ations an be run eÆ iently on a wide-area system.

The problem here, of ourse, is the relatively poor performan e of the wide-area links. On DAS, for example,

most programming systems obtain a null-laten y over the lo al Myrinet network of about 30-40 �se , whereas

the wide-area ATM laten y is several millise onds. The throughput obtained for Myrinet typi ally is 30-60

Mbyte/se and for the DAS ATM network it is about 0.5 Mbyte/se . So, there is about two orders of

magnitude di�eren e in performan e between the lo al and wide-area links.

Many resear hers have therefore ome to believe that it is impossible to run medium-grained appli ations

on wide-area systems. Our experien e, however, ontradi ts this expe tation. The reason is that it is possible

11

Page 12: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

to exploit the hierar hi al stru ture of systems like DAS. Most ommuni ation links in DAS are fast Myrinet

links, and only few links are slow ATM links. We have dis overed that many parallel algorithms an be

optimized by taking this hierar hi al stru ture into a ount [6, 35, 45℄. The key is to avoid the overhead on

the wide-area links, or to mask wide-area ommuni ation. Many appli ations an be made to run mu h faster

on the wide-area DAS using well-known optimizations like message ombining, hierar hi al load balan ing,

and laten y hiding. The speedups of the optimized programs often is lose to those on a single luster with

the same total number of CPUs.

In addition to this distributed super omputing type of appli ation, interest is also in reasing in using DAS

for other types of omputational-grid appli ations. An important issue is to harness the large omputational

and storage apabilities that are provided by su h systems. A next step is to develop new types of appli ation

environments on top of these grids. Virtual laboratories are one form of su h new appli ation environments.

They will, in the near future, have to allow an experimental s ientist (either being a physi ist, a biologist or an

engineer) to do experiments or develop designs. A Virtual laboratory environment will onsist of equipment

(like a mass spe trometer or a DNA mi ro array) that an be remotely ontrolled and that will provide data

sets that an be stored in the information management part of the laboratory. Intera tion with the data

an, among others, take pla e in virtual reality equipment like a CAVE. We have used DAS and another

SMP-based luster omputer (Ar hes) to develop the distributed information management for su h systems

as well as to study the possibilities for more generi user oriented middle ware for su h laboratories [23℄.

In another proje t (whi h is part of the Dut h Robo up initiative), we have developed an intera tive and

ollaborative visualization system for multi-agent systems, in parti ular robot so er. Our system allows

human users in CAVEs at di�erent geographi lo ations to parti ipate in a virtual so er mat h [37, 43℄.

The user in the CAVE an navigate over the so er �eld and ki k a virtual ball. In this way, the user intera ts

with a running (parallel) simulation program, whi h runs either on an SP2 or a DAS luster. In general,

many other s ienti� appli ations an bene�t from su h a form of high-level steering from a Virtual Reality

environment. The wide-area networks of DAS are an interesting infrastru ture for implementing distributed

ollaborative appli ations in general, as the link-laten y has hardly any variation.

4.3 Web-based appli ations

We have studied Web a hing using DAS and Ar hes, fo using mainly on a he repla ement and oheren e

strategies. This study showed that the te hniques urrently used for Web a hing are nevertheless very simple

and are mostly derived from earlier work in omputer ar hite ture systems. The experiments show that these

te hniques are still quite eÆ ient ompared to some new te hniques that have been proposed spe ially for

12

Page 13: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Web a hing [1, 2, 3℄. Sin e in Web a hing both strong and weak do ument oheren y are onsidered,

we have performed experiments in whi h we studied the quality of the hits (good hits are performed on

up-to-date a hed do uments). We have shown the existen e of two ategories of repla ement strategies,

performing the hits on re ently requested do uments or (mainly) on long term a hed do uments. The usage

of the a hed do uments is quite di�erent in both lasses and the a he size has di�erent impa t on ea h

ategory. We have ompared the eÆ ien y of strong and weak do ument oheren y. The results show that

with weak do ument oheren y, between 10% and 26% of the forwarded do uments were out-of-date, while

the useless generated traÆ remains quite high: 40% to 70% of the messages ex hanged to he k the state

of the a hed do uments are useless. To study strong oheren y, we used a typi al method that uses the

invalidation proto ol. The results show that the ost paid for this an be quite high. On average 70% of the

invalidation messages are useless and arrive at a a he server after the target do ument has been removed

from the a he.

In another proje t, alled ARCHIPEL [7℄, we study information servi e brokerage systems. In order

to support the wide variety of appli ation requirements, fundamental approa hes to ele troni ommer e,

Web-based learing houses, distributed databases, and information servi e brokerage systems need to be

developed. The ARCHIPEL system developed at the University of Amsterdam aims at both addressing and

analyzing these omplex needs, and providing the environment and system for their adequate support. The

results a hieved at this early stage of the proje t ontain the identi� ation of the hallenging requirements for

the ARCHIPEL support infrastru ture, su h as Web-based, ooperative and interoperable, high performan e,

support for node autonomy, preservation of information visibility (and information modi� ation) rights, and

the platform heterogeneity. So far, a omprehensive des ription of the ARCHIPEL infrastru ture is provided

that unites di�erent hara teristi s of existing parallel, distributed, and federated database management

systems within one uniform ar hite ture model. Currently, di�erent methodologies and te hnologies are

being evaluated to ful�ll the requirements. From this point of view, among others the XML te hnology,

ODBC standard, rapidly improving Java based programming and distributed management tools are being

evaluated.

4.4 Worldwide distributed appli ations

The goal of the Globe proje t [50, 51℄ is to design and build a prototype of a s alable infrastru ture for future

worldwide distributed appli ations. The infrastru ture is designed to support up to a billion users, spread

over the whole world. These users may own up to a trillion obje ts, some of them mobile. Many aspe ts of

the system, su h as repli ating data over multiple lo ations and managing se urity should be automati or at

13

Page 14: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

least easily on�gurable. Although Globe interworks with the World Wide Wide, it is intended to run native

on the Internet, just as email, USENET news, and FTP do. DAS, with its 200 nodes at four lo ations, has

been used as a testbed to test pie es of the Globe system. We hope that a �rst prototype of Globe will be

available in early 2001.

The software te hnology underlying Globe is the distributed shared obje t. Ea h obje t has methods

that its users an invoke to obtain servi es. An obje t may be repli ated on multiple ma hines around the

world and a essed lo ally from ea h one as if it were a lo al obje t. The key to s alability is that ea h

obje t has its own poli ies with respe t to repli ation, oheren e, ommuni ation, se urity, et ., and these

poli ies are en apsulated within the obje t. For example, an obje t providing �nan ial servi es may require

sequential onsisten y, whereas an obje t providing sports s ores may have a mu h weaker onsisten y. It

is this ability to tailor the various poli ies per obje t that makes Globe s alable be ause those obje ts with

demanding requirements an have them without a�e ting obje ts that an live with weaker guarantees.

One of the aspe ts studied in detail is lo ating obje ts (�les, mailboxes, Web pages, et .) in su h a large

system. When an obje t wishes to be found, it registers with the lo ation server, whi h tra ks the lo ation

of all obje ts in a worldwide tree [48, 49℄. The tree exploits lo ality, a hing, and other te hniques to make

it s alable. Re ent work has fo used on handling mobile obje ts. Other work has looked at automating the

de ision about where to pla e repli as of obje ts to minimize bandwidth and delay [34℄. The main on lusion

here is that the ability to tailor the repli ation poli y to ea h obje t's a ess patterns provides signi� ant

gains over a single repli ation poli y for all obje ts and far better than having only a single opy of ea h

obje t. We have also looked at se urity issues [30℄.

Three appli ations of Globe have been built. The �rst one, Globedo [26℄, is used to produ e a better

Web on top of Globe. Globedo allows one or more HTML pages plus some appropriate olle tion of i ons,

images, et . to be pa kaged together into a single Globedo and transmitted all at on e, a vastly more

eÆ ient s heme than the urrent Web. Gateways to and from the Web have been onstru ted so Globedo

obje ts an be viewed with existing browsers. The se ond appli ation is the Globe Distribution Network [4℄,

an appli ation to provide a s alable worldwide distribution s heme for omplex free software pa kages. The

third appli ation is an instant-messaging servi e ( alled Lo 8) built as a front end to the lo ation servi e.

This servi e allows users all over the world to onta t ea h other, regardless whether they are mobile or not.

Spe ial attention has been paid to se urity. In ontrast to the entralized approa h of existing servi es, our

Lo 8 servi e is highly distributed and exploits lo ality as mu h as possible to attain s alability.

14

Page 15: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

5 Con lusions

The Distributed ASCI Super omputer (DAS) is a 200-node homogeneous wide-area luster omputer that

is used for experimental resear h within the ASCI resear h s hool. Sin e the start of the proje t in June

1997, a large number of people have used the system for resear h on ommuni ation substrates, s heduling,

programming languages and environments, and appli ations.

The luster omputers of DAS use Myrinet as fast user-level inter onne t. EÆ ient ommuni ation

software is the key issue to obtain high ommuni ation performan e on modern networks like Myrinet.

We designed an eÆ ient low-level ommuni ation substrate for Myrinet, alled LFC. LFC provides the

right fun tionality to higher level layers (e.g., MPI, PVM, Java RMI, Or a), allowing them to obtain high

ommuni ation performan e. Most programming systems (even Java RMI) obtain null-laten ies of 30-40 �se

and throughputs of 30-60 Mbyte/se over Myrinet. We have therefore been able to su essfully implement a

wide variety of parallel appli ations on the DAS lusters, in luding many types of imaging appli ations and

s ienti� simulations.

DAS also is an ex ellent vehi le for doing resear h on wide-area appli ations, be ause it is homogeneous

and uses dedi ated wide-area networks. The onstant bandwidth and the low round trip times of the

WANs make message passing between the lusters predi table. On heterogeneous omputational grids [17℄,

additional problems must be solved (e.g., due to di�eren es in pro essors and networks). Our work on wide-

area parallel appli ations on DAS shows that there is a mu h ri her variety of appli ations than expe ted that

an bene�t from distributed super omputing. The basi assumption we rely on is that the distributed system

is stru tured hierar hi ally. This assumption fails for systems built from individual workstations at random

lo ations on the globe, but it does hold for grids built from MPPs, lusters, or networks of workstations.

We expe t that future omputational grids will indeed be stru tured in su h a hierar hi al way, exhibiting

lo ality like DAS.

A knowledgements

The DAS system is �nan ed partially by the Netherlands Organization for S ienti� Resear h (NWO) and

by the board of the Vrije Universiteit. The system was built by Parsyte , Germany. The wide-area ATM

networks are part of the Surfnet infrastru ture. A large number of other people have ontributed to the

DAS proje t, in luding Egon Amade, Arno Bakker, Sanya Ben Hassen, Christopher H�anle, Philip Homburg,

Jim van Keulen, Jussi Leiwo, Aske Plaat, Patri k Verkaik, Ivo van der Wijk (Vrije Universiteit), Gijs

Nelemans, Gert Polletiek, (University of Amsterdam), Wil Denissen, Vin ent Korstanje, Frits Kuijlman

15

Page 16: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

(Delft University of Te hnology), and Erik Reinhard (University of Bristol). We thank Jun Makino for

kindly providing us with two GRAPE-4 boards.

Referen es

[1℄ A.Belloum and B. Hertzberger. Dealing with One-Time Do uments in Web Ca hing. In "EUROMI-

CRO'98 Conferen e", Sweden, August 1998.

[2℄ A.Belloum and B. Hertzberger. Repla ement Strategies in Web Ca hing. In "ISIC/CIRA/ISAS'98

IEEE onferen e", Gaithersburg, Maryland, September 1998.

[3℄ A.Belloum, A.H.J Peddemors, and B. Hertzberger. JERA: A S alable Web Server. In "Pro eedings of

the PDPTA'98 onferen e", pages 167{174, Las Vegas, NV, 1998.

[4℄ A. Bakker, E. Amade, G. Ballintijn, I. Kuz, P. Verkaik, I. van der Wijk, M. van Steen, and A.S.

Tanenbaum. The Globe Distribution Network. In Pro . 2000 USENIX Annual Conf. (FREENIX

Tra k), pages 141{152, June 2000.

[5℄ H. Bal, R. Bhoedjang, R. Hofman, C. Ja obs, K. Langendoen, T. R�uhl, and F. Kaashoek. Performan e

Evaluation of the Or a Shared Obje t System. ACM Transa tions on Computer Systems, 16(1):1{40,

February 1998.

[6℄ H.E. Bal, A. Plaat, M.G. Bakker, P. Dozy, and R.F.H. Hofman. Optimizing Parallel Appli ations for

Wide-Area Clusters. In International Parallel Pro essing Symposium, pages 784{790, Orlando, FL,

April 1998.

[7℄ A. Belloum, H. Muller, and B. Hertzberger. S alable Federations of Web Ca hes. submitted to the

spe ial issue on Web performan e of the Journal of Performan e Evaluation, 1999.

[8℄ R.A.F. Bhoedjang, T. R�uhl, and H.E. Bal. Design Issues for User-Level Network Interfa e Proto ols for

Myrinet. IEEE Computer, 31(11):53{60, November 1998.

[9℄ R.A.F. Bhoedjang, T. R�uhl, and H.E. Bal. EÆ ient Multi ast on Myrinet Using Link-Level Flow

Control. In Pro . of the 1998 Int. Conf. on Parallel Pro essing, pages 381{390, Minneapolis, MN,

August 1998.

[10℄ R.A.F. Bhoedjang, K. Verstoep, T. R�uhl, H.E. Bal, and R.F.H. Hofman. Evaluating Design Alternatives

for Reliable Communi ation on High-Speed Networks. In Pro . 9th Int. Conferen e on Ar hite tural

16

Page 17: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

Support for Programming Languages and Operating Systems (ASPLOS-9), Cambridge, MA, November

2000.

[11℄ A.I.D. Bu ur and D.H.J. Epema. The In uen e of the Stru ture and Sizes of Jobs on the Performan e of

Co-Allo ation. In Sixth Workshop on Job S heduling Strategies for Parallel Pro essing (in onjun tion

with IPDPS2000), Le ture Notes in Computer S ien e, Can un, Mexi o, May 2000. Springer-Verlag,

Berlin.

[12℄ J. Buijs and M. Lew. Learning Visual Con epts. In ACM Multimedia'99, Orlando, FL, November 1999.

[13℄ K. Czajkowski, I. Foster, and C. Kesselman. Resour e Co-Allo ation in Computational Grids. In Pro .

of the 8-th IEEE Int'l Symp. on High Performan e Distributed Computing, pages 219{228, Redondo

Bea h, CA, USA, July 1999.

[14℄ W.J.A. Denissen, V.J. Korstanje, and H.J. Sips. Integration of the HPF Data-parallel Model in the CoSy

Compiler Framework. In 7th International Conferen e on Compilers for Parallel Computers (CPC'98),

pages 141{158, Linkoping, Sweden, June 1998.

[15℄ D. Dubbeldam, A.G. Hoekstra, and P.M.A. Sloot. Computational Aspe ts of Multi-Spe ies Latti e-Gas

Automata. In P.M.A. Sloot, M. Bubak, A.G. Hoekstra, and L.O. Hertzberger, editors, High-Performan e

Computing and Networking (HPCN Europe '99), Amsterdam, The Netherlands, number 1593 in Le ture

Notes in Computer S ien e, pages 339{349, Berlin, April 1999. Springer-Verlag.

[16℄ I. Foster and C. Kesselman. Globus: A Meta omputing Infrastru ture Toolkit. Int. Journal of Super-

omputer Appli ations, 11(2):115{128, Summer 1997.

[17℄ I. Foster and C. Kesselman, editors. The GRID: Blueprint for a New Computing Infrastru ture. Morgan

Kaufmann, 1998.

[18℄ A.S. Grimshaw and Wm. A. Wulf. The Legion Vision of a Worldwide Virtual Computer. Comm. ACM,

40(1):39{45, January 1997.

[19℄ C. Fonse a Guerra, J. G. Snijders, G. te Velde, and E. J. Baerends. Towards an Order-N DFT method.

Theor. Chem. A ., 99:391{403, 1998.

[20℄ S. Ben Hassen, H.E. Bal, and C. Ja obs. A Task and Data Parallel Programming Language based on

Shared Obje ts. ACM. Trans. on Programming Languages and Systems, 20(6):1131{1170, November

1998.

17

Page 18: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

[21℄ K.A. Iskra, Z.W. Hendrikse, G.D. van Albada, B.J. Overeinder, and P.M.A. Sloot. Experiments with

Migration of PVM Tasks. In Resear h and Development for the Information So iety Conferen e Pro-

eedings (ISThmus 2000), pages 295{304, 2000.

[22℄ K.A. Iskra, F. van der Linden, Z.W. Hendrikse, B.J. Overeinder, G.D. van Albada, and P.M.A. Sloot.

The implementation of Dynamite | an environment for migrating PVM tasks. ACM OS Review

(submitted), 2000.

[23℄ E. Kaletas, A.H.J. Peddemors, and H. Afsarmanesh. ARCHIPEL Cooperative Islands of Information.

Internal report, University of Amsterdam, Amsterdam, The Netherlands, June 1999.

[24℄ D. Kandhai. Large S ale Latti e-Boltzmann Simlations: Computational Methods and Appli ations. PhD

thesis, University of Amsterdam, Amsterdam, The Netherlands, 1999.

[25℄ D. Kandhai, A. Koponen, A.G. Hoekstra, M. Kataja, J. Timonen, and P.M.A. Sloot. Latti e Boltzmann

Hydrodynami s on Parallel Systems. Computer Physi s Communi ations, 111:14{26, 1998.

[26℄ A.M. Kermarre , I. Kuz, M. van Steen, and A.S. Tanenbaum. A Framework for Consistent, Repli ated

Web Obje ts. In Pro eedings of the 18th International Conferen e on Distributed Computing Systems

(ICDCS), May 1998.

[27℄ T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, and R.A.F. Bhoedjang. MagPIe: MPI's Colle tive

Communi ation Operations for Clustered Wide Area Systems. In ACM SIGPLAN Symposium on

Prin iples and Pra ti e of Parallel Programming, pages 131{140, Atlanta, GA, May 1999.

[28℄ D. Koelma, P.P. Jonker, and H.J. Sips. A software ar hite ture for appli ation driven high perfor-

man e image pro essing. Parallel and Distributed Methods for Image Pro essing, Pro eedings of SPIE,

3166:340{351, July 1997.

[29℄ K. Langendoen, R. Hofman, and H. Bal. Challenging Appli ations on Fast Networks. In HPCA-4

High-Performan e Computer Ar hite ture, pages 125{137, Las Vegas, NV, February 1998.

[30℄ J. Leiwo, C. H�anle, P. Homburg, C. Gamage, and A.S. Tanenbaum. A se urity design for a wide-area

distributed system. In Pro . Se ond Int'l Conf. Information Se urity and Cryptography (ICISC'99), In

LNCS 1878, De ember 1999.

[31℄ J. Maassen, R. van Nieuwpoort, R. Veldema, H.E. Bal, and A. Plaat. An EÆ ient Implementation

of Java's Remote Method Invo ation. In ACM SIGPLAN Symposium on Prin iples and Pra ti e of

Parallel Programming, pages 173{182, Atlanta, GA, May 1999.

18

Page 19: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

[32℄ J. Makino, M. Taiji, T. Ebisuzaki, and D. Sugmimoto. GRAPE-4: A Massively-parallel Spe ial-purpose

Computer for Collisional N-body Simulation. Astrophysi al Journal , (480):432{446, 1997.

[33℄ B.J. Overeinder, A. S honeveld, , and P.M.A. Sloot. Self-Organized Criti ality in Optimisti Simulation

of Correlated Systems. Submitted to Journal of Parallel and Distributed Computing, 2000.

[34℄ G. Pierre, I. Kuz, M. van Steen, and A.S. Tanenbaum. Di�erentiated Strategies for Repli ating Web

Do uments. In Pro . 5th International Web Ca hing and Content Delivery Workshop, May 2000.

[35℄ A. Plaat, H. Bal, and R. Hofman. Sensitivity of Parallel Appli ations to Large Di�eren es in Band-

width and Laten y in Two-Layer Inter onne ts. In Fifth International Symposium on High-Performan e

Computer Ar hite ture, pages 244{253, Orlando, FL, January 1999. IEEE CS.

[36℄ E. Reinhard, A. Chalmers, and F.W. Jansen. Hybrid s heduling for parallel rendering using oherent

ray tasks. In J. Ahrens, A. Chalmers, and Han-Wei Shen, editors, 1999 IEEE Parallel Visualization

and Graphi s Symposium, pages 21{28, O tober 1999.

[37℄ L. Renambot, H.E. Bal, D. Germans, and H.J.W. Spoelder. CAVEStudy: an Infrastru ture for Com-

putational Steering in Virtual Reality Environments. In Ninth IEEE International Symposium on High

Performan e Distributed Computing, Pittsburgh, PA, August 2000.

[38℄ J.W. Romein, A. Plaat, H.E. Bal, and J. S hae�er. Transposition Table Driven Work S heduling in

Distributed Sear h. In 16th National Conferen e on Arti� ial Intelligen e (AAAI), pages 725{731,

Orlando, Florida, July 1999.

[39℄ H.J. Sips, W. Denissen, and C. van Reeuwijk. Analysis of Lo al Enumeration and Storage S hemes in

HPF. Parallel Computing, 24:355{382, 1998.

[40℄ P.M.A. Sloot and B.J. Overeinder. Time Warped Automata: Parallel Dis rete Event Simulation of

Asyn hronous CA's. In Pro eedings of the Third International Conferen e on Parallel Pro essing and

Applied Mathemati s, number 1593 in Le ture Notes in Computer S ien e, pages 43{62. Springer-Verlag,

Berlin, September 1999.

[41℄ L. Smarr and C.E. Catlett. Meta omputing. Communi ations of the ACM, 35(6):44{52, June 1992.

[42℄ P.F. Spinnato, G.D. van Albada, and P.M.A. Sloot. Performan e Analysis of Parallel N-Body Codes.

In High-Performan e Computing and Networking (HPCN Europe 2000), number 1823 in Le ture Notes

in Computer S ien e, pages 249{260. Springer-Verlag, Berlin, May 2000.

19

Page 20: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

[43℄ H.J.W. Spoelder, L. Renambot, D. Germans, H.E. Bal, and F.C.A. Groen. Man Multi-Agent Intera tion

in VR: a Case Study with RoboCup. In IEEE Virtual Reality 2000 (poster), Mar h 2000. The full paper

is online at http://www. s.vu.nl/~renambot/vr/.

[44℄ G.D. van Albada, J. Clin kemaillie, A.H.L. Emmen, J. Gehring, O. Heinz, F. van der Linden, B.J.

Overeinder, A. Reinefeld, and P.M.A Sloot. Dynamite - Blasting Obsta les to Parallel Cluster Com-

puting. In High-Performan e Computing and Networking (HPCN Europe '99), number 1593 in Le ture

Notes in Computer S ien e, pages 300{310. Springer-Verlag, Berlin, April 1999.

[45℄ R. van Nieuwpoort, J. Maassen, H.E. Bal, T. Kielmann, and R. Veldema. Wide-Area Parallel Computing

in Java. In ACM 1999 Java Grande Conferen e, pages 8{14, San Fran is o, California, June 1999.

[46℄ C. van Reeuwijk, W.J.A. Denissen, F. Kuijlman, and H.J. Sips. Annotating Spar/Java for the Pla e-

ments of Tasks and Data on Heterogeneous Parallel Systems. In Pro eedings CPC 2000, Aussois, January

2000.

[47℄ C. van Reeuwijk, A.J.C. van Gemund, and H.J. Sips. Spar: a Programming Language for Semi-

automati Compilation of Parallel Programs. Con urren y Pra ti e and Experien e, 9(11):1193{1205,

November 1997.

[48℄ M. van Steen, F.J. Hau k, G. Ballintijn, and A.S. Tanenbaum. Algorithmi Design of the Globe Wide-

Area Lo ation Servi e. The Computer Journal, 41(5):297{310, 1998.

[49℄ M. van Steen, F.J. Hau k, P. Homburg, , and A.S. Tanenbaum. Lo ating Obje ts in Wide-Area Systems.

IEEE Communi ations Magazine, pages 104{109, jan 1998.

[50℄ M. van Steen, P. Homburg, and A.S. Tanenbaum. Globe: A Wide-Area Distributed System. IEEE

Con urren y, pages 70{78, January-Mar h 1999.

[51℄ M. van Steen, A.S. Tanenbaum, I. Kuz, and H.J. Sips. A S alable Middleware Solution for Advan ed

Wide-Area Web Servi es. Distributed Systems Engineering, 6(1):34{42, Mar h 1999.

[52℄ R. Veldema, R.A.F. Bhoedjang, R.F.H. Hofman, C.J.H. Ja obs, and H.E. Bal. Ja kal: A Compiler-

Supported, Fine-Grained, Distributed Shared Memory Implementation of Java. Te hni al report, Vrije

Universiteit Amsterdam, July 2000.

[53℄ J. Waldo. Remote Pro edure Calls and Java Remote Method Invo ation. IEEE Con urren y, pages

5{7, July{September 1998.

20

Page 21: The Distributed ASCI Sup ercomputer Pro jectrvannieuwpoort.synology.me/papers/das.pdf · The Distributed ASCI Sup ercomputer Pro ject Henri Bal, Raoul Bho edjang, Rutger Hofman, Ceriel

[54℄ R.J. Wijngaarden, H.J.W. Spoelder, R. Surdeanu, and R. Griessen. Determination of Two-dimensional

Current Patterns in Flat Super ondu tors from Magneto-opti al Measurements: An EÆ ient Inversion

S heme. Phys.Rev.B, (54):6742{6749, 1996.

21