[ieee comput. soc. press 11th international parallel processing symposium - genva, switzerland (1-5...

7
Oblivious Routing Algorithms on the Mesh of Buses Kazuo Iwama Department of Computer Science and Communication Engineering Kyushu University, Fukuoka 812-81, Japan [email protected] Abstract An optimal r1.5N1/21 lower bound is shown for oblivious routing on the mesh of buses, a two- dimensional parallel model consisting of N112 x N112 processors, N112 row and N112 column buses but no lo- cal connections between neighbouring processors. Many lower bound proofs for routing on mesh-structured mod- els use a single instance (adversary) which includes dif- Jicwlt packet-movement. This approach does not work in our case; our proof is the first which exploits the fact that the routing algorithm has to cope with many different instances. Note that the two-dimensional mesh of buses includes 2N1I2 buses and each processor can access two different buses. Apparently the three- dimensional model provides more communication facil- ities, namely, including 3N2I3 buses and each processor can access three different buses. Surprisingly, however, the oblivious routing on the three-dimensional mesh of buses needs more time, i.e., i2(N2I3) steps, which is another important result of this paper. 1. Introduction The two dimensional mesh is widely considered to be a promising parallel architecture in its scalability. In this architecture, processors are naturally placed at in- tersections of horizontal and vertical grids, while there can be two different types of communication links: The first type is shown in Figure 1. Each processor is con- nected to its four neighbours and such a system is called the mesh-connected computer (MCfor short). Figure 2 shows the second type: Each processor is connected to a couple of (row and column) buses. The system is then called the mesh of buses (MBUS for short). Routing is a basic form of communication among the processors, which is especially important in this type of parallel computer because it relatively takes a long Eiji hliyano Department of Coinputer Science and Communication Engineering Kyushu University, Fukuoka 812-81, Japan [email protected] time and there are no obvious algorithms. In the case of MCs including N112 :x N'I2 processors, the tight 2N'I2 - 2 bound for routing has long been known, which even holds tinder the condition of constant buffer size [GG95, LMT95, R0921. Figure 1: MC Figure 2: MBUS In this paper, we prove a [1.5N1I21 lower bound for routing on the MBUS under the oblivious condition. This lower bound is exact since the same upper bound is already known [IMK96, LS941. The 2N1I2 - 2 lower bound for MCs easily comes from the physical distance, 2N1I2 - 2, between farthest two processors. Namely, the bound can be proved using a single instance that includes a packet that is to move between the most dis- tant two processors. The situation completely differs in MBUSs: Their physical ldistance is two, i.e., only two bus-rides are enough to move a packet between any two processors. Instead, the communication width is small, i.e., there are only N112 horizontal and N'I2 vertical buses. This fact combined with the oblivious condi- tion immediately implies a weaker 1.0n lower bound using the common bisection argument for again a sin- gle instance such that patckets in the left half move to the right half and vice versa. Unfortunately this is the best we can do, since any single instance can be routed in.(n + 1) steps [IMK96, LS94]. Thus, to obtain lower 721 1063-7133197 $10.00 0 1997 IEEE

Upload: e

Post on 14-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

Oblivious Routing Algorithms on the Mesh of Buses

Kazuo Iwama Department of Computer Science and

Communication Engineering Kyushu University, Fukuoka 812-81, Japan

[email protected]

Abstract

A n optimal r1.5N1/21 lower bound is shown for oblivious routing on the mesh of buses, a two- dimensional parallel model consisting of N112 x N112 processors, N112 row and N112 column buses but no lo- cal connections between neighbouring processors. Many lower bound proofs for routing on mesh-structured mod- els use a single instance (adversary) which includes dif- Jicwlt packet-movement. This approach does not work in our case; our proof is the first which exploits the fact that the routing algorithm has to cope with many different instances. Note that the two-dimensional mesh of buses includes 2N1I2 buses and each processor can access two different buses. Apparently the three- dimensional model provides more communication facil- ities, namely, including 3N2I3 buses and each processor can access three different buses. Surprisingly, however, the oblivious routing on the three-dimensional mesh of buses needs more time, i.e., i2(N2I3) steps, which is another important result of this paper.

1. Introduction

The two dimensional mesh is widely considered to be a promising parallel architecture in its scalability. In this architecture, processors are naturally placed at in- tersections of horizontal and vertical grids, while there can be two different types of communication links: The first type is shown in Figure 1. Each processor is con- nected to its four neighbours and such a system is called the mesh-connected computer (MCfor short). Figure 2 shows the second type: Each processor is connected to a couple of (row and column) buses. The system is then called the mesh of buses (MBUS for short). Routing is a basic form of communication among the processors, which is especially important in this type of parallel computer because it relatively takes a long

Eiji hliyano Department of Coinputer Science and

Communication Engineering Kyushu University, Fukuoka 812-81, Japan

[email protected]

time and there are no obvious algorithms. In the case of MCs including N112 :x N'I2 processors, the tight 2N'I2 - 2 bound for routing has long been known, which even holds tinder the condition of constant buffer size [GG95, LMT95, R0921.

Figure 1: MC Figure 2: MBUS

In this paper, we prove a [1.5N1I21 lower bound for routing on the MBUS under the oblivious condition. This lower bound is exact since the same upper bound is already known [IMK96, LS941. The 2N1I2 - 2 lower bound for MCs easily comes from the physical distance, 2N1I2 - 2, between farthest two processors. Namely, the bound can be proved using a single instance that includes a packet that is to move between the most dis- tant two processors. The situation completely differs in MBUSs: Their physical ldistance is two, i.e., only two bus-rides are enough to move a packet between any two processors. Instead, the communication width is small, i.e., there are only N112 horizontal and N'I2 vertical buses. This fact combined with the oblivious condi- tion immediately implies a weaker 1.0n lower bound using the common bisection argument for again a sin- gle instance such that patckets in the left half move to the right half and vice versa. Unfortunately this is the best we can do, since any single instance can be routed in.(n + 1) steps [IMK96, LS94]. Thus, to obtain lower

721 1063-7133197 $10.00 0 1997 IEEE

Page 2: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

bounds better than l.On, we can no longer depend on a single instance. To the best knowledge of the authors, our proof in this paper is the first one that exploits the fact that the algorithm has to cope with many different instances.

Recall that our results need the oblivious condition: Suppose that a processor, P , now holds a packet, 2. Then P has to make a decision about (i) on which link 2 should be written and (ii) when P should do so. In the case of point-to-point connected systems like MCs and hypercubes, (ii) is less important since there is no concern of data-collision. Therefore the usual oblivious condition, originally introduced for such systems, only takes care of (i) [BH85, KKT91, Kri91, RT921. Namely decision (i) is made only by the packet 2. In the case of MBUSs, (ii) is even more important since to prevent data-collision on the bus is a major task of routing al- gorithms. Our oblivious condition also takes care of (ii) and therefore is stronger than the usual one. How- ever, it is very unlikely that the bound can be improved by relaxing this oblivious condition. More discussion will be made in Section 4 about this point. Without the oblivious condition, we have to be careful even in claiming seemingly trivial lower bounds [IMK96].

One can naturally think of the model which is equipped with both buses and local connections (MCs with buses). The known lower and upper bounds for this model are 0.691N [CL931 and i N [LS91], respec- tively.

In both MCs and MCs with buses, the bound for routing decreases to O(N113) if the model becomes three dimensional. In the case of MBUSs, the bound increases to Q(N213); this result is another important contribution of this paper. Note that the MBUS can be regarded as a parallel model with restricted shared memory which has been emerging recently as an im- plementable model [ABK95, MNV941. Namely, the two-dimensional MBUS is a PRAM(2N112, 2), which means that the processors can access only 2n1I2 mem- ory cells and each single processor can access only two memory cells. The three-dimensional MBUS is a PRAM(3N2f3, 3) that has a lot more memory cells and better accessibility. Even so, routing becomes harder and there is no obvious way of improving it, e.g., by relaxing the oblivious condition, which is quite surpris- ing.

In the next section, more formal description is given about models, problems and the oblivious condition. The lower bound of the two-dimensional case is proved in Section 3. Section 4 discusses routing on three and more dimensional MBUSs.

2. Models and problems

In this and the next sections, we regard the size of an MBUS as n x n instead of N112 x N112 just for simplicity of description. Thus an MBUS consists of n2 processors, Pi,j, 1 5 i , j 5 n, and n row and n column buses, RO Wi and COLj, respectively. Pi,j is connected to ROWi and COLj. The problem of per- mutation routing on the MBUS is defined as follows: The input is given by n2 packets that are initially held by the n2 processors, one for each. Each packet, (d , 0)) consists of two portions; d is a destination address that specifies the processor to which the packet should be moved, and the data portion U of the packet is an inte- ger. No two packets have the same destination address. Routing requires that all n2 such packets be moved to their correct destinations. This does not necessarily mean that each packet must be moved to its destina- tion “physically;” it is enough that each destination processor can “create” the packet using whatever in- formation it takes from its row and column buses.

Our discussion throughout this paper is based on the following four rules on the model: (i) We follow the common practice on how to measure the running time of MBUSs: The one-step computation of each proces- sor P consists of (a) reading the current data on both row and column buses P is connected to, (b) execut- ing arbitrarily complicated instructions using the local memory and (c) if necessary, writing data to the row or column bus, or (possibly different data) to both. The written data will be read in the next step. (ii) Algo- rithms should be designed so that a collision of data on a single bus will never occur. (iii) The buffer size is not bounded, namely, an arbitrary number of packets can stay on a single processor temporarily. (iv) What can be written on the buses by the processor P must be the packet originally given to P as its input packet or one of the packets that have been read so far by P from its row or column bus. (Nothing other than packets can be written.) The last condition (iv) means that any kind of data compression is not allowed. Nevertheless, there is still a lot of room we can do something tricky: For example, we need only 0.97n steps for routing some kind of instances where every packet has to move hor- izontally [IMK96]. (Since there are n2 packets and n row buses, we need at least n steps if every packet has to ride on some row bus.) As for general routing, how- ever, the following upper bound seems to be the best possible, and the algorithm achieving this upper bound satisfies the oblivious condition.

Theorem 1 [IMK96]. There is a r1.5nl step obliv- ious routing algorithm on MBUSs.

722

Page 3: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

In this paper, we prove that this rl.5nl is also a lower bound. To do so, however, we need to assume the oblivious condition: At some moment, the processor P holds several packets a1, az, . . . , a ~ , one of which is an original input packet, in its local memory. Roughly speaking, what a routing algorithm has to do is to de- cide for each ai (i) on which bus (row or column) P writes ai and (ii) in what step P does so. A routing algorithm is said to be oblivious if it makes those two decisions depending on only the destination of ai and the processor number of P I regardless of any other in- formation such as the integer part of ai, when P read it previously, any other packets P holds, etc. Note that some (most) of a1, a2,. . . , a k are not written on the buses at all. So, we may assume that P holds only packets it will write later. In this case P decides whether it holds ai or not from the destination of ai at the moment P reads it.

In the O.97n-step routing mentioned previously, there must be a processor P that does not read the packet a whose destination is P but can “create” a. For P to do so, it has to exploit every information on the buses, which was allowed under the basic conditions (i)-(iv). However, such a trick is no longer possible un- der the oblivious condition.

Lemma 1. Suppose that the destination of a packet a is a processor Pi,j. Then, under the oblivious condi- tion, a must ride on RO W i or COLj before the routing is completed. (The proof is omitted.)

Thus the 1.0 lower bound is now apparent for obliv- ious routing.

3. Lower bounds for oblivious routing

In this section, we prove the r1.5nl lower bound for oblivious routing on the two-dimensional MBUS of n x n processors.

Theorem 2 (Main Theorem). Any oblivious routing on MBUSs needs r 1 . 5 ~ ~ 1 steps.

Proof. We only discuss the case that n is even since the other case is similar. Suppose for contradiction that there is an oblivious algorithm, say A , which runs for all instances in at most 1.5n - 1 steps. Recall that the in- teger portion of a packet plays no role under the oblivi- ous condition; only its destination portion is important. Hence we can assume that there are only n2 different packets, namely, n2 different destinations, which are denoted by the set r. An instance is a sequence of these n2 destination addresses, ( d l , d z , . . . , d+), where d(i-l),+j E r is originally placed in its source processor

Pi,j for 1 5 i , j 5 n. In other words, the set C of in- stances is a set of permutations over those n2 packets in I?. A packet held in its source processor is called a pn’- m a y packet and a packet in an intermediate processor (other than its source processor) a seconda y packet.

Let us review what the oblivious condition means: For a primary packet a E r and its source processor, Pi,j, the algorithm A has to make the following com- putation only from a and the position (i,j) of the pro- cessor: (i) In which bus ( R O W i or COLj) Pi,j writes a and (ii) in which time-slot Pi,j does so, where a t ime- slot denotes an integer in the set { 1 , 2 , . . . ,1.5n - 1 ) denoted by A. Suppose for example that a is written on ROWi. Then a is read by all the processors on that row and is held in their local memories as a secondary packet. Now each processor decides in which time-slot in A U {nil} it writes a on its column bus where ‘hil” means it does not write a at any time-slot. (a may be written on the row bus as well by definition, but it does not make sense since a was read from the row bus. Also note that a primary packet may not be writ- ten on a row or a column bus at all if its destination is the source processor itself.) Thus two or more pro- cessors on ROWi may write a on the column buses. Again the particular time-slot has to be determined by only a and the processor number.

When considering different instances, a packet a can be placed originally in each of processors P I , Pz , . . . , P, on COLi. Then m (0 5 m 5 n ) out of these n pro- cessors will write a on C‘OLi, which is uniquely deter- mined under the oblivious condition. For reasons de- scribed later, we need special care when m = l , which makes the following definitions necessary: For a packet a and a processor P , P is called a singular processor for a wrt column if a is written on the column bus when a is originally placed in .P but a is written on the row bus when a is originally placed in any of the other n - 1 processors on that column. A singular processor for a wrt row is defined similarly, i.e., by switching “row” and “column.” Note thak if P is a singular processor for a wrt row, then P is not a singular processor for a wrt column. If P is a singular processor for two or more packets wrt column, we select an arbitrary one of them and call it a permissible packet and the others prohibited packets for the singular processor P . Let C- be a subset of C such that if I E C- then I includes no prohibited packets placed in its singular processor. In the following, it is shown that the algorithm A cannot maintain this subset E-.

Now, fix an instance I in E-, some row bus ROWi and a processor Pj on that bus. Then let S ( I ; P j , ROWi) denote the set of packets C r which, when the given instance is I, are held by Pj as sec-

723

Page 4: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

ondary packets and are to be written on the row bus ROWi (i.e., their time-slots are not n i l ) . Also, define

S ( P j , ROWi) = U S ( I , Pj, ROWi) , I € E -

S ( R 0 W i ) = U S ( P j , R O W i ) ,

SROW = U S ( R 0 W i ) .

1<j<n

l<i<n

Note that S(Pj, ROWi) can include up to n2 packets even if S(I, Pj, ROWi) is a few for each instance 1. Let t (a , Pj, ROWi) be the time-slot (uniquely determined) at which the packet a E S ( P j , ROWi) is written by Pj on ROWi. Define also

T ( a , R O W i ) = U i t (&, Pj 9 ROW^)), l<j$n

T ( P j , R O W i ) = U M f f , 4, R O W i ) ) , cr€S(Pj,ROW;)

T ( R 0 W i ) = U T ( P j , R O W i ) . l<j<n

Lemma 2. Let a1 and a2 be distinct two packets in S ( R 0 W i ) . Then T ( q , R O W i ) n T ( a 2 , ROWi) = 0.

Consider two time-slots t(a1, Pj, ROWi) and t( f fZ,Pk, ROWi) . If j # k , then t (a l ,P j , ROWi) = t(a2,Pk, ROWi) clearly allows us to create an in- stance in E- in which a1 and a2 collide on ROWi. Suppose that j = k . Since we only consider instances in E-, one can construct a particular instance where a1 and a2 arrive at Pj (= 4) from distinct two pro- cessors in the same column. (This is the reason why we consider only E-; if a1 and a2 would have the same singular processor, then we could not obtain such an instance.) Thus t(a1, Pj , ROWi) = t(a.2, Pj, ROWi)

Lemma 3. If IS(R0Wi) l 2 k , then IT(R0Wi)l 2 k . In other words, if JT(ROWi) l < k , then IS(R0Wi)l < k . (Immediate from Lemma 2. )

Proof.

also implies a collision. 0

This lemma plays a key role in the following proof and also in the proof of Theorem 3 in the next section. We say that a packet a is placed at a general position if its source processor P is not prohibited and P is not on the same row or column bus as a's destination.

Lemma 4. Suppose that a SROW. Then a must be written on a row bus first if it is placed at a general posit ion.

Proof. If a rides on a column bus first, then it 0 cannot move horizontally, which is a contradiction.

Now we are entering the main portion of the proof. The following three cases on the number of the time- slots will be considered:

(Case 1) For all i, IT(R0Wi) l 2 n - 1.

(Case 2) For all j , IT(COLj)l 5 n - 1.

(Case 3) There is at least one row such that IT(R0Wi)l 2 n, and at least one column such that IT( C0Lj)I 3 n.

Case 1. At this moment we temporarily aban- don the 1.5n lower bound and will try to prove a 1.5n - 1 lower bound, namely, to prove that the al- gorithm A cannot finish the job within 1.5n - 2 steps. Let A-1 = {1,2, . . . ,1.5n - 2). At the end of the proof for this Case 1, we will describe how to strengthen the proof to manage the original 1.5n lower bound. By Lemma 3, the assumption IT(R0Wi) l 5 n - 1 im- plies IS(R0Wi) l 5 n - 1 for all i. Then I S ~ o w l 5 Ci IS(ROWi)l 5 n2 - n, i.e., at least n packets are not in SROW. We furthermore assume that those n packets are destined at least two columns. Hence we can denote those n packets by a1, ag, , a, so that the column destination of ai will be different from the i th column for all 1 5 i 5 n. This multi-destination assumption will also be removed shortly.

Now consider the following instance 11: The above packets a l , . - . , a n are placed on PI,^,..., PI,, of the first row, respectively. However, if ai is prohibited for PI,;, then that ai is replaced by an arbitrary packet

{ a l , . . . , an}. As for the other rows, it does not mat- ter how the remaining n2 - n packets are placed origi- nally. Suppose that these are w1 such prohibited pack- ets among (a1, + . . , a,}. Then the other n - 01 packets must go horizontally first as primary packets and there- fore they occupy n - w1 time-slots as primary packets. That means we have at most 1.5n - 2 - ( n - w 1 ) = 4 + w1 - 2 time-slots for secondary packets. Formally, IT(ROW1)I 5 4 + w1 - 2. We can make a similar ar- gument when ' ~ 1 , . . e , a, are placed on the i th row. It follows that IT(R0Wi) l 5 f + wi - 2.

Claim 1. I S ( R O W > ~ 5 $ - n

Proof. Since IT(R0Wi)l 5 f + wi - 2, xi IT(R0W;)I 5 $-2n+(vl+v2+-..+vn). However, (wl+. - .+wn) 5 n because there is at most one singular processor for the particular aj on the j t h column. (If P is a singular processor wrt column for some packet a , then no other processor on that column can be a sin- gular processor for a.) Since xi I T ( R O W ~ ) ~ < $ - n, it follows that IS(R0W)l 5 xi IS(R0Wi) l 5 $ - n by Lemma 3. 0

724

Page 5: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

This claim means there are $ + n packets which must move horizontally first (unless placed in singu- lar processors). Namely, we were able to increase the number of those moving-horizontally-first packets from n (i.e., “1,. , a, originally) to this many. Let MHF be the set of those $ + n packets.

We assumed that a1,. . . , an have at least two different column destinations. Suppose that they all have the same column destination. Then, the number of new moving-horizontally-first packets may be less than before but we can apparently obtain at least one new such packet, say a. a’s column destina- tion must be different from the column destination of al , . - , a,. Hence we can replace an arbitrary ai by a. Now we can restart the argument using this new a1,. . . , a , . .. , a n .

Remark.

Now we divide MHF into two disjoint subsets MHF,, and MHFd,,. MHF,, (MHFdwn) contains the packets whose row destinations are the upper half, i.e., 1 through f (the lower half, i.e., $+ 1 through n). Now take n packets from MHFdwn arbitrarily but so that it must not happen that they can be decomposed into 4 pairs (a1, PI ) , . , (a , /2 , Pnp) such that the row destinations of ai and Pi are the same for all i and the row destinations of any different two pairs are differ- ent (namely the row destinations of the $ pairs exactly covers all the rows of the lower half). Such a selection of n packets is always possible if lMHFdwnl 2 n + 1.

, a,, we try to find a row i which includes at most one “bad” position, say j . Here, a “bad” position means that (i) the processor Pi,j on that row is a singular processor for aj or (ii) the row destination of aj is the row i itself.

Claim 2. Such a row (including at most one bad position) can be always found.

Proof. We first try to select the row from the up- per half. Since 01, . . . , on are taken from MHFdwn, the second condition (ii) above for the bad position never happens. So, we are done if there is a row that contains at most one prohibited processor. Suppose otherwise that every row contains at least two prohibited proces- sors. Then there are no prohibited processors in the lower half (see Claim 1). So, we then try to find the desirable row from the lower half. Now the condition we imposed when selecting a1 , - . - , a, becomes impor- tant. Recall that there is at least one packet whose row destination, say k , is not shared by any other packet. Hence the row k includes at most one bad position. 0

Now we can place cy1, . . . , an on the row which in- cludes at most one bad position. Then, at least n - 1 packets out of a1, . . . , an must go horizontally first and

For these n packets, again denoted by al, 1

then vertically. (At most one packet at the bad posi- tion is a singular processor or does not have to move vertically since the row destination is the current row.) Those n - 1 packets should be assigned different time- slots as primary packets, one of which must be n - 1 or larger. Let a be the packet which is assigned this largest time-slot. Then the time-slot at which a can move vertically must be n or larger. Thus, from the n packet a1,. - , a,, we can squeeze one packet whose time-slot for vertical move is n or later. Let us call this packet a late packet.

After selecting such a , we once return the remaining n - 1 packets into MHFd,,. Then we repeat the same procedure to obtain another late packet. This can be continued until IMHFd,,,I becomes n. Then we do the same thing for IMHF,,I until IMHF,,I becomes n. One can see that we have squeezed $ + n - 2n = $ - n late packets, since the original size of MHF,, U

Now let us squeeze one more late packet. Recall that there remain n packets a1,. . . , a, in MHFdwn and PI, . . . , Pn in MHF,, . The worst case is that there are two bad positions no matter which row they are placed on. Suppose, for example, that a1 and a2 are at bad positions in the lower half of the space. Then we try to exchange a1 with Pj.. Then the position may be still bad since it is prohLibited for PI. Then undo the previous exchange and try a new exchange, i.e., a1 and

Since the position was prohibited for PI, it can never be prohibited for P 2 . Thus the number of bad positions is reduced to ‘one, which allows us to select one late packet.

Now there are $ - n + 1 late packets. However, this implies a contradictmion as follows. Recall that the time-slot of late packets is n or more and the largest time-slot in A-’ is 1.5n - 2. So, the total number of time-slots that are at least n on n vertical buses is ((1.5n - 2 ) - ( n - 1)) x n = - n2z - n. This must be the maximum number of lat,e packets.

na MHFdwn is + n.

Remark. We need two modifications in order t o strengthen the current 1.5n - 1 lower bound into the 1.5n lower bound. One is to obtain $ + n packets (the same number as the current one) in MHF even if the largest time-slot increases by one. Review Claim 1: Let U = ~1 + . - . + U,. Then. xi IT(ROWi)l _< $ - n + U

since the limit of the time-slot is now 1.5n - 1. Then lMHFl would become $+n-v which is less than what we want. The idea is to exploit permissible packets: By selecting prohibited and permissible packets carefully, it is possible to obtain ru different permissible packets and we can also prove that those permissible packets are not in M H F at this moment. We simply add them,

725

Page 6: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

which makes the size of M H F increase to + n. The second modification is simpler: Recall that we squeezed one more packet after IMHF,,I = lMHFdwnI becomes n. We just continue this procedure and can squeeze n more packets that matches the increase of n vertical time-slot s.

This concludes the proof of Case 1.

Case 2. Everything is the same as Case 1.

Now we are going to investigate Case 3, which is divided further into the following three subcases:

Case 3-1. I S ~ o w l _< n2 - f. Let M H F be the set of f packets not in SROW. By the assumption of Case 3, there is at least one row bus, ROWi , such that IT(R0Wi)l 2 n. If there exist only $ - 1 singular processors or less on ROWi , then there exist at least

+ 1 positions where we can place primary packets in M H F . Among these positions, at most one position might be bad, but at least $ packets can be placed as primary packets. That means we need 1.5n steps. Sup- pose otherwise that ROWi has $ + U (U > 0) singular processors. Then we can conclude that there are $ + U

permissible packets, P I , . . . , (here we need the same care as mentioned in the remark above when we determine permissible and prohibited packets). Then the n packets in MHF U {PI, . . . , can act as the first n packets a1,. . . , a, of Case 1. Do exactly the same discussion as Case 1 afterwards.

Case 3-2. ISCOL( 5 n2 - 5. Similar t o Case 3-1.

Case 3-3. I S ~ o w l 2 n2 - f + 1, and ISCOL( 2

n2 - f + 1 and Xi IT(COLi)I 2 n2 - f + 1. Hence the number of total time-slots for secondary packets for all the buses is at least 2n2 - n + 2. Since there are n2 time-slots for primary packets, we need at least 3n2 - n + 2 time-slots in total. However, we actually have 2n x (1.5n - 1) = 3n2 - 2n ones, a contradiction. 0

n2- II + 1. By Lemma 3, Cj IT(R0Wi)l 2 ( S ~ o w l 2

4. Higher dimensional routing

In this section, we first discuss the routing problem on S-dimensional N 1 / 6 x N116 x . . . x N 1 / 6 MBUSs under the current oblivious condition:

Lemma 5. For oblivious routing on &dimensional MBUSs (6 is a constant) there is at least one bus on which C2(N6-’16) secondary packets must appear when considering the whole set C of instances.

Proof. Fix a processor P , and see what happens when we place each of N packets on P one by one. Then each of N - 1 packets is to be written on one

of the S buses (one of the N packets is destined for P itself). On average, packets ride on a single bus. Namely, at least one bus, say B , carries that number of primary packets when handling all the instances. Since

sors through B , one of those N1/’ - 1 processors, say Q, holds at least packets. Even if the desti- nation of one of those packets might be Q, the other

- 1 packets have to go out as secondary pack- ets through the 6 - 1 buses other than B. On average,

- l ) / (S - 1) secondary packets appear per bus. Thus there must be at least one bus that has to carry at least eN‘-’/‘ secondary packets for some constant c. 0

Oblivious routing on S-dimensional MBUSs (6 is a constant) needs C2(N6-1/6) steps.

Proof. Immediate from Lemmas 3 and 5. Although we did not consider the singular processors, it can only

0

It is not hard to obtain an oblivious routing algc- rithm which runs in the same time-bound, i.e., this lower bound is also an upper bound. One can see that the time-bound rises as S increases. The reason is as follows: Take a look at some bus B. Then in the second phase of the algorithm, namely after each processor has sent its primary packet, the number of different packets that can reach the processors on bus B can be as large as approximately N/N116 if we take all the permuta- tions into account. Hence we have to assign different time-slots t o those packets in order to prevent a colli- sion on the bus. However, if we consider a particular single instance, the number of packets which reach a single processor on bus B is at most N116. The dif- ference between N/N1I6 and N1/’ is nothing if 6 = 2 but becomes larger as 6 increases; the former becomes much larger than the latter. One can now think of a different strategy: Instead of giving a time-slot to each packet on B , we should give an enough number of time- slots to each processor on B. An oblivious routing al- gorithm based on this strategy has already been known on the hypercube networks [BH85], where “oblivious” no longer means our current condition.

Theorem 4 [BH85]. There is a deterministic obliv- ious algorithm for the N-processor hypercube that route any permutation in time O(N112).

In [KKTSl] Kaklamanis et d. improved the upper bound into O(N1/2/ logN), and proved that it is also a lower bound. The oblivious condition in these papers requires that the route of each packet must be deter- mined only by its initial position, i.e., its route is not affected by other packets. Historically this oblivious

N-1 packets move from P to the other NI/’ - 1 proces-

N - l 6 ( N 1 / 6 - 1)

N-l (6(”/6-1)

Theorem 3.

changes the constant factor if 6 is a constant.

726

Page 7: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Oblivious

condition was introduced for point-to-point intercon- nection networks such as MCs and hypercubes, where there is no concern about the collision of two or more packets on communication links. Note that our current oblivious condition also satisfies this route-oblivious condition. However, this condition is clearly not suf- ficient for the purpose of preventing collision. That is why another condition about when each packet can be written on the bus is introduced, which was successful in the sense that we were able to prove the lower bound that matches the best known upper bound. Unfortu- nately, the new condition seems to be an obstacle when the dimension increases.

The idea of [BH85] can be used to develop similar algorithms on multi-dimensional MBUSs.

Theorem 5. There are O(Nm+l12”) and O( Nm+lIam+’) step oblivious routing algorithms on 2m- and 2m + 1-dimensional MBUSs for any constant m, respectively.

Now we can summarize the two different strategies to prevent collision: (i) Different time-slots are allo- cated to packets of different destinations. (ii) Each processor is provided with a sufficient number of time- slots, i.e., being equal to the maximum number of packets that can gather in the processor. An obvi- ous attempt is to make the algorithm exploit those two strategies dynamically. It actually works:

There are O(N1/’) step oblivious routing algorithms on &dimensional MBUSs for any even S. (The proof is omitted.)

For specific values of 6, see Table 1. The bound is still not so good for odd 6’s, especially for 5 = 3, which will be an interesting topic for the future research.

Theorem 6.

Table 1: Upper bounds on &dimensional MBUSs

5 . Concluding remarks

In this paper, we proved the r1.5~~1 lower bound for the permutation routing on the two-dimensional MBUS. To do so, we needed the rather strong oblivious condition. However, we believe that the bound does not change if the condition is relaxed within a reason- able extent. Unfortunately, it appears to be very hard to obtain the same lower bound without the current form of the oblivious condition. So, what we can do in the future might be to show that our oblivious con- dition is necessary for fast routing algorithms on the two-dimensional MBUS .

References

[ABK95] M. Adler, J.W. Byers and R.M. Karp, “Par- allel sorting with limited bandwidth,” In Proc. ACM Synzp. on Parallel Algorithms and Architectures (1995) 129-136.

[BH85]

[CL931

[GG95]

[IK89]

[IMK96]

A. Borodin and J.E. Hopcroft, “Routing, merging, and Eiorting on parallel models of computation,” J. Computer and System Sci- ences 30 (1985) 130-145.

S. Cheung and F.C.M. Lau, “A lower bound for permutation routing on two-dimensional bused meshes,” Information Processing Let- ters 45 (1993) 2125-228.

Q.P. Gu and J. Gu, “Two packet routing algorithms on a mesh-connected computer,” IEEE Trans. on. Parallel and Distributed Sys- tems, Vol. 6, No. 4 (1995) 436-440.

K. Iwama and Y. Kambayashi, “An O(1og n) parallel connectivity algorithm on the mesh of buses,” In Proc. 11th IFIP World Computer Congress, (1993) 225-228.

K. Iwama, E. Miyano and Y. Kambayashi, “Routing problems on the mesh of buses,” J. Algorithms 20 (1996) 613-631.

[KKTSl] C. Kaklamanis, D. Krizanc and A. Tsanti- las, “Tight bounds for oblivious routing in the hypercube,” Math. Systems Theory 24-( 1991) 223-232.

[KriSl] D. Krizanc, “Oblivious routing with limited buffer capacity,” J. Computer and System Sciences 43 (19!31) 317-327.

[LMT95] F.T. Leighton, F. Makedon and I. Tollis, “A 2n-2 step algorithm for routing in an n x n ar- ray with constant queue sizes,” Algorithmica 14 (1995) 291-3134.

L.Y.T. Leung and S.M. Shende, “Packet rout- ing on square rneshes with row and column buses,” In Proc. ACM Symp. on Parallel Al- gorithms and Architectures (1989) 328-335.

L.Y.T. Leung and S.M. Shende, “On multi- dimensional packet routing for meshes with buses,” J. Parallel and Distributed Comput- ing 20 (1994) 187-197.

[MNV94] Y. Mansour, N. Nisan and U. communica- tion throughDutt and Darallel time.” In Proc.

[LS91]

[LS94]

[R092]

[RT92]

A CM Symi. -on Theory of Comput&g, (1994) 372-381. S. Rajasekaran and R. Overholt, “Constant queue routing on a mesh,” J. Parallel and Distributed Comput., 15 (1992) 160-166.

S. Rajasekaran and T . Tsantilas, “Optimal routing algorithms for mesh-connected pro- cessor arrays,” .4lgorithmica 8 (1992) 21-38.

727