[ieee comput. soc. press 11th international parallel processing symposium - genva, switzerland (1-5...

7
Characterization of Deadlocks in Interconnection Networks* Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group EE-System Dept., University of Southern California, Los Angeles, CA 90089-2562 (sugath @ scJ tpink@charity) .usc.edu; http://www.usc.edu/dept/ceng/pinkston/SMART. html Abstract Deadlock-free routing algorithms have been developed re- cently without fully understanding the frequency and char- acteristics of deadlocks. Using a simulator capable of true deadlock detection, we measure a network’s susceptibility to deadlock due to various design parameters. The effects of bidirectionality, routing adaptivity, virtual channels, buffer size and node degree on deadlock formation are studied. In the process, we provide insight into the frequency and char- acteristics of deadlocks and the relationship between routing flexibility, blocked messages, resource dependencies and the degree of correlation needed to form deadlock. 1 Introduction Interconnection network routing algorithms aim to mini- mize message blocking by efficiently utilizing network vir- tual channel and physical channel resources while ensur- ing deadlock freedom. Routing approaches to accomplish this can be based on avoiding deadlock or on recovering from deadlock. The main distinction between these two ap- proaches is the decision made in trading off routing freedom and deadlock formation. Avoidance-based routing algorithms enforce certain rout- ing restrictions in order to altogether avoid deadlocks [ 1, 2, 31. Recovery-based routing algorithms relax routing re- strictions and recover from potential deadlock situations [4, 51. The circumstances under which either routing approach is preferable depend critically on the frequency with which deadlocks occur and the resulting effects. For instance, dead- lock may be so infrequent for a particular network configura- tion that avoidance-based routing inefficiently uses network resources, resulting in frequent message blocking. On the other hand, deadlock may be so frequent and costly in some network configurations that avoidance-based routing outper- forms recovery-based routing. This paper precisely quantifies the frequency and charac- teristics of deadlock formation in wormhole and cut-through k-ary n-cube networks and identifies network design param- eters which influence deadlock formation. This enables us to better understand the nature of deadlocks and their likelihood and to determine the circumstances under which routing algo- rithms should be based on recovery as opposed to avoidance. *This research was supported by an NSF Research Initiation Award, grant ECS-9411587, and an NSF Career Award, grant ECS-9624251. In accomplishing this, we analyze the effects of different traf- fic patterns, bidirectionality, routing adaptivity, node degree, number of virtual channels and buffer depth on the frequency and characteristics of deadlocks. To our knowledge, no other study of router-related deadlock in interconnection networks has been performed to the detail presented here. In the next section, we classify deadlocks through exam- ple. Section 3 presents the experiments we performed and the results. Section 4 presents related work and important find- ings are summarized in Section 5. 2 Deadlock Formation Deadlocks in interconnection networks can occur as a re- sult of cyclic resource dependencies formed when messages hold onto some resources (i.e., virtual channels) while wait- ing to acquire others. As a message progresses through a net- work, it acquires exclusive ownership of a virtual channel (VC) prior to each hop. When the header flit of a message blocks, it can be thought of as requesting the exclusive use of one of possibly many alternative VCs in order to progress to the next hop. A blocked message resumes once a new VC is acquired. As the tail of a message moves through the net- work, it releases previously acquired VCs no longer needed, so they can become available for other messages. The “ex- clusive ownership” and “resource wait-for’’ conditions along with the condition that messages are not preempted makes cyclic dependencies and deadlock possible. 2.1 Depicting Deadlocks We use channel wait-forgraphs (CWGs) [6] to model re- source dependencies within interconnection networks. Al- though similar to dependency graphs used in previous work [3,7,8], these graphs depict network state reflecting resource allocations and requests existing at a particular point in time, not the resource allocations allowed by the routing algorithm. Hence, in this context, CWGs depicting the entire network state are not necessarily connected. Figures 1 through 4 show examples of messages being routed in k-ary n-cube wormhole networks, along with the corresponding CWGs. In the network illustrations (Figures la, 2a, 3a and 4a), the source and destination nodes of mes- sage mi are labeled si and di, respectively. VC labeling in these figures is done only to facilitate explanation and is not intended to convey information regarding the relative po- sitions of VCs within the network. In the CWGs (Figures 80 1063-7133/97 $10.00 0 1997 IEEE

Upload: tm

Post on 11-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

Characterization of Deadlocks in Interconnection Networks*

Sugath Warnakulasuriya Timothy Mark Pinkston SMART Interconnects Group

EE-System Dept., University of Southern California, Los Angeles, CA 90089-2562 (sugath @ scJ tpink@ charity) .usc.edu; http://www.usc.edu/dept/ceng/pinkston/SMART. html

Abstract Deadlock-free routing algorithms have been developed re-

cently without fully understanding the frequency and char- acteristics of deadlocks. Using a simulator capable of true deadlock detection, we measure a network’s susceptibility to deadlock due to various design parameters. The effects of bidirectionality, routing adaptivity, virtual channels, buffer size and node degree on deadlock formation are studied. In the process, we provide insight into the frequency and char- acteristics of deadlocks and the relationship between routing flexibility, blocked messages, resource dependencies and the degree of correlation needed to form deadlock.

1 Introduction Interconnection network routing algorithms aim to mini-

mize message blocking by efficiently utilizing network vir- tual channel and physical channel resources while ensur- ing deadlock freedom. Routing approaches to accomplish this can be based on avoiding deadlock or on recovering from deadlock. The main distinction between these two ap- proaches is the decision made in trading off routing freedom and deadlock formation.

Avoidance-based routing algorithms enforce certain rout- ing restrictions in order to altogether avoid deadlocks [ 1, 2, 31. Recovery-based routing algorithms relax routing re- strictions and recover from potential deadlock situations [4, 51. The circumstances under which either routing approach is preferable depend critically on the frequency with which deadlocks occur and the resulting effects. For instance, dead- lock may be so infrequent for a particular network configura- tion that avoidance-based routing inefficiently uses network resources, resulting in frequent message blocking. On the other hand, deadlock may be so frequent and costly in some network configurations that avoidance-based routing outper- forms recovery-based routing.

This paper precisely quantifies the frequency and charac- teristics of deadlock formation in wormhole and cut-through k-ary n-cube networks and identifies network design param- eters which influence deadlock formation. This enables us to better understand the nature of deadlocks and their likelihood and to determine the circumstances under which routing algo- rithms should be based on recovery as opposed to avoidance.

*This research was supported by an NSF Research Initiation Award, grant ECS-9411587, and an NSF Career Award, grant ECS-9624251.

In accomplishing this, we analyze the effects of different traf- fic patterns, bidirectionality, routing adaptivity, node degree, number of virtual channels and buffer depth on the frequency and characteristics of deadlocks. To our knowledge, no other study of router-related deadlock in interconnection networks has been performed to the detail presented here.

In the next section, we classify deadlocks through exam- ple. Section 3 presents the experiments we performed and the results. Section 4 presents related work and important find- ings are summarized in Section 5.

2 Deadlock Formation Deadlocks in interconnection networks can occur as a re-

sult of cyclic resource dependencies formed when messages hold onto some resources (i.e., virtual channels) while wait- ing to acquire others. As a message progresses through a net- work, it acquires exclusive ownership of a virtual channel (VC) prior to each hop. When the header flit of a message blocks, it can be thought of as requesting the exclusive use of one of possibly many alternative VCs in order to progress to the next hop. A blocked message resumes once a new VC is acquired. As the tail of a message moves through the net- work, it releases previously acquired VCs no longer needed, so they can become available for other messages. The “ex- clusive ownership” and “resource wait-for’’ conditions along with the condition that messages are not preempted makes cyclic dependencies and deadlock possible.

2.1 Depicting Deadlocks We use channel wait-forgraphs (CWGs) [6] to model re-

source dependencies within interconnection networks. Al- though similar to dependency graphs used in previous work [3,7,8], these graphs depict network state reflecting resource allocations and requests existing at a particular point in time, not the resource allocations allowed by the routing algorithm. Hence, in this context, CWGs depicting the entire network state are not necessarily connected.

Figures 1 through 4 show examples of messages being routed in k-ary n-cube wormhole networks, along with the corresponding CWGs. In the network illustrations (Figures la, 2a, 3a and 4a), the source and destination nodes of mes- sage mi are labeled si and di , respectively. VC labeling in these figures is done only to facilitate explanation and is not intended to convey information regarding the relative po- sitions of VCs within the network. In the CWGs (Figures

80 1063-7133/97 $10.00 0 1997 IEEE

Page 2: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

Figure 1. (a) A "single-cycle deadlock" for DOR with 1 VC. (b) The CWG contains a knot.

Ib, 2b, 3b and 4b), vertices represent VCs. Outgoing arc(s) at each vertex are labeled with the message which currently owns that VC. A path formed by a series of solid arcs with the same label implies the temporal order in which VCs were acquired and continues to be owned by a particular message. Blocked messages are represented by connecting the ends of such paths to one or more desired VCs using dashed arcs. At any vertex, the labels of incoming dashed arcs represent the group of messages that desire to use that VC at this instant in time. Only those portions of the network's CWGs useful for illustrative purposes are shown in these figures.

Figure lashows fivemessages (ml, mz, m3, m4, and m5) being routed statically in dimension order within a torus- con- nected network with one VC. Note that messages ml , m2, and m3 are blocked while messages m4 and m5 have ac- quired all of the channels needed to reach their destinations. Message m l has acquired channels vc1 and vc2, and requires vc3 to continue. Similarly, message ma has acquired chan- nels vc3, vc4, and vc5, and requires VC6 to continue; message m3 has acquired channels VC6, vc7, and vco, and requires vcl to continue. Thus, each of these blocked messages will wait indefinitely for one of the other messages in the group to re- lease an owned VC.

Figure l b shows the CWG for the scenario in Figure la. There is a single cycle in this graph consisting of ver- tices VCO, vc1, v e ~ , vc3, vc4, v e ~ , VC6 and vc7. Given the set of all resources involved in this cycle, R = { VCO, vcl , vcz , vc3, vc4, vc5, vc6, V C ~ } , observe that the set of vertices that can be reached by each and every member of R is R itself. This type of relationship formed by vertices in one or more cycles is referred to as a knot [9]. Assuming that the routing function is connected, a knot is a necessary and sufficient condition for deadlock [6] .

2.2 Classifying Deadlocks

2.2.1 Single-Cycle Deadlocks

Deadlock can be characterized by its deadlock set, re- source set, and knot cycle density. The deadlock depicted in Figure 1 is what we refer to as a single-cycle deadlock. In this example, the deadlock involves 3 messages in its dead- lock set {ml, m2, m3}, occupies 8 channels in its resource

\ \ \ I I

I I

I

lm4

f

Figure 2. (a) A "single-cycle deadlock" for min- imal adaptive routing with 1 VC. (b) The CWG contains a knot.

set {vc0,vc~,vc~,vc3,ve4,vc5,Vc~, vc7},andhasaknotcy- cle density of one cycle (true of all single-cycle deadlocks).

Single-cycle deadlocks are more likely to occur in net- works having minimal resources and/or highly restrictive routing options on available resources. As in the above exam- ple (Figure 1) of a torus network with one VC that allows only non-adaptive (static) dimension ordered routing, the routing function returns at most a single channel option. This is re- flected in the CWG by a singledashed outgoing arc at any ver- tex in Figure l b (maximum "fan-out'' of one). In such a net- work, a single cycle is sufficient to form a knot. However, for this to occur, a correlated resource dependency among multi- ple messages must form.

Single-cycle deadlocks are also possible in networks which use less restrictive routing (e.g., minimal adaptive routing with only one VC) when only one routing option is available to all messages comprising the deadlock set (e.g., due to faulty links or routing in the destination's di- mension). An example is illustrated in Figures 2a and 2b. Here, each of the messages ml , m2, m3 and m4 has ac- quired 2 VCs, exhausted their routing adaptivity, and are therefore waiting to acquire the one channel needed to reach their respective destinations. However, the required channels are already owned by members of this group of messages. The CWG (Figure 2b) contains a single cycle, and the ver- tices in this cycle form a knot, R = { vcl vc3, wc5, vc7). Hence, with a knot cycle density of one, this too is a single-cycle deadlock; its deadlock set contains 4 messages { ml , ma, m3, m4} and its resource set includes 8 channels

This single-cycle deadlock not only requires all of the messages in the deadlock set to have exhausted their adaptiv- ity, but also to own all of the resources needed by other mes- sages in the deadlock set. Therefore, an even higher degree of correlation of message resource dependency is required for this type of deadlock to occur.

In this example, message m5 has acquired VCS and wc9, and is waiting for a VC owned by message m3 which is in- volved in the deadlock. Although the message is not able to proceed until the deadlock is resolved, it is not considered to be in the deadlock set as its resources do not meet the condi-

{ V C O , vc1, VCZI vc3, vc4, vc5, vc6, V c 7 ) .

81

Page 3: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

owned by m, owned by m2 owned by m3 owned by m4

E owned by ms owned by mg owned by m7

EJ owned by mg

owned by m, owned by mz owned by m3 owned by m,

E owned by m5 owned by m6 owned by m7

EJ owned by mg

,----

Figure 3. (a) A "multi-cycle deadlock" for mini- mal adaptive routing with 2 VCs. (b) The CWG contains a knot.

Figure 4. (a) A "cyclic non-deadlock" for mini- mal adaptive routing with 2 VCs. (b) The CWG does not contain a knot.

tion for participation in a knot as described previously. This type of message is referred to as a dependent message and is distinguished from those messages actually in the deadlock set. The usefulness of this distinction is evident when de- veloping deadlock detection mechanisms for recovery-based routers. The detection mechanism must be careful not to in- correctly identify dependent messages as being among those properly in the deadlock set, as removing them from the net- work will not resolve the deadlock. Moreover, dependent messages may be transient in that they may be able to proceed using an alternate resource not owned by one of the messages in the deadlock set.

2.2.2 Multi-Cycle Deadlocks Figures 3a and 3b depict the network and the CWG for a

more complex example of a deadlock, one comprised ofmul- tiple resource dependency cycles. This network uses minimal adaptive routing and two VCs per physical channel. Once again, all messages (ml ... ma) have exhausted their adaptiv- ity and are blocked. Each message is waiting to acquire one of two VCs needed to continue routing, both of which are owned by other members of the group. There are multipleunique cy- cles in the CWG. The set of all vertices involved in this group of cycles, R = {wc~, wc3, uc5, wc7, ucg, vcll, uc13, wc15}, meets the requirement for a knot. This is an example of what is referred to as a multi-cycle deadlock; its deadlock set has 8 messages {ml . . . ma}, its resource set has 16 VCs { u c ~ , vel , . . . ~ ~ 1 5 } , and its knot cycle density is 24 cycles.

CWGs similar to Figure 3b, where there are multiple out- going dashed arcs per blocked message (fan-out > l), are in- dicative of networks which allow a greater degree of routing flexibility (e.g. , provide multiple VCs per physical channel, allow adaptive routing, etc.). Given that the messages in this example have exhausted their adaptivity, the vertices with a fan-out of two in Figure 3b correspond to a routing relation that supplies two alternative resources for each of the blocked messages. Should messages have blocked prior to exhausting their adaptivity, vertices with larger fan-out (i.e., 4) would ex- ist in the graph. As can be seen by this example, the fan-out of vertices in the CWG, which is determined by routing adap- tivity and the number of VCs per physical channel, greatly in- fluences the number of unique cycles that can form. More im-

portantly, increasing the routing flexibility exponentially in- creases the degree of correlation of resource dependency re- quired for multiple cycles to form knots.

2.2.3 Cyclic Non-Deadlocks A scenario in which multiple cycles exists but which does

not result in deadlock (referred to as cyclic non-deadlock) is depicted in Figures 4a and 4b. This is similar to the previous example except that message mq's destination is changed, allowing it to acquire the required VCs on its way to its destination. There are 8 unique cycles in the CWG. Given the set of all vertices in this group of cycles (21~1~21~3, wc5, vcg, well, vC13, wc15}, note that vertices we7 and VC16 are reachable from members of this set, but the op- posite does not hold. This set (or any subset thereof) does not meet the conditions €or a knot; therefore, there is no deadlock in this network. This is because message m4 may eventually reach its destination and subsequently release wc7, which will allow one of the two messages waiting for this channel (m3 or m7) to continue. Other messages will then be able to proceed in a similar fashion.

This example confirms the notion that cycles are neces- sary but not sufficient for deadlock, as was concluded by Duato [7]. Resource dependency graphs of deadlock avoid- ance algorithms based on Duato's framework may have cy- cles but will always have an escape resource to avoid dead- lock (such as we7 in Figure 4b). The elimination of these cy- cles as required by some avoidance-based routing schemes is therefore overly restrictive. Similarly, eliminating cycles in a packet wait-for graph [IO] to avoid deadlock is also overly restrictive-the packet wait-for graph for this example clearly contains cycles, yet no deadlock exists.

In summary, single-cycle deadlocks are possible in net- works which have a single channel resource and limited adap- tivity defined on that resource (due to static routing or ex- hausted adaptivity). Multi-cycle deadlocks involving highly correlated message resource dependencies are possible in net- works using multiple resources and which allow greater rout- ing adaptivity over those resources. It has been shown that the number of blocked messages (number of vertices which have outgoing dashed arcs) and the flexibility in routing (fan- out of these vertices) greatly influence the formation of cycles

82

Page 4: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

[ 111. However, deadlock occurs only when a group of cycles form a knot.

3 Deadlock Characterization Our approach for precisely detecting deadlocks is based on

a theoretical framework which defines a deadlock as a knot within a CWG [6]. We implement a deadlock detection algo- rithm that is able to identify knots within the CWG of an on- going network simulation. The deadlock detection algorithm involves maintaining a CWG, detecting cycles within this graph, and identifying groups of cycles which form knots. It is implemented in a flit-level simulator called FlexSim (an ex- tension of FlitSim 2.0).

All simulations are run for normalized loads up to full net- work capacity or until the network saturates with respect to the number of resource dependency cycles, generally well beyond the loads at which network performance saturates (shown in the figures by a vertical dashed line). Each simula- tion is run for 20,000 simulation cycles beyond steady state.

Unless otherwise stated, all simulations are performed us- ing uniform traffic, a 16-ary 2-cube with bidirectional chan- nels, a fixed message size of 32 flits, an edge buffer depth of two flits, one injection and reception channel, and a chan- nel selection policy which favors continuing routing in the current dimension over turning. Minimal true fully adap- tive routing (TFAR) is used for adaptive routing and dimen- sion ordered routing (DOR) is used for static routing. Since no other restrictions are enforced, deadlocks are possible for both routing schemes. The deadlock detection algorithm is invoked every 50 simulation cycles. Deadlocks are ”broken” by removing a message in the deadlock set (flit-by-flit) from the network so as to synthesize a recovery procedure (as in the Disha scheme [51).

Deadlock frequency is presented as “normalized dead- locks” which is the ratio given by the number of deadlocks averaged over all messages delivered. When no deadlocks exist, we instead use the total number of resource dependency cycles formed and the amount of congestion (number and per- centage of blocked messages) to represent the conditions that could lead to deadlock formation. The size of deadlock and resource sets and the knot cycle density are used to describe the size and complexity of deadlocks.

3.1 Effect of Physical Links on Deadlocks In studying the effect of network links on deadlock for-

mation, we measure the frequency of deadlocks in tori with uni- and bidirectional channels. We assume DOR with one VC per physical channel for both networks (all other param- eters set to default values). Figures 5a and 5b show normal- ized deadlocks vs. load rate and deadlock set size vs. load rate for the two networks under uniform traffic. Normalized load rate is calculated based on total link bandwidth and aver- age internode distance, which differs for both networks. The figures show that the uni-torus leads to relatively more dead- lock despite having generated less overall traffic.

Below network saturation, there are 1 and 7 deadlocks for every 100 messages delivered (on average) in the bi- and uni- directional networks, respectively. For the two networks, no more than 4 (bi) and 3 (uni) messages are involved in each

Figure 5. (a) Normalized deadlocks vs. load rate. (b) Deadlock set size vs. load rate.

deadlock below saturation loads. This indicates that unless messages experience deadlock more than once, up to 3% (bi) and 15% (uni) of all messages participate in deadlock. Deep into saturation, deadlock frequency grows to 11% (bi) and 60% (uni) while the number of messages involved in dead- lock converges to around 6 for both networks. From this, we can infer that at highly saturated load rates messages may be involved in multiple deadlocks prior to being delivered, par- ticularly in the uni-directional network.

The dead,locks formed in both networks are of the single- cycle deadlock variety described in Section 2.2.1. The re- quirements and factors leading to deadlock for the two net- works, however, are different which helps to explain the dis- parity in deadlock frequency. For one, a bi-torus requires a minimum of 3 messages per deadlock whereas only 2 mes- sages comprise the minimal deadlock set for a uni-torus. As confirmed by Figure 5b, the uni-torus has deadlocks involv- ing fewer messages for all load rates up through deep satu- ration. Second, and more importantly, for uniform traffic in a torus with 16 nodes per dimension each bi-link is used by 13% of the messages traveling in a particular direction within a given dimension whereas each uni-link is used by 50% of the messages in the network. This suggests that the highly correlated resource dependencies resulting from all network traffic having to travel in the same direction (and turn) to reach their respective destinations is a major contributor to deadlock frequency.

Our results show that as expected, adding routing re- sources (e.g., bidirectional physical links) reduces resource contention such that correlated resource dependencies re- quired for deadlock are less likely to form. Although bidirec- tionality significantly reduces deadlock frequency, it does not by itself reduce the likelihood of deadlock formation to suf- ficiently low enough levels. However, bidirectionality may be combined with other techniques (following sections) to re- duce deadlock frequency to well within acceptable levels.

3.2 Effect of Adaptivity on Deadlocks In studying the effect of adaptivity on deadlock forma-

tion, we measure the frequency of deadlocks and cycles in tori using DOR and WAR. To focus on the effects of adap- tivity alone, we again use a single VC per physical channel for both algorithms. Figures 6a and 6b show the normalized deadlocks and cycles vs. load rate and the deadlock and re- source set size vs. load rate for the two algorithms under uni- form traffic. DOR allows only single-cycle deadlocks to form (as in Figure l), so one curve can represent both cycle and deadlock information. In contrast, TFAR allows cyclic non-

83

Page 5: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

$d

1 I 1 d.

’ i & : - - - - a

o-TF*R D-d!-%b

+ - W R C y c k and O e e d W ’ - TFAR ReooumeSm x - F A R OeadWse(

0 - WR R s a r ~ Set + - WR t k a d k k Set

‘G6 ;, 0;s 0.“ ob 0.5 *11 0 4 3;- ;, 0.2 ok 0 s a n G. wlu Y.dF%.

Figure 6. (a) Normalized deadlocks and cycles vs. load rate. (b) Deadlock and resource set size vs. load rate.

deadlocks (similar to Figure 4). Since many more cycles can exist than there are deadlocks, two different curves are used to convey cycle and deadlock formation.

Our results show that TFAR suffers no deadlocks below network saturation, 1 deadlock per 100 messages delivered at saturation, and about the same number of deadlocks as mes- sages delivered in deep saturation. The ratio of deadlocks to messages delivered for DOR is even smaller prior to satu- ration (less than 1 per 1000 messages delivered). This rate gradually increases to 1 deadlock for every 10 messages de- livered in deep saturation. In terms of actual number of dead- locks (not normalized to throughput), DOR suffers more than TFAR by as much as a factor 6. Interestingly, DOR has higher sustained throughput over WAR despite having alarger num- ber of deadlocks. This explains the discrepancy between ac- tual deadlock and normalized deadlock. It is also observed that the performance of TFAR is highly sensitive to just a few deadlocks while the performance of DOR remains relatively unaffected even as the number of deadlocks grows.

The size of deadlock and resource sets in DOR are in- herently limited by the single-cycle deadlocks which form. Given that deadlocks are broken immediately upon detection, the effects of deadlocks in DOR are “local”, isolated to a given row or a column within the network. The relatively simpler correlation of message dependency required for these deadlocks makes them more likely but, at the same time, less severe. In contrast, WAR can lead to large multi-cycle dead- locks which have a more “global” effect upon the network. Hence, the higher degree of correlation of message resource dependency required for these deadlocks makes them less likely but more severe.

The results shown in Figure 6b confirms our hypothesis. Large multi-cycle deadlocks appear in WAR with deadlock sets and resource sets that are 5 to 7 and 7 to 10 times larger than those of DOR, respectively. What’s more, the knot cy- cle densities for TFAR deadlocks are greater by a factor of 10 to 20. Some of the larger deadlocks observed in TFAR in- volve as many as 35% of the messages within the network, occupy more than 40% of the channels, and involve hundreds of cycles, thus confirming their global nature. As a result, the residual effects of such large deadlocks are longer-term and widespread; just afew can greatly degrade performance. This is in contrast to the deadlocks in DOR which have more “lo- calized‘’, shorter-term effects, thereby making DOR’s perfor- mance less affected by a large number of deadlocks.

The cyclic non-deadlocks in TFAR may also degrade per- formance. Duato [3] has described situations where messages

Figure 7. (a) Normalized deadlocks vs. load rate. (b) Number of cycles vs. percent of mes- sages blocked.

block cyclically faster than they can be drained and remain blocked for extended periods, leading to large message la- tencies. The large number of cycles we have observed even in the absence of deadlocks suggests that this may be oc- curring. Hence, low throughput resulting from these cyclic non-deadlocks contributes to the higher normalized deadlock frequency for WAR although fewer actual deadlocks form. Given that TFAR with a single VC makes harmful deadlocks and cyclic non-deadlocks probable, recovery-based adaptive routing would benefit from additional VCs. Next we will ex- amine the effect of additional VCs on reducing the likelihood of deadlock formation.

3.3 Effect of Virtual Channels on Deadlocks In investigating the effects of traffic flow on deadlock for-

mation using multiple VCs per physical channel, we measure the deadlock frequency of DOR and TFAR in tori networks which allow the unrestricted use of 2,3, and 4 VCs (all other parameters default). For experiments in which deadlock did not occur, we use network congestion and resource depen- dency cycles formed as a measure of the likelihood of possi- ble deadlock. Figures 7a and 7b show normalized deadlocks vs. load rate and number of cycles vs. percentage of blocked messages under uniform traffic. In Figure 7b, each curve is annotated with the load rate at which cycles first appear (first point) and the load rate at which the highest number of cycles were found (last point).

In Figure 7a, DOR with two VCs (DOR2) does not lead to deadlock prior to saturation; the 2nd VC more than dou- bles the load at which deadlocks begin to appear when us- ing only 1 VC. At its saturation load rate, approximately 1 deadlock occurs for every 100 messages delivered. Dead- lock frequency increases to 1 for every 5 messages delivered in deep saturation. Beyond saturation, the actual number of deadlocks for DORl and DOR2 is roughly the same. How- ever, a larger reduction in throughput at loads after saturation makes the normalized deadlock measure slightly higher for DOR2 (as shown in the figure). With 3 or more VCs, DOR suffers no deadlocks. In contrast, 2VCs are sufficient to dis- courage deadlocks in TFAR (DOR3, DOR4, TFAR2, WAR3 and WAR4 are not plotted as no deadlocks occurred).

A number of factors contribute to the elimination of dead- locks when additional VCs are introduced. The new VCs are resources that become available to messages which would otherwise block. The likelihood of the formation of cycles and knots decreases when fewer messages are blocked within the network. The new VCs also provide a higher number of

a4

Page 6: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

routing options for those messages which still block within the network. As was illustrated in Section 2, additional rout- ing options increase the deadlock set size, resource set size, and knot cycle density needed for deadlock, thereby requir- ing a higher degree of correlation of message dependency in order for deadlock to form. This greatly diminishes the like- lihood of deadlocks. Note that WAR amplifies the effects of additional VCs since adaptivity makes new routing options available in each dimension. This explains why TFAR is able to eliminate all deadlocks with a smaller number of VCs (two instead of three for DOR). The simpler correlation of message dependencies required for deadlock in DOR combined with restrictions in the use of the new resources makes 2 VCs in- sufficient to eliminate deadlock in DOR.

Figure 7b indicates that adding VCs reduces congestion and allows higher loads to be applied before a large num- ber of cyclic non-deadlocks form. TFARl results in increas- ingly higher congestion and a larger number of cycles start- ing at saturation. WAR2 eliminates the cycles encountered at low load rates in TFAR1, and substantially reduces the overall congestion (from over 70% of the messages being blocked down to as few as 13%). As WAR2 reaches satu- ration, its congestion increases while the number of cycles grows rapidly. The third and the fourth VCs for TFAR and DOR show a similar effect on reducing congestion and elim- inating cycles at loads prior to saturation, leading to rapid growth in cycles once saturation is reached.

In summary, we observe that additional VCs are able to reduce the amount of messages which block within the net- work, as expected. This, along with the higher degree of cor- relation of message dependencies required for deadlock in the presence of a larger number of routing options due to the addi- tional VCs greatly diminishes the likelihood of deadlock. We find the extent to which deadlocks are eliminated with as few as 2 VCs per physical channel to be surprising. However, as networks with multiple VCs reach saturation, a higher num- ber of blocked messages along with the larger number of rout- ing options increases the number of cycles exponentially. The fact that no knots exist even in the presence of hundreds of thousands of cycles suggests the formation of extremely large cyclic non-deadlocks at saturation loads. Operating below saturation avoids this performance degradation.

3.4 Effect of Buffer Depth on Deadlocks We now investigate the effects of increasing the channel

buffer size on deadlock formation. We measure the frequency of deadlocks in bidirectional tori with channel buffer depths of 2, 4, 6, 8, 16, and 32 flits. TFAR with one VC per phys- ical channel is used. Using a buffer of the same depth as message length corresponds to virtual cut-through switch- ing [12]. Other buffer depths correspond to wormhole or buffered wormhole switching [13]. Figures 8a and 8b show normalized deadlocks vs. load rate and normalized deadlocks vs. the number of messages in the network.

As shown in Figure 8a, networks with buffer depths of 2, 4 and 6 flits all saturate at a similar load rate. After satura- tion, these networks lead to a large amount of deadlocks (15 to 25 deadlocks for every 10 messages delivered). The net- work with a buffer depth of 8 flits saturates at a 25% higher load rate, and leads to a similar deadlock frequency for load

Figure 8. (a) Normalized deadlocks vs. load rate. (b) Normalized deadlocks vs. messages in the network.

rates beyond saturation. Networks with buffers depths of 16 and 32 flits saturate at a 75% higher load than the smallest buffers, reflecting the larger capacity of these networks. A buffer depth of 16 flits leads to the highest number of dead- locks (15 to 35 deadlocks per every 10 messages delivered) while the virtual cut-through network (buffer depth of 32 flits) leads to the smallest number of deadlocks (1 deadlock for ev- ery 2 messages delivered) at load rates beyond saturation.

The increase in saturation load as the buffer depth is in- creased confirms that each message requires the simultaneous use of fewer channels due to the higher capacity. This allows for "message compaction". Below saturation, compaction leads to less resource contention and allows more messages to be serviced by the network. The similar saturation load for buffer depths of 2, 4, and 6 flits (6%, 12%, and 18% of the message size) indicates that the amount of compaction occurring for these buffer sizes are alike, and suggests that messages have blocked close to their source nodes, thereby neutralizing the effect of compaction. Increases in saturation loads are greater for larger increments in buffer sizes, thereby suggesting effective compaction for these buffer sizes (buffer sizes of 8,16 and 32 flits which can accommodate 25%, 50%, and 100% of a message, respectively).

When normalized with respect to the number of messages in the network (Figure 8b), the networks with smaller capac- ity buffers lead to a substantially higher number of deadlocks. This is explained by the fact that in these networks, each mes- sage requires the simultaneous use of a larger number of re- sources, thereby leading to higher resource contention. Al- though higher capacity buffers allow more messages to en- ter the network and, potentially, a larger number of messages to block at saturation, the degree of correlation of message dependency required for deadlocks also increases due to the message compaction, thereby making multi-cycle deadlocks with large deadlock and resource sets less probable.

3.5 Effect of Network Node Degree on Deadlocks To investigate the effects of node degree on the frequency

of deadlocks, we measured deadlock frequency in a 16-ary 2- cube (2D) and a 4-ary 4-cube (4D) torus-connected network, both of which use F A R routing with one VC. Load rate was normalized based on the total link bandwidth and average in- ternode distance of the two networks.

The 4D network resulted in relatively fewer deadlocks at loads prior to saturation (less than 1 % of the deadlocks which occurred for the 2D network). Also, the 4D network achieved higher performance well beyond the saturation load of the 2D

85

Page 7: [IEEE Comput. Soc. Press 11th International Parallel Processing Symposium - Genva, Switzerland (1-5 April 1997)] Proceedings 11th International Parallel Processing Symposium - Characterization

network, thereby leading to an even larger gap in the normal- ized deadlock frequency. The two main factors contributing to this are the additional network resources (physical chan- nels) and the increased routing freedom (dimensions). Simi- lar to other experiments, additional links serve to reduce re- source contention and the high node degree, along with adap- tive routing, increases the required correlation of message de- pendencies in order for knots to form. The few deadlocks that did form in the 4D network were all single-cycle deadlocks, which suggests that the few messages in the deadlock sets were limited to restricted routing due to exhaustion of rout- ing adaptivity towards the destination.

3.6 Effect of Non-Uniform Traffic on Deadlocks

The deadlock frequencies for non-uniform traffic patterns (bit-reversal, matrix-transpose, perfect-shuffle, and hot- spot) were similar to (in most cases, within 10% of) the deadlock frequencies for the uniform traffic patterns in the experiments described above. The characteristics of the deadlocks (dead- lock set size, resource set size, and knot cycle density) were similar as well. The only exception to this was for DOR. Single-cycle deadlocks in DOR (as shown in Figure 1) re- quire “circular overlap” of messages. The source and desti- nation pairs designated by some of these non-uniform traffic patterns are such that this overlap is not possible.

4 Related Work

Deadlock approximation schemes proposed previously [4, 51 have provided little insight into the frequency of true dead- locks. In contrast, our work presents frequencies of actual deadlock as well as their characteristics as they relate to key network parameters. CWGs and similar constructs have pre- viously been used to statically represent connections allowed by deadlock-avoidance based algorithms [3, 81. In contrast, we use these graphs to model dynamic resource allocation in unrestricted routing, and to precisely define and detect dead- locks. A summary of work characterizing deadlocks as knots in generalized resource graphs intended to describe deadlocks in operating systems is presented in [9]. Our work is a spe- cialized application of this framework, intended for depicting deadlocks in interconnection networks.

5 Conclusions and Future and Work

We characterize the causal effects of various network pa- rameters on blocked messages, resource dependency cycles, and deadlocks to gain a greater understanding of the viabil- ity of deadlock recovery-based routing. Through simulation and analysis, we empirically show how deadlock probability is influenced by these factors when routing restrictions are not enforced so as to avoid deadlock.

Our results for le-ary n-cube networks with n 2 2 con- firm that deadlock probability is less in bidirectional net- works than in unidirectional networks, and it decreases as node degree and adaptivity is increased. Localized dead- locks of limited harmful effect are more probable with dimen- sion ordered routing whereas globally harmful deadlocks are

probable with true fully adaptive routing. Deadlock proba- bility is less in virtual cut-through networks than in buffered- wormhole and wormhole networks, as expected. Interest- ingly, however, deadlocks are highly improbable (none were detected) if as few as 3 VCs are used with dimension ordered routing and vnly 2 VCs are used with true fully adaptive rout- ing in bidirectional wormhole networks. These results lead us to conclude that recovery-based routing is viable since the unrestricted use of only a few virtual channels is sufficient to make deadlock highly improbable. Providing greater rout- ing flexibility and buffer capacity through increased routing adaptivity, number of virtual and physical channels (bidirec- tional), and buffer depth greatly increases the complexity of correlated resource dependencies required for deadlock to oc- cur.

We will continue this characterization study by examin- ing the effect of irregular network topology, hybrid message length, misrouting, etc., on deadlock. We also plan to char- acterize deadlock formation under hybrid non-uniform traffic loads using program-driven simulations.

References [l] Andrew A. Chien and J. H. Kim. “Planar-AdaptiveRouting: Low-Cost

Adaptive Networks for Multiprocessors”. In Proc. of the 19th Sympo- sium on Computer Architecture, pp 268-277, May 1992.

[2] L. Ni and C. Glass, “The Turn Model for Adaptive Routing”, In Proc. of the 19th International Symposium on Computer Architecture, IEEE Computer Society, pages 278-287, May 1992.

[3] J. Duato. “A New Theory of Deadlock-free Adaptive Routing in Wormhole Networks”. IEEE Transactions on Parallel and Distributed Systems, 4( 12):1320-1331, December 1993.

[4] J. Kim, Z. Liu, and A. Chien. “Compressionless Routing: A Frame- work for Adaptive and Fault-tolerant Routing”. In Proc. of the 21st In- ternational Symposium on Computer Architecture, pp 289-300, April 1994.

[5] Anjan K.V. and Timothy M. Pinkston, “An Efficient, Fully Adaptive Deadlock Recovery Scheme: Disha”, In Proc. of the 22nd Interna- tional Symposium on Computer Architecture, pp 201-210, June 1995.

[6] Sugath Warnakulasuriya and Timothy Mark Pinkston. “Implementa- tion of Deadlock Detection in a Simulated Interconnection Network Environment”, Technical Report CENG 97-01, University of Southern California, January 1997.

[7] J. Duato. “A Necessary and Sufficient Condition for Dead lock-free Adaptive Routing in Wormhole Networks”. IEEE Transactions on Parallel and Distributed Systems, 6(10):1055- 1067, October 1995.

[8] Loren Schwiebert, D.N. Jayasimha, “A Necessary and Sufficient Con- dition for Deadlock-Free Wormhole Routing”, Journal of Parallel and Distributed Computing, 32, 103-117 (1996).

[9] Mamoru Maekawa, Arthur E. Oldehoft, and Rodney R. Oldehoft, Op- erating Systems: Advanced Concepts, Benjamin Cummings, 1987.

[ 101 William J. Dally and Hiromichi Aoki, “Deadlock-Free Adaptive Rout- ing in Multicomputer Networks Using Virtual Channels”, IEEE Trans- actions on Parallel Distributed Systems, Vol. 4, No. 4, April, 1993.

[ 111 Timothy Mark Pinkston and Sugath Wamakulasuriya. “On Deadlock in Interconnection Networks”, To appear in Proc. of the 24th Interna- tional Symposium on Computer Architecture, June 1997.

[12] Parviz Kermani and Leonard Kleinrock. “Virtual cut- through: A new computer communication switching technique”, Computer Nemorks, pages 267-286,1979.

[13] C.B. Stunkle et al. “The SF’2 high-performanceswitch”, IBM Systems Journal, vol. 34, no. 2, pp. 185-204,1995.

86