distributed multiple-microprocessor network

8
Distributed multiple-microprocessor network C.C. Lau, M.Phil, H.F. Li, B.Sc, Ph.D Indexing terms: Distributed computing, Microprocessors, Networks Abstract: The design of a distributed multiprocessor system has received much attention in recent years. With the advent of technology in LSI and VLSI, a cost-effective multiple-microprocessor module can be designed. The module can be employed as a basic element in configuring a distributed system. Several prob- lems are to be solved in such a design: (a) internal structure of each module, (b) handshake between modules, (c) information interchange via distributed mailbox, id) distributed timing and control, and (e) distributed priority resolving and deadlock detection. In this paper, these problems will be discussed and experimental solutions proposed. A simple performance analysis is also included. 1 Introduction The architecture of a distributed computing system consisting of closely located mini- and/or microcomputers is usually characterised by its coupling and control schemes including: Bus I time-shared common bus switch-controlled \ multibus Memory [large common memory with or without private memory \ large private memory with small common memory Interconnection (centralised interconnection control control \ distributed interconnection control Each of these features carries certain advantages as well as drawbacks. Just to mention some briefly, time-shared common bus requires the least hardware complexity and system cost, but is likely to provide the lowest system efficiency and expandability. Switch-controlled multibus provides the best chance for high system efficiency and data-transfer rate, but the switch matrix is often complex and costly. Large common memory seems ideal for sharing information and co-ordinating the execution of different tasks. However, severe contention problems [1] may arise and nullify its effectiveness. In addition, the prevention of private data in common memory against contamination imposes stringent requirements on hardware and software design. Small common memory with some usage restriction alleviates the contention and contami- nation problems, but the data-transfer rate may suffer. Finally, centralised interconnection control achieves higher control efficiency and requires simpler hardware, but it£ modularity and survivability are poorer than those of distributed control. The latter also allows the designer to configure symmetrical systems of identical modules. True redundancy in both hard- ware and software can be acheived in such systems. Just because of these inherent comparisons, traditionally distributed computing systems are often designed for special- purpose applications. Special-purpose applications allow the designer to choose the most suitable schemes to be adopted. Since the advent of microprocessors and microcomputer net- works, a general purpose-distributed system is bcoming cost effective. The approach to design a general-purpose distributed system is slightly different from that of special-purpose systems. Instead of choosing particular coupling and control schemes, the designer has to consider mixing different alterna- tives and providing arrangements to reduce their drawbacks. The current state-of-the-art of the distributed system is best illustrated by a brief survey of some representative systems. To start with, DATAPAC SL-10 [2] is an example using time- shared interconnection bus and large common memory. Several sets of control, trunk and line modules are linked by a common bus to a common memory module of primary and secondary storages. Each processing module has its own private bus and memory, which are not accessible by others. It is an asymmetrical system, as different modules are designed Paper 1674E, received 11th August 1981 The authors are with the Electrical Engineering Department, University of Hong Kong, Hong Kong for different functions. Indeed, SL-10 is a special purpose system of servicing telephone subscribers and may not be efficient for other applications. C.mmp [3—5] is a much publi- cised distributed computing system using switch-controlled interconnection buses and large common memory. It consists of a maximum of 16 minicomputers (PDP 1 l/20s and 1 l/40s), each with a 4k private memory. These processors are linked to a centralised crossbar switch through address translators. The crossbar switch allows sharing of a large memory system of up to 16 modules under centralised control. C.mmp is designed to support general-purpose applications. Its next-of-kin, Cm* [6—7], is a system using mixed coupling and control schemes. A cluster of DEC LSI-11 microcomputers are coupled by a time-shared map bus, and then the map bus is linked to other clusters through a K.map, which is itself a high-performance microprogrammed processor. Each K.map can be connected to two intercluster buses which are communication paths between different clusters. A microcomputer is connected to a map bus through local switches. Data interchange between microcomputers of different clusters is centrally controlled by the K.map and local switches. The memory of each micro- computer can be accessed by others using certain additional delays such as address translation time. Thus Cm* is a system with a large shared memory implemented distributively. The interconnection control is distributed in the cluster level. So a cluster fails whenever the K.map fails, and the cluster becomes logically detached from the rest of the system. Finally, the MIDSS [8] is an example system employing a small shared memory, called the mailbox. The microprocessor-controlled data preprocessor SPACE PIPE passes data to the host com- puter (PDP 11/40) via the mailbox. The MIDSS is a special- purpose system designed for telemetry. To allow sharing, the mailbox is implemented by multiport memory modules. All the systems mentioned have one common factor. They consist of general-purpose mini- and/or microcomputers which require additional centralised hardware specially designed for configurating a distributed system. This paper will discuss the configuration of a distributed system with completely decen- tralised control. Interconnection facilities such as mailbox, interconnection buses, switch matrix as well as timing and control are resolved in each module. Thus a distributed sys- tem can be easily configured with these basic modules and cables. The architecture is made as simple as possible, so that no address translation and memory management are implemented. Each module is made of moderate speed micro- processors, single port memory and the least hardware for linking them together. Thus low cost can be expected. 2 Basic multiprocessor microcomputer module The multiprocessor microcomputers are the exclusive elements in the proposed distributed system. Each module contains a number of microprocessors linked to a common memory through a time-shared common bus. The speed of the memory should be several times higher than the cycle time of the microprocessors, depending on the multiplicity of micro- IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982 0143- 7062/82/030123 + 08 $1.50/0 123

Upload: hf

Post on 21-Sep-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Distributed multiple-microprocessor network

Distributed multiple-microprocessor networkC.C. Lau, M.Phil, H.F. Li, B.Sc, Ph.D

Indexing terms: Distributed computing, Microprocessors, Networks

Abstract: The design of a distributed multiprocessor system has received much attention in recent years.With the advent of technology in LSI and VLSI, a cost-effective multiple-microprocessor module can bedesigned. The module can be employed as a basic element in configuring a distributed system. Several prob-lems are to be solved in such a design: (a) internal structure of each module, (b) handshake between modules,(c) information interchange via distributed mailbox, id) distributed timing and control, and (e) distributedpriority resolving and deadlock detection. In this paper, these problems will be discussed and experimentalsolutions proposed. A simple performance analysis is also included.

1 Introduction

The architecture of a distributed computing system consistingof closely located mini- and/or microcomputers is usuallycharacterised by its coupling and control schemes including:

Bus I time-shared common bus switch-controlled\ multibus

Memory [large common memory with or without private memory\ large private memory with small common memory

Interconnection (centralised interconnection controlcontrol \ distributed interconnection control

Each of these features carries certain advantages as well asdrawbacks. Just to mention some briefly, time-shared commonbus requires the least hardware complexity and system cost,but is likely to provide the lowest system efficiency andexpandability. Switch-controlled multibus provides the bestchance for high system efficiency and data-transfer rate, butthe switch matrix is often complex and costly. Large commonmemory seems ideal for sharing information and co-ordinatingthe execution of different tasks. However, severe contentionproblems [1] may arise and nullify its effectiveness. Inaddition, the prevention of private data in common memoryagainst contamination imposes stringent requirements onhardware and software design. Small common memory withsome usage restriction alleviates the contention and contami-nation problems, but the data-transfer rate may suffer. Finally,centralised interconnection control achieves higher controlefficiency and requires simpler hardware, but it£ modularityand survivability are poorer than those of distributed control.The latter also allows the designer to configure symmetricalsystems of identical modules. True redundancy in both hard-ware and software can be acheived in such systems.

Just because of these inherent comparisons, traditionallydistributed computing systems are often designed for special-

purpose applications. Special-purpose applications allow thedesigner to choose the most suitable schemes to be adopted.Since the advent of microprocessors and microcomputer net-works, a general purpose-distributed system is bcoming costeffective. The approach to design a general-purpose distributedsystem is slightly different from that of special-purposesystems. Instead of choosing particular coupling and controlschemes, the designer has to consider mixing different alterna-tives and providing arrangements to reduce their drawbacks.

The current state-of-the-art of the distributed system is bestillustrated by a brief survey of some representative systems.

To start with, DATAPAC SL-10 [2] is an example using time-shared interconnection bus and large common memory.Several sets of control, trunk and line modules are linked by acommon bus to a common memory module of primaryand secondary storages. Each processing module has its ownprivate bus and memory, which are not accessible by others.It is an asymmetrical system, as different modules are designed

Paper 1674E, received 11th August 1981The authors are with the Electrical Engineering Department, Universityof Hong Kong, Hong Kong

for different functions. Indeed, SL-10 is a special purposesystem of servicing telephone subscribers and may not beefficient for other applications. C.mmp [3—5] is a much publi-cised distributed computing system using switch-controlledinterconnection buses and large common memory. It consistsof a maximum of 16 minicomputers (PDP 1 l/20s and 1 l/40s),each with a 4k private memory. These processors are linked toa centralised crossbar switch through address translators. Thecrossbar switch allows sharing of a large memory system of upto 16 modules under centralised control. C.mmp is designed tosupport general-purpose applications. Its next-of-kin, Cm*[6—7], is a system using mixed coupling and control schemes.A cluster of DEC LSI-11 microcomputers are coupled by atime-shared map bus, and then the map bus is linked to otherclusters through a K.map, which is itself a high-performancemicroprogrammed processor. Each K.map can be connected totwo intercluster buses which are communication pathsbetween different clusters. A microcomputer is connected to amap bus through local switches. Data interchange betweenmicrocomputers of different clusters is centrally controlledby the K.map and local switches. The memory of each micro-computer can be accessed by others using certain additionaldelays such as address translation time. Thus Cm* is a systemwith a large shared memory implemented distributively. Theinterconnection control is distributed in the cluster level. So acluster fails whenever the K.map fails, and the cluster becomeslogically detached from the rest of the system. Finally, theMIDSS [8] is an example system employing a small sharedmemory, called the mailbox. The microprocessor-controlleddata preprocessor SPACE PIPE passes data to the host com-puter (PDP 11/40) via the mailbox. The MIDSS is a special-purpose system designed for telemetry. To allow sharing, themailbox is implemented by multiport memory modules.

All the systems mentioned have one common factor. Theyconsist of general-purpose mini- and/or microcomputers whichrequire additional centralised hardware specially designed forconfigurating a distributed system. This paper will discuss theconfiguration of a distributed system with completely decen-tralised control. Interconnection facilities such as mailbox,interconnection buses, switch matrix as well as timing andcontrol are resolved in each module. Thus a distributed sys-tem can be easily configured with these basic modules andcables. The architecture is made as simple as possible, sothat no address translation • and memory management areimplemented. Each module is made of moderate speed micro-processors, single port memory and the least hardware forlinking them together. Thus low cost can be expected.

2 Basic multiprocessor microcomputer module

The multiprocessor microcomputers are the exclusive elementsin the proposed distributed system. Each module contains anumber of microprocessors linked to a common memorythrough a time-shared common bus. The speed of the memoryshould be several times higher than the cycle time of themicroprocessors, depending on the multiplicity of micro-

IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982 0143- 7062/82/030123 + 08 $1.50/0 123

Page 2: Distributed multiple-microprocessor network

processors in the module. For example, if the access time ofthe memory, including switch delay, is 200 ns, then it can beshared by three microprocessors with cycle time of 600 ns.Then the microprocessors are timed by interlaced clocks0, , 02 and 03 respectively, and the memory is timed by0i +02 +03- Most part of the memory is private to themicroprocessors in the same module except a small portionrepresenting the mailbox for neighbouring modules to access.The time-shared bus inside each module is called the privatebus (PB). In addition to this, each microcomputer has aninterconnection bus (IB) to each of its neighbours. The PB andIB are controlled by a simple switch matrix consisting of tri-state drivers which in turn is controlled by the timing andcontrol unit of the module. The architecture of this basicmodule is shown in Fig. 1.

HP, HP2

IBS

HP3

sw.

isolationlogic

dummingterminal

cntl

PB

memory

MB

I/O

MRO-MRI:MACKOMACK1

handshakesignals

Fig. 1 Architecture of multiprocessor microcomputer

The mailbox is the only portion of memory that a neigh-bouring module can access directly, reducing the chance ofdata contamination. A mailbox access is initiated by decodingthe external mailbox address (EMA). The timing and controlunit (CNTL) in the module will than generate a mailboxrequest output signal (MRO) and send to the appropriateneighbouring module. Upon completing its current cycle,the latter will sense this request signal, as a mailbox requestinput signal (MRI) and acknowledge it with a mailboxacknowlege output signal (MACKO). At the same time, thisMACKO signal is sent to the requesting module as a mailboxacknowledge input signal (MACKI). Through this handshakeprocess, a communication path is established. It will last for aperiod sufficient for mailbox read/write, or read-modify-writeoperation. Afterwards, the previous bus connections in bothmodules will be restored.

The number of interconnection buses is determined by thenumber of neighbouring modules in the desired configuration.The private and interconnection buses are connected tomemory and microprocessor through private bus switches(PBS) and interconnection bus switches (IBS), respectively.The PBSs are normally closed while the IBSs are normallyopen. Fig. 2 shows a simple example in which each modulehas two interconnection buses, IB0 and IB,. Note that IB0 isconnected to IB] of its neighbour and vice versa. In asynchro-nous operation, the requesting module for mailbox access willopen the PBS and close the IBS on its own side. The requestedmodule closes both PBS and IBS on the other side of thecommunication path, and temporarily detaches its micro-processors from its local memory.

Structurally, these basic modules allow a large variety ofinterconnection configurations, depending on the suitabilityto particular applications. Microcomputers with two inter-connecting buses form a ring. Increasing the number of inter-connection buses to four or six generates array or cubic con-figurations, respectively. It is generally not necessary to adoptsuch a uniform structure; for example, a tree or general graphstructure may similarly be formed. In our subsequent descrip-tion, however, we will assume an array structure for the sakeof illustration. It should be apparent how to tailor the designfor any general structure.

The microprocessors in the system can be programmed toexecute dependent or independent tasks. They also take careof simultaneous data transfer on different interconnectionbuses, boosting the overall system efficiency.

3 Distributed mailbox

Systems using both common and private memories have acommon problem. The total common and private addressspace is limited by the maximum address space of each micro-computer. The larger the common memory is, the smaller theprivate memory can be, unless extra paging or address trans-lation hardware are provided. It is desirable to minimise thereduction of private memory due to the mailbox, especiallyfor 8-bit microprocessors whose maximum address spaceis only 64k.

Fig. 2 Interconnection structure

The distributed mailbox in the proposed system is selectedto limit the private memory reduction. Each microcomputermodule has four interconnection buses labelled as 00, 01, 10,and 11. Bus 00 is connected to bus 11 of its neighbour, andbus 01 connected to bus 10 of its neighbour and so on. Inshort, an interconnection bus C Co is connected to busQ Co of its neighbour. In terms of the address format, say16 bits long, we have an address interpreted as:

M3 M2 M, Mo C, Co A9 . . . Ao

where

M3 M2 M, : indicates if it is mailbox address

[ 0 if the local mailbox is accessedMo =

1 if external mailbox at its neighbour is accessed

A9 . . . Ao: indicates a location within the selected par-tition of the mailbox

Based on this interpretation, the address map of each micro-computer is shown in Fig. 3a. Physically, the memory residingin each node is organised according to the map shown inFig. 3b. Notice that each mailbox contains four partitions,one for each of its neighbours. In case Mo is equal to 1, theexternal mailbox is accessed. So an external memory accessrequest is generated (MRO) and sent to the appropriateneighbour, controlled by C] Co. On acknowledgment, the

124 IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982

Page 3: Distributed multiple-microprocessor network

requested node will accept the address M3 M2 Mj 1 Cj Co

A9 . . . Ao and convert it automatically to M3 M2 M! 0 C!Co A9 . . . Ao. Thus both local and external microprocessorcan access the same physical locations in mailbox. Addressesheaded with M3 M2 Mt 1 Ci Co are unusable locations.Noticing that, with this arrangement, no address translator isrequired, Mo is converted from 1 to 0 just by holding theappropriate IB line low.

data. For the array structure, the switch matrix is illustratedin Fig. 4.

PSi, PS2 and PS3 are used to select the microprocessors.They are closed one at a time and in a fixed sequence underthe control of an interlaced clock. But in ca.se the node isallowed to be accessed by a neighbouring no$e, all of themshould be open. So the signals that control PSi, PS2 and PS3

are 0 , . MACKO, 02. MACKO, and 03. MACKO, respectively,

mailbox

private

not used

partition

partition

partition

partition

private

11

10

01

00

M3M2 M, 1 1111 1111 1111

M3M2M, 1 0000 0000 0000

M J M J M , 0 1100 0000 0000

M3 M2M, 0 1000 0000 0000

M3M7M, 0 0100 0000 0000

M3M2M, 0 0000 0000 0000

private

externalmailbox

internalmailbox

private

Fig. 3 Memory map of each microcomputer

This simple address translation allows fast memory accessacross adjacent modules. The transfer of data from a sourcenode to a distant destination node is controlled by software.Data is then transmitted in the form of a message with aheader describing the source and destination, among otherthings. The mailbox transfer routine will check these nodenumbers and perform the desirable routing. The detailedmonitor and other software design for this system will becovered by a separate paper under preparation, an is outsidethe scope of this paper.

From the above example, however, it is quite apparentthat the reductions of private memory space is independentof the number of nodes (modules) in the system. For theexample, each mailbox size is 4k and the reduction of theprivate memory size is 8k.

Thus, the memory size can always increase with n, thenumber of microcomputers in the system, and expandabilityis not limited by memory space reduction.

4 Distributed bus switch

Referring to Fig. 2, the bus switch in each microcomputercan be classified into two groups, one for the selection ofmicroprocessors, and the other for the selection of memory(PBS for local and IBS for neighbour). Each group is furtherclassified into two subgroups, one for address and one for

where MACKO is the memory acknowledge out signal fromthis node to its four neighbours (MACKO = MACK000 +MACKOQ, + MACKO 10 +MACKOn).

PB

IBn

IBn

IB,,

IB,

PBS2address

IBS,'00a

IBSok

IBS,

IBS,lc

PS,, PS,data

p 5 2 Q Mp2 P S 2d

PS3a MP3 PS3d

PBS, PB

IBS,'00d

IB,'00

IBSoid

IBS•10d

IB01

IB10

IB,,

Fig. 4 Switch matrix

IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982 125

Page 4: Distributed multiple-microprocessor network

PBS is a normally closed switch which connects a micro-processor (local or external) to the local memory. It is openedwhen a microprocessor is allowed by the timing and controlunit (CNTL) to access an external mailbox (M3 M2 M! 1 Cx

Co A9 . . . Ao). In such case, the local node scans MROCiCo

and generates a service signal SMRC C o, while the neigh-bouring node generates an acknowledge MACKOCiCoi andpasses it to the requesting node as MACKICiCo. Thus PBS isclosed by signal

SMRO. MACKI = SMRO + MACKI

where

SMRO = SMROQO + SMR00i + SMRO10 SMROn

MACKI = MACKIQO + MACKIOi + MACKI 10 +MACKI n

Except the subgroups of PS and PBS for address lines, allother switches are bidirectional tristate buffers, direction ofdata flow via PS, PBS and IBS switches are controlled byread/write signal (R/W). Address signal flow through IBSaddress group is controlled by

SMROClc0MACKICiCo + M A C K O ^

where

CiC0 = bus number

The first term allows address signal from the requestingnode to be routed to its neighbour via IBSC c and IBCic0 •The latter allows such address signal to betouted to its localmemory via IBg c > IBSc c a n ^ PBS.

The sub-group"of IBS* for data transfer is controlled bylogic

(SMROClc0 • MACKICiCo + MACKOe.c,, )• (R/W)

The authors' prototype has been implemented with an iso-lation register (ISR). Through software functional checks,a ' 1 ' can be written into the appropriate bit of ISR, indicatinga faulty microproccessor or a faulty (inaccessible) neighbour.Thus, all the control signals mentioned above have to beANDed with the complement of appropriate isolating bits,before they are applied to various switches.

J Q

T

K Q

J Q

T

K Q

RG

MRI,

EMA,

MR1O_

EMA0

priorityresolver SCS,

MACKOO

M A C K O I

\ \ VMA-# MCLK

Fig. 5 Timing and control unit

126

clockstretcher

5 Distributed timing and control

The timing and control unit (CNTL) performs three functions:clocking the microprocessors, memory and I/O interfaces inthe same node, supervising the handshake between neigh-bouring nodes, as well as controlling the switch matrix, whichhas been covered in the previous Section.

As illustrated in Fig. 5, a basic clock 0 synchronizes allactivities in the same node. This clock in turn generates threeinterlaced clocks, 0 j , 02 and 03, one for each of the micro-processors. Each 0j (0!, 02 or 03) can be stretched when (a) amemory refresh request is sensed and granted, (b) a mailboxrequest output is generated, or (c) a mailbox request input isreceived. The timing relationship is further illustrated in Fig. 6.In the system, we assume dynamic memory.is used because ofits speed and cost advantages, especially in recent products.

The control part of the timing and control unit consists ofa clock stretcher and a priority resolver. External mailboxaccess is dealt with in a way similar to dynamic memory

Fig. 6 Timing diagramnode 2

MCLK

DBE

0EXT

STRETCH

IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982

Page 5: Distributed multiple-microprocessor network

refresh. Mailbox request inputs (MRIS) are treated as memoryrefresh request (RR). Mailbox acknowledge outputs(MACKOs) serve the same purpose as refresh grant (RG).However, an actual memory refresh request (dynamic memoryis used) in the requesting node can be ignored if the node isaccessing the mailbox of its neighbours. A mailbox acknowl-edge output should be delayed if the memory in the requestednode requires memory refresh. The microprocessor clock ofthe requested node should be frozen for at least one morememory cycle, to allow the local access to be completed,indicated by some valid memory address signal (VMA). Thisextension is controlled by a one shot (0EXT), which is trig-gered by MACKO.VMA signal as shown in Fig. 5. The effect of0EXT is illustrated by Fig. 6 marked with letter A.

Notice that a requesting node may also be requested byother neighbours. Thus, the priority resolver has to be able tosolve the following problems which may arise during systemoperation:

(a) simultaneous requests for the same mailbox in a node(b) simultaneous requests for the same communication path(c) unsuccessful handshake (SMRO not supported by

MACKI at proper time)(d) deadlock.

Either a centralised or decentralised priority resolver can bedesigned. The centralised design can resolve handshakes moreefficiently. However, it is expensive as it has to handle a largenumber of mailbox requests. Decentralised priority can workwith request information local to each node only. Its com-plexity is obviously much smaller.

The circuit diagram of a decentralised priority resolver isshown in Fig. 7. It consists of three sections. The first sectionis a request register for sampling and identification of pendingrequests. The second section is a sequencer (scanner), whichselects one request at a time and passes it to the third sectionfor performing handshake with the neighbouring node. Thethird section decides if two adjacent nodes can execute the

requested data transfer. Handshake is successful if at thatmemory cycle, the service of a MROCiCo matches that ofMRIC c of the adjacent node. Subsequently, the MROCiCo

request °in the requesting node is deleted from the requestregister. Deletion of MROC c i n requesting node in turndeletes MRIC c in requested node.

All possibV requests are labelled with correspondinginterconnection bus numbers (CiCo) (Fig. 8). Those requestsare sampled and queued in request registers in a fixed order.No further sampling occurs until the entire request register iscleared.

There are two different causes for unsuccessful handshake:(a) a MRI signal is not sensed by the requested node, unknownto the requesting node, (b) a node is performing handshakewith one of its neighbours, disregarding the requests of othernodes. These problems are solved, because if a request in thequeue is not successfully served, it will rejoin the end of thequeue and priority will be given to next active request, andthus the request queue is actually a variable length circular

IB number

MRI

Fig. 8 Notation of mailbox request

r

MRI,

MRI,

EMA0

L.

MPU

requestregister

"1

D QTQ

MR0D Q

T 6

D 0

T Q

D Q

T Q

set

MCLK-cntr

'MRIs

MACKO, (sw.IBS0)

SCS^sw.IBS,)

MACKCtysw.IBS,)

5C50(sw.IBS0)

decoderD (deadlock) deadlock enable sw.

Fig. 7 Priority resolver

MCLK, and MCLK2 are delayed-memory clocks

1EE PROC, Vol. 129, Pt. E, No. 3, MA Y 1982 127

Page 6: Distributed multiple-microprocessor network

list, also, the sample request is queued in a proper order toensure a high chance of successful handshake. The suggestedorder of the queue is MRIn , MROn ,MRI10, MROiO, MRIOi,MRIQI , MROoi, MRIQO and MROQO. Other ordering can alsobe adopted, but some are inadvisable. However, detaileddiscussion of this will be covered in another paper.

Deadlock arises when a node is incapable of completing theservice of its request queue within an allowable period. Thepossible cases are:

(a) nodes connected in a ring have the same active requestqueue (Fig. 9). Thus, all nodes try to serve the same request(MROC c or MRIC c ) without its match

(b) similar to case (a), but resulting from rotation of therequest in sequencer

(c) a node is trying to access a faulty node which isincapable of responding to the requesting node.

MRI MRO10

MRIn

Fig. 9 Mailbox request patterns which may cause deadlock

Deadlock can be detected by a deadlock detector incorporatedin each priority resolver, which is actually a downcounterpreset by the leading edge of each microprocessor clock. Itcounts the number of memory cycles which has elapsed. Acarryout from it is used to readjust the order of request queuesin alternate nodes. Another upcounter will count the occur-rence of this carryout, and after a fixed count may generatean interrupt, which defreezes the microprocessor clock,informs the microprocessor to repeat the data transfer througha different route, and writes a ' 1 ' to the appropriate bit in ISRto isolate the unresponsive neighbour. Thus, the deadlockdetector, functions as a watchdog timer.

It should be pointed out that, with the designed priorityresolver, the deadlock request patterns are software-dependentand predictable. Deadlock mainly occurs in synchronoussystems. In asynchronous systems, its chance of occurrence isinfinitesimally small. Thus, deadlock detectors in asynchro-nous systems are mainly for detecting faulty nodes.

Fig. 10 illustrate how an example request pattern isresolved. Microprocessor clock for microprocessor 1 in node 1is denoted by 0 n , similarly for 012 and 013 etc. Assuming that0 U leads 0i2 and 012 leads 013, microprocessor 1 in node 1generates a MR00 for node 2, which is then sensed by micro-

processor 1 in node 2. At the same time, the latter also gen-erates a MRO! for node 1. Since the microprocessor clock ofnode 1 has been frozen, that request will not be sensed bynode 1. Microprocessor 1 in node 3 generates a MROj fornode 2, which is also unsampled. Thus, within the first mem-ory cycle after 0 n is frozen, the request queues in respectivenodes will be:

node 1: MR00

node 2: M R O ^ M R I Q

node 3: MRO!

Handshake in request is served. However, during the 2ndmemory cycle, the request queues become

node 1: MR00

node2:MRI0,MRO!node3:MRO!

Thus handshake between node 1 and node 2 is successful.Microprocessor clock of node 1 is defrozen. The requestqueues become

node l:MRIinode 2: MRO xnode 3:

Handshake between node 1 and node 2 is successful in the firstmemory cycle of node 1. Both microprocessor clocks ofnode 1 and node 2 are defrozen. Thus node 2 is able to senseMRIj from node 3. The request queues then become:

node 1: nonenode 2: MRI!node3:MROi

Handshake between node 2 and node 3 is successful, themicroprocessor clock of node 3 is defrozen immediately.However, if the microprocessor is node 2 has a local memory

MR00

I'-node number

1—Mpu number

Fig. 10 Timing diagram for unsuccessful handshakes resolving

128 IEEPROC, Vol. 129, Pt. E, No. 3, MA Y 1982

Page 7: Distributed multiple-microprocessor network

request, (address held on buss near MPU, indicated by VMA),the 0EXT one-shot will stretch 022 by one more memory cycleas indicated by the section marked by A in Fig. 10.

It is noteworthy that with this asynchronous operationamong different modules, the system effectiveness and data-transfer rate can be improved, the system really does notprocess in the same manner as traditional array processorswhich work in full synchronism and perform the same task.The cycle stealing approach for external mailbox access ismost suitable for distributed processing, as it prevents theoperation of any module from being severely impeded. Withthe distributed control, the failure of any one module does notcause system crash as suitable reconfiguration can be carriedout and the failed module can be logically disconnected fromthe rest of the system.

6 Performance analysis

The performance of the distributed system can be reflectedby a number of parameters. Here, we will consider two ofthem: total memory bandwidth and expected message transferdelay.

The total memory bandwidth is the expected number ofmemory accesses in the system within one clock cycle (0i).It reflects the number of processor cycles completed within 04

as well and is a measure of the system efficiency. Under idealsituation when each node does not access any other node,none of the processor cycles is stretched, and the effectivememory bandwidth (B) is simply n, the number of nodesin the network. But in practical cases, cross accessing is inevi-table and the bandwidth will have to be reduced, dependingon the frequency of cross accessing. Let

r = percentage time a memory access is to externalmailbox

Pi = probability that / other MRs (mailbox requests) havepriority over this particular mailbox access(0 < / < k*)

t — expected clock skew between neighbouring nodes

Then

B = r) + nriT11 + i + 1

= n{\ -

where

|3 = (expected access time to external mailbox) l

measured in terms of memory cycles"1

So

B =

Thus, the effective memory bandwidth is linearly proportionalto n, /• and j3. The above relationship is plotted in Fig. 11A{Bin against r) and Fig. 1 IB (B/n against /?). The system is thusnot limited by n, but the effects of r and (3 on B are clearlyrevealed in the Figure.

The message transfer delay from one source node to adistant destination node is a crucial parameter characterisingthe system responsiveness to coupling among distributedmodules in the system. If the transfer path of a message hasan expected length of / (so /—I intermediate nodes are

*In case of array structure, k is equal to 7, as a MRO may be super-seded by four MRIs at the requesting node and three MRIs at therequested node. Also, deadlock is ignored in the analysis

traversed), and each message is of length MT words (memoryaccesses), the expected delay in the transmission from sourceto destination is

D = /r1 MTI

This delay, however, is reducible by suitable modificationin the transmission software. Consider further partitioningeach partition of a mailbox into two parts, A and B. While Ais filled by a source node, B can be processed and forwardedby the receiving node, and vice versa. Based on this, a messagecan then be divided into submessages, or packets, each of sizeMP (so that MT = p Mp). The latter are transmitted in a

I

0=0

r 1

Fig. 11A Normalised bandwidth against r

Fig. 11B Normalised bandwidth against j3

streamline fashion, filling into alternating parts of the corre-sponding partition of each intermediate node. With this over-lapped processing embedded in data transfer,

D-p-1 Mpl

Mrpl

~ 0 P~

Consequently, the resulting delay D is independent of /, thepath lenth, and we have a responsive system.

7 Conclusions

A distributed multiple microprocessor system has beendescribed. Although the presentation centres around the arraystructure, it is easily envisaged that the modular design isapplicable to any graphical structure by a certain application.A prototype system in the form of a ring of three nodes hasbeen designed and implemented in the laboratory, using the6800 microprocessors. In this case, the 6800 lends itself toeasy conversion to dual-processor configuration in each

1EE PROC, Vol. 129, Pt. E, No. 3, MA Y 1982 129

Page 8: Distributed multiple-microprocessor network

module. The stretching of 0j, the memory refresh and otherinterconnection control are successfully implemented. Theasynchronous operation of the modules facilitates both localcomputation and mutual coupling, as wastage due tosynchronous design can be avoided. The distributed systemdesign is particularly suited to real-time control applicationswhere the real-time task can be partitioned into individualsub-tasks executed on the distributed modules, synchronisedwhenever necessary via memory handshake.

A multiprocessor microcomputer with a time-shared bus foraccessing single-port common memory is a cost-effectivedesign. Bandwidth of high-speed solid-state memory and low-price computing power of microprocessors are both utilised.In such a system, time-multiplexed switches and the associatedcontrol are necessary hardware for routing address and datasignals, for boosting device fanouts and for isolating faultycomponents. In fact, even in unimicroprocessor systems, suchswitches have been used as drivers and isolators (tristatescontrol), though such isolation is usually not for fault-toler-ance purposes. With slight modifications, such switches andcontrol units will be able to perform more powerful functionsrequired by distributed systems. The modification is mainlythe addition of a number of IB switches and a simple priorityresolver. A communication mailbox is mainly implemented insoftware. Interconnection buses are actually pin terminals ofIB switches. They do not incur much extra cost. However, thereward is the readiness for distributed system configuration.Individual multiprocessor microcomputers can be readilylinked with only cables to form a distributed system of anysize. It is also equally easy to decompose such a system intoindependent subsystems, through hardware or softwarereconfiguration.

In addition to inexpensive elements, such as single-portmemory and moderate-speed microprocessors, the essentialhardware in each multiprocessor microcomputer has also

been reduced to a minimum. It can be seen that a distributedsystem consisting of such computing modules can operatewithout address translator, memory management, bus arbi-trator and DMA controller, etc. The priority resolver does thesimilar job of a bus arbitrator, the DMA job is done by indi-vidual microprocessors. Reducing such special function unit isalso a way to improve system cost effectiveness, and utilis-ation. Thus, the usage of more microprocessors, and compara-tively less special function hardware and peripheral will be atrend in microsystem design.

References

BURNETT, G.J., and COFFMAN, E.G.: 'A study of interleavedmemory system'. Proceedings of spring joint computer conference,AFIPS Press, Monvale, NJ, 1970, pp. 467-474BEDARD, C.J., MELLOR, F., and OLDER, W.J.: 'A message-switched operating system for a multiprocessor'. Paper presented atComputer software and application conference, Chicago, Nov. 1977,pp. 772-777WULF W.A., and BELL, G.G.: 'C.mmp - a multi-miniprocessor',Proc. AFIPS conf., 1972, Pt. 2, 41, pp. 765-777JOOBBANI, R., SIEWIOREK, D.P.: 'Reliability modelling ofmultiprocessor architectures', Proceedings of first internationalconference on distributed computing systems, Oct. 1979, pp. 384-398THURBER, K.J., and FREEMAN, H.A.: 'Architecture consider-ations for local computer networks.' Ibid., pp. 131-140SIEWIOREK, D.P., KIM, V., JOOBBANI, R., and BELLIS, H.:'A case study of C.mmp and Cm* and C.vmp: Pt. 2 - Predictingand calibrating reliability of multiprocessor systems', Proc. IEEE,1978,66, pp. 1200-1220SWAN, R.J., FULLER, S.H., and SIEWORIEK, D.P.: 'Cm* - Amodular multi-microprocessor'. National computer conferenceproceedings, 1977FEINBERG, D.L.: 'MIDSS: A unique multiprocessor telemetryground station'. International telemetering conference, Los Angeles,CA, 1976

Contents of Software 8- MicrosystemsThe contents are given below of the February and April 1982 issues of Software & Microsystems.

February 1982 issue

Host-satellite software tools for microcomputer systems.Chris Corbett and Prof. Ian H. Witten

April 1982 issue

Methodology for implementation of structured networkingsoftware. F. Halsall and A. Al-Jaff

EMU: a multiprocessor software debugging tool.P.C. Burkimsher

Applications reportPET-microcomputer-based Fourier transform and dataacquisition system using a Michelson interferometer forfar-infra-red spectroscopy

Conference reportsMIT/ACM conference on functional programming languagesand computer architecture

IECI '81

Prototype X25 exchange for use in local area networks.F. Halsall and J.A. Ruela

Applications reportMicrocomputer for control of a domestic oven.J. Billingsley

Software engineers and the IEE. R.W Sutton

*Software & Microsystems, published bimonthly by the IEE. Annual clearly stated. AH subscription inquiries and orders should be sent to:subscription prices: £27.00 UK, £29.50 overseas, $65.00 USA. IEE IEE Publications Sales Dept., PO Box 26, Hitchin, Herts. SG5 ISA,members may take out a personal subscription for £10, provided that England,payment accompanies each order and IEE membership number is

130 IEE PROC, Vol. 129, Pt. E, No. 3, MA Y 1982