design, implementation and evaluation of an abstract ... · calhoun: the nps institutional archive...
TRANSCRIPT
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1988-06
Design, implementation and evaluation of an
abstract programming and communications interface
for a network of transputers
Bryant, Gregory R.
Monterey, California. Naval Postgraduate School
http://hdl.handle.net/10945/23158
^_ _o-600i
NAVAL POSTGRADUATE SCHOOL
Monterey , California
bOT^3
THESIS
DESIGN. IMPLEMENTATION AND EVALUATIONOF AN ABSTRACT PROGRAMMING ANDCOMMUNICATIONS INTERFACE FOR A
NETWORK OF TRANSPUTERS
by
Gregory R. Bryant
June 1988
Thesis Advisor: Uno R. Kodres
Approved for public release; distribution is unlimited
T238713
iCURITY CLASSIFICATION OF THIS PAGE
REPORT DOCUMENTATION PAGE
a. REPORT SECURITY CLASSIFICATION
Unclassifiedlb RESTRICTIVE MARKINGS
a. SECURITY CLASSIFICATION AUTHORITY
b. DECLASSIFICATION /DOWNGRADING SCHEDULE
3 DISTRIBUTION/AVAILABILITY OF REPORT
Approved for public release;Distribution is unlimited
PERFORMING ORGANIZATION REPORT NUMBER(S) 5 MONITORING ORGANIZATION REPORT NUMBER(S)
a. NAME OF PERFORMING ORGANIZATION
Naval Postgraduate School
6b OFFICE SYMBOL(If applicable)
Code 52
7a NAME OF MONITORING ORGANIZATION
Naval Postgraduate School
c. ADDRESS {City, State, and ZIP Code)
Monterey, California 93943-5000
7b ADDRESS (Gty, State, and ZIP Code)
Monterey, California 93943-5000
a. NAME OF FUNDING / SPONSORINGORGANIZATION
8b OFFICE SYMBOL(If applicable)
9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
c. ADDRESS (City, State, and ZIP Code) 10- SOURCE OF FUNDING NUMBERS
PROGRAMELEMENT NO
PROJECTNO
TASKNO
WORK UNITACCESSION NO
1 TITLE (Include Security Classification)
)esign. Implementation and Evaluation of an Abstract Programming and CommunicationsInterface for a Network of Transputers. ^2 PERSONAL AUTHOR(S)
3a. TYPE OF REPORT
Master's Thesis13b. TIME COVEREDFROM TO
14 DATE OF REPORT {Year, Month, Day)
1988 June15 PAGE COUNT
176
6. SUPPLEMENTARY NOTATIONThe views in expressed in this thesis are those of the author and do not reflect the
official policy or position of the Department of Defense or the U.S. Government.COSATI CODES
FIELD GROUP SUB-GROUP
18 SUBJECT TERMS {Continue on reverse If necessary and identify by block number)
Distributed computing; Transputer; OCCAM; Network;Event Counts and Sequencers; Communicating SequentialProcesses
9 ABSTRACT {Continue on reverse if necessary and identify by block number)This thesis presents the design, implementation and evaluation of two abstractedprogramming and communication interfaces for developing distributed programs on a
network of Transputers. One interface uses a shared memory model for interprocesscommunication and synchronization. The other interface uses a message passing modelfor communication and synchronization. The programming interfaces allow developmentof distributed programs that are independent of the physical configuration of a network.This thesis also presents an evaluation of Transputer performance with a particularemphasis on the interaction of computation and inter-Transputer communication.
20 DISTRIBUTION/AVAILABILITY OF ABSTRACT
E UNCLASSIFIED/UNLIMITED D SAME AS RPT dTIC USERS
22a. NAME OF RESPONSIBLE INDIVIDUAL
Professor Uno R. Kodres
21 ABSTRACT SECURITY CLASSIFICATION
Unclassified
22b TELEPHONE (/nc/ude Area Code)
(408) 646-2197:2c. OFFICE SYMBOLCode 52Kr
DDFORM 1473, 84 MAR 83 APR edition may be used until exhausted.
All other editions are obsoleteSECURITY CLASSIFICATION OF THIS PAGE
ft U.S. Government Printing Office: 1986—606-24.
Unclassified
Approved for public release; distribution is unlimited.
Design, Implementation and Evaluation of an Abstract Programming
and Communication Interface for a Network of Transputers
by
Gregory R. Bryant
Lieutenant Commander, United States Navy
B.S.E.E., University of New Mexico, 1975
Submitted in partial fulfillment of the requirements for the degree of
MASTER OF SCIENCE IN COMPUTER SCIENCE
from the
NAVAL POSTGRADUATE SCHOOLJune 1988
/
ABSTRACT
This thesis presents the design, implementation and evaluation of
two abstracted programming and communication interfaces for devel-
oping distributed programs on a network of Transputers. One inter-
face uses a shared memory model for interprocess communication and
synchronization. The other interface uses a message passing model
for communication and synchronization. The programming interfaces
allow development of distributed programs that are independent of
the physical configuration of a network. This thesis also presents an
evaluation of Transputer performance with a particular emphasis on
the interaction of computation and inter-Transputer communication.
ni
THESIS DISCLAIMER
The reader is cautioned that computer programs developed in
this research may not have been exercised for all cases of interest.
While every effort has been made within the time available to ensure
that the programs are free of computational and logic errors, they
cannot be considered validated. Any application of these programs
without additional verification is at the risk of the user.
Many terms used in this thesis are registered trademarks of
commercial products. Rather than attempting to cite each individual
occurrence of a trademark, all registered trademarks appearing in this
thesis are listed below the firm holding the trademark:
INMOS Limited, Bristol, United Kingdom:
Transputer
OCCAMIMS T414IMS T800Transputer Development System (TDS)
Relational Technology Inc., Alameda, California:
Ingres
IV
BYrE SCHOOL:IA 93943-5002
TABLE OF CONTENTS
L BviRCoucna^. i
A BACKGROUND 1
B. THE TRANSPUTER 2
C ABSTRACT PROGRAMMING INTERFACES.......... .2
D. AVAILABLE HARDWARE AND SOFTWARE 3
E. THESIS ORGANIZATION 3
n. THE TRANSPUTER 5
A OVERVIEW 5
B. COMMUNICATING SEQUENTIAL PROCESSES 5
C. TRANSPUTER ARCHITECTURE 8
L Processor. 8
a Instruction Set 10
b. Concurrency 1
3
2. Floating Point Unit 14
3. Links .14
4. Memory 17
5. Timers 18
6. External Event Input 19
m. TRANSPUTER PERFORMANCE 20
A OVERVIEW 20
B. PRIORRESEARCH 20
C TEST METHODOLOGY... 21
1. Configuration 21
2. Software ..23
3. Conditions 23
4. Data Management ,.25
D. TEST RESULTS... 26
1. Isolated Processor Performance................. 26
2. Isolated Communications Link Performance............. 31
3. . Communication link Interaction 34
4. Communication Link Effects on Processor
Performance........ 38
5. Processor Effects on Communications Link
Performance ...43
6. Summaiy...... 44
IV. SHARED MEMORY MODEL PROGRAMMINGINTERFACE........ 45
A BACKGROUND .45
B. EVENT COUNTS AND SEQUENCERS 46
C IMPLEMENTATION... 47
1. The Kernel.. 50
a Input/Output Buffers 5
1
VI
b. Communications Manager , 5 1
c. Shared Memory Manager 52
2. Data Structures 52
a nodclink. 52
b. count.node.................o......o. 53
c. countarray 53
d. countarrayindices. ...54
e. node.awaits.................o............................................................ 54
f. free. list ..56
g. countsize... .56
h. node.data 56
3. Library Procedures ,57
a read(link, count.id, count.value) 59
b. advance(link, count.id) 59
c. await(link, count.id, count.value).... .59
d. ticket(link, count.id, ticketvalue) 59
e. put(link, count.id, index, byte.array).. .59
f. get(link, count.id, index, byte.array)..................... 60
g. ecs.kernel(link,array,count.node,count. size,
node. id, node. link) 60
D. PROGRAMMING .........60
E. EVALUATION ................61
1. Basic Interface Procedure Timing .62
2. Node Distance Effects on Communication Rate 66
Vll
3. Hardware Link Sharing Effects on
Communication Rate 68
4. Multiple Link Effects on Communication Rate 70
V. MESSAGE-PASSING MODEL PROGRAMMINGINTERFACE...... 73
A BACKGROUND........ 73
B. IMPLEMENTATION 73
1. The Kernel. 76
a Input/Output Buffers. 76
b. Channel Controllers 76
c. Communications Manager 78
2. Data Structures 78
a nodclink............ .79
b. gchan.node 79
c. gchanJchan 8
1
d. loc.chan 8
1
e. lchan.gchan.......<. 8
1
f. chan.map 8
1
2. Library Procedures 82
C. PROGRAMMING 82
D. EVALUATION 83
1. Node Distance Effects on Communications Rate 83
2. Hardware Link Sharing Effects on
Communication P?ate..... 87
3. Multiple Link Effects on Communication Rate 88
Vlll
VI. CONCLUSIONS AND RECOMMENDATIONS 89
A CONCLUSIONS 89
B. RECOMMENDATIONS 90
APPENDIX A DETAILED TRANSPUTER TIMING TESTCONFIGURATION AND SOURCE CODE 92
A SUMMARY. 92
B. SOURCE CODE. .............................................92
1. Configuration Section 92
2. Target Node Procedure .95
3. Satellite Node Procedure. 1 00
4. Echoing (Data Routing) Node Procedure 102
5. Host (Data Recording) Procedure .103
APPENDK B TIMING DATABASE DESCRIPTION 1 08
A DISCUSSION ....108
B. DATABASE DESCRIPTION 1 08
C. EXAMPLES Ill
1. Data Loading 1 1
1
2. Data Retrieval........ ........112
3. Data Unloading 114
APPENDIX C DETAILED SOURCE CODE FOR THE SHAREDMEMORY ABSTRACT INTERFACE LIBRARY 115
A SYMBOUC CONSTANTS 1 1
5
IX
a KERNEL PROCEDURE .116
C. READ PROCEDURE 1 26
D. ADVANCE PROCEDURE 126
E. AWAIT PROCEDURE.................. 127
R TICKET PROCEDURE 1 28
G. PUTPROCEDURE 129
H. GET PROCEDURE 1 30
APPENDIX D SAMPLE PROGRAM USING THE SHARED-MEMORYINTERFACE......... 1 32
A DESCRIPTION.. .....132
B. MODULARIZATION....... ........132
1. Producers 132
2. Consumer 133
C MODULE CODING 133
1. Producers 133
2. Consumer .134
D. APPORTIONMENT OF MODULES 134
1. Single Transputer 135
2. Three Transputers 1 36
X
APPENDIX E DETAILED SOURCE CODE FORTHE MESSAGE-PASSING ABSTRACTINTERFACE LIBRARY 1 38
A SYMBOUC CONSTANTS 138
B. KERNEL PROCEDURE 139
APPENDIX F SAMPLE PROGRAM USING THE MESSAGE-PASSINGINTERFACE 150
A DESCRIPTION....... 150
B. MODULARIZATION 1 50
1. Producers 150
2. Consumer 151
3. Buffer 1 5
1
C MODULE CODING 151
1. Producers 151
2. Consumer 1 52
3. Buffer. 152
D. APPORTIONMENT OF MODULES 153
1. Single Transputer 153
2. Four Transputers.... .154
LIST OF REFERENCES 1 57
INITIAL DISTRIBUTION LIST 159
XI
LIST OF TABLES
3.1 T414 Expected and Measured Instruction
Execution Times =...........,.,............0, 30
3.2 T800 Expected and Measured Instruction
Execution Times 3
1
3.3 Isolated Communication Link Performance 32
3.4 Processor Effects on Communications Performance..... 43
B.l Timing Data Relation Definition... 109
B.2 Relation Definition for Converting Encoded
Operation Names......... 1 10
B.3 Relation Definition for Converting Encoded
Priority Names ...1 10
B.4 Relation Definition for Converting Encoded
Location Names........................................ Ill
B.5 Relation Definition for The Retrieved New Relation. 114
xii
LIST OF FIGURES
2.1 Process Representation Ejcample 7
2.2 Process Abstraction Example 7
2.3 Transputer Functional Block Diagram........ 9
2.4 Processor Block Diagram 10
2.5 Example of Operand Register Operation................... 1
1
2.6 Hardware Communication Link Logical Block Diagram 1
5
2.7 Serial Communication Link Protocol.... 16
3.1 Logical Diagram of Test Configuration 22
3.2 T414 Multiple Link Effects on Communications Rate 35
3.3 T800 Multiple Link Effects on Communications Rate 35
3.4 T414 Packet Size Effects on Communications Rate 37
3.5 T800 Packet Size Effects on Communications Rate 37
3.6 T414 Performance Degradation with
No Loop Operation 39
3.7 T414 Performance Degradation with
Four Divide Operations .39
3.8 T800 Performance Degradation with
No Loop Operation..... .................................40
3.9 T800 Performance Degradation with
Four Divide Operations 40
3.10 T414 Bus Access Frequency Effects on
Performance Degradation 42
3.11 T800 Bus Access Frequency Effects on
Performance Degradation 42
Xlll
4.1 Physical Configuration of Shared Memory Kernel 48
4.2 Logical Configuration of Shared Memory Kernel. 49
4.3 Shared Memory Kernel Logical Block Diagram 50
4.4 Kernel Communication Packet Formats. 51
4.5 Kernel Waiting Process List Structure..... 55
4.6 Kernel Waiting Free List Management 57
4.7 Kernel Shared Memory Access Data Structures 58
4.8 Primitive Operation Timing Test Results 63
4.9 Put Operation Timing Test Results 63
4.10 Get Operation Timing Test Results 64
4.11 Count and Shared Memory Location Effects
on Communication Rate 66
4.12 Hop Distance Effects on Communication Data Rate. 67
4.13 Intermediate Process Degradation
During Communication 69
4.14 Shared Hardware Link Communication Rates..... 70
4.15 Multiple Hardware Link Communication Rates .71
5.1 Physical Configuration of Message Passing Kernel .74
5.2 Logical Configuration of Message Passing Kernel.. 75
5.3 Message Passing Kernel Logical Block Diagram 77
5.4 Kernel Communication Packet Format 78
5.5 Kernel Data Structure Interrelation 80
5.6 Hop Distance Effects on Communication Data Rate 84
5.7 Comparison of Hop Distance Effects 85
5.8 Intermediate Process Degradation During Communication 86
5.9 Shared Hardware Link Communication Rates 87
XIV
5.10 Multiple Hardware Link Communication Rates .88
A. 1 Detailed Test Configuration 9 3
B.l Query Language Listing for Loading the Database 1 12
B.2 Example of Retrieval for Display 1 13
B.3 Example of Retrieval for Forming a New Relation.. 1 13
B.4 Query Language Listing for Unloading the Database 1 14
XV
DEDICATION
I dedicate this thesis to my wife, Kathy, who gave me confidence
and encouragement and to my children, Betsey and Rusty, who were
patient and understanding.
XIV
I. INTRODUCTION
A BACKGROUND
The Aegis Modeling Project in the Computer Science Department
at the Naval Postgraduate School is engaged in researching advanced
computer architectures for potential future application aboard naval
ships. Currently, this research is centered on the application and
evaluation of distributed computing architectures. A distributed
architecture is particularly suited for shipboard applications. Ship-
board locations at which processing capabilities are required are
physically distributed, yet processors at all locations must cooperate to
monitor and control a ship's sensors and systems. Such an architec-
ture can also provide the necessary characteristics of high perfor-
mance, fault tolerance, and extensibility.
One emphasis of the current research has been to investigate sys-
tems that are composed of relatively low cost, "off-the-shelf compo-
nents. The single chip microprocessor is such a component. A range
of microprocessor-based distributed multicomputer systems are,
therefore, being evaluated. One such system already developed and
evaluated uses clusters of microprocessor-based single-board comput-
ers interconnected by a hierarchical bus structure [Ga86]. A dis-
tributed system architecture now being evaluated is based on the
single chip microprocessor known as the Transputer.
a THE TRANSPUTER
The Transputer is a single chip microprocessor that has been
specifically designed to function as a computing element in a dis-
tributed multicomputer system. The name 'Transputer" is an amal-
gam of the words transistor and computer. As the transistor was a
building block for large and varied electronic circuits, the Transputer
is intended by the manufacturer to be an analogous building block for
distributed computing systems.
To facilitate the use of the Transputer as an element in a dis-
tributed system, the Transputer implements the concept of Commu-
nicating Sequential Processes [Ho79]. Communicating Sequential
Processes is a paradigm which defines and describes the interaction of
programs that execute in parallel (as is the case in a distributed
system).
C ABSTRACT PROGRAMMING INTERFACES
Program development for a network of Transputers is currently
closely tied to the particular physical configuration of a given network.
The physical configuration of a network must be considered early and
throughout the software design process. In certain cases, the configu-
ration can actually dictate aspects of software design. This thesis
investigates isolating the software designer from this physical configu-
ration through the use of an abstract programming interface.
D. AVAILABLE HARDWARE AND SOFTWARE
The Transputer hardware available to the Aegis Modeling Project
is varied. The hardware includes Transputer interface cards for per-
sonal computers. Transputer-based serial interface and color graphics
interface cards, and cards with multiple Transputers. Aspects of this
hardware that pertain to portions of this thesis are discused where
applicable. A complete description of the hardware is available
elsewhere [In86].
Programs included in this thesis were developed using the Trans-
puter Development System(TDS) [In87a] and the OCCAM program-
ming language [PoMa87]. OCCAM is a high-level block-structured lan-
guage which includes constructs based on CSP to support program-
ming in a distributed environment. An assembler and Pascal and C
compilers for the Transputer are also available.
E. THESIS ORGANIZATION
Chapter II describes the Transputer and the basic concept of
Communicating Sequential Processes.
This thesis investigates the basic performance characteristics of a
Transputer as an element in a network of Transputers. Chapter III
describes testing that was performed and documents the results of
this testing.
Chapter IV and Chapter V describe and evaluate two different
prototype abstract programming interfaces developed for a network of
Transputers. One interface is based on a virtual globally shared mem-
ory. The other is based on message passing.
Chapter VI presents the conclusions reached as a result of devel-
oping, using, and evaluating the programming interfaces. Recommen-
dations for further research are also provided in Chapter VI.
II. THE TRANSPUTER
A OVERVIEW
Central to the to the Transputer is the concept of Communicating
Sequential Processes (CSP) [Ho 79]. The Transputer is, in fact, a hard-
ware implementation of this concept. The programming language
OCCAM, which is the primary language used for programming the
Transputer, is based on this concept. A summary of CSP is presented
in this chapter.
To effectively evaluate and use the Transputer requires an under-
standing of how the Transputer implements the concept of CSP. In
addition, in many cases this thesis refers to Transputer architectural
features and details. This chapter, therefore, also presents a brief
description of Transputer architecture.
a COMMUNICATING SEQUENTIAL PROCESSES
Communicating Sequential Processes (CSP) is one model for con-
current or parallel programming. In CSP, a program is composed of
processes. A process consists of a list of commands or instructions
that are to be executed in sequence. The different processes within a
program are combined and specified to be executed in sequence or in
parallel or in some sequential/parallel combination. The data spaces
for any processes executed in parallel are constrained to be disjoint.
This requirement for parallel processes to have disjoint data
spaces precludes using shared memory to communicate between the
processes. To provide for necessary process-to-process communica-
tion, CSP instead utilizes message passing. Messages are passed
between any pair of parallel processes via synchronous, unbuffered,
point-to-point communications channels connected between the
processes. These communication channels also provide the means for
synchronizing processes. To communicate between two processes,
one process must include an instruction for performing an output to
the other process and the other process must include a corresponding
input instruction. Both processes must be at that point in their
instruction execution where the communication is specified to occur
(If one process reaches this point first, it waits for the other process
to reach its point of communication). The communication is then
performed. Since the communication only occurs when both process
are at their points of communication, the processes are synchronized
at these points. In addition, CSP includes constructs for program
control and sequencing and for conditional selection between multiple
communications
.
Figure 2.1 depicts a set of processes (represented as circles)
interconnected by point-to-point communications links (represented
as directed lines). This set of processes would operate independently
and in parallel on a continuous stream of input values to produce a
continuous stream of output values.
6
Processes to Calculate Unity Based on the Formula
r"^
—
>/sin X +2
cos X
m AqrtW
Figure 2.1. Process Representation Example
Groups of processes may be logically aggregated to form larger,
more abstracted process constructs. Figure 2.2 shows one possible
abstraction of a subset of the processes shown in Figure 2.1.
Processes to Calculate Unity Based on the Formula
Vsin X +2
cos X
Figure 2.2. Process Abstraction Example
C TRANSPUTER ARCHITECTURE
A block diagram of a typical Transputer is shown in Figure 2.3.
This figure depicts the major architectural components a Transputer.
The following sections give a brief description of each of these com-
ponents. In particular, these sections point out some architectural
aspects of the Transputer that can influence the performance of pro-
grams written for the Transputer.
Several versions of the Transputer are currently available. This
thesis considers only Transputer types T414 and T800. For this rea-
son, the following sections describe the features of these Transputer
types. A complete description of all currently available Transputers
can be found elsewhere [In87b].
1. Processor
In general, the processor consists of a 32-bit integer arith-
metic unit and a set of 32-bit registers. Figure 2.4 shows a block dia-
gram of the processor. The I or instruction register points to the next
instruction to be executed in a process. The W or workspace register
is a pointer to the beginning of an area in memory that is the data
space for a process. Together, these two registers can be thought of
as representing the instructions and the data for a single process. The
A, B, and C registers form a push-down evaluation stack. In general,
all operations are performed on or using the values in these registers.
For example, the load and store operations load and store the value of
the A register. The add operation adds the values of the A and B
registers leaving the result in the A register.
8
Floating Point Unit
RAM Sn^
Timers \^^
^^ Event ^Cj:^
o
<=> LinkO
Processor
^P^^W"^W^
O Link 1
V-v' '-'"'^ 2
CC> Link 3
External MemoryInterface
ZV"
|-<
^
t
\7Figure 2.3. Transputer Functional Block Diagram
9
^IV/
AJ
O Register
A Register
B Register
C Register
Register
W Register
Micro-Coded
SequenceController
32-bit Integer
Arithmetic
Logic Unit
mjMMji*jiM^**i
Figure 2A. Processor Block Diagram
a Instruction Set
Each byte in a program can be viewed as having a four-bit
high-order half and a four-bit low-order half. The low-order half of an
instruction byte contains a data value. The high-order half contains a
function code. To execute such an instruction, the data value half of
the instruction byte is loaded into the operand register and the func-
tion encoded in the other half of the instruction byte is performed. At
this point, it would appear that this instruction format limits operand
values to the range of to 15 and that only 16 different functions
could be performed. A means is provided, however, for representing
larger data values and for encoding a greater number of functions. The
O or operand register is used to form operands and multi-byte
instructions from a sequence bytes. For data values represented by
10
more than four bits, special function codes cause individual four bit
"pieces" of the data value to be extracted from a series of instruction
bytes and accumulated in the operand register. The last function code
in this series of instruction bytes will actually operate on the accumu-
lated data value. Figure 2.5 shows an example of this operand forma-
tion process.
Example Adds $5A9 to Register A
(adc #$5A9}
Function
Codes
O Register
A Register
O Register
A Register
Instruction
Bytes
y"
; : ! ! : ;
! ! : B? 4 ! 1 : 3
Register |; ; ; 5 : 5 ;o 1
A Register|
; ; ; B ! 4 ; 1 .3 1
; : : ! ! 5 : A :
! ; ; ! B 5 4 ; 1 i 31
^ / 'y
X ; Y
Register | : : ; ! ; : 1
A Register | ! : : BJ 9 ; B i Q 1
Data
Values
Figure 2.5. Example of Operand Register Operation
11
To encode functions with value representations greater
than four bits, the value representing such a function is loaded into the
operand register in the same manner as if it were data. A special
function code then causes the value in the operand register to be exe-
cuted as an instruction. Using this instruction formatting scheme, the
instruction set has been optimized so that the most frequently used
operations are encoded using only a single byte. Measurements show
that about 70% of the instructions actually executed in a typical pro-
gram are, in fact, encoded using only a single byte [In87c]. Having
most instructions represented as single bytes not only reduces the
memory requirements for program code but tends to improve pro-
gram performance. This is because, since fewer bytes are fetched per
instruction executed, fewer memory accesses will be required to exe-
cute the program.
This method of encoding instructions has other effects
on processor performance. Each byte of a multi-byte instruction takes
one processor clock cycle to load into the operand register for assem-
bly of the instruction or its operand. Because of this, instructions with
a greater byte length take more time to assemble before they can be
executed. In most cases, the length of an instruction is fixed. How-
ever, in some cases, the length of an instruction is dependent on the
value or location of the instruction's operand. Since these factors can
be controlled by the programmer, it is useful to be aware of these
types of instructions. For example, data values located in the first 16
locations above the base of the processor workspace require only one
12
instruction byte to be accessed. Data values located elsewhere require
additional instruction bytes, which increases the time required to
access these data values. Because of this, frequently accessed data val-
ues should be located closest to the base of the workspace. Also,
loading a constant takes one instruction byte for each four bits of the
constant's length. This means that loading a 32-bit constant takes
eight processor clock cycles. If such a constant is to be used
repeatedly, it is often more efficient to name and store the constant as
a data value and use that data value instead.
b. Concurrency
A single Transputer directly supports running concur-
rent processes. These processes may be either of two priority levels:
high or low. To facilitate implementation of concurrency and prioriti-
zation, the Transputer has a micro-coded process scheduler. The
operations performed by this scheduler can be examined based on
whether a process is active or inactive. An inactive process is one that
is waiting on communication or on a programmed time delay
(operations for inactive processes are discussed later in this chapter).
An active process is one that is not waiting and is ready to execute.
To manage the active processes, the process scheduler
maintains a separate linked list of active processes for each priority
level. Instructions are included in the instruction set for starting new
processes by adding the process to an active process list. Processes
are selected for execution from these active process lists based on the
following generalized rules:
13
• High-priority processes are always executed in preference to low-priority processes. If a low-priority process is executing when ahigh-priority process is added to the high-priority active processlist, the low-priority process is preempted and the high-priorityprocess is executed. A high-priority process is executed until it
completes or until it must wait for communication or for a pro-
grammed time delay.
• When no high-priority processes are available for execution, a low-priority process may be executed. A low-priority processes is
executed until it must wait for communication or for a pro-grammed time delay. In addition, to ensure that one low-priority
process does not monopolize the processor, a low-priority pro-
cess that has been executing for more than about one millisecondis suspended. The suspended process is placed at the end of thelow-priority active process list and the process at the beginning of
the low-priority active process list is then executed.
2. Floating Point Unit
One version of the Transputer includes hardware for per-
forming floating point arithmetic operations. Internally, the floating
point unit includes a three-register floating-point evaluation stack that
operates in the same manner as the "normal" or integer processor's
evaluation stack. The floating-point unit operates in parallel with the
other components of the Transputer. Floating-point operations may,
therefore, be performed in the floating-point unit at the same time
that integer calculations are being performed in the integer processor.
Currently, only the Transputer model T800 includes this floating-
point unit.
3. Links
The hardware links provide the means for implementing CSP
communication channels between processes executing on different
Transputers. A link from one Transputer is connected to a link on
another Transputer to provide a bidirectional pair of communications
14
channels. Each hardware link can perform simultaneous, independent
input and output communication. Instructions are included in the
Transputer's instruction set for performing input and output opera-
tions using the links. A block diagram of a communications link is
shown in Figure 2.6. Each communications link consists of an inde-
pendent direct memory access controller and serial communication
logic.
\
«DMA Controller
Serial Output
ControllerMessage Location
Register
/
/
Message Length
Register
Serial Input
Controlleriiiplii;
Resume Process
Pointer ^
Figure 2.6. Hardware Communication Link Logical Block Diagram
To perform a communication via a hardware link, the com-
munication link address and the size and location of a message are
specified, then the communications instruction is executed. This ini-
tializes the direct memory access controller with the size and location
of the message to be communicated. The process executing the
15
communication instruction is suspended and a pointer for later
resuming the process is saved (another ready process from an active
process list may then be executed). When both the sending and
receiving links have been initialized in this manner, the message
communication is accomplished. When the communication is com-
plete, the process which was suspended for communication is added
to the end of the appropriate high- or low-priority active process list
to wait its turn for execution.
Messages are transmitted by the links one byte at a time in a
bit-serial format. After a receiver has recognized the reception of a
byte and is capable of receiving another byte, the receiver transmits an
acknowledge message. The transmitter will await reception of the
acknowledge message before transmitting the next message byte.
Since the link hardware performs no error checking on messages, the
purpose of the acknowledge message is solely to control the flow of
message bytes between the links. Figure 2.7 shows a formatted mes-
sage byte and an acknowledge message.
^Data Packet
1 i1
2 Start Bits
t > t t 1 t «
!
8 Data Bits
Acknowledge Packet
Stop Bit
,4_„1 ;
Figure 2.7. Serial Communication Link Protocol
16
A method for communicating between processes executing
on the same Transputer is also provided. To perform such a commu-
nication, a memory address and the size and location of a message are
specified, then the communications instruction is executed (note that
this is the same sequence required to initiate communication via a
link). When both the sending and receiving processes are ready to
communicate, the communication is accomplished by performing a
memory-to-memory transfer of the message data.
4. Memory
The Transputer can address four gigabytes of memory. This
memory space is divided into two non-overlapping segments: an
internal or on-chip segment of memory and an external or off-chip
segment of memory. Although these two segments of memory are
logically the same, the physical characteristics of the two segments
are quite different.
The on-chip memory consists of up to four kilobytes of static
random access memory (depending on the type of Transputer being
considered). Within the address space, the on-chip memory occupies
the lowest block of addresses. The on-chip memory can be accessed
for read or write via the internal processor bus in one processor clock
cycle.
The off-chip memory forms the balance of the address space.
The off-chip memory is accessed via an external memory interface.
This external memory interface provides access to the external mem-
ory by multiplexing a 32-bit address and 32 bits of data onto a single
17
32-bit external bus. The timing for this multiplexing slows access to
the external bus to a minimum access time of three processor clock
cycles. The actual number of cycles required to access external mem-
ory is also dependent on the requirements of the external memory
devices. For the Transputer systems available in our laboratory, the
external memory access times range from three to five processor
clock cycles. This difference in access times between the "fast" on-
chip and "slow" off-chip memory can affect program performance.
Frequently accessed variables or code segments should, therefore, be
preferentially located in the "fast" on-chip memory.
Additionally, the external memory interface includes control
logic for refreshing external dynamic random access memory devices.
Control lines and signals are also provided to facilitate peripheral
device direct memory access to the external segment of memory.
5. Timers
The Transputer provides two hardware timers. Each of these
timers can be viewed as free-running 32-bit binary counters. One of
tlrie timers is accessible to high-priority processes; the other timer is
accessible to low-priority processes. The high-priority timer incre-
ments at intervals of one [isecond, for a total cycle time of about 72
minutes. The low-priority timer increments at intervals of 64
[iseconds, for a total cycle time of about 76 hours.
Instructions are provided for initializing the value of these
timers and for reading the current value of a timer. An instruction is
also provided for suspending execution of a process until a specified
18
timer value is reached. To implement this instruction, the Transputer
maintains a linked list of suspended processes waiting on timer values.
Separate lists are maintained for the high- and low-priority timers.
These lists are ordered by the specified "wait-until" time value. The
first "wait-until" time value in each list is loaded into a dedicated reg-
ister and, using hardware, is compared with the current value of the
high- or low-priority timer. When the "wait-until" timer value is
reached, the suspended process is removed from the timer list and is
added to the end of the appropriate active process list to wait its turn
for execution. The next timer list entry is then loaded for comparison.
6. External Event Input
The external event input on the Transputer is similar to an
external interrupt input. To the Transputer, this external event input
appears as a communications channel which is capable of transmitting
a signal to a user's program. A user's program requesting input from
the external event channel will be suspended if the external event
input is not being asserted. Then, when the external event input is
asserted, the process will be added to the end of an active process list
to wait its turn for execution. Either a high- or a low-priority process
may request input from the external event channel.
19
III. TRANSPUTER PERFORMANCE
A OVERVIEW
As has been described in the previous chapter, the Transputer is a
complex microprocessor. Because of this complexity, the perfor-
mance of even a single Transputer can be affected by many factors.
Such factors might include whether or not the program and/or
associated data is in on-chip or off-chip memory or if there is internal
bus contention resulting from external communications link direct
memory access. When considering a network of Transputers, the fac-
tors that can affect overall network performance are multiplied con-
siderably.
Developing efficient Transputer-based systems in the face of this
complexity requires a firm understanding of Transputer performance
and the manner in which different factors influence that performance.
Evaluating Transputer-based systems requires accurate methods for
measuring the performance of individual and networked Transputers.
To begin to understand Transputer performance characteristics
and to gain experience in measuring individual and networked Trans-
puter performance, a series of timing and performance studies was
conducted.
a PRIOR RESEARCH
INMOS provides basic performance specifications for the
Transputer [In87b and In87d]. The basic specifications list the
20
performance for individual machine-level and high-level language
operations. The listed specifications consider the effects of some of
the factors that can potentially affect performance. While it is
expected that the manufacturer's performance specifications are
accurate, it is necessary to independently confirm this.
Prior theses [Va87 and Ha87] document some detailed tests and
analyses of Transputer performance. These theses point out, however,
that some aspects of their performance test results do not appear to
be consistent or cannot adequately be explained. They suggest that
further research be performed in this area to resolve the identified
problems and to extend the scope of performance testing.
C TEST METHODOLOGY
This chapter documents the suggested further timing research
and documents the results of an initial set of timing tests on the
recently released T800 20 MHz Transputer. Two major categories of
testing are addressed. The first category is testing to confirm the
basic manufacturer's performance specifications for the Transputer.
The second category is testing to determine the interaction that exists
between the operation of the central processing unit and communica-
tion link direct memory access activity.
1. Configuration
To accomplish both types of testing, a single test configura-
tion was developed. The test configuration consists of a central
"target" Transputer and four "satellite" Transputers, each attached to
the target Transputer by a communications link. In addition, there
21
are associated Transputers which perform the functions of control and
of data routing and recording. A logical diagram of this test configura-
tion is presented in Figure 3.1. A detailed diagram of the test set-up is
presented in Appendix A.
Host System
of Test Configtiration
Although based on and quite similar to a previously used test
configuration [Ha87], there is a subtle but significant difference
between the two test configurations. In the prior test configuration,
one of the satellites was the Transputer installed in the host develop-
ment system. User programs executed on the Transputer in the host
development system are run from within the Transputer Development
System (TDS) "shell."
22
After some initial experimentation, it became apparent that,
in some way, this shell affected the timing of programs run from
within the shell. The timing of programs running on Transputers
external to the shell but depending on communications to or from a
program run from within the shell also appeared to be affected. It is
postulated that some shell processes are active while the user pro-
gram is executing and, to an extent, interfere with the user program.
References to such processes can be found in [In87a].
Although it may be of interest to research this aspect of TDS
in the future, it is sufficient for the purposes of the current timing
tests to simply ensure that all timing measurements are performed
external to and independent of the host development system
Transputer.
2. Software
In general, the target Transputer performed some calculation
in a loop while the satellite Transputers placed different link input
and output communications loads on the target Transputer. The test
software monitored the time required to perform link communica-
tions and the number of calculation loops that could be performed
during those communications. Appendix A provides a listing of the
programs used to measure and record processor performance.
3. Conditions
Because so many factors can interact and affect Transputer
performance, it was necessary to set and hold some test conditions
fixed so that the effects of variations in parameters of interest could be
23
properly interpreted. Constant for the tests documented by this
thesis were:
• T414 Transputer processor speed was 15 MHz.
• T800 Transputer processor speed was 20 MHz.
• Communications link speed was fixed at 20 megabits per secondfor both Transputer types.
• Program code and scalar variable values in the calculation loopwere located in the "fast" on-chip memory for both Transputertypes.
• Scalar variables in the calculation loop were in "local" scope (i.e.,
within the first 16 bytes of the bottom of the workspace andaccessible by single byte load and store instructions).
• Memory-to-memory transfer blocks were located in "slow" off-
chip memory for both Transputer types.
• Communications data blocks were fixed at 100,000 bytes.Because this block size is much greater than the size of the on-chip memory, the block was located In "slow" off-chip memoryfor both Transputer types.
• CommLunlcatlons processes were run at high priority and the cal-
culation loop was run at low priority.
• Time measurements were taken in the each Transputer using thehigh-priority one-(isecond resolution timer.
The effects of varying certain parameters of interest were
investigated. The following list identifies the parameters of Interest
and the manner in which they were varied.
• The characteristics of the target Transputer calculation loop werevaried by placing different types and numbers of operations withinthe loop. The operations used within the loop were no operation,assignment, addition, subtraction, multiplication, division, and100- and 1000-byte memory-to-memory transfers. The numberof individual operations of a type in a loop ranged from to 4operations per loop.
24
• Communications conditions were varied by changing the numberof links that were active at one time. Additionally, the communi-cations conditions were varied by changing the size of a commu-nications packet. The number of active communications links
ranged from to 4 input links active and from to 4 output links
active. The individual data packet size was varied in 16 stepsfrom one byte per packet to 100,000 bytes per packet with anoverall constant data block size of 100,000 bytes.
4. Data Management
From the number of possible combinations of test conditions,
it was clear quite early that these timing tests would generate a large
amount of test data. To more effectively handle this test data, it was
decided to record the data in some database management system.
Using a database management system provided for ease of access to
selected aspects of the data. Additionally, since the potential exists
for performing further timing tests, such a system will facilitate the
incorporation and aggregation of any future timing data.
Since the Ingres relational database management system was
readily available on the departmental mini-computer, this system was
selected for use. To facilitate loading of the timing database, condi-
tions for a particular timing test and the test results were directed to
a disk file on the host development system, then electronically trans-
ferred to the departmental mini-computer for loading into the Ingres
timing database. Appendix B describes the Ingres database created for
the timing data and gives some examples of accessing timing informa-
tion from the database.
25
D. TEST RESULTS
Test runs covering the range of conditions described previously
were completed and the resulting data was loaded into the Ingres
timing test database. Data was extracted from this database to "view"
several aspects of Transputer performance. These aspects of perfor-
mance are:
1. Isolated Processor Performance
The first view of the timing results is to compare measured
instruction execution times with the manufacturer's specified instruc-
tion times for a selected subset of Transputer instructions. This is
intended as a confirmation of the manufacturer's stated execution
times. Additionally, if the timing results are consistent with the man-
ufacturer's specifications, it will tend to validate the timing test
methodology that is being used.
Determining the manufacturer's specified execution time for
a selected loop operation requires that the operation be examined at
the machine-language level. Using a disassembler [Br87] to facilitate
examination of the timing program object code, each of the selected
operations was decomposed into its machine-level components. The
manufacturer-specified number of processor clock cycles for each of
these components of an operation was summed, then multiplied by
the period of the processor clock. The result of this calculation is the
expected execution time for a particular loop operation executed in
on-chip memory. For example, the expected execution time for the
multiply operation is determined as follows:
26
Multiply Operation:
a := b * c
Machine-Language Equivalent Multiply Operation:
Idl b (2 cycles)
Idl c (2 cycles)
pfix; mult (1+38 cycles)
stl a (1 cycle)
Execution Time Calculation (T800 @ 20 MHz):
A A 10.050 iisec o o^^44 cycles x
^y^.{g= 2.200 jisec
Note that execution of instructions in "slow" off-chip memory
affects the calculation of expected execution times. To determine the
expected time for such an off-chip operation, the same basic method
described above is used, except that a separate accounting of on-chip
and off-chip cycle counts is maintained. The off-chip cycle count is
then multiplied by a hardware-dependent scale factor. This scale fac-
tor is the number of processor cycles required to make a single access
to the off-chip memory. In the case of the hardware used for these
timing tests, this scale factor is 4 for the T414 Transputer and 5 for
the T800 Transputer.
In addition, in the particular hardware configuration used for
these timing tests, off-chip memory consisted of dynamic random
access memory devices. Such devices must periodically be
"refreshed" to maintain their data. Although this refresh operation is
relatively fast and is handled automatically by the Transputer
27
hardware, access to the off-chip memory is restricted during a short
period of time. In the worst case, this increases the average time
required to access off-chip memory by a factor of 1.0237 for the T414
Transputer and 1.0213 for the T800 Transputer [In87b]. The worst-
case overall time for an off-chip operation is, then, the sum of the on-
chip time and the scaled and "refresh delayed" off-chip time. An
example calculation for execution of an off-chip memory-to-memory
transfer follows:
Memory-to-Memory Transfer Operation:
[array FROM FOR 1000] := [array FROM FOR 1000]
Machine Language Equivalent Operation:
i
pflx; pfix; Idc #$800 (1 + 1 + 1 cycles)
pfix; mint (1 + 1 cycles)
wsub (2 cycles)
stl temp (1 cycle)
pfix; pfix; Idc #$800 (1 + 1 + 1 cycles)
pfix; mint (1 + 1 cycles)
wsub . (2 cycles)
Idl temp (2 cycles)
pfix; pfix; Idc #$3E8 (1 + 1 + 1 cycles)
pfix; move ' (1 + 8 cycles maximum500 off-chip cycles)
Execution Time Calculation (T414 15 MHz):
29 cycles x^•^c^de^^'' = ^'^^^ ^^^"^
28
500 cycles x^'^^f cle
^^x (4 x 1.0237) off-chip = 136.493 |isec
1.933 |isec + 136.493 |isec = 138.426 |isec maximum
Measured execution times for different operations were
derived from the timing data as follows:
• Consider only timing data where no external communication links
were active.
• Calculate the average time per loop for a test set with no opera-
tions in the loop. (The average loop time is simply the time of
measurement divided by the number of loops executed duringthat time.)
• Extract the average time per loop for a test set with 1, 2, 3, and 4instances of a selected operation in the loop.
• Subtract the no-operation loop time from each of the selected
operation loop times. These loop times have now been "adjusted"to remove any loop overhead time.
• The incremental increase in adjusted loop times for the selectedoperation loops is the expected time required to perform a single
operation.
Tables 3.1 and 3.2 list the expected and measured execution
times for the selected loop operations. These results show that,
except for the divide operation, the measured results are consistent
with the expected results. The expected time for the divide operation
is based on the operation taking 39 processor clock cycles [In87d].
The measured results indicate that the divide operation takes 38 pro-
cessor clock cycles. Subsequent to performing the timing tests, it was
confirmed with the manufacturer that the divide operation, in fact,
takes 38 clock cycles as measured in the timing tests [Pe88].
29
TABLE 3.1
T414 EXPECTED AND MEASUREDINSTRUCTION EXECUTION TIMES
Operation Operations
per
Loop
Test
Duration
(|isec)
Loops
During
Test
LoopTime
(jLisec/Loop)
Measured
Op Time
(lisec)
Calculated
Op Time
(lisec)
Null Loop 1000014 214153 4.67
Assignment 1 1000013 205352 4.87 0.20 0.200
2 1000014 197246 5.07 0.20
3 1000011 189755 5.27 0.20
4 1000012 182813 5.47 0.20
Addition 1 1000013 197246 5.07 0.40 0.400
2 1000012 182813 5.47 0.40
3 1000014 170349 5.87 0.40
4 1000011 159475 6.27 0.40
Subtraction 1 1000013 197246 5.07 0.40 0.400
2 1000012 182813 5.47 0.40
3 1000015 170349 5.87 0.40
4 1000011 159475 6.27 0.40
Multiplication 1 1000011 131497 7.61 2.94 2.933
2 1000016 94878 10.54 2.94
3 1000022 74212 13.48 2.94
4 1000017 60938 16.41 2.94
Division 1 1000013 129230 7.74 3.07 3.133
2 1000012 92535 10.81 3.07
3 1000018 72071 13.88 3.07
4 1000022 59019 16.94 3.07
Block Move 1 1000030 50711 19.72 15.05 15.516
(100 Bytes) 2 1000036 28773 34.76 15.04 (maximum)3 1000042 20077 49.81 15.05
4 1000074 15433 64.80 15.03
Block Move 1 1000117 7007 142.73 138.06 138.426
(1000 Bytes) 2 1000100 3562 280.77 138.05 (maximum)3 1000056 2388 418.78 138.04
4 1000066 1796 556.83 138.04
*
30
TABLE 3.2
T800 EXPECTED AND MEASUREDINSTRUCTION EXECUTION TIMES
Operation Operations
per
Loop
Test
Duration
(iisec)
LoopsDuring
Test
LoopTime
(|LLsec/Loop)
Measured
Op Time(|j.sec)
Calculated
Op Time
(usee)
Null Loop 1000009 285581 3.50
Assignment 1 1000010 273845 3.65 0.15 0.150
2 1000008 263035 3.80 0.15
3 1000009 253038 3.95 0.15
4 1000010 243789 4.10 0.15
Addition 1 1000008 263035 3.80 0.30 0.300
2 1000011 243789 4.10 0.30
3 1000010 227167 4.40 0.30
4 1000011 212667 4.70 0.30
Subtraction 1 1000009 263035 3.80 0.30 0.300
2 1000010 243789 4.10 0.30
3 1000011 227167 4.40 0.30
4 1000010 212667 4.70 0.30
Multiplication 1 1000011 175357 5.70 2.20 2.200
2 1000015 126524 7.90 2.20
3 1000012 98964 10.11 2.20
4 1000010 81263 12.31 2.20
Division 1 1000013 172334 5.80 2.30 2.350
2 1000015 123400 8.10 2.30
3 1000009 96109 10.41 2.30
4 1000016 78704 12.71 2.30
Block Move 1 1000013 57789 17.31 13.80 14.166
(100 Bytes) 2 1000009 32118 31.14 13.82 (maximum)3 1000044 22262 44.92 13.81
4 1000036 17037 58.70 13.80
Block Move 1 1000012 7558 132.31 128.81 129.110
(1000 Bytes) 2 1000099 3830 261.12 128.81 (maximum)3 1000100 2565 389.90 128.80
4 1000524 1930 518.41 128.73
2. Isolated Communications Link Performance
The Transputer manufacturer also provides specifications for
communications link performance [In87b and In87c]. The measured
31
performance of a communications link can also be calculated from
timing test data. Measured communications link performance is cal-
culated by dividing the size of the largest single data block communi-
cated over a single link by the time required for the communication.
The largest block size was selected to minimize the affects of any
communications set-up overhead that might exist. For the timing
tests performed, the largest block size was 100,000 bytes. The speci-
fied and measured communications data rates are compared in Table
3.3. This table shows that the specified and measured data rates are
consistent.
TABLE 3.3
ISOLATED COMMUNICATION LINK PERFORMANCE
Processor
TypeCommunication
ModeSpecified Rate
(Kbytes/sec)
Measured Rate
(Kbytes/sec)
T414Input
Output
Input/Output
800
800
1600
759762
1505
T800Input
Output
Input/Output
1740
1740
2350
1738
1737
2344
Of particular interest here is the bidirectional data rate for
the T800 Transputer. For the T414 Transputer, the bidirectional data
rate is approximately 2 times the unidirectional data rate. For the
T800. however, the bidirectional data rate is only about 1.35 times the
unidirectional data rate. Since in the worst case of the external off-
chip memory the processor bus data rate is at least 20 megabytes per
32
second, accessing data from memory should not limit link perfor-
mance. Taking a closer look at the link communications protocol
provides the key to understanding this problem.
In unidirectional communication for the T800, the acknowl-
edge packet from the receiving Transputer is returned to the trans-
mitting Transputer before the transmitter completes its transmission.
Because of this, the transmitter is able to transmit bytes "head-to-tail"
without any intervening delays. This results in a maximum theoretical
unidirectional data rate of
20 Mbits 1 byte 1.82 Mbytessec "^11 bits transmitted
~sec
which is consistent with the actual unidirectional link data rate. In
bidirectional communications, however, each Transputer must
"sandwich" acknowledge packets between data packets. This
increases the total number of bits transmitted by a Transputer per
data byte from 11 to 13 and results in a maximum theoretical bidirec-
tional data rate of
20 Mbits 1 byte _ 3.08 Mbytessec ^ 13 bits transmitted ^ ~ sec
This is less than double the unidirectional rate. Additionally,
because a receiving Transputer is also transmitting its own data, it may
not immediately be able to return an acknowledge packet for data
received. The transmitter may, therefore, be delayed in transmitting
its next data byte, further reducing the data rate. However, since the
33
internal timing threshold requirements of the link hardware are not
known, it is not possible to exactly quantify this effect.
Because of this bidirectional link communications limitation,
two separate unidirectional communications links between T800
Transputers will provide about a 1.1 megabyte greater bidirectional
communications throughput than will a single communications link
operated bidirectionally.
Since the T414 unidirectional communication rate is signifi-
cantly less than the maximum possible rate for a 20 Mbit/second data
link, space already exists between successive transmitted packets to
"insert" an acknowledge packet for a received data packet. Because of
this, the T414 does not show the prominent bidirectional
communications protocol limitation shown for the T800.
3. Coimnunication Link Interaction
Prior research [Va87] has shown that, for the T414 Trans-
puter, there is minimal interaction between links when multiple links
are operated simultaneously. It was desired to determine whether
this is also a characteristic of the T800 Transputer with its greater
communication link data rates. Figures 3.2 and 3.3 show communica-
tions link performance for both the T414 and the T800 under condi-
tions when multiple links are active. Except for the bidirectional
protocol limitation previously discussed, these figures show that there
is minimal interaction when multiple links are being operated. Note
that in Figure 3.2, the number of links active includes both unidirec-
tionally and bidirectionally active links. Because of this, each number
34
of active links may include more than one data point. For example,
two links active on Figure 3.2 include the data points for both the case
when two unidirectional links are active and the case when one bidi-
rectional link is active.
Communications
Data
Rate
(bytes/second/link)
)
1000 "
800-^
600":
400 ;
200 ;
,9 r' ' [J c
i
C) 1 s 1 ^
Active L
; 6
nks
1 1
) 7 8 S
Figure 3.2. T414 Multiple Link Effects on Communications Rate
rd :=:
CO CC Oo o•^ oTO CO
C d)
E j3E ^oO
2000
1500
1000
500 -'Q Unidirectional
« Bidirectional
3 4 5 6
Active Links
Figure 3.3. T800 Multiple Link Effects on Communications Rate
35
Additionally, for the T414, it has been shown that individual
data packet size can affect the overall communications data rate
[VA87]. In general, as the data packet size is decreased, the overall
communications data rate also decreases. This is because the over-
head time associated with the set=up of a data transmission is more
significant when the size of the data packet to be communicated is
small. The current test configuration and software was used to extend
the scope of testing to further investigate this phenomenon. The
results of this testing for the T414 and the T800 are shown in Figures
3.4 and 3.5. These graphs, as well as others in this chapter, were
conducted with various numbers of links active. The number of links
active for a particular graph are identified on that graph. Figures 3.4
and 3.5 show that communications throughput decreases rapidly when
the packet size decreases below a threshold point of from about 200
down to 50 bytes per packet.
Further, note the communications characteristics shown in Figure
3.5. This graph shows the protocol limited nature of bidirectional link
operations for the T800 Transputer. Compare the plots for the two
bidirectional links case with those of the four unidirectional links case.
In either of these cases, a total of four links are operating. As packet
size increases from one byte per packet, the plots for both links are at
first identical. However, as the packet size passes about 20 bytes per
packet, the two plots begin to diverge. The four unidirectional link
case data rate continues to increase but the bidirectional link case
becomes protocol limited and data rate levels off.
36
Bre
BToaCofao"c3
OO
800
600
400
3=3=;j
cooCDwo5o>>
^ 200
Links Active
-a- 1 Unidirectional
-- 2 Bidirectional
-^ 4 Bidirectional
1000 10000 100000
Packet Size (bytes)
Figure 3.4. T414 Packet Size Effects on Communication Rate
2000
oB _ 1500
CO .S03
Qco
acooa
03 i2o w
? ^E -Q
E ^oO
1000 -"
500
Links Active
1 Unidirectional
2 Bidirectional
4 Unidirectional
4 Bidirectional
100 1000 10000 100000
Packet Size (bytes)
Figure 3.5. T800 Packet Size Effects on Communication Rate
37
4. Commmiication Link Effects on Processor Performance
One of the key questions raised about Transputer perfor-
mance has been the extent to which communications link activity
interferes with processor execution speed. There are two ways in
which such interference can occur. First, both the processor and the
communication links contend for access to the same internal data bus.
Secondly, the set-up of a communication and the rescheduling of a
process upon completion of its communication require some proces-
sor overhead. Recalling the timing test methodology, where opera-
tions in a loop were conducted in parallel with a variety communica-
tions load conditions, timing data is available to address this issue.
For the purposes of comparing processor performance, the
average number of loops that were performed per unit time under
each set of test conditions was taken as the relative measure of target
processor performance. For each set of test conditions, processor
performance has been normalized by dividing performance measure-
ments by the "no communication" performance for that set of test
conditions.
Figures 3.6. 3.7, 3.8 and 3.9 present representative results
from this analysis. As can be seen from these figures, processor per-
formance drops dramatically as the communications packet size is
decreased below a certain threshold value. In some cases, perfor-
mance was reduced to the point where the processor made no
progress on its looping calculation; the processor was completely
communications bound.
38
©ocCT3
Ek.
o"ti
O)
Q.
CLoo_l
03
ToE
Q.oo.S 0.6coca
<DO.OOc
S B
—
B—a B—El
<>
Links Active
1 Unidirectional
2 Bidirectional
• 4 Bidirectional
10 100 1000 10000 100000
Packet Size (bytes)
Figure 3.6. T414 Performance Degradation with No Loop Operation
OCO
Eoo
k- ro•n (n03 cQ. oQ. mO ^O 05_l Q.
n Oo OJN oTO
>E o
Tfo
Links Active
' 1 Unidirectional
2 Bidirectional
' 4 Bidirectional
10 100 1000 10000 100000
Packet Size (bytes)
Figure 3.7. T414 Performance Degradation
with Four Divide Operations
39
0)oc:m _„^
f- Q.Uc Oo o"Co c0. cQ. oOO CO_J
<x>o CLo o.N
cn cE
""^
Links Active
' 1 Unidirectional
• 2 Bidirectional
4 Bidirectional
0.0 -*
100 1000 10000 100000
Packet Size (byte)
Figure 3.8. T800 Performance Degradation with No Loop Operation
<uocCO
E
exoo
w= rut cnCD cCL oQ, mO k.
O <D™l a,
T3 o03 oN T3
ffl>
E ok. -^o
Links Active
1 Unidirectional
2 Bidirectional
4 Bidirectional
10 100 1000 10000 100000
Packet Size (bytes)
Figure 3.9. T800 Performance Degradation
with Four Divide Operations
40
In the left-hand region of the performance graphs, where the
individual data packet size is small, the processor overhead associated
with the communication is the primary contributor to performance
degradation. As the data packet size is increased, the time associated
with actually transmitting the data packet increases. As the actual
packet transmission time increases, the overhead associated with
communicating a packet becomes less significant. Therefore, when
the packet size is very large, any processor performance degradation
will be due to processor and communications link internal bus con-
tention. Towards the right-hand side of the performance graphs, pro-
cessor performance degradation can be seen asymptotically
approaching this bus contention-only degradation characteristic.
As can be seen by comparing the different performance
graphs, the type of calculation performed within the loop also affects
the amount of performance degradation. Although there is only a
minimal difference in the overhead limited (left-hand) area of the
graph, there is a noticeable difference in the bus contention limited
(right-hand) portion of the graph. In general, the difference in the
bus contention limited area of the graph is directly related to the fre-
quency of memory access required by the calculation loop. For short
instructions (such as assignment) that require frequent bus access, the
performance degradation is greater than for long instructions (such as
division) that require infrequent bus access. Figures 3.10 and 3.11
illustrate this characteristic.
41
CDC)cnj
F ^_^mo r•no CO
CL CD
n D.o Oo (/)
—J -i<:
T3 cON
03
03-*—
'
FoZ
"3° Assignment-° Division
I I I nil
10 100 1000 10000 100000
Packet Size (bytes)
Figure 3.10. T414 Bus Access Frequency Effects
on Perforinance Degradation
03OcmEo
CL
CLOO
>
CO
C/)
c
CD CO
CO
Eo
Assignment
Division
10 100 1000 10000 100000
Packet Size (bytes)
Figure 3.11. T800 Bus Access Frequency Effects
on Performance Degradation
42
5. Processor Effects on Communications Link Performance
As a corollary to the question of communications effects on
processor performance. It Is reasonable to ask what effect an actively
executing process has on the performance of a communications link.
It might be expected that if the processor is slowed by the effects of
communication link bus contention, a communication link would also
be slowed by the effects of processor bus contention. Data was
extracted from the timing database to determine if such effects did in
fact exist, A typical sample of this data is shown in Table 3.4.
TABLE 3.4
PROCESSOR EFFECTS ON COMMUNICATIONS PERFORMANCE
Data No Calculation With Calculation
Packet Size Data Rate Data Rate
(bytes/packet) (bytes/second/link) (bytes/second/link)
T800 1
10
100
1000
10000
100000
31.415
282.415
1120.561
1166.943
1171.413
1171.866
T414 1 23.561 23.561
10 226.078 226.072
100 705.184 701.262
1000 738.460 738.165
10000 742.242 742.975
100000 747.049 740.988
31.415
282.416
1112.013
1165.773
1171.097
1171.646
Table 3,4 shows that the effects of processor activity on
communication link performance is not significant. The reason for
this is that, when the communicating process is of an equal or higher
43
priority to the executing process, the communication link has priority
access to the data bus [In86e]. The communication link is, therefore,
virtually unaffected by bus contention. Bus arbitration priority was
defined in this manner so that the lower bandwidth communications
links would not be idled [Ha87].
6. Summary
Through the timing tests that have been performed, it has
been seen that a strong relationship exists between the calculational
performance of the Transputer and the operation of its communica-
tions links. Additionally, the size of individual communications pack-
ets has a pronounced effect on both the calculational and communica-
tions performance of the Transputer. Knowledge of these
characteristics can be used by the software designer to develop more
efficient software.
44
IV. SHARED MEMORY MODEL PROGRAMMING INTERFACE
A BACKGROUND
The concept of Communicating Sequential Processes as imple-
mented by the Transputer is one methodology for interprocess com-
munication and synchronization. An alternative concept utilizes
globally shared memory as a means for interprocess communication
and sjnichronlzation. A body of software and development experience
that makes use of this shared memory concept exists and could be
useful in developing programs for a network of Transputers. In
particular, distributed programs developed in the ADA language
environment use a shared memory model.
In a large network of Transputers, the physical connectivity of the
network is limited by the availability of only four bidirectional commu-
nications links per Transputer. This limited connectivity often
imposes restrictions upon the distributed systems software designer.
The software designer must be aware of and consider the physical
configuration of the network. As a result, the designer must often
work around the limitations that the configuration imposes. The use
of a shared memory model in a network of Transputers is one
methodology that might be used to isolate the software designer from
these physical configuration limitations.
These factors have motivated the development of prototype soft-
ware to implement a shared memory model environment in a network
45
of Transputers. The purpose of this prototyping effort was to gain
experience in developing such a system for a distributed system. This
chapter describes the design, implementation, and evaluation of this
prototype.
a EVENT COUNTS AND SEQUENCERS
When multiple processes are sharing a common resource, as in
the case of a shared memory system, some means must be available for
the processes to coordinate or synchronize their use of the resource.
One means for this synchronization is through the use of Event Counts
and Sequencers [ReKa79].
As its name implies, an event count is a value representing the
cumulative total number of events of a particular type that have
occurred in a system. An event count might be used, for example, to
record and monitor the number of times a particular shared memory
location has been written to or modified. Many separate event counts
may be defined in a system for use in monitoring different types of
events.
A sequencer is also a value representing a count. This count is
used to control the sequence in which events occur. Each sequencer
count value can be thought of as a reservation for a process to use a
shared resource. These reservations are issued in sequencer count
order to requesting processes. Processes with reservations are then
permitted access to the controlled resource in sequencer count order.
For example, several processes writing to the same memory location
may use a sequencer to ensure that only one process writes to the
46
location at a time. As with event counts, many sequencers may be
defined in a system to control access to different shared resources.
Several primitive operations are defined for event counts and
sequencers. These operations are:
• read(event_count)— The read operation returns the current valueof a specified event count.
• advance{event_count)— The advance operation increments thevalue of a specified event count.
• await(event_count, count_value)— The await operation suspendsthe process executing the await command until the value of aspecified event count has at least reached the identified countvalue. Note that execution of the suspended process is not neces-sarily resumed immediately when the specified count value is
reached. The process might not resume until some time after thecount is reached.
• ticket(sequencer)— The ticket operation returns the value of thenext available reservation "slot" for a specified sequencer.
C IMPLEMENTATION
The model of event counts and sequencers was selected as the
paradigm for implementing the prototype shared-memory model.
Since a network of Transputers does not physically share any memory,
the implementation must make use of software to simulate a sharing of
memory. The core of the implementation is a distributed software
kernel which executes on each node in the network. Figure 4.
1
depicts the general physical configuration of a network with this
kernel.
47
Transputer 1
Distributed
Kernel Transputer 2
Transputer 3
Application
Program
Modules
Figure 4. 1 . Physical Configuration of Shared Memory Kernel
The kernel contains the system's shared memory and maintains
the system's event count and sequencer values. The system's shared
memory as well as the event count and sequencer values are appor-
tioned amongst the kernels operating on different nodes. The kernel
also "hides" the specific physical configuration of the network from
the application program by managing all communications between
program modules and network nodes. Figure 4.2 shows the
48
distributed application program's view of the kernel. A detailed
description of the kernel is provided in this section.
Application
Program
Modules
Distributed
Kernel
"Hidden"
Physical
Configuration
Figure 4.2. Logical Configuration of Shared Memory Kernel
The prototype implementation also includes a library of proce-
dures for interfacing with the kernel. This library includes the basic
primitive operations defined for event counts and sequencers. In
addition, primitive operations have been defined and included for
49
accessing shared memory segments. The procedures in this library-
are also listed in this section.
1. The Kernel
Figure 4.3 shows a logical diagram of the software kernel. As
shown, the kernel consists of three major parts: the input/output
buffers, the communications manager, and the shared memory-
manager.
Figure 4.3. Shared Memory Kernel Logical Block Diagram
50
a Input/Output Buffers
A set of input and and a set of output buffers are provided
for the hardware communications links. These buffers decouple the
relatively slow link data transfers from the operation of the kernel.
The buffers enable the kernel to operate in parallel with network data
transfers.
be Coxnmunications Manager
All communications between a kernel and the library
interface procedures and between kernels on different nodes are for-
matted as packets. Two different packet formats are used. One
format is used for communication between a kernel and a library
interface procedure and one is used for communication between
kernels on different nodes in the network. The communications
manager perform any necessary conversions between the two packet
formats. Figure 4.4 depicts an example of each type of communi-
cations packet. A listing of all the packet types is included in
Appendix C.
Packet Format Used Between an Interface Module and the Kernel
f-data
action
codeevent
count id
Packet Format Used Between Kernels on Different Nodes
action
codefrom
node id
from
process
to
node id
to
process
event
count id
data
sizedata
Figure 4.4. Kernel Comiiniinication Packet Formats
51
The communications manager processes all received
packets. If a packet destination is a remote node, the packet is routed
to the node via the hardware communications link identified by the
node.link data structure. If the packet is for a local application pro-
gram module interface procedure, the packet is routed to that inter-
face procedure via the appropriate local link. Otherwise, the packet is
processed by the node's shared memory manager.
c. Shared Memory Manager
The shared memory manager actually performs the
operations defined by the various library interface procedures. As a
result of external requests, this module accesses the kernel data
structures and, if a response is required, generates appropriate mes-
sage packets for return to the requesting process.
2. Data Structures
Operation of the kernel is best described by examining the
set of data structures that support the kernel. These data structures
are central to operation of the kernel. Diagrams of these data struc-
tures and their interrelation are shown in Figures 4.5, 4.6, and 4.7.
a node.link
This array is used to represent the physical configuration
of a network. Each node in the network is assigned a unique identi-
fying number. By convention, the assigned numbers start at zero and
are consecutive. The node.link data structure is a one-dimensional
array with one element for every node in the network. The value of
the ith element in the array identifies the number of the hardware
52
link that is used to send messages to the network node with identify-
ing number i. This does not necessarily mean that the ith node is at
the other end of the specified link. Rather, it means that the node on
the other end of the specified link is the next node in the path to the
ith node.
Note that the node.link array may be viewed as one row
extracted from a type of adjacency matrix that defines the network.
Such an adjacency matrix array is, in fact, used at compile time to
define the individual node.link arrays. As a result, each kernel hold a
portion of the overall matrix. Currently, the programmer defines the
contents of this matrix as the last step of the software development
process.
b. count.node
As was done to uniquely identify nodes, each event count
and sequencer in the system is assigned a unique identifying number.
Again, by convention, these numbers start at zero and are consecutive.
The count.node structure is a one-dimensional array with one element
for each event count and sequencer in the system. The value of the ith
element in the array identifies the number of the node that maintains
the count assigned identifying number i. This array identifies which
processor in the network is responsible for maintaining each event
count.
c. count.array
This is a two-dimensional array that contains four ele-
ments for each event count and sequencer maintained by a kernel at a
53
node. The first two of these elements are the current values of the
event count and sequencer. The third element is either a nil pointer
or a pointer to the base of a shared memory segment associated with
the event count and sequencer. The fourth element is either a nil
pointer or a pointer to a list of processes which have been suspended
by executing the await library procedure. This list is ordered by value
of the count value specified when a processes executed the await pro-
cedure. Each event count and sequencer has a separate waiting pro-
cess list associated with it.
d. couiit.array.in<iices
This is a one-dimensional array that has one element for
each event count in the system. For counts maintained at a particular
node, this array contains the index of that count's data in the
count.array data structure. For counts not maintained at the node, the
array contains the nil token. Use of this array speeds access to a
count's data in the count.array data structure by providing a direct
pointer to the desired row and avoiding having to search through the
count.array to find the correct row.
e. node.awaits
As was mentioned earlier, associated with each event
count and sequencer is a list of suspended processes waiting for par-
ticular count values to be reached. These lists of suspended processes
are stored using the node.awaits data structure. The data structure is
a two-dimensional array where each row of the array represents an
entry for one suspended process. There are four data elements for
54
each entry. The first element of the entry is the count value for which
the suspended process is waiting. The next two elements identify
which process at which node has been suspended. The final element
is either a nil pointer or a pointer to the next entry in that particular
waiting process list. These pointers and the waiting process pointers
in the count. array data structure are row index values of the
node.awaits data structure. Figure 4.5 shows an example waiting pro-
cess list.
count .array
^J
Event Count
Value
SequencerValue
MemoryPointer
Await List
Pointer
42 44 nil •v^
650 • nil \12345 12348 • •s^
447 nil nil \
node .awaits
Wait-for
CountWaiting
Node Id
Waiting
Process Id
Next Await
Pointer
12347 10 5 nil
12346 7 6 w-^^43 1 8 nil
Figure 4.5. Kernel Waiting Process List Structure
55
f free.Ust
This is a one-dimensional array that has one element for
each array position in the node.awaits data structure. The free. list data
structure holds a list of node.await data structure row indices that are
not in use and may be allocated to any event count and sequencer's
waiting process list. As entries are removed from wait process lists,
the associated node.awaits row indices are returned to this free list.
A pointer into the free. list data structure identifies the
value of the next available node.awaits row index. When this next
index is allocated, the pointer is incremented to find yet the next
available node.awaits row index. When an index is returned to the
free. list data structure, the pointer is decremented and the returned
index is stored. Figure 4.6 illustrates the use of the free list and asso-
ciated pointer.
g. cotint.size
This is a one-dimensional array with one element for
each event count and sequencer in the system. The value of the ith
element in the array identifies the number of bytes of shared memory
that are to be associated with count i.
h. node.data
This array is the block of shared memory available at a
node. Segments of this memory block are apportioned based on the
values in the count. size data structure for counts being maintained at
that node. The memory pointers in the count. array data structure
56
node . awaits
Wait-for
Count
Waiting
Node !d
Waiting
Process Id
Next Await
Pointer
12347 10 5 nil
12346 7 6
43 1 8 nil
tNext Free
Pointer
Figure 4.6. Kernel Waiting Free List Management
point to the start of individual shared memory segments in this block
of memory. Figure 4.7 shows how this shared memory array is config-
ured and accessed.
3, Library Procedures
A functional description of each of the library interface pro-
cedures is provided below. Each procedure includes a reference to a
"link" parameter. This parameter is a bidirectional communication
channel that links or connects a distributed application program
module to the kernel. The program module does not directly use the
57
channel for any communication. The library interface procedures use
this channel internally for communicating with the kernel. The
detailed source code for the procedures is provided in Appendix C.
count . array . indices
count . size
256 26 16 6 2 512
Figure 4.7. Kernel Shared Memory Access Data Structures
58
a read(link, count.id, count.value)
This procedure performs the read operation on the
specified event count. The value of count.value is set to the current
value of the count.
b. advance(link, count.id]
This procedure increments the value of the specified
event count. If other processes are waiting on the new value of the
event count, these waiting processes are resumed.
c. awalt(link, count«id, count.value)
The executing process is suspended until the value of the
specified count reaches the argument count.value. If the value of the
specified count is already greater or equal to the argument count.value,
the process executing the await call continues execution.
d. ticket(link, count.id, ticket.value)
This procedure sets the value of ticket.value to the value
representing the next available reservation for the specified
sequencer.
e. put(link, coiuit.id, index, byte.array)
This procedure provides write access to the shared
mem.ory segment associated with the specified event count. The array
of bytes passed to the procedure is stored in the shared memory seg-
ment offset from beginning of the shared memory segment by the
number of bytes specified by the value of the index parameter.
59
t get(link, count.id, index, byte.array)
The get procedure complements the put procedure by
providing read access to the shared memory segment associated with
the specified count. The array of bytes passed to the procedure is
read from the shared memory segment offset from beginning of the
shared memory segment by the number of bytes specified by the value
of the index parameter. The number of bytes read is based on the size
of the byte.array parameter.
g. ecs.kemeUlink.array.count.node.count.size.node.id,
node.link)
The procedure for the kernel is itself included in the
library. This procedure is executed on each node in the system. The
link,array parameter for the kernel is an array of all the individual
links that connect the application program modules at a node to the
kernel. The other kernel parameters are described in detail in the
data structures section of this chapter.
D. PROGRAMMING
This section presents a brief overview of how to program using
the shared memory interface. A detailed programming example is
presented in Appendix D. Programming using the shared memory
interface is accomplished in three basic steps.
The first programming step is to divide the program into modules
that should operate in parallel and to define the shared memory seg-
ments and event counts and sequencers that will be required for
communication between and synchronization of the modules. This
60
step should performed without considering the physical configuration
of a network.
The next step is to code the individual modules using the library-
interface procedures to manipulate the event counts and sequencers
and to access the shared memory segments. Since this step is also
accomplished without considering the physical configuration of the
network, the modules may be coded independently.
The final step of the software development process is to apportion
modules and shared memory segments amongst different processors
in a network. Although any apportionment of the modules and mem-
ory segments is logically equivalent, this placement process may be
influenced by practical considerations. For example, one would not
want to place all the computationally intensive modules on the same
processor. Further, it would seem reasonable to locate modules shar-
ing the same memory segment physically close to the network loca-
tion of the memory segment. To help quantify the factors that may
influence the network placement of modules and shared memory
segments, the following section evaluates the performance of the
shared memory interface under a variety of conditions.
E. EVALUATION
To evaluate the prototype programming interface, a set of tests
was conducted using the interface. The objective of this testing was to
provide a representative measure of the communications performance
of a network when using the programming interface. It was desired to
determine how the network communications performance was
61
affected by variations in certain parameters. This testing utilized
T414 Transputers operating with a clock speed of 15 MHz and a link
speed of 20 Mbits per second. These tests are described in the
following paragraphs. It should be noted that the prototype program-
ming interface was not optimized for maximum performance. Because
of this, the interface performance documented here can likely be
improved. Chapter VI describes some ways in which the program-
ming interface performance can be improved.
1. Basic Interface Procedure Timing
As the first step in examining the performance of the pro-
gramming interface, it was chosen to measure the execution times of
the various interface procedures. A test program was developed to
execute each of these procedures in a loop. The time required to
execute the loop with each of the interface procedures was measured.
The time required to execute the loop without an interface procedure
(i.e., a "null loop") was measured and subtracted from the interface
procedure loop times. The resulting time was used to determine the
average time required for a single interface procedure execution.
Interface procedure execution times were measured under
several sets of conditions. Specifically, the network distance between
an interface procedure and its target event count and sequencer was
varied and, for the put and get operations, the size of the argument
data element was varied. The results of this testing are shown in
Figures 4.8. 4.9. and 4.10.
62
CD ^_^
E tont- cc Ro o3 C/D
o —X ELU
—
'
H3- read/ticket
-" advance
-a- await(ready)
-- await/advance
Hop Distance
Figure 4.8. Primitive Operation Timing Test Results
oE CO
1- cco
ooo
i: (no ""~
oX ELU
o- 64 Bytes
-- 256 Bytes
-B- 512 Bytes
-^ 1024 Bytes
Hop Distance
Figure 4.9. Put Operation Timing Test Results
63
E
c
oXLU
-B- 64 Bytes
-•- 256 Bytes
a- 512 Bytes
-•- 1 024 Bytes
3 4 5
Hop Distance
Figure 4.10. Get Operation Timing Test Results
Figure 4.8 shows two basic types of execution behavior. First,
the execution times for one group of interface procedures is relatively
constant over the range of conditions tested. This group of proce-
dures includes those that do not require any response from the dis-
tributed kernel. The procedure can pass its communications packet
to the kernel and the application program module can proceed witii-
out waiting for a reply. Since communication between the interface
procedure and the kernel is via memor^'^-to-memory transfer, the
transmission of a communications packet is fast and relatively insensi-
tive to variations in data element size over the range of kernel opera-
tion. Note also in Figure 4.9 that the execution time for the put
operation is significantly less for the zero-hop (same node) case than
for the one-hop case. This is because the zero-hop case is executed by
64
performing a memory-to-memory transfer of data as opposed to the
slower physical link transfer of data in the one-hop case.
The second t3^e of behavior is for interface procedures that
do require a response from the distributed kernel. As can be seen in
Figure 4.8 for the read, ticket, and await procedures and in Figure
4.10 for the get procedure, execution times are strongly dependent on
the test conditions. As the network distance is varied, the overall
execution time increases because the "round trip" communication
time in the network increases. Further, since data transfer between
nodes via the hardware links is significantly slower than the memory
to memory transfer, the effects of data size on execution time for the
get procedure become significant.
The results of this testing imply that there is a preferred
location for the counts and shared memory segment for a producer
and consumer of data located at different nodes in the network. Since
the put operation is relatively insensitive to network distance and
since the get operation is greatly affected by network distance, it
would appear that the shared memory segment for a producer
consumer pair should be located at the consumer's node. A test was
performed to confirm this. In this test, the count and shared memory
segment for a producer and consumer pair were placed at the
producer's node, then at the consumer's node. The resulting data
communication rates were measured for various data element sizes.
The results of this testing are shown in Figure 4.11 for a network
distance of seven nodes. The results of this test show that the highest
65
communications rate is obtained when the shared memory segment
and associated event count for a producer/consumer pair is located at
the consumer's node.
oCO
OCT3
CO C08 ooow v>
c ^o a
a.03 CO
c3 5,
E ^boO
200 400 600 800 1000
Data Element Size (bytes)
1200
Figure 4. 1 1 . Count and Shared Memory Location Effects
on Communication Rate
2. Node Distance Effects on Communication Rate
The distance between an application program module pro-
ducing data and the application program module consuming that data
can be measured as the number of physical links or "hops" in the
communications path between the node locations of the modules. A
test was performed to determine the effect that hop distance could
have on the overall data communication rate between application pro-
gram modules. In this test, a producer and consumer of data were
66
separated by increasing hop distances. The library interface proce-
dures were used to transfer data between the producer and consumer
at the maximum possible rate. In addition, since the size of a commu-
nication message has a demonstrated effect on performance (Chapter
III), the size of the data element being communicated using the pro-
gramming interface was varied. The resulting communications data
rates are shown in Figure 4.12.
-3-1 Byte
-- 64 Bytes-»- 51 2 Bytes-*- 1024 Bytes
Hop Distance
Figure 4.12. Hop Distance Effects on Communication Data Rate
This graph shows that as the hop distance increases, the
maximum data rate decreases. This decrease is rapid for first few
hops but less significant as the hop distance is further increased. This
is because each additional hop represents a proportionally smaller
increment of in-line delay to the data transfer. As was shown in
67
Chapter III for the Transputer in general, communication of larger
data elements is more efficient when using the programming
interface.
Processes executing on the nodes between the producer and
consumer may also be affected by the data transfer through a node. To
examine this, a looping calculation was initiated on a node immedi-
ately between a producer and consumer pair. The number of loops
executed per unit time during a data transfer was measured and com-
pared to the no-data transfer looping rate. Again, varying sizes of data
elements were used. The results of this testing are shown in Figure
4.13.
Figure 4.13 shows that lowest performance of about 70 per-
cent of normal occurs when using the smallest-sized data elements.
As the size of the data element increases, the performance increases
to a maximum of about 80 percent of normal. Performance of the
intermediate process improves when the data element size is
increased because the overhead associated with communicating a sin-
gle message becomes relatively smaller. This is the same general
effect as was shown in Chapter III for communication with varying
sized data packets.
3. Hardware Link Sharing Effects on Cominiinication Rate
When using the programming interface, it is likely that data
communication between several pairs of producers and consumers will
be conducted using the same hardware communications link. In this
case, the kernel in the node on either end of the link is also involved
68
in the communication. A test was performed to determine what
effects this link sharing had on the overall data communications rate
between the nodes. In this test, producers and consumers were
placed on adjacent nodes. Producers were placed on each node in the
same number to provide balanced bidirectional utilization of the hard-
ware link. The number of producers on each node and the size of a
communication data element was varied. The results of this test are
shown in Figure 4.14. This plot shows that, although some degrada-
tion in communications performance does occur due to kernel over-
head, a substantial data rate can be maintained when a hardware link
is shared between several users.
©ocCO
E?ot:03CL
Q.OO_J
•oo
E
200 400 600 800 1000
Data Element Size (bytes)
1200
Figure 4. 13. Intermediate Process Degradation
During Conuniinication
69
400
<u350
03
rr
ffl ^_^ 30003
QDCo
o ato
250
03 k.
O 2001 C/J
E O)
E 150uO i<:
15 100oH
50
•3- 1024 Bytes
-•- 512 Bytes
S- 256 Bytes
-0" 64 Bytes
3 4 5 6 7
Number of Producers
Figure 4.14, Shared Hardware Link Communications Rates
4. Multiple Link Effects on Communication Rate
In most cases, several links on each Transputer in a network
will be connected and utilized at any one time. To test the
communications performance of the kernel under these conditions, a
varying number of hardware links on one Transputer were
bidirectionally loaded. The resulting data rates were measured for
different data element sizes. The results of the test are shown in
Figure 4.15.
70
-s- 64 rate
-•- 256 rate
-a- 512 rate
-o- 1024 rate
2 3
Number of Links Active
Figure 4.15. Multiple Hardware Link Communications Rates
Figure 4.15 shows that, as the number of Unks used
increases, the data actually communicated via each link decreases.
However, in Chapter III it was shown that link communication rate
was essentially independent of the number of links active. The kernel
must, therefore, the cause of this reduced link utilization. Since all
data messages must be processed by the kernel and since the kernel
can only process and transfer a fixed maximum number of messages in
any time period, a limit on the data rate through the kernel must
exist. Since Figure 4.15 shows the communication rate per link
decreasing by more than a factor of two when a second link is acti-
vated, it appears that the kernel data rate limit is less than the capac-
ity of a single communication link. Activating additional links.
71
therefore, except to reduce the hop distance between nodes in a
network, does not increase the overall communication bandwidth of
the network. Chapter VI discusses potential changes to the kernel
which may improve the programming interface performance.
72
V. MESSAGE-PASSING MODEL PROGRAMMING INTERFACE
A BACKGROUND
The shared memory model is one method for isolating the dis-
tributed systems software designer and programmer from having to
consider the physical configuration of a network of Transputers. It is
also possible, however, to utilize a message passing model to provide
for this isolation. To investigate this alternative, a prototype pro-
gramming interface based on message passing has been developed.
This chapter describes and evaluates this prototype.
a IMPLEMENTATION
Since the Transputer is based on the concept of Communicating
Sequential Processes (CSP), using this concept as a basis for a mes-
sage-passing prototype was a natural choice. Communications in the
message-passing prototype are, therefore, based on CSP communica-
tions "rules." Specifically, message passing is logically point-to-point,
synchronousm and unbuffered.
A significant goal in the development of this interface was to per-
mit application program modules to be written in the Transputer's
"native" high-level language, OCCAM, without requiring any additional
external procedure references. In this way, existing modules pro-
grammed in OCCAM could be used with the interface.
As with the shared-memory model prototype, the core of the
message-passing prototype is a distributed kernel that executes on
73
each node in the network. Figure 5.1 shows the physical configuration
of such a network. Note that this physical configuration is very similar
to the physical configuration used in the shared-memory model
prototype.
Transputer 1 Transputer 3
Hardware
Links
Local
Channels
Distributed
Kernel Transputer 2
Application
Program
Modules
Figure 5. 1 . Physical Configuration of Message-Passing Kernel
Figure 5.2 depicts the logical configuration of the message-passing
network as seen by the application. The logical configuration of the
74
message-passing network shows individual application program mod-
ules interconnected by a network of global message-passing channels.
Global
Channels
via
Distributed
Kernel
Application
Program
Modules
Figure 5.2. Logical Configuration of Message-Passing Kernel
This chapter frequently uses the terms "local channel" and
"global channel" when referring to the operation of the message-
passing prototype. A local channel is an actual communications path
that exists between a process and a kernel. Figure 5.1 shows several
examples of local channels. Global channels, as shown in Figure 5.2,
are the virtual communication paths that exist between two processes.
These virtual communications paths are established and maintained by
75
the distributed kernel. Local channels may be thought of as connect-
ing processes to the ends of global channels.
1. The Kernel
Figure 5.3 shows a block diagram of the message-passing
kernel. As shown, the kernel consists of three major parts: the
Input/Output Buffers, Channel Controllers, and the Communications
Manager. In addition, the kernel performs some initialization of the
network. Each node broadcasts to the network a list of the global
channel ends that it has been assigned. This information is used by
the kernel to create a global routing table for network
communications.
' a Input/Output Buffers
These buffers are identical to those used in the shared
memory model prototj^e. They decouple the relatively slow link data
transfers from the operation of the kernel. The buffers enable the ker-
nel to operate in parallel with network data transfers.
b. Channel Controllers
The kernel creates one channel controller process for
each local channel at a node. The channel controller is the physical
connection between a local channel and the kernel and is the logical
connection between a local channel and the appropriate global chan-
nel. Communication over the global channel is controlled using a sim-
ple protocol which is managed by these channel controllers. When
the receiving end of a global channel is ready to receive an application
program module message, the receiving end channel controller
76
transmits a "ready to receive" message to the sending end controller.
The sending end controller is then released to accept the applica-
tion's message and send it to the receiving application program mod-
ule via the global channel.
Physical
Communication
Links
Application
Program
Modules
Figure 5.3. Message Passing Kernel Logical Block Diagram
77
c. Communications Manager
As in the shared memory interface, messages communi-
cated over the network are formatted as packets. Figure 5.4 diagrams
the packet format used. Each packet received from a hardware link or
from a channel controller is processed by the communications con-
troller and routed to its packet header identified destination.
Packet F Drmat Used Between Kernels on Different Nodes
^ 3
action
codeto
node id
global
channel id
data
sizedata
i <.
Figure 5.4. Kernel Communications Packet Format
At the current state of the kernel's development, mes-
sages transmitted from an application program module over a global
channel are restricted to only one of the simple protocols predefined
in the OCCAM language [PoMa87]. The available protocol provides for
transmission of variable-length byte arrays (INT::[1BYTE). This is not a
particularly limiting restriction since any structure in OCCAM can eas-
ily be RETYPED into an array of bytes for communications purposes.
2. Data Structures
The message-passing prototype kernel maintains a set of data
structures that defines the characteristics of the physical and logical
communications network. The overall operation of the kernel is fun-
damentally dependant on the interrelation of these structures. The
78
kernel data structures are described in the following paragraphs.
Figure 5.5 is an example of the manner in which these data structures
are interrelated at two different nodes.
a node.link
This data structure is identical to the node.link data
structure used in the shared-memory model. The value of the i*^
element in the array identifies the number of the hardware link that
connects to the next node in the path leading to node i. The array
defines the node's view of a network. Currently, the programmer
defines the contents of this data structure at the end of the software
development process based on the particular network configuration
that is to be used.
b. gchaiienode
This structure is used to identify the end locations of
network global channels. This is a two-dimensional array which has a
two-element entry for each global channel defined in the distributed
system. Each global channel in the network is assigned a unique
identifying number. By convention, the assigned numbers start at zero
and are consecutive. This array is indexed by a global channel's unique
identifying number. The first element of an entry identifies the node
location of the process that outputs to the global channel. The second
element identifies the node location of the process that inputs from
the channel. During kernel initialization, each node broadcasts a list-
ing of global channel ends at the node. The kernel uses these
79
Segment of a Logical Configuration
Data Structures for Logical Configuration
At node n At node m
g goutput a n c c b a n c c b
input b m c b a b m c b a
output
input
gchan.node
g goutput
nil
nil
'out nil nil nil nil nil nil nil nil
input nil nil nil nil nil1 ,n ni! nil nil
gchan.lchan
output
input
g Inil
Ichan.gchan
'out
out
in
mChan.map
Figure 5.5. Kernel Data Structure Interrelation
80
broadcast messages to construct this array. As a result, each node in
the system has an identical copy of this data structure.
c. gchan.lchan
If a global channel end is connected to a local channel at
a particular node, this structure identifies which local channel the
connection is to. This is also a two-dimensional array indexed by a
global channel's unique identifying number. Each entry in the array
consists of two elements. In general, the first element of an entry is
the identifier of the local channel that outputs to the global channel;
the second element is the identifier of the local channel that inputs
from the global channel. At a particular node, this array only includes
local channel identifiers for the global channels that output from or
input to that node. All other entries in the array are nil at that node.
d. loc.chan
This data structure is an array of OCCAM communication
channels. This array, potentially different at each node in the net-
work, defines all the local channels that connect to global channels at
a particular node.
e. Ichan.gchan
This is a one-dimensional array with an element for each
local channel at a node. The value of the i^^ element in the array is the
identifier for the global channel to which local channel i is connected.
£ chan.inap
Each node in a network has a different version of this
data structure. The data structure is an array that holds two elements
81
for each local channel defined at a node. For the i*^ channel defined
in the loc.chan array, the i^^ entry in the chan.map array defines the
end of the global channel "connecting to" that local channel. The first
element of an entry is the unique identifier for the global channel.
The second element of an entry identifies whether the local channel is
connected to the sending or receiving end of the global channel.
2. Library Procedures
The Library for the message passing interface consists only of
the distributed kernel procedure:
csp.kemel(node.id. node.Unk, chan.map, loc.chan).
This kernel procedure is executed on each node in the sys-
tem. In general, the parameters for this procedure identify the physi-
cal and logical channel mappings for that node.
C PROGRAMMING
This section presents a brief overview of how to develop a pro-
gram using the message-passing interface. A detailed programming
example is presented in Appendix F. Programming using the mes-
sage-passing interface is accomplished in essentially the same three
basic steps as is programming using the shared memory interface.
The first step is to divide the program into modules that should oper-
ate in parallel and to define the point-to-point channels that will be
required for communication between the modules. This modulariza-
tion need not, however, consider the physical four-link limiitation of an
82
individual Transputer or the actual connectivity of a particular
network.
The next step is to code the individual modules using OCCAM
programming guidelines [PoMa87]. Thus far. the program develop-
ment methodology is the same as would be used for developing any
program in OCCAM.
The final step of the software development process is to apportion
modules amongst different processors in a network. The same practi-
cal considerations that influence the placement of channels when
using the shared memory model also need to be considered when
using the message passing model.
D. EVALUATION
The same types of testing were performed for the message-pass-
ing interface as were performed for the shared-memory interface.
This section provides the results of the message-passing model inter-
face kernel testing and compares the test results of the two interfaces.
The message-passing interface testing also utilized T414 Transputers
operating with a clock speed of 15 MHz and a link speed of 20 Mbits
per second. Since the message passing interface has no separate set
of procedures for use in application program modules, separate inter-
face procedure timing tests were not needed.
1. Node Distance Effects on Cominunications Rate
A test was performed to determine the effect increasing dis-
tance between nodes has on the overall data communication rate
between application program modules. In this test, a producer and
83
consumer of data were separated by increasing hop distances. Data
messages were sent from the producer to the consumer at the maxi-
mum possible rate. The resulting communications data rates for dif-
ferent data message sizes are shown in Figure 5.6. Figure 5.7 provides
a comparison of the maximum data communications rates for the two
Interfaces.
1 024 Bytes
512 Bytes
256 Bytes
64 Bytes
3 4 5
Hop Distance
Figure 5.6. Hod Distance Effects on Communications Data Rate
Figure 5.6 shows that as the hop distance increases, the
maximum data rate decreases. This decrease is rapid for first few
hops, but less significant as the hop distance is further increased.
This is because each additional hop represents a proportionally
84
smaller increment of in-line delay to the data transfer. This same
effect was shown for the shared memory kernel in Chapter FV.
©nj
a: oCO cm oQ u
03
V) Ui
c \—
o oa.
2(1)c
VhoO
1000
900
800
700
600
500
400
300'
200'
100'
0'
Message Passing
Shared Memory
3 4 5
Hop Distance
Figure 5.7. Comparison of Hop Distance Effects
Figure 5.7 shows that for smaller network distances, the
message passing interface has a significant data communications rate
advantage over the shared memory interface. However, as the net-
work distance between the producer and consumer nodes increases,
the communication performance difference between the two
interfaces decreases. In both cases, minimizing the difference
between the producing and consuming nodes improves performance.
With the message-passing model's greater data communica-
tion rate, it might be expected that performance of processes on
85
nodes in the path between a producer and consumer would be
degraded to a greater degree with the message-passing model than
with the shared-memory model. Figure 5.8 shows the intermediate
process degradation for both interfaces.
1.0
0.9<DOc 0.8CC
£o 0.7"C0)
a. 0.6Q.oo 0.5_J
o0.4
NCT3
F 0.3u.oZ 0.2
0.1
0.0
Q- Shared Memory-- Message Passing
T200 400 600 800
Data Size (bytes)
i
1000 1200
Figure 5.8. Intermediate Process Degradation
During Communication
As can be seen, the degradation for the message-passing
model is less than for the the shared-memory interface. The reason
for this behavior appears to lie in the number of messages an
intermediate node must process when using the two interfaces. In
the shared-memory model case, each production/consumption cycle
requires at least the transmission of three messages between nodes
(an await message, an advance message, and either a put or a get
86
message). The message-passing model only requires that two
messages be sent (a receiver ready message and a data message). As
shown in Chapter III, a greater degradation should be expected when
using more messages to communicate a given sized data element.
2. Hardware Link Sharing Effects on Communication Rate
In the message-passing model, several global channels will
likely use the same physical link for network communication. As with
shared-memory interface testing, a varying number of producer/con-
sumer pairs were executed on two adjacent nodes to examine the
kernel's performance. The results of this test are shown in Figure 5.9.
Despite some performance degradation, these results show that a link
can be effectively shared between global channels.
o03
a:
m , ^
CO
Qoco
co
ooV)
m u.o oc Q.3 CO
E oE noO :^
03
o1—
600
500
400
300
200
100
3 4 5
Number of Producers
10
-a- 64 Bytes
-= 256 Bytes
-«- 512 Bytes
-o- 1024 Bytes
Figure 5.9. Shared Hardware Link Communication Rates
87
3. Multiple Link Effects on Cominimication Rate
Testing of the message-passing interface with a varying num-
ber of links active was performed for the message-passing interface.
The results of this testing are shown in Figure 5.10. This testing
revealed the same type of kernel data rate limitation found when
testing the shared-memory model interface. Although the data rate of
the message-passing kernel is significantly greater than that of the
shared memory interface, the message passing kernel's data rate limit
remains less than the capacity of one hardware communications link.
o
EC
Bre
Qco08o'c3
c
oex.
X3cooowa5Q.V)
o S
1 2 3
Number of Links Active
•<i- 64 Bytes
-- 256 Bytes
a- 512 Bytes
-o- 1024 Bytes
Figure 5.10. Multiple Hardware Link Communication Rates
88
VI. CONCLUSIONS AND RECOMMENDATIONS
A CONCLUSIONS
This thesis has evaluated the performance of an isolated Trans-
puter and the performance of two abstract programming interfaces
with distributed kernels for a network of Transputers. The results of
the evaluation show that, although the use of a distributed kernel
reduces the effective performance of individual Transputers in the
network, the performance remains relatively high. In general, the
performance of the message-passing interface was superior to that of
the shared-memory model interface. This comparison should not,
however, be taken as absolute. There are improvements discussed in
this chapter that can be made to both interfaces which could signifi-
cantly affect this comparison.
Thus far, all evaluation of the abstract programming interfaces has
been based on testing. The programming interfaces also need to be
examined on the basis of whether or not they can be effectively used in
the development of distributed programs. By its nature, such an
assessment is subjective and prone to be influenced by one's prior
experience and personal programming style preferences. Based on
the experience of developing test and example programs for use with
the interfaces, it appears that both of the interfaces do simplify the
programming of a physical network of Transputers. The primary fac-
tor in this is that the programmer is isolated from having to consider a
specific physical configuration of a network at all stages of program
89
development. Only after the program is developed need a particular
configuration be considered, at which point the physical configuration
is described by a set of simple data tables.
Both interfaces accomplish the goal of isolating the software
designer from physical configuration considerations. There was, how-
ever, no clear choice as to which interface methodology was the over-
all "best" from a programming standpoint. Each methodology had
certain advantages and disadvantages. In general, the initial design,
development, and coding of a distributed program was most effectively
accomplished using the message-passing model. However, once past
the initial implementation of a program, modifications, either to add
additional functionality or to correct initial design errors, are usually
required. In most cases, these modifications and additions were eas-
ier to make when using the shared-memory model interface.
a RECOMMENDATIONS
As was mentioned previously, the programs for these prototype
interface implementations are not fully optimized. As a direction for
further work in developing the abstract interfaces for a network of
Transputers, several modifications to the interface implementation
should be considered. In general, the same types of changes can be
made to both interfaces to improve performance. Some suggested
changes are listed as follows:
• The message packet format can be improved. The current mes-sage format is actually scheduled as three separate transmissions:transmission of the header, transmission of a size value, andtransmission of the data array. Efficiency can be improved byreducing the number of required transmissions. For example, the
90
header information and the data array could be combined into asingle array transmission preceded by a size transmission. Thiswould eliminate one scheduled transmission each time a messageis handled.
• The transmission of messages within the kernel uses OCCAMcommunications channels. This transmission is performed byprocessor-controlled memory-to-memory transfer. When the size
of a data message is large, this decreases the performance of anode. A simple memory manager could be incorporated into the
kernel to allocate space for messages in transit so that the trans-
fer of messages within the kernel could be performed by commu-nicating pointers or handles to the messages via the OCCAMchannels.
• In the case of the shared-memory model, the interface onlytransfers data from a shared-memory segment to a remote nodewhen an application program module at the remote node requeststhe data. If it is known in advance that certain remote nodes ormodules will require frequent access to a node's shared memorysegment, performance could be improved by having the dis-
tributed kernel automatically and periodically transfer selectedsegments of globally shared memory between nodes.
• Many of the data structures within the kernels are sparse. As thesize of a network becomes larger, the storage required for thesestructures will also become larger. At some point, it is likely thata compressed storage scheme for the kernel data structures will
need to be adopted.
In addition to these performance improvements, the kernel
should be modified to include a distributed algorithm for examining
the network to determine a network's physical configuration. Use of
such an algorithm would save the software designer from having to
specify the physical configuration of a network as the last step of the
design process. More importantly, however, to support fault toler-
ance, such an algorithm could be used to identify an altered physical
configuration so that alternate message routing paths could be
established.
91
APPENDIX A
DETAILED TRANSPUTER TIMING TEST CONFIGURATION
AND SOURCE CODE
A SUMMARY
The purpose of t±ie timing test configuration and associated test
code is to provide a general framework for testing Transputer
performance characteristics. The detailed configuration of the test
set-up is shown in Figure A. 1. The test set-up consists of a central
"target" Transputer and four "satellite" Transputers, each attached to
the target Transputer by a communications link. In addition, there
are associated Transputers to perform the functions of control and of
data routing and recording.
a SOURCE CODE
1. Configuration Section
TIMING TEST PROGRAM
{{{
VALVALVALVALVALVALVALVAL
}}}
{{{
CHAN
link definitionslinkOout IS
linklout IS 1
link2out IS 2
link3out IS 3
linkOin IS 4
linklin IS 5
link2in IS 6
link3in IS 7
declarationsOF ANY aim.mid. 0, arm.mid. 1, arm.mid. 2, arm.mid. 3,
mid. arm. 0, mid. arm. 1, mid.ann.2, mid. arm. 3,
92
oMHz
echo10 MHz
3 N*I
NX
echo2
1
Link Numbersand Frequencies
Are as Shown
10 MH:satellite
3
10 MHz
satellite
1
o
target
3 N
o
20 M^
_0_
satellite
1
satellite
2
Figure A. 1. Detailed Test Configuration
93
ann.echo.O, arm. echo, 1, arm. echo. 2, arm. echo. 3,
echo. arm. 0, echo. arm. 1, echo. arm. 2, echo. arm. 3,
host . echo . , host . echo . 2
,
echo . host . , echo . host . 2 :
{ SC target
.target procedure
}
{ SC satellite
.satellite procedure
}
{ SC echo.echo procedure
}
PLACED PAR
{ { { echo
PROCESSOR 10 T4
{ { { channel placementsPLACF, echo.host. AT linkOoutPLACF, host.echo. AT linkOinPLACE echo . arm. AT link3outPLACE arm, echo, AT link3inPLACE echo. arm, 1 AT linklout
PLACE arm, echo. 1 AT linklin
}}}
echo (echo . host . , host .echo. Of
echo. arm. 0, arm. echo, Of
echo. arm. 1, arm. echo. 1, 0)
}}}
{{{ echo 2
PROCESSOR 11 T4
( { { channel placementsPLACE echo .host .
2
AT linkOoutPLACE host .echo,
2
AT linkOinPLACE echo, arm, 2 AT linklout
PLACE arm. echo,
2
AT linklinPLACE echo, aim, 3 AT link2outPLACE arm. echo. 3 AT link2in
}}}
echo (echo . host . 2
,
host. echo. 2,
echo. arm, 2, arm. echo. 2,
echo. arm. 3, arm. echo. 3, 2)
}}}
{ { { satellite
PROCESSOR 13 T4
( ( { channel placements
PLACE arm. echo. AT link2out
PLACE echo .arm . AT link2in
PLACE arm.mid. AT linklout
PLACE mid. arm. AT linklin
}}}
satellite (arm. echo .0, echo. arm. 0, arm. mid. 0, mid. arm. 0, 0)
94
}}}
{ { { satellite 1
PROCESSOR 21 T4
{ { { channel placements
PLACE arm. echo. 1 AT linkOout
PLACE echo.acm.
1
AT linkOin
PLACE arm.mid. 1 AT linkSout
PLACE mid. arm. 1 AT linkSin
}}}
satellite (arm. echo. 1, echo.arm.l, arm.mid. 1, mid. arm. 1, 1)
}}}
{{{ satellite 2
PROCESSOR 23 T4
( { { channel placements
PLACE arm. echo. 2 AT linklout
PLACE echo. arm. 2 AT linklin
PLACE arm.mid, 2 AT link2out
PLACE mid. arm. 2 AT link2in
}}}
satellite (arm.echo. 2, echo.arm.2, arm.mid. 2, mid. arm. 2, 2)
}}}
{{{ satellite 3
PROCESSOR 12 T4
{ { { channel placonentsPLACE arm. echo. 3 AT link3out
PLACE echo. arm. 3 AT link3inPLACE ann.mid.3 AT linklout
PLACE mid. arm. 3 AT linklin
}}}
satellite (arm.echo. 3, echo.arm,3, arm.mid. 3, mid.arm.3, 3)
}}}
{ { { target
PROCESSOR 20 T4
{ { { channel placementsPLACE mid. arm. AT linklout
arm.mid. AT linklinmid. arm. 1 AT link2out
arm.mid. 1 AT link2in
mid . arm . 2 AT IdJik3out
arm.mid. 2 AT linkSin
mid . arm . 3 AT linkOout
ann.mid.3 AT linkOin
PLACEPLACEPLACEPLACEPLACEPLACEPLACE
}}}
target (mid. arm. ,
mid. arm. 2,
}}}
arm.mid. 0, mid. arm. 1,
arm.mid. 2, mid.arm.3,
aim.mid. 1,
arm.mid. 3)
2. Target Node Procedure
PROC target (CHAN OF ANY out. link. 0, in. link. 0, out. link. 1, in. link. 1,
out. link. 2, in. link. 2, out. link. 3, in. link. 3)
( ( { description— This procedure performs communications with adjacent nodes and performs
95
calculations in a loop to ireasure the interaction of communication andcalculation. In general, a timer is started and the conrmunications andthe calculation is started. When the communications are corplete, thetimer and the looping calculation are stopped. The time for thecommunication and the number of loops performed during this time is
extracted and reported.
}}}
{ { { declarationsVKL else IS TRUE :
vaL one. second IS [1000000/64, 1000000]
VAL ave . count IS 1
{{{ priority codesVAL low IS
VAL high IS 1
}}}
{{{ speed codesVAL slow IS
VAL fast IS 1
}}}
{{{ operation codesVAL none IS
VAL null IS 1
VAL assign IS 2
VAL add IS 3
VAL sub IS 4
VAL mult IS 5
VAL div IS 6
VAL movelOO IS 7
VAL movelOOO IS 8
}}}
{{{ timedata.tsrVAL block.size IS 100000:
VAL packet. counts IS [1,2,5,10,20,50,100,200,500,
1000,2000,5000,10000,20000,50000,100000]
}}}
CHAN OF BOOL Status :
CHAN OF INT result :
}}}
{{[ test set
VAL loop . op IS null
VAL op . count IS 4 :
VAL loop.pri IS low
VAL loop . loc IS fast
VAL cuititi.pri IS highVAL comm.loc IS fast
96
}}}
PRI PAR
{ {
{
do corrmunication and reporting
{ {
{
process declarationsINT trigger :
INT packet .count, packet. size :
INT loop. count :
INT start. time, stop. time :
[block. size] BYTE link. data:
PLACE link.data AT 4096:
TIMER clock :
}}}
SEQ in. links = FOR 5
SEQ out.loJiks = FOR 5
SEQ pjacket . count . index = FOR SIZE packet . counts
SEQ
{ { { initialize for test
packet . count : = packet . counts [packet , count . index]
packet. size := block. size / packet. count
}}}
{ { { take care of triggeringin. link. ? triggerPAR
triggertriggertriggertrigger
out. link.
out . link .
1
out. link,
2
out . link .
3
status ! FALSE
}}}
( { { start, the timerclock ? start. time
}}}
{ { { do communicationIF
(in, links + out. links) =
{ { { time one second delay for no links active caseclock ? AFTER start. time + one. second [corrm.pri]
}}}
else
{ { { set up links
PAR
{ ( ( input links
{ ( ( start link input
IF
in. links >
SEQ i = FOR packet. countin. link. ? [link. data FROyl FOR packet. size]
else
SKIP
}}}
{ { { start link 1 input
97
IFin. links > 1
SEQ i = FOR packet. countin. link. 1 ? [link. data FRCM FOR packet. size]
elseSKIP
}}}
{ { { start link 2 input
IF
in, links > 2
SEQ i = FOR packet. count
in. link. 2 ? [link.data FROM FOR packet. size]else
SKIP
}}}
{ { { Start link 3 input
IF
in. links > 3
SEQ i = FOR packet. count
in. link. 3 ? [link. data FRCM FOR packet. size]
else
SKIP
}}}
}}}
{ ( ( ouput links
{ { { start link output
IF
out. links >
SEQ i = FOR packet. count
out. link. ! [link. data FRCM FOR packet. size]else
SKIP
}}}
{ { { start link 1 outputIF
out. links > 1
SEQ i = FOR packet. count
out. link. 1 ! [link. data FRCM FOR packet. size]
else
SKIP
}}}
{ { [ start link 2 outputIF
out. links > 2
SEQ i = FOR packet. count
out. link. 2 ! [link. data FRCM FOR packet. size]
else
SKIP
}}}
{ { { start link 3 output
IF
out. links > 3
SEQ i = FOR packet. count
out. link. 3 ! [link. data FROM FOR packet. size]
else
98
SKIP
}}}
}}}
}}
{ stop the loop count
status ! TRUE
}}
{ { stop the timer
clock ? stop. time
}}
{ { get loop count resultsresult ? loop. count
}}
{( report results
ave . count
loop, op; op. count; loop.pri; loop.loc
ccntn.pri; carrm.loc; in, links; out. linksblock. size; packet. count; packet. size
start. time; stop.tirre; loop. count
out. link.
out . link .
out. link,
out . link .
out, link.
}}}
}}}
{ { ( do looping calculation
( { { process declarationsBOOL done :
INT iteration. count :
INT a, b, c :
[1000] BYTE move. data :
PLACE TOve.data AT 4096 :
}}}
SEQ
{{
bc
}}}
SEQ in. links = FOR 5
SEQ out. links = FDR 5
SEQ packet. count. index = FOR SIZE packet . coiants
{ { { do calculation loop
SEQiteration. count :=
status ? doneWHILE NOT done
PRI ALTStatus ? done
SKIP
TRUE & SKIP
SEQ
iteration . count := iteration. count + 1
— Calulation to be done in the loop goes hereiteration . count
{ init calc variables:= #FFFFFFE:= #FFF
}}}
result
}}
GCMMENT do no looping calculationdo no looping calculation
99
{ { { process declarationsBOOL done :
}}}
SEQ in. links = FOR 5
SEQ out. links = FOR 5
SEQ packet . count . index = FOR SIZE packet .counts
{ { { do wait for synchronization with comnunication processSEQ
statios ? donestatus ? doneresult !
}}}
}}}
}}}
3. Satellite Node Procedure
PROC satellite (CHAN OF ANY to. echo, from. echo, to.irdd, fron.mid,
VAL INT my. tag)
{ {
{
description— This procedure provides for sourcing and sinking communications to place— a communications load on the target node of the test configuration. If the— procedure is placed in a the data reporting path from the target node— to the host systan, the satellite also passes along the data frori the— target node. Packet sizes transmitted from this node are the same size— as the packets being received by the target
.
}}}
{ { { declarationsVAL else IS TRUE :
INT trigger :
INT packet. count, packet. size :
INT start .time, stop.time :
INT ave . count :
INT loop.cp, op. count, loop.pri, loop.loc :
INT comm.pri, comm.loc, mid. in, mid. out :
INT mid. block, mid. packet .count, mid. packet .size :
INT mid. start, mid. stop, mid. loop :
{{{ timedata.tsrVAL block. size IS 100000:
VAL packet. counts IS [1,2,5,10,20,50,100,200,500,1000,2000,5000,10000,20000,50000,100000]
}}}
[block. size] BYTE link. data:
PLACE link. data AT 4096 :
100
TIMER clock :
}}}
PRI PARSEQ in. links = FOR 5
SEQ out. links = FOR 5
SEQ packet . count . index = FOR SIZE packet . counts
SEQ
({{ if 'primary' satellite pass on the trigger
IF
my. tag =
SEQfran. echo ? trigger
to .mid ! trigger
else
SKIP
}}}
{ { { wait for target start trigger
from.mid ? trigger
}}}
{ { { initialize for test set
packet . count : = packet . counts [packet . count . index]
packet. size := block. size / packet. count
clock ? start. time
}}}
{ { { set up links
PAR
{ { { start link input
IF
out. links > rry.tag
SEQ i = FOR packet. count
from.mid ? [link. data FROM FOR packet. size]
elseSKIP
}}}
{ ( { start link output
IF
in. links > ity.tag
SEQ i = FOR packet. count
to.mid ! [link.data FPCM FOR packet. size]
elseSKIP
}}}
}}}
{ { { conplete test set
clock ? stop. time
}}}
{({ if 'primary' satellite pass on the mid results
IF
ave . count
loop. op; op. count; loop.pri; loop.loccorrm.pri; corrm.loc; mid. in; mid. out
'.tag =
SEQ
fron.mid 7
frcm .mid 7
frcm..mid 7
101
frcm.mid ? mid.block; mid. packet .count; mid. packet .size
fran.mid ? mid. start; mid. stop; mid. loop
to .echo ! ave . count
to.echo ! loop. op; op. count; loop.pri; loop.locto. echo ! comm.pri; ccmm.loc; mid. in; mid. out
to.echo ! raid.block; mid.packet .count; mid. packet .size
to.echo ! mid. start; mid. stop; mid. loopelse
SKIP
}}}
{ { { report own timing results
to. echo ! start. time; stop. time
}}}
SKIP
4. Echoing fPata Routing) Node Procedure
PROC echo (CHAN OF ANY to.root, fran. root,
to. aim. 0, from. arm. 0, to. arm. 1, from. arm. 1,
VAL INT m^^.tag)
{ (
(
description~ This procedure echos timing results from the target and satellite nodes to— the host system. The echoing procedure placed in the routing path from the— target to the host echos all the host data.
}}}
{ {
{
declarationsVAL. else IS TRUE :
{{{ timedata.tsrVAL block. size IS 100000:
VAL packet. counts IS [1,2,5,10,20,50,100,200,500,1000,2000, 5000, 10000, 20000, 50000, 100000]
}}}
INT ave . count :
INT loop. op, op. count, loop.pri, loop.loc :
INT CCTttn.pri, comn.loc, mid. in, mid. out :
INT mid. block, mid. packet .count, mid.packet .size :
INT mid. start, mid. stop, mid. loop :
INT trigger :
INT arm. start, arm. stop :
SEQ in. links = FOR 5
SEQ out. links = FOR 5
SEQ packet . count . index = FOR SIZE packet. counts
102
if 'primary' pass on the start triggerSEQ
{{{
IF
rry.tag =
SEQ
from, root
to. arm.
elseSKIP
}}}
( { { if 'primary' echo pass on the target results
IF
? trigger! trigger
rr^.tag =
SEQfrom. arm.
from. arm.
from. arm.
from. arm.
frcm.ann.O
to . root
to . root
to . root
to . root
to . root
else
ave . coiont
loop, op; op. count; loop.pri; loop.loc
comm.pri; comm.loc; mid. in; mid. out
mid.block; mid.packet .count; mid.packet .size
mid. start; mid. stop; mid. loop
ave. count
loop. op; op. count; loop.pri; loop.loc
comm.pri; comm.loc; mid. in; mid. out
mid.block; mid.packet .coiont; mid. packet .size
mid. start; mid. stop; mid. loop
SKIP
}}}
{ { ( report satellite timing results
from. aim.
to . rootfrom. arm. 1
to . root
}}}
arm. start; arm. stop
arm. start; arm. stoparm. start; arm. stoparm. start; arm. stop
5. Host (Data Recording) Procedure
PRCC root (CHAN OF ANY ke^tooard, screen,
[4]CHAN OF ANY fran.u. filer, to. u. filer,
CHAN OF ANY fran. fold, to. fold,
fran. filer, to. filer)
{{{ descriptionThis procedure runs on the host system. It triggers the start of a testrun and gathers data from the network upon conpletion of the test run.
The data gathered is sent to a disk file on the host system for laterevaluation or transfer. A subset of the gathered data is displayed onon the host's screen.
}}}
{ { { TDS library references
103
#USE "XtdsiolibXuserio.tsr";#USE "\tdsiolib\interf .tsr":
}}}
declarationsVALVALVALVAL
else IS TRUE :
trigger IS 1 :
uS. per. tic IS [64, 1]
tab IS BYTE 9
USTT
[63] BYTECHAN OF ANY
id. length, result
:
id. string:
data . file
:
{{{ timedata.tsrVAL block.size IS 100000:
VAL packet. comts IS [1,2,5,10,20,50,100,200,500,
1000,2000,5000,10000,20000,50000,100000]
}}}
{ (
{
network report
INT ave . count :
INT loop. op, op. count, loop.pri, loop.loc :
INT comm.pri, carm.loc, mid. in, mid. out :
INT mid. block, mid.packet .count, mid.packet, size :
INT mid. start, mid, stop, mid. loop, mid. time :
INT arm. 0. start, arm. 0. stop, arm. 0.time :
INT arm. 1. start, arm. 1. stop, arm. 1.time :
INT armc2. start, arm. 2. stop, arm. 2.time :
INT arm. 3. start, arm. 3. stop, arm. 3.time :
}}}
{ { { network channelsVALVALVALVALVALVALVALVAL
linkOout IS
liiiklout IS 1
link2out IS 2
link3out IS 3
linkOin IS 4
linklin IS 5
lixik2in IS 6
lixik3in IS 7
CHAN OF ANY from. net. 0, from. net. 2,
to. net. 0, to. net. 2 :
PLACEPLACE
PLACEPLACE
to . net . AT link2out
to . net .
2
AT linkSout
frem. net. AT link2infrom . net . 2 AT link3in
}}}
PAR
{ { { set the filename and run the host system filer
104
SEQ[id.strijig FRCM FOR 6] := "timeOl"
id. length := 6
scrstream. to . server (data . file,
from. filer, to. filer,
id. length, id. string,
result)
}}}
{ { { run the test network
SEQ
SEQ in. links = FOR 5
SEQ out. links = FOR 5
SEQ packet. count. index = FOR SIZE packet . counts
SEQ
{ { { trigger network
to. net. ! trigger
}}}
{ { { receive network results
fran, net. ? ave. count
from. net. ? loop. op; op. count; loop.pri; loop.loc
from. net. ? ccxnm.pri; carcm.loc; mid. in; mid. out
from. net. ? mid.block; mid.packet .count; mid.packet .size
from. net. ? mid. start; mid. stop; mid. loop
fram. net. ? arm. 0. start; arm. 0. stopfrom. net. ? arm. 1. start; ann.l.stopfrom. net.
2
? ann.2.start; arm. 2. stopfrom. net.
2
? arm. 3. start; arm. 3. stop
mid. time := (mid. step-mid. start) *uS. per. tic [corm.pri]
arm. 0. time := aim. 0. stop-arm. 0. start
arm. 1. time := arm. 1. stop-arm. 1. start
arm. 2. time := arm. 2. stop-arm. 2. startarm. 3. time := arm. 3. stop-arm. 3. start
}}}
{ { { put results to file
{ { { processor typewrite . int (data . file, 800, 1)
write . char (data . file, tab)
}}}
{ { { processor speedwrite . int (data . file, 20,1)
write . char (data . file, tab)
}}}
{ { ( link speedwrite . int (data . file, 20,1)
write . char (data . file, tab)
}}}
{ ( ( loop conditionswrite . int (data . file, loop . op, 1)
write . char (data . file, tab)
write . int (data . file, op . count , 1
)
write . char (data . file, tab)
write . int (data . file, loop .pri, 1)
105
write . char (data . file, tab)
write . int (data . file, loop . loc, 1)
write . char (data . file, tab)
}}}
{ { { link conditions
write. int (data.file,corrm.pri, 1)
write . char (data . file, tab)
write . int (data . file, coirm. loc, 1)
write . char (data . file, tab)
write . int (data . file,mid. in, 1)
write . char (data . file, tab)
write. int (data , file, mid, out, 1)
write . char (data . file, tab)
}}}
{{{ corm. conditions
write. int (data. file, mid.block, 1)
write . char (data . file, tab)
write . int (data . file,mid.packet .count, 1)
write . char (data . file, tab)
write . int (data . file,mid.packet . size, 1)
write . char (data . file, tab)
}}}
{ { { target results
write . int (data . file,mid . loop, 1
)
write , char (data . file, tab)
write. int (data. file, mid. time, 1)
write . char (data . file, tab)
}}}
{ { { satellite results
write . int (data . file, arm. .time, 1)
write . char (data . file, tab)
write . int (data . file, arm. 1 .time, 1)
write . char (data . file, tab)
write. int (data. file, arm. 2. time, 1)
write . char (data . file, tab)
write . int (data . file, arm. 3 .time, 1)
newline (data . file)
}}}
}}}
{ { { put results to screenwrite. int (screen, in. links, 3)
write . int ( screen, out . links , 3
)
write. int (screen, mid.packet .count,7)
write. int (screen, mid.packet .size,?)
write. int (screen, mid. loop, 9)
write. int (screen, mid. time, 9)
write. int (screen, arm. 0. time, 9)
write. int (screen, arm. 1. time, 9)
write. int (screen, arm. 2. time, 9)
write. int (screen, arm. 3. time, 9)
newline (screen)
}}}
{ { { terminate host system filer
data. file ! 24
}}}
106
}}}
107
APPENDIX B
TIMING DATABASE DESCRIPTION
A DISCUSSION
The large volume of data generated by individual timing tests was
loaded into a database management system, to facilitate ease of access
and aggregation of any future test results. The Ingres Relational
Database Management System, currently available on the departmental
mini-computer, was selected for this purpose. The Ingres database
management system provides a wide range of tools for accessing and
manipulating data. These tools range from a database query language
that can be used to program complex data manipulations and opera-
tions to a completely menu-driven interactive shell for accessing the
database.
This appendix documents the format of the Ingres timing
database. Also, some examples are provided on ways in which timing
data can be accessed through the use of the query language. Complete
documentation for the use of the many features of the Ingres relational
database system may be found in [Re85].
R DATABASE DESCRIPTION
A relational database consists of a set of relations. A relation is, in
turn, composed of a set of data fields. A relation can be thought of as a
table where the data fields represent the columns of the table. One
108
row of the table then represents one individual set of data entered in
the table.
In the timing database, there are several relations that are
defined. The first of these relations, shown in Table B.l, is the rela-
tion that actually holds the timing test data. This table shows each of
the fields in the timing relation, the tj^e of data the field is formatted
for and the length of the data field (in bj^es).
TABLE B.l
TIMING DATA RELATION DEFINITION
Relation Name : timing
column name type length
processor integer 2
procspeed integer 1
linkspeed integer 1
opcode integer 1
opcount integer 1
looppri integer 1
looploc integer 1
commpri integer 1
commloc integer 1
linksin integer 1
linksout integer 1
blocksize integer 4
packcount integer 4
packsize integer 4
loopcount integer 4
looptime integer 4
armOtime integer 4
armltime integer 4
arm2time integer 4
arm3time integer 4
109
The name of t±iis relation is "timing." Note that the fields defined
in this relation are the same as the data elements written to the host
system file by the timing test program of Appendix A.
There are two additional relations in the timing test database.
These relations, shown in Tables B.2, B.3, and B.4, provide for trans-
lating the numerically encoded calculation loop operation codes, the
timing test priorities, and memory locations in the "timing" relation
into a text description of the corresponding test condition.
TABLE B.2
RELATION DEFINITION FOR CONVERTING
ENCODED OPERATION CODE NAMES
Relation Name: opcodenames
column name type length
opcodeopcodename
integercharacter
1
32
TABLE B.3
RELATION DEFINITION FOR CONVERTING
ENCODED PRIORITY NAMES
Relation Name: prinames
column name type length
pripriname
integercharacter
1
32
110
TABLE B.4
RELATION DEFINITION FOR CONVERTING
ENCODED LOCATION NAMES
Relation Name: locnames
column name type length
loclocname
integer 1
character 32
C EXAMPLES
There are three database operations that are provided as exam-
ples of query language interaction with the database. These operations
are: loading an external text file of data into the database, retrieving
information from the database, and, finally, transferring data from the
database to an external text file.
1. Data Loading
Figure B.l is a listing of the query language instructions for
transferring a file of test data from the timing test program into the
timing database. For the most part, these instructions are self
explanatory. Of note are the formatting codes, "cOtab" or "cOnl,"
associated with each of the fields. These formatting codes mean that
the text file representation of of a field's data is a variable length char-
acter string terminated by a tab character or by a new line character.
Ill
copy timing
processor = cOtab,procspeed = cOtab,linkspeed = cOtab,opcode = cOtab,opcount = cOtab,looppri = cOtab,looploc = cOtab,commpri = cOtab,commloc = cOtab,linksin = cOtab,linksout = cOtab,blocksize = cOtab,packcount = cOtab,packsize = cOtab,loopcount = cOtab,looptime = cOtab,armOtime = cOtab,armltime = cOtab,arin2time = cOtab,arm3time = cOnl
from "/work/ingres/time file"
2c Data Retrieval
Figures B.2 and B.3 are examples of query language data
retrievalSc In Figure B.2, the time (in }iseconds) for single link bidi-
rectional communication of 100,000 bytes with a packet size of 10
bytes is retrieved for all types of processors. The results of this
retrieval are also displayed.
In Figure B.3, the same retrieval is processed but, instead of
displaying the results, the results of the query are used to form the
new relation "new relation."
112
retrieve
timing. process or,timing . looptime)
wheretiming. linksin = 1 andtiming . linksout = 1 andtiming. packsize = 10 andtiming. opcode = opcodenames . opcode andopcodenames . opcodename = "null loop"
sort bytiming. processor,timing . looptime
Executing . . .
1proces 1 looptime 1
1 1
1 1
1 4141 19165911 8001 12314311 1
1 1
Figure B.2. Example of Retrieval for Display
retrieve into new relation
timing. processor.timing . looptime>
wheretiming. linksin = 1 andtiming. linksout = 1 andtiming. packsize = 10 andtiming. opcode = opcodenames .opcode andopcodenames . opcodename = "null loop"
sort bytiming. processor.timing . looptime
Figure B.3. Example of Retrieval for Forming a New Relation
113
As can be seen in Table B.5, which lists the characteristics for
this new relation, the fields in the new relation inherit their charac-
teristics from the relation from which the field originated.
TABLE B.5
RELATION DEFINITION FOR THE RETRIEVED NEW RELATION
Relation Name: new_relation
column name type length
processorlooptime
integerinteger
3. Data UnLoading
Figure B.4 shows the query language listing for transferring
the previously created relation "new_relation'' to an external text file.
As with the loading of data into the database, a formatting code is
associated with each field to be transferred to the external text file.
The interpretation of these formatting codes is the same as for the
formatting codes for loading the database.
copy new_relation(
processor = cOtab,looptime = cOnl)
into "/work/ingres/text . file"
Figure B.4. Query Language Listing for Unloading the Database
114
APPENDIX C
DETAILED SOURCE CODE FOR THE SHARED MEMORY ABSTRACTINTERFACE LIBRARY
A SYMBOLIC CONSTANTS
{{{ LIB this is LIBRARY "\ecslib\ecssyiTib.tsr"
Library ID
{ { { VUL Declarations
{ { { useful symbolic constantsVAL ELSE IS TRUE :
VAL NIL IS -1 :
VAL OUTPUT IS :
VAL INPUT IS 1 :
}}}
{ { { definition of system limits— define the maximum size of an individual messageVAL MAX.MESSAGE.SIZE IS 1028 :
—' define the maximum size of one node's part of the
— globally shared, memoryVAL MAX.DATA.PER.NODE IS 4096 :
— define the maximum number of event counts allowed— in the systemVAL MAX. COUNTS IS 256 :
— define the maximum number of event counts to be— maintained, ty one nodeVAL MAX. COUNTS.PER.NODE IS 16 :
— define the maximum space allocation for the waiting— process listsVAL MAX.AWAIT. QUE.ENTRIES IS 16 :
— define the maxinxim number of processes per nodeVAL MAX. PRDC. PER.NODE IS 16 :
}}}
{ ( { action codes for communications packetsVAL READ.CMD IS 10
VAL READ. REP IS 11
VAL AWAIT.CMD IS 20VAL AWAIT.REP IS 21VAL AWAIT. QUE.FULL IS 22
VAL ADVANCE.CMD IS 30
VAL ADVT^CE.REP IS 31
115
VAL TICKET.CMD IS 40
VAL TICKET. REP IS 41
VM. GET.CMD IS 50
VAL GET. REP IS 51
VAL put.cm: IS 60
VAL PUT. REP IS 61
}}}
}}}
}}}
a KERNEL PROCEDURE
PROC ecs.kemeKt] [2]CHAN OF ANY links,
VAL INT node. id,
VAL []INT count. node,
VAL []INT count. size,
VAL []INT node. link)
( { ( description— Event Counts and Sequencers for the Transputer
— This program is an distributed kernel for irrplementing event counts and— sequencers on a network of transputers. It should be run at high™ priority in parallel with application program modules. The application— program modules are linked to the kernel via bidirectional comnnunication— channels that are elements of the links channel array parameter.
}}}
{ { { libraries#USE "\ecslib\ecssyrob.tsr"
}})
{ { { local declarations
{ { { local abbreviationsVAL INT num. nodes IS SIZE node. link :
VAL INT num. counts IS SIZE count .node :
VAL INI no. links IS SIZE links :
VAL INT no. h. links IS 4 :
VAL INT no. s. links IS no. links - no. h. links :
}}}
{ { { channel declarations and assignments— channels for kernel control of the hard/physical communications links[2 *no.h. links] CHAN OF ANY hard. links :
PLACE hard. links AT :
}}}
{ { ( communications data structures— basic format of a received ccmmunications header
[6] INT header :
[]INT short.header IS [header FROM FOR 2]
INT action . code IS header [0] :
INT count . id IS header [1] :
116
INT to. node IS header [2]
INT to.proc IS header [3]
INT from, node IS header [4]
INT from.proc IS header [5]
INT data. size :
— potential formats of received data arrays
[MAX.MESSAGE. SIZE] BYTE data. array :
INT coijnt.value RETYPES [data. array FRCM FOR 4]
INT index RETYPES [data. array FROM FOR 4]
INT seg.size RETYPES [data. array FRCM 4 FOR 4]
}}}
{ { { event count data structures— variables to track allocation of shared, resources
INT node . counts
,
base. count,
next . free :
— storage allocation for kernel data structures
[MAX. COUNTS] INT
[MAX.COUNTS. PER.NODE] [4]INT
[MAX .DATA . PER .NODE ] BYTE
[MAX.AWAIT. QUE. ENTRIES] [4] INT
[MAX .AWAIT . QUE .ENTRIES ] INT
count . array . indices
count . array :
node. data :
node . awaits :
free . list :
}}}
{ { { procedures
{{{ PROC buffer. in. link (chan. in, chan. out)
PRDC buffer. in. link (CHAN OF ANY in. link, out. link)
{ { { description— This procedure provides a 'soft' buffer for hard link input and output.— This buffer increases the overall throughput of the node.
}}}
{ { { declarations[MAX .MESSAGE . SIZE] BYTE
[6]INT
INT
}}}
data. array. 0,
data. array.
1
header. 0,
header . 1 :
data. size. 0,
data . size . 1 :
SEQ
in. link ?
WHILE TRUE
SEQ
PARout . linkin. link
PARout. link
header. 0; data. size. : :data. array.
header . ; data . size . : : data . array .
header. 1; data.size.l: :data.array.l
! header. 1; data.size.l: :data.array.l
117
in. link ? header. 0; data.size.O: :data. array.
}}}
{ { { PROC buffer . out . link (chan . in, chan . out
)
PRDC buffer, out. link (CHAN OF ANY in. link, out. link)
{
f
description~ This procedure buffers output to hardware communications links.— To irrprove performance, the buffer is sized to hold one message— for each process at the node. The buffer is organized in a ring— configuration. WARNING: This inplementation uses variables— shared between parallel processes. The ring buffer itself, and— next in and out pointers are shared. When this procedure is run— at high priority, the sequencing of the code guarantees that— there will be no access conflicts to these shared structixres.— The purpose for this sharing is that tlie irrplenentation is— slightly faster under average case loading conditions and is no— worse than a 'normal' request-next-item buffer under any conditions.
}}
{ { { declarations— a local alternate name for the system li, it used to~ size the bufferVAL INT buff .size IS MAX. PRDC. PER.NODE :
— pointers to positions in the ring bufferINT next . in,
next. out :
— define the ring buffer[buff .size] [6] INT header :
[buff .size] INT data. size :
[buff .size] [MAX.MESSAGE. SIZE] BYTE data. array :
— channels for ccammunicating between the input and output— parts of the bufferCHAN OF ANY wake. in,
wake . out :
}}}
SEQ
{ { { initialization
next. in :=
next. out :=
}}}
PAR
{ { { buffer input
{ { { local declarations
INT buff.no,
any :
}}}
118
WHILE TRUE
SEQbuff.no := next . inXbuff . size
PRI ALT— if the buffer is not full
(next. in < (next. out + buff. size)) & SKIP
SEO— input an item to the buffer
in. link ? header[buff .no] ; data.size [buff.no] :: data. array [buff .no]
next. in := next. in + 1
— if the buffer was enpty, let the sleeping output know
IF
(next. in - next. out) = 1
wake. out ! NIL
ELSE
SKIP— the bxiffer is full, go to sleep until itani output
wake . in ? anySKIP
}}}
{ { ( do output
{ { { local declarations
INT buff.no,
any :
}}}
WHILE TRUE
SEQ
buff.no := next. out\buff .size
PRI ALT— if the buffer is not errpty
(next. in > next. out) & SKIP
SEQ— output a buffered, item
out . link ! header [buff . no] ; data . size [biaff . no] : : data . array [buff .no]
next. out := next. out + 1
— if the buffer was full, wake the slewing input process
IF
(next. in - next. out) = (buff. size - 1)
wake . in ! NILELSE
SKIP— the buffer is errpty, wait for a wake-up after some input
wake . out ? anySKIP
}}}
{{{ PRDC send.packet (header, data.size, data. array)
PROC send.packet (VAL [6] INT header, VAL INT data.size, []BYTE data. array)
{ ( { description— This procedure packages rressages for sending either to a remote node or
119
— to a local process,
}}}
{ ( { declarations— define subset of overall header for local communicationV?iL [JUSTT short. header IS [header FRCM FOR 2] :
VAL INT to. node IS header [2] :
VAL INT to.proc IS header [3] :
}}}
IF
{ { { packet is to local procedure — return it locallyto.node = node. id
links [to.proc] [OUTPUT] ! short .header; data. size :: data. array
}}}
{ { { else packet is for remote nocte —- pass it on
ELSElinks [node. link [to. node]] [OUTPUT] ! header; data. size :: data. array
}}}
}}}
PROC process.packet (link . no)
PROC process .packet (VAL INT link.no)
{ { { description— This procedure performs a function based on the value of the action. code—
• element in the header of a carmunications packet. These functions— correspond to the basic calls provided by the event counts and sequencers— procedures (read, advance, await, ticket, put and get)
.
}}
IF
{ { { command packet requires processing by m/ node — handle it
(to.node = node. id) PM) (to.proc = 0)
{ { { get characteristics of count . idcount . array . indices [count .id] :
count . array [count . index] [ ]
count . array [count . index] [ 1
]
count. array [count. index] [2]
count . array [count . index] [3]
{ { { read commandaction, code = READ.OO— represent the count as an array of bytes[]BYTE out. count RETYPES current . count :
— return the value of the coiont
send. packet ( [READ. REP, count. id, from. node, from. proc, node. id, 0],
SIZE out. count, out. count)
}}}
INT count . i ndex IS
INT current. count IS
INT current. ticket IS
INT base IS
INT head IS
}}}
IF
120
node. id, 0]
,
list
{ { { advance ccammand
action. code = ADVANCE. CM)
{ ( { declarations
— Each time an advance is performed, the waiting process list— associated, with the target event count is checked to determine— if the advanced count is a waited-for count. If so, a wake-up— message is sent to the suspended process.
BOOL more . to . wake
:
}}}
SEQ
{ { { initialization— advance the event count
current . count := current . count + 1
more. to. wake := TRUE
}}}
WHILE iTore.to.wake
IF
{ { { no items on the waiting process list
head = NIL
more. to. wake := FALSE
}}}
{ { { check the first item on the waiting process list
ELSE
{ { { abbreviations— extract the wating process list entries for the first item—
• on the waiting process list.
INT wait. count IS node. awaits [head] [0]
INT wait. node IS node. awaits [head] [1]
INT wait.proc IS node. awaits [head] [2]
INT wait. next IS node.awaits[head] [3]
}}}
IF
{ { { count reached - wake the first process on the listwait. count <= current . count
SEQ
{ { { send wakeup messagesend.packet ( [AWAIT .REP, count . id, wait . node, wait .proc.
0, data. array)
}}}
{ { { remove the first item, return, array position to free
next. free := next. free - 1
free . list [next . free] : = headhead := wait .next
}}}
}}}
{ ( { count not reached - no more to wakeELSE
more. to. wake := FALSE
}}}
}}}
121
{ { { await carmandaction. code = AWAIT.CMD
IF
{ { { wakeup if count already reachedcount. value <= current . count
send.packet ( [AWAIT. REP, count. id, from. node, from.proc, node. id, 0]
,
0, data. array)
}}}
( ( { add to waiting process list if a future countELSE
IF
{ { { no room for an await list entry - send back await failnext, free = MAX.AWAIT.QUE. ENTRIES
send.packet ( [AWAIT. QUE. FULL, count, id, frcm.node, from.proc.
node . id, 0]
,
list
0, data. array)
}}}
{ { { is rxxm for an await list entry - add the entry to the await
ELSE
{ { { abbreviations— extract and abbreviate the await list entries for the—
' next available position (from free list)
INT wait. count IS node.awaits [free. list [next. free] ] [0]
INT wait. node IS node.awaits [free. list [next. free] ] [1]
INT wait.proc IS node . awaits [free . list [next . free] ] [2
]
INT wait. next IS node.awaits [free. list [next. free] ] [3]
}}}
SEQIF
{ i { handle list errpty case
head = NILSEQ
wait. next := NILhead := free . list [next . free]
}}}
[ [ ( handle insert at head of list case
count. value <= node . awaits [head] [0]
SEQ
wait. next := headhead := free . list [next . free
]
}}}
{ { { handle all other cases
ELSE
{ ( { declarations— walk the list to find the proper ordered insertion point— following local variables track position of the search
INT cursor,
prior :
}}}
SEQ
{ ( { initialize
prior := head
cursor := node.awaits [head] [3]
}}}
122
{ { ( scan list
WHILE cursor <> NIL
IF
count. value <= node. awaits [cursor] [0]
cursor := NIL
ELSESEQ
prior := cursor
cursor := node. awaits [cursor] [3]
}}}
{ { { insert into list
wait .next : = node .awaits [prior] [ 3
]
node.awaits[prior] [3] := free. list [next. free]
}}}
}}}
{ { { build table entry - rermove position from free list
wait .count := count. value
wait .node := from.nodewait .proc := from,proc
next . free := next. free +
}}}
}}}
}}}
}}}
{{{ ticket caritand
action. code = TICKET. CMD—
• transform ticket value to byte array for transmission as data
[]BYTE out. count RETYPES current . ticket :
— send packet and increment the ticket value
SEQsend.packet( [TICKET. REP, count. id, fron.node, from. proc, node. id, 0],
SIZE out. count, out. count)
current. ticket := current .ticket + 1
}}}
{ { { put commandaction. code = PUT.CMD
{ { { local definitions~ define the location in the node. data array of shared memory— for where to write the transferred dataVAL USTT put .base IS base + index :
VAL []BYTE index. array RETYPES index :
VAL INT index. size IS SIZE index. array :
VAL INT put. size IS data. size - index. size :
}}}— store the requested data array[node. data FRCM put .base FOR put. size] :=
[data. array FRCM index. size FOR put. size]
}}}
{ { { get carmand
action. code = 0;t.CMD— determine the starting location for the read operation— in the shared memory segmentVAL INT get. base IS base + index :
— send the requested datasend.packet ( [GET. REP, count. id, from. node, frcm.proc, node. id, 0],
123
seg.size, [node.cJata FPCM get .base FOR seg.size]
1}
}}}
{ ( { else — just pass the message on to its destination nodeELSE
send.packet (header, data. size, data. array)
}}}
}}}
{{{ PRDC build.packet (link.no)
PROC build.packet (VAL INT link, no)
{ ( { description— To reduce the internal node cormiunication load^ message headers from local— processes include only a subset of the elements used for communicating— between nodes. This procedure 'e^qsands' a local comraunications packet— header into an external packet header.
SEQfrom, nodefrom.proc
to. nodeto.proc
= node old= link.no= count.node [count, id]
=
}}}
}}}
SEQ
{ { { initialization
({{ initialize the count. array and count. array. indices
SEQ i = FOR MAX.COUNTScount . array , indices [ i] := NIL
node. counts :=
base. count :=
SEQ i = FOR num. countsIF
count.node [i] = node. idSEQ— enter quick look-up index
count . array . indices [ i] : = node . counts— build initial count datacount .array [node. counts] [0] :=
count. array [node. counts] [1] :=
count .array [node. counts] [2] := base. count
count .array [node. counts] [3] := NIL— allocate shared memory segment
node. counts := node. counts + 1
124
base. count := base. count + count. size [i]
ELSE
SKIP
}}}
{ { { initialize the free list
SEQ i = FOR MAX.AWAIT. QUE. ENTRIESfree. list [i] := i
next. free :=
}}}
}}}
( { { run systemPAR
{ { { buffer hardware links
PAR link.no = FOR no. h. links
PARbuffer, in. link (hard. links [link.no + no. h. links], links [link.no] [INPUT])
buffer.out.link (links [link.no] [OUTPUT], hard. links [link.no]
)
}}}
[ { { imonitor conmunications
{ { { local declarations— The conmunications monitoring procedure uses a 'fair' irrpleirentation
— of the ALT structure. In general, it provides that if a connmunication— was just received from one of several channels, that channel will have— the lowest priority for the next execution of the ALT. In this way,— no single cormunications channel cam 'starve' access to the kernel— from the other comnnunications channels. The variables defined— below are used to track the last comnnunicating channel for this— 'fair' ALT. Note that hard and soft links are treated separately.
INT last.h, last.s:
}}}
SEQ
{ { { initializationlast.s :=
last.h :=
}}}
WHILE TRUE
PRI ALT
{ { { handle external hardware linksALT link.no = FOR no. h, links
links [ (link. no + last .h) \no.h. links] [INPUT] ? header;data . size : : data .array
SEQ
process.packet ( (link.no + last.h) \no.h. links)
last.h := (no. h. links - 1) + link.no
}}}
{ { { handle local soft links
ALT i = FOR no. s. linkslinks[((i + last.s) \no.s. links) + no. h. links] [INPUT] ? short .header;
data . size : : data . arrayVAL INT link.no IS ( (i + last.s) \no.3. links) + no. h. links :
SEQ
build,packet (link.no)
process .packet (link.no)
last.s := i + (no. s. links - 1)
125
}}}
}}}
}}}
C READ PROCEDURE
PRCC read ([2] CHAN OF ANY link,
VAL INT coimt.id,
INT count . value
)
{ { { description— This procedure reads and returns the value of the argument specified— event count.
}}}
{ { { libraries#USE " \ecslib\ecssyrtib . tsr"
}}}
{ { { declarations— altemalte channel namesCHAN OF ANY linkin IS link[0] :
CHAN OF ANY linkout IS lin]c[l] :
— communications data structures
[2] INT short .header :
INT data, size :
[]BYTE data. array RETYPES count. value:
}}}
SEQ— request count value from kernel
linkout ! [READ.CMD, count. id];— receive count value from kernellinkin ? short. header; data. size :: data. array
D. ADVANCE PROCEDURE
PRDC advance ([2] CHAN OF ANY link,
VAL INT count. id)
{ { { description— This procedure requests that the distributed kernel
126
— increment the specified event counter.
}}}
{ { { libraries
#USE "\ecslib\ecssimb.tsr"
}}}
{ { { declarations
— alternate name for channel
CHAN OF ANY linkout IS link[l] :
}}}
— request the advance action
linkout ! [ADVANCE. CM}, count. id];
E. AWAIT PROCEDURE
PROC await ([2] CHAN OF ANY link,
VAL INT count. id,
VAL INT count .value)
{ {
{
description— Perfoim the await function on the specified event count and— await the argument count value.
— Note that the waiting process table (node. awaits) in the— kernel that maintains the event count may be full.— In this case, the kernel retioms a message to this process— identifying that this is the case. This procedure then— waits a period of time and retransmits the await request to— the kernel. If the await request is again rejected due to— a full waiting process table, the wait time is doubled before— retrying the await request. This process will continue until— either the requested wait-for count is reached or the await— request is accepted and placed in the kernel's waiting process— table.
}}}
{ { { libraries#USE "\ecslib\ecssymb.tsr"
}}}
{ { { declarations— alternate channel namesCHAN OF ANY linkin IS link[0] :
CHAN OF ANY linkout IS link[l] :
— communications data structures
127
[2] INT short .header :
INT retiim.tag IS short. header [0] :
VAL []BYTE data. array RETYPES count. value :
INT dummy. size :
[4] BYTE dummy. array :
— structures for controlling retransmit of rejected awaitsTIMER clock :
INT current. time, wait .time :
}}}
SEQ— request await from kernellijikout ! [AWAIT. CMD, count. id]; (SIZE data. array) : :data. array— receive reply from the kernellinkin ? short.header; durrrny. size : idumrry. array
{ { { check to see if the await was acceptedIF
return, tag = AWAIT. QUE ..FULL
{ { { wait by binary back-off and retry the await request
SEQ— initial wait time (short)
wait. time := 1
WHILE return. tag = AWAIT. QUE.FULLSEQclock ? current .time
clock ? AFTER (current. time PLUS wait. time)—
• set-i:^) the doubled delay timewait. time := wait. time * 2
— request await frem kernellinkout ! [AWAIT.CMD, count. id]; (SIZE data. array) :: data. array— receive response from kernellinkin ? short .header; dumrny . size : :dummy . array
}}}
ELSE
{ { { do nothing —• await was acceptedSKIP
}}}
}}}
F. TICKET PROCEDURE
PRDC ticket ( [2] CHAN OF ANY link,
VAL INT count. id,
INT count . value
)
{ { { description— This procedure request a reservation or 'ticket' from the— distributed kernel for the specified event count/sequencer.— The value of the ticket is returned in the count. value
128
— paraireter.
}}}
{ { { libraries
#USE "\ecslib\ecssymb.tsr"
}}}
{ { { declarations— alternate channel names
CHAN OF ANY linkin IS link[0] :
CHAN OF ANY linkout IS link[l] :
— corrniunication data structures
[2] INT short .header :
INT data. size :
[]BYTE data. array RETYPES count. value :
}}}
SEQ— request ticket from kernel
linkout ! [TICKET.CMD, count. id];— receive ticket from kernel
linkin ? short.header; data. size :: data. array
G. PUT PROCEDURE
PRDC put ([2] CHAN OF ANY link,
VAL INT count. id,
VAL INT index,
VAL []BYTE data. array)
{ {
{
description— This procedure writes data to the shared mertory segment— associated with the argument event count. The data array— is written offset from the start of the shared, memory— segment by the number bytes specified by the index— parameter.
}}}
{ { ( libraries
#USE "\ecslib\ecssvmb.tsr"
}}}
{ ( { declarations— alternate channel nameCHAN OF ANY linkout IS link[l] :
— convert the index value to an array of bytes for
129
— transmission as dataVAL []BYTE index. array RETYPES index :
— identify the size of the data arraysVAL INT index. size IS SIZE index. array :
VAL INT data. size IS SIZE data. array :
VAL INT local. size IS index. size + data. size :
— define a local array for the data and index value[MAX.MESSAGE. SIZE + index. size] BYTE local. array :
}}}
SEQ— carbine the data array and the array representation of— the index value into a single array[local. array FRCM FOR index. size] := index. array[local. array FRCM index. size FOR data. size] := data. array— request kernel to store the data arraylinkout ! [PUT.CMD, count. id]; local , size :: local. array
H. GET PROCEDURE
PROC get ([2] CHAN OF ANY link,
VAL INT count. id,
VAL INT inctex,
[]BYTE data, array)
{ { { description— This procedure performs a read of the shared memory~ segment associated with the argument event count.~ The number of bytes read is determined by the size— of the argument data array. The bytes read from— the shared memory are offset from the start of the— shared memory segment b^' the number of bytes specified— in the index parameter.
}}}
{ { { libraries
#USE "\ecslib\ecssymb.tsr"
{ { { declarations— alternate name for the channels to kernel
CHAN OF ANY linkin IS link[0] :
CHAN OF ANY linkout IS link[l] :
— communications data structures
[2] INT short.header :
INT data. size :
130
— define the size and index retrieval paraireters as— an array of bytes for sending as dataVAL []BYTE get. specs RETYPES [index, SIZE data. array]
VMu INT spec. size IS SIZE get. specs :
}}}
SEQ— request the shared memory read operation
linkout ! [GET .OO , count . id] ; spec . size : :get . specs—
• receive the results of the read operation
linkin ? short.header; data. size :: data. array
131
APPENDIX D
SAMPLE PROGRAM USING THE SHARED-MEMORY INTERFACE
A DESCRIPTION
This appendix provides an example of the meUiodology employed
to write a program using the shared-memory interface developed for
this thesis. The programming example selected to demonstrate the
methodology is the bounded buffer problem. In this problem, a
producer passes data to a consumer via a bounded buffer or queue.
The buffer serves to "smooth out" variations in the data production
and consumption rates. To add slightly to the single producer
bounded buffer problem, this implementation of the problem will
provide for two producers of data.
a MODULARIZATION
This particular problem can be naturally subdivided into three
basic modules. Each of these are listed and described below.
1- Producers
Each instantiation of this module generates a continuous
stream of data elements at a specified average rate with some
characteristic random variation in the rate. The data elements
produced are placed in a segment of memory shared with the
consumer of the data. Access to the shared-memory segment is
controlled using an event count and a sequencer. The sequencer
controls the access of the two producers to the buffer. The associated
132
event count is advanced when either producer uses its ticket to add
data to the buffer.
2. Consmner
This module continuously removes data elements from the
shared memory buffer at a specified average rate with some
characteristic random variation in the rate. The average rate for the
single consumer should be at least equal to the total of the producer's
rates. If not and the consumer can not keep up with the producers,
the buffer will eventually fill and the producers will be forced to
remain idle while waiting for the consumer.
An event count associated with the consumer is advanced
when a data element is removed from the buffer. Note that if the
producer and consumer event counts are initially equal, the number of
data elements in the buffer will be equal to the difference between the
event counts.
C MODULE CODING
The code for each of the program modules should then be
developed. The following sections provide an abbreviated listing of the
code for the modules.
1. Producers
PROG producer ([2] CHAN OF ANY link,
VAL INT in, out)
#USE "\ecslib\ecsproc.tsr" — the interface library#USE "globals . tsr" — global constants
INT my. ticket :
133
[data . element . size ] BYTE data .element :
WHILE TRUE
SBQ
— insert code to produce a data elanent
ticket (link, in, my. ticket)
await (link, in, my. ticket)
await (link , out
,
(it^. ticket - data.buffer. size) + 1)
put (link, in,
(ny.ticket + 1) \data.bxaffer.size, data. element)
advance (link , out
)
2. Consumer
PROC consun:Br([2]CHAN OF ANY link,
VAL INT in, out)
#USE "\ecslib\ecsproc.tsr" — the interface library#USE "globals.tsr" — global constants
INT itr/. count :
[data . element . size] BYTE data .element :
SEQmy.coxjnt :=
WHILE TRUE
SEQmy. count := my. count + 1
await (link, in, rtY. count)
get (link, out,
rry • count\data.buffer. size, data. element)
advance (link, out)
—• insert code for consuming the data element
D. APPORTIONMENT OF MODULES
To best demonstrate the nature of the abstracted programming
interface, the apportionment of modules will be done in two different
ways. The first way will assign all modules, shared memory, and event
134
counts to a single Transputer. The second way will distribute the
modules, shared memory, and event counts on three different
Transputers.
1. Single Transputer
Library aport.tsr
#USE "globals.tsr" — global constants
— symbolic constants for countsVAL in IS :
VAL out IS 1 :
— count node assignments (which node maintains the count)
VMi count. node IS [0, 0] :
— shared, memory segment sizes
VAL count. size IS [0, data.buffer. size] :
— network, adjacency rratri^ to match particular— physical configuration. This matrix matches— a clockwise ring on a BO03 board.VAL node. link IS [[0,2,2,2],
[2,0,2,2],
[2,2,0,2],
[2,2,2,0]] :
PROG nodeOO — procedure tc> run on Transputer
#USE "\ecslib\ecsproc.tsr" the interface library#USE "procs.tsr" — library of procedures#USE "aport.tsr" — apportionment structures
VAL INT node. id IS :
[8] [2]CHAN OF ANY links :
CHAN OF ANY pro. one. link IS links [4] :
CHAN OF ANY pro. two. link IS links [5] :
CHAN OF ANY consume
.
link IS links [6] :
PRI PARecs. kernel (links, node. id,
count . node, count . size,
node . link [node . id]
)
PARproducer (pro . one . link, in, out
)
producer (pro. two. link, in, out)
consumer (consume . link, in, out)
135
2. Three Transputers
Library aport.tsr
#USE "globals.tsr" — global constants
— symbolic constants for countsVAL in IS :
VAL out IS 1 :
— coiont node assignments (which node maintains the count)
VAL coxont.node IS [0, 2] :
— shared memory segment sizes
VAL count. size IS [0, data.bioffer. size] :
— network adjacency matrix to match particular—
• physical configuration. This matrix matches— a clockwise ring on a BO03 board.VAL node. link IS [[0,2,2,2],
[2,0,2,2],
[2,2,0,2],
[2,2,2,0]] :
PRDC nodeO ( )— procedure to run on Transputer
#USE "\ecslib\ecsproc.tsr" — the interface library#USE "procs.tsr" — library of procedures#USE "aport.tsr" — apportionment structures
VAL INT node. id. IS :
[5] [2] CHAN OF ANY links :
CHAN OF ANY pro. one. link IS links [4] :
PRI PARecs . kernel (links , node . id,
count . node, count . size,
node . link [node . id]
)
producer (pro . one . link, in , out
)
PRDC nodel ( ) — procedure to run on Transputer 1
#USE "\ecslib\ecsproc.tsr" — the interface library
#USE "procs.tsr" — library of procedures
136
#USE "aport.tsr" — apportionment structures
VAL INT node. id IS 1 :
[5] [2] CHAN OF ANY links :
CHAN OF ANY pro. two. link IS links[4] :
PRI PAR
ecs . kernel (links , node . id,
count . node, count .size,
node . link [node . id]
)
producer (pro . two . link, in, out
)
PROC node2 ( ) — procedure to run on Transputer 2
#USE "\ecslib\ecsproc.tsr" — the interface library#USE "procs.tsr" — library of procedures
#USE "aport.tsr" — apportionment structures
VAL INT node. id IS 2 :
[5] [2]CHAN OF ANY links :
CHAN OF ANY consume. link IS links [4] :
PRI PARecs . kernel (links , nocte . id,
count . node, count . size,
node . link [node . id]
)
consumer (consume . link, in, out
)
137
APPENDIX E
DETAILED SOURCE CODE FOR THE MESSAGE-PASSING ABSTRACTINTERFACE LIBRARY
A SYMBOLIC CONSTANTS
{{{ LIBLibrary ID
{ { { VMj declarations
{ { ( useful symbolic constantsVALVALVALVALVALVAL
}}}
( { { system limits—
, define the maximum ccmraunications data array sizeVAL MAX.MESSAGE.SIZE IS 1024 :
~ define the maximum number of syston nodes
VAL MAX.NCDES IS 16 :
— define the maximum number of global channelsVAL MAX.CHANS IS 256 :
— define the mazimum number of channels per nodeVAL MAX. PF!DC. PER.NODE IS 40 :
— define the mazimum number of processes per nodeVAL MAX. CHAN. PER.NODE IS 40 :
}}}
{ { { packet tags used during initialization
F.T.SR IS TRUE
NIL IS -1 :
SENDER IS :
oirrpuT IS :
RECEIVER IS 1 :
INPUT IS 1 :
VAL INIT. START IS 1 :
VAL START.ACK IS 2 :
VAL INIT.DATA IS 3 :
VAL DATA.JOC IS 4 :
VAL INIT . STOP IS 5 :
VAL STOP.ACK IS 6 :
VAL INIT.QUIT IS 9 :
}}}
{ {
{
packet tags used during cartnunication
VAL RECEIVER. READY IS 7 :
VAL SEND IS 8 :
138
}}}
}}}
}}}
R KERNEL PROCEDURE
PRDC csp. kernel (VMi INT node. id,
VAL []INT node. link,
VAL [][2]INT Chan.map,
[]CHAN OF ANY loc.chan)
descriptionThis procedure is the distributed kernel for a message passingbased model programming interface. This procedure should beexecuted in parallel with application program modules using the
interface. This procedure should be run at high priority andthe application modules run at low priority.
{ { { libraries
#USE "cspsymb.tsr"
}}}
{ { { declarations
{ { { local abbreviationsVAL INT no. h. links IS 4 :
VMi INT no. s.links IS SIZE chan.rrap :
VMi INT no . 1 . chan IS SIZE chan.map :
VAL INT no. links IS no. h. links + no. s. linksVAL INT num.nodes IS SIZE node. link :
}}}
{ { { channel declaration and placement[no.h.links*2]CHAN OF ANY hard. links:
PLACE hard, links AT :
[MAX.CHAN. PER.NCDE] [2]CHAN OF ANY links:
}}}
{ { { communications data structiires— definition of the communications header parts[3] INT header :
INT header. action. code IS header [0]
INT header. to. node IS header [1]
INT header. gchan IS header [2]
— definition of the communications packet data segmentINT data. size :
[MAX .MESSAffi. SIZE] BYTE data. array :
}}}
{ { { channel mapping data structures— kernel data structures for communications routing and— management[MAX.CHANS] [2]INT gchan. node.
139
gchan . Ichan
[MAX .CHANS ] INT Ichan . gchan
{ { { procedures
{ { { PROC get . local (local, return)
PRDC get , local (CHAN OF ANY local, return)
{ {
{
description— This procedure broadcasts local channel map data— to all nodes for network global structure— initialization
}}}
{ { { declarations— for receiving acknowledge of remote broadcast
INT acknowledge :
}}}
SEQ
{ { { broadcast start signals— Open a path to all nodes in the system.— Intermediate nodes receiving these start— tokens and relaying them will not shut-down— until a corresponding stop token is received.
SEQ to.node = FOR num.nodes
IF
to.node O node. idSEQ
local ! [INIT. START, to. node, node. id];
retxam ? acknowledge
ELSE
SKIP
}}}
{ { { broadcast all local link data— scan the channel map and build the local structures
SEQ chan.no = FOR no.l.chan
{ { { local abbreviations
VAL []INT global. ident RETYPES chan.map [chan.no] :
VAL []BYTE ident. array RETYPES chan.map [chan. no] :
VAL INT global. chan. id IS global. ident [0] :
VAL INT chan.mode IS global. ident [1] :
VAL INT link.no IS chan.no + no. h. links
}}}
SEQ— build local data structures
gchan. node [global. chan. id] [chan.mode] := node. id
gchan. Ichan [global. chan. id] [chan .mode] := link.no
Ichan. gchan [link.no] := global . chan . id— broadcast to other nodes
SEQ to. node = FOR num. nodes
IF
140
to. node <> node. id
SEQ
local ! [INIT.DATA, to. node, node. id]; 8 : : ident . arrayreturn ? acknowledge
ELSESKIP
}}}
{ { { broadcast done signals— notify other nodes that done with init transmission
SEQ to.node = FOR ntma. nodes
IF
to.node o node. idSEQ
local ! [INIT. STOP, to. node, node. id];
return ? acknowledgeELSE
SKIP
}}}
{ { { send local quit signal— notify that local transmissions over
local ! [ INIT . QUIT, node . id, node . id] ;
}}}
}}}
{{{ PBGC get. raTiote( remote, return)
PROC get . reciote (CHAN OF ANY rertote, return)
{ {
{
description— This procedure inputs remote nodes' channel map data— to all nodes for network global structure— initialization
}}}
{ { { local declarations[3] INT header :
INT ixiit.code IS header [0] :
INT to. node. id IS header [1] :
INT frem. node, id IS header [2] :
INT data. size :
[2] INT global . ident :
[]BYTE ident. array RETYPES global . ident :
INT global . chan . id IS global. ident [0]
INT Chan.mode IS global . ident [1
]
INT quit . count :
}}}
SEQ
{ ( { initialize— will receive at least 2 messages from every other node
141
quit. count := 2* (num. nodes - 1)
}}}
WHILE quit. count >
ALT link.no = FOR no. h. links— get a rressage from somev^erelinks [link.no] [INPUT] ? header; data.size: :ident .array
IF
{ { { for m/ node, process it
to. node. id = node. idSEQ
to. node. id := from. node. id— categorize and act on messageIF
{ ( { start messageinit.code = INIT. START
SEQ— acknowledge message to remote node
init.code := START .ACK
raiDte ! header; data.size: :ident. array
}}}
{ { { start ack
init.code = START.ACK— let local process know the ack rec'd
return ! NIL
}}}
{ { { data messageinit.code = INIT.DATA
SEQ
gchan.node[global. chan. id] [chan.mode] := fran. node. id
— acknowledge message to remote node
init.code := DATA.ACKremote ! header; data.size: :ident. array
}}}
{ { { data ack
init.code = DATA.ACK— let local process know the ack rec'dreturn ! NIL
}}}
{ { { stop messageinit.code = INIT.STOP
SEQ— one less message to get from a remote nodequit. count := quit. count - 1
— acknowledge message to remote node
init.code := STOP. ACKremote ! header ; data . size : : ident . array
}}}
{ { { stop ackinit.code = STOP .ACK
SEQ— one less message to get from a remote node
quit . count : = quit . count - 1
— let local process know the ack rec'd
return ! NIL
}}}
142
}}}
{ { { for a different node, forward it
ELSESEQ
remote ! header; data . size :: ident . array
IF
init.code = INIT. START— reserve a path for the sender
quit. count := quit. count + 1
init.code = START.ACKSKIP
init.code = HSriT.DATA
SKIP
init.code = DATA.ACKSKIP
init.code = INIT. STOP
SKIP
init.code = STOP.ACK— done with pass-thru reservation
quit. count := quit. count - 1
ELSESKIP
}}}
{ { { send local quit signal—
- done with remote receives
remote ! [INIT. QUIT, node. id, node. id];
}}}
}}}
{{{ PRDC itultiplex (local, remote)
PRDC multiplex (CHAN OF ANY local, reitote)
{ { { description— This procedure multiplexes locally generated initialization~ messages and 'pass-through' initilization messages onto— the hardware ccntnunication links
}}}
{ { { local declarations— packet header definition[3] INT header :
INT init.code IS header [0]
INT to.node. id IS header [1]
— init packet dataINT data. size :
[2] INT global. ident :
[]BYTE ident. array RETYPES global. ident
— local flags
BOOL local. done,
143
retote.done :
}}}
SEQ
{ { { initialize
local.done := FALSEremote.done := FALSE
}}}
WHILE (NOT local. done) OR (NOT rertote.done)
PRI ALT
{ { { rertotely generated nnessages
(NOT remote. done) & ratiote ? header; data. size : :ident.arrayIF
init.code = INIT.QUIT
ratiote. done := TRUE
ELSElinks [node .. link [to . node . id] ] [OUTPUT] ! header;
data . size : : ident . array
}}}
{ { { locally generated messages(NOT local. done) & local ? header; data. size :: ident. arrayIF
init.code = INIT.QUIT
local. done := TRUE
ELSElinks [node . link [to .node . id] ] [OUTPUT ] ! header
;
data . size : : ident . array
}}}
}}}
{ { { PROC map . out (local . link, local . chan)
PROC map. out ( [2]CHAN OF ANY local. link,
CHiysi OF ANY local. Chan)
{ {
{
description— Perform the data transfers needed to connect the output— end of a global channel to the receiving local channel
}}}
{ { { declarations— alternate name for the local bidirectional channel
CHAN OF ANY link. in IS local . link [ ] :
CHAN OF ANY link. out IS local. link [1] :
— for receiving signal to start transmission
INT any :
— data structure for transmitted messageINT data. size :
[MAX.MESSAGE. SIZE] BYTE data. array :
144
WHILE TRUE
SEQ
PAR— know that receiver is ready to receive
link . in ? any— get the transmitted data
local. chan ? data.size: :data.array— send the data via the kernel
link. out ! SEND; data.size: : data. array
}}}
{ { { PRDC nap . in (local . link, local . chan)
PRDC nap. in ([2] CHAN OF ANY local. link,
CHAN OF ANY local. chan)
{ { { description— Perform the data transfers needed to connect the input— end of a global channel to the transmitting local channel
}}}
{ { { declarations— alternate names for the local channelsCHAN OF ANY link. in IS local. link[0] :
CHAN OF ANY link.out IS local. link [1] :
— data structiire for received dataINT data.size :
[MAX.MESSAGE. SIZE] BYTE data. array :
}}}
WHILE TRUESEQ— identify to the kernel that the process is ready to receivelink.out ! RECEIVER. FEADY;— receive the transmitted data frem the keimellink. in ? data.size: :data. array— send data to the applicationlocal. chan ! data.size: : data. array
{ { { PROC buffer . link (chan . in, chan . out
)
PRDC buffer. link (CHAN OF ANY in. link, out. link)
{ ( ( description— This procedure provides a buffer for hard link irput.— This buffer increases the overall throughput of the node.
}}}
145
{ { { declarations
[MftX.MESSAGE. SIZE] BYTE data. array. 0,
data . array .
1
[3]I1SIT header. 0,
header . 1 ;
INT data. size. 0,
data . size .
1
}}}
SEQin. link. ? header. 0; data. size. : :data. array.
WHILE TRUE
SEQPAR
out. link ! header. 0; data.size.O: :data.array.O
in. link ? header. 1; data. size. 1 : idata.array.lPAR
out. link ! header. 1; data, size. 1 : :data. array ,1
in. link ? header. 0; data.size.O: :data.array.O
}}}
{ { { PRDC build packet (link.no)
PROC build.packet (VAL INT link.no)
{ { { description— Based on the local channel sending the rressage to the kernel,— construct an intemode conniunications packet.
}}}
SEQ— identify the global channel being usedheader.gchan := Ichan.gchan [link.no]— find the node to which the packet is to be routedIF
header. action. code = RECEIVER. READYheader. to.node
header . action . codeheader . to . node
ELSEheader . to . node
= gchan.node [header. gchan] [OUTPUT]= SE3SD
= gchan. node [header. gchan] [INPUT]
= node. id
}}
PRDC process.packet (link . no
)
PRDC process. packet (VAL INT link.no)
{ { { description
146
— This procedure performs communications routing and maintains— synchronization between the sending and receiving processes— by sequencing communications between the local and remote— processes
.
}}}
IF
{ { { packet for this node
header. to. node = node. idIF
header. action.code = RECEIVER. READY— let the sender know that receiver is ready
links [gchan.lchan [header. gchan] [OUTPUT]] [OUTPUT] ! SEND
header, action, code = SEISD
—• pass on the data packet to its destination
links [gchan . Ichan [header.gchan] [ INPUT] ] [OUTPUT] !
data . size : :data .array
}}}
{ { { packet for other node - forward it
ELSE
links[node. link[header. to. node] ] [OUTPUT] ! header; data. size :: data. array
}}}
}}}
}}}
PAR
{ { { buffer hardware links
PAR link.no = FOR no. h. links
PARbuffer. link (hard. links [link.no + no. h. links], links [link.no] [INPUT])
buffer. link (links [link.no] [OUTPUT], hard.links [link.no]
)
}}}
SEQ
{ { { initialization
{ { { initialize kernel data structures
SEQ i = FOR MAX.CHANSSEQ
Ichan. gchan [i] := NILgchan. node [i] := [NIL, NIL]
gchan. Ichan [i] := [NIL, NIL]
}}}
( { ( query local channels to build data structures
{ { { declare local channels for initializationCHAN OF ANY local,
remote,
return :
}}}
PARget . local (local, return)
get . remote (remote , return
)
multiplex (local, remote)
147
}}}
}}}
PAR
{ { { build mapping processes— Create the channel controller processes to connect the— local channels to global channel ends
{ { { local declarations
TIMER clock:
USTT time:
}}}
SEQ
{ { ( wait for init messages clear the network— This is a cobble to correct a problem with one— node corrpletinig initialization and sending a 'real'—
• message that is not properly handled by a node— that has not yet been initialized. Just wait for— a long time for the initilization messages to— clear the networkclock ? time
clock ? AFTER time + 5000000 — wait 5 seconds
}}}
PAR chan.no = FOR MAX. CHAN. PER.NODEIF — this channel is among those declaredchan.no < no.l.chan
IF — input or output global channel endChan.map [chan.no] [1] = INPUT
map.in(links [chan.no+no.h. links] ,loc.chan[chan.no]
)
ELSEmap.out (links [chan.no+no.h. links] ,loc.chan[chan. no]
)
ELSESKIP
}}}
{ { { monitor communications— accept input from channel controllers or from hard links
{ { { local declarations—
• The comnruinications monitoring procedure uses a 'fair' inplementation— of the ALT structure. In general, it provides that if a communication— was just received from one of several channels, that channel will have— the lowest priority for the next execution of the ALT. In this way,— no single communications channel cam 'starve' access to the kernel— from the other communications channels. The variables defined— below are used to track the last conmunicating channel for this— 'fair' ALT. Note that hard and soft links are treated separately.
HMT last.h, last.s:
}}}
SEQ
{ ( { initializationlast.s :=
last.h :=
}}}
WHILE TRUE
PRI ALT
{ { { handle external hardware links
ALT link.no = FOR no.h. links
148
header . action . code
;
links[ (link.no + last.h ) \no.h. links] [INPUT] ? header;
data . size : : data . array
SEQ
process. packet ( (link.no + last .h) \no.h. links)
last.h := (no. h. links - 1) + link.no
}}}
{ { { handle local soft links
ALT i = FOR no. s. links
links [((i + last. s)\no.s. links) + no. h. links] [INPUT] ?
data . size : : data . arrayVMj INT link.no IS ( (i + last. s ) \no.s. links) + no. h. links
SEQbuild .packet (link .no
)
process.packet (link . no
)
last.s := i + (no. s. links - 1)
}}
}}}
149
APPENDIX F
SAMPLE PROGRAM USING THE MESSAGE-PASSING INTERFACE
A. DESCRIPTION
This appendix provides an example of the met±iodology employed
to write a program using the message-passing interface developed for
this thesis. The programming example selected is the same bounded
buffer problem used in Appendix D to demonstrate programming with
the shared-memory interface. This problem consisted of two produc-
ers and one consumer with a bounded buffer or queue between them.
a MODULARIZATION
The modularization of the bounded buffer problem for program-
ming under the message-passing interface is similar to that specified
for programming with the shared memory interface. As with the
shared memory modularization, the producers and consumer are indi-
vidual modules. However, with the shared memory interface, the
buffer between the producers and consumer is a natural consequence
of the memory shared between the modules. In the message-passing
scheme, however, this buffer must be defined and coded as a separate
module.
1. Producers
Each instantiation of this module generates a continuous
stream of data elements at a specified average rate with some charac-
150
teristic random variation in the rate. The data elements produced are
output to a buffering process via a global communications channel.
2. Consmner
This module continuously attempts to input data elements
from a global communications channel. Data elements input are con-
sumed at a specified average rate with some characteristic random
variation in the rate. The average rate for the single consumer should
be at least equal to the total of the producer's rates. If not and the
consumer can not keep up with the producers, the buffer will eventu-
ally fill and the producers will be forced to remain idle while waiting
for the consumer.
3. Buffer
This module inputs communications from two global channels
and queues the input data elements for output on a third global com-
munications channel. Internally, this buffer should exhibit first-in,
first-out queue characteristics.
C MODULE CODING
The code for each of the program modules should then be devel-
oped. The following sections provide an abbreviated listing of the
code for the modules.
1. Producers
PROG producer (CHAN OF ANY output)
#USE "\csplib\cspproc.tsr" — the interface library#USE "globals.tsr" — global constants
151
[data.element. size] BYTE data. element :
WHILE TRUE
SBQ
— insert cede for producing a data element
output ! data. element
2. Consumer
PRDC consumer (CHAN OF ANY input)
#USE "\csplib\csFproc.tsr" — the interface library
#USE "globals . tsr" ~ global constants
[data.element. size] BYTE data. element :
WHIIE TPUE
SBQ
input ? data . element
— insert code to consume data elotient
3. Buffer
PRDC buffer(CHAN OF ANY buff .in. 1, buff .in. 2, buff .out)
— Note, procedure written in a manner to illustrate some—
- features of the OCCAM prograirming language; not for— efficiency
#USE "\csplib\cspproc.tsr" — the interface library
#USE "globals . tsr" — global constants
[buffer. size] CHAN OF ANY buff .chan :
PAR
— accept input from either producer[data. element .size] BYTE data . elerient :
WHILE TRUE
ALTbuff . in . 1 ? data . element
buff. chan [0] ! data. element
152
buff. in. 2 ? data. element
buff. Chan [0] ! data. element
— buffer the input
PAR i = FOR buffer. size - 1
[data.element. size] BYTE data. element :
WHILE TRUE
SEQ
buff. Chan [i] ? data. element
buff .chan[i + 1] ! data.eleinent
— send from the buffer[data. element. size] BYTE data.elerrent :
WHILE TRUE
SEQbuff .Chan [buffer. size - 1] ? data. element
buff. out ! data .element
D. APPORTIONMENT OF MODULES
As is done for the shared-memory interface, the apportionment of
modules for the message-passing interface is done in two different
ways. The first way will assign all modules to a single Transputer. The
second way will distribute the modules on four different Transputers.
1. Single Transputer
Tlibrary aporrt.tsr
#USE "globals.tsr" — global constants
— symbolic constants for global channelsVAL prol.global IS
VAL pro2. global IS 1
VAL con. global IS 2
— network adjacency matrix to match particular— physical configuration. This matrijx; matches— a clockwise ring on a BOOS board.VAL node.lijnk IS [[0,2,2,2],
[2,0,2,2],
[2,2,0,2],
[2,2,2,0]] :
PROC nodeOO — procedure to run on Transputer
153
#USE "\csplib\csFproc.tsr" — the interface library#USE "procs.tsr" — library of procedures#USE "aport.tsr" — apportionment structures
VAL INT node. id IS :
VAL [] [2]INT Chan.map IS [ [prol. global, SENDER]
,
[prol . global , RECEIVER]
,
[pro2 .global, SENDER]
,
[pro2 .global, RECEIVER]
,
[con . global, SENDER]
,
[con.global, RECEIVER] ]
[SIZE Chan.map] CHAN OF ANY loc.chan :
— abbreviations for convenience onlyCHAN OF ANY pro. one. out IS loc.chan[0]
CHAN OF ANY buff . in . one IS loc.chan[l]
CHAN OF ANY pro. two. out IS loc.chan [2]
CHAN OF ANY buff .in.two IS loc.chan [3]
CHAN OF ANY bxoff .out IS loc.chan [4]
CHAN OF ANY con. in IS loc.chan [5]
PRI PARcsp . kernel (node . id, node . link [node . id]
,
Chan.map, loc.chan)
PARproducer (pro . one . out
)
producer (pro .two . out
)
buffer (buff .in. one, buff .in. two, biaff.out)
consi:imer (con . in)
2e
Library aport.tsr
#USE "globals.tsr" ~ global constants
— symbolic constants for global channelsVAL prol. global IS
VAL pro2. global IS 1
VAL con. global IS 2
— network adjacency matrix to match particular— physical configuration. This matrix matches— a clockwise ring on a B003 board.
VAL node. link IS [[0,2,2,2],
[2,0,2,2],
[2,2,0,2],
[2,2,2,0]] :
154
PRDC nodeO ( ) — procedure to run on Transputer
#USE "\csplib\cspproc.tsr" — the interface library
#USE "procs.tsr" — library of procedures
#USE "aport.tsr" — apportionment structures
VKL INT node. id IS :
VAL [][2]INT chan.map IS [ [prol. global, SENDER]]
[SI2E Chan.imp] CHAN OF ANY loc.chan :
—• abbreviation for convenience only
CHAN OF ANY pro. one. out IS loc.chan[0] :
PRI PARcsp. kernel (node. id, node . link [node . id]
,
chan.nap, loc.chan)
producer (pro . one . out
)
PROC nodelO — procedure to run on Transputer 1
#USE "\csplib\cspproc.tsr" — the interface library#USE "procs.tsr" •— library of procedures#USE "aport.tsr" — apportionment structures
VAL INT node. id IS 1 :
VAL [][2]INT chan.map IS [ [pro2.global, SENDER]]
[SIZE chan.map] CHAN OF ANY loc.chan :
— abbreviations for convenience onlyCHAN OF ANY pro. two. out IS loc.chan [0] :
PRI PARcsp. kernel (node. id, node. link [nocte. id]
,
chan.map, loc.chan)
producer (pro . two . out)
PROC node2 ( ) — procedure to run on Transputer 2
#USE "\csplib\cspproc.tsr" ~ the interface library#USE "procs.tsr" — library of procedures#USE "aport.tsr" — apportionment structures
VAL INT node. id IS 2 :
VAL [][2]INT Chan.nap IS [ [prol. global, RECEIVER]
,
[pro2 .global, RECEIVER]
,
155
[con.global, SENDER]
]
[SIZE Chan.nap] CHAN OF ANY loc.chan :
— abbreviations for convenience onlyCHAN OF ANY buff .in. one IS loc.chan[0]
CHAN OF ANY buff. in. two IS loc.chan[l]CHAN OF ANY buff. out IS loc.chan [2]
PRI PARcsp. kernel (node. id, node. link [node. id]
,
Chan .map, loc . chan)
buffer (buff .in. one, buff .in. two, buff .out)
PRDC node3() ~ procedure to run on Transputer 3
#USE "\csplib\cspproc . tsr" — the interface library#USE "procs.tsr" — 1 -ibrai-y of procedurBS#USE "aport.tsr" — apportionment structures
VAL INT node. id IS 3 :
VAL [][2]INT chan.map IS [ [con .global, RECEIVER]
]
[SIZE Chan.map] CHAN OF ANY loc.chan :
—- abbreviations for convenience only
CHAN OF ANY con.in IS loc.chan[0]
PRI PARcsp. kernel (node. id, node . link [node . id]
,
chan.map, loc.chan)
consumer (con . in)
156
LIST OF REFERENCES
[Br87] Bryant, Gregory R., Transputer Instruction Set Disassembler,August 1987.
[Ga86] Garret, D. R, A Software System Implementation Guide andSystem Prototyping Facility for the MCORTEX Executive on the RealTime Cluster, M.S. Thesis, December 1986, Naval PostgraduateSchool, Monterey, California.
[Ha87] Hart, Simon J., Design, Implementation, and Evaluation Of AVirtual Shared Memory System In A Multi-Transputer Network, M.S.Thesis, December 1987, Naval Postgraduate School, Monterey,California.
[Ho79] Hoare, C. A. R., "Communicating Sequential Processes,"Communications of the ACM, vol. 21. no. 8, pp. 666-677, August 1978.
[In86] Product Information The Transputer Family, March 1986,INMOS Ltd., Bristol, United Kingdom.
[In87a] Transputer Development System 2.0, Programming Interface,April 1987, INMOS Ltd., Bristol, United Kingdom.
[In87b] Transputer Reference Manual, January 1987, INMOS Ltd.,
Bristol, United Kingdom.
Iln87c] T800 Preliminary Data Sheet, INMOS Ltd., 1987, Bristol,
United Kingdom.
[In87d] The Transputer Instruction Set—A Compiler Writer's Guide,May 1987, INMOS Ltd.. Bristol, United Kingdom.
[In87e] T424 Transputer Reference Manual, INMOS Ltd., 1987,Bristol, United Kingdom.
[Pe88] Discussion with Mr. Laurie Pegum, April 1988. INMOS Ltd.,
Colorado Springs, Colorado
[PoMa87] A Tutorial Introduction to OCCAM Including LanguageDefinition, March 1987, INMOS LTD., Bristol, United Kingdom.
157
[Re85] Ingres Reference Manual, 1985, Relational Technology Inc.,
Alameda, California.
[ReKa79] Reed, D. P., and Kanodia, R. K., "Synchronization withEvent Counts and Sequencers," Communications of the ACM, vol. 22,
no. 2, pp. 115-123, February 1979.
[Va87] Vanni, Filho J., Test and Evaluation of the Transputer in aMulti-Transputer Configuration, M.S. Thesis, June 1987, Naval Post-
graduate School, Monterey, California.
158
INITIAL DISTRIBUTION LIST
No. Copies
1. Defense Technical Information Center 2Cameron StationAlexandria. VA 22304-6145
2. Library, Code 0142 2Naval Postgraduate SchoolMonterey, CA 93943-5002
3. Department Chairman, Code 52 1
Department of Computer ScienceNaval Postgraduate SchoolMonterey, CA 93943
4. Dr. Uno R. Kodres, Code 52Kr 3Department of Computer ScienceNaval Postgraduate SchoolMonterey. CA 93943
5. Major Richard A. Adams, USAF, Code 52Ad 3Department of Computer ScienceNaval Postgraduate SchoolMonterey, CA 93943
6. Defense Technical Information Center 2Cameron StationAlexandria, VA 22304-6145
7. Daniel Green, Code 20F 1
Naval Surface Weapons CenterDahlgren, VA 22449
8. Jerry Gaston, Code N24 1
Naval Surface Weapons CenterDahlgren, VA 22449
9. Captain J. Hood, USN 1
PMS 400B5Naval Sea Systems CommandWashington, DC 20362
159
10. RCA AEGIS RepositoryRCA CorporationGovernment Systems DivisionMail Stop 127-327Moorestown, NJ 08057
11. Library (Code E33-05)Naval Surface Weapons CenterDahlgren, VA 22449
12. Dr. M. J. Gralia
Applied Physics LaboratoryJohns Hopkins RoadLaurel, MD 20702
13. Dana Small Code 8242Naval Ocean Systems CenterSan Diego, CA 92152
14. Lieutenant Commander G. R. Bryant, USNMare Island Naval ShipyardVallejo, CA 94592
15. Lieutenant Commander S. J. HartRoyal Australian NavyCombat Data Systems Centre84 Maryborough StreetFyshwick, ACT 2610AUSTRALIA
15. AEGIS Modeling Laboratory. Code 52Department of Computer ScienceNaval Postgraduate SchoolMonterey, CA 93943
16. Lieutenant W. R. Cloughley. USN1448 47th StreetSacramento, CA 95819
160
? - S9a.
ThesisB82923c.l
Bryant
,J^^^S^ i^'Plementation
and evaluation of anabstract programming andcommunications interfacefor a network of trans-puters.
d
e
Thesis
t82923c.i
Bryanti)ogi|frn5 iiupiementation
and evaluation of an
abstract programming and
communications interface
for a network of trans-
puters.
i^A KA»^