improving digital computer performance using residue number theory

Improving Digital Computer PerformanceUsing Residue Number Theory

ROY D. MERRILL, JR., MEMBER, IEEE

Summary-Residue arithmetic has the interesting characteristic to conventional digital systems, addition, subtractionthat in multiplication, addition and subtraction any digit in the and miultiplication can be performed in the residue num-result is dependent only on its two corresponding operand digits. ber system quite rapidly, while detecting sign and theConsequently, for these operations, residue arithmetic is inher- vently faster than the conventional weighted arithmetics. A system occurrence of overflow is a difficult and time-consum-design approach for exploiting the desirable characteristics of both ing task to accomplish.residue and conventional number theory in digital computers is the The purpose of this paper is to present an approach toprincipal topic of this paper. Criteria for selecting the moduli, and the design of computing systems which would take ad-general techniques for implementing the residue arithmetic opera- vantage of the desirable properties of both conventionaltions by simple modifications of conventional circuitry are described. o i

A specific system with a conventional word length of 25 bits and a and residue number systems. The main object is toresidue system with moduli 128, 127, 63 and 31 are treated in detail, demonstrate that a system of this type would be practi-showing that in comparison to the conventional mode of computation, cal in applications in which certain classes of problemsresidue arithmetic addition and subtraction are 3 times faster and are encountered. Of particular interest are those prob-multiplication is 12 times faster. As an example of the usefulness ofthe approach, the problem of solving systems of simultaneous linear lem swhc ruireclarge anu ofr multilcations,equations is considered. It is shown that in obtaining solutions by additions and subtractions and infrequent scalings, suchresidue arithmetic, the residue mode computation time approaches as might be the case in generating correlation functionsone sixth that required in the conventional mode as the equation [3], solving sets of simultaneous linear equations [6],systems become more complex. and performing exact matrix inversions [2].

The general requirements of the organization of aINTRODUCTION computing system incorporating both conventional

C9ONSIDERABLE attention has been given re- binary and residue computation capability are discussedcently to the use of residue number theory for first, followed by a general description of techniques forcomputation. Of particular interest are the resi- performing the residue arithmetic operations of addi-

due comiputer system design approaches presented by tion, subtraction, multiplication, and scaling. In de-Svoboda [1], Takahashi and Ishibashi [2], and Cheney veloping these techniques, criteria are established for[3] which take advantage of inherent characteristics of selecting the set of moduli to be used in the residue sys-residue number theory to implement certain classes of tem so that the various arithmetic operations can beproblems. Szab6 [4] and Keir, Cheney and Tannen- easily implemented using conventional arithmetic cir-baum [5] made important contributions to the general cuitry. A specific system is described including thetheory in their discussion on sign detection, overflow organization and equipment requirements for mecha-determination, and general division in residue computer nizing the residue arithmetic operations of addition, sub-systems. traction, multiplication, sign determination and scaling,The principal advantage of computing in the residue as well as the residue encoding and decoding functions.

number system is that in multiplication, addition and The utility of the system design approaclh is consideredsubtraction any particular digit of the result is de- in describing the procedures and computing require-pendent only on the corresponding operand digits. This ments for solving systems of simultaneous equationsproperty eliminates carries from digit to digit for all using residue arithmetic.three arithmetic operations and removes the need to In the subsequent discussion it will be assumed thatform partial products in multiplication. However, two the reader is familiar with the basic concepts of residuebasic disadvantages arise when computing in this system: arithmetic as presented by Garner [7] and Szabo [4].first, the sign of a residue encoded number is a function NOTATIONS AND DEFINITIONSof all residue digits of that number [4]; and second,overflow resulting from addition or multiplication is not N= Number of modulidiscernible during the actual operations since carries mi= ith modulusfrom digit to digit, particularly the most significant M=Range of the residue number sys-digit, do not occur in residue arithmetic [5]. In contrast tem, i.e., the number of different

integers which can be representedwithout ambiguity in the residue

Manuscript received July 1, 1963; revised January 2, 1964. This sse hrwork was supported by the U. S. Air Force under Contract No. sse hrAF33(657)-8777.

The author is with the Electronic Sciences Laboratory, Lockheed NMissiles and Space Company, A Group Division of Lockheed Air- M- I micraft Corp., Sunnyvale, Calif. i=i

93

94 IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS April

|X m = The least positive residue of the INPUT F11to1 ADDEROR MEMORY - - -

number x mod mi called the ithresidue digit of x Sl.OUTPUT KSIiFT LEFT(SL) INPUT

Sit INPItr -- --_-S1HIFT HIGHT(SR) OUTPUT[a, b) =Set of all integers in the interval INPUTFMMEM oIY -.-

s *r ~~~~~~~AUXXILIARY REGIS'I'EIl oRtraversed when proceeding from a ADDER INPUTCARRYto b in the sense of increasing num- _NINAEYSITPOSITION)K-B1T BINARYbers, where a is a number of the set OIJTPIT CAltRY ADDER

(FIIOM M0AT 8SNSFICANT_but b is not BIT SITION)

I1/ m= Iultiplicative inverse of a mod mi;i.e., 0 < /alm, < mi and (a)||1/ai5mialmi= 1

of in; i.e., ~INPIUTmi,-a m= Additive inverse of a iod mi; I.e., IOMllAIDERl ORlas defined by the identitv MEMORY ---- SEGMENTED

|mi-al|mi +a 0m t INPUT FROM Slt.- OUTP.UT)

[q/l] = The largest integer equal to or AUXAItY

smaller than q/l ADDER SEGMENTED

[q/l ]mi Denotes [q/ll] ADOUDUT43 k2 kI

KECARRY

Pairwise relatively prime moduli = A set of moduli wherethe greatest common divisor of mi and m1(i$j) is 1.

(b)COMPUTER SYSTEM ORGANIZATION Fig. I-Computer organization. (a) Organization for the conventional

computation mode. (b) Organization for the residue computationThe basic system design approach is best described mode.

as the modification of the arithmetic and control unitsof a parallel-organized conventional computer so that TABLE Icomputation routines can be executed in either the COMPUTER SYSTEM PERFORMANCEweighted binary or residue number system. The arith-metic unit is utilized to execute residue system routines Conventional Residue Computationby the partitioning of the adder and shift register, from peAdd Cyclesr tional Add CyclesperOpration per Operation*a control standpoint, into several segments. Each cor- __per_Operation*responding adder and shift register segment has the Addition 11/3same number of bit positions. The number of bits in SMubltrapliction 12 (average) 1 (average)each segment is that required to binary encode the 24 (maximum) 2 (maximum)

residu.diiof te pmodulus. The adder-shift Input Conversion - 4/3residue digit of the particular moauius- l ne aauer-8nlit Output Conversion - 11/3 (14/3 if x<O)register segments are controlled separately during the Sign Detection <1 5/3residue computation mode so that certain shift and * The equivalent add cycles with respect to a conventional arith-carry bit-transfer operations peculiar to each of the metic unit having look-ahead carry are obtained by multiplying bymoduli can be accomplished. Since, in general, storage 3/2[83.requirements are about the same for each computationmode, memory organization and retrieval procedures need is composed of moduli 128, 127, 63 and 31) can be sum-not be changed. Thus the control unit must be mnodified marized as follows: A conventional arithmetic unit with-to accept residue add and shifting instructions, but need out carry-look-ahead circuitry is assumed and, sincenot be altered for memory transfer and clear operations. binary encoding the residues of the largest moduli re-This organization concept is shown schematically in quires 7 bits, the residue add cycle is taken as one thirdFig. 1 where ki, k2, k3 and k4 are the number of bits re- the conventional add cycle.' The arithmetic and controlquired to encode the residue digits of moduli m1, M2, m3 units were modified according to Fig. 1, requiring ap-and m4, respectively, and K is the word length of the proximately 140 additional AND gates, the majority ofconventional coimtputer. which have only two inputs, to execute combinationally

In succeeding paragraphs, general considerations re- certain shifting and gating operations peculiar to thelating to the residue system selection and the mecha- residue mode. Interpreting residue mode instructionsnization of residue arithmetic are presented followed by requires an estimated 40 gates, each with an averagethe description of a particular system wherein specific Of up to five inputs depending on the instruction codeaspects of the design approach are discussed. The results used. Systeml performance in the residue mode com-obtained in the investigation of the specific system con-sisting of a parallel-organized conventional computerwith a 25-bit word length modified to handle weighted *' If the conventional adder incorporates carry-look-ahead cir-, . ~~~~~~~cuitry, the basic residue add cycle interval would be approximatelybinary and residue arithmetic (where the residue system one half the conventional add cycle interval [81.

1964 Merrill: Improving Digital Computer Performance 95

pared with that in the conventional mode is given in technique. Multiplying by 2i is accomplished in theTable I in terms of equivalent conventional add cycles. mod 2k shift register exactly as in a conventional k-bitThe tabulations for the conventional mode are based shift register because 2i 2k=Ofor allj>k. However, iton a system where the shifting is included as part of the can be shown that |2i 2ki-= 21 ilk forj=0, 1, 2, * andadd cycle. The shift register is assumed to be able to k = 2, 3, 4, * ; therefore, multiplying by a power of 2shift one, two, three or four stages. The average multi- in a mod (2k -1) shift register is accomplished as in aplication time assumes fixed-point multiplication where conventional k-bit shift register except that each bitthe number of bits in the multiplier has a uniform dis- that is shifted out is recirculated into the register'stribution. The tabulations for the residue mode are de- lowest bit position. In general, multiplying by a powerrived in later portions of this paper. of 2, mod 2k or (2k - 1) only requires shifting operations

and no additions. For both mod 2k and (2k -1) multi-IMPLEMENTING RESIDUE ARITHMETIC OPERATIONS plication, k partial products will be formed, hence multi-By selecting moduli that are powers of 2 and one less plication requires at most (k - 1) residue additions or,

than powers of 2, residue addition, subtraction and with the residue addition mechanization described,multiplication can be implemented very efficiently us- (k -1) k-bit binary additions.ing conventional binary arithmetic circuitry. (This as- Scalingsumes that the residue digits are binary encoded.)

Scaling by one or more of the system moduli in theResidue A ddition residue system is analogous to division by powers of 2

Addition modulo 2* of two residue digits, each modulo (shifting right) in a conventional binary system. Scaling2k, is equivalent to conventional k-bit binary addition is accomplished by a modified residue to mixed radixexcept that the most significant bit carry is ignored conversion process [5 ]. The residue to mixed radix con-since 12i 2k = 0 for all j>k. Addition modulo (2k-1) version, or simply the mixed radix conversion, providesof two residue digits, each modulo (2.k 1), is equivalent a means of transforming a residue-encoded integer x

to conventional k-bit binary addition with end-around into the mixed radix representationcarry since I2k 2k_1= 1. As in the case for l's comple- x = pNmim2 . myiment subtraction in conventional binary arithmetic, the + pNmim2 . . mN-2 + + pl (1)time allotted for addition of two numbers with end-around carry need not exceed the time required when where the mi are the residue system moduli, the piend-around carry is not used.2 Consequently, addition are the mixed radix digits for x, and 0 < pi < mimod (2k- 1) can also be accomplished in one k-bit (i= 1, 2, , N). Any integer in the range [0,M) canbinary addition. (With this type of implementation, be represented in this form and hence this representa-O mod (2k -1) has two binary representations, 0 and tion has the same range as the residue system, provided2k -1.) The add times presented do not include memory the moduli are all pairwise relatively prime [4]. Thetransfer time which should be the same for correspond- mixed radix conversion process is iterative in natureing residue mode and conventional mode computations. and hence must be executed in a sequential manner.

From (1), and letting N=4 for convenience,Residue Subtraction

Pi = |X 1mResidue subtraction can be accomplished by adding,mod mi, the subtrahend's additive inverse to the minu- P2 = L__piend. The additive inverse for XI 2&k1 is 2k-1-I XI 2k_1 ml m2or the 1's complement of the binary representation for x-PiIx 2ki_. The additive inverse for Ix12k is 2k- XI2k or - P2the 2's complement of the binary representation for mlIx 2I. By utilizing a k-bit adder which accepts a carry m2 M3input in the lowest bit position, mod 2 subtraction can -be accomplished in one k-bit addition interval by form- - P2ing the l's complement of the binary representation for mi _the subtrahend and adding the result to the minuend M2while at the same time introducing a 1 into the lowest P4 = inm3 4.bit position carry input.

The conversion process is shown in tabular form inResidue Multiplication Table II. In general, the conversion process requires

Multiplication, both mod 2k and mod (2k 1), can 2(N- 1) procedural steps where N is the number ofbe accomplished by the conventional shift and add moduli.

3Szabo, see [4], Appendix I, for numerical example of the con-2 Richards, see [91, pp. 119-126. version procedure.


TABLE IIMIXED RADIX CONVERSION

Step Operation X |mlI X In2 XI.m X [i.

1 Subtract ] Pl mi: x-plm2pxip I iJm| x-plim4

2 Multiply by - mj: | m Pi| PIMI ~~~~~~~~MlM2 M MlMl m4

3 Subtract [p2 Imi: P2-p2 Im P2-p2 Im4

4 Multiply by -|s: P2-P2 P2- 2M2 M2 MS ~~~~~~~~~~m2m4

5 Subtract P3 Imi [P3- P3 Im4

6 Multiply by mi: p3 P3m3 m3 Md

where

P2= []

P2-p2 Xx -P3= -=m2 Lm2mj

Suppose we wish to determine the residue encoded the range of the residue system should be very close, ifvalue of x scaled by m1 and M2, that is, [X/m1m2]mi not equal to, the binary computer range. The set of(i= 1, 2, 3, 4). Note from Table II that on completion of moduli are selected in the formstep 4, [x/mlm2] mod m3 and M4 are available. Further,since O<x <11I, (22, 2k2- 1,2k3 1, ..,2 - 1)

[X/mlm2] = P3 + P4M4. so that the arithmetic operations can be implementedin parallel using the conventional arithmetic circuitry

Hence, to determine [x/mlm2] mod mn1 and M2, form as stipulated in previous paragraphs. Further, this set|P3 m, and |P3 m2 on step 5, p4m41 ml and Ip4m4 M2 on must be pairwise relatively prime so that the range ofstep 7, and P3+P4m441 m and P|-3+p4m4 m2 on step 8. It the residue system, M, is to equal the product of thecan be shown that scaling by other subsets of moduli moduli. In fulfilling the second and third requirements itcan be accomplished in a similar manner by reordering is obvious that M52K, the range of the binary com-the moduli and/or starting the extension process on a puter. Therefore, so that the residue arithmetic opera-different step of the conversion process. In any event, tions can be executed in a parallel manner and thescaling will take eight procedural steps for four moduli binary encoded residue digits of a number can be storedor, in general, 2N steps for N moduli. It can be verified in one computer memory word length, M<2K. Notethat by selecting the moduli as powers of 2 and one less that a set of pairwise relatively prime modulithan powers of 2, multiplying by a constant of the type|11/milm or milmj (iXj) is equivalent to a simple (2kl, 2k2 - 1, 2k3 1, * ,2kN-1)shifting operation in a majority of the cases. For exam-ple, 1/128131=4 and |1/31 63= -21 63. Consequently, can always be chosen such that 2K-1 < M < 2K. Iffor the systems under consideration the scaling process k2, k3, * - *, kAv are chosen such that M2, M3, MN,iwill usually require approximately N additions and N are pairwise relatively prime andshifting operations.NI~~~~~~~~~~~~~~~~

fk1 = Kf- kt,SELECTING THE RESIDUE NUMBER SYSTEM =

In choosing a residue system which can be imple- then the set (in1, in2, * inN) will always be pairwisemented efficiently in a conventional computer, three relatively prime andrequirements arise: first, each modulus should be a fpower of 2 or one less than a power of 2; second, the set 2K-i < 2k1 yH (2ki 1)of moduli should be pairwise relatively prime; and third, =


provided k>2 since4 plication in the residue mode, it will be assumed thatN 1 1 each shift register segment can shift one, two, three ortI 1 -- > _ . four stages during the residue add cycle. With the capa-i=2_ 2k 2 bility of shifting over zeros, the average residue multi-

Some examiples of residue system moduli sets of inter plication time will be approximately three residue add

est are (32, 31, 15, 7) for K=17, (128, 127, 63, 31) for cycles.K=25, and (265, 255, 127, 31) for K=28. Table IIIcontains a partial list of moduli (of the form 2k and Input-Output Conversion2k 1) and their prime factors. The absolute magnitude of an integer x coded in the

weighted binary system will be denoted byTABLE III IXI = X23223 + X22222 ± + Xo (2)

A PARTIAL LIST OF MODULI OF THE FORM 2k ANDk -1AND THEIR PRIME FACTORS where the xi are binary digits and X24 iS the sign bit. To

Moduli Prime Factors residue encode x it must first be ascertained if x falls3 within the range of the residue code. This can be accom-7 plished by determining if - M/2 < x < (M/2) - 1. This

i135 5 can be implemented combinationally or by subtracting31-63 3, 7 x from M/2 and checking the sign of the result, taking127 - into consideration the sign of x. If x is not in the interval255 3, 5511 7, 73 specified above, then it must be scaled by 2 before evalu-1023 3,11, 31 ating the residue digits. From (2)2047 23,894095 3,5,7,13 I8191 - XI 127 =I (X20 + (X13 + X6))26 + (X19 + (X12 + X6))25

2(k= 1, 2, 3, 4, )| 2+ (X18 + (xll + X4))24 + (X17 + (x10 + x3))23

+ (X23 + (X16 + (X9 + X2)))22A SPECIFIC SYSTEM + (X22 + (X15 + (x8 + Xi)))2

The system design approach will be demonstrated by + (X21 + (X14 + (X7 + X0))) 1127 (3)examining the requirements for a particular system.The residue system to be considered has the moduli 128, where if x>0, XI 127 is given by (3), otherwise the l's127, 63, 31 and a range M= 31,747,968 where the residue complement of the binary representation for ||x 1127 isrepresentation of |x IM is considered positive if 0 < x m formed to obtain XI 127. Eq. (3) can be calculated with<M/2 and negative otherwise. The binary computer three shifting or gating operations together with threeused in implementing the residue system is parallel- residue additions. The shifting or gating operation refersorganized having a 25-bit word length. to the operation which, for example, supplies the oper-

and bits X6 and x13 to the sixth bit location of the mod-Addition, Subtraction and Multiplication 127 adder prior to the first residue addition, then sup-

Techniques for implementing these operations have plies X20 to that location following the first but prior tobeen described. Mod 128 and 127 addition and subtrac- the second residue addition. The IXl 63 and x 3 aretion require the equivalent of a 7-bit binary addition, formed in a similar manner, each requiring four residuewhile Mod 63 and 31 addition and subtraction require additions along with four shifting or gating operations.a 6-bit and 5-bit binary addition, respectively. In all Because 2i 128= 0 for j>7, XI 128 is determined by thecases it can be assumed that, during subtraction, the first seven lower-order bits of x where, if x is negative,subtrahend additive inverse is formed during the add one residue addition is required to obtain the additivecycle. The residue add cycle will be taken as the time to inverse of flxli128.perform a 7-bit binary addition, or one third the con- Translating from the residue to the weighted binaryventional add cycle time since carry-look-ahead is not representation requires two steps: first, a residue toemployed. mixed radix conversion, and second, a mixed radix to

Residue multiplication will take at most six residue weighted binary conversion.add cycles. However, to take full advantage of multi- The first conversion is accomplished as shown in

Table II. The system moduli are ordered as m1= 128,4Hardy and Wright, see [10], p. 284, from w-hich m2= 31, m3= 63, and m4= 127.5 With this moduli order-

i=l (1l - x6 -)( 111~(~'')poie There are two advantages in selecting this ordering: first, allmoduli multiplicative inverses required in the conversion process are

from which it can be shown that for x= 1/2 powers of 2 except for 11/31 1l27= 12 and, consequently, multiplyingX ~~~~~~~~~~~~bythese inverses can be accomplished by simple shift-left operations;

II (1 -(1/2)i) > 0.5775756. and second, the first seven bits of x are given by Ix1128 making possi-i-2 ble a simple mixed radix to weighted binary conversion [see (4)1.


ing, the mixed radix digits can be evaluated in five accomplished in approximately the same length of timeresidue add cycles accompanied by three shifting or required in the conventional mode.gating operations. The scaling procedure for the residue system underThe mixed radix to weighted binary conversion can consideration is illustrated in Table IV for the case in

be accomplished as follows. Using the binary encoded which mlm2 * mj2= (31)28, 1<28<128. The generalvalues of the mixed radix digits, scaling procedure can be used to scale by mlm2 mj232x =_([IP4211 + p325+ p2} where 0<j<N and 0<s<kN since

- ( p426 + p3 + p421)] + p4)27 + pi (4) [ 1 =2kN F X ]ml2***mj282 ml2***M,mNwhere, since O<pj<128 O<P2<31, O<P3<63 and MIM2 iM1m2

0.<P4<127, the quantities within the } can be r xformed by suitable shifting and gating while the quanti- -Lmm2 Ma-mNties within the ( ) and the [ ] can be evaluated by one +Li2 Nconventional 18-bit binary addition and subtraction,respectively. If x > M/2, then x is negative and the cor- provided mN = 2kN > mi (i # N).7 The residue representa-rect weighted binary value is obtained by subtracting x tions for [x/mlm2 ... mj2s] is obtained by the follow-from M after the mixed radix to weighted binary conver- ing steps:sion.The sign of x, when coded in the residue system, can Step 1: Determine [x/mpm2 mjmN]i (i= 1, 2,

be determined in five residue additions and three shift- 3, 4) using the scaling procedure presented in Tableing or gating operations by examining the mixed radix IV where for convenience it was assumed that j =1digits of x obtained as for (4). Specifically, M/2 <x <M and N=4.ifff= where the logic equation forfi Stepf2:Calculate 2kN- s[x1/mm2 m1mi] (t- 19

2, 3, 4) by shifting [x/mi . . m,mN]mi left (kN-s)f = P46P45P44P43P42P41P4n (p35Vp34P33P32P31P30 bit positions in each corresponding mi shift register

(p24z'P23P22P21P20Pl6)) (5) as described previously.with the bnrenddmxdaStep 3: Retain [X/mIm2 ... mj]lmN as evaluatedwith the binary encoded mixed radix digits denote by during the scaling procedure.

pI = P1626 + + plo Step 4: Form [[X/mIm2 Mj]mN/28] by shiftingP2= p2424 + + p20 the binary representations for [Xlm1m2 MjIM'V

right s-bit positions, then residue encode the resultP3=P3525±+ * +P30 modmi (i= 1, 2, 3, 4).P4 = P4626 + * + P40. Step 5: Add the results of steps 2 and 4 mod mi

In summary, residue system input conversion requires (i= 1, 2, 3, 4).at most four residue additions accompanied by four It can be shown that in the residue systemn under con-residue system shifting or gating operations. (Each sideration (with 0<s< kv = 7) scaling by 28 or (31)28shifting or gating operation, even though performed requires nine residue additions accompanied by fourdifferently for each modulus, is accomplished in paral- shifting or gating operations, and scaling by (31)(63)28lel.) Similarly, residue system output conversion re- or (31)(63)(127)28 requires seven residue additions ac-quires at most five residue additions accompanied by companied by three shifting or gating operations. Fur-three residue system shifting or gating operations, one ther, it can be shown that scaling by 28> 128 will require18-bit binary addition and subtraction accompanied by either [s/7] or [s/7]+1 times as many operations asthree conventional shifting or gating operations, one required for scaling by 128 depending on whether s isconventional addition if the converted number is nega- or is not divisible by seven.tive, plus the usual input/output storage transfers.

OverflowScalling in the Residute System The growth in the magnitudes of numerical operands

It is desirable to be able to scale by mlm2 * * mj25 under repeated integer addition and multiplication re-where 0<j<N-1(mo=t1), mN=2kN>mi(i7.N) and quires periodic divisions and scalings to avoid overflow.0.s.kN. Scaling in this manner provides essentially As discussed earlier, the problems best suited for proc-the same capability of scaling by powers of 2 available essing in the residue mode require large numbers ofin the conventional mode. In addition, it can be shown multiplications and additions. FHence provisions mustthat with this scaling procedure, and using the No. 2 be made to eliminate occurrences of errors arising fromdivision algorithm presented in "Modular Arithmetic undetected overflow. Rather than invest in the exten-Techniques,"'6 integer division in the residue mode can be sive equipment required for rapid overflow detection in

6'See [11], pp. 2-4. 7See [11], p. C-6, for proof of this relationship.


TABLE IVPROCEDURE FOR SCALING BY (31) (128)*

Steps X131 X[l28 X[ 63 X 127

Subtract pir|Il [ x-pi i128 X-Pi 63 -pP1127

Multiply by (32) (x -p)- (x- PI) 1128 -2(x - p163 4(x -pi)+ 8(x- PI)112731 i,

Form p2ImP1: [P2 131 P2 163 P2 127

Subtract P2 l: P2 - P2 163 P2 - P2 127

Multiply by 128 32(P2 P2) 63 P2- P2127

Form and Subtract J p3 1.: P3 131 P3 1128 P3 - P3 127

Multiply by | 6 1 -2(p- P3) 1127

Form |63P4Jr1 p4 131 |-(64p4 +p4) 1128

Form Pp3 + 63p4 m,: P3 + P4131 p3 - (64p4 + P4) 1128

* The notation used is

P2 = [ ] P2 = P21128= -(32 + 1) (X-pl) 1128

A P2 2 [ xP3 = 28 [(31i128)J' P3= IP3163= 132(P2-P2)l63

63- [ ( ]__-1 P4= P41127 = -2(p3 -P3)112763 (31) (128) (63)1

p3131 = |pa3 +p4131, P331128 = p3- (64 + 1)P41128 and p3 1127 = P2-p2 1127

the residue mode [5], the processing will be programmned the gating operations required in residue system inputwith operand growth rates or periodic checks in the con- conversion. It can be shown that with 41 additional two-ventional mode, or both, establishing the schedule for input AND gates, input conversion can be executed in,the scaling and division operations. Growth rates may at most, four residue add cycles.be predicted for problems envisioned through numerical The second set of shifting or gating operations is re-analysis techniques supplemented, in some instances, quired during residue multiplication and the scalingwith reduced accuracy processing in the conventional algorithm. Because scaling may occur with any of fourmode. different moduli orderings, up to three different shift-

left operations can be required for each modulus. DuringSpecial Circuitry Requirements residue multiplication, depending on the zero bit group-

In general, it was assumed that the addition time for ings in the multiplier residue digits, it will be de-all moduli would be equal to the time required to add the sirable to shift left one, two, three, or four bitsresidue digits of the largest modulus, thus making the independently in each register segment. In addition,residue system add cycle the time required to perform scaling can require a 5-bit, shift-left operation for thea parallel 7-bit addition. modulo-63 register, and 5- and 6-bit, shift-left opera-

Except for several infrequently used shifting operations for the modulo-128 register. Implementing thesetions, there are three situations which arise during a shifting operations will require 60 AND gates, eachnormal residue system computation algorithm which with at most four inputs.will require shifting or gating operations peculiar to the The third set of shifting or gating operations is re-residue computation mode. The first of these involves quired during the mixed radix to weighted binary con-


version algorithm. Implementing the gating operations when consecutive sets of solutions agree within someof (4) will require 38 two-input AND gates. With this predetermined accuracy.type of implementation, the residue system output con- The general approach to be used in the computingversion can be executed in five residue add cycles and algorithm is suggested by writing (6) in the formtwo conventional add cycles.The control circuitry required for interpreting instruc- sxk =k- (5(W -1))(SXj(-'))

tions peculiar to the residue computation mode is esti- ai smated at 40 gates, the majority of which would be AND n (swat,\gates. An average of as many as five inputs per gate + }-- (sx(i)) + E (Sxj(k-()could be expected, depending on the instruction code j=f \aii/ j=i+l \ as / Jused. i = 1,2, , n (7)

Computing in the Residue Mode where s is the constant required to normalize the systemof equations to ensure that all computations will takeplace in integer arithmetic with the precision desired. In

considering the problem of solving systems of linear general, s or s-1 may be integral; however, this discus-simultaneous equations. Guffin used this problem as a s wl b c o wbasis for designing a special purpose residue arithmetic intege. Fo convenence in calnb i duing thecomputer which obtained solutions using the Gauss- inee. Frcneinei cln ysdrn hcompuliterawhiemethobtd solutions uecnsin rtenGau residue computation, s will be taken as a product of asuesidel iterativemethocedu[6 Here weicnsider Generl subset of the odd modulh and a power of 2 factor of thesuccessive iterative procedures, for which the Gauss-Seidel method is a special case, and show the advan- even modulus.

.... . . . ~~~~~Forpurposes of comparing the computation times re-tages in obtaining solutions in this manner in the residuemode over the conventional mode of computation. quired in the residue mode vs the conventional mode

General successive iterative procedures can be formu- for solving a system Ax] = b], it will be assumed thatall product terms within the }of (7) are summedlated in the following manner. Let Ax =b] denote the

system of linear equations to be solved and assume that prior to scaling by s. With this procedure and knowingx(*k1) ], the (k- 1)th trial solution for the unknown vec- the maximuntor x], has been calculated. Then the kth trial solution n

for x] can be written in equation form as 1/aii E aijxj,j=-i

x(k) (k-1) +-c (k) the residue system range M and constant s are relatedi~() = ik +i Zaxii= by the inequality l/2 >s2CO(llaiiZaijXj)max. If the pre-

cision with which the input data and unknown trial>E aijxj(k-l) + b - aixi(-) solutions must be represented is such that the above

j=i+l inequality does not hold, it will be necessary to formi = 1 2, . . . n (6) several sums of the product terms within the { } of (7)

and then scale each sum by s before performing thewhere w, the relaxation factor, takes on a value in the final summing operation. This consideration comesrange (0, 2). If co= 1, the iterative procedure corresponds about because of the difficulty encountered in detectingexactly to the Gauss-Seidel method, and if w> 1 (c < 1), overflow while computing in residue arithmetic [5 ].the procedure is called the overrelaxation (underrelaxa- In both the residue and conventional mode the inputtion) method. Depending on the problem, co is selected data will be in the form bilaii, cobi/aii and coaij/ai1,to minimize the number of iterations required in con- i, j= 1, 2, , n. Hence no attempt will be made toverging to the desired solution [12]. determined their required calculation times.The procedures converge for all systems Ax]=b] The residue mode computation requirements can be

where the coefficient matrix A is not reducible and has broken down into three areas: input conversion, trialdiagonal dominance. Boundary-value problems involv- solution calculations and accuracy tests. Converting theing elliptic and parabolic differential equations (for input data to the residue code consists of multiplyingexample, the Laplace equation) are usually solved by by s and rounding off at the decimal point during thesuccessive iteration procedures because they meet these encoding operation. This is accomplished by sequen-convergence requirements and, in addition, are gener- tially encoding the integer part of the partial productsally large sparse systems, a desirable attribute for rapid produced when multiplying by s. Assuming that s isconvergence [12 ]. binary encoded with six nonzero bits at most and givenA successive iterative procedure starts with a set of n(n+2) input data values at most, the conversion will

trial solutions xl(0), x2(0), *., xnj0) where Xi °=bi1aij, require approximately 8n(n+2) conventional add cy-i= 1, 2, * * *, n, are usually the initial approximations for cles. (See Table I.) Calculating a trial solution accord-the elements of x]. One sweep through the system of (6) ing to (7) requires n residue multiplications and addi-constitutes an iteration. The procedure terminates tions to form the quantity in the { } and, on the aver-


age, 8/3 conventional add cycles to scale that quantity To simplify the modifications required, it was proposedby s. Consequently, one set of trial solutions requires that moduli have values of powers of 2 and one less thann((4/3)n+3) conventional add cycles. Termination of powers of 2. It was shown how a parallel-organizedthe procedure necessitates that it be determined if binary computer with a 25-bit word length could be-pi <sx(k) -sX(k-l) <p where pi is a predetermined modified to implement computations in a residue sys-value indicative of the accuracy desired. To perform tem whose moduli are 128, 127, 63 and 31. The utilitythis test in the residue mode, the simplest technique is of having both conventional and residue processingto form p,2- (SX,(k) -SX,(k-1))2 and check the sign of the capability was discussed and the attributes of eachresult; this requires approximately 10/3 conventional summarized (see Table I). The practicality of the sys-add cycles. Neglecting the bookkeeping operations, tem approach was demonstrated in showing that sys-calculating a new set of trial solutions and performing tems of linear equations can be solved in the residuethe accuracy tests necessary will require on the order of mode with one sixth the number of computations re-n((4/3)n+-10/3) conventional add cycles each iteration. quired in the conventional mode.Computing in the conventional mode will require

(n+2) additions and n multiplications (which on the ACKNOWLEDGMENTaverage requires 12n additions) for each new trial solu- The author wishes to express his appreciation to Dr.tion. This assumes that s is selected as a power of 2 R. I. Tanaka, Senior Member of the Lockheed Missilesmaking it possible to scale by shifting. Therefore, com- & Space Company Electronics Sciences Laboratory,puting a new set of trial solutions requires approxi- and to the members of the LMISC Computer Researchmately n(13n+2) conventional add cycles plus n sub- Group for their assistance in the preparation of thistractions to perform the accuracy tests. paper, with special appreciation to N. S. Szabo for hisThe number of iterations, r, required for the solution many constructive suggestions and criticisms.

to converge to the desired accuracy will be the same inboth the residue and conventional mode of computa- REFERENCEStion. Consequently, the ratio of conventional add cycles [1] A. Svoboda, "The numerical system of residual classes inrequired for solution in the residue mode to those re- mathematical machines," Information Processing (Proc.quired in the conventional mode is [2] UNESCO Conf., Paris, June, 1959) pp. 419-422; 1960.[2] H. Takahashi and Y. Ishibashi, "A new method for 'exact calcula-

tion' by a digital computer," J. Inform. Proc. Soc. Japan, vol.8(n + 2) + ((4/3)n + 19/3)r 1, pp.28-42; 1961.

[3] P. W. Cheney, "A digital correlator based on the residue num-(13 -+ 3)r ber system," IRE TRANS. ON ELECTRONIC COMPUTERS, VOl.

EC-10, pp. 63-70; March, 1961.From this ratio it is easily shown that performing the [41 N. Szab6, "Sign detection in nonredundant residue systems,"

IRE TRANS. ON ELECTRONIC COMPUTERS, vol. EC-11, pp.calculations in the residue mode will be faster when n 494-500; August, 1962.becomes larger than 2 and at least two iterations are [51 Y. A. Keir, P. WV. Cheney and M. Tannenbaum, "Division and

overflow in residue number systems," IRE TRANS. ON ELEC-required to obtain the desired solution. In general, as TRONIC COMPUTERS, vol. EC- 1I, pp. 500-507; August, 1962.n and r become large (n> 10 and r >40) the number of [6] R. M. Guffin, "A computer for solving linear simultaneous

equations using the residue number system," IRE TRANS. ONresidue mode computations approaches 1/6 that re- ELECTRONIC COMPUTERS, vol. EC-11, pp. 164-173; April, 1962.quiredin the conventional mode. [7] H. L. Garner, "The residue number system," IRE TRANS. ON

quired ln the conventlonal mode. ELECTRONIC COMPUTERS, vol. EC-8, pp. 140-147; June, 1959.[8] 0. L. MacSorley, "High-speed arithmetic in binary computers,"

CONCLUSIONS PROC. IRE, vol. 49, pp. 67-91; January, 1961.[9] R. K. Richards, "Arithmetic Operations in Digital Computers,"

The preceding work was intended to demonstrate how D. Van Nostrad Co., Inc., Princeton, N. J.; 1955.[10] G. M. Hardy and E. M. WVright, "An Introduction to the Theory

residue arithmetic might be used to improve digital of Numbers," Oxford Univ. Press, London, England; 1960.computer performance. It was determined that it is pos- [11] "Modular Arithmetic Techniques," Lockheed Missiles &

Space Company, Sunnyvale, Calif., ASD-TDR-62686; August,sible to modify a conventional computer so that process- 1962.ing can be accomplished in either the conventional [121 G. E. Forsythe and WV. R. XWasow, "Finite-Difference Methods

for Partial Differential Equations," John Wiley and Sons, Newweighted binary system or the residue number system. York, N. Y.; 1960.

improving digital computer performance using residue number theory

Documents