a survey of indexing techniques for sparse matrices

A Survey of Indexing Techniques for Sparse Matrices

UDO W. POOCH, AND AL NIEDER

Texas A&M Umversily,* College Statwn, Texas

A sparse matrix is defined to be a matrix containing a high proportion of elements that are zeros. Sparse matrices of large order are of great interest and application in science and industry; for example, electrical networks, structural engineering, power distribution, reactor diffusion, and solutions to differential equations

While conclusions within this paper are primarily drawn considering orders of greater than 1000, much ~s applicable to sparse matrices of smaller orders in the hundreds.

Because of increasing use of large order sparse matrices and the tendency to attempt to solve larger order problems, great attention must be focused on core storage and execution time Every effort should be made to optimize both computer memory allocation and executmn times, as these are the limiting factors that most often dictate the practicahty of solving a given problem

Indexing algorithms are the subject of this paper, as they are generMly recognized as the most ~mportant factor in fast and efficient processing of large order sparse matrices.

Indexing schemes of main interest are the bit map, address map, row-column, and the threaded list Major variations of the indexing techniques above mentioned are noted, as well as the particular indexing scheme inherent in diagonal or band matrices.

The concluding section of the paper compares the types of methods, discusses their sui tabihty for different types of processing, and makes suggestions eoneernlng the adaptabil i ty and flexibility of the maj or exmting methods of indexing algorithms for application to user problems

Key Words and Phrases: Matrix, sparse matrix, matr ix manipulation, indexing.

CR Categomes: 5 14, 5 19

I. INTRODUCTION

Computations involving sparse matrices have been of widespread use since the 1950s, becoming increasingly popular with the advent of faster cycle times and larger computer memories. One cycle time is the time required for the central processing unit to send and to receive a data signal from main memory. Systems applications for sparse matrices include electrical networks and power distribution, structural engineering, reactor diffusion, and solutions to differen- tim equations.

A sparse matrix is a matrix having few nonzero elements. Matrix density is defined as the number of nonzero elements of the * Depar tment of Industr ia l Engineering.

matrix divided by the total number of elements in the full matrix. Most available references utilizing sparse matrices for calculations [1-8] consider matrices of order 50, or more [9, 10], with densities ranging from 15 % to 25 % and decreasing steadily as the order increases. This paper will accept these boundary conditions as a strict definition of a sparse matrix. Brayton, Gustavson, and Willoughby [8] say that a typical large (implied to be in the hundreds) order sparse matrix has 2 to 10 nonzero entries per row. Hays [5] says that an average of 20 nonzero elements per row is not an unreasonably small number in quite large (implied to be around 100 and greater) order. Livesley [1] indicates that an average of 3 or 4 elements

Computing Surveys, Vol. 5, No. 2, June 1973

1 1 0 • U. W. Pooch and A. Nieder

CONTENTS

I Introduction 109-112 II Bit Map Scheme 112-114

III Address Map Scheme 114-116 IV Row-Column Scheme 116-119 V Threaded List Scheme 119-122

¥I Diagonal or Band Indexing Scheme 122-123 VII Conclusion 123-127 Appendix A 127-132

Algorithm 1 Bit Map Scheme Algorithm 2 Address Map Scheme Algorithm 3. Address Map Scheme

Bibliography 132-133

C o p y r i g h t (~ 1973, A s s o c ~ a t m n for C o m p u t i n g M a c h i n e r y , Inc. G e n e r a l p e r m i s s i o n to r e p u b h s h , b u t n o t for profi t , all or p a r t of t h i s m a t e r i a l is g r a n t e d , p r o v i d e d t h a t A C M ' s c o p y r i g h t no t i c e is g i v e n a n d t h a t r e fe rence is m a d e to thJs p u b l i c a - t ion, to i t s d a t e of ~.ssue, a n d to t h e f ac t t h a t re- p r m L i n g p r iv i l eges were g r a n t e d b y p e r m i s s i o n of t h e A s s o c m t m n for C o m p u t i n g M a c h i n e r y .

per row in a large (implied to be around 1000) order structural problem is a good estimate.

If the order I of the matrix is reasonably small, i.e., about order 50 or less, it would make little difference if the full matrix were kept in core. However, if the sparse matrix is of larger order than about 50, it becomes efficient in terms of execution time and core allocation to store only the nonzero entries of the matrix.

The efficiency of retaining only the nonzero elements becomes obvious in the exam- pie of a 500 X 500 matrix with 10 % density. With one word of storage allocated for each element, the matrix requires 250,000 words, which is very often more than is physically available. Storing only the nonzero elements requires 25,000 words. If the full matrix were multiplied by a similar full matrix a minimum of 500 X 500 X 500 = 125 X 106 arithmetic operations are required, compared to a minimum of (500 X 10 %)3 = 125 X 103 arithmetic operations when only the nonzero elements are retained. If both 500 X 500 matrices were to be retained in core as full matrices, core allocation and execution time would be prohibitive on many computers, and the problem would be abandoned as in- feasible for computer solution.

By storing the nonzero elements in some reasonable manner, and using logical operations to decide when arithmetic operations are necessary, Brayton, et al. [8] relate tha t both the storage requirements and the required amount of arithmetic can often, in practice, be decreased by a factor of I over the full matrix.

Sparse matrices are classified generally by the arrangement of the nonzero elements. When the matrix is in random form, nonzero elements appear in no specific pattern. A matrix is said to be a band matrix, or in band form, if its elements a~.~ = 0 for [ i - j I > m (where m is a small integer, and usually m ~ I) and where the nonzero elements form a band along the mam diagonal. The band width is the number of nonzero elements tha t appear in one row of a band matrix (i.e., 2m ~- 1). A block-diagonal form occurs when submatrices of nonzero elements appear along the matrix diagonal. In block

Computing Sulvevs, Vol 5, No 2, June 1973

Indexing Techniques for Sparse Matrices • 111

form, the matrix has submatrices of nonzero elements that occur in no specific pattern throughout the full matrix. The block dimension is the order of a submatrix in a block or block-diagonal matrix.

In electrical network and power distribution problems, the matrix is generally in random, band, or block-diagonal form, with the elements representing circuit voltages, currents, impedances, power sources, or users [9-10]; in structural engineering applications, the sparse matrix is generally of band or block form, with the band width or block dimension representing the number of joints per floor [3, 11]; in reactor diffusion problems and differential equations, the band form of matrix is most common, with the band width being the number of points used in a point- difference formula [12-14].

This paper, while not concerned with the actual mathematical manipulations of sparse matrices, is primarily concerned with the indexing algorithms employed in such calculations. If the sparse matrix is stored in a hap- hazard manner, elements can only be re- trieved by a search of all the data, which takes much time. If the sparse matrix is stored in some very convenient form, execution time will be much less. Conservation of execution time is of major importance in selecting an indexing algorithm.

Another major consideration in selecting a particular indexing method is the amount of fast core the method requires in addition to that used for the storage of the nonzero data elements. For most applications, a small difference in core allocation between two methods is not a critical factor. In this case, the critical consideration is the execution time difference between the two methods. Since execution times vary greatly with the methods of indexing, an exact comparison of execution times must reflect the type of mathematical manipulation that is to be performed on the sparse matrix.

One last major aspect of indexing algorithm selection concerns the adaptability and flexibility of programming the selected scheme. This depends in great part on the type of machine, business or scientific; machine configuration; operating system capabilities; number of bits per word; access

times for peripheral devices; average instruction times; availability of the required instructions; the maximum row or column size to be used; the expected matrix density; and the availability and size of buffers.

As with most applications, the use of a high-level programming language may provide relative ease of implementation for a selected indexing scheme, but such use is frequently accompanied by penalties in execution time and storage requirements. How- ever, on the positive side, use of high-level languages may well result in a minimum of elapsed time for problem solution with a given programming staff, as well as overall minimum cost, considering both personnel and computer usage. Problems involving large order sparse matrices focus their attention on core storage utilization and execution time minimization, and therefore all but eliminate the employment of high-level languages for indexing schemes.

In subsequent sections of this paper, cur- rent indexing schemes will be examined in an attempt to isolate a "fast" indexing algorithm, with "fast" being defined as pro- ducing an optimization of execution time and core storage for sparse matrices of large order. Particular advantages and disadvantages of each major type of indexing discussed will be brought to the attention of the reader. Parts II through VI discuss aspects of particular indexing schemes, while Part VII compares the requirements and advantages of the various schemes. Part VII, in conclusion, also makes recommendations concerning the adaptability and flexibility of the major existing indexing algorithms for application to user problems.

The authors have attempted, as much as possible, to make their discussions machine independent. However, the authors made use of an IBM System 360/65 Model I in their research and certain basic aspects of this machine, such as the 32-bit word, are alluded to in the succeeding pages. The interested reader should have httle difficulty in adapt- ing the concepts presented to machines of differing architecture.

Computing Surveys, Vol 5, No 2, June 1973

112 • U. W. Pooch and A. Nieder

II. BIT MAP SCHEME

In a bit map scheme, a Boolean form of the matrix M is the basic indexing reference. Whenever a nonzero entry occurs in the sparse matrix, a 1 bit is placed in the bit map, with null entries remaining as zeros in the bit map. The position of each successive nonzero entry is found by counting over to the next 1 bit in the map.

More rapid access to any element of a row is achieved by providing an additional row index vector, where each element of tha t vector is the address of the first nonzero element of each row [16]. An additional column index vector may also be applied for a more rapid column access, but this will also neces- sitate storing each nonzero entry twice. I t should be noted, however, tha t any machine based on word, rather than bit, addressing techniques will give much slower access in one dimension of the matrix than in the other.

As an example, the following matr ix M, and its associated bit map and reduced Z- vector is given.

M = 0 5 B M = 0 1 0 0 0 0 1 0 1 0

Z - - [ 3 , 2 , 5 , 4 , 7 , 1 , 8 ]

Figure 1 demonstrates a sample bit map supplemented with the row index vector V; the Z elements are the nonzero elements of the matrix.

The bit map in Figure 1 is a matrix con- ception of the bit map. To conserve core, instead of using one word for each row of the bit map, all four rows (16 bits) are corn-

v

v(2) • z(2) • 2 z(3)

V(3) • 4 , Z(4) z(5)

V(4) ) ) Z(6) z(7) ,

ROW Index Value Indicates Z-vector Bit Map f l r s t nonzero value element for row

FI~ 1. Sample bi t map .

I 0100 1010 I 1001 0101 I . . . . . . . ] byte 1 byte 2 byte 3

FIG. 2. Bit map of Figure 1 in core.

pacted into one word as shown in Figure 2 with byte (8 bits) boundaries marked.

From Figure 2, it is simple to see tha t the bit map, being the Boolean form of the matrix, will, in fast core, require a t least W = I . J / B words, where I and J are the dimensions of the matr ix and B is the number of bits per word; W is rounded up to the nearest integer. The bit map uses a t min imum

Emt Map ---- (100/B) %

of the storage requirements of the full matr ix for indexing. The additional row index vector adds W= I . A / B more words, where A is the number of bits required for an address. Supplemented with the row index vector,

E•lt Map -~- ROW I n d e x = IO0/B (1 ~- (A/J) ) %

of the full matr ix is required for the indexing. Now, if the sparse matrix has less than

65,536 nonzero elements, then A can be 16 bits in excess 32,768 notation. In a 32-bit- word machine for example, 16 bits may be conveniently accessed if the instruction set has a complement of half-word instructions. Attent ion should be given to the number of bits required for an address to range through the max imum core size. If this number of bits is not conveniently manipulated, it will be necessary to use more than the minimum amount of core to gain an execution advantage. Execution times for full word instructions are often less than execution times for half-word instructions. Therefore, when choosing a convenient number of bits for A, the number of bits used for an address, it is impor tant to realize the tradeoff between core conservation and access time.

Using B = 32 bits (word length), and A = 16 bits (half-word length), for a 500 × 500 matr ix the bit map and row index vector require 8313 words, or 3.325 % of the 250,000 words for the full matr ix; if the matI ix is only 5% dense, another 12,500 words are required for the nonzero elements; the total is 20,813 words, or 8.325 % of the full matrix.



In order to reference the M,~ element, it is necessary to physically count across to the j th element in the zth "row" of the bit map. The correct bit will lie in the S~ = ((i - 1) * J + j + (B - 1)) /B word of the bit map. To isolate the required bit, it will be necessary to either shift the word the necessary number of bits or mask all the other bits by a logical operation. If a shift is used, then repeated shifts perform a row operation when the bit map is stored by rows. Algorithm 1 (see Appendix) isolates the correct beginning word of a row in the bit map; a segment of the code shifts through one entire row, in preparation for a mathematical manipulation of the row.

Algorithm 1 with slight alteration will ac- commodate matrices up to order 100,000. The restriction occurs in statement 06, where the multiplication must not result in loss of significant bits due to exceeding word size. In practice, the algorithm is limited either by the index vector being half-words, as indexing is provided for only 65,536 nonzero elements; or by 4095 rows or columns, the maximum number used in the indexing in statement 02.

When the bit map is stored by rows, as in the algorithm above, then to perform a column operation it is necessary to count to the correct j bit for all I rows. This means exe- cuting virtually the entire algorithm I times. If more than a few column operations are to be performed, then execution time will become an important factor. The execution time is dependent on the density of the sparse matrix, the order of the sparse matrix, and the number of column operations to be performed. The time factor is exemplified by the following:

EXAMPLE 1: A 500 X 500 m a t r i x

exists, and it is necessary to perform 10 column operations when the matrix is 5 % dense. The average column execution time will be that of the 250th column. Assuming the entire algorithm is executed for each row, the execution time will be approximately:

500 rows X 10 column operations [(time to locate beginning of each row)

+ .05 density X 500/2 columns X (to process 1 bits)

+ (1 - .05 density) X 500/2 columns X (time to process 0 bits)

+ 500/2 columns X (time to locate bit in bit map)

+ 500/2 words X (time to locate word in bit map)]

which is about 10 seconds on the IBM 360/65, with additional microseconds incorporated for the mathematical operation not listed in the coding. Had the same procedure been carried out on the transpose of the bit map, that is, the bit map is now column-oriented instead of row-oriented, then the execution time would have been cut by a factor of about 500, a considerable time savings. Not taken into consideration is any further computer processing, such as updating an index register after each 4095 characters or bytes, if necessary.

If the bit map of the sparse matrix can be transposed and the data rearranged in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time. In the above example, the difference between column and row execution times is about 9.7 seconds.

For certain types of operations the bit map is ideal. Being in Boolean form, which means elements are either 1 or 0, true or false, or plus or minus, the bit map is the most compact form for logical operations, such as AND, OR, or EXCLUSIVE OR. Thus, if matrices MA and MB exist, and it is necessary to determine which elements are nonzero in both matrices, it is necessary only to AND each word of bit map MA with the corresponding word of bit map MB. If the result is zero, both are not present; if the result is nonzero, the indicated elements appear in both matrices. An EXCLUSIVE OR deter- mines which elements are present in either, but not both, of the matrices; an OR deter- mines which elements appear in either or both of the matrices. Logical operations performed on the bit map require about 1/~ 2 of the execution time for the same logical operation on the full scale matrix, because the bit map on a 32 bit-word machine condenses 32 pieces of data into 1 word. Additionally,

Computing Surveys, Vol. 5, No 2, June 1973

114 • U. W. Pooch and A. N~eder

and often most importantly, the bit map conserves core storage.

To determine how many elements will be present in the sum of two rows, and their order, an OR is performed on the two rows of the bit map. Using similar techniques, the feasibility of rearranging the matrix in a form more convenient for the user, such as diagonal form, where nonzero elements appear all along the diagonal, is determined. Kett ler and Well [15] discuss some of the aspects of such a rearrangement algorithm.

Many references are found to endorse or suggest the use of a bit map scheme for sparse matrices [7, 15-20], but it is particu- larly difficult to ascertain the exact algorithms utilized, as most authors do not include these in their papers.

While a bit map scheme appears convenient and fast, it is restricted by the amount of fast core available for the bit map. In the case where the sparse matrix is less dense than the percentage of the full matrix tha t the bit map scheme occupies, core storage will be conserved by switching to an alterna- tive method of indexing.

Givens [21] has suggested that the bit map scheme would be more at tractive to users if some special instructions were designed and implemented, to further decrease execution times. One such instruction Givens references is CLEAR TO ZERO, which would clear a large block of core, e.g., the bit map, from a first to a last address. Another instruction would be LOAD N E X T NONZERO, which would fetch the address of the next nonzero entry of the bit map, given the previous nonzero element, thereby eliminating the necessity of counting through all the zero bits.

These special instructions would be implemented as microprogrammed subroutines [21]. To define a microprogram, it is necessary to understand that the execution of each assembly language instruction involves a specific sequence of transfers of information from one register in the processor to another; some of the transfers take place directly, and some through an adder or other logical circuit. Each of these steps defines a microin- struction and the complete set of steps necessary to execute the assembly language instruction constitutes a microprogram [22].

IlL ADDRESS MAP SCHEME

The address map is similar in form to the bit map, the main difference being that the address map stores an address or address displacement for each matrix element. If the matrix element is zero then a zero address is stored. The bit map requires only one bit for each matrix element.

Since an address or address displacement requires more than one bit for each matrix element, the address map scheme will require N times more core storage than the bit map scheme, where N is the number of bits used for an address or address displacement. If address displacements instead of full-length addresses are used, then the address map must be augmented by a row index vector, as with the bit map.

Assuming there are less than 256 nonzero entries per row, for example, an address displacement would require only 8 bits (a common character size). If a particular computer allows character operations that are faster than the access time to an individual bit map entry, the improved column access time of the address map can warrant the increased core expenditure. On a system with 6 bit characters, up to 64 nonzero row entries can be accommodated.

The overall percentage storage requirement of the full matrix required for the address map with the row index vector will be

EAdd .... Map = 100/B (C + A / J ) %

where B is the number of bits per word; C is the number of bits used for an address displacement; A is the number of bits used for an element of the row index vector; and J is the number of columns of the matrix. Using C= 8 bits; A = 32 bits; B = 32 bits; and J = 1000 columns, the address map and row index vector require 25.1% of the full matrix, tha t is 251,000 words compared to 1 million for the full matrix. In addition, if the matrix is 5% dense, an additional 50,000 words are required for the storage of the nonzero elements.

In order to isolate the M,~ element, it is necessary to access the S, = C /B (i -- 1). J -t- j character (or byte). In terms of words, S, = {C[(i -- 1) . J + ( j - 1)] + B} /B . )

Comput ing Surveys, Vol 5, No 2, June 1973


where i and 3 are respectively the row and column of interest. If the S~ character (byte) is zero, it is a null entry; otherwise, the con- tent of the S~ character (byte) is added to the row index element to give the address of the nonzero element.

The address map scheme is subject to many of the same limitations of the bit map scheme, and requires a larger amount of core storage for indexing. A sample coding, Al- gorithm 2, which has the same characteristics as the example used in the bit map method (Algorithm 1) illustrates that fewer arithmetic operations than the bit map method are required when the computer is equipped with character addressing capabilities. If the computer used does not allow convenient arithmetic manipulation of individual characters, then the coding enclosed in brackets in Algorithm 2 must be added to overcome this difficulty. The bracketed coding requires much of the algorithm time, so if a computer has built-in arithmetic character manipulation, then the algorithm becomes increasingly faster.

With an example similar to Example 1, we find that the execution time, with the bracketed coding included, is drastically different from the bit map time. This is primarily because of the easy access to any character. To access by column instead of by row, only the first row location of the correct column need be found. To find the correct location of the character in row 2, it is sufficient to add just the column dimension. This process is continued until the end of the matrix is encountered.

For a column manipulation, then, we easily obtain Algorithm 3, similar to Algo- ri thm 2.

EXAMPLE 2. As in Example 1, a 500 X 500 matrix exists with 5 % density, and it is necessary to perform 10 column operations. I t is therefore necessary to execute Algorithm 3, 10 times, so the execution time will be approximately

10 column operations X [(initialization time to lobate beginning of each row) 500 rows X (time to locate bit in

bit map)

+ (1 -- .05 density) X (time to process 0 bits)

+ .05 density X (time to process 1 bits)]

which is about 30 msec on the IBM 360/65, and has incorporated 2 additional ~sec that were included for the mathematical operation not listed in the coding. As with Algorithm 1, the limitations are due to the use of half- words for the index vector, and to the use of an index register. Note that there is a considerable time savings, but at the expense of computer memory. Again, not taken into consideration is any further computer processing, other than the above coding, such as updating index registers, which may be necessary and require more time.

Unhke the bit map scheme, where the entire row of the bit map up to the desired element must be scanned for nonzero entries before data manipulation can occur, the address map method requires only a reference to the desired element. Because the storage location of a data element is found inde- pendently of all except the desired address displacement, the address map method blends well with the concept of parallel processing. Parallel processing involves the s~multaneous execution of a sequence of operations by dependent central processing units. Thus, using the address map method, 4 separate central processing units could si- multaneously execute the required arithmetic on 4 different elements of the matrix; at best, using the bit map method, different steps in the execution of 1 matrix element would be shared by the 4 central processing units. Employing the address map method, the processing units could work independ- ently, except for the final results; while the bit map method would require transfers of information from one processing unit to the other processing units to execute the shared steps, which introduces an additional time lag.

While no references have been found to explicitly endorse or suggest this method, and comparatively large core requirements exist, the address map scheme may prove useful with some future computer tha t features both very fast core of a few million characters and a multitude of parallel proc-


116 • U. W. Pooch and A . Nieder

0 2 0 0

I: Oo 4o °o 1 7 9 0 5

FIG. 3 Sample matrix.

v(]> v(2) v(3) v(4) v(5) v(6) v(7)

FIG. 4 nators

l 2

2 l

2 3

3 l

4 l

4 2

4 4

Row Col umn

÷ 2 Z(1)

÷ 6 Z(2)

+ 4 Z(3)

÷ 3 Z(4)

÷ 7 z(s) ÷ 9 z(6) ÷ 5 Z(7)

Indexing with row and column deslg-

the row designation and another specified number of bits for the column designation (Figure 4).

If computations are to be performed in a row manner, it is highly practical and efficient to order the nonzero entries first by rows and then by columns. Ordering the entries by rows makes it unnecessary to main- tain the row index for every nonzero element; only the row need be identified for the first nonzero element of each row, as it is known tha t all the following entries up to the next row indicator belong to the same row. In order to create the row marker, a check bit, such as a minus sign bit, can be set in the first column index word of each row (Figure 5), or as is usually done, an additional and separate row index vector can be created (Figure 6). The row index element generally contains the address or index number of the first column index for the row. The same system m a y be applied to ordering the entries

V(1)

V(2) essing units. Hoffman and McCormick [22] V ( 3 )

state tha t at present the value of parallel processing on a large scale is debatable as V(4) far as manipulating sparse matrices, as there V (5) are virtually no available computers with

V(6) more than just a few parallel central processing units, and the field is quite unexplored. V(7)

IV. R O W - C O L U M N S C H E M E

Row-column indexing schemes refer to methods relying on paired vectors of some type; generally one vector contains the nonzero elements, which are most often ordered by rows or columns, and the other vector main- tains the indexing information. Row-column indexing schemes are sometimes referred to as block index, row, or column packing schemes, depending on the author 's description of how the indexing algorithm works [7, 15, 17, 20, 23-24].

In the simplest, but not the most core- and time-efficient form, each nonzero element of the matrix has a corresponding index word tha t contains a specified number of bits for

2

- 1

+ 3

- 1

- 1

+ 2

+ 4

Row Column indicator (Sign b i t )

÷ ~ z(l) ÷ Z(2)

+ : z(3) Z(4)

z(5) ÷ ~(6)

Z(7)

FIG. 5. Indexing with row mdmator and column designation

VR(1) ~ ! V(1) VR(2) V(2) 1 1 VR(3) V(3) 3 1 VR(4) V(4) 1 : ÷

First V(5) " ' column V(6) 2 ~ ÷ index

for V(7) 4 i ÷ each Column row (halfword) (halfword)

2 6

÷ 4 3 7 9 5

z(1) z(2) z(3) z(4) z(5) z(6) z(7)

FIG 6 Indexing with row vector and column index vector.

Comput ing Surveys, Vol 5, No 2, June 1973


by columns if column operations are to be performed.

Figures 3 through 6 depict sample vectors for the row-column schemes described above. The index vectors are V and VR; the nonzero entries are contained in vector Z. The data matrix used in Figures 4 through 6 is displayed in Figure 3. The nonzero entries of the data matrix are stored by rows, in order of increasing column number. All index vectors are full words unless otherwise noted.

From the above figures it is evident, there exists a wide possibility of variation in the row-column scheme of indexing. Further variations and adaptations can occur as a result of optimizing peculiar computer characteristics, or as a result of making calculations on special forms of sparse matrices, such as block matrices.

However, caution is advised, for such optimizations may result in a useless program whenever system changes occur, and should therefore only be used when they are critical economies of the calculations.

In the instance of computer peculiarities, Smith [17] states that a particular type of second generation IBM computer did not utilize the bits of the second word in extended-precision floating-point calculations that were normally used as the exponent bits in single precision floating-point calculations. A sparse matrix row-column indexing algorithm was developed that employed these otherwise wasted 8 to 9 bits as the row or column indices, and could ac- commodate matrices up to order 255 and 511 respectively.

For the case of a special sparse matrix, the row-column indexing scheme for a block- diagonal matrix could become a blocked indexing scheme. The blocked indexing scheme would be identical to the row-column method, except that the large sparse matrix is partitmned into several smaller submatrices (blocks). Then each submatrix is identified with a separate row-column scheme of some sort.

A blocked indexing scheme may also be used to refer to combining several column indices into one block (word). For example, one 64-bit word would contain 4 column indices, each index of 16 bits. When a row

operation is performed, then, 4 nonzero elements can be readied for processing at the expense of a loading time for only one block [17].

I t should be noted tha t for many computers and algorithms more time is required to load a referenced word for arithmetic processing than is required to perform the necessary arithmetic to isolate the required bits of the referenced word. Likewise, more time is required to load extended-precision words than ordinary ,words. Also, since most computers are geared to utilize arithmetic data primarily by words, more time is required to load a half-word for arithmetic processing than is required to load a full word.

Another major variation, known as delta or displacement indexing, is also popular, and is somewhat similar to the address map form of indexing. For one particular example of a delta indexing scheme, one 64-bit extended-precision word contains one 16-bit index and six 8-bit displacements to the index. Therefore, the column indices of 7 elements can be referred to by loading and processing one extended-precision word, which can result in both a considerable t ime and core savings. For a delta of 8 bits, it is possible for 2 nonzero entries of the same row to be a maximum of 255 columns apart. If elements can appear farther apart than 255 columns, then a greater number of bits must be allocated for each delta or the method must be abandoned. To determine the column number of the first element paired with the 64-bit index word, the first 16 bits of the index word are used. In order to determine subsequent column numbers for any other element paired with the 64-bit index word, the appropriate delta is added to the first 16 bits and the sum of deltas in between.

Smith [17] also states that delta indexing is more efficient for large order (implied order about 250) sparse matrices than a blocked index form. Figures 7 and 8 depict the blocked and delta indexed word mentioned above, and are equivalent.

EXAMPLE 3. From Figure 7, column index 3 = 1078. From Figure 8, column index 3 = 1027 + 20 -t- 31 = 1078.

Computing Surveys, Vol, 5, No 2, June 1973

118 * U. W. Pooch and A. Nieder

1027 1047 1078 1095 Column Column Column Column index 1 index 2 index 3 index 4

(16 bits each index) FIG. 7 Blocked index word.

For the row-column indexing method, using a column index for each nonzero entry and a row index vector, there is a required minimum for indexing W = I / B ( J . T . D + V) words; where I is the number of rows; J is the number of columns; T is the number of bits used for a column index element; D is the density of the matrix; V is the number of bits used for a row index element; and B is the number of bits per word. In reality, however, for matrices up to order 65,535 (in excess 32,768 notation), half-words may be most conveniently and efficiently used for all the row and column indices. Half-word indices are used to increase core savings at a generally tolerable increase in execution time; few it any matrices of order 30,000 or greater have been of notable use. Using half-word indices, then, the above-

mentioned indexing scheme requires a minimum core storage of

ERow-co~umn = (1 /2J + D)%

of the full matrix for indexing. To access an M , element, it is necessary

to refer to the i th row index, which points to the first nonzero element of the i th row. The column indices between the ~th and i + 1st row indices are searched for j. If the column indices searched do not contain j, the M , element is zero; otherwise the data element paired with the j column index is fetched and processed.

For row operations, as long as the matrix remains ordered, execution time is very fast. For more than a few column operations, however, on a matrix of order greater than

about 200, it is almost always more convenient and efficient to transpose the entire matrix and reorder all the data elements before performing the desired arithmetic. Again, the same situation exists as with the bit map; if the data and indexing scheme can be transposed in less time than the difference between the column and row execution times, then the transpose operation will conserve execution time.

Unlike the bit map and address map schemes, which have constant core requirements for indexing, the row-column method has a core requirement for indexing directly proportional to the matrix density. Since each nonzero element has a paired column index, only the number of elements in the row index vector is constant. For example, adding two 50 X 50 sparse matrices, MA and MB, does not in general produce the result tha t the total number of resulting nonzero elements is the sum of the nonzero elements for each matrix before the matrix addition: if MA has 250 data elements and MB has 450, the sum of matrices MA and MB will not, in most cases, have 700 elements, in the sum of matrices MA and MB, the only surety is tha t there will still be 50 row index elements. A variable amount of core for indexing creates core allocation difficulties tha t may not be readily acceptable to the user.

In comparison to the bit map method, the row-column indexing method is noted for its fast execution time, when data elements are properly ordered, and its ease of programming, even for matrices of very large order (in the thousands). A wide variety of references endorse (or imply an endorsement of) a row-column techmque for indexing [15, 17, 25-30], or a block-diagonal method [31- 34], especially for particular applications, as noted in the Introduction, or for special matrices, such as symmetric matrices. I t should be noted that a symmetric matrix

1027 20 31 Column delta delta index 1 (16 bits)

1 7 m _ _ _ _

delta delta delta delta

(8 bits each index) FIG 8. Delta index word.

C o m p u t i n g Su rveys , Vol 5, N o 2, J u n e 1973


decreases by almost 50 % the core requirements in the row-column technique, both for the data elements and for the indexing elements.

Two of the more general sets of algorithms encountered for processing random, and some special, sparse matrices and employing the row-column indexing technique are MATLAN [29], an IBM product, and Algo- r i thm 408 [30], a more recent private effort. As these algorithms are readily available and are of general interest, a particular coding example is not given for the row- column indexing technique. Both these algorithms were intended for use on sparse matrices of order less than about 32,700, and are more efficient for orders less than (about) 1,000.

MATLAN is a programming system, operating under the control of Operating System/360, and has a very wide applica- bility. MATLAN includes many supple- mentary features, such as different versions for an all-core problem and for a segmented problem, three overlay structures for core storage, and options on precision. A segmented problem exists when portions of the problem under consideration are stored in core and on tapes or disks, an all-core problem exists when the storage requirement is such that the entire problem is stored in fast memory. Because of the variable precision option and the all-core or segmented feature, it is difficult to assess execution times. Array dimensions are limited to 32,756, which indicates half-words are used for indexing purposes.

Algorithm 408 uses a variation of the indexing algorithm depicted in Figure 6. Instead of having the row index vector contain the address or index number of the first column index for the row, the row index vector contains the number of stored elements in the row. In addition, the row index vector is appended to the column index vector by using the same array name, M. While the scope of Algorithm 408 is not as broad as ~¢IATLAN, Algorithm 408 has the distinct advantage of being readily alterable: a section of the reference is devoted to possible alterations, such as combining three or more indices to a word of the M array.

Because of the great variation in coding, at present it is not considered economically worthwhile to compare actual core storage and execution times to determine which of the many different existing algorithms employing the row-column method is the most efficient or optimal.

A good basis for examing some of the row- column indexing scheme characteristics rests on using half-word indices, with a row index vector, for calculations. At worst, the method (as typified by Algorithm 408) will utilize less core than the full matrix up to a density of slightly over 66%. Conservation of core allocation and execution time increases as the density decreases.

I t has been noted that the bit map method employs approximately 4 % of the full matrix for indexing. Therefore, it can easily be seen that when the matrix density falls below about 4%, the row-column method will conserve more core than the bit map scheme. In addition, the advantage of the faster indexing into the data by the row-column method in this case almost excludes the use of the bit map, except for special cases, such as a Boolean problem.

V. THREADED LIST SCHEME

A threaded, or linked list, scheme contains one element of an array in core for each nonzero element of the sparse matrix. Each array element in a linked list method has at least three components: one component contains the row and column indices; another contains the matrix element (data); and the third contains the address of, or a pointer to, the next array element.

If the third component of an array element were not present, the linked list scheme would have, at an absolute minimum, the same core requirement for indexing as the row-column method. The third component adds W = A*D/B more words for indexing which gives a minimum total of W -- I /B ((J.T A- A)D A- V) words for indexing a threaded list scheme: where I is the number of rows; J is the number of columns; D is the density of the matrix; T is the number of bits used for a column index; V is the number



of bits used for a row index; A is the number of bits required for an address to range through the entire amount of core used to contain the complete threaded list; and B is the number of bits per word. For any practical application, however, both the row and column indices must be retained, which gives an overall minimum core allocation for indexing of W = I . J . D ( T + V + A ) / B words.

As in the previously discussed methods of indexing, half-words (16 bits) are used in practice for both the row and column indices, which give capabilities of a matrix of order 65,535 (in excess 32,768 notation). In addition, because of the great difficulty and great t ime involved in manipulating addresses of less than full word size (refer to Bit Map Scheme), full words (32 bits) are conveniently used for addresses. These considerations now require for the overall minimum core storage for indexing, W = 2 . I . J . D words. As a percentage (E) of the full matrix, this is

E L m k e d LI~t = 2*D %

necessary for indexing. In order to reference an M , element, the

entire threaded list must be searched if the nonzero elements are stored in a random manner. Elements can be stored, except for updates, and accessed more efficiently by rows and colums, which can reduce access time to particular elements or rows of elements. Elements need not be stored contiguously for reasonably efficient processing.

In one particular application of a threaded list scheme, data elements were initially stored by rows and columns, and a table of pointers was kept. Each pointer addressed the beginning element of a group of 8 elements. Any particular item, or row of items, could be found by a binary search on the list of pointers. Example 4 typifies the search for a particular matrix element in this application of linked list indexing.

EXAMPLE 4. Matrix elements are stored by rows and columns. The element to be found is in the middle row of the matrix, so the pointer in the middle of the pointer list is selected. The contents of the pointer word

addresses an element of the linked list. The element is then examined, to compare the row and column components with the required row and column numbers. Three separate cases can now occur: (1) If the row and column numbers

match, the correct element has been found.

(2) The rest of the elements in the group of 8 are searched, and if the row matches, but not the column, it is known that the correct group can probably be found by a search on the next few pointers about the pointer last used. if the pointer indexed an element whose column number was greater than required, then the next lower pointer is used.

(3) The rest of the elements in the group of 8 are searched, and if the row doesn't match, then a binary search on the pointers is continued. In a binary search, if the pointer indexed an element whose row number was greater than required, the next pointer to be selected is the one halfway between the last pointer (upper bound pointer in this case) and the lower bound pointer (the first pointer in this c a s e ) .

When the procedure is iterated, (2) above, and the appropriate groups are searched, but the correct row and column cannot be found, then it is known that the required matrix element is the null element.

I t should be noted that unless the data elements are in reasonable order, the binary search on the pointers is almost useless. The particular value of a linked list is that there is no longer the requirement that data elements be stored contiguously: updates, in- sertions, and deletions of matrix elements are performed by altering the address component of the appropriate hnked elements. However, a linked list expansion or con- traction results in some pointer groups having a greater number of link elements, and some other pointer groups having fewer link elements. The alterable number of link



elements in each pointer group necessitates a periodic updating of the pointer table. A pointer table update is vital to the efficiency of the binary search, and may require a great amount of execution time. The amount of execution time required for a pointer table update depends directly on the number of link elements to be grouped, as each link element must be inspected m order to find each successive link element. For peak efficiency of the binary search, every group should have the same number of linked list elements.

Using the additional pointer table to combat the otherwise slow execution time of the linked list scheme, one pointer exists for each 8 nonzero matrix elements. Em- ploying a full word for each pointer, which is an address, we now have a minimum indexing core requirement of W = 21/~*I*J*D words, for

ELmked List - - ~ 2 . 1 2 5 , D %

of the full matrix. This is a much greater core requirement than the row-column methods of the previous section require for any matrix of order greater than three.

Figure 9 depicts a few elements of a linked list, and the correlation between elements. A pointer table is not included.

Not previously mentioned is the practical necessity of maintaining a table of available addresses, so that core allocation remain conservative during the insertion and dele-

Address Address

I051 next Data ROW Column e l e m e n t element

. . . . . . . . . . . . . . . . . . . *

Address

1 1 6 2 .

F '2 I 3 1 9841 I J i i i

i . . . . . . . . . . . "1 i . . . . . . . I

Address I

1 2 7 3 I

. . . . . . . . . . . H 41 1,4 f .6'2 J f

FIG 9 Linked hst elements.

tion of matrix elements. When matrix elements are deleted, the address of the deleted link element must be appended to the table of available addresses. Not only must the table be maintained in fast core but the threaded list scheme additionally requires a buffer area to be used for the in- serted and/or deleted link elements. If such a buffer area is not used or kept, then core will not be conserved and the prime ~d- vantage of the threaded list will have been discarded.

Few references endorse, or suggest endorsement of, the linked list scheme as a practical method for indexing sparse matrices [15, 34-37]. Only a few sources [15, 38-40] found in the literature survey ac- tually utilized the threaded list scheme; while the actual algorithms were seldom described in great detail, the scheme basi- cally followed the designs of Example 4.

Overall, the threaded list technique of indexing into sparse matrices requires a significant amount of execution time for processing indices, in addition to the core requirements of a buffer and two separate tables. Inherent in the method, then, are considerable execution times for processing and considerable core expenditure, in comparison with the bit map and row-column schemes for identical matrices. Offsetting these disadvantages, however, the linked list scheme has the distinct advantage of not requiring a significant amount of execution time to update the linked list by insertion or deletion of single matrix elements or series of matrix elements. All other previously discussed indexing techniques require a shifting of data when an update is performed, which will take a great amount of execution time when numerous matrix elements have to be shifted to make the appropriate word available for the update. The linked list scheme is slow for random processing of matrix elements; however, in many applications items are accessed sequentially by row or column. In these applications, proper chains of pointers speed up processing greatly. As with previous methods, a definite symmetry of the sparse matrix reduces proportionately the core requirements for indexing.


122 • U. W . P o o c h a n d A . N~eder

Vl. DIAGONAL OR BAND INDEXING SCHEME

Band and diagonal matrices are special types of matrices tha t occur frequently in electrical engineering, structural engineering, nuclear engineering and physics, solutions to differential equations, and a host of other fields, as mentioned in the I n t r o d u c t i o n . Band and diagonal matrices, while of frequent occurrence, should not be mis taken as a general case of sparse matrices.

When band or diagonal matrices occur, a special effort on the par t of the user should be made to adapt his processing and /or indexing algorithms to the case at hand. This adaptat ion should be made because of the inherent simplicity of processing, manipulating, and solving band matrices, and also because of the opportuni ty to minimize core allocation and execution time.

In most cases, band or diagonal matrices are processed either wholly by rows or columns, and htt le or no processing of single elements occurs. For a band matrix, a common manipulat ion involves decreasing the band width. In such a manipulation, it is normal procedure for one entire row (column) to operate on the row (column) im- mediately above or below it (or to either side). With such a simple processing sequence, it is evident tha t only a few rows (columns) need be maintained in fast core for immediate use.

If da ta transmission rates are comparable to the rate with which rows (columns) are manipulated, then rows (columns) not in immediate use can be stored on slower access devices, such as tapes or disks. Storing data on tapes or disks frees the more expensive fast core. In most machine configurations there is a much larger amount of memory available in the slower devices. When slow devices can be used efficiently for processing band matrices, the capabil i ty of manipulating large order sparse matrices is limited by the max imum allowable execution t ime and the desired accuracy limits of the results, and not by the order of the matr ix involved.

To further conserve execution time, but a t the expense of fast memory, the entire band matrix can be stored in fast core. Preserving

/ -199 lO0 5

99 -199. lOl

98 5 -199 lOl 5 0

98 -199, I02

97 5 -199 102 5

97 -199 I03

96,5 -199

0 96

\

103 5

-199 104

95 5 -199

FIG 10. Band matrix.

/

the entire matrix in fast core eliminates the transmission times between fast core and auxiliary devices, as well as the t ime required to restore elements in fast core, which is done prior to data manipulat ion and processing. Another prime advantage directly involved with data transmission is the use of overlapping channels in burst or select mode. However, when the matrix is fully maintained in fast core, channels will then be available to other users on multi-user computers.

I f the band matrix has full bands, tha t is, no row has any zero elements within the band, then the total number of elements to be stored is the band width multiplied by the number of rows in the matrix. Figure 10 depicts a band matr ix with full bands (a band width of 3 here):

EXAMPLE 5. Figure 10 is the resulting 9 X 9 matr ix obtained by using a central difference approximation (3 points) to solve the boundary-value differential equation 2 + 3t 2 = y + y ' + y" using 10 intervals between the points y ( t = 0) = 0. a n d y ( t = 1) = 1.

A 5-point interpolation would yield a band width of 5; 50 intervals would result in a 49 X 49 matrix. Note that the augment column, a constant associated with each row of the matrix, is not considered here as an integral par t of the sparse matrix. Accuracy of results depends on the number of intervals, number of points in the interpolation formula, and computer round off. In one particular application of processing

a band matrix by rows (columns), it is convenient and efficient to store elements in full vectors, one vector for each super- or sub-



diagonal of the band matrix. Since the diagonal has the greatest number of elements, the vector for the diagonal will be the largest vector. To avoid double indexing, which takes greater execution time, an additional table of addresses is created. Each element of the address table contains the address of the first element of the respective vector. The indexing scheme in the algorithm used to arithmetically manipulate the band matrix is then altered to suit the storage scheme.

If, for some reason, it is more convenient to store elements in a row or column form, e.g., because of a very difficult or time-con- suming arithmetic manipulation, most of the advantage of employing a band scheme is lost, and other methods of indexing should be considered.

Band matrices, as noted above, are un- usual from an indexing standpoint because of the very slight core requirements for indexing. For the application described above, only W = I , V / B words are required for indexing; where I is the number of rows; V is the number of bits used for a row index element; and B is the number of bits per word. As a percentage (E) of the full matrix, this indexing requirement is

Ezand = 100 / J%

where J is the number of columns in the matrix when full words are used for the table of addresses. If hMf-words are adequate, it decreases this requirement further by one- half.

I t should be brought to the attention of the user that in the instance where bands do contain zero elements, a decision should be made whether to employ a band scheme, which may not be very efficient in use of core if a large number of null entries exists, or some other particular scheme, such as a block-diagonM scheme, which may not conserve execution time.

Many papers [4, 10, 34, 40-43] are concerned with band matrices, primarily, as said, because of the prevalence of band matrices in many specific fields of interest. Also, many algorithms are readily available for processing band matrices; FOaTRAN M

[44] being one of the more recent programming packages.

VII. CONCLUSION

In the previous sections four major types of indexing methods were discussed, three of which are in general use: the bit map scheme, the row-column scheme, and the threaded list scheme. Each major type, of course, has many variations (the address map method is not in general use at present, so no variations occur). The important special case of the band matrix is discussed as a separate entity, because it is not a general case of a sparse matrix, even though it has wide application.

As stated in the Introduction, one of the major considerations in selecting a particular indexing method is the amount of fast core the method requires, in addition to the data elements. The indexing in the bit map method requires a fast core allocation of approximately 4 % of the full matrix; in the address map method indexing requires about 25 % of the full matrix. The row-column and threaded list schemes have no definite core requirements for indexing, and fast memory for indexing is directly proportional to the sparse matrix density. The percentage of the full matrix required for indexing a row- column scheme is about one times the matrix density, and about twice the density is required for a threaded list scheme.

Previous discussion indicated that an exact comparison of execution times must reflect the type of mathematical manipulation being performed on the sparse matrix. For example, the bit map method is of particular use when the matrix is used to produce an "opt imal" ordering, so the matrix inverse will not have a greatly increased density. In contrast, the row-column method is faster than other methods when manipulations involve one row (column) acting on other rows (columns).

The second important aspect of indexing scheme selection is the conservation of execution time. If arithmetic operations are to be performed on the data, primary consideration should first be given to a row- column method; if Boolean arithmetic or

Computmg Surveys, Vol. 5, No 2, June 1973


reordering algorithms are to be performed, the bit map scheme should be considered first; and if a great number of data elements are to be reordered, created, or annihilated, a threaded list scheme deserves first consideration.

The bit map scheme has a definite core allocation for indexing, offers a reasonable row access time, is quite fast in execution time when row operations are performed, is core efficient when the matrix density is greater than 4 %, and allows very fast manipulation of logical (Boolean) operations. Logical operations can be conveniently used to determine when arithmetic operations are to be executed.

As to its disadvantages: the bit map scheme has extremely poor column access time when elements are ordered by rows, which in most cases requires transposing the bit map and reordering the data elements: it makes poor use of parallel processing, requires considerable time to reorder data elements, and is not core efficient when matrix density fails below 4 %.

The address map proves advantageous when character addressing is available, makes very efficient use of parallel processors, provides ready access to any element, does not require an extensive amount of execution time (in comparison to the bit map scheme) to reorder data elements, and exhibits a reasonable row and column execution time.

The primary disadvantages of the address map method are: a large fast core requirement for indexing; and the relatively large execution time, in comparison with the threaded list scheme, to reorder matrix elements.

Both bit and address maps require significant execution times to transpose the ma- t r i x - t h e map must be transposed, and all the data elements must be reordered. Ex- ecution time to transpose the matrix is directly proportional to the order of the matrix and the matrix density.

Pr imary advantages of the row-column schemes are: a very fast row access time in comparison with the bit and address maps; a relatively fast column access time in comparison to all other methods; conservation of

core with matrices of less than 4% density when compared to the bit map method; an increase in efficiency as the order of the matrix increases, as more complex variations become more efficient; and faster reordering than the bit map or address map methods.

The main disadvantages of the row-column scheme are tha t column access time and the time required to reorder elements greatly increase as the matrix order and /or matrix density increases.

The threaded list technique is the sole technique that allows a simple and fast ex- ecuting method of reordering, adding, or annihilating data elements.

The threaded list scheme exhibits a variety of disadvantages, the primary ones being a large core requirement for indexing in comparison with the row-column method, a slow access time for rows when elements are stored by rows, and an even slower access time for columns compared with the row- column method. The inclusion of orthogonal links, as discussed by Knuth [35], removes some of the column access difficulties, but only at the price of additional storage.

For the special case of band matrices, a scheme similar to the one described in Par t VI should be used unless either half or more of the elements within the b, nd width are null, or the nature of the mathematical operations to be performed dictates otherwise (as described in Par t VI). If the band matrix scheme cannot he utilized, the user must decide which characteristics of the other types of indexing are considered vital to the solution, and select a method on this basis.

A final major aspect of indexing the user must consider concerns the adaptabil i ty and flexibility of programming the selected scheme, which depends upon the factors enumerated in the Introduction. The following suggestions and comments concerning programming flexibility and adaptabil i ty are offered.

None of the major types of indexing schemes requires double indexing. Double indexing involves using one register (adder) to index across the row, and another register to index down the column. Double indices have at least three drawbacks: they require



more time than single indices; the computer may have a built-in limit on the number of characters or words that can be indexed by one or both of the registers before a new index (base) register must be designated;and registers are at a premium, because of the extremely fast register to register operation time, and should be used for more vital arithmetic. In the last analysis, the increased time involved in double indexing is the critical factor.

In general, the larger the order of the matrix, the lower the matrix density. Be- cause of this the row-column method is preferred for matrices with orders of 1000 or more, especially when arithmetic manipulations or operations are to be performed.

As the order of the matrix increases, it becomes more efficient to employ more complex variations of the major types. For instance, the delta indexing scheme (as described in Part VI) conserves a considerable amount of fast core compared with the simpler row-column schemes, without a great increase in execution time, when the order approaches 1000.

If the matrix requires more fast core than is available, the user must decide either to segment the matrix between fast and slow core, or to reduce the complexity of the problem. If the problem can be simplified, or the matrix condensed or partitioned (blocked), then it is not necessary to segment the matrix between fast and slow core. Simplifying the matrix involves the real consideration of whether or not it is economically feasible to reorder rows and/or columns to produce a new matrix that can be more efficiently processed. Many schemes have been developed [7, 16, 18, 27] to attempt such an optimal ordering of matrix elements. Condensing the matrix involves the elimination of data elements that produce insignificant or negligible change in the results. Such condensing can often be done with reasonable competence by somebody skilled in the nature of the problem to be solved. If the matrix is of block-diagonal form, each block can be processed as a separate entity to produce a composite result.

The availability of a virtual memory

processor might lead the user to the er- roneous conclusion that the benefits of a proper indexing algorithm are negated. This is not so; at some time during the processing of a sparse matrix the matrix must reside in physical memory. I t then follows that the fewer the number of pages occupied by the sparse matrix, the fewer the page faults generated, and therefore the less time involved in moving the matrix to and from peripheral paging devices. In other words, the same benefits accruing from indexing in an ordinary processor apply in a virtual memory processor.-

When such updating of data files is an- ticipated, the user should designate buffer storage. When new matrix elements are introduced, they should be stored in the buffer area. When a considerable humber of corrections to the data elements exist (about 5%), then the matrix is reordered. The threaded list scheme requires no separate buffer area, as a buffer is inherent in the indexing scheme.

The segments of coding that contain the actual indexing algorithm should be pro- grammed in a low-level language, such as assembly language, to conserve execution time. High-level languages, such as FOgWRhN utilize a compiler, which may not produce the most efficient coding. For instance, if a division by 32,768 is necessary, the high-level language may simply create a division by 32,768 in assembly language. If the high- level compiler, however, recognized that a division by 32,768 is identical to shifting an accumulator right 16 bits, the assembly language version would be a shift right logical or shift right double logical. The first version would require significantly more execution time than the more efficient assembly language program version. A considerable savings is realized when the com- putation is performed perhaps as many as several million times in a program.

The user should avoid making the indexing algorithm in a subroutine form, especially in a high-level language, because of the added linkage time during program execution.

While a "fast" algorithm for indexing into arbitrarily sparse matrices would allow very


126 • U. W. Pooch and A . N~eder

efficient core storage allocation and execution times for matrix manipulations, it is also evident that no such single algorithm exists, at least at present. The advent of array processors and pipeline computers may eliminate the desire to handle sparse matrices in any special manner whatsoever. However, it also appears that no matter how large, or how fast and sophisticated, computing machines become, users will con- tinue to strive for core storage conservation and faster execution times. It remains to be seen if sufficiently sophisticated indexing algorithms will be developed to accomplish those goals in array or pipeline machines; or whether such machines will come into

general use and provide an environment conducive to developing sparse matrix indexing schemes.

For the present, the choice of an indexing algorithm depends upon many considerations, with each major type of indexing discussed here having particular advantages and disadvantages. Careful selection of an algorithm can satisfactorily achieve the goals of conservation of core memory and execution time. In addition, whenever there exists some pattern to the nonzero entries, the possibility of reorganizing the calculations as a means to handle some sparse matrices should be carefully considered.


APPENDIX


A L G O R I T H M 1 B I T Statement

01 R O W ~- 02 R I N D E X ~- v(~) 03 B I T S ~ b 04 R O W ¢-- R O W - 1 05 COLS ~- J 06 R O W e - R O W * C O L S 07 S A V E (--- R O W 08 R O W ~- R O W 4- B I T S - 1 09 R O W ~- R O W / B I T S

10 R O W E N D (-- C O L S l l S T A R T ~- R O W 12 R O W E N D ~- R O W E N D A N D

M A S K

13

14 15 16 C O U N T

S T A R T *- S T A R T * 2 * * S A V E

R O W E N D *- R O W E N D - S A V E GO T O R O W S C A N R O W E N D *-- B I T S

17 R O W ~- R O W 4- 1 18 W O R D ~-- b i t word f rom m a p 19 R O W S C A N W O R D B 1 T *-- b i t f r o m b i t - w o r d

(WORD) 20 C O L N U M (-- C O L N U M 4- 1

21 W O R D B 1 T = 1

22 IF YES , GO T O M A T H

23 E N D R O W C O L N U M = ~ C O L S 24 I F YES , GO T O E N D 1

25 C O L N U M = R O W E N D

26 IF YES, GO T O C O U N T 27 GO T O R O W S C A N 28 M A T H R I N D E X e - - R I N D E X 4- 1

29 GO T O E N D R O W 30 E N D 1 S T O P

M A P S C H E M E M e a n i n g

is t h e row n u m b e r t h a t will b e m a m p u l a t e d v is t h e row i n d e x v e c t o r b = n u m b e r of b i t s / w o r d

(* -- 1) J is t h e n u m b e r of c o l u m n s in t h e m a t r i x (z -- 1) * J Save ( z - l ) * J ( ( ( z - 1 ) * J ) 4- b - D S, = (((~ - l ) * J ) 4- b - - 1 ) / b word c o n t a i n s t h e

f i rs t b i t of r e q u i r e d row E n d of row c o u n t e r ( J ) S t a r t i n g word of t h e row D e t e r m i n e co r r ec t n u m b e r of d i s p l a c e m e n t b i t s ;

M A S K = m a s k for m a x i m u m d i s p l a c e m e n t b i t s

Sh i f t to e l i m i n a t e i n c o r r e c t b i t s ( f rom p r e v i o u s rOW)

C o r r e c t for e h m i n a t e d b i t s B r a n c h to code to s c a n row in b i t m a p for 1 b i t s F o l l o w m g code s c a n s one e n t i r e row of a b i t m a p .

A f t e r f i rs t w o r d of row is s c a n n e d , t h e b i t c o u n t e r ( R O W E N D ) = b

I n c r e m e n t b i t m a p word a d d r e s s b y one W o r d of b i t m a p P i c k up h{gh o rde r b i t f r o m b i t word

I n c r e m e n t c o l u m n n u m b e r

F o l l o w i n g s t a t e m e n t s a re b r a n c h c o n t r o l s Is t h e b i t non -ze ro? Yes , an e l e m e n t e x i s t s .

I s t he c o l u m n c o u n t e r e q u a l to t h e row c o u n t e r ? Yes , e n d of row

H a v e we s h i f t e d c o m p l e t e l y t h r o u g h b i t m a p word?

Yes , f e t c h a n o t h e r word . No , s c a n n e x t b i t in word R I N D E X = a d d r e s s of n o n z e r o e l e m e n t

C O L N U M = c o l u m n n u m b e r of n o n - z e r o e l e m e n t

P e r f o r m r e q u i r e d o p e r a t i o n on e l e m e n t

R e t u r n E n d of o p e r a t i o n on t h e row.

Computing Surveys, ¥oi 5, No 2, June 1973

1 2 8 • U. W . Pooch and A . Nieder

A L G O R I T H M 2: A D D R E S S M A P

S t a t e m e n t 01 R O W *-- i 02 R I N D E X ~-- v(~) 03 R O W *-" R O W -- 1 04 C O L S ¢-- 3 05 R O W (-- R O W * C O L S 06 R O W (-- R O W - 1 07 S T A R T R O W ~-" ROW + 1 08 C O L N U M (--- C O L N U M + 1 09 C O L N U M > C O L S 10 I F Y E S , GO T O E N D R O W

I l l B Y T E ~- b y t e f r o m

a d d r e s s m a p

12 B Y T E ~ 0 13 I F YES, GO T O S T A R T 14 C H E C K ~-- 0 15 C H E C K *- B Y T E 16 M A T H C H E C K ~ C H E C K + R I N D E X

S C H E M E

M e a n i n g i = row v = row i n d e x v e c t o r (i - 1)

3 = $ c o l u m n s j*(~ -- 1)

(3*(2 -- 1)) -- 1 I n c r e m e n t a c r o s s row I n c r e m e n t c o l u m n $ E n d of row Y e s , done P i c k u p p a r t i a l word

I s b y t e zero? R e e n t e r s c a n p r o c e s s Zero w o r k a r e a B y t e to w o r k a r e a P o i n t s to n o n - z e r o e l e m e n t

R e q u i r e d o p e r a t i o n s p e r f o r m e d he re

17 GO T O S T A R T 18 E N D R O W S T O P 19 E N D

S t a t e m e n t 01 O2 03 O4 O5 O6 07 S T A R T O8 O9 10 11 12

13 14 15 16 17 M A T H

R e e n t e r s c a n p r o c e s s F i n i s h

A L G O R I T H M 3: A D D R E S S M A P S C H E M E

B E G I N *-- A d d r e s s of a d d r e s s m a p B E G I N ~-- B E G I N + J B E G I N *-- B E G I N -- 1 C O L S (-- 3 R O W S ~ B E G I N ~-- B E G I N - C O L S B E G I N ~-- B E G I N + C O L S R I N D E X *- v(I) R O W C T R ~-- R O W C T R + 1 R O W C T R > C O L S I F YE S , GO T O E N D R O W B Y T E *- b y t e f r o m a d d r e s s m a p

B Y T E = ~ 0 I F YE S , GO T O S T A R T C H E C K ~- 0 C H E C K ~- B Y T E C H E C K ~-- C H E C K + R I N D E X

M e a n i n g P o i n t e r J = c o l u m n g

3 = g c o l u m n s i = g rows

I n c r e m e n t a d d r e s s R o w index v e c t o r I n c r e m e n t row c o u n t e r P a s s e d end of m a t r i x ? Yes , p a s s e d e n d P i c k u p p a r t m l word

Is b y t e zero? R e e n t e r s c a n p r o c e s s Zero w o r k a r e a B y t e to w o r k a r e a P o i n t s to n o n - z e r o e l e m e n t

R e q u i r e d o p e r a t i o n s p e r f o r m e d he r e

18 19 E N D R O W 20

GO T O S T A R T S T O P E N D

R e e n t e r s c a n p r o c e s s F i n i s h



I" ROW

I ROW + i I

)--

IR,,DEX + v(i)

D COLS÷ J (

[ Row ~- RO.*CO'S i $

, , . ¢

÷ (ROW + BITS - I ) / B I T S l !

[i.o.~,o: ~oc,~ .~

[START ÷ ROW I • ~ ,

I MASK & SHIFT ROWENDI

RowEND ~ - SAVE I I . . . . ROWEND

FIG A1.

,1,,

_ i_ __~ . . . . . . . ~, ~IT÷bit frombitmap I

¢ NO

NO

oc.o. ; O"E"9 @.o @

Flowchart--algorithm 1 bit map scheme.

Computing Surveys, VoL 5, No. 2, JuBe 1973


@

~ NDEX ÷ v(i)

F~o~ ~o~- ~I

COLS ÷ j

ROW÷ ROW + l ~, _,]

( IS COLNUM, COLS~ ~ ~NO

~_~TE ÷ bYte from address map ]

~IS BYTE = 0?~ Y E S ~ @"°

YES

CHECK ÷ 0

ICHECK÷BYTE

~-- DE-X-~ ICHECK ÷ CHECK + RIN

FIG A2 F l o w c h a r t - - a l g o r i t h m 2: address m a p scheme



BEGIN ÷ address map address I

BEGIN ÷ BEGIN + J - l

$ ~ BEG,N ÷ BEGIN + CO'S l

[ RINDEX÷ v(I> I

( ~ RO.C~ > c o ~ ~ NO

I-BYTE + byte from address map~

~ ~,~ - o~; Y~ < ~

+

'~._j /

I, CHECK ÷ 0

T FCHECK ÷ BYTE

CHECK ÷ CHECK + RINDEX

Fza A3 F lowcha r t - - a lgo r i t hm 3 address map scheme.

Computing Surveys, Vol, 5, No. 2, June 1973


BIBLIOGRAPHY

1. BRAYTON, R., GUSTAVSON, F., AND WIL- LOUGHBY~ R. "Some results on sparse matrices." RC2332, IBM Watson Research Center, (February 1969), 37-46.

2. LARSEN, L. "A modified inversion procedure for product form of the inverse-linear program- ruing codes " Comm. ACM 5, 7 (July 1962) 382- 383

3. LIVESLEY, R. "An analysis of large structural system." Comp. J. 3, (1960)34-39.

4. McCoRMICK, C.W. "Application of partially handed matrix methods to structural analysis." Sparse Matrix Proceedings, R. Wil- loughby (Ed.) IBM Watson Research Center, RAl1707 (March 1969) 155-158

5 ORCHARD-HAYs, W. Advanced L~near Pro- gramming Techniques McGraw-Hill, New York, 1968, 73-82.

6. TEWARSON, R. "On the product form of inverse of sparse matrices." SIAM Rewew 8, (1966) 336-342.

7 TEWARSON, R. "Row column permutation of sparse matrices." Comp. J 10, (1967/68) 300-305

8. BRAYTON, R., GUSTAVSON, F , AND WIL- LOUGHBY, R. "Some results on sparse matrices." (Introduction), RC2332, IBM Watson Research Center, (February 1969) 1-3.

9. BASHKOW, T "Network analysis." Mathe- matical Methods for Digztal Computers A. Ralston and A. S. Wilf, Eds., Vol. I, John Wiley and Sons, New York, 1967 280-290

1O. TINNEY, W F. "Comments on using sparsltv techniques for power system problems." Sparse Matrix Proceedings R Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 25-34.

11. PALACOL, E . L . "The finite element method of structural analysis " Sparse Matmx Pro- ceedzngs R. Willoughby Ed., IBM Watson Re- search Center, RAl1707 (March, 1969) 101-5.

12. RALSTON, A. "Numerical integration methods for the solution of ordinary differential equations." Mathematzcal Metaods for Dzgztal Computers A. Ralston and A. S. Wilf Eds, Vol. I, John Wiley and Sons, New York, 1967, 95- 109.

13. ROMANELLI, M "Runge-Kutta methods for the solution of ordinary differentml equations " Mathematzcal Methods for Dzgztal Com- puters A. Ralston and A S Wilf, Eds , Vol. I, John Wiley and Sons, New York 1967, 110-20.

14. WAC~SPRESS, E "The numerical solution of boundary value problems " Mathematzcal Methods for Dzgztal Computers A Ralston and A. S. Wflf, Eds , Vol. I, John Wiley and Sons, New York, 1967, 121-7.

15. WEIL, R,, JR, AND KETTLER, P. " A n algorithm to provide structure for decomposition." Sparse Matrzx Procee&ngs R. Willoughby, Ed., IBM Wa~sca Research Center, RAl1707 (March, 1969) 11-24

16. GUSTAVSON, F., LINIGEB, W., WILLOUGHBY, R. "Symbohc generation of an optimal crout algorithm for sparse systems of linear equa-

tlons." Sparse Matrix Proceedings R. Wil- loughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 1-10.

17. SMI~I, D .M. "Data logistics for matrix inversion." Sparse Matrix Proceedings R. Wil- loughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 127-32.

18. SPILLERS, W. R., AND t~ICKERSON, N. "Opti- mal elimination for sparse symmetric systems as a graph problem." Quar Appl. Math. 26 (1968) 425-32

19 STEWARD, D. V. "On an nique for the analysis of thepproacha

to tech- structure of large

systems of equations." SIAM Rev 4 (1962) 321-42.

20. TEWARSON, R.P . "The Gausslan elimination and sparse systems," Sparse Matrzx Proceed- ~ngs R. Willoughby, Ed., IBM Watson Re- search Center, RAl1707 (March, 1969) 35-42.

21. GIVENS, W., McCoRMICK, HOFFMAN, et al. "Panel discussion on new and needed word and open questions." (Chairman P. Wolfe), Sparse Matmx Proceedings R. Willoughby, Ed., IBM Watson Research Center, RAl1707 (March, 1969) 159-80.

22 WILKES, M. V. "The growth of interest in microprogramming: a literature survey," Com p. Surveys, 1,3 (September, 1969) 139-45.

23. ORC~ARD-HAYs, W. "MPsys tems technology for large sparse matrices." Sparse Matrix Pro- ceedzngs R. Willoughby, Ed , IBM Watson Re- search Center, RAl1707 (March, 1969) 59-64.

24. CHANG, A. "Apphcatlon of sparse matrix methods in electric power system analysis." Sparse Matrix Proceedings R. Willoughby, Ed., IBM Watson Research Center, BAll707 (March, 1969) 113-122.

25. BRAYTON, n . , GUSTAVSON, F., WILLOUGHBY, R "Some results on sparse matrices." IBM Watson Research Center, RC2332 (February 1969) 21-22.

26. CHhRTRES, B A., ANn GLUDEN, J C. "Com- putable error bounds for direct solution of hnear equations." J ACM 14, 1 (Jan 1967) 63-71

27 FORSY~HE, G. E. "Crout with pivoting." Comm. ACM 3 (1960) 507-8.

28. JENNINGS, A. "A compact storage scheme for the solution of symmetric linear simultaneous equations." Comput. J. 9 (1966/67) 281-5

29. System 360 Matrix Language (MATLAN) Ap- plication Description, IBM H20-0479 Program Description Manual, IBM H20-0564

30. McNAMEE, J M. "Algorithm 408, a sparse matrix package." (Part I), Comm ACM 4, 4 (April 1971) 265-273.

31. DULMAGE, A L., AND MENDELSOHN, N. S. "On the inversion of sparse matrices." Math. Comp. 16 (1962) 494-496.

32. MAYOH, B.H. "A graph technique for invert- ing certain matrices." Math. Comp. 19 (1965) 644-646.

33. RoT~, J. P. "An application of algebraic topology: Kron's method of tearing " Quar. Appl. Math. 17 (1959) 1-24

34. SWIFT, G "A comment on matrix inversaon by partition." SIAM Rev. 2 (1960) 132-33.



35. KNUTH, D. ]~. The Art of Computer Program- m~ng, Vol. I, Addison--Wesley, Reading, Mass. 1968 299-304, 554-556.

36. BERZTISS, A .T . Data Structures: Theory and Practice. Academic Press, New York, 1971, 276-279.

37. LARCOMBE, M. "A hst processing approach to the solution of large sparse sets of matrix equations and the factorization of the overall matrix." in Large Sparse Sets of L~near Equa- twns, Reid, J. K., Ed., Academm Press, London, 1971.

38. WEIL, R. L., AND KI~TTLER, P .C. "Rearrang- ing matmces to block-angular form for decom- potation (and other) algorithms." Manage- ment Science 18, 1 (Sept. 1971) 98-108.

39. GUSTAVSON, F. G. "Some basic techniques for solving sparse systems of linear equations " in Sparse Matmces and Their Applications, Rose, D J , and Willoughby, R. A., Eds., Plenum Press, New York, 1972 41-52.

40. FIKE, C .T . PL/I for Scientific Programmers,

Prentice-Hall, Englewood Cliffs, N. J., 1970 108, 180.

41. WILLOUGHBY, R. A. "A survey of sparse matrix technology." IBM Watson Research Center, RC3872 May 1972.

42. CuTmt.t., E. "Several strategies for reducing the band-width of matrices." in Sparse Mat- races and their Applications, Rose, D . J., and Willoughby, R. A., Eds., Plenum Press, New York, 1972, 34-38.

43. TEWARSON, R.P . "Computations withsparse matrices." SIAM Rev., 12, 4 (Oct. 1970) 527- 543.

44. PETTY, J. S. "FORTRAN M: programming package for band matrices and vectors." Aero- space Research Labs., Wright-Patterson AFB, Ohio, ARL-69-0064 (April, 1969).

45. SHLL~RS, W.R. "On Diakoptics: Tearing an arbitrary system." Quar. Appl. Math. 23 (1965) 188-90.

46. IBM System/360 Model 65 Functional Charac- teristics, IBM A22-6884-3, File No. $360-01.

Computing Surveys, VoI. 5, No. 2, June 1973

a survey of indexing techniques for sparse matrices

Technology

r t of t h i s

c t t h

t r e f e r e n c e

g e n e r

e g e s

s s i o n of t h e

n d t h

t h e f