an algorithm for profile and wavefront reduction of sparse

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, VOL. 23, 239-251 (1986)

AN ALGORITHM FOR PROFILE AND WAVEFRONT REDUCTION OF SPARSE MATRICES

S . W. SLOAN

Department of Civil Engineering and Surveying, University of Newcastle, N X W , Australia

SUMMARY

An algorithm for reducing the profile and wavefront of a sparse matrix is described. The scheme is applicable to any sparse matrix which has a symmetric pattern of zeros and may be used to generate efficient labellings for finite element grids. In particular, it is suitable for generating efficient labellings for profile and frontal solution schemes. Empirical evidence, obtained from analysis of the 30 test problems collected by Everstine, suggests that the new algorithm is superior to existing methods for profile and wavefront reduction. It is fast, requires only a small amount of memory, and is simple to program.

INTRODUCTION

Many problems in science and engineering require the solution of a set of sparse matrix equations of the form

where [ A ] is a known N x N matrix which is sparse and has a symmetric pattern of zeros, (x] is a vector of unknowns of length N , and (b} is a known vector of length N . This type of equation arises, for example, in various finite element computations and is usually solved using some form of Gaussian elimination.

Bandwidth strategies for implementing Gaussian elimination attempt to cluster the non-zero elements of [ A ] along the diagonal and then exploit the banded nature of the equations. The bandwidth of [ A ] may be defined as

CAI (4 = Cb) (1)

B = max ( b i } (2) l < r < N

where b, is the difference between i + 1 and the column index of the first non-zero entry in row i of [ A ] . Note that this definition of bandwidth includes the diagonal term. In a traditional banded solution algorithm, all entries inside the envelope of the bandwidth are stored and operated on, even if they are zeros. If N and B are large, the number of arithmetic operations involved in a single elimination step for a bandwidth scheme is O(NB2). Since the size of B is related to the order in which the equations are eliminated (i.e. the order in which they are labelled), it is desirable to label the equations so as to procure a small bandwidth and thus avoid wasted operations on zero entries. In finite element analysis, where the order of elimination is dictated by the labelling of the nodes, a small bandwidth is procured by minimizing the maximum positive difference between the node numbers which define each element. For large systems of equations, or problems with a complicated matrix structure, it is impractical to generate efficient labellings by hand and automatic procedures are necessary. Several schemes for automatically reducing bandwidth have been devised, the most successful of which are those of Cuthill and McKee,' Collins2 and Gibbs et aL3

0029-5981/86/030239-13$01.30 0 1986 by John Wiley & Sons, Ltd.

Received 30 October 1984 Reuised 16 April 1985

240 S. W. SLOAN

Although bandwidth solution schemes are simple to implement, they suffer from the disadvantage that their economy is governed solely by the bandwidth. These schemes may be inefficient for sparse matrices which contain a significant number of zeros inside the bandwidth envelope. One alternative to the bandwidth strategy involves discarding all leading zeros in each row and column and storing only the profile of the matrix. Various means of implementing profile schemes have been described, including those discussed by Jennings4 and Taylor,s and they are often used for solving the sparse matrix equations that arise in finite element computations. In order to optimize the performance of the profile method, it is necessary to label the equations so that few zeros follow the leading zero in each row and column. If we consider only the upper triangle of [A] , this is equivalent to minimizing the sum of the column heights. For a matrix which has a symmetric pattern of zeros, the sum of the column heights is known as the profile and is defined by

N P = C bi

i = 1 (3)

The number of arithmetic operations involved in a single elimination step for a profile scheme is roughly proportional to the sum of the column heights squared. A number of algorithms have been designed specifically to reduce the profile of a sparse matrix, including those of King6 and Gibbs7 The reverse Cuthill-McKee method, proposed by George,* reverses the labelling produced by the Cuthill-McKee scheme in order to reduce profile. Recently, Lewisg has described an efficient implementation of the Gibbs algorithm which performs well on the 30 test matrices collected by Everstine.". The scheme published by Gibbs7 appears to be one of the most effective means available for profile reduction since it is fast, reliable and often yields profiles which are less than those produced by other algorithms.

Another technique which is often used for solving sparse matrix equations is the frontal solution method (see, for example, Irons"). This procedure has found wide application in the field of finite element analysis, but is generally applicable to any system of sparse matrix equations. In the frontal algorithm the overall [A] matrix is never assembled explicitly. Instead, the assembly and elimination phases are interleaved with each equation being eliminated as soon as it is fully assembled. It is possible to combine the assembly and elimination steps in this manner because of the nature of the Gaussian elimination algorithm. In order to optimize the performance of the frontal method, the equations should be labelled so as to minimize the number of equations that are active during each stage of the elimination process. With reference to equation (l), equation (or column) j of row i is said to be active if j 2 i and there is a non-zero entry in column j with a row index, k, such that k 2 i. Letting fi denote the number of equations that are active during the elimination of the variable x i , the maximum wavefront of [ A ] is given by

F = max (fi> l S r $ N

Since it is assumed that [A] has a symmetric pattern of zeros, it follows that N N

(4)

Following Everstine," the average and root-mean-square wavefront may be defined, respectively, as

P - 1 N F = - Z f i = -

Ni= l N

REDUCTION OF SPARSE MATRICES 24 1

heavy lines denote active columns

' b i f i b i z f i 2 1 3 1 9 2 3 4 9

TALS

Figure 1 . Stiffness matrix for mesh of one-dimensional bar elements

The number of arithmetic operations in a single elimination step for a frontal algorithm, assuming that N and F are large, is O(NI"). For the type of frontal solver most commonly used in finite element computations it is the order of the elements, not the order of the nodes, that is important for an efficient solution. This is because the equations are assembled on an element-by-element basis, even though they are still eliminated node by node. Since it is the connectivity of the nodes that determines when an equation can be eliminated, an efficient element labelling may be achieved by first ordering the nodes to minimize E. The elements are then labelled so that the elimination sequence implied by the new nodal numbering is followed as closely as possible. In practice, this may be achieved by processing the elements in ascending sequence of their lowest numbered nodes. Various algorithms for reducing the maximum wavefront of a sparse matrix, with particular emphasis on finite element applications, have been proposed. Some of these include the methods described by Levy," Sloan and R a n d ~ l p h , ' ~ Akin and Pardue,14 Razzaquel' and Pina.I6 In addition, all of the profile reduction algorithms discussed previously may be used.

To illustrate the previous definitions of bandwidth, profile and wavefront, Figure 1 illustrates a grid of one-dimensional finite elements and the corresponding structural stiffness matrix. In this diagram, the heavy lines denote active columns. Using equations (2)-(7), we see that B = 4, P = 12, F = 3, F = 2.4 and E = 2.53.

In the following sections, an algorithm for reducing the profile and wavefront of a sparse matrix is described. The procedure is applicable to any sparse matrix with a symmetric pattern of zeros and may be used to generate efficient labellings for finite element grids. In particular, it may be used to provide efficient nodal numberings for profile solution schemes, as well as efficient element numberings for frontal solution schemes. Application of the algorithm to Everstine's test problems indicates that it is more effective than the reverse Cuthill-McKee,' Gibbs et aL3 Lewisg and Levy" schemes. Detailed timing comparisons indicate that the new algorithm is substantially faster than the Lewis' algorithm, and also requires less storage. A major attraction of the proposed scheme is its simplicity.

NOTATION AND DEFINITIONS

As discussed by Cuthill and McKee,' the derivation of an efficient ordering for a sparse matrix is related to the labelling of an undirected graph. Some elementary concepts from graph theory are useful in the development of heuristic labelling strategies and it is appropriate to state some basic definitions.

242 S. W. SLOAN

A graph G is defined to be a pair (N(G), E(G)) where N(G) is a non-empty finite set of members called nodes, and E(G) is a finite set of unordered pairs, comprised of distinct members of N(G), called edges. A graph satisfying the above definition is said to be undirected because E(G) is comprised of unordered pairs. The occurrence of loops (i.e. edges which join nodes to themselves) and multiple edges (i.e. pairs of nodes which are connected by more than one edge) is excluded.

The degree of a node i in G is defined as the number of edges incident to i . Two nodes i and j in G are said to be adjacent if there is an edge joining them.

A path in G is defined by a sequence of edges such that consecutive edges share a common node. Two nodes are said to be connected if there is a path joining them. A graph G is connected if each pair of distinct nodes is connected.

The distance between nodes i and j in G is denoted d(i, j ) , and is defined as the number of edges on the shortest path connecting them. The diameter of G is defined as the maximum distance between any pair of nodes, i.e.

D(G) = max{d(i, j ) : i , j E N ( G ) )

Nodes which are at opposite ends of the diameter of G are known as peripheral nodes. a pseudo-diameter, 6(G), is defined by any pair of nodes i

and j for which d(i, j ) is close to D(G). A pseudo-diameter may be slightly less than, or equal to, the true diameter and is found by some approximate algorithm. Nodes which define a pseudo- diameter are known as pseudo-peripheral nodes.'

An important concept in the development of graph labelling algorithms is the rooted level structure. A rooted level structure is defined as the partitioning of N(G) into levels ll(r), lz(r), . . . , Eh(r) such that:

1. Z,(r) = { r ) where r is the root node of the level structure. 2. For i > 1, Ei(r) is the set of all nodes, not yet assigned a level, which are adjacent to nodes in 1,- '(r) .

The level structure rooted at node r may be expressed as the set L(r) = {Zl(r), Zz(r),.. . , Eh(r)}, where h is the total number of levels and is known as the depth. The width oflevel i is defined by 1 Zi(r)l (i.e. the number of nodes on level i ) and the width of the level structure is given as

w = max { I Zi(r) I }

Following the notation of Gibbs et

1$ i$h

To illustrate some of these definitions, consider the grid of two-dimensional finite elements shown in Figure 2. This grid is comprised offour-noded quadrilaterals with two equations per node. For a displacement type of finite element formulation, the graph which corresponds to the global stiffness matrix is shown in Figure 3. (All nodes which share an element are connected by an edge.) With reference to Figure 3, N(G) is the set {1,2,3,4,5,6) and E(G) is the set (1,2), {1,3), {1,4), {2,3}, {2,4), {3,4), {3,5}, {3,6), {4,5), {4,6), {5,6). The distance between nodes one and six is two, and these nodes also define the diameter of the graph. (The diameter is also defined by nodes one and

I e lement I e l emen t I 1 2

Figure 2. Grid of four-noded quadrilaterals

REDUCTION OF SPARSE MATRICES 243

Figure 3. Graph corresponding to grid of four-noded quadrilaterals

Figure 4. A rooted level structure

five, nodes two and six, and nodes two and five.) Figure 4 shows the level structure of this graph which is rooted at node one (boxes denote level numbers). More formally, this rooted level structure may be expressed as

W) = ( W ) ? 12(1)> /3(1) t where 1,(1) = {I}, E2(1) = {2,3,4) and 13(1) = ( $ 6 ) . The width and depth for this level structure are identical and equal three.

THE ALGORITHM

Once the graph that corresponds to the sparse matrix is established, the labelling scheme is comprised of two distinct steps. Each of these will be discussed in turn.

Selection of pseudo-peripheral nodes

When using the Cuthill-McKee' algorithm to reduce the bandwidth of a graph, Gibbs et aL3 observed that it is often beneficial to begin the labelling process at peripheral or pseudo- peripheral nodes. The level structures associated with these nodes are generally deep and narrow, which enables the Cuthill-McKee scheme to produce a small bandwidth. (It can be shown that the Cuthill-McKee algorithm must produce a bandwidth which is less than or equal to 2w, where w is the width of the level structure rooted at the starting node.) It has been shown by Gibbs7 and Sloan and Randolph,' that pseudo-peripheral nodes also make good starting points for profile and wavefront reduction algorithms. Procedures for locating pseudo-peripheral nodes have been given by Gibbs et al.,3 Sloan and Randolphi3 and George and Liu.I7

A method for locating a pair of pseudo-peripheral nodes, which are endpoints of a pseudo- diameter, is as follows:

1. (First guess for starting node.) Scan all nodes in G and select a node s with the smallest degree. 2. (Generate rooted level structure.) Generate the level structure rooted at node s, i.e. L(s) = (ll(s),

3. (Sort the last level.) Sort the nodes in E&) in ascending sequence of degree. These nodes are at maximum distance from s.

w,. . * 7 14(4 1.

244 S. W. SLOAN

4. (Shrink the last level.) Let m equal IIh(s)I. Shrink the last level by forming a list Q of the first L(m + 2)/2Jt entries in the sorted list I&). 5. (Initialize.) Set wmin+ co and h, , , th . 6. (Test for termination.) For each node iEQ, in order of ascending degree, generate Yi) = {ll(i), I&), . . . , I & ) } . If h > h,,, and w = max { 1 lj(i)\ 1 < wmin set s f- i and go to step 3.

Else, if w < wmin, set e t i and wmin c w. 7. (Exit.) Exit with starting node s and end node e which define a pseudo-diameter.

The above algorithm is similar to the procedure given by Gibbs et u E . , ~ but includes two important modifications. The first modification is the introduction of the shrinking strategy in step 4. This step significantly reduces the amount of computation necessary to locate the pseudo- peripheral nodes, but at the same time ensures that their rooted level structures are deep and narrow. It follows naturally from the empirical observation that nodes with high degrees are not often selected as potential starting or end nodes in step 6. Choosing the first L(m + 2)/2J entries in the sorted last level restricts attention to nodes with degrees which are less than or equal to the median value for all nodes in lh(s). The shrinking strategies discussed by George and Liu” proved unsuitable for the present application as they were found to yield nodes with wide rooted level structures.

The second modification occurs in step 6 and incorporates the ‘short circuiting’ strategy suggested by George and Liu.” Inserting the condition that w must be less than wmin permits the assembly of wide level structures to be aborted before completion and often leads to considerable savings (especially for large graphs). The inclusion of the short circuiting strategy, which may occasionally overlook a slightly deeper level structure, does not affect the performance of the labelling algorithm. This is due to the fact that deep level structures are usually narrow. Moreover, it is not particularly important if the computed pseudo-diameter is slightly less than the true diameter, provided that the level structure associated with the end node is narrow.

The above algorithm usually locates the pseudo-peripheral nodes in two or three iterations, and is considered to be efficient. The pseudo-diameter produced is often a true diameter, but there is no guarantee of this.

i < j < h

Node labelling algorithm

This section describes an algorithm for labelling the nodes of a graph to produce a small profile and wavefront. Two pseudo-peripheral nodes, found by the procedure outlined in the previous section, are required as input. Initially the nodes are labelled arbitrarily with integers ranging from 1 to N , where N is the number of nodes in the graph. The new node labels are generated in a single pass.

In order to describe the algorithm succinctly, it is convenient to state some definitions. Any node which has been assigned a new label is said to have a postactive status. Any node which is adjacent to a postactive node, but does not have a postactive status, is defined to have an active status. Nodes which are adjacent to an active node, but do not have an active or postactive status, are said to be of a preactive status. Nodes which do not have a postactive, active or preactive status are said to be inactive.

The current degree is a quantity which measures the incremental growth in the number of active nodes during the labelling process, and is defined for each node in the graph. The current degree, ni,

‘Lx] means the largest integer less than or equal to x.


of node i in G is

on 'inactive' status

set of nodes with a'preactive' status

set of nodes with an 'active' status

set of nodes with a 'postactive' status

Figure 5. Terminology for node-labelling algorithm.

defined by

n, = (mi - Ci) + k,

where mi is the degree of i, c, is the number of nodes adjacent to i which have a postactive or active status, and ki is an integer of value zero or one (zero if node i is active or postactive, one otherwise). Before the labelling algorithm begins, the current degree of each node in G is simply equal to its degree plus one. After the labelling is completed, the current degree of all nodes in G is equal to zero.

Figure 5 illustrates a partially labelled graph and the use of the above terms. The current degree of node x, for example, is two.

To begin the labelling procedure two pseudo-peripheral nodes, which define a pseudo-diameter, are required. These serve as starting and end nodes for the labelling. The algorithm relabels the starting node as node one and then forms a list of nodes that are eligible to receive the next label. This list is comprised of all active and preactive nodes, and is maintained as a priority queue. The node with the highest priority is labelled next. The priority of each node in the queue is related to its current degree and its distance from the end node. Nodes with low current degrees and large distances from the end node assume the highest priority. Once a node is selected for labelling, it is deleted from the queue and renumbered. The queue of eligible nodes is then updated by using the connectivity information for the graph and the process is repeated until all the nodes have been assigned new labels.

1. (Entry.) Enter with endpoints of a pseudo-diameter, nodes s and e. 2. (Compute distances.) Generate the level structure rooted at the end node, L(e), and use this to form a vector giving the distance of each node from the end node. Note that if node i is located on level j of ye) , then d(e, i ) = j - 1. 3. (Assign initial status and priority.) Assign each node in G an inactive status and an initial priority. For each node iEN(G) set

Pi c (nmax - n,) * W , + d(e, i ) * W,

where Pi is the initial priority of i , W, and W, are integer weights, ni is the initial current degree of i, and nmax = max (nil. For convenience nmax may be set equal to N (since the maximum current

degree in any graph with N nodes is N , this ensures that the priorities will always be non-negative). 4. (Initialize queue of eligible nodes.) Insert the starting node, s, in the queue of eligible nodes and assign it a preactive status.

The algorithm for labelling a graph with N nodes is as follows:

i $ r Q N

246 S. W. SLOAN

5. (Test for termination.) While the queue of eligible nodes is not empty, do steps 6-9. 6. (Select node to be labelled.) Search the queue of eligible nodes and select the node with the highest priority, breaking ties arbitrarily. Let this node be i. 7. (Update priorities and queue.) Delete node i from the queue. If i is not preactive, go to step 8. Else, examine each node j which is adjacent to i and set P j t Pj + W,. I f j is inactive, insert j into the queue of eligible nodes and assign it a preactive status. 8. (Label next node.) Label node i with its new number and assign it a postactive status. 9. (Update priorities and queue.) Examine each node j which is adjacent to node i. If j is not preactive, take no action. Else, set P i c P j + W,, assign j an active status and examine each node k which is adjacent to j . If node k is active or preactive, Set Pk c Pk + W,. Else, if k is inactive, set P, c P, + W,, insert k into the queue of eligible nodes, and assign k a preactive status. 10. (Exit.) Exit with new node labels.

The basic idea behind the algorithm is that, during each stage of the labelling process, nodes with small current degrees and long distances from the end node are labelled first. Selecting nodes with small current degrees causes the current ‘front’ of active nodes to grow by a minimum amount during each step, while selecting nodes with large distances from the end node attempts to take the global structure of the graph into account. The values assigned to the weights W, and W, in step 3 determines the importance of each of these criteria. It will be shown that weightings of Wl = 2 and W, =I 1 give excellent labellipgs, and these values are used for all results presented in this paper. Note that if we select W, = 1 and W, = 0, the above algorithm is similar to one of the procedures proposed by King.‘ The performance of King’s scheme, although good for certain types of problems, may be somewhat erratic. As noted by Gibbs et aE.18 this is due to the local nature of the labelling criterion which ignores the global structure of the graph.

The above scheme may be used to generate efficient node or element labellings for finite element grids. With a profile solution strategy it is the order of the nodes which is important, and the labelling is complete. For a frontal solution algorithm, however, it is still necessary to determine an efficient element labelling. This may be achieved by labelling the elements in ascending sequence of their (new) lowest numbered nodes.I3 Processing the elements in this manner ensures that the equations are eliminated in an order which is similar to that dictated by the new node labels. For grids comprised of a single type of high order element, Sloan and Randolphc3 have noted that it is necessary to consider only the vertex nodes when relabelling the elements for a frontal solution scheme. This follows from the observation that an optimal labelling for a grid of low order elements is the same as an optimal labelling for an equivalent grid of high order elements. Since a grid of high order elements may have a small number of vertex nodes, but a large number of nodes overall, this leads to considerable economies in the relabelling phase.

IMPLEMENTATION

The algorithm described in the previous sections has been implemented in standard FORTRAN 77. The program can deal with connected and disconnected graphs. After each node has been assigned a new label, the code checks that the corresponding profile is less than the initial profile. If this is not the case, the initial node numbering is retained.

Following George and Liu,17 the graph is stored as an adjacency list which is accessed by a pointer vector. If ADJ and XADJ denote the adjacency list and pointer vector, the nodes adjacent to node 1 are found in ADJ( J ) , where J = XADJ(Z), XADJ(I) + 1,. . . , XADJ(1 + 1) - 1. The degfee of node I is given by XADJ(I + 1) - XADJ(1). For a graph with N nodes and E edges, this data structure requires 2E + N + 1 words ofinteger memory (note that each edge is stored twice for ease


of access). In addition to this basic storage requirement, the algorithm requires a maximum of L(7N + 3)/2] words of memory to generate the node labels. Thus, for any graph, the total storage requirement is 2E + L(9N + 5)/2J integer words of memory. The Lewis' implementation of the Gibbs7 scheme (which also has an option for bandwidth reduction using the method of Gibbs et aL3) has an average requirement of approximately 2E + 7N storage locations, and a worst case requirement of 2E + 9 N + 3 storage locations.

A key feature of the algorithm is the list of active and preactive nodes, which is maintained as a priority queue. In the implementation discussed here, the priority queue is stored as a simple unordered list. Separate lists, of length N , are used to record the status and priority of each node. If the current length of the queue is M , finding the node with the maximum priority requires O(M) operations. Deleting, inserting or changing the priority of a node, however, requires only a constant number of operations. Another convenient method for implementing a priority queue, which is efficient if the queue is large, is to use a binary tree data structure (see, for example, Reference 19). A binary tree structure allows an item to be deleted, inserted, or have its priority changed, with O(log, M ) operations, while the searching step is trivial. Both of the above methods for implementing a priority queue were tested using the 30 matrices collected by Everstine." The binary tree implementation was generally less efficient than the unordered list implementation, except for a few isolated cases. Except for very large graphs, where the number of active and preactive nodes during the labelling process is of the order of a thousand, it was concluded that the unordered list structure is the best means of implementing the priority queue.

APPLICATIONS

In order to test the performance of various bandwidth, profile and wavefront reduction algorithms, Everstine" has assembled a collection of 30 sparse matrices. These matrices arise from a number of different finite element grids, and provide a useful means of assessing the performance of heuristic labelling schemes. A full description of these matrices, together with plots of the corresponding finite element meshes, may be found in Everstine.'O

Table I illustrates the profiles produced by the new algorithm, together with the profiles produced by the Lewis' implementation of the Gibbs7 algorithm, for Everstine's test problems. The latter procedure was selected as a benchmark for comparison since it is one of the more successful methods for reducing profile and wavefront, and is known to be efficient. Lewis's implementation is written in FORTRAN IV and distributed by the Association for Computing Machinery as algorithm 582. Inspection of Table I indicates that the new algorithm yields the lowest, or equal lowest, profile for 27 of the 30 test examples. On average, it reduces the profile by roughly 51 per cent and only fails to improve upon the initial labelling in one instance. The worst performance of the new scheme, relative to the Lewis scheme, results in a 1 per cent increase in the profile (problem 28). This is in contrast to the worst performance of the Lewis scheme which, relative to the new algorithm, results in a 62 per cent increase in the profile (problem 17). Overall, the new algorithm performs significantly better on seven occasions (for problems 4,8,10,1 I , 17,25 and 29) and provides an average profile which is 9 per cent less than that of the Lewis scheme.

The profiles produced by the new algorithm also compare well with the profiles produced by the reverse Cuthill-McKee,' Gibbs et aL3 and LevyI2 algorithms that are quoted in Table 111 of Reference 10. On 27 occasions, the new algorithm gives the lowest or equal lowest profile.

The root-mean-square wavefronts produced by the new algorithm and the Lewis algorithm are shown in Table 11. The new scheme gives the lowest, or equal lowest, root-mean-square wavefront on 28 occasions and gives an average reduction of roughly 51 per cent. It achieves no reduction on one occasion. Relative to the Lewis algorithm, the new procedure performs worst on example 28

248 S. W. SLUAN

Table I. Results for profile reduction

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

59 66 72 87

162 193 198 209 221 234 245 307 310 346 361 419 492 503 512 592 607 758 869 878 918 992

1005 1007 1242 2680

464 640 244

2336 2806 7953 5817 9712

10131 1999 4179 8132 3006 9054 5445

40145 34282 36417 6530

29397 30615 23871 20397 26933

109273 263298 122075 26793

1 I1430 590543

314 193

682 1579 4609 1313 4032 2 1-54” I349 3813

*

* *

768 1 5060 SO73 5513

15042 4970a

10925 14760 8175

15728 19696 20498 34068 40141 22465” 52952 99271“

294b 193b

i L h

525” 1 554” 4624 1298’ 3316b 2052b 1089b 2676’ 7550’ 2982b 6726” 5062 72Sb 3412b

14436b 4680‘

10073b 14412” 7598”

14930” 19475b 16943b 33464b 35704’ 22669 36822b 89604’

Average

094 1 -00 1.00 077 0-98 1 .00 0.99 0.82 095 081 0.70 0.93 099 0.88 1.00 090 0.62 096 0.94 092 098 0.93 0.95 099 083 098 089 1.01 070 090 091

0.63 0.30 1 .oo 0.22 055 0.58 022 0.34 0.20 0.54 064 093 099 074 0.93 0.18 0.10 040 072 034 047 0.32 0.73 0.72 0.16 013 029 0-85 033 0.15 0.49

N = order of matrix; Po = original profile; P , = profile for Lewis algorithm; P = profile for proposed algorithm. Notes:

a These values differ from those quoted by Lewis.’ (Note that a11 profiles quoted in Lewis’ do not include diagonal terms.) bProfiIes for proposed algorithm less than or equal to those produced by Lewisg algorithm. *No improvement on original labelling.

and gives a 2 per cent increase in root-mean-square wavefront. This is in contrast to the worst performance of the Lewis algorithm which, relative to the new algorithm, gives a 68 per cent increase in root-mean-square wavefront (problem 17). Averaging over the 30 test cases, the new scheme gives a root-mean-square wavefront which is roughly 9 per cent less than that of the Lewis scheme. Its performance is clearly better on seven occasions (problems 4,8,10,11,17,25 and 29).

The root-mean-square wavefront reductions for the new algorithm compare well with those of the reverse Cuthiil-McKee,* Gibbs et aL3 and Levy’’ algorithms (Table 11 of Reference 10). Tbe new scheme gives the lowest, or equal lowest, root-mean-square wavefront for 27 of the 30 test cases.

For completeness, the maximum wavefronts produced by the new algorithm are also shown in

REDUCTION OF SPARSE MATRICES

Table 11. Results for wavefront reduction

249

Problem N F o FL

1 59 2 66 3 72 4 87 5 162 6 193 7 198 8 209 9 221

10 234 11 245 12 307 13 310 14 346 15 361 16 419 17 492 18 503 19 512 20 592 21 607 22 758 23 869 24 878 25 918 26 992 27 1005 28 1007 29 1242 30 2680

11 21 4

43 33 62 36 71 77 18 30 35 16 44 25

172 149 126 28 88 86 61 41 40

194 514 228

32 193 362

8 3 4

12 13 36 9

32 14" 12 27 35 16 34 15 30 22 50 23 33 38 25 37 25 39 36 89 32" 84 57a

F Po 7" 3' 4"

1 0" 13" 31' 8"

28" 15 8"

18' 35" 13' 30" 15' 26" 11" 45" 27 35 41 24" 37' 33 32" 42 66' 34 50' 53"

8.22 11.01 3.46

29.38 18.96 43.84 30.90 50.32 50.39 935

18.48 27.36 9.85

27.15 15.38

107.07 79.5 1 78.60 14.55 55.18 55.43 37.95 25.02 31.92

131.14 302.00 137.66 26.93

105.20 234.42

PI. -~

5.53 2.94

8.24 10.09 24.86 6.95

20.40 10.18" 6.29

16.64

*

* *

23.39 14.23 19.96 11.99 32.22 11.76" 1970 27.25 12.01 19.87 22.60 23.39 34.66 44.80 2263" 46.26 37.93"

E PIPL E/Eo

5.19 2.94b

* b

630b 9.93b

24.7Sb 6.87b

16.77b 9.71b 4,95b

1 1.76b 25.6Sb 9.76b

20.23b 14.24 1 7.80b 7.13b

30.71b 1 1 4 b 18.32b 26.71b 10.92b 18.9Sb 22.53b 19.16b 34.17b 3951 23.00 30.42b 34.14b

Average

0.93 1 .oo 1 .oo 0.76 098 1.00 0.99 0.82 095 079 0.7 1 0.94 0.99 0.87 1 .oo 089 0.59 0.95 097 093 0.98 0 9 1 0.95 1 .oo 0.82 099 0.88 1.02 0.66 090

0.9 1

063 0.27 1 .oo 0.2 1 0.52 0.56 022 0.33 0.19 0.53 0.64 0.94 0.99 0.75 0.93 0.17 0.09 0.39 0.79 0.33 0.48 0.29 0.76 0.7 1 0.15 0.1 1 0.29 085 0.29 0.15

0.49

N = order- of matrix; Po = original root-mean-square wavefront; P, = root-mean-square wavefront from Lewisg algorithm; F = root-mean-square wavefront from proposed algorithm; F , = original maximum wavefront; F , = maximum wavefront for Lewis' algorithm; F = maximum wavefront for proposed algorithm.

Notes: a These values differ from those quoted by Lewis.' bRoot-mean-square wavefront for proposed algorithm less than or equal to root-mean-square wavefront for Lewisg algorithm. ' Maximum wavefront for proposed algorithm less than or equal to maximum wavefront for Lewisg algorithm. *No improvement on original labelling.

Table 11. This quantity is of less interest than the profile or root-mean-square wavefront, since it only provides a rough indication of the efficiency of a labelling strategy. Nevertheless, when compared with the results for the Lewis algorithm, the new algorithm yields the lowest or equal lowest maximum wavefronts for 23 examples.

The CPU times required by the new algorithm and the Lewis algorithm, for each of Everstine's test problems, are shown in Table 111. These statistics are for a VAX 11/780 operating under VMS, and were obtained from the internal clock of the machine (which is accurate to the nearest hundredth of a second). Both implementations were compiled using the FORTRAN optimizing

250 S. W. SLOAN

compiler. As can be seen from Table 111, the new algorithm is significantly faster than the Lewis algorithm for each of the 30 test cases. The percentage saving in CPU time ranges from 60 per cent (examples 1, 2, 3 and 19) to 10 per cent (example 30) and, on average, is equal to 40 per cent.

Overall, empirical evidence suggests that the new algorithm is capable of producing efficient labellings for a wide variety of graphs with a minimum of computational expense. Indeed for the 30 test matrices collected by Everstine," the new scheme proved more reliable in reducing profile and root-mean-square wavefront than the Lewis,g reverse Cuthill-McKee,* Gibbs et d3 and Levy' schemes. Moreover, it is roughly 40 per cent faster than the Lewis procedure and is considerably simpler to implement. The new algorithm requires only 2E + L(9N + 5)/2J integer words of memory to label any graph. This is substantially less than that of the Lewis algorithm which has a

Table 111. CPU times for relabelling of test examplest

Problem N TL TIT,

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

59 66 72 87

162 193 198 209 22 1 234 245 307 310 346 361 419 492 503 512 592 607 758 869 878 918 992

1005 1007 1242 2680

005 0.05 0.05 008 015 056 0.27 024 026 0.24 027 038 0.34 0.49 032 050 051 070 1.15 077 091 092 1.43 1.61 1.40 3.49 1.28 1.78 2.44 3.62

002 0.02 0.02 004 009 0.36 0.14 017 015 0.1 1 017 0.29 0.2 1 0 3 1 0.25 0.37 029 057 0.47 0.53 069 058 0-85 1.14 090 2.55 1.07 1.13 1.75 3.36

Average

0.4 0 4 0 4 0.5 0.6 0 6 0 5 0.7 0.6 0 5 0.6 0.8 0 6 0.6 0 8 0 7 0.6 0.8 0 4 0 7 0.8 0.6 0.6 0.7 0.6 0.7 0 8 0.6 0.7 0 9

0.6

N =order of matrix; T,=CPU time for Lewis' algorithm; T = CPU time for proposed algorithm.

*All CPU times in seconds for VAX 111780 operating under VMS with the FORTRAN 77 optimizing compiler; all times accurate to 001 sec.

REDUCTION O F SPARSE MATRICES 25 1

worst case requirement of 2E + 9N + 3 storage locations and an average requirement of roughly 2E + 7 N storage locations. Although the Lewis implementation has the additional option of bandwidth reduction using the Gibbs et aL3 algorithm, the new algorithm saves on space and execution time because of its simplicity.

CONCLUSIONS

An algorithm for reducing the profile and wavefront of a sparse matrix has been described. The scheme is applicable to any sparse matrix which has a symmetric pattern of zeros and should prove useful in finite element analysis. In particular, the algorithm may be used to generate efficient labellings for profile or frontal solution schemes. Empirical evidence suggests that the new procedure is a substantial improvement on existing algorithms since it is fast, reliable, requires little storage, and is simple to implement.

REFERENCES

1. E. Cuthill and J. McKee, ‘Reducing the bandwidth ofsparse symmetric matrices’, Proc. ACM Nut. Conf, Association of

2. R. J. Collins, ‘Bandwidth reduction by automatic renumbering’, Int . j . numer. methods eng., 6, 345-256 (1973). 3. N. E. Gibbs, W. G. Poole and P. K. Stockmeyer, ‘An algorithm for reducing the profile and bandwidth of a sparse

4. A. Jennings, Matrix Computation for Engineers and Scientists, Wiley, New York, 1977. 5. R. L. Taylor, The Finite Element Method (Ed. Zienkiewicz, 0. C.), ch. 24, McGraw-Hill, London, 1977. 6. I. P. King, ‘An automatic reordering scheme for simultaneous equations derived from network systems’, Int. j . numer.

7. N. E. Gibbs, ‘A hybrid profile reduction algorithm’, ACM Trans. Math. Soficvare, 2, 378-387 (1976). 8. A. George, ‘Computer implementation of the finite element method’, Ph.D. thesis, Stanford Univ. (1971). 9. J. G. Lewis, ‘Implementation of the Gibbs-Poole-Stockmeyer and Gibbs-King algorithms’, ACM Truns. Math.

10. G. C. Everstine, ‘A comparison of three resequencing algorithms for the reduction of matrix profile and wavefront’, Int.

11. B. M. Irons, ’A frontal solution program for finite element analysis’, Int. j . numer. methods eizg., 2, 5-32 (1970). 12. R. Levy, ‘Resequencing of the structural stiffness matrix to improve computational efficiency’, Jet Propulsion Lab.

13. S. W. Sloan and M. F. Randolph, ‘Automatic element reordering for finite element analysis with frontal solution

14. J. E. Akin and R. M. Pardue, ‘Element resequencing for frontal solutions’, The Mathematics @“finite Elements and

15. A. Razzaque, ‘Automatic reduction of frontwidth for finite element analysis’, Int. j . numer. methods eng., 15, 1315-1324

16. H. L. Pina, ‘An algorithm for frontwidth reduction’, lnt. j . numer. methods eng., 17, 1539-1545 (1981). 17. A. George and J. W. H. Liu, ‘An implementation of a pseudo-peripheral node-finder’, ACM Trans. Math. Software, 5,

18. N. E. Gibbs, W. G. Poole and P. K. Stockmeyer, ‘A comparison ofseveral bandwidth and profile reduction algorithms’,

19. R. Sedgewick, Algorithms, Addison-Wesley, New York, 1983.

Computing Machinery, New York (1969).

matrix’, SIAM J. Numer. Anal., 13, 236-250 (1976).

methods eng., 2, 523-533 (1970).

Sofrware, 8, 180-189 (1982).

j. numer. methods eng., 14, 837-853 (1979).

Quart. Tech. Rev., 1, 61-70 (1971).

schemes’, Int. j . numer. methods eng., 19, 1153-1181 (1983).

Applications (MAFELAP) (Ed. J. R. Whiteman), Academic Press, London, pp. ,535-541 (1975).

(1980).

284-295 (1979).

ACM Trans. Math. Software, 2, 322-330 (1976).

an algorithm for profile and wavefront reduction of sparse

Documents