aa220/cs238 parallel methods in numerical analysis...

AA220/CS238

Parallel Methods in Numerical Analysis

Introduction to Sparse Direct Solver

(Symmetric Positive Definite Systems)

Kincho H. Law

Professor of Civil and Environmental Engineering

Stanford University

Email: [email protected]

May 13, 2005

User Interface (UI)

(Mesh Gen., B.C., etc..)

Element Library

Matrix, RHS

Formation

and Assembly

Linear Solver

(Direct, Iterative)

Post Processing

Element

Characteristics

Nonlin

ear

and/o

r A

daptive S

olv

er

Direct Solution Scheme

A Typical Finite Element Program

Ku p=

TLDL u p=1.

Ly p=2.

TDL u y=3.

K: Symmetric +ve Definite

L: Lower Triangular

Matrix

D: Diagonal Matrix

Motivation Example: Bandwidth Minimization

Reverse Cuthill McKee Algorithm

Motivation Example: Sparsity within Profile

Reverse Cuthill McKee Algorithm

.

...

Motivation Example: Sparse Matrix Ordering

Minimum Degree Ordering

Introductory Example312

1 2

0.40.2

0.20.4

=

3

2

1

3

2

1

000.8000.2

000.8000.2

000.2000.2000.8

y

y

y

x

x

x Stiffness

Matrix

1 1

2 2

3 3

1.000 8.000 2.000 2.000

0.250 1.000 7.500 0.500

0.250 1.000 0.500 7.500

x y

x y

x y

= Eq. 2 – 2/8 x Eq. 1

Eq. 3 – 2/8 x Eq. 1

=

3

2

1

3

2

1

467.7

500.0500.7

000.2000.2000.8

000.1067.0250.0

000.1250.0

000.1

y

y

y

x

x

x

Eq. 3 – (-0.5/7.5) x Eq. 2

1 1

2 2

3 3

1.000 8.000 2.000 2.000

0.250 1.000 7.500 0.500

0.250 1.000 0.500 7.500

x y

x y

x y

=

=

3

2

1

3

2

1

000.8000.2

000.8000.2

000.2000.2000.8

y

y

y

x

x

x

0 of Graph AG

1

2

3

1 of Graph AG

Graph G of A1 is constructed as follows:

1. Deleting node X1 and its incident edges

2. Adding edges to the graph so that the adjacent nodes of X1 are

now pairwise adjacent in graph G

1

2

3

[ ]000.2000.2250.0

250.0

000.8000.0

000.0000.8

Symbolic Representation of Symmetric Matrices

and Gaussian Elimination

=

3

2

1

3

2

1

000.8000.2

000.8000.2

000.2000.2000.8

y

y

y

x

x

x

1 1

2 2

3 3

1.000 8.000 2.000 2.000

0.250 1.000 7.500 0.500

0.250 1.000 0.500 7.500

x y

x y

x y

=

0 of Graph AG

1 of Graph AG

1

2

3

1

2

3

Symbolic Representation of Gaussian Elimination

=

3

2

1

3

2

1

467.7

500.0500.7

000.2000.2000.8

000.1067.0250.0

000.1250.0

000.1

y

y

y

x

x

x

1

2

3

2 of Graph AG

)067.0)(500.0(500.7

=

3

2

1

3

2

1

467.7

500.0500.7

000.2000.2000.8

000.1067.0250.0

000.1250.0

000.1

y

y

y

x

x

x

=

3

2

1

3

2

1

000.1

067.0000.1

250.0250.0000.1

467.7

500.7

000.8

000.1067.0250.0

000.1250.0

000.1

y

y

y

x

x

x

=

3

2

1

3

2

1

000.8000.2

000.8000.2

000.2000.2000.8

y

y

y

x

x

x

Summary of Results

Given a symmetric system of equations Ax = y

Numerical factorization of A into LU

For symmetric system, A can be factored as LDLT

1

2

3

Graph of A

1

2

3

Graph of L+U

3124 5 6

=

6

5

4

3

2

1

6

5

4

3

2

1

8.00002.00002.0000

0000.28.00002.0000

0000.28.0000

8.00000000.2

0000.28.00002.0000

2.00002.00008.0000

y

y

y

y

y

y

x

x

x

x

x

x

1.0000 0.26670.01790.2667

1.00000.2500

1.0000

1.00000.0667-0.2500

1.00000.2500

1.0000

6.9310

7.5000

8.0000

7.4667

7.5000

8.0000

1.0000

2667.01.0000

2500.01.0000

0179.01.0000

2667.00667.01.0000

0.25000.25001.0000

Another Example

Ax = y

A = LDLT

1 2

0.40.2

0.20.4Stiffness

Matrix

Not all entries within profile (band) are zero!

XXX

XXX

XX

XX

XXX

XXX

1 2 3 4 5 6

XXX

XXX

XX

X

XX

X

1 2 3 4 5 6

=

6

5

4

3

2

1

6

5

4

3

2

1

8.00002.00002.0000

0000.28.00002.0000

0000.28.0000

8.00000000.2

0000.28.00002.0000

2.00002.00008.0000

y

y

y

y

y

y

x

x

x

x

x

x

Graph Representation of Gaussian Elimination

Graph of A

Graph after elimination of node 1

XX

XXX

XX

X

X

X

1 2 3 4 5 6

XX

XXX

XX

X

X

X

1 2 3 4 5 6

XX

XX

X

X

X

X

1 2 3 4 5 6

X

X

X

X

X

X

1 2 3 4 5 6





XXX

XXX

XX

XX

XXX

XXX

8.00002.00002.0000

0000.28.00002.0000

0000.28.0000

8.00000000.2

0000.28.00002.0000

2.00002.00008.0000

1 2 3 4 5 6

Matrix A

Nonzero Structure of Matrix A

Graph of Matrix A

1.0000 0.26670.01790.2667

1.00000.2500

1.0000

1.00000.0667-0.2500

1.00000.2500

1.0000

1 2 3 4 5 6

XXX

XXX

XX

XX

XXX

XXX

Matrix Factor L

Nonzero Structure of Matrix L+LT

Graph of Matrix L+LT

Matrix Numbering as Graph Ordering

Minimum Degree Ordering Scheme (A Greedy Strategy)

1. Select a node with a minimum degree from the graph and label the

node

2. Eliminate the node and perform graph transformation by adding fill-in

edges if necessary

3. Repeat steps 1 and 2 until all nodes are labelled

6541 2 3

1 2 3 4 5 6

XX

XXX

XXX

XXX

XXX

XX

Purpose: Minimize the number of “fill-in” nonzero entries in the

matrix factor L

e a

bd

f

g c gXXX

fX

eX

dX

cX

bX

a

1 a

bd

f

g c

2

bd

f

g c

3d

f

g c

d

4

g c

d

5 c

6

c7

7

6

5

4

3

2

1

XX

XXX

XX

X

Original Graph and

Matrix Structure

1 2

36

4

5 7

Reordered Graph

and Matrix Structure

Summary of Numerical Factorization for

Sparse Symmetric Matrices

Given a symmetric matrix A, compute the matrix factor

L and D such that A = LDLT

1. Order the matrix A such that it has a desirable

structure

2. Symbolic Factorization to determine the structure of L

3. Numerically factorize the matrix A into L and D

utilizing the nonzero structure of L

4. Forward and backward solutions: Lz = y; DL x = zT

Focus of Discussion:

• Graph and Tree Representation of the

Sparse Factor L

=su

uMA

T

+===

dwDwLDw

wDLLDLwL

d

D

w

L

su

uMA

M

TT

MM

T

MM

T

MMM

T

MM

T

M

T

11

Direct (Cholesky) Factorization

T

MMMLDLM ]][][[][ =

1.000

0.250 1.000

[ ] 0.250 -0.067 1.000

1.000

0.250 1.000

ML =

8.000

7.500

[ ] 7.467

8.000

7.500

MD =

Suppose

8.00002.00002.0000

0000.28.00002.0000

0000.28.0000

8.00000000.2

0000.28.00002.0000

2.00002.00008.0000

The problem is to compute w and d

{ } { } { } { }uLDwwDLuMMMM

11][][]][[ ==

}]{[}{}]{[}{ wDwsddwDwsM

T

M

T=+=

{ }

11/8.000 1.000

1/7.500 0.250 1.000 2.000 0.267

1/7.467 0.250 -0.067 1.000 0.018

1/8.000 1.000

1/7.500 0.250 1.000 2.000 0.267

w = =

8.000

0.267 7.500 0.267

8.000 6.9310.018 7.467 0.018

8.000

0.267 7.500 0.267

T

d = =

Structure of {w} is related to forward solve

Define

PARENT(j) = min {i | L(i,j) 0}

Note: The list array PARENT represents the row

subscript of the first nonzero entry in each column of

the lower triangular matrix factor L

Lemma: If A(i,j) (or L(i,j)) 0, then for each

k = PARENT( PARENT …. (PARENT(j) )

L(i,k) 0, where k < i.

That is, given the tree T(A) and the nonzero entries of

A, we can obtain the nonzero entries per each row of

the matrix factor L by

tracing the path along the tree from the

nonzero offdiagonal column subscript of A to

the row number of interest

XXX

XXX

XX

XX

XXX

XXX

6

3

2

1

5

4

Tree of Matrix Factor L

Definition for a Tree Structure of Matrix Factor

Matrix Structure of L+LT

(Law and Fenves 81, 86; Liu 86, 88; Schreiber 82)

ALGORITHM: ROW_STRUCTURE

/* Determine the data structure for row i of matrix factor L */

BEGIN

Sort the column subscripts of the nonzero entries in ascending

order and store them in a linked list array LIST;

j = HEAD of LIST;

WHILE j 0 DO

BEGIN

IF LIST(j) 0 THEN next = LIST(j)

ELSE next = i;

r = j;

WHILE 0 < r < next DO

BEGIN

add subscript r to row i of L;

r = PARENT(r);

ENDWHILE;

IF r 0 and r < i THEN sort r to LIST;

j = LIST(j);

ENDWHILE;

END.


/* Determine data structure for row i of matrix factor L */

BEGIN

Sort the column subscripts of the nonzero entries

in ascending order and store them in a linked list

array LIST;

j = HEAD of LIST;

WHILE j 0 DO

BEGIN


ELSE next = i;

r = j;


BEGIN


r = PARENT(r);

ENDWHILE;


j = LIST(j);

ENDWHILE;

END.

XXX

XXX

XX

XX

XXX

XXX

6

3

2

1

5

4

Tree of Matrix

Factor L

Matrix

Structure

Example: Row 6 of Matrix K

HEAD = 2; LIST:<0, 5, 0, 0, 6, 0>

j = 2

next = LIST(2) = 5;

r = 2;

add entry 2 to row 6 of L

(i.e. L(6,2) 0);

r = PARENT(2) = 3;

sort 3 to list;

(i.e. LIST:<0, 3, 5, 0, 6, 0>);

(j = LIST(2) = 3;)


/* Determine data structure for row i of matrix factor L */

BEGIN

Sort the column subscripts of the nonzero entries

in ascending order and store them in a linked list

array LIST;

j = HEAD of LIST;

WHILE j 0 DO

BEGIN


ELSE next = i;

r = j;


BEGIN


r = PARENT(r);

ENDWHILE;


j = LIST(j);

ENDWHILE;

END.

XXX

XXX

XX

XX

XXX

XXX

6

3

2

1

5

4

Tree of Matrix

Factor L

Matrix

Structure

Example: Row 6 of Matrix K (cont’d)

(HEAD = 2; LIST:<0, 3, 5, 0, 6, 0>;)

(j = LIST(2) = 3;)

next = LIST(3) = 5;

r = 3;


(i.e. L(6,3) 0);

r = PARENT(3) = 6;

j = LIST(3) = 5

(LIST(5)=0) NEXT=6

r = 5;


(i.e. L(6,5) 0);

r = PARENT(5) = 6;

j = LIST(6) = 0.Nonzero entries denoted in the linked list LIST

ALGORITHM: TREE-STRUCTURE

/*Given the nonzero entries of A, determine the tree structure of

matrix A */

BEGIN

Initialize array PARENT to 0;

FOR each row i = 2 TO n, DO

FOR each nonzero entry A(i,j) of row i, DO

r = j;

WHILE ((PARENT(r) 0) AND (PARENT(r) i))

DO

r = PARENT(r);

ENDWHILE.

IF (PARENT(r) = 0) THEN PARENT(r) = i;

ENDFOR

ENDFOR.

END.

1

2

1

3

2

1

3

2

1

3

2

1

4

3

2

1

5

4

3

2

1

5

4

66

3

2

1

5

4

6

3

2

1

5

4

XXX

XXX

XX

XX

XXX

XXX

XXX

XXX

XX

XX

XXX

XXX

Matrix A

Matrix of L+LT

ALGORITHM: TREE-STRUCTURE (with tree compression) (Ref: Liu 86, 88)

/*Given the nonzero entries of A, determine the tree structure of matrix A */

BEGIN

Initialize arrays PARENT and ANCESTOR to 0;

FOR each row i = 2 TO n, DO

FOR each nonzero entry A(i,j) of row i, DO

r = j;

WHILE ((ANCESTOR(r) 0) AND (ANCESTOR(r) < i)) DO

r = ANCESTOR(r);

ANCESTOR(r) = i;

r = t;

ENDWHILE.

IF (ANCESTOR(r) = 0) THEN DO

ANCESTOR(r) = i;

PARENT(r) = i;

ENDIF.

ENDFOR

ENDFOR.

END.

1

10

8

7

65

4

3

2

24

262827

1817

15

16 13

14

11 12

9

3029

19

21

23 25

20

22

3635

3433

3231

36

36363636363623

22

24 20

26

27

28

29

30

31

32

33

34

35

8

13

14

3

4

9

10

15

16

1

2

5

6

11

12

2117

18

19

7

25

6

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

65 6 7 8 9 0 1 2 3 4 56 7 8 9 0 1 2 3 41 2 3 4 56 7 8 9 03 4 51 2

F F

Example model ordered using RCM (bandwidth minimization) algorithm

Note: Zero entries within the band!

Restructuring of Ordered Elimination Tree :

ALGORITHM: Binary-Tree Representation

/*Given the PARENT array */

BEGIN

FOR each node i, DO

IF (PARENT(I) 0) THEN DO

BEGIN

r = PARENT(r);

IF (CHILD(r) 0) THEN DO

BEGIN

SIBLING(r)=CHILD(r);

CHILD(r) = i;

ENDIF.

ELSE CHILD(r)=I;

ENDIF

ENDFOR.

END.

X X X

X X X

X

X X

X X X

X X X

6

4

2

1

5

3

SIBLING

CHILD6

4

2

1

5

3

PARENT

Restructuring of Ordered Elimination Tree : Post-Order Traversal

ALGORITHM: POST-ORDER(r,number)

/*Given the Binary Tree Representation */

/* Initially set r=n (root of T(K)) and number = 1 */

BEGIN

t=r;

IF (t 0) THEN DO

BEGIN

POST-ORDER(CHILD(t),number);

label node t = number;

number = number + 1;

POST-ORDER(SIBLING(t),number);

ENDIF

END.

X X X

X X X

X

X X

X X X

X X X

6

3

2

1

5

4

6

4

2

1

5

3

6

3

2

1

5

4

XXX

XXX

XX

XX

XXX

XXX

1

10

87

65

43

2

24262827

1817

15

16

13

14

11

12

9

3029

19212325

2022

3635

3433

3231

36

36363636363617

16

18 24

26

27

28

29

30

31

32

33

34

35

20

21

22

9

10

11

12

13

14

1

2

3

4

5

6

157

8

23

19

25

6

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

65 6 7 8 9 0 1 2 3 4 56 7 8 9 0 1 2 3 41 2 3 4 56 7 8 9 03 4 51 2

FF

Post-ordering of the elimination tree

• Preserve number of fill-in entries

• Reveal matrix partitioning

• Allow “block” data structure : principal submatrix, row segments

Summary of Numerical Factorization for

Sparse Symmetric Matrices

Given a symmetric matrix A, compute the matrix factor

L and D such that A = LDLT

1. Order the matrix A such that it has a desirable

structure

2. Symbolic Factorization to determine the structure of L

1. Given structure of A, determine the tree structure

T(A) of A

2. Given T(A) and structure of A, determine the

structure of L

3. Numerically factor the matrix A into L and D utilizing

the nonzero structure of L

User Interface (UI)

(Mesh Gen., B.C., etc..)

Element Library

Matrix, RHS

Formation

and Assembly

Linear Solver

(Direct, Iterative)

Post Processing

Element

Characteristics

Nonlin

ear

and/o

r A

daptive S

olv

er

Ordering the equations

Data structure for

system matrix

Profile solver

Sparse solver

Direct Solution Scheme

A Typical Finite Element Program

Performance of Sparse Linear Solver

0

2

4

6

8

10

12

14

16

18

0 5000 10000 15000 20000 25000

Number of Equations

Per

form

ance

(se

con

d) Profi le Sollver

Minimum Degrree

Multi level Nested Disssection

Neq MultiND MinD Profile

Square40 3354 0.191 0.286 0.425

Humboldt 1 5206 0.282 0.26 1.027

Humboldt 2 7294 0.61 0.788 5.126

Plate 100x20 12516 2.532 3.272 14.33

Square100 20394 2.518 5.081 15.82

(Ref: Jun Peng 2002)

Sparse Solvers Traditional Solver

Comparison of Different Linear SolversComparison of Different Linear Solvers

The Models: 1. Brick6x8x50; 2. Humboldt1; 3. Humboldt2; 4. Square100x100

0

5

10

15

20

25

30

1 2 3 4

4 Tested Models

Perf

orm

an

ce (

Seco

nd

)

SymSparse::MultiND SymSparse::MinD SymSparse::GenNDProfile SuperLU UmfPack

(Ref: Jun Peng 2002)

General Remarks

References:

• Law and Fenves, “A Node Addition Model for Symbolic Factorization,”

ACM TOMS, 12(1):37-50, 1986.

• Mackay, Law and Raefsky, “An Implementation of A Generalized

Sparse/Profile Finite Element Solution Method,” Computer and

Structure, 41(4):723-737, 1991.

• George and Liu, Computer Solution of Large Sparse Positive Definite

Systems, Prentice Hall, 1981.

• Duff, Erisman and Reid, Direct Methods for Sparse Matrices, Oxford

Science Publications, 1986.

Software Packages: Sparspak, YSMP, UMFPACK, SuperLU, etc

Next Lecture: Parallel Implementation of a Sparse Direct Solver

aa220/cs238 parallel methods in numerical analysis...

Documents