cr07 - graal.ens-lyon.frbucar/cr07/introsparse.pdf · a selection of references i books i du ,...

112
CR07 – Sparse Matrix Computations / Cours de Matrices Creuses (2010-2011) Jean-Yves L’Excellent (INRIA) and Bora U¸ car (CNRS) LIP-ENS Lyon, Team: ROMA ([email protected], [email protected]) Fridays 10h15-12h15 prepared in collaboration with P. Amestoy (ENSEEIHT-IRIT) 1/ 94

Upload: hoangxuyen

Post on 29-Aug-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

CR07 – Sparse Matrix Computations /Cours de Matrices Creuses (2010-2011)

Jean-Yves L’Excellent (INRIA) and Bora Ucar (CNRS)

LIP-ENS Lyon, Team: ROMA

([email protected], [email protected])

Fridays 10h15-12h15

prepared in collaboration with P. Amestoy (ENSEEIHT-IRIT)

1/ 94

Page 2: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

MotivationsI Applications ayant un besoin croissant en puissance de calcul:

I modelisation,I simulation (plutot qu’experimentation),I optimisation numerique

I Typiquement:Probleme continu ⇒ Discretisation (maillage)⇒ Algorithme numerique de resolution (selon lois physiques)⇒ Probleme matriciel (Ax = b, . . .)

I Besoins:I Modelisations de plus en plus precisesI Problemes de plus en plus complexesI Applications critiques en temps de reponseI Minimisation des couts du calcul

⇒ Calculateurs haute performance, parallelisme

⇒ Algorithmes permettant de tirer le meilleur parti de cescalculateurs et des proprietes des matrices considerees

2/ 94

Page 3: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Example of preprocessing for Ax = b

Original (A =lhr01) Preprocessed matrix (A′(lhr01))

0 200 400 600 800 1000 1200 1400

0

200

400

600

800

1000

1200

1400

nz = 184270 200 400 600 800 1000 1200 1400

0

200

400

600

800

1000

1200

1400

nz = 18427

Modified Problem:A′x ′ = b′ with A′ = PnPDr AQDcPt

3/ 94

Page 4: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Quelques exemples dans le domaine du calculscientifique

I Constraints of duration: weather forecast

4/ 94

Page 5: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

A few examples in the field of scientific computing

I Cost constraints: wind tunnels, crash simulation, . . .

5/ 94

Page 6: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Scale Constraints

I large scale: climate modelling, pollution, astrophysics

I tiny scale: combustion, quantum chemistry

6/ 94

Page 7: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Contents of the course

I Introduction, reminders on graph theory and on numericallinear algebra

- Introduction to sparse matrices- Graphs and hypergraphs, trees, some classical algorithms- Gaussian elimination, LU factorization, fill-in- Conditioning/sensitivity of a problem and error analysis

7/ 94

Page 8: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Contents of the course

↔ ↔ Graph

I Sparse Gaussian elimination

- Sparse matrices and graphs- Gaussian elimination of sparse matrices- Ordering and permuting sparse matrices

7/ 94

Page 9: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Contents of the course

I Maximum (weighted) Matching algorithms and their use insparse linear algebra

I Efficient factorization methods

- Implementation of sparse direct solvers- Parallel sparse solvers- Scheduling of computations to optimize memory usageand/or performance

I Graph and hypergraph partitioning

I Iterative methods

I Current research activities

7/ 94

Page 10: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Tentative outline

A. INTRODUCTIONI. Sparse matrices

II. Graph theory and algorithmsIII. Linear algebra basics

------------------------------------------------------B. SPARSE GAUSSIAN ELIMINATION

IV. Elimination tree and structure predictionV. Fill-reducing ordering methods

VI. Matching in bipartite graphsVII. Factorization: MethodsVIII. Factorization: Parallelization aspects

------------------------------------------------------C. SOME OTHER ESSENTIAL SPARSE MATRIX ALGORITHMS

IX. Graph and hypergraph partitioningX. Iterative methods

------------------------------------------------------D. CLOSING

XI. Current research activitiesXII. Presentations

8/ 94

Page 11: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Tentative organization of the course

I References, teaching material will be made available athttp://graal.ens-lyon.fr/~bucar/CR07/

I 2+ hours dedicated to exercises (manipulation of sparsematrices and graphs)

I Evaluation: mainly based on the study of research articlesrelated to the contents of the course; the students will write areport and do an oral presentation.

9/ 94

Page 12: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Outline

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

10/ 94

Page 13: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

A selection of references

I Books

I Duff, Erisman and Reid, Direct methods for Sparse Matrices,Clarendon Press, Oxford 1986.

I Dongarra, Duff, Sorensen and H. A. van der Vorst, SolvingLinear Systems on Vector and Shared Memory Computers,SIAM, 1991.

I Davis, Direct methods for sparse linear systems, SIAM, 2006.I Saad, Iterative methods for sparse linear systems, 2nd edition,

SIAM, 2004.

I Articles

I Gilbert and Liu, Elimination structures for unsymmetric sparseLU factors, SIMAX, 1993.

I Liu, The role of elimination trees in sparse factorization,SIMAX, 1990.

I Heath and E. Ng and B. W. Peyton, Parallel Algorithms forSparse Linear Systems, SIAM review 1991.

11/ 94

Page 14: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

12/ 94

Page 15: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Motivations

I solution of linear systems of equations → key algorithmickernel

Continuous problem↓

Discretization↓

Solution of a linear system Ax = b

I Main parameters:I Numerical properties of the linear system (symmetry, pos.

definite, conditioning, . . . )I Size and structure:

I Large (> 1000000× 1000000 ?), square/rectangularI Dense or sparse (structured / unstructured)I Target computer (sequential/parallel/multicore)

→ Algorithmic choices are critical

13/ 94

Page 16: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Motivations for designing efficient algorithms

I Time-critical applications

I Solve larger problems

I Decrease elapsed time (parallelism ?)

I Minimize cost of computations (time, memory)

14/ 94

Page 17: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Difficulties

I Access to data :I Computer : complex memory hierarchy (registers, multilevel

cache, main memory (shared or distributed), disk)I Sparse matrix : large irregular dynamic data structures.

→ Exploit the locality of references to data on the computer(design algorithms providing such locality)

I Efficiency (time and memory)I Number of operations and memory depend very much on the

algorithm used and on the numerical and structural propertiesof the problem.

I The algorithm depends on the target computer (vector, scalar,shared, distributed, clusters of Symmetric Multi-Processors(SMP), multicore).

→ Algorithmic choices are critical

15/ 94

Page 18: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

16/ 94

Page 19: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse matrices

Example:

3 x1 + 2 x2 = 52 x2 - 5 x3 = 1

2 x1 + 3 x3 = 0

can be represented as

Ax = b,

where A =

3 2 00 2 −52 0 3

, x =

x1

x2

x3

, and b =

510

Sparse matrix: only nonzeros are stored.

17/ 94

Page 20: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse matrix ?

0 100 200 300 400 500

0

100

200

300

400

500

nz = 5104

Matrix dwt 592.rua (N=592, NZ=5104);Structural analysis of a submarine

18/ 94

Page 21: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse matrix ?

Matrix from Computational Fluid Dynamics;(collaboration Univ. Tel Aviv)

0 1000 2000 3000 4000 5000 6000 7000

0

1000

2000

3000

4000

5000

6000

7000

nz = 43105

“Saddle-point” problem

19/ 94

Page 22: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Preprocessing sparse matrices

Original (A =lhr01) Preprocessed matrix (A′(lhr01))

0 200 400 600 800 1000 1200 1400

0

200

400

600

800

1000

1200

1400

nz = 184270 200 400 600 800 1000 1200 1400

0

200

400

600

800

1000

1200

1400

nz = 18427

Modified Problem:A′x ′ = b′ with A′ = PnPDr ADcQPt

20/ 94

Page 23: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Factorization process

Solution of Ax = b

I A is unsymmetric :I A is factorized as: A = LU, where

L is a lower triangular matrix, andU is an upper triangular matrix.

I Forward-backward substitution: Ly = b then Ux = y

I A is symmetric:I A = LDLT or LLT

I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is

triangular.I Solve: y = QTb then Rx = y

21/ 94

Page 24: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Factorization process

Solution of Ax = b

I A is unsymmetric :I A is factorized as: A = LU, where

L is a lower triangular matrix, andU is an upper triangular matrix.

I Forward-backward substitution: Ly = b then Ux = y

I A is symmetric:I A = LDLT or LLT

I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is

triangular.I Solve: y = QTb then Rx = y

21/ 94

Page 25: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Factorization process

Solution of Ax = b

I A is unsymmetric :I A is factorized as: A = LU, where

L is a lower triangular matrix, andU is an upper triangular matrix.

I Forward-backward substitution: Ly = b then Ux = y

I A is symmetric:I A = LDLT or LLT

I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is

triangular.I Solve: y = QTb then Rx = y

21/ 94

Page 26: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Difficulties

I Only non-zero values are stored

I Factors L and U have far more nonzeros than A

I Data structures are complex

I Computations are only a small portion of the code (the rest isdata manipulation)

I Memory size is a limiting factor ( → out-of-core solvers )

22/ 94

Page 27: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Key numbers:

1- Small sizes : 500 MB matrix;Factors = 5 GB; Flops = 100 Gflops ;

2- Example of 2D problem: Lab. Geosiences Azur, ValbonneI Complex 2D finite difference matrix n=16× 106 , 150× 106

nonzerosI Storage (single prec): 2 GB (12 GB with the factors)I Flops: 10 TeraFlops

3- Example of 3D problem: EDF (Code Aster, structuralengineering)

I real matrix finite elements n = 106 , nz = 71× 106 nonzerosI Storage: 3.5× 109 entries (28 GB) for factors, 35 GB totalI Flops: 2.1× 1013

4- Typical performance (MUMPS):I PC LINUX 1 core (P4, 2GHz) : 1.0 GFlops/sI Cray T3E (512 procs) : Speed-up ≈ 170, Perf. 71 GFlops/sI AMD Opteron 8431, 24 [email protected] GHz: 50 GFlops/s (1 core:

7 GFlop/s)

23/ 94

Page 28: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Typical test problems:

BMW car body,227,362 unknowns,5,757,996 nonzeros,MSC.Software

Size of factors: 51.1 million entriesNumber of operations: 44.9× 109

24/ 94

Page 29: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Typical test problems:

BMW crankshaft,148,770 unknowns,5,396,386 nonzeros,MSC.Software

Size of factors: 97.2 million entriesNumber of operations: 127.9× 109

25/ 94

Page 30: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sources of parallelism

Several levels of parallelism can be exploited:

I At problem level: problem can be decomposed intosub-problems (e.g. domain decomposition)

I At matrix level: Sparsity implies independency in calculation

I At submatrix level: within dense linear algebra computations(parallel BLAS, . . . )

26/ 94

Page 31: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Data structure for sparse matrices

I Storage scheme depends on the pattern of the matrix and onthe type of access required

I band or variable-band matricesI “block bordered” or block tridiagonal matricesI general matrixI row, column or diagonal access

27/ 94

Page 32: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Data formats for a general sparse matrix A

What needs to be represented

I Assembled matrices: MxN matrix A with NNZ nonzeros.

I Elemental matrices (unassembled): MxN matrix A with NELTelements.

I Arithmetic: Real (4 or 8 bytes) or complex (8 or 16 bytes)

I Symmetric (or Hermitian)→ store only part of the data.

I Distributed format ?

I Duplicate entries and/or out-of-range values ?

28/ 94

Page 33: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Classical Data Formats for Assembled MatricesI Example of a 3x3 matrix with NNZ=5 nonzeros

a31

a23a22

a11

a33

1 2 3

1

2

3

I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1

I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row

I Diagonal format (M=N):NDIAG = 3IDIAG = −2 0 1

VAL =

na a11 0na a22 a23

a31 a33 na

(na: not accessed)

VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 ≤ i + IDIAG(j)≤ N)

29/ 94

Page 34: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Classical Data Formats for Assembled MatricesI Example of a 3x3 matrix with NNZ=5 nonzeros

a31

a23a22

a11

a33

1 2 3

1

2

3

I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1

I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row

I Diagonal format (M=N):NDIAG = 3IDIAG = −2 0 1

VAL =

na a11 0na a22 a23

a31 a33 na

(na: not accessed)

VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 ≤ i + IDIAG(j)≤ N)

29/ 94

Page 35: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Classical Data Formats for Assembled MatricesI Example of a 3x3 matrix with NNZ=5 nonzeros

a31

a23a22

a11

a33

1 2 3

1

2

3

I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33

COLPTR [1 : N + 1] = 1 3 4 6column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1

I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row

I Diagonal format (M=N):NDIAG = 3IDIAG = −2 0 1

VAL =

na a11 0na a22 a23

a31 a33 na

(na: not accessed)

VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 ≤ i + IDIAG(j)≤ N)

29/ 94

Page 36: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Classical Data Formats for Assembled Matrices

I Example of a 3x3 matrix with NNZ=5 nonzeros

a31

a23a22

a11

a33

1 2 3

1

2

3

I Diagonal format (M=N):NDIAG = 3IDIAG = −2 0 1

VAL =

na a11 0na a22 a23

a31 a33 na

(na: not accessed)

VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 ≤ i + IDIAG(j)≤ N)

29/ 94

Page 37: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse Matrix-vector products Y ← AX

Algorithm depends on sparse matrix format:

I Coordinate format:Y ( 1 :M) = 0DO k=1,NNZ

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO

I CSC format:

Y ( 1 :M) = 0DO J=1,N

Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k )∗XjENDDO

ENDDO

I CSR format

DO I =1,MYi=0DO k=ROWPTR( I ) ,ROWPTR( I +1)−1

Yi = Yi + VAL( k )∗X(JCN( k ) )ENDDOY( I )=Yi

ENDDO

30/ 94

Page 38: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse Matrix-vector products Y ← AX

Algorithm depends on sparse matrix format:

I Coordinate format:Y ( 1 :M) = 0DO k=1,NNZ

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO

I CSC format:Y ( 1 :M) = 0DO J=1,N

Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k )∗XjENDDO

ENDDO

I CSR format

DO I =1,MYi=0DO k=ROWPTR( I ) ,ROWPTR( I +1)−1

Yi = Yi + VAL( k )∗X(JCN( k ) )ENDDOY( I )=Yi

ENDDO

30/ 94

Page 39: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse Matrix-vector products Y ← AX

Algorithm depends on sparse matrix format:

I Coordinate format:Y ( 1 :M) = 0DO k=1,NNZ

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO

I CSC format:Y ( 1 :M) = 0DO J=1,N

Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k )∗XjENDDO

ENDDO

I CSR formatDO I =1,M

Yi=0DO k=ROWPTR( I ) ,ROWPTR( I +1)−1

Yi = Yi + VAL( k )∗X(JCN( k ) )ENDDOY( I )=Yi

ENDDO

30/ 94

Page 40: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Sparse Matrix-vector products Y ← AX

Algorithm depends on sparse matrix format:

I Coordinate format:Y ( 1 :M) = 0DO k=1,NNZ

Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO

I Diagonal format: (VAL(i,j) corresponds to A(i,i+IDIAG(j)))

Y ( 1 :N) = 0DO j =1,NDIAG

DO i= max(1,1− IDIAG ( j ) ) , min (N, N−IDIAG ( j ) )Y( i ) = Y( i ) + VAL( i , j )∗X( i+IDIAG ( j ) )

END DOEND DO

30/ 94

Page 41: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Jagged diagonal storage (JDS)

a31

a23a22

a11

a33

1 2 3

1

2

3

1. Shift all elements left (similar to CSR) and keep column

indicesa11 (1)a22 (2) a23 (3)a31 (1) a33 (3)

2. Sort rows in decreasing order of their number of nonzeros

3. Store corresponding row permutation: PERM = 2 3 1

4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33

COL IND = 2 1 1 3 3COL PTR = 1 4 6

31/ 94

Page 42: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Jagged diagonal storage (JDS)

a31

a23a22

a11

a33

1 2 3

1

2

3

1. Shift all elements left (similar to CSR) and keep column

indicesa11 (1)a22 (2) a23 (3)a31 (1) a33 (3)

2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

3. Store corresponding row permutation: PERM = 2 3 14. Stored jagged diagonals (columns of step 2)

VAL = a22 a31 a11 a23 a33

COL IND = 2 1 1 3 3COL PTR = 1 4 6

31/ 94

Page 43: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Jagged diagonal storage (JDS)

a31

a23a22

a11

a33

1 2 3

1

2

3

1. Shift all elements left (similar to CSR) and keep columnindices

2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

3. Store corresponding row permutation: PERM = 2 3 1

4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33

COL IND = 2 1 1 3 3COL PTR = 1 4 6

31/ 94

Page 44: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Jagged diagonal storage (JDS)

a31

a23a22

a11

a33

1 2 3

1

2

3

1. Shift all elements left (similar to CSR) and keep columnindices

2. Sort rows in decreasing order of their number of nonzerosa22 (2) a23 (3)a31 (1) a33 (3)a11 (1)

3. Store corresponding row permutation: PERM = 2 3 1

4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33

COL IND = 2 1 1 3 3COL PTR = 1 4 6

31/ 94

Page 45: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Jagged diagonal storage (JDS)

a31

a23a22

a11

a33

1 2 3

1

2

3

1. Shift all elements left (similar to CSR) and keep columnindices

2. Sort rows in decreasing order of their number of nonzeros

3. Store corresponding row permutation: PERM = 2 3 1

4. Stored jagged diagonals (columns of step 2)VAL = a22 a31 a11 a23 a33

COL IND = 2 1 1 3 3COL PTR = 1 4 6

Pros: manipulate longer vectors than CSR (interesting on vectorcomputers or GPU’s).Cons: extra-indirection due to permutation array.

31/ 94

Page 46: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Example of elemental matrix format

A =

−1 2 3 0 0

2 1 1 0 01 1 3 −1 30 0 1 2 −10 0 3 2 1

= A1 + A2

A1 =123

−1 2 32 1 11 1 1

, A2 =345

2 −1 31 2 −13 2 1

32/ 94

Page 47: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Example of elemental matrix format

A1 =123

−1 2 32 1 11 1 1

, A2 =345

2 −1 31 2 −13 2 1

I N=5 NELT=2 NVAR=6 A =

∑NELTi=1 Ai

IELTPTR [1:NELT+1] = 1 4 7ELTVAR [1:NVAR] = 1 2 3 3 4 5ELTVAL [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1

I Remarks:

I NVAR = ELTPTR(NELT+1)-1I NVAL =

∑S2

i (unsym) ou∑

Si (Si + 1)/2 (sym), avecSi = ELTPTR(i + 1)− ELTPTR(i)

I storage of elements in ELTVAL: by columns

32/ 94

Page 48: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

File storage: Rutherford-Boeing

I Standard ASCII format for filesI Header + Data (CSC format). key xyz:

I x=[rcp] (real, complex, pattern)I y=[suhzr] (sym., uns., herm., skew sym., rectang.)I z=[ae] (assembled, elemental)I ex: M T1.RSA, SHIP003.RSE

I Supplementary files: right-hand-sides, solution,permutations. . .

I Canonical format introduced to guarantee a uniquerepresentation (order of entries in each column, no duplicates).

33/ 94

Page 49: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

File storage: Rutherford-Boeing

DNV-Ex 1 : Tubular joint-1999-01-17 M_T1

1733710 9758 492558 1231394 0

rsa 97578 97578 4925574 0

(10I8) (10I8) (3e26.16)

1 49 96 142 187 231 274 346 417 487

556 624 691 763 834 904 973 1041 1108 1180

1251 1321 1390 1458 1525 1573 1620 1666 1711 1755

1798 1870 1941 2011 2080 2148 2215 2287 2358 2428

2497 2565 2632 2704 2775 2845 2914 2982 3049 3115

...

1 2 3 4 5 6 7 8 9 10

11 12 49 50 51 52 53 54 55 56

57 58 59 60 67 68 69 70 71 72

223 224 225 226 227 228 229 230 231 232

233 234 433 434 435 436 437 438 2 3

4 5 6 7 8 9 10 11 12 49

50 51 52 53 54 55 56 57 58 59

...

-0.2624989288237320E+10 0.6622960540857440E+09 0.2362753266740760E+11

0.3372081648690030E+08 -0.4851430162799610E+08 0.1573652896140010E+08

0.1704332388419270E+10 -0.7300763190874110E+09 -0.7113520995891850E+10

0.1813048723097540E+08 0.2955124446119170E+07 -0.2606931100955540E+07

0.1606040913919180E+07 -0.2377860366909130E+08 -0.1105180386670390E+09

0.1610636280324100E+08 0.4230082475435230E+07 -0.1951280618776270E+07

0.4498200951891750E+08 0.2066239484615530E+09 0.3792237438608430E+08

0.9819999042370710E+08 0.3881169368090200E+08 -0.4624480572242580E+08

34/ 94

Page 50: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

File storage: Matrix-market

I Example

%%MatrixMarket matrix coordinate real general% Comments

5 5 81 1 1.000e+002 2 1.050e+013 3 1.500e-021 4 6.000e+004 2 2.505e+024 4 -2.800e+024 5 3.332e+015 5 1.200e+01

35/ 94

Page 51: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Examples of sparse matrix collections

I The University of Florida Sparse Matrix Collectionhttp://www.cise.ufl.edu/research/sparse/matrices/

I Matrix market http://math.nist.gov/MatrixMarket/

I Rutherford-Boeinghttp://www.cerfacs.fr/algor/Softs/RB/index.html

I TLSE http://gridtlse.org/

36/ 94

Page 52: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

37/ 94

Page 53: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Gaussian elimination

A = A(1), b = b(1), A(1)x = b(1):0@ a11 a12 a13a21 a22 a23a31 a32 a33

1A 0@ x1x2x3

1A =

0@ b1b2b3

1A 2← 2− 1× a21/a113← 3− 1× a31/a11

A(2)x = b(2)0B@ a11 a12 a13

0 a(2)22 a

(2)23

0 a(2)32 a

(2)33

1CA0@ x1

x2x3

1A =

0B@ b1

b(2)2

b(2)3

1CA b(2)2 = b2 − a21b1/a11 . . .

a(2)32 = a32 − a31a12/a11 . . .

Finally A(3)x = b(3)0B@ a11 a12 a13

0 a(2)22 a

(2)23

0 0 a(3)33

1CA0@ x1

x2x3

1A =

0B@ b1

b(2)2

b(3)3

1CAa

(3)(33)

= a(2)(33)− a

(2)32 a

(2)23 /a

(2)22 . . .

Typical Gaussian elimination step k : a(k+1)ij = a

(k)ij −

a(k)ik a

(k)kj

a(k)kk

38/ 94

Page 54: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Relation with A = LU factorization

I One step of Gaussian elimination can be written:A(k+1) = L(k)A(k) , with

L(k) =

0BBBBBBB@

1.

.1

−lk+1,k .. .−ln,k 1

1CCCCCCCAand lik =

a(k)ik

a(k)kk

.

I Then, A(n) = U = L(n−1) . . .L(1)A, which gives A = LU ,

with L = [L(1)]−1 . . . [L(n−1)]−1 =

0BBB@1 0

..

.li,j 1

1CCCA ,

I In dense codes, entries of L and U overwrite entries of A.

I Furthermore, if A is symmetric, A = LDLT with dkk = a(k)kk :

A = LU = At = U tLt implies (U)(Lt)−1 = L−1U t = D diagonal

and U = DLt , thus A = L(DLt) = LDLt

Page 55: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Gaussian elimination and sparsity

Step k of LU factorization (akk pivot):

I For i > k compute lik = aik/akk (= a′ik),

I For i > k, j > k

a′ij = aij −aik × akj

akk

ora′ij = aij − lik × akj

I If aik 6= 0 and akj 6= 0 then a′ij 6= 0

I If aij was zero → its non-zero value must be stored k j

k

i

x

x

x

x

k j

k

i

x

x

x

0

fill-in

40/ 94

Page 56: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

I Idem for Cholesky :

I For i > k compute lik = aik/√

akk (= a′ik),

I For i > k, j > k, j ≤ i (lower triang.)

a′ij = aij −aik × ajk√

akk

ora′ij = aij − lik × ajk

41/ 94

Page 57: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Example

I Original matrix

x x x x xx xx xx xx x

I Matrix is full after the first step of elimination

I After reordering the matrix (1st row and column ↔ last rowand column)

42/ 94

Page 58: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

x xx x

x xx x

x x x x x

I No fill-inI Ordering the variables has a strong impact on

I the fill-inI the number of operations

NP-hard problem in general (Yannakakis, 1981)

43/ 94

Page 59: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Illustration: Reverse Cuthill-McKee on matrixdwt 592.rua

Harwell-Boeing matrix: dwt 592.rua, structural computing on asubmarine. NZ(LU factors)=58202

Original matrix Factorized matrix

0 100 200 300 400 500

0

100

200

300

400

500

nz = 51040 100 200 300 400 500

0

100

200

300

400

500

nz = 58202

44/ 94

Page 60: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Illustration: Reverse Cuthill-McKee on matrixdwt 592.rua

NZ(LU factors)=16924

Permuted matrix Factorized permuted matrix(RCM)

0 100 200 300 400 500

0

100

200

300

400

500

nz = 51040 100 200 300 400 500

0

100

200

300

400

500

nz = 16924

44/ 94

Page 61: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Table: Benefits of Sparsity on a matrix of order 2021 × 2021 with 7353nonzeros. (Dongarra etal 91) .

Procedure Total storage Flops Time (sec.)on CRAY J90

Full Syst. 4084 Kwords 5503 ×106 34.5Sparse Syst. 71 Kwords 1073×106 3.4Sparse Syst. and reordering 14 Kwords 42×103 0.9

45/ 94

Page 62: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Control of numerical stability: numerical pivoting

I In dense linear algebra partial pivoting commonly used (ateach step the largest entry in the column is selected).

I In sparse linear algebra, flexibility to preserve sparsity isoffered :

I Partial threshold pivoting : Eligible pivots are not too smallwith respect to the maximum in the column.

Set of eligible pivots = r | |a(k)rk | ≥ u ×maxi |a(k)

ik |, where0 < u ≤ 1.

I Then among eligible pivots select one preserving bettersparsity.

I u is called the threshold parameter (u = 1 → partial pivoting).I It restricts the maximum possible growth of: aij = aij − aik×akj

akk

to 1 + 1u which is sufficient to the preserve numerical stability.

I u ≈ 0.1 is often chosen in practice.

I For symmetric indefinite problems 2by2 pivots (withthreshold) is also used to preserve symmetry and sparsity.

Page 63: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Threshold pivoting and numerical accuracy

Table: Effect of variation in threshold parameter u on matrix 541× 541with 4285 nonzeros (Dongarra etal 91) .

u Nonzeros in LU factors Error

1.0 16767 3 ×10−9

0.25 14249 6 ×10−10

0.1 13660 4 ×10−9

0.01 15045 1 ×10−5

10−4 16198 1 ×102

10−10 16553 3 ×1023

47/ 94

Page 64: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Threshold pivoting and numerical accuracy

Table: Effect of variation in threshold parameter u on matrix 541× 541with 4285 nonzeros (Dongarra etal 91) .

u Nonzeros in LU factors Error

1.0 16767 3 ×10−9

0.25 14249 6 ×10−10

0.1 13660 4 ×10−9

0.01 15045 1 ×10−5

10−4 16198 1 ×102

10−10 16553 3 ×1023

Difficulty: numerical pivoting implies dynamic datastructures thatcan not be forecasted symbolically

47/ 94

Page 65: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Three-phase scheme to solve Ax = b

1. Analysis stepI Preprocessing of A (symmetric/unsymmetric orderings,

scalings)I Build the dependency graph (elimination tree, eDAG . . . )

2. Factorization (A = LU, LDLT, LLT, QR)

Numerical pivoting

3. Solution based on factored matricesI triangular solves: Ly = b, then Ux = yI improvement of solution (iterative refinement), error analysis

48/ 94

Page 66: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Efficient implementation of sparse algorithms

I Indirect addressing is often used in sparse calculations: e.g.sparse SAXPY

do i = 1, mA( ind(i) ) = A( ind(i) ) + alpha * w( i )

enddoI Even if manufacturers provide hardware for improving indirect

addressingI It penalizes the performance

I Identify dense blocks or switch to dense calculations as soonas the matrix is not sparse enough

49/ 94

Page 67: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Effect of switch to dense calculations

Matrix from 5-point discretization of the Laplacian on a 50× 50grid (Dongarra etal 91)

Density for Order of Millions Timeswitch to full code full submatrix of flops (seconds)

No switch 0 7 21.81.00 74 7 21.40.80 190 8 15.00.60 235 11 12.50.40 305 21 9.00.20 422 50 5.50.10 531 100 3.70.005 1420 1908 6.1

Sparse structure should be exploited if density < 10%.

50/ 94

Page 68: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

51/ 94

Page 69: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Main processor (r)evolutions

I pipelined functional units

I superscalar processors

I out-of-order execution (ILP)

I larger caches

I evolution of instruction set (CISC, RISC, EPIC, . . . )

I multicores

52/ 94

Page 70: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Pourquoi des traitements paralleles ?

I Besoins de calcul non satisfaits dans beaucoup de disciplines(pour resoudre des problemes significatifs)

I Performance uniprocesseur proche des limites physiques

Temps de cycle 0.5 nanoseconde↔ 8 GFlop/s (avec 4 operations flottantes / cycle)

I Calculateur 40 TFlop/s ⇒ 5000 coeurs→calculateurs massivement paralleles

I Pas parce que c’est le plus simple mais parce que c’estnecessaire

Puissance actuelle (juin 2010, cf http://www.top500.org):

Cray XT5, Oak Ridge Natl Lab:

1.7 PFlop/s, 300 TBytes de memoire, 224256 coeurs

53/ 94

Page 71: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Quelques unites pour le calcul haute performance

Vitesse

1 MFlop/s 1 Megaflop/s 106 operations / seconde1 GFlop/s 1 Gigaflop/s 109 operations / seconde1 TFlop/s 1 Teraflop/s 1012 operations / seconde1 PFlop/s 1 Petaflop/s 1015 operations / seconde1 EFlop/s 1 Exaflop/s 1015 operations / seconde

Memoire

1 kB / 1 ko 1 kilobyte 103 octets1 MB / 1 Mo 1 Megabyte 106 octets1 GB / 1 Go 1 Gigabyte 109 octets1 TB / 1 To 1 Terabyte 1012 octets1 PB / 1 Po 1 Petabyte 1015 octets

54/ 94

Page 72: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Mesures de performance

I Nombre d’operations flottantes par seconde (pas MIPS)I Performance crete :

I Ce qui figure sur la publicite des constructeursI Suppose que toutes les unites de traitement sont activesI On est sur de ne pas aller plus vite :

Performance crete = #unites fonctionnellesclock (sec.)

I Performance reelle :I Habituellement tres inferieure a la precedente

(malheureusement)

55/ 94

Page 73: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Rapport (Performance reelle / performance de crete) souvent bas !!Soit P un programme :

1. Processeur sequentiel:I 1 unite scalaire (1 GFlop/s)I Temps d’execution de P : 100 s

2. Machine parallele a 100 processeurs:I Chaque processor: 1 GFlop/sI Performance crete: 100 GFlop/s

3. Si P : code sequentiel (10%) + code parallelise (90%)I Temps d’execution de P : 0.9 + 10 = 10.9 sI Performance reelle : 9.2 GFlop/s

4. Performance reellePerformance de crete = 0.1

56/ 94

Page 74: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Moore’s law

I Gordon Moore (co-fondateur d’Intel) a predit en 1965 que ladensite en transitors des circuits integres doublerait tous les24 mois.

I A aussi servi de but a atteindre pour les fabriquants.I A ete deforme:

I 24 → 18 moisI nombre de transistors → performance

57/ 94

Page 75: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Comment accroıtre la vitesse de calcul ?

I Accelerer la frequence avec des technologies plus rapides

On approche des limites:I Conception des pucesI Consommation electrique et chaleur dissipeeI Refroidissement ⇒ probleme d’espace

I On peut encore miniaturiser, mais:I pas indefinimentI resistance des conducteurs (R = ρ×l

s ) augmente et ..la resistance est responsable de la dissipation d’energie (effetJoule).

I effets de capacites difficiles a maıtriser

Remarque: 0.5 nanoseconde = temps pour qu’un signalparcourt 15 cm de cable

I Temps de cycle 0.5 nanosecond ↔ 8 GFlop/s (avec 4operations flottantes par cycle)

58/ 94

Page 76: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Seule solution: le parallelisme

I parallelisme: execution simultanee de plusieurs instructions al’interieur d’un programme

I A l’interieur d’un cœur :I micro-instructionsI traitement pipelineI recouvrement d’instructions executees par des unites de calcul

distinctes

→ transparent pour le programmeur(gere par le compilateur ou durant l’execution)

I Entre des processeurs ou cœurs distincts:I suites d’instructions differentes executees → synchronisations:

I implicites (compilateur, parallelisation automatique, utilisationde librairies paralleles)

I ou explicites (transfert de messages, programmationmultithreads, sections critiques)

59/ 94

Page 77: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Probleme d’acces aux donnees

I On est souvent (en pratique) a 10% de la performance creteI Processeurs plus rapides → acces aux donnees plus rapide :

I organisation processeur,I organisation memoire,I communication inter-processeurs

I Hardware plus complexe : pipe, technologie, reseau, . . .

I Logiciel plus complexe : compilateur, systeme d’exploitation,langages de programmation, gestion du parallelisme, outils demise au point . . . applications

Il devient plus difficile de programmer efficacement

60/ 94

Page 78: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Problemes de debit memoire

I L’acces aux donnees est un probleme crucial dans lescalculateurs modernes

I Performance processeur : + 60% par anI Memoire DRAM : + 9% par an

→ Ratio performance processeurtemps acces memoire augmente d’environ 50% par an!

MFlop/s plus faciles que MB/s pour debit memoire

I Hierarchie memoire de plus en plus complexe (mais latenceaugmente)

I Facon d’acceder aux donnees de plus en plus critique:I Minimiser les defauts de cacheI Minimiser la pagination memoireI Localite: ameliorer le rapport references a des memoires

locales/ references a des memoires a distanceI Reutilisation, blocage: accroıtre le ratio flops/memory accessI Gestion des transferts de donnees ”a la main” ? (Cell, GPU)

61/ 94

Page 79: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Cache level #2

Cache level #1 1−2 / 8 − 66

6−15 / 30 − 200

Main memory 10 − 100

Remote memory 500 − 5000

Registers < 1

256 KB − 16 MB

1 − 128 KB

Average access time (# cycles) hit/missSize

Disks 700,000 / 6,000,000

1 − 10 GB

Figure: Exemple de hierarchie memoire.

62/ 94

Page 80: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Conception memoire pour nombre important deprocesseurs ?

Comment 500 processeurs peuvent-ils avoir acces a des donneesrangees dans une memoire partagee (technologie, interconnexion,prix ?)→ Solution a cout raisonnable : memoire physiquement distribuee(chaque processeur ou groupe de processeurs a sa propre memoirelocale)

I 2 solutions :I memoires locales globalement adressables : Calulateurs a

memoire partagee virtuelleI transferts explicites des donnees entre processeurs avec

echanges de messagesI Scalabilite impose :

I augmentation lineaire debit memoire / vitesse du processeurI augmentation du debit des communications / nombre de

processeursI Rapport cout/performance → memoire distribuee et bon

rapport cout/performance sur les processeurs63/ 94

Page 81: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Architecture des multiprocesseurs

Nombre eleve de processeurs → memoire physiquement distribuee

Organisation Organisation physiquelogique Partagee (32 procs max) DistribueePartagee multiprocesseurs espace d’adressage global

a memoire partagee (hard/soft) au dessus de messagesmemoire partagee virtuelle

Distribuee emulation de messages echange de messages(buffers)

Table: Organisation des processeurs

64/ 94

Page 82: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Terminologie

Architecture SMP (Symmetric Multi Processor)

I Memoire partagee (physiquement et logiquement)

I Temps d’acces uniforme a la memoire

I Similaire du point de vue applicatif aux architecturesmulti-cœurs (1 cœur = 1 processeur logique)

I Mais communications bcp plus rapides dans les multi-cœurs(latence < 3ns, bande passantee > 20 GB/s) que dans lesSMP (latence ≈ 60ns, bande passantee ≈ 2 GB/s)

Architecture NUMA (Non Uniform Memory Access)

I Memoire physiquement distribuee et logiquement partagee

I Plus facile d’augmenter le nombre de procs qu’en SMP

I Temps d’acces depend de la localisation de la donnee

I Acces locaux plus rapides qu’acces distants

I hardware permet la coherence des caches (ccNUMA)65/ 94

Page 83: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Programmation

Standards de programmation

Org. logique partagee: threads POSIX, directives OpenMPOrg. logique distribuee: PVM, MPI, sockets (Message Passing)

I Programmation hybride: MPI + OpenMP

I Machines a 1 million de cœurs? architectures emergentes typeGPGPU ? pas encore de standard !

66/ 94

Page 84: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Evolution du calcul haute performance

I Evolution rapide des architectures haute performance (SMP,clusters, NUMA, multicoeurs, Cell, GPUs, . . . )

I Parallelisme a plusieurs niveaux

I Hierarchie memoire de plus en plus complexe

⇒ Programmation de plus en plus difficile avec des outils logicielset des standards qui ont toujours un temps de retard.

67/ 94

Page 85: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

68/ 94

Page 86: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Simulation numerique et matrices creuses

I Demarche generale pour le calcul scientifique:

1. Probleme de simulation (probleme continu)2. Application de lois physiques (Equations aux derivees

partielles)3. Discretisation, mise en equations en dimension finie4. Resolution de systemes lineaires (Ax = b)5. (Etude des resultats, remise en cause eventuelle du modele ou

de la methode)

I Resolution de systemes lineaires=noyau algorithmiquefondamental. Parametres a prendre en compte:

I Proprietes du systeme (symetrie, defini positif,conditionnement, sur-determine, . . . )

I Structure: dense ou creux,I Taille: plusieurs millions d’equations ?

69/ 94

Page 87: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Equations aux derivees partielles

I Modelisation d’un phenomene physique

I Equations differentielles impliquant:I forcesI momentsI temperaturesI vitessesI energiesI temps

I Solutions analytiques rarement disponibles

70/ 94

Page 88: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Exemples d’equations aux derivees partielles

I Trouver le potentiel electrique φ pour une distribution decharge donnee f :∇2ϕ = f ⇔ ∆ϕ = f , or∂2

∂x2ϕ(x , y , z) + ∂2

∂y2ϕ(x , y , z) + ∂2

∂z2ϕ(x , y , z) = f (x , y , z)

I Equation de la chaleur (ou equation de Fourier):∂2u

∂x2+∂2u

∂y 2+∂2u

∂z2=

1

α

∂u

∂tavec

I u = u(x , y , z , t): temperature,I α: diffusivite thermique du milieu.

I Equations de propagation d’ondes, equation de Schrodinger,Navier-Stokes,. . .

71/ 94

Page 89: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Discretisation (etape qui suit la modelisationphysique)

Travail du numericien:

I Realisation d’un maillage (regulier, irregulier)

I Choix des methodes de resolution et etude de leurcomportement

I Etude de la perte d’information due au passage a la dimensionfinie

Principales techniques de discretisation

I Differences finies

I Elements finis

I Volumes finis

72/ 94

Page 90: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Discretization with finite differences (1D)

I Basic approximation (ok if h is small enough):(du

dx

)(x) ≈ u(x + h)− u(x)

h

I Results from Taylor’s formula

u(x + h) = u(x) + hdu

dx+

h2

2

d2u

dx2+

h3

6

d3u

dx3+ O(h4)

I Replacing h by −h:

u(x − h) = u(x)− hdu

dx+

h2

2

d2u

dx2− h3

6

d3u

dx3+ O(h4)

I Thus:

d2u

dx2=

u(x + h)− 2u(x) + u(x − h)

h2+ O(h2)

73/ 94

Page 91: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Discretization with finite differences (1D)

d2u

dx2=

u(x + h)− 2u(x) + u(x − h)

h2+ O(h2)

3-point stencil for the centered difference approximation tothe second order derivative:

−21 1

74/ 94

Page 92: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite Differences for the Laplacian Operator (2D)

Assuming same mesh refinement h in x and y directions:

∆u(x) ≈ u(x−h,y)−2u(x ,y)+u(x+h,y)h2 + u(x ,y−h)−2u(x ,y)+u(x ,y+h)

h2

∆u(x) ≈ 1h2 (u(x−h, y)+u(x +h, y)+u(x , y−h)+u(x , y +h)−4u(x , y))

1 1

1

1

−4 −4

1 1

11

5-point stencils for the centered difference approximation tothe Laplacian operator (left) standard (right) skewed

75/ 94

Page 93: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

27-point stencil usedfor 3D geophysicalapplications (collabo-ration with GeoscienceAzur, S.Operto andJ.Virieux).

Page 94: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

1D example

I Consider the problem

−u′′(x) = f (x) for x ∈ (0, 1)u(0) = u(1) = 0

I xi = i × h, i = 0, . . . , n + 1, f (xi ) = fi , u(xi ) = ui

h = 1/(n + 1)

I Centered difference approximation:

−ui−1 + 2ui − ui+1 = h2fi (u0 = un+1 = 0),

I We obtain a linear system Au = f or (for n = 6):

1

h2

2 −1 0 0 0 0−1 2 −1 0 0 0

0 −1 2 −1 0 00 0 −1 2 −1 00 0 0 −1 2 −10 0 0 0 −1 2

u1

u2

u3

u4

u5

u6

=

f1

f2

f3

f4

f5

f6

77/ 94

Page 95: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Slightly more complicated (2D)

Consider an elliptic PDE:

−∂(a(x , y)∂u

∂x )

∂x−∂(b(x , y)∂u

∂y )

∂y+ c(x , y)× u = g(x , y) sur Ω

u(x , y) = 0 sur ∂Ω

0 ≤ x , y ≤ 1

a(x , y) > 0

b(x , y) > 0

c(x , y) ≥ 0

78/ 94

Page 96: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

I Case of a regular 2D mesh:

0 1

1

2 3 41

5

discretization step: h = 1n+1 , n = 4

I 5-point finite difference scheme:

∂(a(x , y)∂u∂x )ij

∂x=

ai+ 12,j(ui+1,j − ui ,j)

h2−

ai− 12,j(ui ,j − ui−1,j)

h2+O(h2)

I Similarly:

∂(b(x , y)∂u∂y )ij

∂y=

bi ,j+ 12(ui ,j+1 − ui ,j)

h2−

bi ,j− 12(ui ,j − ui ,j−1)

h2+O(h2)

Page 97: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

I ai+ 12,j , bi+ 1

2,j , cij , . . . known.

I With the ordering of unknows of the example, we obtain alinear system of the form:

Ax = b,

I where

x1 ↔ u1,1 = u( 1n+1 ,

1n+1 )

x2 ↔ u2,1 = u( 2n+1 ,

1n+1 )

x3 ↔ u3,1

x4 ↔ u4,1

x5 ↔ u1,2, . . .

I and A is n2 by n2, b is of size n2, with the following structure:

80/ 94

Page 98: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16|x x x | 1 |g11||x x x x | 2 |g21|| x x x x | 3 |g31|| x x 0 x | 4 |g41||x 0 x x x | 5 |g12|| x x x x x | 6 |g22|| x x x x x | 7 |g32|

A=| x x x 0 x | 8 b=|g42|| x 0 x x x | 9 |g13|| x x x x x |10 |g23|| x x x x x |11 |g33|| x x x 0 x |12 |g43|| x 0 x x |13 |g14|| x x x x |14 |g24|| x x x x |15 |g34|| x x x |16 |g44|

81/ 94

Page 99: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Solution of the linear system

Often the most costly part in a numerical simulation code

I Direct methods:I L U factorization followed by triangular substitutionsI parallelism depends highly on the structure of the matrix

I Iterative methods:I usually rely on sparse matrix-vector productsI algebraic preconditioner useful

82/ 94

Page 100: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Evolution in time of a complex phenomenon

I Examples:I climate modeling, evolution of radioactive waste, . . .

I heat equation:

∆u(x , y , z , t) = ∂u(x,y ,z,t)

∂tu(x , y , z , t0) = u0(x , y , z)

I Discretization in both space and time (1D case):I Explicit approaches:

un+1j −un

j

tn+1−tn=

unj+1−2un

j +unj−1

h2 .I Implicit approaches:

un+1j −un

j

tn+1−tn=

un+1j+1 −2un+1

j +un+1j−1

h2 .

I Implicit approaches are preferred (more stable, larger timesteppossible) but are more numerically intensive: a sparse linearsystem must be solved at each iteration.

83/ 94

Page 101: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Discretization with Finite elements

I Consider a partial differential equation of the form (PoissonEquation):

∆u = ∂2u∂x2 + ∂2u

∂y2 = f

u = 0 on ∂Ω

I we can show (using Green’s formula) that the previousproblem is equivalent to:

a(u, v) = −∫

Ωf v dx dy ∀v such that v = 0 on ∂Ω

where a(u, v) =∫

Ω

(∂u∂x

∂v∂x + ∂u

∂y∂v∂y

)dxdy

84/ 94

Page 102: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite element scheme: 1D Poisson Equation

I ∆u = ∂2u∂x2 = f , u = 0 on ∂Ω

I Equivalent to

a(u, v) = g(v) for all v (v|∂Ω = 0)

where a(u, v) =∫

Ω∂u∂x

∂v∂x and g(v) = −

∫Ω f (x)v(x)dx

(1D: similar to integration by parts)

I Idea: we search u of the form =∑

k αkΦk(x)

(Φk)k=1,n basis of functions such that Φk is linear on all Ei ,

and Φk(xi ) = δik = 1 if k = i , 0 otherwise.

Ω

Φk Φk+1Φk−1

xkEk Ek+1

85/ 94

Page 103: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite Element Scheme: 1D Poisson Equation

Ω

Φk Φk+1Φk−1

xkEk Ek+1

I We rewrite a(u, v) = g(v) for all Φk :a(u,Φk) = g(Φk) for all k ⇔

∑i αia(Φi ,Φk) = g(Φk)

a(Φi ,Φk) =∫

Ω∂Φi∂x

∂Φk∂x = 0 when |i − k| ≥ 2

I kth equation associated with Φk

αk−1a(Φk−1,Φk) + αka(Φk ,Φk) + αk+1a(Φk+1,Φk) = g(Φk)

I a(Φk−1,Φk) =∫Ek

∂Φk−1

∂x∂Φk∂x

a(Φk+1,Φk) =∫Ek+1

∂Φk+1

∂x∂Φk∂x

a(Φk ,Φk) =∫Ek

∂Φk∂x

∂Φk∂x +

∫Ek+1

∂Φk∂x

∂Φk∂x

86/ 94

Page 104: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite Element Scheme: 1D Poisson Equation

From the point of view of Ek , we have a 2x2 contribution matrix:( ∫Ek

∂Φk−1

∂x∂Φk−1

∂x

∫Ek

∂Φk−1

∂x∂Φk∂x∫

Ek

∂Φk−1

∂x∂Φk∂x

∫Ek

∂Φk∂x

∂Φk∂x

)=

(IEk

(Φk−1,Φk−1) IEk(Φk−1,Φk)

IEk(Φk ,Φk−1) IEk

(Φk ,Φk)

)

210 3 4 Ω

E3

Φ1 Φ2 Φ3

E4E1 E2 IE1 (Φ1,Φ1) + IE2 (Φ1,Φ1) IE2 (Φ1,Φ2)IE2 (Φ2,Φ1) IE2 (Φ2,Φ2) + IE3 (Φ2,Φ2) IE3 (Φ2,Φ3)

IE3 (Φ2,Φ3) IE3 (Φ3,Φ3) + IE4 (Φ3,Φ3)

×

α1

α2

α3

=

g(φ1)g(φ2)g(φ3)

87/ 94

Page 105: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite Element Scheme in Higher Dimension

I Can be used for higher dimensions

I Mesh can be irregular

I Φi can be a higher degree polynomial

I Matrix pattern depends on mesh connectivity/ordering

88/ 94

Page 106: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Finite Element Scheme in Higher Dimension

I Set of elements (tetrahedras, triangles) to assemble:

j

T

i

k C (T ) =

aTi ,i aT

i ,j aTi ,k

aTj ,i aT

j ,j aTj ,k

aTk,i aT

k,j aTk,k

Needs for the parallel case

I Assemble the sparse matrix A =∑

k C (Tk): graph coloringalgorithms

I Parallelization domain by domain: graph partitioning

I Solution of Ax = b: high performance matrix computationkernels

88/ 94

Page 107: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Other example: linear least squares

I mathematical model + approximate measures ⇒ estimateparameters of the model

I m ”experiments” + n parameters xi :min‖Ax − b‖ avec:

I A ∈ Rm×n,m ≥ n: data matrixI b ∈ Rm: vector of observationsI x ∈ Rn: parameters of the model

I Solving the problem:I Decompose A under the form A = QR, with Q orthogonal, R

triangularI ‖Ax−b‖ = ‖QT Ax−QT b‖ = ‖QT QRx−QT b‖ = ‖Rx−QT b‖

I Problems can be large (meteorological data, . . . ), sparse ornot

89/ 94

Page 108: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

90/ 94

Page 109: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Solution of sparse linear systems Ax = b (Direct or Iterative approaches ?

Direct methods

I Very general/robust

I Numerical accuracyI Irregular/unstructured

problems

I Factorization of matrix A

I May be costly(flops/memory)

I Factors can be reused formultiple right-hand sides b

I Computing issues

I Good granularity ofcomputations

I Several levels of parallelismcan be exploited

Iterative methods

I Efficiency depends on:

I convergence –preconditioning

I numerical properties /structure of A

I Rely on efficient Mat-Vect product

I memory effectiveI successive right-hand sides

is problematic

I Computing issues

I Smaller granularity ofcomputations

I Often, only one level ofparallelism

Page 110: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationParallel and high performance computingNumerical simulation and sparse matricesDirect vs iterative methodsConclusion

92/ 94

Page 111: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Summary – sparse matrices

I Widely used in engineering and industry

I Irregular data structures

I Strong relations with graph theoryI Efficient algorithms are critical

I OrderingI Sparse Gaussian eliminationI Sparse matrix-vector multiplicationI Parallelization

I Challenges:I Modern applications leading to

I bigger and bigger problemsI different types of matrices and requirements

I Dynamic data structures (numerical pivoting) → need fordynamic scheduling

I More and more parallelism (evolution of parallel architectures)

93/ 94

Page 112: CR07 - graal.ens-lyon.frbucar/CR07/introSparse.pdf · A selection of references I Books I Du , Erisman and Reid, Direct methods for Sparse Matrices, Clarendon Press, Oxford 1986

Suggested home reading

I Google page rank, The world’s largest matrix computation,Cleve Moler.

I Architecture and Performance Characteristics of ModernHigh Performance Computers, Georg Hager and GerhardWellein, Lect. Notes in Physics

I Optimization Techniques for Modern High PerformanceComputers, Georg Hager and Gerhard Wellein, Lect. Notesin Physics

94/ 94