e cient computation of sparse matrix functions for large...

Efficient computation of sparse matrix

functions for large scale electronic

structure calculations: The CheSS

library

Stephan Mohr, William Dawson, Michael Wagner, Damien

Caliste, Takahito Nakajima, Luigi Genovese

Barcelona Supercomputing Center

MaX Conference

Trieste, 30th January 2018

Motivation and Applicability Theory Sparsity Accuracy and scaling Parallel scaling Comparison Outlook

Outline

1 Motivation and Applicability

2 Theory behind CheSS

3 Sparsity and truncation

4 Accuracy and scaling with matrix properties

5 Parallel scaling

6 Comparison with other methods

7 Outlook and conclusions

2 / 20


Motivation for CheSS

Linear Algebra operations appear in all electronic structure codes and arein principle easy to handle (standard libraries).

However they can become a severe bottleneck (up to O(N3)).

The straightforward approach is wasteful in the case of sparse matrices:

consequence of a localized basis set (e.g. Gaussians, wavelets, etc.)

intrinsic localization properties of the system

We created a standalone library for sparse Linear Algebra operations,specifically tailored for electronic structure codes:Chebyshev Sparse Solvers (CheSS).

CheSS can be obtained for free from https://launchpad.net/chess

More details in J. Chem. Theory Comput., 2017, 13 (10), pp 4684–4698

3 / 20

https://launchpad.net/chess


Applicability of CheSS

CheSS performs best for matrices exhibiting a small spectral width.

Can this be obtained in practice?

S H

system #atoms sparsity εmin εmax κ sparsity εmin εmax λ ∆HL

DNA 15613 99.57% 0.72 1.65 2.29 98.46% -29.58 19.67 49.25 2.76bulk pentacene 6876 98.96% 0.78 1.77 2.26 97.11% -21.83 20.47 42.30 1.03perovskite 768 90.34% 0.70 1.50 2.15 76.47% -20.41 26.85 47.25 2.19Si nanowire 706 93.24% 0.72 1.54 2.16 81.61% -16.03 25.50 41.54 2.29water 1800 96.71% 0.83 1.30 1.57 90.06% -26.55 11.71 38.26 9.95

4 / 20


Basic idea

In CheSS we approximate matrix functions by Chebyshev polynomials:

p(M) =c0

2I +

npl∑i=1

ciTi (M̃) ,

with

M̃ = σ(M− τ I) ; σ =2

εmax − εmin; τ =

εmin + εmax

2

and

cj =2

npl

npl−1∑k=0

f

[1

σcos

(π(k + 1

2 )

npl

)+ τ

]cos

(πj(k + 1

2

npl

).

Recursion relation for the Chebyshev polynomials:

T0(M̃) = I ,

T1(M̃) = M̃ ,

Tj+1(M̃) = 2M̃Tj(M̃)− Tj−1(M̃) .

Each column independent=⇒ easily parallelizable

Strict sparsity pattern=⇒ linear scaling

5 / 20


Available functions

CheSS can calculate those matrix functions needed for DFT:

density matrix: f (x) = 11+eβ(x−µ)

(or f (x) = 12

[1− erf

(β(ε− x)

)])

energy density matrix: f (x) = x1+eβ(x−µ)

(or f (x) = x2

[1− erf

(β(ε− x)

)])

matrix powers: f (x) = xa (a can be non-integer!)

We can calculate arbitrary functions by changing only the coefficients cj !

Only requirement:function f must be well representable by Chebyshev polynomials over theentire eigenvalue spectrum.

6 / 20


Sparsity and truncation

CheSS works with predefined sparsity patterns.

In general there are three:pattern for the original matrix Mpattern for the matrix function f (M)auxiliary pattern to perform the matrix-vector multiplications

At the moment all of them must be defined by the user.Typically: distances atoms / basis functions

1 original matrix M

2 exact calculation of M−1 withoutsparsity constraints

3 sparse calculation of M−1 usingCheSS within the sparsity pattern

4 difference between Fig. 2 andFig. 3

7 / 20


Accuracy – error definition

There are two possible factors affecting the accuracy of CheSS:

error introduced due to the enforced sparsity (truncating thematrix-vector multiplications)

error introduced by the Chebyshev fit

This also affects the definition of the “exact solution”. Two possibilities:

1 calculate the solution exactly and without sparsity constraints andthen crop to the sparsity pattern. Shortcoming: violates in generalthe identity f −1(f (M)) = M

Associated error: wf̂sparse= 1|f̂ (M)|

√∑(αβ)∈f̂ (M)

(f̂ (M)αβ − f (M)αβ

)2

2 calculate the solution within the sparsity pattern, and define as exactone that which fulfills f̂ −1(f̂ (M)) = M

Associated error: wf̂−1(f̂ ) = 1|f̂ (M)|

√∑(αβ)∈f̂ (M)

(f̂ −1(f̂ (M))αβ −Mαβ

)2

8 / 20


Fixed versus variable sparsity

Alternative to a fixed sparsity pattern: Variable adaption during theiterations (e.g. based on the magnitude of the entries).

Advantage: Flexible control of the truncation error

Shortcoming: The sparsity may decrease considerably

Example: Purification method (TRS4 by Niklasson)

Sparsity defined by magnitude

Strong fill-in during the iterations

Cost of matrix multiplicationsincreases

Unlike CheSS the matrix to beapplied changes ⇒ harder toparallelize 92

93

94

95

96

97

98

99

2 4 6 8 10 12 14 16

sp

ars

ity in

%

iteration

sparsity

9 / 20


Accuracy

Inverse:

wf̂ −1(f̂ ): error due to Chebyshev fitbasically zero

wf̂sparse: error due to sparsity pattern

very small

10-12

10-11

10-10

10-9

10-8

10-7

10-6

10 100 1000

mean r

ela

tive e

rror

condition number

wf̂-1

(f̂)wf̂sparse

Density matrix:

energy (i.e. Tr(KH)): relative errorof only 0.01%

slightly larger error for smallspectral width:eigenvalues are denser, finitetemperature smearing affects more

0.000

0.005

0.010

0.015

0.020

0.025

0.001 0.01 0.1 1

rela

tive

err

or

in %

gap (eV)

spectral width 50.0 eVspectral width 100.0 eVspectral width 150.0 eV

10 / 20


Scaling with matrix size and sparsity

Series of matrices with the same “degree of sparsity”(DFT calculations of water droplet of various size).

Example: calculation of the inverse

Runtime only depends on the number of non-zero elements of M

no dependence on the total matrix size

0

20

40

60

80

100

120

5x106

1x107

1.5x107

2x107

run

tim

e (

se

co

nd

s)

number of non-zero elements in the original matrix

matrix size 6000matrix size 12000matrix size 18000matrix size 24000matrix size 30000matrix size 36000

11 / 20


Scaling with spectral properties

CheSS is extremely sensitive to the eigenvalue spectrum:

required polynomial degree strongly increases with the spectral width

as a consequence the runtime strongly increases as well

a good input guess for the eigenvalue bounds helps a lot

Example: calculation of the inverse

1

10

100

1000

10 100 1000 10

100

1000

10000

runtim

e (

seconds)

poly

nom

ial degre

e

condition number

runtime "bounds default"runtime "bounds adjusted"

npl "bounds default"npl "bounds adjusted"

12 / 20


Scaling with spectral properties

For the density matrix the performance depends on two parameters:

spectral width (the smaller the better)

HOMO-LUMO gap (the larger the better)

In both cases the polynomial degree can increase considerably=⇒ CheSS less efficient.

0

50

100

150

200

0.001 0.01 0.1 1 0

500

1000

1500

2000

2500

3000

run

tim

e (

se

co

nd

s)

po

lyn

om

ial d

eg

ree

HOMO-LUMO gap (eV)runtime, εmax-εmin=50.0 eV

runtime, εmax-εmin=100.0 eVruntime, εmax-εmin=150.0 eV

npl, εmax-εmin=50.0 eVnpl, εmax-εmin=100.0 eVnpl, εmax-εmin=150.0 eV

13 / 20


Parallel scaling

Most compute intensive part of CheSS: matrix-vector multiplications.

Easily parallelizable

identical for all operations

Example: Calculation of M−1 (runs performed with 16 OpenMP threads)

0

5

10

15

20

25

30

35

500 1000 1500 2000 2500

sp

ee

du

p

number of cores

matrix size 12000matrix size 24000matrix size 36000

ideal

4

8

16

32

64

128

128 256 512 1024 2048

run

tim

e(s

eco

nd

s)

number of cores

matrix size 12000matrix size 24000matrix size 36000

ideal

14 / 20


Extreme scaling

We have also performed extreme-scaling tests from 1536 to 16384 cores.

Example: Calculation of M−1 (runs performed with 8 OpenMP threads)

1

2

3

4

5

6

7

8

9

10

11

2000 4000 6000 8000 10000 12000 14000 16000

speedup

number of cores

matrix size 96000ideal

Given the small matrix (96000× 96000) the obtained scaling is good.

We will try to further improve the scaling.15 / 20


Comparison with other methods: Inverse

Comparison of the matrix inversion between:

CheSS

SelInv

ScaLAPACK

LAPACK

1

10

100

2 4 8

16

32

64

128

runtim

e (

seconds)

κ

matrix size 6000 sparsity S: 97.95 % sparsity S

-1: 88.45 %

CheSS

1

10

100

2 4 8

16

32

64

128

κ


-1: 93.73 %

SelInv

1

10

100

2 4 8

16

32

64

128

κ


-1: 95.65 %

ScaLAPACK

1

10

100

2 4 8

16

32

64

128

κ


-1: 96.66 %

LAPACK

1

10

100

2 4 8

16

32

64

128

κ


-1: 97.28 %

16 / 20


Comparison with other methods: Density matrix

Comparison of the density matrix calculation between:

CheSS hybrid MPI/OpenMP

PEXSI hybrid MPI/OpenMP

CheSS MPI-only

PEXSI MPI-only

10

100

1000

50 100 150 200

runtim

e (

seconds)

spectral width (eV)

matrix size 6000 sparsity S: 97.95 % sparsity H: 92.97 % sparsity K: 88.45 %

CheSS hybrid(160 MPI x 12 OMP)

10

100

1000

50 100 150 200

spectral width (eV)


PEXSI hybrid(160 MPI x 12 OMP)

10

100

1000

50 100 150 200

spectral width (eV)


CheSS MPI-only(1920 MPI x 1 OMP)

10

100

1000

50 100 150 200

spectral width (eV)


PEXSI MPI-only(1920 MPI x 1 OMP)

10

100

1000

50 100 150 200

spectral width (eV)


17 / 20


Integration into ELSI

There are various other libraries toaccelerate linear algebra operations, e.g.

ELPA

libOMM

PEXSI

...

The ELSI project tries to provide onesingle interface for all these libraries.

We are planning to include also CheSSin ELSI.

18 / 20


Conclusions

CheSS is a flexible tool to calculate matrix functions for DFT

can easily be extended to other functions

exploits sparsity of the matrices, linear scaling possible

works best for small spectral widths of the matrices

very good parallel scaling (both MPI and OpenMP)

19 / 20

Thank you for your attention!

[email protected]

e cient computation of sparse matrix functions for large...

Documents