daubechies wavelets as a basis set for density functional ...€¦ · 2 institut fur¨ physik,...

Daubechies wavelets as a basis set for density functional pseudopotentialcalculations

Luigi Genovese,1 Alexey Neelov,2 Stefan Goedecker,2 Thierry Deutsch,1 Seyed Alireza Ghasemi,2 AlexanderWilland,2 Damien Caliste,1 Oded Zilberberg,2 Mark Rayson,2 Anders Bergman,1 and Reinhold Schneider31Institut de Nanosciences et Criogenie,SP2M/L Sim, CEA-Grenoble, 38054 Grenoble cedex 9, France2 Institut fur Physik, Universitat Basel, Klingelbergstr.82, 4056 Basel, Switzerland3Kiel-Berlin,Germany

Daubechies wavelets are a powerful systematic basis set for electronic structure calculations be-cause they are orthogonal and localized both in real and Fourier space. We describe in detail howthis basis set can be used to obtain a highly efficient and accurate method for density functionalelectronic structure calculations. An implementation of this method is available in the ABINITfree software package.

PACS numbers:

I. INTRODUCTION

In the recent years, the Kohn-Sham formalism ofthe density functional theory (DFT) approach has beenproven to be one of the most efficient and reliable first-principle methods for predicting material properties andprocesses which undergo a quantum mechanical behav-ior. The high accuracy of the results together with therelatively simple form of the exchange-correlation func-tionals make this method probably the most powerfultool for ab-initio simulations of the properties of matter.The computational machinery of DFT calculations hasbeen widely developed in the last decade, giving rise toa plethora of DFT codes. The usage of DFT calculationhas thus become more and more common, and its domainof application comprises solid state physics, chemistry,materials science, biology and geology.

One of the most important characteristics of a DFTcode is the set of basis functions used for expressing theKohn-Sham (KS) orbitals. The domain of applicabilityof a code is tightly connected to this choice. For exam-ple, a non-localised basis set like plane waves is highlysuitable for electronic structure calculations of periodicand/or homogeneous systems like crystals or solids, whileit is much less efficient in expanding localised informa-tion, which has a wider range of components in the re-ciprocal space. For these reasons DFT codes based onplane waves are not convenient for simulating inhomoge-neous or isolated systems like molecules, due to the highmemory requirements for such kind of simulations.

A remarkable difference should be also made betweencodes which use systematic and non-systematic basissets. A systematic basis set allows us to calculate theexact solution of the KS equations with arbitrarily highprecision as the number of basis functions is increased. Inother terms, the numerical precision of the results is re-lated to the number of basis functions used to expand theKS orbitals. With such a basis set it is thus possible toobtain results that are free of errors related to the choice

of the basis, eliminating a source of uncertainty. This isparticularly important in view of the fact that highly ac-curate approximations to the exchange correlation func-tional are now available such as the PBE functional (21).Some of these functionals contain also van der Waals in-teractions (28) . A systematic basis set allows us thusto really calculate the solution of a particular exchangecorrelation functional. On the other hand, an exampleof a non-systematic set is provided by Gaussian type ba-sis, for which over-completeness may be achieved beforeconvergence. Such basis sets are more difficult to use,since the basis set must be carefully tuned by hand bythe user, which will sometimes require some preliminaryknowledge of the system under investigation. This is themost important weakness of this popular basis set.

Another property which has a role in the performancesof a DFT code is the orthogonality of the basis set. Theuse of nonorthogonal basis sets requires the calculationof the overlap matrix of the basis function and perform-ing various operations with this overlap matrix such asinverting the matrix. This makes methods based on non-orthogonal basis functions not only more complicated butalso slower.

Daubechies wavelets(9) have virtually all the proper-ties that one might desire for a basis set. They form asystematic orthogonal and smooth basis that is localizedboth in real and Fourier space and that allows for adap-tivity. A DFT approach based on such functions willmeet both the requirements of precision and localisationfound in many applications. In this paper, we will de-scribe in detail a DFT method based on a Daubechieswavelets basis set. This method is implemented in aDFT code, distributed under GNU-GPL license and in-tegrated in the ABINIT (1) software package. A separate,standalone version of this code is also available and dis-tributed under GNU-GPL license (2). In the next fewparagraphs we will discuss the importance of the prop-erties of Daubechies wavelets in the context of electronicstructure calculations.

2

A wavelet basis consists on a family of functions gener-ated from a mother function and its translations on thepoints of a uniform grid of spacing h. For a wavelet basisthe number of basis functions is increased by decreas-ing the spacing of the grid on whose points the waveletsare centered. The degree of smoothness determines thespeed with which one converges to the exact result ash is decreased. The degree of smoothness increases asone goes to higher order Daubechies wavelets. In ourmethod we use Daubechies wavelets of order 16. This to-gether with the fact that our method is quasi variationalgives a convergence rate of h14. Obtaining such a highconvergence rate is essential in the context of electronicstructure calculations where one needs highly accurateresults for basis sets of acceptable size. The combinationof adaptivity and a high order convergence rate is typi-cally not achieved in other electronic structure programsusing systematic real space methods (3). An adaptivefinite element code (4) has a convergence rate of h6. Fi-nite difference methods have sometimes low (5) h3 or highconvergence rates (31) but are not adaptive.

As discussed above, localization in real space is essen-tial for molecular systems. Basis sets that are not local-ized in real space are wasteful in this context. With planewaves one has to fill for instance a orthorhombic cell intowhich the molecule fits. Large subregions of the cell maycontain no atoms and therefore no charge density, butthis feature can not be exploited with plane waves. SinceDaubechies wavelets are defined on a compact support,one can consistently define a set of localisation parame-ters which allows us to put the basis functions only onthe points which are sufficiently close to the atoms. Thecomputational volume in our method is thus given onlyby the union of spheres centered on all the atoms in thesystem. Real space localization is also necessary for theimplementation of linear scaling algorithms (6). This ba-sis set is thus a promising candidate for developing suchalgorithms.

Localization in Fourier space is useful for precondition-ing purposes. As a matter of fact, the condition numberof the Hamiltonian operator depends explicitly on itshighest eigenvalue. Since the high frequency spectrumof the Hamiltonian is dominated by the kinetic energyoperator, high kinetic energy basis function are there-fore also approximate eigenfunctions of the Hamiltonian.A function localised in Fourier space is an approximateeigenfunction of the kinetic energy operator. By usingsuch functions as basis functions for the KS orbitals thehigh energy spectrum of the Hamiltonian can thus easilybe preconditioned.

A high degree of adaptivity is necessary for all-electroncalculations since highly localized core electrons requirea much higher spatial resolution that the valence wave-function away from the atomic core. High adaptivitycan in principle be obtained with a wavelet basis andwavelet based all-electron electronic structure programshave been developed (7; 8). In contrast to these devel-opments we use pseudopotentials since such pseudopo-

tentials are the easiest way to incorporate the relativisticeffects that are important for heavy elements. The use ofpseudopotentials drastically reduces the need for adap-tivity and we have therefore only two levels of adaptiv-ity. We have a high resolution region that contains all thechemical bonds and a low resolution region further awayfrom the atoms where the wavefunctions decay exponen-tially to zero. In the low resolution region each grid pointcarries a scaling function. In the high resolution region itcarries in addition 7 wavelets. In comparison with a planewave methods our wavelet method is therefore particu-larly efficient for open structures with large empty spacesand a relatively small bonding region.

The outline of this paper is as follows: in thenext section we describe the fundamental properties ofDaubechies wavelets. Then we will describe how the var-ious operations needed in an electronic structure calcu-lations are done in a scaling function/wavelet basis. Thelast part of the paper illustrates the performances of ourDFT code based on Daubechies wavelets.

II. ADAPTIVITY IN A WAVELET BASIS

There are two fundamental functions in wavelet the-ory (9; 10), the scaling function φ(x) and the waveletψ(x).

The most important property of these functions is thatthey satisfy the so-called refinement equations

φ(x) =√

2m∑

j=−m

hj φ(2x− j) (1)

ψ(x) =√

2m∑

j=−m

gj φ(2x− j)

which establishes a relation between the scaling functionson a grid with grid spacing h and another one with spac-ing h/2. hj and gj = (−1)jh−j+1 are the elements ofa filter that characterizes the wavelet family, and m isthe order of the scaling function-wavelet family. All theproperties of these functions can be obtained from therelations (1). The full basis set can be obtained from allthe translations by a certain grid spacing h of the motherfunction centered at the origin. The mother function islocalised, with compact support. The maximally sym-metric Daubechies scaling function and wavelet of order16 that are used in this work are shown in Fig. 1.

For a three-dimensional description, the simplest basisset is obtained by a set of equally spaced scaling functionson a grid of grid spacing h′

φi,j,k (r) = φ(x/h′ − i)φ(y/h′ − j)φ(z/h′ − k) . (2)

In other terms, the three-dimensional basis functions area tensor product of one dimensional basis functions. Notethat we are using a cubic grid, where the grid spacingis the same in all the directions, but the following de-scription can be straightforwardly applied to general or-thorombic grids.

3

-1.5

-1

-0.5

0

0.5

1

1.5

-6 -4 -2 0 2 4 6 8

x

φ(x)

ψ(x)

FIG. 1 Daubechies scaling function φ and wavelet ψ of order16. Both have are different from zero only in the interval from-7 to 8.

The basis set of Eq. 2 is equivalent to a mixed basisset of scaling functions on a twice coarser grid of gridspacing h = 2h′

φi,j,k(r) = φ(x/h− i)φ(y/h− j)φ(z/h− k)

augmented by a set of 7 wavelets

ψ1i,j,k(r) = ψ(x/h− i)φ(y/h− j)φ(z/h− k)

ψ2i,j,k(r) = φ(x/h− i)ψ(y/h− j)φ(z/h− k)

ψ3i,j,k(r) = ψ(x/h− i)ψ(y/h− j)φ(z/h− k)

ψ4i,j,k(r) = φ(x/h− i)φ(y/h− j)ψ(z/h− k)

ψ5i,j,k(r) = ψ(x/h− i)φ(y/h− j)ψ(z/h− k)

ψ6i,j,k(r) = φ(x/h− i)ψ(y/h− j)ψ(z/h− k)

ψ7i,j,k(r) = ψ(x/h− i)ψ(y/h− j)ψ(z/h− k)

This equivalence follows from the fact that every scalingfunction and wavelet at a coarse grid spacing h can beexpressed as a linear combination of scaling functions atthe fine grid level h′ and vice versa.

The points of the simulation grid fall into 3 differentclasses. The points which are very far from the atoms willhave virtually zero charge density and thus will not carryany basis functions. The remaining grid points are eitherin the high resolution region which contains the chemi-cal bonds or in the low resolution regions which containsthe exponentially decaying tails of the wavefunctions. Inthe low resolution region one uses only one scaling func-tion per coarse grid point, whereas in the high resolu-tion region one uses both the scaling function and the 7wavelets. In this region the resolution is thus doubledcompared to the low resolution region. Fig. 2 shows the2-level adaptive grid around a water molecule.

A wavefunctions Ψ(r) can thus be expanded in this

FIG. 2 A 2-level adaptive grid around a H2O molecule. Thehigh resolution grid points carrying both scaling functions andwavelets are shown in blue (thicker points), the low resolutiongrid points carrying only a single scaling function are shownin yellow (thinner points).

basis:

Ψ(r) =∑

i1,i2,i3

si1,i2,i3φi1,i2,i3(r)+

+∑

j1,j2,j3

7∑ν=1

dνj1,j2,j3ψ

νj1,j2,j3(r) (3)

The sum over i1, i2, i3 runs over all the grid points con-tained in the low resolution region and the sum over j1,j2, j3 over all the points contained in the smaller highresolution region.

The decomposition of scaling function into coarser scal-ing functions and wavelets can be continued recursivelyto obtain more than 2 resolution level. We found how-ever that a high degree of adaptivity is not needed inpseudopotential calculations. In other terms, the pseu-dopotentials smoothes the wavefunctions such that twolevels of resolution are enough to achieve good compu-tataional accuracy. In addition, more than two resolu-tion levels lead to more complicated algorithms such asthe non-standard operator form (32) that, in turn, leadto larger prefactors.

The transformation from a pure fine scaling functionrepresentation (a basis set which contains only scalingfunctions centered on a finer grid of spacing h′) to amixed coarse scaling function/wavelet representation isdone by the fast wavelet transformation (10) which is aconvolution and scales linearly with respect to the num-ber of basis functions being transformed.

The wavefunctions are stored in a compressed formwhere only the nonzero scaling function and wavelets co-

4

efficients are stored. The basis set being orthogonal, sev-eral operations such as scalar products among differentorbitals and between orbitals and the projectors of thenon-local pseudopotential can directly be done in thiscompressed form. In the following sections we will illus-trate the main operations which must be performed inthe context of a DFT calculation.

III. OVERVIEW OF THE METHOD

In the KS formulation of DFT, the electronic densityof a system of N electrons can be calculated from thesquare modulus of a set of wavefunctions:

ρ(r) =N/2∑i=1

n(i)occ |Ψi(r)|2 , (4)

where the KS wavefunctions |Ψi〉 are eigenfunctions ofthe KS Hamiltonian, with pseudopotential Vpsp:(

−12∇2 + VKS [ρ] + Vpsp

)|Ψi〉 = εi|Ψi〉 (5)

for simplicity we assumed in this description that ourelectronic system is a closed-shell system of non spin-polarised eletronic orbitals. For this reasons we have ex-actly N/2 KS wavefunctions and ∀i n(i)

occ = 2.The KS potential

VKS [ρ] = VH [ρ] + Vxc[ρ] + Vext , (6)

contains the Hartree potential, solution of the Poisson’sequation ∇2VH = −4πρ, the exchange-correlation poten-tial Vxc and the external ionic potential Vext acting onthe electrons. The method we illustrate in this paperis conceived for isolated systems, namely free boundaryconditions.

In our method, we choose the pseudopotential termVpsp to be of the form of norm-conserving GTH-HGHpseudopotentials (15–17), which have a local and a non-local term, Vpsp = Vlocal + Vnonlocal. For each of the ionsthese potentials have this form:

Vlocal(r) = −Zion

rerf

(r√

2rloc

)+ exp

[−1

2

(r

rloc

)2]×

×

[C1 + C2

(r

rloc

)2

+ C3

(r

rloc

)4

+ C4

(r

rloc

)6]

(7)

Vnonlocal =∑

`

3∑i,j=1

h(`)ij |p

(`)i 〉〈p(`)

j | (8)

〈r|p(`)i 〉 =

√2r`+2(i−1) exp

[− 1

2

(rr`

)2]

r`+(4i−1)/2`

√Γ

(`+ 4i−1

2

) +∑m=−`

Y`m(θ, φ) ,

where Y`m are the spherical harmonics, and rloc, r` are,respectively, the localization radius of the local pseudopo-tential term and of each projector.

The analytic form of the pseudopotentials togetherwith the fact that their expression in real space can bewritten in terms of a linear combination of tensor prod-ucts of one dimensional functions is of great utility in ourmethod.

Each of the terms of the hamiltonian is implementeddifferently, and will be illustrated in the following sec-tions. After the application of the Hamiltonian, theKS wavefunctions are updated via a direct minimisationscheme (33), which is fast and reliable but only for non-zero gap system, namely insulators.

IV. TREATMENT OF KINETIC ENERGY

The matrix elements of the kinetic energy operatoramong the basis functions of our mixed representation(i.e scaling functions with scaling functions, scaling func-tion with wavelets and wavelets with wavelets) can becalculated analytically (11). For simplicity, let us illus-trate the application of the kinetic energy operator onto awavefunction Ψ that is only expressed in terms of scalingfunctions.

Ψ(x, y, z) =∑

i1,i2,i3

si1,i2,i3φ(x/h−i1)φ(y/h−i2)φ(z/h−i3)

The result of the application of the kinetic energy oper-ator on this wavefunctions will again only be expressedin terms of scaling functions

12∇2Ψ(x, y, z) =

=∑

i1,i2,i3

si1,i2,i3φ(x/h− i1)φ(y/h− i2)φ(z/h− i3) .

(9)

Analytically the coefficients si1,i2,i3 and si1,i2,i3 are re-lated by a convolution

si1,i2,i3 =12

∑j1,j2,j3

Ki1−j1,i2−j2,i3−j3sj1,j2,j3 (10)

where

Ki1,i2,i3 = Ti1Ti2Ti3 , (11)

and

Ti1 =∫

dxφ(x/h− i1) ∂2xφ(x/h) . (12)

Using the refinement equation (1), the Ti’s can be calcu-lated analytically, from a suitable eigenvector of a matrixderived from the wavelet filters (11). For this reason theexpression of the kinetic energy operator is exact in agiven Daubechies basis.

5

Since the 3-dimensional kinetic energy filter Ki1,i2,i3 isa product of of three 1-dim filters (Eq. 11 the convolutionin Eq. 10 can be evaluated with 3N1N2N3L operationsfor a 3-dimensional grid of N1N2N3 grid points. L isthe length of the 1-dimensional filter which is 29 for ourDaubechies family. The kinetic energy can thus be eval-uated with linear scaling with respect to the number ofnonvanishing expansion coefficients of the wavefunction.This statement remains true for a mixed scaling function-wavelet basis where we have both nonvanishing s and dcoefficients and for the case where the low and high res-olution regions cover only parts of the cube of N1N2N3

grid points.The Daubechies wavefunctions of degree 16 have an

approximation error of h8, i.e. the difference betweenthe exact wavefunction and its representation in a finitebasis set (Eq. 3) is decreasing as h8. The error of thekinetic energy in a variational scheme decreases then ash2·8−2 = h14 (12). As we will see the kinetic energyis limiting the convergence rate in our scheme and theoverall convergence rate is thus h14. Figure 3 shows thisasymptotic convergence rate.

10-6

10-5

10-4

10-3

10-2

0.3 0.35 0.4 0.45 0.5 0.55 0.6

Abs

olut

e en

ergy

pre

cisi

on (

Ha)

h (bohr)

wavelet code

A h14 + B h

15 + C h16

FIG. 3 Convergence rate O(h14) of the wavelet code for atest run of a carbon atom. For this run the interpolationparameters are found to be, whithin 2% accuracy: A = 344,B = −1239, C = 1139. Other test systems gave comparableconvergence rates.

V. TREATMENT OF LOCAL POTENTIAL ENERGY

In spite of the striking advantages of Daubechieswavelets the initial exploration of this basis set (13) didnot lead to any algorithm that would be useful for realelectronic structure calculations. This was due to the factthat an accurate evaluation of the local potential energyis difficult in a Daubechies wavelet basis.

By definition, the local potential V (r) can be easilyknown on the nodes of the uniform grid of the simulationbox. Approximating a potential energy matrix element

Vi,j,k;i′,j′,k′

Vi,j,k;i′,j′,k′ =∫

drφi′,j′,k′(r)V (r)φi,j,k(r)

by

Vi,j,k;i′,j′,k′ ≈∑

l,m,n

φi,j,k(rl,m,n)V (rl,m,n)φi,j,k(rl,m,n)

gives an extremely slow convergence rate with respectto the number of grid point used to approximate theintegral because a single scaling function is not verysmooth, i.e. it has a rather low nuber of continuousderivatives. A. Neelov and S. Goedecker (14) haveshown that one should no try to approximate a singlematrix element as accurately as possible but that oneshould try instead to approximate directly the expecta-tion value of the local potential. The reason for this strat-egy is that the wavefunction expressed in the Daubechybasis is smoother than a single Daubechies basis func-tion. A single Daubechies scaling function of order 16has only 4 continous derivatives. By suitable linear com-binations of Daubechies 16 one can however exactly rep-resent polynomials up to degree 7, i.e functions that have7 non-vanishing continuous derivatives. The discontinu-ities get thus canceled by taking suitable linear combi-nations. Since we use pseudopotentials, our exact wave-function are analytic and can locally be represented bya Taylor series. We are thus approximating functionsthat are approximately polynomials of order 7 and thediscontinuities nearly cancel.

Instead of calculating the exact matrix elements wetherefore use matrix elements with respect to a smoothedversion φ of the Daubechies scaling functions.

Vi,j,k;i′,j′,k′ ≈∑

l,m,n

φi′,j′,k′(rl,m,n)V (rl,m,n)φi,j,k(rl,m,n) =

∑l,m,n

φ0,0,0(ri′+l,j′+m,k′+n)V (rl,m,n)φ0,0,0(ri+l,j+m,k+n) ,

(13)

where the magic filter ω is given by

ωl,m,n = φ0,0,0(rl,m,n)

The relation between the true functional values, i.e. thescaling function, and ω is shown in figure 4. Even thoughEq. 13 is not a particulary good approximation for a sin-gle matrix element it gives an excellent approximationfor the expectation values of the local potential energy∫

dx

∫dy

∫dzΨ(x, y, z)V (x, y, z)Ψ(x, y, z)

or also for matrix elements between different wavefunc-tions ∫

dx

∫dy

∫dzΨi(x, y, z)V (x, y, z)Ψj(x, y, z)

6

in case they are needed. In practice we do not explicitlycalculate any matrix elements but we apply only filters tothe wavefunction expansion coefficients as will be shownin the following. This is mathematically equivalent butnumerically much more efficient.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6 8

x

φ(x)

ωj

FIG. 4 The magic filter ωi for the least asymmetricDaubechies-16 basis.

Since the operations with the local potential V are per-formed in the computational box on the double resolu-tion grid with grid spacing h′ = h/2, we must performa wavelet transformation before applying the magic fil-ters. These two operation can be combined in one givingrise to modified magic filters both for scaling functionsand wavelets on the original grid of spacing h. Thesemodified magic filters can be obtained from the originalone using the refinement relations and they are shown inFigures 5, 6. Following the same guidelines as the kineticenergy filters, the smoothed real space values Ψi,j,k of awavefunction Ψ are calculated by performing a productof three 1-dim convolutions with the magic filters alongthe x y and z direction. For the scaling function part ofthe wavefunction the corresponding formula is

Ψi1,i2,i3 =∑

j1,j2,j3

sj1,j2,j3v(1)i1−2j1

v(1)i2−2j2

v(1)i3−2j3

where v(1)i is the filter that maps a scaling function on a

double resolution grid. Similar convolutions are neededfor the wavelet part. The calculation is thus similar tothe treatment of the Laplacian in the kinetic energy.

Once we have calculated Ψi,j,k the approximate expec-tation value εV of the local potential V for a wavefunc-tion Ψ is obtained by simple summation on the doubleresolution real space grid:

εV =∑

j1,j2,j3

Ψj1,j2,j3Vj1,j2,j3Ψj1,j2,j3

The evaluation of the local potential energy εV con-verges with a convergence rate of h16 to the exact valuewhere h is the grid spacing. The potential energy hasthus a convergence rate that is two powers of h fasterthan the rate for the kinetic energy.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6 8

x

φ(x)

√ 2

vi

(1)

FIG. 5 The fine scale magic filter v(1)i (combination of a

wavelet transform and the magic filter in figure 4) for theleast asymmetric Daubechies-16 basis, scaled by

√2 for com-

parison with the scaling function.

-1.5

-1

-0.5

0

0.5

1

-6 -4 -2 0 2 4 6 8

x

ψ(x)

√ 2

vi

(2)

FIG. 6 The fine scale magic filter v(1)i (combination of a

wavelet transform and the magic filter in figure 4) for the leastasymmetric Daubechies-16 wavelet, scaled by

√2 for compar-

ison with the wavelet itself.

VI. CALCULATION OF HARTREE POTENTIAL

We saw in the section on the treatment of the localpotential energy how to express efficiently the point val-ues of the smoothed wavefunction Ψ on the fine gridmesh. From these values the charge density on a gridpoint j1, j2, j3 of the double resolution grid is given by

ρj1,j2,j3 =∑

i

n(i)occΨ

2ν;j1,j2,j3 (14)

where n(i)occ are the occupation numbers. For a closed shell

system they equal 2 for the occupied orbitals and zero forall other orbitals. The discrete charge density ρj1,j2,j3 isa very good approximation to the charge distribution ofthe continuous wavefunctions |Ψ〉 in the sense that the

7

first multipoles of the discrete charge distribution con-verge rapidly to the values of the continous charge dis-tribution. The monopole converges with a rate of h16.For each higher multipole moment the convergence rateis reduced by one power of h, i.e. dipoles converge witha rate of h15, quadrupoles with h14 etc. The discretecharge density ρ on the double resolution grid is then theinput to various Poisson solvers that are available for dif-ferent boundary conditions. In the case of free boundaryconditions appropriate for isolated molecules the valuesρj1,j2,j3 form the expansion coefficients for an expansionin interpolating scaling functions of order 16. This ex-pansion strictly conserves all the multipoles up to theangular moment ` = 15 and allows to solve the inte-gral equation for the potential exactly with the correctboundary conditions (22). In addition to free boundaryconditions we have also implemented surface boundaryconditions (23), i.e. periodicity in 2 directions and freeboundary conditions in the third direction. In this casethe charge density is represented in a mixed plane wave-scaling function representation.

These Poisson solvers have a convergence rate of h′m,wherem is the order of the interpolating scaling functionsused for expressing the poisson kernel. Since we use in-terpolating scaling functions of order 16 the convergencerate of the electrostatic potential is faster than the ratefor the kinetic energy. All these Poisson Solvers have incommon, that they perform explicitly the convolution ofthe density with the Green’s functions of the Poisson’sequation. The necessary convolutions are done by a tra-ditional zero-padded FFT procedure which leads to anO(N logN) operation count with respect to the num-ber of grid points N . The fraction of the computationaltime needed for the solution of the Poisson’s equationdecreases with increasing system size and is roughly 1%for large systems, see section XVI. Moreover, the explicitGreen’s function treatment of the Poisson’s solver allowsus to treat isolated systems with a net charge directlywithout the insertion of compensating charges.

VII. XC FUNCTIONALS AND IMPLEMENTATION OFGGA’S

The charge density expression used for calculating theHartree potential

ρ(r) =∑

i

n(i)occ|Ψi(r)|2 , (15)

is also used for the calculation of the exchange correlationenergy Exc and the corresponding potential Vxc. Anyreal-space based implementation of the XC functionalsfits well with this density representation. In our programwe use the XC functionals as implemented in ABINITcode. To this aim, we use the same ABINIT XC routinesto calculate the exchange correlation energy

Exc =∫ρ(r)εxc(r)dr , (16)

together with the XC potential

Vxc(r) =δExc

δρ(r). (17)

Also spin-polarised (collinear) version of the ABINIT XCfunctionals can be used with our method.

In the case of GGA exchange-correlation functionalsthe XC energy density depends both on the local valuesof the charge density ρ and on the modulus of its gradient:

εxc(r) = εxc (ρ(r), |∇ρ|(r)) . (18)

A traditional finite difference scheme of fourth order isused on the double resolution grid to calculate the gra-dient of the charge density

∂wρ(ri1,i2,i3) =∑

j1,j2,j3

c(t)i1,i2,i3;j1,j2,j3

ρj1,j2,j3 , (19)

where w = x, y, z. For grid points close to the boundaryof the computational volume the above formula requiresgrid points outside the volume. For free boundary con-ditions the values of the charge density outside of thecomputational volume in a given direction are taken tobe equal to the value at the border of the grid.

The relation between the gradient and the densitymust be taken into account when calculating Vxc in thestandard White-Bird approach (24), where the densitygradient is considered as an explicit functional of the den-sity. There the XC potential can be split in two terms:

Vxc(ri1,i2,i3) = V oxc(r) + V c

xc(r) ,(20)

where

V oxc(ri1,i2,i3) = εxc(r) + ρ(r)

∂εxc

∂ρ(r) , (21)

V cxc(ri1,i2,i3) =

∑j1,j2,j3

ρ

|∇ρ|∂εxc

∂|∇ρ|(rj1,j2,j3)×

×∑

w=x,y,z

∂wρ(rj1,j2,j3)c(w)j1,j2,j3;i1,i2,i3

,

where the “ordinary” part V oxc is present in the same form

also for LDA functionals, while the White-Bird “correc-tion” term V c

xc appears only when the XC energy dependsexplictly on |∇ρ|. The coefficients c(w) are the coeffi-cients of the finite difference formula used for calculatingthe gradient of the charge density.

The evaluation of the XC terms and also, when needed,the calculation of the gradient of the charge density, mayeasily be performed together with the Poisson solver usedfor evaluating the Hartree potential. This allows us tosave computational time.

8

VIII. TREATMENT OF THE NON-LOCALPSEUDOPOTENTIAL

The energy contributions from the non-local pseudopo-tential have for each angular moment l the form∑

i,j

〈Ψ|pi〉hij〈pj |Ψ〉

where |pi〉 is a pseudopotential projector. Once applyingthe hamiltonian operator, the application of one projec-tor on the wavefunctions requires the calculation of

|Ψ〉 → |Ψ〉+∑i,j

|pi〉hij〈pj |Ψ〉 .

If we use for the projectors the representation of Eq. 3,i.e. the same as for the wavefunction, both operations aretrivial to perform. Because of the orthogonality of thebasis set we just have to calculate scalar products amongthe coefficient vectors and to update the wavefunction.The scaling function and wavelet expansion coefficientsfor the projectors are given by (10)∫

p(r)φi1,i2,i3(r)dr ,∫p(r)ψν

i1,i2,i3(r)dr . (22)

The GTH-HGH pseudopotentials (15; 16) have projec-tors which are written in terms of gaussians times polyno-mials. This form of projectors is particularly convenientto be expanded in the Daubechies basis. In other terms,since the general form of the projector is

〈r|p〉 = e−cr2x`xy`yz`z ,

the 3-dimensional integrals can be calculated easily sincethey can be factorized into a product of 3 1-dimensionalintegrals.∫

〈r|p〉φi1,i2,i3(r)dr = Wi1(c, `x)Wi2(c, `y)Wi3(c, `x) ,

(23)

Wj(c, `) =∫ +∞

−∞e−ct2t`φ(t/h− j)dt (24)

The 1-dimensional integrals are calculated in the fol-lowing way. We first calculate the scaling function expan-sion coefficients for scaling functions on a 1-dimensionalgrid that is 16 times denser. The integration on thisdense grid is done by summing the product of the Gaus-sian and the smoothed scaling function that is obtainedby filtering the original scaling function with the magicfilter (14). This integrations scheme based on the magicfilter has a convergence rate of h14 and we gain there-fore a factor of 1614 in accuracy by going to a densergrid. This means that the expansion coefficients are forreasonable grid spacings h accurate to machine preci-sion. After having obtained the expansion coefficientswith respect to the fine scaling functions we obtain the

expansion coefficients with respect to the scaling func-tions and wavelets on the required resolution level byone-dimensional fast wavelet transformations. No accu-racy is lost in the wavelet transforms and our represen-tation of the projectors is therefore typically accurate tonearly machine precision.

IX. PRECONDITIONING

As already mentioned, direct minimisation of the totalenergy is used for finding the converged wavefunctions.The gradient gi of the total energy with respect to thei-th wavefunction |Ψi〉 is given by

|gi〉 = H|Ψi〉 −∑

j

Λij |Ψj〉 , (25)

where Λij = 〈ψj |H|ψi〉 are the Lagrange multipliers en-forcing the orthogonality constraints . Convergence isachieved when the average norm of the residue 〈gi|gi〉1/2

is below a used-defined numerical tolerance.Given the gradient direction at each step, several al-

gorithms can be used for improving convergence. In ourmethod we use either preconditioned steepest-descent al-gorithm or preconditoned DIIS method (18; 19). Thesemethods work very well for improving the convergencefor non-zero gap systems if a good preconditioner is avail-able.

The preconditioning gradient |gi〉 which approximatelypoints in the direction of the minimum is obtained bysolving the linear system of equations obtained by dis-cretizing Eq. 26.(

12∇2 − εi

)gi(r) = gi(r) . (26)

The values εi are approximate eigenvalues obtained bya subspace diagonalization in a minimal basis of atomicpseudopotential orbitals during the generation of the in-put guess.

Eq. (26) is solved by a preconditioned conjugate gradi-ent (CG) method. The preconditioning is done by usingthe diagonal elements of the matrix representing the op-erator 1

2∇2−εi in a scaling function-wavelet basis. In the

initial step we use ` resolution levels of wavelets where `is typically 4. To do this we have to enlarge the domainwhere the scaling function part of the gradient is definedto a grid that is a multiple of 2`. This means that thepreconditioned gradient gi will also exist in a domain thatis larger than the domain of the wavefunction Ψi. Never-theless this approach is useful since it allows us to obtainrapidly a preconditioned gradient that has the correctoverall shape. In the following iterations of the conju-gate gradient we use only one wavelet level in addition tothe scaling functions for preconditioning. In this way wecan do the preconditioning exactly in the domain of ba-sis functions that are used to represent the wavefunction(Eq. 3). A typical number of CG iterations necessary toobtain a meaningful preconditioned gradient is 5.

9

X. ORTHOGONALIZATION

We saw the need of keeping the wavefunctions Ψi to beorthonormal at each step of the minimisation loop. Thismeans that the overlap matrix S, with matrix elements

Sij = 〈Ψj |Ψi〉 (27)

must be equal to the identity matrix.All of the orthogonalization algorithms have a cubic

complexity causing this part of the program to domi-nate for large systems, see Fig. 11. We optimized thispart therefore carefully and found that a pseudo-Gram-Schmidt algorithm that uses a Cholesky factorization ofthe overlap matrix S is the most efficient method on par-allel computers. In the following, we discuss the reasonsfor this choice by comparing it to two other orthogonal-ization algorithms: classical Gram-Schmidt and Loewdinorthogonalizations.

A. Gram-Schmidt orthogonalization

The classical Gram-Schmidt orthonormalization algo-rithm generates an orthogonal set of orbital

{|Ψi〉

}out

of a non-orthogonal set {|Ψi〉}, by processing separatelyeach orbital. The overlap of the currently processed or-bital |Ψi〉 with the set of the already processed orbitals{|Ψj〉

}j=1,··· ,i−1

is calculated and is removed from |Ψi〉.Thereafter, the transformed orbital |Ψi〉 is normalized.

|Ψi〉 = |Ψi〉 −i−1∑j=1

〈Ψj |Ψi〉|Ψj〉 (28)

|Ψj〉 −→|Ψj〉√〈Ψj |Ψj〉

(29)

The algorithm consists of the calculation of n(n + 1)/2scalar products and wavefunction updates. If the coeffi-cients of each orbital are distributed among several pro-cessors n(n + 1)/2 communication steps are needed tosum up the various contributions from each processor toeach scalar product. Such a large number of communica-tion steps lead to a large latency overhead on a parallelcomputer and therefore to poor performance.

B. Loewdin orthogonalization

The Loewdin orthonormalization algorithm is based onthe following equation

|Ψi〉 =∑

j

|Ψj〉S− 1

2ij , (30)

where a new set of orthonormal orbitals |Ψi〉 is obtainedby multiplying the inverse square-root of the overlap ma-trix S with the original orbital set.

The implementation of this algorithm requires that theoverlap matrix S is calculated. As S is a symmetric ma-trix, we need to calculate only a triangle of the origi-nal matrix which results in n(n + 1)/2 scalar products.In contrast to the classical Gram-Schmidt algorithm thematrix elements Sij depend on the original set of orbitalsand can be calculated in parallel in the case where eachprocessor holds a certain subset of the coefficients of eachwavefunction. At the end of this calculation a single com-munication step is needed to sum up the entire overlapmatrix out of the contributions to each matrix elementcalculated by the different processors. Thereafter, theinverse square-root of S is calculated. For this, we usethe fact that S is an hermitian positive definite matrix.Thus, there exist a unitary matrix U which diagonalizesS = U?ΛU , where Λ is a diagonal matrix with positiveeigenvalues. Consequently, S−

12 = U†Λ− 1

2U . Hence, aneigenvalue problem must be solved in order to find U andΛ.

C. Pseudo Gram-Schmidt using Cholesky Factorization

In this scheme a Cholesky factorization of the overlapmatrix S = LLT is calculated. The new orthonormalorbitals are obtained by

|Ψi〉 =∑

j

(L−1

ij

)|Ψj〉 , (31)

and are equivalent to the orbitals obtained by the clas-sical Gram-Schmidt. The procedure for calculating theoverlap matrix out of the contributions calculated by eachprocessor is identical to the Loewdin case. Instead ofsolving an eigenvalue problem we have however to calcu-late the decomposition of the overlap matrix. This canbe done much faster. Thus, this algorithm has a lowerpre-factor than the Loewdin scheme and requires onlyone communication step on a parallel computer.

XI. CALCULATION OF FORCES

Atomic forces can be calculated with the same meth-ods used for the application of the hamiltonian onto awavefunction. Since the scaling function/wavelet basishas no dependence on the atomic position, we have noPulay forces (30) and atomic forces can be evaluated di-rectly through the Feynman-Hellmann theorem. Exceptfor the force arising from the trivial ion-ion interaction,which for the i-th atom is

F(ionic)i =

∑j 6=i

ZiZj

R3ij

(Ri −Rj) , (32)

the energy terms which depend explicitly on the atompositions are related to the pseudopotentials. As shownin the previous sections, the GTH-HGH pseudopotentialswe are using are based on separable functions (15; 16),

10

and can be splitted into a local and a non-local contribu-tion.

For an atom placed at position Ri, the contribution tothe energy that comes from the local part of the pseu-dopotential is

Elocal(Ri) =∫

dr Vlocal(|r−Ri|)ρ(r) . (33)

Where the local pseudopotential can be splitted in a longrange and a short-range term Vlocal(λ) = VL(λ) + VS(λ),and

VL(λ) = −Zi

λerf

(λ√2r`

),

VS(λ) = exp(− λ2

2r2`

) [C1 + C2

(λ

r`

)2

+ (34)

+ C3

(λ

r`

)4

+ C4

(λ

r`

)6],

where the Ci and r` are the pseudopotential parameters,depending on the atom of atomic number Zi under con-sideration. The energy contribution Elocal(Ri) can berewritten in an equivalent form. It is straightforward toverify that

Elocal(Ri) =∫

dr ρL(|r−Ri|)VH(r)

+∫

drVS(|r−Ri|)ρ(r) , (35)

where VH is the Hartree potential, and ρL is such that∇2

rVL(|r−Ri|) = −4πρL(|r−Ri|). This analytical trans-formation remains also valid in our procedure for solvingthe discretized Poisson equation. From equation (35) wecan calculate

ρL(λ) = − 1(2π)3/2

Zi

r3`e− λ2

2r2` , (36)

which is a localised (thus short-ranged) function. Theforces coming from the local pseudopotential are thus

F(local)i = −∂E`(Ri)

∂Ri

=1r`

∫dr

r−Ri

|r−Ri|

[ρ′L(|r−Ri|)VH(r)

+ V ′S(|r−Ri|)ρ(r)

], (37)

where

ρ′L(λ) =1

(2π)3/2

Zion

r4loc

λe− λ2

2r2` ,

V ′S(λ) =

λ

r`e− λ2

2r2`

[(2C2 − C1) + (4C3 − C2)

(λ

r`

)2

+

+ (6C4 − C3)(λ

r`

)4

− C4

(λ

r`

)6]. (38)

Within this formulation, the contribution to the forcesfrom the local part of pseudopotential is written in termsof integrals with localized functions (gaussians timespolynomials) times the charge density and the Hartreepotential. This allows us to perform the integrals onlyin a relatively small region around the atom positionand to assign different integrations to different proces-sors. Moreover, the calculation is performed with linear(O(N logN)) scaling.

The contribution to the energy that comes from thenonlocal part of the pseudopotential is, as we saw in sec-tion VIII,

Enonlocal(Ri) =∑

l

∑mn

〈Ψ|plm(Ri)〉hl

mn〈pln(Ri)|Ψ〉 ,

(39)where we wrote explicitly the dependence of the projectorto the atom position Ri. The contribution of this termto the atomic forces is thus

F(nonlocal)i = −

∑l

∑m,n

〈Ψ|∂p(Ri)∂Ri

〉hmn〈p(Ri)|Ψ〉

−∑

〈Ψ|p(Ri)〉hmn〈∂p(Ri)∂Ri

|Ψ〉 . (40)

Expressing the derivatives of the projectors in theDaubechies basis, the evaluation of the scalar productsis straightforward. The scaling functions - wavelets ex-pansion coefficients of the projector derivatives can becalculated with machine precision accuracy in the sameway as the projectors themselves were calculated. Thisis due to the fact that the derivative of the projectors arelike the projectors themselves products of Gaussians andpolynomials.

XII. LOCALISATION PROPERTIES AND SMOOTHNESSOF THE BASIS FUNCTIONS

As discussed above, Daubechies basis functions aresuitable for expanding localised functions. There is noneed to put basis functions on grid points which are muchtoo far from the atoms. For this reason, we choose to as-sociate the basis functions to points lying inside the unionof atom-centered spheres defined by their radii. This op-eration must be performed both for the high and lowresolution grid points (see Figure 2). In our method, wemeasure these radii in two different units. For the highresolution region the radius is expressed in terms of theshortest localisation radius of the atom pseudopotential.For the low resolution region, the distance is expressedin units of the asymptotic decaying length of the atomicwavefunction 1/

√2εHOMO, calculated from the energy

εHOMO of the highest occupied atomic orbital, obtainedfrom (20). In this way we can easily determine nearlyoptimal sizes for the high and low resolution regions andminimize the number of degrees of freedom to achieve atarget accuracy. (section XVI. )

11

We saw that Daubechies wavelets have the propertythat linear combinations of them can be smoother thana single Daubechies scaling function or wavelet. Thewavefunction of Eq. 3 is thus typically smoother thanthe scaling functions and wavelets used to represent it.The reduced smoothness of Daubechies scaling functionof order 16 in the tail region can be seen from Fig. 7. Thecancellation of discontinuities in the basis set by suitablelinear combinations is only possible in an infinite inter-val where several basis functions are present between anytwo grid points. Since we use a finite grid of scalingfunctions in the tail region, the number of scaling func-tions that contribute to the value of the wavefunction ata certain point is dropping as we are going out of thecomputational volume. The outermost intervals of thewavefunction is actually only described by the tail of asingle scaling function. Hence the wavefunction is gettingless smooth towards its end. This reduced smoothness af-fects principally the kinetic energy. For systems withouta net charge, far from the atoms the potential is verysmall and for this reason errors in the potential energyare decreasing exponentially with respect to the size ofthe computational volume.

XIII. PERTURBATIVE CALCULATION OF THE FINITESIZE CORRECTIONS

Far from the atoms each wavefunctions decays expo-nentially with a decay rate which depends on its KSeigenvalue. The kinetic energy contribution of the non-smooth wavefunction in its tail region is of the order ofA/h2 where A is the amplitude of the tail of the wave-function, whereas the exact wavefunction has a kineticenergy of the order of A. As a consequence the kineticenergy error increases as one decreases h and the totalenergy increases as well if the computational volume istoo small. We know however that the contribution tothe kinetic energy in this region will depend uniquely onthe asymptotic behaviour of the wavefunction which issuggested by its KS eigenvalue. In other terms, the mag-nitude of the kinetic energy error due to the localisationof the system into a finite size volume can in principlebe estimated by knowing the KS eigenvalue of the wave-function.

If on the other hand the computational volume is largeenough such that the amplitude A is very small ourmethod shows a strict variational behaviour with a con-vergence rate of h14 over a large range of grid spacingsh. This is illustrated in Fig. 3.

The above described facts prompted us to develop amethod that cuts off the wave function tail at a verylarge radius but which is computationally much less ex-pensive than a fully selfconsistent calculation in a verylarge computational volume. We do first a fully selfcon-sistent calculation in a medium size box and we add thenafterwards the missing far tail to the wavefunction. Letus denote the wavefunction that we have calculated in

-5.0⋅10-7

0.0⋅100

5.0⋅10-7

1.0⋅10-6

1.5⋅10-6

2.0⋅10-6

2.5⋅10-6

3.0⋅10-6

3.5⋅10-6

4.0⋅10-6

4.5⋅10-6

6 6.5 7 7.5 810-20

10-18

10-16

10-14

10-12

10-10

10-8

10-6

10-4

φ(x)

|φ(x

)| (

log.

sca

le)

x

φ(x)

|φ(x)|

FIG. 7 Zoom of the Daubechies scaling function near theborder of its support. Both the function and its absolutevalue are plotted.

the medium size box by |Ψ〉 and the wavefunction in thevery large box by |Ψ〉 + |∆Ψ〉. As we will see |∆Ψ〉 isnegligible inside the medium size box. It is essentiallythe tail outside the original medium size box plus a partthat cancels the non-smooth behaviour in the surface re-gion of the medium size box. Evidently |Ψ〉 + |∆Ψ〉 hasto satisfy the Schrodinger equation(

12∇2 + V (r)

)(|Ψ〉+ |∆Ψ〉) = ε(|Ψ〉+ |∆Ψ〉) .

Rearranging the term one obtains(12∇2 + V (r)− ε

)|∆Ψ〉 = −

(12∇2 + V (r)− ε

)|Ψ〉 .

(41)The term on the right hand side of the above equationis the gradient |g〉 that is needed in any minimizationscheme. When the calculation of the wavefunctions isconverged the gradient is zero (actually less than a smallnumerical tolerance) when projected onto the subspaceof the basis functions spanning the medium size volume.The gradient is however not anymore zero when it is pro-jected onto the basis set of the larger volume. In thiscase the projection onto the basis function just outsidethe medium size volume give a nonzero contribution. Re-member that the fact that these basis functions are miss-ing in the basis set of the medium size volume is causingthe non-smooth behaviour. Projection on basis functionsthat are far outside the surface region of the medium sizevolume are again zero since |Ψ〉 identically zero. So, inthis context, the gradient is a quantity that is nonzeroonly in a small shell outside the original medium size vol-ume. The width of this shell is given by the length of thekinetic energy filter. Since the potential is very small inthe tail region Eq. 41 can be approximated by(

12∇2 − ε

)|∆Ψ〉 = |g〉 .

12

As usually in a perturbative treatment we rely on the factthat the eigenvalues ε converge faster than the wavefunc-tion and that the zero-th order eigenvalues can thereforebe used for the first order correction of the wavefunc-tion. The above equation is identical to the precondi-tioning equation Eq. 26 and can be solved with the samemethod, just within a larger volume. In this way wecan eliminate in a single preconditioning step at the endof the fully selfconsistent calculation in the medium sizevolume a large fraction of the error arising from cuttingoff the wavefunction at the surface of our computationalvolume. We can thus have a reliable estimate of the ap-proximation which is done by localising the system on afinite-size region. Fig. 8 shows an example of the con-vergence rate of the total energy with respect to the sizeof the computational volume both with and without tailcorrection.

10-5

10-4

10-3

10-2

3 3.5 4 4.5 5 5.5 6 6.5 7

Abs

olut

e en

ergy

pre

cisi

on (

Ha)

Localisation radius of coarse region (arb. units)

h = 0.35bohr

h = 0.30bohr

Ordinary wavefunctions minimisation

With perturbative tail corrections

FIG. 8 Absolute convergence of the total energy of a methanemolecule as a function of the low resolution localization ra-dius with and without the tail corrections. The curves fortwo different values of the grid spacing are plotted, showingthe h convergence for the localization parameter sufficientlyextended.

XIV. PARALLELIZATION

Two data distribution schemes are used in the par-allel version of our program. In the orbital distributionscheme each processor is working on one or a few orbitalsfor which it holds all its scaling function and waveletcoefficients. In the coefficient distribution scheme eachprocessor holds a certain subset of the coefficients of allthe orbitals. Most of the operations such as applyingthe Hamiltonian on the orbitals, and the precondition-ing is done in the orbital distribution scheme. This hasthe advantage that we do not have to parallelize all theseroutines and that we therefore achieve perfect parallelspeedup. The calculation of the Lagrange multipliersthat enforce the orthogonality constraints onto the gra-dient as well as the orthogonalization of the orbitals is

done in the coefficient distribution scheme. For the or-thogonalization we have to calculate the matrix 〈Ψj |Ψi〉and for the Lagrange multipliers the matrix 〈Ψj |H|Ψi〉.So each matrix element is a scalar product and each pro-cessor is calculating the contribution to this scalar prod-uct from the coefficients it is holding. A global reduc-tion sum is then used to sum the contributions to obtainthe correct matrix. Such sums can esily be performedwith the very well optimised BLAS-LAPACK libraries.Switch back and forth between the orbital distributionscheme and the coefficient distribution scheme is done bythe MPI global transposition routine MPI ALLTOALL.For parallel computers where the cross sectional band-width (25) scales well with the number of processors thisglobal transposition does not require a lot of CPU time.The most time consuming communication is the globalreduction sum required to obtain the total charge distri-bution from the partial charge distribution of the indi-vidual orbital (sum in Eq. 14).

XV. CALCULATION OF UNOCCUPIED ORBITALS

In order to calculate the unoccupied Kohn Sham or-bitals we use the Davidson method (29) after havingfound the selfconsistent occupied Kohn Sham orbitals.An initial guess for the Nvirt unoccupied eigenvectors Ψj

and eigenvalues εj of the Kohn Sham Hamiltonian HKS

is obtained from the subspace diagonalization in a min-imal atomic basis set that is also used to generate theinput guess for the occupied orbitals. For any given setof virtual orbitals we calculate then the gradients (Eq. 25where the Lagrange multipliers ensure only orthogonal-ity to the occupied orbital) and precondition then thesegradients according to Eq. 26. A subspace diagonaliza-tion is then done in the space spanned by the presentset of approximate eigenvectors and their preconditionedgradients. In the original Davidson method the dimen-sion of the subspaced is increased in each iteration sinceone keeps all the previous preconditioned gradients in thesubspace. To save memory we have limited the dimensionof the subspace in each iteration to 2Nvirt using by onlythe present set of approximate eigenvectors together withtheir preconditioned gradients. Even though the numberof requested unoccupied orbitals is typically small (fre-quently only the LUMO), a larger set of vectors Nvirt

is considered in our method (in a parallel calculation atleast one per processor), but only the gradients of thedesired number of orbitals are taken into account for theconvergence criterion for the norm of the gradients. This,together with the fact that our preconditioner is rathergood allows us to achieve fast convergence rates compara-ble to the ones achieved in the calculation of the occupiedorbitals. Some 20 iterations are typically needed.

13

XVI. PERFORMANCE RESULTS

We have applied our method on different molecularsystems in order to test its performances. As expected,the localisation of the basis set allows us to reduce con-siderably the number of degrees of freedom (i.e. thenumber of basis function which must be used) to attaina given absolute precision with respect to a plane-wavecode. This fact reduces the memory requirements andthe number of floating point operations. Figure 9 showsthe comparison of the absolute precision of the calcula-tion of a 44 atoms molecule as a function of the numberof degrees of freedom used for the calculation. In table Ithe comparison of the timings of a single SCF cycle withrespect to other plane-wave based codes are shown. Sincethe system is relatively small the cubic terms do not dom-inate. For large systems of several hundred atoms thegain in CPU time compared to a plane wave program isproportional to the reduction in the number of degressof freedom squared (compare Eq. 42) and can thus bevery significant as one can conclude from Fig. 9.

10-4

10-3

10-2

10-1

100

101

105 106

Abs

olut

e en

ergy

pre

cisi

on (

Ha)

Number of degrees of freedom

Ec = 125 Ha

Ec = 90 Ha

Ec = 40 Ha

h = 0.3bohr

h = 0.4bohr

Plane waves

Wavelets

FIG. 9 Absolute precision (not precision per atom) as a func-tion of the number of degrees of freedom for a cinchonidinemolecule (44 atoms). Our method is compared with a planewave code. In the case of the plane wave code the plane wavecutoff and the volume of the computational box were cho-sen such as to obtain the required precision with the smallestnumber of degrees of freedom. In the case of our waveletprogram the grid spacing h and the localzation radii wereoptimized. For very high accuracies the exponential conver-gence rate of the plane waves beats the algebraic convergencerate of the wavelets. The wavelet code results however to bemuch faster for a given accuracy, thanks to the localisationproperties of the basis set which allow for highly optimisedoperations (see table I).

The parallellisation scheme of the code is tested andhas given the efficiency detailed in Figure 10. The over-all efficiency is always higher than 88%, also for largesystems with a big number of processors.

It is also interesting to see which is the computationalshare of the different sections of the code with respect

Ec (Ha) ABINIT (s) CPMD (s) Abs. Precision Wavelets(s)

40 403 173 3.7 · 10−1 30

50 570 207 1.6 · 10−1 45

75 1123 422 2.5 · 10−2 94

90 1659 538 9.3 · 10−3 129

145 4109 2 · 10−4 474

TABLE I Computational time in seconds for a single minimi-sation iteration for different runs of the cinchonidine moleculeused for the plot in figure 9. The values for different cutoffenergies Ec for the plane waves run are shown. The input pa-rameters for the wavelet run are chosen such as to obtain thesame absolute precision of the plane wave calculations. Theplane wave runs are performed with the ABINIT code, whichuses iterative diagonalisation and with CPMD code (26) indirect minimisation. These timings are taken for a serial runon a 2.4GHz AMD Opteron CPU.

88

90

92

94

96

98

100

1 10 100 1000

Effi

cien

cy (

%)

Number of processors

(with orbitals/proc)

32 at

8

4

2

65 at

2

1

173 at

44

22

11

5

3257 at

16

4

11025 at

4

2

1

FIG. 10 Efficiency of the parallel implementation of the codefor several runs with different number of atoms. The numberclose to each point indicates the number of orbitals treatedby each processors, in the orbital distribution scheme.

to the total execution time. Figure 11 represents thepercentage of the computational time for the differentsection of the code as a function of the number of or-bitals while keeping constant the number of orbitals perprocessors. The different sections considered are the ap-plication of the hamiltonian (kinetic, local plus nonlo-cal potential), the construction of the density (Eq.(14)),the Poisson solver for creating the Hartree potential, thepreconditioning-DIIS, and the operations needed for theorthogonality constraint as well as the ortogonalisation,which are mainly matrix-matrix products or matrix de-compositions. These operations are all performed bylinear algebra subroutines furnished by the LAPACK li-braries (27). Also the percentage of the communicationtime is shown. While for relatively small systems themost time-dominating part of the code is related to thePoisson solver, for large systems the most expensive sec-tion is by far the calculation of the linear algebra oper-ations. The operations performed in this section scales

14

cubically with respect to the number of atoms. Apartfrom the Cholesky factorisation, which has a scaling ofO(n3

orb), where norb is the number of orbitals, the cubicterms are of the form

O(n · n2orb) , (42)

where n is the number of degrees of freedom, i.e. thenumber of scaling function and wavelet expansion coef-ficients. Both the calculation of the overlap matrix inEq. 27 and the orthogonality transformation of the or-bitals in Eq. 31 lead to this scaling, The number of thecoefficients n is typically much more bigger than the num-ber of orbitals.

0

20

40

60

80

100

1 5 8 17 32 65 128 257 512 1025 0.1

1

10

100

1000

Per

cent

Sec

onds

(lo

g. s

cale

)

Number of atoms

LinAlgsumrhoPSolverHamAppPrecondOtherComm (%)Time (sec)

FIG. 11 Relative importance of different code sections as afunction of the number of atoms of a simple alkane chain,starting from single carbon atom. The calculation is per-formed in parallel such that as each processor holds the samenumber of orbitals (two in this figure). Also the time in sec-onds for a single minimisation iteration is indicated, showingthe asymptotic cubic scaling of present implementation.

XVII. CONCLUSIONS

In this paper we have shown the principal features ofan electronic structure pseudopotential method based onDaubechies wavelets. Their properties make this basisset a powerful and promising tool for electronic struc-ture calculations. The matrix elements, the kinetic en-ergy and nonlocal pseudopotentials operators can be cal-culated analytically in this basis. The other operationsare mainly based on convolutions with short-range filters,which can be highly optimised in order to obtain goodcomputational performances. Our code shows high sys-tematic convergence properties, very good performancesand an excellent efficiency for parallel calculations. Thiscode is integrated in the ABINIT software package andavailable under GNU-GPL license. At present, severaldevelopments efforts are in progress to enlarge the fea-tures of this code. Mainly, they concern the extension

of this formalism to fully periodic and surfaces systems,as well as the inclusion of non-collinear spin-polarisedXC functionals. A linear scaling version of this waveletcode is also under preparation and will be presented in aforthcoming paper.

XVIII. ACKNOWLEDGEMENTS

We acknowledge support from the European Com-mission within the Sixth Framework Program throughNEST-BigDFT (contract no. BigDFT-511815) and theSwiss National Science foundation. LN3M. Computercalculations were also performed at the Centre de CalculRecherche et Technologie at CEA-Saclay, France and atthe Swiss national Scientific Computing Center (CSCS)in Manno.

References

[1] X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M.Fuchs, G.-M. Rignanese, L. Sindic, M. Verstraete, G.Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami, Ph.Ghosez, J.-Y. Raty, D.C. Allan. Computational Materi-als Science 25, 478-492 (2002). http://www.abinit.org

[2] http://www-drfmc.cea.fr/sp2m/L Sim/BigDFT

http://www.unibas.ch/comphys/comphys/SOFTWARE

[3] Thomas L. Beck, Rev. Mod. Phys. 72, 1041 (2000)[4] J. E. Pask, B. M. Klein, C. Y. Fong, and P. A. Sterne

Phys. Rev. B 59, 12352 (1999)[5] J. J. Mortensen, L. B. Hansen, and K. W. Jacobsen Phys.

Rev. B 71, 035109 (2005)[6] Stefan Goedecker,Rev. Mod. Phys. 71, 1085 (1999)[7] T. A. Arias,Rev. Mod. Phys. 71, 267 (1999)[8] Takeshi Yanai, George I. Fann, Zhenting Gan, Robert J.

Harrison, and Gregory Beylkin J. Chem. Phys. 121, 6680(2004)

[9] I. Daubechies, “Ten Lectures on Wavelets”, SIAM,Philadelphia (1992)

[10] S. Goedecker: “Wavelets and their application for the so-lution of partial differential equations”, Presses Polytech-niques Universitaires et Romandes, Lausanne, Switzer-land 1998, (ISBN 2-88074-398-2)

[11] G. Beylkin, SIAM J. on Numerical Analysis 6, 1716(1992)

[12] Strang J, Fix G J 1988 An analysis of the Finite ElementMethod (Wellesley-Cambridge Press)

[13] C. J. Tymczak and Xiao-Qian Wang Phys. Rev. Lett. 78,3654 (1997)

[14] A. I. Neelov and S. Goedecker, J. of. Comp. Phys. 217,312-339 (2006)

[15] S. Goedecker, M. Teter, J. Hutter, Phys. Rev. B 54, 1703(1996)

[16] C. Hartwigsen, S. Goedecker and J. Hutter, Phys. Rev.B 58, 3641 (1998)

[17] M. Krack, Theor. Chem. Acc. 114, 145 (2005)[18] P. Pulay, Chem. Phys. Lett., 73, 393, (1980)[19] J. Hutter, H.P. Luthi and M. Parrinello, Comp. Mat. Sci.

2 244 (1994).[20] http://physics.nist.gov/PhysRefData/DFTdata/

Tables/ptable.html

15

[21] J.Perdew, K.Burke and M.Ernzerhof, Phys. Rev. Lett 77,3865 (1996)

[22] L. Genovese, T. Deutsch, A. Neelov, S. Goedecker, G.Beylkin, “ Efficient solution of Poisson’s equation withfree boundary conditions ”, J. Chem. Phys. 125, 074105(2006)

[23] L. Genovese, T. Deutsch, S. Goedecker, “ Efficient andaccurate three dimensional Poisson solver for surfaceproblems ”, J. Chem. Phys. 127, 054704 (2007)

[24] J. A. White and D. M. Bird, Phys. Rev. B 50, 4954(1994)

[25] S. Goedecker, A. Hoisie: “Performance Optimization ofNumerically Intensive Codes”, SIAM publishing com-pany, Philadelphia, USA 2001 (ISBN 0-89871-484-2)

[26] CPMD Version 3.8: developed by J. Hutter, A. Alavi,T. Deutsch, M. Bernasconi, S. Goedecker, D. Marx, M.Tuckerman and M. Parrinello, Max-Planck-Institut fur

Festkorperforschung and IBM Zurich Research Labora-tory (1995-1999)

[27] E. Anderson et al., “LAPACK Users’ Guide”, SIAM pub-lishing company, Philadelphia, USA 1999 (ISBN 0-89871-447-8)

[28] M. Dion, H. Rydberg, E. Schr, D. C. Langreth, and B. I.Lundqvist, Phys. Rev. Lett. 92, 246401 (2004)

[29] E. R. Davidson, J. Comp. Phys. 17, 87 (1975)[30] Pulay, P., in Modern Theoretical Chemistry , H. F. Schae-

fer editor, (Plenum Press, New York), 1977[31] J. R. Chelikowsky, N. Troullier, and Y. Phys. Rev. Lett.

72, 1240 (1994).[32] G. Beylkin, R. Coifman and V. Rokhlin, Comm. Pure

and Appl. Math. 44, 141 (1991)[33] M. Payne, M. Teter, D. Allan, T. Arias and J.

Joannopoulos, Rev. of Mod. Phys. 64, 1045, (1992)

daubechies wavelets as a basis set for density functional ...€¦ · 2 institut fur¨ physik,...

Documents