preliminary cpmd benchmarks on ranger, pople, and abe tg aus materials science project matt mckenzie...

Preliminary CPMD BenchmarksOn Ranger, Pople, and Abe

TG AUS Materials Science ProjectMatt McKenzieLONI

What is CPMD ?

•Car Parrinello Molecular Dynamics▫www.cpmd.org

•Parallelized plane wave / pseudopotential implementation of Density Functional Theory

•Common chemical systems: liquids, solids, interfaces, gas clusters, reactions ▫Large systems ~500atoms

Scales w/ # electrons NOT atoms

Key Points in Optimizing CPMD

•Developers have done a lot of work here

•The Intel compiler is used in this study

•BLAS/LAPACK▫BLAS levels 1 (vector ops) and 3 (matrix-matrix

ops) Some level 2 (vector-matrix)

• Integrated optimized FFT Library▫Compiler flag: -DFFT_DEFAULT

Benchmarking CPMD is difficult because…

• Nature of the modeled chemical system▫ Solids, liquids, interfaces

Require different parameters stressing the memory along the way▫ Volume and # electrons

• Choice of the pseudopotential (psp)▫ Norm-conserving, ‘soft’, non-linear core correction (++memory)

• Type of simulation conducted▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc…▫ CPMD is a robust code

• Very chemical system specific▫ Any one CPMD sim. cannot be easily compared to another▫ However, THERE ARE TRENDS

• FOCUS: simple wave function optimization timing▫ This is a common ab initio calculation

Probing Memory Limitations

•For any ab initio calculation:•Accuracy is proportional to # basis sets

used•Stored in matrices, requiring increased

RAM•Energy cutoff determines the size of the

Plane wave basis set,

NPW = (1/2π2)ΩEcut3/2

Model Accuracy & Memory Overview

Image obtained from the CPMD user’s manual

Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff)NOTE: Choice of psp is importanti.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in soft psp’s ; CPMD works with any psp’s

Memory Comparison Ψoptimization, 63 Si atoms, SGS psp

Ecut = 50 Ryd Ecut = 70 Ryd

• NPW ≈ 134,000

• Memory = 1.0 GB

• NPW ≈ 222,000

• Memory = 1.8 GB

Well known CPMD benchmarking model: www.cpmd.org

Results can be shown either by:Wall time = (n steps x iteration time/step) + network overhead

Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation

It neglects the network, yet results are comparableNote, CPMD runs well on a few nodes connected with gigabyte ethernet

Two important factor which affects CPMD performance MEMORY BANDWIDTH

FLOATING-POINT

Pople, Abe, Ranger CPMD Benchmarks

0 32 64 96 128 160 192 224 2560

1

2

3

4

5

6

7

8Pople 50 Ryd

Pople 70 Ryd

Abe 50 Ryd

Abe 70 Ryd

Ranger 50 Ryd

Ranger 70 Ryd

Number of Cores

Ave

rag

e I

tera

tion

Tim

e, se

con

ds

Results I

• All calculations ran no longer than 2 hours• Ranger is not the preferred machine for CPMD

• Scales well between 8 and 96 cores▫This is a common CPMD trend

• CPMD is known to super-linearity scale above ~1000 processors▫Will look into this▫Chemical system would have to change as this smaller

simulation is likely not to scale in this manner

Results II

• Pople and Abe gave the best performance

• IF a system requires more than 96 procs, Abe would be a slightly better choice

• Knowing the difficulties in benchmarking CPMD,( psp, volume, system phase, sim. protocol )

this benchmark is not a good representation of all the possible uses of CPMD.▫ Only explored one part of the code

• How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance▫ To increase system accuracy, increase Ecut

Percent Difference between 70 and 50 Ryd%Diff = [(t70-t50) / t50]*100

0 32 64 96 128 160 192 224 2560

10

20

30

40

50

60

70

Pople

Abe

Ranger

Number of Cores

Perc

en

t D

iffere

nce,

rela

tive

to 5

0 R

yd

ConclusionsRANGER• Re-ran Ranger calculations • Lower performance maybe linked to Intel compiler on AMD chips

▫ PGI compiler could show an improvement▫ Nothing over 5% is expected: still be the slowest▫ Wanted to use the same compiler/math libraries

ABE• Possible super-linear scaling, tAbe, 256procs < tothers, 256procs

• Memory size effects hinders performance below 96 procs

POPLE• Is the best system for wave function optimization• Shows a (relatively) stable, modest speed decrease as the memory

requirement is increased, it is the recommended system

Future Work

• Half-node benchmarking• Profiling Tools

• Test the MD part of CPMD▫Force calculations involving the non-local parts of the

psp will increase memory▫Extensive level 3 BLAS & some level 2▫Many FFT all-to-all calls, Now the network plays a role▫Memory > 2 GB

A new variable ! Monitor the fictitious electron mass

• Changing the model▫Metallic system (lots of electrons, change of psp; Ecut)▫Check super-linear scaling

preliminary cpmd benchmarks on ranger, pople, and abe tg aus materials science project matt mckenzie...

Documents