preliminary cpmd benchmarks on ranger, pople, and abe tg aus materials science project matt mckenzie...
TRANSCRIPT
Preliminary CPMD BenchmarksOn Ranger, Pople, and Abe
TG AUS Materials Science ProjectMatt McKenzieLONI
What is CPMD ?
•Car Parrinello Molecular Dynamics▫www.cpmd.org
•Parallelized plane wave / pseudopotential implementation of Density Functional Theory
•Common chemical systems: liquids, solids, interfaces, gas clusters, reactions ▫Large systems ~500atoms
Scales w/ # electrons NOT atoms
Key Points in Optimizing CPMD
•Developers have done a lot of work here
•The Intel compiler is used in this study
•BLAS/LAPACK▫BLAS levels 1 (vector ops) and 3 (matrix-matrix
ops) Some level 2 (vector-matrix)
• Integrated optimized FFT Library▫Compiler flag: -DFFT_DEFAULT
Benchmarking CPMD is difficult because…
• Nature of the modeled chemical system▫ Solids, liquids, interfaces
Require different parameters stressing the memory along the way▫ Volume and # electrons
• Choice of the pseudopotential (psp)▫ Norm-conserving, ‘soft’, non-linear core correction (++memory)
• Type of simulation conducted▫ CPMD, BOMD, Path Integral, Simulated Annealing, etc…▫ CPMD is a robust code
• Very chemical system specific▫ Any one CPMD sim. cannot be easily compared to another▫ However, THERE ARE TRENDS
• FOCUS: simple wave function optimization timing▫ This is a common ab initio calculation
Probing Memory Limitations
•For any ab initio calculation:•Accuracy is proportional to # basis sets
used•Stored in matrices, requiring increased
RAM•Energy cutoff determines the size of the
Plane wave basis set,
NPW = (1/2π2)ΩEcut3/2
Model Accuracy & Memory Overview
Image obtained from the CPMD user’s manual
Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff)NOTE: Choice of psp is importanti.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in soft psp’s ; CPMD works with any psp’s
Memory Comparison Ψoptimization, 63 Si atoms, SGS psp
Ecut = 50 Ryd Ecut = 70 Ryd
• NPW ≈ 134,000
• Memory = 1.0 GB
• NPW ≈ 222,000
• Memory = 1.8 GB
Well known CPMD benchmarking model: www.cpmd.org
Results can be shown either by:Wall time = (n steps x iteration time/step) + network overhead
Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation
It neglects the network, yet results are comparableNote, CPMD runs well on a few nodes connected with gigabyte ethernet
Two important factor which affects CPMD performance MEMORY BANDWIDTH
FLOATING-POINT
Pople, Abe, Ranger CPMD Benchmarks
0 32 64 96 128 160 192 224 2560
1
2
3
4
5
6
7
8Pople 50 Ryd
Pople 70 Ryd
Abe 50 Ryd
Abe 70 Ryd
Ranger 50 Ryd
Ranger 70 Ryd
Number of Cores
Ave
rag
e I
tera
tion
Tim
e, se
con
ds
Results I
• All calculations ran no longer than 2 hours• Ranger is not the preferred machine for CPMD
• Scales well between 8 and 96 cores▫This is a common CPMD trend
• CPMD is known to super-linearity scale above ~1000 processors▫Will look into this▫Chemical system would have to change as this smaller
simulation is likely not to scale in this manner
Results II
• Pople and Abe gave the best performance
• IF a system requires more than 96 procs, Abe would be a slightly better choice
• Knowing the difficulties in benchmarking CPMD,( psp, volume, system phase, sim. protocol )
this benchmark is not a good representation of all the possible uses of CPMD.▫ Only explored one part of the code
• How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance▫ To increase system accuracy, increase Ecut
Percent Difference between 70 and 50 Ryd%Diff = [(t70-t50) / t50]*100
0 32 64 96 128 160 192 224 2560
10
20
30
40
50
60
70
Pople
Abe
Ranger
Number of Cores
Perc
en
t D
iffere
nce,
rela
tive
to 5
0 R
yd
ConclusionsRANGER• Re-ran Ranger calculations • Lower performance maybe linked to Intel compiler on AMD chips
▫ PGI compiler could show an improvement▫ Nothing over 5% is expected: still be the slowest▫ Wanted to use the same compiler/math libraries
ABE• Possible super-linear scaling, tAbe, 256procs < tothers, 256procs
• Memory size effects hinders performance below 96 procs
POPLE• Is the best system for wave function optimization• Shows a (relatively) stable, modest speed decrease as the memory
requirement is increased, it is the recommended system
Future Work
• Half-node benchmarking• Profiling Tools
• Test the MD part of CPMD▫Force calculations involving the non-local parts of the
psp will increase memory▫Extensive level 3 BLAS & some level 2▫Many FFT all-to-all calls, Now the network plays a role▫Memory > 2 GB
A new variable ! Monitor the fictitious electron mass
• Changing the model▫Metallic system (lots of electrons, change of psp; Ecut)▫Check super-linear scaling