how to run vasp - nscpla/downloads/... · efficiency using as few cores as possible speed as many...

17
How to run VASP Part 1: Influential settings

Upload: others

Post on 16-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

How to run VASPPart 1: Influential settings

Page 2: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

Quick summary

• Processor cores ≤ atoms

• NCORE = cores per node NPAR = compute nodes

• NSIM = 1 (Intel), 4 (AMD)

• KPAR = compute nodes

• ALGO = Fast (or VeryFast)

• NBANDS = N*cores

• Use the right VASP binary (NGZhalf, and wNGZhalf for gamma-point calculations)

Q: What’s missing?

A: Read the VASP manual!

Page 3: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

Efficiency:Running as many jobs as possible within a given allocation of computer time.

Speed:The amount of time (in real time, "human time") to run a specific simulation from the time it starts.

Time-to-solution:The amount of time (in real time, "human time") it takes to get the results = runtime + the time waiting in the queue.

Page 4: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

EfficiencyUsing as few cores as possible

SpeedAs many cores as possible (up to a limit)

Time-to-solution •Finding the "right" size which fits the cluster. •Not using up all your allocated time (in order to get higher priority in the queue).

Page 5: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

1

10

100

1000

10000 1 10 100 1000 10000

Run-

time

(s)

Number of cores

VASP scaling on Carter NiSi-504 PbSO4-24 NiSi-504 (12c) NiSi-1200 (16c)

Page 6: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

Safe & Efficient: cores < n(atoms)

Daring & not as efficient:cores ≈ 2 x n(atoms)

Crash! cores > n(bands)

Page 7: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

Why should I care?

0"

10"

20"

30"

40"

50"

60"

70"

NPA

R=1"

NPA

R=2"

NPA

R=3"

NPA

R=6"

NPA

R=8"

NPA

R=12"

NPA

R=16"

NPA

R=48"

Number'of'SCF'itera0ons'vs'NPAR'FeCo%alloy%(53%atoms)%on%48%cores%

Does 10% matter?

Dear Professor N.N., !We regret to inform you that we have decided to cut your VSC time allocation from 100,000 core hours/month to 90,000 core hours/month. !Best regards, VSC Staff

Page 8: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

NPAR & NCORE

• Applies to the Davidson/RMM-DIIS algorithms (not hybrid calculations)

• Controls/activates band parallelization

• NPAR = 1: no parallelization over bands = BAD

• NPAR = (1-2)x number of compute nodes

• NCORE = cores working on one band

NCORE = cores per node (or socket)

Page 9: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

0

5

10

15

20

25

30

35

NPAR=1 NPAR=2 NPAR=4 NPAR=8 NPAR=16

Spee

d (J

obs/

h)

Li2FeSiO4 (128 atoms) on Triolith Standard DFT / 4 compute nodes / 64 cores

Page 10: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which
Page 11: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

NPAR: caveat

0"

10"

20"

30"

40"

50"

60"

70"

NPA

R=1"

NPA

R=2"

NPA

R=3"

NPA

R=6"

NPA

R=8"

NPA

R=12"

NPA

R=16"

NPA

R=48"

Number'of'SCF'itera0ons'vs'NPAR'FeCo%alloy%(53%atoms)%on%48%cores%

Page 12: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

K-point parallelization ("KPAR")

• Controls parallelization over k-points (version 5.3.2+)

• Great for hybrid-DFT jobs

• Not so great for metals with 100-1000s of k-points…

KPAR = "number of k-points treated in parallel" cores / KPAR = "group size" = cores/node?

!

KPAR = min(nodes,#kpts)

Page 13: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

4 8 12 24

Spee

d (J

obs/

h)

Number of nodes (16c/node)

MgO (63 atoms) on Triolith HSE06 hybrid calculation with 4 k-points

KPAR=1 KPAR=2 KPAR=4

Page 14: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

NSIM

• Blocking mode for RMM-DIIS algorithm(so only applies for ALGO=Fast/VeryFast)

• Effect depends on BLAS library. Less important nowadays?

• NSIM = 1 for single-node jobs on Intel Sandy Bridge

• NSIM = 4 for older systems (incl. AMD). This is VASP's default.

Page 15: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

0

5

10

15

20

25

30

35

NSIM=1 NSIM=2 NSIM=4 NSIM=8

Spee

d (J

obs/

h)

Li2FeSiO4 (128 atoms) on Triolith Standard DFT / 4 compute nodes / 64 cores

Page 16: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

0

10

20

30

40

50

60

70

NSIM=1 NSIM=2 NSIM=4 NSIM=6 NSIM=8

Spee

d (J

obs/

h)

PbSO4 (24 atoms) on Triolith Standard DFT / 1 compute node / 16 cores

Page 17: How to run VASP - NSCpla/downloads/... · Efficiency Using as few cores as possible Speed As many cores as possible (up to a limit) Time-to-solution •Finding the "right" size which

Core binding / pinning

• It is important to lock an MPI process to a specified processor core. Otherwise they can jump around…

• Intel MPI binds by default.

• Older versions of OpenMPI does not. Try e.g.:

mpirun -np 32 --display-map —bind-to-core …

• Core binding on AMD Opteron 6300-series “Interlagos” processors is not intuitive. [Cf. one of my blog posts]