lammps olcf titan summit · prepare lammps for titan 1. give an overview of molecular dynamics...

70
LAMMPS: Code Transforma3ons in Preparing for Titan A Hybrid Mul3Core Architecture Presenter: Arnold Tharrington Prepared by Arnold Tharrington and William Brown Scientific Computing Group, OLCF

Upload: others

Post on 24-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

LAMMPS:  Code  Transforma3ons  in  Preparing  for  Titan  -­‐  A  Hybrid  Mul3-­‐Core  Architecture  

Presenter: Arnold Tharrington

Prepared by Arnold Tharrington and William Brown

Scientific Computing Group, OLCF

Page 2: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

2

Outline

1.  Convey ongoing code modifications needed to prepare LAMMPS for Titan

1.  Give an overview of Molecular Dynamics   Overview of MD/LAMMPS   Runtime profile of LAMMPS

2.  Accelerating LAMMPS   Accelerating the nonbonded interactions   Accelerating the electrostatic interactions

2.  Summary

Page 3: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

3

1. Science problem

Page 4: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

4

Science Problem: Molecular Level Insight into Membrane Fusion from Molecular Dynamics • Gain molecular level understanding of membrane fusion mechanisms

• A major means for materials to enter and exit cells • Role of lipids and membrane proteins

To date, the majority of simulation work on membranes has focused on model systems containing typically one to two lipid types. However biological membranes are complex mixtures, and to model these systems without finite size effects, simulations of significantly larger systems than what is currently possible needs to be performed.

Computer image of membrane vesicles undergoing fusion.

Page 5: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

5

Science Problem: Molecular Level Insight into Membrane Fusion from Molecular Dynamics (cont.)

Computer image of the 2 petaflop target science problem. The membrane bilayer represented by coarse grain bead model.

 Some requirements for realistic simulations are

o  Proper solvation o  Eliminate modeling artifacts

due to interactions between periodic images

o  Capture long wavelength modes

o  Proper treatment of coulombic or electrostatic interactions

 The smallest, experimentally accessible vesicles to date have diameters of approximately 500 Å.

 We need simulations with the minimum size of 750,000,000 coarse grain particles to model experimentally produced vesicles

 Acceleration of the simulation is needed to reduce the time to solution!!

Page 6: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

6

Science Problem: Molecular Level Insight into Membrane Fusion from Molecular Dynamics (cont.) Coarse grain model of Dipalmitoylphosphatidylcholine (DPPC) stacked bilayers 2 PF problem:

 A charged system of 834,403 coarse grained beads  System dimensions ~1000Å x 1000Å x 33 Å

20 PF problem:

  A charge system of ~850,000,000 coarse grained beads  System Dimension ~ 32,000Å x 32,000Å x 33 Å

Computer image of the 2PF science problem. The membrane bilayer is represented by coarse grain beads.

Page 7: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

7

2. LAMMPS and MD Overview

Page 8: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

8

MD Simulations and Algorithm, F=ma

( ) ( )i

NiN r

rrrUFrrrUU

∂−==

,,,;,,, 2121

( )

( )

( )NNNN

N

N

N

rrrUFdtrdm

rrrUFdtrdm

rrrUFdtrdm

,,,

,,,

,,,

212

2

211222

2

2

211121

2

1

−∇==

−∇==

−∇==

Calculate the trajectory of a system of particles interacting with potential U.

mi is the mass of particle i ri is the position of particle i vi is the velocity of particle i

Solve by Velocity Verlet algorithm

Page 9: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

9

Force fields details

H

O

H

HO

H

(1)

(2) (3)

(4)

(5)

(6)

l1 ɵ2

l2

ɵ1

l4

l3

( ) ( ) ( ) ( )242

32

22

1 21

21

21

21 llkllkllkllkU OH

bOH

bOH

bOH

bbond OHOHOHOH

−+−+−+−=

( ) ( )222

1 21

21

Θ−Θ+Θ−Θ= ΘΘHOHHOHangle OHOH

kkU

∑∑= <

=N

i

N

ij ji

ji

oticelectrosta r

qqU

1 ,41πε

nonbondedticelectrostaanglebondsystem UUUUU +++=

O = -q

H = +q/2

∑ ∑=

<<

⎟⎟⎠

⎞⎜⎜⎝

⎛−=

N

i

N

rrij ji

ji

ji

jinonbonded

cutoffji

rB

rA

U1

6,

,12,

,

,

Page 10: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

10

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  H  

O  

H  

H  O  H  

H  O  

H  H  O  

H  

Periodic Boundary Conditions

Page 11: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

11

Spatial Decomposition

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

H  O  H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  

H  

H  O  H  

H  O  H  H  O  

H  

0 1

23

∑ ∑=

<<

⎟⎟⎠

⎞⎜⎜⎝

⎛−=

N

i

N

rrij ji

ji

ji

jinonbonded

cutoffji

rB

rA

U1

6,

,12,

,

,

Page 12: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

12

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  H  

O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

Ghost Atom Region

0

H  O  

H  H  

O  H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  

H  

H  O  H  

H  O  

H  H  O  

H  

0

Page 13: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

13

Force Calculation Implementation Within LAMMPS

Compute  nonbonded  

forces  

Compute  neighbor  List  

Compute  bond  forces  

Compute  electrosta8c  

forces    

Compute  angle  forces  

Compute  dihedral  forces  

Compute  improper  forces  

time step N

Update  coordinates  

(Repeat previous steps)

time step N+1

Page 14: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

14

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  nonbond/bond  

forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  nonbond/bond  

forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  nonbond/bond  

forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  nonbond/bond  

forces  

Update  coordinates  

0 1 2 3

Page 15: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

15

• Approximately 80% of the total computational work is in computing the forces

i Terms Force The F

∑∑

∑∑∑∑

<

Θ

+

⎥⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛+

−+−+Θ−Θ+−=

i ji ji

ji

o

nonbonded ijij

improperso

dihedralso

angleso

bondsobsystem

rqq

rR

rR

knkkbbkU

ijij

,

6min

12min

2222

41

)()()()(

πε

ε

ωωφφ ωφ

These two terms, nonbonded and electrostatic, respectively dominate the computational work

i

systemiii r

UamF

∂−==

Page 16: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

16

DPPC Stacked Bilayer Profile: 2 PF Problem

0  

500  

1000  

1500  

2000  

2500  

3000  Time  (secs)  

Job  size  

DPPC  Stacked  Bilayer  Run  3me  profile  for  100  steps  

Pair  3me  Bond  3me  Kspace  3me  Neigh  3me  Comm  3me  Output  3me  Other  3me  

nsinteractio rqq

4π1 U theofpart iscomponent Kspace The

i ji ji,

ji

oticelectrosta ∑∑

nsinteractio r

Rr

Rε U theofpart iscomponent pair The

i ji

6

ij

min12

ij

minnonbonded

ijij∑∑≠ ⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛= These two terms,

nonbonded and electrostatic, respectively dominate the computational work

Page 17: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

17

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

Job  Size  

Other  3me  

Comm  3me  

Neigh  3me  

Kspace  3me  Bond  3me  

Pair  3me  

DPPC Stacked Bilayer Profile: 2 PF Problem DPPC  Stacked  Bilayer  

Normalized  run  3me  profile  without  I/O  Normalized

   Cu

mula3

ve  Tim

e  Distrib

u3on

 

nsinteractio rqq

4π1U the ofpart iscomponent Kspace The

i ji ji,

ji

oticelectrosta ∑∑

nsinteractio r

R

r

Rε U theofpart iscomponent pair The

i ji

6

ij

min12

ij

minnonbonded

ijij∑∑≠ ⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛= These two terms,

nonbonded and electrostatic, respectively dominate the computational work

Page 18: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

18

DPPC Stacked Bilayer Profile: 2 PF Problem

0  2  4  6  8  

10  12  14  16  18  20  

1  cpu  

2  cpu  

4  cpu  

8  cpu    

16  cpu

 

32  cpu

 

64  cpu

 

128  cpu  

256  cpu  

512  cpu  

1024  cpu

 

Time  (secs)  

Kspace  cumula3ve  3me  for  100  steps  

Kspace  

nsinteractio rqq

4π1U the ofpart iscomponent Kspace The

i ji ji,

ji

oticelectrosta ∑∑

Page 19: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

19

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

Compute  electrosta8c  

forces    

Compute  non-­‐bonded  forces  

Update  coordinates  

0 1 2 3

Page 20: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

20

3. Accelerating LAMMPS

Need to accelerate the nonbonded forces and the Kspace component of the electrostatic forces nsinteractio

rqq

4π1U the ofpart iscomponent Kspace The

i ji ji,

ji

oticelectrosta ∑∑

nsinteractio r

Rr

RεU the ofpart iscomponent pair The

i ji

6

ij

min12

ij

minnonbonded

ijij∑∑≠ ⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛=

Page 21: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

21

Accelerating the Nonbonded Interactions within LAMMPS

nsinteractio r

Rr

RεU the ofpart iscomponent pair The

i ji

6

ij

min12

ij

minnonbonded

ijij∑∑≠ ⎥

⎢⎢

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛=

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

Normalized

   Cu

mula3

ve  Tim

e  Distrib

u3on

 

Job  Size  

DPPC  Stacked  Bilayer  Normalized  run  3me  profile  without  I/O  

Other  3me  

Comm  3me  

Neigh  3me  

Kspace  3me  Bond  3me  

Pair  3me  

Page 22: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

22

Calculation of Nonbonded Interactions (Current CPU Only Implementation)

Each spatial region is assigned to a MPI task.

doenddoend

MjdoNido

j particle to due i on force compute particles ji, of energy compute

,1,1=

=

Done on host CPU

Page 23: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

23

Calculation of Non-bonded Interactions Accelerated Implementation (Planned Work)

• Spatial region maps to a node. • Spatial region hosts a MPI task • Particles in the region will be evolved by CPUs and GPUs attached to this node

doenddoend

NeighborsjdoNNido

i

cpu

particles ji, ninteractio compute,1

1,,1

=

+=

Asynchronously done on GPUs

Done on host CPUs

doenddoend

NeighborsjdoNido

i

cpu

particles ji, of ninteractio compute,1

1,,1

=

=

• A numerical fraction X of the N particles will be assigned the CPUs • The remaining numerical fraction (1-X) will be assign to the accelerator

Page 24: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

24

Load Balancing of Nonbonded Interactions

Lennard-Jones kernel (864k Particles) cutoff=2.5 x radius (~78 neighbors) Low Arithmetic Intensity

Gay-Berne kernel (125k Particles) cutoff=7 x short radius High Arithmetic Intensity

•  Multiple cores can be assigned to each GPU

•  Dynamic load balancing assigns particles to CPU and GPU for asynchronous force calculation

–  Not the only benefit, splitting per GPU data into smaller subdomains improves memory latencies and splits “other” time (integration, etc.) across more cores

Dual Hex-Core AMD Istanbul 2.6 GHz, Dual Fermi C2050, Infiniband

Page 25: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

25

Acceleration of the Electrostatic Interactions by Means of Multilevel Summation Method (MSM)

nsinteractio rqq

4π1U the ofpart iscomponent Kspace The

i ji ji,

ji

oticelectrosta ∑∑

0%  

10%  

20%  

30%  

40%  

50%  

60%  

70%  

80%  

90%  

100%  

Normalized

   Cu

mula3

ve  Tim

e  Distrib

u3on

   

Job  Size  

DPPC  Stacked  Bilayer  Normalized  run  3me  profile  without  I/O  

 

Other  3me  

Comm  3me  

Neigh  3me  

Kspace  3me  

Bond  3me  

Pair  3me  

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  

20  

Time  (secs)  

Kspace  Cumula3ve  for  100  Steps  

Kspace  

Page 26: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

26

Multilevel Summation Method (MSM) vs. Particle Mesh Ewald (PME)

• LAMMPS currently uses the PME algorithm solve the electrostatic interactions

• MSM is more scalable than PME  MSM is grid based method whose communication pattern is nearest neighbor while PME involves 2 ALL-TO-ALLs over reciprocal lattice points to compute 2 3d FFTs (Hardy, 2009)  MSM scale like O(N) whereas PME scales as O(NlnN)  For typical systems, the crossover between MSM and PME is approximately 100,000 atoms

 MSM is essential for the scalability of large problems like the ones proposed for OLCF3

Page 27: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

27

ij rr −

1

( )jia rrg ,0

( )⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−−

jiaij

rrgrr

,1

1.  Separation of the electrostatic interaction potential into a MSM short range part plus a MSM long range, slowly varying part

2.  Approximate the slowly varying MSM long range part to a grid.

3.  Recursive application of the first two ideas on a hierarchy of coarser grids (Hardy, 2006, p. 12).

Multilevel Summation Method Overview

MSM Short range part Long range part

( )jia rrg ,+

Page 28: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

28

MSM Algorithmic Steps

qq 00 ϕ=

q

anterpolation

01

1 qq ϕ=

restriction

restriction

MSM short range part

qGe Shortshort =

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

11

00000 eqGqGe Short ϕ+≅=

22

11111 eqGqGe Short ϕ+≅=

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

ll qGlll qGe =

1−le

1e

0e

interpolation

00 eqGqGe Short ϕ+≅=

Grid lattice spacing h, lattice cutoff a

Grid lattice spacing 2h, lattice cutoff 2a

Grid lattice spacing 2(l-1)h, lattice cutoff 2(l-1)a

Grid lattice spacing 2lh, lattice cutoff 2la

Page 29: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

29

MSM Direct

{ }[ ]

[ ] ( )( ) ( )( )( )[ ]kkjjiiq

kjihgkjihgkjie

ha

hai

ha

haj

ha

hak

kjieindiceskji

cccn

nnnnccc

shortn

xx

yy

zz

cccshortn

nccc

+++×

−=+

⎥⎦

⎥⎢⎣

⎢⎥⎦

⎥⎢⎣

⎢−=

⎥⎥⎦

⎢⎢⎣

⎥⎥⎦

⎢⎢⎣

⎢−=

⎥⎦

⎥⎢⎣

⎢⎥⎦

⎥⎢⎣

⎢−=

=

Ω∈

+

,,

,,2,,,2,,,

2,,2

2,,2

2,,2

0,,

),,(

1,

,

00

for

for

for

for

Page 30: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

30

Compute  neighbor  List  

MSM  

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

MSM  

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

MSM  

Compute  non-­‐bonded  forces  

Update  coordinates  

Compute  neighbor  List  

MSM  

Compute  non-­‐bonded  forces  

Update  coordinates  

0 1 2 3

acc   acc   acc   acc  

Page 31: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

31

• Need non-trivial knowledge of what the code is doing –  Algorithmic details

•  Use a debugger to step through the code –  Problem/data decomposition –  Interproccessor data communication patterns

• Profile the code to discover bottlenecks –  VampirTrace, CrayPat, homemade timers. etc.

•  “There is no free lunch for porting to GPUs –  (Except for those who registered for this workshop) –  There is a significant communication cost with the

accelerator; some algorithms may not be computational intense enough to be worth this cost.

Page 32: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

32

Summary •  We have identified the major challenges and have a technologically

reasonable plan for accelerating LAMMPS –  Nonbonded interactions

•  Capturing various levels of parallelism on CPU and GPU •  Load balancing between CPU and GPU

–  Electrostatic interactions •  Developing the MSM algorithm

– Capturing various levels of parallelism on CPU and GPU – Load balancing between CPU and GPU

–  Regardless of the accelerator, we are forced to restructure some parts LAMMPS to effectively capture the inter and intra node parallelism

Page 33: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

33

Acknowledgements

• Bobby Phillips and Richard Mills – Multi-grid discussions

•  Jim Schwarzmeier and Sarah Anderson – Algorithmic discussions, benchmarking, ….

• Scott Hampton and Pratul Agarwal – CUDA and algorithmic discussions

Page 34: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

34

References

• Hardy, David. “Multilevel Summation For the Fast Evaluation of Forces for the Simulation of Biomolecules" Ph.D. diss., University of Illinois at Urbana-Champaign, 2006.

• Hardy, David., Stone John., and Schulten K. 2009. Multilevel Summation of Electrostatic Potentials Using Graphics Processing Units. Parallel Computing vol 35: pgs 164-177.

Page 35: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

35

•  Team Members and Expertise –  Oak Ridge National Laboratories

•  Arnold Tharrington - Computational biophysicist, primary programmer for code modifications

•  Michael Brown - Computational biophysicist, primary programmer for code modifications

•  Bobby Philips - Mathematician and multigrid expert –  Sandia National Laboratories

•  Steve Plimpton - Computational scientist, originator of LAMMPS •  Paul Crozier - Computational scientist, LAMMPS developer

–  NVIDIA •  Peng Wang - Computational scientist and CUDA expert

–  Cray Inc. •  James Schwarzmeier - Computational scientist, benchmarking, code profile and

optimization expert, threading, etc. •  Sarah Anderson - Computational Scientist, benchmarking, code profile and optimization

expert, threading, etc. –  Institute for Computational Molecular Science at Temple University (Director: Michael Klein)

•  Axel Kohlmeyer - Computational Biophysicist, science lead for 20 PF target problem

Page 36: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

36

Supplementary Slides

Page 37: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

37

Energy and Forces

( ) [ ]

( )

11

0000

0

0021

21

21

21,,,

eqGqGe

rr

eqqGr

qF

eqqGqrrrU

Short

ll

lShortl

l

TShortN

ϕ

ϕ

ϕ

ααα

+≅=

∂−

∂≅

+≅

where

Page 38: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

38

Matrix Vector Decomposition

[ ] 11

11

++

++

+≅=

=

nTn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q [ ] 00 eqGqGe T

Short ϕ+≅=

qq 00 ϕ= [ ] 1

10000

0 eqGqGe TShort ϕ+≅=

01

1 qq ϕ=[ ] 22

11111 eqGqGe TShort ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

[ ] lTl

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 39: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

39

Calculation of MSM Short Range Part

doenddoend

NeighborsjdoNNido

i

cpu

particles ji, ninteractio compute,1

1,,1

=

+=

Asynchronously done on GPUs.

Done on host CPUs

doenddoend

NeighborsjdoNido

i

cpu

particles ji, of ninteractio compute,1

1,,1

=

=

• A numerical fraction X of the N particles will be assigned the CPUs • The remaining numerical fraction (1-X) will be assign to the accelerator

• Spatial region maps to a node. • Spatial region hosts a MPI task • Particles in the region will be evolved by CPUs and GPUs attached to this node

Page 40: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

40

MSM Grid Computations

Done on host CPUs

doend

foripoint grid on compute

pointsgridcpu

Done on GPUs

Kn total grid points for level n

• A numerical fraction X of the grid points will be assigned the CPUs • The remaining numerical fraction (1-X) will be assign to the accelerator

X 1-X

doend

foripoint grid on compute

pointsgridraccelerato

• Spatial region maps to a node. • Spatial region hosts a MPI task • Particles in the region will be evolved by CPUs and GPUs attached to this node

Page 41: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

41

Algorithmic Steps

Particle Level

anterpolation

restriction

restriction

short range part

direct part

direct part

direct part

direct part (top level)

prolongation

prolongation

prolongation

interpolation

Page 42: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

42

Operational Counts

• Short range part ~ (a/h)3N • Anterpolation ~ (p3 + 7p2 + 9p + 6)N • Direct part ~ (a/h)3Kn

•  Interpolation ~ (5p3 + 24p2 + 27p + 11)N + K0

• Restriction and Prolongation ~(2p+2)(1-1/8l)K0

Hardy, J. 2006 Multilevel Summation for the Fast Evaluation of Forces For The Simulation of Biomolecules. PhD Thesis, University of Illinois at Urbana-Champaign.

Page 43: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

43

Splitting Operation

ij rr −

1

( )jia rrg ,0

( ) ( )jiajiaij

rrgrrgrr

,,1

+⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−−

Short range Long range

Page 44: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

44

Splitting Operation (cont.)

( )

( )

( )

( ) 11)(,22

1,

,0)(,,

)(;,1

,0,,

0

≥=⎟⎟

⎜⎜

⎛ −≡

⎪⎪⎪

⎪⎪⎪

=

≠∈−

≠∉−−

=

⎩⎨⎧

=

≠=

ρρ

ργγ

χ

χ

for and

where

and

and

arr

arrg

jijiijrrg

jiijrrgrr

G

jijirrg

G

nij

njina

jia

jiaij

Short

jiaLong

Page 45: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

45

Splitting Operation (cont.)

( ) ( )jiajiaij

rrgrrgrr

,,1

+⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−−

( ) ( ){ } ( )jiajiajia rrgrrgrrg ,,, 22 +−

( ) ( ){ } ( )jiajiajia rrgrrgrrg ,,, 442 +−

convolve charges to grid of length a; split long range part

convolve charges to grid of length 2a; split long range part

Page 46: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

46

MPI Internode Communication

Cutoff distance r for atom

MPI task 0

MPI task 1

r

r

Page 47: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

47

MPI Internode Communication (cont.)

Neighbor  list  

Compute  forces  

Integrate  and  update  

atom  coordinates  

• Many small messages between neighbors • Typical MPI message packet <100 Mbytes

MPI task 0

MPI task 1

Time  step  N  

Page 48: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

48

MSM Grid Structure

Level 0 Lattice spacing h

16x16 grid

Level 1 Lattice spacing 2h

8x8 grid

Level 2 Lattice spacing 4h

4x4 grid

Page 49: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

49

Anterpolation – Convolution of Charges

Particle level Level 0

Level 0

Basis functions provide local support about each grid point

Level 0

Page 50: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

50

Anterpolation (cont.) Basis functions provide local support about each grid point

Level 0

MPI task 0

MPI task 2

MPI task 1

MPI task 3

Page 51: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

51

Nodal Basis Functions

),( ξϕ m

ξ

Page 52: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

52

Anterpolation Communication-Nearest Neighbor

MPI task 0

MPI task 1

MPI task 2

MPI task 3

Page 53: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

53

Restriction of Grid

Level 0 grid

Level 1 grid

Restrict charges from level 0 to level 1

Level 2 grid

Restrict charges from level 1 to level 2

MPI task 0 MPI task 1

MPI task 2 MPI task 3

Animation of grid restriction

Page 54: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

54

MSM Short Range

a

rr

ar

arqqf

rr

ar

arqqU

ijjij

ijji

Short

Short

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛ʹ′−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−=

γ

γ

22

11

11

MSM short range cutoff of distance a

a

MPI task 0 MPI task 1

MPI task 2 MPI task 3

Page 55: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

55

MSM Direct MPI task 0 MPI task 1

MPI task 2 MPI task 3

[ ] ( )( ) ( )( )( ) [ ]kkjjiiqkjihgkjihgkjie cccnnnnn

cccshortn +++×−=+ + ,,,,2,,,2,,, 1, 00

( )ccc k,j,i

Animation of direct calculation for MPI task 0

Page 56: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

56

MSM Prolongation

?

1

=

Ω −

n

n

e1+

Ωn

n

e

[ ] 11

11

++

++

+≅=

=

nTn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕAnimation of prolongation calculation for MPI task 0

MPI task 0 MPI task 0 MPI task 1 MPI task 1

MPI task 2 MPI task 2 MPI task 3 MPI task 3

Page 57: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

57

Initial Results of NAMD Lite with MSM

(GPU: NVIDIA GTX-285, using CUDA 3.0; CPU: 2.4 GHz Intel Core 2 Q6600 quad core)

Box  of  21950  flexible  waters,  12  A  cutoff,  1ps  

CPU  only   with  GPU   Speedup  vs.  NAMD/CPU  

NAMD  with  PME   1199.8  s   210.5  s   5.7  x  

NAMD-­‐Lite  with  MSM  

5183.3  s  (4598.6  short,  572.23  long)  

176.6  s  (93.9  short,  63.1  

long)  

6.8  x  (19%  over  NAMD/

GPU)  

Hardy, D. 2010, “Multiscale Molecular Modeling”, Edinburgh, UK, June 30 - July 3, 2010

Page 58: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

58

Data Transfer Between Host and Device

Number of MPI tasks: 18,000 Unit Cell Dimensions: 32,000 Å x 32,000 Å x 33 Å

Maximum  number  of  Levels  

X  grid  length  

Y  grid  length  

Z  grid  length  

Total  Number  of  grid  points  

Number  of  grid  

points  per  MPI  task  

8   3.90625   3.90625   0.12890625   1.68E+07   9.32E+02  

9   1.953125   1.953125   0.064453125   1.34E+08   7.46E+03  

10   0.9765625   0.9765625   0.032226563   1.07E+09   5.97E+04  

11   0.48828125   0.48828125   0.016113281   8.59E+09   4.77E+05  

12   0.244140625   0.244140625   0.008056641   6.87E+10   3.82E+06  

For the finest grid only

Page 59: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

59

Data Transfer Between Host and Device(cont.) Maximum  number  of  levels   Bytes  per  MPI  task   Bytes  per  MPI  task   Bytes  per  MPI  task   Bytes  per  MPI  task  

Level  0   Level  1   Level  2   Level  3  

8  

Data  in   Data  out   Data  in   Data  out   Data  in   Data  out   Data  in   Data  out  29.8E+3   7.5E+3   3.7E+3   932.1E+0   466.0E+0   116.5E+0   58.3E+0   14.6E+0  

9   238.6E+3   59.7E+3   29.8E+3   7.5E+3   3.7E+3   932.1E+0   466.0E+0   116.5E+0  

10   1.9E+6   477.2E+3   238.6E+3   59.7E+3   29.8E+3   7.5E+3   3.7E+3   932.1E+0  

11   15.3E+6   3.8E+6   1.9E+6   477.2E+3   238.6E+3   59.7E+3   29.8E+3   7.5E+3  

12   122.2E+6   30.5E+6   15.3E+6   3.8E+6   1.9E+6   477.2E+3   238.6E+3   59.7E+3  

Page 60: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

60

MSM Short Range

11

11

++

++

+≅=

=

nn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q0

0 eqGqGe Short ϕ+≅=

qq 00 ϕ=

11

00000 eqGqGe Short ϕ+≅=

01

1 qq ϕ=2

211111 eqGqGe Short ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 61: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

61

MSM Short Range

( ) ( ) ( )

rr

ar

arqqf

rr

ar

arqqU

ar

zzyyxxr

zzyyxxr

ijjij

ijji

Short

ijijij

ijijijij

Short

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛ʹ′−=−

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−=+

<

−+−+−=

−−−=

γ

γ

22

22

2222

11

11

)(

],,[

then if

Page 62: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

62

MSM Direct

11

11

++

++

+≅=

=

nn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q0

0 eqGqGe Short ϕ+≅=

qq 00 ϕ=

11

00000 eqGqGe Short ϕ+≅=

01

1 qq ϕ=2

211111 eqGqGe Short ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 63: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

63

MSM Direct

{ }[ ]

[ ] ( )( ) ( )( )( )[ ]kkjjiiq

kjihgkjihgkjie

ha

hai

ha

haj

ha

hak

kjieindiceskji

cccn

nnnnccc

shortn

xx

yy

zz

cccshortn

nccc

+++×

−=+

⎥⎦

⎥⎢⎣

⎢⎥⎦

⎥⎢⎣

⎢−=

⎥⎥⎦

⎢⎢⎣

⎥⎥⎦

⎢⎢⎣

⎢−=

⎥⎦

⎥⎢⎣

⎢⎥⎦

⎥⎢⎣

⎢−=

=

Ω∈

+

,,

,,2,,,2,,,

2,,2

2,,2

2,,2

0,,

),,(

1,

,

00

for

for

for

for

Page 64: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

64

MSM Anterpolation

11

11

++

++

+≅=

=

nn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q0

0 eqGqGe Short ϕ+≅=

qq 00 ϕ=

11

00000 eqGqGe Short ϕ+≅=

01

1 qq ϕ=2

211111 eqGqGe Short ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 65: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

65

Anterpolation

( )

( )

( )

( )

( )

( )

nxyzgridgridgrid

xgrid

ygrid

yzyz

zgrid

zz

z

z

kz

y

y

jy

x

x

ix

z

y

x

qqzyxq

xp

y

qq

pzq

pkhzz

j

hyy

ihxx

p

phzzk

phyyj

phxxi

zyxNn

][],,[

,...,0

][

,...,0

][

,...,0

][

][

][

,...,02

1

21

21

),,(stencil gsurroundin ofindex gridlowest thefind,...,1

0

0

0

0

00

00

00

000

0

0

0

x_stencil

x_stencil

y_stencil

y_stencil

z_stencil

z_stencil

N

Nfor

N

Nfor

N

Nfor

for

for

ϕ

νλ

νλ

ϕ

νλ

ϕ

ννλ

νϕ

ννλ

νϕ

ννλ

νϕ

ν

ν

ν

ν

=+

=

=

=

=

=

=

=

=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛ −Φ=

=

−−⎥⎦

⎥⎢⎣

⎢ −=

−−⎥⎥⎦

⎢⎢⎣

⎢ −=

−−⎥⎦

⎥⎢⎣

⎢ −=

=

+

+

+

Nodal basis function of degree p=3

( )

⎪⎪⎪

⎪⎪⎪

≤≤−−−

≤−+−

otherwisefor

for

cubic

,021,)2)(1(

1),231)(1(

221

2

ξξξ

ξξξξ

ξ

),,(000 kji zyx

Page 66: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

66

Anterpolation

h

Particle level (k=0)

for i=-3,3 for j=-3,3 for k=-3,3 Q(m)=Q(m)+dQ(ijk)

Node i

Page 67: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

67

Restriction

11

11

++

++

+≅=

=

nn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q0

0 eqGqGe Short ϕ+≅=

qq 00 ϕ=

11

00000 eqGqGe Short ϕ+≅=

01

1 qq ϕ=2

211111 eqGqGe Short ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 68: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

68

MSM Restriction

( )[ ]

( )[ ]

( )[ ]

( ){ }( ){ }( ){ }

[ ]

[ ] [ ]( ){ }

( ){ }

[ ]

[ ] [ ]( ){ }

[ ] [ ]ννϕ

ν

ννϕ

ν

ννϕ

ν

φ

φ

φ

ν

ν

ν

ν

++ʹ′⋅=+ʹ′ʹ′ʹ′

+−=

Ω∈ʹ′

++ʹ′⋅=+

+−=

=

Ω∈

Ω∈ʹ′

++ʹ′⋅=+

+−=

=

Ω∈

Ω∈

Ω∈ʹ′

⎟⎟

⎜⎜

⎛ −Φ=

⎟⎟

⎜⎜

⎛ −Φ=

⎟⎟

⎜⎜

⎛ −Φ=

+−=

⎥⎦

⎥⎢⎣

⎢ −=⎥

⎥⎢⎣

⎢ −=⎥

⎥⎢⎣

⎢ −=

Ω

++

+

++

+

+

+

+

+

+

++

+

++

+

++

+++

032

)(1

1

031

)(32

32

1

0)(31

31

1

1

10

1

10

1

10

01

00

01

00

01

00

2][,,

1,,

2,][

1,,0

2,,][,

1,,0,

2

2

2

1,,2

,2

,2

iikjiq

ppforifor

jjii

ppfori

ifor

jfor

kkjiqji

ppforji

ifor

jforkfor

hzz

v

hyy

v

hxx

v

ppforhxxk

hxxj

hxxi

q

n

xn

xn

n

y

n

n

xn

yn

nz

n

n

xn

yn

zn

zn

nni

z

yn

nni

y

xn

nni

x

n

nn

n

nn

n

nn

n

offset

offset

offset

Q

indices

QQ

Q

indices

indices

Q

Q

indices

indices indices

charge with grid given N

Page 69: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

69

MSM Prolongation

11

11

++

++

+≅=

=

nn

nnShort

nnn

nn

n

eqGqGe

qq

ϕ

ϕ

q0

0 eqGqGe Short ϕ+≅=

qq 00 ϕ=

11

00000 eqGqGe Short ϕ+≅=

01

1 qq ϕ=2

211111 eqGqGe Short ϕ+≅=

21

1 −−

− = ll

l qq ϕ

1−= ll

l qq ϕ

ll

llShort

lll eqGqGe ϕ+≅= −−−−− 11111

lll qGe =0=ll qG

1−le

1e

0e

e

This product is zero for neutral charged systems

Page 70: LAMMPS OLCF Titan Summit · prepare LAMMPS for Titan 1. Give an overview of Molecular Dynamics Overview of MD/LAMMPS Runtime profile of LAMMPS 2. Accelerating LAMMPS Accelerating

70

Prolongation

( )[ ]

( )[ ]

( )[ ]

( ){ }

( ){ }

( ){ }

[ ] [ ]( ){ }

[ ] [ ]( ){ }( ){ }

[ ] [ ] [ ]jikkjie

ppforifor

jfor

ijji

ppforifor

kjieii

ppforifor

jfor

kfor

hzz

v

hyy

v

hxx

v

ppforhxxk

hxxj

hxxi

q

n

ylongn

xn

yn

n

y

n

xn

nx

n

xn

n

yn

n

zn

zn

nni

z

yn

nni

y

xn

nni

x

n

nn

n

nn

n

nn

n

offset

offset

offset

,2,,

1,,

][2,

1,,

,,][2

1,,

2

2

2

1,,2

,2

,2

31

)(0,

32

)(031

1)(0

32

1

32

1

31

1

1

10

1

10

1

10

01

00

01

00

01

00

+

++

++

+

+

+

+

+

+

++

+

++

+

++

+++

⋅=+++ʹ′

+−=

Ω∈

Ω∈

ʹ′⋅=+++ʹ′

+−=

Ω∈

ʹ′ʹ′ʹ′⋅=+++ʹ′

+−=

Ω∈ʹ′

Ω∈ʹ′

Ω∈ʹ′

⎟⎟

⎜⎜

⎛ −Φ=

⎟⎟

⎜⎜

⎛ −Φ=

⎟⎟

⎜⎜

⎛ −Φ=

+−=

⎥⎦

⎥⎢⎣

⎢ −=⎥

⎥⎢⎣

⎢ −=⎥

⎥⎢⎣

⎢ −=

Ω

E

indices

indices

EE

indices

E

indices 0 to Ereset

indices 0 to Ereset

indices

charge with grid given N

νϕν

ν

νϕν

ν

νϕν

ν

φ

φ

φ

ν

ν

ν

ν