what’s new with namd triumph and torture with new platforms

33
NIH Resource for Biomolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute, UIU What’s New With NAMD Triumph and Torture with New Platforms Jim Phillips and Chee Wai Lee Theoretical and Computational Biophysics Group http://www.ks.uiuc.edu/Research/ namd/

Upload: sarah-owens

Post on 30-Dec-2015

35 views

Category:

Documents


1 download

DESCRIPTION

What’s New With NAMD Triumph and Torture with New Platforms. Jim Phillips and Chee Wai Lee Theoretical and Computational Biophysics Group http://www.ks.uiuc.edu/Research/namd/. What is NAMD?. Molecular dynamics and related algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

What’s New With NAMDTriumph and Torture with New Platforms

Jim Phillips and Chee Wai LeeTheoretical and Computational Biophysics Group

http://www.ks.uiuc.edu/Research/namd/

Page 2: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

What is NAMD?

• Molecular dynamics and related algorithms– e.g., minimization, steering, locally enhanced sampling,

alchemical and conformational free energy perturbation

• Efficient algorithms for full electrostatics• Effective on affordable commodity hardware• Read file formats from standard packages:

X-PLOR (NAMD 1.0), CHARMM (NAMD 2.0),Amber (NAMD 2.3), GROMACS (NAMD 2.4)

• Building a complete modeling environment

Page 3: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Towards Understanding Membrane Channels The versatile, highly selective and efficent aquaporin

Deposited at the web site of the Nobel Museum

Page 4: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

• 57,000 atoms • Periodic boundary conditions• CHARMM27 force-field, • NVT: constant volume and

temperature• PME full electrostatics• Teragrid benchmark: 0.24

day/ns on 64 Itanium 1.5 GHz processors

Collaboration with DOE National Renewable Energy Lab. Golden, CO

Algal HydrogenaseProtein Redesign Seeks a Photosynthetic

Source for Hydrogen Gas

Page 5: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

ATP-Synthase One shaft, two motors

Soluble part, F1-ATPase-Synthesizes ATP when torque is applied to it (main function of this unit)-Produces torque when it hydrolyzes ATP (not main function)

Membrane-bound part, F0 Complex- Produces torque when positive proton gradient across membrane(main function of this unit)- Pumps protons when torque is applied (not main function)

~ 80 Å

~ 200 Å

~ 60 Å

~ 60 Å

~ 100Å

Torque is transmitted between the motors via the central stalk.

330,000 atom

130,000 atom

Page 6: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Molecular Mechanics Force Field

Page 7: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Biomolecular Time ScalesMotion Time Scale

(sec)Bond stretching 10-14 to 10-13

Elastic vibrations 10-12 to 10-11

Rotations of surfacesidechains

10-11 to 10-10

Hinge bending 10-11 to 10-7

Rotation of buried sidechains

10-4 to 1 sec

Allosteric transistions 10-5 to 1 sec

Local denaturations 10-5 to 10 sec

Max Timestep: 1 fs

Page 8: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD with PMEPeriodic boundary conditionsNPT ensemble at 310 K

Protein: ~ 15,000 atomsLipids: ~ 40,000 atomsWater: ~ 51,000 atomsTotal: ~ 106,000 atoms

1024 PSC TCS CPUs4 hours per ns

Example Simulation: GlpF

M. Jensen, E. Tajkhorshid, K. Schulten, Structure 9, 1083 (2001)E. Tajkhorshid et al., Science 296, 525-530 (2002)

Page 9: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Typical Simulation Statistics

• 100,000 atoms (including water, lipid)

• 10-20 MB of data for entire system

• 100 A per side periodic cell

• 12 A cutoff of short-range nonbonded terms

• 10,000,000 timesteps (10 ns)

• 4 s/step on one processor (1.3 years total!)

Page 10: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Parallel MD: Easy or Hard?

• Easy– Tiny working data

– Spatial locality

– Uniform atom density

– Persistent repetition

– Multiple timestepping

• Hard– Sequential timesteps

– Short iteration time

– Full electrostatics

– Fixed problem size

Page 11: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Poorly Scaling Approaches

• Replicated data– All atom coordinates stored on each processor– Communication/Computation ratio: O(P log P)

• Partition the atom array across processors– Nearby atoms may not be on the same processor– C/C ratio: O(P)

• Distribute force matrix to processors– Matrix is sparse, non uniform– C/C Ratio: O(sqrt P)

Page 12: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Spatial Decomposition: NAMD 1

• Atoms spatially distributed to cubes

• Size of each cube :

• Just a larger than cut-off radius

• Communicate only w/ neighbors

• Work for each pair of neighbors

• C/C ratio: O(1)

• However:

• Load Imbalance

• Limited Parallelism

Cells, Cubes or “Patches”

Page 13: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

• Spatially decompose data and communication.• Separate but related work decomposition.• “Compute objects” facilitate iterative, measurement-based load balancing system.

Hybrid Decomposition: NAMD 2

Page 14: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Particle Mesh Ewald

• Particle Mesh Ewald (PME) calculation adds:– A global grid of modest size (e.g. 192x144x144).

– Distributing charge from each atom to 4x4x4 sub-grid.

– 3D FFT over the grid, hence O(N log N) performance.

• Strategy:– Use a smaller subset of processors for PME.

– Overlap PME with cutoff computation.

– Use same processors for both PME and cutoff.

– Multiple time-step reduces scaling impact.

Page 15: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

192

NAMD 2 w/PME Parallelization using Charm++

700 30,000

144

Page 16: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Avoiding Barriers

• In NAMD:– The energy reductions were made asynchronous.

– No other global barriers are used in cut-off simulations.

• This came handy when:– Running on Pittsburgh Lemieux (3000 processors).

– The machine (and how Converse uses the network) produced unpredictable, random communication delay.

• A send call would remain stuck for 20 ms, for example.

– Each timestep, ideally, was 12-14 ms.

Page 17: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Handling Network Delays

Page 18: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

0.01

0.1

1

10

100

1 2 4 8 16 32 64 128 256 512 1024

tim

e pe

r st

ep (

seco

nds)

SC2002 Gordon Bell Award

36 ms per step76% efficiency

327K atomswith PME

Lemieux(PSC)

28 s per step

Linear scaling

number of processors

Page 19: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Major New Platforms

• SGI Altix

• Cray XT3 “Red Storm”

• IBM BG/L “Blue Gene”

Page 20: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

SGI Altix 3000

• Itanium-based successor to Origin series• 1.6 GHz Itanium 2 CPUs w/ 9 MB Cache• Cache-coherent NUMA shared memory• Runs Linux (with some SGI modifications)• NCSA has two 512 processor machines

Page 21: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Porting NAMD to the Altix

• Normal Itanium binary just works.• Best serial performance ever, better than other

Itanium platforms (TeraGrid) at same clock speed.• Building with SGI MPI just works.• setenv MPI_DSM_DISTRIBUTE needed.• Superlinear speedups 16 to 64 processors (good

network, running mostly in cache at 64).• Decent scaling to 256 (for ApoA1 benchmark).• Intel 8.1 and later compiler performance issues.

Page 22: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

2 4 8 16 32 64 128 256

pro

cess

ors

x se

con

ds

per

ste

p

NAMD on New Platforms

PSC Cray XT3

21 ms/step4.1 ns/day

(perfect scaling is a horizontal line)

92K atoms, PME

NCSA 3.06 GHz Xeon

TeraGrid 1.5 GHz Itanium 2

number of processors

NCSA Altix 1.6 GHz Itanium 2

Page 23: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Altix Conclusions

• Nice machine, easy to port to– Code must run well on Itanium

• Perfect for typical NAMD user– Fastest serial performance– Scales well to typical number of processors– Full environment, no surprises– TCBG’s favorite platform for the past year

Page 24: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Altix Transforms Interactive NAMDUserVMDNAMD

(HHS Secretary Thompson)

0

100

200

300

400

500

600

0 8 16 24 32processors

step

s pe

r se

con

d

3.06 GHz Xeon

1.33 GHz Athlon

2.13 GHzGlpF IMD Benchmark:

• 4210 atoms

• 3295 fixed atoms

• 10A cutoff, no PME

20032004

8-fold Performance Growth2001 to 2003: 72% faster2003 to 2004: 39% faster2004 to 2005: 239% faster

1.6 GHz Altix

2001

20052fs step = 1ps/s

Page 25: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 (Red Storm)

• Each node:– Single AMD Opteron 100-series processors

• 57 ns memory latency• 6.4 GB/s memory bandwidth• 6.4 GB/s HyperTransport to Seastar network

– Seastar router chip:• 6 ports (3D torus topology)• 7.6 GB/s per port (in fixed Seastar 2)• Poor latency (vs. XD1, according to Cray)

Page 26: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 (Red Storm)

• 4 nodes per blade

• 8 blades per chassis

• 3 chassis per cabinet, plus one big fan

• PSC machine (Big Ben) has 22 chassis– 2068 compute processors– Performance boost for TCS system (Lemieux)

Page 27: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 (Red Storm)

• Service and I/O nodes run Linux– Normal x64-64 binaries just work on them

• Compute nodes run Catamount kernel– No OS interference for fine-grained parallelism

– No time sharing…one process at a time

– No sockets

– No interrupts

– No virtual memory

– System calls forwarded to head node (slow!)

Page 28: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 Porting

• Initial compile mostly straightforward– Disable Tcl, sockets, hostname, username code.

• Initial runs horribly slow on startup– Almost like memory allocation was O(n2)

– Found docs:• “simple implementation of malloc(), optimized for the

lightweight kernel and large memory allocations”

• Sounds like they assume a stack-based structure

• Using –lgmalloc restores sane performance

Page 29: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 Porting

• Still somewhat slow on startup– Need to do all I/O to Lustre scratch space– May be better when head node isn’t overloaded

• Tried SHMEM port (old T3E layer)– New library doesn’t support locks yet– SHMEM was optimized for T3E, not XT3

• Need Tcl for fully functional NAMD– #ifdef out all socket and user info code– Same approach should work on BG/L

Page 30: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 Porting

• Random crashes even on short benchmarks– Same NAMD code as elsewhere– Same MPI layer as other platforms– Try the debugger (TotalView)

• Still buggy, won’t attach to running jobs• Managed to load a core file• Found pcqueue with item count of –1• Checking item count apparently fixes problem• Probably a compiler bug…the code looks fine

Page 31: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 Porting

• Performance limited (on 256 CPUs)– Only when printing energies every step– NAMD streams better than direct CmiPrintf()– I/O is unbuffered by default, 20ms per write– Create large buffer, remove NAMD flushes

• Fixes performance problem

• Can hit 6ms/step on 1024 CPUs…very good

• No output until end of job, may lose all in crash

Page 32: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

2 4 8 16 32 64 128 256

pro

cess

ors

x se

con

ds

per

ste

p

NAMD on New Platforms

PSC Cray XT3

21 ms/step4.1 ns/day

(perfect scaling is a horizontal line)

92K atoms, PME

NCSA 3.06 GHz Xeon

TeraGrid 1.5 GHz Itanium 2

number of processors

NCSA Altix 1.6 GHz Itanium 2

Page 33: What’s New With NAMD Triumph and Torture with New Platforms

NIH Resource for Biomolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray XT3 Conclusions

• Serial performance is reasonable– Itanium is faster for NAMD– Opteron requires less tuning work

• Scaling is outstanding (eventually)– Low system noise allows 6ms timesteps– NAMD latency tolerance may help

• Lack of OS features annoying, but workable• TCBG’s main allocation for this year