protein folding with python on...

Post on 22-Jul-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

June 30, 2010

Mitg

lied

de

r H

e lm

hol

tz-G

em

e in

scha

ft

Protein Folding with Python on Supercomputers

Jan H. Meinke

June 30, 2010 Slide 2

Research Centre Jülich

AustinAustin

JülichJülich

June 30, 2010 Slide 3

Jülich Supercomputing Centre

June 30, 2010 Slide 4

JUGENE

IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP

processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network

Number 5 worldwide, number 1 in Europe (Top500, June 2010)

June 30, 2010 Slide 5

JUGENE Compute Card

Source: IBM

June 30, 2010 Slide 6

Blue Gene/P Node Card

Source: IBM

June 30, 2010 Slide 7

Blue Gene/P Design

June 30, 2010 Slide 8

JUGENE

IBM BlueGene/P 72 racks 32-bit PowerPC 450 SMP

processor @ 850 MHz 294,912 cores 144 TB RAM 1 Petaflop/s peak 0.826 Petaflop/s Linpack 3D Torus network

Number 5 worldwide, number 1 in Europe (Top500, June 2010)

June 30, 2010 Slide 9

JuRoPA

Intel Nehalem Cluster Dual-socket, quad-core Intel

Nehalem @ 2.93 GHz 3288 nodes, 26,304 cores 79 TB RAM 308 Teraflop/s peak 275 Teraflop/s Linpack Infiniband with a Fat Tree

topology

Number 14 worldwide, number 3 in Europe (Top500, June 2010)

June 30, 2010 Slide 10

Non-blocking full “fat tree” (Infiniband)

JuRoPA Interconnect

June 30, 2010 Slide 11

Simulation Laboratory Biology

Olav Zimmermann Jan H. Meinke Sandipan Mohanty

sl-bio.jsc@fz-juelich.de

June 30, 2010 Folie 12

Simulation Laboratory BiologyServiceResearch

Community

SL BIO3 Ph.D.Scientists1 M.S. Student

Structure predictionProtein folding andaggregationParallel algorithms

Projects w/ SL BioScientific supportWorkshops

DatabasesSoftwareBenchmarks

June 30, 2010 Slide 13

Python at the SimLab Biology

WorkflowPrototyping Production

Analysis and visualization

Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).

Irbäck, A., Mitternacht, S. & Mohanty, S. PMC Biophysics 2, 2 (2009).

Zimmermann, O. & Hansmann, U.H.E. Journal of Chemical Information and Modeling 48, 1903-1908 (2008).

June 30, 2010 Slide 14

Proteins

ERVRISITARTKKEAEKFAAILIKVFAELGYNDINVTWDGDTVTEGQL

α-helix

β-sheet

June 30, 2010 Slide 15

Simple Molecular Mechanics for Proteins (SMMP)

Protein simulations with Monte Carlo Standard geometry (bond length and angle fixed) Dihedrals are degrees of freedom Force field: ECEPP/3

dihedral angles

http://apple.sysbio.info/~mjhsieh/sstour/

ω

June 30, 2010 Slide 16

PySMMP

Python modules:universe.pyprotein.py

Compiled Fortran code with binding:

smmp.so

Python modules:ParallelTempering.py

algorithms.py

Built with f2py

Wrapper around SMMP's internal data structure and property functions

Algorithms implemented on top of PySMMP

June 30, 2010 Slide 17

import universe, proteinimport ParallelTempering

seq = "EXAMPLES/1LQ7.seq"; var = ' '

myUniverse = universe.Universe()myProtein = protein.Protein(seq, var)myUniverse.add(myProtein)

Tmin = 250; Tmax = 1000; n = 32nequi = 10; sweeps = 60; nup = 1 try: dT = (Tmax - Tmin) / (n - 1.0)except: dT = 0

T = [int(Tmin + i * dT) for i in range(0, n)] myPT = ParallelTempering(myUniverse, nequi, sweeps, nup, T, seed=314)myPT.run()

June 30, 2010 Slide 18

Compiling PySMMP on JUGENE

Set environment variables export BGPGNU=/bgsys/drivers/ppcfloor/gnu-linux export F90=$BGPGNU/powerpc-bgp-linux/bin/gfortran

Use correct Python binary export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:\

$BGPGNU/lib $BGPGNU/bin/python /bgsys/local/numpy/1.2.1/bin/f2py

June 30, 2010 Slide 19

from mpi4py import MPIimport sys

size = MPI.COMM_WORLD.Get_size()rank = MPI.COMM_WORLD.Get_rank()name = MPI.Get_processor_name()

sys.stdout.write( "Hello, World! I am process %d of %d on %s.\n" % (rank, size, name))

helloworld.py

June 30, 2010 Slide 20

Launching Python

June 30, 2010 Slide 21

Parallel Tempering Monte Carlo

PPT=e E

T0

T1

T2

< <

P PT=e E

June 30, 2010 Slide 22

Parallel Tempering with SMMP and PySMMP

Calculation of Cartesian coordinates and energy.

SMMP PySMMP

Python modules:universe.pyprotein.py

algorithms.py

partem_p

common blocks

metropolis

ParallelTempering.py

June 30, 2010 Slide 23

GS-α3W (1LQ7)

Designed 3-helix bundle 67 amino acids 1110 atoms

June 30, 2010 Slide 24

Parallel Tempering with SMMP and PySMMP

Calculation of Cartesian coordinates and energy.

SMMP PySMMP

Python modules:universe.pyprotein.py

algorithms.py

partem_p

common blocks

metropolis

ParallelTempering.py

June 30, 2010 Slide 25

Scaling of the Energy Function

JuRoPA

JUGENE

SMMP

PySMMP

SMMP

PySMMP

June 30, 2010 Slide 26

Weak scaling of Parallel Tempering

June 30, 2010 Slide 27

Scaling of Parallel Tempering

June 30, 2010 Slide 28

Protein Clusters

Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).

Meinke, J.H. & Hansmann, U.H.E. J. Comp. Chem. 30, 1642--1648 (2009).

June 30, 2010 Slide 29

Clustering

Fully connected

Connected componentswith minimum number of links

June 30, 2010 Slide 30

Distance Between Two Protein Conformations

Root-mean square deviation (rmsd) Dihedral rmsd Overlap of contacts Scores

→ n2 operations

June 30, 2010 Slide 31

Density-Based Clustering

June 30, 2010 Slide 32

MAFIA

Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).

Nagesh, H., Goil, S. & Choudhary A., Data mining for scientific and engineering applications (2001).

June 30, 2010 Slide 33

PyMAFIA

def determineClusters(self):self.buildAdaptiveGrid()self.CDU = []

for A in xrange(0, self.d):for i in xrange(len(self.thresholds[A])):

self.CDU.append(((A, i), )) A = 0while self.CDU:

if A > 0:self.findCandidateDenseUnits()self.eliminateDuplicateCandidates()self.getDensityOfCDU()

self.identifyDenseUnits()A += 1

self.buildGraphOfDenseUnits() self.findClustersOfDenseUnits()

def buildAdaptiveGrid(self):

minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])

maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])

# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)

self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)

self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)

… …

def buildAdaptiveGrid(self):

minimum = np.array([self.data[:, A].min() for A in xrange(self.d)])

maximum = np.array([self.data[:, A].max() for A in xrange(self.d)])

# Get collective extrema self.globalMinimum = np.zeros(self.d) self.globalMaximum = np.zeros(self.d)

self.comm.Allreduce([minimum, self.dataType], [self.globalMinimum, self.dataType], op = MPI.MIN)

self.comm.Allreduce(maximum, self.comm.Allreduce(maximum, self.globalMaximum,self.globalMaximum, MPI.MAX)MPI.MAX)

… …

June 30, 2010 Slide 34

PyMAFIA in Action

June 30, 2010 Slide 35

( )

Conclusion

Python ready for developing HPC algorithms. ready for production runs in HPC. scales to 100 k cores on BG/P.

top related