project (protein) science mission - ibm research · pdf fileibm research blue gene science may...
TRANSCRIPT
IBM Research
Blue Gene Science May 11, 2005 © 2005 IBM Corporation
Project (Protein) Science Mission
Robert S. GermainBiomolecular Dynamics & Scalable Modelinghttp://www.research.ibm.com/bluegene
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Outline
Science Overview
Application
Planning infrastructure for BGW to support protein science mission
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
December 1999:
IBM Announces $100 Million Research Initiative to build World's Fastest Supercomputer
"Blue Gene" to Tackle Protein Folding Grand Challenge
YORKTOWN HEIGHTS, NY, December 6, 1999 -- IBM today announced a new $100 million exploratory research initiative to build a supercomputer 500 times more powerful than the world’s fastest computers today. The new computer -- nicknamed "Blue Gene" by IBM researchers -- will be capable of more than one quadrillion operations per second (one petaflop). This level of performance will make Blue Gene 1,000 times more powerful than the Deep Blue machine that beat world chess champion Garry Kasparov in 1997, and about 2 million times more powerful than today's top desktop PCs.
Blue Gene's massive computing power will initially be used to model the folding of human proteins, making this fundamental study of biology the company's first computing "grand challenge" since the Deep Blue experiment. Learning more about how proteins fold is expected to give medical researchers better understanding of diseases, as well as potential cures.
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Blue Gene Science Mission
Advance our understanding of biologically important processes via simulation, in particular the mechanisms behind protein folding
Current Activities include:– Thermodynamic & kinetic studies of model
peptide systems
– Structural and dynamical studies of membrane and membrane/protein systems
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Time Scales: Biopolymers and Membranes
10-15 10-12 10-9 10-6 10-3 1 103 106 109| | | | | | | | |
Bond Vibration
Adapted from “The Protein Folding Problem”, Chan and Dill, Physics Today, Feb. 1993
DNA Twisting
Hinge Motion
Helix-Coil Transition
Protein Folding
Ligand-Protein Binding
Electron Transfer
Lipid exchange via diffusion
Torsional correlation in lipid headgroups
Simulation Experiment
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
1 secVillin headpiece (36 residues) in 3000 water molecules
Cray T3EDuan & Kollman1998
100 psSegment B1 of Protein GCray T3D, T3E,C-90
Sheinerman & Brooks
1998
BPTIMcCammon, Gelin et al.
1977
Water, 216 moleculesRahman & Stillinger
1971
10 psLennard-Jones liquid(argon), 864 particles
CDC 3600Rahman1964
Hard spheresAlder & Wainwright
1959
50,000 cycles
64 particle chain with non-linear interactions
MANIACFermi, Pasta, Ulam
1955
Time ScaleSystemComp. Platform
InvestigatorsYear
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
The science plan – a spectrum of projects
systematically cover a range of system sizes, topological complexity– discovering the "rules" of folding
– applying those rules to have impact on disease
address a broad range of scientific questions and impact areas:– thermodynamics
– folding kinetics
– folding-related disease (CF, Alzheimer's, GPCR's)
improve our understanding not just of protein folding but protein function
1LE1
1L2Y1EOM
1ENH
1BBL
1LMB
1FME
GPCR in membrane
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
How can we use large scale computational resources?
Capability– Increase time scales
probed (strong scaling)– Increase system size
studied
Capacity– Improve sampling to reduce
statistical uncertainties– Run large ensembles of
trajectories
Make contact with experiment
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Protein Folding Simulations
Initial thermodynamic studies– 32-64 replicas running on SP2 (one node/replica)
– nanosecond-scale trajectories in each replica
– small systems (~5000 atoms)
Future studies:– 64-128 replicas running on BG/L (8-512 nodes/replica 64
rack simulations possible)
– tens of nanoseconds or more/replica
– Larger systems (20,000+ atoms)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
-hairpin Simulation
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Free Energy Landscape of Beta Hairpin (PNAS 2001)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
“trp-cage” folding (PNAS 2003)
Small 20 amino acid miniprotein
Simulations started from a completely unfolded state
Simulations could reproduce & explain sequence-dependent folding
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Membrane ProteinsMembrane processes enable:– cell signal detection, ion and nutrient transport
– infection processes target specific membranes
– Over 50% of drug discovery research targets are membrane proteins
Experiment and simulation play a concerted role in understanding membrane biophysics
Simulation can be validated by experiment
Simulation can then help to interpret experiment
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
GPCR-based drugs among the 200 best-selling prescriptions,and their GPCR targets
900Bristol-Myers SquibbStrokePlavixADP receptors
100Pharmacia UlcersCytotecProstaglandin (PGE1) receptors
90AstraZenecaParkinson’s diseases RequipDopamine receptors
740AstraZenecaCancerZoladexGnRH receptors
600BoehringerIngelheim
COPD AtroventMuscarinic acetylcholine receptors
940GlaxoSmithKlineAsthma Serevent
250GlaxoSmithKlineCongestive heart failure Coreg
580AstraZenecaToprol-XL
Adrenoceptors
1,700MerckHypertension
CozaarAngiotensin receptors 2,400Eli LillySchizophrenia Zyprexa
714Bristol-Myers Squibb Anxiety BuSpar
1,100GlaxoSmithKlineMigraine Imitrex
1,600Johnson & JohnsonPsychosis Risperdal
5-HT receptors
1,100AventiaAllegra
2,200Schering-Plough Allergies
Claritin
850Merck Pepcid
870AstraZenecaUlcers
Zantac
Histamine receptors
2000 sales(US $m) CompanyDiseaseDrugGPCR target
http://www.predixpharm.com/market_table.htm
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Rhodopsin and the Eye
http://www2.mrc-lmb.cam.ac.uk/groups/GS/eye.html
RetinaOuter segment
of Rod
Light sensitive
Protein
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Overview of Blue Gene Membrane Protein Studies
SOPE– Extensive hydrogen bonding network with headgroups– Excellent agreement with experiment for both structural
and dynamic properties
3:1 SDPC/Cholesterol– Cholesterol induces dramatic lateral organization– Cholesterol shows preference of STEA over DHA– Significant Angular anisotropy of Cholesterol Environment
GPCR in a membrane environment– Rhodopsin with 2:2:1 SDPC/SDPE/CHOL– 100 ns cis-retinal - 200+ ns trans-retinal– Current production rate ~5 hrs / ns on 1024 nodes BG/L
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Current Simulations of Rhodopsin in Membrane
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Rhodopsin in 2:2:1 SDPE/SDPC/Cholesterol after 120ns
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Simulations of Membranes & Membrane-bound proteins
Lipid bilayers (12,000 atoms) -- 10-30ns
Lipid bilayers with cholesterol – 10-30ns
Rhodopsin in lipid bilayer with cholesterol (44,000 atoms) – goal of microsecond-scale simulation
Simulations of larger functional units might involve 100,000+ atoms
Parameterization of coarse-grained models
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Usage
Problem set-up:starting structuregeneration,force field assignment
Post-processing:
....
analysis,visualization,
Monitoring Restart
Simulation
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Blue Matter – Application Platform for BG Protein ScienceArchitected from the ground up to track hardware architecture and incorporate “state-of-the-art” methodsPrototype platform for exploration of application frameworks suitable for cellular architecture machinesBlue Matter comprises all the necessary application components—those that run on BG/L and those that run on the host systems (offload function to host whenever possible)Protein folding simulations at SC2003 within 4 months of first hwLimited production science runs since May 2004, published work in early 2005
Setup
Monitoring & analysis
MD Core (massively parallel, minimal in size)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
What Limits the Scalability of MD?
Inherent limitations on concurrency:– Bonded force evaluation
* Represents only small fraction of computation, can be distributed moderately well.
– Real space non-bond force evaluation* Large fraction of computation, but good distribution can be achieved using volume or
interaction decompositions.
– Reciprocal space contribution to force evaluation for Ewald/P3ME* P3ME uses 3D FFT with global communication (global data dependencies)* Ewald with direct evaluation uses floating point reduction
Load balancing
Hardware and software overheads
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Scalability Results for Blue Matter
0.001
0.01
0.1
1
10 100 1000 10000
Ela
psed
Tim
e (s
econ
ds)
Node Count
91K atom Factor IX51K atom Mini FBP
43K atom Rhodopsin23K atom DHFR
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Blue Matter on BG/L vs. NAMD on PSC Lemieux
0.001
0.01
0.1
1
10
1 10 100 1000 10000
Ela
psed
Tim
e (s
econ
ds)
Node/CPU Count
NAMD ApoA1 on BG/L (MPI) PME every stepNAMD ApoA1 on Lemieux (Elan/Quadrics)
Blue Matter Factor IX on BG/L (MPI)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
3D-FFT Performance on MPI vs. BG/L Advanced Diagnostic Environment
0.0001
0.001
0.01
100 1000 10000 100000
Elap
sed
Tim
e (s
econ
ds)
Node Count
$128^3$ MPI$128^3$ Blade Single Core
$64^3$ MPI$64^3$ Blade Single Core
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Blue Matter DataflowBlue Matter Runtime
User Interface
MolecularSystemSetup:
CHARMM,IMPACT,AMBER,
etc.
XML Filter
Blue MatterMolecular System
Spec
RTPFile
DVSFile
MSD.cppFile
Blue MatterStart Process
Blue MatterParallel Application
Kernel on BG/W
Raw DataManagement
System(UDP/IP-based)
Restart Process(Checkpoint)
Long TermArchivalStorageSystem
OnlineAnalysis/
Monitoring/Visualization
RegressionTest Driver Scripts
DVSFilter
OfflineAnalysis
ReliableDatagrams
Probspec Db2
C++ Compiler
MD UDFRegistry
HardwarePlatform
Description
Visualization
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Blue Matter Performance on 43K Atom Rhodopsin
Node Count px py pz Dual Core Single Core Dual Core Single Core16 4 2 2 0.3646 0.4471 0.47 0.39
128 8 4 4 0.09113 0.13215 1.90 1.31512 8 8 8 0.02527 0.03172 6.84 5.45
1024 16 8 8 0.01845 0.02057 9.37 8.402048 16 16 8 0.01022 0.0137 16.91 12.614096 32 16 8 0.0135 0.0157 12.80 11.014096 16 16 16 0.00896 0.01035 19.29 16.70
Time/Time-step Computation Rate (ns/day)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Data Funnel to Tape
40 GB/sec. bandwidth from BG/L
0.5GB/sec
Tape Archive
12 x 3592
tape drives
Science Host
Hierarchical
Storage
Managment
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Data Rates for Molecular Dynamics
Data rates extrapolated from measured 512 node results on 43K atom
Rhodopsin system
Node Count Snapshot Period (ts) Data Rate (MB/sec.) Data Rate (GB/day)512 1 78.78 6646.69
20480 1 3151.03 265867.7720480 10 315.10 26586.7820480 100 31.51 2658.6820480 1000 3.15 265.8720480 10000 0.32 26.5920480 100000 0.03 2.66
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Outreach Activities
Blue Gene Protein Science Workshops– San Diego 2001– Edinburgh 2002– Brookhaven 2003
Blue Gene Seminar Series– Over 40 external speakers since inception (2000)
Collaborations– Bruce Berne (Columbia), Scott Feller (Wabash), Martin
Grubele (UIUC), Klaus Gawrisch (NIH), Vijay Pande (Stanford), Ken Dill (UCSF), Teresa Head-Gordon (UC Berkeley), Jeff Madura (Duquesne), Hans Andersen (Stanford)
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Acknowledgements
Alex BalaeffBruce BerneMaria EleftheriouScott FellerBlake FitchKlaus GawrischAlan GrossfieldJed PiteraBlue Gene Hardware and System Software teams, particularly Mark Giampapa
Mike Pitman Alex RayshubskiyYuk ShamFrank SuitsBill SwopeChris WardYuri ZhestkovRuhong ZhouFacilities & BGW Support Staff
IBM Research
© 2005 IBM CorporationBlue Gene Science May 11, 2005
Selected Publications
Role of Cholesterol and Polyunsaturated Chains in Lipid-Protein Interactions: Molecular Dynamics Simulation of Rhodopsin in a Realistic Membrane Environment J. Am. Chem. Soc.; 2005; 127(13) pp 4576 - 4577Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids in a Phosphatidylcholine BilayerContaining Cholesterol; Biochemistry 43(49); 2004; 15318-15328Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of Physical Chemistry B; 2004; 108(21); 6571-6581Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to AlanineDipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 2004; 108(21); 6582-6594 Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, PNAS USA, Vol. 100, Issue 13, June 24, 2003, pp. 7587-7592Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol. 98, Issue 26, December 18, 2001, pp. 14931-14936