project (protein) science mission - ibm research · pdf fileibm research blue gene science may...

32
IBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain Biomolecular Dynamics & Scalable Modeling http://www.research.ibm.com/bluegene

Upload: lehuong

Post on 30-Jan-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

Blue Gene Science May 11, 2005 © 2005 IBM Corporation

Project (Protein) Science Mission

Robert S. GermainBiomolecular Dynamics & Scalable Modelinghttp://www.research.ibm.com/bluegene

Page 2: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Outline

Science Overview

Application

Planning infrastructure for BGW to support protein science mission

Page 3: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

December 1999:

IBM Announces $100 Million Research Initiative to build World's Fastest Supercomputer

"Blue Gene" to Tackle Protein Folding Grand Challenge

YORKTOWN HEIGHTS, NY, December 6, 1999 -- IBM today announced a new $100 million exploratory research initiative to build a supercomputer 500 times more powerful than the world’s fastest computers today. The new computer -- nicknamed "Blue Gene" by IBM researchers -- will be capable of more than one quadrillion operations per second (one petaflop). This level of performance will make Blue Gene 1,000 times more powerful than the Deep Blue machine that beat world chess champion Garry Kasparov in 1997, and about 2 million times more powerful than today's top desktop PCs.

Blue Gene's massive computing power will initially be used to model the folding of human proteins, making this fundamental study of biology the company's first computing "grand challenge" since the Deep Blue experiment. Learning more about how proteins fold is expected to give medical researchers better understanding of diseases, as well as potential cures.

Page 4: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Blue Gene Science Mission

Advance our understanding of biologically important processes via simulation, in particular the mechanisms behind protein folding

Current Activities include:– Thermodynamic & kinetic studies of model

peptide systems

– Structural and dynamical studies of membrane and membrane/protein systems

Page 5: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Time Scales: Biopolymers and Membranes

10-15 10-12 10-9 10-6 10-3 1 103 106 109| | | | | | | | |

Bond Vibration

Adapted from “The Protein Folding Problem”, Chan and Dill, Physics Today, Feb. 1993

DNA Twisting

Hinge Motion

Helix-Coil Transition

Protein Folding

Ligand-Protein Binding

Electron Transfer

Lipid exchange via diffusion

Torsional correlation in lipid headgroups

Simulation Experiment

Page 6: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

1 secVillin headpiece (36 residues) in 3000 water molecules

Cray T3EDuan & Kollman1998

100 psSegment B1 of Protein GCray T3D, T3E,C-90

Sheinerman & Brooks

1998

BPTIMcCammon, Gelin et al.

1977

Water, 216 moleculesRahman & Stillinger

1971

10 psLennard-Jones liquid(argon), 864 particles

CDC 3600Rahman1964

Hard spheresAlder & Wainwright

1959

50,000 cycles

64 particle chain with non-linear interactions

MANIACFermi, Pasta, Ulam

1955

Time ScaleSystemComp. Platform

InvestigatorsYear

Page 7: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

The science plan – a spectrum of projects

systematically cover a range of system sizes, topological complexity– discovering the "rules" of folding

– applying those rules to have impact on disease

address a broad range of scientific questions and impact areas:– thermodynamics

– folding kinetics

– folding-related disease (CF, Alzheimer's, GPCR's)

improve our understanding not just of protein folding but protein function

1LE1

1L2Y1EOM

1ENH

1BBL

1LMB

1FME

GPCR in membrane

Page 8: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

How can we use large scale computational resources?

Capability– Increase time scales

probed (strong scaling)– Increase system size

studied

Capacity– Improve sampling to reduce

statistical uncertainties– Run large ensembles of

trajectories

Make contact with experiment

Page 9: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Protein Folding Simulations

Initial thermodynamic studies– 32-64 replicas running on SP2 (one node/replica)

– nanosecond-scale trajectories in each replica

– small systems (~5000 atoms)

Future studies:– 64-128 replicas running on BG/L (8-512 nodes/replica 64

rack simulations possible)

– tens of nanoseconds or more/replica

– Larger systems (20,000+ atoms)

Page 10: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

-hairpin Simulation

Page 11: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Free Energy Landscape of Beta Hairpin (PNAS 2001)

Page 12: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

“trp-cage” folding (PNAS 2003)

Small 20 amino acid miniprotein

Simulations started from a completely unfolded state

Simulations could reproduce & explain sequence-dependent folding

Page 13: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Membrane ProteinsMembrane processes enable:– cell signal detection, ion and nutrient transport

– infection processes target specific membranes

– Over 50% of drug discovery research targets are membrane proteins

Experiment and simulation play a concerted role in understanding membrane biophysics

Simulation can be validated by experiment

Simulation can then help to interpret experiment

Page 14: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

GPCR-based drugs among the 200 best-selling prescriptions,and their GPCR targets

900Bristol-Myers SquibbStrokePlavixADP receptors

100Pharmacia UlcersCytotecProstaglandin (PGE1) receptors

90AstraZenecaParkinson’s diseases RequipDopamine receptors

740AstraZenecaCancerZoladexGnRH receptors

600BoehringerIngelheim

COPD AtroventMuscarinic acetylcholine receptors

940GlaxoSmithKlineAsthma Serevent

250GlaxoSmithKlineCongestive heart failure Coreg

580AstraZenecaToprol-XL

Adrenoceptors

1,700MerckHypertension

CozaarAngiotensin receptors 2,400Eli LillySchizophrenia Zyprexa

714Bristol-Myers Squibb Anxiety BuSpar

1,100GlaxoSmithKlineMigraine Imitrex

1,600Johnson & JohnsonPsychosis Risperdal

5-HT receptors

1,100AventiaAllegra

2,200Schering-Plough Allergies

Claritin

850Merck Pepcid

870AstraZenecaUlcers

Zantac

Histamine receptors

2000 sales(US $m) CompanyDiseaseDrugGPCR target

http://www.predixpharm.com/market_table.htm

Page 15: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Rhodopsin and the Eye

http://www2.mrc-lmb.cam.ac.uk/groups/GS/eye.html

RetinaOuter segment

of Rod

Light sensitive

Protein

Page 16: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Overview of Blue Gene Membrane Protein Studies

SOPE– Extensive hydrogen bonding network with headgroups– Excellent agreement with experiment for both structural

and dynamic properties

3:1 SDPC/Cholesterol– Cholesterol induces dramatic lateral organization– Cholesterol shows preference of STEA over DHA– Significant Angular anisotropy of Cholesterol Environment

GPCR in a membrane environment– Rhodopsin with 2:2:1 SDPC/SDPE/CHOL– 100 ns cis-retinal - 200+ ns trans-retinal– Current production rate ~5 hrs / ns on 1024 nodes BG/L

Page 17: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Current Simulations of Rhodopsin in Membrane

Page 18: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Rhodopsin in 2:2:1 SDPE/SDPC/Cholesterol after 120ns

Page 19: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Simulations of Membranes & Membrane-bound proteins

Lipid bilayers (12,000 atoms) -- 10-30ns

Lipid bilayers with cholesterol – 10-30ns

Rhodopsin in lipid bilayer with cholesterol (44,000 atoms) – goal of microsecond-scale simulation

Simulations of larger functional units might involve 100,000+ atoms

Parameterization of coarse-grained models

Page 20: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Usage

Problem set-up:starting structuregeneration,force field assignment

Post-processing:

....

analysis,visualization,

Monitoring Restart

Simulation

Page 21: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Blue Matter – Application Platform for BG Protein ScienceArchitected from the ground up to track hardware architecture and incorporate “state-of-the-art” methodsPrototype platform for exploration of application frameworks suitable for cellular architecture machinesBlue Matter comprises all the necessary application components—those that run on BG/L and those that run on the host systems (offload function to host whenever possible)Protein folding simulations at SC2003 within 4 months of first hwLimited production science runs since May 2004, published work in early 2005

Setup

Monitoring & analysis

MD Core (massively parallel, minimal in size)

Page 22: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

What Limits the Scalability of MD?

Inherent limitations on concurrency:– Bonded force evaluation

* Represents only small fraction of computation, can be distributed moderately well.

– Real space non-bond force evaluation* Large fraction of computation, but good distribution can be achieved using volume or

interaction decompositions.

– Reciprocal space contribution to force evaluation for Ewald/P3ME* P3ME uses 3D FFT with global communication (global data dependencies)* Ewald with direct evaluation uses floating point reduction

Load balancing

Hardware and software overheads

Page 23: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Scalability Results for Blue Matter

0.001

0.01

0.1

1

10 100 1000 10000

Ela

psed

Tim

e (s

econ

ds)

Node Count

91K atom Factor IX51K atom Mini FBP

43K atom Rhodopsin23K atom DHFR

Page 24: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Blue Matter on BG/L vs. NAMD on PSC Lemieux

0.001

0.01

0.1

1

10

1 10 100 1000 10000

Ela

psed

Tim

e (s

econ

ds)

Node/CPU Count

NAMD ApoA1 on BG/L (MPI) PME every stepNAMD ApoA1 on Lemieux (Elan/Quadrics)

Blue Matter Factor IX on BG/L (MPI)

Page 25: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

3D-FFT Performance on MPI vs. BG/L Advanced Diagnostic Environment

0.0001

0.001

0.01

100 1000 10000 100000

Elap

sed

Tim

e (s

econ

ds)

Node Count

$128^3$ MPI$128^3$ Blade Single Core

$64^3$ MPI$64^3$ Blade Single Core

Page 26: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Blue Matter DataflowBlue Matter Runtime

User Interface

MolecularSystemSetup:

CHARMM,IMPACT,AMBER,

etc.

XML Filter

Blue MatterMolecular System

Spec

RTPFile

DVSFile

MSD.cppFile

Blue MatterStart Process

Blue MatterParallel Application

Kernel on BG/W

Raw DataManagement

System(UDP/IP-based)

Restart Process(Checkpoint)

Long TermArchivalStorageSystem

OnlineAnalysis/

Monitoring/Visualization

RegressionTest Driver Scripts

DVSFilter

OfflineAnalysis

ReliableDatagrams

Probspec Db2

C++ Compiler

MD UDFRegistry

HardwarePlatform

Description

Visualization

Page 27: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Blue Matter Performance on 43K Atom Rhodopsin

Node Count px py pz Dual Core Single Core Dual Core Single Core16 4 2 2 0.3646 0.4471 0.47 0.39

128 8 4 4 0.09113 0.13215 1.90 1.31512 8 8 8 0.02527 0.03172 6.84 5.45

1024 16 8 8 0.01845 0.02057 9.37 8.402048 16 16 8 0.01022 0.0137 16.91 12.614096 32 16 8 0.0135 0.0157 12.80 11.014096 16 16 16 0.00896 0.01035 19.29 16.70

Time/Time-step Computation Rate (ns/day)

Page 28: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Data Funnel to Tape

40 GB/sec. bandwidth from BG/L

0.5GB/sec

Tape Archive

12 x 3592

tape drives

Science Host

Hierarchical

Storage

Managment

Page 29: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Data Rates for Molecular Dynamics

Data rates extrapolated from measured 512 node results on 43K atom

Rhodopsin system

Node Count Snapshot Period (ts) Data Rate (MB/sec.) Data Rate (GB/day)512 1 78.78 6646.69

20480 1 3151.03 265867.7720480 10 315.10 26586.7820480 100 31.51 2658.6820480 1000 3.15 265.8720480 10000 0.32 26.5920480 100000 0.03 2.66

Page 30: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Outreach Activities

Blue Gene Protein Science Workshops– San Diego 2001– Edinburgh 2002– Brookhaven 2003

Blue Gene Seminar Series– Over 40 external speakers since inception (2000)

Collaborations– Bruce Berne (Columbia), Scott Feller (Wabash), Martin

Grubele (UIUC), Klaus Gawrisch (NIH), Vijay Pande (Stanford), Ken Dill (UCSF), Teresa Head-Gordon (UC Berkeley), Jeff Madura (Duquesne), Hans Andersen (Stanford)

Page 31: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Acknowledgements

Alex BalaeffBruce BerneMaria EleftheriouScott FellerBlake FitchKlaus GawrischAlan GrossfieldJed PiteraBlue Gene Hardware and System Software teams, particularly Mark Giampapa

Mike Pitman Alex RayshubskiyYuk ShamFrank SuitsBill SwopeChris WardYuri ZhestkovRuhong ZhouFacilities & BGW Support Staff

Page 32: Project (Protein) Science Mission - IBM Research · PDF fileIBM Research Blue Gene Science May 11, 2005 © 2005 IBM Corporation Project (Protein) Science Mission Robert S. Germain

IBM Research

© 2005 IBM CorporationBlue Gene Science May 11, 2005

Selected Publications

Role of Cholesterol and Polyunsaturated Chains in Lipid-Protein Interactions: Molecular Dynamics Simulation of Rhodopsin in a Realistic Membrane Environment J. Am. Chem. Soc.; 2005; 127(13) pp 4576 - 4577Molecular-Level Organization of Saturated and Polyunsaturated Fatty Acids in a Phosphatidylcholine BilayerContaining Cholesterol; Biochemistry 43(49); 2004; 15318-15328Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of Physical Chemistry B; 2004; 108(21); 6571-6581Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to AlanineDipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 2004; 108(21); 6582-6594 Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, PNAS USA, Vol. 100, Issue 13, June 24, 2003, pp. 7587-7592Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol. 98, Issue 26, December 18, 2001, pp. 14931-14936