biosimgrid and biosimgrid ’lite’ - towards a worldwide repository for biomolecular simulation ...
TRANSCRIPT
BioSimGRID and BioSimGRID ’lite’
-Towards a worldwide repository for biomolecular simulation
www.biosimgrid.org
Philip C Bigginhttp://[email protected]
OverviewOverview
• Introduction- Motivation- Consortium- Case studies – added value from comparisons
• Design- Architecture- Data schema
• How to use- Deposition- Analysis- Worldwide application
• The Future- Towards computational systems biology
Current Paradigm for MD SimulationsCurrent Paradigm for MD Simulations
Target selection: literature based; interesting protein/problem
System preparation: highly interactive; slow; idiosyncratic
Simulation: diversity of protocols
Analysis: highly interactive; slow; idiosyncratic
Dissemination: traditional – papers, posters, talks
Archival: ‘archive’ data … and then mislay the tape!
No third party involvement
Integrating Simulations and Structural Biology of ProteinsIntegrating Simulations and Structural Biology of Proteins
Novel structure(RCSB)
Sequence alignmentBiomedically relevant homologue(s)
Homology model(s)
MD simulationsBiomolecular simulation database
Comparative analysis
Evaluation/refinement of model
Biological and pharmacological simulation & modellinge.g. drug discovery
bacterial K channel
mammalian K channel
dynamics in membrane
drug docking calculations
Interaction site dynamics
bioi
nfo
rmat
ics
& s
tru
ctur
al
biol
ogy
Bio
Sim
GR
IDdr
ug
disc
over
y
ConsortiumConsortium
York
Nottingham
OxfordRAL
Southampton
LondonBristol
• Oxford: Mark Sansom, Paul Jeffreys, Bing Wu, Kaihsu Tai
• Southampton: Jon Essex, Simon Cox, Stuart Murdock, Muan Hong Ng, Hans Fogohr,
Steven Johnston
• London: David Moss
• Nottingham: Charlie Laughton
• York: Leo Caves
• Bristol: Adrian Mulholland
Comparative Simulations: Drug ReceptorsComparative Simulations: Drug Receptors
Why? – increase significance of results
Sampling – long simulations and multiple simulations
Sampling via biology – exploiting evolution
Biology emerges from comparisons…
e.g. mammalian receptor vs. bacterial binding protein
Rat GluR2 EC fragment Major receptor in mammalian
brains – drug target MD simulations with/without
bound ligands Analyse inter-domain motions
glutamate
D1
D2
GluR2 – Flexibility & Gating…GluR2 – Flexibility & Gating…
Flexibility depends on ligand occupancy & species
Gating mechanism – decrease in flexibility on channel activation
But … incomplete sampling Need: longer simulations &
comparative simulations
empty Kainate Glutamate
>> >
“OFF” “ON”
0 1.0 1.50.5
1
2
3
4
time (ns)
RM
SD
(Å
)
0
empty
+Kai
+Glu
2.0
GlnBP – A Bacterial Binding ProteinGlnBP – A Bacterial Binding Protein
GlnBP – bacterial 2-domain periplasmic binding protein
Similar fold to mammalian GluR2
X-ray shows ligand binding induces domain closure
MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations
+ Gln
empty Gln bound
X-ray structuresMD Simulation
empty
Gln bound
So how do compare…So how do compare…
Similar active sites or similar motions
Different structures
Simulated with different MD packages (analysis difficult if not visualization)
On different hard drives/tapes/CDs/DVDs.
Under different graduate students’ desks
Under different postdocs’ beds
In different rubbish bins!
BioSimGrid = BioSimDB + Toolkits + Integration
Answer…Answer…
Create a wordwide repository of molecular simulations….
GUI
Service
DB/Data
Web ApplicationWeb Application Python ApplicationPython Application
Apache / Tomcat / SSL / Python
Authentication Authorisation Accounting
DataRetrievalTool
AnalysisTool
HTML Generator
DataDepositionTool
SQLEditor
Trajectory Query Tool
Video/Img Engine
BioSim Data Engine / Storage Resource Broker
HTTP(S) SSH
TCP/IP
TCP/IP
Middle-ware
DatabaseDatabase Flat FilesFlat Files
BioSimGrid Architecture…BioSimGrid Architecture…
DB Flat File
Size/GB 7.5 3.0
Random Access /s
560.8 18.6
Sequential Access
389.0 5.5
• BioSimDB = PDB (or NDB) for MD
enable discovery of new science (cf. genomics/proteomic initiatives)
BioSimDB
CHARMM
AMBER
NAMD
LAMMPS
TINKERGROMACS
Cross-software Analysis…Cross-software Analysis…
It’s a Distributed DatabaseIt’s a Distributed Database
Nobody has enough disk space in one place anyway
Distributed and duplicate
Any piece of information is stored in at least two sites
…for resilience
DB Interface
BioSim Data Engine Services
DB Engine
DatabaseDatabase Flat Files
Flat Files
F/F Engine
F/F Interface
oxford.biosimgrid.orgoxford.biosimgrid.org soton.biosimgrid.orgsoton.biosimgrid.org
CacheCache
BioSim Data Engine Services
DB Interface
DB Engine
DatabaseDatabaseFlat Files
Flat Files
F/F Engine
F/F Interface
CacheCache
SRBAgent
SRBAgent
SRBServer
MCATIDASRB
ServerMCAT IDA
Current ArchitectureCurrent Architecture
Data SchemaData Schema
The hierachy is like that in the PDB: Chain residue atom coordinate …but also extended in the time dimension: frames
Metadata..Metadata..
…is the data about data
MD setup, parameters, instantaneous properties, etc.
People currently write this in papers
People forget something
The disciplined way:-
…structured schema
• Analysis tools
BioSimDB ToolkitBioSimDB Toolkit
Radius of Gyration
Surface and Volume
RMSD/RMSF
Centre of Mass
Inter-atomic distances
Distance matrix
Internal angles
Principal Component Analysis
Average structure
New workflow with BioSimGridNew workflow with BioSimGrid
Target selection: literature based; interesting protein/problem
Perform simulation (or use someone else’s)
Protocals more systematically recorded/checked/confirmed
Archive data to BioSimGrid
Analyse shared data (either locally or distributed)
Dissemination: traditional – papers, posters, talks
Store results in BioSimGrid
Third parties can analyse data you deposit
That’s dandy - but who is this aimed at?That’s dandy - but who is this aimed at?
• Novice and Expert..
Novice (web/GUI) Makes selections Guided through the options Can only do specific things Difficult to make mistakes
Expert (employ scripting) Python interpreter Much available Reasonably unrestricted
Example sessionsExample sessions
Even in script mode the syntax is quite informative:-
FC = FrameCollection(`2, 100-200`) myRMSD = RMSD(FC)
myRMSD.createPNG()
Provide biochemists with little computational experience a means of analysing computational data and obtain meaningful results.
BioSimGrid ‘Lite’BioSimGrid ‘Lite’
Light version before final rollout
Provides equilibrated lipid bilayer boxes
Also provides ontogeny: How the box came about…
…metadata
…equilibration process (all the frames)
Deliverables to Date…Deliverables to Date…
• Database schema
• Sample database (with test trajectories)
• Prototype shared between 2 sites
• Analysis tools – preliminary versions (about 14 tools)
• Interface to database for data retrieval
• Python hosting environment
RoadmapRoadmap
Dec 2002 – project started
July 2003 – (internal) prototype
September 2003 – working prototype (All Hands meeting)
November 2003 – test ‘real world’ applications
December 2003 – multi-site prototype
2004 – multi-site deposition of data
2005 – open up to additional groups for deposition/testing
If you are interested…If you are interested…
The team would like to hear from interested parties especially with new ideas etc
Benefits to you
New directions are implemented Toolkit suits your needs Shared development of code Faster and more thorough development
BioSimGrid Benefits
Larger user community More work gets done Code is efficient.
BioSimGrid and community is successful
Future Directions in the GRID contextFuture Directions in the GRID context
1. HTMD – simulations coupled to structural genomics
Diamond light source
2. Computational system biology – virtual outer membrane
HPCx
3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling
GRID-enabled simulations
1. HTMD – simulations coupled to structural genomics
Diamond light source
2. Computational system biology – virtual outer membrane
HPCx
3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling
GRID-enabled simulations
BioSimGridBioSimGrid
Structural Genomics & HTMDStructural Genomics & HTMD
Overall vision – simulation as an integral component of structural genomics
Needs capacity computation – GRID?
MD database (distributed) – BioSimGRID
synchrotron
MD database
novel biology…
compute GRID
Towards a Virtual Outer Membrane (vOM)Towards a Virtual Outer Membrane (vOM)
Om
pT
Om
pX
Om
pA
Om
pF
PhoE
FhuA
Pi
TolC
LamB
FhuDMalE
PiBP
OM
PLA
OpcA
- - - -+
Pi
TonB
First step towards computational systems biology – a suitable system
Bacterial OMs – 5 or 6 proteins = 90% of protein content
Structures or good homology models of proteins are available
Complex lipid – outer leaflet is lipopolysaccharide (LPS)
Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns
cf. current FhuA – 80,000 atoms & 10 ns – need HPCx
Multiscale Biomolecular SimulationsMultiscale Biomolecular Simulations
Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids)
Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases
Need for GRID-based integrated simulations
QM (Bristol)
Drug-binding (Southampton)
Protein Motions (Oxford)
Drug Diffusion (London)
References…References…
1. K. Tai, S. Murdock, B.Wu, MH Ng, S. Johnston, H. Fangohr, S. Cox, P Jeffreys, J. Essex, M.S.P. Sansom. Org. Biomol. Chem :: Under review
2. MH Ng, S. Johnston, S. Murdock, B. Wu, K. Tai, H. fangohr, S. Cox, J. Essex, M.S.P. Sansom, P.Jeffrey.
UK E-Science Programme All Hands Meeting 2004 :: Accepted.
3. Python Website – www.python.org
4. BioSimGrid – www.biosimgrid.org
Elsewhere
Leo Caves (York)
Charles Laughton (Nottingham)
David Moss (Birkbeck)
Oliver Smart (Birmingham)
Adrian Mulholland (Bristol)
Marc Baaden (Paris)
Southampton
Dr Stuart Murdock (generic analysis tools)
Dr Muan Hong Ng (data retrieval)
Dr Hans Fangohr
Steven Johnston
Prof Simon Cox
Dr Jon Essex
Oxford
Professor Mark Sansom
Dr Carmen Domene
Dr Alessandro Grottesi
Dr Andrew Hung
Dr Daniele Bemporad
Dr Shozeb Haider
Dr Kaihsu Tai (curation and integration)
Dr George Patargias
Oliver Beckstein Jennifer Johnston
Syma Khalid Jorge Pikunic
Pete Bond Zara Sands
Jonathan Cuthbertson Sundeep Deol
Jeff Campbell Yalini Pathy
Loredana Vaccaro Shiva Amiri
Katherine Cox Robert d’Rozario
John Holyoake Samantha Kaye
Anthony Ivetac Sylvanna Ho
Oxford e-Science Center
Professor Paul Jeffreys
Dr Bing Wu (database management)
Matthew Dovey
Ivaylo Kostadinov
BBSRC DTI The Wellcome Trust GSK
EC (TMR) OeSC (EPSRC & DTI) EPSRC OSC (JIF)
MRC
AcknowledgementsAcknowledgements