gtl facilities computing infrastructure for 21 st century systems biology ed uberbacher ornl &...
TRANSCRIPT
GTL Facilities ComputingInfrastructure for 21st Century
Systems Biology
Ed UberbacherORNL
&Mike Colvin
LLNL
Experimental:•Complete datasets•Quantitative measurements•Comprehensive physical characterization:
Protein expression and interactions Spatial distributions Process kinetics
Computational:•Automated data analysis and validation•Automated integration of diverse data sets•Human and computer-accessible databases•Molecular, Pathway and cell-level
simulations
The goals require a new synergy
between computing
and biology.
Ultimate Goal is to Provide Predictive Models of Microbes
This goal drives data collection and computing strategy.
GTL Biology ParadigmIntegrated Large-Scale Experiment-Computing Cycles
Real-Time Analysis
Design or Revise Models
Design or Revise Models
Large-ScaleData Sets
Large-ScaleData Sets
Simulate andGenerate Hypotheses
Simulate andGenerate Hypotheses
ExperimentExperiment
Facility IProduction and Characterization of ProteinsEstimating Microbial Genome Capability
Computational Analysis Genome analysis of genes, proteins, and operons
Metabolic pathways analysis from reference data
Protein machines estimate from PM reference data
Knowledge Captured Initial annotation of genome
Initial perceptions of pathways and processes
Recognized machines, function, and homology
Novel proteins/machines (including prioritization)
Production conditions and experience
Analysis and Modeling Mass spectrometry expression analysis
Metabolic and regulatory pathway / network analysis and modeling
Knowledge Captured Expression data and conditions
Novel pathways and processes
Functional inferences about novel proteins/machines
Genome super annotation: regulation, function, and processes (deep knowledge about cellular subsystems)
Facility II: Whole Proteome AnalysisModeling Proteome Expression, Regulation, and Pathways
Facility III: Characterization and Imaging
of Molecular MachinesExploring Molecular Machine Geometry and Dynamics
Computational Analysis, Modeling and Simulation
Image analysis/cryoelectron microscopy
Protein interaction analysis/mass spec
Machine geometry and docking modeling
Machine biophysical dynamic simulation
Knowledge Captured
Machine composition, organization, geometry, assembly and disassembly
Component docking and dynamic simulations of machines
Classical Mol. DynamicsJeruzalmi et al. Cell 106:417 (2001)
Mechanistic model based on physical and biochemical data Jeruzalmi et al. Cell 106:429 (2001)
Electron microscopy Mayanagi et al. J. Struct. Bio. 134: 35 (2001)
Homology Modeling Venclovas et al. Prot. Sci. 11:2403 (2002)
Atomic Force MicroscopyShiomi, et al. PNAS, 97:14127 (2002)
Example of Combined Experiment and Modeling to Understand a Multiprotein Complex: DNA Clamps & Clamp-Loading Mechanisms
Facility IV: Analysis and Modeling
of Cellular Systems Simulating Cell and Community Dynamics
Analysis, Modeling and Simulation Couple knowledge of pathways, networks,
and machines to generate an understanding of cellular and multi-cellular systems
Metabolism, regulation, and machine simulation
Cell and multicell modeling and flux visualization
Knowledge Captured Cell and community measurement data sets Protein machine assembly time-course data
sets Dynamic models and simulations of cell
processes
Facility 1genome annotationregulatory element and operon identificationmetabolic pathway analysis
Facility 2mass spec data analysisexpression analysis and clusteringmetabolic and regulatory network modeling
Facility 3image analysismass spec analysisprotein / machine modelingdocking and molecular dynamics
Facility 4metabolic simulationregulatory simulationcell modeling and simulations
Collect and manage software - Maintain current versions - Ensure hardware compatability - User Interfaces - Documentation
Centrally Planned Analysis and Modeling Tools Libraries
ATCGTAGCAATCGACCGT...CGGCTATAGCCGTTACCG…TTATGCTATCCATAATCGA...GGCTTAATCGCATACGAC...
Capacity: e.g., High-throughput protein structure predictions
Thread ontotemplates
Bestmatch
Capability: e.g., Large scale biophysical simulations:
Large size and timescale classical simulations
Highly accurate quantum mechanical simulations
GTL facilities will Require High Performance Computing for Both Capacity and Capability
GTL High-Performance Computing Roadmap
Biological Complexity
ComparativeGenomics
Constraint-BasedFlexible Docking
1000 TF
100 TF
10 TF
1 TF*
Constrained rigid
docking
Genome-scale protein threading
Community metabolic regulatory, signaling simulations
Molecular machine classical simulation
Protein machineInteractions
Cell, pathway, and network
simulation
Molecule-basedcell simulation
*Teraflops
Current U.S. Computing
Swimming in Data: Exploding Need to Capture and Manipulate Data
● From Acquisition, Refinement, Reduction and Deposition
● Across Scales of Space and Time - Petabytes
Data Repositories Genomes, annotation and community ‘genomes’
Expression data and proteome composition
Metabolite and flux data
Metabolic pathways and kinetic parameters
Protein interactions
Protein machines repository - machine composition, function, homology, models
Image data repository
Regulatory network data and models
Cell models repository
Integrated or integrable
Requires development of cross-facilities approach
phylogeny
microbialgenomes
proteindomains
pathways
regulatoryelements
communitygenomes
literature
Metabolicmodels
Expression
proteomics
proteinmachines
regulatorynetworks
proteinstructure
Central Database Planning
Simulation of even “simple” metabolic pathway depends on large volume data
Annotated data sets Raw data sets
The GTL Knowledge Base:Integration of Large Datasets is a Precursor to Predictive Modeling
•GTL knowledge base will change how information about microbes reaches the community•Models and simulations will be online•We will know more and more about systems in each consecutive microbe