the subliminal toolbox: automating steps in the reconstruction of metabolic networks
DESCRIPTION
TRANSCRIPT
The Subliminal Toolbox: automating steps in the
reconstruction of metabolic networks
Neil SwainstonManchester Centre for Integrative Systems Biology
Integrative Bioinformatics 2011, Wageningen, Netherlands22 March 2011
Metabolic networks
• Computational and mathematical representation of the metabolic capabilities of a given organism
• On a genome-scale• ~1000 unique metabolites• ~1000 unique reactions
• Predictive, simulatable
Metabolic networks
Metabolic reactions
A + B
C + D
Gene / enzyme
E1Protonation
C + DH+Mass balancing
A + 2B + H+
AextExtracellular
Intracellular
Transport reactions
Mitochondria
Cytosol
Cm
Compartmentalisation
biomass
aa nucl
Biomass objective
T1
• Goal: generate biomass from growth medium?
How are they generated?
• Traditionally: manually
• Start with KEGG download? Genome sequence?• Collated / edited in spreadsheets• Many steps done by hand• Curated in focussed meetings (“jamborees”)
• Expensive• Boring
Automation
• Many of these steps can be automated• Subliminal Toolbox
• Goal is to generate a metabolic reconstruction automatically• Manual curation still necessary• BUT reduce what needs to be done
• Investigation• Can we automate the generation of a metabolic
network in yeast?
KEGG MetaCyc
Merge pathways
Balance reactions
Add transport reactions
Draft
(De)protonate metabolites
Balance reactions
(De)protonate metabolites
Merge
Add compartmentalisation
Add biomass reaction
Initial draft
• Both KEGG and MetaCyc allow export of pathways / networks in SBML
• BUT these are representations of the database, NOT computational models
• Merging issue:• Components are named inconsistently
Naming
• Glucose, glc, D-glucose, alpha-D-glucose?
• Need to be reconciled
• Use semantic annotations• ChEBI terms for metabolites• UniProt terms for enzymes• Apply MIRIAM standard (RDF and URIs)
MIRIAM annotation
<species metaid="_glc" id="glc" name="D-Glucose">
</species>
MIRIAM annotation
<species metaid="_glc" id="glc" name="D-Glucose">
<annotation>
<rdf:RDF>
<rdf:Description rdf:about="#_glc">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI:17634"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
</species>
Exploiting model annotations
• MIRIAM annotations provide unambiguous, unique identifiers for model components
• But – also provide link to chem/bioinformatics resources via web services• Models become “live” and increase in utility as
resources develop• Kinetic parameters accessible from ChEBI• Improving annotation in UniProt (phospho sites,
etc.)• Extract data through web services
UniProtChEBIKEGG
libAnnotationSBML
MIRIAM / RDF annotation Molecular formula, protein sequence, etc.
Merging
• Standard identifiers: job done?
• Inconsistent charge states• Pyruvic acid and pyruvate
Charge state determination
• Annotated ChEBI terms provides web service access to structural data• InChI, SMILES strings• InChI=1/C3H4O3/c1-2(4)3(5)6/h1H3,(H,5,6)/p-1/fC3H3O3/q-1• CC(=O)C([O-])=O
• Cheminformatics software (ChemAxon MARVIN) can be used to predict charge state at given pH• Consistency
✓✗
Stereochemistry
• KEGG and MetaCyc are inconsistent in their definition of stereochemical precision
• Considered different: apparently minor but can cause gaps in the network
beta-D-glucose D-glucose
ChEBI ontology
• ChEBI is an ontology and contains relationships between metabolites
Stereochemistry-induced gaps
X
Y
Stereochemistry-induced gaps
X
Y
Stereochemistry-induced gaps
X Y
Reaction balancing
• Reaction elemental and charge balancing• Prevents mass violation and inconsistencies arriving
from “magical” production or disappearance of matter
• KEGG and MetaCyc reactions don’t always balance• Incorrect stoichiometry• Missing protons, water, etc.
• Solution: use linear programming
Reaction balancing
carbon dioxide + 2-Acetolacetate Pyruvate
Ab = 0
A =
Reactants Products Optional reactants Optional productsCO2 C5H7O4 C3H3O3 H+ H20 H+ H20 CO2
C 1 5 -3 0 0 0 0 -1O 2 4 -3 0 1 0 -1 -2H 0 7 -3 1 2 -1 -2 0charge 0 -1 1 1 0 -1 0 0
bmin 1 1 1 0 0 0 0 0
b represents a vector of stoichiometries
CO2 + C5H7O4- C3H3O3
-
Reaction balancing
• Linear programming solver solves Ab = 0
Reactants Products Optional reactants Optional productsCO2 C5H7O4 C3H3O3 H+ H20 H+ H20 CO2
C 1 5 -3 0 0 0 0 -1O 2 4 -3 0 1 0 -1 -2H 0 7 -3 1 2 -1 -2 0charge 0 -1 1 1 0 -1 0 0
bmin 1 1 1 0 0 0 0 0b 1 1 2 0 0 1 0 0
carbon dioxide + 2-Acetolacetate 2 Pyruvate + H+
CO2 + C5H7O4- 2 C3H3O3
- + H+
Compartmentalisation
• Determination of intracellular compartment in which enzymes operate
• Two approaches:• Extract curated information from UniProt annotation• Extract protein sequence from UniProt and pipe to
WoLF PSORT localisation prediction algorithm
• Infer reaction localisation from enzyme localisation
Biomass function
• Flux Balance Analysis requires an objective function to maximise• Traditionally, a biomass function is specified• Simulates cell growth
• Subliminal adds a generic biomass function• Production of amino acids, nucleotides, lipids, ATP
• Formats model such that it can be loaded into the COBRA Toolbox
KEGG MetaCyc
Merge pathways
Balance reactions
Add transport reactions
Draft
(De)protonate metabolites
Balance reactions
(De)protonate metabolites
Merge
Add compartmentalisation
Add biomass reaction
Analysis
• Goal: can biomass be generated from growth medium?
• Simulate biomass generation
• Specify a “sensible” growth medium• Glucose, NH4+, etc.• Only histidine had to be added to the growth medium• Suggests good connectivity• BUT suggests gap(s) in histidine synthesis pathways
Analysis
Components Subliminal ManualCompartments 7 17Unique metabolites 1277 728Unique enzymes 847 939Unique metabolic reactions 1394 947Unreachable metabolites 1281/2287 (57%) 75/758 (9.9%)Blocked reactions 728/1687 (43%) 140/1102 (13%)
• Many more metabolites• Better coverage? Poor merging?
• Many unreachable metabolites• Many blocked reactions
• Network gaps?
Future developments
• Directionality• Use thermodynamic predictions of reaction
reversibility• Possible to automate due to extraction of chemical
structures (SMILES, InChI) from ChEBI
• Editing• Online, graphical editor (with checking) would be
incredibly useful• Difficult to render genome-scale reconstructions• Pathway by pathway?
• WikiPathways? Payao?
Conclusion
• Many steps can be automated in generating genome-scale metabolic reconstructions
• Additional modules would be useful
• Manual curation still necessary… but…• Subliminal Toolbox is modular
• Can be used in manual curation phase• Back-end for graphical editors?
• Approach is better than starting from scratch• Capable of producing reconstructions covering central
carbon metabolism
Thanks…
The Subliminal Toolbox: automating steps in the
reconstruction of metabolic networks
Neil SwainstonManchester Centre for Integrative Systems Biology
Integrative Bioinformatics 2011, Wageningen, Netherlands22 March 2011
Transporters
• Transporters are required to transport metabolites into and out of the cell
• TransportDB is a source of transporter proteins• BUT not comprehensive enough to assign these to
individual reactions
• Approach taken is a pragmatic one• Add all transport proteins from TransportDB• Generate transport reactions for ALL metabolites• Map the proteins to the reactions manually