the subliminal toolbox: automating steps in the reconstruction of metabolic networks

Post on 18-Nov-2014

1.979 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

The Subliminal Toolbox: automating steps in the

reconstruction of metabolic networks

Neil SwainstonManchester Centre for Integrative Systems Biology

Integrative Bioinformatics 2011, Wageningen, Netherlands22 March 2011

Metabolic networks

• Computational and mathematical representation of the metabolic capabilities of a given organism

• On a genome-scale• ~1000 unique metabolites• ~1000 unique reactions

• Predictive, simulatable

Metabolic networks

Metabolic reactions

A + B

C + D

Gene / enzyme

E1Protonation

C + DH+Mass balancing

A + 2B + H+

AextExtracellular

Intracellular

Transport reactions

Mitochondria

Cytosol

Cm

Compartmentalisation

biomass

aa nucl

Biomass objective

T1

• Goal: generate biomass from growth medium?

How are they generated?

• Traditionally: manually

• Start with KEGG download? Genome sequence?• Collated / edited in spreadsheets• Many steps done by hand• Curated in focussed meetings (“jamborees”)

• Expensive• Boring

Automation

• Many of these steps can be automated• Subliminal Toolbox

• Goal is to generate a metabolic reconstruction automatically• Manual curation still necessary• BUT reduce what needs to be done

• Investigation• Can we automate the generation of a metabolic

network in yeast?

KEGG MetaCyc

Merge pathways

Balance reactions

Add transport reactions

Draft

(De)protonate metabolites

Balance reactions

(De)protonate metabolites

Merge

Add compartmentalisation

Add biomass reaction

Initial draft

• Both KEGG and MetaCyc allow export of pathways / networks in SBML

• BUT these are representations of the database, NOT computational models

• Merging issue:• Components are named inconsistently

Naming

• Glucose, glc, D-glucose, alpha-D-glucose?

• Need to be reconciled

• Use semantic annotations• ChEBI terms for metabolites• UniProt terms for enzymes• Apply MIRIAM standard (RDF and URIs)

MIRIAM annotation

<species metaid="_glc" id="glc" name="D-Glucose">

</species>

MIRIAM annotation

<species metaid="_glc" id="glc" name="D-Glucose">

<annotation>

<rdf:RDF>

<rdf:Description rdf:about="#_glc">

<bqbiol:is>

<rdf:Bag>

<rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI:17634"/>

</rdf:Bag>

</bqbiol:is>

</rdf:Description>

</rdf:RDF>

</annotation>

</species>

Exploiting model annotations

• MIRIAM annotations provide unambiguous, unique identifiers for model components

• But – also provide link to chem/bioinformatics resources via web services• Models become “live” and increase in utility as

resources develop• Kinetic parameters accessible from ChEBI• Improving annotation in UniProt (phospho sites,

etc.)• Extract data through web services

UniProtChEBIKEGG

libAnnotationSBML

MIRIAM / RDF annotation Molecular formula, protein sequence, etc.

Merging

• Standard identifiers: job done?

• Inconsistent charge states• Pyruvic acid and pyruvate

Charge state determination

• Annotated ChEBI terms provides web service access to structural data• InChI, SMILES strings• InChI=1/C3H4O3/c1-2(4)3(5)6/h1H3,(H,5,6)/p-1/fC3H3O3/q-1• CC(=O)C([O-])=O

• Cheminformatics software (ChemAxon MARVIN) can be used to predict charge state at given pH• Consistency

✓✗

Stereochemistry

• KEGG and MetaCyc are inconsistent in their definition of stereochemical precision

• Considered different: apparently minor but can cause gaps in the network

beta-D-glucose D-glucose

Stereochemistry-induced gaps

X

Y

ChEBI ontology

• ChEBI is an ontology and contains relationships between metabolites

Stereochemistry-induced gaps

X

Y

Stereochemistry-induced gaps

X

Y

Stereochemistry-induced gaps

X Y

Reaction balancing

• Reaction elemental and charge balancing• Prevents mass violation and inconsistencies arriving

from “magical” production or disappearance of matter

• KEGG and MetaCyc reactions don’t always balance• Incorrect stoichiometry• Missing protons, water, etc.

• Solution: use linear programming

Reaction balancing

carbon dioxide + 2-Acetolacetate Pyruvate

Ab = 0

A =

Reactants Products Optional reactants Optional productsCO2 C5H7O4 C3H3O3 H+ H20 H+ H20 CO2

C 1 5 -3 0 0 0 0 -1O 2 4 -3 0 1 0 -1 -2H 0 7 -3 1 2 -1 -2 0charge 0 -1 1 1 0 -1 0 0

bmin 1 1 1 0 0 0 0 0

b represents a vector of stoichiometries

CO2 + C5H7O4- C3H3O3

-

Reaction balancing

• Linear programming solver solves Ab = 0

Reactants Products Optional reactants Optional productsCO2 C5H7O4 C3H3O3 H+ H20 H+ H20 CO2

C 1 5 -3 0 0 0 0 -1O 2 4 -3 0 1 0 -1 -2H 0 7 -3 1 2 -1 -2 0charge 0 -1 1 1 0 -1 0 0

bmin 1 1 1 0 0 0 0 0b 1 1 2 0 0 1 0 0

carbon dioxide + 2-Acetolacetate 2 Pyruvate + H+

CO2 + C5H7O4- 2 C3H3O3

- + H+

Compartmentalisation

• Determination of intracellular compartment in which enzymes operate

• Two approaches:• Extract curated information from UniProt annotation• Extract protein sequence from UniProt and pipe to

WoLF PSORT localisation prediction algorithm

• Infer reaction localisation from enzyme localisation

Biomass function

• Flux Balance Analysis requires an objective function to maximise• Traditionally, a biomass function is specified• Simulates cell growth

• Subliminal adds a generic biomass function• Production of amino acids, nucleotides, lipids, ATP

• Formats model such that it can be loaded into the COBRA Toolbox

KEGG MetaCyc

Merge pathways

Balance reactions

Add transport reactions

Draft

(De)protonate metabolites

Balance reactions

(De)protonate metabolites

Merge

Add compartmentalisation

Add biomass reaction

Analysis

• Goal: can biomass be generated from growth medium?

• Simulate biomass generation

• Specify a “sensible” growth medium• Glucose, NH4+, etc.• Only histidine had to be added to the growth medium• Suggests good connectivity• BUT suggests gap(s) in histidine synthesis pathways

Analysis

Components Subliminal ManualCompartments 7 17Unique metabolites 1277 728Unique enzymes 847 939Unique metabolic reactions 1394 947Unreachable metabolites 1281/2287 (57%) 75/758 (9.9%)Blocked reactions 728/1687 (43%) 140/1102 (13%)

• Many more metabolites• Better coverage? Poor merging?

• Many unreachable metabolites• Many blocked reactions

• Network gaps?

Future developments

• Directionality• Use thermodynamic predictions of reaction

reversibility• Possible to automate due to extraction of chemical

structures (SMILES, InChI) from ChEBI

• Editing• Online, graphical editor (with checking) would be

incredibly useful• Difficult to render genome-scale reconstructions• Pathway by pathway?

• WikiPathways? Payao?

Conclusion

• Many steps can be automated in generating genome-scale metabolic reconstructions

• Additional modules would be useful

• Manual curation still necessary… but…• Subliminal Toolbox is modular

• Can be used in manual curation phase• Back-end for graphical editors?

• Approach is better than starting from scratch• Capable of producing reconstructions covering central

carbon metabolism

Thanks…

The Subliminal Toolbox: automating steps in the

reconstruction of metabolic networks

Neil SwainstonManchester Centre for Integrative Systems Biology

Integrative Bioinformatics 2011, Wageningen, Netherlands22 March 2011

Transporters

• Transporters are required to transport metabolites into and out of the cell

• TransportDB is a source of transporter proteins• BUT not comprehensive enough to assign these to

individual reactions

• Approach taken is a pragmatic one• Add all transport proteins from TransportDB• Generate transport reactions for ALL metabolites• Map the proteins to the reactions manually

top related