an expanded genomescale model of escherichia coli

Upload: peeyush-kumar

Post on 04-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    1/12

    Genome Biology 2003, 4:R54

    Open Access2003Reedet al. Volume 4, Issue 9, Article R54Research

    An expanded genome-scale model of Escherichia coli K-12 ( i JR904GSM/GPR)

    Jennifer L Reed*

    , Thuy D Vo*

    , Christophe H Schilling

    andBernhard O Palsson *

    Addresses: *Department of Bioengineering, University of California, San Diego, Gilman Drive, La Jolla, CA 92092, USA. Genomatica, Inc.,Morehouse Drive, San Diego, CA 92121, USA.

    Correspondence: Bernhard O Palsson. E-mail: [email protected]

    2003 Reed et al .; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all mediafor any purpose, provided this notice is preserved along with the article's original URL. An expanded genome-scale model of Escherichia coli K-12 (i JR904 GSM/GPR)Diverse datasets, including genomic, transcriptomic, proteomic and metabolomic data, are becoming readily available for specific organ-isms. There is currently a need to integrate these datasets within an in silico modeling framework. Constraint-based models of Escherichiacoli K-12 MG1655 have been developed and used to study the bacteriums metabolism and phenotypic behavior. The most comprehensiveE. coli model to date (E. coli iJE660a GSM) accounts for 660 genes and includes 627 unique biochemical reactions.

    Abstract

    Background: Diverse datasets, including genomic, transcriptomic, proteomic and metabolomicdata, are becoming readily available for specific organisms. There is currently a need to integratethese datasets within an in silicomodeling framework. Constraint-based models of Escherichia coli K-12 MG1655 have been developed and used to study the bacterium's metabolism and phenotypicbehavior. The most comprehensive E. colimodel to date ( E. coli i JE660a GSM) accounts for 660genes and includes 627 unique biochemical reactions.

    Results: An expanded genome-scale metabolic model of E. coli (i JR904 GSM/GPR) has beenreconstructed which includes 904 genes and 931 unique biochemical reactions. The reactions inthe expanded model are both elementally and charge balanced. Network gap analysis led toputative assignments for 55 open reading frames (ORFs). Gene to protein to reaction associations(GPR) are now directly included in the model. Comparisons between predictions made by i JR904and i JE660a models show that they are generally similar but differ under certain circumstances.Analysis of genome-scale proton balancing shows how the flux of protons into and out of themedium is important for maximizing cellular growth.

    Conclusions: E. coli i JR904 has improved capabilities over i JE660a. i JR904 is a more complete andchemically accurate description of E. colimetabolism than i JE660a. Perhaps most importantly, i JR904can be used for analyzing and integrating the diverse datasets. i JR904 will help to outline thegenotype-phenotype relationship for E. coliK-12, as it can account for genomic, transcriptomic,proteomic and fluxomic data simultaneously.

    Background Escherichia coli is perhaps the best characterized and studied bacterium and is of interest industrially, genetically and path-ologically. For these reasons, in silico modeling efforts have been made to describe and predict its cellular behavior. Withthe vast amounts of '-omics' data that are being generated,there is a growing need for incorporating and reconciling

    heterogeneous datasets, including genomic, transcriptomic,proteomic and metabolomic data [ 1]. A constraint-basedmodel of E. coli metabolism can accomplish this and serve asa model centric database.

    In addition to providing the context for various '-omics' datatypes, constraint-based models provide a framework to

    Published: 28 August 2003

    Genome Biology 2003, 4:R54

    Received: 9 May 2003Revised: 11 July 2003Accepted: 18 July 2003

    The electronic version of this article is the complete one and can befound online at http://genomebiology.com/2003/4/9/R54

    http://www.biomedcentral.com/info/about/charter/http://genomebiology.com/2003/4/9/R54http://genomebiology.com/2003/4/9/R54http://www.biomedcentral.com/info/about/charter/http://genomebiology.com/2003/4/9/R54
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    2/12

    R54.2 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003, 4:R54

    compute cellular functions [ 2]. This modeling method findsthe limits of cellular, biochemical and systemic functions,thereby identifying all allowable solutions. Searches withinthe allowable solution space can identify solutions of interest,for example a solution that maximizes a particular objective.This approach to genome-scale model building has beenreviewed in detail [ 3-6]. In general, the application of succes-

    sive constraints (stoichiometric, thermodynamic and enzymecapacity constraints), with respect to the metabolic network,restricts the number of possible solutions. Linear optimiza-tion is often used to find a particular solution in the allowablesolution space that maximizes a chosen objective function,such as cellular growth (Figure 1). A more detailed descrip-tion of the constraint-based modeling approach can be foundin Materials and methods.

    The constraint-based modeling approach has been used tostudy E. coli metabolism for over ten years; the history of suchmodel building efforts has recently been reviewed [ 7]. Thefirst genome-scale metabolic (GSM) model accounting for

    660 gene products ( i JE660 GSM) was reconstructed usinggenomic information, biochemical data and physiologicaldata [ 8 ]. This genome-scale model has been used to performin silico gene deletion studies [ 8 ] and to predict both optimalgrowth behavior [ 9] and the outcome of adaptive evolution[10].

    This paper reports an expansion of i JE660a GSM, which itself is a slight modification of the original genome-scale meta- bolic model ( i JE660 GSM) [ 8]. Gene to protein to reaction(GPR) associations are included directly in the new model(i JR904 GSM/GPR). These associations describe the

    dependence of reactions on proteins and proteins on genes(Figure 2). The metabolic network described by i JR904 hasalso changed; individual reactions are now elementally andcharge balanced, and a significant number of new genes andnovel reactions have been added to the model. i JR904 GSM/GPR accounts for over 904 genes and the 931 unique bio-chemical reactions the encoded proteins carry out. This paper

    discusses the effects that these additional reactions have onthe predictive capabilities of the model and identifies putativeORFs in the genome which could resolve gaps in the meta- bolic network.

    Since computational models of E. coli will continue to grow insize and scope [ 7] it will become important to be able to dis-tinguish between the different models - a naming convention will aid in this effort. The naming convention we chose to usemirrors the one already established for plasmids. The generalform of the names of in silico strains used is i XXxxxa YYY.The ' i' in the name refers to an in silico model (that is, a com-puter model). This ' i ' is followed by the initials (XX) of the

    person who developed the model and then the number of genes (xxx) included in the model. Any letters (a) after thenumber of genes indicates that slight modifications weremade to the model, for instance i JE660a is derived fromi JE660. Further designation of the content and scope of amodel are found in YYY; here the acronyms GSM and GPR stand for genome-scale model and gene-protein-reactionassociations, respectively. The contents of i JE660a andi JR904 can be found on our website [ 11], and i JR904 is alsodetailed in the additional data files provided with this publi-cation online.

    Principles of constraint-based modelingFigure 1Principles of constraint-based modeling. A three-dimensional flux space for a given metabolic network is depicted here. Without any constraints the fluxescan take on any real value. After application of stoichiometric, thermodynamic and enzyme capacity constraints, the possible solutions are confined to a

    region in the total flux space, termed the allowable solution space. Any point outside of this space violates one or more of the applied constraints. Linearoptimization can then be applied to identify a solution in the allowable solution space that maximizes or minimizes a defined objective, for example ATP orbiomass production [ 3-6].

    Constraints

    1) S .v = 02) v i

    Optimization

    Unconstrainedsolution space

    Optimal solution

    maximize objective

    Allowablesolution space

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    3/12

    http://genomebiology.com/2003/4/9/R54 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. R54.

    Genome Biology 2003, 4:R54

    Results and discussionProperties of the i JR904 metabolic network An update on the annotation of the E. coli K-12 genome waspublished in 2001 [ 12], facilitating the process of updating thegenome-scale in silico E. coli model ( i JE660a GSM). Genesencoding known or putative enzymes and transporters notincluded in the i JE660a model were further examined. Liter-ature and database searches (LIGAND [ 13], EcoCyc [ 14] andTC-DB [ 15]) on each of the genes provided the biochemicalinformation needed to expand E. coli i JE660a. The i JR904model was built using the software SimPheny (Genomatica,San Diego, CA) and accounts for 904 genes with a knownlocus in the genome, as compared to 660 genes in the previ-ous model.

    The metabolic network described by E. coli i JE660a hasexpanded in size from 627 unique reactions and 438 metabo-lites to 931 unique reactions and 625 metabolites in i JR904.Complete maps containing all the reactions in the metabolicnetwork are available in the additional data files and can also be downloaded from [ 11]. The molecular formulae andcharges for the metabolites in the model were determinedassuming a pH of 7.2. Fifty-eight of the reactions in i JR904currently do not have associated genes. A complete list of thereactions can be found in the additional data files. Putativefunctional gene assignments account for 23 of the added reac-tions, with the majority of these being putative transporters.

    In addition to these new reactions, old reactions wereupdated to be both elementally and charge balanced by including water and protons as participants in the reactions.Six reactions in i JR904 are elementally balanced but notcharge balanced (Table 1). Five out of the six reactions areimbalanced because they have not been fully characterized biochemically so assumptions had to be made about the par-ticipating metabolites; the remaining reaction (abbreviated ADOCBLS) is charge imbalanced since we were unable to

    GPR AssociationsLevel

    Succinate dehydrogenase

    D-Xylose ABC transporter

    Glyceraldehyde 3-phosphate dehydrogenase

    Gene

    Peptide

    Protein

    Reaction

    Gene

    Peptide

    Protein

    Reaction

    Gene

    Peptide

    Protein

    Reaction

    b0721

    b3567 b3566 b3568

    xylG xylF xylH

    b1779 b1416 b1417

    gapA gapC_2 gapC_1

    XylG

    GapA GapC

    XylF XylH

    b0722 b0723 b0724

    sdhC sdhD

    SUCD1i

    XYLabc

    GAPD

    SUCD4

    sdhA

    Sdh

    sdhB

    &

    &

    &

    Representation of gene to protein to reaction (GPR) associationsFigure 2Representation of gene to protein to reaction (GPR) associations. Eachgene included in the model is associated with at least one reaction.Examples of different types of associations are shown, where the top layeris the gene locus, the second layer is the translated peptide, the third layeris the functional protein and the bottom layer is the reaction (shown as itscorresponding abbreviation listed in the additional data file). Subunits (forexample, sdhABCD and gapC_1C_2) and enzyme complexes (forexample, xylFGH) are connected to reactions with '&' associations,indicating that all have to be expressed for the reaction to occur. ForsdhABCD, the '&' is shown above the functional protein level, denotingthat all of these gene products are needed for the functional enzyme. WithxylFGH the '&' association is shown above the reaction level, indicatingthat the different proteins form a complex that carries out the reaction.Isozymes (for example, gapC_1C_2 and gapA) are independent proteinswhich carry out identical reactions where only one of the isozymes needsto be present for the reaction to occur. Isozymes are shown as two ormore arrows leaving different proteins but impinging on the samereaction.

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    4/12

    R54.4 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003, 4:R54

    determine the charge associated with the ion complex. The biomass reaction [ 8], representing the drain of biosynthetic

    constituents from the network and the growth-associated ATP requirement, was also changed to include internal pro-tons and water. The amount of water needed in the biomassreaction is equal to the amount of ATP hydrolyzed to meet thegrowth-associated ATP requirement. The hydrolysis of ATPresults in the production of a proton while the utilization of NADPH and NADH consumes a proton; this results in the netproduction of protons in the biomass reaction.

    Other updates to the i JE660a reaction network are also nota- ble. A number of reactions in i JE660a could not previously beassigned to an ORF, but are now assigned to an ORF as aresult of the updated genome annotation [ 12], such as mtn

    and uppS . Other ORF names have changed and these werealso updated. Some of the original model reactions were mod-ified in addition to including internal protons and water.These modifications mainly included changes in the stoichio-metric coefficients, cofactor usage and reaction reversibility.In some cases, the metabolites that participate in the reaction were changed. Forty-two reactions were removed fromi JE660a, and these are listed in the additional data files along with the reasons for their removal.

    i JR904 also accounts for the specificity of the quinones in theindividual reactions involved in the electron transport chain(Figure 3). E. coli K-12 uses three quinones: ubiquinone (Q),

    menaquinone (MK) and demethylmenaquinone (DMK) totransfer electrons from the electron donor to the terminalacceptor; in i JE660a there was no distinction between thethree quinones and they were all treated as ubiquinone, which led to inaccurate electron donor/electron acceptorpairs.

    GPR associations are for the first time directly included in thei JR904 model, and examples of some are shown in Figure 2.GPR associations have been constructed and their images can be found in the additional data files. These GPR associationscan be used to evaluate the reactions remaining in the

    metabolic network after deletion of a specific gene. Includingthese associations directly into i JR904 will lead to a moreaccurate assessment of the effects of gene deletions. In addi-tion, these GPR associations are necessary for analyzingdiverse datasets by the model and using these datasets to fur-ther identify physiological states.

    Thus, i JR904 accounts more accurately for a number of the

    metabolic processes in E. coli K-12 MG1655, and expands inscope significantly through the addition of GPR associations. E. coli i JR904 GSM/GPR is no longer a purely metabolicmodel.

    Systemic properties A list of network components alone does not explain how thecomponents work together to produce a biological function.These systemic properties can only be investigated when con-sidering all the components simultaneously; it is the interac-tion of these components that provides the most informationabout cellular behavior. As with i JE660a, a myriad of other

    Table 1

    List of charge imbalanced reactions used in i JR904

    Abbreviation Reaction

    ADOCBLS agdpcbi + rdmbzi adocbl + gmp + hAMPMS air + h2o 4ampm + (2) for + (4) hBTS2 cys-L + dtbt ala-L + btn + (2) hDHNAOT dhna + octdp 2dmmq8 + co2 + h + ppiDKMPPD2 dkmpp + (3) h2o 2kmb + for + (6) h + piMECDPDH 2mecdp + h h2mb4p + h2o

    All of these reactions are elementally balanced but not charge balanced(see Additional data file for definition of metabolite abbreviations).

    Quinone specificity for electron donors and terminal acceptorsFigure 3Quinone specificity for electron donors and terminal acceptors. E. colii JR904 also accounts for the quinone specificity of different reactions.Electron donors are listed on the left, terminal electron acceptors are onthe right, and the three types of quinones (DMK, MK, Q; note that theseabbreviations differ from those listed in the additional data files) that serveas carriers are shown in the middle. The electron donors shown in redwere able to donate their electrons to fumarate in i JE660a, but are nowunable to do so. DMK, demethylmenaquinone; DMSO, dimethyl sulfoxide;

    MK, menaquinone; Q, ubiquinone; TMAO, trimethylamine N-oxide.

    O2

    NO 3NO 2

    FumarateDMSOTMAO

    SuccinatePyruvateD-Lactate

    D-AlanineL-Proline2-Octaprenyl-3-methyl-5-hydroxy- 6-methoxy-1,4-benzoquinol

    H2NADH

    Glycerol-3- phosphate

    MK

    DMK

    Q

    Electrondonors

    Electronacceptors

    Quinones

    S-DihydrooroateL-Aspartate

    FormateL-Lactate

    http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    5/12

    http://genomebiology.com/2003/4/9/R54 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. R54.

    Genome Biology 2003, 4:R54

    issues can be addressed with i JR904. We will address threetypes of issues in this paper using i JR904: gap analysis andputative ORF assignments, the importance of global proton balancing, and phase plane analysis [ 16].

    Identification and resolution of dead ends A 'dead end' exists in a metabolic network if a metabolite iseither only produced or only consumed in the network. If ametabolic network contains a gap, it is missing the biochemi-cal reactions that can produce or consume the dead endmetabolites. i JR904 has 70 dead end metabolites or gaps inthe network; these are listed in the additional data files. These70 metabolites participate in 89 reactions, indicating that atleast 89 model reactions can never be used if the network is tooperate at steady state. The reactions that lead up to the deadends in i JR904 are included so that when the gaps are filledin at a future date, the network will be fully functional. Someof these network gaps could be reconciled by the addition of

    transporters, while others could be reconciled by modifyingthe growth function, that is, including dead end metabolitesas requirements for biomass production. Neither of thesesteps was taken here since genomic or biochemical evidencecould not be found to support their inclusion.

    Using these gaps, we attempted to identify new functionalassignments based on sequence homology searches. A list of EC numbers corresponding to enzymes (not included in themodel) that could resolve the gaps, was generated. This list was pooled together with the enzymes known to occur in E.coli which lack assigned loci (EcoCyc [ 14]). We subsequently collected amino acid sequences from other organisms

    (orthologous sequences) assigned to these enzymatic func-tions for a homology search study.

    A total of 83 training sets, each with an average of 11 ortholo-gous sequences, was collected and compared against the E.coli genome. Each training set includes multiple orthologoussequences that correspond to an enzyme of interest. Of the 83training sets, 61 are for enzymes which connect to gaps in theexisting network and the remaining 22 are for some of theenzymes listed in EcoCyc. Each training set was processedusing the alignment programs MEME [ 17] and ClustalW [ 18]to generate a profile for the corresponding enzyme. Usingthese profiles MAST [ 19] and HMMER [ 20 ] were used to

    identify similar ORFs in the E. coli genome. Of the 61 enzymesthat could resolve network gaps, we assigned putative loci for12 of them. In addition, putative loci could be found for 15 of the 22 enzymes listed in EcoCyc. Some of these enzymes havemultiple matches within the E. coli genome, which togetherresulted in 55 putative assignments.

    These results were inspected manually and found to be rele- vant and consistent across the three search methods (MAST, with and without end-gap penalty, and HMMER). The resultsof this study largely coincided with the annotation performed by Serres et al. [12]; however, most annotation updates added

    more specificity to the type of reactions the enzymes werepredicted to catalyze and, in some cases, suggested additionalsubstrates that known enzymes might act upon (Table 2).Table 2 lists the top three matches for each enzyme (except forthe case when the match is a known isozyme); a complete list

    of all results and expected values (e-values) can be found inthe additional data files. The putative assignments presentedin Table 2 should be used with care. For example, there aremultiple putative assignments for the acyl-CoA dehydroge-nase enzyme for which the gene has recently been found [ 21], but the actual gene locus for this enzyme (b0221) does nothave the most significant e-value among the list of potentialloci.

    Effects of constraining proton exchange flux The importance of balancing protons internally can now beinvestigated with i JR904 since the metabolic network accounts for all the protons being generated and consumed by

    the individual metabolic and transport reactions (only exter-nal protons associated with the proton motive force wereaccounted for in i JE660a). The medium can serve as a pool both supplying and dissipating external protons as needed by the cell. During growth on some carbon sources, the genera-tion of internal protons by the metabolic reactions is relieved by secreting protons into the medium. This subsequently reduces the amount of ATP made by ATPase since theseprotons could be used to drive this reaction. Under other con-ditions, a shortage of internal protons is compensated for by taking up protons from the medium and transporting themacross the cell membrane into the cytosol.

    The effects that the exchange of protons across the system boundary have on predicted growth rates were investigated. A robustness analysis [ 22 ], in which the flux through the protonexchange reaction was constrained from its optimal valuedown to zero, was performed under aerobic conditions for a variety of carbon sources (Figure 4). The predicted growthrates for different carbon sources respond differently as theexchange flux of protons (between the cell and the medium)is reduced to zero; glucose and glycerol were the most sensi-tive to proton exchange while D- and L-lactate were the leastsensitive.

    When either glucose or glycerol was used as the carbon

    source, excess protons were generated intracellularly; thisexcess was relieved by secreting extracellular protons,thereby lowering the pH of the medium. For pyruvate, D- andL-lactate, acetate, -ketoglutarate ( KG), succinate andmalate there is a shortage of internal protons; as a result, cells would uptake protons from the medium thereby raising thepH. Since there can be no net accumulation of charge withinthe system, the total charge entering the system must equalthe charge leaving the system. Pyruvate, lactate, acetate, -ketoglutarate, succinate and malate, as used in these simula-tions, have a negative charge so H + must be taken up;however, if the uncharged acidic form of these compounds

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    6/12

    R54.6 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003, 4:R54

    Table 2

    List of ORFs in E. coli with updated annotations based on sequence similarity

    EC number Bnum Gene Published annotation [ 12] Suggested annotation

    1.1.1.48 b1315 ycjS Putative NADH-dependent dehydrogenase D-Galactose 1-dehydrogenase1.1.1.48 b1624 ydgJ Putative NAD(P)-binding dehydrogenase D-Galactose 1-dehydrogenase1.1.1.48 b3440 yhhX Putative NAD(P)-binding dehydrogenase D-Galactose 1-dehydrogenase1.1.1.5 b2426 ucpA Putative oxidoreductase, NAD(P)-binding Acetoin dehydrogenase, diacetyl reductase1.1.1.5 b2137 yohF Putative oxidoreductase Acetoin dehydrogenase, diacetyl reductase1.1.1.5 b4266 idnO 5-Keto-D-gluconate-5-reductase Acetoin dehydrogenase, diacetyl reductase1.2.1.19 b1385 feaB Phenylacetaldehyde dehydrogenase Aminobutyraldehyde dehydrogenase1.2.1.19 b1444 ydcW Putative aldehyde dehydrogenase Aminobutyraldehyde dehydrogenase1.2.1.19 b3588 aldB Aldehyde dehydrogenase B (lactaldehyde dehydrogenase) Aminobutyraldehyde dehydrogenase1.2.1.24 b1385 feaB Phenylacetaldehyde dehydrogenase Succinate-semialdehyde dehydrogenase1.2.1.24 b1444 ydcW Putative aldehyde dehydrogenase Succinate-semialdehyde dehydrogenase1.2.1.24 b1415 aldA Aldehyde dehydrogenase A, NAD-linked Succinate-semialdehyde dehydrogenase1.2.7.1 b1378 nifJ Putative pyruvate-flavodoxin oxidoreductase Pyruvate synthase1.3.1.2 b2146 yeiT Putative glutamate synthase Dihydrothymine dehydrogenase1.3.1.2 b2147 yeiA Putative dihydropyrimidine dehydrogenase, FMN-

    linkedDihydrothymine dehydrogenase

    1.3.1.2 b2878 b2878 Putative oxidoreductase Dihydrothymine dehydrogenase1.3.99.3 b1695 ydiO Putative acyl-CoA dehydrogenase Acyl-CoA dehydrogenase1.3.99.3 b0039 caiA Putative acyl-CoA dehydrogenase, carnitine metabolism Acyl-CoA dehydrogenase1.3.99.3 b4187 aidB Putative acyl-CoA dehydrogenase; adaptive response

    (transcription activated by Ada)Acyl-CoA dehydrogenase

    1.6.6.9 b3551 bisC Biotin sulfoxide reductase Trimethylamine-N-oxide reductase (TMAOreductase II)

    1.6.6.9 b1587 b1587 Putative reductase Trimethylamine-N-oxide reductase (TMAOreductase II)

    1.6.6.9 b1588 b1588 Putative reductase Trimethylamine-N-oxide reductase (TMAOreductase II)

    1.6.99.2 b0046 yabF Putative electron transfer flavoprotein-NAD/FAD/quinoneoxidoreductase, subunit for KefC K+ efflux system

    NAD(P)H dehydrogenase

    1.6.99.2 b3351 yheR Putative electron transfer flavoprotein-NAD/FAD/quinoneoxidoreductase

    NAD(P)H dehydrogenase

    1.6.99.2 b0901 ycaK Putative electron transfer flavoprotein-NAD/FAD/quinoneoxidoreductase

    NAD(P)H dehydrogenase

    2.3.3.11(formerly4.1.3.9)

    b2264 menD Bifunctional modular MenD: 2-oxoglutaratedecarboxylase and SHCHC synthase

    2-Hydroxyglutarate synthase

    2.3.3.11(formerly4.1.3.9)

    b3671 ilvB Acetolactate synthase I, large subunit, valine-sensitive,FAD and thiamine PPi binding

    2-Hydroxyglutarate synthase

    2.7.1.23 b2615 yfjB Conserved protein NAD+ kinase

    2.7.1.23 b3916 pfkA 6-phosphofructokinase I NAD+ kinase2.7.2.1 b3115 tdcD Propionate kinase/acetate kinase II, anaerobic Acetate kinase (ackB)2.7.8.2 b1408 b1408 Putative enzyme Diacylglycerol

    cholinephosphotransferase2.8.1.2 b2521 sseA Putative sulfurtransferase 3-Mercaptopyruvate sulfurtransferase2.8.1.2 b1757 ynjE Putative thiosulfate sulfur transferase 3-Mercaptopyruvate sulfurtransferase2.8.3.8 b2222 atoA Acetyl-CoA:acetoacetyl-CoA transferase, beta subunit Acetate CoA-transferase2.8.3.8 b2221 atoD Acetyl-CoA:acetoacetyl-CoA transferase, alpha subunit Acetate CoA-transferase2.8.3.8 b1694 ydiF Putative acetyl-CoA:acetoacetyl-CoA transferase alpha subunit Acetate CoA-transferase3.1.1.31 b0678 nagB Glucosamine-6-phosphate deaminase 6-Phosphogluconolactonase (Pgl)3.1.1.31 b3718 yieK Putative isomerase 6-Phosphogluconolactonase (Pgl)

  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    7/12

    http://genomebiology.com/2003/4/9/R54 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. R54.

    Genome Biology 2003, 4:R54

    was used instead (and the acid-base reaction was included inthe network - generating proton(s) and the basic form of thecompound) it would be predicted that H + would be secreted by the cell thereby lowering the pH.

    It is generally thought that the pH of the medium becomesmore acidic as E. coli grows. A lowering of pH has beenobserved for growth on unbuffered medium with glycerol;however, during growth on unbuffered medium with diso-dium succinate or sodium acetate the medium becomes more basic (J.L.R. and B.O.P., unpublished data). Clearly i JR904 with charge-balanced reactions highlights the challenge thatcells have with globally balancing protons. As a part of theiterative model building process [2 ] i JR904 can be used todesign informative experiments to systematically address thisissue.

    Phenotypic phase plane (PhPP) comparisonsThe phenotypic phase planes [ 16] for growth on different car- bon sources were calculated using both models i JR904 andi JE660a. A description of phenotypic phase planes can befound in Materials and methods. For these simulations thecarbon uptake rate and oxygen uptake rate were varied. Thecarbon substrates tested included: glucose, pyruvate, acetate,glycerol, D-lactate, KG, succinate and malate. The phaseplanes for pyruvate, D-lactate and succinate calculated using

    i JR904 were nearly identical to those calculated with modeli JE660a (the lines demarcating the different phases only moved slightly). These results show that the modified andnew reactions contained in i JR904 did not significantly affectthe phase planes for these substrates, indicating that i JR904makes similar predictions regarding optimal growth on thesesubstrates.

    The line of optimality (LO) on a phenotypic phase plane cor-responds to the conditions (oxygen and substrate uptakerates) which can maximize the biomass yield. The largestshifts in the line of optimality were observed for growth onpyruvate and KG. However, these shifts are relatively smallindicating that the optimal oxygen uptake and carbon sourceuptake rates needed to generate the maximal amount of bio-mass do not change significantly (less than 10%).

    The phenotypic phase planes for carbon sources other thanpyruvate, D-lactate, and succinate have more significantchanges when calculated with i JR904 as compared toi JE660a. These include growth on the following carbonsources: glycerol, glucose, acetate, malate and KG. Theresulting phase planes calculated using i JR904 and i JE660aare shown in red and blue, respectively, in Figure 5a-e. Forthe malate and glucose phase planes (Figure 5a,b) one of thelines only appears on the phase plane calculated using

    3.1.1.31 b3141 agaI Putative galactosamine-6-phosphate i somerase 6-Phosphogluconolactonase (Pgl)3.2.1.68 b3431 glgX Glycosyl hydrolase Isoamylase

    3.5.1.2 b1524 yneH Putative glutaminase Glutaminase A or B3.5.1.2 b0485 ybaS Putative glutaminase Glutaminase A or B3.6.1.19 b2954 yggV Conserved hypothetical protein Nucleoside-triphosphate (ITP)

    diphosphatase3.6.1.22 b3996 yjaD Conserved hypothetical protein, MutT-like protein NAD+ diphosphatase3.6.1.5 b3615 yibD Putative glycosyltransferase Inosine triphosphate diphosphatase3.6.1.6 b3614 yibQ Conserved hypothetical protein Nucleoside (IDP) diphosphatase4.1.1.39 b3615 yibD Putative glycosyltransferase Ribulose-bisphosphate carboxylase (2-

    phosphoglycolate forming)4.1.2.10 b0311 betA Choline dehydrogenase, a flavoprotein Hydroxynitrile lyase4.1.2.10 b3168 infB Protein chain initiation factor IF-2 Hydroxynitrile lyase4.2.1.66 b3022 ygiU Conserved protein Cyanide hydratase6.2.1.12 b4069 acs Acetyl-CoA synthetase Hydroxycinnamate-CoA ligase6.2.1.12 b1701 ydiD Putative CoA-dependent ligase Hydroxycinnamate-CoA ligase6.2.1.12 b2836 aas bifunctional multimodular Aas: 2-acylglycerophospho-

    ethanolamine acyl transferaseHydroxycinnamate-CoA ligase

    6.2.1.4 b0728 sucC Succinyl-CoA synthetase, beta subunit Succinate-CoA ligase (ITP forming)6.2.1.4 b0729 sucD Succinyl-CoA synthetase, alpha subunit, NAD(P)

    bindingSuccinate-CoA ligase (ITP forming)

    Annotations shown in bold arose from analyzing the network gaps in i JR904.

    Table 2 (Continued)

    List of ORFs in E. coli with updated annotations based on sequence similarity

    http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    8/12

    R54.8 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003, 4:R54

    i J E 660a; these changes are attributed to the effects of globalproton balancing described in the previous section. This issuealso accounts for changes in the acetate phase plane (Figure5c) and is a contributing factor to the changes observed withthe glycerol and KG phase planes. However, other changes(discussed below) to the metabolic networks have more sig-nificant effects on these two phase planes.

    Glycerol phase planeThe changes in the glycerol phenotypic phase plane (Figure5d) were less drastic than those for the KG phase plane (Fig-ure 5e, see comments below). The only major change was that

    the lines in the microaerobic region shifted downward. Thischange is a result of the removal of two reactions involved inpyrridoxal recycling which were previously assigned to pdxH .The two reactions are not included in the metabolic network defined by i JR904 for reasons explained in the additionaldata files.

    -Ketoglutarate phase planeThe two-dimensional phenotypic phase plane for KG isnoticeably different when calculated for i JR904 and i JE660a(Figure 5e). One notable feature is that i JR904 predicts com-pletely anaerobic growth on KG, while i JE660a predicts that

    oxygen is required for growth on KG. Under oxygen limita-tions, corresponding to regions below the line of optimality,the expanded model is also more efficient at producing bio-mass from KG than the previous model. As oxygen becomesmore limiting, i JR904 becomes increasingly more efficient at

    generating biomass than i JE660a.

    Examination of the calculated optimal flux distributions pro- vides insights into why i JR904 is more efficient. One of thereasons for this increased growth efficiency is that i JR904includes the citrate lyase enzyme, which converts citrate(CIT) to acetate (AC) and oxaloacetate (OAA). During oxygen-limited growth, i JR904 predicts that some of the KG is con- verted to OAA by first reversing some of the TCA cycle reac-tions to generate CIT and then splitting CIT into AC and OAA by citrate lyase. The rest of the KG is consumed through theforward reactions of the TCA cycle to produce malate (Figure6). Removal of the citrate lyase reaction from the network

    under anaerobic conditions shows two other, less-efficientroutes that would still enable i JR904 to predict anaerobicgrowth. These new metabolic routes, which are dependent onat least one of the new additions, are depicted in Figure 5f.

    ConclusionsThis paper reports the curation and expansion of a previousgenome-scale constraint-based model of E. coli metabolism(i JE660a GSM) that is now used in multiple laboratories(A.L. Barabasi, personal communication; Church and col-leagues [ 23]; H. Greenberg, personal communication and C.Maranas, personal communication). This expanded model,

    i JR904 GSM/GPR, includes 37% more metabolic genes and47% more metabolic reactions. Each reaction in the network is now both elementally and charge balanced with the excep-tion of the six reactions listed in Table 1. While the new reac-tions added to the network do not change many of thepredicted optimal phenotypes, there are instances in whichthe expanded model makes significantly different predic-tions, examples of which occur when glycerol, glucose,malate, acetate and KG are used as the carbon sources underoxygen-limited conditions.

    The analysis of dead ends or gaps in the network has led toputative annotations of 55 ORFs. If these putative functional

    assignments are verified biochemically they can be includedin future updates of i JR904. Ideally an iterative process will be developed, within which the model can help identify new targets, and, if verified, can lead to an updated model. Thisiterative process [ 2] would be likely to produce more usefulresults in less-characterized organisms and has already beensuccessful in helping to identify malate dehydrogenase in Helicobacter pylori [24 ] and citrate synthase in Geobactersulfurreducens (D. Lovley, unpublished results).

    The incorporation of GPR associations into i JR904 will allow for the analysis of transcriptomic and proteomic data

    Effect of proton balancing on predicted growth rateFigure 4Effect of proton balancing on predicted growth rate. The figure showshow limiting the exchange of protons between the cell and the mediumaffects the predicted growth rates under aerobic conditions. The relativegrowth rate ( y -axis) is the ratio of the predicted growth rate when theproton exchange flux is limited over the predicted growth rate when theproton exchange flux is not limited (at its optimal value). Thesecalculations are for the conditions of aerobic growth on minimal media;the carbon source uptake rates were set to 5 mmol/g DW-hr and themaximum oxygen uptake rate was set to 20 mmol/g DW-hr. Carbonsources resulting in an outward flux of protons (lines to the right of the y -axis) would make the medium more acidic, while carbon sources resultingin an inward flux of protons (lines to the left of the y -axis) would make themedium more basic.

    0

    0.2

    0.4

    0.6

    0.8

    1

    10 8 6 4 2 0 2 4 6

    Acetate -KetoglutarateGlucoseL-LactateD-LactateMalatePyruvateSuccinateGlycerol

    Limiting exchange of protons across system boundary

    Proton secretion flux (mmol/g DW-hr)

    R e l a t

    i v e g r o w

    t h r a t e

    Producing an acidic environment

    Producing a basic environment

    http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    9/12

    http://genomebiology.com/2003/4/9/R54 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. R54.

    Genome Biology 2003, 4:R54

    directly; it also enables the incorporation of these datasets tofurther constrain the solution space leading to more accurate

    predictions of phenotypic data. E. coli i JR904 can now serveas a model centric database which could analyze and reconcile

    Comparisons between phenotypic phase planes (PhPP) calculated using i JR904 and i JE660aFigure 5Comparisons between phenotypic phase planes (PhPP) calculated using i JR904 and i JE660a. (a-e) The PhPP for growth on (a) malate, (b) glucose, (c)acetate, (d) glycerol, and (e) -ketoglutarate ( KG) are shown, where the red lines show the PhPP calculated from i JR904 and the blue lines the PhPPcalculated from i JE660a. The line of optimality (LO) corresponds to maximal biomass yield. (e) With KG as the carbon source, the phase planescalculated from i JR904 (red line) and i JE660a (blue line) are drastically different in the oxygen-limited region (area below the LO). (f) The differentmetabolic routes (along with their associated genes) added to the network in i JR904 that enable fermentative growth on KG are shown in red, blue andgreen. AC, acetate; CIT, citrate; GLYCLT, glycolate; GLYC-R, R-glycerate; GLX, glyoxylate; ICIT, isocitrate; KG, -ketoglutarate, OAA, oxaloacetate;3PG, 3-phospho-D-glycerate; SUCC, succinate; SUCCOA, succinyl-CoA.

    0

    5

    10

    15

    20

    0 5 10 15 20

    0

    5

    10

    O x y g e n u p

    t a k e r a

    t e

    ( m m o l

    / g D W - h r )

    O x y g e n u p

    t a k e r a

    t e

    ( m m o l

    / g D W - h r )

    Acetate uptake rate (mmol/g DW-hr) Glycerol uptake rate (mmol/g DW-hr)

    Malate uptake rate (mmol/g DW-hr)

    -Ketoglutarate uptake rate (mmol/g DW-hr)

    Glucose uptake rate (mmol/g DW-hr)

    O x y g e n u p

    t a k e

    r a t e

    ( m m o l

    / g D W - h r )

    O x y g e n u p

    t a k e

    r a t e

    ( m m o l

    / g D W - h r )

    O x y g e n u p

    t a k e r a

    t e

    ( m m o l

    / g D W - h r )

    15

    20

    0 5 10 15 20

    0

    5

    1015

    20

    0 5 10 15 20

    0

    5

    10

    15

    20

    0 5 10 15 20

    LO(i JR904)

    LO(i JE660a)

    LO (i JR904)LO (i JE660a)

    0

    5

    1015

    20

    0 5 10 15 20

    LO

    (i JE660a)

    LO(i JR904)

    LO(i JE660a)

    LO(i JR904)

    LO (i JR904)LO (i JE660a)

    KG

    ICIT

    SUCCOA

    SUCC

    CITOAAACGLYC-R

    GLX

    ycdW,yiaE

    garK,glxK citDEF

    gcl,garR,glxR

    3PG

    GLYCLT

    (c) Acetate phase plane (d) Glycerol phase plane

    (a) Malate phase plane (b) Glucose phase plane

    (e) -Ketoglutarate phase plane (f) Anaerobic growth on -Ketoglutarate

  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    10/12

    R54.10 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003, 4:R54

    heterogeneous datasets as well as use these datasets to aid in

    model predictions.

    Materials and methodsConstraint-based modeling A stoichiometric matrix, S (m n), is constructed where m isthe number of metabolites and n the number of reactions.Each column of S specifies the stoichiometry of the metabo-lites in a given reaction from the metabolic network. Mass balance equations can be written for each metabolite by tak-ing the dot product of a row in S , corresponding to a particu-lar metabolite, and a vector, v , containing the values of thefluxes through all reactions in the network. A system of mass

    balance equations for all the metabolites can be representedas follows:

    where X is a concentration vector of length m , and v is a flux vector of length n . At steady-state, the time derivatives of metabolite concentrations are zero, and equation (1) can besimplified to:

    S v = 0

    It follows that in order for a flux vector v to satisfy this rela-tionship, the rate of production must equal the rate of con-sumption for each metabolite. Application of additionalconstraints further reduces the number of allowable flux dis-tributions, v .

    Limits on the range of individual flux values can furtherreduce the number of allowable solutions. These constraintshave the form:

    v i

    where and are the lower and upper limits, respectively.Maximum flux values ( ) can be estimated based onenzymatic capacity limitations or, for the case of exchangereactions, measured maximal uptake rates can be used. Ther-modynamic constraints, regarding the reversibility or irre- versibility of a reaction, can be applied by setting the for the

    corresponding flux to zero if the reaction is irreversible.

    These constraints are not sufficient to shrink the originalsolution space to a single solution. Instead a number of solu-tions remain which make up the allowable solution space.Linear optimization can be used to find the solution that max-imizes a particular objective function. Some examples of objective functions include the production of ATP, NADH,NADPH or a particular metabolite. An objective function witha combination of the metabolic precursors, energy and redoxpotential required for the production of biomass has provenuseful in predicting in vivo cellular behavior [ 9,10 ,25,26 ].

    Simulation conditionsSimulations with i JR904 were all done using the softwarepackage SimPheny (Genomatica, San Diego, CA); this soft- ware was also used to build i JR904. All calculations weremade using the conditions outlined in this section. The bio-mass reaction was the same as that reported previously [ 8] with the addition of intracellular protons and water, and can be found in the additional data files. All flux values reportedin this section are in units of mmol/g DW-hr. The fluxthrough the non-growth associated ATP maintenance reac-tion (ATPM in the additional data files) was fixed to 7.6.Fluxes through all other internal reactions have an upperlimit of 1 10 30 ; if the reaction is reversible the lower limit is

    -1 1030

    and if it is irreversible the lower limit is zero.

    In addition to the metabolic reactions listed in the additionaldata files, reversible exchange reactions for all externalmetabolites were also included in the simulations to allow external metabolites to cross the system boundaries. If theseexchange reactions are used in the forward direction theexternal metabolites leave the system and if used in thereverse direction (that is, a negative flux value through thereaction) the external metabolites enter the system.

    Flux map of TCA cycle and citrate lyaseFigure 6Flux map of TCA cycle and citrate lyase. During oxygen-limited growth on KG maximal biomass will be made by utilizing the citrate lyase reaction.As oxygen becomes more limiting, the reactions shown in red, culminatingin the production of oxaloacetate (OAA), were predicted by i JR904 to beused more heavily than the forward direction of the TCA cycle reactions(shown in black). i JE660a does not include the citrate lyase reaction socarbon flow is directed only in the forward direction of the TCA cycle;these reactions produce more reduced redox carriers, which are difficultfor the cell to oxidize in an oxygen-limited environment. The '+' signindicates that the cofactor is produced by the reaction in the directionshown and the '-' sign indicates that the cofactor is consumed. Forabbreviations see Figure 5, with the following additions: ACCOA, acetyl-CoA; FUM, fumarate; MAL, malate, PYR, pyruvate.

    CIT

    ICIT

    KG

    SUCCOA

    SUCC

    FUM

    + NADH

    AC

    NADH

    + FADH2

    NADPH

    OAA

    + ATP

    + NADH

    Citrate lyase

    MAL

    d

    dt

    XS= ( )v 1

    http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    11/12

    http://genomebiology.com/2003/4/9/R54 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. R54.

    Genome Biology 2003, 4:R54

    The following external metabolites were allowed to freely enter and leave the system: ammonia, water, phosphate, sul-fate, potassium, sodium, iron (II), carbon dioxide and pro-tons (except during the robustness study where protonexchange flux was constrained down to zero). The corre-

    sponding exchange fluxes for these metabolites have a lowerand upper flux limit of -1 10 30 and 1 10 30 , respectively. Aerobic conditions were simulated with a maximum oxygenuptake rate of 20 mmol/g DW-hr, by setting the lower andupper limits for the oxygen exchange flux to -20 and 0 respec-tively, and anaerobic conditions were simulated by fixing theoxygen uptake rate to 0. All other external metabolites, exceptfor the carbon source, were only allowed to leave the system.The lower and upper limits on their corresponding exchangefluxes were 0 and 1 10 30 , respectively. Growth on differentcarbon sources was simulated by allowing those externalmetabolites to enter the system; the actual flux values foruptake rates used in the simulations are noted in the text and

    figures, where the upper limit is 0 and the lower limit is thenegative of the uptake rate listed. These constraints are alsosummarized in the additional data files.

    Phenotypic phase planes (PhPP)Phenotypic phase plane analysis was developed to generate aglobal view of the optimality properties of a network [ 16]. Thephenotypic phase plane is constructed from a large number of individual optimal solutions and gives an overall view of theoptimality properties of the network. PhPPs are used to show all possible quantitative flux distributions through a network while varying two or three constrained fluxes. The differentregions of the phase plane have qualitatively different flux

    distributions that translate to different metabolicphenotypes. One important feature of the PhPP is the line of optimality (LO). Points that lie on the LO optimize the objec-tive function for a given substrate uptake rate. When the cellis operating along the LO, calculated when the growth rate isused as the objective function, the cell is growing with a max-imal biomass yield.

    Identification of dead endsDead end metabolites are classified as such if a metabolite caneither be produced but not consumed or consumed but notproduced. By examining the connectivity of the metabolites inthe S matrix, a list of dead ends can be generated. Once these

    are identified, the number of reactions directly involved withthese metabolites can be determined by enumerating thenumber of non-zero elements in the row corresponding toeach dead end metabolite.

    Sequence annotations A list of reactions and associated enzymes that could connectthe dead ends with the rest of the network was gathered fromLIGAND (Database of Chemical Compounds and Reactionsin Biological Pathways [ 13]). These are enzymes known to acton those metabolites in other organisms. The enzymes listedin EcoCyc which are known to be in E. coli but lack assigned

    loci, were also added to the search list. The enzyme commis-sion numbers for this combined list of enzymes were used inqueries against the SwissProt, TremBl and TremBlnew data- banks (using Sequence Retrieval System [ 27]) to retrieve theenzymes' corresponding amino acid sequences. Known

    orthologous sequences of all of the enzymes of interest weregrouped together to construct multiple sequence alignments.Two separate programs, MEME (Multiple Expectation maxi-mization for Motifs Elicitation [ 17]) and ClustalW [ 18] wereused for this purpose. MEME was run assuming that eachsequence may contain a variable number of non-overlappingoccurrences of each motif and up to three distinct motifs.Each training set was processed twice with MEME, once withthe program's default end-gap penalty and once without it.Default values were used for gap-opening penalty and gap-extension penalty. All sequences in each training set were weighed equally by MEME. ClustalW, however, down- weighed similar sequences in proportion to their degree of

    relatedness. Pair-wise alignments in ClustalW were run witha dynamic programming algorithm (slow option) and withthe Reset Gap option off.

    The MEME output files were submitted to MAST [ 19] (Motif Alignment and Search Tool, version 3.0 online) and searchedfor matching sequences against the E. coli genome. MASTreturned a list of high-scoring sequences and their annota-tions. ClustalW output files were used by HMMER'shmmbuild and hmmcalibrate (version 2.2g [ 20 ]) with defaultparameters to train profile HMMs (Hidden Markov Models).hmmsearch was subsequently applied to find sequences inthe E. coli genome that matched each profile. Results from

    MAST's and HMMER's searches were manually examined,and relevant matches were reported.

    Additional data filesThe additional data consists of three files. The Excel file(Additional data file 1) contains the following: a list of thereactions in i JR904; definitions of the metaboliteabbreviations; a list of the exchange fluxes used in simula-tions and their constraints; a list of the dead end metabolitesin the metabolic network; a list of the reactions that were notincluded from i J E 660a and a complete list of the sequencecomparison results (including e-values). A zip file (Additional

    data file 2) is also provided, including the JPEG images of allthe GPR associations and a detailed document describinghow to interpret these images. The GPR image files are eachlabeled to correspond to either an individual gene or reaction.The third file (Additional data file 3) contains six maps of metabolism which together include all the reactions ini JR904; the reaction and metabolite abbreviations are thesame as those listed in the Excel file.

    Additional data file 1 A list of the reactions in i JR904; definitions of the metaboliteabbreviations; a list of the exchange fluxes used in simulations andtheir constraints; a list of the dead end metabolites in the metabolicnetwork; a list of the reactions that were not included fromi JR660a and a complete list of the sequence comparison results(including e-values) A list of the reactions in i JR904; definitions of the metaboliteabbreviations; a list of the exchange fluxes used in simulations andtheir constraints; a list of the dead end metabolites in the metabolicnetwork; a list of the reactions that were not included fromi JR660a and a complete list of the sequence comparison results(including e-values)Click here for additional data file Additional data file 2The JPEG images of all the GPR associations and a detailed docu-ment describing how to interpret these imagesThe JPEG images of all the GPR associations and a detailed docu-ment describing how to interpret these imagesClick here for additional data file Additional data file 3Six maps of metabolism which together include all the reactions ini JR904Six maps of metabolism which together include all the reactions ini JR904Click here for additional data file

    AcknowledgementsThe authors would like to thank Markus Herrgard for his advice regardingsequence comparisons, Evelyn Travnik for her help putting together the

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 8/13/2019 An Expanded Genomescale Model of Escherichia Coli

    12/12

    R54.12 Genome Biology 2003, Volume 4, Issue 9, Article R54 Reed et al. http://genomebiology.com/2003/4/9/R54

    Genome Biology 2003 4:R54

    additional data files, as well as Timothy Allen, Jason Papin and SharonWiback for their comments on the manuscript. Support for this work wasprovided by the NIH (grants GM57092 and GM62791) and the San DiegoFellowship to T.V.

    References1. Palsson BO: In silico biology through "omics". Nat Biotechnol 2002, 20: 649-650.

    2. Palsson BO: The challenges of in silico biology. Nat Biotechnol 2000, 18: 1147-1150.

    3. Edwards JS, Covert M, Palsson B: Metabolic modelling of microbes: the flux-balance approach. Environ Microbiol 2002,4:133-140.

    4. Varma A, Palsson BO: Metabolic flux balancing: basic concepts,scientific and practical use. Bio/Technology 1994, 12: 994-998.

    5. Bonarius HPJ, Schmid G, Tramper J: Flux analysis of underdeter-mined metabolic networks: the quest for the missingconstraints. Trends Biotechnol 1997, 15: 308-314.

    6. Price ND, Papin JA, Schilling CH, Palsson BO: Genome-scalemicrobial in silico models: the constraints-based approach.Trends Biotechnol 2003, 21: 162-169.

    7. Reed JL, Palsson BO: Thirteen years of building constraint-based in silico models of Escherichia coli . J Bacteriol 2003,

    185: 2692-2699.8. Edwards JS, Palsson BO: The Escherichia coli MG1655 in silicometabolic genotype: its definition, characteristics, andcapabilities. Proc Natl Acad Sci USA 2000, 97: 5528-5533.

    9. Edwards JS, Ibarra RU, Palsson BO: In silico predictions of Escherichia coli metabolic capabilities are consistent withexperimental data. Nat Biotechnol 2001, 19: 125-130.

    10. Ibarra RU, Edwards JS, Palsson BO: Escherichia coli K-12 under-goes adaptive evolution to achieve in silico predicted optimalgrowth. Nature 2002, 420: 186-189.

    11. Systems Biology Research Group [http://systemsbiology.ucsd.edu/organisms/ecoli.html]

    12. Serres MH, Gopal S, Nahum LA, Liang P, Gaasterland T, Riley M: Afunctional update of the Escherichia coli K-12 genome. GenomeBiol 2001, 2:research0035-0035.7.

    13. Goto S, Okuno Y, Hattori M, Nishioka T, Kanehisa M: LIGAND:database of chemical compounds and reactions in biologicalpathways. Nucleic Acids Res 2002, 30: 402-404.

    14. Karp PD, Riley M, Saier M, Paulsen IT, Collado-Vides J, Paley SM, Pel-legrini-Toole A, Bonavides C, Gama-Castro S: The EcoCycdatabase. Nucleic Acids Res 2002, 30: 56-58.

    15. TC-DB: Transport Protein Database [http://tcdb.ucsd.edu/tcdb/background.php]

    16. Edwards JS, Ramakrishna R, Palsson BO: Characterizing the met-abolic phenotype: a phenotype phase plane analysis. Biotechnol Bioeng 2002, 77: 27-36.

    17. Bailey TL, Elkan C: Fitting a mixture model by expectationmaximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994, 2:28-36.

    18. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improvingthe sensitivity of progressive multiple sequence alignmentthrough sequence weighting, position-specific gap penaltiesand weight matrix choice. Nucleic Acids Res 1994, 22: 4673-4680.

    19. Bailey TL, Gribskov M: Combining evidence using p-values:application to sequence homology searches. Bioinformatics1998, 14: 48-54.

    20. HMMER: Profile HMMs for protein sequence analysis [http://hmmer.wustl.edu]

    21. Campbell JW, Cronan JE Jr: The enigmatic Escherichia coli fadEgene is yafH. J Bacteriol 2002, 184: 3759-3764.

    22. Edwards JS, Palsson BO: Robustness analysis of the Escherichiacoli metabolic network. Biotechnol Prog 2000, 16: 927-939.

    23. Segre D, Vitkup D, Church GM: Analysis of optimality in naturaland perturbed metabolic networks. Proc Natl Acad Sci USA 2002,99: 15112-15117.

    24. Covert MW, Schilling CH, Famili I, Edwards JS, Goryanin II, Selkov E,Palsson BO: Metabolic modeling of microbial strains in silico .Trends Biochem Sci 2001, 26: 179-186.

    25. Varma A, Palsson BO: Stoichiometric flux balance modelsquantitatively predict growth and metabolic by-productsecretion in wild-type Escherichia coli W3110. Appl Environ

    Microbiol 1994, 60: 3724-3731.

    26. Pramanik J, Keasling JD:Stoichiometric model of Escherichia colimetabolism: incorporation of growth-rate dependent bio-mass composition and mechanistic energy requirements. Bio-technol Bioeng 1997, 56: 398-421.

    27. Zdobnov EM, Lopez R, Apweiler R, Etzold T: The EBI SRS server - recent developments. Bioinformatics 2002, 18: 368-373.

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12089538http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11062431http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12000313http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12000313http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12679064http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12700248http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10805808http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10805808http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10805808http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11175725http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11175725http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11175725http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12432395http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12432395http://systemsbiology.ucsd.edu/organisms/ecoli.htmlhttp://systemsbiology.ucsd.edu/organisms/ecoli.htmlhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11574054http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752253http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752253http://tcdb.ucsd.edu/tcdb/background.phphttp://tcdb.ucsd.edu/tcdb/background.phphttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11745171http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11745171http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11745171http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7584402http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7584402http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9520501http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9520501http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9520501http://hmmer.wustl.edu/http://hmmer.wustl.edu/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12057976http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12057976http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11101318http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11101318http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12415116http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12415116http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12415116http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11246024http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7986045http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7986045http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11847095http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11847095http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11847095http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11847095http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7986045http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11246024http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12415116http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12415116http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11101318http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12057976http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12057976http://hmmer.wustl.edu/http://hmmer.wustl.edu/http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9520501http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9520501http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7984417http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7584402http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=7584402http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11745171http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11745171http://tcdb.ucsd.edu/tcdb/background.phphttp://tcdb.ucsd.edu/tcdb/background.phphttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752253http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752253http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11752349http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11574054http://systemsbiology.ucsd.edu/organisms/ecoli.htmlhttp://systemsbiology.ucsd.edu/organisms/ecoli.htmlhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12432395http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12432395http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11175725http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11175725http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10805808http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=10805808http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12700248http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12679064http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12000313http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12000313http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=11062431http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12089538