תילובטמ היצרגטניאל ילכ :imat integrative metabolic analysis...blavatnik school...
TRANSCRIPT
אביב- אוניברסיטת תל
הפקולטה למדעים מדויקים על שם ריימונד ובברלי סאקלר
בית הספר למדעי המחשב על שם בלבטניק
iMAT :כלי לאינטגרציה מטבולית
אפליקציה לסרטן השד
חיבור זה הוגש כחלק מהדרישות לקבלת התואר
אביב-באוניברסיטת תל .M.Sc –" מוסמך אוניברסיטה"
ס למדעי המחשב"ביה
על ידי
הדס צור
העבודה הוכנה בהדרכתו של פרופסור איתן רופין
ע"תש, שבט
Tel-Aviv University
The Raymond and Beverly Sackler Faculty of Exact Sciences
Blavatnik School of Computer Science
iMAT: Integrative Metabolic Analysis Tool
Application to Human Breast Cancer
This thesis is submitted in partial fulfillment of the requirements
towards the M.Sc. degree in Computer Science
Tel-Aviv University
Blavatnik School of Computer Science
by
Hadas Zur
The research work in this thesis has been carried out
under the supervision of Prof. Eytan Ruppin
January, 2010
Acknowledgments
First, I would like to express my deepest gratitude to my supervisor, Prof. Eytan Ruppin.
His guidance enabled me to view up close a scientist and a man. An accomplished
scientist with innovative ideas, motivated by passion and a desire for true understanding.
A man who consistently assists and advises, patiently and always with a smile. For his
non-trivial support during my mother’s hospitalization period, I will forever be grateful.
I would also like to thank Dr. Tomer Shlomi for his guidance in my first steps in the lab
and for all his help during my research.
Many thanks to Prof. Ilan Tsarfati for his willingness, insight and knowledge.
Very special thanks to Tomer Benyamini for his open ear, invaluable advice and
generous availability. Keren Yizhak, your reason and humor were instrumental, thank
you. To my lab colleagues, my friends, Adi Shabi, Ori Folger and Livnat Jerby, much
gratitude.
Finally, I would like to thank my family for all their help and support.
תקציר
לאפיון ניכריםמורכבות זו מציגה אתגרים . אשר מערבת מגוון תופעות ביולוגיות הינה מחלה מורכבתסרטן
מודלים חישוביים של סרטן . והפיזיולוגי יהמולקולאר בהקשרומניעה את חקר הסרטן , הביולוגיה של סרטן
ניסוייםפיתוח מודלים זה מסתייע בהתקדמות המואצת של כלים . מתפתחים כעזר למחקר ביולוגי ורפואי
iMAT (Integrativeודה זו אנו מציגים את בעב .רחב היקףו ידע עתיראשר מייצרים מידע ואנליטיים
Metabolic Analysis Tool) , ראקטומיפרוטאומי ו, של מידע גנומיאשר מאפשר את האינטגרציה
)Reactome array data (ישה תוך כדי הרחבת הג, עם מודל מטבולי לקבלת חיזוי של שטפים מטבוליים
חדשה לאינטגרציה של מידע CBMאנו מציגים מתודת , בפרט. (Shlomi, Cabili et al. 2008)-המוצגת ב
טכנולוגיית , בעוד שמידע גנומי ופרוטאומי רחב היקף קיים כבר זמן מה. ראקטומי לחיזוי שטף מטבולי
חיזוי שטפים מטבולים . (Beloqui, Guazzaroni et al. 2009)י "הראקטום פותחה ממש לאחרונה ע
ניסוייותכיוון שגישות , את הבנתנו של מטבוליזם תאי בהתבסס על מידע מולקולארי רחב היקף יוכל לקדם
בחיזוי שטפים iMATשל כאן אנו מדגימים את התועלתיות. נוכחיות מוגבלות למדידת שטפים בודדים
. כאשר החיזויים תואמים שינויים מטבוליים ידועים, מטבוליים של תאי סרטן שד
Abstract
Cancer is a complex disease that involves multiple types of biological interactions across
diverse physical, temporal, and biological scales. This complexity presents substantial
challenges for the characterization of cancer biology, and motivates the study of cancer in
the context of molecular, cellular, and physiological systems. Computational models of
cancer are being developed to aid both biological discovery and clinical medicine. The
development of these in silico models is facilitated by rapidly advancing experimental
and analytical tools that generate information-rich, high-throughput biological data. In
this work we introduce iMAT, an Integrative Metabolic Analysis Tool, enabling the
integration of transcriptomic, proteomic, and reactome array data with metabolic network
models to predict metabolic flux, developing variants of the approach presented in
(Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based method for
the integration of a genome-scale metabolic network with reactome array data to predict
metabolic flux activity. While high-throughput transcriptomic and proteomic data have
been available for quite some time now, the reactome array technology has been very
recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting new genome
scale data on the rate of metabolite transformation by enzymes present in cell extracts.
The prediction of metabolic fluxes based on high-throughput molecular data sources
could help advance our understanding of cellular metabolism, since current experimental
approaches are limited to measuring fluxes through merely a few dozen enzymes. Here
we demonstrate the utility of iMAT in predicting metabolic flux activities in breast-
cancer cell-lines, where its predictions correspond with previously measured cancer
metabolic alterations.
1
Contents
1 Introduction ................................................................................................................. 2
1.1 Modeling Cellular Metabolism ............................................................................ 2 1.2 Constraint-Based Modelling .............................................................................. 10 1.3 Modeling Human Metabolism ........................................................................... 18 1.4 High-throughput molecular data and Metabolic Model Integration .................. 21 1.5 Human Cancer and Metabolism ......................................................................... 28
1.5.1 Breast Cancer Metabolism .......................................................................... 43
1.5.2 Modeling Cancer Metabolism ..................................................................... 46
1.6 Reactome Array data Technology ...................................................................... 58 1.7 Automatically generated Metabolic Models ...................................................... 60
2 iMAT: Integrative Metabolic Analysis Tool ............................................................ 62
2.1 Online Tool Development .................................................................................. 62 2.1.1 Online availability ....................................................................................... 63
2.1.2 An illustrative example of applying iMAT to a toy network model ........... 64
2.2 Expanding approach: Integration of Reactome Array data ................................ 67 2.2.1 Modeling P.Putida’s Metabolic Profile via Reactome Array Integration .. 70
3 Modeling Human Breast Cancer ............................................................................... 74
3.1 Results ................................................................................................................ 75 3.1.1 Data Acquisition and Preprocessing ........................................................... 75
3.1.2 Analysis Overview ...................................................................................... 75
3.1.3 iMAT on general human model integrated with expression data ............... 76
3.2 Future Directions ................................................................................................ 88 3.2.1 Integrating iMAT’s flux predictions to model a cancer metabolic profile via
quantification ............................................................................................................ 88
3.2.2 Weighted iMAT .......................................................................................... 89
4 Discussion ................................................................................................................. 90
5 Bibliography ............................................................................................................. 92
6 Supplementary material ............................................................................................ 98
2
1 Introduction
Metabolism is widely known to play a key part in human physiology. Its function is
crucial for understanding disease states and progression, aging, nutrition and athletes,
astronauts and soldiers performance improvement.
In particular, metabolism has been known to be involved in many major disease states,
such as diabetes, obesity and cardiovascular disease. Cancers display highly abnormal
metabolic phenotypes, and metabolic targets have long been used in cancer
chemotherapy. More recently, evidence is growing that the effects of metabolism on
physiological and pathophysiological brain functions are significant, from schizophrenia
to neurodegenerative disorders. Successful implementation of molecular systems biology
of human metabolism is thus likely to have broad consequences. (Mo and Palsson 2009)
1.1 Modeling Cellular Metabolism
The intricate nature of human physiology renders its study an arduous undertaking, and a
systems biology approach is necessary to comprehend the complex interactions involved.
Network reconstruction is a pivotal step in systems biology and represents a common
denominator as all systems biology research on a target organism relies on such a
representation (Mo and Palsson 2009). Genome-scale metabolic networks represent the
repertoire of chemical transformations that take place in an organism, in a biochemically,
genetically and genomically structured (BiGG) manner (Figure 1) (Mahadevan, Bond et
al. 2006) and allows the formulation of genome-scale models (GEMs). GEMs enable the
3
computation of phenotypic traits based on the genetic composition of the target organism
(Palsson 2009). Modern genomic sequencing technologies enable the rapid reconstruction
of metabolic networks, giving rise to more than 50 highly curated metabolic
reconstructions that have been published to date (Duarte, Herrgård et al. 2004; Feist and
Palsson 2008). Such network reconstructions span all three domains of life, Eukaryota,
Bacteria, and Archaea. These encompass dozens of bacterial and yeast species, including
various pathogens and industrially relevant organisms, the model plant Arabidopsis and
mammalian metabolic networks including mouse and human (Figure 2) (Duarte, Becker
et al. 2007). The scope and content of network reconstructions continues to grow, for
instance to include the entire transcription/translation apparatus of a cell and the
structural information about the metabolic enzymes (Palsson 2009). Major ongoing
efforts are currently made to develop computational methods to automatically reconstruct
metabolic network models for additional organisms based on genomic and functional
genomic data. Such efforts have recently resulted in draft network reconstructions for 160
microbial species (Overbeek, Begley et al. 2005). Reconstructed metabolic network have
been used toward five major ends: (1) contextualization of high-throughput data, (2)
guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4)
interrogation of multi-species relationships, and (5) network property discovery (Figure
3) (Oberhardt, Palsson et al. 2009). Specifically common uses span metabolic phenotype
prediction (Guldberg, Rey et al. 1998), metabolic engineering (Pharkya, Burgard et al.
2004), studies of network evolution (Fong, Joyce et al. 2005), and biomedical
applications (Apic, Ignjatovic et al. 2005). These studies employ various constraint-based
modeling (CBM) methods to analyze the network function by solely relying on simple
4
physical-chemical constraints (Price, Reed et al. 2004), while more traditional modeling
techniques are based on mathematical approaches that require detailed information on
kinetics and on enzyme and metabolite concentrations (Fell 1997; Domach, Leung et al.
2000).
Figure 1: Incorporation of genomic and biochemical knowledge derived from the genome annotation and experimental literature into a BiGG-structured knowledge base network. High-throughput annotation data provides information on gene products, transcript variants and their associated functions, as well as localization (i.e. cellular compartment and tissue). Literature documents specific biochemical details from experiments on the gene product functions, such as reaction mechanism and substrate specificity. Figure from (Mo and Palsson 2009)
5
Figure 2: Reconstruction statistics. The cumulative number of metabolic GENREs published over the past decade is shown in (A). (B–D) Histograms of the number of metabolic GENREs containing varying numbers of genes (B), metabolites (C), and reactions (D). (E) Histogram of the number of reconstructions published per species. All histograms display prokaryotic (green) and eukaryotic (brown) statistics. Figure from (Oberhardt, Palsson et al. 2009)
6
Figure 3: Uses of metabolic GENREs. The building and analysis of metabolic GENREs are shown in the left panels, and the five categories of uses of metabolic GENREs are described in the right set of panels. Each panel on the right includes a representative example from literature. Figure from (Oberhardt, Palsson et al. 2009) Essentially, a metabolic network is comprised of a set of chemical reactions that
can be represented as a set of chemical equations (Figure 4a-b), in which the reaction
stoichiometry is embedded (Palsson 2006), where the reaction stoichiometry is the
calculation of quantitative relationships of the reactants and products in a balanced
chemical reaction (Figure 4b). Metabolic reactions can be divided to two distinct types:
7
catabolic reactions that break down molecules and release energy and anabolic reactions
that use energy to build up essential cell components. The dynamics of a metabolic
network pertains to the reaction's metabolic flux, i.e. the rate by which compounds are
composed or decomposed by a reaction. The stoichiometry of a metabolic network can be
represented in matrix form, and is commonly referred to as the stoichiometric matrix,
denoted by S (Figure 4c). The stoichiometric matrix is organized such that rows
correspond to biochemical compounds (metabolites), columns correspond to biochemical
reactions, and the entries are the integer stoichiometric coefficients, thereby each column
conveys the compound balance of the corresponding reaction. Similarly, each row depicts
all the reactions in which a certain compound participates, thus representing reaction
interconnections in the metabolic network. The stoichiometric matrix embodies both
chemical and network information enabling transformation of the network flux vector
(containing the flux rate for each reaction in the network) to the time derivatives of the
metabolites' concentrations (Figure 4d). Thus, the addition of information such as protein
localization, kinetic constants and enzyme and metabolite intracellular concentrations, to
the stoichiometric matrix, results in a structured database which constitutes a basis for
various mathematical and in silico network investigation analysis.
While the stoichiometric matrix represents the static aspect of a metabolic model,
the dynamic facet, a kinetic model, which is composed of a set of differential equations
describing the change in metabolite concentrations over time, may be assembled when
kinetic constants and enzyme and metabolite intracellular concentrations are available.
For instance, comprehensive kinetic models are available for the human red blood cell
8
(Joshi and Palsson 1989; Joshi and Palsson 1990; Lee and Palsson 1990; Mulquiney and
Kuchel 1999). However, lack of accurate and comprehensive data on such parameters
limits the current applicability of such methods to small-scale systems. Only recently, a
proposed workflow for the formulation of large scale kinetic models was outlined
(Jamshidi and Palsson 2008). An alternative approach bypassing this hurdle is
Constraint-Based Modeling (CBM), which serves to analyze the function of large-scale
metabolic networks by solely relying on simple physical-chemical and physiological
constraints (Price, Reed et al. 2004). In recent years, CBM has been frequently used to
successfully predict various phenotypes of microorganisms. In this thesis I will describe a
CBM approach for the integration of a static metabolic network with transcriptomic,
proteomic, or reactome array data, and show its applicability re. the prediction of Human
breast-cancer metabolism via gene expression data, and Pseudomonas Putida metabolism
via Reactome array data.
9
Figure 4: Formal representation of a metabolic network model. (a) A schematic illustration of a metabolic network. The nodes a,b,c represent metabolites and the edges ri represent the reactions. (b) Each reaction is formulated as a chemical equation which is balanced according to the integer coefficients. (c) The stoichiometric matrix S; Rows represent metabolites and columns represents reactions. (d) The dynamic mass balance. Matrix S transforms the reactions' flux vector v into a vector containing the time derivatives of all metabolite concentrations. The figure is adapted from (Lee, Gianchandani et al. 2006).
10
1.2 Constraint-Based Modelling
The recent flood of genomic, transcriptomic, and other high-throughput data makes the
need to interpret this information in a systemic fashion increasingly pressing. The
construction of in silico models represents a way to interpret these data and place them in
the context of cellular physiology. A variety of in silico modeling approaches in biology
have been developed, including detailed kinetic models, cybernetic models, stochastic
models, metabolic control analysis, biochemical systems theory, and constraint-based
methods. Modern modeling approaches in biology need to be easily scalable and able to
integrate available “-omics” data that may contain tens of thousands of measurements. A
constraint based modeling approach meets these criteria and at present is the only
methodology by which genome-scale models have been constructed. The few parameters
used in a constraint-based framework enable models to be built quickly and to encompass
a larger portion of biochemical reaction networks than the portion currently spanned by
other modeling methodologies. To date, constraint-based models account for the largest
metabolic models in terms of numbers of genes and reactions and have proven to be
predictive of some types of data, including phenomic data, qualitative transcriptomic
data, and gene knockout data. (Reed and Palsson 2003)
These advances in genomic sequencing and annotation along with a wealth of
chemical literature enabled the reconstruction of several genome scale metabolic
networks comprised of hundreds to thousands of enzymes, reactions and metabolites
(Price, Reed et al. 2004). Given a stoichiometric matrix S that encapsulates the
biochemical reactions and compounds involved in a metabolic system, a CBM model
11
imposes mass balance, thermodynamic and maximum/minimum flux constraints to define
a set of flux vectors representing all possible steady states of the network. Over the past
decade, Constraint- Based models have been developed for a variety of systems
including: bacterial and yeast metabolism (Edwards and Palsson 2000; SCHILLING and
PALSSON 2000; Schilling, Covert et al. 2002; Förster, Famili et al. 2003), the red blood
cell (Wiback and Palsson 2002), the human cardiac mitochondria (Vo, Greenberg et al.
2004), glutamate neurotransmission (Chatziioannou, Palaiologos et al. 2003), the human
cell metabolic model (Duarte, Becker et al. 2007), and recently, a mouse cell metabolic
model (Selvarasu, Karimi et al.).
While kinetic models may ultimately provide a detailed understanding of
integrated cellular functions, they are limited by the current availability of the
information needed to construct them and by the fact that kinetic constants can vary
across a population and change over time through evolution. The constraint-based
modeling procedure does not strive to find a single solution but rather finds a collection
of all allowable solutions to the governing equations that can be defined. Solutions that
violate any of the imposed constraints are excluded from the collection, which
mathematically is called a solution space. The subsequent application of additional
constraints further reduces the solution space and, consequently, reduces the number of
allowable solutions that a cell can utilize. The constraints that have been used in the first
generation of constraint-based models include stoichiometric constraints (mass balance),
thermodynamic constraints (regarding the reversibility of a reaction), and enzymatic
capacity constraints (using an appropriate value). (Reed and Palsson 2003)
12
The CBM approach represents the constraints on the network as a set of linear
equations on the network's flux vector v (Price, Reed et al. 2004).
(1) · 0
(2)
The steady state assumption represented in equation (1) assumes that there is no
accumulation or depletion of metabolites in the metabolic network. Therefore, the
production rate of each metabolite equal's its consumption rate and there is no
concentration change (the time derivatives of all the metabolites' concentrations equals
zero). The thermodynamic constraints (i.e., under which physiological conditions certain
reactions are reversible while others are not) and flux capacity constraints (i.e.,
constraints on enzyme production rate) define bounds on the flux vector and are
embedded in equation (2). Additional constraints such as ones describing the available
nutrients in the environment or a genetic perturbation may also be included. For example,
in order to eliminate the activity of a gene ("knockout experiment") the minimal and
maximal flux bounds of the corresponding reaction should be set to zero ( i.e.: 0
0). Similarly, to restrict the consumption of a metabolite from the environment the
corresponding uptake reactions’ flux bounds should be set to zero. As more constraints
are applied to the system, the attained solution space is reduced, and models the specific
biological system at hand more accurately, with its solution space describing more likely
functional physiological states (Mahadevan and Schilling 2003; Price, Reed et al. 2004).
13
Extreme pathways and elementary modes represent sets of vectors that describe
the solution space and are themselves biochemically valid flux distributions through a
defined metabolic network. Elementary modes are unique vectors that characterize the
solution space. An elementary mode is defined as a “minimal set of enzymes that could
operate at steady-state with all irreversible reactions proceeding in the appropriate
direction”. Extreme pathways are related to elementary modes and correspond directly to
the edges of the convex solution space. Positive linear combinations of these vectors can
be used to generate any valid steady-state flux solution under the governing constraints.
These analysis methods are useful for characterizing the solution space, and the next step
is to try to determine what solution in the solution space the cell actually opts to use.
(Reed and Palsson 2003)
The aforementioned set of constraints defines a solution space of alternative flux
distributions that can be explored via different optimization and sampling techniques
(Price, Reed et al. 2004; Palsson 2006). Flux Balance Analysis (FBA) is the most widely
studied CBM method (Varma and Palsson 1994; Kauffman, Prakash et al. 2003) which
searches for an optimal steady state solution that maximizes a certain objective function
among all feasible steady state solutions. In micro-organisms FBA the reigning
assumption as an objective function is that the organism strives to maximize its growth
rate (or biomass production). To implement this, an artificial reaction that drains
biosynthetic precursors in an appropriate ratio required to produce the cellular
components was added to micro-organisms CBM (Varma, Boesch et al. 1993). Notably,
FBA under the biomass maximization hypothesis was found to successfully predict an
14
impressive array of phenotypes observed in microorganisms, such as their growth rates
(Edwards, Ibarra et al. 2001), uptake rates, by-product secretion (Varma, Boesch et al.
1993), the outcomes of adaptive evolution (Ibarra, Edwards et al. 2002; Fong and Palsson
2004), gene expression levels (Famili, Förster et al. 2003), metabolic flux rates (Segre,
Vitkup et al. 2002; Wiback, Mahadevan et al. 2004; Shlomi, Berkman et al. 2005), and
knockout lethality (Edwards and Palsson 2000).
FBA utilizes Linear Programming (LP) to find an optimal flux vector v satisfying
the linear equations (1-2) and optimizing a linear objective function (Figure 5). However,
while investigating different metabolic phenotypes and conditions other cellular objective
functions and optimization techniques were explored (Price, Reed et al. 2004). For
example, maximization of ATP production was postulated as a cellular objective
(Majewski and Domach 1990; Ramakrishna, Edwards et al. 2001). In (Burgard and
Maranas 2003) a Quadratic Programming (QP) method was used to find a flux
distribution with a minimum Euclidian distance from a set of experimentally measured
fluxes. Minimization of Metabolic Adjustment (MOMA) also employs QP to identify a
flux distribution in the flux space of a knockout strain, with a minimum Euclidean
distance from the wild-type flux distribution (Segre, Vitkup et al. 2002). In a similar
manner, Regulatory On/Off Minimization (ROOM) employs Mixed Integer Linear
Programming (MILP) to minimize the Boolean regulatory changes between the wild-type
and knockout strain fluxes (Shlomi, Berkman et al. 2005).
15
To exhaustively determine all alternative optimal solutions (Lee, Palakornkule et
al. 2000) applied a MILP formulation on a small metabolic network, consisting of 33
reactions and 30 metabolites. In later stages a combined metabolic/regulatory CBM
model was reconstructed for E.coli (Covert, Knight et al. 2004). In SR-FBA, MILP is
utilized to consider a set of additional Boolean variables that translate the Boolean logic
underlying regulatory constraints and the mapping between genes and reactions to a form
of linear equations (Shlomi, Eisenberg et al. 2007). OptNock defines a bi-level
optimization problem that finds the minimal set of gene deletions that maximize the
production of a desired metabolite under the noted stoichiometric constraints (Burgard,
Pharkya et al. 2003). Drawing upon LP duality theory, the bi-level optimization problem
is elegantly transformed to a single MILP problem (Bard 1998). These several examples
of MILP methods and other computational approaches (Lee, Gianchandani et al. 2006)
were at large applied to CBM models of micro-organisms.
16
Figure 5: Flux Balance Analysis formulation: An illustrative example of employing LP to find a steady state flux distribution for the network shown in Figure 4a. The figure is adapted from (Lee, Gianchandani et al. 2006).
17
Figure 6: Constraint-based modeling: Application of constraints to a reconstructed metabolic network leads to a defined solution space in which a cell’s network must operate. From this solution space a number of methods have been developed that help predict or explain phenotypic behaviour. Linear optimization can be used to find solutions in the space that maximize or minimize a given objective, and mixed-integer linear programming (MILP) can be used to find multiple optima if they exist. Elementary mode analysis and extreme pathway analysis can be used to characterize vectors in the solution space; the edges of the space correspond to extreme pathways (Lee, Gianchandani et al.) and are a subset of the elementary modes (EM). Phenotypic phase plane analysis shows for what conditions the metabolic network operates under different limitations. The effects of gene deletions can also be computed. In the diagram the old optimal solution (point a) does not lie in the new solution space. A new optimum can be calculated (point b), or a suboptimal solution that is closest to the old optimum can be calculated (point c). In addition, work has been done by using experimental flux measurements (indicated by a point) to back-calculate objective functions (indicated by vectors). Figure from (Reed and Palsson 2003)
18
1.3 Modeling Human Metabolism
Research into human metabolism and its regulation has expanded rapidly due to the
emergence of metabolic diseases such as diabetes and obesity as major sources of
morbidity and mortality (Lanpher, Brunetti-Pierri et al. 2006; Muoio and Newgard 2006),
with metabolic enzymes and their regulators increasingly emerging as viable drug targets
(Shi and Burn 2004; Altucci, Leibowitz et al. 2007). In addition, a common hypothesis
exists that malfunctions in energy metabolism may play a central role in a wide range of
age-related disorders and various forms of cancer (Wallace 2005). However, while much
work has been done in the context of applying constraint-based modeling to study the
metabolism of micro-organisms, large-scale modeling of human metabolism is still in its
infancy. In terms of reconstructing human metabolic networks, most of the previous work
has focused on characterizing distinct metabolic pathways (Kanehisa and Goto 2000;
Romero, Wagg et al. 2004).
Reconstructions of large-scale human metabolic networks had until recently been
performed only for specific cell types and organelles. The human red blood cell (RBC)
conducts a simplistic metabolic activity that is well characterized and is described by both
kinetic (Mulquiney and Kuchel 1999) and CBM models (~30 reactions, ~40 metabolites ;
(Wiback, 2002 #66)) that were utilized to study its metabolic behaviour under
multifarious conditions . For example: (Mulquiney and Kuchel 1999) studied the
regulation and control of the key regulatory enzyme 2-3 biphosphoglycrate in glycolysis.
In (Durmu Tekir, Çak r et al. 2006) CBM techniques were applied to study several RBC
19
enzymopathies and indicated that RBC metabolism is mostly affected by the glucose-6-
phosphate dehydrogenase and phosphoglycerate kinase enzymopathies. An FBA model
for the metabolism of neurotransmitter glutamate was constructed to study its metabolism
in the brain, and pointed at several regulatory points that govern the release of this major
stimulatory neurotransmitter (Chatziioannou, Palaiologos et al. 2003). However, this
model is partial (16 reactions, 13 metabolites) and includes only a subset of reactions
associated with glutamate metabolism.
A study on the cardiac mitochondria exhibited a wider scope reconstruction
(~190 reactions, ~230 metabolites) and employed a CBM approach to examine the
capabilities of the reconstructed network to fulfil three metabolic functions: ATP
production, heme synthesis, and mixed phospholipid synthesis (Vo, 2004 #69). In a later
study, LP and uniform random sampling were applied to study mitochondrial activity
under four metabolic conditions: normal physiologic, diabetic, ischemic, and dietetic
(Thiele, Price et al. 2005), implying reduced flexibility of the metabolic network under
abnormal conditions. This study simulated suggested treatments to evaluate their impact
on diabetic conditions and deduced that neither normalized glucose uptake nor decreased
ketone body uptake have a positive effect on the mitochondrial energy metabolism. It
also showed that the experimentally observed reduced activity of pyruvate dehydrogenase
in vivo under diabetic conditions could be a result of stoichiometric constraints and
therefore would not necessarily require enzymatic inhibition.
20
A cardinal step forward has been presented in recent studies by (Duarte, Becker et
al. 2007) and by (Ma, Sorokin et al. 2007) that reconstructed the global human metabolic
network based on an extensive evaluation of genomic and bibliomic data (about 1500
references). These networks included ~4000 reactions, ~3000 metabolites, and ~1500
genes mapped to the various reaction over 7 organelles. A comparison between the two
networks implies that the network reconstructed by Ma et al. is more extensive, but it was
not assembled as a CBM model. The resulting network models are, however, non tissue
specific. Furthermore, CBM methods that explore the solution space of metabolic states
described by the network model of Duarte et al., were not applied in a comprehensive
manner. Rather, Duarte’ et al. characterization of genome-scale changes via metabolic
behaviour following gastric bypass surgery, was essentially based on the topological
properties of the network, thus not utilizing the stoichiometric data embedded in the
model.
The task of adapting constraint-based modeling methods from the realm of
microorganisms to that of multi-cellular organisms encounters two main hurdles: One
major difficulty relates to the fact that different tissues have different metabolic
objectives that are not well characterized and are largely unknown. This is in contrast to
modeling microorganisms where a simple objective function (such as maximizing the
biomass production rate) can be used together with the FBA method to predict
biologically plausible flux distributions. Another major difficulty relates to the lack of
information on tissue-specific metabolite uptake and secretion, which is essential for
FBA employment.
21
1.4 High-throughput molecular data and Metabolic Model
Integration
The availability of high throughput transcriptomic, proteomic and metabolomic data
raises an emerging challenge of overlaying this data on top of the reconstructed metabolic
networks, to more accurately infer the metabolic activity reflected in the data. Similar
challenges arise when integrating such high-throughput functional data with networks of
physical molecular interactions, such as protein-protein and protein-DNA, towards the
inference of functional modules and control mechanisms (Luscombe, Madan Babu et al.
2004; Chuang, Lee et al. 2007). Existing methods for integrating functional data with a
metabolic network are of two types: (i) Network topology-based – considering only the
structure of a metabolic network and overlaying high-throughput data to foster insight
into metabolic hotspots or pathways that are significantly altered under certain conditions
(Hu, Mellor et al. 2005; Joyce and Palsson 2006) (Chatziioannou, Palaiologos et al.)
(Chatziioannou, Palaiologos et al.) Constraint-based – integrating the high-throughput
functional data within a constraint-based modeling approach, to improve the prediction of
the actual metabolic flux distribution through the network.
Utilizing gene and protein expression to predict metabolic flux is a challenging
task due to the complex mapping between the two. Previous studies have found a strong
qualitative correspondence between gene expression and measured (Daran-Lapujade,
Jansen et al. 2004; Fong and Palsson 2004) as well as predicted (Stelling, Klamt et al.
2002; Famili, Fצrster et al. 2003; Akesson, Forster et al. 2004; Bilu, Shlomi et al. 2006)
22
metabolic fluxes in microbes. However, the correlation between expression and
metabolic flux is generally moderate and in some cases significant transcriptional
changes do not reflect changes in flux (Banta, Vemula et al. 2007), and vice-versa,
significant changes in measured flux may not reflect transcriptional changes (Yang, Hua
et al. 2002; Tummala, Junne et al. 2003). These discrepancies may result from
hierarchical regulation, reflecting post-transcriptional regulation of protein synthesis and
degradation rates, and post-translational modifications that represent additional regulatory
mechanisms which affect the potential activity rate of metabolic enzymes (Park, Lee et al.
2005). Furthermore, they also arise due to an additional level of flux regulation that is not
reflected in gene expression, termed metabolic regulation. The latter denotes the effect of
metabolite concentrations on the actual enzyme activity through allosteric and mass
action effects (Rossell, van der Weijden et al. 2006).
High-throughput experiments give rise to different biological networks, such as
signaling, transcriptional regulatory and metabolic networks. The analysis of these
networks has mostly involved exploration of static (topological) properties (Luscombe,
Madan Babu et al. 2004). However, while static analysis provides some insight into the
network function, it does not reveal the dynamics arising from a specific temporal, spatial
and physiological biological context. The dynamics of a biological system can be
revealed by the integration of diverse genome and metabolome wide data, such as gene
expression levels, with the aforementioned static networks. Integration of networks with
high-throughput data was shown to be advantageous for various networks and data
sources. Network-level analysis of experimental data, facilitates a more comprehensive
23
perception of the investigated system and its underlying molecular mechanism
(Workman, Mak et al. 2006; Chuang, Lee et al. 2007). Integration of protein–protein
interaction networks and gene expression data identifies markers that are more
reproducible than individual marker genes selected without network information for
example. Chuang et al. demonstrated that they achieve higher accuracy in the
classification of metastatic versus non-metastatic tumors. Large unexpected changes in
the underlying regulatory network architecture can be uncovered via the dynamics of a
biological network on a genomic scale, by integrating transcriptional regulatory
information and gene-expression data. Luscombe et al. performed such integrations for
multiple conditions in Saccharomyces cerevisiae.
The emergence of metabolism as a key factor in common diseases, makes the
integration of genome and metabolome wide data with a metabolic network potentially
very informative. Several CBM methods for analyzing and predicting metabolic flux
distributions based on gene expression data have been previously suggested. The methods
of (Akesson, Forster et al. 2004) and (Becker and Palsson 2008) use gene expression data
to identify genes that are absent or likely to be absent in certain contexts and search for
metabolic states which prevent (or minimize) the flux through the associated metabolic
reactions.
A recent method by (Shlomi, Cabili et al. 2008) considers data on both lowly and
highly expressed genes in a given context as cues for the likelihood that their associated
reactions carry metabolic flux, and employ CBM to accumulate these cues into a global,
24
consistent prediction of the metabolic state. Applied to a metabolic network model of the
yeast S. cerevisiae, this method was shown to accurately predict changes in metabolic
fluxes across different growth media, in accordance with measured flux data. The method
was further applied to predict human tissue metabolism, based on tissue-specific gene and
protein expression data. The analysis showed that the activity of genes responsible for
metabolic diseases is not directly manifested in enzyme-expression data, though can still
be correctly predicted by expression integration with a metabolic network, as validated by
large-scale mining of tissue-specificity data.
(Shlomi, Cabili et al. 2008) Formulation:
A detailed Boolean gene-to-reaction mapping (part of the metabolic network model of
Duarte et al.) is employed to identify a tissue-specific expression state for each reaction,
reflecting whether its enzyme-encoding genes are classified as expressed in the tissue.
Specifically, this is done by modifying the Boolean mapping to account for tri-valued
expression states, assigning highly, lowly and moderately expressed genes, values of 1, –
1 and 0, respectively, and replacing the logical ‘and’ and ‘or’ operators with ‘max’ and
‘min’ expressions, respectively. This analysis results in a subset of the reactions in the
model (denoted ) that is defined to be highly expressed and another subset (denoted
) defined as lowly expressed. For each tissue, the following mixed integer linear
programming (MILP) problem is formulated to find a steady-state flux distribution
satisfying stoichiometric and thermodynamic constraints, while maximizing the number
of reactions whose activity is consistent with their expression state:
max ∑ ∑
s.t
25
(1) 0
(2)
(3) , ε , :
(4) , ε , :
(5) , 1 , 1 :
, 0,1
where is the flux vector and is a stoichiometric matrix, in which is the
number of metabolites and is the number of reactions. The mass balance constraint is
enforced in equation (1). Thermodynamic constraints that restrict flow direction are
imposed by setting and as lower and upper bounds on flux values in equation
(2), respectively. For each expressed reaction, the Boolean variables and _ represent
whether the reaction is active (in either direction) or not. Specifically, a highly expressed
reaction is considered to be active if it carries a significant positive flux that is greater
than a positive threshold ε (equation (3)) or a significant negative flux (equation (4)
for reversible reactions). We chose a threshold of 1 to determine reactions’ flux
activity for highly expressed reactions, though various other choices provide qualitatively
similar results. For each lowly expressed reaction, the Boolean variable represents
whether the reaction is inactive (equation (5)). Specifically, lowly expressed reactions are
considered to be inactive if they carry zero metabolic flux, though changing equation (5)
to enable these reactions to carry a low metabolite flux (that is, with an upper bound
lower than ε) and still be considered inactive provides qualitatively similar results.
26
The optimization maximizes the number of highly expressed reactions ( ) that are
active and the number of lowly expressed reactions ( ) that are inactive. A solution
found by the MILP solver is guaranteed to be an optimal one in terms of the objective
function maximized, but the solution identified may not be unique as a space of
alternative optimal solutions may exist. In this case, the space of optimal solutions
represents alternative steady-state flux distributions obtaining the same similarity with the
expression data. To account for these alternative solutions, a variant of Flux Variability
Analysis was employed. The method computes for each metabolic reaction whether it is
predicted to be always active (or, in the opposite case, always inactive) in a certain tissue
across the entire solution space. This is performed by solving two MILP problems (each
similar to the one described above) for each reaction to find the maximal attainable
similarity with the expression data when the reaction is forced to be activated (denoting
this similarity ) and when it is forced to be inactivated (denoting this similarity ).
A reaction is then considered to be active in this tissue if (that is, a higher
similarity with the expression data is achieved when the reaction is active than
when it is inactive) with a confidence level of . Conversely, it is considered to be
inactive if , with a confidence of . In case (that is, the same similarity
with the expression data can be achieved both when the reaction is forced to be active or
inactive), the activity state is considered to be undetermined.
The method implementation requires solving multiple, complex Mixed-Integer
Linear Programming (MILP) optimization problems, requiring extensive parallel
computing resources, and has hence not been readily accessible for the research
community since its publication.
27
In this work we introduce an Integrative Metabolic Analysis Tool (iMAT),
enabling the integration of transcriptomic, proteomic, and reactome array data with
metabolic network models to predict metabolic flux, developing variants of the approach
presented in (Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based
method for the integration of a genome-scale metabolic network with reactome array data
to predict metabolic flux activity. While high-throughput transcriptomic and proteomic
data have been available for quite some time now, the reactome array technology has
been very recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting
new genome scale data on the rate of metabolite transformation by enzymes present in
cell extracts. All together, iMAT supports the integration of functional data with an array
of different models, including: (i) a highly curated metabolic network model of human
metabolism by (Duarte, Becker et al. 2007), enabling the prediction of metabolic activity
under various tissues and cell-types; (Chatziioannou, Palaiologos et al.) (Chatziioannou,
Palaiologos et al.) common model organisms such as E. coli and S. cerevisiae; and (iii) an
array of automatically reconstructed networks for 160 bacteria (Overbeek, Begley et al.
2005), enabling the prediction of metabolic activity under various environmental and
genetic conditions. Importantly, The usage of iMAT is straightforward and user friendly,
starting with the submission of the functional data for a certain organism via WEB and
receiving a visualization of the organism’s metabolic network showing the most likely,
predicted metabolic flux. The applicability of iMAT is demonstrated here re. the
prediction of Human breast-cancer metabolism via gene expression data, and
Pseudomonas Putida metabolism via Reactome array data.
28
1.5 Human Cancer and Metabolism
In 2000, Douglas Hanahan and Robert Weinberg published a review detailing the six
hallmarks of cancer. These are six phenotypes that a tumour requires in order to become a
fully fledged malignancy: (1) persistent growth signals, (2) evasion of apoptosis, (3)
insensitivity to anti-growth signals, (4) unlimited replicative potential, (5) angiogenesis
and (6) invasion and metastasis. However, it is becoming increasingly clear that these
phenotypes do not portray the whole story and that other hallmarks are necessary: one of
which is a shift in cellular metabolism. The tumour environment creates a unique
collection of stresses to which cells must adapt in order to survive. This environment is
formed by the uncontrolled proliferation of cells, which ignore the cues that would create
normal tissue architecture. As a result, the cells forming the tumour are exposed to low
oxygen and nutrient levels, as well as high levels of toxic cellular waste products, which
are thought to propel cells towards a more transformed phenotype, resistant to cell death
and pro-metastatic. (Tennant, Duran et al. 2009)
In order to sustain the rapid proliferation and to counteract the hostile
environment observed in tumours, cells must increase the rate of metabolic reactions to
provide the adenosine triphosphate (ATP), lipids, nucleotides and amino acids necessary
for daughter cell production. Cells that do not undergo these changes will not survive the
tumour environment, resulting in the selection of those with a transformed metabolic
phenotype. One seemingly necessary metabolic alteration is the increase in the rate of
glycolysis, the conversion of glucose to pyruvate. In work beginning .80 years ago, Otto
Warburg noted that tumour cells use glycolysis (‘fermentation’), even in the presence of
29
O2. This was termed ‘aerobic glycolysis’ and since then has been considered as a
universal phenotype of tumours. In normal cells, an interplay exists between
mitochondrial respiration and glycolysis in which mitochondrial respiration inhibits
glycolytic flux—a phenomenon originally described in yeast by Pasteur in 1861 (the
‘Pasteur Effect’) and was expanded upon and extended to mammalian tissues by
Crabtree. High rates of aerobic glycolysis are not a mechanism unique to tumours, as all
energy-demanding cells utilize glycolysis as well as mitochondrial respiration for ATP
production. However, the phenotype that is unique to cancer is the high levels of lactate
that are produced from the increased rate of aerobic glycolysis. Forcing proliferating cells
into a resting, differentiated phenotype can decrease glycolytic rate and promote
oxidative Phosphorylation (OXPHOS) as the major ATP generating process, indicating
that, at least in the case of normal cells, this loss of the mitochondrial inhibition of
glycolysis is reversible. Glycolysis produces only 2 mol of ATP per mole glucose, an
inefficient bioenergetic process when compared with OXPHOS (up to 36 mol of ATP per
mole glucose); so in order to maintain normal ATP levels in the tumour, the rate of
glycolysis must be much greater than that observed in most normal tissues (exceptions
include the heart and kidney). This hunger of tumours for glucose is utilized in current
clinical practice, as primary and distant metastatic sites of tumours can be imaged in
patients using their uptake of a radiolabelled glucose derivative (18fluoro-2-
deoxyglucose). The change in metabolism cannot be purely attributed to alterations in
allosteric and product/substrate regulation of the metabolic enzymes. A concerted ‘energy
response’ also occurs involving factors such as mammalian target of rapamycin (mTOR),
30
Myc and the hypoxia-inducible factors (HIFs), which is vital for the long term metabolic
transformation of tumours. (Tennant, Duran et al. 2009)
Other than increased aerobic glycolysis, cancer cells also utilize glucose under
anaerobic conditions to compensate for the reduced mitochondrial ATP generation.
Hypoxia (low oxygen) and anoxia (complete lack of oxygen) are both present in most, if
not all solid tumours. Hypoxia specifically is thought to be an important factor in
supporting and directing tumour progression. However, contrary to being under constant
hypoxia, one important facet of the tumour environment is that the hypoxia experienced
by the cell is thought to be variable, even cycling between normal oxygen tension and
acute hypoxia (< 10 mm Hg O2). (Tennant, Duran et al. 2009)
Although strongly and rapidly up-regulated under short periods of hypoxia, during
chronic hypoxia, HIF levels are decreased. Only the areas furthest from functional blood
vessels experience this effect, and in the absence of an angiogenic response, they are
thought to form the necrotic areas in a tumour. The down-regulation of HIFα in these
circumstances is thought to help protect against the necrosis of cells, but this may only be
for a limited time period. Most of the hypoxic regions found in tumours are exposed to
fluctuating levels of O2, which allows for continued HIF stabilization. However,
fluctuating levels of O2 can also cause an increase in intracellular levels of reactive
oxygen species (ROS). It has been observed that ROS can be produced under hypoxia
due to inefficient electron transport chain (ETC) activity, and the resultant leakage of
electrons, mainly from complexes I and III of the ETC. (Tennant, Duran et al. 2009)
31
HIF expression in tumours—whether due to hypoxia, TCA cycle enzyme
mutation, mitochondrial dysfunction or aberrant growth factor stimulation—is known to
be vitally important for their progression. Studies in xenografts have shown that
decreasing HIF expression in tumours inhibits growth, and data from patient samples
have shown a correlation between HIF, HIF target gene expression and disease
progression and patient survival. This positions HIFα firmly as a therapeutic target, and a
number of antitumour therapies have been designed to interfere with HIF and its target
genes. (Tennant, Duran et al. 2009)
In order to undergo glycolysis, glucose enters the cell via a facilitative glucose
transporter (Figure 7). A number of glucose transporters are up-regulated in tumours,
Glut1 being particularly important in the tumour response to hypoxia. Up-regulation of
this transporter immediately increases the intracellular availability of glucose for
metabolic reactions, most of which are initiated by its Phosphorylation by hexokinase to
give glucose 6-phosphate (G6P, see Figure 7). Hexokinase II, one of the four hexokinase
isozymes, is a target of many transcription factors important in tumorigenesis, including
HIF1 and Myc (through the ‘carbohydrate response element’). Hexokinase is also thought
to have a role in protecting the cell against apoptosis. (Tennant, Duran et al. 2009)
The conversion of pyruvate to lactate appears important for the maintenance of
tumour cell viability. This is carried out by lactate dehydrogenase (Figures 7 and 8A), of
which the A isoform is strongly upregulated in tumours. Lactate production is important
for the recycling of cytosolic nicotinamide adenine dinucleotide (NAD ) in the absence of
functional mitochondrial-cytoplasmic NADH (the reduced form of NAD ) shuttles due to
32
decreased OXPHOS (Figure 8B). The regeneration of cytosolic NAD is vital for efficient
glycolysis. In studies carried out by Fantin et al., lactate dehydrogenase A suppression
not only pushed cells towards a more OXPHOS phenotype but also slowed their
proliferation in vitro, and in an in vivo model of breast cancer almost tripled the survival
of mice compared with an lactate dehydrogenase A expressing control. (Tennant, Duran
et al. 2009)
In order to sustain the rapid proliferation characteristic of tumours, increased
synthesis of both fatty acids and nucleotide precursors must occur. A mechanism used by
cells to support this is the diversion of glycolytic intermediates into the pentose
phosphate pathway, either from G6P (using the oxidative arm) or from fructose 6-
phosphate (using the non-oxidative arm). These intermediates can then be used to reduce
nicotinamide adenine dinucleotide phosphate (NADP +) to NADPH (from the oxidative
arm only) and synthesize ribose 5-phosphate (Figure 7). The control of glycolysis by
PFK2/FBPase and TIGAR (as mentioned earlier) has the ability to divert substrates into
the oxidative arm of the PPP. Increasing PFK2/FBPase phosphatase activity or inhibiting
PFK1 by some other means (such as increase in ATP or citrate) will therefore increase
PPP activity and support rapid cellular proliferation. The diversion of G6P into the PPP
flow not only has the capacity to increase nucleotide biosynthesis but also increase the
antioxidant capacity of the cell due to the generation of NADPH required for the
reduction of oxidized glutathione. In this respect, the acceleration of the PPP after DNA
damage, or during tumorigenesis in general, may prove important, as it provides much of
the necessary equipment with that to replicate and repair the DNA. NADPH generated by
33
the oxidative PPP also supports fatty acid biosynthesis required for tumour growth (see
below). Interestingly, the first two enzymes in this pathway, G6P dehydrogenase and 6-
phosphogluconate dehydrogenase are also up-regulated in transformed cells. (Tennant,
Duran et al. 2009)
34
35
Figure 7: Glycolysis and cancer. Green text—enhanced/activated in cancer. Red text—reduced/inhibited in cancer. Figure from (Tennant, Duran et al. 2009). .
36
37
Figure 8: (A) TCA cycle and cancer. (B) The malate/aspartate shuttle. This process is used to transfer electrons from the cytosolic NADH pool to the mitochondria to be oxidized by the ETC. Green text—enhanced/activated in cancer. Red text—reduced/inhibited in cancer. Figure from (Tennant, Duran et al. 2009).
Proliferating cells in general and cancer cells in particular require de novo
synthesis of lipids for membrane assembly. Under conditions where PDH is not inhibited,
pyruvate is converted into acetyl-CoA and enters the TCA cycle by condensing with
oxaloacetate to form citrate (Figure 8A). This intermediate is mostly further oxidized in
the TCA cycle to produce reducing potential for the mitochondrial ETC, but can also be
used for fatty acid synthesis in the cytosol. Cytosolic citrate is converted back into
oxaloacetate and acetyl-CoA by the action of ATP citrate lyase. The reduction in levels or
activity of any of the three enzymes involved in fatty acid synthesis has been shown to
inhibit tumour growth and may therefore represent a target for tumour therapies.
Interestingly, activation of AKT has been found to inhibit the β oxidation (degradation)
of lipids by inhibiting the expression of carnitine palmitoyltransferase (CPT1A). This
further support the anabolic reprogramming observed in tumorigenesis and their push
towards increased proliferation. (Tennant, Duran et al. 2009)
Glutaminolysis: There are two major sources of energy and carbon for cancer
cells: glucose and glutamine. Cancer cells appear to use excessive amounts of both
nutrients: more than they need for their function. One possible explanation is that high
rates of flux through these metabolic pathways can affect the regulation of other
metabolic branches, allowing high rates of proliferation. A consequence of this excess is
the increased secretion of by-products of glucose and glutamine degradation, mainly
lactate, alanine and ammonium (Figure 8A). It has been recently proposed that in this
38
context, glucose accounts mainly for lipid and nucleotide synthesis, whereas glutamine is
responsible for anaplerotic re-feeding of the TCA cycle, for amino acid synthesis and for
nitrogen incorporation into purine and pyrimidine for nucleotide synthesis. Glycolysis is
capable of re-feeding the TCA cycle in the presence of functional pyruvate carboxylase
(Figure 8A). In light of the rapid growth and proliferation of tumour cells, catabolic
reactions are unlikely to be used to feed the TCA cycle, predicting that in order for cells
to efficiently use glutamine for anabolic reactions, at least some pyruvate must enter the
TCA cycle, instead of being converted to lactate. (Tennant, Duran et al. 2009)
Once in the cell, glutamine is initially deaminated to form glutamate, a process
catalysed by the enzyme glutaminase (Figure 8A). Glutamate in turn can be converted
into α-ketoglutarate either by a second deamination process catalysed by the enzyme
glutamate dehydrogenase or through transamination. On entering the TCA cycle, α-
ketoglutarate is metabolized to eventually generate oxaloacetate, an important anabolic
precursor that will condense with the acetyl-CoA generated from glycolysis or
glutaminolysis to produce citrate. The importance of glutaminolysis in cancer
metabolism is evident from the considerable release of ammonium in the de venous
effluent of cancer patients, and by the fact that, with time, the majority of patients
develop glutamine depletion. In fact, glutaminase has been found to be over-expressed in
a variety of tumour models and human malignancies, and the rate of glutaminase activity
correlates with the rate of tumour growth. Unfortunately, despite promising signs in
leukaemic mouse models, mammary tumours and colon carcinoma, therapeutic strategies
designed to limit the availability of glutamine to cancer cells with inhibitors of
glutaminase (6-diazo-5-oxo-L-norleucine or acividin) failed due to severe side effects
39
during clinical trials. However, better knowledge of the biochemical and regulatory
processes of glutamine uptake and degradation in normal and cancer cells could
constitute a major goal in designing new strategies against cancer. (Tennant, Duran et al.
2009)
Mitochondrial citrate not exported for anabolic use is used in the TCA cycle to
produce reducing equivalents for the ETC (Figure 8A). Two of the enzymes in this
pathway succinate dehydrogenase (SDH) and fumarate hydratase (FH) are of particular
importance for cancer. SDH is also complex II of the ETC, where reduced flavine
adenine dinucleotide (FADH2) is generated and further oxidized. It consists of four
subunits: A and B, which are associated with the inner leaflet of the mitochondrial inner
membrane and C and D, which are embedded in the mitochondrial inner membrane.
Although their function is vital for the normal working of the TCA cycle, mutations in
either FH or SDHB, SDHC or SDHD are known causes of a number of familial and
sporadic cancers, namely leiomyoma, leiomyosarcoma or renal cell carcinoma (FH),
paraganglioma and pheochromocytoma (SDHB, SDHC and SDHD). Phenotypically, all
of these mutations result in pseudohypoxia, referring to the normoxic induction of HIFα
subunits. Mechanistically, it has been shown that the increase in succinate (SDH
mutations) or fumarate (FH mutations) levels is responsible for inactivation of the PHDs
even in the presence of O2, leading to the normoxic stabilization of HIFα and up-
regulation of its downstream effectors (Figure 9). As discussed earlier, mitochondrial
ROS can be produced from both complexes I and III of the ETC under hypoxia.
However, it has been suggested that SDHB, SDHC and SDHD mutations can also result
40
in normoxic ROS production and HIF activation, though the role of ROS in
pseudohypoxia of SDH-deficient cells is debatable. (Tennant, Duran et al. 2009)
Figure 9: Synthesis, degradation and regulation of HIFa. aKG, a-ketoglutarate. Figure from (Tennant, Duran et al. 2009) Amino acids and their transporters: To sustain high proliferation rates, cancer
cells are extremely dependent on extra-energy and nutrient supply. Therefore, nutrient
uptake and metabolism are frequently altered and enhanced in tumour cells. Amino acids
are the primary source of cellular nitrogen. In addition to being the building blocks for
41
protein synthesis, they are used for nucleotide and glutathione synthesis, and the carbon
backbone can also be used for ATP synthesis. Moreover, amino acids have an important
role in regulating signalling pathways that govern cell growth and survival. Many human
tumour cells express high levels of amino acid transporters, and this correlates with
disease progression. A notable example is the alanine serine cysteine transporter 2
(ASCT2) transporter, a non-specific neutral amino acid transporter that functions as the
major transporter of glutamine in numerous cell lines. Given that glutamine has a key role
in tumour cell metabolism (discussed above) and that glutamine transport is increased in
tumour cells, it is not surprising that ASCT2 expression is also enhanced during tumour
development. ASCT2 expression is enhanced in breast, liver and brain tumours, and
inhibition of ASCT2-dependent glutamine transport inhibited the growth of colon
carcinoma cell lines. Moreover, silencing of the ASCT2 messenger RNA transcript
causes dramatic apoptosis in hepatoma cells and this appears to occur in parallel with its
role in glutamine uptake. Enhanced expression of L-type amino acid transporter
(LAT1), another amino acid transporter with high affinity for several essential amino
acids including leucine, tryptophan and methionine, has been reported in high-grade
astrocytomas and correlates with poor survival. LAT1 inhibition has been shown to block
glioma cell growth in both in vitro and in vivo models. These findings highlight the
growth advantage conferred to tumour cells by increased amino acid transporter
expression and point to a potential role for amino acid transporter inhibition as a
therapeutic strategy. (Tennant, Duran et al. 2009)
42
Mammalian target of rapamycin: As previously mentioned, cells strictly depend
on nutrient availability and growth stimuli to sustain growth and proliferation. The
regulation of these stimuli is integrated by target of rapamycin (TOR), a highly
evolutionarily conserved mechanism, present from unicellular eukaryotes to mammals. In
mammals, mTOR is involved in four different sensing mechanisms: growth factor
signalling; nutrient availability; oxygen availability and internal energetic status. All four
factors are especially important for tumour development. Solid tumours can become
limited for nutrient, oxygen and growth factors, a situation potentially leading to
energetic limitations inside the cancer cell. Therefore, de-regulation of the molecular
process that controls these mechanisms could be critical for tumour development.
(Tennant, Duran et al. 2009)
Summary: The extent to which metabolism plays a role in tumorigenesis should
not be underestimated, and drugs that can selectively target the metabolic phenotype of
the tumour and its environment are likely to at least delay, if not halt tumour progression.
The resistance of tumours to both radiotherapy and chemotherapy can often be attributed
to its aberrant metabolism. It therefore follows that the reactivation of a more ‘normal’
metabolism could very well re-sensitize tumours to these agents. Cell metabolism is
inextricably linked to its differentiated state: if we can reverse the metabolism of a de-
differentiated, aggressive tumour to that of a more quiescent state it may become more
amenable to other interventions. Therapies that target tumour metabolism are already
being tested in pre-clinical and clinical studies, but this field is very much in its infancy.
43
It is anticipated that the next few years will provide more new therapeutic approaches that
target metabolic transformation. (Tennant, Duran et al. 2009)
1.5.1 Breast Cancer Metabolism Breast cancer is the most common cancer type for women in the western world. Despite
decades of research, the molecular processes associated with the breast cancer
progression are still inadequately defined. Recently (Lu, Bennet et al. 2010) deferred
focus to the systematic alteration of metabolism by using the state of the art metabolomic
profiling techniques to investigate the changes of 157 metabolites during the progression
of normal mouse mammary epithelial cells to an isogenic series of mammary tumour cell
lines with increasing metastatic potentials. The results suggest a two-step metabolic
progression hypothesis during the acquisition of tumourigenic and metastatic abilities.
Metabolite changes accompanying tumour progression are identified in the intracellular
and secreted forms in several pathways, including glycolysis, tricarboxylic acid cycle,
pentose phosphate pathway, fatty acid and nucleotide biosynthesis and the GSH-
dependent anti-oxidative pathway. These results suggest possible biomarkers of breast
cancer progression as well as opportunities of interrupting tumour progression through
the targeting of metabolic pathways.
Approximately 40,000 American women succumb to breast cancer each year,
with metastasis causing the overwhelming majority of these deaths. Metastasis is a multi-
step process, requiring tumour cells to intravasate into the bloodstream, survive in the
44
circulation, adhere to and extravasate from the vascular network in the secondary organ,
and finally adapt to a foreign microenvironment. Recent functional transcriptomics
studies identified genes that play important roles in individual steps of breast cancer
metastasis with the MDA-MB-231 xenograft model or the 4T1 syngeneic mouse models.
However, studies with metabolomic approaches to identify key metabolites that
characterize metastasis progression are still scarce. Metabolic reprogramming was linked
to the major hallmarks of cancer, including tissue invasion and metastasis. However, its
functional role in tumour progression and metastasis remains largely undefined. In a
recent study, metabolomic profiling identified increased sarcosine synthesis as a
functionally important metabolic alteration during prostate cancer progression. Similar
efforts to identify important metabolic changes during breast cancer progression hold the
potential for providing putative diagnostic and prognostic biomarkers as well as new
therapeutics targets. In the current study, we used the 4T1 syngeneic mouse model to
systematically identify metabolite changes. (Lu, Bennet et al. 2010)
Malignant transformation of normal epithelial cells and metastasis ability
acquisition has usually been studied with the aim to identify genes and proteins that play
tumour-promoting or suppressive roles. The pace of finding such molecules has been
tremendously expedited with the development of transcriptomics and proteomics to look
for transcripts and proteins with altered abundance during malignancy. Another aspect of
molecular changes, altered metabolism, was less explored, even though the phenomenon
of aerobic glycolysis was one of the first major discoveries in cancer research. (Lu,
Bennet et al. 2010)
45
Fortunately, the situation is improving owing to the recent technical advances in global
analysis of metabolites using mass spectrometry or high-resolution 1H nuclear magnetic
resonance spectroscopy. High-throughput metabolomic analysis allows simultaneous
quantification of hundreds of metabolites belonging to a diverse array of metabolic
pathways in a panel of cell lines or tissues. As illustrated in (Lu, Bennet et al. 2010), 157
metabolites were profiled in six cell lines with progressively increased tumourigenicity
and metastatic ability. The analysis of intracellular metabolites clustered the lines into
three categories, normal, tumourigenic (but non-metastatic) and metastatic in general.
Results from the analysis favour a two-step metabolic progression hypothesis during
mammary tumour progression: the first step accompanies the acquisition of
tumourigenicity and includes altered glycolysis, PPP and fatty acid synthesis as well as
decreased GSH/GSSG redox pool; the second step is correlated with the gain of the
general metastatic ability and includes further changes in glycolysis and TCA cycle,
further depletion of the glutathione species, and increased nucleotides. No further
metabolite alterations correlated with stepwise increase of metastasis potential in the four
metastatic lines were resolved in the analysis. This model suggests that the fine regulation
of the ability to colonize distant organs in breast cancer may not require further dramatic
biochemical reprogramming and may instead rely more on alterations in gene expression
regulation and cellular behaviours, although we cannot rule out the potential importance
of metabolomic changes in metastatic lesions in vivo in different target organs. Our
analysis of extracellular metabolites identified increased abundance of TCA cycle
components as well as nucleotide metabolism intermediates, similar to the intracellular
results. Our findings agree with a recent study profiling a more limited set of metabolites
46
in the MCF10 model of mammary carcinoma. Both studies find evidence for increased
pentose phosphate pathway, TCA cycle, and fatty acid biosynthetic activity in
transformed and/or metastatic cells. Further efforts should investigate the universality of
these findings with other in vitro and in vivo preclinical models as well as with human
samples. Confirmed altered metabolic pathways may open new therapeutic avenues for
treating malignant breast cancer. Several secreted metabolites accompanying the
increased metastatic potential (malate, fumarate, deoxyguanosine, guanine, xanthine, and
hypoxanthine) should be tested for their value as diagnostic and prognostic biomarker of
malignant breast cancer in future studies. (Lu, Bennet et al. 2010)
1.5.2 Modeling Cancer Metabolism Cancer is a complex disease that involves multiple types of biological interactions across
diverse physical, temporal, and biological scales. This complexity presents substantial
challenges for the characterization of cancer biology, and motivates the study of cancer in
the context of molecular, cellular, and physiological systems. Computational models of
cancer are being developed to aid both biological discovery and clinical medicine. The
development of these in silico models is facilitated by rapidly advancing experimental
and analytical tools that generate information-rich, high-throughput biological data.
Statistical models of cancer at the genomic, transcriptomic, and pathway levels have
proven effective in developing diagnostic and prognostic molecular signatures, as well as
in identifying perturbed pathways. Statistically inferred network models can prove useful
in settings where data overfitting can be avoided, and provide an important means for
biological discovery. Mechanistically based signalling and metabolic models that apply a
47
priori knowledge of biochemical processes derived from experiments can also be
reconstructed where data are available, and can provide insight and predictive ability
regarding the behaviour of these systems. At longer length scales, continuum and agent-
based models of the tumour microenvironment and other tissue-level interactions enable
modeling of cancer cell populations and tumour progression. Even though cancer has
been among the most-studied human diseases using systems approaches, significant
challenges remain before the enormous potential of in silico cancer biology can be fully
realized (Edelman, Eddy et al. 2009).
Monumental advances in molecular and cellular biology, beginning in the latter
half the 20th century and continuing today, have provided an increasingly detailed
portrait of human biology from the molecular to physiological levels. These advances
have centred on ‘reductionist’ experimental approaches aiming to annotate a vast array of
biological components, from cells and tissues to genes and proteins. Collectively, these
components represent a ‘parts list’ for biological systems (e.g., biochemical pathways,
larger interaction networks). At scales beyond a handful of interacting components,
however, simple analysis techniques can become limited in providing comprehensible
insight into resulting phenotypic behaviours. Systems biology is a rapidly growing
discipline that employs an integrative approach to characterize biological systems, in
which interactions among all components in a system are described mathematically to
establish a computable model. These in silico models—which complement traditional in
vivo animal models—can be simulated to quantitatively study the emergent behaviour of
a system of interacting components. Model development in the systems biology paradigm
48
is enabled by the description of parts and interactions from reductionist biology, and also
depends upon quantitative measurements. The advent of high-throughput experimental
tools has allowed for the simultaneous measurement of thousands of biomolecules,
paving the way for in silico model construction of increasingly large and diverse
biological systems. Integrating heterogeneous dynamic data into quantitative predictive
models holds great promise to significantly increase our ability to understand and
rationally intervene in disease-perturbed biological systems. This promise, particularly
with regards to personalized medicine and medical intervention, has motivated the
development of new methods for systems analysis of human biology and disease.
(Edelman, Eddy et al. 2009)
Cancer is an intrinsically complex and heterogeneous disease, making it
particularly amenable to systems biology approaches. Malignant tumours develop as a
function of multiple biological interactions and events, both in the molecular domain
among individual genes and proteins, and at the cellular and physiological levels between
functionally diverse somatic cells and tissues (Figure 10). At the molecular level, genetic
lesions interact synergistically to evade tumour suppression pathways, with no single
mutation typically sufficient to cause transformation. Beyond genetic mutations,
transformed cells can exhibit changes in expression of hundreds to thousands of genes
and proteins. Genetic modifications observed in cancer are often accompanied by
changes at the epigenetic level. The convolution of genetic effects and epigenetic
modifications illustrates the complex, nonlinear relationship between molecular state and
cellular cancer phenotype, emphasizing the need for heterogeneous data integration
49
through in silico models. The diversity of cancer models mirrors the broad array of
molecular and physiological events characteristic of the disease (Figure 11). The most
course-grained approaches use statistical analysis of high-throughput expression data to
identify molecular signatures of cancer phenotypes. Such signatures are indicative of
aberrant function of genes or pathways, and can be used to predict the type, stage, or
grade of biopsied tumour samples. More advanced methods aim to statistically infer the
structure and/or quantitative relationships among biomolecules within interaction and
regulatory networks of importance in cancer. Alternatively, stoichiometric or kinetic
models of biochemical reaction networks, constructed in a bottom up, annotation based
manner, can be used to simulate in mechanistic detail the behaviour of metabolism or
signal transduction in cancer. (Edelman, Eddy et al. 2009)
Figure 10: Molecular and physiological complexity in cancer. Figure from (Edelman, Eddy et al. 2009)
50
Figure 11: Biological scales and potential modeling approaches. Figure from (Edelman, Eddy et al. 2009)
The complexity of intracellular phenomena observed in cancer is mirrored by
equally intricate interactions between cells and across somatic tissues. Among the most
important biological systems mediating cancer development is the local tumour
microenvironment, a complex, interacting system of cells and extracellular moieties.
Contributory agents include the extracellular matrix, cooperating tumour and proximate
‘host’ cells, extracellular signaling factors, and the metabolic context of local tissue.
Other important agents include the infiltrating leukocytes and cytokines of the immune
system. Human cancers also exhibit other major interactions with somatic tissues
concomitant to malignant invasion, such as tumour-induced angiogenesis. The potential
response to chemotherapeutics, radiotherapy, and surgical procedures represent additional
confounding factors in the cellular and physiological behaviour of cancer cells. The
heterogeneous nature of the tumour microenvironment poses substantial modeling
51
challenges, yet ongoing research has sought to characterize these cancer systems,
including continuum and discrete models. (Edelman, Eddy et al. 2009)
Despite significant experimental and analytic challenges arising from cancer’s
complexity, modeling has already successfully led to insights into cancer biology and
treatment, as will be discussed herein. Some of the earliest models describing the
molecular basis of cancer over half a century ago implicated the absolute number of
genetic mutations as causative for malignancy. Today, important efforts in sequencing the
human genome and now individual cancers mean that malignant genetic transformations
can be studied and modeled in the context of the entire genome. (Edelman, Eddy et al.
2009) describe key examples of recent in silico modeling efforts in cancer. These include
(1) statistical models of cancer, such as molecular signatures of perturbed genes and
molecular pathways, and statistically inferred reaction networks; (2) models that
represent biochemical, metabolic, and signaling reaction networks important in
oncogenesis, including constraint-based and dynamic approaches for the reconstruction
of such networks; and (3) continuum and agent-based models of the tumour
microenvironment and tissue level interactions. (Edelman, Eddy et al. 2009)
In contrast to statistically inferred networks, biochemical reaction networks are
constructed to represent explicitly the mechanistic relationships between genes, proteins,
and the chemical interconversion of metabolites within a biological system (Figure 12).
In these models, network links are based on pre-established biomolecular interactions
rather than statistical associations; significant experimental characterization is thus
52
needed to reconstruct biochemical reaction networks in human cells. These biochemical
reaction networks require, at a minimum, knowledge of the stoichiometry of the
participating reactions. Additional information such as thermodynamics, enzyme capacity
constraints, time-series concentration profiles, and kinetic rate constants can be
incorporated to compose more detailed dynamic models. (Edelman, Eddy et al. 2009)
Figure 12: Comparison of biochemical reaction network and statistical network models. Figure from (Edelman, Eddy et al. 2009)
53
The most basic mathematical representation of a biochemical reaction network is
a stoichiometric model. Stoichiometric models describe the interconversion of
biomolecules purely in terms of the number of reactants and products participating in
each reaction. The generation of stoichiometric and analysis of their properties is a well
established process, and genome-scale models of metabolism have been completed for a
diverse range of organisms. Methods have also been developed for reconstructing
signalling networks, transcriptional and translational networks, and regulatory networks;
these models are fundamentally analogous to reconstructed metabolic networks (Figure
13). The reconstruction of a biochemical reaction network results in a database of
stoichiometric equations that can be represented mathematically to form the foundation of
a genome-scale, computable model. Computational tools for constraint-based analysis are
then used to interrogate the properties of the reconstructed network in silico, and to
facilitate model-driven validation and refinement. Physico-chemical and environmental
constraints under which the network operates are applied in the form of balances,
including mass, energy, and charge, and bounds, such as flux capacities and
thermodynamic constraints. The statement of constraints defines a solution space
comprising all non-excluded network states, thereby describing possible functions or
allowable phenotypes. These methods are now being adapted for modeling human
systems in greater detail. (Edelman, Eddy et al. 2009)
54
Figure 13: Mathematical representation of reaction links in biochemical networks.
The global human metabolic reconstruction provides a basis for the known set of
metabolic reactions catalyzed by human proteins. However, the utility of these models for
cancer research going forward depends upon overcoming several challenges. First,
further refinement of the global human metabolic map is essential to increase its
accuracy. Second, each of the approximately 200 cell types in the human body exhibits
only a portion of the full metabolic capability contained in the genome. The high
percentage of undetermined activities for metabolic enzymes in human tissues clearly
shows how much more we have to learn about even this very well-studied cellular
55
process. Effectively representing which portions of the global human metabolic network
are active in any given cell type, and at what level, is thus of critical importance. Cancers,
in particular, are known to exhibit diverse metabolic phenotypes compared with their
progenitor cells, typically with an increased rate of overall metabolic activity to support
their increased growth and the highest metabolic activity observed in the most aggressive
malignancies. Multiple other hallmarks of cancer including angiogenesis, metastasis,
evasion of apoptosis, and avoidance of immune detection have been previously linked to
human tumour metabolism. Metabolic targets have also been used in cancer
chemotherapy. For these reasons, metabolic networks in human cancer have the potential
to be a rich focus area for systems modeling going forward. (Edelman, Eddy et al. 2009)
In silico models of cancer can be built not only for intracellular networks, but also
at larger length scales. Alternative computational methods must be applied to consider the
interface between cancers and the tissue contexts in which they reside. These settings
exhibit complex interactions with multiple factors of different function and scale,
including extracellular biomolecules, a spatially intricate and dynamic vasculature, and
the immune system. Models of cancer at the tissue level that account for these
functionally divergent parameters can be broadly divided into ‘continuum’ models, and
discrete or ‘agent-based’ models. The latter are often applied when the number of
individual interacting units, such as cancer cells, is constrained to remain small; the
former is more practical at population scales where agent-based modeling can be
computationally prohibitive. Both methods can integrate information about the biological
56
context in which cancers develop, and thus represent a multi-scale consideration of
oncogenesis as it occurs within somatic tissues. (Edelman, Eddy et al. 2009)
Extracellular parameters can be represented as continuously distributed variables
to mathematically model cell–cell or cell–environment interactions in the context of
cancers and the tumour microenvironment. Systems of partial differential equations have
been used to simulate the magnitude of interaction between these factors, including the
effects of hypoxia on cell cycle progression, the impact of mechanical forces on tumour
invasiveness and extracellular matrix interactions. Recent studies have examined cell
population dynamics within colonic crypts in colorectal cancer. These models consider
interactions between stem cells, differentiating cells, and differentiated cell populations to
quantitatively predict tissue-level invasion and the growth of tumour mass. Other models
have represented solid tumours as a multiphase system of both bound and ‘mobile’ forms.
Such ‘mixture’ models consider differential growth and apoptosis rates, as well as mass
transfer and regulatory interactions between phases. Alternative models have considered
nonlinear and combinatoric effects of multiple factors, including nutrient availability and
mechanical parameters, and the effects of mutation rate on invasion and metastasis. These
numerical systems embody a robust method to incorporate the effects of somatic
biological phenomena into computational representations of cancer. Continuum-based
models are thus a powerful tool to simulate and characterize interactions between
intracellular and extracellular factors in oncogenic processes. (Edelman, Eddy et al. 2009)
57
Multivariate continuum models are able to represent the effects of several
physiological or biochemical events on cancer development. However, in situ, these
factors are highly heterogeneous, and interact discontinuously with tumour cells. Cellular
automata models represent cancer cells as discrete entities of defined location and scale,
interacting with one another and external factors in discrete time intervals according to
predefined rules. Agent based models expand the cellular automata paradigm to include
entities of divergent functionalities interacting together in a single spatial representation,
including different cell types, genetic elements, and environmental factors. With
sensitivity to starting conditions, and the ability to incorporate probabilistic interactions at
each time step, these models can exhibit similar stochastic behaviours to those observed
in oncogenesis in vivo. Phenomena that have been modeled using agent-based models
include three dimensional tumour cell patterning, immune system surveillance,
angiogenesis, and the kinetics of cell motility. Another recent model integrated diverse
parameters such as extracellular signals, blood flow, and tissue degradation to simulate
the spatiotemporal formation of tumour vasculature. (Edelman, Eddy et al. 2009)
Increasingly, ‘hybrid’ models have been created which incorporate both
continuum and agent based variables in a modular approach. For example, a recent study
considered continuous extracellular biomolecule distributions and discrete cell locations
to simulate the interaction between intracellular decision-making processes and malignant
growth. Another recent model incorporated a continuous model of a receptor signaling
pathway, an intracellular transcriptional regulatory network, cell–cycle kinetics, and
three-dimensional cell migration in an integrated, agent-based simulation of solid brain
58
tumour development. The interaction between cellular and microenvironment states have
also been considered in a multi-scale model that predicts tumour morphology and
phenotypic evolution in response to such extracellular pressures. These and other
techniques which incorporate multiple, nested scales of interacting biology embody
promising paradigms to understand cancer as a cascade of information across levels of
size and complexity. This ability to interrogate cancer across multiple biological agents
and compartments presents a unique framework to elucidate oncogenic processes, and to
evaluate potential therapeutic interventions through digital simulation prior to
experimental deployment. (Edelman, Eddy et al. 2009)
In this work, human breast cancer was investigated via the integration of a human
metabolic model and gene expression data.
1.6 Reactome Array data Technology
(Beloqui, Guazzaroni et al. 2009) have applied reactome array technology to measure
metabolite transformation in P. putida for genome sequence–independent functional
analysis of metabolic phenotypes and networks. The array includes 1676 substrate
compounds collectively representing central metabolic pathways of all forms of life.
Proof of concept was shown inter alia, by the reconstruction of P. putida’s metabolic
network, demonstrating that the array discriminates compounds metabolized by extracts
of P. putida from those that are not.
59
Functional genomics has greatly accelerated research on the genomic basis of life
processes in health and disease and provided a quantum advance in our understanding of
such processes, their regulation, and underlying mechanisms. Functional assignments and
metabolic network reconstructions have generally depended on both the genome
sequence of the organism(s) in question and bioinformatic analyses based on homology
to known genes and proteins However, many genes in databases have questionable
annotations or are not annotated at all, which hinders effective exploitation of the rapidly
growing volume of genome sequence data. Metabolomics provides new insights into the
metabolic state of a cell under a given set of environmental parameters, or in response
to a parameter change, independently of a genome sequence, although problems of
metabolite identification and quantification exist. Functionally associating the
metabolic profile obtained with the enzymes and pathways responsible still depends
heavily on sequenced-based metabolic reconstructions. There is thus a need for a new
method to causally link metabolites with cognate enzymes, which, in addition to
delivering global descriptions of metabolic responses to given environmental conditions,
simultaneously provides annotation of the enzymes featured. The “reactome array” was
designed to forge this link between genome and metabolome, providing a global
metabolic phenotype of a cell extract derived from a clonal population of cells or a
mixture of cell types, as is found in clone libraries, tissues, or multicellular organisms.
The array constitutes a generic tool for metabolic phenotyping of cells and annotation of
proteins and has applications in diverse aspects of biology and medicine. The reactome is
a sensitive metabolite array for genome sequence–independent functional analysis of
metabolic phenotypes and networks, of cell populations and communities. Application of
60
cell extracts to the array leads to specific binding of enzymes to cognate substrates,
transformation to products, and concomitant activation of the dye signals. Utility of the
array for unsequenced organisms was demonstrated, inter alia, by reconstruction of the
global metabolisms of three microbial communities derived from acidic volcanic pool,
deep-sea brine lake, and hydrocarbon-polluted seawater. Enzymes of interest are captured
on nanoparticles coated with cognate metabolites, sequenced, and their functions
unequivocally established. (Beloqui, Guazzaroni et al. 2009)
1.7 Automatically generated Metabolic Models
Presently model reconstruction lags behind genome sequencing with ~1000 completely
sequenced prokaryotes vs ~50 published genome‐scale models. Models are often
constructed one‐at‐a‐time by individuals working independently, resulting in replication
of work, propagation of errors, and extensive manual curation. It currently requires
approximately one year to produce a complete manually curated model. Rapid
Annotation using Subsystem Technology (RAST) and SEED have made high‐speed,
quality annotation of prokaryotic genomes a reality. In RAST, each biological subsystem
is continuously annotated and curated across all known genomes by a annotator with
expert knowledge in that subsystem. The modeling pipeline exploits the high quality of
RAST annotations along with a variety of optimization algorithms to automatically
generate genome‐scale models. The automated model reconstruction pipeline produces
genome‐scale models that are comparable in size with the available published
genome‐scale models. The automatic reactions added during auto‐completion process of
the pipeline exposed regions of metabolism where more annotation efforts are necessary.
61
The optimization steps of the pipeline boosted model accuracy from initial values of
66% to optimized values of 87%, which approaches the accuracy typical of manually
reconstructed models. The model optimization process also enabled the identification of
missing transporters, additional missing reactions, under‐constrained reactions, and
annotations that are inconsistent with available essentiality data. (Overbeek, Begley et al.
2005)
62
2 iMAT: Integrative Metabolic Analysis Tool
2.1 Online Tool Development
In this work we introduce an Integrative Metabolic Analysis Tool (iMAT), enabling the
integration of transcriptomic, proteomic, and reactome array data with metabolic network
models to predict metabolic flux, developing variants of the approach presented in
(Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based method for
the integration of a genome-scale metabolic network with reactome array data to predict
metabolic flux activity. While high-throughput transcriptomic and proteomic data have
been available for quite some time now, the reactome array technology has been very
recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting new genome
scale data on the rate of metabolite transformation by enzymes present in cell extracts.
All together, iMAT supports the integration of functional data with an array of different
models, including: (i) a highly curated metabolic network model of human metabolism by
(Duarte, Becker et al. 2007), enabling the prediction of metabolic activity under various
tissues and cell-types; (Chatziioannou, Palaiologos et al.) common model organisms such
as E. coli and S. cerevisiae; and (iii) an array of automatically reconstructed networks for
160 bacteria (Overbeek, Begley et al. 2005), enabling the prediction of metabolic activity
under various environmental and genetic conditions. Importantly, The usage of iMAT is
straightforward and user friendly, starting with the submission of the functional data for a
certain organism via WEB and receiving a visualization of the organism’s metabolic
network showing the most likely, predicted metabolic flux. The applicability of iMAT is
63
demonstrated here re. the prediction of Human breast-cancer metabolism via gene
expression data, and Pseudomonas Putida metabolism via Reactome array data.
2.1.1 Online availability
iMAT is available at http://imat.cs.tau.ac.il/
Utilizing iMAT to predict metabolic flux based on transcriptomic, proteomic, or reactome
array data requires the specification of an organism of interest and uploading the input
data file (for a list of supported organisms, see the iMAT website). To predict metabolic
flux based on gene expression or proteomic data, the user is required to supply discrete
tri-valued expression state of genes, being either lowly, moderately, or highly expressed
in the condition studied. For reactome array data, the user is required to supply the
discrete transformation rate of metabolites, being either lowly, moderately, or highly
transformed (reflecting the strength of metabolic consumption by the corresponding
enzymatic reactions). Various parameters can be tuned to control the discretization of the
raw input files, as described online in the iMAT website. Given the above input, iMAT
predicts a flux activity state for each reaction in the model, reflecting the presence or
absence of its associated metabolic flux. For some of the reactions, the flux activity state
can be uniquely determined to be active or inactive, with associated confidence
estimations. For others, the activity state cannot be uniquely determined because of
potential alternative flux distributions with the same overall consistency with the
expression data due to isozymes or alternative pathways. In cases where the predicted
flux activity of reactions deviate from the given expression state of the corresponding
enzyme-coding gene, the corresponding gene is considered to be post-transcriptionally
64
up-or-downregulated. iMAT provides as output the predicted flux activity state and the
corresponding confidence values over all network reactions in both tabular and network
visualization forms. The network visualization displays the relevant transcriptomic,
proteomic and reactome array data given as input, as well as the predicted metabolic flux,
superimposed on top of the organism’s metabolic network, employing the publicly
available Cytoscape software (Cline, Smoot et al. 2007). In addition, iMAT provides a
pathway enrichment analysis based on the flux activity predictions. (A detailed
description of iMAT’s functionality can be found in the attached user guide).
2.1.2 An illustrative example of applying iMAT to a toy network model
Figure 14 shows two examples of applying iMAT to a small toy model given either gene
expression or reactome array data as input, predicting the same metabolic flux
distribution in both cases. The toy model is comprised of ten metabolites and thirteen
reactions, including seven exchange reactions that enable the uptake of substrates and the
secretion of metabolic by-products.
65
Figure 14: (a) An illustrative example of applying iMAT on a toy metabolic network (shown in b) given either (a) gene expression or (c) reactome array data as input. Circular nodes represent metabolites, edges represent biochemical reactions, and diamond-shaped nodes represent enzyme-coding genes. iMAT’s output is an optimal flux distribution that is the most consistent with (d) expression data or (e) reactome array data given as input. (d) Reactions associated with highly, lowly or moderately expressed genes are colored in green, red, or black respectively, (e) Nodes colored in green, red or black, represent highly, lowly or moderately transformed metabolites (based on reactome array data), respectively. Solid (dashed) edges represent reactions predicted to active (inactive). Reactions whose flux activity state is uniquely determined to be active or inactive (across the whole space of alternative optimal flux distributions) are marked with thick edges.
In the example application of iMAT to integrate gene expression data, the predicted
flux is consistent with the expression state of 4 of the 5 reactions, predicted to be active
(inactive) in accordance with the high (Hu, Mellor et al.) expression state of their enzyme
coding genes. One reaction (M6->M9) is predicted to be inactive though its
corresponding gene is highly expressed, giving rise to a potential post-transcriptional
regulation. Of the five metabolites that can be transported across the membrane
66
boundary in the toy model (M1-2, M7-9), iMAT predicts the uptake of one metabolite
(M1) and the secretion of two others (M7 and M8). Notably, while the high expression
level of the membrane transporter of M1 indicates that it may potentially be active, it
does not provide information whether M1 is taken up or secreted from the tissue. In
contrast, iMAT can predict flux directionality in some cases by propagating known
constraints on the reversibility of other enzymes (inferred based on thermodynamic
principles; in this case based on the known irreversibility of G3 and G4, and spontaneous
reactions ->M3, M3+M4->M7+M8, M8->).
In the second example of applying iMAT utilizing reactome array data, iMAT
predicts the same flux distribution that is consistent with the metabolic transformation
state of five out of the six metabolites in the network. In this case, the high transformation
state of metabolite M1 indicates that it is used as a substrate by some enzymatic reaction,
without an indication of the specific reaction in which it participates. iMAT predicts that
M1 will be transformed by M1->M4, and not by M1+M2->M5+M6, by considering the
network-wide flux distribution in which both metabolites M2 and M9 have a low
transformation rate.
Both expression and reactome array examples exhibit an equivalent metabolic
flux prediction (in panels (d) and (e) respectively). The pathway predicted to be active in
both panels is portrayed by solid edges, where thick edges denote reactions predicted to
be active with high confidence (i.e. the flux activity state can be uniquely determined to
be active across the optimal solution space). The activity prediction of M1->M4 has low
confidence, since an equally viable alternative path exists via M1->M10->M4.
67
2.2 Expanding approach: Integration of Reactome Array data
We introduce a new constraint-based computational method for the integration of a
genome-scale metabolic network with reactome array data (Beloqui, Guazzaroni et al.
2009), in order to predict metabolic flux activity. It is a reasonable assumption, that if a
metabolite substrate of a reaction is transformed by the reaction enzyme, that that
reaction can then be considered to be active. Similarly, if the metabolite substrate is not
transformed, then the reaction can be considered to be inactive. Reactome array data
input, the rate of metabolite transformation by enzymes, is discretized in a similar manner
to that of the gene expression data, such that metabolites which have a high
transformation rate are considered to be highly transformed, and receive the value 1,
while metabolites with a low transformation rate are considered to be lowly transformed
and receive the value -1. Metabolites in the intermediate are considered to be moderately
transformed and receive the value 0.
This problem can be formulated in the following manner: Given a subset of highly
transformed metabolite substrates ( ) and lowly transformed metabolite substrates ( ),
from a subset of existing model metabolites ( ), measured by the Reactome ( ):
; find an optimal feasible solution that would maximize the number of
metabolites from which are transformed by at least one reaction, and which
are not transformed by any reaction. A simple metabolite-to-reaction mapping was
employed to determine the transformation state for each reaction. Specifically, this was
achieved by assigning the transformation state of the metabolite substrates to the reaction
(a reaction that has both highly and lowly transformed metabolites is not included in the
68
constraints, allowing iMAT to determine its activity state without a priori knowledge).
This pre-processing results in a subset of the reactions in the model (denoted ) that is
defined to be highly transformed and another subset (denoted ) defined as lowly
transformed. We then formulated the following mixed integer linear programming
(MILP) problem to find a steady-state flux distribution satisfying stoichiometric and
thermodynamic constraints, while maximizing the number of reactions whose activity is
consistent with their metabolic transformation state (equation(1)):
(1) max ∑ ∑
s.t
(2) 0
(3)
(4) , ε , : ,
(5) , ε , : ,
(6) ∑
(7) 1 1 : ,
, , , 0,1 : ,
Where is the flux vector and is a stoichiometric matrix, in which is the
number of metabolites and is the number of reactions. The mass balance constraint is
enforced in equation (2). Thermodynamic constraints that restrict flow direction are
imposed by setting and as lower and upper bounds on flux values in equation
(3), respectively. For each reaction, the Boolean variables and represent whether
the reaction is active (in either direction) or not (when both and are 0); is the set
69
of reactions of which highly transformed metabolite j is a substrate. Specifically, a
reaction is considered to be active if it carries a significant positive flux that is greater
than a positive threshold ε (equation (4)) or a significant negative flux < – ε (equation (5)
for reversible reactions). is a Boolean variable per highly transformed metabolite j:
1 if at least one of the reactions of which metabolite j is a substrate is active
(equation (6)). For each lowly transformed metabolite j, the Boolean variable
represents whether all the reactions of which metabolite j is a substrate are inactive, or if
some are active (when is 0); is the set of reactions of which lowly transformed
metabolite j is a substrate. Specifically, a reaction is considered to be inactive if it does
not carry a flux that is greater than 0 in either direction (equation (7)). The optimization
maximizes the number of reactions whose activity is similar to their metabolic
transformation state. The commercial CPLEX solver was used for solving MILP
problems on a Pentium-4 machine running Linux in a few dozens of seconds per
problem.
A solution found by the MILP solver is guaranteed to be optimal in the sense of
the objective function maximized, but the solution found may not be unique as a space of
alternative optimal solutions may exist. In this case, the space of optimal solutions
represents alternative steady-state flux distributions attaining the same similarity with the
metabolic data. To account for these alternative solutions, we employed a variant of Flux
Variability Analysis (Mahadevan and Schilling 2003). Our method computes for each
metabolic reaction whether it is predicted to be always active (or alternately, always
inactive) across the entire solution space. This is achieved by solving two MILP problems
70
(similar to the one described above) for each reaction, in order to find the maximal
attainable similarity with the metabolic data when the reaction is forced to be activated
(denoting this similarity ) and when it is forced to be inactivated (denoting this similarity
). A reaction is then considered to be active if (i.e., a higher similarity with the
metabolic data is achieved when the reaction is active than when it is inactive) with a
confidence level of . Alternately, a reaction is considered to be inactive if ,
with a confidence of . If (i.e., the same similarity with the metabolic data can
be obtained both when the reaction is forced to be active or inactive), the activity state is
considered to be undetermined.
2.2.1 Modeling P.Putida’s Metabolic Profile via Reactome Array Integration
(Beloqui, Guazzaroni et al. 2009) have applied reactome array technology to measure
metabolite transformation in P. putida for genome sequence–independent functional
analysis of metabolic phenotypes and networks. The array includes 1676 substrate
compounds collectively representing central metabolic pathways of all forms of life.
Proof of concept was shown inter alia, by the reconstruction of P. putida’s metabolic
network, demonstrating that the array discriminates compounds metabolized by extracts
of P. putida from those that are not. Here, we utilize iMAT to integrate the reactome
array data with a genome-scale metabolic network model of P. putida [ref]. to predict the
actual metabolic flux reflected in the array data.
71
2.2.1.1 Data Acquisition and Preprocessing The raw reactome array data was first used to assign each metabolite with a
transformation state (i.e. lowly, moderately, or highly transformed), reflecting whether it
is being consumed by some enzymatic reaction. This pre-processing resulted in 91 lowly
and 263 highly transformed metabolites, out of the 1191 metabolites of the P. putida
metabolic model.
2.2.1.2 Results Utilizing iMAT to predict metabolic flux in P. putida given this data results in a
confident prediction of 459 active reactions and 792 inactive reactions (out of 1373
reactions in the model). The predicted flux distribution reflects the activity of amino acid
metabolism, biosynthesis of secondary metabolites, carbohydrate metabolism and energy
metabolism (based on a hyper geometric-based pathway enrichment test), in accordance
with the findings of Beloqui et al (Table 1). (See supplementary results for detailed
description)
Active enriched pathways P‐Values
Branched‐chain amino acid biosynthesis 0.0000TCA cycle 0.0000De novo purine biosynthesis 0.0000Histidine biosynthesis 0.0000Purine conversions 0.0001Isoleucine degradation 0.0001Valine degradation 0.0001Methionine biosynthesis 0.0003Common pathway for synthesis of aromatic compounds (dahp synthase to chorismate)
0.0004
Lysine biosynthesis dap pathway 0.0004leucine degradation and hmg‐coa metabolism 0.0013
72
Arginine and Ornithine degradation 0.0025N‐phenylalkanoic acid degradation 0.0025Histidine degradation 0.0040Tryptophan synthesis 0.0080Glycolysis and Gluconeogenesis 0.0175Formaldehyde assimilation: Ribulose monophosphate pathway 0.0192Serine biosynthesis 0.0367Glutamine; Glutamate; Aspartate and Asparagine biosynthesis 0.0416Proline synthesis 0.0447Pyruvate metabolism i: anaplerotic reactions; pep 0.0447
Table 1: Significantly enriched active pathways as predicted by iMAT.
The authors describe an analysis where 549 proteins were captured by gold
nanoparticles, of which 191 enzymes acting on 158 of the 525 P. putida metabolites were
unambiguously identified as active. Of the 191 enzymes and 158 metabolites, 123 (of
1082 enzymes in the model) and 47 (of 1191 metabolites in the model) were found
respectively in the P. putida metabolic model. We calculated iMAT's recall by testing if
at least one reaction associated with the above enzymes or metabolites was predicted to
be active, obtaining 0.6748 and 0.4894 respectively. To evaluate the significance of these
results, we calculated p-values for a random predictor which draws an associated
correctly predicted to be active reaction, for each enzyme and metabolite from their
associated reactions distribution respectively. The random predictor obtained p-values of
0.7387 and 0.7904, for enzymes and metabolites respectively, confirming iMAT's
significance.
To validate iMAT’s predictive accuracy, we performed a 5-fold cross-validation
test in which a training set of 80% of the reactome array metabolites was used as input for
iMAT to predict the transformation state of the remaining 20%. Specifically, in each
cross-validation trial, the transformation state of a metabolite (within the test set) was
73
predicted to be high, if at least a single reaction in which it participates as a substrate is
predicted to be active by iMAT, and low, if all reactions in which it participates as a
substrate are predicted to be inactive. The prediction of metabolites with high and low
transformation states was found to be highly significant, with a precision of 0.6077 and
0.615, recall of 0.8187 and 0.3544, and a p-value of 0.000283 and 0.000137 respectively.
74
3 Modeling Human Breast Cancer
We applied iMAT to the human metabolic model of (Duarte, Becker et al. 2007) to
predict the metabolic state of met induced breast cancer, by integrating gene expression
measurements from the pertaining cancer cell lines (Kaplan, Firon et al. 2000). met is a
proto-oncogene that encodes a protein Met, which is a membrane receptor activated by
the hepatocyte growth factor (HGF/SF) ligand, the only known ligand of the Met protein.
Met is a tyrosine kinase growth factor receptor that is imperative to embryonic growth
and wound healing. When spurred by HGF/SF, Met induces tumor growth, angiogenesis
and metastasis, which correlates with poor prognosis (Bottaro, Rubin et al. 1991; Cooper
1992).
Hepatocyte growth factor/scatter factor (HGF/SF) is a paracrine growth factor
which increases cellular motility and has also been implicated in tumor development and
progression and in angiogenesis. Little is known about the metabolic alteration induced in
cells following Met-HGF/SF signal transduction. The hypothesis that HGF/SF alters the
energy metabolism of cancer cells was investigated in perfused DA3 murine mammary
cancer cells by nuclear magnetic resonance (NMR) spectroscopy, oxygen and glucose
consumption assays and confocal laser scanning microscopy (CLSM). 31P NMR
demonstrated that HGF/SF induced remarkable alterations in phospholipid metabolites,
and enhanced the rate of glucose phosphorylation (P < .05). 13C NMR measurements,
using [13C1]-glucose-enriched medium, showed that HGS/SF reduced the steady state
levels of glucose and elevated those of lactate (P < .05). In addition, HGF/SF treatment
increased oxygen consumption from 0.58±0.02 to 0.71±0.03 µmol/hour per milligram
protein (P < .05). However, it decreased CO2 levels, and attenuated pH decrease. The
75
mechanisms of these unexpected effects were delineated by CLSM, using NAD(P)H
fluorescence measurements, which showed that HGF/SF increased the oxidation of the
mitochondrial NAD system. (Kaplan, Firon et al. 2000) propose that concomitant with
induction of ruffling, HGF/SF enhances both the glycolytic and oxidative
phosphorylation pathways of energy production.
3.1 Results
3.1.1 Data Acquisition and Preprocessing We utilized normalized gene expression data from breast cancer cell-lines with high Met
expression (MDA231, BT549, Hs578T) and low Met expression (MCF7, T47D,
MCF10), 24 hours after treatment with HGF/SF. The raw cell-line expression data was
transformed into qualitative expression states, in which each is either highly, lowly or
moderately expressed, using a bidirectional threshold of half a standard deviation from
the mean. The derived gene expression states for each cell-line were given as input to
iMAT to predict a flux distribution that is most consistent with the corresponding
expression signature.
3.1.2 Analysis Overview
A rough overview of the various analysis I performed utilizing iMAT’s metabolic flux
predictive power to deduct and differentiate the metabolic state of met induced breast
cancer. Pathway enrichment analysis was performed, deeming pathways with p-values
below 0.05 as significantly enriched. This analysis created a first level differentiating
76
metabolic profile for the high Met cell-lines as compared with low Met cell-lines, with
validations from the literature supporting this differentiating profile. Differential genes
that are post-transcriptionally upregulated (and thus could not be discerned by expression
data alone), reactions, and uptake and secretion of metabolites, which create a second
differentiating level comprising the high Met metabolic signature, were again ascertained
by the literature. Following personal correspondence with Prof. Ilan Tsarfaty, fatty acid
biosynthesis was selected for an in depth look due to its critical function in Met induced
breast cancer. We conclude with two immediate augmenting modifications.
3.1.3 iMAT on general human model integrated with expression data
3.1.3.1 Pathway Enrichment Analysis To track the differences between the metabolic response of high vs low Met cell lines to
ligand stimulation, we performed a pathway enrichment analysis of the predicted
metabolic flux activity profiles for both the high and low Met cell-lines. The analysis
revealed 9 metabolic pathways which are significantly enriched with reactions predicted
to be active in all three high Met cell-lines, and not in the low Met cell-lines (Table 2).
Correspondingly, the analysis uncovered 5 metabolic pathways which are significantly
enriched with reactions predicted to be active in all three low Met cell-lines, and not in
the high Met cell-lines. Reassuringly, the pathways identified by iMAT correspond nicely
to the known underlying biology of Met signaling in cancer. As already shown by
(Kaplan, Firon et al. 2000), HGF/SF activates Oxidative Phosphorylation and the TCA
cycle in DA3 murine mammary cancer cells, consistent with the corresponding metabolic
77
pathways iMAT predicted to be significantly active. Additionally, Table 2 points to a
quite global activation of amino acid pathways, and typical anaplerotic activation of
pathways involved in glutamine metabolism and the TCA cycle. These findings are in
line with the observations of (DeBerardinis, Sayed et al. 2008), which have asserted that
glutamine metabolism enables macromolecular synthesis in proliferating cells, allowing
cells to meet both the anaplerotic and NADPH demands of growth. Thus, iMAT’s
predictions fit well these putative metabolic requirements of the highly proliferating high-
Met cell-lines. The low-Met cell-line active pathway predictions do not portraying the
metabolic HGF/SF Met incitement signature .Notably, the differential activation of these
pathways is not directly reflected in the gene expression data, as neither Gene Set
Enrichment Analysis (GSEA) (Subramanian, Kuehn et al. 2007), or the commonly used
hyper geometric-based pathway enrichment analysis enables their detection. Only two of
the high-Met differentiating pathways uncovered by iMAT were discerned by hyper
geometric expression analysis, and the remaining predicted pathways are not indicative of
HGF/SF activation of Met (Supplementary material).
Enriched Active PathwaysiMAT High‐Met P‐value iMAT low‐Met P‐value Inositol phosphate metabolism 0.0000 Pentose Phosphate pathway 0.0010 TCA cycle 0.0013 Eicosanoid metabolism 0.0013 Phenylalanine metabolism 0.0034 Glutathione metabolism 0.0375 Oxidative Phosphorylation 0.0080 Fatty acid activation 0.0425 Glutamate and Glutamine metabolism 0.0111 Alanine and Aspartate metabolism 0.0475 Glycine, Serine, and Threonine metabolism 0.0265 Tyrosine metabolism 0.0350 Propanoate metabolism 0.0493 Histidine Metabolism 0.0493 Table 2: depicts the significant results of the pathway enrichment analysis performed on the predicted metabolic flux activity profiles generated by iMAT for both the high and low Met cell-lines, and the common pathways discovered by gene expression data alone for the high-met cell-line. iMAT’s analysis uncovered 9 pathways from 99 pathways in the human metabolic model predicted to be active (i.e., significantly enriched by predicted active reactions using a hyper geometric test) in all
78
three high Met cell-lines, and not in all three low Met cell-lines; 5 pathways predicted to be active in all three low Met cell-lines, and not in all three high Met cell-lines. The analysis based solely on gene expression data revealed 5 pathways predicted to be active in all three high Met cell-lines, and not in all three low Met cell-lines (Supplementary Results, Section 1). Highlighted in yellow are pathways common to iMAT and the gene expression analysis.
3.1.3.2 Differential genes and post-transcriptional regulation I performed a differential reaction analysis finding reactions whose activity predictions
differentiate high Met cell-lines from low Met cell-lines. Per this analysis the differential
driving enzymes can be indentified in the case of unspontaneous reactions, and more
specifically post-transcriptionally regulated genes can be identified via this process, in
addition to the uptake and secretion of differential metabolites.
(Lu, Bennet et al. 2010) show metabolite changes accompanying mammary
tumour progression are identified in the intracellular and secreted forms in several
pathways, including glycolysis tricarboxylic acid cycle, pentose phosphate pathway, fatty
acid and nucleotide biosynthesis and the GSH-dependent anti-oxidative pathway. As
illustrated in (Lu, Bennet et al. 2010) 157 metabolites were profiled in six cell lines with
progressively increased tumourigenicity and metastatic ability. The analysis of
intracellular metabolites clustered the lines into three categories, normal, tumourigenic
(but non-metastatic) and metastatic in general. Results from the analysis favour a two-
step metabolic progression hypothesis during mammary tumour progression: the first step
accompanies the acquisition of tumourigenicity and includes altered glycolysis, PPP and
fatty acid synthesis as well as decreased GSH/GSSG redox pool; the second step is
correlated with the gain of the general metastatic ability and includes further changes in
glycolysis and TCA cycle (three of the four metabolites in the last steps of glycolysis
79
were enriched in metastatic cells: 3-phosphoglycerate, phosphoenolpyruvate (PEP) and
pyruvate, suggesting differences in lower glycolysis. Although the exact mechanism is
not clear, these findings in lower glycolysis may relate to the pivotal role of specific
pyruvate kinase isozymes in oncogenesis. Moving down from pyruvate, several TCA
cycle intermediates were enriched in the metastatic lines, including aconitate, citrate,
isocitrate and malate. The “Warburg effects” suggests tumor cells prefer aerobic
glycolysis to TCA cycle for producing ATP and reductants. (Lu, Bennet et al. 2010)’
observation that TCA intermediates are exclusively upregulated in metastatic cell lines
suggests that invasive cancer cells are using the TCA cycle differently than the non-
metastatic cells. It is unclear, however, whether these differences are associated with
increased TCA cycle flux, and, if so, whether this flux is driven primarily by glucose or
glutamine. Fluxomic analysis is well suited for answering these questions. The changes
of nucleotide species follow an interesting pattern: levels are lower in the nonmetastatic
tumour cells than in the nontransformed cells, perhaps due to enhanced nucleotide
consumption to feed growth and DNA replication in the transformed cells. The metastatic
tumour cells, however, have increased nucleotide levels, reflecting altered nucleotide
turnover), further depletion of the glutathione species, and increased nucleotides. No
further metabolite alterations correlated with stepwise increase of metastasis potential in
the four metastatic lines were resolved in the analysis. (Lu, Bennet et al. 2010) analysis of
extracellular metabolites identified increased abundance of TCA cycle components as
well as nucleotide metabolism intermediates, similar to the intracellular results. Their
findings agree with a recent study profiling a more limited set of metabolites in the
MCF10 model of mammary carcinoma. Both studies find evidence for increased pentose
80
phosphate pathway, TCA cycle, and fatty acid biosynthetic activity in transformed and/or
metastatic cells. Several secreted metabolites accompanying the increased metastatic
potential (malate, fumarate, deoxyguanosine, guanine, xanthine, and hypoxanthine)
should be tested for their value as diagnostic and prognostic biomarker of malignant
breast cancer in future studies.
My analysis of differential metabolic activity recovered the following pathways,
enzymes and metabolites as differentiating metastatic high Met HGF/SF treated cells-
lines from non-metastatic low Met cell-lines.
HGF/SF enhances the glycolytic pathway of energy production (Kaplan, Firon et
al. 2000). Many tumour cells contain elevated levels of total hexokinase activity, the first
enzyme involved in the commitment step of glycolysis as well as an increased amount of
hexokinase type II bound to the outer mitochondrial membrane. In contrast to normal
cells, tumour cells rather obtain most of their ATP from glycolysis than the TCA and
respiration. The mitochondrial association with hexokinase has been proposed to drive
the process of glycolysis in tumour cells by providing for preferential access to inorganic
phosphate and ADP as well as protection against product inhibition by glucose-6-
phosphate (Figure 15). (Copeland, Wachsman et al. 2002). Hexokinase II, one of the four
hexokinase isozymes, is a target of many transcription factors important in tumorigenesis,
including HIF1 and Myc (through the ‘carbohydrate response element’). Hexokinase is
also thought to have a role in protecting the cell against apoptosis. It has been shown that
hexokinases I and II are associated with mitochondria, binding the voltage-dependent
81
anion channel on the mitochondrial outer membrane. Hexokinase binding to voltage
dependent anion channel is thought to be dependent on both glycolytic flux and AKT
activity, although the former is both necessary and sufficient for its anti-apoptotic activity
(Copeland, Wachsman et al. 2002). iMAT predicts post transcriptional upregulation of
HK2 and HK3 in all three high Met cell-lines, while HK1 is upregulated only in one of
the cell-lines.
A glycolytic enzyme whose levels can be altered by p53 (tumor suppressor
protein) expression is phosphoglycerate mutase (PGM, catalyzes the transfer of
phosphate between the 1 and 6 positions of glucose). In cells with high p53 expression,
PGM expression is reduced, but loss of function or low levels of p53 allows increased
PGM and hence glycolysis. Interestingly, over-expression of PGM can immortalize
mouse embryonic fibroblast (MEFs): a phenotype that is dependent upon its catalytic
activity. The correlation between the rate of glycolysis and immortalization was
strengthened by two further strands of evidence: that inhibition of a number of glycolytic
enzymes [PGM, PGI, glyceraldehydes 3-phosphate dehtdrogenase (GAPDH) and
phosphoglycerate kinase (PGK)] can trigger MEF senescence and that spontaneously
immortalized MEFs also increase their glycolytic rate. (Tennant, Duran et al. 2009).
iMAT predicted PGM1 and PGM2 to be post-transcriptionally upregulated in the high
Met cell-lines (they expressed low and moderate expression levels) in concordance.
The TCA cycle intermediates are known to be increased by HGF/SF activation of
Met. It was found to be differentially activated, with MDH1 and MDH2 (Malate
dehydrogenase, localized in the cytoplasm and mitochondria respectively) which catalyze
82
the reversible oxidation of malate to oxaloacetate, utilizing the NAD/NADH cofactor
system in the citric acid cycle. Both enzymes are predicted by iMAT to catalyze the
reaction in the direction of malate production in accordance with (Lu, Bennet et al. 2010).
Furthermore MDH1and MDH2 were found to be post-transcriptionally upregulated
across all three HGF/SF high Met cell-lines in which they exhibited only moderate
expression rates, thus further attesting to iMAT’s predictive value.
PRPS1, PRPS2, and PRPS1L1 of the Pentose phosphate pathway were found to
be post-transcriptionally upregulated in the high Met cell-lines (predicted to drive
phosphoribosylpyrophosphate synthetase in the atp[c] + r5p[c] => amp[c] + h[c] +
prpp[c] direction), in accordance with (Lu, Bennet et al. 2010). This gene encodes an
enzyme that catalyzes the phosphoribosylation of ribose 5-phosphate to 5-
phosphoribosyl-1-pyrophosphate, which is necessary for purine metabolism and
nucleotide biosynthesis.
(Lu, Bennet et al. 2010) describe increased nucleotide biosynthesis, and iMAT
uncovered GUK1 (guanylate kinase, catalyses the conversion of GMP, to GTP as part of
the cGMP cycle. In mammalian phototransduction, this cycle is essential for the
regeneration of cGMP following its hydrolysis by phosphodiesterase.) to be post-
transcriptionally upregulated. This is in accordance with genes identified as over-
expressed in frequently gained/amplified chromosome regions in multiple myeloma
(Largo, Alvarez et al. 2006; Young, Ebner et al. 2006). In addition (da Rocha, Giorgi et
al. 2006) found over expression of HGF and GUK1 in GH-secreting pituitary adenomas.
83
Hepatocyte growth factor (HGF) down-modulates FSH-dependent estradiol-17b
(E2) production in ovarian granulosa cells in vitro. The mechanisms of action underlying
the antiestrogenic effects of HGF are vague, although evidence indicates that HGF may
affect cAMP signal transduction in rat granulosa cells. (Zachow and Woolery 2002)
demonstrate that the effects of HGF on cyclic nucleotide PDE activities were manifested
in a selective time-dependent and hormone-dependent manner, in addition to cAMP
decreasement at 24 hr and cGMP increasement after the HGF treatment. FSH-induced
(pituitary glycoprotein) cAMP (catabolite gene activator protein) PDE was suppressed by
HGF at 24 h but not at 36 h, whereas FSH-dependent cGMP PDE was impaired at 36 h,
but not at 24h. HGF prevented the IGF-I-dependent reduction in FSH-stimulated cAMP-
PDE activity at 24 and 36 h, and lowered FSH 1 IGF-I-stimulated cGMP-PDE activity at
36 h, concomitant with an HGF-dependent increase in cGMP content at 24 h. These data
indicate that HGF affects cAMP-directed and cGMP-directed signaling pathways at
multiple sites in granulosa cells. These HGF-dependent effects may provide insight for
mechanisms of action whereby HGF reduces E2 secretion by granulose cells. The PDE
(phosphodiester, is an enzyme family that catalyzes the hydrolysis of phosphodiester
bonds, plays an important role in the repair of oxidative DNA damage and belongs to the
nucleotide biosynthesis pathway) gene family was predicted by iMAT to be post-
transcriptionally upregulated. Recent studies comparing a normal human mammary
epithelial cell line and a transformed human breast cancer cell line demonstrated that the
levels of PMEs (phosphomonoester) as well as PDEs were extremely low in the normal
cells, and significantly less than in the breast cancer cell line. A further serial study of 25
patients undergoing hormone, chemotherapy and radiotherapy treatments showed a
84
significant correlation between a decrease in PME, PDE and total NTP levels and
response to therapy as measured by a decrease in tumour volume. (Ronen and Leach
2000)
iMAT predicts greatly increased Inositol Phosphate metabolism activity, with the
associated genes post-transcriptionally upregulated. (Harris, Burns et al. 1993) exhibit
that hepatocyte growth factor stimulates phosphoinositide hydrolysis and mitogenesis in
cultured renal epithelial cells. (Koch, Mancini et al. 2005) determine that SH2-domian-
containing inositol 5-phosphatase (SHIP)-2 binds to c-Met directly via tyrosine residue
1356 and involves hepatocyte growth factor (HGF)-induced lamellipodium formation,
cell scattering and cell spreading.
Fatty acid biosynthesis is hypothesized to have a pivotal role in Met-HGF/SF
induced breast cancer (Prof. Ilan Tsarfaty, personal correspondence). (Quash, Fournet et
al. 2003) provide evidence that certain oxoacids formed in anaplerotic reactions control
cell proliferation/apoptosis. Normal human fibroblasts in culture in a serum-deprived
medium require the presence of one of the oxoacids (glyoxylate, pyruvate, 2-
oxoglutarate, or oxaloacetate) for their proliferation, and, of these, glyoxylate is the most
effective. iMAT predicted the post-transcriptional upregulation of the ALDH (Aldehyde
dehydrogenase) family of the Glyoxylate and Dicarboxylate Metabolism, which are
involved in the metabolism of many molecules including certain fats (cholesterol and
other fatty acids) and protein building blocks (amino acids). iMAT also predicts the post-
transcritional upregulation of aldose reductase of pyruvate metabolism, in accordance
85
with (Gharbi, Gaffney et al. 2002) speculation that increased expression of the metabolic
enzymes carbamoyl-phosphate synthetase, glutaminase, and aldose reductase in the
HBc3.6 cells is a direct consequence of their enhanced proliferation caused by ErbB-2
over-expression. The ErbB protein family or epidermal growth factor receptor (EGFR)
family is a family of four structurally related receptor tyrosine kinases. Excessive ErbB
signaling is associated with the development of a wide variety of types of solid tumor.
ErbB-1 and ErbB-2 are found in many human cancers and their excessive signaling may
be critical factors in the development and malignancy of these tumors (Cho and Leahy
2002).
Cholesterol is essential for the multiplication of all mammalian cells and expected
to be in higher demand in fast growing cells such as tumour cells. Most cholesterol is
supplied to the tumour usually by the host, however, tumours may also have the
machinery to synthesize it. Animal studies shown that cholesterol lowering drugs such as
lovastatin attenuate the tumour formation and metastasis. Cholesterol is essential for cell
viability and growth being a critical component of the cell membranes where it serves
several functions including regulation of the membrane fluidity, activity of membrane
bound proteins such as integrins, membrane bound enzymes, and several signal
transduction pathways. It has been recently shown that cholesterol is also required for cell
cycle progression from G2 to M phase. (Awad, Williams et al. 2003). iMAT predicted
cholesterol metabolism to differentiate between HGF/SF activated metastatic high Met
and low Met cell-lines.
86
HGF/SF enhances the oxidative phosphorylation pathway of energy production
(Kaplan, Firon et al. 2000). Tumour development is often associated with mitochondrial
DNA (mtDNA) mutations and alterations in mitochondrial genomic function. These
mutations have been identified in bladder, breast, colon, head and neck, kidney, liver,
lung and stomach cancers, and in the hematologic malignancies leukaemia and
lymphoma. Altered expression and mutations in mtDNA-encoded Complexes I. III. IV.
And V. as well as mutations in the hypervariable regions of mtDNA, comprise some of
the mitochondrial genomic aberrations found in cancer tissue. Cytochrome c oxidase
belongs to Complex IV of the electron respiratory chain oxidative phosphorylation
system that produces cellular ATP. The mtDNA aberrations in Complex IV were also
identified by (Copeland, Wachsman et al. 2002) in breast cancer. Compared to nuclear
DNA mtDNA is more susceptible to oxidative damage, and is in general, more mutable.
The occurrence of mtDNA mutations (deletions, point mutations. duplications) in tumour
cells are consistent with the concept that tumour cells are under persistent (constitutive)
oxidative stress generating higher levels of the ROS Superoxide and hydrogen peroxide
than their normal counterparts. This notion is consistent with the fact that mitochondria
contain the complete electron transport system involved in both respiration and oxidative
phosphorylation. Reactive oxygen species are known to function in both the initiation and
promotion of cancer, as well as in decreasing mitochondrial ATP production. (Copeland,
Wachsman et al. 2002). iMAT predicts COX IV to be differentially post-transcriptionally
upregulated in the high Met cell-lines.
87
Several differential metabolites were found to be transported to and secreted from
the high Met cell-lines. Among them xanthine (a purine base product on the pathway of
purine degradation, and is subsequently converted to uric acid by the action of the
xanthine oxidase enzyme), which is predicted by iMAT to be transported from the
cytoplasm to the peroxisome. This is in agreement with the findings of (Lu, Bennet et al.
2010).
This analysis focused on active pathways and post-transcriptionally upregulated
genes since literary validations are more readily available than for the inactive pathways.
The tabulated results of this analysis and the inactive pathways differential genes analysis
can be found in the accompanying excel files (“Differentiated Genes from predicted
in/active rxns High Low MET_24h”). These results suggest possible biomarkers of breast
cancer progression as well as opportunities of interrupting tumour progression through
the targeting of metabolic pathways.
88
3.2 Future Directions
In the immediate future the following feasible enhancements to iMAT’s integrative
approach can be easily implemented, the first having already passed a preliminary test of
viability.
3.2.1 Integrating iMAT’s flux predictions to model a cancer metabolic profile via quantification
One of iMAT’s drawbacks is affixed in its inability to predict elevation or reduction in
metabolic flux activity when comparing biological conditions (such as wild-type
compared to cancer cell, or aggressive cancer compared to first stage, where the
differential changes are in many cases the phenomenon of interest), since its prediction
pertains only to a boolean activity state. By taking iMAT’s confident flux activity
predictions and constraining them in the relevant model (in the HGF/SF-Met analysis the
general human model), and applying FVA, we create a reduced optimal solution space,
and thus minimize the FVA predicted flux range, which enables the prediction of flux
activity elevation and reduction.
We performed preliminary tests of this model via pathway enrichment analysis,
across all 4 measured time-stamps (0min, 10min, 30min, 24h) of the 6 high and low Met
cell-lines, and received very encouraging results. We found that the predicted elevated
and reduced pathways fit the known pathway signature of each high Met cell-line time-
stamp, forming a predicted kinetic trajectory of HGF/SF Met induction. Our goal is to
project such kinetic trajectories mapping the predicted behavioural (elevation/reduction)
cycle of the HGF/SF induced high Met metabolic signature.
89
When comparing these initial results with the pathway enrichment analysis
described in section 3.1.3.1 for the 30min time-stamp, we see that known HGF/SF driven
elevated pathways such as glycolysis, ROS detoxification and pyrimidine biosynthesis
are uncovered (Kaplan, Firon et al. 2000)(Personal correspondence with Prof. Ilan
Tsarfaty), and since are active in both high and low Met cell-lines cannot be discerned by
simple pathway enrichment analysis on iMAT’s raw flux activity predictions.
3.2.2 Weighted iMAT A criticism of iMAT’s qualitative input (discrete tri-values representing
low/moderate/high expression levels) is that it loses the fine granularity of the expression
intensities. Abating this (Banta, Vemula et al. 2007) describes a moderate correlation
between expression and metabolic flux, such that the quantitative values should not have
much impact. In addition it is a known gene expression measurement fact that it is almost
impossible to compare one set of expression levels with another, even when measured by
the same scientist with the same technology due the tremendous noise levels, which
further strengthens the discretization logic. Having said that, it would still be interesting
and worthy to confirm that a weighted version of iMAT does not encompass added value.
One possible means to implement the integration of expression weights (levels), is to
modify iMAT’s objective function to include the relative expression weights.
90
4 Discussion
This thesis presents two challenges. The first, a computational endeavour of bringing
forth a research oriented method to the systems biology community at an industry level
quality. The second, an exploratory venture of understanding the limitations and
applicability of the integrative approach presented by (Shlomi, Cabili et al. 2008),
expanding it to span other forms of high-throughput molecular data, and deducing its
illustrative effectiveness and potential in the quest of modeling cancer.
iMAT online availability: We introduced here, an integrative metabolic analysis tool
(iMAT), which is a web-based implementation of the method of (Shlomi, Cabili et al.
2008), which will serve the community by enabling the prediction of metabolic fluxes by
integrating metabolic networks with gene and protein expression, and reactome array
data. We demonstrated its utility in the prediction of Human breast-cancer metabolism.
As a side benefit this will enable the construction and accumulation of a corpus of high-
throughput molecular data, which can further advance and facilitate our research
objectives.
Metabolic Breast Cancer Modeling: The various manipulations on iMAT’s flux
activity predictions uncovered significant pathways, enzymes and metabolites to the
description of HGF/SF induced Met breast cancer. Experimental validations now need to
be carried out to further explore the interesting sensible directions unearthed.
Metabolic signature: The approaches suggested here initiate a course for the
computational investigation of cancer by way of culminating metabolic alterations into a
differential metabolic signature, which can then be utilized for disease diagnosis,
prognosis and treatment. iMAT’s metabolic flux distribution predictions denote the first
91
step towards disease profiling, and can be employed in various statistical inference and
descriptive tests such as active and inactive pathway enrichment analysis, determining
gene post transcriptional regulation, prediction of metabolite uptake and secretion under
multifarious conditions, and so forth, as far as our imagination goes. From this, emerged
the need for the design of a learning framework, to process metabolic transformations
into an aggregated comprehensive distinctive signature.
92
5 Bibliography
Akesson, M., J. Forster, et al. (2004). "Integration of gene expression data into genome-scale metabolic models." Metabolic Engineering 6(4): 285-293.
Altucci, L., M. Leibowitz, et al. (2007). "RAR and RXR modulation in cancer and metabolic disease." Nature Reviews Drug Discovery 6(10): 793-810.
Apic, G., T. Ignjatovic, et al. (2005). "Illuminating drug discovery with biological pathways." FEBS letters 579(8): 1872-1877.
Awad, A., H. Williams, et al. (2003). "Effect of phytosterols on cholesterol metabolism and MAP kinase in MDA-MB-231 human breast cancer cells." The Journal of Nutritional Biochemistry 14(2): 111-119.
Banta, S., M. Vemula, et al. (2007). "Contribution of gene expression to metabolic fluxes in hypermetabolic livers induced through burn injury and cecal ligation and puncture in rats." Biotechnology and bioengineering 97(1): 118.
Bard, J. (1998). Practical bilevel optimization: algorithms and applications, Kluwer Academic Pub.
Becker, S. A. and B. O. Palsson (2008). "Context-specific metabolic networks are consistent with experiments." PLoS Computational Biology 4(5).
Beloqui, A., M. E. Guazzaroni, et al. (2009). "Reactome array: Forging a link between metabolome and genome." Science 326(5950): 252.
Bilu, Y., T. Shlomi, et al. (2006). "Conservation of expression and sequence of metabolic genes is reflected by activity across metabolic states." PLoS Comput Biol 2: e106.
Bottaro, D. P., J. S. Rubin, et al. (1991). "Identification of the hepatocyte growth factor receptor as the c-met proto-oncogene product." Science 251(4995): 802.
Burgard, A. and C. Maranas (2003). "Optimization-based framework for inferring and testing hypothesized metabolic objective functions." Biotechnology and Bioengineering 82(6): 670-677.
Burgard, A., P. Pharkya, et al. (2003). "Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization." Biotechnology and Bioengineering 84(6): 647-657.
Chatziioannou, A., G. Palaiologos, et al. (2003). "Metabolic flux analysis as a tool for the elucidation of the metabolism of neurotransmitter glutamate." Metabolic engineering 5(3): 201-210.
Cho, H. and D. Leahy (2002). "Structure of the extracellular region of HER3 reveals an interdomain tether." Science 297(5585): 1330.
Chuang, H. Y., E. Lee, et al. (2007). "Network-based classification of breast cancer metastasis." Molecular systems biology 3: 140.
Cline, M. S., M. Smoot, et al. (2007). "Integration of biological networks and gene expression data using Cytoscape." NATURE PROTOCOLS-ELECTRONIC EDITION- 2(10): 2366.
Cooper, C. S. (1992). "The met oncogene: from detection by transfection to transmembrane receptor for hepatocyte growth factor." Oncogene 7(1): 3.
Copeland, W., J. Wachsman, et al. (2002). "Mitochondrial DNA alterations in cancer." Cancer investigation 20(4): 557-569.
93
Covert, M., E. Knight, et al. (2004). "Integrating high-throughput and computational data elucidates bacterial networks." Nature 429(6987): 92-96.
da Rocha, A., R. Giorgi, et al. (2006). "Hepatocyte growth factor-regulated tyrosine kinase substrate (HGS) and guanylate kinase 1 (GUK1) are differentially expressed in GH-secreting adenomas." Pituitary 9(2): 83-92.
Daran-Lapujade, P., M. L. A. Jansen, et al. (2004). "Role of transcriptional regulation in controlling fluxes in central carbon metabolism of Saccharomyces cerevisiae: a chemostat culture study." Journal of Biological Chemistry 279(10): 9125-9138.
DeBerardinis, R. J., N. Sayed, et al. (2008). "Brick by brick: metabolism and tumor cell growth." Current opinion in genetics & development 18(1): 54-61.
Domach, M., S. Leung, et al. (2000). "Computer model for glucose-limited growth of a single cell of Escherichia coli B/rA (Reprinted from Biotechnology and Bioengineering, vol 26, pg 203-216, 1984)." Biotechnology and Bioengineering 67(6): 827-840.
Duarte, N., S. Becker, et al. (2007). "Global reconstruction of the human metabolic network based on genomic and bibliomic data." Proceedings of the National Academy of Sciences 104(6): 1777.
Duarte, N., M. Herrgård, et al. (2004). "Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model." Genome Research 14(7): 1298.
Durmu Tekir, S., T. Çak r, et al. (2006). "Analysis of enzymopathies in the human red blood cells by constraint-based stoichiometric modeling approaches." Computational Biology and Chemistry 30(5): 327-338.
Edelman, L., J. Eddy, et al. (2009). "In silico models of cancer." Wiley Interdisciplinary Reviews: Systems Biology and Medicine.
Edwards, J., R. Ibarra, et al. (2001). "In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data." Nature biotechnology 19(2): 125-130.
Edwards, J. and B. Palsson (2000). "The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities." Proceedings of the National Academy of Sciences 97(10): 5528.
Famili, I., J. Förster, et al. (2003). "Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network." Proceedings of the National Academy of Sciences of the United States of America 100(23): 13134.
Famili, I., J. Fצrster, et al. (2003). "Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network." Proceedings of the National Academy of Sciences of the United States of America 100(23): 13134.
Feist, A. and B. Palsson (2008). "The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli." Nature biotechnology 26(6): 659-667.
Fell, D. (1997). Understanding the control of metabolism, Portland Press London. Fong, S., A. Joyce, et al. (2005). "Parallel adaptive evolution cultures of Escherichia coli
lead to convergent growth phenotypes with different gene expression states." Genome Research 15(10): 1365.
94
Fong, S. and B. Palsson (2004). "Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes." Nature genetics 36(10): 1056-1058.
Fong, S. S. and B. Palsson (2004). "Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes." Nature genetics 36(10): 1056-1058.
Förster, J., I. Famili, et al. (2003). "Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network." Genome Research 13(2): 244.
Gharbi, S., P. Gaffney, et al. (2002). "Evaluation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell system." Molecular & Cellular Proteomics 1(2): 91.
Guldberg, P., F. Rey, et al. (1998). "A European multicenter study of phenylalanine hydroxylase deficiency: classification of 105 mutations and a general system for genotype-based prediction of metabolic phenotype." The American journal of human genetics 63(1): 71-79.
Harris, R., K. Burns, et al. (1993). "Hepatocyte growth factor stimulates phosphoinositide hydrolysis and mitogenesis in cultured renal epithelial cells." Life sciences 52(13): 1091.
Hu, Z., J. Mellor, et al. (2005). "VisANT: data-integrating visual framework for biological networks and modules." Nucleic acids research 33(Web Server Issue): W352.
Ibarra, R., J. Edwards, et al. (2002). "Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth." Nature 420(6912): 186-189.
Jamshidi, N. and B. Palsson (2008). "Formulating genome-scale kinetic models in the post-genome era." Molecular Systems Biology 4: 171.
Joshi, A. and B. Palsson (1990). "Metabolic dynamics in the human red cell. Part III--Metabolic reaction rates." Journal of theoretical biology 142(1): 41.
Joshi, A. and B. O. Palsson (1989). "Metabolic dynamics in the human red cell: Part I--A comprehensive kinetic model." Journal of theoretical biology 141(4): 515-528.
Joyce, A. R. and B. O. Palsson (2006). "The model organism as a system: integrating'omics' data sets." Nature Reviews Molecular Cell Biology 7(3): 198-210.
Kanehisa, M. and S. Goto (2000). "KEGG: Kyoto encyclopedia of genes and genomes." Nucleic acids research 28(1): 27.
Kaplan, O., M. Firon, et al. (2000). "HGF/SF activates glycolysis and oxidative phosphorylation in DA3 murine mammary cancer cells." Neoplasia (New York, NY) 2(4): 365.
Kauffman, K., P. Prakash, et al. (2003). "Advances in flux balance analysis." Current Opinion in Biotechnology 14(5): 491-496.
Koch, A., A. Mancini, et al. (2005). "The SH2-domian-containing inositol 5-phosphatase (SHIP)-2 binds to c-Met directly via tyrosine residue 1356 and involves hepatocyte growth factor (HGF)-induced lamellipodium formation, cell scattering and cell spreading." Oncogene 24(21): 3436-3447.
Lanpher, B., N. Brunetti-Pierri, et al. (2006). "Inborn errors of metabolism: the flux from Mendelian to complex diseases." Nature Reviews Genetics 7(6): 449-460.
95
Largo, C., S. Alvarez, et al. (2006). "Identification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma." Haematologica 91(2): 184.
Lee, I. and B. Palsson (1990). "A comprehensive model of human erythrocyte metabolism: extensions to include pH effects." Biomedica biochimica acta 49(8-9): 771.
Lee, J., E. Gianchandani, et al. (2006). "Flux balance analysis in the era of metabolomics." Briefings in bioinformatics 7(2): 140.
Lee, S., C. Palakornkule, et al. (2000). "Recursive MILP model for finding all the alternate optima in LP models for metabolic networks." Computers and Chemical Engineering 24(2-7): 711-716.
Lu, X., B. Bennet, et al. (2010). "Metabolomic changes accompanying transformation and acquisition of metastatic potential in a syngeneic mouse mammary tumor model." The Journal of biological chemistry.
Luscombe, N. M., M. Madan Babu, et al. (2004). "Genomic analysis of regulatory network dynamics reveals large topological changes." Nature 431(7006): 308-312.
Ma, H., A. Sorokin, et al. (2007). "The Edinburgh human metabolic network reconstruction and its functional analysis." Molecular Systems Biology 3: 135.
Mahadevan, R., D. Bond, et al. (2006). "Characterization of metabolism in the Fe (III)-reducing organism Geobacter sulfurreducens by constraint-based modeling." Applied and Environmental Microbiology 72(2): 1558.
Mahadevan, R. and C. Schilling (2003). "The effects of alternate optimal solutions in constraint-based genome-scale metabolic models." Metabolic engineering 5(4): 264-276.
Majewski, R. and M. Domach (1990). "Simple constrained-optimization view of acetate overflow in E. coli." Biotechnology and Bioengineering 35(7): 732-738.
Mo, M. and B. Palsson (2009). "Understanding human metabolic physiology: a genome-to-systems approach." Trends in Biotechnology 27(1): 37-44.
Mulquiney, P. and P. Kuchel (1999). "Model of 2, 3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: computer simulation and metabolic control analysis." Biochemical Journal 342(Pt 3): 597.
Mulquiney, P. and P. Kuchel (1999). "Model of 2, 3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: equations and parameter refinement." Biochemical Journal 342(Pt 3): 581.
Muoio, D. and C. Newgard (2006). "Obesity-related derangements in metabolic regulation."
Oberhardt, M., B. Palsson, et al. (2009). "Applications of genome-scale metabolic reconstructions." Molecular Systems Biology 5(1).
Overbeek, R., T. Begley, et al. (2005). "The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes." Nucleic acids research 33(17): 5691.
Palsson, B. (2006). Systems biology: properties of reconstructed networks, Cambridge University Press New York, NY, USA.
Palsson, B. (2009). "Metabolic systems biology." FEBS letters.
96
Park, S. J., S. Y. Lee, et al. (2005). "Global physiological understanding and metabolic engineering of microorganisms based on omics studies." Applied microbiology and biotechnology 68(5): 567-579.
Pharkya, P., A. Burgard, et al. (2004). "OptStrain: a computational framework for redesign of microbial production systems." Genome Research 14(11): 2367.
Price, N., J. Reed, et al. (2004). "Genome-scale models of microbial cells: evaluating the consequences of constraints." Nature Reviews Microbiology 2(11): 886-897.
Quash, G., G. Fournet, et al. (2003). "Anaplerotic reactions in tumour proliferation and apoptosis." Biochemical pharmacology 66(3): 365-370.
Ramakrishna, R., J. Edwards, et al. (2001). "Flux-balance analysis of mitochondrial energy metabolism: consequences of systemic stoichiometric constraints." American Journal of Physiology- Regulatory, Integrative and Comparative Physiology 280(3): 695.
Reed, J. and B. Palsson (2003). "Thirteen years of building constraint-based in silico models of Escherichia coli." Journal of bacteriology 185(9): 2692.
Romero, P., J. Wagg, et al. (2004). "Computational prediction of human metabolic pathways from the complete human genome." Genome biology 6(1): R2.
Ronen, S. and M. Leach (2000). "Breast imaging technology: Imaging biochemistry- applications to breast cancer." Breast Cancer Res 3(1): 36.
Rossell, S., C. C. van der Weijden, et al. (2006). "Unraveling the complexity of flux regulation: a new method demonstrated for nutrient starvation in Saccharomyces cerevisiae." Proceedings of the National Academy of Sciences of the United States of America 103(7): 2166.
Schilling, C., M. Covert, et al. (2002). "Genome-scale metabolic model of Helicobacter pylori 26695." Journal of bacteriology 184(16): 4582.
SCHILLING, C. and B. PALSSON (2000). "Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis." Journal of theoretical biology 203(3): 249-283.
Segre, D., D. Vitkup, et al. (2002). "Analysis of optimality in natural and perturbed metabolic networks." Proceedings of the National Academy of Sciences 99(23): 15112.
Selvarasu, S., I. Karimi, et al. "Genome-scale modeling and in silico analysis of mouse cell metabolic network."
Shi, Y. and P. Burn (2004). "Lipid metabolic enzymes: emerging drug targets for the treatment of obesity." Nature Reviews Drug Discovery 3(8): 695-710.
Shlomi, T., O. Berkman, et al. (2005). "Regulatory on/off minimization of metabolic flux changes after genetic perturbations." Proceedings of the National Academy of Sciences of the United States of America 102(21): 7695.
Shlomi, T., M. Cabili, et al. (2008). "Network-based prediction of human tissue-specific metabolism." Nat Biotechnol 26(9): 1003–1010.
Shlomi, T., Y. Eisenberg, et al. (2007). "A genome-scale computational study of the interplay between transcriptional regulation and metabolism." Molecular Systems Biology 3: 101.
Stelling, J., S. Klamt, et al. (2002). "Metabolic network structure determines key aspects of functionality and regulation." Nature 420(6912): 190-193.
97
Subramanian, A., H. Kuehn, et al. (2007). "GSEA-P: a desktop application for Gene Set Enrichment Analysis." Bioinformatics 23(23): 3251.
Tennant, D., R. Duran, et al. (2009). "Metabolic transformation in cancer." Carcinogenesis 30(8): 1269.
Thiele, I., N. Price, et al. (2005). "Candidate metabolic network states in human mitochondria: Impact of diabetes, ischemia, and diet." Journal of Biological Chemistry 280(12): 11683-11695.
Tummala, S. B., S. G. Junne, et al. (2003). "Transcriptional analysis of product-concentration driven changes in cellular programs of recombinant Clostridium acetobutylicumstrains." Biotechnology and bioengineering 84(7): 842-854.
Varma, A., B. Boesch, et al. (1993). "Stoichiometric interpretation of Escherichia coli glucose catabolism under various oxygenation rates." Applied and Environmental Microbiology 59(8): 2465.
Varma, A. and B. Palsson (1994). "Metabolic flux balancing: basic concepts, scientific and practical use." Nature biotechnology 12(10): 994-998.
Vo, T., H. Greenberg, et al. (2004). "Reconstruction and functional characterization of the human mitochondrial metabolic network based on proteomic and biochemical data." Journal of Biological Chemistry 279(38): 39532-39540.
Wallace, D. (2005). "A mitochondrial paradigm of metabolic and degenerative diseases, aging, and cancer: a dawn for evolutionary medicine."
Wiback, S., R. Mahadevan, et al. (2004). "Using metabolic flux data to further constrain the metabolic solution space and predict internal flux patterns: the Escherichia coli spectrum." Biotechnology and Bioengineering 86(3): 317-331.
Wiback, S. and B. Palsson (2002). "Extreme pathway analysis of human red blood cell metabolism." Biophysical Journal 83(2): 808-818.
Workman, C., H. Mak, et al. (2006). "A systems approach to mapping DNA damage response pathways." Science 312(5776): 1054.
Yang, C., Q. Hua, et al. (2002). "Integration of the information from gene expression and metabolic fluxes for the analysis of the regulatory mechanisms in Synechocystis." Applied microbiology and biotechnology 58(6): 813-822.
Young, P., R. Ebner, et al. (2006). Cancer-linked genes as targets for chemotherapy, Google Patents.
Zachow, R. and J. Woolery (2002). "Effects of hepatocyte growth factor on cyclic nucleotide-dependent signaling and steroidogenesis in rat ovarian granulosa cells in vitro." Biology of reproduction 67(2): 454.
98
6 Supplementary material
Please see attached files:
1. Automatic models list.xls – iMAT supported automatic metabolic models
2. Differentiated Enriched Pathways High Low MET _10.xls – differentiated high
Met significantly enriched pathways for the 10min time-stamp
3. Differentiated Enriched Pathways High Low MET _30.xls - differentiated high
Met significantly enriched pathways for the 30min time-stamp
4. Differentiated Enriched Pathways High Low MET_0.xls - differentiated high Met
significantly enriched pathways for the 0min time-stamp
5. Differentiated Enriched Pathways High Low MET_24h.xls - differentiated high
Met significantly enriched pathways for the 24h time-stamp
6. Differentiated Genes from predicted active rxns High Low MET_24h.xls
7. Differentiated Genes from predicted inactive rxns High Low MET_24h.xls
8. Elevated High Met Celline Enriched Pathways_0.xls – Enriched pathways found
to be elevated in the high Met cell-line for the 0min time-stamp
9. Elevated High Met Celline Enriched Pathways_10.xls – Enriched pathways found
to be elevated in the high Met cell-line for the 10min time-stamp
10. Elevated High Met Celline Enriched Pathways_24h.xls – Enriched pathways
found to be elevated in the high Met cell-line for the 24h time-stamp
11. Elevated High Met Celline Enriched Pathways_30.xls – Enriched pathways found
to be elevated in the high Met cell-line for the 30min time-stamp
99
12. Expression Differentiated Enriched Pathways High Low MET_24h.xls – High
Met differential hyper-geometric based pathway enrichment analysis based solely
on expression data.
13. Expression High Met Celline Enriched Pathways_24h.xls - High Met hyper-
geometric based pathway enrichment analysis based solely on expression data.
14. Expression Low Met Celline Enriched Pathways_24h.xls – Low Met hyper-
geometric based pathway enrichment analysis based solely on expression data.
15. Genes requested by Prof. Ilan Tsarfaty.xls
16. iMAT User Guide.doc
17. Reduced High Met Celline Enriched Pathways_0.xls – Enriched pathways found
to be reduced in the high Met cell-line for the 0min time-stamp
18. Reduced High Met Celline Enriched Pathways_10.xls – Enriched pathways found
to be reduced in the high Met cell-line for the 10min time-stamp
19. Reduced High Met Celline Enriched Pathways_24h.xls – Enriched pathways
found to be reduced in the high Met cell-line for the 24h time-stamp
20. Reduced High Met Celline Enriched Pathways_30.xls – Enriched pathways found
to be reduced in the high Met cell-line for the 30min time-stamp
21. Low Met Celline Enriched Pathways_24h.xls
22. High Met Celline Enriched Pathways_0.xls
23. High Met Celline Enriched Pathways_24h.xls
24. Low Met Celline Enriched Pathways_0.xls
25. Low Met Celline Enriched Pathways_30.xls
26. High Met Celline Enriched Pathways_10.xls
100
27. High Met Celline Enriched Pathways_30.xls
28. Low Met Celline Enriched Pathways_10.xls