תילובטמ היצרגטניאל ילכ :imat integrative metabolic analysis...blavatnik school...

אביב- אוניברסיטת תל

הפקולטה למדעים מדויקים על שם ריימונד ובברלי סאקלר

בית הספר למדעי המחשב על שם בלבטניק

iMAT :כלי לאינטגרציה מטבולית

אפליקציה לסרטן השד

חיבור זה הוגש כחלק מהדרישות לקבלת התואר

אביב-באוניברסיטת תל .M.Sc –" מוסמך אוניברסיטה"

ס למדעי המחשב"ביה

על ידי

הדס צור

העבודה הוכנה בהדרכתו של פרופסור איתן רופין

ע"תש, שבט

Tel-Aviv University

The Raymond and Beverly Sackler Faculty of Exact Sciences

Blavatnik School of Computer Science

iMAT: Integrative Metabolic Analysis Tool

Application to Human Breast Cancer

This thesis is submitted in partial fulfillment of the requirements

towards the M.Sc. degree in Computer Science

Tel-Aviv University

Blavatnik School of Computer Science

by

Hadas Zur

The research work in this thesis has been carried out

under the supervision of Prof. Eytan Ruppin

January, 2010

Acknowledgments

First, I would like to express my deepest gratitude to my supervisor, Prof. Eytan Ruppin.

His guidance enabled me to view up close a scientist and a man. An accomplished

scientist with innovative ideas, motivated by passion and a desire for true understanding.

A man who consistently assists and advises, patiently and always with a smile. For his

non-trivial support during my mother’s hospitalization period, I will forever be grateful.

I would also like to thank Dr. Tomer Shlomi for his guidance in my first steps in the lab

and for all his help during my research.

Many thanks to Prof. Ilan Tsarfati for his willingness, insight and knowledge.

Very special thanks to Tomer Benyamini for his open ear, invaluable advice and

generous availability. Keren Yizhak, your reason and humor were instrumental, thank

you. To my lab colleagues, my friends, Adi Shabi, Ori Folger and Livnat Jerby, much

gratitude.

Finally, I would like to thank my family for all their help and support.

תקציר

לאפיון ניכריםמורכבות זו מציגה אתגרים . אשר מערבת מגוון תופעות ביולוגיות הינה מחלה מורכבתסרטן

מודלים חישוביים של סרטן . והפיזיולוגי יהמולקולאר בהקשרומניעה את חקר הסרטן , הביולוגיה של סרטן

ניסוייםפיתוח מודלים זה מסתייע בהתקדמות המואצת של כלים . מתפתחים כעזר למחקר ביולוגי ורפואי

iMAT (Integrativeודה זו אנו מציגים את בעב .רחב היקףו ידע עתיראשר מייצרים מידע ואנליטיים

Metabolic Analysis Tool) , ראקטומיפרוטאומי ו, של מידע גנומיאשר מאפשר את האינטגרציה

)Reactome array data (ישה תוך כדי הרחבת הג, עם מודל מטבולי לקבלת חיזוי של שטפים מטבוליים

חדשה לאינטגרציה של מידע CBMאנו מציגים מתודת , בפרט. (Shlomi, Cabili et al. 2008)-המוצגת ב

טכנולוגיית , בעוד שמידע גנומי ופרוטאומי רחב היקף קיים כבר זמן מה. ראקטומי לחיזוי שטף מטבולי

חיזוי שטפים מטבולים . (Beloqui, Guazzaroni et al. 2009)י "הראקטום פותחה ממש לאחרונה ע

ניסוייותכיוון שגישות , את הבנתנו של מטבוליזם תאי בהתבסס על מידע מולקולארי רחב היקף יוכל לקדם

בחיזוי שטפים iMATשל כאן אנו מדגימים את התועלתיות. נוכחיות מוגבלות למדידת שטפים בודדים

. כאשר החיזויים תואמים שינויים מטבוליים ידועים, מטבוליים של תאי סרטן שד

Abstract

Cancer is a complex disease that involves multiple types of biological interactions across

diverse physical, temporal, and biological scales. This complexity presents substantial

challenges for the characterization of cancer biology, and motivates the study of cancer in

the context of molecular, cellular, and physiological systems. Computational models of

cancer are being developed to aid both biological discovery and clinical medicine. The

development of these in silico models is facilitated by rapidly advancing experimental

and analytical tools that generate information-rich, high-throughput biological data. In

this work we introduce iMAT, an Integrative Metabolic Analysis Tool, enabling the

integration of transcriptomic, proteomic, and reactome array data with metabolic network

models to predict metabolic flux, developing variants of the approach presented in

(Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based method for

the integration of a genome-scale metabolic network with reactome array data to predict

metabolic flux activity. While high-throughput transcriptomic and proteomic data have

been available for quite some time now, the reactome array technology has been very

recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting new genome

scale data on the rate of metabolite transformation by enzymes present in cell extracts.

The prediction of metabolic fluxes based on high-throughput molecular data sources

could help advance our understanding of cellular metabolism, since current experimental

approaches are limited to measuring fluxes through merely a few dozen enzymes. Here

we demonstrate the utility of iMAT in predicting metabolic flux activities in breast-

cancer cell-lines, where its predictions correspond with previously measured cancer

metabolic alterations.

1

Contents

1 Introduction ................................................................................................................. 2

1.1 Modeling Cellular Metabolism ............................................................................ 2 1.2 Constraint-Based Modelling .............................................................................. 10 1.3 Modeling Human Metabolism ........................................................................... 18 1.4 High-throughput molecular data and Metabolic Model Integration .................. 21 1.5 Human Cancer and Metabolism ......................................................................... 28

1.5.1 Breast Cancer Metabolism .......................................................................... 43

1.5.2 Modeling Cancer Metabolism ..................................................................... 46

1.6 Reactome Array data Technology ...................................................................... 58 1.7 Automatically generated Metabolic Models ...................................................... 60

2 iMAT: Integrative Metabolic Analysis Tool ............................................................ 62

2.1 Online Tool Development .................................................................................. 62 2.1.1 Online availability ....................................................................................... 63

2.1.2 An illustrative example of applying iMAT to a toy network model ........... 64

2.2 Expanding approach: Integration of Reactome Array data ................................ 67 2.2.1 Modeling P.Putida’s Metabolic Profile via Reactome Array Integration .. 70

3 Modeling Human Breast Cancer ............................................................................... 74

3.1 Results ................................................................................................................ 75 3.1.1 Data Acquisition and Preprocessing ........................................................... 75

3.1.2 Analysis Overview ...................................................................................... 75

3.1.3 iMAT on general human model integrated with expression data ............... 76

3.2 Future Directions ................................................................................................ 88 3.2.1 Integrating iMAT’s flux predictions to model a cancer metabolic profile via

quantification ............................................................................................................ 88

3.2.2 Weighted iMAT .......................................................................................... 89

4 Discussion ................................................................................................................. 90

5 Bibliography ............................................................................................................. 92

6 Supplementary material ............................................................................................ 98

2

1 Introduction

Metabolism is widely known to play a key part in human physiology. Its function is

crucial for understanding disease states and progression, aging, nutrition and athletes,

astronauts and soldiers performance improvement.

In particular, metabolism has been known to be involved in many major disease states,

such as diabetes, obesity and cardiovascular disease. Cancers display highly abnormal

metabolic phenotypes, and metabolic targets have long been used in cancer

chemotherapy. More recently, evidence is growing that the effects of metabolism on

physiological and pathophysiological brain functions are significant, from schizophrenia

to neurodegenerative disorders. Successful implementation of molecular systems biology

of human metabolism is thus likely to have broad consequences. (Mo and Palsson 2009)

1.1 Modeling Cellular Metabolism

The intricate nature of human physiology renders its study an arduous undertaking, and a

systems biology approach is necessary to comprehend the complex interactions involved.

Network reconstruction is a pivotal step in systems biology and represents a common

denominator as all systems biology research on a target organism relies on such a

representation (Mo and Palsson 2009). Genome-scale metabolic networks represent the

repertoire of chemical transformations that take place in an organism, in a biochemically,

genetically and genomically structured (BiGG) manner (Figure 1) (Mahadevan, Bond et

al. 2006) and allows the formulation of genome-scale models (GEMs). GEMs enable the

3

computation of phenotypic traits based on the genetic composition of the target organism

(Palsson 2009). Modern genomic sequencing technologies enable the rapid reconstruction

of metabolic networks, giving rise to more than 50 highly curated metabolic

reconstructions that have been published to date (Duarte, Herrgård et al. 2004; Feist and

Palsson 2008). Such network reconstructions span all three domains of life, Eukaryota,

Bacteria, and Archaea. These encompass dozens of bacterial and yeast species, including

various pathogens and industrially relevant organisms, the model plant Arabidopsis and

mammalian metabolic networks including mouse and human (Figure 2) (Duarte, Becker

et al. 2007). The scope and content of network reconstructions continues to grow, for

instance to include the entire transcription/translation apparatus of a cell and the

structural information about the metabolic enzymes (Palsson 2009). Major ongoing

efforts are currently made to develop computational methods to automatically reconstruct

metabolic network models for additional organisms based on genomic and functional

genomic data. Such efforts have recently resulted in draft network reconstructions for 160

microbial species (Overbeek, Begley et al. 2005). Reconstructed metabolic network have

been used toward five major ends: (1) contextualization of high-throughput data, (2)

guidance of metabolic engineering, (3) directing hypothesis-driven discovery, (4)

interrogation of multi-species relationships, and (5) network property discovery (Figure

3) (Oberhardt, Palsson et al. 2009). Specifically common uses span metabolic phenotype

prediction (Guldberg, Rey et al. 1998), metabolic engineering (Pharkya, Burgard et al.

2004), studies of network evolution (Fong, Joyce et al. 2005), and biomedical

applications (Apic, Ignjatovic et al. 2005). These studies employ various constraint-based

modeling (CBM) methods to analyze the network function by solely relying on simple

4

physical-chemical constraints (Price, Reed et al. 2004), while more traditional modeling

techniques are based on mathematical approaches that require detailed information on

kinetics and on enzyme and metabolite concentrations (Fell 1997; Domach, Leung et al.

2000).

Figure 1: Incorporation of genomic and biochemical knowledge derived from the genome annotation and experimental literature into a BiGG-structured knowledge base network. High-throughput annotation data provides information on gene products, transcript variants and their associated functions, as well as localization (i.e. cellular compartment and tissue). Literature documents specific biochemical details from experiments on the gene product functions, such as reaction mechanism and substrate specificity. Figure from (Mo and Palsson 2009)

5

Figure 2: Reconstruction statistics. The cumulative number of metabolic GENREs published over the past decade is shown in (A). (B–D) Histograms of the number of metabolic GENREs containing varying numbers of genes (B), metabolites (C), and reactions (D). (E) Histogram of the number of reconstructions published per species. All histograms display prokaryotic (green) and eukaryotic (brown) statistics. Figure from (Oberhardt, Palsson et al. 2009)

6

Figure 3: Uses of metabolic GENREs. The building and analysis of metabolic GENREs are shown in the left panels, and the five categories of uses of metabolic GENREs are described in the right set of panels. Each panel on the right includes a representative example from literature. Figure from (Oberhardt, Palsson et al. 2009) Essentially, a metabolic network is comprised of a set of chemical reactions that

can be represented as a set of chemical equations (Figure 4a-b), in which the reaction

stoichiometry is embedded (Palsson 2006), where the reaction stoichiometry is the

calculation of quantitative relationships of the reactants and products in a balanced

chemical reaction (Figure 4b). Metabolic reactions can be divided to two distinct types:

7

catabolic reactions that break down molecules and release energy and anabolic reactions

that use energy to build up essential cell components. The dynamics of a metabolic

network pertains to the reaction's metabolic flux, i.e. the rate by which compounds are

composed or decomposed by a reaction. The stoichiometry of a metabolic network can be

represented in matrix form, and is commonly referred to as the stoichiometric matrix,

denoted by S (Figure 4c). The stoichiometric matrix is organized such that rows

correspond to biochemical compounds (metabolites), columns correspond to biochemical

reactions, and the entries are the integer stoichiometric coefficients, thereby each column

conveys the compound balance of the corresponding reaction. Similarly, each row depicts

all the reactions in which a certain compound participates, thus representing reaction

interconnections in the metabolic network. The stoichiometric matrix embodies both

chemical and network information enabling transformation of the network flux vector

(containing the flux rate for each reaction in the network) to the time derivatives of the

metabolites' concentrations (Figure 4d). Thus, the addition of information such as protein

localization, kinetic constants and enzyme and metabolite intracellular concentrations, to

the stoichiometric matrix, results in a structured database which constitutes a basis for

various mathematical and in silico network investigation analysis.

While the stoichiometric matrix represents the static aspect of a metabolic model,

the dynamic facet, a kinetic model, which is composed of a set of differential equations

describing the change in metabolite concentrations over time, may be assembled when

kinetic constants and enzyme and metabolite intracellular concentrations are available.

For instance, comprehensive kinetic models are available for the human red blood cell

8

(Joshi and Palsson 1989; Joshi and Palsson 1990; Lee and Palsson 1990; Mulquiney and

Kuchel 1999). However, lack of accurate and comprehensive data on such parameters

limits the current applicability of such methods to small-scale systems. Only recently, a

proposed workflow for the formulation of large scale kinetic models was outlined

(Jamshidi and Palsson 2008). An alternative approach bypassing this hurdle is

Constraint-Based Modeling (CBM), which serves to analyze the function of large-scale

metabolic networks by solely relying on simple physical-chemical and physiological

constraints (Price, Reed et al. 2004). In recent years, CBM has been frequently used to

successfully predict various phenotypes of microorganisms. In this thesis I will describe a

CBM approach for the integration of a static metabolic network with transcriptomic,

proteomic, or reactome array data, and show its applicability re. the prediction of Human

breast-cancer metabolism via gene expression data, and Pseudomonas Putida metabolism

via Reactome array data.

9

Figure 4: Formal representation of a metabolic network model. (a) A schematic illustration of a metabolic network. The nodes a,b,c represent metabolites and the edges ri represent the reactions. (b) Each reaction is formulated as a chemical equation which is balanced according to the integer coefficients. (c) The stoichiometric matrix S; Rows represent metabolites and columns represents reactions. (d) The dynamic mass balance. Matrix S transforms the reactions' flux vector v into a vector containing the time derivatives of all metabolite concentrations. The figure is adapted from (Lee, Gianchandani et al. 2006).

10

1.2 Constraint-Based Modelling

The recent flood of genomic, transcriptomic, and other high-throughput data makes the

need to interpret this information in a systemic fashion increasingly pressing. The

construction of in silico models represents a way to interpret these data and place them in

the context of cellular physiology. A variety of in silico modeling approaches in biology

have been developed, including detailed kinetic models, cybernetic models, stochastic

models, metabolic control analysis, biochemical systems theory, and constraint-based

methods. Modern modeling approaches in biology need to be easily scalable and able to

integrate available “-omics” data that may contain tens of thousands of measurements. A

constraint based modeling approach meets these criteria and at present is the only

methodology by which genome-scale models have been constructed. The few parameters

used in a constraint-based framework enable models to be built quickly and to encompass

a larger portion of biochemical reaction networks than the portion currently spanned by

other modeling methodologies. To date, constraint-based models account for the largest

metabolic models in terms of numbers of genes and reactions and have proven to be

predictive of some types of data, including phenomic data, qualitative transcriptomic

data, and gene knockout data. (Reed and Palsson 2003)

These advances in genomic sequencing and annotation along with a wealth of

chemical literature enabled the reconstruction of several genome scale metabolic

networks comprised of hundreds to thousands of enzymes, reactions and metabolites

(Price, Reed et al. 2004). Given a stoichiometric matrix S that encapsulates the

biochemical reactions and compounds involved in a metabolic system, a CBM model

11

imposes mass balance, thermodynamic and maximum/minimum flux constraints to define

a set of flux vectors representing all possible steady states of the network. Over the past

decade, Constraint- Based models have been developed for a variety of systems

including: bacterial and yeast metabolism (Edwards and Palsson 2000; SCHILLING and

PALSSON 2000; Schilling, Covert et al. 2002; Förster, Famili et al. 2003), the red blood

cell (Wiback and Palsson 2002), the human cardiac mitochondria (Vo, Greenberg et al.

2004), glutamate neurotransmission (Chatziioannou, Palaiologos et al. 2003), the human

cell metabolic model (Duarte, Becker et al. 2007), and recently, a mouse cell metabolic

model (Selvarasu, Karimi et al.).

While kinetic models may ultimately provide a detailed understanding of

integrated cellular functions, they are limited by the current availability of the

information needed to construct them and by the fact that kinetic constants can vary

across a population and change over time through evolution. The constraint-based

modeling procedure does not strive to find a single solution but rather finds a collection

of all allowable solutions to the governing equations that can be defined. Solutions that

violate any of the imposed constraints are excluded from the collection, which

mathematically is called a solution space. The subsequent application of additional

constraints further reduces the solution space and, consequently, reduces the number of

allowable solutions that a cell can utilize. The constraints that have been used in the first

generation of constraint-based models include stoichiometric constraints (mass balance),

thermodynamic constraints (regarding the reversibility of a reaction), and enzymatic

capacity constraints (using an appropriate value). (Reed and Palsson 2003)

12

The CBM approach represents the constraints on the network as a set of linear

equations on the network's flux vector v (Price, Reed et al. 2004).

(1) · 0

(2)

The steady state assumption represented in equation (1) assumes that there is no

accumulation or depletion of metabolites in the metabolic network. Therefore, the

production rate of each metabolite equal's its consumption rate and there is no

concentration change (the time derivatives of all the metabolites' concentrations equals

zero). The thermodynamic constraints (i.e., under which physiological conditions certain

reactions are reversible while others are not) and flux capacity constraints (i.e.,

constraints on enzyme production rate) define bounds on the flux vector and are

embedded in equation (2). Additional constraints such as ones describing the available

nutrients in the environment or a genetic perturbation may also be included. For example,

in order to eliminate the activity of a gene ("knockout experiment") the minimal and

maximal flux bounds of the corresponding reaction should be set to zero ( i.e.: 0

0). Similarly, to restrict the consumption of a metabolite from the environment the

corresponding uptake reactions’ flux bounds should be set to zero. As more constraints

are applied to the system, the attained solution space is reduced, and models the specific

biological system at hand more accurately, with its solution space describing more likely

functional physiological states (Mahadevan and Schilling 2003; Price, Reed et al. 2004).

13

Extreme pathways and elementary modes represent sets of vectors that describe

the solution space and are themselves biochemically valid flux distributions through a

defined metabolic network. Elementary modes are unique vectors that characterize the

solution space. An elementary mode is defined as a “minimal set of enzymes that could

operate at steady-state with all irreversible reactions proceeding in the appropriate

direction”. Extreme pathways are related to elementary modes and correspond directly to

the edges of the convex solution space. Positive linear combinations of these vectors can

be used to generate any valid steady-state flux solution under the governing constraints.

These analysis methods are useful for characterizing the solution space, and the next step

is to try to determine what solution in the solution space the cell actually opts to use.

(Reed and Palsson 2003)

The aforementioned set of constraints defines a solution space of alternative flux

distributions that can be explored via different optimization and sampling techniques

(Price, Reed et al. 2004; Palsson 2006). Flux Balance Analysis (FBA) is the most widely

studied CBM method (Varma and Palsson 1994; Kauffman, Prakash et al. 2003) which

searches for an optimal steady state solution that maximizes a certain objective function

among all feasible steady state solutions. In micro-organisms FBA the reigning

assumption as an objective function is that the organism strives to maximize its growth

rate (or biomass production). To implement this, an artificial reaction that drains

biosynthetic precursors in an appropriate ratio required to produce the cellular

components was added to micro-organisms CBM (Varma, Boesch et al. 1993). Notably,

FBA under the biomass maximization hypothesis was found to successfully predict an

14

impressive array of phenotypes observed in microorganisms, such as their growth rates

(Edwards, Ibarra et al. 2001), uptake rates, by-product secretion (Varma, Boesch et al.

1993), the outcomes of adaptive evolution (Ibarra, Edwards et al. 2002; Fong and Palsson

2004), gene expression levels (Famili, Förster et al. 2003), metabolic flux rates (Segre,

Vitkup et al. 2002; Wiback, Mahadevan et al. 2004; Shlomi, Berkman et al. 2005), and

knockout lethality (Edwards and Palsson 2000).

FBA utilizes Linear Programming (LP) to find an optimal flux vector v satisfying

the linear equations (1-2) and optimizing a linear objective function (Figure 5). However,

while investigating different metabolic phenotypes and conditions other cellular objective

functions and optimization techniques were explored (Price, Reed et al. 2004). For

example, maximization of ATP production was postulated as a cellular objective

(Majewski and Domach 1990; Ramakrishna, Edwards et al. 2001). In (Burgard and

Maranas 2003) a Quadratic Programming (QP) method was used to find a flux

distribution with a minimum Euclidian distance from a set of experimentally measured

fluxes. Minimization of Metabolic Adjustment (MOMA) also employs QP to identify a

flux distribution in the flux space of a knockout strain, with a minimum Euclidean

distance from the wild-type flux distribution (Segre, Vitkup et al. 2002). In a similar

manner, Regulatory On/Off Minimization (ROOM) employs Mixed Integer Linear

Programming (MILP) to minimize the Boolean regulatory changes between the wild-type

and knockout strain fluxes (Shlomi, Berkman et al. 2005).

15

To exhaustively determine all alternative optimal solutions (Lee, Palakornkule et

al. 2000) applied a MILP formulation on a small metabolic network, consisting of 33

reactions and 30 metabolites. In later stages a combined metabolic/regulatory CBM

model was reconstructed for E.coli (Covert, Knight et al. 2004). In SR-FBA, MILP is

utilized to consider a set of additional Boolean variables that translate the Boolean logic

underlying regulatory constraints and the mapping between genes and reactions to a form

of linear equations (Shlomi, Eisenberg et al. 2007). OptNock defines a bi-level

optimization problem that finds the minimal set of gene deletions that maximize the

production of a desired metabolite under the noted stoichiometric constraints (Burgard,

Pharkya et al. 2003). Drawing upon LP duality theory, the bi-level optimization problem

is elegantly transformed to a single MILP problem (Bard 1998). These several examples

of MILP methods and other computational approaches (Lee, Gianchandani et al. 2006)

were at large applied to CBM models of micro-organisms.

16

Figure 5: Flux Balance Analysis formulation: An illustrative example of employing LP to find a steady state flux distribution for the network shown in Figure 4a. The figure is adapted from (Lee, Gianchandani et al. 2006).

17

Figure 6: Constraint-based modeling: Application of constraints to a reconstructed metabolic network leads to a defined solution space in which a cell’s network must operate. From this solution space a number of methods have been developed that help predict or explain phenotypic behaviour. Linear optimization can be used to find solutions in the space that maximize or minimize a given objective, and mixed-integer linear programming (MILP) can be used to find multiple optima if they exist. Elementary mode analysis and extreme pathway analysis can be used to characterize vectors in the solution space; the edges of the space correspond to extreme pathways (Lee, Gianchandani et al.) and are a subset of the elementary modes (EM). Phenotypic phase plane analysis shows for what conditions the metabolic network operates under different limitations. The effects of gene deletions can also be computed. In the diagram the old optimal solution (point a) does not lie in the new solution space. A new optimum can be calculated (point b), or a suboptimal solution that is closest to the old optimum can be calculated (point c). In addition, work has been done by using experimental flux measurements (indicated by a point) to back-calculate objective functions (indicated by vectors). Figure from (Reed and Palsson 2003)

18

1.3 Modeling Human Metabolism

Research into human metabolism and its regulation has expanded rapidly due to the

emergence of metabolic diseases such as diabetes and obesity as major sources of

morbidity and mortality (Lanpher, Brunetti-Pierri et al. 2006; Muoio and Newgard 2006),

with metabolic enzymes and their regulators increasingly emerging as viable drug targets

(Shi and Burn 2004; Altucci, Leibowitz et al. 2007). In addition, a common hypothesis

exists that malfunctions in energy metabolism may play a central role in a wide range of

age-related disorders and various forms of cancer (Wallace 2005). However, while much

work has been done in the context of applying constraint-based modeling to study the

metabolism of micro-organisms, large-scale modeling of human metabolism is still in its

infancy. In terms of reconstructing human metabolic networks, most of the previous work

has focused on characterizing distinct metabolic pathways (Kanehisa and Goto 2000;

Romero, Wagg et al. 2004).

Reconstructions of large-scale human metabolic networks had until recently been

performed only for specific cell types and organelles. The human red blood cell (RBC)

conducts a simplistic metabolic activity that is well characterized and is described by both

kinetic (Mulquiney and Kuchel 1999) and CBM models (~30 reactions, ~40 metabolites ;

(Wiback, 2002 #66)) that were utilized to study its metabolic behaviour under

multifarious conditions . For example: (Mulquiney and Kuchel 1999) studied the

regulation and control of the key regulatory enzyme 2-3 biphosphoglycrate in glycolysis.

In (Durmu Tekir, Çak r et al. 2006) CBM techniques were applied to study several RBC

19

enzymopathies and indicated that RBC metabolism is mostly affected by the glucose-6-

phosphate dehydrogenase and phosphoglycerate kinase enzymopathies. An FBA model

for the metabolism of neurotransmitter glutamate was constructed to study its metabolism

in the brain, and pointed at several regulatory points that govern the release of this major

stimulatory neurotransmitter (Chatziioannou, Palaiologos et al. 2003). However, this

model is partial (16 reactions, 13 metabolites) and includes only a subset of reactions

associated with glutamate metabolism.

A study on the cardiac mitochondria exhibited a wider scope reconstruction

(~190 reactions, ~230 metabolites) and employed a CBM approach to examine the

capabilities of the reconstructed network to fulfil three metabolic functions: ATP

production, heme synthesis, and mixed phospholipid synthesis (Vo, 2004 #69). In a later

study, LP and uniform random sampling were applied to study mitochondrial activity

under four metabolic conditions: normal physiologic, diabetic, ischemic, and dietetic

(Thiele, Price et al. 2005), implying reduced flexibility of the metabolic network under

abnormal conditions. This study simulated suggested treatments to evaluate their impact

on diabetic conditions and deduced that neither normalized glucose uptake nor decreased

ketone body uptake have a positive effect on the mitochondrial energy metabolism. It

also showed that the experimentally observed reduced activity of pyruvate dehydrogenase

in vivo under diabetic conditions could be a result of stoichiometric constraints and

therefore would not necessarily require enzymatic inhibition.

20

A cardinal step forward has been presented in recent studies by (Duarte, Becker et

al. 2007) and by (Ma, Sorokin et al. 2007) that reconstructed the global human metabolic

network based on an extensive evaluation of genomic and bibliomic data (about 1500

references). These networks included ~4000 reactions, ~3000 metabolites, and ~1500

genes mapped to the various reaction over 7 organelles. A comparison between the two

networks implies that the network reconstructed by Ma et al. is more extensive, but it was

not assembled as a CBM model. The resulting network models are, however, non tissue

specific. Furthermore, CBM methods that explore the solution space of metabolic states

described by the network model of Duarte et al., were not applied in a comprehensive

manner. Rather, Duarte’ et al. characterization of genome-scale changes via metabolic

behaviour following gastric bypass surgery, was essentially based on the topological

properties of the network, thus not utilizing the stoichiometric data embedded in the

model.

The task of adapting constraint-based modeling methods from the realm of

microorganisms to that of multi-cellular organisms encounters two main hurdles: One

major difficulty relates to the fact that different tissues have different metabolic

objectives that are not well characterized and are largely unknown. This is in contrast to

modeling microorganisms where a simple objective function (such as maximizing the

biomass production rate) can be used together with the FBA method to predict

biologically plausible flux distributions. Another major difficulty relates to the lack of

information on tissue-specific metabolite uptake and secretion, which is essential for

FBA employment.

21

1.4 High-throughput molecular data and Metabolic Model

Integration

The availability of high throughput transcriptomic, proteomic and metabolomic data

raises an emerging challenge of overlaying this data on top of the reconstructed metabolic

networks, to more accurately infer the metabolic activity reflected in the data. Similar

challenges arise when integrating such high-throughput functional data with networks of

physical molecular interactions, such as protein-protein and protein-DNA, towards the

inference of functional modules and control mechanisms (Luscombe, Madan Babu et al.

2004; Chuang, Lee et al. 2007). Existing methods for integrating functional data with a

metabolic network are of two types: (i) Network topology-based – considering only the

structure of a metabolic network and overlaying high-throughput data to foster insight

into metabolic hotspots or pathways that are significantly altered under certain conditions

(Hu, Mellor et al. 2005; Joyce and Palsson 2006) (Chatziioannou, Palaiologos et al.)

(Chatziioannou, Palaiologos et al.) Constraint-based – integrating the high-throughput

functional data within a constraint-based modeling approach, to improve the prediction of

the actual metabolic flux distribution through the network.

Utilizing gene and protein expression to predict metabolic flux is a challenging

task due to the complex mapping between the two. Previous studies have found a strong

qualitative correspondence between gene expression and measured (Daran-Lapujade,

Jansen et al. 2004; Fong and Palsson 2004) as well as predicted (Stelling, Klamt et al.

2002; Famili, Fצrster et al. 2003; Akesson, Forster et al. 2004; Bilu, Shlomi et al. 2006)

22

metabolic fluxes in microbes. However, the correlation between expression and

metabolic flux is generally moderate and in some cases significant transcriptional

changes do not reflect changes in flux (Banta, Vemula et al. 2007), and vice-versa,

significant changes in measured flux may not reflect transcriptional changes (Yang, Hua

et al. 2002; Tummala, Junne et al. 2003). These discrepancies may result from

hierarchical regulation, reflecting post-transcriptional regulation of protein synthesis and

degradation rates, and post-translational modifications that represent additional regulatory

mechanisms which affect the potential activity rate of metabolic enzymes (Park, Lee et al.

2005). Furthermore, they also arise due to an additional level of flux regulation that is not

reflected in gene expression, termed metabolic regulation. The latter denotes the effect of

metabolite concentrations on the actual enzyme activity through allosteric and mass

action effects (Rossell, van der Weijden et al. 2006).

High-throughput experiments give rise to different biological networks, such as

signaling, transcriptional regulatory and metabolic networks. The analysis of these

networks has mostly involved exploration of static (topological) properties (Luscombe,

Madan Babu et al. 2004). However, while static analysis provides some insight into the

network function, it does not reveal the dynamics arising from a specific temporal, spatial

and physiological biological context. The dynamics of a biological system can be

revealed by the integration of diverse genome and metabolome wide data, such as gene

expression levels, with the aforementioned static networks. Integration of networks with

high-throughput data was shown to be advantageous for various networks and data

sources. Network-level analysis of experimental data, facilitates a more comprehensive

23

perception of the investigated system and its underlying molecular mechanism

(Workman, Mak et al. 2006; Chuang, Lee et al. 2007). Integration of protein–protein

interaction networks and gene expression data identifies markers that are more

reproducible than individual marker genes selected without network information for

example. Chuang et al. demonstrated that they achieve higher accuracy in the

classification of metastatic versus non-metastatic tumors. Large unexpected changes in

the underlying regulatory network architecture can be uncovered via the dynamics of a

biological network on a genomic scale, by integrating transcriptional regulatory

information and gene-expression data. Luscombe et al. performed such integrations for

multiple conditions in Saccharomyces cerevisiae.

The emergence of metabolism as a key factor in common diseases, makes the

integration of genome and metabolome wide data with a metabolic network potentially

very informative. Several CBM methods for analyzing and predicting metabolic flux

distributions based on gene expression data have been previously suggested. The methods

of (Akesson, Forster et al. 2004) and (Becker and Palsson 2008) use gene expression data

to identify genes that are absent or likely to be absent in certain contexts and search for

metabolic states which prevent (or minimize) the flux through the associated metabolic

reactions.

A recent method by (Shlomi, Cabili et al. 2008) considers data on both lowly and

highly expressed genes in a given context as cues for the likelihood that their associated

reactions carry metabolic flux, and employ CBM to accumulate these cues into a global,

24

consistent prediction of the metabolic state. Applied to a metabolic network model of the

yeast S. cerevisiae, this method was shown to accurately predict changes in metabolic

fluxes across different growth media, in accordance with measured flux data. The method

was further applied to predict human tissue metabolism, based on tissue-specific gene and

protein expression data. The analysis showed that the activity of genes responsible for

metabolic diseases is not directly manifested in enzyme-expression data, though can still

be correctly predicted by expression integration with a metabolic network, as validated by

large-scale mining of tissue-specificity data.

(Shlomi, Cabili et al. 2008) Formulation:

A detailed Boolean gene-to-reaction mapping (part of the metabolic network model of

Duarte et al.) is employed to identify a tissue-specific expression state for each reaction,

reflecting whether its enzyme-encoding genes are classified as expressed in the tissue.

Specifically, this is done by modifying the Boolean mapping to account for tri-valued

expression states, assigning highly, lowly and moderately expressed genes, values of 1, –

1 and 0, respectively, and replacing the logical ‘and’ and ‘or’ operators with ‘max’ and

‘min’ expressions, respectively. This analysis results in a subset of the reactions in the

model (denoted ) that is defined to be highly expressed and another subset (denoted

) defined as lowly expressed. For each tissue, the following mixed integer linear

programming (MILP) problem is formulated to find a steady-state flux distribution

satisfying stoichiometric and thermodynamic constraints, while maximizing the number

of reactions whose activity is consistent with their expression state:

max ∑ ∑

s.t

25

(1) 0

(2)

(3) , ε , :

(4) , ε , :

(5) , 1 , 1 :

, 0,1

where is the flux vector and is a stoichiometric matrix, in which is the

number of metabolites and is the number of reactions. The mass balance constraint is

enforced in equation (1). Thermodynamic constraints that restrict flow direction are

imposed by setting and as lower and upper bounds on flux values in equation

(2), respectively. For each expressed reaction, the Boolean variables and _ represent

whether the reaction is active (in either direction) or not. Specifically, a highly expressed

reaction is considered to be active if it carries a significant positive flux that is greater

than a positive threshold ε (equation (3)) or a significant negative flux (equation (4)

for reversible reactions). We chose a threshold of 1 to determine reactions’ flux

activity for highly expressed reactions, though various other choices provide qualitatively

similar results. For each lowly expressed reaction, the Boolean variable represents

whether the reaction is inactive (equation (5)). Specifically, lowly expressed reactions are

considered to be inactive if they carry zero metabolic flux, though changing equation (5)

to enable these reactions to carry a low metabolite flux (that is, with an upper bound

lower than ε) and still be considered inactive provides qualitatively similar results.

26

The optimization maximizes the number of highly expressed reactions ( ) that are

active and the number of lowly expressed reactions ( ) that are inactive. A solution

found by the MILP solver is guaranteed to be an optimal one in terms of the objective

function maximized, but the solution identified may not be unique as a space of

alternative optimal solutions may exist. In this case, the space of optimal solutions

represents alternative steady-state flux distributions obtaining the same similarity with the

expression data. To account for these alternative solutions, a variant of Flux Variability

Analysis was employed. The method computes for each metabolic reaction whether it is

predicted to be always active (or, in the opposite case, always inactive) in a certain tissue

across the entire solution space. This is performed by solving two MILP problems (each

similar to the one described above) for each reaction to find the maximal attainable

similarity with the expression data when the reaction is forced to be activated (denoting

this similarity ) and when it is forced to be inactivated (denoting this similarity ).

A reaction is then considered to be active in this tissue if (that is, a higher

similarity with the expression data is achieved when the reaction is active than

when it is inactive) with a confidence level of . Conversely, it is considered to be

inactive if , with a confidence of . In case (that is, the same similarity

with the expression data can be achieved both when the reaction is forced to be active or

inactive), the activity state is considered to be undetermined.

The method implementation requires solving multiple, complex Mixed-Integer

Linear Programming (MILP) optimization problems, requiring extensive parallel

computing resources, and has hence not been readily accessible for the research

community since its publication.

27

In this work we introduce an Integrative Metabolic Analysis Tool (iMAT),

enabling the integration of transcriptomic, proteomic, and reactome array data with

metabolic network models to predict metabolic flux, developing variants of the approach

presented in (Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based

method for the integration of a genome-scale metabolic network with reactome array data

to predict metabolic flux activity. While high-throughput transcriptomic and proteomic

data have been available for quite some time now, the reactome array technology has

been very recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting

new genome scale data on the rate of metabolite transformation by enzymes present in

cell extracts. All together, iMAT supports the integration of functional data with an array

of different models, including: (i) a highly curated metabolic network model of human

metabolism by (Duarte, Becker et al. 2007), enabling the prediction of metabolic activity

under various tissues and cell-types; (Chatziioannou, Palaiologos et al.) (Chatziioannou,

Palaiologos et al.) common model organisms such as E. coli and S. cerevisiae; and (iii) an

array of automatically reconstructed networks for 160 bacteria (Overbeek, Begley et al.

2005), enabling the prediction of metabolic activity under various environmental and

genetic conditions. Importantly, The usage of iMAT is straightforward and user friendly,

starting with the submission of the functional data for a certain organism via WEB and

receiving a visualization of the organism’s metabolic network showing the most likely,

predicted metabolic flux. The applicability of iMAT is demonstrated here re. the

prediction of Human breast-cancer metabolism via gene expression data, and

Pseudomonas Putida metabolism via Reactome array data.

28

1.5 Human Cancer and Metabolism

In 2000, Douglas Hanahan and Robert Weinberg published a review detailing the six

hallmarks of cancer. These are six phenotypes that a tumour requires in order to become a

fully fledged malignancy: (1) persistent growth signals, (2) evasion of apoptosis, (3)

insensitivity to anti-growth signals, (4) unlimited replicative potential, (5) angiogenesis

and (6) invasion and metastasis. However, it is becoming increasingly clear that these

phenotypes do not portray the whole story and that other hallmarks are necessary: one of

which is a shift in cellular metabolism. The tumour environment creates a unique

collection of stresses to which cells must adapt in order to survive. This environment is

formed by the uncontrolled proliferation of cells, which ignore the cues that would create

normal tissue architecture. As a result, the cells forming the tumour are exposed to low

oxygen and nutrient levels, as well as high levels of toxic cellular waste products, which

are thought to propel cells towards a more transformed phenotype, resistant to cell death

and pro-metastatic. (Tennant, Duran et al. 2009)

In order to sustain the rapid proliferation and to counteract the hostile

environment observed in tumours, cells must increase the rate of metabolic reactions to

provide the adenosine triphosphate (ATP), lipids, nucleotides and amino acids necessary

for daughter cell production. Cells that do not undergo these changes will not survive the

tumour environment, resulting in the selection of those with a transformed metabolic

phenotype. One seemingly necessary metabolic alteration is the increase in the rate of

glycolysis, the conversion of glucose to pyruvate. In work beginning .80 years ago, Otto

Warburg noted that tumour cells use glycolysis (‘fermentation’), even in the presence of

29

O2. This was termed ‘aerobic glycolysis’ and since then has been considered as a

universal phenotype of tumours. In normal cells, an interplay exists between

mitochondrial respiration and glycolysis in which mitochondrial respiration inhibits

glycolytic flux—a phenomenon originally described in yeast by Pasteur in 1861 (the

‘Pasteur Effect’) and was expanded upon and extended to mammalian tissues by

Crabtree. High rates of aerobic glycolysis are not a mechanism unique to tumours, as all

energy-demanding cells utilize glycolysis as well as mitochondrial respiration for ATP

production. However, the phenotype that is unique to cancer is the high levels of lactate

that are produced from the increased rate of aerobic glycolysis. Forcing proliferating cells

into a resting, differentiated phenotype can decrease glycolytic rate and promote

oxidative Phosphorylation (OXPHOS) as the major ATP generating process, indicating

that, at least in the case of normal cells, this loss of the mitochondrial inhibition of

glycolysis is reversible. Glycolysis produces only 2 mol of ATP per mole glucose, an

inefficient bioenergetic process when compared with OXPHOS (up to 36 mol of ATP per

mole glucose); so in order to maintain normal ATP levels in the tumour, the rate of

glycolysis must be much greater than that observed in most normal tissues (exceptions

include the heart and kidney). This hunger of tumours for glucose is utilized in current

clinical practice, as primary and distant metastatic sites of tumours can be imaged in

patients using their uptake of a radiolabelled glucose derivative (18fluoro-2-

deoxyglucose). The change in metabolism cannot be purely attributed to alterations in

allosteric and product/substrate regulation of the metabolic enzymes. A concerted ‘energy

response’ also occurs involving factors such as mammalian target of rapamycin (mTOR),

30

Myc and the hypoxia-inducible factors (HIFs), which is vital for the long term metabolic

transformation of tumours. (Tennant, Duran et al. 2009)

Other than increased aerobic glycolysis, cancer cells also utilize glucose under

anaerobic conditions to compensate for the reduced mitochondrial ATP generation.

Hypoxia (low oxygen) and anoxia (complete lack of oxygen) are both present in most, if

not all solid tumours. Hypoxia specifically is thought to be an important factor in

supporting and directing tumour progression. However, contrary to being under constant

hypoxia, one important facet of the tumour environment is that the hypoxia experienced

by the cell is thought to be variable, even cycling between normal oxygen tension and

acute hypoxia (< 10 mm Hg O2). (Tennant, Duran et al. 2009)

Although strongly and rapidly up-regulated under short periods of hypoxia, during

chronic hypoxia, HIF levels are decreased. Only the areas furthest from functional blood

vessels experience this effect, and in the absence of an angiogenic response, they are

thought to form the necrotic areas in a tumour. The down-regulation of HIFα in these

circumstances is thought to help protect against the necrosis of cells, but this may only be

for a limited time period. Most of the hypoxic regions found in tumours are exposed to

fluctuating levels of O2, which allows for continued HIF stabilization. However,

fluctuating levels of O2 can also cause an increase in intracellular levels of reactive

oxygen species (ROS). It has been observed that ROS can be produced under hypoxia

due to inefficient electron transport chain (ETC) activity, and the resultant leakage of

electrons, mainly from complexes I and III of the ETC. (Tennant, Duran et al. 2009)

31

HIF expression in tumours—whether due to hypoxia, TCA cycle enzyme

mutation, mitochondrial dysfunction or aberrant growth factor stimulation—is known to

be vitally important for their progression. Studies in xenografts have shown that

decreasing HIF expression in tumours inhibits growth, and data from patient samples

have shown a correlation between HIF, HIF target gene expression and disease

progression and patient survival. This positions HIFα firmly as a therapeutic target, and a

number of antitumour therapies have been designed to interfere with HIF and its target

genes. (Tennant, Duran et al. 2009)

In order to undergo glycolysis, glucose enters the cell via a facilitative glucose

transporter (Figure 7). A number of glucose transporters are up-regulated in tumours,

Glut1 being particularly important in the tumour response to hypoxia. Up-regulation of

this transporter immediately increases the intracellular availability of glucose for

metabolic reactions, most of which are initiated by its Phosphorylation by hexokinase to

give glucose 6-phosphate (G6P, see Figure 7). Hexokinase II, one of the four hexokinase

isozymes, is a target of many transcription factors important in tumorigenesis, including

HIF1 and Myc (through the ‘carbohydrate response element’). Hexokinase is also thought

to have a role in protecting the cell against apoptosis. (Tennant, Duran et al. 2009)

The conversion of pyruvate to lactate appears important for the maintenance of

tumour cell viability. This is carried out by lactate dehydrogenase (Figures 7 and 8A), of

which the A isoform is strongly upregulated in tumours. Lactate production is important

for the recycling of cytosolic nicotinamide adenine dinucleotide (NAD ) in the absence of

functional mitochondrial-cytoplasmic NADH (the reduced form of NAD ) shuttles due to

32

decreased OXPHOS (Figure 8B). The regeneration of cytosolic NAD is vital for efficient

glycolysis. In studies carried out by Fantin et al., lactate dehydrogenase A suppression

not only pushed cells towards a more OXPHOS phenotype but also slowed their

proliferation in vitro, and in an in vivo model of breast cancer almost tripled the survival

of mice compared with an lactate dehydrogenase A expressing control. (Tennant, Duran

et al. 2009)

In order to sustain the rapid proliferation characteristic of tumours, increased

synthesis of both fatty acids and nucleotide precursors must occur. A mechanism used by

cells to support this is the diversion of glycolytic intermediates into the pentose

phosphate pathway, either from G6P (using the oxidative arm) or from fructose 6-

phosphate (using the non-oxidative arm). These intermediates can then be used to reduce

nicotinamide adenine dinucleotide phosphate (NADP +) to NADPH (from the oxidative

arm only) and synthesize ribose 5-phosphate (Figure 7). The control of glycolysis by

PFK2/FBPase and TIGAR (as mentioned earlier) has the ability to divert substrates into

the oxidative arm of the PPP. Increasing PFK2/FBPase phosphatase activity or inhibiting

PFK1 by some other means (such as increase in ATP or citrate) will therefore increase

PPP activity and support rapid cellular proliferation. The diversion of G6P into the PPP

flow not only has the capacity to increase nucleotide biosynthesis but also increase the

antioxidant capacity of the cell due to the generation of NADPH required for the

reduction of oxidized glutathione. In this respect, the acceleration of the PPP after DNA

damage, or during tumorigenesis in general, may prove important, as it provides much of

the necessary equipment with that to replicate and repair the DNA. NADPH generated by

33

the oxidative PPP also supports fatty acid biosynthesis required for tumour growth (see

below). Interestingly, the first two enzymes in this pathway, G6P dehydrogenase and 6-

phosphogluconate dehydrogenase are also up-regulated in transformed cells. (Tennant,

Duran et al. 2009)

35

Figure 7: Glycolysis and cancer. Green text—enhanced/activated in cancer. Red text—reduced/inhibited in cancer. Figure from (Tennant, Duran et al. 2009). .

37

Figure 8: (A) TCA cycle and cancer. (B) The malate/aspartate shuttle. This process is used to transfer electrons from the cytosolic NADH pool to the mitochondria to be oxidized by the ETC. Green text—enhanced/activated in cancer. Red text—reduced/inhibited in cancer. Figure from (Tennant, Duran et al. 2009).

Proliferating cells in general and cancer cells in particular require de novo

synthesis of lipids for membrane assembly. Under conditions where PDH is not inhibited,

pyruvate is converted into acetyl-CoA and enters the TCA cycle by condensing with

oxaloacetate to form citrate (Figure 8A). This intermediate is mostly further oxidized in

the TCA cycle to produce reducing potential for the mitochondrial ETC, but can also be

used for fatty acid synthesis in the cytosol. Cytosolic citrate is converted back into

oxaloacetate and acetyl-CoA by the action of ATP citrate lyase. The reduction in levels or

activity of any of the three enzymes involved in fatty acid synthesis has been shown to

inhibit tumour growth and may therefore represent a target for tumour therapies.

Interestingly, activation of AKT has been found to inhibit the β oxidation (degradation)

of lipids by inhibiting the expression of carnitine palmitoyltransferase (CPT1A). This

further support the anabolic reprogramming observed in tumorigenesis and their push

towards increased proliferation. (Tennant, Duran et al. 2009)

Glutaminolysis: There are two major sources of energy and carbon for cancer

cells: glucose and glutamine. Cancer cells appear to use excessive amounts of both

nutrients: more than they need for their function. One possible explanation is that high

rates of flux through these metabolic pathways can affect the regulation of other

metabolic branches, allowing high rates of proliferation. A consequence of this excess is

the increased secretion of by-products of glucose and glutamine degradation, mainly

lactate, alanine and ammonium (Figure 8A). It has been recently proposed that in this

38

context, glucose accounts mainly for lipid and nucleotide synthesis, whereas glutamine is

responsible for anaplerotic re-feeding of the TCA cycle, for amino acid synthesis and for

nitrogen incorporation into purine and pyrimidine for nucleotide synthesis. Glycolysis is

capable of re-feeding the TCA cycle in the presence of functional pyruvate carboxylase

(Figure 8A). In light of the rapid growth and proliferation of tumour cells, catabolic

reactions are unlikely to be used to feed the TCA cycle, predicting that in order for cells

to efficiently use glutamine for anabolic reactions, at least some pyruvate must enter the

TCA cycle, instead of being converted to lactate. (Tennant, Duran et al. 2009)

Once in the cell, glutamine is initially deaminated to form glutamate, a process

catalysed by the enzyme glutaminase (Figure 8A). Glutamate in turn can be converted

into α-ketoglutarate either by a second deamination process catalysed by the enzyme

glutamate dehydrogenase or through transamination. On entering the TCA cycle, α-

ketoglutarate is metabolized to eventually generate oxaloacetate, an important anabolic

precursor that will condense with the acetyl-CoA generated from glycolysis or

glutaminolysis to produce citrate. The importance of glutaminolysis in cancer

metabolism is evident from the considerable release of ammonium in the de venous

effluent of cancer patients, and by the fact that, with time, the majority of patients

develop glutamine depletion. In fact, glutaminase has been found to be over-expressed in

a variety of tumour models and human malignancies, and the rate of glutaminase activity

correlates with the rate of tumour growth. Unfortunately, despite promising signs in

leukaemic mouse models, mammary tumours and colon carcinoma, therapeutic strategies

designed to limit the availability of glutamine to cancer cells with inhibitors of

glutaminase (6-diazo-5-oxo-L-norleucine or acividin) failed due to severe side effects

39

during clinical trials. However, better knowledge of the biochemical and regulatory

processes of glutamine uptake and degradation in normal and cancer cells could

constitute a major goal in designing new strategies against cancer. (Tennant, Duran et al.

2009)

Mitochondrial citrate not exported for anabolic use is used in the TCA cycle to

produce reducing equivalents for the ETC (Figure 8A). Two of the enzymes in this

pathway succinate dehydrogenase (SDH) and fumarate hydratase (FH) are of particular

importance for cancer. SDH is also complex II of the ETC, where reduced flavine

adenine dinucleotide (FADH2) is generated and further oxidized. It consists of four

subunits: A and B, which are associated with the inner leaflet of the mitochondrial inner

membrane and C and D, which are embedded in the mitochondrial inner membrane.

Although their function is vital for the normal working of the TCA cycle, mutations in

either FH or SDHB, SDHC or SDHD are known causes of a number of familial and

sporadic cancers, namely leiomyoma, leiomyosarcoma or renal cell carcinoma (FH),

paraganglioma and pheochromocytoma (SDHB, SDHC and SDHD). Phenotypically, all

of these mutations result in pseudohypoxia, referring to the normoxic induction of HIFα

subunits. Mechanistically, it has been shown that the increase in succinate (SDH

mutations) or fumarate (FH mutations) levels is responsible for inactivation of the PHDs

even in the presence of O2, leading to the normoxic stabilization of HIFα and up-

regulation of its downstream effectors (Figure 9). As discussed earlier, mitochondrial

ROS can be produced from both complexes I and III of the ETC under hypoxia.

However, it has been suggested that SDHB, SDHC and SDHD mutations can also result

40

in normoxic ROS production and HIF activation, though the role of ROS in

pseudohypoxia of SDH-deficient cells is debatable. (Tennant, Duran et al. 2009)

Figure 9: Synthesis, degradation and regulation of HIFa. aKG, a-ketoglutarate. Figure from (Tennant, Duran et al. 2009) Amino acids and their transporters: To sustain high proliferation rates, cancer

cells are extremely dependent on extra-energy and nutrient supply. Therefore, nutrient

uptake and metabolism are frequently altered and enhanced in tumour cells. Amino acids

are the primary source of cellular nitrogen. In addition to being the building blocks for

41

protein synthesis, they are used for nucleotide and glutathione synthesis, and the carbon

backbone can also be used for ATP synthesis. Moreover, amino acids have an important

role in regulating signalling pathways that govern cell growth and survival. Many human

tumour cells express high levels of amino acid transporters, and this correlates with

disease progression. A notable example is the alanine serine cysteine transporter 2

(ASCT2) transporter, a non-specific neutral amino acid transporter that functions as the

major transporter of glutamine in numerous cell lines. Given that glutamine has a key role

in tumour cell metabolism (discussed above) and that glutamine transport is increased in

tumour cells, it is not surprising that ASCT2 expression is also enhanced during tumour

development. ASCT2 expression is enhanced in breast, liver and brain tumours, and

inhibition of ASCT2-dependent glutamine transport inhibited the growth of colon

carcinoma cell lines. Moreover, silencing of the ASCT2 messenger RNA transcript

causes dramatic apoptosis in hepatoma cells and this appears to occur in parallel with its

role in glutamine uptake. Enhanced expression of L-type amino acid transporter

(LAT1), another amino acid transporter with high affinity for several essential amino

acids including leucine, tryptophan and methionine, has been reported in high-grade

astrocytomas and correlates with poor survival. LAT1 inhibition has been shown to block

glioma cell growth in both in vitro and in vivo models. These findings highlight the

growth advantage conferred to tumour cells by increased amino acid transporter

expression and point to a potential role for amino acid transporter inhibition as a

therapeutic strategy. (Tennant, Duran et al. 2009)

42

Mammalian target of rapamycin: As previously mentioned, cells strictly depend

on nutrient availability and growth stimuli to sustain growth and proliferation. The

regulation of these stimuli is integrated by target of rapamycin (TOR), a highly

evolutionarily conserved mechanism, present from unicellular eukaryotes to mammals. In

mammals, mTOR is involved in four different sensing mechanisms: growth factor

signalling; nutrient availability; oxygen availability and internal energetic status. All four

factors are especially important for tumour development. Solid tumours can become

limited for nutrient, oxygen and growth factors, a situation potentially leading to

energetic limitations inside the cancer cell. Therefore, de-regulation of the molecular

process that controls these mechanisms could be critical for tumour development.

(Tennant, Duran et al. 2009)

Summary: The extent to which metabolism plays a role in tumorigenesis should

not be underestimated, and drugs that can selectively target the metabolic phenotype of

the tumour and its environment are likely to at least delay, if not halt tumour progression.

The resistance of tumours to both radiotherapy and chemotherapy can often be attributed

to its aberrant metabolism. It therefore follows that the reactivation of a more ‘normal’

metabolism could very well re-sensitize tumours to these agents. Cell metabolism is

inextricably linked to its differentiated state: if we can reverse the metabolism of a de-

differentiated, aggressive tumour to that of a more quiescent state it may become more

amenable to other interventions. Therapies that target tumour metabolism are already

being tested in pre-clinical and clinical studies, but this field is very much in its infancy.

43

It is anticipated that the next few years will provide more new therapeutic approaches that

target metabolic transformation. (Tennant, Duran et al. 2009)

1.5.1 Breast Cancer Metabolism Breast cancer is the most common cancer type for women in the western world. Despite

decades of research, the molecular processes associated with the breast cancer

progression are still inadequately defined. Recently (Lu, Bennet et al. 2010) deferred

focus to the systematic alteration of metabolism by using the state of the art metabolomic

profiling techniques to investigate the changes of 157 metabolites during the progression

of normal mouse mammary epithelial cells to an isogenic series of mammary tumour cell

lines with increasing metastatic potentials. The results suggest a two-step metabolic

progression hypothesis during the acquisition of tumourigenic and metastatic abilities.

Metabolite changes accompanying tumour progression are identified in the intracellular

and secreted forms in several pathways, including glycolysis, tricarboxylic acid cycle,

pentose phosphate pathway, fatty acid and nucleotide biosynthesis and the GSH-

dependent anti-oxidative pathway. These results suggest possible biomarkers of breast

cancer progression as well as opportunities of interrupting tumour progression through

the targeting of metabolic pathways.

Approximately 40,000 American women succumb to breast cancer each year,

with metastasis causing the overwhelming majority of these deaths. Metastasis is a multi-

step process, requiring tumour cells to intravasate into the bloodstream, survive in the

44

circulation, adhere to and extravasate from the vascular network in the secondary organ,

and finally adapt to a foreign microenvironment. Recent functional transcriptomics

studies identified genes that play important roles in individual steps of breast cancer

metastasis with the MDA-MB-231 xenograft model or the 4T1 syngeneic mouse models.

However, studies with metabolomic approaches to identify key metabolites that

characterize metastasis progression are still scarce. Metabolic reprogramming was linked

to the major hallmarks of cancer, including tissue invasion and metastasis. However, its

functional role in tumour progression and metastasis remains largely undefined. In a

recent study, metabolomic profiling identified increased sarcosine synthesis as a

functionally important metabolic alteration during prostate cancer progression. Similar

efforts to identify important metabolic changes during breast cancer progression hold the

potential for providing putative diagnostic and prognostic biomarkers as well as new

therapeutics targets. In the current study, we used the 4T1 syngeneic mouse model to

systematically identify metabolite changes. (Lu, Bennet et al. 2010)

Malignant transformation of normal epithelial cells and metastasis ability

acquisition has usually been studied with the aim to identify genes and proteins that play

tumour-promoting or suppressive roles. The pace of finding such molecules has been

tremendously expedited with the development of transcriptomics and proteomics to look

for transcripts and proteins with altered abundance during malignancy. Another aspect of

molecular changes, altered metabolism, was less explored, even though the phenomenon

of aerobic glycolysis was one of the first major discoveries in cancer research. (Lu,

Bennet et al. 2010)

45

Fortunately, the situation is improving owing to the recent technical advances in global

analysis of metabolites using mass spectrometry or high-resolution 1H nuclear magnetic

resonance spectroscopy. High-throughput metabolomic analysis allows simultaneous

quantification of hundreds of metabolites belonging to a diverse array of metabolic

pathways in a panel of cell lines or tissues. As illustrated in (Lu, Bennet et al. 2010), 157

metabolites were profiled in six cell lines with progressively increased tumourigenicity

and metastatic ability. The analysis of intracellular metabolites clustered the lines into

three categories, normal, tumourigenic (but non-metastatic) and metastatic in general.

Results from the analysis favour a two-step metabolic progression hypothesis during

mammary tumour progression: the first step accompanies the acquisition of

tumourigenicity and includes altered glycolysis, PPP and fatty acid synthesis as well as

decreased GSH/GSSG redox pool; the second step is correlated with the gain of the

general metastatic ability and includes further changes in glycolysis and TCA cycle,

further depletion of the glutathione species, and increased nucleotides. No further

metabolite alterations correlated with stepwise increase of metastasis potential in the four

metastatic lines were resolved in the analysis. This model suggests that the fine regulation

of the ability to colonize distant organs in breast cancer may not require further dramatic

biochemical reprogramming and may instead rely more on alterations in gene expression

regulation and cellular behaviours, although we cannot rule out the potential importance

of metabolomic changes in metastatic lesions in vivo in different target organs. Our

analysis of extracellular metabolites identified increased abundance of TCA cycle

components as well as nucleotide metabolism intermediates, similar to the intracellular

results. Our findings agree with a recent study profiling a more limited set of metabolites

46

in the MCF10 model of mammary carcinoma. Both studies find evidence for increased

pentose phosphate pathway, TCA cycle, and fatty acid biosynthetic activity in

transformed and/or metastatic cells. Further efforts should investigate the universality of

these findings with other in vitro and in vivo preclinical models as well as with human

samples. Confirmed altered metabolic pathways may open new therapeutic avenues for

treating malignant breast cancer. Several secreted metabolites accompanying the

increased metastatic potential (malate, fumarate, deoxyguanosine, guanine, xanthine, and

hypoxanthine) should be tested for their value as diagnostic and prognostic biomarker of

malignant breast cancer in future studies. (Lu, Bennet et al. 2010)

1.5.2 Modeling Cancer Metabolism Cancer is a complex disease that involves multiple types of biological interactions across

diverse physical, temporal, and biological scales. This complexity presents substantial

challenges for the characterization of cancer biology, and motivates the study of cancer in

the context of molecular, cellular, and physiological systems. Computational models of

cancer are being developed to aid both biological discovery and clinical medicine. The

development of these in silico models is facilitated by rapidly advancing experimental

and analytical tools that generate information-rich, high-throughput biological data.

Statistical models of cancer at the genomic, transcriptomic, and pathway levels have

proven effective in developing diagnostic and prognostic molecular signatures, as well as

in identifying perturbed pathways. Statistically inferred network models can prove useful

in settings where data overfitting can be avoided, and provide an important means for

biological discovery. Mechanistically based signalling and metabolic models that apply a

47

priori knowledge of biochemical processes derived from experiments can also be

reconstructed where data are available, and can provide insight and predictive ability

regarding the behaviour of these systems. At longer length scales, continuum and agent-

based models of the tumour microenvironment and other tissue-level interactions enable

modeling of cancer cell populations and tumour progression. Even though cancer has

been among the most-studied human diseases using systems approaches, significant

challenges remain before the enormous potential of in silico cancer biology can be fully

realized (Edelman, Eddy et al. 2009).

Monumental advances in molecular and cellular biology, beginning in the latter

half the 20th century and continuing today, have provided an increasingly detailed

portrait of human biology from the molecular to physiological levels. These advances

have centred on ‘reductionist’ experimental approaches aiming to annotate a vast array of

biological components, from cells and tissues to genes and proteins. Collectively, these

components represent a ‘parts list’ for biological systems (e.g., biochemical pathways,

larger interaction networks). At scales beyond a handful of interacting components,

however, simple analysis techniques can become limited in providing comprehensible

insight into resulting phenotypic behaviours. Systems biology is a rapidly growing

discipline that employs an integrative approach to characterize biological systems, in

which interactions among all components in a system are described mathematically to

establish a computable model. These in silico models—which complement traditional in

vivo animal models—can be simulated to quantitatively study the emergent behaviour of

a system of interacting components. Model development in the systems biology paradigm

48

is enabled by the description of parts and interactions from reductionist biology, and also

depends upon quantitative measurements. The advent of high-throughput experimental

tools has allowed for the simultaneous measurement of thousands of biomolecules,

paving the way for in silico model construction of increasingly large and diverse

biological systems. Integrating heterogeneous dynamic data into quantitative predictive

models holds great promise to significantly increase our ability to understand and

rationally intervene in disease-perturbed biological systems. This promise, particularly

with regards to personalized medicine and medical intervention, has motivated the

development of new methods for systems analysis of human biology and disease.

(Edelman, Eddy et al. 2009)

Cancer is an intrinsically complex and heterogeneous disease, making it

particularly amenable to systems biology approaches. Malignant tumours develop as a

function of multiple biological interactions and events, both in the molecular domain

among individual genes and proteins, and at the cellular and physiological levels between

functionally diverse somatic cells and tissues (Figure 10). At the molecular level, genetic

lesions interact synergistically to evade tumour suppression pathways, with no single

mutation typically sufficient to cause transformation. Beyond genetic mutations,

transformed cells can exhibit changes in expression of hundreds to thousands of genes

and proteins. Genetic modifications observed in cancer are often accompanied by

changes at the epigenetic level. The convolution of genetic effects and epigenetic

modifications illustrates the complex, nonlinear relationship between molecular state and

cellular cancer phenotype, emphasizing the need for heterogeneous data integration

49

through in silico models. The diversity of cancer models mirrors the broad array of

molecular and physiological events characteristic of the disease (Figure 11). The most

course-grained approaches use statistical analysis of high-throughput expression data to

identify molecular signatures of cancer phenotypes. Such signatures are indicative of

aberrant function of genes or pathways, and can be used to predict the type, stage, or

grade of biopsied tumour samples. More advanced methods aim to statistically infer the

structure and/or quantitative relationships among biomolecules within interaction and

regulatory networks of importance in cancer. Alternatively, stoichiometric or kinetic

models of biochemical reaction networks, constructed in a bottom up, annotation based

manner, can be used to simulate in mechanistic detail the behaviour of metabolism or

signal transduction in cancer. (Edelman, Eddy et al. 2009)

Figure 10: Molecular and physiological complexity in cancer. Figure from (Edelman, Eddy et al. 2009)

50

Figure 11: Biological scales and potential modeling approaches. Figure from (Edelman, Eddy et al. 2009)

The complexity of intracellular phenomena observed in cancer is mirrored by

equally intricate interactions between cells and across somatic tissues. Among the most

important biological systems mediating cancer development is the local tumour

microenvironment, a complex, interacting system of cells and extracellular moieties.

Contributory agents include the extracellular matrix, cooperating tumour and proximate

‘host’ cells, extracellular signaling factors, and the metabolic context of local tissue.

Other important agents include the infiltrating leukocytes and cytokines of the immune

system. Human cancers also exhibit other major interactions with somatic tissues

concomitant to malignant invasion, such as tumour-induced angiogenesis. The potential

response to chemotherapeutics, radiotherapy, and surgical procedures represent additional

confounding factors in the cellular and physiological behaviour of cancer cells. The

heterogeneous nature of the tumour microenvironment poses substantial modeling

51

challenges, yet ongoing research has sought to characterize these cancer systems,

including continuum and discrete models. (Edelman, Eddy et al. 2009)

Despite significant experimental and analytic challenges arising from cancer’s

complexity, modeling has already successfully led to insights into cancer biology and

treatment, as will be discussed herein. Some of the earliest models describing the

molecular basis of cancer over half a century ago implicated the absolute number of

genetic mutations as causative for malignancy. Today, important efforts in sequencing the

human genome and now individual cancers mean that malignant genetic transformations

can be studied and modeled in the context of the entire genome. (Edelman, Eddy et al.

2009) describe key examples of recent in silico modeling efforts in cancer. These include

(1) statistical models of cancer, such as molecular signatures of perturbed genes and

molecular pathways, and statistically inferred reaction networks; (2) models that

represent biochemical, metabolic, and signaling reaction networks important in

oncogenesis, including constraint-based and dynamic approaches for the reconstruction

of such networks; and (3) continuum and agent-based models of the tumour

microenvironment and tissue level interactions. (Edelman, Eddy et al. 2009)

In contrast to statistically inferred networks, biochemical reaction networks are

constructed to represent explicitly the mechanistic relationships between genes, proteins,

and the chemical interconversion of metabolites within a biological system (Figure 12).

In these models, network links are based on pre-established biomolecular interactions

rather than statistical associations; significant experimental characterization is thus

52

needed to reconstruct biochemical reaction networks in human cells. These biochemical

reaction networks require, at a minimum, knowledge of the stoichiometry of the

participating reactions. Additional information such as thermodynamics, enzyme capacity

constraints, time-series concentration profiles, and kinetic rate constants can be

incorporated to compose more detailed dynamic models. (Edelman, Eddy et al. 2009)

Figure 12: Comparison of biochemical reaction network and statistical network models. Figure from (Edelman, Eddy et al. 2009)

53

The most basic mathematical representation of a biochemical reaction network is

a stoichiometric model. Stoichiometric models describe the interconversion of

biomolecules purely in terms of the number of reactants and products participating in

each reaction. The generation of stoichiometric and analysis of their properties is a well

established process, and genome-scale models of metabolism have been completed for a

diverse range of organisms. Methods have also been developed for reconstructing

signalling networks, transcriptional and translational networks, and regulatory networks;

these models are fundamentally analogous to reconstructed metabolic networks (Figure

13). The reconstruction of a biochemical reaction network results in a database of

stoichiometric equations that can be represented mathematically to form the foundation of

a genome-scale, computable model. Computational tools for constraint-based analysis are

then used to interrogate the properties of the reconstructed network in silico, and to

facilitate model-driven validation and refinement. Physico-chemical and environmental

constraints under which the network operates are applied in the form of balances,

including mass, energy, and charge, and bounds, such as flux capacities and

thermodynamic constraints. The statement of constraints defines a solution space

comprising all non-excluded network states, thereby describing possible functions or

allowable phenotypes. These methods are now being adapted for modeling human

systems in greater detail. (Edelman, Eddy et al. 2009)

54

Figure 13: Mathematical representation of reaction links in biochemical networks.

The global human metabolic reconstruction provides a basis for the known set of

metabolic reactions catalyzed by human proteins. However, the utility of these models for

cancer research going forward depends upon overcoming several challenges. First,

further refinement of the global human metabolic map is essential to increase its

accuracy. Second, each of the approximately 200 cell types in the human body exhibits

only a portion of the full metabolic capability contained in the genome. The high

percentage of undetermined activities for metabolic enzymes in human tissues clearly

shows how much more we have to learn about even this very well-studied cellular

55

process. Effectively representing which portions of the global human metabolic network

are active in any given cell type, and at what level, is thus of critical importance. Cancers,

in particular, are known to exhibit diverse metabolic phenotypes compared with their

progenitor cells, typically with an increased rate of overall metabolic activity to support

their increased growth and the highest metabolic activity observed in the most aggressive

malignancies. Multiple other hallmarks of cancer including angiogenesis, metastasis,

evasion of apoptosis, and avoidance of immune detection have been previously linked to

human tumour metabolism. Metabolic targets have also been used in cancer

chemotherapy. For these reasons, metabolic networks in human cancer have the potential

to be a rich focus area for systems modeling going forward. (Edelman, Eddy et al. 2009)

In silico models of cancer can be built not only for intracellular networks, but also

at larger length scales. Alternative computational methods must be applied to consider the

interface between cancers and the tissue contexts in which they reside. These settings

exhibit complex interactions with multiple factors of different function and scale,

including extracellular biomolecules, a spatially intricate and dynamic vasculature, and

the immune system. Models of cancer at the tissue level that account for these

functionally divergent parameters can be broadly divided into ‘continuum’ models, and

discrete or ‘agent-based’ models. The latter are often applied when the number of

individual interacting units, such as cancer cells, is constrained to remain small; the

former is more practical at population scales where agent-based modeling can be

computationally prohibitive. Both methods can integrate information about the biological

56

context in which cancers develop, and thus represent a multi-scale consideration of

oncogenesis as it occurs within somatic tissues. (Edelman, Eddy et al. 2009)

Extracellular parameters can be represented as continuously distributed variables

to mathematically model cell–cell or cell–environment interactions in the context of

cancers and the tumour microenvironment. Systems of partial differential equations have

been used to simulate the magnitude of interaction between these factors, including the

effects of hypoxia on cell cycle progression, the impact of mechanical forces on tumour

invasiveness and extracellular matrix interactions. Recent studies have examined cell

population dynamics within colonic crypts in colorectal cancer. These models consider

interactions between stem cells, differentiating cells, and differentiated cell populations to

quantitatively predict tissue-level invasion and the growth of tumour mass. Other models

have represented solid tumours as a multiphase system of both bound and ‘mobile’ forms.

Such ‘mixture’ models consider differential growth and apoptosis rates, as well as mass

transfer and regulatory interactions between phases. Alternative models have considered

nonlinear and combinatoric effects of multiple factors, including nutrient availability and

mechanical parameters, and the effects of mutation rate on invasion and metastasis. These

numerical systems embody a robust method to incorporate the effects of somatic

biological phenomena into computational representations of cancer. Continuum-based

models are thus a powerful tool to simulate and characterize interactions between

intracellular and extracellular factors in oncogenic processes. (Edelman, Eddy et al. 2009)

57

Multivariate continuum models are able to represent the effects of several

physiological or biochemical events on cancer development. However, in situ, these

factors are highly heterogeneous, and interact discontinuously with tumour cells. Cellular

automata models represent cancer cells as discrete entities of defined location and scale,

interacting with one another and external factors in discrete time intervals according to

predefined rules. Agent based models expand the cellular automata paradigm to include

entities of divergent functionalities interacting together in a single spatial representation,

including different cell types, genetic elements, and environmental factors. With

sensitivity to starting conditions, and the ability to incorporate probabilistic interactions at

each time step, these models can exhibit similar stochastic behaviours to those observed

in oncogenesis in vivo. Phenomena that have been modeled using agent-based models

include three dimensional tumour cell patterning, immune system surveillance,

angiogenesis, and the kinetics of cell motility. Another recent model integrated diverse

parameters such as extracellular signals, blood flow, and tissue degradation to simulate

the spatiotemporal formation of tumour vasculature. (Edelman, Eddy et al. 2009)

Increasingly, ‘hybrid’ models have been created which incorporate both

continuum and agent based variables in a modular approach. For example, a recent study

considered continuous extracellular biomolecule distributions and discrete cell locations

to simulate the interaction between intracellular decision-making processes and malignant

growth. Another recent model incorporated a continuous model of a receptor signaling

pathway, an intracellular transcriptional regulatory network, cell–cycle kinetics, and

three-dimensional cell migration in an integrated, agent-based simulation of solid brain

58

tumour development. The interaction between cellular and microenvironment states have

also been considered in a multi-scale model that predicts tumour morphology and

phenotypic evolution in response to such extracellular pressures. These and other

techniques which incorporate multiple, nested scales of interacting biology embody

promising paradigms to understand cancer as a cascade of information across levels of

size and complexity. This ability to interrogate cancer across multiple biological agents

and compartments presents a unique framework to elucidate oncogenic processes, and to

evaluate potential therapeutic interventions through digital simulation prior to

experimental deployment. (Edelman, Eddy et al. 2009)

In this work, human breast cancer was investigated via the integration of a human

metabolic model and gene expression data.

1.6 Reactome Array data Technology

(Beloqui, Guazzaroni et al. 2009) have applied reactome array technology to measure

metabolite transformation in P. putida for genome sequence–independent functional

analysis of metabolic phenotypes and networks. The array includes 1676 substrate

compounds collectively representing central metabolic pathways of all forms of life.

Proof of concept was shown inter alia, by the reconstruction of P. putida’s metabolic

network, demonstrating that the array discriminates compounds metabolized by extracts

of P. putida from those that are not.

59

Functional genomics has greatly accelerated research on the genomic basis of life

processes in health and disease and provided a quantum advance in our understanding of

such processes, their regulation, and underlying mechanisms. Functional assignments and

metabolic network reconstructions have generally depended on both the genome

sequence of the organism(s) in question and bioinformatic analyses based on homology

to known genes and proteins However, many genes in databases have questionable

annotations or are not annotated at all, which hinders effective exploitation of the rapidly

growing volume of genome sequence data. Metabolomics provides new insights into the

metabolic state of a cell under a given set of environmental parameters, or in response

to a parameter change, independently of a genome sequence, although problems of

metabolite identification and quantification exist. Functionally associating the

metabolic profile obtained with the enzymes and pathways responsible still depends

heavily on sequenced-based metabolic reconstructions. There is thus a need for a new

method to causally link metabolites with cognate enzymes, which, in addition to

delivering global descriptions of metabolic responses to given environmental conditions,

simultaneously provides annotation of the enzymes featured. The “reactome array” was

designed to forge this link between genome and metabolome, providing a global

metabolic phenotype of a cell extract derived from a clonal population of cells or a

mixture of cell types, as is found in clone libraries, tissues, or multicellular organisms.

The array constitutes a generic tool for metabolic phenotyping of cells and annotation of

proteins and has applications in diverse aspects of biology and medicine. The reactome is

a sensitive metabolite array for genome sequence–independent functional analysis of

metabolic phenotypes and networks, of cell populations and communities. Application of

60

cell extracts to the array leads to specific binding of enzymes to cognate substrates,

transformation to products, and concomitant activation of the dye signals. Utility of the

array for unsequenced organisms was demonstrated, inter alia, by reconstruction of the

global metabolisms of three microbial communities derived from acidic volcanic pool,

deep-sea brine lake, and hydrocarbon-polluted seawater. Enzymes of interest are captured

on nanoparticles coated with cognate metabolites, sequenced, and their functions

unequivocally established. (Beloqui, Guazzaroni et al. 2009)

1.7 Automatically generated Metabolic Models

Presently model reconstruction lags behind genome sequencing with ~1000 completely

sequenced prokaryotes vs ~50 published genome‐scale models. Models are often

constructed one‐at‐a‐time by individuals working independently, resulting in replication

of work, propagation of errors, and extensive manual curation. It currently requires

approximately one year to produce a complete manually curated model. Rapid

Annotation using Subsystem Technology (RAST) and SEED have made high‐speed,

quality annotation of prokaryotic genomes a reality. In RAST, each biological subsystem

is continuously annotated and curated across all known genomes by a annotator with

expert knowledge in that subsystem. The modeling pipeline exploits the high quality of

RAST annotations along with a variety of optimization algorithms to automatically

generate genome‐scale models. The automated model reconstruction pipeline produces

genome‐scale models that are comparable in size with the available published

genome‐scale models. The automatic reactions added during auto‐completion process of

the pipeline exposed regions of metabolism where more annotation efforts are necessary.

61

The optimization steps of the pipeline boosted model accuracy from initial values of

66% to optimized values of 87%, which approaches the accuracy typical of manually

reconstructed models. The model optimization process also enabled the identification of

missing transporters, additional missing reactions, under‐constrained reactions, and

annotations that are inconsistent with available essentiality data. (Overbeek, Begley et al.

2005)

62

2 iMAT: Integrative Metabolic Analysis Tool

2.1 Online Tool Development

In this work we introduce an Integrative Metabolic Analysis Tool (iMAT), enabling the

integration of transcriptomic, proteomic, and reactome array data with metabolic network

models to predict metabolic flux, developing variants of the approach presented in

(Shlomi, Cabili et al. 2008). Specifically, we present a new constraint-based method for

the integration of a genome-scale metabolic network with reactome array data to predict

metabolic flux activity. While high-throughput transcriptomic and proteomic data have

been available for quite some time now, the reactome array technology has been very

recently developed by (Beloqui, Guazzaroni et al. 2009), providing exciting new genome

scale data on the rate of metabolite transformation by enzymes present in cell extracts.

All together, iMAT supports the integration of functional data with an array of different

models, including: (i) a highly curated metabolic network model of human metabolism by

(Duarte, Becker et al. 2007), enabling the prediction of metabolic activity under various

tissues and cell-types; (Chatziioannou, Palaiologos et al.) common model organisms such

as E. coli and S. cerevisiae; and (iii) an array of automatically reconstructed networks for

160 bacteria (Overbeek, Begley et al. 2005), enabling the prediction of metabolic activity

under various environmental and genetic conditions. Importantly, The usage of iMAT is

straightforward and user friendly, starting with the submission of the functional data for a

certain organism via WEB and receiving a visualization of the organism’s metabolic

network showing the most likely, predicted metabolic flux. The applicability of iMAT is

63

demonstrated here re. the prediction of Human breast-cancer metabolism via gene

expression data, and Pseudomonas Putida metabolism via Reactome array data.

2.1.1 Online availability

iMAT is available at http://imat.cs.tau.ac.il/

Utilizing iMAT to predict metabolic flux based on transcriptomic, proteomic, or reactome

array data requires the specification of an organism of interest and uploading the input

data file (for a list of supported organisms, see the iMAT website). To predict metabolic

flux based on gene expression or proteomic data, the user is required to supply discrete

tri-valued expression state of genes, being either lowly, moderately, or highly expressed

in the condition studied. For reactome array data, the user is required to supply the

discrete transformation rate of metabolites, being either lowly, moderately, or highly

transformed (reflecting the strength of metabolic consumption by the corresponding

enzymatic reactions). Various parameters can be tuned to control the discretization of the

raw input files, as described online in the iMAT website. Given the above input, iMAT

predicts a flux activity state for each reaction in the model, reflecting the presence or

absence of its associated metabolic flux. For some of the reactions, the flux activity state

can be uniquely determined to be active or inactive, with associated confidence

estimations. For others, the activity state cannot be uniquely determined because of

potential alternative flux distributions with the same overall consistency with the

expression data due to isozymes or alternative pathways. In cases where the predicted

flux activity of reactions deviate from the given expression state of the corresponding

enzyme-coding gene, the corresponding gene is considered to be post-transcriptionally

64

up-or-downregulated. iMAT provides as output the predicted flux activity state and the

corresponding confidence values over all network reactions in both tabular and network

visualization forms. The network visualization displays the relevant transcriptomic,

proteomic and reactome array data given as input, as well as the predicted metabolic flux,

superimposed on top of the organism’s metabolic network, employing the publicly

available Cytoscape software (Cline, Smoot et al. 2007). In addition, iMAT provides a

pathway enrichment analysis based on the flux activity predictions. (A detailed

description of iMAT’s functionality can be found in the attached user guide).

2.1.2 An illustrative example of applying iMAT to a toy network model

Figure 14 shows two examples of applying iMAT to a small toy model given either gene

expression or reactome array data as input, predicting the same metabolic flux

distribution in both cases. The toy model is comprised of ten metabolites and thirteen

reactions, including seven exchange reactions that enable the uptake of substrates and the

secretion of metabolic by-products.

65

Figure 14: (a) An illustrative example of applying iMAT on a toy metabolic network (shown in b) given either (a) gene expression or (c) reactome array data as input. Circular nodes represent metabolites, edges represent biochemical reactions, and diamond-shaped nodes represent enzyme-coding genes. iMAT’s output is an optimal flux distribution that is the most consistent with (d) expression data or (e) reactome array data given as input. (d) Reactions associated with highly, lowly or moderately expressed genes are colored in green, red, or black respectively, (e) Nodes colored in green, red or black, represent highly, lowly or moderately transformed metabolites (based on reactome array data), respectively. Solid (dashed) edges represent reactions predicted to active (inactive). Reactions whose flux activity state is uniquely determined to be active or inactive (across the whole space of alternative optimal flux distributions) are marked with thick edges.

In the example application of iMAT to integrate gene expression data, the predicted

flux is consistent with the expression state of 4 of the 5 reactions, predicted to be active

(inactive) in accordance with the high (Hu, Mellor et al.) expression state of their enzyme

coding genes. One reaction (M6->M9) is predicted to be inactive though its

corresponding gene is highly expressed, giving rise to a potential post-transcriptional

regulation. Of the five metabolites that can be transported across the membrane

66

boundary in the toy model (M1-2, M7-9), iMAT predicts the uptake of one metabolite

(M1) and the secretion of two others (M7 and M8). Notably, while the high expression

level of the membrane transporter of M1 indicates that it may potentially be active, it

does not provide information whether M1 is taken up or secreted from the tissue. In

contrast, iMAT can predict flux directionality in some cases by propagating known

constraints on the reversibility of other enzymes (inferred based on thermodynamic

principles; in this case based on the known irreversibility of G3 and G4, and spontaneous

reactions ->M3, M3+M4->M7+M8, M8->).

In the second example of applying iMAT utilizing reactome array data, iMAT

predicts the same flux distribution that is consistent with the metabolic transformation

state of five out of the six metabolites in the network. In this case, the high transformation

state of metabolite M1 indicates that it is used as a substrate by some enzymatic reaction,

without an indication of the specific reaction in which it participates. iMAT predicts that

M1 will be transformed by M1->M4, and not by M1+M2->M5+M6, by considering the

network-wide flux distribution in which both metabolites M2 and M9 have a low

transformation rate.

Both expression and reactome array examples exhibit an equivalent metabolic

flux prediction (in panels (d) and (e) respectively). The pathway predicted to be active in

both panels is portrayed by solid edges, where thick edges denote reactions predicted to

be active with high confidence (i.e. the flux activity state can be uniquely determined to

be active across the optimal solution space). The activity prediction of M1->M4 has low

confidence, since an equally viable alternative path exists via M1->M10->M4.

67

2.2 Expanding approach: Integration of Reactome Array data

We introduce a new constraint-based computational method for the integration of a

genome-scale metabolic network with reactome array data (Beloqui, Guazzaroni et al.

2009), in order to predict metabolic flux activity. It is a reasonable assumption, that if a

metabolite substrate of a reaction is transformed by the reaction enzyme, that that

reaction can then be considered to be active. Similarly, if the metabolite substrate is not

transformed, then the reaction can be considered to be inactive. Reactome array data

input, the rate of metabolite transformation by enzymes, is discretized in a similar manner

to that of the gene expression data, such that metabolites which have a high

transformation rate are considered to be highly transformed, and receive the value 1,

while metabolites with a low transformation rate are considered to be lowly transformed

and receive the value -1. Metabolites in the intermediate are considered to be moderately

transformed and receive the value 0.

This problem can be formulated in the following manner: Given a subset of highly

transformed metabolite substrates ( ) and lowly transformed metabolite substrates ( ),

from a subset of existing model metabolites ( ), measured by the Reactome ( ):

; find an optimal feasible solution that would maximize the number of

metabolites from which are transformed by at least one reaction, and which

are not transformed by any reaction. A simple metabolite-to-reaction mapping was

employed to determine the transformation state for each reaction. Specifically, this was

achieved by assigning the transformation state of the metabolite substrates to the reaction

(a reaction that has both highly and lowly transformed metabolites is not included in the

68

constraints, allowing iMAT to determine its activity state without a priori knowledge).

This pre-processing results in a subset of the reactions in the model (denoted ) that is

defined to be highly transformed and another subset (denoted ) defined as lowly

transformed. We then formulated the following mixed integer linear programming

(MILP) problem to find a steady-state flux distribution satisfying stoichiometric and

thermodynamic constraints, while maximizing the number of reactions whose activity is

consistent with their metabolic transformation state (equation(1)):

(1) max ∑ ∑

s.t

(2) 0

(3)

(4) , ε , : ,

(5) , ε , : ,

(6) ∑

(7) 1 1 : ,

, , , 0,1 : ,

Where is the flux vector and is a stoichiometric matrix, in which is the

number of metabolites and is the number of reactions. The mass balance constraint is

enforced in equation (2). Thermodynamic constraints that restrict flow direction are

imposed by setting and as lower and upper bounds on flux values in equation

(3), respectively. For each reaction, the Boolean variables and represent whether

the reaction is active (in either direction) or not (when both and are 0); is the set

69

of reactions of which highly transformed metabolite j is a substrate. Specifically, a

reaction is considered to be active if it carries a significant positive flux that is greater

than a positive threshold ε (equation (4)) or a significant negative flux < – ε (equation (5)

for reversible reactions). is a Boolean variable per highly transformed metabolite j:

1 if at least one of the reactions of which metabolite j is a substrate is active

(equation (6)). For each lowly transformed metabolite j, the Boolean variable

represents whether all the reactions of which metabolite j is a substrate are inactive, or if

some are active (when is 0); is the set of reactions of which lowly transformed

metabolite j is a substrate. Specifically, a reaction is considered to be inactive if it does

not carry a flux that is greater than 0 in either direction (equation (7)). The optimization

maximizes the number of reactions whose activity is similar to their metabolic

transformation state. The commercial CPLEX solver was used for solving MILP

problems on a Pentium-4 machine running Linux in a few dozens of seconds per

problem.

A solution found by the MILP solver is guaranteed to be optimal in the sense of

the objective function maximized, but the solution found may not be unique as a space of

alternative optimal solutions may exist. In this case, the space of optimal solutions

represents alternative steady-state flux distributions attaining the same similarity with the

metabolic data. To account for these alternative solutions, we employed a variant of Flux

Variability Analysis (Mahadevan and Schilling 2003). Our method computes for each

metabolic reaction whether it is predicted to be always active (or alternately, always

inactive) across the entire solution space. This is achieved by solving two MILP problems

70

(similar to the one described above) for each reaction, in order to find the maximal

attainable similarity with the metabolic data when the reaction is forced to be activated

(denoting this similarity ) and when it is forced to be inactivated (denoting this similarity

). A reaction is then considered to be active if (i.e., a higher similarity with the

metabolic data is achieved when the reaction is active than when it is inactive) with a

confidence level of . Alternately, a reaction is considered to be inactive if ,

with a confidence of . If (i.e., the same similarity with the metabolic data can

be obtained both when the reaction is forced to be active or inactive), the activity state is

considered to be undetermined.

2.2.1 Modeling P.Putida’s Metabolic Profile via Reactome Array Integration

(Beloqui, Guazzaroni et al. 2009) have applied reactome array technology to measure

metabolite transformation in P. putida for genome sequence–independent functional

analysis of metabolic phenotypes and networks. The array includes 1676 substrate

compounds collectively representing central metabolic pathways of all forms of life.

Proof of concept was shown inter alia, by the reconstruction of P. putida’s metabolic

network, demonstrating that the array discriminates compounds metabolized by extracts

of P. putida from those that are not. Here, we utilize iMAT to integrate the reactome

array data with a genome-scale metabolic network model of P. putida [ref]. to predict the

actual metabolic flux reflected in the array data.

71

2.2.1.1 Data Acquisition and Preprocessing The raw reactome array data was first used to assign each metabolite with a

transformation state (i.e. lowly, moderately, or highly transformed), reflecting whether it

is being consumed by some enzymatic reaction. This pre-processing resulted in 91 lowly

and 263 highly transformed metabolites, out of the 1191 metabolites of the P. putida

metabolic model.

2.2.1.2 Results Utilizing iMAT to predict metabolic flux in P. putida given this data results in a

confident prediction of 459 active reactions and 792 inactive reactions (out of 1373

reactions in the model). The predicted flux distribution reflects the activity of amino acid

metabolism, biosynthesis of secondary metabolites, carbohydrate metabolism and energy

metabolism (based on a hyper geometric-based pathway enrichment test), in accordance

with the findings of Beloqui et al (Table 1). (See supplementary results for detailed

description)

Active enriched pathways P‐Values

Branched‐chain amino acid biosynthesis 0.0000TCA cycle 0.0000De novo purine biosynthesis 0.0000Histidine biosynthesis 0.0000Purine conversions 0.0001Isoleucine degradation 0.0001Valine degradation 0.0001Methionine biosynthesis 0.0003Common pathway for synthesis of aromatic compounds (dahp synthase to chorismate)

0.0004

Lysine biosynthesis dap pathway 0.0004leucine degradation and hmg‐coa metabolism 0.0013

72

Arginine and Ornithine degradation 0.0025N‐phenylalkanoic acid degradation 0.0025Histidine degradation 0.0040Tryptophan synthesis 0.0080Glycolysis and Gluconeogenesis 0.0175Formaldehyde assimilation: Ribulose monophosphate pathway 0.0192Serine biosynthesis 0.0367Glutamine; Glutamate; Aspartate and Asparagine biosynthesis 0.0416Proline synthesis 0.0447Pyruvate metabolism i: anaplerotic reactions; pep 0.0447

Table 1: Significantly enriched active pathways as predicted by iMAT.

The authors describe an analysis where 549 proteins were captured by gold

nanoparticles, of which 191 enzymes acting on 158 of the 525 P. putida metabolites were

unambiguously identified as active. Of the 191 enzymes and 158 metabolites, 123 (of

1082 enzymes in the model) and 47 (of 1191 metabolites in the model) were found

respectively in the P. putida metabolic model. We calculated iMAT's recall by testing if

at least one reaction associated with the above enzymes or metabolites was predicted to

be active, obtaining 0.6748 and 0.4894 respectively. To evaluate the significance of these

results, we calculated p-values for a random predictor which draws an associated

correctly predicted to be active reaction, for each enzyme and metabolite from their

associated reactions distribution respectively. The random predictor obtained p-values of

0.7387 and 0.7904, for enzymes and metabolites respectively, confirming iMAT's

significance.

To validate iMAT’s predictive accuracy, we performed a 5-fold cross-validation

test in which a training set of 80% of the reactome array metabolites was used as input for

iMAT to predict the transformation state of the remaining 20%. Specifically, in each

cross-validation trial, the transformation state of a metabolite (within the test set) was

73

predicted to be high, if at least a single reaction in which it participates as a substrate is

predicted to be active by iMAT, and low, if all reactions in which it participates as a

substrate are predicted to be inactive. The prediction of metabolites with high and low

transformation states was found to be highly significant, with a precision of 0.6077 and

0.615, recall of 0.8187 and 0.3544, and a p-value of 0.000283 and 0.000137 respectively.

74

3 Modeling Human Breast Cancer

We applied iMAT to the human metabolic model of (Duarte, Becker et al. 2007) to

predict the metabolic state of met induced breast cancer, by integrating gene expression

measurements from the pertaining cancer cell lines (Kaplan, Firon et al. 2000). met is a

proto-oncogene that encodes a protein Met, which is a membrane receptor activated by

the hepatocyte growth factor (HGF/SF) ligand, the only known ligand of the Met protein.

Met is a tyrosine kinase growth factor receptor that is imperative to embryonic growth

and wound healing. When spurred by HGF/SF, Met induces tumor growth, angiogenesis

and metastasis, which correlates with poor prognosis (Bottaro, Rubin et al. 1991; Cooper

1992).

Hepatocyte growth factor/scatter factor (HGF/SF) is a paracrine growth factor

which increases cellular motility and has also been implicated in tumor development and

progression and in angiogenesis. Little is known about the metabolic alteration induced in

cells following Met-HGF/SF signal transduction. The hypothesis that HGF/SF alters the

energy metabolism of cancer cells was investigated in perfused DA3 murine mammary

cancer cells by nuclear magnetic resonance (NMR) spectroscopy, oxygen and glucose

consumption assays and confocal laser scanning microscopy (CLSM). 31P NMR

demonstrated that HGF/SF induced remarkable alterations in phospholipid metabolites,

and enhanced the rate of glucose phosphorylation (P < .05). 13C NMR measurements,

using [13C1]-glucose-enriched medium, showed that HGS/SF reduced the steady state

levels of glucose and elevated those of lactate (P < .05). In addition, HGF/SF treatment

increased oxygen consumption from 0.58±0.02 to 0.71±0.03 µmol/hour per milligram

protein (P < .05). However, it decreased CO2 levels, and attenuated pH decrease. The

75

mechanisms of these unexpected effects were delineated by CLSM, using NAD(P)H

fluorescence measurements, which showed that HGF/SF increased the oxidation of the

mitochondrial NAD system. (Kaplan, Firon et al. 2000) propose that concomitant with

induction of ruffling, HGF/SF enhances both the glycolytic and oxidative

phosphorylation pathways of energy production.

3.1 Results

3.1.1 Data Acquisition and Preprocessing We utilized normalized gene expression data from breast cancer cell-lines with high Met

expression (MDA231, BT549, Hs578T) and low Met expression (MCF7, T47D,

MCF10), 24 hours after treatment with HGF/SF. The raw cell-line expression data was

transformed into qualitative expression states, in which each is either highly, lowly or

moderately expressed, using a bidirectional threshold of half a standard deviation from

the mean. The derived gene expression states for each cell-line were given as input to

iMAT to predict a flux distribution that is most consistent with the corresponding

expression signature.

3.1.2 Analysis Overview

A rough overview of the various analysis I performed utilizing iMAT’s metabolic flux

predictive power to deduct and differentiate the metabolic state of met induced breast

cancer. Pathway enrichment analysis was performed, deeming pathways with p-values

below 0.05 as significantly enriched. This analysis created a first level differentiating

76

metabolic profile for the high Met cell-lines as compared with low Met cell-lines, with

validations from the literature supporting this differentiating profile. Differential genes

that are post-transcriptionally upregulated (and thus could not be discerned by expression

data alone), reactions, and uptake and secretion of metabolites, which create a second

differentiating level comprising the high Met metabolic signature, were again ascertained

by the literature. Following personal correspondence with Prof. Ilan Tsarfaty, fatty acid

biosynthesis was selected for an in depth look due to its critical function in Met induced

breast cancer. We conclude with two immediate augmenting modifications.

3.1.3 iMAT on general human model integrated with expression data

3.1.3.1 Pathway Enrichment Analysis To track the differences between the metabolic response of high vs low Met cell lines to

ligand stimulation, we performed a pathway enrichment analysis of the predicted

metabolic flux activity profiles for both the high and low Met cell-lines. The analysis

revealed 9 metabolic pathways which are significantly enriched with reactions predicted

to be active in all three high Met cell-lines, and not in the low Met cell-lines (Table 2).

Correspondingly, the analysis uncovered 5 metabolic pathways which are significantly

enriched with reactions predicted to be active in all three low Met cell-lines, and not in

the high Met cell-lines. Reassuringly, the pathways identified by iMAT correspond nicely

to the known underlying biology of Met signaling in cancer. As already shown by

(Kaplan, Firon et al. 2000), HGF/SF activates Oxidative Phosphorylation and the TCA

cycle in DA3 murine mammary cancer cells, consistent with the corresponding metabolic

77

pathways iMAT predicted to be significantly active. Additionally, Table 2 points to a

quite global activation of amino acid pathways, and typical anaplerotic activation of

pathways involved in glutamine metabolism and the TCA cycle. These findings are in

line with the observations of (DeBerardinis, Sayed et al. 2008), which have asserted that

glutamine metabolism enables macromolecular synthesis in proliferating cells, allowing

cells to meet both the anaplerotic and NADPH demands of growth. Thus, iMAT’s

predictions fit well these putative metabolic requirements of the highly proliferating high-

Met cell-lines. The low-Met cell-line active pathway predictions do not portraying the

metabolic HGF/SF Met incitement signature .Notably, the differential activation of these

pathways is not directly reflected in the gene expression data, as neither Gene Set

Enrichment Analysis (GSEA) (Subramanian, Kuehn et al. 2007), or the commonly used

hyper geometric-based pathway enrichment analysis enables their detection. Only two of

the high-Met differentiating pathways uncovered by iMAT were discerned by hyper

geometric expression analysis, and the remaining predicted pathways are not indicative of

HGF/SF activation of Met (Supplementary material).

Enriched Active PathwaysiMAT High‐Met P‐value iMAT low‐Met P‐value Inositol phosphate metabolism 0.0000 Pentose Phosphate pathway 0.0010 TCA cycle 0.0013 Eicosanoid metabolism 0.0013 Phenylalanine metabolism 0.0034 Glutathione metabolism 0.0375 Oxidative Phosphorylation 0.0080 Fatty acid activation 0.0425 Glutamate and Glutamine metabolism 0.0111 Alanine and Aspartate metabolism 0.0475 Glycine, Serine, and Threonine metabolism 0.0265 Tyrosine metabolism 0.0350 Propanoate metabolism 0.0493 Histidine Metabolism 0.0493 Table 2: depicts the significant results of the pathway enrichment analysis performed on the predicted metabolic flux activity profiles generated by iMAT for both the high and low Met cell-lines, and the common pathways discovered by gene expression data alone for the high-met cell-line. iMAT’s analysis uncovered 9 pathways from 99 pathways in the human metabolic model predicted to be active (i.e., significantly enriched by predicted active reactions using a hyper geometric test) in all

78

three high Met cell-lines, and not in all three low Met cell-lines; 5 pathways predicted to be active in all three low Met cell-lines, and not in all three high Met cell-lines. The analysis based solely on gene expression data revealed 5 pathways predicted to be active in all three high Met cell-lines, and not in all three low Met cell-lines (Supplementary Results, Section 1). Highlighted in yellow are pathways common to iMAT and the gene expression analysis.

3.1.3.2 Differential genes and post-transcriptional regulation I performed a differential reaction analysis finding reactions whose activity predictions

differentiate high Met cell-lines from low Met cell-lines. Per this analysis the differential

driving enzymes can be indentified in the case of unspontaneous reactions, and more

specifically post-transcriptionally regulated genes can be identified via this process, in

addition to the uptake and secretion of differential metabolites.

(Lu, Bennet et al. 2010) show metabolite changes accompanying mammary

tumour progression are identified in the intracellular and secreted forms in several

pathways, including glycolysis tricarboxylic acid cycle, pentose phosphate pathway, fatty

acid and nucleotide biosynthesis and the GSH-dependent anti-oxidative pathway. As

illustrated in (Lu, Bennet et al. 2010) 157 metabolites were profiled in six cell lines with

progressively increased tumourigenicity and metastatic ability. The analysis of

intracellular metabolites clustered the lines into three categories, normal, tumourigenic

(but non-metastatic) and metastatic in general. Results from the analysis favour a two-

step metabolic progression hypothesis during mammary tumour progression: the first step

accompanies the acquisition of tumourigenicity and includes altered glycolysis, PPP and

fatty acid synthesis as well as decreased GSH/GSSG redox pool; the second step is

correlated with the gain of the general metastatic ability and includes further changes in

glycolysis and TCA cycle (three of the four metabolites in the last steps of glycolysis

79

were enriched in metastatic cells: 3-phosphoglycerate, phosphoenolpyruvate (PEP) and

pyruvate, suggesting differences in lower glycolysis. Although the exact mechanism is

not clear, these findings in lower glycolysis may relate to the pivotal role of specific

pyruvate kinase isozymes in oncogenesis. Moving down from pyruvate, several TCA

cycle intermediates were enriched in the metastatic lines, including aconitate, citrate,

isocitrate and malate. The “Warburg effects” suggests tumor cells prefer aerobic

glycolysis to TCA cycle for producing ATP and reductants. (Lu, Bennet et al. 2010)’

observation that TCA intermediates are exclusively upregulated in metastatic cell lines

suggests that invasive cancer cells are using the TCA cycle differently than the non-

metastatic cells. It is unclear, however, whether these differences are associated with

increased TCA cycle flux, and, if so, whether this flux is driven primarily by glucose or

glutamine. Fluxomic analysis is well suited for answering these questions. The changes

of nucleotide species follow an interesting pattern: levels are lower in the nonmetastatic

tumour cells than in the nontransformed cells, perhaps due to enhanced nucleotide

consumption to feed growth and DNA replication in the transformed cells. The metastatic

tumour cells, however, have increased nucleotide levels, reflecting altered nucleotide

turnover), further depletion of the glutathione species, and increased nucleotides. No

further metabolite alterations correlated with stepwise increase of metastasis potential in

the four metastatic lines were resolved in the analysis. (Lu, Bennet et al. 2010) analysis of

extracellular metabolites identified increased abundance of TCA cycle components as

well as nucleotide metabolism intermediates, similar to the intracellular results. Their

findings agree with a recent study profiling a more limited set of metabolites in the

MCF10 model of mammary carcinoma. Both studies find evidence for increased pentose

80

phosphate pathway, TCA cycle, and fatty acid biosynthetic activity in transformed and/or

metastatic cells. Several secreted metabolites accompanying the increased metastatic

potential (malate, fumarate, deoxyguanosine, guanine, xanthine, and hypoxanthine)

should be tested for their value as diagnostic and prognostic biomarker of malignant

breast cancer in future studies.

My analysis of differential metabolic activity recovered the following pathways,

enzymes and metabolites as differentiating metastatic high Met HGF/SF treated cells-

lines from non-metastatic low Met cell-lines.

HGF/SF enhances the glycolytic pathway of energy production (Kaplan, Firon et

al. 2000). Many tumour cells contain elevated levels of total hexokinase activity, the first

enzyme involved in the commitment step of glycolysis as well as an increased amount of

hexokinase type II bound to the outer mitochondrial membrane. In contrast to normal

cells, tumour cells rather obtain most of their ATP from glycolysis than the TCA and

respiration. The mitochondrial association with hexokinase has been proposed to drive

the process of glycolysis in tumour cells by providing for preferential access to inorganic

phosphate and ADP as well as protection against product inhibition by glucose-6-

phosphate (Figure 15). (Copeland, Wachsman et al. 2002). Hexokinase II, one of the four

hexokinase isozymes, is a target of many transcription factors important in tumorigenesis,

including HIF1 and Myc (through the ‘carbohydrate response element’). Hexokinase is

also thought to have a role in protecting the cell against apoptosis. It has been shown that

hexokinases I and II are associated with mitochondria, binding the voltage-dependent

81

anion channel on the mitochondrial outer membrane. Hexokinase binding to voltage

dependent anion channel is thought to be dependent on both glycolytic flux and AKT

activity, although the former is both necessary and sufficient for its anti-apoptotic activity

(Copeland, Wachsman et al. 2002). iMAT predicts post transcriptional upregulation of

HK2 and HK3 in all three high Met cell-lines, while HK1 is upregulated only in one of

the cell-lines.

A glycolytic enzyme whose levels can be altered by p53 (tumor suppressor

protein) expression is phosphoglycerate mutase (PGM, catalyzes the transfer of

phosphate between the 1 and 6 positions of glucose). In cells with high p53 expression,

PGM expression is reduced, but loss of function or low levels of p53 allows increased

PGM and hence glycolysis. Interestingly, over-expression of PGM can immortalize

mouse embryonic fibroblast (MEFs): a phenotype that is dependent upon its catalytic

activity. The correlation between the rate of glycolysis and immortalization was

strengthened by two further strands of evidence: that inhibition of a number of glycolytic

enzymes [PGM, PGI, glyceraldehydes 3-phosphate dehtdrogenase (GAPDH) and

phosphoglycerate kinase (PGK)] can trigger MEF senescence and that spontaneously

immortalized MEFs also increase their glycolytic rate. (Tennant, Duran et al. 2009).

iMAT predicted PGM1 and PGM2 to be post-transcriptionally upregulated in the high

Met cell-lines (they expressed low and moderate expression levels) in concordance.

The TCA cycle intermediates are known to be increased by HGF/SF activation of

Met. It was found to be differentially activated, with MDH1 and MDH2 (Malate

dehydrogenase, localized in the cytoplasm and mitochondria respectively) which catalyze

82

the reversible oxidation of malate to oxaloacetate, utilizing the NAD/NADH cofactor

system in the citric acid cycle. Both enzymes are predicted by iMAT to catalyze the

reaction in the direction of malate production in accordance with (Lu, Bennet et al. 2010).

Furthermore MDH1and MDH2 were found to be post-transcriptionally upregulated

across all three HGF/SF high Met cell-lines in which they exhibited only moderate

expression rates, thus further attesting to iMAT’s predictive value.

PRPS1, PRPS2, and PRPS1L1 of the Pentose phosphate pathway were found to

be post-transcriptionally upregulated in the high Met cell-lines (predicted to drive

phosphoribosylpyrophosphate synthetase in the atp[c] + r5p[c] => amp[c] + h[c] +

prpp[c] direction), in accordance with (Lu, Bennet et al. 2010). This gene encodes an

enzyme that catalyzes the phosphoribosylation of ribose 5-phosphate to 5-

phosphoribosyl-1-pyrophosphate, which is necessary for purine metabolism and

nucleotide biosynthesis.

(Lu, Bennet et al. 2010) describe increased nucleotide biosynthesis, and iMAT

uncovered GUK1 (guanylate kinase, catalyses the conversion of GMP, to GTP as part of

the cGMP cycle. In mammalian phototransduction, this cycle is essential for the

regeneration of cGMP following its hydrolysis by phosphodiesterase.) to be post-

transcriptionally upregulated. This is in accordance with genes identified as over-

expressed in frequently gained/amplified chromosome regions in multiple myeloma

(Largo, Alvarez et al. 2006; Young, Ebner et al. 2006). In addition (da Rocha, Giorgi et

al. 2006) found over expression of HGF and GUK1 in GH-secreting pituitary adenomas.

83

Hepatocyte growth factor (HGF) down-modulates FSH-dependent estradiol-17b

(E2) production in ovarian granulosa cells in vitro. The mechanisms of action underlying

the antiestrogenic effects of HGF are vague, although evidence indicates that HGF may

affect cAMP signal transduction in rat granulosa cells. (Zachow and Woolery 2002)

demonstrate that the effects of HGF on cyclic nucleotide PDE activities were manifested

in a selective time-dependent and hormone-dependent manner, in addition to cAMP

decreasement at 24 hr and cGMP increasement after the HGF treatment. FSH-induced

(pituitary glycoprotein) cAMP (catabolite gene activator protein) PDE was suppressed by

HGF at 24 h but not at 36 h, whereas FSH-dependent cGMP PDE was impaired at 36 h,

but not at 24h. HGF prevented the IGF-I-dependent reduction in FSH-stimulated cAMP-

PDE activity at 24 and 36 h, and lowered FSH 1 IGF-I-stimulated cGMP-PDE activity at

36 h, concomitant with an HGF-dependent increase in cGMP content at 24 h. These data

indicate that HGF affects cAMP-directed and cGMP-directed signaling pathways at

multiple sites in granulosa cells. These HGF-dependent effects may provide insight for

mechanisms of action whereby HGF reduces E2 secretion by granulose cells. The PDE

(phosphodiester, is an enzyme family that catalyzes the hydrolysis of phosphodiester

bonds, plays an important role in the repair of oxidative DNA damage and belongs to the

nucleotide biosynthesis pathway) gene family was predicted by iMAT to be post-

transcriptionally upregulated. Recent studies comparing a normal human mammary

epithelial cell line and a transformed human breast cancer cell line demonstrated that the

levels of PMEs (phosphomonoester) as well as PDEs were extremely low in the normal

cells, and significantly less than in the breast cancer cell line. A further serial study of 25

patients undergoing hormone, chemotherapy and radiotherapy treatments showed a

84

significant correlation between a decrease in PME, PDE and total NTP levels and

response to therapy as measured by a decrease in tumour volume. (Ronen and Leach

2000)

iMAT predicts greatly increased Inositol Phosphate metabolism activity, with the

associated genes post-transcriptionally upregulated. (Harris, Burns et al. 1993) exhibit

that hepatocyte growth factor stimulates phosphoinositide hydrolysis and mitogenesis in

cultured renal epithelial cells. (Koch, Mancini et al. 2005) determine that SH2-domian-

containing inositol 5-phosphatase (SHIP)-2 binds to c-Met directly via tyrosine residue

1356 and involves hepatocyte growth factor (HGF)-induced lamellipodium formation,

cell scattering and cell spreading.

Fatty acid biosynthesis is hypothesized to have a pivotal role in Met-HGF/SF

induced breast cancer (Prof. Ilan Tsarfaty, personal correspondence). (Quash, Fournet et

al. 2003) provide evidence that certain oxoacids formed in anaplerotic reactions control

cell proliferation/apoptosis. Normal human fibroblasts in culture in a serum-deprived

medium require the presence of one of the oxoacids (glyoxylate, pyruvate, 2-

oxoglutarate, or oxaloacetate) for their proliferation, and, of these, glyoxylate is the most

effective. iMAT predicted the post-transcriptional upregulation of the ALDH (Aldehyde

dehydrogenase) family of the Glyoxylate and Dicarboxylate Metabolism, which are

involved in the metabolism of many molecules including certain fats (cholesterol and

other fatty acids) and protein building blocks (amino acids). iMAT also predicts the post-

transcritional upregulation of aldose reductase of pyruvate metabolism, in accordance

85

with (Gharbi, Gaffney et al. 2002) speculation that increased expression of the metabolic

enzymes carbamoyl-phosphate synthetase, glutaminase, and aldose reductase in the

HBc3.6 cells is a direct consequence of their enhanced proliferation caused by ErbB-2

over-expression. The ErbB protein family or epidermal growth factor receptor (EGFR)

family is a family of four structurally related receptor tyrosine kinases. Excessive ErbB

signaling is associated with the development of a wide variety of types of solid tumor.

ErbB-1 and ErbB-2 are found in many human cancers and their excessive signaling may

be critical factors in the development and malignancy of these tumors (Cho and Leahy

2002).

Cholesterol is essential for the multiplication of all mammalian cells and expected

to be in higher demand in fast growing cells such as tumour cells. Most cholesterol is

supplied to the tumour usually by the host, however, tumours may also have the

machinery to synthesize it. Animal studies shown that cholesterol lowering drugs such as

lovastatin attenuate the tumour formation and metastasis. Cholesterol is essential for cell

viability and growth being a critical component of the cell membranes where it serves

several functions including regulation of the membrane fluidity, activity of membrane

bound proteins such as integrins, membrane bound enzymes, and several signal

transduction pathways. It has been recently shown that cholesterol is also required for cell

cycle progression from G2 to M phase. (Awad, Williams et al. 2003). iMAT predicted

cholesterol metabolism to differentiate between HGF/SF activated metastatic high Met

and low Met cell-lines.

86

HGF/SF enhances the oxidative phosphorylation pathway of energy production

(Kaplan, Firon et al. 2000). Tumour development is often associated with mitochondrial

DNA (mtDNA) mutations and alterations in mitochondrial genomic function. These

mutations have been identified in bladder, breast, colon, head and neck, kidney, liver,

lung and stomach cancers, and in the hematologic malignancies leukaemia and

lymphoma. Altered expression and mutations in mtDNA-encoded Complexes I. III. IV.

And V. as well as mutations in the hypervariable regions of mtDNA, comprise some of

the mitochondrial genomic aberrations found in cancer tissue. Cytochrome c oxidase

belongs to Complex IV of the electron respiratory chain oxidative phosphorylation

system that produces cellular ATP. The mtDNA aberrations in Complex IV were also

identified by (Copeland, Wachsman et al. 2002) in breast cancer. Compared to nuclear

DNA mtDNA is more susceptible to oxidative damage, and is in general, more mutable.

The occurrence of mtDNA mutations (deletions, point mutations. duplications) in tumour

cells are consistent with the concept that tumour cells are under persistent (constitutive)

oxidative stress generating higher levels of the ROS Superoxide and hydrogen peroxide

than their normal counterparts. This notion is consistent with the fact that mitochondria

contain the complete electron transport system involved in both respiration and oxidative

phosphorylation. Reactive oxygen species are known to function in both the initiation and

promotion of cancer, as well as in decreasing mitochondrial ATP production. (Copeland,

Wachsman et al. 2002). iMAT predicts COX IV to be differentially post-transcriptionally

upregulated in the high Met cell-lines.

87

Several differential metabolites were found to be transported to and secreted from

the high Met cell-lines. Among them xanthine (a purine base product on the pathway of

purine degradation, and is subsequently converted to uric acid by the action of the

xanthine oxidase enzyme), which is predicted by iMAT to be transported from the

cytoplasm to the peroxisome. This is in agreement with the findings of (Lu, Bennet et al.

2010).

This analysis focused on active pathways and post-transcriptionally upregulated

genes since literary validations are more readily available than for the inactive pathways.

The tabulated results of this analysis and the inactive pathways differential genes analysis

can be found in the accompanying excel files (“Differentiated Genes from predicted

in/active rxns High Low MET_24h”). These results suggest possible biomarkers of breast

cancer progression as well as opportunities of interrupting tumour progression through

the targeting of metabolic pathways.

88

3.2 Future Directions

In the immediate future the following feasible enhancements to iMAT’s integrative

approach can be easily implemented, the first having already passed a preliminary test of

viability.

3.2.1 Integrating iMAT’s flux predictions to model a cancer metabolic profile via quantification

One of iMAT’s drawbacks is affixed in its inability to predict elevation or reduction in

metabolic flux activity when comparing biological conditions (such as wild-type

compared to cancer cell, or aggressive cancer compared to first stage, where the

differential changes are in many cases the phenomenon of interest), since its prediction

pertains only to a boolean activity state. By taking iMAT’s confident flux activity

predictions and constraining them in the relevant model (in the HGF/SF-Met analysis the

general human model), and applying FVA, we create a reduced optimal solution space,

and thus minimize the FVA predicted flux range, which enables the prediction of flux

activity elevation and reduction.

We performed preliminary tests of this model via pathway enrichment analysis,

across all 4 measured time-stamps (0min, 10min, 30min, 24h) of the 6 high and low Met

cell-lines, and received very encouraging results. We found that the predicted elevated

and reduced pathways fit the known pathway signature of each high Met cell-line time-

stamp, forming a predicted kinetic trajectory of HGF/SF Met induction. Our goal is to

project such kinetic trajectories mapping the predicted behavioural (elevation/reduction)

cycle of the HGF/SF induced high Met metabolic signature.

89

When comparing these initial results with the pathway enrichment analysis

described in section 3.1.3.1 for the 30min time-stamp, we see that known HGF/SF driven

elevated pathways such as glycolysis, ROS detoxification and pyrimidine biosynthesis

are uncovered (Kaplan, Firon et al. 2000)(Personal correspondence with Prof. Ilan

Tsarfaty), and since are active in both high and low Met cell-lines cannot be discerned by

simple pathway enrichment analysis on iMAT’s raw flux activity predictions.

3.2.2 Weighted iMAT A criticism of iMAT’s qualitative input (discrete tri-values representing

low/moderate/high expression levels) is that it loses the fine granularity of the expression

intensities. Abating this (Banta, Vemula et al. 2007) describes a moderate correlation

between expression and metabolic flux, such that the quantitative values should not have

much impact. In addition it is a known gene expression measurement fact that it is almost

impossible to compare one set of expression levels with another, even when measured by

the same scientist with the same technology due the tremendous noise levels, which

further strengthens the discretization logic. Having said that, it would still be interesting

and worthy to confirm that a weighted version of iMAT does not encompass added value.

One possible means to implement the integration of expression weights (levels), is to

modify iMAT’s objective function to include the relative expression weights.

90

4 Discussion

This thesis presents two challenges. The first, a computational endeavour of bringing

forth a research oriented method to the systems biology community at an industry level

quality. The second, an exploratory venture of understanding the limitations and

applicability of the integrative approach presented by (Shlomi, Cabili et al. 2008),

expanding it to span other forms of high-throughput molecular data, and deducing its

illustrative effectiveness and potential in the quest of modeling cancer.

iMAT online availability: We introduced here, an integrative metabolic analysis tool

(iMAT), which is a web-based implementation of the method of (Shlomi, Cabili et al.

2008), which will serve the community by enabling the prediction of metabolic fluxes by

integrating metabolic networks with gene and protein expression, and reactome array

data. We demonstrated its utility in the prediction of Human breast-cancer metabolism.

As a side benefit this will enable the construction and accumulation of a corpus of high-

throughput molecular data, which can further advance and facilitate our research

objectives.

Metabolic Breast Cancer Modeling: The various manipulations on iMAT’s flux

activity predictions uncovered significant pathways, enzymes and metabolites to the

description of HGF/SF induced Met breast cancer. Experimental validations now need to

be carried out to further explore the interesting sensible directions unearthed.

Metabolic signature: The approaches suggested here initiate a course for the

computational investigation of cancer by way of culminating metabolic alterations into a

differential metabolic signature, which can then be utilized for disease diagnosis,

prognosis and treatment. iMAT’s metabolic flux distribution predictions denote the first

91

step towards disease profiling, and can be employed in various statistical inference and

descriptive tests such as active and inactive pathway enrichment analysis, determining

gene post transcriptional regulation, prediction of metabolite uptake and secretion under

multifarious conditions, and so forth, as far as our imagination goes. From this, emerged

the need for the design of a learning framework, to process metabolic transformations

into an aggregated comprehensive distinctive signature.

92

5 Bibliography

Akesson, M., J. Forster, et al. (2004). "Integration of gene expression data into genome-scale metabolic models." Metabolic Engineering 6(4): 285-293.

Altucci, L., M. Leibowitz, et al. (2007). "RAR and RXR modulation in cancer and metabolic disease." Nature Reviews Drug Discovery 6(10): 793-810.

Apic, G., T. Ignjatovic, et al. (2005). "Illuminating drug discovery with biological pathways." FEBS letters 579(8): 1872-1877.

Awad, A., H. Williams, et al. (2003). "Effect of phytosterols on cholesterol metabolism and MAP kinase in MDA-MB-231 human breast cancer cells." The Journal of Nutritional Biochemistry 14(2): 111-119.

Banta, S., M. Vemula, et al. (2007). "Contribution of gene expression to metabolic fluxes in hypermetabolic livers induced through burn injury and cecal ligation and puncture in rats." Biotechnology and bioengineering 97(1): 118.

Bard, J. (1998). Practical bilevel optimization: algorithms and applications, Kluwer Academic Pub.

Becker, S. A. and B. O. Palsson (2008). "Context-specific metabolic networks are consistent with experiments." PLoS Computational Biology 4(5).

Beloqui, A., M. E. Guazzaroni, et al. (2009). "Reactome array: Forging a link between metabolome and genome." Science 326(5950): 252.

Bilu, Y., T. Shlomi, et al. (2006). "Conservation of expression and sequence of metabolic genes is reflected by activity across metabolic states." PLoS Comput Biol 2: e106.

Bottaro, D. P., J. S. Rubin, et al. (1991). "Identification of the hepatocyte growth factor receptor as the c-met proto-oncogene product." Science 251(4995): 802.

Burgard, A. and C. Maranas (2003). "Optimization-based framework for inferring and testing hypothesized metabolic objective functions." Biotechnology and Bioengineering 82(6): 670-677.

Burgard, A., P. Pharkya, et al. (2003). "Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization." Biotechnology and Bioengineering 84(6): 647-657.

Chatziioannou, A., G. Palaiologos, et al. (2003). "Metabolic flux analysis as a tool for the elucidation of the metabolism of neurotransmitter glutamate." Metabolic engineering 5(3): 201-210.

Cho, H. and D. Leahy (2002). "Structure of the extracellular region of HER3 reveals an interdomain tether." Science 297(5585): 1330.

Chuang, H. Y., E. Lee, et al. (2007). "Network-based classification of breast cancer metastasis." Molecular systems biology 3: 140.

Cline, M. S., M. Smoot, et al. (2007). "Integration of biological networks and gene expression data using Cytoscape." NATURE PROTOCOLS-ELECTRONIC EDITION- 2(10): 2366.

Cooper, C. S. (1992). "The met oncogene: from detection by transfection to transmembrane receptor for hepatocyte growth factor." Oncogene 7(1): 3.

Copeland, W., J. Wachsman, et al. (2002). "Mitochondrial DNA alterations in cancer." Cancer investigation 20(4): 557-569.

93

Covert, M., E. Knight, et al. (2004). "Integrating high-throughput and computational data elucidates bacterial networks." Nature 429(6987): 92-96.

da Rocha, A., R. Giorgi, et al. (2006). "Hepatocyte growth factor-regulated tyrosine kinase substrate (HGS) and guanylate kinase 1 (GUK1) are differentially expressed in GH-secreting adenomas." Pituitary 9(2): 83-92.

Daran-Lapujade, P., M. L. A. Jansen, et al. (2004). "Role of transcriptional regulation in controlling fluxes in central carbon metabolism of Saccharomyces cerevisiae: a chemostat culture study." Journal of Biological Chemistry 279(10): 9125-9138.

DeBerardinis, R. J., N. Sayed, et al. (2008). "Brick by brick: metabolism and tumor cell growth." Current opinion in genetics & development 18(1): 54-61.

Domach, M., S. Leung, et al. (2000). "Computer model for glucose-limited growth of a single cell of Escherichia coli B/rA (Reprinted from Biotechnology and Bioengineering, vol 26, pg 203-216, 1984)." Biotechnology and Bioengineering 67(6): 827-840.

Duarte, N., S. Becker, et al. (2007). "Global reconstruction of the human metabolic network based on genomic and bibliomic data." Proceedings of the National Academy of Sciences 104(6): 1777.

Duarte, N., M. Herrgård, et al. (2004). "Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model." Genome Research 14(7): 1298.

Durmu Tekir, S., T. Çak r, et al. (2006). "Analysis of enzymopathies in the human red blood cells by constraint-based stoichiometric modeling approaches." Computational Biology and Chemistry 30(5): 327-338.

Edelman, L., J. Eddy, et al. (2009). "In silico models of cancer." Wiley Interdisciplinary Reviews: Systems Biology and Medicine.

Edwards, J., R. Ibarra, et al. (2001). "In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data." Nature biotechnology 19(2): 125-130.

Edwards, J. and B. Palsson (2000). "The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities." Proceedings of the National Academy of Sciences 97(10): 5528.

Famili, I., J. Förster, et al. (2003). "Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network." Proceedings of the National Academy of Sciences of the United States of America 100(23): 13134.

Famili, I., J. Fצrster, et al. (2003). "Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network." Proceedings of the National Academy of Sciences of the United States of America 100(23): 13134.

Feist, A. and B. Palsson (2008). "The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli." Nature biotechnology 26(6): 659-667.

Fell, D. (1997). Understanding the control of metabolism, Portland Press London. Fong, S., A. Joyce, et al. (2005). "Parallel adaptive evolution cultures of Escherichia coli

lead to convergent growth phenotypes with different gene expression states." Genome Research 15(10): 1365.

94

Fong, S. and B. Palsson (2004). "Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes." Nature genetics 36(10): 1056-1058.

Fong, S. S. and B. Palsson (2004). "Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes." Nature genetics 36(10): 1056-1058.

Förster, J., I. Famili, et al. (2003). "Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network." Genome Research 13(2): 244.

Gharbi, S., P. Gaffney, et al. (2002). "Evaluation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell system." Molecular & Cellular Proteomics 1(2): 91.

Guldberg, P., F. Rey, et al. (1998). "A European multicenter study of phenylalanine hydroxylase deficiency: classification of 105 mutations and a general system for genotype-based prediction of metabolic phenotype." The American journal of human genetics 63(1): 71-79.

Harris, R., K. Burns, et al. (1993). "Hepatocyte growth factor stimulates phosphoinositide hydrolysis and mitogenesis in cultured renal epithelial cells." Life sciences 52(13): 1091.

Hu, Z., J. Mellor, et al. (2005). "VisANT: data-integrating visual framework for biological networks and modules." Nucleic acids research 33(Web Server Issue): W352.

Ibarra, R., J. Edwards, et al. (2002). "Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth." Nature 420(6912): 186-189.

Jamshidi, N. and B. Palsson (2008). "Formulating genome-scale kinetic models in the post-genome era." Molecular Systems Biology 4: 171.

Joshi, A. and B. Palsson (1990). "Metabolic dynamics in the human red cell. Part III--Metabolic reaction rates." Journal of theoretical biology 142(1): 41.

Joshi, A. and B. O. Palsson (1989). "Metabolic dynamics in the human red cell: Part I--A comprehensive kinetic model." Journal of theoretical biology 141(4): 515-528.

Joyce, A. R. and B. O. Palsson (2006). "The model organism as a system: integrating'omics' data sets." Nature Reviews Molecular Cell Biology 7(3): 198-210.

Kanehisa, M. and S. Goto (2000). "KEGG: Kyoto encyclopedia of genes and genomes." Nucleic acids research 28(1): 27.

Kaplan, O., M. Firon, et al. (2000). "HGF/SF activates glycolysis and oxidative phosphorylation in DA3 murine mammary cancer cells." Neoplasia (New York, NY) 2(4): 365.

Kauffman, K., P. Prakash, et al. (2003). "Advances in flux balance analysis." Current Opinion in Biotechnology 14(5): 491-496.

Koch, A., A. Mancini, et al. (2005). "The SH2-domian-containing inositol 5-phosphatase (SHIP)-2 binds to c-Met directly via tyrosine residue 1356 and involves hepatocyte growth factor (HGF)-induced lamellipodium formation, cell scattering and cell spreading." Oncogene 24(21): 3436-3447.

Lanpher, B., N. Brunetti-Pierri, et al. (2006). "Inborn errors of metabolism: the flux from Mendelian to complex diseases." Nature Reviews Genetics 7(6): 449-460.

95

Largo, C., S. Alvarez, et al. (2006). "Identification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma." Haematologica 91(2): 184.

Lee, I. and B. Palsson (1990). "A comprehensive model of human erythrocyte metabolism: extensions to include pH effects." Biomedica biochimica acta 49(8-9): 771.

Lee, J., E. Gianchandani, et al. (2006). "Flux balance analysis in the era of metabolomics." Briefings in bioinformatics 7(2): 140.

Lee, S., C. Palakornkule, et al. (2000). "Recursive MILP model for finding all the alternate optima in LP models for metabolic networks." Computers and Chemical Engineering 24(2-7): 711-716.

Lu, X., B. Bennet, et al. (2010). "Metabolomic changes accompanying transformation and acquisition of metastatic potential in a syngeneic mouse mammary tumor model." The Journal of biological chemistry.

Luscombe, N. M., M. Madan Babu, et al. (2004). "Genomic analysis of regulatory network dynamics reveals large topological changes." Nature 431(7006): 308-312.

Ma, H., A. Sorokin, et al. (2007). "The Edinburgh human metabolic network reconstruction and its functional analysis." Molecular Systems Biology 3: 135.

Mahadevan, R., D. Bond, et al. (2006). "Characterization of metabolism in the Fe (III)-reducing organism Geobacter sulfurreducens by constraint-based modeling." Applied and Environmental Microbiology 72(2): 1558.

Mahadevan, R. and C. Schilling (2003). "The effects of alternate optimal solutions in constraint-based genome-scale metabolic models." Metabolic engineering 5(4): 264-276.

Majewski, R. and M. Domach (1990). "Simple constrained-optimization view of acetate overflow in E. coli." Biotechnology and Bioengineering 35(7): 732-738.

Mo, M. and B. Palsson (2009). "Understanding human metabolic physiology: a genome-to-systems approach." Trends in Biotechnology 27(1): 37-44.

Mulquiney, P. and P. Kuchel (1999). "Model of 2, 3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: computer simulation and metabolic control analysis." Biochemical Journal 342(Pt 3): 597.

Mulquiney, P. and P. Kuchel (1999). "Model of 2, 3-bisphosphoglycerate metabolism in the human erythrocyte based on detailed enzyme kinetic equations: equations and parameter refinement." Biochemical Journal 342(Pt 3): 581.

Muoio, D. and C. Newgard (2006). "Obesity-related derangements in metabolic regulation."

Oberhardt, M., B. Palsson, et al. (2009). "Applications of genome-scale metabolic reconstructions." Molecular Systems Biology 5(1).

Overbeek, R., T. Begley, et al. (2005). "The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes." Nucleic acids research 33(17): 5691.

Palsson, B. (2006). Systems biology: properties of reconstructed networks, Cambridge University Press New York, NY, USA.

Palsson, B. (2009). "Metabolic systems biology." FEBS letters.

96

Park, S. J., S. Y. Lee, et al. (2005). "Global physiological understanding and metabolic engineering of microorganisms based on omics studies." Applied microbiology and biotechnology 68(5): 567-579.

Pharkya, P., A. Burgard, et al. (2004). "OptStrain: a computational framework for redesign of microbial production systems." Genome Research 14(11): 2367.

Price, N., J. Reed, et al. (2004). "Genome-scale models of microbial cells: evaluating the consequences of constraints." Nature Reviews Microbiology 2(11): 886-897.

Quash, G., G. Fournet, et al. (2003). "Anaplerotic reactions in tumour proliferation and apoptosis." Biochemical pharmacology 66(3): 365-370.

Ramakrishna, R., J. Edwards, et al. (2001). "Flux-balance analysis of mitochondrial energy metabolism: consequences of systemic stoichiometric constraints." American Journal of Physiology- Regulatory, Integrative and Comparative Physiology 280(3): 695.

Reed, J. and B. Palsson (2003). "Thirteen years of building constraint-based in silico models of Escherichia coli." Journal of bacteriology 185(9): 2692.

Romero, P., J. Wagg, et al. (2004). "Computational prediction of human metabolic pathways from the complete human genome." Genome biology 6(1): R2.

Ronen, S. and M. Leach (2000). "Breast imaging technology: Imaging biochemistry- applications to breast cancer." Breast Cancer Res 3(1): 36.

Rossell, S., C. C. van der Weijden, et al. (2006). "Unraveling the complexity of flux regulation: a new method demonstrated for nutrient starvation in Saccharomyces cerevisiae." Proceedings of the National Academy of Sciences of the United States of America 103(7): 2166.

Schilling, C., M. Covert, et al. (2002). "Genome-scale metabolic model of Helicobacter pylori 26695." Journal of bacteriology 184(16): 4582.

SCHILLING, C. and B. PALSSON (2000). "Assessment of the metabolic capabilities of Haemophilus influenzae Rd through a genome-scale pathway analysis." Journal of theoretical biology 203(3): 249-283.

Segre, D., D. Vitkup, et al. (2002). "Analysis of optimality in natural and perturbed metabolic networks." Proceedings of the National Academy of Sciences 99(23): 15112.

Selvarasu, S., I. Karimi, et al. "Genome-scale modeling and in silico analysis of mouse cell metabolic network."

Shi, Y. and P. Burn (2004). "Lipid metabolic enzymes: emerging drug targets for the treatment of obesity." Nature Reviews Drug Discovery 3(8): 695-710.

Shlomi, T., O. Berkman, et al. (2005). "Regulatory on/off minimization of metabolic flux changes after genetic perturbations." Proceedings of the National Academy of Sciences of the United States of America 102(21): 7695.

Shlomi, T., M. Cabili, et al. (2008). "Network-based prediction of human tissue-specific metabolism." Nat Biotechnol 26(9): 1003–1010.

Shlomi, T., Y. Eisenberg, et al. (2007). "A genome-scale computational study of the interplay between transcriptional regulation and metabolism." Molecular Systems Biology 3: 101.

Stelling, J., S. Klamt, et al. (2002). "Metabolic network structure determines key aspects of functionality and regulation." Nature 420(6912): 190-193.

97

Subramanian, A., H. Kuehn, et al. (2007). "GSEA-P: a desktop application for Gene Set Enrichment Analysis." Bioinformatics 23(23): 3251.

Tennant, D., R. Duran, et al. (2009). "Metabolic transformation in cancer." Carcinogenesis 30(8): 1269.

Thiele, I., N. Price, et al. (2005). "Candidate metabolic network states in human mitochondria: Impact of diabetes, ischemia, and diet." Journal of Biological Chemistry 280(12): 11683-11695.

Tummala, S. B., S. G. Junne, et al. (2003). "Transcriptional analysis of product-concentration driven changes in cellular programs of recombinant Clostridium acetobutylicumstrains." Biotechnology and bioengineering 84(7): 842-854.

Varma, A., B. Boesch, et al. (1993). "Stoichiometric interpretation of Escherichia coli glucose catabolism under various oxygenation rates." Applied and Environmental Microbiology 59(8): 2465.

Varma, A. and B. Palsson (1994). "Metabolic flux balancing: basic concepts, scientific and practical use." Nature biotechnology 12(10): 994-998.

Vo, T., H. Greenberg, et al. (2004). "Reconstruction and functional characterization of the human mitochondrial metabolic network based on proteomic and biochemical data." Journal of Biological Chemistry 279(38): 39532-39540.

Wallace, D. (2005). "A mitochondrial paradigm of metabolic and degenerative diseases, aging, and cancer: a dawn for evolutionary medicine."

Wiback, S., R. Mahadevan, et al. (2004). "Using metabolic flux data to further constrain the metabolic solution space and predict internal flux patterns: the Escherichia coli spectrum." Biotechnology and Bioengineering 86(3): 317-331.

Wiback, S. and B. Palsson (2002). "Extreme pathway analysis of human red blood cell metabolism." Biophysical Journal 83(2): 808-818.

Workman, C., H. Mak, et al. (2006). "A systems approach to mapping DNA damage response pathways." Science 312(5776): 1054.

Yang, C., Q. Hua, et al. (2002). "Integration of the information from gene expression and metabolic fluxes for the analysis of the regulatory mechanisms in Synechocystis." Applied microbiology and biotechnology 58(6): 813-822.

Young, P., R. Ebner, et al. (2006). Cancer-linked genes as targets for chemotherapy, Google Patents.

Zachow, R. and J. Woolery (2002). "Effects of hepatocyte growth factor on cyclic nucleotide-dependent signaling and steroidogenesis in rat ovarian granulosa cells in vitro." Biology of reproduction 67(2): 454.

98

6 Supplementary material

Please see attached files:

1. Automatic models list.xls – iMAT supported automatic metabolic models

2. Differentiated Enriched Pathways High Low MET _10.xls – differentiated high

Met significantly enriched pathways for the 10min time-stamp

3. Differentiated Enriched Pathways High Low MET _30.xls - differentiated high

Met significantly enriched pathways for the 30min time-stamp

4. Differentiated Enriched Pathways High Low MET_0.xls - differentiated high Met

significantly enriched pathways for the 0min time-stamp

5. Differentiated Enriched Pathways High Low MET_24h.xls - differentiated high

Met significantly enriched pathways for the 24h time-stamp

6. Differentiated Genes from predicted active rxns High Low MET_24h.xls

7. Differentiated Genes from predicted inactive rxns High Low MET_24h.xls

8. Elevated High Met Celline Enriched Pathways_0.xls – Enriched pathways found

to be elevated in the high Met cell-line for the 0min time-stamp



10. Elevated High Met Celline Enriched Pathways_24h.xls – Enriched pathways

found to be elevated in the high Met cell-line for the 24h time-stamp



99

12. Expression Differentiated Enriched Pathways High Low MET_24h.xls – High

Met differential hyper-geometric based pathway enrichment analysis based solely

on expression data.

13. Expression High Met Celline Enriched Pathways_24h.xls - High Met hyper-

geometric based pathway enrichment analysis based solely on expression data.

14. Expression Low Met Celline Enriched Pathways_24h.xls – Low Met hyper-

geometric based pathway enrichment analysis based solely on expression data.

15. Genes requested by Prof. Ilan Tsarfaty.xls

16. iMAT User Guide.doc

17. Reduced High Met Celline Enriched Pathways_0.xls – Enriched pathways found

to be reduced in the high Met cell-line for the 0min time-stamp



19. Reduced High Met Celline Enriched Pathways_24h.xls – Enriched pathways

found to be reduced in the high Met cell-line for the 24h time-stamp



21. Low Met Celline Enriched Pathways_24h.xls

22. High Met Celline Enriched Pathways_0.xls

23. High Met Celline Enriched Pathways_24h.xls

24. Low Met Celline Enriched Pathways_0.xls



100



תילובטמ היצרגטניאל ילכ :imat integrative metabolic analysis...blavatnik school...

Documents