genome t.s. t.f biochemical pathway physical interactions functional interactions challenge of the...

Genome

t.s.

t.f

Biochemical pathway

Physical interactionsFunctional interactions

Challenge of the Post Genomic Era

Understand how genes andproteins interact to giverise to cellular processes

Protein 2

Protein 1

Living cell

Systems Biology: (according to Sydney Brenner (1997))Systems Biology: (according to Sydney Brenner (1997))

“The first requirement will be for a theoretical framework in which to embed all the detailed knowledge we have accumulated, to allow us to compute outcomes of the complex interactions and to start to understand the dynamics of the system.

The second will be to make parallel measurements of the behavior of many components during the execution by the cell of an integrated action in order to test whether the theory is right.

Is there some other approach?

If I knew it I would be doing it, and not writing about the problem. “

Huge networks of thousands of molecules interconnected via thousands of interactions,which depend on time, sub-cellularLocalization, cellular state

The biological function of a gene is defined by its interactions(physical or functional) with other genes/proteins,

These interactions definethe role it plays in the system

Major efforts devoted world-wide to characterizing interactions onthe genome scale: physical interactome, genetic interactions, cellularprocesses (metabolic, signal transdcution, regulatory pathways).

Descriptions of the cellular circuitry

Interactions & Networks:

Only a small fraction of the routes is charted:known metabolic, regulatory & signal transduction pathways..

Descriptions of the cellular circuitry

Networks and pathways

Pathways: the charted ‘routes’

Analyzing the Networks

I. Metabolic networks:pathway inference

and functional distanceusing weighted graphs

II. Protein-protein interaction networks:Identifying functional

modules by combineddata analysis

Pathway analysis

Protein interactions

Analysis of Metabolic Network

Stringing together chemicalreactions via their substratesand products:

Must be done in a biologicallyAnd chemically meaningful way:

Treatment of reversible reactionsTreatment of bulk compounds

I-Building the network:

P6P1

P7

P8

2.5 51.0 7.0

15.0

Weighted

A

B

R1 R2

R3

R1

R2

R3

A

B

C

D

E

Compound graph Reaction graph Bipartite graph

C

D

E

Metabolic networks as graphs

4.2.1.52H2O

dihydrodipicolinic acidL-Aspartic Semialdehyde

Pyruvate

3.5.1.18LL-diaminopimelic acid

succinateSucinyl diaminopimelate

H2O

Invalid pathways, through shortcut

4.2.1.52 H2OL-Aspartic

Semialdehyde3.5.1.18

LL-diaminopimelic acid

Reactions

The problem of pool metabolitescompound reactionsH2O 1615NAD+ 578NADH 569NADP+ 564NADPH 559Oxygen 527ATP 435Orthophosphate 349ADP 324CO2 323CoA 303H+ 272NH3 270Pyrophosphate 252UDP 190

Compound connectivity

0.1

1

10

100

1000

10000

1 10 100 1000 10000

number of reactions (avg=4.9, std=34.9)

nu

mb

er

of

co

mp

ou

nd

s

Reaction connectivityAll compounds

1

10

100

1000

10000

0 1 2 3 4 5 6 7

Number of reactants

Nu

mb

er

of

rea

cti

on

s

products (avg=2.0, std=0.6) substrates (avg=2.0, std=0.6)

Reaction connectivityWithout ubiquitous compounds

1

10

100

1000

10000

0 1 2 3 4 5 6

Number of reactants

Nu

mb

er

of

rea

cti

on

s

products (avg=1.3,std=0.5) substrates (avg=1.4, std=0.5)

van Helden et al., (2002); Croes et al. (2006)

CB

Propertiesof

metabolicnetworks

“hubs” (Jeong et al,2000)

Pool metabolites

Nu

mbe

r of

rea

ctio

n pa

irs

Nu

mbe

r of

rea

ctio

n pa

irs

Nu

mbe

r of

com

pou

ndsA

without pool metabolites

Metabolic network representations Croes et al., (2006)

P6P1

P7

P8

2.5 51.0 7.0

15.0

Weighted

A

B

R1 R2

R3

R1

R2

R3

A

B

C

D

E

Compound graph Reaction graph Bipartite graph

C

D

E

Raw bipartite graph

R1 R2

R3

A

B

C

D

E

(30)

(10) (25)

(67)

(15)

Weighted bipartite graph

Each compound is assigneda weight equal to its connectivity in the entire network

Each compound is assigneda weight equal to its connectivity in the entire network

Path finding in the global metabolic graph.

1

2

E2

E1

A very large # of pathslead from E1 to E2,

Which ones are meaningful?

A very simple approach:look for the shortest path I-in raw graph: II-in raw graph with ubiquitous compounds excluded: III-in weighted graph:

Path with min # steps

Lightest path:Path where sum of the weights is min

Path finding in metabolic graphs: the test

Known pathway: Lysine biosynthesis, E.coli

Rebuild the pathway How ??By searching for thelightest path between the1st and last reactions.in the ‘golbal’ graph

(http://www.scmbb.ulb.ac.be/pathfinding/)(http://www.scmbb.ulb.ac.be/pathfinding/)

Escherichia coli Lysine Biosynthesi s

Annotated

4.1.1.20

3.5.1.18

11-2,6-diaminopimelate

5.1.1.7

meso-2,6- diaminopimelate

2.3.1.117

n-succinyl-1-2-amino-6-oxopimelate

4.2.1.52

2,3-dihydrodipicolinate

2.7.2.4

4-phospho- l-aspartate

1.2.1.11

l-aspartic 4-semialdehyde

1.3.1.26

2,3,4,5-tetra hydrodipicolinate

2.6.1.17

n-succinyl-1-2,6- diaminopimelate

A)

Mode: Weighted Grap hMaximum weight: 2000

Maximum metabolic steps: 16Number of queries : 1

Number of path per query :5

All possible path havinga score from 28.0 to 38.0

4-phospho-l-aspartate

1.2.1.11

2.7.2.4

atp

adp

l-aspartate

l-lysine

R04336

l-2-amino- 6-oxopimelate

2,3-dihydrodipicolinate

1.3.1.26 1.3.1.26

2.3.1.117

n-succinyl-l-2- amino-6-oxopimelate

2.6.1.17

n-succinyl-l- 2,6-diaminopimelate

l-aspartic 4-semialdehyde

4.2.1.52

5.1.1.7


co2

2,3,4,5-tetrahydrodipicolinate

ll-2,6-diaminopimelate

4.1.1.20

3.5.1.18

1.4.1.16

n6-acetyl-l-2,6 -diaminopimelate

3.5.1.47

n-acetyl-l-2- amino-6-oxopimelate

2.6.1.-

2.3.1.-

D)

Mode: Unweighted graph (Raw), highly connected compounds excluded

Maximum weight: 2000Maximum metabolic steps: 16

Number of queries : 1Number of path per query :5

All possible path havinga length from 6.0 to 8.0

4.1.1.12

l-alanine

beta-alanine

3.4.13.4

6.3.2.11

2.6.1.18

2.6.1.71

co2

beta-alanyl -l-lysine

l-lysine

4.1.1.114.1.1.15

atp

2.6.1.12

4.1.1.20

meso-2,6-diaminopimelate

l-aspartate

adp

4-phospho-l- aspartate

2.7.2.4

C)

Mode: Unweighted graph, Raw graphMaximum weight: 2000

Maximum metabolic steps: 16Number of queries : 1

Number of path per query :5

All possible path havinga length from 4.0 to 6.0

2.7.2.4

l-aspartate

adp

atp

4-phospho-l-aspartate

6.3.2.20

l-lysine

6.3.2.76.3.2.7

co2

4.1.1.20


6.3.2.13

6.3.3.3

B)

Annotated Raw Raw with excluded compounds

Weighted

Annotated Raw Raw with excluded

compoundsWeighted

A. Shortest path

Representation Average sensitivity Average PPV Accuracy

Raw graph 31.4% 25.4% 28.4%Excludedcompounds

68.0% 63.0% 65.5%

Weighted Graph 88.5% 83.4% 85.9%B. Best among the 5 shortest paths

Representation Average sensitivity Average PPV Accuracy

Raw graph 33.3% 26.5% 29.9%Excludedcompounds

71.4% 66.7% 69.1%

Weighted Graph 92.2% 88.1% 90.1%

Test results on all known pathways ofE-coli and yeast (with >3 reactions)

What does this tell us??

Metabolic pathways seem to represent paths through least ‘promiscuous’ compounds

They seem to be ’specific’ in terms of the chemistry of their metabolites

This might reflect evolutionary pressure to keep the Nº of possible routes for chemical transformations of compound A to compound B, low...

What is the origin of such pressure?

Likely: the requirement to minimize the free energy balance

Biochemically feasible pathways maycorrespond to paths with lowestfree energy

This could be used in inferring pathwaysfor newly sequenced genomes

Network of physically interacting proteins

The physical interactome and protein complexes The physical interactome and protein complexes

B

Complexesare the cell’sfactories

-spliceosome-proteasome-ribosome-replication compl. -…-cytochrome bc1-----------------

Essential role!

Complexesare the cell’sfactories

-spliceosome-proteasome-ribosome-replication compl. -…-cytochrome bc1-----------------

Essential role! Our knowledge about them is still limited… - At best, only a fraction of the complexes is well characterized- Even for those, we know little about their stability, life times,- Also, components of complexes may change dynamically, as a function of cellular, state, localization, time etc.

Our knowledge about them is still limited… - At best, only a fraction of the complexes is well characterized- Even for those, we know little about their stability, life times,- Also, components of complexes may change dynamically, as a function of cellular, state, localization, time etc.

B

EA

Y

C

D

To what extent are multi protein complexes co-regulated at the transcriptional level?To what extent are multi protein complexes co-regulated at the transcriptional level?

c y b a d e

TF1TF1 TF2TF2 TF4TF4TF3TF3 ......

B

C

E

A

Y

D

YPR184w YKL085w YPR160w

YPR184w

YKL085w

YPR160w

YPR184w

YKL085w

YPR160w

Identify regulatory motifs,& use them to predict ‘regulons’(Simonis et al.2004a,b)

Here:

Identify transcriptional modulesin protein complexes

mRNA expression levels in ≠ conditions and timePoints----549 DNA chips

http://bioinfo.mbb.yale.edu/genome/expression/array.jpg

Protein complexes:

Entities identified by‘pull-down’ experiments, low/ high throughput

B

EAYC

D

+Other cellular proteins

B

C

E

= YLR258w

= YER133w

= YER054c

A

Y

D

= YPR184w

= YKL085w

= YPR160w

Bank of ORF's fused with a tag

Expression in yeast and lysis

Tandem affinity purification

Identification of components by

Mass Spec

Y

y TAPORF tag

1D SDS PAGE

MIPS Repositoryof curated data

on yeast protein complexesfrom individual labs,

total of 243 complexes

MIPS/CYGD & SGD database

Network ofS. CerevisieaMIPSComplexes(243)

GenePro(Cytoscape)

Heat Shock 5 min.

GenePro(Cytoscape)

Heat Shock 15 min.

GenePro(Cytoscape)

Spellman : cell cycle (Mol Biol Cell., 1998)

Rosetta compendium : mutants analysis (Hughes et al., Cell, 2000)

Gasch : stress response, drugs, carbon sources (Gasch et al., Mol Biol Cell., 2000)

6296

OR

Fs

549 conditions

Cleaning;normalising

Condition specific expression of complexesCondition specific expression of complexesCondition specific expression of complexesCondition specific expression of complexes

243

MIP

S

com

plex

es

549 conditions

Normalized expression ratios matrix

Identify conditions in which components of individual complexes are coherently expressed -> t-test

Identify conditions in which components of individual complexes are coherently expressed -> t-test

t-test p-value matrix

Conclusion I

Complexes can be subdivided into 2 categories:

1-Complexes coherently expressed in many conditions

-ribosome-proteasome-respiration chain complexes-..

2-Complexes coherently expressed in only a few conditions

-DNA repair complexes-cytoplasmic translation elongation

-cytoskeleton, microtubules...

Complexes can be subdivided into 2 categories:

1-Complexes coherently expressed in many conditions

-ribosome-proteasome-respiration chain complexes-..

2-Complexes coherently expressed in only a few conditions

-DNA repair complexes-cytoplasmic translation elongation

-cytoskeleton, microtubules...

Identify transcriptional modules withinProtein complexesIdentify transcriptional modules withinProtein complexes

Transcriptional modules: groups of genesthat are up and down regulated togetherunder the same set of conditions. They have coherent expression profiles

Transcriptional modules: groups of genesthat are up and down regulated togetherunder the same set of conditions. They have coherent expression profiles

Discriminant analysis based on expression ratios in different conditions

Discriminant analysis based on expression ratios in different conditions

Results:

Out of 63 complexes with >5 components (≠ expressed in at least 2 conditions)

-26 complexes have > 90% of their genes assigned to the complex

-15 complexes have between 50-90% of their genes assigned to the complex


-14 complexes have none of their genes assigned

Out of 63 complexes with >5 components (≠ expressed in at least 2 conditions)

-26 complexes have > 90% of their genes assigned to the complex



-14 complexes have none of their genes assigned

Coat Complexes(25 genes; 45 selected conditions)

Coat Complexes(25 genes; 45 selected conditions)

18 genes assigned 7 genes unassigned

GenePro(Cytoscape)

Replication Fork ComplexReplication Fork Complex

GenePro(Cytoscape)

16 genes assigned14 genes unassigned

COMPLEX GENE ORF P(da)

POL1 YNL102W 0.95155315

PRI2 YKL045W 0.92757655

POL12 YBL035C 0.8740073DNA polymerase _ I primasecomplex

PRI1 YIR008C 0.14922267

POL32 YJR043C 0.88528264

HYS2 YJR006W 0.79712089DNA polymerase _ III

CDC2 YDL102W 0.79079457

POL2 YNL262W 0.85845235

DPB2 YPR175W 0.7575824DNA polymerase _ II

DPB3 YBR278W 0.64557167

Exonucleases RAD27 YKL113C 0.99432872

PCNA POL30 YBR088C 0.69668482

RFA1 YAR007C 0.98673268

RFA2 YNL312W 0.87875323Replication factor A complex

RFA3 YJL173C 0.73210048

TOP1 YOL006C 0.82395424Topoisomerases

TOP2 YNL088W 0.73032433

ECM32 YER176W 0.23499023DNA helicases

DNA2 YHR164C 0.05406037

DNA ligases CDC9 YDL164C 0.03708159

DNA polymerase _ IV POL4 YCR014C 0.03919368

DNA polymerase MIP1 YOR330C 0.03182497

REV7 YIL139C 0.05724429DNA polymerase z

REV3 YPL167C 0.04090946

RFC4 YOL094C 0.50870756

RFC5 YBR087W 0.33589936

RFC3 YNL290W 0.26089619

RFC2 YJR068W 0.10584985

Replication factor C complex

RFC1 YOR217W 0.05246123

RNase H1 RNH1 YMR234W 0.02989109

Replication ForkComplexes

Regulon predictedon basis ofpredictedregulatory motifs

(Simonis et al., 2004)

Replication ForkComplexes

Regulon predictedon basis ofpredictedregulatory motifs

(Simonis et al., 2004)

y = 0.6764x + 0.2066

R2 = 0.5906

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2

P_Co-Regulation (Motifs)

P_C

o-R

egul

tion

(Exp

ress

ion)

Some complexes behave as a single transcriptional module

Others do not: they contain more than onemodule, usually a large and a smaller one,whose expressions are often anti-regulated;or none at all.

What are the biological reasons for this behaviour?

Some complexes behave as a single transcriptional module

Others do not: they contain more than onemodule, usually a large and a smaller one,whose expressions are often anti-regulated;or none at all.

What are the biological reasons for this behaviour?

Conclusion II

Question currentlyUnder investigation

AcknowledgementsAcknowledgementsNicolas Simonis (ULB Belgium)Didier Gonze (ULB, Belgium)Didier Croes (ULB, Belgium)Jacques van Helden (ULB, Belgium)Chris Orsi (HSC, Toronto)Mark Superina (HSC, Toronto)Gina Liu (HSC, Toronto)Shuye Pu (HSC, Toronto)Jim Vlasblom (HSC, Toronto)CCB Systems Support team (HSC, Toronto)___________________________________Ran Kafri (Weizmann Inst. Israel)Werner Mewes & teamMIPS(CYGD)Funding: EU CYGD project; CIHR; HSC

genome t.s. t.f biochemical pathway physical interactions functional interactions challenge of the...

Documents

metabolic networks

thousands of interactions

cellularprocesses metabolic

genetic interactions

complex interactions

huge networks of thousands

cellular circuitrynetworks

interactions definethe