genome t.s. t.f biochemical pathway physical interactions functional interactions challenge of the...
TRANSCRIPT
Genome
t.s.
t.f
Biochemical pathway
Physical interactionsFunctional interactions
Challenge of the Post Genomic Era
Understand how genes andproteins interact to giverise to cellular processes
Protein 2
Protein 1
Living cell
Systems Biology: (according to Sydney Brenner (1997))Systems Biology: (according to Sydney Brenner (1997))
“The first requirement will be for a theoretical framework in which to embed all the detailed knowledge we have accumulated, to allow us to compute outcomes of the complex interactions and to start to understand the dynamics of the system.
The second will be to make parallel measurements of the behavior of many components during the execution by the cell of an integrated action in order to test whether the theory is right.
Is there some other approach?
If I knew it I would be doing it, and not writing about the problem. “
Huge networks of thousands of molecules interconnected via thousands of interactions,which depend on time, sub-cellularLocalization, cellular state
The biological function of a gene is defined by its interactions(physical or functional) with other genes/proteins,
These interactions definethe role it plays in the system
Major efforts devoted world-wide to characterizing interactions onthe genome scale: physical interactome, genetic interactions, cellularprocesses (metabolic, signal transdcution, regulatory pathways).
Descriptions of the cellular circuitry
Interactions & Networks:
Only a small fraction of the routes is charted:known metabolic, regulatory & signal transduction pathways..
Descriptions of the cellular circuitry
Networks and pathways
Pathways: the charted ‘routes’
Analyzing the Networks
I. Metabolic networks:pathway inference
and functional distanceusing weighted graphs
II. Protein-protein interaction networks:Identifying functional
modules by combineddata analysis
Pathway analysis
Protein interactions
Analysis of Metabolic Network
Stringing together chemicalreactions via their substratesand products:
Must be done in a biologicallyAnd chemically meaningful way:
Treatment of reversible reactionsTreatment of bulk compounds
I-Building the network:
P6P1
P7
P8
2.5 51.0 7.0
15.0
Weighted
A
B
R1 R2
R3
R1
R2
R3
A
B
C
D
E
Compound graph Reaction graph Bipartite graph
C
D
E
Metabolic networks as graphs
4.2.1.52H2O
dihydrodipicolinic acidL-Aspartic Semialdehyde
Pyruvate
3.5.1.18LL-diaminopimelic acid
succinateSucinyl diaminopimelate
H2O
Invalid pathways, through shortcut
4.2.1.52 H2OL-Aspartic
Semialdehyde3.5.1.18
LL-diaminopimelic acid
Reactions
The problem of pool metabolitescompound reactionsH2O 1615NAD+ 578NADH 569NADP+ 564NADPH 559Oxygen 527ATP 435Orthophosphate 349ADP 324CO2 323CoA 303H+ 272NH3 270Pyrophosphate 252UDP 190
Compound connectivity
0.1
1
10
100
1000
10000
1 10 100 1000 10000
number of reactions (avg=4.9, std=34.9)
nu
mb
er
of
co
mp
ou
nd
s
Reaction connectivityAll compounds
1
10
100
1000
10000
0 1 2 3 4 5 6 7
Number of reactants
Nu
mb
er
of
rea
cti
on
s
products (avg=2.0, std=0.6) substrates (avg=2.0, std=0.6)
Reaction connectivityWithout ubiquitous compounds
1
10
100
1000
10000
0 1 2 3 4 5 6
Number of reactants
Nu
mb
er
of
rea
cti
on
s
products (avg=1.3,std=0.5) substrates (avg=1.4, std=0.5)
van Helden et al., (2002); Croes et al. (2006)
CB
Propertiesof
metabolicnetworks
“hubs” (Jeong et al,2000)
Pool metabolites
Nu
mbe
r of
rea
ctio
n pa
irs
Nu
mbe
r of
rea
ctio
n pa
irs
Nu
mbe
r of
com
pou
ndsA
without pool metabolites
Metabolic network representations Croes et al., (2006)
P6P1
P7
P8
2.5 51.0 7.0
15.0
Weighted
A
B
R1 R2
R3
R1
R2
R3
A
B
C
D
E
Compound graph Reaction graph Bipartite graph
C
D
E
Raw bipartite graph
R1 R2
R3
A
B
C
D
E
(30)
(10) (25)
(67)
(15)
Weighted bipartite graph
Each compound is assigneda weight equal to its connectivity in the entire network
Each compound is assigneda weight equal to its connectivity in the entire network
Path finding in the global metabolic graph.
1
2
E2
E1
A very large # of pathslead from E1 to E2,
Which ones are meaningful?
A very simple approach:look for the shortest path I-in raw graph: II-in raw graph with ubiquitous compounds excluded: III-in weighted graph:
Path with min # steps
Lightest path:Path where sum of the weights is min
Path finding in metabolic graphs: the test
Known pathway: Lysine biosynthesis, E.coli
Rebuild the pathway How ??By searching for thelightest path between the1st and last reactions.in the ‘golbal’ graph
(http://www.scmbb.ulb.ac.be/pathfinding/)(http://www.scmbb.ulb.ac.be/pathfinding/)
Escherichia coli Lysine Biosynthesi s
Annotated
4.1.1.20
3.5.1.18
11-2,6-diaminopimelate
5.1.1.7
meso-2,6- diaminopimelate
2.3.1.117
n-succinyl-1-2-amino-6-oxopimelate
4.2.1.52
2,3-dihydrodipicolinate
2.7.2.4
4-phospho- l-aspartate
1.2.1.11
l-aspartic 4-semialdehyde
1.3.1.26
2,3,4,5-tetra hydrodipicolinate
2.6.1.17
n-succinyl-1-2,6- diaminopimelate
A)
Mode: Weighted Grap hMaximum weight: 2000
Maximum metabolic steps: 16Number of queries : 1
Number of path per query :5
All possible path havinga score from 28.0 to 38.0
4-phospho-l-aspartate
1.2.1.11
2.7.2.4
atp
adp
l-aspartate
l-lysine
R04336
l-2-amino- 6-oxopimelate
2,3-dihydrodipicolinate
1.3.1.26 1.3.1.26
2.3.1.117
n-succinyl-l-2- amino-6-oxopimelate
2.6.1.17
n-succinyl-l- 2,6-diaminopimelate
l-aspartic 4-semialdehyde
4.2.1.52
5.1.1.7
meso-2,6- diaminopimelate
co2
2,3,4,5-tetrahydrodipicolinate
ll-2,6-diaminopimelate
4.1.1.20
3.5.1.18
1.4.1.16
n6-acetyl-l-2,6 -diaminopimelate
3.5.1.47
n-acetyl-l-2- amino-6-oxopimelate
2.6.1.-
2.3.1.-
D)
Mode: Unweighted graph (Raw), highly connected compounds excluded
Maximum weight: 2000Maximum metabolic steps: 16
Number of queries : 1Number of path per query :5
All possible path havinga length from 6.0 to 8.0
4.1.1.12
l-alanine
beta-alanine
3.4.13.4
6.3.2.11
2.6.1.18
2.6.1.71
co2
beta-alanyl -l-lysine
l-lysine
4.1.1.114.1.1.15
atp
2.6.1.12
4.1.1.20
meso-2,6-diaminopimelate
l-aspartate
adp
4-phospho-l- aspartate
2.7.2.4
C)
Mode: Unweighted graph, Raw graphMaximum weight: 2000
Maximum metabolic steps: 16Number of queries : 1
Number of path per query :5
All possible path havinga length from 4.0 to 6.0
2.7.2.4
l-aspartate
adp
atp
4-phospho-l-aspartate
6.3.2.20
l-lysine
6.3.2.76.3.2.7
co2
4.1.1.20
meso-2,6- diaminopimelate
6.3.2.13
6.3.3.3
B)
Annotated Raw Raw with excluded compounds
Weighted
Annotated Raw Raw with excluded
compoundsWeighted
A. Shortest path
Representation Average sensitivity Average PPV Accuracy
Raw graph 31.4% 25.4% 28.4%Excludedcompounds
68.0% 63.0% 65.5%
Weighted Graph 88.5% 83.4% 85.9%B. Best among the 5 shortest paths
Representation Average sensitivity Average PPV Accuracy
Raw graph 33.3% 26.5% 29.9%Excludedcompounds
71.4% 66.7% 69.1%
Weighted Graph 92.2% 88.1% 90.1%
Test results on all known pathways ofE-coli and yeast (with >3 reactions)
What does this tell us??
Metabolic pathways seem to represent paths through least ‘promiscuous’ compounds
They seem to be ’specific’ in terms of the chemistry of their metabolites
This might reflect evolutionary pressure to keep the Nº of possible routes for chemical transformations of compound A to compound B, low...
What is the origin of such pressure?
Likely: the requirement to minimize the free energy balance
Biochemically feasible pathways maycorrespond to paths with lowestfree energy
This could be used in inferring pathwaysfor newly sequenced genomes
Network of physically interacting proteins
The physical interactome and protein complexes The physical interactome and protein complexes
B
Complexesare the cell’sfactories
-spliceosome-proteasome-ribosome-replication compl. -…-cytochrome bc1-----------------
Essential role!
Complexesare the cell’sfactories
-spliceosome-proteasome-ribosome-replication compl. -…-cytochrome bc1-----------------
Essential role! Our knowledge about them is still limited… - At best, only a fraction of the complexes is well characterized- Even for those, we know little about their stability, life times,- Also, components of complexes may change dynamically, as a function of cellular, state, localization, time etc.
Our knowledge about them is still limited… - At best, only a fraction of the complexes is well characterized- Even for those, we know little about their stability, life times,- Also, components of complexes may change dynamically, as a function of cellular, state, localization, time etc.
B
EA
Y
C
D
To what extent are multi protein complexes co-regulated at the transcriptional level?To what extent are multi protein complexes co-regulated at the transcriptional level?
c y b a d e
TF1TF1 TF2TF2 TF4TF4TF3TF3 ......
B
C
E
A
Y
D
YPR184w YKL085w YPR160w
YPR184w
YKL085w
YPR160w
YPR184w
YKL085w
YPR160w
Identify regulatory motifs,& use them to predict ‘regulons’(Simonis et al.2004a,b)
Here:
Identify transcriptional modulesin protein complexes
mRNA expression levels in ≠ conditions and timePoints----549 DNA chips
Protein complexes:
Entities identified by‘pull-down’ experiments, low/ high throughput
B
EAYC
D
+Other cellular proteins
B
C
E
= YLR258w
= YER133w
= YER054c
A
Y
D
= YPR184w
= YKL085w
= YPR160w
Bank of ORF's fused with a tag
Expression in yeast and lysis
Tandem affinity purification
Identification of components by
Mass Spec
Y
y TAPORF tag
1D SDS PAGE
MIPS Repositoryof curated data
on yeast protein complexesfrom individual labs,
total of 243 complexes
MIPS/CYGD & SGD database
Network ofS. CerevisieaMIPSComplexes(243)
GenePro(Cytoscape)
Heat Shock 5 min.
GenePro(Cytoscape)
Heat Shock 15 min.
GenePro(Cytoscape)
Spellman : cell cycle (Mol Biol Cell., 1998)
Rosetta compendium : mutants analysis (Hughes et al., Cell, 2000)
Gasch : stress response, drugs, carbon sources (Gasch et al., Mol Biol Cell., 2000)
6296
OR
Fs
549 conditions
Cleaning;normalising
Condition specific expression of complexesCondition specific expression of complexesCondition specific expression of complexesCondition specific expression of complexes
243
MIP
S
com
plex
es
549 conditions
Normalized expression ratios matrix
Identify conditions in which components of individual complexes are coherently expressed -> t-test
Identify conditions in which components of individual complexes are coherently expressed -> t-test
t-test p-value matrix
Conclusion I
Complexes can be subdivided into 2 categories:
1-Complexes coherently expressed in many conditions
-ribosome-proteasome-respiration chain complexes-..
2-Complexes coherently expressed in only a few conditions
-DNA repair complexes-cytoplasmic translation elongation
-cytoskeleton, microtubules...
Complexes can be subdivided into 2 categories:
1-Complexes coherently expressed in many conditions
-ribosome-proteasome-respiration chain complexes-..
2-Complexes coherently expressed in only a few conditions
-DNA repair complexes-cytoplasmic translation elongation
-cytoskeleton, microtubules...
Identify transcriptional modules withinProtein complexesIdentify transcriptional modules withinProtein complexes
Transcriptional modules: groups of genesthat are up and down regulated togetherunder the same set of conditions. They have coherent expression profiles
Transcriptional modules: groups of genesthat are up and down regulated togetherunder the same set of conditions. They have coherent expression profiles
Discriminant analysis based on expression ratios in different conditions
Discriminant analysis based on expression ratios in different conditions
Results:
Out of 63 complexes with >5 components (≠ expressed in at least 2 conditions)
-26 complexes have > 90% of their genes assigned to the complex
-15 complexes have between 50-90% of their genes assigned to the complex
-8 complexes have between 10-50% of their genes assigned to the complex
-14 complexes have none of their genes assigned
Out of 63 complexes with >5 components (≠ expressed in at least 2 conditions)
-26 complexes have > 90% of their genes assigned to the complex
-15 complexes have between 50-90% of their genes assigned to the complex
-8 complexes have between 10-50% of their genes assigned to the complex
-14 complexes have none of their genes assigned
Coat Complexes(25 genes; 45 selected conditions)
Coat Complexes(25 genes; 45 selected conditions)
18 genes assigned 7 genes unassigned
GenePro(Cytoscape)
Replication Fork ComplexReplication Fork Complex
GenePro(Cytoscape)
16 genes assigned14 genes unassigned
COMPLEX GENE ORF P(da)
POL1 YNL102W 0.95155315
PRI2 YKL045W 0.92757655
POL12 YBL035C 0.8740073DNA polymerase _ I primasecomplex
PRI1 YIR008C 0.14922267
POL32 YJR043C 0.88528264
HYS2 YJR006W 0.79712089DNA polymerase _ III
CDC2 YDL102W 0.79079457
POL2 YNL262W 0.85845235
DPB2 YPR175W 0.7575824DNA polymerase _ II
DPB3 YBR278W 0.64557167
Exonucleases RAD27 YKL113C 0.99432872
PCNA POL30 YBR088C 0.69668482
RFA1 YAR007C 0.98673268
RFA2 YNL312W 0.87875323Replication factor A complex
RFA3 YJL173C 0.73210048
TOP1 YOL006C 0.82395424Topoisomerases
TOP2 YNL088W 0.73032433
ECM32 YER176W 0.23499023DNA helicases
DNA2 YHR164C 0.05406037
DNA ligases CDC9 YDL164C 0.03708159
DNA polymerase _ IV POL4 YCR014C 0.03919368
DNA polymerase MIP1 YOR330C 0.03182497
REV7 YIL139C 0.05724429DNA polymerase z
REV3 YPL167C 0.04090946
RFC4 YOL094C 0.50870756
RFC5 YBR087W 0.33589936
RFC3 YNL290W 0.26089619
RFC2 YJR068W 0.10584985
Replication factor C complex
RFC1 YOR217W 0.05246123
RNase H1 RNH1 YMR234W 0.02989109
Replication ForkComplexes
Regulon predictedon basis ofpredictedregulatory motifs
(Simonis et al., 2004)
Replication ForkComplexes
Regulon predictedon basis ofpredictedregulatory motifs
(Simonis et al., 2004)
y = 0.6764x + 0.2066
R2 = 0.5906
0
0.2
0.4
0.6
0.8
1
1.2
0 0.2 0.4 0.6 0.8 1 1.2
P_Co-Regulation (Motifs)
P_C
o-R
egul
tion
(Exp
ress
ion)
Some complexes behave as a single transcriptional module
Others do not: they contain more than onemodule, usually a large and a smaller one,whose expressions are often anti-regulated;or none at all.
What are the biological reasons for this behaviour?
Some complexes behave as a single transcriptional module
Others do not: they contain more than onemodule, usually a large and a smaller one,whose expressions are often anti-regulated;or none at all.
What are the biological reasons for this behaviour?
Conclusion II
Question currentlyUnder investigation
AcknowledgementsAcknowledgementsNicolas Simonis (ULB Belgium)Didier Gonze (ULB, Belgium)Didier Croes (ULB, Belgium)Jacques van Helden (ULB, Belgium)Chris Orsi (HSC, Toronto)Mark Superina (HSC, Toronto)Gina Liu (HSC, Toronto)Shuye Pu (HSC, Toronto)Jim Vlasblom (HSC, Toronto)CCB Systems Support team (HSC, Toronto)___________________________________Ran Kafri (Weizmann Inst. Israel)Werner Mewes & teamMIPS(CYGD)Funding: EU CYGD project; CIHR; HSC