bioinformatics and evolutionary genomics : pathway evolution
Post on 19-Dec-2015
227 views
TRANSCRIPT
Bioinformatics and Evolutionary Genomics :Bioinformatics and Evolutionary Genomics :
Pathway evolutionPathway evolution
Bioinformatics and Evolutionary Genomics :Bioinformatics and Evolutionary Genomics :
Pathway evolutionPathway evolution
What is a pathway ?
-An ordered set of proteins and substrates (boundaries)
-A graph
-A system (systems biology) (includes a notion of function, regulation)
-A set of proteins that “do something together” (includes complexes, regulatory and signalling pathways), a.k.a. a functional module
-A set of proteins that are co-regulated, or behave similarly in evolution
What is a pathway ?
-An ordered set of proteins and substrates (boundaries)
-A graph
-A system (systems biology) (includes a notion of function, regulation)
-A set of proteins that “do something together” (includes complexes, regulatory and signalling pathways), a.k.a. a functional module
-A set of proteins that are co-regulated, or behave similarly in evolution
Tracing the evolution of NADH:ubiquinone oxidoreductase (Complex I of the Tracing the evolution of NADH:ubiquinone oxidoreductase (Complex I of the oxidative phosphorylation), from 14 subunits (Bacteria) to 46 subunits oxidative phosphorylation), from 14 subunits (Bacteria) to 46 subunits
(Mammals) by comparative genome analysis(Mammals) by comparative genome analysis
Bacteria: 14 subunits
Algae: 30
Fungi: 37
Mammals: 46
Plants: 30
Name Name Bt Mm Tr Dm Ag Ce Nc Ca Yl Sc Sp At Atc Cr Bacteria
NU1M Chain1 NuoH NU2M Chain2 NuoN
NU3M Chain3 NuoA
NU4M Chain4 NuoM
NULM Chain4L NuoK NU5M Chain5 NuoL
NU6M Chain6 NuoJ
NUAM 75kD NuoG NUBM 51kD NuoF
NUCM 49kD NuoC
NUGM 30kD NuoD
NUHM 24kD NuoE NUIM TYKY NuoI
NUKM 20kD NuoB
ACPM SDAP COG0236
NUEM 39kD COG0702 N5BM B14.7
NESM ESSM
NI8M B8 NUYM AQDQ NOG07158
NIDM PDSW
NUFM B13
NUPM 19kD NIPM 15kD
Distribution of Complex I subunits among model species, inDistribution of Complex I subunits among model species, in red red identified at identified at the protein level (exp.), in the protein level (exp.), in yellow yellow at the gene level.at the gene level.
FungiFungiMammalsMammals Plants/AlgaePlants/AlgaeArthro.Arthro.
NUMM 13kD COG4391
N7BM B17.2 COG3761
NI2M B22
NB6M B16.6
NB8M B18
NB4M B14
NB2M B12
CI30 CI30 ZP_00241795
CI84 CI84
NIMM MWFE
NB5M B15
NIAM ASHI
NIGM AGGG
NISM SGDH
NUDM 42kD COG1428
N4AM B14.5a
NB7M B17
NUOM 9/10kD
N4BM B14.5b
NUML MLRQ
NINM MNLL
NIKM KFYI
NI9M B9
NUXM 20.9kD
NUZM 21.3a
NURM 17.8kD
Plant1 25/27kD FBP-like
Plant2 30/32kD FBP-like
Plant3 29kD FBP-like
Plant4 6kD
Plant5 8kD
Plant6 17kD
Plant7 NDH11
Plant8 NDH16
Plant9 9kD
Plant10 16kD
Plant11 19kD
Bt Mm Tr Dm Ag Ce Nc Ca Yl Sc Sp At Cr
Distribution of Complex I subunits among model species, inDistribution of Complex I subunits among model species, in red red identified at the protein identified at the protein level (exp.), in level (exp.), in yellow yellow at the gene level, in white at the DNA level.at the gene level, in white at the DNA level.
FungiFungiMammalsMammals Plants/AlgaePlants/AlgaeInsectsInsects
Reconstructing Complex I Reconstructing Complex I evolution by mapping the evolution by mapping the
variation onto a phylogenetic variation onto a phylogenetic tree. After an initial “surge” in tree. After an initial “surge” in
complexity (from 14 to 35 complexity (from 14 to 35 subunits in early eukaryotic subunits in early eukaryotic
evolution) new subunits have evolution) new subunits have been gradually added and been gradually added and
incidentally lost.incidentally lost.
Complex I loss is not always Complex I loss is not always “complete”, S.cerevisiae and “complete”, S.cerevisiae and S.pombe have retained 1 and S.pombe have retained 1 and
3 proteins3 proteins
Six of the eukaryotic Complex Six of the eukaryotic Complex I proteins have been I proteins have been
“recruited” from the alpha-“recruited” from the alpha-proteobacteriaproteobacteria
Beyond Blastology, Cogoly: Phylogenies for orthology Beyond Blastology, Cogoly: Phylogenies for orthology predictionprediction
The Complex I assembly protein CI30 has been duplicated in the Fungi. The Complex I assembly protein CI30 has been duplicated in the Fungi. This can explain the presence of a CIA30-homolog in Complex I-less This can explain the presence of a CIA30-homolog in Complex I-less S.pombeS.pombe
In the eukaryotic evolution of Complex I, new subunits have been added “all over” the In the eukaryotic evolution of Complex I, new subunits have been added “all over” the complexcomplex
Gabaldon et al, J. Mol. Biol 2005
Eukaryotic evolution of Eukaryotic evolution of Complex I contrasts in which Complex I contrasts in which individual subunits have been individual subunits have been added to a growing complex added to a growing complex contrasts with prokaryotic contrasts with prokaryotic
evolution in which separate, evolution in which separate, multi protein complexes multi protein complexes
appear to have been appear to have been assembled (T. Friedrich).assembled (T. Friedrich).
An explanation for this An explanation for this contrast is the “operon” contrast is the “operon” genome organization of genome organization of
prokaryotes, which facilitates prokaryotes, which facilitates the duplication of sets of the duplication of sets of
interacting proteins.interacting proteins.
CO G 0021CO G 0213CO G 2820
ribose phosphate metabolism (not cohesive at all)ribose phosphate metabolism (not cohesive at all)
CO G 0707CO G 0769CO G 0770CO G 0771CO G 0773CO G 0796CO G 0812CO G 1181
peptidoglycan biosynthesis pathway (highly cohesiveness, far from perfect)peptidoglycan biosynthesis pathway (highly cohesiveness, far from perfect)
Is this variation in subunits the exception or Is this variation in subunits the exception or the rule for functional modules?the rule for functional modules?
Very few functional modules are perfect; limited cohesiveness; functional units vs evolutionary units
Non-orthologous gene displacement/analogous proteinsNon-orthologous gene displacement/analogous proteins
Not specific to the “genome” age, but research into this topic has increased dramatically with the availability of complete genomes.(people would encounter “missing links”, and start hypothesizing about what could fill up this gap)
First systematic analysis on M.genitalium (Koonin et al., Trends Genet. 1997)
Not specific to the “genome” age, but research into this topic has increased dramatically with the availability of complete genomes.(people would encounter “missing links”, and start hypothesizing about what could fill up this gap)
First systematic analysis on M.genitalium (Koonin et al., Trends Genet. 1997)
The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: predicting anti-correlation / complementary patterns: predicting
analogous enzymesanalogous enzymes
The opposite of co-occurrence:The opposite of co-occurrence:anti-correlation / complementary patterns: predicting anti-correlation / complementary patterns: predicting
analogous enzymesanalogous enzymes
A B A B
Genes with complementary phylogenetic profiles tend to have a similar biochemical function.Genes with complementary phylogenetic profiles tend to have a similar biochemical function.
Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes
Complementary patterns in thiamin biosynthesis Complementary patterns in thiamin biosynthesis predict analogous enzymespredict analogous enzymes
Prediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmedPrediction of analogous enzymes is confirmed
(recent) Gene Duplication(recent) Gene Duplication(recent) Gene Duplication(recent) Gene Duplication
• fate after duplication: neofunctionalization or fate after duplication: neofunctionalization or subfunctionalization subfunctionalization
• GO process / molecular function / cellular componentGO process / molecular function / cellular component
• Substrate vs catalytic site / mechanismSubstrate vs catalytic site / mechanism
• fate after duplication: neofunctionalization or fate after duplication: neofunctionalization or subfunctionalization subfunctionalization
• GO process / molecular function / cellular componentGO process / molecular function / cellular component
• Substrate vs catalytic site / mechanismSubstrate vs catalytic site / mechanism
subfunctionalization: example in terms of protein subfunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)
subfunctionalization: example in terms of protein subfunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)
neofunctionalization: example in terms of protein neofunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)
neofunctionalization: example in terms of protein neofunctionalization: example in terms of protein complexes (=GO cellular component)complexes (=GO cellular component)
Sub vs neo in regulatory contextSub vs neo in regulatory contextSub vs neo in regulatory contextSub vs neo in regulatory context
OLD VIEW
NEW VIEW
Moore and Purugganan 2005 b
An example of a metabolic Pathway: Histidine Metabolism (including biosynthesis) in KEGGAn example of a metabolic Pathway: Histidine Metabolism (including biosynthesis) in KEGG
Histidine Biosynthesis in EcoCyc
Pathway evolution:Pathway evolution:
How to evolve a complex thing, when the intermediates don’t How to evolve a complex thing, when the intermediates don’t make sense make sense See the discussion regarding the evolution See the discussion regarding the evolution of the eye.of the eye.
Pathway evolution occurs at two levels:Pathway evolution occurs at two levels:
which substrate will be turned into which productwhich substrate will be turned into which product
Get the proteins to catalyze the required reactionsGet the proteins to catalyze the required reactions
Model of Horowitz (1945): “Retrograde evolution” (Back Model of Horowitz (1945): “Retrograde evolution” (Back propagation by gene duplication within the pathway)propagation by gene duplication within the pathway)
1)1) Given a good “soup”, first evolve the enzyme for the last Given a good “soup”, first evolve the enzyme for the last step of the pathway (the other intermediates are in the step of the pathway (the other intermediates are in the soup)soup)
2)2) Secondly, as the substrate of the last step is the product of Secondly, as the substrate of the last step is the product of the preceding step, the enzymes need similar binding sites the preceding step, the enzymes need similar binding sites duplicate the gene encoding the last step to evolve the duplicate the gene encoding the last step to evolve the last minus one steplast minus one step
3)3) Iterate step 2Iterate step 2
time
Gene duplication
Gene duplication
Horowitz model of pathway evolution
enzyme
End prod.substr.
We have data !! (no time machine), but we can test whether We have data !! (no time machine), but we can test whether homologous proteins tend to cluster in pathways.homologous proteins tend to cluster in pathways.
Some pathways do display such clustering.e.g. Tryptophane, Some pathways do display such clustering.e.g. Tryptophane, Histidine biosynthesis contain subsequent steps catalyzed by Histidine biosynthesis contain subsequent steps catalyzed by homologous proteinshomologous proteins
Teichmann et al, Trends Biotechn. 2001
Homologous proteins are overrepresented at short distances withinHomologous proteins are overrepresented at short distances withinpathways, supporting the Horowitz model.pathways, supporting the Horowitz model.
Alternative theory of pathway evolution:Alternative theory of pathway evolution:
Jensen, 1976: Enzyme recruitment in evolution of new Jensen, 1976: Enzyme recruitment in evolution of new functionfunction
Primordial enzymes were multifunctional (“substrate Primordial enzymes were multifunctional (“substrate ambiguity”)ambiguity”)Ordered pathways were evolved from these enzymes by Ordered pathways were evolved from these enzymes by gene duplication followed by specialization (recruitment)gene duplication followed by specialization (recruitment)
How many proteins are really multifunctional ?How many proteins are really multifunctional ?
Example: finding the fructose 1,6 biphosphate phosphatase in Example: finding the fructose 1,6 biphosphate phosphatase in the Archaeathe Archaea
Stec B, Yang H, Johnson KA, Chen L, Roberts MF.Stec B, Yang H, Johnson KA, Chen L, Roberts MF.MJ0109 is an enzyme that is both an inositol MJ0109 is an enzyme that is both an inositol monophosphatase monophosphatase and the 'missing' archaeal fructose-1,6-bisphosphatase.and the 'missing' archaeal fructose-1,6-bisphosphatase.Nat Struct Biol. 2000 Nov;7(11):1046-50.Nat Struct Biol. 2000 Nov;7(11):1046-50.
A number of multifunctional are being discovered but the A number of multifunctional are being discovered but the question remains whether multifunctional enzymes played a question remains whether multifunctional enzymes played a larger role in early evolutionlarger role in early evolution
Structural assignments and sequence comparisons were used to show that 213 domain families constitute approximately 90% of the enzymes in the small-molecule metabolic pathways. Catalytic or cofactor-binding properties between family members are often conserved, while recognition of the main substrate with change in catalytic mechanism is only observed in a few cases of consecutive enzymes in a pathway. Recruitment of domains across pathways is very common, but there is little regularity in the pattern of domains in metabolic pathways. This is analogous to a mosaic in which a stone of a certain colour is
selected to fill a position in the picture.(Teichmann et al., 2001) Pathway evolution operates mainly by recruitment, not by Horowitz’ retrograde evolution.(notice that this is not so surprising, given what we learned on day 2:Substrate specificities are relatively volatile aspects of the enzyme evolution, catalytic function is much better conserved the “conservation of substrate binding, evolution of catalytic function” argument is not really what one encounters in present day evolution This does not necessarily support the Jensen theory of substrate ambiguity.
Pathway duplication: co-duplicate multiple functional interacting proteins to together take a place in a new pathway.
Pathway duplication
Pathway duplication at the protein level: homologous (sometimes identical) proteins are used to catalyze a chain of similar reactions
propionateATP + CoA
acetate
AMP + PPipropionyl-CoA
2-methylcitrate
2-methylisocitrate
acetyl-CoA
citrate
isocitrate
succinate pyruvate succinate glyoxylate
citrate synthase
propionyl-CoA synthase acetyl-CoA synthase
aconitase
isocitrate lyase2-methyl isocitrate lyase
acinotase + prpD
2-methyl citrate synthaseH2O + oxaloacetate
CoA
H2O
Pathway duplication between (methyl)citric acid metabolism and Amino-Acid biosynthesis (Lysine, Leucine)
Lys20 homologous to LeuA, not GltA
HacAB homologous to LeuCD, Acn
PH1722 homologous to icd, LeuB
Methods: define paraCOGsMethods: define paraCOGsMethods: define paraCOGsMethods: define paraCOGs
all COGs& NOGs
HMMs Raw outputMSAs(Muscle) (HHmake) (HHsearch)Align
create HMM profiles
All vs. allprofile-profile
searches
Assign homology
Methods: define functional modulesMethods: define functional modulesMethods: define functional modulesMethods: define functional modules
Functional module: primary building block of biomolecular systems, i.e. metabolic or signaling pathway or protein complex
all COGs& NOGs
‘Rough’ functional modules
Functionally linked COG pairs, recalculated for genomic context links
only (npf)
STRINGdataset CFinder
Clustering
Specific functional modules
Iterative module subclustering
CFinder
Tracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductasesTracing the evolution of the NQR/RNF reductases
Duplication of NqrDE/RnfAE Duplication of NqrDE/RnfAE occurred prior to module occurred prior to module
duplicationduplication
Duplication of NqrDE/RnfAE Duplication of NqrDE/RnfAE occurred prior to module occurred prior to module
duplicationduplication
Reconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductasesReconstruction of the evolution of the NQR-RNF reductases
• Sub-functionalization on the protein complex levelSub-functionalization on the protein complex level
Redox-driven Na+-pump
Reductase of proteins involved in nitrogen-fixation
Pathway duplication is prevalent in signalling, transport pathways. (The evolution of the MAP kinase pathways: coduplication of interacting proteins leads to new signaling cascades. Caffrey DR, O'Neill LA, Shields DC. J Mol Evol 1999
Nov;49(5):567-82)
Pathway duplication in signaling pathways is:
1) Easy because one does not have to change the substrate specificity
2) Hard because one does not want too much crosstalk…
Is it one duplication of the entire pathway or stepwise duplication?
Pathway duplication in signaling pathways is:
1) Easy because one does not have to change the substrate specificity
2) Hard because one does not want too much crosstalk…
Is it one duplication of the entire pathway or stepwise duplication?
Pathway evolution scenariosPathway evolution scenarios