the slow road to the eukaryotic genome
TRANSCRIPT
The slow road to theeukaryotic genomeLeo Lester, Andrew Meade, and Mark Pagel*
SummaryThe eukaryotic genome is a mosaic of eubacterial andarchaeal genes in addition to those unique to itself. Themosaic may have arisen as the result of two prokaryotesmerging their genomes, or from genes acquired froman endosymbiont of eubacterial origin. A third possibilityis that the eukaryotic genome arose from successiveevents of lateral gene transfer over long periods of time.This theory does not exclude the endosymbiont, butquestions whether it is necessary to explain the peculiarset of eukaryotic genes.Weusephylogenetic studies andreconstructions of ancestral first appearances of geneson the prokaryotic phylogeny to assess evidence for thelateral gene transfer scenario. We find that phylogeniesadvanced to support fusion can also arise from asuccession of lateral gene transfer events. Our recon-structions of ancestral first appearances of genes revealthat the various genes that make up the eukaryoticmosaic arose at different times and in diverse lineageson the prokaryotic tree, and were not available in a singlelineage. Successive events of lateral gene transfer canexplain the unusual mosaic structure of the eukaryoticgenome,with its content linked to the immediate adaptivevalue of the genes its acquired. Progress in under-standingeukaryotesmaycome from identifyingancestralfeatures such as the eukaryotic splicesome that couldexplain why this lineage invaded, or created, the eukar-yotic niche. BioEssays 28:57–64, 2006.� 2005 Wiley Periodicals, Inc.
Introduction
The phylogenetic placement of the eukaryotes among the
prokaryotes has been called ‘‘evolution’s sorest spot’’.(1)
Almost thirty years ago, Woese suggested a classification
system that divided life into three domains.(2) the Eukaryota,
the Eubacteria and the Archaea. Phylogenetic studies using
ribosomal markers and other essential genes gave support to
Woese’s view by showing that the domains were monophy-
letic,(3–5) and primordially duplicated genes placed the root
of all life within the eubacteria, leaving the archaea and
eukaryotes as sister taxa.(6,7)
Recent genomic studies have begun to complicate this
picture.As further geneshavebeensequenced, so theposition
of theeukaryoteshasbeen found to jump.RibosomalRNAmight
place the eukaryotes alongside the archaea, but other genes
make them sister to the eubacteria.(8) More generally, so-called
informational genes, those involved in essential housekeeping
duties, return topologies in which the eukaryotes and archaea
are sisters, while metabolic genes give trees which place
the eukaryotes closer to the eubacteria.(8,9) These alternative
phylogenies stem from the unusual nature of the eukaryotic
genome: it turns out to be a mosaic of the prokaryotic
domains.(10–12)
How this mosaic formed is fundamental to theories of the
origin and early evolution of the lineage that eventually gave
rise to the eukaryotes. One idea proposes that the mosaic
genome arose from an ancient fusion event between an
archaeon and a eubacterium,(10) possibly deriving from a
symbiotic relationship between the two. Whether or not this
early fusion ever occurred, a second idea links the mosaic
to the endosymbiotic origin of the mitcohondria.(13,14) Theory
proposes that the endosymbiont was eubacterial(15)
and that, over time, many of its genes transferred to the
eukaryotic nucleus.(16) Whether genes were acquired from an
endosymbiont or some other source, it has been suggested
that the majority coded for metabolic capabilities,(17,18)
although more recent work shows that informational genes
of apparently eubacterial origin can also be found in the
eukaryotic nucleus.(19)
Either of the fusion hypotheses can explain the broad
features of the eukaryotic mosaic—its complements of
eubacterial and archaeal genes—but equally neither excludes
a third possibility; this is that myriad lateral gene transfer
events among the prokaryotes over long periods of time slowly
built up a unique lineage, one that we now recognise as
eukaryotic.(20) This slow-drip scenario does not doubt the
existence of the endosymbiont, but questions whether it is
necessary to invoke it to explain the nuclear mosaic.(17,20)
As part of an already functioning organism, many of the
endosymbiont’s functions may have been redundant or
unnecessary,(22) and itmay simply have lostmanyof its genes.
School of Animal and Microbial Sciences, The University of Reading
University, UK.
Funding agency: Biotechnology and Biological Sciences Research
Council (UK); Grant numbers: 45/G14980 and 45/G19848 awarded to
Mark Pagel.
*Correspondence to: Mark Pagel, The University of Reading Uni-
versity, School of Animal and Microbial Sciences, Whiteknights, PO
Box 228, Reading RG6 6AJ, UK. E-mail: [email protected]
DOI 10.1002/bies.20344
Published online in Wiley InterScience (www.interscience.wiley.com).
BioEssays 28:57–64, � 2005 Wiley Periodicals, Inc. BioEssays 28.1 57
Problems and paradigms
Our interest here is to assess the evidence for the slow-drip
hypothesis, using phylogeneticmethods applied to data on the
presence and absence of genes in both prokaryotes and
eukaryotes. We first show how successive events of lateral
gene transfer can produce both a mosaic genome and
phylogenetic patterns indistinguishable from those that fusion
arguments predict. We then use statistical models to
reconstruct probable first appearances of the metabolic and
informational genes that make up the eukaryotic mosaic. We
find that the number of genes that canbeexplainedby fusion or
slow-drip theories—asopposed tobeingpresent ancestrally—
is small when compared to a typical prokaryotic genome. We
show that thesegenes first appeared in theprokaryotesat very
different times and in diverse lineages: they were not all
available in a single species for a fusion or endosymbiosis
event to transfer them to the eukaryote. We conclude with a
discussion of how the slow-drip view fitswith our knowledge of
the origin of the peculiar features associated with eukaryotes
and with features of the mitochondrial proteome.
Fusion and a ring of life
Fusion provides an explanation for how the eukaryotic
genome acquired its unusual mosaic collection of genes. This
supposition, though, never received specific support until a
recent novel phylogenetic argument purporting to find not a
tree of life, but a ring of life.(23) Lake and Rivera(23) show that
when two genomes fuse to produce a new third species, a
peculiar distribution of phylogenetic trees is expected to arise
from resamplings of the original data. This set of trees has the
property that, in some, the fusion species clusters closest to
one of the putative donors and, in other trees in the set, it
clusters with the other donor species. Significantly, intermedi-
ate trees, in which the fusion species clusters with other
species that fall phylogenetically somewhere between the two
donors, are not expected. This means that the trees in the set
can be written as permutations of a cycle graph in which the
order of species is preserved.
Fig. 1 reproduces Lake and Rivera’s argument. For three
species (X, Y, Z), there are eight possible combinations of
the presence and absence of a gene. All these combinations
are possible but some are more likely than others depending
upon the phylogenetic relationships among the three species.
Now consider that species X and Y fuse their genomes to
produce a new species W. Whenever X or Y has a gene, W
receives it. Species Z is not part of the fusion. Of the eight
original combinations, only two are phylogenetically informa-
tive about the relative position of W with respect to X or Y. One
of these favours a tree that placesW next to X, the other a tree
in which W is placed next to Y. If the trees are aligned around
speciesW, they forma repeating pattern that can be described
by the ring shown there. A graphical interpretation of this result
is that of a ring of life: W has received genes from both sides of
a phylogenetic ring that joins X, W and Y.
Rivera and Lake(23) applied this logic to the analysis of gene
presence/absence data from three eubacterial, three archaeal
and two eukaryotic species, using their method of conditioned
reconstruction.(24) With genomes of unequal length, it is
impossible to say with certainty how similar or different two
species are. The reason is that, of the four categories of gene
presence/absence in two species, the proportion of genes in
the ‘both-genes-absent’ category is arbitrary, depending upon
the length of the longer genome. If one species were to have a
genome of length 100 and two others genomes of length 75,
there would be up to 25 genes present in the species
with the longest genome that may not be present in either of
the genomes of the other two. These other two will look more
similar than perhaps they really are by virtue of being identical
on these arbitrary shared-absence sites. Lake andRivera’s(24)
‘conditioned reconstruction’ algorithm chooses a genome
against which to condition all of the other genomes, thereby
giving a definable upper limit in any given data set to the
proportion of absent/absent sites.
In two separate phylogenetic analyses using their condi-
tioned data, Rivera and Lake found sets of alternative
topologies as predicted from the fusion argument. In one, five
alternative topologies accounted for 96.3% of the trees
observed and, in the other, five alternative trees accounted
for 87.1% of the results. Higher proportions could be obtain-
ed by collapsing some nodes. All the alternative topologies
Figure 1. Drawn after Lake and Rivera (Fig. 3).(24) The table
shows the eight possible combinations of gene presence and
absence for the three species X, Y and Z. The eight
combinations arise with varying probabilities, labelled a to h,
whichmust sum to 1.0.W is a fusion of X andY such that a gene
is present in W if it is present in either donor genome. Only two
patterns in the table are phylogenetically informative about the
relative position of W with respect to X or Y; these are denoted
i and ii. The phylogenies inferred from these two sites are
represented in the linear diagrams labelled i and ii respectively.
These linear diagrams do not show relationships in theway of a
conventional unrooted tree, but in terms of which species
comes out next to which. In i, W comes out next to Y, and Z next
to X; in ii, W is next to X. The remaining patterns yield
unresolved trees. The two resolved topologies that emerge
from a fusion can be visualised as a ring (diagram iii).
Problems and paradigms
58 BioEssays 28.1
could be aligned as in Fig. 1 to represent permutations of a
cycle graph, with some placing the eukaryotes closer to the
archaea, others closer to the eubacteria, and some placing
them in between the two domains. Rivera and Lake(23) con-
clude that the three domains of life are connected not as a
phylogenybut asa ring of life inwhich fusion has causedgenes
to flow into the eukaryotes from both prokaryote domains.
Lateral gene transfer and a ring of life
Thestrength of evidence that a ring-like tree structure provides
for fusion depends upon whether other processes could also
produce the predicted set of trees. Consider, in Fig. 1, thatW’s
genomemight have been shaped, not by a single fusion event,
but by successive events of lateral gene transfer from species
additional to those shown. Let a donor other than X contribute
one of the phylogenetically informative genes, or a donor other
than Y contribute the other. The trees characteristic of the ring
structure will still describe these species’ data even though no
fusion occurred. Phylogenetic similarity among closely related
species means that many will have similar complements of
genes, making them possible donors of the phylogenetically
informative genes.
A simple computer simulation shows that this argument can
be applied more generally. We simulated gene presence/
absence data for 2000 genes on a tree of eight taxa (Fig. 2A),
imagining them to consist of four eubacterial (B1–4) and four
archaeal (A1–4) species. Each of our species, therefore, had a
string of 2000 presence/absence codes, corresponding to
whether or not the species carried the gene at that locus. By
simulating the data, we avoid the problems of genomes of
unequal lengths for which the conditioned reconstruction
method was proposed. We inferred phylogenies using
standard maximum likelihood methods for binary data,
although simpler parsimony methods give the same results.
The simulated data returned the original tree.
We chose two of the simulated genomes, one from the
eubacterial domain (B4), the other an archaeon (A1), to act as
donors to form a hybrid ‘eukaryotic fusion’ genome (E)
comprising a complete hybrid of the two donors: if either or
both of the donors held a gene, then the hybrid did aswell, just
as in the logic of Fig. 1. Next, we simulated evolution by
allowing the original species’ genomes to gain or lose genes
over a number of generations. A gene could only be gained by
a species if it was present in at least one other species; that is,
genes were not created de novo. Gains can, therefore, be
treated as events of lateral gene transfer. For the original eight
genomes, stabilizing selection was imposed so that mutations
(genes gained or lost) that moved the evolving genome away
from the simulated starting point were fixed at a lower
probability than mutations that moved it back. The genome
of the fusion species (E) was treated differently. At the
beginning of the generational simulations, we assigned it the
genome of one of its original donors (A1). We then imposed
directional selection on this genome such that newly acquired
genes that moved it towards the hybrid fusion genome were
fixed at a higher probability than were others. The ratios of
fixation probabilities for positive and negative mutations were
the same for all species.
We ran the generational simulation, allowing species to
gain or lose genes each generation, until an equilibrium
similarity was reached between the evolving genomes and
either their fixed starting points, or, in the case of the hybrid
fusion species, the fixed endpoint. This gave us evolved data,
influenced by lateral gene transfer, for each of the species.
Removing the hybrid fusion species E, the evolved data
reproduced the original tree. Analysing the simulated data
with the hybrid ‘eukaryote’ included produced six alternative
Figure 2. A: The tree used in the simulations of lateral gene
transfer. We imagine four eubacterial (B1–4) and four archaeal
(A1–4) species, giving a total of eight taxa. These correspond to
the labels in Fig. 1 as follows: B4¼X, andA1¼Y. The remaining
six species can be seen as various Zs: background species not
involved in the ‘fusion’.B: The trees arising from the simulation
of lateral gene transfer, and drawn to show the linear
permutations (after Rivera and Lake(23)). In addition to the
eight taxa in the tree in part A, there is now the single hybrid
eukaryotic species (here labelled E, but corresponding to Fig.
1’sW), giving a total of nine taxa. If our simulated data produce
a ring then we expect all the topologies to conform to
permutations of the basic B-E-A order. These permutations
are shown to the left of each topology and the cumulative
percentages of the topologies in the bootstrap sample to the
right. All six topologies are linear permutations of an underlying
cycle graph, despite some phylogenetic uncertainty within
‘domains’. Lateral gene transfer can produce cycle-graphs.
Problems and paradigms
BioEssays 28.1 59
topologies that cumulatively account for 97% of the observed
trees (Fig. 2B). By aligning these six topologies around the
‘eukaryote’ species, it can be seen that they are permutations
of an apparently underlying cycle graph linking the eubacterial
(B), eukaryotic (E) andarchaeal (A) species.All the topologies,
despite some phylogenetic uncertainty within ‘domains’,
conform to permutations of the basic B-E-A order, as shown
to the left of each topology. In some trees, the ‘eukaryotes’ are
ambiguously between the two ‘prokaryotic’ domains; in others,
they are clearly with either the ‘eubacteria’ (2.7%) or the
‘archaea’ (17.6%).
The simulations show that sets of trees conforming to
permutations of a cycle graph, as in Fig. 1, can arise solely
from a succession of lateral gene transfer events, and do not
require fusion. An objection might be that all that is shown by
these simulations is that we can impose directional selection
on a genome. But the deeper phylogenetic issue is that in any
data set in which there has been lateral gene transfer, species
will show affinities to more than one other species at the gene
level: lateral gene transfer produces conflicting phylogenetic
signals. If the conflicting signals in real data are divided among
other species, the sort of phylogenetic cycle graph that we
have produced can be observed in bootstrap samples.
Phylogenetic similarity among closely related species ensures
that many can act as donors and still give an apparently ring-
like result: stochastic variation means that there will always be
two that contributemore than others. No special mechanism is
required to explain species or lineages moving about in
phylogenetic trees. It is a consequence of conflicting phyloge-
netic signal, which has many causes.
Ancestral states
Even if lateral gene transfer canproduce cycle-graphsof trees,
the eukaryote genome is unusual in having so many of its
genes seemingly derived from the eubacterial domain,(19) this
is the feature often taken as evidence either of a fusion or an
endosymbiotic origin of the eukaryotic genome. How else
could the presence of somanygenes be explained? Yet fusion
or endosymbiont theories must also explain why the many
genes available froma fusion or endosymbiotic eventwould be
retained, unless they had some immediate adaptive value.
Successive events of lateral gene transfer provide a plausible,
if pedestrian, mechanism: the many genes in eukaryotes of
apparently eubacterial origin are there because they have
been acquired over time for their adaptive value at the time of
acquisition, and not for some possible future function.
We cannot now assess the possible advantages these
genes conferred, butwecanexamine two issues that reflect on
the plausibility of the lateral gene transfer explanation. One of
these is to determine how many genes need explaining. The
eukaryotic genome is large and, although lateral gene transfer
is common,(25–27) we wish to know whether it is common
enough to explain the eukaryotic mosaic. The second issue is
related to the first. Whereas fusion theories for the mosaic
identifya single source, lateral gene transfer allows amultitude
of sources. We can use ancestral reconstructions to identify
the probable first appearances, on the prokaryotic tree, of
genes that are now found in the eukaryotic mosaic. These will
show whether they tend to be confined to a single lineage and
were all available at one time, or whether their appearances
are distributed throughout the prokaryotic tree.
The NCBI Clusters of Orthologous Groups, or COGs,
database(28) and the more recent dataset from Esser et al(19)
can be used to investigate both these questions. The COGs
database records the presence or absence in a large number
of prokaryotic species plus several eukaryotes of 2597 sets of
orthologousproteinswith informational ormetabolic functions.
Esser et al. searched all the genes in the yeast (Sacchar-
omyces cervesiae) genome, identifying 850 that had possible
homologues among the prokaryotes. The two numbers differ
because the COGs data set does not require that a COG
includes a eukaryote. We used both data sets to infer the
probable ancestral states of genes (present or absent) at each
of the nodes of a phylogeny of the prokaryotes and eukaryotes
(Fig. 3). Ancestral states were inferred from maximum
likelihood statistical methods, allowing for unequal rates of
gains and losses on the tree, and fitting a separate model to
each gene.(29)
We first sought to identify the subset of genes in the
eukaryotes that are candidates for possible explanation either
by fusion or lateral gene transfer from the eubacteria. These
are genes that are found in both the eubacteria and in at least
one of the eukaryotes, but are inferred not to be ancestral
either to thearchaeaor to the commonancestor of thearchaea
andeukaryotes.Our criteria ensured that genes inferredasnot
ancestral were generally absent in all, or nearly all, extant
archaea. This may mean that we overestimate the size of the
candidate set for horizontal transfer. Nevertheless, it is
possible that some of the genes that we identify as absent
may have been ancestrally present but were later lost in all
archaea.
Our procedure identifies 1100 orthologues from the COGs
database and 665 genes from the Esser et al. data. The
discrepancy in numbers probably arises because the COGs
data are constructed using more lenient sequence-similarity
rules than Esser et al. We also had two eukaryotes
(S. cerevisiae and Schizosaccharomyces pombe) in the
COGs data to match against the prokaryotes, compared to
just S. cerevisiae in the Esser et al. data set. Metabolic genes
predominate in both sub sets (65% of the COGs and 60% of
the Esser sample), but large numbers of informational genes
have also apparently been gained from the eubacteria.
This accords with Esser et al.’s finding that eukaryotes, as
represented by yeast, have more genes of eubacterial
ancestry than archaeal ancestry.(19)
Problems and paradigms
60 BioEssays 28.1
Our first question concerned whether the number of genes
in the candidate sets calls for special explanations. In fact,
these numbers may suggest that lateral gene transfer has
played a smaller role in eubacterial evolution than is some-
times assumed. Given the timescales involved in the evolution
of eukaryotes, rates of lateral gene transfer would have to rise
little higher than 1�10�6 genes/year to account for 1100
genes. Taking a typical gene to be around 1 kb in length, this is
within the rates of 16 kb laterally transferred per million years
inferred for Escerichia coli.(30)
To examine our second question, that of the phylogenetic
distribution of the candidate genes,we inferred thepoint of first
appearance of each gene on the tree in Fig. 3. In reconstruct-
ing first appearances on the tree, we adopt a liberal criterion,
requiring only a 70% confidence in the inference, well below
the conventional 95% criterion.(31) The numbers shown at
each node record the number of genes reconstructed to have
first appeared at that node: the first number corresponds to the
reconstructions for the COGs database and the second to
those from Esser et al. At the base of the tree, the COGs data
suggest that a greater proportion of the geneswere ancestral.
This almost certainly reflects the more lenient definition of a
homologue in that data set. By using a less-strict criterion for a
match between two genes, the COGs genes tend to be more
widely phylogeneticially distributed, resulting in more being
reconstructed as present at the base.
Despite these initial differences, the overall pattern in Fig. 3
is one of the gradual accumulation of genes over long periods
of time and in phylogenetically diverse lineages. The various
metabolic and informational functions that eukaryotes ac-
quired were not all invented in one lineage but arose, perhaps
in response to varying environmental demands, in a variety of
lineages. No single lineage carries a large proportion of the
candidate genes at any one time, and only a relatively small
number of genes is reconstructed to have been present early
in eubacterial evolution. This accords with analyses of whole
genomes showing that eukaryotes share genes with a wide
range of prokaryotes, with no single prokaryotic species
dominating.(19,32,33) In a few branches of the tree, compara-
tively large numbers of genes do arise. These tend to
correspond to a recent radiation of clades or groups of
species, suggesting that diversification into new niches
required new kinds of genetic functions.
The eubacteria are now highly genetically diverse, but logic
and the data give no reason to believe that they started outwith
complex genomes and then diversified by a process of
sculpting away unnecessary or irrelevant tranches of genes,
even if specific species have undergone reductive evolution. In
combination with Fig. 3, this leaves fusion theories with an
awkward choice. For the fusion partner to have had enough
genes to explain the contemporary data, the event would have
had to have taken place near the tips of the prokaryotic tree.
But the tips of the tree extendback a fewhundredmillion years,
Figure 3. Reconstructed first appearances of genes identified
ascandidates for horizontal transfer to the eukaryotes (see text).
The tree is drawn to capture the main features of prokaryote
phylogeny supported in part and whole by several recently
published papers.(58–62) Different trees will alter specific details
but not the broad patterns. Gene presence/absence data were
taken both from the NCBI COGs database(28) and from Esser
et al.(19) The ancestral reconstruction programme used was
BayesMultiState.(29,31) Genes were identified as candidates for
horizontal transmission to the eukaryotes from the eubacteria if
present in the eubacteria and in at least one of the eukaryotes,
but which are inferred not to be ancestral to the archaea or to
the common ancestor of the archaea and eukaryotes. This
procedure identifies 1100 orthologues from the COGs database
and 665 genes from the Esser et al. data. In reconstructing first
appearances on the tree, we adopt a liberal criterion, requiring
only a 70% confidence in the inference, well below the con-
ventional 95% criterion.(31) The first set of numbers above each
node relate to the number of genes reconstructed from the
COGs database to have first appeared at that node. The second
set of numbers represents the same subset but this time drawn
from the Esser et al. dataset. Genes tend to appear gradually
throughout the tree and in diverse lineages: no one lineage has a
large proportion of the total genes. Our liberal reconstruction
criterion will tend to reconstruct genes earlier in prokaryote
evolution than a stricter criterion would. The tree does not
reconstruct the loss of genes, only the point at which new genes
are gained. For this reason, the numbers at sequential nodes
shouldnot beadded together to find the total numberofgenes for
a particular species. Nodes given the number zero signify
lineages in which there has been no net gain of new genes.
Problems and paradigms
BioEssays 28.1 61
not the 1.5 billion needed for the origin of the eukaryote
lineage.(34) If an earlier node is identified as the fusion partner,
then so much subsequent lateral gene transfer must be in-
voked to complete the eukaryotic set of eubacterial genes that
the fusion event ceases to be a revolutionary point of origin.
A potentially confounding influence in Fig. 3 is that the
genes reconstructed as first appearing near the tipsmay be so
highly divergent as not to be recognised as present in other
eubacterial species. If this were true, their first appearances
should be reconstructed earlier in the tree. This seems unlikely
because each of the genes that we reconstruct in Fig. 3 has
been identified as being present in yeast as well as in at least
one eubacterium. The Esser et al. data set, by providing
measures of sequence similarity between each yeast gene
and its eubacterial homologue, allows a test of this possibility.
Fig. 4 plots the average similarity scores for genes recon-
structed tomake their first appearance at varying phylogenetic
distances from the root of the tree. If the rate of evolution does
confound the result, we would expect that the average
similarity should be lower for genes reconstructed higher up
the tree. The figure shows that, in contrast to this expectation,
the range of similarity scores for genes reconstructed the
furthest from the root is comparable to that for genes
reconstructed as appearing at the root of the tree.
Another possibility is that these results are dependent on
the tree. The topology of the tree in Fig. 3 is conventional in
sharing many similarities to published phylogenies of the
prokaryotes. Cavalier-Smith has suggested that the eukar-
yotes and archaea should be placed on the actinobacterial
branch.(35)Whenweperform our ancestral reconstructions on
this tree, the same patterns of gradual accumulation still
emerge.
Conclusions
We find theoretical and empirical support for the notion that
successive events of lateral gene transfer over time, without
recourse to fusion or endosymbionts, can explain the broad
outlines of the eukaryotic mosaic. The presence of the
mitochondrion attests to endosymbiosis having occurred in
the eukaryotic lineage, and instances of transfer of genes from
the mitochondrion to the nucleus are well documented,(16,36)
butwedonot findempirical evidence compelling us towardsan
interpretation that relies upon such transfers to explain the
presence of the eubacterial fraction of genes in eukaryotes.
While it is true that the vast majority of the genes that code
for the mitochondrial proteome are found within the nu-
cleus,(37,38) comparatively few can be traced back to the
putative mitochondrial ancestor.(39,40) We compared a recent
list of the 750 genes that constitute the mitochondrial
proteome(38) to the 850 (nuclear) genes in the Esser et al.
dataset. Only 62 of these proteins had a homologue within the
prokaryotes, and, of those, only 51 had a homologue amongst
the eubacteria. The remaining genes in the mitochondrial
proteome are evidently eukaryotic inventions: most of the
endosymbiont’s genes have simply been lost.(17,21,40) This
may not be surprising. The transfer of genes from the
mitochondrion to the nucleus requires a sophisticated set of
controlling proteins and for each stage to retain its function-
ality.(16,39) The loss of genes from the mitochondrion might be
better understood as similar to the reductive evolution that
occurs in the genes of obligate parasites.(41)
The difference between the prokaryotes and eukaryotes in
terms of cell structure, genetics and evenmolecular biology, is
so great that it was for long seen as the central divide in
biology.(42,43) There are no obvious transitional forms between
the two(44) and the gulf can at times seem insuperable,
demanding of some saltational event. Indeed, the sequential
evolution of prokaryotes to eukaryotes has long been
doubted,(45) even though no biologist now questions that all
life is related.(46) Such doubtswere not misplaced: eukaryotes
are not a product of sequential evolution in the conventional
Darwinian sense; they are almost certainly a product of the
prokaryotic domains.(19,47) The absence of contemporary
mechanisms of lateral gene transfer in eukaryotes is some-
times taken as evidence that the eukaryotic hybrid could not
have arisen from events of this kind. But yeast are a derived
lineage and what matters is whether there could have been
Figure 4. The plot records the average sequence similarity
between genes in eubacteria and their yeast homologue for
each of the 665 genes in the Esser et al.(19) data, which
have been identified as candidates for lateral transfer from
the eubacteria to the eukaryotes. These are plotted against
the distance (path length) from the root of the tree to the node
where the gene is reconstructed to have first appeared among
the eubacteria. Although there is a trend for older genes
(reconstructed as having arisen nearer the root) to show higher
sequence similarity, this is an artefact caused by the second
group of genes from the left. Even genes reconstructed near
the tip of the tree can have high sequence similarity. Where a
gene is reconstructed to arise appears not to be confounded by
its rate of evolution.
Problems and paradigms
62 BioEssays 28.1
lateral gene transfer to the progenitor of all eukaryotes.
This supposition seems reasonable: lateral gene transfer
occurs in prokaryotes; phagocytosis, on the other hand,
seems not to.(48)
Understanding ‘eukaryote’ evolution partly rests on the
question of what is meant by their origin. Is endosymbiosis
the sine qua non of eukaryotes,(49) or do eukaryotes predate
the first endosymbiosis event?(42,50) Most theories that
attempt to describe the evolution of the eukaryotes rest on
the former supposition: some form of fusion is required
between prokaryotic cells, and it is the resultant symbiosis that
forms the eukaryotic lineage.(13,18) Even though there are no
extant primitively amitochondriate species,(51) suchorganelles
cannot be seen as the diagnostic trait of eukaryotes, for many
extant taxa have lost their mitochondria.(50,52) Whilst endo-
symbiosis may have occurred very soon after the origin of
eukaryotes, it need not be seen as the process that formed
them.(50)
Modern ‘eukaryoteness’, then, is more than just a peculiar
genome and the presence of organelles. It is a set of traits that
is distinctive in its combination(48) and likely emerged not in a
singular event, but through the pulling together of many
different threads that only became available through time.
Individually, many of the genes that make the eukaryotic
mosaic distinctive are found in disparate taxa, spread across
the prokaryote phylogeny.(53–56) What requires a special
explanation is not the presence of those genes, but why it
was that the eukaryote lineage that seemingly invaded, or
perhaps invented, a new ecological niche in which a mixed
complement of archaeal, eubacterial and, eventually, many
new eukaryotic genes and structures would be required.
Recent evidence for an ancient origin of the unique and
sophisticated eukaryotic splicesosome(57) opens one promis-
ing avenue of research to this most unusual lineage.
References1. Martin W, Embley TM. 2004. Early evolution comes full circle. Nature
431:134–136.
2. Woese CR, Fox GE. 1977. Phylogenetic structure of the prokaryotic
domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090.
3. Brown JR, Doolittle WF. 1995. Root of the universal tree of life based on
ancient aminoacyl-tRNA synthetase gene duplications. Proc Natl Acad
Sci USA 92:2441–2445.
4. Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ. 2001.
Universal trees based on large combined protein sequence data sets.
Nat Genet 28:281–285.
5. Daubin V, Gouy M, Perriere G. 2002. A phylogenomic approach to
bacterial phylogeny: evidence of a core of genes sharing a common
history. Genome Res 12:1080–1090.
6. Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989. Evolutionary
relationship of archaebacteria, eubacteria and eukaryotes inferred from
phylogenetic trees of duplicated genes. Proc Natl Acad Sci USA 86:
9355–9359.
7. Gribaldo S, Cammarano P. 1998. The root of the universal tree of life
inferred from anciently duplicated genes encoding components of the
protein-targeting machinery. J Mol Evol 47:508–516.
8. Ribeiro S, Golding GB. 1998. The mosaic nature of the eukaryotic
nucleus. Mol Biol Evol 15:779–788.
9. Rivera MC, Jain R, Moore JE, Lake JA. 1998. Genomic evidence for two
functionally distinct gene classes. Proc Natl Acad Sci USA 95:6239–
6244.
10. Gupta R. 1997. Protein phylogenies and signature sequences: evolu-
tionary relationships within prokaryotes and between prokaryotes and
eukaryotes. Antonie van Leeuwenhoek 72:49–61.
11. Martin W. 1999. Mosaic bacterial chromosomes: a challenge en route to
a tree of genomes. BioEssays 21:99–104.
12. Katz LA. 2002. Lateral gene transfers and the evolution of eukaryotes:
theories and data. Int J Syst Evol Microbiol 52:1893–1900.
13. Martin W, Muller M. 1998. The hydrogen hypothesis for the first
eukaryote. Nature 392:37–41.
14. Lopez-Garcia P, Moreira D. 1999. Metabolic symbiosis at the origin of
eukaryotes. Trends Biochem Sci 24:88–93.
15. Margulis L. 1970. Origin of Eukaryotic Cells. New Haven: Yale University
Press.
16. Martin W, Herrmann RG. 1998. Gene transfer from organelles to the
nucleus: how much, what happens, and why? Plant Physiol 118:9–17.
17. Andersson SGE, Kurland CG. 1999. Origins of mitochondria and
hydrogenosomes. Curr Opin Microbiol 2:535–541.
18. Moreira D, Lopez-Garcia P. 1998. Symbiosis between methanogenic
archaea and d-proteobacteria as the origin of eukaryotes: the syntrophic
hypothesis. J Mol Evol 47:517–530.
19. Esser C, Ahmadinejad N, Wiegand C, Rotte C, Sebastiani F, et al. 2004.
A genome phylogeny for mitochondria among a-Proteobacteria and a
predominantly eubacterial ancestry of yeast nuclear genes. Mol Biol Evol
21:1643–1660.
20. Doolittle WF. 1998. You are what you eat: a gene transfer ratchet could
account for bacterial genes in eukaryotic nuclear genomes. Trends
Genet 14:307–311.
21. Karlberg O, Canback B, Kurland CG, Andersson GE. 2000. The dual
origin of the yeast mitochondrial proteome. Yeast 17:170–187.
22. Bapteste E, Gribaldo S. 2003. The genome reduction hypothesis and the
phylogeny of eukaryotes. Trends Genet 19:696–700.
23. Rivera MC, Lake JA. 2004. The ring of life provides evidence for a
genome fusion origin of eukaryotes. Nature 431:152–155.
24. Lake JA, Rivera MC. 2004. Deriving the genomic tree of life in the
presence of horizontal gene transfer: conditioned reconstruction. Mol
Biol Evol 21:681–690.
25. Lawrence J, Hendrickson H. 2003. Lateral gene transfer: when will
adolescence end? Mol Microbiol 50:739–749.
26. Brown J. 2001. Genomic and phylogenetic perspectives on the evolution
of prokaryotes. Syst Biol 50:497–512.
27. Garcia-Vallve S, Romeu A, Palau J. 2000. Horizontal gene transfer
in Bacterial and Archaeal complete genomes. Genome Res 10:1719–
1725.
28. Tatusov RL, Galperin MY, Natale DA, Koonin EV. 2000. The COG
database: a tool for genome-scale analysis of protein functions and
evolution. Nucleic Acids Res 28:33–36.
29. Pagel M, Meade A, Barker D. 2004. Bayesian Estimation of Ancestral
Character States on Phylogenies. Syst Biol 53:673–684.
30. Ochman H, Lawrence J, Groisman EA. 2000. Lateral gene transfer and
the nature of bacterial innovation. Nature 405:299–304.
31. Pagel M. 1999. The maximum likelihood approach to reconstructing
ancestral character states of discrete characters on phylogenies. Syst
Biol 48:612–622.
32. Brown JR. 2003. Ancient horizontal gene transfer. Nat Rev Genet 4:
121–132.
33. Doolittle WF. 1999. Phylogenetic classification and the universal tree.
Science 284:2124–2128.
34. Javaux EJ, Knoll AH, Walter M. 2003. Recognising and interpreting the
fossils of early eukaryotes. Origins Life Evol B 33:75–94.
35. Cavalier-Smith T. 2002. The phagotrophic origin of eukaryotes and
phylogenetic classification of Protozoa. Int J Syst Evol Micr 52:297–354.
36. Gray MW, Long BF, Cedergren R, Golding GB, Lemieux C, et al. 1998.
Genome structure and gene content in protist mitochondrial DNAs.
Nucleic Acids Res 26:865–878.
37. Prokisch H, Scharfe C, Camp DG II, Xiao W, David L, et al. 2004.
Integrative analysis of the mitochondrial proteome in yeast. PLoS Biol
2:795–804.
Problems and paradigms
BioEssays 28.1 63
38. Sickmann A, Reinders J, Wagner Y, Joppich C, Zahedi R, et al. 2003.
The proteome of Saccharomyces cerevisiae mitochondria. Proc Natl
Acad Sci USA 100:13207–13212.
39. Gabaldon T, Huynen M. 2004. Shaping the mitochondrial proteome.
Biochim Biophys Acta 1659:212–220.
40. Gray MW, Burger G, Lang BF. 2001. The origin and early evolution of
mitochondria. Genome Biol 2:1018.1–1018.5.
41. Andersson SGE, Kurland CG. 1998. Reductive evolution of resident
genomes. Trends Mircobiol 6:263–268.
42. Cavalier-Smith T. 1987. The origin of eukaryote and archaebacterial
cells. Ann N Y Acad Sci 503:17–54.
43. Mayr E. 1998. Two empires or three? Proc Natl Acad Sci USA 95:9720–
9723.
44. Doolittle WF. 1998. A paradigm gets shifty. Nature 392:15–16.
45. Darnell JE Jr. 1978. Implications of RNA-RNA splicing in evolution of
eukaryotic cells. Science 202:1257–1260.
46. Doolittle RF. 2000. Searching for the common ancestor. Res Microbiol
151:85–89.
47. Horiike T, Hamada K, Miyata D, Shinozawa T. 2004. The origin of
eukaryotes is suggested as the symbiosis of Pyrococcus into g-
Proteobacteria by phylogenetic tree based on gene content. J Mol Evol
59:606–619.
48. Vellai T, Vida G. 1999. The origin of eukaryotes: the difference between
prokaryotic and eukaryotic cells. Proc Roy Soc Lond Ser B 266:1571–
1577.
49. Doolittle WF. 1999. Rethinking the origin of eukaryotes. Biol Bull
196:378–380.
50. Embley TM, Hirt RP. 1998. Early branching eukaryotes? Curr Opin Genet
Dev 8:624–629.
51. Cavalier-Smith T. 2002. Origins of the machinery of recombination and
sex. Heredity 88:125–141.
52. Williams BAP, Hirt RP, Lucocq JM, Embley TM. 2002. A mitochondiral
remnant in the microsporidian Trachipleistophora hominis. Nature
418:865–869.
53. Searcy DG, Hixon WG. 1991. Cytoskeletal origins in sulfur-metabolising
archaebacteria. BioSyst 25:1–11.
54. Sioud M, Baldacci G, Forterre P, Recondo A. 1987. Antitumour
drugs inhibit the growth of halophilic archaea. Eur J Biochem 169:
231–236.
55. Lowe J, van den Ent F, Amos LA. 2004. Molecules of the bacterial
cytoskeleton. Annu Rev Biophys Biomol Struct 33:177–198.
56. Ferat J, Michel F. 1993. Group II self-splicing introns in bacteria. Nature
364:358–361.
57. Collins L, Penny D. 2005. Complex spliceosomal organization ancestral
to extant eukaryotes. Mol Biol Evol 22:1053–1066.
58. Battistuzzi FU, Feijao A, Hedges SB. 2004. A genomic timescale of
prokaryote evolution: insights into the origin of methanogenesis,
phototrophy, and the colonization of land. BMC Evol Biol 4:44.
59. Wolf YI, Rogozin IB, Grishin NV, Koonin EV. 2002. Genome trees and the
tree of life. Trends Genet 18:472–479.
60. Gupta R, Griffiths E. 2002. Critical issues in bacterial phylogeny. Theor
Popul Biol 61:423–434.
61. Qi J, Wang B, Hao B. 2004. Whole proteome prokaryote phylogeny
without sequence alignment: a K-string composition approach. J Mol
Evol 58:1–11.
62. Dutilh BE, Huynen M, Bruno WJ, Snel B. 2004. The consistent
phylogenetic signal in genome trees revealed by reducing the impact
of noise. J Mol Evol 58:527–539.
Problems and paradigms
64 BioEssays 28.1