nature template - pc word 97digital.csic.es/bitstream/10261/17438/1/molecular... · web...
TRANSCRIPT
Flanking regions of monomorphic microsatellite loci provide a new source of data
for plant species-level phylogenetics
Lars W. Chatroua, M. Pilar Escribanob, Maria A. Viruelb, Jan. W. Maasc, James E.
Richardsond, José I. Hormazab
a Nationaal Herbarium Nederland, Wageningen branch, and Wageningen UR,
Biosystematics Group, Generaal Foulkesweg 37, 6703 BL, the Netherlands. Email:
b Estación Experimental La Mayora-CSIC, Algarrobo-Costa, Málaga 29750, Spain.
Email: [email protected], [email protected], [email protected]
c Nationaal Herbarium Nederland, Utrecht branch, Heidelberglaan 2, 3584 CS Utrecht,
the Netherlands. Email: [email protected]
d Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, United
Kingdom. Email: [email protected]
Corresponding author:
Dr. Lars W. Chatrou
Nationaal Herbarium Nederland, Wageningen branch, and Wageningen UR,
Biosystematics Group
Generaal Foulkesweg 37
6703 BL Wageningen
The Netherlands
Phone: +31 – 317 – 483854
Fax: +31 – 317 – 484917
Email: [email protected]
1
Abstract
Well-resolved phylogenetic trees are essential for us to understand evolutionary
processes at the level of species. The degree of species-level resolution in the plant
phylogenetic literature is poor, however, largely due to the dearth of sufficiently
variable molecular markers.
Unlike the common genic approach to marker development, we generated DNA
sequences of monomorphic nuclear microsatellite flanking regions in a phylogenetic
study of Annona species (Annonaceae). The resulting data showed no evidence of
paralogy or allelic diversity that would confound attempts to reconstruct the species
tree. Microsatellite flanking regions are short, making them practical to use, yet have
astounding proportions of variable characters. They have 3.5-10-fold higher substitution
rates compared to two commonly used chloroplast markers, have no rate heterogeneity
among nucleotide positions, evolve in a clock-like fashion, and show no evidence of
saturation. These advantages are offset by the short length of the flanking regions,
resulting in similar numbers of parsimony informative characters to the chloroplast
markers.
The neutral evolution and high variability of flanking regions, together with the wide
availability of monomorphic microsatellite loci in angiosperms, are useful qualities for
species-level phylogenetics. The general methodology we present here facilitates to find
phylogenetic markers in groups where microsatellites have been developed.
Key words: microsatellite flanking regions, species-level phylogenetics, neutral
evolution.
2
1. Introduction
Increased focus on species-level phylogenetics in angiosperms has encouraged the
pursuit of molecular markers that are capable of resolving phylogenetic relationships at
lower taxonomic levels, i.e. have a mutation rate that is fast enough to produce
sufficient variation (Crawford and Mort, 2004). The need for such markers resonates
within the literature (Bailey et al., 2004; Choi et al., 2006; Crawford and Mort, 2004;
Whittall et al., 2006), as only a small percentage of the published species phylogenetic
trees in plants are fully resolved (Hughes et al., 2006).
Chloroplast markers have been an important source of data for plant phylogenetics.
Apparent advantages of the application of chloroplast markers are its relative abundance
in plant total DNA and the relatively conservative mutation rates, facilitating extraction
and amplification using conserved primer binding sites. Furthermore, chloroplast
markers are essentially single-copy. This avoids the reconstruction of erroneous
organismal phylogenies due to the application of paralogous gene copies, which may be
a problem when applying nuclear markers (Bailey et al., 2003; Baker et al., 2000;
Sanderson and Shaffer, 2002).
The features that simplify the application of cpDNA markers at the species level, are
however traded off against less desirable qualities for organismal species-level
phylogenetics (Sang, 2002). Chloroplast markers generally evolve at rates that are too
slow to provide sufficient phylogenetically informative characters over recent time
spans (Richardson et al., 2001), even after considerable data collection (Perret et al.,
2003; Pirie et al., 2006). It is not to say that not fully resolved phylogenies are
meaningless. As long as critical nodes are well-supported they can serve to pinpoint
biogeographical phenomena (e.g. Erkens et al., 2007b), or to falsify current
classification based on morphological characters (e.g. Shaw and Small, 2004) Only few
3
papers with fairly large chloroplast data sets have generated reasonably well-resolved
and robustly supported species-level phylogenies (e.g. Clarkson et al., 2004).
Furthermore, chloroplast markers are uniparentally inherited, usually maternally in
angiosperms, and therefore provide only part of the evidence for the evolutionary
development of a lineage if hybridization and introgression have taken place (Chase et
al., 2003; Sunnucks, 2000; Vriesendorp and Bakker, 2005).
The search for useful markers for plant species-level systematics has predominantly
yielded markers from genic regions, or, in the case of noncoding DNA, at short
distances from genic regions. There is a growing body of literature on single- or low-
copy nuclear genes that provide sufficient informative characters and do not complicate
organismal phylogeny reconstruction with paralogous copies (Edwards et al., 2008;
Emshwiller and Doyle, 1999; Sang et al., 1997; Small et al., 2004; Whittall et al., 2006).
However, it has also become clear that rates of nucleotide substitution of these markers
may differ significantly among lineages, and even among closely related species
(Hughes et al., 2006). Therefore, attempts to extrapolate the utility of these markers for
resolving species-level relationships outside the taxonomic group for which they have
been developed may not succeed. These difficulties have led some researchers to
suggest that a universal approach should be abandoned in favour of a lineage-specific
one (Small et al., 2004).
However, an alternative to a gene-based approach to the development of variable
nuclear markers involves search strategies that focus on randomly amplified regions
throughout the genome (Bailey et al., 2004; Hughes et al., 2006). The high variability,
abundance, uniform and genome-wide distribution, and neutral evolution of one of
these, namely microsatellites (Ellegren, 2004), make them potentially useful at the
species level. However, their polymorphic nature brings about analytical problems,
4
related to the translation of allele sizes to distance-based characters that are susceptible
to incorrect homology assessment (Matsuoka et al., 2002; Primmer and Ellegren, 1998).
We avoid this drawback by only focusing on the nucleotide sequences of the flanking
regions alongside the microsatellite repeat region, not on the repeat region. A further
factor possibly complicating phylogeny reconstruction is the presence of multiple
alleles, i.e. of variation that doesn’t necessarily have a one-to-one relationship to the
organismal phylogeny, for example because of incomplete lineage sorting. The
distinction between paralogous and orthologous microsatellite copies is less of a
complicating factor. Microsatellites, including the flanking regions, usually represent
unique and therefore orthologous loci (Sunnucks, 2000), although duplication events
affecting microsatellite loci have been reported (Antunes et al., 2006; Zhang and
Rosenberg, 2007).
Thus, an optimal microsatellite flanking region marker for plant species-level
phylogenetics has a rate of substitution that allows resolving shallower relationships, is
represented by orthologous copies, and is monomorphic within species, populations and
individuals. We present examples of such markers from the plant family Annonaceae,
and outline the potential for the broader applicability of this approach in other clades of
angiosperms. We have taken orthology-by-default of as a starting point of our study,
further hypothesizing that a neutral, highly variable marker system such as
microsatellites, including the flanking regions, evolves at a fast enough rate to elucidate
relationships among species of Annona (Annonaceae). In this plant family, the
application of chloroplast sequence data has produced phylogenies that are poorly
resolved at the species level, despite the gathering of large amounts of data (Erkens et
al., 2007a; Mols et al., 2004). Annona is paraphyletic with respect to Rollinia, and the
5
two genera together comprise a clade of approximately 175 species. Species of Rollinia
were synonymised into Annona recently (Rainer, 2007). Here we provide the first
published phylogenetic support for this taxonomic decision, as the former species of
Rollinia (Annona cuspidata, A. herzogii, A. mucosa, A. neochrysocarpa) appear as a
well-supported clade within Annona from the analyses we present here. Although our
taxon sampling reflects one eighth of the species diversity in Annona, covering the
entire morphological diversity as well as the geographical distribution of the genus,
additional taxon sampling would be needed to confidently corroborate the inclusion of
Rollinia into Annona.
To assess the utility of microsatellite flanking regions, we need to address the following
issues: (1) can we produce flanking region sequences that are monomorphic within
individuals? (2) Can we confirm the assumed orthology of the flanking region
sequences? (3) What is the transferability of microsatellite regions across species of
Annona and other Annonaceae? (4) What is the strength of the phylogenetic signal at
the species level?
2. Materials and Methods
2.1 Taxon sampling
For this study we sampled 24 species: 22 species of Annona and two species of Asimina
as outgroup species (Table 1). Richardson et al. (2004) have shown that Asimina is
sister to Annona. The samples of Annona represent the complete geographical range of
the genus, as well as the considerable morphological (particularly floral) variation.
6
2.2 Character sampling
Two chloroplast markers, rbcL and trnLF, were sequenced. These markers have
commonly been applied in phylogenetic analyses of Annonaceae (e.g. Couvreur et al.
(2008), Erkens et al. (2007a), Pirie et al (2006)). These markers are generally
considered to be useful at taxonomic levels above that of species. In a family-wide
analysis the relationships among nine species of Annona were fairly well resolved but
generally poorly supported, based on these two chloroplast markers only (Richardson et
al., 2004). The microsatellite loci were selected based on a screening with the first 15
microsatellite loci that were developed in cherimoya (Annona cherimola) (Escribano et
al., 2004). Seven of them (LMCH4, 5, 6, 9, 10, 11, and 14) produced amplification
bands in the eight species studied initially (Annona sp. nov., A. glabra, A. montana, A.
muricata, A. oligocarpa, A. reticulata, A. senegalensis, and Rollina cuspidata [now
Annona cuspidata, (Rainer, 2007)]. No amplification was obtained for two loci
(LMCH1 and LMCH13). For two additional loci (LMCH7 and LMCH8) amplification
was obtained only with A. montana and A. glabra. LMCH9 and LMCH10 were selected
for this study because they showed clear and monomorphic single-allele amplification
bands in all the species tested.
2.3 DNA extraction, PCR amplification and sequencing
Total genomic DNA was extracted following a protocol adapted from the CTAB
method (Doyle and Doyle, 1987), as described in Pirie et al. (2006). PCR conditions
and primers for the chloroplast markers were standard, and are identical to Pirie et al.
(2006). PCR products were purified using QIAquick PCR purification kits (Qiagen),
7
and sequenced with the PCR primers. PCR conditions and primers for the
microsatellites LMCH9 and LMCH10 are according to Escribano et al. (2004).
PCR products were resolved in 3% high resolution agarose (Metaphor, FMC
Bioproducts, Rockland, ME) gel electrophoresis in SB buffer at 5V/cm.
Sequencing reactions had a total volume of 10 µl contained 0.5 µl DYEnamic ET
Terminator (Amersham Pharmacia Biotech), 3.5 µl ET Terminator dilution buffer
(Amersham Pharmacia Biotech), and 2-4 µl of DNA template, depending on the
concentration. Template concentration was assessed by gel electrophoresis through a
1.5% agarose gel using a molecular weight marker (Smart-Ladder, Eurogentec, Seraing,
Belgium). Sequencing products were purified in a Sephadex G-50, DNA grade
(Sigma-Aldrich, St. Louis, MO, USA), and analyzed on an automatic sequencer ABI
3730XL (Applied Biosystems).
2.4 Phylogenetic analysis
DNA sequences were edited in SeqMan 4.0 (DNAStar Inc., Madison, Wisconsin), and
aligned manually. After exclusion of ambiguous positions, the resulting alignment of
rbcL consisted of 1364 positions, trnLF 845 positions, LMCH9 117 positions, and
LMCH10 233 positions. Indels were coded following Simmons and Ochotorena (2000),
and resulted in 15 further characters (trnLF: 4, LMCH9: 3, LMCH10: 8).
Maximum parsimony analyses [Fitch parsimony (Fitch, 1971)] for each marker
separately were done applying heuristic searches, with 100,000 random addition
sequence replicates, saving maximally 100 trees per replicate, using TBR branch
swapping. The program PAUP* 4.0b10 (Swofford, 2000) was used for the phylogenetic
analyses. The concatenated data matrix of all markers was analyzed using the branch
8
and bound method, with furthest addition sequence, and the MulTrees option in effect.
Bootstrap resampling of the data matrix was used to assess support, with 1000 bootstrap
replicates for each bootstrap analysis. For each marker individually, full heuristic
searches were done of 100 random addition sequences, TBR, saving 100 trees each
time. For the concatenated matrix, each bootstrap resampled matrix was analyzed using
the branch and bound algorithm, with settings as above.
DNA substitution models for each data partition separately were identified using
ModelTest 3.04 (Posada and Crandall, 1998). Individual data partitions were optimized
onto the combined topology. Based on the model identified by ModelTest, a likelihood
ratio test (Felsenstein, 1988) was applied to test whether each data partition evolves
along all branches within the combined topology at a homogenous rate (molecular
clock). The difference in likelihoods of the tree topologies, with and without clock
constraint, was used to calculate the likelihood ratio test statistic , which is reported in
Table 2. Likelihood values were produced with PAUP* 4.0b10.
To test for congruence between the chloroplast data partition and the flanking region
data partition, we applied the incongruence length difference (ILD) test (Farris et al.,
1995a, b), implemented in PAUP* 4.0b10, using informative characters only, with 5000
replicates, and heuristic searches as described above. The chloroplast markers rbcL and
trnLF were combined into a single data partition, and we tested incongruence of the
plastid data partition with both flanking region markers separately. Incongruence
between the two flanking regions was tested as well. Statistics of the incongruence
length difference tests are given in Table 3.
Saturation plots were made by plotting corrected vs. uncorrected distances for all
possible species pairs, both of which are produced by PAUP*. Distances were corrected
9
applying the models of molecular evolution for each marker separately, as found with
Modeltest 3.06 (Posada and Crandall, 1998). These models are given in Table 2.
Substitution rates were estimated using the program r8s (Sanderson, 2004), using
penalized likelihood as reconstruction method. To estimate branch lengths as accurately
as possible, sequences of each individual markers were optimized onto the combined
topology. Rates were calculated in absolute time (10-9 substitutions / site / year) by
calibrating the crown node of Annona at 19.1 myr. Richardson et al. (2004) estimated
the crown node of Annona at 25.6 3.8 myr. Unpublished results (Pirie et al., in
prep.), analysing more sequence data and calibrating with more fossils compared to
Richardson et al. (2004) have pushed this age up to 19.1 2.0 myr. Reliable fossil data
for Annona are unavailable. Given the broad taxon and character sampling of the study
from which we derive this age, the quality of the fossils, and the small confidence
intervals, we consider this secondary calibration reliable.
3. Results and discussion
3.1 Monomorphism and allelic diversity
Monomorphic microsatellites, i.e. those with only a single allele for a locus, are
routinely discovered during the screening of microsatellite loci in plants (Squirrell et al.,
2003). Two out of 15 microsatellites developed for Annona cherimola (Escribano et al.,
2004), LMCH9 and LMCH10, meet this criterion, as they produced clear single
amplification bands after PCR. We produced LMCH9 and LMCH10 nucleotide
sequences for 22 species of Annona (Table 1). The two microsatellite loci both contain a
dinucleotide repeat region, as well as short 5’ and 3’ flanking regions.
10
PCR of these regions generally produced homogeneous bands, supporting our
assumption of orthology of the included flanking region sequences (Small et al., 2004).
Standard PCR conditions were adequate for obtaining amplification products, and no
cloning was required. The small size of the fragments enhances the ease of
amplification, which makes them especially useful when working with degraded DNA.
In the unusual case of double bands, fragment size similarity amongst different species
was easy to assess, and fragments were cut out from the high-resolution separation gel.
Sequencing of PCR products typically produced chromatograms indicative of
monomorphic loci, which could be interpreted unequivocally. Single nucleotide
polymorphisms (SNPs), i.e. identical polymorphisms that were present in both the
forward and reverse sequence, were hardly ever encountered. Three LMCH9 sequences
(Annona dumetorum, A. mucosa, and A. urbaniana) and two LMCH10 sequences (A.
bicolor and A. hypoglauca) contained 1-3 SNPs, causing a polymorphism frequency
between 0.4% and 1.7% for these five sequences. There was no overlap in positions at
which the polymorphism occurred between any of these species. These SNPs might
indicate the presence of multiple alleles. However, given the very low frequency of
SNPs, possible alleles were highly similar and their effect on the results of the
phylogenetic analyses was non-existent. Even if the SNPs would point at allelic
variation, it would only cause problems for species-level phylogeny reconstruction in
case the coalescence of alleles at deeper phylogenetic levels, ancestral to the species
sampled here. And so, despite the use of a highly variable marker, the careful selection
of monomorphic loci precluded the gathering of intra-specific polymorphisms that
would have rendered reconstruction of the species phylogeny problematic.
3.2 Orthology
11
Alignment of the LMCH9 and LMCH10 flanking regions was straightforward. The 5’
flanking region of LMCH9 was only 30 bps and therefore only the 3’ flanking region
was included in the analyses. Both the 5’ and 3’ flanking region of LMCH10 were
included. The dinucleotide repeat regions were excluded from the analysis. The aligned
flanking regions of LMCH9 and LMCH10 comprise 117 and 233 characters,
respectively. Additionally, 11 indel characters were scored and included in the analyses
(Table 2).
Orthology of the flanking region sequences was supported by the similarity of
phylogenetic signal in the flanking regions and in an independent data source, viz.
chloroplast rbcL and trnL-F sequences. Manual observation of bootstrap support values
for each marker separately (Fig. 1) revealed the absence of any well-supported
conflicting clades (bootstrap support 85 %). The sister group relationship of the A.
glabra / A. senegalensis clade with a clade containing the former Rollinia species, as
reconstructed with LMCH9 sequences, is in conflict with the position of the former
clade after analysis of the other markers. However, the bootstrap support of 75% is only
moderate, and insufficient to consider the signal of LMCH9 to be different.
Furthermore, congruence of the combined chloroplast markers and each flanking region
was demonstrated using the parsimony-based incongruence length difference (ILD) test
(Table 3). Both LMCH9 and LMCH10 were not significantly incongruent with the
chloroplast data partition at the 95% confidence level. Finally, incongruence of the two
flanking regions was clearly refuted.
3.3 Transferability
12
We were unable to amplify either of the two microsatellite loci for any species outside
Annona, not even for its sister genus Asimina (Richardson et al., 2004). It should be
noted that the clade to which Annona and Asimina belong is characterized by long
branches subtending generally species-rich clades, causing sister genera to be relatively
distant (Richardson et al., 2004). However, similar patterns of good amplification
success within a target group and poor success in non-congeneric species have been
reported in other plant and animal clades too (Fraser et al., 2005; Peakall et al., 1998;
Wilson et al., 2004) In Annona the limited transferability only poses problems with
regard to the rooting of the tree, as the flanking regions sequences could be produced for
the entire ingroup. We predict that this would also be the case in other similar studies in
angiosperms. Datasets for phylogenetic analyses typically comprise multiple markers,
and can easily be designed to contain flanking region sequences as well as markers that
can be sequenced for a broader range of taxa. The latter sequences would ensure
appropriate rooting of phylogenetic trees and the former would provide many
informative characters at nodes within the ingroup. The potential for utilizing
microsatellite flanking regions for species-level phylogenetics in plants is fairly large, as
published monomorphic microsatellite sequences are available for species-rich tropical
groups, such as Begonia or Melaleuca, as well as temperate groups such as Pinus and
Primula (Squirrell et al., 2003), and could readily be scrutinized for their phylogenetic
utility. At the same time our results as well as the other reports on transferability
suggest that the usefulness of the methodology we describe here is limited to resolving
the shallow branches of the tree of life and will not contribute to taxonomically large
data sets (Chase et al., 2006; Driskell et al., 2004).
3.4 Phylogenetic utility
13
The addition of flanking region sequence data has a positive effect on the resolution of
the phylogenetic tree of Annona. The number of well-supported nodes increases
compared to the application of the chloroplast sequences only (Table 4), and the
simultaneous analysis of the four markers produced a single most-parsimonious tree,
generally with high bootstrap support for the nodes (Fig. 2).
The flanking region sequences are much more variable than the chloroplast markers, as
expressed by the higher percentage of both variable and parsimony informative
characters (Table 2). Also, mean substitution rates are higher for the flanking regions,
approximately 3.5 to 10-fold the rate of the chloroplast markers, although it should be
noted that the standard deviations of the substitution rates are large (Table 2). The
models of evolution of the flanking regions evolve are simpler than of the chloroplast
markers. All sites of each of the flanking regions, respectively, evolve at the same rates
as the model estimates showed the absence of rate substitution heterogeneity among the
positions ( = ). Moreover, substitutions accumulate linearly over time in the flanking
regions. For LMCH9, the molecular clock hypothesis was not rejected by the likelihood
ratio test (LRT) at any significance level (Table 2). For LMCH10, the molecular clock
hypothesis is just rejected at the 5% level, though not at the 2.5 % level. For both
chloroplast markers the molecular clock hypothesis was rejected (p < 0.001).
For each marker we plotted uncorrected pairwise distances against distances corrected
using models of molecular evolution (Table 2) as identified using ModelTest (Posada
and Crandall, 1998), to assess the occurrence of saturation (Fig. 3). The chloroplast
markers show initial saturation as the graphs deflect from linearity but don’t reach a
saturation plateau yet. In contrast, the flanking regions that show no evidence of
multiple substitutions at nucleotide positions despite the higher overall substitution
14
rates. The first signs of saturation in the chloroplast markers might be attributable to the
sampling of only 22 out of 175 species of Annona. Increased taxon sampling would
likely reduce the phylogenetic distances among sequences, and consequently could
minimize the appearance of saturation. However, saturation in the four markers is
compared against the background of the same taxon sampling. The tentative conclusion
that the microsatellite flanking regions are less saturated than the chloroplast markers is
therefore warranted.
Due to the higher percentage of variable characters and the higher rate of substitution, it
would be reasonable to expect higher levels of homoplasy in the flanking regions,
simply because of the availability of four character states only for each nucleotide
position. First, superimposed substitutions at a nucleotide position would be supposed to
occur more readily, causing the saturation plot to deflect from linearity. In addition to
this hidden homoplasy, we would expect to see higher levels of ‘visible’ homoplasy, i.e.
the independent multiple origin of identical character states, as reflected in lower values
of the consistency index (CI). However, both the saturation plot (Fig. 3) and the CI
values (Table 2) show results differing from these expectations; there is not as much
saturation in the flanking regions as in the chloroplast markers, and CI values are
similar. The explanation must be sought in the characteristics of molecular evolution of
the flanking regions, notably the clock-like accumulation of substitutions, and the
absence of rate heterogeneity. Both characteristics are assumed by neutral molecular
evolution: a constant substitution rate over evolutionary lineages and over sites in DNA
sequences (Bromham and Penny, 2003). The even distribution of substitutions over the
nucleotide positions in the flanking regions allows the substitution rates to be higher
than for the chloroplast markers, while at the same time making the flanking regions
15
less prone to saturation. In addition, the K80 model of molecular evolution estimated for
LMCH9 is congruent with these indications of neutrality.
These characteristics of the flanking regions are noteworthy as they are in contrast to
findings in the literature on homoplasy in chloroplast markers. The correlation between
levels of homoplasy on the one hand, and substitution rates and/or levels of sequence
divergence on the other hand is often assumed without further testing, for instance to
rule out the possible deleterious effect of saturation in data partitions with low
substitution rates (e.g. Cronn et al., 2002; Zgurski et al., 2008). Such a positive
relationship between substitution rate and level of homoplasy, as expressed by the
consistency index, has been demonstrated for nucleotide substitution rates in chloroplast
genes (Graham and Olmstead, 2000), and even for rates of chloroplast indel characters
(Ingvarsson et al., 2003). Wortley et al. (2005) found that a simulating an increase of
substitution rate of rbcL, matK and ndhF soon resulted in the decrease of phylogenetic
resolution, probably because of saturation. This is likely to be related to the fact that the
evolution of these chloroplast genes is governed by mild functional constraints
(Savolainen et al., 2002). The absence of comparable correlations in microsatellite
flanking regions between substitution rate and homoplasy mirrors the neutral evolution
of microsatellites (Ellegren, 2004), which apparently is present at the nucleotide level in
the flanking regions too.
The clock-like behaviour of the flanking regions makes them a helpful tool for the
dating of divergences. An additional advantage is the availability of a nuclear data
partition, providing a more complete picture of species level phylogenies than based on
chloroplast markers only.
These advantages are traded off by the drawback of the limited size of the flanking
regions. Despite the high percentage of variable and parsimony informative characters,
16
the absolute number of these characters is comparable to the chloroplast markers, the
latter being a more relevant characteristic for determining phylogenetic utility. Wortley
and Scotland (2006) fine-tuned this criterion by describing the minimum number of
parsimony informative character-state changes, as a measure of utility. For the markers
in this paper the difference between this measure and the number of parsimony
informative characters was negligible, as there was only a difference of 1 between the
two measures for trnLF and LMCH10.
3.5 Conclusion
To our knowledge, microsatellite flanking regions have only once been demonstrated to
be congruent with phylogenies based on other data, and subsequently been used in
species-level phylogenetics, viz. in cichlid fishes (Zardoya et al., 1996). Here we
present the first example of the utility of flanking regions for angiosperm phylogenetics.
Our data strongly suggest that the suitability of the flanking regions for resolving
species-level relationships is related to the neutral molecular evolution of these regions,
as exemplified by the substitution rate constancy, similar substitution rates over
nucleotide positions, and the lack of saturation. Given the large number of microsatellite
libraries that have been created for a broad range of species in a large number of
angiosperm genera, the potential for using a similar approach to that employed here for
discovering phylogenetically useful markers in problematic groups is high. In a survey
of the utilization of microsatellites for population genetic studies on individual species
Squirrel et al. (Squirrell et al., 2003) highlighted 66 examples in angiosperms. Of the
studies considered in Table 3 of that publication an average of 17.7% of microsatellite
loci producing PCR products were monomorphic. Our study demonstrates that these
17
loci may represent a great untapped source of phylogenetically informative nuclear
neutral markers for plant species-level systematics. Obviously, on the basis of our
results we cannot accurately predict the phylogenetic utility of these markers in other
plant groups, as little is known about the molecular evolution of microsatellite flanking
regions in general. Nevertheless, it is likely that these molecular evolutionary patterns
will resemble that of Annona, given the frequency and distribution of the bulk of
microsatellites in the genome, and given the neutral evolution of the repeat regions
(Ellegren, 2004). Given the universal utility of our approach microsatellite flanking
regions have the potential to become useful tools for resolving relationships amongst
recently diverged taxa.
18
Acknowledgments
The authors acknowledge financial support from the Spanish Ministry of Education
(Project Grants AGL2004-02290/AGR and AGL2007-60130/AGR). M.P.E. was
supported by a FPI grant of the Spanish Ministry of Education.
19
References
Antunes, A., Gharbi, K., Alexandrino, P., Guyomard, R., 2006. Characterization of
transferrin-linked microsatellites in brown trout (Salmo trutta) and Atlantic salmon
(Salmo salar). Mol. Ecol. Notes 6, 547-549.
Bailey, C.D., Carr, T.G., Harris, S.A., Hughes, C.E., 2003. Characterization of
angiosperm nrDNA polymorphism, paralogy, and pseudogenes. Mol. Phylogen. Evol.
29, 435-455.
Bailey, C.D., Hughes, C.E., Harris, S.A., 2004. Using RAPDs to identify DNA
sequence loci for species level phylogeny reconstruction: an example from Leucaena
(Fabaceae). Syst. Bot. 29, 4-14.
Baker, W.J., Hedderson, T.A., Dransfield, J., 2000. Molecular phylogenetics of
subfamily Calamoideae (Palmae) based on nrDNA ITS and cpDNA rps16 intron
sequence data. Mol. Phylogen. Evol. 14, 195-217.
Bromham, L., Penny, D., 2003. The modern molecular clock. Nat. Rev. Genet. 4, 216-
224.
Chase, M.W., Fay, M.F., Soltis, D.E., Soltis, P.S., Takahashi, K.T., Savolainen, V.,
2006. Simple phylogenetic tree searches easily "succeed" with large matrices of single
genes. Taxon 55, 573-578.
Chase, M.W., Knapp, S., Cox, A.V., Clarkson, J.J., Butsko, Y., Joseph, J., Savolainen,
V., Parokonny, A.S., 2003. Molecular systematics, GISH and the origin of hybrid taxa
in Nicotiana (Solanaceae). Ann. Bot. 92, 107-127.
Choi, H.-K., Luckow, M., Doyle, J.J., Cook, D., 2006. Development of nuclear gene-
derived molecular markers linked to legume genetic maps. Mol. Genet. Genomics 276,
56-70.
20
Clarkson, J.J., Knapp, S., Garcia, V.F., Olmstead, R.G., Leitch, A.R., Chase, M.W.,
2004. Phylogenetic relationships in Nicotiana (Solanaceae) inferred from multiple
plastid DNA regions. Mol. Phylogen. Evol. 33, 75-90.
Couvreur, T.L.P., Richardson, J.E., Sosef, M.S.M., Erkens, R.H.J., Chatrou, L.W.,
2008. Evolution of syncarpy and other morphological characters in African
Annonaceae: a posterior mapping approach. Mol. Phylogen. Evol. 47, 302-318.
Crawford, D.J., Mort, M.E., 2004. Single-locus molecular markers for inferring
relationships at lower taxonomic levels: observations and comments. Taxon 53, 631-
635.
Cronn, R.C., Small, R.L., Haselkorn, T., Wendel, J.F., 2002. Rapid diversification of
the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and
chloroplast genes. Am. J. Bot. 89, 707-725.
Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of
fresh leaf tissue. Phytochem. Bull. 19, 11-15.
Driskell, A.C., Ane, C., Burleigh, J.G., McMahon, M.M., O'Meara, B.C., Sanderson,
M.J., 2004. Prospects for building the tree of life from large sequence databases.
Science 306, 1172-1174.
Edwards, C.E., Lefkowitz, D., Soltis, D.E., Soltis, P.S., 2008. Phylogeny of Conradina
and related southeastern scrub mints (Lamiaceae) based on GapC gene sequences. Int. J.
Plant Sci. 169, 579-594.
Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev.
Genet. 5, 435-445.
Emshwiller, E., Doyle, J.J., 1999. Chloroplast-expressed glutamine synthetase (ncpGS):
potential utility for phylogenetic studies with an example from Oxalis (Oxalidaceae).
Mol. Phylogen. Evol. 12, 310-319.
21
Erkens, R.H.J., Chatrou, L.W., Koek-Noorman, J., Maas, J.W., Maas, P.J.M., 2007a.
Classification of a large and widespread genus of Neotropical trees, Guatteria
(Annonaceae) and its three satellite genera Guatteriella, Guatteriopsis and
Heteropetalum. Taxon 56, 757-774.
Erkens, R.H.J., Chatrou, L.W., Maas, J.W., van der Niet, T., Savolainen, V., 2007b. A
rapid diversification of rainforest trees (Guatteria; Annonaceae) following dispersal
from Central into South America. Mol. Phylogen. Evol. 44, 399-411.
Escribano, P., Viruel, M.A., Hormaza, J.I., 2004. Characterization and cross-species
amplification of microsatellite markers in cherimoya (Annona cherimola Mill.,
Annonaceae). Mol. Ecol. Notes 4, 746-748.
Farris, J.S., Källersjö, M., Kluge, A.G., Bult, C., 1995a. Constructing a significance test
for incongruence. Syst. Biol. 44, 570-572.
Farris, J.S., Källersjö, M., Kluge, A.G., Bult, C., 1995b. Testing significance of
incongruence. Cladistics 10, 315-319.
Felsenstein, J., 1988. Phylogenies and quantitative characters. Annu. Rev. Ecol. Syst.
19, 445-471.
Fitch, W.M., 1971. Toward defining the course of evolution: minimum change for a
specified tree topology. Syst. Zool. 20, 406-416.
Fraser, L.G., McNeilage, M.A., Tsang, G.K., Harvey, C.F., De Silva, H., 2005. Cross-
species amplification of microsatellite loci within the dioecious, polyploid genus
Actinidia (Actinidiaceae). Theor. Appl. Genet. 112, 149-157.
Graham, S.W., Olmstead, R.G., 2000. Utility of 17 chloroplast genes for inferring the
phylogeny of the basal angiosperms. Amer. J. Bot. 87, 1712-1730.
22
Hughes, C.E., Eastwood, R.J., Bailey, C.D., 2006. From famine to feast? Selecting
nuclear DNA sequence loci for plant species-level phylogeny reconstruction. Philos.
Trans. R. Soc. Lond., Ser. B: Biol. Sci. 361, 211-225.
Ingvarsson, P.K., Ribstein, S., Taylor, D.R., 2003. Molecular evolution of insertions and
deletion in the chloroplast genome of Silene. Mol Biol Evol 20, 1737-1740.
Matsuoka, Y., Mitchell, S.E., Kresovich, S., Goodman, M., Doebley, J., 2002.
Microsatellites in Zea - variability, patterns of mutations, and use for evolutionary
studies. Theor. Appl. Genet. 104, 436-450.
Mols, J.B., Gravendeel, B., Chatrou, L.W., Pirie, M.D., Bygrave, P., Chase, M.W.,
Kessler, P.J.A., 2004. Identifying clades in Asian Annonaceae: monophyletic genera in
the polyphyletic Miliuseae. Amer. J. Bot. 91, 590-600.
Peakall, R., Gilmore, S., Keys, W., Morgante, M., Rafalski, A., 1998. Cross-species
amplification of soybean (Glycine max) simple sequence repeats (SSRs) within the
genus and other legume genera: implications for the transferability of SSRs in plants.
Mol. Biol. Evol. 15, 1275-1287.
Perret, M., Chautems, A., Spichiger, R., Kite, G., Savolainen, V., 2003. Systematics and
evolution of tribe Sinningieae (Gesneriaceae): evidence from phylogenetic analyses of
six plastid DNA regions and nuclear ncpGS. Amer. J. Bot. 90, 445-460.
Pirie, M.D., Chatrou, L.W., Erkens, R.H.J., Maas, J.W., van der Niet, T., Mols, J.B.,
Richardson, J.E., 2005. Phylogeny reconstruction and molecular dating in four
Neotropical genera of Annonaceae: the effect of taxon sampling in age estimation. In:
Bakker, F.T., Chatrou, L.W., Gravendeel, B., Pelser, P.B. (Eds.), Plant species-level
systematics: new perspectives on pattern and process. A.R.G. Gantner Verlag, Ruggell,
Liechenstein, pp. 149-174.
23
Pirie, M.D., Chatrou, L.W., Mols, J.B., Erkens, R.H.J., Oosterhof, J., 2006. 'Andean-
centred' genera in the short-branch clade of Annonaceae: testing biogeographical
hypotheses using phylogeny reconstruction and molecular dating. J. Biogeogr. 33, 31-
46.
Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution.
Bioinformatics 14, 817-818.
Primmer, C.R., Ellegren, H., 1998. Patterns of molecular evolution in avian
microsatellites. Mol. Biol. Evol. 15, 997-1008.
Rainer, H., 2007. Monographic studies in the genus Annona L. (Annonaceae): Inclusion
of the genus Rollinia A.St.-Hil. Ann. Naturhist. Mus. Wien, B 108, 191-205.
Richardson, J.E., Chatrou, L.W., Mols, J.B., Erkens, R.H.J., Pirie, M.D., 2004.
Historical biogeography of two cosmopolitan families of flowering plants: Annonaceae
and Rhamnaceae. Philos. Trans. R. Soc. Lond., Ser. B: Biol. Sci. 359, 1495-1508.
Richardson, J.E., Pennington, R.T., Pennington, T.D., Hollingsworth, P.M., 2001. Rapid
diversification of a species-rich genus of Neotropical rainforest trees. Science 293,
2242-2245.
Sanderson, M.J., 2004. r8s, version 1.70. Distributed by the author, Section of Evolution
and Ecology, University of California, Davis, USA.
Sanderson, M.J., Shaffer, H.B., 2002. Troubleshooting molecular phylogenetic
analyses. Annu. Rev. Ecol. Syst. 33, 49-72.
Sang, T., 2002. Utility of low-copy nuclear gene sequences in plant phylogenies. Crit.
Rev. Biochem. Mol. Biol. 37, 121-147.
Sang, T., Donoghue, M.J., Zhang, D., 1997. Evolution of alcohol dehydrogenase genes
in peonies (Paeonia): phylogenetic relationships of putative nonhybrid species. Mol.
Biol. Evol. 14, 994-1007.
24
Savolainen, V., Chase, M.W., Salamin, N., Soltis, D.E., Soltis, P.S., López, A.J.,
Fédrigo, O., Naylor, G.J.P., 2002. Phylogeny reconstruction and functional constraints
in organellar genomes: plastid atpB and rbcL sequences versus animal mitochondrion.
Syst. Biol. 51, 638 - 647.
Shaw, J., Small, R.L., 2004. Addressing the "hardest puzzle in American pomology:"
Phylogeny of Prunus sect. Prunocerasus (Rosaceae) based on seven noncoding
chloroplast DNA regions. Amer. J. Bot. 91, 985-996.
Simmons, M.P., Ochotorena, H., 2000. Gaps as characters in sequence-based
phylogenetic analysis. Syst. Biol. 49, 369-381.
Small, R.L., Cronn, R.C., Wendel, J.F., 2004. L.A.S. Johnson review no. 2. Use of
nuclear genes for phylogeny reconstruction in plants. Aust. Syst. Bot. 17, 145-170.
Squirrell, J., Hollingsworth, P.M., Woodhead, M., Russell, J., Lowe, A.J., Gibby, M.,
Powell, W., 2003. How much effort is required to isolate nuclear microsatellites from
plants? Mol. Ecol. 12, 1339-1348.
Sunnucks, P., 2000. Efficient genetic markers for population biology. Trends Ecol.
Evol. 15, 199-203.
Swofford, D.L., 2000. PAUP*. Phylogenetic Analysis Using Parsimony (* and other
methods), version 4.0b10. Sinauer Associates, Sunderland (MA).
Vriesendorp, B., Bakker, F.T., 2005. Reconstructing patterns of reticulate evolution in
angiosperms: what can we do? Taxon 54, 593-604.
Whittall, J.B., Medina-Marino, A., Zimmer, E.A., Hodges, S.A., 2006. Generating
single-copy nuclear gene data for a recent adaptive radiation. Mol. Phylogen. Evol. 39,
124-134.
Wilson, A.C.C., Massonnet, B., Simon, J.-C., Prunier-Leterme, N., Dolatti, L.,
Llewellyn, K.S., Figueroa, C.C., Ramirez, C.C., Blackman, R.L., Estoup, A., Sunnucks,
25
P., 2004. Cross-species amplification of microsatellite loci in aphids: assessment and
application. Mol. Ecol. Notes 4, 104-109.
Wortley, A.H., Rudall, P.J., Harris, D.J., Scotland, R.W., 2005. How much data are
needed to resolve a difficult phylogeny? Case study in Lamiales. Syst. Biol. 54, 697 -
709.
Wortley, A.H., Scotland, R.W., 2006. Determining the potential utility of datasets for
phylogeny reconstruction. Taxon 55, 431-442.
Zardoya, R., Vollmer, D.M., Craddock, C., Streelman, J.T., Karl, S., Meyer, A., 1996.
Evolutionary conservation of microsatellite flanking regions and their use in resolving
the phylogeny of cichlid fishes (Pisces: Perciformes). Proc. R. Soc. Lond., Ser. B: Biol.
Sci. 263, 1589-1598.
Zgurski, J.M., Rai, H.S., Fai, Q.M., Bogler, D.J., Francisco-Ortega, J., Graham, S.W.,
2008. How well do we understand the overall backbone of cycad phylogeny? New
insights from a large, multigene plastid data set. Mol. Phylogen. Evol. 47, 1232-1237.
Zhang, K., Rosenberg, N.A., 2007. On the genealogy of a duplicated microsatellite.
Genetics 177, 2109-2122.
26
Table 1. Species, voucher information and GenBank, NCBI accession numbers.
Species Geography Voucher rbcL trnLF LMCH9 LMCH10Asimina angustifolia A.Gray USA Weerasooriya, A.
s.n. (U) DQ124939 b AY841677 a – –Asimina triloba(L.) Dunal USA Chatrou, L.W.
276 (U) AY743441 c AY743460 c – –Annona amazonica R.E.Fr. Bolivia Chatrou, L.W.
462 (U) EU420853 a EU420836 a EU420768 a EU420790 a
Annona bicolorUrb. Mexico Maas, P.J.M.
8381 (U) EU420854 a EU420837 a EU420769 a EU420791 a
Annona cornifoliaA.St.-Hil. Bolivia Chatrou, L.W.
343 (U) EU420855 a – EU420770 a EU420792 a
Annona cuspidata(Mart.) H.Rainer Guyana Jansen-Jacobs, M.J.
5957 (U) EU420869 a EU420851 a EU420787 a EU420809 a
Annona deminutaR.E.Fr. Peru Rainer, H.
271 (WU) EU420857 a EU420839 a EU420772 a EU420794 a
Annona dumetorum R.E.Fr.
Dominican Republic
Maas, P.J.M.8374 (U) EU420856 a EU420838 a EU420771 a EU420793 a
Annona glabraL.
Neotropical / African
Chatrou, L.W.467 (U) AY841596 a AY841673 a EU420773 a EU420795 a
Annona herzogii(R.E.Fr.) H.Rainer Bolivia Chatrou, L.W.
347 (U) AY841656 a AY841734 a EU420788 a EU420810 a
Annona holosericeaSaff. Honduras Maas, P.J.M.
8445 (U) EU420858 a EU420840 a EU420774 a EU420796 a
Annona hypoglaucaMart. Bolivia Chatrou, L.W.
444 (U) EU420859 a EU420841 a EU420775 a EU420797 a
Annona montanaMacfad. Neotropical Chatrou, L.W.
484 (U) EU420860 a EU420842 a EU420776 a EU420798 a
Annona mucosaJacq. Peru Chatrou, L.W.
247 (U) EU420870 a EU420852 a EU420789 a EU420811 a
Annona muricataL. Neotropical Chatrou, L.W.
468 (U) AY743440 c AY743459 c EU420777 a EU420799 a
Annona neochrysocarpa H.Rainer Peru Pirie, M.D.
43 (U) EU420868 a EU420850 a EU420786 a EU420808 a
Annona oligocarpa R.E.Fr. Ecuador Maas, P.J.M.
8522 (U) EU420861 a EU420843 a EU420778 a EU420800 a
Annona pruinosa G.E.Schatz Costa Rica Chatrou, L.W.
77 (U) EU420862 a EU420844 a EU420779 a EU420801 a
Annona reticulataL. Bolivia Chatrou, L.W.
290 (U) EU420863 a EU420845 a EU420780 a EU420802 a
Annona scandensDiels Bolivia Chatrou, L.W.
365 (U) EU420864 a EU420846 a EU420781 a EU420803 a
Annona senegalensisPers. West African Chatrou, L.W.
469 (U) AY841597 a AY841674 a EU420782 a EU420804 a
Annona squamosaL. Curação van Proosdij, A.S.J.
1133 (U) EU420865 a EU420847 a EU420783 a EU420805 a
Annona symphyocarpa Sandw. Guyana Ek, R.C.
1270 (U) EU420866 a EU420848 a EU420784 a EU420806 a
Annona urbanianaR.E.Fr.
Dominican Republic
Maas, P.J.M.8392 (U) EU420867 a EU420849 a EU420785 a EU420807 a
a This studyb Erkens et al. (2007b)c Pirie et al. (2005)
27
Table 2. Statistics per marker on features of data and molecular evolution.
# nucl.
chars.
#
indel
chars.
# and %
variable
chars.
# and %
pars. inf.
chars.
# most
pars. trees
tree
length
CI RI
rbcL 1364 0 72 / 5.3 % 42 / 3.1 % 24 103 0.874 0.913
trnL-F 845 4 101 / 12.0 % 47 / 5.6 % 66 142 0.887 0.898
LMCH 9 117 3 41 / 35.0 % 26 / 22.2 % 31 55 0.873 0.932
LMCH 10 233 8 69 / 29.6 % 42 / 18.0 % 189 98 0.898 0.906
model among-
site rate
variation
LRT statistic
(2)
rate (10-9
substit. /
site / year)
rbcL GTR + Γ =
0.1504
58.95 a 0.3127 ±
0.2225
trnL-F TIM + Γ =
0.7028
102.89 a 0.8682 ±
0.5258
LMCH9 K80 ∞ 16.37 b 2.962 ±
1.999
LMCH10 HKY ∞ 33.26 c 3.065 ±
1.693
a p < 0.0001
b 0.6 < p < 0.7
c 0.03 < p < 0.04
28
Table 3. Statistics of incongruence length difference tests.
Partitions ILD p-value
chloroplast markers vs. LMCH9 0.0746
chloroplast markers vs. LMCH10 0.0856
LMCH9 vs. LMCH10 0.2160
Table 4. Effect of combining data partitions on the number of clades with bootstrap
support ≥ 85 %.
rbcL 10
trnLF 6
rbcL / trnLF 12
LMCH9 2
LMCH10 6
LMCH9 / LMCH10 9
rbcL / trnLF / LMCH 9 13
trnLF / LMCH9 / LMCH10 12
rbcL / trnLF / LMCH9 / LMCH10 16
29
30
Figure legends
Figure 1. Maximum parsimony phylogram for each of the four markers. The total
number of most parsimonious trees from which the trees shown here were arbitrarily
chosen is given in Table 2. Thick grey branches indicate bootstrap support 85%, thick
black branches bootstrap support between 70% and 84%. Lack of flanking region
sequences for Asimina precluded outgroup rooting. For ease of comparison, trees of
LMCH9 and LMCH10 have been drawn rooted at the midpoint between the clade with
Annona muricata and the remainder of the ingroup species (as found in the plastid trees
rooted with Asimina). Horizontal bars equal indicate branch lengths of five steps.
Figure 2. Single most-parsimonious phylogram resulting from maximum parsimony
analysis of all data (rbcL, trnLF, LMCH9, and LMCH10) combined. Thick grey
branches indicate bootstrap support 85%, thick black branches bootstrap support
between 70% and 84%.
Figure 3. Saturation plots, displaying corrected versus uncorrected pairwise distances.
Distances were corrected using the models of molecular evolution given in Table 2.
31