speciation history inferred from gene trees l. lacey knowles department of ecology and evolutionary...
TRANSCRIPT
Speciation history inferred from gene trees
L. Lacey KnowlesDepartment of Ecology and Evolutionary Biology
University of Michigan, Ann Arbor MI
Emphasis on multilocus data in phylogenetics and phylogeography…
• The good • The bad• The ugly
Utility of single locus data for inferences about speciation history??
Estimating population genetic parameters relevant to the process of species divergence
1
T
Present
A
2
m
speciation
T
Was speciation promoted by displacements into glacial refugia or recolonization of sky islands
during interglacials?
Was diversification inhibited or promoted during the Pleistocene?
• accurate & precise estimates of T isessential to evaluating when, and therefore the geographic setting, of species divergence
Parameterized model for making inferences about the divergence process
map of the sky islands above 2500 m
MT
WY
ID
36 M. oregonensis
23 M. montanus
Divergence M. oregonensis and M. montanus from the Rocky Mountains
Carstens & Knowles 2007, Mol. Ecol. 16:619-27.
5 anonymous nuclear loci1 mitochondrial locus
1
T
Presentm
A
2
coalescent framework and multilocus versus single locus data set
4.9 x 105 to 2.0 x 106
estimate from average mtDNA genetic distance:
*same mutation rate used in the different approaches
divergence of gene lineages within the ancestral species
Assumed species tree of Poephila finches
Jennings & Edwards (2005) Evolution
hecki acuticauda cincta
Long-tailed Finch Black-throated Finch
tahc-tah
tah
Australia
ah
ahc
Identified role of geographic barriers in aPleistocene divergence of the grass finches
Bayes Markov chain Monte Carlo (MCMC) method(Yang and Rannala)
- multiple independent loci- estimates ancestral (present also)- estimates population divergence times- uses branch length information- accounts for uncertainty in gene trees
Assumptions:-“know” the species tree- random mating- no gene flow after population divergence- free recombination among loci (not within)
Parameterized model for making inferences about the divergence process
Analysis of 30 anonymous nuclear loci
Jennings & Edwards (2005) Evolution
hecki acuticauda cincta
tahc-tah
tah
ah
ahc
Prior and posterior probability distributions(grey and black lines refer to analyses
based on two different priors)
Increasing variance with decreasing number of loci
Estimating population genetic parameters relevant to the process of species divergence
1
T
Present
A
2m
• The good
• The bad
• The ugly
Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa
Effects of sampling scheme:
contrast between sequencing single representatives per species versus multiple individuals per species
gene tree
species tree
Gene trees will not always match the species tree
• deep coalescence
Maddison 1997
While there is a distribution of possible gene trees for a given species tree, the probabilities of each gene tree differs
low P(Gtree|Stree)high P(Gtree|Stree)
Degnan & Salter (2005) Evolution
5 taxa
105 possible gene tree topologies
* The shape of this distribution will differ depending on the shape of the species tree
Inferred history of species divergence differs among loci
Jennings & Edwards (2005) Evolution
Gene trees from 30 anonymous markers with single individual sequenced per species
Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa
Gene tree from one locus with 9 individuals sequenced in each
of 8 different species
Multilocus data
concatenation
“THE history”
Arbitrary criteria
History of divergence based on single nucleotide difference
What is the true species tree?
Recently developed approaches for estimating the species tree(explicitly consider the process of gene lineage coalescence in the
estimation of the history of species divergence)
Maddison & Knowles 2006Edwards et al. 2007
Liu & Pearl 2007
genetree
speciestree
Gene tree from one locus with multiple individuals sequenced per species
discord
Extract the historical signal of species divergence, despite discord between the gene tree and species tree
Goal: estimate the species tree directly(as opposed to estimating a gene tree and equating that gene tree with the
history of the species)
speciestree
species Aspecies Aspecies Aspecies Aspecies Aspecies Aspecies Aspecies A
genetree
speciestree
discord
(1) minimize the number of deep coalescences
(2) shallowest divergence between species
Considers the process of lineage sorting, but the actual probabilities of incomplete lineage sorting are not quantified using a stochastic model
STEM and BEST: Likelihood and Bayesian approaches that incorporate stochastic models of both nucleotide substitution and lineage sorting processes
Can the history of species divergence be recovered from a single gene tree:
T C C G G T G T C A A T. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . A . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .T C C G G T G T C A A T. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . A . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .
simulatedspecies trees
simulatedsequences
simulatedgene trees
• shallowest divergence approach
• minimize the number of deep coalescences
reconstructedgene trees
reconstructedspecies trees
infer species tree:
500
1......
Maddison & Knowles 2006
simulatedspecies trees
inferredspecies trees
accuracy assessment number of partitions of the species in common between original and inferred species trees (max = 5 for the 8 species trees)
500 replicate species trees of 8 species each
Goals: Examine a reasonable spectrum of topologies and branch lengths
simulatedspecies trees
(500 species trees were simulated rather than choosing a single species tree & assessing how well it can be reconstructed with many simulation replicates)
• t = 100,000 (i.e., 1Ne); 500 replicate species trees
• t = 1,000,000 (i.e., 10Ne); 500 replicate species trees (*topologies of the two sets of trees are identical)
Determine how the extent of incomplete lineage sorting affects the ability to reconstruct species histories
Maddison & Knowles 2006
(1, 3, 9 or 27 gene trees representing unlinked loci simulated independently with either 1, 3, 9 or 27 gene sequences simulated for each locus per species)
simulatedspecies trees
simulatedgene trees
neutral coalescence (Ne = 100,000)
Increasing total sampling effort per species (either 1, 3, 9 or 27 sequences per species)
Increasing the number of individuals per locus versus the number of loci per species for a given sampling effort
Accuracy affected by:
Maddison & Knowles 2006
gene copiesper locus 1Ne 10Ne
1 7.6 1.83 28.7 6.99 63.2 14.727 114.4 25.7
Number of deep coalescences
Lots of discord (i.e.,our simulated data should well
reflect the challenges faced by reconstructing evolutionary
relationships near the species/population level)
Maddison & Knowles 2006
b. total tree depth of 10 Ne
3
9
27
1 locusa. total tree depth of 1Ne
1 Deep Coalescents
3
9
27
Average proportion of correct partitions (those in the inferred tree matching the true tree)
gene trees retain some signal of phylogenetic history
despite significant discord with species tree
* Average accuracy greater as expected
0.26 0.27
0.47 0.53
0.59 0.60
0.64 0.56
0.76 0.73
1 locus
0.79 0.78
0.80 0.79
0.82 0.84
0.60 is reasonably successful, given that the shared partition measure is sensitive to minor changes in tree structure (approximately equivalent to a single terminal taxon being out of place)
Shallowest Divergence
Deep Coalescents
Shallowest Divergence
Deep Coalescents
Shallowest Divergence
Deep Coalescents Shallowest Divergence
gene copies
gene copies
gene copies
gene copy
Deep Coalescents
Shallowest Divergence
Shallowest DivergenceDeep Coalescents
Shallowest DivergenceDeep Coalescents
Shallowest DivergenceDeep Coalescents
gene copies
gene copies
gene copies
Maddison & Knowles 2006
genetree
speciestree
**
*
*
*
*
Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa
Gene tree from one locus with multiple individuals sequenced per species and very simple approach
• The good
• The bad
• The ugly
What would happen if more loci were considered?
00
0.1
0.2
0.3
0.4
1 2 3 4 5
0.8
pro
po
rtio
n o
f tr
ee
s
random1 individual
3 individuals9 individuals27 individuals
0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.8
pro
po
rtio
n o
f tr
ee
s
tree accuracy ( number of shared partitions with ‘true’ tree)
random1 locus
3 loci9 loci27 loci
Frequency distribution of species tree accuracy with increasing number of loci
Frequency distribution of species tree accuracy with increasing number of individuals
Similar accuracy for a given sampling effort if sample multiple individuals compared to loci for recent divergence (t = 1Ne)
• The curve marked “random” shows the expected distribution of the accuracy measure in comparing two randomly simulated trees
• Wayne Maddison • Bryan Carstens, (former postdoc)
support NSF (DEB 04-47224) & the University of Michigan
Acknowledgements: