speciation history inferred from gene trees l. lacey knowles department of ecology and evolutionary...

Post on 13-Jan-2016

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Speciation history inferred from gene trees

L. Lacey KnowlesDepartment of Ecology and Evolutionary Biology

University of Michigan, Ann Arbor MI

knowlesl@umich.edu

Emphasis on multilocus data in phylogenetics and phylogeography…

• The good • The bad• The ugly

Utility of single locus data for inferences about speciation history??

Estimating population genetic parameters relevant to the process of species divergence

1

T

Present

A

2

m

speciation

T

Was speciation promoted by displacements into glacial refugia or recolonization of sky islands

during interglacials?

Was diversification inhibited or promoted during the Pleistocene?

• accurate & precise estimates of T isessential to evaluating when, and therefore the geographic setting, of species divergence

Parameterized model for making inferences about the divergence process

map of the sky islands above 2500 m

MT

WY

ID

36 M. oregonensis

23 M. montanus

Divergence M. oregonensis and M. montanus from the Rocky Mountains

Carstens & Knowles 2007, Mol. Ecol. 16:619-27.

5 anonymous nuclear loci1 mitochondrial locus

1

T

Presentm

A

2

coalescent framework and multilocus versus single locus data set

4.9 x 105 to 2.0 x 106

estimate from average mtDNA genetic distance:

*same mutation rate used in the different approaches

divergence of gene lineages within the ancestral species

Assumed species tree of Poephila finches

Jennings & Edwards (2005) Evolution

hecki acuticauda cincta

Long-tailed Finch Black-throated Finch

tahc-tah

tah

Australia

ah

ahc

Identified role of geographic barriers in aPleistocene divergence of the grass finches

Bayes Markov chain Monte Carlo (MCMC) method(Yang and Rannala)

- multiple independent loci- estimates ancestral (present also)- estimates population divergence times- uses branch length information- accounts for uncertainty in gene trees

Assumptions:-“know” the species tree- random mating- no gene flow after population divergence- free recombination among loci (not within)

Parameterized model for making inferences about the divergence process

Analysis of 30 anonymous nuclear loci

Jennings & Edwards (2005) Evolution

hecki acuticauda cincta

tahc-tah

tah

ah

ahc

Prior and posterior probability distributions(grey and black lines refer to analyses

based on two different priors)

Increasing variance with decreasing number of loci

Estimating population genetic parameters relevant to the process of species divergence

1

T

Present

A

2m

• The good

• The bad

• The ugly

Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa

Effects of sampling scheme:

contrast between sequencing single representatives per species versus multiple individuals per species

gene tree

species tree

Gene trees will not always match the species tree

• deep coalescence

Maddison 1997

While there is a distribution of possible gene trees for a given species tree, the probabilities of each gene tree differs

low P(Gtree|Stree)high P(Gtree|Stree)

Degnan & Salter (2005) Evolution

5 taxa

105 possible gene tree topologies

* The shape of this distribution will differ depending on the shape of the species tree

Inferred history of species divergence differs among loci

Jennings & Edwards (2005) Evolution

Gene trees from 30 anonymous markers with single individual sequenced per species

Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa

Gene tree from one locus with 9 individuals sequenced in each

of 8 different species

Multilocus data

concatenation

“THE history”

Arbitrary criteria

History of divergence based on single nucleotide difference

What is the true species tree?

Recently developed approaches for estimating the species tree(explicitly consider the process of gene lineage coalescence in the

estimation of the history of species divergence)

Maddison & Knowles 2006Edwards et al. 2007

Liu & Pearl 2007

genetree

speciestree

Gene tree from one locus with multiple individuals sequenced per species

discord

Extract the historical signal of species divergence, despite discord between the gene tree and species tree

Goal: estimate the species tree directly(as opposed to estimating a gene tree and equating that gene tree with the

history of the species)

speciestree

species Aspecies Aspecies Aspecies Aspecies Aspecies Aspecies Aspecies A

genetree

speciestree

discord

(1) minimize the number of deep coalescences

(2) shallowest divergence between species

Considers the process of lineage sorting, but the actual probabilities of incomplete lineage sorting are not quantified using a stochastic model

STEM and BEST: Likelihood and Bayesian approaches that incorporate stochastic models of both nucleotide substitution and lineage sorting processes

Can the history of species divergence be recovered from a single gene tree:

T C C G G T G T C A A T. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . A . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .T C C G G T G T C A A T. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . A . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .. . . C . . . . . . . .. . . . . . . . . . . .

simulatedspecies trees

simulatedsequences

simulatedgene trees

• shallowest divergence approach

• minimize the number of deep coalescences

reconstructedgene trees

reconstructedspecies trees

infer species tree:

500

1......

Maddison & Knowles 2006

simulatedspecies trees

inferredspecies trees

accuracy assessment number of partitions of the species in common between original and inferred species trees (max = 5 for the 8 species trees)

500 replicate species trees of 8 species each

Goals: Examine a reasonable spectrum of topologies and branch lengths

simulatedspecies trees

(500 species trees were simulated rather than choosing a single species tree & assessing how well it can be reconstructed with many simulation replicates)

• t = 100,000 (i.e., 1Ne); 500 replicate species trees

• t = 1,000,000 (i.e., 10Ne); 500 replicate species trees (*topologies of the two sets of trees are identical)

Determine how the extent of incomplete lineage sorting affects the ability to reconstruct species histories

Maddison & Knowles 2006

(1, 3, 9 or 27 gene trees representing unlinked loci simulated independently with either 1, 3, 9 or 27 gene sequences simulated for each locus per species)

simulatedspecies trees

simulatedgene trees

neutral coalescence (Ne = 100,000)

Increasing total sampling effort per species (either 1, 3, 9 or 27 sequences per species)

Increasing the number of individuals per locus versus the number of loci per species for a given sampling effort

Accuracy affected by:

Maddison & Knowles 2006

gene copiesper locus 1Ne 10Ne

1 7.6 1.83 28.7 6.99 63.2 14.727 114.4 25.7

Number of deep coalescences

Lots of discord (i.e.,our simulated data should well

reflect the challenges faced by reconstructing evolutionary

relationships near the species/population level)

Maddison & Knowles 2006

b. total tree depth of 10 Ne

3

9

27

1 locusa. total tree depth of 1Ne

1 Deep Coalescents

3

9

27

Average proportion of correct partitions (those in the inferred tree matching the true tree)

gene trees retain some signal of phylogenetic history

despite significant discord with species tree

* Average accuracy greater as expected

0.26 0.27

0.47 0.53

0.59 0.60

0.64 0.56

0.76 0.73

1 locus

0.79 0.78

0.80 0.79

0.82 0.84

0.60 is reasonably successful, given that the shared partition measure is sensitive to minor changes in tree structure (approximately equivalent to a single terminal taxon being out of place)

Shallowest Divergence

Deep Coalescents

Shallowest Divergence

Deep Coalescents

Shallowest Divergence

Deep Coalescents Shallowest Divergence

gene copies

gene copies

gene copies

gene copy

Deep Coalescents

Shallowest Divergence

Shallowest DivergenceDeep Coalescents

Shallowest DivergenceDeep Coalescents

Shallowest DivergenceDeep Coalescents

gene copies

gene copies

gene copies

Maddison & Knowles 2006

genetree

speciestree

**

*

*

*

*

Estimating the history (order) of divergence events(i.e., the species tree) for recently derived taxa

Gene tree from one locus with multiple individuals sequenced per species and very simple approach

• The good

• The bad

• The ugly

What would happen if more loci were considered?

00

0.1

0.2

0.3

0.4

1 2 3 4 5

0.8

pro

po

rtio

n o

f tr

ee

s

random1 individual

3 individuals9 individuals27 individuals

0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.8

pro

po

rtio

n o

f tr

ee

s

tree accuracy ( number of shared partitions with ‘true’ tree)

random1 locus

3 loci9 loci27 loci

Frequency distribution of species tree accuracy with increasing number of loci

Frequency distribution of species tree accuracy with increasing number of individuals

Similar accuracy for a given sampling effort if sample multiple individuals compared to loci for recent divergence (t = 1Ne)

• The curve marked “random” shows the expected distribution of the accuracy measure in comparing two randomly simulated trees

• Wayne Maddison • Bryan Carstens, (former postdoc)

knowlesl@umich.edu

support NSF (DEB 04-47224) & the University of Michigan

Acknowledgements:

top related