joaks-evolution-2014

An Improved Approximate-Bayesian Method forEstimating Shared Evolutionary History

Jamie R. Oaks1,2

1Department of Ecology and Evolutionary Biology, University of Kansas

2Department of Biology, University of Washington

June 21, 2014

Estimating shared history J. Oaks, University of Washington 1/24

Processes of diversification

I Large-scale geological and climatic processes are important inbiodiversification and community assembly

I Accounting for such processes will better our understanding ofbiodiversity

I We need methods for inferring evolutionary patterns predictedby historical events from contemporary populations


Community scale processes

We want to infer m and Tgiven DNA sequencealignments X


Community scale processes


0100200300400500Time (kya)


Divergence model choice

T = (T1,T2,T3)

model = 111

τ = {τ1}


τ1

T1

T2

T3

0100200300400500Time (kya)



T = (260, 260, 260)

model = 111

τ = {260}


τ1

T1

T2

T3

0100200300400500Time (kya)



T = (397, 260, 260)

model = 211

τ = {260, 397}


τ1τ2

T1

T2

T3

0100200300400500Time (kya)



T = (260, 397, 260)

model = 121

τ = {260, 397}


τ1τ2

T1

T2

T3

0100200300400500Time (kya)



T = (260, 260, 397)

model = 112

τ = {260, 397}


τ1τ2

T1

T2

T3

0100200300400500Time (kya)



T = (260, 95, 397)

model = 123

τ = {260, 95, 397}


τ1 τ3τ2

T1

T2

T3

0100200300400500Time (kya)



T = (T1, . . . ,TY)

model = mi

τ = {τ1, . . . , τ|τ|}


τ1

T1

T2

T3

0100200300400500Time (kya)



T = (T1, . . . ,TY)

model = mi

τ = {τ1, . . . , τ|τ|}


τ1

0100200300400500Time (kya)

T1

T2

T3



X Sequence alignments

T Divergence times

m Divergence model

G Gene trees

φ Substitutionparameters

Θ Demographicparameters


τ1

0100200300400500Time (kya)

T1

T2

T3


The msBayes model

msBayes will often infer clustered divergences when divergences arerandom over millions of generations.

J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].


The msBayes model

msBayes will often infer clustered divergences when divergences arerandom over millions of generations.

Objective:

Use principles of probability to extend msBayes framework forimproved estimation of shared evolutionary history



An improved method

Potential improvements:

1. Alternative priors on parameters that increase marginallikelihoods of rich models

2. Alternative approach to modeling the temporal distribution ofdivergences



p(X ) =

∫θ

p(X | θ)p(θ)dθ


p(X ) =

∫θ

p(X | θ)p(θ)dθ

0.0 0.2 0.4 0.6 0.8 1.0θ

0

5

10

15

20

25

30De

nsity

p(X | θ)

p(θ)


An improved method

Potential improvements:

1. Alternative priors on parameters that increase marginallikelihoods of rich models

2. Alternative approach to modeling the temporal distribution ofdivergences



Prior on divergence models

I msBayes uses a discrete uniform prior on the number ofdivergence events

# of

div

erge

nce

mod

els

020

4060

8010

012

0

1 3 5 7 9 11 13 15 17 19 21

A

p(M

|τ|,i)

0.00

0.01

0.02

0.03

0.04

1 3 5 7 9 11 13 15 17 19 21

B

# of divergence events, |τ|

Potential solution:

Place flexible prior directly on the sample space of divergencemodels


New method: dpp-msbayes

I Replaced uniform priors on continuous parameters withgamma and beta distributions

I Dirichlet process prior (DPP) over all possible divergencemodels


dpp-msbayes: Simulation-based assessment

Simulate 50,000 datasets under three models

MmsBayes I U-shaped prior on divergence modelsI Uniform priors on continuous parameters

MUshaped I U-shaped prior on divergence modelsI Gamma priors on continuous parameters

MDPP I DPP prior on divergence modelsI Gamma priors on continuous parameters

Analyze all datasets under each of the models


dpp-msbayes: Simulation results

0.0

0.2

0.4

0.6

0.8

1.0

MmsBayes MDPP

MmsBayes

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

MDPP

Posterior probability of one divergence

True

prob

abili

tyof

one

dive

rgen

ceA

nalysism

odelData model

J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].



0.0

0.2

0.4

0.6

0.8

1.0

MmsBayes MDPP MUniform MUshaped

MmsBayes

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

MDPP


True

prob

abili

tyof

one

dive

rgen

ceA

nalysism

odel

Data model



dpp-msbayes: Simulation-based power analyses

I Simulate datasets in which all 22 divergence times are random

I τ ∼ U(0, 0.5MGA)

I τ ∼ U(0, 1.5MGA)

I τ ∼ U(0, 2.5MGA)

I τ ∼ U(0, 5.0MGA)

I MGA = Millions of Generations Ago

I Simulate 1000 datasets for each τ distribution

I Analyze all 4000 datasets under models MmsBayes , MUshaped ,and MDPP


dpp-msbayes: Power results

1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

¿»U(0; 0:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 1:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 2:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 5:0 MGA)

MmsBayes

Estimated number of divergence events (mode)

Den

sity




1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

¿»U(0; 0:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 1:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 2:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 5:0 MGA)

MmsBayes


Den

sity

1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21

MDPP


Den

sity




0.0 0.25 0.5 0.75 102468

10121416

¿»U(0; 0:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 1:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 2:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 5:0 MGA)

MmsBayes


Den

sity

0.0 0.25 0.5 0.75 10

5

10

15

20

0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1

MDPP


Den

sity




0.0 0.25 0.5 0.75 102468

10121416

¿»U(0; 0:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 1:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 2:5 MGA)

0.0 0.25 0.5 0.75 1

¿»U(0; 5:0 MGA)MmsBayes


Den

sity

0.0 0.25 0.5 0.75 10123456789

0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1

MUshaped


Den

sity

0.0 0.25 0.5 0.75 10

5

10

15

20

0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1

MDPP


Den

sity


Empirical application

Did fragmentation of PhilippineIslands during inter-glacial rises insea level promote diversification?


Empirical results: Philippine diversification

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

0.0

0.1

0.2

0.3

0.4

0.5

Pos

terio

r pro

babi

lity

msBayes

1 3 5 7 9 11 13 15 17 19 21Number of divergence events

dpp-msbayes



Conclusions

I New method for estimating shared evolutionary history showsimproved

1. Estimation of posterior uncertainty2. Model-choice accuracy3. Power to detect temporal variation across divergences4. Robustness to model violations

Caveats:

I Estimating a very rich (600+ parameters for 22 taxa) modelusing limited information from the data

I Likely sensitive to prior assumptionsI Be skeptical of strongly supported results


Recommendations

For Bayesian model choice, choose priors carefully

ABC model choice estimates should be accompanied by:

1. Simulation-based power analyses

2. Assessment of prior sensitivity


Future directions

I Full-likelihood Bayesian approach 1

I Full-phylogenetic frameworkτ1

0100200300400500Time (kya)

T1

T2

T3

1 J. Sukumaran (2012). PhD thesis. Lawrence, Kansas, USA: University of Kansas


Everything is on GitHub. . .

Software:

I dpp-msbayes: https://github.com/joaks1/dpp-msbayes

I PyMsBayes: https://github.com/joaks1/PyMsBayes

I ABACUS: Approximate BAyesian C UtilitieS.https://github.com/joaks1/abacus

Open-Science Notebook:

I msbayes-experiments:https://github.com/joaks1/msbayes-experiments


https://github.com/joaks1/dpp-msbayes

https://github.com/joaks1/PyMsBayes

https://github.com/joaks1/abacus

https://github.com/joaks1/msbayes-experiments

Acknowledgments

Ideas and feedback:

I Holder Lab

I KU Herpetology

I Melissa Callahan

Computation:

I KU ITTC

I KU Computing Center

I iPlant

Funding:

I NSF

I KU Grad Studies, EEB & BI

I SSB

I Sigma Xi

Photo credits:

I Rafe Brown, Cam Siler, &Jake Esselstyn

I FMNH Philippine MammalWebsite:

I D.S. Balete, M.R.M. Duya,& J. Holden

I PhyloPic!


Questions?

[email protected]


mailto:[email protected]

Causes of bias: Insufficient sampling

I Models with more parameter space are less densely sampled

I Could explain bias toward small models in extreme casesI Predicts large variance in posterior estimates

I We explored empirical and simulation-based analyses with 2, 5,and 10 million prior samples, and estimates were very similar

0.0 0.2 0.4 0.6 0.8 1.01e8

0.0

0.2

0.4

0.6

0.8

1.0

1.2

95%

HPD

DT

UnadjustedA

0.0 0.2 0.4 0.6 0.8 1.01e8

0.00.10.20.30.40.50.60.70.8 GLM-adjustedB

Number of prior samples



1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

¿»U(0; 0:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 1:5 MGA)

1 3 5 7 9 11 13 15 17 19 21

¿»U(0; 2:5 MGA)

1 3 5 7 9 11 13 15 17 19 21



Den

sity

1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21

MUshaped


Den

sity

1 3 5 7 9 11 13 15 17 19 210.0

0.2

0.4

0.6

0.8

1.0

1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21

MDPP


Den

sity



0.0 0.02 0.04 0.06 0.08 0.1 0.120.0

50.0

100.0

150.0

200.0p(D̂T <0:01)=1:0

¿»U(0; 0:5 MGA)

0.0 0.02 0.04 0.06 0.08 0.1 0.120.0

50.0

100.0

150.0

200.0p(D̂T <0:01)=0:999

¿»U(0; 1:5 MGA)

0.0 0.02 0.04 0.06 0.080.0

50.0

100.0

150.0

200.0p(D̂T <0:01)=0:996

¿»U(0; 2:5 MGA)

0.0 0.02 0.04 0.06 0.08 0.1 0.120.0

40.0

80.0

120.0

160.0

p(D̂T <0:01)=0:637


Estimated variance in divergence times (median)

Den

sity

0.0 0.1 0.2 0.30.0

20.0

40.0

60.0

p(D̂T <0:01)=0:914

0.0 0.2 0.4 0.6 0.80.0

5.0

10.0

15.0

20.0

25.0p(D̂T <0:01)=0:626

0.0 0.2 0.4 0.6 0.80.0

2.0

4.0

6.0

8.0

p(D̂T <0:01)=0:235

0.0 0.4 0.8 1.20.0

0.5

1.0

1.5

2.0

2.5p(D̂T <0:01)=0:004

MUshaped


Den

sity

0.0 0.1 0.2 0.3 0.4 0.50.0

2.0

4.0

6.0

8.0

10.0p(D̂T <0:01)=0:002

0.0 0.4 0.8 1.20.0

1.0

2.0

3.0

4.0

p(D̂T <0:01)=0:0

0.0 0.4 0.8 1.20.0

0.5

1.0

1.5

2.0

2.5p(D̂T <0:01)=0:0

0.0 0.4 0.8 1.2 1.60.0

0.5

1.0

1.5

2.0

2.5

3.0p(D̂T <0:01)=0:0

MDPP


Den

sity


Empirical results: Philippine diversification

0.0

0.1

0.2

0.3

0.4

0.5msBayes dpp-msbayes

Posterior

1 3 5 7 9 11 13 15 17 19 210.0

0.1

0.2

0.3

0.4

0.5

1 3 5 7 9 11 13 15 17 19 21

Prior

Number of divergence events

Pro

babi

lity



joaks-evolution-2014

Science

shared history

university of washington

t3 model

divergence model choice

time kya t1 t2 t3

ty model

university of kansas

community scale processes