joaks-evolution-2014
TRANSCRIPT
An Improved Approximate-Bayesian Method forEstimating Shared Evolutionary History
Jamie R. Oaks1,2
1Department of Ecology and Evolutionary Biology, University of Kansas
2Department of Biology, University of Washington
June 21, 2014
Estimating shared history J. Oaks, University of Washington 1/24
Processes of diversification
I Large-scale geological and climatic processes are important inbiodiversification and community assembly
I Accounting for such processes will better our understanding ofbiodiversity
I We need methods for inferring evolutionary patterns predictedby historical events from contemporary populations
Estimating shared history J. Oaks, University of Washington 2/24
Processes of diversification
I Large-scale geological and climatic processes are important inbiodiversification and community assembly
I Accounting for such processes will better our understanding ofbiodiversity
I We need methods for inferring evolutionary patterns predictedby historical events from contemporary populations
Estimating shared history J. Oaks, University of Washington 2/24
Processes of diversification
I Large-scale geological and climatic processes are important inbiodiversification and community assembly
I Accounting for such processes will better our understanding ofbiodiversity
I We need methods for inferring evolutionary patterns predictedby historical events from contemporary populations
Estimating shared history J. Oaks, University of Washington 2/24
Community scale processes
We want to infer m and Tgiven DNA sequencealignments X
Estimating shared history J. Oaks, University of Washington 3/24
Community scale processes
We want to infer m and Tgiven DNA sequencealignments X
Estimating shared history J. Oaks, University of Washington 3/24
Community scale processes
We want to infer m and Tgiven DNA sequencealignments X
Estimating shared history J. Oaks, University of Washington 3/24
Community scale processes
We want to infer m and Tgiven DNA sequencealignments X
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (T1,T2,T3)
model = 111
τ = {τ1}
We want to infer m and Tgiven DNA sequencealignments X
τ1
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (260, 260, 260)
model = 111
τ = {260}
We want to infer m and Tgiven DNA sequencealignments X
τ1
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (397, 260, 260)
model = 211
τ = {260, 397}
We want to infer m and Tgiven DNA sequencealignments X
τ1τ2
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (260, 397, 260)
model = 121
τ = {260, 397}
We want to infer m and Tgiven DNA sequencealignments X
τ1τ2
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (260, 260, 397)
model = 112
τ = {260, 397}
We want to infer m and Tgiven DNA sequencealignments X
τ1τ2
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (260, 95, 397)
model = 123
τ = {260, 95, 397}
We want to infer m and Tgiven DNA sequencealignments X
τ1 τ3τ2
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (T1, . . . ,TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and Tgiven DNA sequencealignments X
τ1
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (T1, . . . ,TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and Tgiven DNA sequencealignments X
τ1
T1
T2
T3
0100200300400500Time (kya)
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
T = (T1, . . . ,TY)
model = mi
τ = {τ1, . . . , τ|τ|}
We want to infer m and Tgiven DNA sequencealignments X
τ1
0100200300400500Time (kya)
T1
T2
T3
Estimating shared history J. Oaks, University of Washington 3/24
Divergence model choice
X Sequence alignments
T Divergence times
m Divergence model
G Gene trees
φ Substitutionparameters
Θ Demographicparameters
We want to infer m and Tgiven DNA sequencealignments X
τ1
0100200300400500Time (kya)
T1
T2
T3
Estimating shared history J. Oaks, University of Washington 3/24
Bayesian model choice
Full model:
p(T,G,φ,Θ |X,mi ) =p(X |T,G,φ,Θ,mi )p(T,G,φ,Θ |mi )
p(X |mi )
p(X |mi ) =
∫θi
p(X | θi ,mi )p(θi |mi )dθi
p(mi |X) =p(X |mi )p(mi )∑i p(X |mi )p(mi )
msBayes: Approximate Bayesian computation (ABC)
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
Bayesian model choice
Full model:
p(T,G,φ,Θ |X,mi ) =p(X |T,G,φ,Θ,mi )p(T,G,φ,Θ |mi )
p(X |mi )
p(X |mi ) =
∫θi
p(X | θi ,mi )p(θi |mi )dθi
p(mi |X) =p(X |mi )p(mi )∑i p(X |mi )p(mi )
msBayes: Approximate Bayesian computation (ABC)
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
Bayesian model choice
Full model:
p(T,G,φ,Θ |X,mi ) =p(X |T,G,φ,Θ,mi )p(T,G,φ,Θ |mi )
p(X |mi )
p(X |mi ) =
∫θi
p(X | θi ,mi )p(θi |mi )dθi
p(mi |X) =p(X |mi )p(mi )∑i p(X |mi )p(mi )
msBayes: Approximate Bayesian computation (ABC)
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
Bayesian model choice
Full model:
p(T,G,φ,Θ |X,mi ) =p(X |T,G,φ,Θ,mi )p(T,G,φ,Θ |mi )
p(X |mi )
p(X |mi ) =
∫θi
p(X | θi ,mi )p(θi |mi )dθi
p(mi |X) =p(X |mi )p(mi )∑i p(X |mi )p(mi )
msBayes: Approximate Bayesian computation (ABC)
W. Huang et al. (2011). BMC Bioinformatics 12: 1. J. R. Oaks et al. (2013). Evolution 67: 991–1010.
Estimating shared history J. Oaks, University of Washington 4/24
The msBayes model
msBayes will often infer clustered divergences when divergences arerandom over millions of generations.
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 5/24
The msBayes model
msBayes will often infer clustered divergences when divergences arerandom over millions of generations.
Objective:
Use principles of probability to extend msBayes framework forimproved estimation of shared evolutionary history
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 5/24
An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginallikelihoods of rich models
2. Alternative approach to modeling the temporal distribution ofdivergences
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 6/24
p(X ) =
∫θ
p(X | θ)p(θ)dθ
Estimating shared history J. Oaks, University of Washington 7/24
p(X ) =
∫θ
p(X | θ)p(θ)dθ
Estimating shared history J. Oaks, University of Washington 7/24
p(X ) =
∫θ
p(X | θ)p(θ)dθ
0.0 0.2 0.4 0.6 0.8 1.0θ
0
5
10
15
20
25
30De
nsity
p(X | θ)
p(θ)
Estimating shared history J. Oaks, University of Washington 7/24
p(X ) =
∫θ
p(X | θ)p(θ)dθ
0.0 0.2 0.4 0.6 0.8 1.0θ
0
5
10
15
20
25
30De
nsity
p(X | θ)
p(θ)
Estimating shared history J. Oaks, University of Washington 7/24
An improved method
Potential improvements:
1. Alternative priors on parameters that increase marginallikelihoods of rich models
2. Alternative approach to modeling the temporal distribution ofdivergences
J. R. Oaks et al. (2013). Evolution 67: 991–1010. J. R. Oaks et al. (2014). arXiv:1402.6397 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 8/24
Prior on divergence models
I msBayes uses a discrete uniform prior on the number ofdivergence events
# of
div
erge
nce
mod
els
020
4060
8010
012
0
1 3 5 7 9 11 13 15 17 19 21
A
p(M
|τ|,i)
0.00
0.01
0.02
0.03
0.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|
Potential solution:
Place flexible prior directly on the sample space of divergencemodels
Estimating shared history J. Oaks, University of Washington 9/24
Prior on divergence models
I msBayes uses a discrete uniform prior on the number ofdivergence events
# of
div
erge
nce
mod
els
020
4060
8010
012
0
1 3 5 7 9 11 13 15 17 19 21
A
p(M
|τ|,i)
0.00
0.01
0.02
0.03
0.04
1 3 5 7 9 11 13 15 17 19 21
B
# of divergence events, |τ|
Potential solution:
Place flexible prior directly on the sample space of divergencemodels
Estimating shared history J. Oaks, University of Washington 9/24
New method: dpp-msbayes
I Replaced uniform priors on continuous parameters withgamma and beta distributions
I Dirichlet process prior (DPP) over all possible divergencemodels
Estimating shared history J. Oaks, University of Washington 10/24
dpp-msbayes: Simulation-based assessment
Simulate 50,000 datasets under three models
MmsBayes I U-shaped prior on divergence modelsI Uniform priors on continuous parameters
MUshaped I U-shaped prior on divergence modelsI Gamma priors on continuous parameters
MDPP I DPP prior on divergence modelsI Gamma priors on continuous parameters
Analyze all datasets under each of the models
Estimating shared history J. Oaks, University of Washington 11/24
dpp-msbayes: Simulation results
0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Posterior probability of one divergence
True
prob
abili
tyof
one
dive
rgen
ceA
nalysism
odelData model
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 12/24
dpp-msbayes: Simulation results
0.0
0.2
0.4
0.6
0.8
1.0
MmsBayes MDPP MUniform MUshaped
MmsBayes
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
MDPP
Posterior probability of one divergence
True
prob
abili
tyof
one
dive
rgen
ceA
nalysism
odel
Data model
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 12/24
dpp-msbayes: Simulation-based power analyses
I Simulate datasets in which all 22 divergence times are random
I τ ∼ U(0, 0.5MGA)
I τ ∼ U(0, 1.5MGA)
I τ ∼ U(0, 2.5MGA)
I τ ∼ U(0, 5.0MGA)
I MGA = Millions of Generations Ago
I Simulate 1000 datasets for each τ distribution
I Analyze all 4000 datasets under models MmsBayes , MUshaped ,and MDPP
Estimating shared history J. Oaks, University of Washington 13/24
dpp-msbayes: Power results
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
¿»U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 5:0 MGA)
MmsBayes
Estimated number of divergence events (mode)
Den
sity
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 14/24
dpp-msbayes: Power results
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
¿»U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 5:0 MGA)
MmsBayes
Estimated number of divergence events (mode)
Den
sity
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MDPP
Estimated number of divergence events (mode)
Den
sity
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 14/24
dpp-msbayes: Power results
0.0 0.25 0.5 0.75 102468
10121416
¿»U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 5:0 MGA)
MmsBayes
Posterior probability of one divergence
Den
sity
0.0 0.25 0.5 0.75 10
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Posterior probability of one divergence
Den
sity
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 15/24
dpp-msbayes: Power results
0.0 0.25 0.5 0.75 102468
10121416
¿»U(0; 0:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 1:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 2:5 MGA)
0.0 0.25 0.5 0.75 1
¿»U(0; 5:0 MGA)MmsBayes
Posterior probability of one divergence
Den
sity
0.0 0.25 0.5 0.75 10123456789
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MUshaped
Posterior probability of one divergence
Den
sity
0.0 0.25 0.5 0.75 10
5
10
15
20
0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1 0.0 0.25 0.5 0.75 1
MDPP
Posterior probability of one divergence
Den
sity
Estimating shared history J. Oaks, University of Washington 16/24
Empirical application
Did fragmentation of PhilippineIslands during inter-glacial rises insea level promote diversification?
Estimating shared history J. Oaks, University of Washington 17/24
Empirical results: Philippine diversification
1 3 5 7 9 11 13 15 17 19 21Number of divergence events
0.0
0.1
0.2
0.3
0.4
0.5
Pos
terio
r pro
babi
lity
msBayes
1 3 5 7 9 11 13 15 17 19 21Number of divergence events
dpp-msbayes
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 18/24
Conclusions
I New method for estimating shared evolutionary history showsimproved
1. Estimation of posterior uncertainty2. Model-choice accuracy3. Power to detect temporal variation across divergences4. Robustness to model violations
Caveats:
I Estimating a very rich (600+ parameters for 22 taxa) modelusing limited information from the data
I Likely sensitive to prior assumptionsI Be skeptical of strongly supported results
Estimating shared history J. Oaks, University of Washington 19/24
Conclusions
I New method for estimating shared evolutionary history showsimproved
1. Estimation of posterior uncertainty2. Model-choice accuracy3. Power to detect temporal variation across divergences4. Robustness to model violations
Caveats:
I Estimating a very rich (600+ parameters for 22 taxa) modelusing limited information from the data
I Likely sensitive to prior assumptionsI Be skeptical of strongly supported results
Estimating shared history J. Oaks, University of Washington 19/24
Recommendations
For Bayesian model choice, choose priors carefully
ABC model choice estimates should be accompanied by:
1. Simulation-based power analyses
2. Assessment of prior sensitivity
Estimating shared history J. Oaks, University of Washington 20/24
Future directions
I Full-likelihood Bayesian approach 1
I Full-phylogenetic frameworkτ1
0100200300400500Time (kya)
T1
T2
T3
1 J. Sukumaran (2012). PhD thesis. Lawrence, Kansas, USA: University of Kansas
Estimating shared history J. Oaks, University of Washington 21/24
Everything is on GitHub. . .
Software:
I dpp-msbayes: https://github.com/joaks1/dpp-msbayes
I PyMsBayes: https://github.com/joaks1/PyMsBayes
I ABACUS: Approximate BAyesian C UtilitieS.https://github.com/joaks1/abacus
Open-Science Notebook:
I msbayes-experiments:https://github.com/joaks1/msbayes-experiments
Estimating shared history J. Oaks, University of Washington 22/24
Acknowledgments
Ideas and feedback:
I Holder Lab
I KU Herpetology
I Melissa Callahan
Computation:
I KU ITTC
I KU Computing Center
I iPlant
Funding:
I NSF
I KU Grad Studies, EEB & BI
I SSB
I Sigma Xi
Photo credits:
I Rafe Brown, Cam Siler, &Jake Esselstyn
I FMNH Philippine MammalWebsite:
I D.S. Balete, M.R.M. Duya,& J. Holden
I PhyloPic!
Estimating shared history J. Oaks, University of Washington 23/24
Questions?
Estimating shared history J. Oaks, University of Washington 24/24
Causes of bias: Insufficient sampling
I Models with more parameter space are less densely sampled
I Could explain bias toward small models in extreme casesI Predicts large variance in posterior estimates
I We explored empirical and simulation-based analyses with 2, 5,and 10 million prior samples, and estimates were very similar
0.0 0.2 0.4 0.6 0.8 1.01e8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
95%
HPD
DT
UnadjustedA
0.0 0.2 0.4 0.6 0.8 1.01e8
0.00.10.20.30.40.50.60.70.8 GLM-adjustedB
Number of prior samples
Estimating shared history J. Oaks, University of Washington 24/24
dpp-msbayes: Simulation results
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
¿»U(0; 0:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 1:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 2:5 MGA)
1 3 5 7 9 11 13 15 17 19 21
¿»U(0; 5:0 MGA)MmsBayes
Estimated number of divergence events (mode)
Den
sity
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MUshaped
Estimated number of divergence events (mode)
Den
sity
1 3 5 7 9 11 13 15 17 19 210.0
0.2
0.4
0.6
0.8
1.0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
MDPP
Estimated number of divergence events (mode)
Den
sity
Estimating shared history J. Oaks, University of Washington 24/24
dpp-msbayes: Simulation results
0.0 0.02 0.04 0.06 0.08 0.1 0.120.0
50.0
100.0
150.0
200.0p(D̂T <0:01)=1:0
¿»U(0; 0:5 MGA)
0.0 0.02 0.04 0.06 0.08 0.1 0.120.0
50.0
100.0
150.0
200.0p(D̂T <0:01)=0:999
¿»U(0; 1:5 MGA)
0.0 0.02 0.04 0.06 0.080.0
50.0
100.0
150.0
200.0p(D̂T <0:01)=0:996
¿»U(0; 2:5 MGA)
0.0 0.02 0.04 0.06 0.08 0.1 0.120.0
40.0
80.0
120.0
160.0
p(D̂T <0:01)=0:637
¿»U(0; 5:0 MGA)MmsBayes
Estimated variance in divergence times (median)
Den
sity
0.0 0.1 0.2 0.30.0
20.0
40.0
60.0
p(D̂T <0:01)=0:914
0.0 0.2 0.4 0.6 0.80.0
5.0
10.0
15.0
20.0
25.0p(D̂T <0:01)=0:626
0.0 0.2 0.4 0.6 0.80.0
2.0
4.0
6.0
8.0
p(D̂T <0:01)=0:235
0.0 0.4 0.8 1.20.0
0.5
1.0
1.5
2.0
2.5p(D̂T <0:01)=0:004
MUshaped
Estimated variance in divergence times (median)
Den
sity
0.0 0.1 0.2 0.3 0.4 0.50.0
2.0
4.0
6.0
8.0
10.0p(D̂T <0:01)=0:002
0.0 0.4 0.8 1.20.0
1.0
2.0
3.0
4.0
p(D̂T <0:01)=0:0
0.0 0.4 0.8 1.20.0
0.5
1.0
1.5
2.0
2.5p(D̂T <0:01)=0:0
0.0 0.4 0.8 1.2 1.60.0
0.5
1.0
1.5
2.0
2.5
3.0p(D̂T <0:01)=0:0
MDPP
Estimated variance in divergence times (median)
Den
sity
Estimating shared history J. Oaks, University of Washington 24/24
Empirical results: Philippine diversification
0.0
0.1
0.2
0.3
0.4
0.5msBayes dpp-msbayes
Posterior
1 3 5 7 9 11 13 15 17 19 210.0
0.1
0.2
0.3
0.4
0.5
1 3 5 7 9 11 13 15 17 19 21
Prior
Number of divergence events
Pro
babi
lity
J. R. Oaks (2014). arXiv:1402.6303 [q-bio.PE].
Estimating shared history J. Oaks, University of Washington 24/24