bayesian phylogenetics - biostatisticsthe one ‘true’ tree? methodswe’velearnedsofar...

28
Bayesian Phylogenetics: an introduction Marc A. Suchard [email protected] UCLA

Upload: others

Post on 17-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayesian Phylogenetics:an introduction

Marc A. [email protected]

UCLA

Page 2: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Who is this man?

How sure are you?

SISMID University of Washington Bayesian Phylogenetics

Page 3: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

The one ‘true’ tree?

Methods we’ve learned so fartry to find a single tree thatbest describes the dataHowever, they do not searcheverywhere, and it is difficultto find the “best” treeMany (gazillions of) trees maybe almost as good

SISMID University of Washington Bayesian Phylogenetics

Page 4: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayesian phylogenetics: general principle

Using Bayesian principles, we will search for and average over setsof plausible trees (weighted by their probability) instead a single“best” treeIn this method, the “space” that you search is limited by priorinformation and the data

The posterior distribution oftrees naturally translates intoprobability statements (anduncertainty) on aspects ofdirect scientific interest

I When did an evolutionaryevent happen?

I Are a subset of sequencesmore closely related?

The cost: we must formalizeour prior beliefs

SISMID University of Washington Bayesian Phylogenetics

Page 5: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Conditional probability: intuition

Philippe is a hipster.Philippe is a hipster

and rides a single-speed bike.

Which is more probable?

SISMID University of Washington Bayesian Phylogenetics

Page 6: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Conditional probability: intuition

Arbitrary events A (hipster) and B (bike) from sample space U

SISMID University of Washington Bayesian Phylogenetics

Page 7: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayes theorem

Definition of conditional probability in words:

probability(A and B) = probability(A given B)× probability(B)

In usual mathematical symbols:

p(A|B)p(B) = p(A,B) = p(B|A)p(A)

With a slight re-arrangement:

p(A|B) =p(B|A)p(A)

p(B)

“Just" a restatement of conditional probability

SISMID University of Washington Bayesian Phylogenetics

Page 8: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayes theorem

Integration (averaging) yields a marginal probability:

p(A) =

∫p(A,B)dB =

∫p(A|B)p(B)dB︸ ︷︷ ︸

over all possible values of B

probability(hipster) = probability(hipster and has bike) +probability(hipster and has no bike)

SISMID University of Washington Bayesian Phylogenetics

Page 9: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Conditional probability: pop quiz

What do you know about Thomas Bayes?Bayes theorem?

Some discussion points:Favorite game? Best buddies?

SISMID University of Washington Bayesian Phylogenetics

Page 10: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayes theorem for statistical inference

Unknown quantity θ (model parameters, scientific hypotheses)Prior p(θ) beliefs before observed data Y become availableConditional probability p(Y |θ) of the data given fixed θ – alsocalled the likelihood of YPosterior p(θ|Y ) beliefs:

p(θ|Y ) =p(Y |θ)p(θ)p(Y )

p(θ) and p(Y |θ) – easyp(Y ) =

∫p(Y |θ)p(θ)dθ – hard

SISMID University of Washington Bayesian Phylogenetics

Page 11: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayesian phylogenetic inference

SISMID University of Washington Bayesian Phylogenetics

Page 12: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayesian phylogenetic inference

SISMID University of Washington Bayesian Phylogenetics

Page 13: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Bayesian phylogenetic inference

Posterior:

p(θ|Y ) =p(Y |θ)p(θ)p(Y )

Trouble: p(Y ) is notcomputable – sum over allpossible treesFor N taxa: there are G(N) =(2N − 3)× (2N − 5)× · · · × 1

θ = (tree, substitution process)p(Y |θ) - continuous-timeMarkov chain process thatgives rise to sequences at tipsof tree

E.g., G(21) > 3× 1023

SISMID University of Washington Bayesian Phylogenetics

Page 14: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Priors

Strongest assumption: most parameters are separable, e.g. the treeis independent of the substitution processWeaker assumption: tree ∼ Coalescent processWeaker assumption: functional form on substitution parameters

Specialized priors as wellIf worried: check sensitivity

SISMID University of Washington Bayesian Phylogenetics

Page 15: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Posterior inference

Numerical (Monte Carlo) integration as a solution:

SISMID University of Washington Bayesian Phylogenetics

Page 16: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Markov chain Monte Carlo

Metropolis et al (1953) andHastings (1970) proposed astochastic integrationalgorithm that can explore vastparameter spacesAlgorithm generates a Markovchain that visits parametervalues (e.g., a specific tree)with frequency equal to theirposterior density / probability.Markov chain: random walkwhere the next step onlydepends on the currentparameter state

SISMID University of Washington Bayesian Phylogenetics

Page 17: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Metropolis-Hastings Algorithm

Each step in the Markov chain starts at its current state θ andproposes a new state θ? from an arbitrary proposal distributionq(·|θ) (transition kernel)θ? becomes the new state of the chain with probability:

R = min

p(θ?|Y )

p(θ|Y )× q(θ|θ?)q(θ?|θ)

= min

p(Y |θ?)p(θ?) / p(Y )

p(Y |θ)p(θ) / p(Y )× q(θ|θ?)q(θ?|θ)

= min

p(Y |θ?)p(θ?)

p(Y |θ)p(θ)× q(θ|θ?)q(θ?|θ)

Otherwise, θ remains the state of the chain

SISMID University of Washington Bayesian Phylogenetics

Page 18: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Posterior samplingWe repeat the process of proposing a new state,calculating the acceptance probability and eitheraccepting or rejecting the proposed move millions oftimes

Although correlated, the Markovchain samples are valid draws fromthe posterior; however . . .

Initial sampling (burn-in) is oftendiscarded due to correlation withchain’s starting point ( 6= posterior)

SISMID University of Washington Bayesian Phylogenetics

Page 19: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Transition Kernels

Often we propose changes to only a small # of dimensions in θ at atime (Metropolis-within-Gibbs)In phylogenetics, mixing (correlation) in continuous dimensions ismuch better (smaller) than for the treeSo, dominate approach has been keep-it-simple-stupid –alternatives exist and may become necessary:

I Gibbs sampler; slice sampler; Hamiltonian MC

SISMID University of Washington Bayesian Phylogenetics

Page 20: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Tree Transition Kernels

SISMID University of Washington Bayesian Phylogenetics

Page 21: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Posterior Summaries

For continuous θ, consider:posterior mean or median ≈MCMC sample average ormedianquantitative measures ofuncertainty, e.g. high posteriordensity interval

Credible Regions

Parameter (x)

The Bayesian equivalent of a confidence interval is called the highest posterior density (HPD) credible region. This is the

smallest region that contains 95% of the posterior probability.

lower 95% HPD limit

post

erio

r pr

obab

ility

di

stri

butio

n

upper 95% HPD limit

For trees, consider:scientifically interestingposterior probabilitystatement, e.g. the probabilityof monophyly ≈ MCMCsample proportion under whichhypothesis is true

SISMID University of Washington Bayesian Phylogenetics

Page 22: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Posterior Probabilities

SISMID University of Washington Bayesian Phylogenetics

Page 23: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Summarizing Trees

SISMID University of Washington Bayesian Phylogenetics

Page 24: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

MCMC Diagnostics: within a single chain

VisuallyinspectMCMCoutput traces

Measure au-tocorrelationwithin achain: theeffectivesample size(ESS)

SISMID University of Washington Bayesian Phylogenetics

Page 25: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

MCMC Diagnostics: across multiple chains

VisuallyinspectMCMCoutput traces

Comparing different chains → variance among and between chains

SISMID University of Washington Bayesian Phylogenetics

Page 26: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Improving Mixing

(Only if convergence diagnostics suggest a problem)

Run the chain longerUse a more parsimonious model(uninformative data)Change tuning parameters of transitionkernels to bring acceptance rates to 10% to70%

Use different transition kernels (consult anexpert)

SISMID University of Washington Bayesian Phylogenetics

Page 27: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Improving Mixing

SISMID University of Washington Bayesian Phylogenetics

Page 28: Bayesian Phylogenetics - BiostatisticsThe one ‘true’ tree? Methodswe’velearnedsofar trytofindasingletreethat bestdescribesthedata However,theydonotsearch everywhere,anditisdifficult

Why Bother being Bayesian?

In practice, we have almost no prior knowledge for the modelparameters. So, why bother with Bayesian inference?

Analysis provides directly interpretable probability statementsgiven the observed dataMCMC is a stochastic algorithm that (in theory) avoids gettingtrapped in local sub-optimal solutionsSearch space under Coalescent prior is astronomically “smaller”By numerically integrating over all possible trees, we obtainmarginal probability statements on hypotheses of scientific interest,e.g. specific branching events or population dynamics, avoiding bias

SISMID University of Washington Bayesian Phylogenetics