14.4. tue introduction to models (jarno) 16.4. thu distance-based methods (jarno) 17.4. fri ml...

77
Maximum likelihood

Upload: garey-garrett

Post on 13-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Maximum likelihood

Page 2: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

14.4. Tue Introduction to models (Jarno) 16.4. Thu Distance-based methods (Jarno) 17.4. Fri ML analyses (Jarno)

20.4. Mon Assessing hypotheses (Jarno) 21.4. Tue Problems with molecular data (Jarno) 23.4. Thu Problems with molecular data (Jarno) Phylogenomics 24.4. Fri Search algorithms, visualization, and other computational aspects (Jarno)

Schedule

J

Page 3: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Maximum likelihood methods of phylogenetic inference evaluate a hypothesis about evolutionary history (the branching order and branch lengths of a tree) in terms of a probability that a proposed model of the evolutionary process and the hypothesised history (tree) would give rise to the data we observe

Maximum Likelihood

Page 4: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The probability, P, of the data (D), given the hypothesis (H)

◦ L = P (D | H)

Likelihood of a hypothesis

Observed data (aligned sequences) Tree topology, branch lengths and

model of evolution

Page 5: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

In statistical usage, a distinction is made depending on the roles of the outcome or parameter. 

Probability is used when describing a function of the outcome given a fixed parameter value. For example, if a coin is flipped 10 times and it is a fair coin, what is the probability of it landing heads-up every time? 

Likelihood is used when describing a function of a parameter given an outcome. For example, if a coin is flipped 10 times and it has landed heads-up 10 times, what is the likelihood that the coin is fair? [Wikipedia, article on likelihood]

Likelihood or probability?What's the difference?

J

Page 6: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

An optimality criterion (as is parsimony) Given a model and data we can evaluate a

tree We can choose between trees based on the

likelihood of a given tree The tree(s) with the highest likelihood is the

best

Maximum Likelihood

Page 7: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

JC

Variable base frequencies

3 substitution types

2 substitution types

Single substitution type

3 substitution types

2 substitution types

Variable base frequencies

Equal base frequencies

F81

HKY85

F84

TrN

GTR

K2P

K3ST

SYM

6 substitution types

6 substitution types

Page 8: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Maximum Likelihood estimates parameter values of an explicit model from observed data

Likelihood provides ways of evaluating models in terms of their log likelihoods

Different trees can also be evaluated for their fit to the data under a particular model (likelihood ratio tests of two trees after Kishino & Hasegawa)

Maximum Likelihood

Page 9: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Let's toss coin ten times (n). It lands 4 times heads up (x), 6 times tails up. What is probability of a head in a single toss?◦ Compare: What is the likelihood of the data given

the process? Naturally phat= x / n = 4 / 10 = 0.4 This is also a maximum likelihood estimater

for phat. Let's see why...

Likelihood function, example

J

Page 10: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Coin toss is a binomial process:◦ Pr (X=x|n, p)

Likelihood function then becomes:◦ L(p|x, n)

Note: in the binomial formula X is the unknown, whereas in the binomial the p is the unknown (because we have the data, the coin tosses).

Likelihood function, example

J

Page 11: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The likelihood function can be solved analytically or using "brute force".

For example, result for p=0.4 is:◦ L = 210 * 0.4^4 * 0.6^6 = 0.2508227◦ logL = log(L) = -1.383009◦ -logL = -logL = 1.383009

Analytically, the point where the derivative of the likelihood function is zero, and the second derivative is negative, is the maximum of the function.

Graphically...

Likelihood function, example

J

Page 12: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Maximum Likelihood

p

Likelihood

Maximumlikelihood

Maximumlikelihood

estimator of p

Page 13: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Maximum Likelihood

μ1

Likelihood

Precise estimate

Imprecise estimate

Page 14: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

l<-function(x, n) { p<-seq(0,1,0.01) L<-rep(NA, length(p)) for(i in 1:length(p)){ L[i]<-p[i]^x* (1-p[i])^(n-x)* (factorial(n)/ (factorial(x)* factorial(n-x))) } d<-data.frame(p=p, L=L, logL=log(L)) return(d)}plot(l(4,10)[,c(1,3)], ylim=c(-30,0), type="l")

l2<-function(x, n) { p<-seq(0,1,0.01) L<-rep(NA, length(p)) for(i in 1:length(p)) { L[i]<- dbinom(4,size=10, prob=p[i],log=TRUE) } d<-data.frame(p=p, L=L) return(d)}plot(l2(), type="l")

R code

J

Page 15: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

plot(l2(), type="L")

R example result

J

Page 16: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Why log likelihood?

L(0.99|10, 4) = 0.0000000002017251 -logL(0.99|10, 4) = -22.324115

◦ When you multiply very small values together, the result is even smaller, and at some point the precision disappears (a restriction of computers)

◦ The same does not happen with log values: L = 210 * 0.4^4 * 0.6^6 = 0.2508227 logL = log(210) + 4*log(0.4)+6*log(0.6) = -1.383009

Likelihood format, why logs?

J

Page 17: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

DNA sequences can be thought of as four sided dice.

Thus, the previous coin example can be straight-forwardly generealized to DNA sequences.

DNA sequences

J

Page 18: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1 CGAGAC2 AGCGAC3 AGATTA4 GGATAG

What is the probability that unrooted Tree A (rather than another tree) could have generated the data shown under our chosen model ?

Maximum likelihood tree reconstruction

1

2

3

4

Tree A

Page 19: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1 CGAGA C2 AGCGA C3 AGATT A4 GGATA G

j

The likelihood for a particular site j is the sum of the probabilities of every possible reconstruction of ancestral states under a chosen model

4 x 4 possibilities

Tree A

C

C

A

G

Maximum likelihood tree reconstruction

ACGT

? ?

Stationarity!

Page 20: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1 CGAGA C2 AGCGA C3 AGATT A4 GGATA G

j

The likelihood for a particular site j is the sum of the probabilities of every possible reconstruction of ancestral states under a chosen model

A C G TA α α αC α α αG α α αT α α α

Tree A

Maximum likelihood tree reconstruction

C

C

A

G

ACGT

? ?

Page 21: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1 CGAG A C2 AGCG A C3 AGAT T A4 GGAT A G

j

The likelihood for a particular site j is the sum of the probabilities of every possible reconstruction of ancestral states under a chosen model

Tree A

A C G TA α α αC α α αG α α αT α α α

A

A

TACGT

? ?

A

Maximum likelihood tree reconstruction

Page 22: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

A

C

C

C

G

Branch lengths also need to be estimated!

y x w

zt1

t2

t6

t8

t4

t5

t3

t7

P(A,C,C,C,G,x,y,z,w|T)=Prob(x) Prob(y|x,t6) Prob(A|y,t1) Prob(C|y,t2)

Prob(z|x,t8) Prob(C|z,t3)

Prob(w|z,t7) Prob(C|w,t4) Prob(G|w,t5)

ti are branch lengths

(rate x time)

Page 23: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Assume a Jukes-Cantor model (all nucleotide frequencies are equal). Further assume that the branch length is 0.1.

Then we can generate a so called P-matrix from the Jukes-Cantor model's Q-matrix:

These are probabilities of a nucleotide changing to some other nucleotide.

Likelihood of a tree, example

J

Page 24: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

A: acct B: gcct

L = (0.25 * 0.0062)^1 * (0.25 * 0.9815)^3 = 2.289932e-05

logL = log(L) = -4.64

For other branch lengths, the P matrix can be multiplied by itself k times. This gives a P matrix for a k cex length.

A branch lenght can be optimized by maximizing the likelihood of a certain branch lenght.

Likelihood, two taxon case

J

Page 25: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Depending on the software, each iteration (in the tree optimization algorithm) has to for a certain tree topology:

Calculate the likelihood of the tree topology given the model and the observed data

Estimate the optimal branch lenghts

Possibly a huge number of calculations

J

Page 26: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The likelihood of Tree A is the product of the likelihoods at each site

The likelihood is usually evaluated by summing the log of the likelihoods (because the summed probabilities are so small) at each site and reported as the log likelihood of the full tree

The Maximum likelihood tree is the one with the highest likelihood (might not be Tree A i.e. it could be another tree topology)◦ Note: highest likelihood (largest value) = the largest

–logL (closest to zero) = smallest logL (closest to zero)

Maximum likelihood tree reconstruction

Page 27: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The probability of any change is independent of the prior history of the site (a Markov Model)

Substitution probabilities do not change with time or over the tree (a homogeneous Markov process)

Change is time reversible e.g. the rate of change of A to T is the same as T to A

Typical assumptions of ML substitution models

Page 28: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

A model is always a simplification of what happens in nature◦ Assumes evolution works parsimoniously

A given model will give more weight to certain changes over others

ML – an objective criterion for choosing one weighting scheme over another?

Sophisticated way to weight your data

Page 29: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

A Bayesian Approach to

Phylogenetics

Based largely on slides by Paul Lewis (www.eeb.uconn.edu)

Page 30: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

D will stand for Data H will mean any one of a number of things:

◦ a discrete hypothesis◦ a distinct model (e.g. JC, HKY, GTR, etc.)◦ a tree topology◦ one of an infinite number of continuous model

parameter values (e.g. ts:tv rate ratio)

Bayesian inference in general

Page 31: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

In ML, we choose the hypothesis that gives the highest (maximized) likelihood to the data

The likelihood is the probability of the data given the hypothesis L = P (D | H).

A Bayesian analysis expresses its results as the probability of the hypothesis given the data.◦ this may be a more desirable way to express the

result

A Bayesian approach compared to ML

Page 32: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The posterior probability, [P (H | D)], is the probability of the hypothesis given the observations, or data (D)

The main feature in Bayesian statistics is that it takes into account prior knowledge of the hypothesis

The posterior probability of a hypothesis

Page 33: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

P (H | D) = P (D | H) * P (H) P (D)

Posterior probability of hypothesis H

Likelihood of hypothesis

Prior probability of hypothesis

Probability of the data (a normalizing constant)

Page 34: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Both ML and Bayesian methods use the likelihood function◦ In ML, free parameters are optimized, maximizing

the likelihood◦ In a Bayesian approach, free parameters are

probability distributions, which are sampled.

Likelihood function is common

Page 35: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Data D: 6 heads (out of 10 flips) H = true underlying proportion of heads

(the probability of coming up heads on any single flip)

if H = 0.5, coin is perfectly fair if H = 1.0, coin always comes up heads (i.e.

it is a trick coin)

Coin-flipping example

Page 36: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

F: there exists true probability H of getting heads, H0: H=0.5◦ Does the data reject the null hypothesis?

B: what is the range around 0.5 that we are willing to accept as being in the ”fair coin” range? ◦ What is the probability that H is in this range?◦ For the coin tossing example, we can calculate

exactly the probabilities◦ For more complex data, we need to explore the

probability space MCMC

The Frequentist and the Bayesian

Page 37: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Markov chain Monte Carlo

Page 38: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Start somewhere◦ That “somewhere” will have a likelihood

associated with it◦ Not the optimized, maximum likelihood

Randomly propose a new state◦ If the new state has a better likelihood, the chain

goes there

How the MCMC works

Page 39: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 40: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 41: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

The target distribution is the posterior distribution of interest

The proposal distribution is used to decide where to go next; you have much flexibility here, and the choice affects the efficiency of the MCMC algorithm

Target vs. proposal distributions

Page 42: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 43: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 44: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 45: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 46: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 47: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 48: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Pro: taking big steps helps in jumping from one “island” in the posterior density to another

Con: taking big steps often results in poor mixing

Solution: MCMCMC!

The Tradeoff

Page 49: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

MC3 involves running several chains simultaneously (one “cold” and several “heated”)

The cold chain is the one that counts, the heated chains are “scouts”

Chain is heated by raising densities to a power less than 1.0 (values closer to 0.0 are warmer)

Metropolis-coupled Markov chain Monte Carlo (MCMCMC, or MC3)

Page 50: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 51: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 52: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Bayesian phylogenetics

Page 53: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Marginal = taking into account all possible values for all parameters

Page 54: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Record the position of the robot every 100 or 1000 steps (1000 represents more “thinning” than 100)

This sample will be autocorrelated, but not much so if it is thinned appropriately (can measure autocorrelation to assess this)

If using heated chains, only the cold chain is sampled

The marginal distribution of any parameter can be obtained from this sample

Sampling the chain

Page 55: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 56: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 57: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Start with random tree and arbitrary initial values for branch lengths and model parameters

Each generation consists of one of these (chosen at random):◦ Propose a new tree (e.g. Larget-Simon move)

and either accept or reject the move◦ Propose (and either accept or reject) a new

model parameter value Every k generations, save tree topology, branch

lengths and all model parameters (i.e. sample the chain)

After n generations, summarize sample using histograms, means, credible intervals, etc.

Putting it all together

Page 58: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 59: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Prior distributions

Page 60: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

For topologies: discrete Uniform distribution For proportions: Beta(a,b) distribution

flat when a=b peaked above 0.5 if a=b and both are greater than 1

For base frequencies: Dirichlet(a,b,c,d) distribution

flat when a=b=c=d all base frequencies close to 0.25 if v=a=b=c=d and v

large (e.g. 300) For GTR model relative rates:

Dirichlet(a,b,c,d,e,f) distribution

Prior Distributions

Page 61: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 62: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

For other model parameters and branch lengths: Gamma(a,b) distribution◦ Exponential(λ) equals Gamma(1, λ-1) distribution◦ Mean of Gamma(a,b) is ab (so mean of an

Exponential(10) distribution is 0.1)◦ Variance of a Gamma(a,b) distribution is ab2 (so

variance of an Exponential(10) distribution is 0.01)

Prior Distributions

Page 63: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 64: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Flat (uninformative) priors mean that the posterior probability is directly proportional to the likelihood◦ The value of H at the peak of the posterior

distribution is equal to the MLE of H Informative priors can have a strong effect

on posterior probabilities

The effect of priors

Page 65: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

10 important considerations

Page 66: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1. Beware arbitrarily truncated priors2. Branch length priors particularly important3. Beware high posteriors for very short branch lengths4. Partition with care (prefer fewer subsets)5. MCMC run length should depend on number of

parameters6. Calculate how many times parameters were updated7. Pay attention to parameter estimates8. Run without data to explore prior9. Run long and run often!10. Future: model selection should include effects of priors

Top 10 List (of important considerations)

Page 67: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 68: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 69: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Marshall, D.C., 2010. Cryptic failure of partitioned Bayesian phylogenetic analyses: lost in the land of long trees. Syst Biol 59, 108-117.

Page 70: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 71: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 72: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 73: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 74: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 75: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses
Page 76: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

1. Beware arbitrarily truncated priors2. Branch length priors particularly important3. Beware high posteriors for very short branch lengths4. Partition with care (prefer fewer subsets)5. MCMC run length should depend on number of

parameters6. Calculate how many times parameters were updated7. Pay attention to parameter estimates8. Run without data to explore prior9. Run long and run often!10. Future: model selection should include effects of

priors

Top 10 List (of important considerations)

Page 77: 14.4. Tue Introduction to models (Jarno)  16.4. Thu Distance-based methods (Jarno)  17.4. Fri ML analyses (Jarno)  20.4. Mon Assessing hypotheses

Bayesian methods are here to stay in phylogenetics

Are able to take into account uncertainty in parameter estimates

Are able to relax most assumptions, including rate homogeneity among branches◦ Timing of divergence analyses

Being heavily developed, new features and algorithms appear regularly

To conclude