pinpointing uncertainty

Pinpointing UncertaintyPinpointing Uncertainty

Comparing competing Comparing competing phylogenetic hypothesesphylogenetic hypotheses - tests - tests

of two of two (or more)(or more) trees trees• Particularly useful techniques are those Particularly useful techniques are those

designed to allow evaluation of alternative designed to allow evaluation of alternative phylogenetic hypothesesphylogenetic hypotheses

• Several such tests allow us to determine if Several such tests allow us to determine if one tree is statistically significantly worse one tree is statistically significantly worse than another:than another:

Winning sites, Templeton, Kishino-Hasegawa, Winning sites, Templeton, Kishino-Hasegawa, parametric bootstrapping (SOWH)parametric bootstrapping (SOWH)

Shimodaira-Hasegawa, Approximately Unbiased Shimodaira-Hasegawa, Approximately Unbiased

• Tests are of the null hypothesis that the Tests are of the null hypothesis that the differences between two trees (A and B) are differences between two trees (A and B) are no greater than expected from sampling errorno greater than expected from sampling error

• The simplest ‘wining sites’ test sums the The simplest ‘wining sites’ test sums the number of sites supporting tree A over tree B number of sites supporting tree A over tree B and vice versa (those having fewer steps on, and vice versa (those having fewer steps on, and better fit to, one of the trees)and better fit to, one of the trees)

• Under the null hypothesis characters are Under the null hypothesis characters are equally likely to support tree A or tree B and a equally likely to support tree A or tree B and a binomial distribution gives the probability of binomial distribution gives the probability of the observed difference in numbers of winning the observed difference in numbers of winning sitessites

Tests of two trees

The Templeton testThe Templeton test

• Templeton’s test is a non-parametric Templeton’s test is a non-parametric Wilcoxon signed ranks test of the Wilcoxon signed ranks test of the differences in fits of characters to two differences in fits of characters to two treestrees

• It is like the ‘winning sites’ test but also It is like the ‘winning sites’ test but also takes into account the magnitudes of takes into account the magnitudes of differences in the support of characters differences in the support of characters for the two treesfor the two trees

Templeton’s test - an exampleTempleton’s test - an exampleS

eym

ou

ria

dae

Dia

dec

tom

orp

ha

Syn

apsi

da

Par

arep

tili

a

Cap

torh

inid

ae

Paleothyris

Claudiosaurus

Yo

un

gin

ifo

rmes

Arc

ho

sau

rom

orp

ha

Lep

ido

sau

rifo

rmes

Placodus

Eo

sau

rop

tery

gia

Ara

eosc

elid

ia

2

1

Recent studies of the relationships of turtles using morphological data have produced very different results with turtles grouping either within the parareptiles (H1) or within the diapsids (H2) the result depending on themorphologist This suggests there may be:- problems with the data- special problems with turtles- weak support for turtle relationships

The Templeton test was used to evaluate the trees and showed that the slightly longer H1 tree found in the constrained analyses was not significantly worse than the unconstrained H2 treeThe morphological data do not allow choice between H1 and H2

Parsimony analysis of the most recent data favoured H2However, analyses constrained by H2 produced trees that required only 3 extra steps (<1% tree length)

Kishino-Hasegawa testKishino-Hasegawa test• The Kishino-Hasegawa test is similar in using The Kishino-Hasegawa test is similar in using

differences in the support provided by differences in the support provided by individual sites for two trees to determine if individual sites for two trees to determine if the overall differences between the trees are the overall differences between the trees are significantly greater than expected from significantly greater than expected from random sampling errorrandom sampling error

• It is a parametric test that depends on It is a parametric test that depends on assumptions that the characters are assumptions that the characters are independent and identically distributed (the independent and identically distributed (the same assumptions underlying the statistical same assumptions underlying the statistical interpretation of bootstrapping)interpretation of bootstrapping)

• It can be used with parsimony and maximum It can be used with parsimony and maximum likelihood - implemented in PHYLIP and likelihood - implemented in PHYLIP and PAUP*PAUP*

Kishino-Hasegawa testIf the difference between trees (tree lengths or likelihoods) is attributable to sampling error, then characters will randomly support tree A or B and the total difference will be close to zeroThe observed difference is significantly greater than zero if it is greater than 1.95 standard deviations This allows us to reject the null hypothesis and declare the sub-optimal tree significantly worse than the optimal tree (p < 0.05)

Under the null hypothesis the mean of the differences in parsimony steps or likelihoods for each site is expected to be zero, and the distribution normal

From observed differences we calculate a standard deviation

Distribution of Step/Likelihood differences at each site

0

Sites favouring tree A Sites favouring tree B

Ex

pe

cte

d

M

ea

n

Kishino-Hasegawa testKishino-Hasegawa testCiliate SSUrDNA

Maximum likelihood tree

OchromonasSymbiodinium

ProrocentrumSarcocystis

TheileriaPlagiopyla nPlagiopyla f

Trimyema cTrimyema sCyclidium p

Cyclidium gCyclidium l

GlaucomaColpodiniumTetrahymena

ParameciumDiscophryaTrithigmostoma

OpisthonectaColpoda

DasytrichiaEntodinium

SpathidiumLoxophylum

HomalozoonMetopus c

Metopus pStylonychiaOnychodromous

OxytrichiaLoxodes

TracheloraphisSpirostomum

GruberiaBlepharismaanaerobic ciliates with hydrogenosomes

Parsimonious character optimization of the presence and absence of hydrogenosomes suggests four separate origins of within the ciliates

Questions - how reliable is this result?- in particular how well supported is the idea of multiple origins?- how many origins can we confidently infer?

Kishino-Hasegawa testKishino-Hasegawa testOchromonasSymbiodiniumProrocentrumSarcocystisTheileriaPlagiopyla nPlagiopyla fTrimyema cTrimyema sCyclidium pCyclidium gCyclidium lDasytrichiaEntodiniumLoxophylumHomalozoonSpathidiumMetopus cMetopus pLoxodesTracheloraphisSpirostomumGruberiaBlepharismaDiscophryaTrithigmostomaStylonychiaOnychodromousOxytrichiaColpodaParameciumGlaucomaColpodiniumTetrahymenaOpisthonecta

OchromonasSymbiodiniumProrocentrumSarcocystisTheileriaPlagiopyla nPlagiopyla fTrimyema cTrimyema sCyclidium p

Cyclidium gCyclidium l

HomalozoonSpathidium

DasytrichiaEntodinium

Loxophylum

Metopus cMetopus p

LoxodesTracheloraphisSpirostomumGruberiaBlepharismaDiscophryaTrithigmostomaStylonychiaOnychodromousOxytrichiaColpodaParameciumGlaucomaColpodiniumTetrahymenaOpisthonecta

Parsimony analyse with topological constraints found the shortest trees forcing hydrogenosomal ciliate lineages together, thereby reducingthe number of separate origins of hydrogenosomes

Two topological constraint trees

Each of the constrained parsimony trees were compared to the ML tree and the Kishino-Hasegawa test used to determine which of these trees were significantly worse than the ML tree

Kishino-Hasegawa testKishino-Hasegawa test

No. Constraint Extra Difference SignificantlyOrigins tree Steps and SD worse?4 ML +10 - -4 MP - -13 18 No3 (cp,pt) +13 -21 22 No3 (cp,rc) +113 -337 40 Yes3 (cp,m) +47 -147 36 Yes3 (pt,rc) +96 -279 38 Yes3 (pt,m) +22 -68 29 Yes3 (rc,m) +63 -190 34 Yes2 (pt,cp,rc) +123 -432 40 Yes2 (pt,rc,m) +100 -353 43 Yes2 (pt,cp,m) +40 -140 37 Yes2 (cp,rc,m) +124 -466 49 Yes2 (pt,cp)(rc,m) +77 -222 39 Yes2 (pt,m)(rc,cp) +131 -442 48 Yes2 (pt,rc)(cp,m) +140 -414 50 Yes1 (pt,cp,m,rc) +131 -515 49 Yes

Constrained analyses used to find most parsimonious trees with less than four separate origins of hydrogenosomesTested against ML treeTrees with 2 or 1 origin are all significantly worse than the ML treeWe can confidently conclude that there have been at least three separate origins of hydrogenosomes within the sampled ciliates

Test summary and results (simplified)

Problems with tests of treesProblems with tests of trees• To be statistically valid, the Kishino-To be statistically valid, the Kishino-

Hasegawa test should be of trees that are Hasegawa test should be of trees that are selected selected a prioria priori

• However, most applications have used trees However, most applications have used trees selected selected a posterioria posteriori on the basis of the on the basis of the phylogenetic analysisphylogenetic analysis

• Where we test the ‘best’ tree against some Where we test the ‘best’ tree against some other tree the KH test will be biased towards other tree the KH test will be biased towards rejection of the null hypothesisrejection of the null hypothesis

• Only if null hypothesis is not rejected will Only if null hypothesis is not rejected will result be safe from some unknown level of result be safe from some unknown level of biasbias

Problems with tests of treesProblems with tests of trees• The Shimodaira-Hasegawa test is a more The Shimodaira-Hasegawa test is a more

statistically correct technique for testing statistically correct technique for testing trees selected trees selected a posterioria posteriori and is and is implemented in PAUP*implemented in PAUP*

• However it requires selection of a set of However it requires selection of a set of plausible topologies - hard to give practical plausible topologies - hard to give practical adviceadvice

• Parametric bootstrapping (SOWH test) is an Parametric bootstrapping (SOWH test) is an alternative - but it is harder to implement and alternative - but it is harder to implement and may suffer from an opposite bias due to may suffer from an opposite bias due to model mis-specificationmodel mis-specification

• The Approximately Unbiased test The Approximately Unbiased test (implemented in CONSEL) may be the best (implemented in CONSEL) may be the best option currentlyoption currently

Problems with tests of treesProblems with tests of trees

Taxonomic CongruenceTaxonomic Congruence• Trees inferred from different data sets Trees inferred from different data sets

(different genes, morphology) should (different genes, morphology) should agree if they are accurateagree if they are accurate

• Congruence between trees is best Congruence between trees is best explained by their accuracyexplained by their accuracy

• Congruence can be investigated using Congruence can be investigated using consensus (and supertree) methodsconsensus (and supertree) methods

• Incongruence requires further work to Incongruence requires further work to explain or resolve disagreements explain or resolve disagreements

Reliability of Phylogenetic Reliability of Phylogenetic MethodsMethods

• Phylogenetic methods (e.g. parsimony, Phylogenetic methods (e.g. parsimony, distance, ML) can also be evaluated in distance, ML) can also be evaluated in terms of their general performance, terms of their general performance, particularly their:particularly their:consistency - approach the truth with more consistency - approach the truth with more

datadata

efficiency - how quickly (how much data)efficiency - how quickly (how much data)

robustness - sensitivity to violations of robustness - sensitivity to violations of assumptionsassumptions

• Studies of these properties can be Studies of these properties can be analytical or by simulationanalytical or by simulation


• There have been many arguments that There have been many arguments that ML methods are best because they have ML methods are best because they have desirable statistical properties, such as desirable statistical properties, such as consistencyconsistency

• However, ML does not always have However, ML does not always have these propertiesthese properties– if the model is wrong/inadequate if the model is wrong/inadequate

(fortunately this is testable to some extent)(fortunately this is testable to some extent)– properties not yet demonstrated for properties not yet demonstrated for

complex inference problems such as complex inference problems such as phylogenetic treesphylogenetic trees


• ““Simulations show that ML methods generally Simulations show that ML methods generally outperform distance and parsimony methods outperform distance and parsimony methods over a broad range of realistic conditions”over a broad range of realistic conditions”

Whelan et al. 2001 Whelan et al. 2001 Trends in GeneticsTrends in Genetics 17:262-27217:262-272

• But…But…• Most simulations cover a narrow range of Most simulations cover a narrow range of

very (unrealistically) simple conditionsvery (unrealistically) simple conditions– few taxa (typically just four!)few taxa (typically just four!)– few parameters (standard models - JC, K2P few parameters (standard models - JC, K2P

etc)etc)

Reliability of Phylogenetic Reliability of Phylogenetic MethodsMethods• Simulations with four taxa have shown:Simulations with four taxa have shown:

- Model based methods - distance and Model based methods - distance and maximum likelihood perform well when the maximum likelihood perform well when the model is accurate (not surprising!)model is accurate (not surprising!)

- Violations of assumptions can lead to Violations of assumptions can lead to inconsistency for all methods (inconsistency for all methods (a Felsenstein a Felsenstein zonezone) when branch lengths or rates are highly ) when branch lengths or rates are highly unequalunequal

- Maximum likelihood methods are quite robust Maximum likelihood methods are quite robust to violations of model assumptionsto violations of model assumptions

- Weighting can improve the performance of Weighting can improve the performance of parsimony (reduce the size of the Felsenstein parsimony (reduce the size of the Felsenstein zone)zone)


• However:However:- Generalising from four taxon simulations may be Generalising from four taxon simulations may be

dangerous as conclusions may not hold for more dangerous as conclusions may not hold for more complex casescomplex cases

- A few large scale simulations (many taxa) have A few large scale simulations (many taxa) have suggested that parsimony can be very accurate and suggested that parsimony can be very accurate and efficientefficient

- Most methods are accurate in correctly recovering Most methods are accurate in correctly recovering known phylogenies produced in laboratory studies known phylogenies produced in laboratory studies

• More realistic simulations are needed if they are More realistic simulations are needed if they are to help in choosing/understanding methodsto help in choosing/understanding methods

• You can try your own…You can try your own…

pinpointing uncertainty

Documents

tree b

trees tree lengths

parametric test

support of characters

longer h1 tree

ranks test

treestempletons test

suboptimal tree