why models of sequence evolution matter number of differences between each pair of taxa vs. genetic...

8
Why Models of Sequence Evolution Matter ferences between each pair of taxa vs. genetic distance between tho he x-axis is a proxy for time since divergence between the two taxa accumulate linearly with time for only a very shot time after two t 2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation.

Upload: christine-quinn

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Why Models of Sequence Evolution Matter

Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy for time since divergence between the two taxa.

Differences accumulate linearly with time for only a very shot time after two taxa diverge

2 pairs taxa that have different divergence times, may have a similar number of differences between them – saturation.

Page 2: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Multiple Hits.

3

So even though there have been 4 substitutions, when we compare these two lineages, we only can detect 3 differences.

Models of sequence evolution expect multiple hits.

Page 3: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Branch-length Information

Let’s assume a true tree.

A ATCGAGCAGCCTGGGAGAGAGACTTATTTGACAAACGTAAB ATTGGGGAGTAGCGTAAACACTCTTATTTGACGAAATTATC ATCGTGGGTTAGAGTAGAGACTCTCATTTGACGAAATTATD AACGTGGCGAATAGTAGTCAAAAAATGTGTACCAGATTAC

Increase # replicates – keeps happening.

Increase # bp – happens with certainty.This tree is 37 steps, and the

true tree is 38 steps.

Page 4: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Branch-length Information

Now let’s subject the sequences to an ME search.

First, we need to convert the character by taxon matrix to a matrix of pairwise distances:

A B C D

A ------- 0.400 0.400 0.575

B 0.572 -------- 0.200 0.525

C 0.572 0.232 ------- 0.475

D 1.091 0.903 0.752 -------

Page 5: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Branch-length Information

So in the simulation, the methods that are agnostic with respect to multiple hits (MP and ME using p-distances) incorrectly unite the long-branch taxa (A & D).

ML can avoid long-branch attraction

Optimum tree is the true tree.

We’re getting pretty lousy estimates of branch lengths – under these conditions,branch-length estimates would converge on true values with more data.

Page 6: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

Long-branch attraction

Let’s assume that there is an A in both the short-branch taxa.

There are four possibilities for states at the other two terminals.

1) The long-branch taxa could have A

2 & 3) One of the long-branch taxa has a substitution to nucleotide X ( = G, C, or T)

1 2

4) There could be substitutions to X1 and X2 along both long branches.

If X1 ≠ X2 , the site is uninformative. If X1 = X2 , the site is misleading.

1

2

Page 7: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

The Importance of Branch Lengths

{A}

{A,C,G}

Large # of terminals with A, this is a slowly evolving site.

C at node 2, transversion to A along short branch a, no change along b and change to G along g.

G at node 2, transition to A along short branch a, no change along ,g and change to C along b.

A at node 2, no change along short branch a, a change to C along ,b and change to G along g.

Fitch Optimization

Page 8: Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy

The Importance of Branch Lengths

{A}

{A,C,G}

ML can voice a preference here, where parsimony can’t. This is because ML accounts for branch lengths in calculating reconstruction probabilities.

No change along a short branch and changes along both long branches is more likely than achange along the short branch coupled with no change along one of the long branches.

All reconstructions are permitted and accounted for, but the reconstruction with A at node 2 (& node 1) contributes the most to the single-site likelihood.