distance methods: p distances and the least squares (ls) approach
Post on 22-Dec-2015
223 views
TRANSCRIPT
![Page 1: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/1.jpg)
![Page 2: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/2.jpg)
Distance methods: p distances and the least squares (LS) approach
![Page 3: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/3.jpg)
Two steps:
1.Compute a distance D(i,j) between any two sequences i and j.
2.Find the tree that agrees most with the distance table.
General concept of distance based methods
![Page 4: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/4.jpg)
SEQ1 AACAAGCGSEQ2 AACGAGCA
There are 2 differences, so the distance = 2.The problem is that now, if you have a longer pair
of sequences
SEQ3 AACAAGCGCCCTCAGTCCGCTCGCACAASEQ4 AACGAGCACCCTCAGTCCGCTCGCACAA
The distance is still 2, but in fact, the distance between 3 and 4 should be smaller than the distance between 1 and 2.
Simplest distance: the “p” distance
![Page 5: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/5.jpg)
SEQ1 AACAAGCGSEQ2 AACGAGCA
There are 2 differences, the length = 8, so the distance is 2/8
This is called the p distance.
Simplest distance: the “p” distance
![Page 6: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/6.jpg)
Distance estimation
There are better and more accurate methods to compute the distance D(i,j) between any two sequences i and j. For example, one can take into account different probabilities between transitions and transversions…
![Page 7: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/7.jpg)
Each tree has branch lengths from which “predicted” set of distances can be computed: d(i,j) (small d, denotes the distance of the branches, unlike the observed pairwise distances D).
From a distance table to a tree
Human
Chimp
Gorilla
0.30.41
0.25
d(Human,Chimp) = 0.55
d(Human,Gorilla) = 0.71
d(Chimp, Gorilla) = 0.66
![Page 8: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/8.jpg)
The question is can we find branch lengths, so that the d’s are equal to the D’s?
Human
Chimp
Gorilla
XY
Z
D(Human,Chimp) = 0.3
D(Human,Gorilla) = 0.4
D(Chimp, Gorilla) = 0.5
From a distance table to a tree
![Page 9: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/9.jpg)
Human
Chimp
Gorilla
XY
Z
D(Human, Chimp) = 0.3
D(Human, Gorilla) = 0.4
D(Chimp, Gorilla) = 0.5
From a distance table to a tree
X+Z = 0.3
X+Y = 0.4
Y+Z = 0.5
d(Human, Chimp) = X+Z
d(Human, Gorilla) = X+Y
d(Chimp, Gorilla) = Y+Z
Y-Z = 0.1
Y+Z = 0.5
Y = 0.3
Z = 0.2
X = 0.1
![Page 10: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/10.jpg)
Human
Chimp
Gorilla
XY
Z
D(Human, Chimp) = D1
D(Human, Gorilla) = D2
D(Chimp, Gorilla) = D3
Is there always a solution?
We get 3 equations with 3 variables: there’s always a solution!
d(Human, Chimp) = X+Z
d(Human, Gorilla) = X+Y
d(Chimp, Gorilla) = Y+ZX+Z = D1
X+Y = D2
Y+Z = D3
![Page 11: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/11.jpg)
Human
Chimp
Gorilla
XY
Z
D(Human, Chimp) = D1
D(Human, Gorilla) = D2
D(Chimp, Gorilla) = D3
Ex.
Show that for a 3 taxa tree, there’s always a solution and it is given by:
Z=0.5(D1-D2+D3), Y=0.5(D2+D3-D1)
X=0.5(D1+D2-D3)
d(Human, Chimp) = X+Z
d(Human, Gorilla) = X+Y
d(Chimp, Gorilla) = Y+ZX+Z = D1
X+Y = D2
Y+Z = D3
![Page 12: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/12.jpg)
A
B
D
XY
Z
5 Variables,
6 Equations,
It might be that there’s no solution
Is there always a solution??
An example of a case where there’s no solution (v=w=x=y=z=1 solves the first 5 equations)
D(A, B) = 2 D(A, D) = 3
D(A, C) = 3 D(B, C) = 3
D(B, D) = 3 D(C, D) = 4
C
W
V
![Page 13: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/13.jpg)
Is there always a solution??
In real life, for n>3 sequences, there is never a solution.
One might try to find the “best” solution.
![Page 14: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/14.jpg)
Is there always a solution??
The simplest case where it might be that equations have no solution: two equations with 1 parameter
a = 2
a = 3
We want to find the “best” solution which solves these equations
![Page 15: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/15.jpg)
Is there always a solution??
Putting it another way:
a – 2 = 0
a – 3 = 0
Let’s assign parameters instead of 0
a – 2 = e1
a – 3 = e2
Ideally, we want e1, and e2 to be as small as possible (e1=e2=0 could be the best).
![Page 16: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/16.jpg)
The least square solution
a – 2 = e1
a – 3 = e2
We want the distance of the point (e1,e2) from (0,0) to be the smallest.
I.e., we want to find “a” that satisfies:
Sqrt(e12+e22) is minimum.
![Page 17: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/17.jpg)
The least square solution
The term: sqrt(e12+e22) reaches its minimum when the term: e12+e22 reaches its minimum.
So for:
a – 2 = e1
a – 3 = e2
we want to minimize: [(a-2)2+(a-3)2]
![Page 18: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/18.jpg)
The least square solution
Min [(a-2)2+(a-3)2]=
Min[2a2-10a+13]=
Min[2a2-10a]=
Min[a2-5a].
a2-5a is a parabola that crosses the X axis at a=0, and a=5, and its minimum is at a=2.5
![Page 19: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/19.jpg)
Is there always a solution???
So for the simplest of two equations with 1 parameter
a = 2
a = 3
The “best” solution is a = 2.5 which makes sense.
![Page 20: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/20.jpg)
Back to phylogeny
We have the D’s (“observed distances”), and we want to find the d’s (branches) that minimize the expression
n
i
n
jijij dDQ
1 1
2)(
![Page 21: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/21.jpg)
Back to phylogeny
For each tree topology we get a different Q. The least square (LS) method searches for the tree with the lowest Q.
n
i
n
jijij dDQ
1 1
2)(
![Page 22: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/22.jpg)
Back to phylogeny
The general formula for LS
The w’s are weights that differ between different least square methods.
n
i
n
jijijij dDwQ
1 1
2)(
![Page 23: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/23.jpg)
Back to phylogeny
w’s used
ijij
ijij
ij
Dw
Dw
w
1
1
1
2
Cavalli-Sforza and Edwards (1967)
Fitch Margoliash (1967)
Beyer et al (1974)
![Page 24: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/24.jpg)
Tree search
There are the general heuristic searches.
No branch-and-bound method published so far.
Problem was shown to be NP-complete.
![Page 25: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/25.jpg)
Minimum Evolution
The general formula for LS
n
i
n
jijijij dDwQ
1 1
2)(
Minimum Evolution (ME) for a given topology, it estimates the branch lengths using LS. But unlike LS, it chooses the topology that results in minimal sum of branches.
![Page 26: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/26.jpg)
![Page 27: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/27.jpg)
The Newick tree format and the Neighbor Joining algorithm
![Page 28: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/28.jpg)
The Newick tree format is used to represent trees as strings
C
A
B
In Newick format: (A,B,C)
![Page 29: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/29.jpg)
The Newick tree format is used to represent trees as strings
C
A
D
In Newick format: (A,C,(B,D)).
B
Each pair of parenthesis () enclose a monophyletic group, and the comma separates the members of the corresponding group.
![Page 30: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/30.jpg)
Neighbor-joining is based on Star decomposition
A
C
B
D
E
Red: best pair to group together
D
A
D
(C,B)A
E
((C,B),E)
![Page 31: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/31.jpg)
The Neighbour Joining method is used for re-constructing phylogenetic trees. Both the tree topology and branch lengths are estimated. In each stage, the two nearest nodes of the tree (the term "nearest nodes" will be defined in the following paragraphs) are chosen and defined as neighbours in our tree. This is done recursively until all of the nodes are paired together.
Neighbor-joining
![Page 32: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/32.jpg)
The algorithm was originally written by Saitou and Nei, 1987. In 1988 a correction for the paper was published by Studier & Keppler. The correction was related to the main theorem in the algorithm. Studier and Keppler also suggested a slight change to the algorithm which brought the efficiency down to O(n3).We will first of all describe the original algorithm, and then elaborate on the changes made by Studier & Kepler.
Neighbor-joining
![Page 33: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/33.jpg)
Reminder:
OTU’s = operational taxonomic units, or in other words – leaves of the tree.
HTU’s = hypothetical taxonomic units, or in other words – the internal nodes of the tree.
OTU’s and HTU’s
![Page 34: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/34.jpg)
What are neighbours?Neighbours are defined as a pair of OTU's who have one internal node connecting them.
Neighbors, we are …
BD
A C
A and B are neighbours,C and D are neighbours,But…A and C are not neighbours…
![Page 35: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/35.jpg)
In an additive tree, the distance matrix exactly reflects the tree:
Additive trees
BD
A
XY
C
the distance between nodes A and B
= the distance between nodes A and Y
+ the distance between nodes Y and B
![Page 36: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/36.jpg)
The NJ theorem: the NJ algorithm recovers the true tree, if the tree is additive.
Additive trees
![Page 37: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/37.jpg)
In the original article, Saitou and Nei defined the two nearest nodes as the pair of nodes that give the minimal sum of branches when placed in a tree.
NJ is an approximation of the Minimum evolution
![Page 38: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/38.jpg)
First of all – some notations:• D(i,j) is defined as the distance between
leaves i and j (the observed distance which we have as an input from our distance matrix).
• L(x,y) is defined as the sum of branch lengths between node X and node Y. L is used as a notation for distances between internal nodes, or an internal node to a leaf.
NJ notations:
![Page 39: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/39.jpg)
We distinguish between L(X,Y) and D(A,B). D’s are given as input to the algorithm, L’s should be inferred…
L(x,y) notation:
BD
A
XY
C
![Page 40: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/40.jpg)
• In each round we join as neighbours all possible pairs of leaves and evaluate the sum of branches for each resultant tree. This means we compare the sum of branches when 1 and 2 are joined as neighbours, denoted as S(1,2), to the sum of branches when 1 and 3 are joined as neighbours, S(1,3), and so on. We look for the i and j pair for which S(i,j) is minimal, where i and j denote numbers of leaves, and i<j .
• This is why NJ is approximating ME (minimum evolution).
NJ step:
![Page 41: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/41.jpg)
How can we evaluate S(1,2) from the input (the distance matrix)?
Computing S(1,2)
2
4
1
X Y
3
5
![Page 42: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/42.jpg)
The problem is that we don’t know the L’s. We only know the D’s…
Computing S(1,2)
2
4
1
YX
3
5
S(1,2) = L(1,X)+L(2,X)+L(X,Y)+L(Y,3)+L(Y,4)+L(Y,5)
![Page 43: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/43.jpg)
Since our tree is additive, we can replace L(1,X)+L(2,X), with D(1,2).
Computing S(1,2)
2
4
1
YX
3
5
S(1,2) = D(1,2)+L(X,Y)+L(Y,3)+L(Y,4)+L(Y,5)
S(1,2) = L(1,X)+L(2,X)+L(X,Y)+L(Y,3)+L(Y,4)+L(Y,5)
![Page 44: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/44.jpg)
Computing L(X,Y) in terms of the D’s
2
4
1
YX
3
5
Here, -L(1,X) is counted N-2 times
L(1,X) is counted here N-2 times
So L(1,X) is canceled out…
]),(2)),2(),1()(2()),2(),1(([)2(2
1),(
33
N
i
N
K
YiLXLXLNkDkDN
YXL
N denotes the number of leaves
![Page 45: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/45.jpg)
Computing L(X,Y) in terms of the D’s
2
4
1
YX
3
5
Here, -L(3,Y) is counted 2 times
Once here
So L(3,Y) is canceled out…
L(3,Y) is counted once here
]),(2)),2(),1()(2()),2(),1(([)2(2
1),(
33
N
i
N
K
YiLXLXLNkDkDN
YXL
![Page 46: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/46.jpg)
Computing L(X,Y) in terms of the D’s
2
4
1
YX
3
5
N-2 hereL(X,Y) is counted N-2 times here
So L(X,Y) is counted altogether 2(N-2) times. Dividing by 2(N-2) we get L(X,Y)
]),(2)),2(),1()(2()),2(),1(([)2(2
1),(
33
N
i
N
K
YiLXLXLNkDkDN
YXL
![Page 47: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/47.jpg)
Computing L(X,Y) in terms of the D’s
2
4
1
YX
3
5
]),(2))2,1()(2()),2(),1(([)2(2
1
33
N
i
N
K
YiLDNkDkDN
]),(2)),2(),1()(2()),2(),1(([)2(2
1),(
33
N
i
N
K
YiLXLXLNkDkDN
YXL
We still have to replace this term by the D’s
![Page 48: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/48.jpg)
Computing L(X,Y) in terms of the Ds
2
4
1
YX
3
5
3 3
1( , ) ( , )
3
N N
i i j
L i Y D i jN
L(3,Y) is counted here N-3 times: once in D(3,4), once in D(3,5), till D(3,N).
![Page 49: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/49.jpg)
Computing L(X,Y) in terms of the D’s
2
4
1
YX
3
5
3 3
( , )
1 2[ ( (1, ) (2, )) ( 2)( (1,2)) ( , )]
2( 2) 3
N N
K i j
L X Y
D k D k N D D i jN N
![Page 50: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/50.jpg)
Back to S(1,2)
3
3 3
3
3 3
(1,2) (1,2) ( , ) ( , )
(1,2)
1 2[ ( (1, ) (2, )) ( 2)( (1,2)) ( , )]
2( 2) 3
1( , )
3
1 1 1[ ( (1, ) (2, ))] (1,2) ( , )
2( 2) 2 2
N
i
N N
K i j
N
i j
N N
K i j
S D L X Y L Y i
D
D k D k N D D i jN N
D i jN
D k D k D D i jN N
![Page 51: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/51.jpg)
Finding the best neighbor
2
4
1
YX3
5
Let’s assume that S(1,2) is minimal in round 1…We call the new node that joins 1 and 2, X.
So, we compute S(1,2), S(1,3), … , S(4,5) and join the two leaves i and j for which S(i,j) is minimal.
2
4
13
5
![Page 52: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/52.jpg)
Finding the best neighbor
2
4
1
YX
3
5
For the next step of the algorithm, we need to create a distance table of (N-1)x(N-1). Let 12 denote the new node that joins 1 and 2. We define: (1, ) (2, )
(12, )2
D j D jD j
12
4
3
5
![Page 53: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/53.jpg)
Branch lengths
2
4
1
YX
3
5 Z
2
),2(
),2(2
),1(
),1( 33
N
iD
ZDN
iD
ZD
N
i
N
i
2
),1(),2()2,1(),2(
2
),2(),1()2,1(),1(
ZDZDDXL
ZDZDDXL
Only the branches in red are being computed.
![Page 54: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/54.jpg)
Branch lengths
412
3
5
If now (12) and (5) are joined, it is equivalent to joining (3) and (4). So we can already compute the branch lengths L((12),X),L(5,X), L(3,Y) and L(4,Y).
4
12
3
5
X
Y
![Page 55: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/55.jpg)
Branch lengths
4
12
3
5
X
Y
)]4,3(2)5,12(2)4,5()3,5()4,12()3,12([(4
1
),(
DDDDDD
YXL
![Page 56: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/56.jpg)
Complexity of computing S(1,2)
3
3
(1,2)
1[ ( (1, ) (2, ))]
2( 2)
1(1,2)
21
( , )2
N
K
N
i j
S
D k D kN
D
D i jN
This part requires O(N2) computations
![Page 57: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/57.jpg)
Complexity of the original NJ algorithm
Computing each S(i,j) sums up to N2
computations.
There are N2 combinations of S(i,j),
and N joining steps.
Altogether, the algorithm is O(N5).
![Page 58: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/58.jpg)
More things to know about the NJ algorithm
•Studier and Keppler introduced a way to reduce the complexity of the algorithm from O(N5) to O(N3).
•The NJ-theorems were not presented.
•BioNJ is a close relative to NJ, but with a slightly better performance.
•NJ constraints.
![Page 59: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/59.jpg)
![Page 60: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/60.jpg)
The UPGMA tree building method and the phylogeny of Carnivores
![Page 61: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/61.jpg)
Minimum evolution
In minimum evolution branch lengths are computed by the LS method for each possible tree topology.
However, the criterion to choose among tree topologies is not the lowest sum-of-squares, but rather the minimum sum of branch lengths.
![Page 62: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/62.jpg)
Molecular clocks
Branch lengths measure average number of replacements per position. It is, thus, equal to the number of replacements per position per year, multiplied by year.
Putting it another way:
rtd
![Page 63: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/63.jpg)
Molecular clocks
Clearly, the time t, from the root to the tips is the same for all sequences. However, the rate r, can differ, and might depend on factors such as the DNA repair mechanisms, generation time, and much more.
Human MouseHuman Mouse
![Page 64: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/64.jpg)
Molecular clocks
A molecular clock is the assumption that the rate of all species is approximately the same. Clearly, this is not the general case, but it might be true, for example when comparing very close species of ants. If the rate is the same, the branch lengths should be the same too.
Human MouseHuman Mouse
WITHOUT CLOCKWITH CLOCK
![Page 65: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/65.jpg)
Two kinds of tree search methods
Methods like least-squares, maximum parsimony, minimum evolution and maximum likelihood have an explicit criterion which they try to maximize or minimize.
There are some other methods (UPGMA, WPGMA, NJ) that apply some direct algorithm that result in a tree. These methods are usually very fast, but their statistical justification is unclear. These methods are usually some kind of a clustering algorithm.
![Page 66: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/66.jpg)
Ultrametric
Trees which satisfy a molecular clock are called ultrametric.
When trees are ultrametric it is very easy to estimate the LS branch lengths (Farris 1969a).
![Page 67: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/67.jpg)
UPGMA
UPGMA is one such direct method, receiving as input a distance matrix and giving as output an ultrametric tree.
It was suggested by Sokal and Michener (1958).
NOT TO BE USED, UNLESS YOU NEED A VERY FAST METHOD, AND YOU ARE SURE THE TREE IS ULTRAMETRIC!
![Page 68: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/68.jpg)
UPGMA
The algorithm:
Input: a distance matrix D which is symmetric, i.e., D(i,j)=D(j,i).
Variables: for each group of species we give a number which indicates how many species are in this group. N(i) will indicate the number of species in group i. Intially, all sequences have n=1.
![Page 69: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/69.jpg)
UPGMA
The algorithm:
1.Find the i and j that have the smallest D(i,j)2.Create a new group (ij) which has n(ij)=n(i)+n(j)3.Connect i and j to a new node (which
corresponds to the new group (ij)). Give the two branches connecting i to (ij) and j to (ij) each length of D(i,j)/2.
![Page 70: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/70.jpg)
UPGMA
The algorithm:
4. Compute the distance between the new group and all other groups (except for i and j) by using:
),())()(
)((),()
)()(
)(()),(( kjD
jnin
jnkiD
jnin
inkijD
![Page 71: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/71.jpg)
UPGMA
The algorithm:
5. Delete the columns and rows of the data (modified input) matrix that correspond to groups i and j, and add a column and row for group (ij).
6. Go to step 1, unless there is only 1 item left in the data matrix.
![Page 72: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/72.jpg)
Complexity
O(n3), because it takes O(n2) to find the minimum D(i,j) in a matrix and you have n iterations of that.
However, we can keep a record of the smallest number in each row, and then finding the minimum goes down to O(n).
Thus, the overall time-complexity is O(n2).
![Page 73: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/73.jpg)
An example
Distances based on immunological data of Sarich (1969).
![Page 74: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/74.jpg)
The players
Canis familiaris
Common name = Dog.
The species = familiaris.Genus = Canis. [First letter always in capital]Family = Canidae. [First letter always in capital]Order = Carnivora. [First letter always in capital]Class = Mammalia. [First letter always in capital]Phylum = Chordata. [First letter always in capital]Kingdom = Metazoa [=Multi-cellular organism. First
letter always in capital]
![Page 75: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/75.jpg)
The players
Ursus americanus
Common name = bear.
The species = americanus.Genus = Ursus . Family = Ursidae. Order = Carnivora.
![Page 76: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/76.jpg)
The players
Procyon lotor
Common name = raccoon.
The species = lotor. Genus = Procyon. Family = Procyonidae. Order = Carnivora.
![Page 77: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/77.jpg)
• Reddish-brown above and black or greyish below.
• Bushy tail with 4-6 black or brown rings
• Black mask outlined in white
• Small ears• The feet and
forepaws are dexterous
The raccoon (דביבון)
![Page 78: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/78.jpg)
The raccoon (דביבון)
• Native to the southern part of the Canadian provinces and most of the United States
• Most common along stream edges, open forests and coastal marshes
![Page 79: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/79.jpg)
• Inhabit hollow trees and logs and often use the ground burrows of other animals for raising their young or for sleeping during the coldest part of the winter months.
• An average of 4-5 young are born in April-May; the mother at first carries them by the nape of the neck like a cat; they are weaned by late summer.
• Omnivorous, it feeds on grapes, nuts, grubs, crickets, small mammals, birds' eggs and nestlings.
• Often seen washing their food, the raccoon is actually feeling for matter that should be rejected as wetting the paws enhances its sense of feel.
• Winter is the raccoons’ greatest enemy when food is scarce.
HEBREW: Nape = “OREF” ; Grub = “Zachal” ; Nestling = “Gozal”
The raccoon (דביבון)
![Page 80: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/80.jpg)
The players
Mustela nivalis
Common name = weasel.
Order = Carnivora.
In Hebrew (Samor)
![Page 81: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/81.jpg)
The players
The color of the weasel is chocolate brown on its back side and white with brown spots on its underparts. The summer coat is about 1 cm in length. The winter coat, which is about 1.5 cm in length, turns to all white in northern populations and remains brown in the southern populations.
![Page 82: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/82.jpg)
The players
The body of the least weasel is long and slender, with a long neck; a flat, narrow head; short limbs. This animal has large black eyes and large, round ears. The weasel's feet have five fingers with sharp claws. Breeding can occur throughout the year, but most of the breeding occurs in the spring and late summer. Gestation in the least weasel lasts from 34 - 37 days. Litters may range from 1 - 7.
![Page 83: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/83.jpg)
The players
A higher number of offsprings per litter can be found in northern populations. Newborns weigh from 1.1 g to 1.7 g and are wrinkled, pink, naked, blind, and deaf. After 49 - 56 days, they have reached their adult length. By week 6, the males are larger than the females. In 9 - 12 weeks family groups begin to break up, and in 12 - 15 weeks the weasels reach their adult mass.
![Page 84: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/84.jpg)
The players
The young spend their time play fighting and play mating. Weasels watch the movement of their prey before they attack. When they kill, they go for the neck of the victim.
Distribution:
Europe, northern Africa, Asia, North America; introduced to New Zealand
Diet:
Rodents, birds
![Page 85: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/85.jpg)
Weasel distribution
![Page 86: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/86.jpg)
The players
Phoca vitulinaCommon name = Harbor seal.Order = Carnivora.
In Hebrew: “Kelev-Yam”
![Page 87: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/87.jpg)
The players
Eumetopias jubatus
Common name = Steller sea lion.
Order = Carnivora.
![Page 88: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/88.jpg)
The players
In Hebrew:Arye-Yam
![Page 89: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/89.jpg)
The players
Felis catus
Common name = cat.
Order = Carnivora.
![Page 90: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/90.jpg)
The players
Pan troglodytes
Common name = chimpanzee.
Order = Primates.
![Page 91: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/91.jpg)
The distance table
dog bear raccoon weasel seal sea lion
cat chimp
dog 0 32 48 51 50 48 98 148bear 0 26 34 29 33 84 136
raccoon 0 42 44 44 92 152weasel 0 44 38 86 142
seal 0 24 89 142sea lion 0 90 142
cat 0 148chimp 0
![Page 92: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/92.jpg)
The distance table
dog bear raccoon weasel seal sea lion
cat chimp
dog 0 32 48 51 50 48 98 148bear 0 26 34 29 33 84 136
raccoon 0 42 44 44 92 152weasel 0 44 38 86 142
seal 0 24 89 142sea lion 0 90 142
cat 0 148chimp 0
![Page 93: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/93.jpg)
Starting tree
seal sea lion
We call the father node of seal and sea lion “ss”.
12 12
Distance between these two taxa was 24, so each branch has a length of 12.
ss
![Page 94: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/94.jpg)
Removing the seal and sea-lion rows and columns,and adding the ss row and columns
dog bear raccoon weasel ss cat chimp
dog 0 32 48 51 ? 98 148bear 0 26 34 ? 84 136
raccoon 0 42 ? 92 152weasel 0 ? 86 142
ss 0 89 142cat 0 148
chimp 0
![Page 95: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/95.jpg)
Computing dog-ss distance
dog bear raccoon weasel seal sea lion
cat chimp
dog 0 32 48 51 50 48 98 148
),())()(
)((),()
)()(
)(()),(( kjD
jnin
jnkiD
jnin
inkijD
Here, i=seal, j=sea lion, k = dog.
n(i)=n(j)=1.
D(ss,dog) = 0.5D(sea lion,dog) + 0.5D(seal,dog) = 49.
![Page 96: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/96.jpg)
The new table. Starting second iteration…
dog bear raccoon weasel ss cat chimp
dog 0 32 48 51 49 98 148bear 0 26 34 31 84 136
raccoon 0 42 44 92 152weasel 0 41 86 142
ss 0 89 142cat 0 148
chimp 0
![Page 97: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/97.jpg)
Starting tree
We call the father node of seal and sea lion “ss”.
Distance between bear and raccoon was 26, so each branch has a length of 13.
seal sea lion
12 12
ss
bear raccoon
13 13
br
![Page 98: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/98.jpg)
Computing br-ss distance
dog bear raccoon weasel ss cat chimp
ss 49 31 44 41 0 89.5 142
Here, i=raccoon, j=bear, k = ss.
n(i)=n(j)=1. D(br,ss) = 0.5D(bear,ss)+0.5D(raccoon,ss)=37.5.
),())()(
)((),()
)()(
)(()),(( kjD
jnin
jnkiD
jnin
inkijD
![Page 99: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/99.jpg)
The new table. Starting second iteration…
dog br weasel ss cat chimp
dog 0 40 51 49 98 148br 0 38 37.5 88 144
weasel 0 41 86 142ss 0 89 142
cat 0 148chimp 0
![Page 100: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/100.jpg)
Starting tree
Distance between br and ss was 37.5, so each branch has a length of 18.75. But this is the distance from br-ss to the leaves. The distance br-ss to ss is 18.75-12=6.75. The distance between br-ss to br is 18.75-13=5.75
seal sea lion
12 12
ss
bear raccoon
6.75
13
brss
br
5.75
13
![Page 101: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/101.jpg)
Computing dog-(br-ss) distance
dog br weasel ss cat chimp
dog 0 40 51 49 98 148
Here, i = br, j = ss, k = dog.
n(i)=n(j)=2. D( brss , dog ) = 0.5D( br , dog ) + 0.5D( ss , dog )=44.5.
),())()(
)((),()
)()(
)(()),(( kjD
jnin
jnkiD
jnin
inkijD
![Page 102: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/102.jpg)
The new table. Starting second iteration…
dog Br-ss weasel cat chimp
dog 0 44.5 51 98 148br-ss 0 39.5 88.75 143
weasel 0 86 142cat 0 148
chimp 0
![Page 103: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/103.jpg)
Starting tree
Distance between br-ss and w was 39.5, so wbrss is mapped to the line 19.75. The distance to br-ss, is thus, 1
seal sea lion
0
ss
bear raccoon
br-ss
br 1312
19.7518.75
weasel
wbrss
![Page 104: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/104.jpg)
Computing dog-wbrss distance
dog br-ss weasel cat chimp
dog 0 44.5 51 98 148
Here, i = br-ss, j = weasel, k = dog.
n(i)=4, n(j)=1. D( wbrss , dog ) = 0.8D( br-ss , dog ) + 0.2D( weasel , dog )=
44.5*8/10+51*2/10 = (356+102)/10=45.8
),())()(
)((),()
)()(
)(()),(( kjD
jnin
jnkiD
jnin
inkijD
![Page 105: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/105.jpg)
The new table. Starting second iteration…
dog wbrss cat chimp
dog 0 45.8 98 148wbrss 0 88.2 142.8
cat 0 148chimp 0
![Page 106: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/106.jpg)
Starting tree
Distance between wbrss and dog was 45.8, so dwbrss is mapped to the line 22.9 The distance to wbrss, is thus, 3.15
seal sea lion
0
ss
bear raccoon
br-ss
br 1312
22.9
18.75
weasel
dwbrss
19.75
dogl
wbrss
![Page 107: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/107.jpg)
The new table. Starting second iteration…
dwrbss cat chimp
dwrbss 0 89.833 143.66cat 0 148
chimp 0
![Page 108: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/108.jpg)
Starting tree
Distance between dwbrss and cat was 89.833, so cdwbrss is mapped to the line 44.9165The distance to dwbrss, is thus, 22.0165
seal sea lion
0
ss
bear raccoon
br-ss
br 1312
44.9165
18.75
weasel
cdwbrss
19.75
dog
wbrss22.9
cat
dwbrss
![Page 109: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/109.jpg)
The new table. Starting second iteration…
cdwrbss chimp
cdwrbss 0 144.2857chimp 0
![Page 110: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/110.jpg)
Starting tree
Distance between cdwbrss and chimp was 144.2857, so THE ROOT is mapped to the line 72.14285The distance to dwbrss, is thus, 27.22635
seal sea lion
0
ss
bear raccoon
brss
br 1312
72.14
18.75
weasel
dwbrss
19.75
dog
wbrss22.9
cat
cdwbrss44.9165
chimp
![Page 111: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/111.jpg)
Problems with UPGMA, when the data is not clock-like
Assume that this is the true tree:
13
4
10
2 2
4
B C
D
A
In this case, B and C will be clustered first – wrong!
A B C D
A 17 21 27
B 12 18
C 14
D
Then, the “true” distance matrix is
![Page 112: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/112.jpg)
Gene,
Volume 397, Issues 1-2, 1 August 2007, Pages 76-83
![Page 113: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/113.jpg)
![Page 114: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/114.jpg)
Networks
A network is sometimes used to represent a tree in which recombination occurred.
b c d ea
![Page 115: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/115.jpg)
![Page 116: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/116.jpg)
Known phylogenies
![Page 117: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/117.jpg)
Known phylogenies
The best way to test different methods of phylogenetic reconstruction is by using trees that are known to be true from other sources…
Problem: known phylogenies are very rare.
Known phylogeny: laboratory animals, crop plants (and even those are often suspicious). Also, their evolutionary rate is very slow…
![Page 118: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/118.jpg)
Known phylogenies
David Hillis and colleagues have created “experimental” phylogenies in the lab.
![Page 119: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/119.jpg)
Known phylogenies
The first paper (1992) analyzed phylogeny reconstruction based on restriction sites analysis.
![Page 120: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/120.jpg)
Known phylogenies
Later bacteriophage T7 was used. It was subdivided into cultures in the presence of a mutagen. Then they sequenced the final cultures and gave the sequences as input to a few phylogenetic reconstruction methods. The tree output of these methods was then compared to the true tree.
![Page 121: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/121.jpg)
Known phylogenies
In fact, they used restriction sites to infer the phylogeny, using MP, NJ, UPGMA and others.
All methods reconstructed the true tree.
![Page 122: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/122.jpg)
Known phylogenies
They also compared outputs of ancestral sequence reconstruction, using MP.
97.3% of the ancestral states were correctly reconstructed.
Encouraging!
![Page 123: Distance methods: p distances and the least squares (LS) approach](https://reader035.vdocument.in/reader035/viewer/2022062300/56649d795503460f94a5c467/html5/thumbnails/123.jpg)
Known phylogenies
Criticism:
(1) The true tree was very easy to infer, because it was well balanced, and all the nodes are accompanied by numerous changes.
(2) Mutating using a single mutagen doesn’t reflect reality.