phylogenetic trees (2) lecture 12
DESCRIPTION
Phylogenetic Trees (2) Lecture 12. Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17. Recall: The Four Points Condition. Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/1.jpg)
.
Phylogenetic Trees (2)Lecture 12
Based on: Durbin et al Section 7.3, 7.8, Gusfield: Algorithms on Strings, Trees, and Sequences Section 17.
![Page 2: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/2.jpg)
2
Recall: The Four Points Condition
Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that:
d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l) We call {{i,j},{k,l}} the “split” of {i,j,k,l}.
The four point condition implies an O(n4) algorithm to decide whether a set is additive.The most common methods for constructing trees for additive sets use neighbor joining methods, which we study next.
![Page 3: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/3.jpg)
3
Constructing additive trees:The neighbor joining problem
Let M be additive set, and let i, j be neighboring leaves in the
implied tree, let k be their parent, and let m be any other vertex.
The formula
shows that we can compute the distances of k to all other leaves.
This suggest the following method to construct tree from a
distance matrix:
1. Find neighboring leaves i,j in the tree,
2. Replace i,j by their parent k and recursively construct a tree T
for the smaller set.
3. Add i,j as children of k in T.
)],(),(),([),( jidmjdmidmkd 2
1
![Page 4: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/4.jpg)
4
Neighbor Finding
How can we find from distances alone a pair of nodes which are neighboring leaves (called “cherries”)?
Closest nodes aren’t necessarily cherries.
AB
CD
Next we show one way to find neighbors from distances.
![Page 5: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/5.jpg)
5
Neighbor Finding: Seitou&Nei method (87)
Theorem (Saitou&Nei) Assume all edge weights are positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree.
)(),()(),(
:,
.),(
ji
ui
rrjidLjiD
ji
uidri
2
leavesFor
let , leaf aFor leaf a is
Definitions
![Page 6: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/6.jpg)
6
Saitou&Nei proofDefinitionspath(i,j) = the path from leaf i to leaf j; d(u,path(i,j)) = distance in T from u to path(i,j).
ij
u
d(u,path(i,j))
path(i,j)
Claim: 2,
( , ) ( , ) ( , ( , ))u i j
D i j d i j d u path i j
![Page 7: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/7.jpg)
7
Proof of Claim:
),()],(),(),([
),(),(),(),(),()(),(
,
,,
jidujduidjid
ujdijduidjidjidLjiD
jiu
jiujiu
2
2
-2d(u,path(i,j))
jiu
jipathudjid,
)),(,(),(2
ri rj
![Page 8: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/8.jpg)
8
Seitou&Nei proof (cont.)
jiu
jipathudjidjiQ
jiD
,
)),(,(),(),(
),( g maximizin toequivalent is g minimizinThus,
)()()()),(,(),(),(),(),(,
eweNewjipathudjidjiQjipathei
jipathejiu
For a vertex i, and an edge e:Ni(e) = |{u : e is on path(i,u)}|Then:
Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j).
ij
uRest of T
e
![Page 9: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/9.jpg)
9
Let (see the figure below):• path(i,j) = (i,,...,k,j).• T1 = the subtree rooted at k. WLOG that T1 has at most L/2 leaves. •T2 = T \ T1.
ij
k
T1
T2
Assume for contradiction that Q(i,j) is maximized for i,j which are not neighboring leaves.
i’j’
Seitou&Nei proof (cont.)
Let i’,j’ be any two neighboring leavesin T1. We will show that Q(i’,j’) > Q(i,j).
![Page 10: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/10.jpg)
10
ij
k
T1
T2
Proof that Q(i’,j’)>Q(i,j):
i’j’
)()()()','(
)()()(),(
)','('
)','(
),(),(
eweNewjiQ
eweNewjiQ
jipei
jipe
jipei
jipe
Each leaf edge e adds w(e) both to Q(i,j) and to Q(i’,j’), so we can ignore the contribution of leaf edges to both Q(i,j) and Q(i’,j’)
Seitou&Nei proof (cont.)
![Page 11: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/11.jpg)
11
ij
k
T1
T2i’
j’
Location of internal edge e
# w(e) added to Q(i,j)
# w(e) added to Q(i’,j’)
epath(i,j) 1 Ni’(e)≥2
epath(i’,j) Ni (e) < L/2 Ni’(e) ≥ L/2
eT\path(i,i’) Ni (e) = Ni’(e)
Since there is at least one internal edge e in path(i,j), Q(i’,j’) > Q(i,j). QED
Contribution of internal edges to Q(i,j) and to Q(i’,j’)
Seitou&Nei proof (end)
![Page 12: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/12.jpg)
12
A simpler neighbor finding method:Select an arbitrary node r.
d(r,path(i,j))
i
j
r
Claim (from final exam, Winter 02-3): Let i, j be such that d(r,path(i,j)) is maximized.Then i and j are neighboring leaves.
)],(),(),([)),(,( jidrjdridjipathrd 2
1
![Page 13: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/13.jpg)
13
Neighbor Joining Algorithm If L =3, return tree of three vertices Set M to contain all leaves, and select a root r. Compute for all i,j ≠ r, C(i,j)=(d(r,i)+d(r,j)-d(i,j))/2.Iteration: Choose i,j such that C(i,j) is maximal Create new vertex k, and set
ij
k
r
1
2 // or could be 0
1for each vertex ,
2
( , ) [ ( , ) ( , ) ( , )]
( , ) ( , ) ( , ) ( , ) ( , )
( , ) [ ( , ) ( , ) ( , )]
d i k d i j d i r d j r
d j k d i j d i k d i k d j k
m d k m d i m d j m d i j
remove i,j, and add k to MRecursively construct a tree on the smaller set, then add i,j as children on k, at distances d(i,k) and d(j,k).
C(i,j)
![Page 14: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/14.jpg)
14
Naive Implementation:
Initialization: θ(L2) to compute d(r,i) and C(i,j) for all i,jL.
Each Iteration: O(L2) to find the maximal C(i,j). O(L) to compute {C(m,k):m L} for the new node k.
Total of O(L3).
Complexity of Neighbor Joining Algorithm (using the simpler neighbor finding method)
mk
r
C(m,k)
![Page 15: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/15.jpg)
15
Complexity of Neighbor Joining Algorithm
Using Heap to store the C(i,j)’s:
Input: Distance matrix D= d(i,j), and an arbitrary object r.
Initialization: θ(L2) to compute and heapify the C(i,j)’s in a heap H.
Each Iteration:
O(log L) to find and delete the maximal C(i,j) from H.
O(L) to add the values {d(k,m)} to D, for all objects m.
O(L) to delete {d(m,i), d(m,j)} from D (for all m).
O(L log L) to delete {C(i,m), C(j,m)} and add C(k,m) from H, for all objects m.
Total of O(L2 log L).
(implementation details are omitted)
![Page 16: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/16.jpg)
16
Some remarks on the Neighbor Joining Algorithm
Applicable to matrices which are not additive
Known to work good in practice (with the original neighbor
finding method).
The algorithm and its variants are the most widely used distance-
based algorithms today.
Next we’ll learn a more efficient algorithm to construct trees from
distances, which is based on ultra metric trees.
![Page 17: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/17.jpg)
17
Ultrametric trees
Definition: An ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth.
Basic property: Define the height of the leaves to be 0. Then edge weights can be represented by the heights of internal vertices.
A E D CB
8
5
33
0:
3333
2
5
5
3Edge weights:
Internal-vertices heights:
![Page 18: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/18.jpg)
18
Least Common Ancestor and distances in Ultrametric Tree
Let LCA(i,j) denote the least common ancestor of leaves i and j. Let height(LCA(i, j)) be its distance from the leaves, and dist(i,j) be the distance from i to j.
Observation: For any pair of leaves i, j in an ultrametric tree:
height(LCA(i,j)) = 0.5 dist(i,j).
A B C D E
A 0 8 8 5 3
B 0 3 8 8
C 0 8 8
D 0 5
E 0A E D CB
8
53 3
![Page 19: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/19.jpg)
19
Ultrametric Matrices
Definition: A distances matrix* U of dimension LL is ultrametric iff for each 3 indices i, j, k :
U(i,j) ≤ max {U(i,k),U(j,k)}. j k
i 9 6
j 9
Theorem: The following conditions are equivalent for an LL distance matrix U:
1. U is an ultrametric matrix.
2. There is an ultrametric tree with L leaves such that for each pair of leaves i,j:
U(i,j) = height(LCA(i,j)) = ½ dist(i,j).
* Recall: distance matrix is a symmetric matrix with positive non-diagonal entries,0 diagonal entries, which satisfies the triangle inequality.
![Page 20: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/20.jpg)
20
Ultrametric tree Ultrametric matrix
There is an ultrametric tree s.t. U(i,j)=½dist(i,j).
U is an ultrametric matrix: By properties of Least Common Ancestors in trees
ijk
U(k,i) = U(j,i) ≥ U(k,j)
![Page 21: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/21.jpg)
21
Ultrametric matrix Ultrametric tree:
We start with two observations:
Definition: Let U be an LL matrix, and let S {1,...,L}.
U[S] is the submatrix of U consisting of the rows and columns with indices from S.
Observation 1: U is ultrametric iff for every S {1,...,L}, U[S] is ultrametric.
Observation 2: If U is ultrametric and maxi,jU(i,j)=M, , then M appears in every row of U.
j k
i ? ?
j M
One of the “?” Must be M
![Page 22: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/22.jpg)
22
Ultrametric matrix Ultrametric tree:Proof by induction
U is an ultrametric matrix U has an ultrametric tree : By induction on L, the size of U.
Basis: L= 1: T is a leaf
L= 2: T is a tree with two leaves
0 9
0
0
i
j
i j
i
i
9
ji
![Page 23: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/23.jpg)
23
Induction step
Induction step: L>2.
Use the 1st row to split the set {1,…,L} to two subsets:
S1 ={i: U(1,i) =M},
S2={1,..,L}-S
(note: 0<|Si|<L)1 2 3 4 5
1 0 8 2 8 5
S1={2,4}, S2={1,3,5}
![Page 24: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/24.jpg)
24
Induction step
By Observation 1, U[S1] and U[S2] are ultrametric.
By induction, tree T1 for S1, rooted labeled M1≤ M,
and a tree T2 for S2 with root labeled M2 < M (M2 is the 2nd largest element
in row 1; if M2=0 then T2 is a leaf).
Join T1 and T2 to T with a root labeled M.
[The construction when M1 = M]
M=M1
M2< M
T2
T1
M - M2
![Page 25: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/25.jpg)
25
Correctness Proof
Need to prove: T is an ultrametric tree for U
ie, U(i,j) is the label of the LCA of i and j in T.
If i and j are in the same subtree, this holds by induction.
Else LCA(i,j) = M (since they are in different subtrees).
Also, [U(1,i)= M and U(1,j) ≠ M] U(i,j) = M.
i j
M l
i M
M=M2
M
1
T1
T2
![Page 26: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/26.jpg)
26
Complexity AnalysisLet f(L) be the time complexity for L×L matrix.
f(1) ≤ f(2) = constant. For L>2: Constructing S1 and S2: O(L). Let |S1| = k, |S2| = L-k.
Constructing T1 and T2: f(k)+f(L-k).
Joining T1 and T2 to T: Constant.
Thus we have:f(L) ≤ maxk[ f(k) + f(L-k)] +cL, 0 < k < L.
f(L) = cL2 satisfies the above.
Need an appropriate data structure!The condition U(i,j) ≤ max {U(i,k),U(j,k)} is easier to check than the 4 points condition. Therefore the theorem implies that ultrametric additive sets are easier to characterize then arbitrary additive sets.
![Page 27: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/27.jpg)
27
Additive trees via Ultrametric trees
Recent (and more efficient) ways for constructing and identifying additive trees use ultrametric trees.
Idea: Reduce the problem to constructing trees by the “heights” of the internal nodes. For leaves i,j, U(i,j) represent the “height” of the common ancestor of i and j.
AE
D C
B
8
5
3
3
![Page 28: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/28.jpg)
28
Farris transform of Weighted Trees to Ultrametric Trees
First we set the height of all leaves to 0, by transforming the weighted tree T to an ultrametric tree T’ as follows:
Step 1: Pick a node r as a root, and “hang” the tree at r.
a
b
c
d
2
23
4
1
a
b
c d
2
13
4 2
r=a
![Page 29: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/29.jpg)
29
Transforming Weighted Trees to Ultrametric Trees
Step 2: Let M = maxid(i,r). M is taken to be the height of T’.Label the root by M, and label each internal node j by M-d(r,j).
a
b
c
d
2
23
4
1
a
b
c
d
2
13
42
9
7
4
r=a, M=9
![Page 30: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/30.jpg)
30
Transforming Weighted Trees to Ultrametric Trees
Step 3 (and last): “Stretch” edges of leaves so that they are all at distance M from the root
M=9
a
b
c
d
2
13
42
9
7
4
(9)
(6)
(2)
(0)
abc d
7
9
7
4
2
3
4
9
4
![Page 31: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/31.jpg)
31
Reconstructing the Weighted Tree from the Ultrametric Tree
M = 9
Weight of an internal edge is the difference between its endpoints.Weights of an edge to leaf i is obtained by subtracting M-d(r,i) from its current weight.
a
b
c d
1
2
3
4
0
2ab
c d
7(-6)
9
7
4
2
3
4
9 (-9)
4(-2)
![Page 32: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/32.jpg)
32
Solving the Additive Tree Problem by the Ultrametric Problem: Outline
We solve the additive tree problem by reducing it to the
ultrametric problem as follows:
Given an input matrix D = D(i,j) of distances:
1. Select an arbitrary object r as a root
2. Transform D to a matrix U= U(i,j), where U(i,j) is the height
of the LCA of i and j in the corresponding ultrametric tree TU.
3. Construct the ultrametric tree, TU, for U.
4. Reconstruct the additive tree T from TU.
![Page 33: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/33.jpg)
33
How U is constructed from D
U(i,j) should be the height of the Least Common Ancestor of i and j in TU, the ultrametric tree hanged at r:
Thus, U(i,j) = M - d(r,m), where d(r,m) is computed by:
a
b
c d
2
13
4 2
9
7
12( , ) ( ( , ) ( , ) ( . ))d r m d i r d j r d i j
For r=a, i=b, j=c, we have:
U(b,c)=9 - ½(3+9-8)=7
![Page 34: Phylogenetic Trees (2) Lecture 12](https://reader036.vdocument.in/reader036/viewer/2022062315/56815520550346895dc2fd06/html5/thumbnails/34.jpg)
34
The transformation D U TUT
a b c d
a 0 9 9 9
b 0 7 7
c 0 4
d 0
a b c d
a 0 3 9 7
b 0 8 6
c 0 6
d 0
D
a
b
c d
2
13
4 2
Uabc d
9
7
4
M=9
T TU