distance-based methods
DESCRIPTION
Distance-based methods. Xuhua Xia [email protected] http://dambe.bio.uottawa.ca. Lecture Outline. Objectives in this lecture Grasp the basic concepts distance-based tree-building algorithms - PowerPoint PPT PresentationTRANSCRIPT
Xuhua Xia Slide 2
Lecture Outline• Objectives in this lecture
– Grasp the basic concepts distance-based tree-building algorithms– Learn the least-squares criterion and the minimum evolution criterion and how
to use them to construct a tree
• Distance-based methods– Genetic distance: generally defined as the number of substitutions per site.
• JC69 distance• K80 distance• TN84 distance• F84 distance• TN93 distance• LogDet distance
– Tree-building algorithms (UPGMA): • UPGMA• Neighbor-joining• Fitch-Margoliash• FastME
Xuhua Xia Slide 3
Genetic Distances• Genetic distances: Assuming a substitution model,
we can obtain the genetic distance (i.e., difference) between two nucleotide or amino acid sequences, e.g.,
• JC
• K80
• TN93:
341ln
43 pK JC
80
1 1ln ln1 2 1 2
2 4KP Q Q
K
RY2GA1CT93 4 + 4 + 4TND
Y 1
T C YY
Y
P Q-ln 1- - ln 1 2 2 2
=2
RY R
Q
R 2
A G RR
R
P Q-ln 1- - ln 1 2 2 2
=2
YY R
Q
22
1ln
RY
Q
Xuhua Xia Slide 4
Calculation of KJC69
3 4ln 14 3
pK
AACGACGATCG: Species 1
AACGACGATCG
AACGACGATCG: Species 2
t
t
The time is 2t between Species 1 to Species 2
Sp1: AAG CCT CGG GGC CCT TAT TTT TTG
|| | ||| ||| | ||| ||| ||
Sp2: AAT CTC CGG GGC CTC TAT TTT TTT
p = 6/24 = 0.25
K = 0.304099
Genetic distances are scaled to be the number of substitutions per site.
Xuhua Xia Slide 5
Numerical IllustrationSp1: AAG CCT CGG GGC CCT TAT TTT TTG
|| | ||| ||| | ||| ||| ||
Sp2: AAT CTC CGG GGC CTC TAT TTT TTT
What are P and Q?
P = 4/24, Q = 2/24
80
ln 1 2 ln 1 20.31507864
2 4K
P Q QK
Comparison of distances:
P = 0.25
Poisson P = -ln(1-p) = 0.288
KJC69 = 0.304099
KK80 = 0.3150786
Xuhua Xia Slide 6
Distance-based phylogenetic algorithms
Algorithms Optimization Assuming a molecular clockUPGMA Local YesNeighbor-joining Local NoMinimum EvolutionGlobal NoFitch-Margoliash Global No FastME Global No
Xuhua Xia Slide 7
A Star Tree (Completely Unresolved Tree)
Human
Chimpanzee
Gorilla
Orangutan
Gibbon
Xuhua Xia Slide 8
Genetic Distance Matrix
Matrix of Genetic distances (Dij):
Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
Xuhua Xia Slide 9
• Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• D(hu-ch),go = (Dhu,go + Dch,go)/2 = 0.038 D(hu-ch),or = (Dhu,or + Dch,or)/2 = 0.135D(hu-ch),gi = (Dhu,gi + Dch,gi)/2 = 0.189
• hu-ch Gorilla Orang Gibbonhu-ch 0.038 0.135 0.189Gorilla 0.092 0.179Orang 0.179Gibbon
HumanChimpGorillaOrangGibbon
GorillaOrangGibbonHumanChimp
UPGMA
OrangGibbonGorillaHumanChimp
(hu,ch),(go,or,gi)
((hu,ch),go),(or,gi)
Xuhua Xia Slide 10
• Human Chimp Gorilla Orang GibbonHuman 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• D(hu-ch-go),or = (Dhu,or + Dch,or + Dgo,or)/3 = 0.120D(hu-ch-go),gi = (Dhu,gi + Dch,gi +Dgo,gi)/3 = 0.185
• hu-ch-go Orang Gibbonhu-ch-go 0.120 0.185Orangutan 0.179Gibbon
• D(hu-ch-go-or),gi = (Dhu,gi + Dch,gi +Dgo,gi + Dor,gi)/4 = 0.184
OrangGibbonGorillaHumanChimp
GibbonOrangGorillaHumanChimp
UPGMA
(((hu,ch),go),or),gi)
Xuhua Xia Slide 11
Phylogenetic Relationship from UPGMA• Human Chimp Gorilla Orang Gibbon
Human 0.015 0.045 0.143 0.198Chimp 0.030 0.126 0.179Gorilla 0.092 0.179Orang 0.179Gibbon
• hu-ch Gorilla Orang Gibbonhu-ch 0.038 0.135 0.189Gorilla 0.092 0.179Orang 0.179Gibbon
• hu-ch-go Orang Gibbonhu-ch-go 0.120 0.185Orang 0.179Gibbon
Xuhua Xia Slide 12
Branch Lengths((hu,ch),(go,or,gi))
(((hu,ch),go),(or,gi))
((((hu,ch),go),or),gi)
Dhu-ch = 0.015D(hu-ch),go = (Dhu,go + Dch,go)/2 = 0.038 D(hu-ch),or = (Dhu,or + Dch,or)/2 = 0.135D(hu-ch),gi = (Dhu,gi + Dch,gi)/2 = 0.189
D(hu-ch-go),or = (Dhu,or + Dch,or + Dgo,or)/3 = 0.120D(hu-ch-go),gi = (Dhu,gi + Dch,gi +Dgo,gi)/3 = 0.185
D(hu-ch-go-or),gi = (Dhu,gi + Dch,gi +Dgo,gi + Dor,gi)/4 = 0.184
((hu:0.0075,ch:0.0075),(go,or,gi))
(((hu:0.0075,ch:0.0075):0.019,go:0.019),(or,gi))
((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092)
Human
Chimp
Gorilla
Orang
Gibbon
0.0075
0.019
0.06
0.092
Xuhua Xia Slide 13
Final UPGMA TreeHuman
Chimp
Gorilla
Orang
Gibbon
0.092 0.060 0.019 0.0075
19 13 8 6 MY
((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092);
Xuhua Xia Slide 14
Distance-based method• Distance matrix
• Tree-building algorithms– UPGMA– Neighbor-joining– FastME– Fitch-Margoliash
• Criterion-based methods– Branch-length estimation– Tree-selection criterion
Xuhua Xia Slide 15
Branch Length Estimation• For three OTUs, the branch lengths can be estimated
directly• For more than three OTUs, there are two commonly
used methods for estimating branch lengths– The least-square method – Fitch-Margoliash method
• Don’t confuse the Fitch-Margoliash method of branch length estimation with the Fitch-Margoliash criterion of tree selection
• Illustration of the least-square method of branch length estimation
Xuhua Xia Slide 16
For three OTUs 1 2 3
1 0.092 0.1792 0.1793
1 2 31 d12 d13 2 d23 3
d12 = x1 + x2
d13 = x1 + x3
d23 = x2 + x3
x1
2
1
x3
x2
3
Xuhua Xia Slide 17
Least-square method
4
x1
3
2
1
x5
x4
x3
x2
4Sp1Sp2 0.3Sp3 0.4 0.5Sp4 0.4 0.6 0.6
4
Sp1
Sp2 d12
Sp3 d13 d23
Sp4 d14 d24 d34
Xuhua Xia Slide 18
Least-square method
4
x1
3
2
1
x5
x4
x3
x2
d’12 = x1 + x2
d’13 = x1 + x5+ x3
d’14 = x1 + x5 + x4
d’23 = x2 + x5 + x3
d’24 = x2 + x5 + x4
d’34 = x3 + x4
(d12 - d’12)2= [d12 – (x1 + x2)]2
(d13 - d’13)2 = [d13 – (x1 + x5+ x3)]2
(d14 - d’14)2 = [d14 – (x1 + x5 + x4)]2
(d23 - d’23)2 = [d23 – (x2 + x5 + x3)]2
(d24 - d’24)2 = [d24 – (x2 + x5 + x4)]2
(d34 - d’34)2 = [d34 – (x3 + x4)]2
n
jiijij ddSS 2' )( Least-squares method: Find xi
values that minimize SS
Xuhua Xia Slide 19
Least-squares method
SS = [d12 – (x1 + x2)]2 + [d13 – (x1 + x5+ x3)]2 + [d14 – (x1 + x5 + x4)]2
+ [d23 – (x2 + x5 + x3)]2+ [d24 – (x2 + x5 + x4)]2+ [d34 – (x3 + x4)]2
Take the partial derivative of SS with respective to xi, we have SS/x1 := -2 d12 + 6 x1 + 2 x2 - 2 d13 + 4 x5 + 2 x3 - 2 d14 + 2 x4
SS/x2 := -2 d12 + 2 x1 + 6 x2 - 2 d23 + 4 x5 + 2 x3 - 2 d24 + 2 x4
SS/x3 := -2 d13 + 2 x1 + 4 x5 + 6 x3 - 2 d23 + 2 x2 - 2 d34 + 2 x4
SS/x4 := -2 d14 + 2 x1 + 4 x5 + 6 x4 - 2 d24 + 2 x2 - 2 d34 + 2 x3
SS/x5 := -2 d13 + 4 x1 + 8 x5 + 4 x3 - 2 d14 + 4 x4 - 2 d23 + 4 x2 - 2 d24
Setting these partial derivatives to 0 and solve for xi, we have
x1 = d13/4 + d12/2 - d23/4 + d14/4 - d24/4x2 = d12/2 - d13/4 + d23/4 - d14/4 + d24/4,x3 = d13/4 + d23/4 + d34/2 - d14/4 - d24/4,x4 = d14/4 - d13/4 - d23/4 + d34/2 + d24/4,x5 = - d12/2 + d23/4 - d34/2 + d14/4 + d24/4 + d13/4
Xuhua Xia Slide 20
Least-squares method
x1 = d13/4 + d12/2 - d23/4 + d14/4 - d24/4x2 = d12/2 - d13/4 + d23/4 - d14/4 + d24/4,x3 = d13/4 + d23/4 + d34/2 - d14/4 - d24/4,x4 = d14/4 - d13/4 - d23/4 + d34/2 + d24/4,x5 = - d12/2 + d23/4 - d34/2 + d14/4 + d24/4 + d13/4
4Sp1Sp2 0.3Sp3 0.4 0.5Sp4 0.4 0.6 0.6
x1 = 0.075x2 = 0.225x3 = 0.275x4 = 0.325x5 = 0.025
4
x1
3
2
1
x5
x4
x3
x2
Xuhua Xia Slide 21
Minimum Evolution Criterion
4
x1
3
2
1
x5
x4
x3
x2
4
x1
2
3
1
x5
x4
x3
x2
3
x1
2
4
1
x5
x4
x3
x2
The minimum evolution (ME) criterion: The tree with the shortest TreeLen is the best tree.
OTUs ofnumber n where
32
1
n
iixTreeLen