1 dan graur molecular phylogenetics. 2 3 4 5 6 molecular phylogenetic approaches: 1. distance-matrix...
TRANSCRIPT
1
Dan Graur
Molecular Molecular PhylogeneticsPhylogenetics
2
3
4
5
6
Molecular phylogenetic approaches:
1. distance-matrix (based on distance measures)
2. character-state (based on character states)
3. maximum likelihood (based on both character states and distances)
7
DISTANCE-MATRIX METHODS
In the distance matrix methods, evolutionary distances (usually the number of nucleotide substitutions or amino-acid replacements between two taxonomic units) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values.
8
GCGGCTCA TCAGGTAGTT GGTG-G SpinachGCGGCCCA TCAGGTAGTT GGTG-G RiceGCGTTCCA TC--CTGGTT GGTGTG MosquitoGCGTCCCA TCAGCTAGTT GTTG-G MonkeyGCGGCGCA TTAGCTAGTT GGTG-A Human*** ** * * *** * **
Multiple AlignmentMultiple Alignment
9
Distance Matrix**
Spinach Rice Mosquito Monkey HumanSpinach 0.0 9 106 91 86
Rice 0.0 118 122 122
Mosquito 0.0 55 51
Monkey 0.0 3
Human 0.0
**Units: Numbers of nucleotide substitutions per 1,000 nucleotide sites
10
Distance Methods:
UPGMA
Neighbor-relations
Neighbor joining
11
UPGMA UPGMA Unweighted pair-group method with arithmetic meansUnweighted pair-group method with arithmetic means
12
UPGMA employs a sequential clustering algorithm, in which local topological relationships are identified in order of decreased similarity, and the tree is built in a stepwise manner.
13
simple OTUs
14
composite OTU
15
16
17
UPGMA only works if the distances are strictly ultrametric.
18
Neighborliness methods
The neighbors-relation method (Sattath & Tversky)
The neighbor-joining method (Saitou & Nei)
19
In an unrooted bifurcating tree, two OTUs are said to be neineigghborshbors if they are connected through a single internal node.
20
If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.
21
A
B
C
D
+ < + = +
Four-Point Conditiond(A,B) d(C,D) d(A,C) d(B,D) d(A,D) d(B,C)
22
23
24
In distance-matrix methods, it is assumed:
SimilaritySimilarity KinshipKinship
25
From Similarity to RelationshipFrom Similarity to Relationship
• Similarity = Relationship, only if genetic distances increase with divergence times (monotonic distances).
26
Similarities among OTUs can be due to:
• Ancestry:– Shared ancestral characters (plesiomorphies)– Shared derived characters (synapomorphy)
• Homoplasy:– Convergent events – Parallel events– Reversals
From Similarity to RelationshipFrom Similarity to Relationship
27
28
Parsimony Methods:
Willi HennigWilli Hennig1913-19761913-1976
29
Occam’s razorOccam’s razor
“Pluralitas non est ponenda sine neccesitate.” (Plurality should not be posited without necessity.)
William of Occam or Ockham (ca. 1285-1349)English philosopher & Franciscan monk
Excommunicated by Pope John XXII in 1328.Officially rehabilitated by Pope Innocent VI in 1359.
30
MAXIMUM PARSIMONY METHODS
Maximum parsimony involves the identification of a topology that requires the smallest number of evolutionary changes to explain the observed differences among the OTUs under study.
In maximum parsimony methods, we use discrete character states, and the shortest pathway leading to these character states is chosen as the best or maximum parsimony tree.
Often two or more trees with the same minimum number of changes are found, so that no unique tree can be inferred. Such trees are said to be equally parsimonious.
31
Site
____________________________________________
Sequences 1 2 3 4 5 6 7 8 9
____________________________________________
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T* * *
invariantinvariant
32
Site
____________________________________________
Sequences 1 2 3 4 5 6 7 8 9
____________________________________________
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T* * *
variantvariant
33
Site
____________________________________________
Sequences 1 2 3 4 5 6 7 8 9
____________________________________________
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T* * *
uninformativeuninformative
34
Site
____________________________________________
Sequences 1 2 3 4 5 6 7 8 9
____________________________________________
1 A A G A G T T C A
2 A G C C G T T C T
3 A G A T A T C C A
4 A G A G A T C C T* * *
informativeinformative
35
36
37
38
39
Inferring the maximum parsimony tree:
1. Identify all the informative sites. 2. For each possible tree, calculate the minimum number of substitutions at each informative site. 3. Sum up the number of changes over all the informative sites for each possible tree.4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.
40
In the case of four OTUs, an informative site can only favor one of the three possible alternative trees.
Thus, the tree supported by the largest number of informative sites is the most parsimonious tree.
41
With more than 4 OTUs, an informative site may favor more than one tree, and the maximum parsimony tree may not necessarily be the one supported by the largest number of informative sites.
42
The informative sites that support the internal branches in the inferred tree are deemed to be synapomorphies.
All other informative sites are deemed to be homoplasies.
43
44
Parsimony is based solely on synapomorphies
45
46
Variants of ParsimonyVariants of ParsimonyWagner-Fitch: Unordered. Character state changes are symmetric and can occur as often as neccesary.
Camin-Sokal: Complete irreversibility.
Dollo: Partial irreversibility. Once a derived character is lost, it cannot be regained.
Weighted: Some changes are more likely than others.
Transversion: A type of weighted parsimony, in which transitions are ignored.
47
Fitch’s (1971) method for inferring nucleotides at internal nodes
48
Fitch’s (1971) method for inferring nucleotides at internal nodes
The set at an internal node is the intersection () of the two sets at its immediate descendant nodes if the intersection is not empty.
The set at an internal node is the union (of the two sets at its immediate descendant nodes if the intersection is empty.
When a union is required to form a nodal set, a nucleotide substitution at this position must be assumed to have occurred.
number of unions = minimum number of substitutionsnumber of unions = minimum number of substitutions
49
Fitch’s (1971) method for inferring nucleotides at internal nodes
4 substitutions 3 substitutions
50
51
total number of substitutions in a tree = tree length
52
Number of OTUs Number of possible rooted tree
2 13 34 15
5 1056 954
7 10,3958 135,1359 2,027,025
10 34,459,42515 213,458,046,676,87520 8,200,794,532,637,891,559,375
Searching for the maximum-parsimony tree
53
Exhaustive = Examine allall trees, get the bestbest tree (guaranteed).
Branch-and-Bound = Examine somesome trees, get the bestbest tree (guaranteed).
Heuristic = Examine some trees, get a tree that may or may not be the bestmay or may not be the best tree.
54
Exhaustive
Descendant trees of tree 2
Ascendant tree 2
55
Branch-and-
Bound
56
Branch-and-
Bound
Obtain a tree by a fast method. (e.g., the neighbor-joining method)
Compute minimum number of substitutions (L).
Turn L into an upper bound value.
Rationale: (1) the maximum parsimony tree must be either equal in length to L or shorter. (2) A descendant tree is either equal in length or longer than the ascendant tree.
57
Branch-and-
Bound
58
Heuristic
59
60
61
Likelihood
• Example: Coin tossing
• Data: Outcome of 10 tosses: 6 heads + 4 tails
• Hypothesis: Binomial distribution
L = P(data|tree)
62
LIKELIHOOD IN MOLECULAR PHYLOGENETICS
• The data are the aligned sequences• The model is the probability of
change from one character state to another (e.g., Jukes & Cantor 1-P model).
• The parameters to be estimated are: Topology & Branch Lengths
63
64
Background: Maximum Likelihood
)()()1( ln...ln...lnln Nj LLLL
)]|([max
dataP
How to calculate ML score for a tree :
1... j ... ...N... ... ...Seq x: C...GGACGTTTA...CSeq y: C...AGATCTCTA...C... ... ...
)|( dataPL
65
Background: Maximum Likelihood
Sm
mj mRCLL )()(
SlACil
SkABik
lCCLvP
kBCLvPiACL
)()(
)()()(
Calculate likelihood for a single site j given tree :
A
B C
R: root
ABv ACv},,,{ TGCAS where