1 dan graur molecular phylogenetics. 2 3 4 5 6 molecular phylogenetic approaches: 1. distance-matrix...

65
1 Dan Graur Molecular Molecular Phylogenetics Phylogenetics

Upload: arnold-charles

Post on 12-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

1

Dan Graur

Molecular Molecular PhylogeneticsPhylogenetics

Page 2: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

2

Page 3: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

3

Page 4: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

4

Page 5: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

5

Page 6: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

6

Molecular phylogenetic approaches:

1. distance-matrix (based on distance measures)

2. character-state (based on character states)

3. maximum likelihood (based on both character states and distances)

Page 7: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

7

DISTANCE-MATRIX METHODS

In the distance matrix methods, evolutionary distances (usually the number of nucleotide substitutions or amino-acid replacements between two taxonomic units) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values.

Page 8: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

8

GCGGCTCA TCAGGTAGTT GGTG-G SpinachGCGGCCCA TCAGGTAGTT GGTG-G RiceGCGTTCCA TC--CTGGTT GGTGTG MosquitoGCGTCCCA TCAGCTAGTT GTTG-G MonkeyGCGGCGCA TTAGCTAGTT GGTG-A Human*** ** * * *** * **

Multiple AlignmentMultiple Alignment

Page 9: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

9

Distance Matrix**

Spinach Rice Mosquito Monkey HumanSpinach 0.0 9 106 91 86

Rice 0.0 118 122 122

Mosquito 0.0 55 51

Monkey 0.0 3

Human 0.0

**Units: Numbers of nucleotide substitutions per 1,000 nucleotide sites

Page 10: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

10

Distance Methods:

UPGMA

Neighbor-relations

Neighbor joining

Page 11: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

11

UPGMA UPGMA Unweighted pair-group method with arithmetic meansUnweighted pair-group method with arithmetic means

Page 12: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

12

UPGMA employs a sequential clustering algorithm, in which local topological relationships are identified in order of decreased similarity, and the tree is built in a stepwise manner.

Page 13: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

13

simple OTUs

Page 14: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

14

composite OTU

Page 15: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

15

Page 16: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

16

Page 17: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

17

UPGMA only works if the distances are strictly ultrametric.

Page 18: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

18

Neighborliness methods

The neighbors-relation method (Sattath & Tversky)

The neighbor-joining method (Saitou & Nei)

Page 19: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

19

In an unrooted bifurcating tree, two OTUs are said to be neineigghborshbors if they are connected through a single internal node.

Page 20: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

20

If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.

Page 21: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

21

A

B

C

D

+ < + = +

Four-Point Conditiond(A,B) d(C,D) d(A,C) d(B,D) d(A,D) d(B,C)

Page 22: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

22

Page 23: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

23

Page 24: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

24

In distance-matrix methods, it is assumed:

SimilaritySimilarity KinshipKinship

Page 25: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

25

From Similarity to RelationshipFrom Similarity to Relationship

• Similarity = Relationship, only if genetic distances increase with divergence times (monotonic distances).

Page 26: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

26

Similarities among OTUs can be due to:

• Ancestry:– Shared ancestral characters (plesiomorphies)– Shared derived characters (synapomorphy)

• Homoplasy:– Convergent events – Parallel events– Reversals

From Similarity to RelationshipFrom Similarity to Relationship

Page 27: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

27

Page 28: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

28

Parsimony Methods:

Willi HennigWilli Hennig1913-19761913-1976

Page 29: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

29

Occam’s razorOccam’s razor

“Pluralitas non est ponenda sine neccesitate.” (Plurality should not be posited without necessity.)

William of Occam or Ockham (ca. 1285-1349)English philosopher & Franciscan monk

Excommunicated by Pope John XXII in 1328.Officially rehabilitated by Pope Innocent VI in 1359.

Page 30: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

30

MAXIMUM PARSIMONY METHODS

Maximum parsimony involves the identification of a topology that requires the smallest number of evolutionary changes to explain the observed differences among the OTUs under study.

In maximum parsimony methods, we use discrete character states, and the shortest pathway leading to these character states is chosen as the best or maximum parsimony tree.

Often two or more trees with the same minimum number of changes are found, so that no unique tree can be inferred. Such trees are said to be equally parsimonious.

Page 31: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

31

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

invariantinvariant

Page 32: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

32

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

variantvariant

Page 33: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

33

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

uninformativeuninformative

Page 34: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

34

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

informativeinformative

Page 35: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

35

Page 36: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

36

Page 37: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

37

Page 38: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

38

Page 39: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

39

Inferring the maximum parsimony tree:

1. Identify all the informative sites. 2. For each possible tree, calculate the minimum number of substitutions at each informative site. 3. Sum up the number of changes over all the informative sites for each possible tree.4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.

Page 40: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

40

In the case of four OTUs, an informative site can only favor one of the three possible alternative trees.

Thus, the tree supported by the largest number of informative sites is the most parsimonious tree.

Page 41: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

41

With more than 4 OTUs, an informative site may favor more than one tree, and the maximum parsimony tree may not necessarily be the one supported by the largest number of informative sites.

Page 42: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

42

The informative sites that support the internal branches in the inferred tree are deemed to be synapomorphies.

All other informative sites are deemed to be homoplasies.

Page 43: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

43

Page 44: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

44

Parsimony is based solely on synapomorphies

Page 45: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

45

Page 46: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

46

Variants of ParsimonyVariants of ParsimonyWagner-Fitch: Unordered. Character state changes are symmetric and can occur as often as neccesary.

Camin-Sokal: Complete irreversibility.

Dollo: Partial irreversibility. Once a derived character is lost, it cannot be regained.

Weighted: Some changes are more likely than others.

Transversion: A type of weighted parsimony, in which transitions are ignored.

Page 47: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

47

Fitch’s (1971) method for inferring nucleotides at internal nodes

Page 48: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

48

Fitch’s (1971) method for inferring nucleotides at internal nodes

The set at an internal node is the intersection () of the two sets at its immediate descendant nodes if the intersection is not empty.

The set at an internal node is the union (of the two sets at its immediate descendant nodes if the intersection is empty.

When a union is required to form a nodal set, a nucleotide substitution at this position must be assumed to have occurred.

number of unions = minimum number of substitutionsnumber of unions = minimum number of substitutions

Page 49: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

49

Fitch’s (1971) method for inferring nucleotides at internal nodes

4 substitutions 3 substitutions

Page 50: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

50

Page 51: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

51

total number of substitutions in a tree = tree length

Page 52: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

52

Number of OTUs Number of possible rooted tree

2 13 34 15

5 1056 954

7 10,3958 135,1359 2,027,025

10 34,459,42515 213,458,046,676,87520 8,200,794,532,637,891,559,375

Searching for the maximum-parsimony tree

Page 53: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

53

Exhaustive = Examine allall trees, get the bestbest tree (guaranteed).

Branch-and-Bound = Examine somesome trees, get the bestbest tree (guaranteed).

Heuristic = Examine some trees, get a tree that may or may not be the bestmay or may not be the best tree.

Page 54: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

54

Exhaustive

Descendant trees of tree 2

Ascendant tree 2

Page 55: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

55

Branch-and-

Bound

Page 56: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

56

Branch-and-

Bound

Obtain a tree by a fast method. (e.g., the neighbor-joining method)

Compute minimum number of substitutions (L).

Turn L into an upper bound value.

Rationale: (1) the maximum parsimony tree must be either equal in length to L or shorter. (2) A descendant tree is either equal in length or longer than the ascendant tree.

Page 57: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

57

Branch-and-

Bound

Page 58: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

58

Heuristic

Page 59: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

59

Page 60: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

60

Page 61: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

61

Likelihood

• Example: Coin tossing

• Data: Outcome of 10 tosses: 6 heads + 4 tails

• Hypothesis: Binomial distribution

L = P(data|tree)

Page 62: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

62

LIKELIHOOD IN MOLECULAR PHYLOGENETICS

• The data are the aligned sequences• The model is the probability of

change from one character state to another (e.g., Jukes & Cantor 1-P model).

• The parameters to be estimated are: Topology & Branch Lengths

Page 63: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

63

Page 64: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

64

Background: Maximum Likelihood

)()()1( ln...ln...lnln Nj LLLL

)]|([max

dataP

How to calculate ML score for a tree :

1... j ... ...N... ... ...Seq x: C...GGACGTTTA...CSeq y: C...AGATCTCTA...C... ... ...

)|( dataPL

Page 65: 1 Dan Graur Molecular Phylogenetics. 2 3 4 5 6 Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state

65

Background: Maximum Likelihood

Sm

mj mRCLL )()(

SlACil

SkABik

lCCLvP

kBCLvPiACL

)()(

)()()(

Calculate likelihood for a single site j given tree :

A

B C

R: root

ABv ACv},,,{ TGCAS where