distance matrix methods

77
Distance matrix methods calculate a measure of distance between each pair of species, then find a tree that predicts the observed set of distances.

Upload: gabby

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Distance matrix methods. calculate a measure of distance between each pair of species , then find a tree that predicts the observed set of distances. Branch lengths and times. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distance matrix  methods

Distance matrix methodscalculate a measure of distance between each pair of species, then find a tree that predicts the observed set of distances.

Page 2: Distance matrix  methods

Branch lengths and times in distance matrix methods, branch lengths reflect the expected amount of evolution in different branches of the tree.

branch length = ri • ti

rate of evolution

elapsed time

Page 3: Distance matrix  methods

The least squares method

A B C D E

A 0 Dab Dac Dad Dae

B Dab 0 Dbc Dbd Dbe

C Dac Dbc 0 Dcd Dce

D Dad Dbd Dcd 0 Dde

E Dae Dbe Dce Dde 0

Observed matrix

minimise the difference between the observed matrix of distances and the matrix of distances predicted by the tree.

Page 4: Distance matrix  methods

The least squares method

A B C D E

A 0 dab dac dad dae

B dab 0 dbc dbd dbe

C dac dbc 0 dcd dce

D dad dbd dcd 0 dde

E dae dbe dce dde 0

Expected matrix c

e

ab

d

0.08

0.05

0.10

0.07

0.06

0.05

0.03

Page 5: Distance matrix  methods

The least squares method

c

e

ab

d

0.08

0.05

0.10

0.07

0.06

0.05

A B C D E

A 0

B 0

C 0

D 0

E 0

0.03

Expected matrix

Page 6: Distance matrix  methods

The least squares method

c

e

ab

d

0.08

0.05

0.10

0.07

0.06

0.05

A B C D E

A 0 0.23

B 0

C 0

D 0

E 0

0.08+0.05+0.10

0.03

Expected matrix

Page 7: Distance matrix  methods

The least squares method

c

e

ab

d

0.08

0.05

0.10

0.07

0.06

0.05

A B C D E

A 0 0.23 0.16 0.20 0.17

B 0.23 0 0.23 0.17 0.24

C 0.16 0.23 0 0.15 0.11

D 0.20 0.17 0.15 0 0.21

E 0.17 0.24 0.11 0.21 0

0.03

Expected matrix

Page 8: Distance matrix  methods

The least squares method

Q = S S wij (Dij – dij)2 i=1 j=1

n n

observed distancebetween species i and j

expected distancebetween species i and j

Q is a measure for the discrepancy between the observed and the expected matrix.

Page 9: Distance matrix  methods

The least squares method

Q = S S wij (Dij – dij)2 i=1 j=1

n n

weight(1, 1/D2, 1/D)

distances can be weighed or not.

Page 10: Distance matrix  methods

The least squares method

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

xij,k= 1 if branch k is on the path between species j and k= 0 if branch k is not on the path between species j and k

Xij, k is a handy variable

Page 11: Distance matrix  methods

The least squares method

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

Xa-b,1= 1

Page 12: Distance matrix  methods

The least squares method

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

Xa-b,1= 1Xa-b,7= 1

Page 13: Distance matrix  methods

The least squares method

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

Xa-b,1= 1Xa-b,7= 1Xa-b,3= 0

Page 14: Distance matrix  methods

The least squares method

Q = S S wij (Dij – dij)2 i=1 j=1

n n

dij = S xij,k vkk

rewrite dij, the expected values

Page 15: Distance matrix  methods

The least squares method

Q = S S wij (Dij – Sxij,k vk)2 i=1 j=1

n n

k

Page 16: Distance matrix  methods

The least squares method

Q = S S wij (Dij – Sxij,k vk)2 i=1 j=1

n n

k

= -2 S S wij xij, k (Dij – Sxij,k vk) i=1 j=1

n ndQdvk k

differentiate Q and equate the derivative to zero

Page 17: Distance matrix  methods

The least squares method

= -2 S S xij, k (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdvk k

for the unweighted case

Page 18: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j:j≠1

n ndQdv1 k

xAB,1 (DAB-SxAB,kvk) + xAC,1 (DAC-SxAC, kvk) + xAD,1 (DAD-SxAD, kvk) + xAB,1 (DAE-SxAE, kvk)

+ xBC,1 (DBC-SxBC, kvk) + xBD,1 (DBD-SxBD, kvk)+ xBE,1 (DBE-SxBE, kvk)

+ xCD,1 (DCD-SxCD, kvk) + xCE,1 (DCE-SxCE, kvk)

+ xDE,1 (DDE-SxDE, kvk) = 0

i=1

i=2

i=3

i=4

j=2 j=3 j=4 j=5

j=3 j=4 j=5

j=4 j=5

j=5

written in full

Page 19: Distance matrix  methods

The least squares method

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

Xij,1 A B C D E

A - 1 1 1 1

B - 0 0 0

C - 0 0

D - 0

E -

Page 20: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdv1 k

1 (DAB-SxAB,kvk) + 1 (DAC-SxAC, kvk)+ 1 (DAD-SxAD, kvk)+ 1 (DAE-SxAE, kvk)

+ 0 (DBC-SxBC, kvk) + 0 (DBD-SxBD, kvk)+ 0 (DBE-SxBE, kvk)

+ 0 (DCD-SxCD, kvk) + 0 (DCE-SxCE, kvk)

+ 0 (DDE-SxDE, kvk) = 0

Xij,1 A B C D E

A - 1 1 1 1

B - 0 0 0

C - 0 0

D - 0

E -

many terms are zero

Page 21: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdv1 k

(DAB-SxAB,kvk) + (DAC-SxAC, kvk) + (DAD-SxAD, kvk) + (DAE-SxAE, kvk) = 0

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

=1•v1 + 1•v2 + 0•v3 + 0•v4 + 0*v5 + 0•v6 + 1*v7

non-zero terms expanded

Page 22: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdv1 k

(DAB-SxAB, kvk) + (DAC-SxAC, kvk) + (DAD-SxAD, kvk) + (DAE-SxAE, kvk) = 0

c

e

ab

d

v1

v7

v2

v4

v5

v3

v6

=1•v1 + 0•v2 + 1•v3 + 0•v4 + 0*v5 + 1•v6 + 0*v7

Page 23: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdv1 k

(DAB-SxAB, kvk) + (DAC-SxAC, kvk) + (DAD-SxAD, kvk) + (DAE-SxAE, kvk) = 0

DAB + DAC + DAD + DAE – 4v1 – v2 – v3 – v4 – v5 – 2v6 – 2v7 = 0

DAB + DAC + DAD + DAE = 4v1 + v2 + v3 + v4 + v5 + 2v6 + 2v7

rearranging to

Page 24: Distance matrix  methods

The least squares method

= -2 S S xij, 1 (Dij – Sxij,k vk) = 0i=1 j=1

n ndQdv1 k

(DAB-SxAB, kvk) + (DAC-SxAC, kvk) + (DAD-SxAD, kvk) + (DAE-SxAE, kvk) = 0

DAB + DAC + DAD + DAE – 4v1 – v2 – v3 – v4 – v5 – 2v6 – 2v7 = 0

DAB + DAC + DAD + DAE = 4v1 + v2 + v3 + v4 + v5 + 2v6 + 2v7 equation for v1

Page 25: Distance matrix  methods

The least squares method

DAB + DAC + DAD + DAE = 4v1 + v2 + v3 + v4 + v5 + 2v6 + 2v7

DAB + DBC + DBD + DBE = v1 + 4v2 + v3 + v4 + v5 + 2v6 + 3v7

equation for v1equation for v2

mutatis mutandis for v2

Page 26: Distance matrix  methods

The least squares method

DAB + DAC + DAD + DAE = 4v1 + v2 + v3 + v4 + v5 + 2v6 + 2v7

DAB + DBC + DBD + DBE = v1 + 4v2 + v3 + v4 + v5 + 2v6 + 3v7

DAC + DBC + DCD + DDE = v1 + v2 + 4v3 + v4 + v5 + 3v6 + 2v7

DAD + DBD + DCD + DDE = v1 + v2 + v3 + 4v4 + v5 + 2v6 + 3v7

DAE + DBE + DCE + DDE = v1 + v2 + v3 + v4 + 4v5 + 3v6 + 2v7

DAC + DAE + DCE + DBE + DCD + DDE = 2v1 + 2v2 + 3v3 + 2v4 + 3v5 + 6v6 + 4v7

DAB + DAD + DBC + DCD + DBE + DDE = 2v1 + 3v2 + 2v3 + 3v4 + 2v5 + 4v6 + 6v7

equation for v1equation for v2

v3

v4v5

v6v7

and all other branches

Page 27: Distance matrix  methods

The least squares method solving linear equations with matrices

x + 2y = 4

3x - 5y = 1

1 2

3 -5

4

1A = = B

A-1 =-5 -2

-3 1

1| A |

=1

1*(-5)- 3*2

-5 -2

-3 1 = -

-5 -2

-3 1

111

X = A-1 B = --5 -2

-3 1

111

4

1= -

111

-22

-11

2

1=

Page 28: Distance matrix  methods

Clustering algorithms clustering methods have no criterion but apply algorithms to come up with trees

Page 29: Distance matrix  methods

Clustering algorithms: UPGMA

an ultrametric tree

UPGMA assumes that evolutionary rates are the same in all lineages

UnweightedPairGroupMethod withArithmetic mean

Page 30: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

Page 31: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

Page 32: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

3. Lump i and j into a new group.

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 98 148

bear 32 0 26 34 84 136

raccoon 48 26 0 42 92 152

weasel 51 34 42 0 86 142

SS 0

cat 98 84 92 86 0 148

monkey 148 136 152 142 148 0

Page 33: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 98 148

bear 32 0 26 34 84 136

raccoon 48 26 0 42 92 152

weasel 51 34 42 0 86 142

SS 0

cat 98 84 92 86 0 148

monkey 148 136 152 142 148 0

Page 34: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 49 98 148

bear 32 0 26 34 84 136

raccoon 48 26 0 42 92 152

weasel 51 34 42 0 86 142

SS 0

cat 98 84 92 86 0 148

monkey 148 136 152 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

Page 35: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 49 98 148

bear 32 0 26 34 31 84 136

raccoon 48 26 0 42 92 152

weasel 51 34 42 0 86 142

SS 0

cat 98 84 92 86 0 148

monkey 148 136 152 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

Page 36: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 49 98 148

bear 32 0 26 34 31 84 136

raccoon 48 26 0 42 44 92 152

weasel 51 34 42 0 41 86 142

SS 49 31 44 41 0 89.5 142

cat 98 84 92 86 89.5 0 148

monkey 148 136 152 142 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

Page 37: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

dog bear raccoon weasel SS cat monkey

dog 0 32 48 51 49 98 148

bear 32 0 26 34 31 84 136

raccoon 48 26 0 42 44 92 152

weasel 51 34 42 0 41 86 142

SS 49 31 44 41 0 89.5 142

cat 98 84 92 86 89.5 0 148

monkey 148 136 152 142 142 148 0

Page 38: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

13

Page 39: Distance matrix  methods

Clustering algorithms: UPGMAdog bear raccoon weasel SS cat monkey

dog 0 32 48 51 49 98 148

bear 32 0 26 34 31 84 136

raccoon 48 26 0 42 44 92 152

weasel 51 34 42 0 41 86 142

SS 49 31 44 41 0 89.5 142

cat 98 84 92 86 89.5 0 148

monkey 148 136 152 142 142 148 0

dog BR weasel SS cat monkey

dog 0 40 51 49 98 148

BR 40 0 38 37.5 88 144

weasel 51 38 0 41 86 142

SS 49 37.5 41 0 89.5 142

cat 98 88 86 89.5 0 148

monkey 148 144 142 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

Page 40: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

dog BR weasel SS cat monkey

dog 0 40 51 49 98 148

BR 40 0 38 37.5 88 144

weasel 51 38 0 41 86 142

SS 49 37.5 41 0 89.5 142

cat 98 88 86 89.5 0 148

monkey 148 144 142 142 148 0

Page 41: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

1318.756.755.75

Page 42: Distance matrix  methods

Clustering algorithms: UPGMAdog BR weasel SS cat monkey

dog 0 40 51 49 98 148

BR 40 0 38 37.5 88 144

weasel 51 38 0 41 86 142

SS 49 37.5 41 0 89.5 142

cat 98 88 86 89.5 0 148

monkey 148 144 142 142 148 0

dog BRSS weasel cat monkey

dog 0 44.5 51 98 148

BRSS 44.5 0 39.5 88.75 143

weasel 51 39.5 0 86 142

cat 98 88.75 86 0 148

monkey 148 143 142 148 0

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

Page 43: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

dog BRSS weasel cat monkey

dog 0 44.5 51 98 148

BRSS 44.5 0 39.5 88.75 143

weasel 51 39.5 0 86 142

cat 98 88.75 86 0 148

monkey 148 143 142 148 0

Page 44: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

13 19.756.755.75

wea

sel

Page 45: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSS weasel cat monkey

dog 0 44.5 51 98 148

BRSS 44.5 0 39.5 88.75 143

weasel 51 39.5 0 86 142

cat 98 88.75 86 0 148

monkey 148 143 142 148 0

dog BRSSW cat monkey

dog 0 98 148

BRSSW 0

cat 98 0 148

monkey 148 148 0

= (4*44.5 + 1*51)/5

4 species in BRSS

1 species in weasel

Page 46: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSS weasel cat monkey

dog 0 44.5 51 98 148

BRSS 44.5 0 39.5 88.75 143

weasel 51 39.5 0 86 142

cat 98 88.75 86 0 148

monkey 148 143 142 148 0

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0

cat 98 0 148

monkey 148 148 0

= (4*44.5 + 1*51)/5

4 species in BRSS

1 species in weasel

Page 47: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSS weasel cat monkey

dog 0 44.5 51 98 148

BRSS 44.5 0 39.5 88.75 143

weasel 51 39.5 0 86 142

cat 98 88.75 86 0 148

monkey 148 143 142 148 0

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0 88.2 142.8

cat 98 88.2 0 148

monkey 148 142.8 148 0

Page 48: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0 88.2 142.8

cat 98 88.2 0 148

monkey 148 142.8 148 0

Page 49: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

13 19.756.755.75

wea

sel

dog

22.9

Page 50: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0 88.2 142.8

cat 98 88.2 0 148

monkey 148 142.8 148 0

BRSSWD cat monkey

BRSSWD 0

cat 0 148

monkey 148 0

= (5*88.2 + 1*98)/6

1 species in dog

5 species in BRSSW

Page 51: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0 88.2 142.8

cat 98 88.2 0 148

monkey 148 142.8 148 0

BRSSWD cat monkey

BRSSWD 0 89.833

cat 89.833 0 148

monkey 148 0

= (5*88.2 + 1*98)/6

1 species in dog

5 species in BRSSW

Page 52: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

dog BRSSW cat monkey

dog 0 45.8 98 148

BRSSW 45.8 0 88.2 142.8

cat 98 88.2 0 148

monkey 148 142.8 148 0

BRSSWD cat monkey

BRSSWD 0 89.833 143.66

cat 89.833 0 148

monkey 143.66 148 0

= (5*88.2 + 1*98)/6

1 species in dog

5 species in BRSSW

Page 53: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

BRSSWD cat monkey

BRSSWD 0 89.833 143.66

cat 89.833 0 148

monkey 143.66 148 0

Page 54: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

13 19.756.755.75

wea

sel

dog

22.9

cat

44.916622.0166

Page 55: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

BRSSWD cat monkey

BRSSWD 0 89.833 143.66

cat 89.833 0 148

monkey 143.66 148 0

BRSSWD monkey

BRSSWD 0

monkey 0= (6*143.66 + 1*148)/7

1 species in cat

6 species in BRSSWD

Page 56: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j. Lump i and j into a new group.

3. Lump i and j into a new group.

4. Compute distance between new group and all other groups (weigh for number of species in groups).

BRSSWD cat monkey

BRSSWD 0 89.833 143.66

cat 89.833 0 148

monkey 143.66 148 0

BRSSWD monkey

BRSSWD 0 144.2857

monkey 144.2857 0= (6*143.66 + 1*148)/7

1 species in cat

6 species in BRSSWD

Page 57: Distance matrix  methods

Clustering algorithms: UPGMA

1. Find species i and j with the smallest distance .

2. Calculate branch length between i and j.

sea

lion

seal

12

racc

oon

bear

13 19.756.755.75

wea

sel

dog

22.9

cat

44.916622.0166

mon

key

72.142827.22619

Page 58: Distance matrix  methods

Clustering algorithms: Neighbour-joining

1. Calculate Sx = (SDx)/(n-2)dog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

79.2

62.3

74.7

72.8

70.3

69.8

114.5

168.3

79.2 62.3 74.7 72.8 70.3 69.8 114.5 168.3

Page 59: Distance matrix  methods

Clustering algorithms: Neighbour-joining

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

dog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

79.2

62.3

74.7

72.8

70.3

69.8

114.5

168.3

79.2 62.3 74.7 72.8 70.3 69.8 114.5 168.3

dog bear raccoon weasel seal sea lion cat monkey

dog -109.50

bear

raccoon

weasel

seal

sea lion

cat

monkey

32 - 79.2 - 62.3 =

-109.5

Page 60: Distance matrix  methods

Clustering algorithms: Neighbour-joining

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

dog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

79.2

62.3

74.7

72.8

70.3

69.8

114.5

168.3

79.2 62.3 74.7 72.8 70.3 69.8 114.5 168.3

dog bear raccoon weasel seal sea lion cat monkey

dog -109.50 -105.83 -101.00 -99.50 -101.00 -95.67 -99.50

bear -109.50 -111.00 -101.17 -103.67 -99.17 -92.83 -94.67

raccoon -105.83 -111.00 -105.50 -101.00 -100.50 -97.17 -91.00

weasel -101.00 -101.17 -105.50 -99.17 -104.67 -101.33 -99.17

seal -99.50 -103.67 -101.00 -99.17 -116.17 -95.83 -96.67

sea lion -101.00 -99.17 -100.50 -104.67 -116.17 -94.33 -96.17

cat -95.67 -92.83 -97.17 -101.33 -95.83 -94.33 -134.83

monkey -99.50 -94.67 -91.00 -99.17 -96.67 -96.17 -134.83

Page 61: Distance matrix  methods

Clustering algorithms: Neighbour-joining

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

dog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

79.2

62.3

74.7

72.8

70.3

69.8

114.5

168.3

79.2 62.3 74.7 72.8 70.3 69.8 114.5 168.3

branch length cat-cm = 148/2 + (114.5-168.5)/2 = 47.08

Page 62: Distance matrix  methods

Clustering algorithms: Neighbour-joining

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

dog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

79.2

62.3

74.7

72.8

70.3

69.8

114.5

168.3

79.2 62.3 74.7 72.8 70.3 69.8 114.5 168.3

branch length cat-cm = 148/2 + (114.5-168.5)/2 = 47.08branch length monkey-cm = 148/2 + (168.5-114.5)/2 = 110.92

Page 63: Distance matrix  methods

Clustering algorithms: Neighbour-joining

catsea lion

seal

monkey

weasel

bear raccoondog

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

Page 64: Distance matrix  methods

Clustering algorithms: Neighbour-joining

cat

sea lion

seal

monkey

weasel

bear raccoondog

cm 47.08

100.92

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

Page 65: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

dog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49

bear 32 0 26 34 29 33

raccoon 48 26 0 42 44 44

weasel 51 34 42 0 44 38

seal 50 29 44 44 0 24

sea lion 48 33 44 38 24 0

cm

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

5. Create a new matrix. Calculate the distances between the new node and other taxa as Dxij=(Dix+Djx-Dij)/2

(98+148-148)/2 =

49

Page 66: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cat monkey

dog 0 32 48 51 50 48 98 148

bear 32 0 26 34 29 33 84 136

raccoon 48 26 0 42 44 44 92 152

weasel 51 34 42 0 44 38 86 142

seal 50 29 44 44 0 24 89 142

sea lion 48 33 44 38 24 0 90 142

cat 98 84 92 86 89 90 0 148

monkey 148 136 152 142 142 142 148 0

dog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49

bear 32 0 26 34 29 33 36

raccoon 48 26 0 42 44 44 48

weasel 51 34 42 0 44 38 40

seal 50 29 44 44 0 24 41.5

sea lion 48 33 44 38 24 0 42

cm 49 36 48 40 41.5 42 0

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

5. Create a new matrix. Calculate the distances between the new node and other taxa as Dxij=(Dix+Djx-Dij)/2

(98+148-148)/2 =

49

Page 67: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49bear 32 0 26 34 29 33 36raccoon 48 26 0 42 44 44 48weasel 51 34 42 0 44 38 40seal 50 29 44 44 0 24 41.5sea lion 48 33 44 38 24 0 42cm 49 36 48 40 41.5 42 0

55.6

38

50.4

49.8

46.5

45.8

51.3

55.6 38 50.4 49.8 46.5 45.8 51.3

1. Calculate Sx = (SDx)/(n-2)

Page 68: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49bear 32 0 26 34 29 33 36

raccoon 48 26 0 42 44 44 48

weasel 51 34 42 0 44 38 40

seal 50 29 44 44 0 24 41.5

sea lion 48 33 44 38 24 0 42

cm 49 36 48 40 41.5 42 0

55.6

38

50.4

49.8

46.5

45.8

51.3

55.6 38 50.4 49.8 46.5 45.8 51.3

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

dog bear raccoon weasel seal sea lion cm

dog -61.60 -58.00 -54.40 -52.10 -53.40 -57.90

bear -61.60 -62.40 -53.80 -55.50 -50.80 -53.30

raccoon -58.00 -62.40 -58.20 -52.90 -52.20 -53.70

weasel -54.40 -53.80 -58.20 -52.30 -57.60 -61.10

seal -52.10 -55.50 -52.90 -52.30 -68.30 -56.30

sea lion -53.40 -50.80 -52.20 -57.60 -68.30 -55.10

cm -57.90 -53.30 -53.70 -61.10 -56.30 -55.10

Page 69: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49bear 32 0 26 34 29 33 36

raccoon 48 26 0 42 44 44 48

weasel 51 34 42 0 44 38 40

seal 50 29 44 44 0 24 41.5

sea lion 48 33 44 38 24 0 42

cm 49 36 48 40 41.5 42 0

55.6

38

50.4

49.8

46.5

45.8

51.3

55.6 38 50.4 49.8 46.5 45.8 51.3

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

branch length seal-ss = 24/2 + (46.5-45.8)/2 = 12.35branch length sealion-ss = 24/2 + (45.8-46.5)/2 = 11.65

Page 70: Distance matrix  methods

Clustering algorithms: Neighbour-joining

cat

sea lion

seal

monkey

weasel

bear raccoondog

cm 47.08

100.92

ss

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

Page 71: Distance matrix  methods

Clustering algorithms: Neighbour-joiningdog bear raccoon weasel seal sea lion cm

dog 0 32 48 51 50 48 49bear 32 0 26 34 29 33 36

raccoon 48 26 0 42 44 44 48

weasel 51 34 42 0 44 38 40

seal 50 29 44 44 0 24 41.5

sea lion 48 33 44 38 24 0 42

cm 49 36 48 40 41.5 42 0

1. Calculate Sx = (SDx)/(n-2)2. Calculate Mij = Dij-Si-Sj and

select pair with smallest Mij

3. Create a node that joins this pair and calculate branch lengths as (Dij/2)+(Si-Sj)/2

4. Join the two species and make all other taxa in form of a star.

5. Create a new matrix. Calculate the distances between the new node and other taxa as Dxij=(Dix+Djx-Dij)/2

dog bear raccoon weasel ss cm

dog 0 32 48 51 37 49

bear 32 0 26 34 19 36

raccoon 48 26 0 42 32 48

weasel 51 34 42 0 29 40

ss 37 19 32 29 0 29.75

cm 49 36 48 40 29.75 0

Page 72: Distance matrix  methods

Clustering algorithms: Neighbour-joining

cat

sea lion

seal

monkey

weaselbear

raccoon

dog

cm 47.08

100.92

ss

br

Round 3bear+raccoon

Page 73: Distance matrix  methods

Clustering algorithms: Neighbour-joining

cat

sea lion

seal

monkey

weaselbear

raccoondog

cm 47.08

100.92

ss

brbrd

Round 4(bear+raccoon)+dog

Page 74: Distance matrix  methods

Clustering algorithms: Neighbour-joining

catsea lion

seal

monkey

weasel

bear

raccoondog

cm 47.08

100.92

ss

brbrd

cmw

Round 5(cat+monkey)+weasel

Page 75: Distance matrix  methods

Clustering algorithms: Neighbour-joining

catsea lion

seal

monkey

weasel

bear

raccoondog

cm 47.08

100.92

ss

brbdr

cmwbdrss

Round 6(seal+sealion)+(bear+raccoon+dog)

Page 76: Distance matrix  methods

Clustering algorithms: Neighbour-joining

catsea lion

seal

monkey

weasel

bear

raccoondog

cm 47.08

100.92

ss

brbdr

cmwbdrss

Page 77: Distance matrix  methods

Clustering algorithms: Neighbour-joining

cat

sea

lion

seal

mon

key

wea

sel

bear

racc

oon

dog

sea

lion

seal

racc

oon

bear

wea

sel

dog

cat

mon

keyUPGMA