distance between tree topologies. d c h splits a b e f g {a}{bcdefgh} {b}{acdefgh} {ab}{cdefgh}...

12
Distance between tree topologies

Post on 22-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Distance between tree topologies

Page 2: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

D

CH

Splits

A

B

E F

G

{A}{BCDEFGH}{B}{ACDEFGH}{AB}{CDEFGH}{C}{ABDEFGH}{CD}{ABEFGH}{ABCD}{EFGH}

Each split represents a branch and there is a 1-1 correspondence between the tree topology and the list of all splits

Page 3: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Splits

Splits, which correspond to external branches, are trivial (found in all tree topologies).{A}{BCDEFGH},{B}{ACDEFGH},{C}{ABDEFGH}

Splits, which correspond to internal branches, are those which determine the topology.{AB}{CDEFGH},{CD}{ABEFGH},{ABCD}{EFGH}

Page 4: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Splits

For an unrooted tree with n leaves, there are 2n-3 branches, n external branches and n-3 internal branches -> n-3 non trivial splits.

Page 5: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Shared internal branches

DC

H

A

B

E

F G

DC

H

A

B

E

F

G

Page 6: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Internal branches exist in one tree but not in the other

DC

H

A

B

E

F G

DC

H

A

B

E

F

G

Robinson-Foulds distance = 6

Page 7: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Robinson-Foulds distance

•The distance was suggested in: Roubinson DF and Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci. 53:131-147.

•For an unrooted tree with n taxa, the min distance is 0, the max is 2(n-3).

•The distance ignores branch lengths.

•Zero branches are not treated as multifurcations.•Note that the splits {R1}{R2} and {R2}{R1} are identical.

Page 8: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Kuhner-Felsenstein’s “BRANCH SCORE”.distance

•The distance was suggested in: Kuhner MK and Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459-468.

•The motivation is to extend RF distance so that it accounts ALSO for differences in branch lengths.

•The distance was used to evaluate performance of ML, NJ, and MP in simulations (distance between inferred tree and “true” tree).

Page 9: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Branch-Score (Bs) distance

Bs =

If a branch is found in both tree (shared split) – its contribution to the distance is the square of the differences between the branches’ lengths in the two trees.

If a branch is found only in one tree – it is considered that a branch of length 0 exist in the other tree

CA

D

B

D

A

CBxb

xab

ydxd

xa xcya

ycyac

yb

222222 )0()0()()()()( acabddccbbaa yxyxyxyxyxBs

Page 10: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Branch-Score (Bs) distance

Bs extends RF if one replaces all branch lengths to equal 1

CA

D

B

D

A

CBxb

xab

ydxd

xa xcya

ycyac

yb

2)01()01()11()11()11()11( 222222 RF

222222 )0()0()()()()( acabddccbbaa yxyxyxyxyxBs

Page 11: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Another look at the Bs distance

Consider an array of all possible splits for n taxa.(B1,B2,…..,BN).

Each tree can be represented by such an array, in which Bi = 0, if the split is not found in the tree, and the length of the relevant branch if the split is found.

Bs distance between (B1,B2,…..,BN) and (B1’,B2’,…..,BN’) becomes

Bs distance is the square Euclidean distance, and hence it is a distance (e.g., the triangle inequality holds).

2

1

)'( BiBiBsN

i

Page 12: Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents

Are these distances true distances

Formally, a distance must have 3 properties:D(a,a)=0 for all a.D(a,b)=D(b,a) for all a,b (symmetry).D(a,c)<=D(a,b)+D(b,c) for all a,b,c (The triangle inequality).

Bs distance is the square Euclidean distance, and hence it is a distance (e.g., the triangle inequality holds).

2

1

)'( BiBiBsN

i