1. 2 rooting the tree and giving length to branches

68
1

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1. 2 Rooting the tree and giving length to branches

1

Page 2: 1. 2 Rooting the tree and giving length to branches

2

Rooting the tree and giving length to branches

Page 3: 1. 2 Rooting the tree and giving length to branches

3

Rooted vs. unrooted trees

1

2

3

3 1

2

Page 4: 1. 2 Rooting the tree and giving length to branches

4

The position of the root does not affect the MP score.

Rooted vs. Unrooted.

Exercise:

Draw all alternative rooting of the MP tree. Evaluate 1 of them, and show that the MP score does not change.

Page 5: 1. 2 Rooting the tree and giving length to branches

5

s1 s4 s3 s2 s5

Gene number 1, Option number 1.

1 1 1 0 0

1

0

1

More intuition why rooting does not change score.

The change will always be on the same branch, no matter where the root is positioned…

1

Page 6: 1. 2 Rooting the tree and giving length to branches

6

How can we root the tree – we want rooted trees!

Page 7: 1. 2 Rooting the tree and giving length to branches

7

Page 8: 1. 2 Rooting the tree and giving length to branches

8

Page 9: 1. 2 Rooting the tree and giving length to branches

9

Gorilla gorilla

(Gorilla)

Homo sapiens (human)

Pan troglodytes (Chimpanzee)

Gallus gallus (chicken)

Page 10: 1. 2 Rooting the tree and giving length to branches

10

Evaluate all 3 possible UNROOTED trees:

Human

Chimp

Chicken

Gorilla

Human

Gorilla

Chimp

Chicken

Human

Chicken

Chimp

Gorilla

MP tree

Page 11: 1. 2 Rooting the tree and giving length to branches

11

Page 12: 1. 2 Rooting the tree and giving length to branches

1212

HOW MANY TREES

Page 13: 1. 2 Rooting the tree and giving length to branches

13

How many rooted trees

a ba b c b a c c a b

N=3, TR(3) = 3

b c da c b da d b ca a c db c a db

TR = “TREE ROOTED”

N=2, TR(2) = 1

d a cb a b dc b a dc d a bc a b cd

b a cd c a bd b c da c b da d b ca

N=4, TR(4) = 15

Page 14: 1. 2 Rooting the tree and giving length to branches

14

How many rooted trees

a b

c a b

TR = “TREE ROOTED”

2 branches. 3 possible places to add “c”

b a cdd b ca

c c

c

4 branches. 5 possible places to add “d”

6 branches. 7 possible places to add “e”

The number of branches is increased by 2 each time. The number of branches is an arithmetic series.0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2

Page 15: 1. 2 Rooting the tree and giving length to branches

15

How many rooted treesTR = “TREE ROOTED”

The number of branches is increased by 2 each time. The number of branches is an arithmetic series.0,2,4,6,8,…. A(n) = A(1)+(n-1)d. A(1) = 0; d=2. => A(n) = (n-1)*2 = 2n-2

a b

2 branches. 3 possible places to add “c”c c

c

Each time we can add a new branch in Br(n)+1 places. [Br(n)=number of branches]

TR(n+1) = TR(n)*(BR(n)+1)=TR(n)*(2n-1)TR(5) = TR(4)*7=TR(3)*5*7=TR(2)*3*5*7=1*3*5*7…TR(n) = 1*3*5*7*…..*(2n-3)

[Tr(n)=number of trees with n sequences]

Page 16: 1. 2 Rooting the tree and giving length to branches

16

How many rooted treesTR = “TREE ROOTED”

n!=1*2*3*4*5*6…..*n = n factorial.

TR(n) = 1*3*5*7*…..*(2n-3) =

2*4*6*8*….*(2n-4) =

1*2*3*4*5*6*7*…*(2n-3)

(2*1)*(2*2)*(2*3)*(2*4)*….*(2*(n-2)) =

1*2*3*4*5*6*7*…*(2n-3)

(2(n-2))*(1*2*3*4*….(n-2)) =

(2n-3)!

(2(n-2))*(n-2)!

(2n-3)! =

Page 17: 1. 2 Rooting the tree and giving length to branches

17

How many rooted treesTR = “TREE ROOTED”

TR(n) = 1*3*5*7*…..*(2n-3) =

(2(N-2))*(n-2)!

(2n-3)! =

=(2n-3)!!

Page 18: 1. 2 Rooting the tree and giving length to branches

18

Page 19: 1. 2 Rooting the tree and giving length to branches

19

How many unrooted trees

Ex: show that the number of unrooted trees is given by1*3*5*…*(2n-5) where n is the number of sequences.

Open questionsA close formula does not exist, though the recursion formula exists (Felsenstein 1987, Schroder, 1870). There are other results about the asymptotic rate at which the numbers rise, and other results concerning number of tree shapes, etc…

Page 20: 1. 2 Rooting the tree and giving length to branches

20

Page 21: 1. 2 Rooting the tree and giving length to branches

2121

HEURISTIC SEARCH

Page 22: 1. 2 Rooting the tree and giving length to branches

22

There are many trees..,

We cannot go over all the trees. We will try to find a way to find the best tree.These are approximate solutions…

Page 23: 1. 2 Rooting the tree and giving length to branches

23

Finding the maximum is the same thing as finding the minimum

Say we have a computer procedure that given a function, it finds its minimum, andwe want to find the maximum of a function f(x). We can just find the minimum of -f(x) and this is minus the maximum of f(x).

Example.

f(0) = 3; f(1) = 7; f(2) = -5; f(3) = 0; max f(x) = 7. argmax f(x) = 1;-f(0)=-3; -f(1) = -7; -f(2) = 5; -f(3) =0; min(-f(x)) = -7. argmax –(f(x) = 1;

Page 24: 1. 2 Rooting the tree and giving length to branches

24

Score = 1700

Page 25: 1. 2 Rooting the tree and giving length to branches

25

Score = 1700

Score = 1825

Score = 1710

Score = 1410

Score = 1695

Page 26: 1. 2 Rooting the tree and giving length to branches

26

Score = 1825

Score = 1828

Score = 1910

Score = 1800

Page 27: 1. 2 Rooting the tree and giving length to branches

27

Max score = 2900

Page 28: 1. 2 Rooting the tree and giving length to branches

28

Score = 2100

Problem number 1: local maximum

Score = 3100

Score = 2900

Local max

Global max

Page 29: 1. 2 Rooting the tree and giving length to branches

29

This algorithm is “greedy” – it seizes the first improvement encountered.

One way to avoid local maxima is to start from many random starting points

Page 30: 1. 2 Rooting the tree and giving length to branches

30

Several options to define a neighbor.

Option 1Option 2

Page 31: 1. 2 Rooting the tree and giving length to branches

31

Nearest-neighbor interchange

A

BC

D

A

DC

B

D

BC

A

Each internal branchdefines two neighbors

Page 32: 1. 2 Rooting the tree and giving length to branches

32

How many neighbors do we check each time?

For unrooted trees of n taxa, we have 2n-3 branches. However, only internal branches are interesting, thus we have n-3. Each defines two neighbors, thus the total number of neighbors in each NNI cycle is 2n-6.

A

BC

D

E

Internal branches

External branches

NNI is possible only in internal branches

Page 33: 1. 2 Rooting the tree and giving length to branches

33

I am greedy

Page 34: 1. 2 Rooting the tree and giving length to branches

34

(1)Most greedy: Start searching your neighbors. If you find something better – move there, and start the search again.

(2)Just greedy: Check ALL your neighbors. Move to the one that is the highest.

(3)Smart greedy: Try all NNI of trees that are tied for the best score.

Greedy variants

There are many other variants of the greedy search

that would not be discussed in this course.

Page 35: 1. 2 Rooting the tree and giving length to branches

35

SPR = SUBTREE PRUNING AND REGRAFTING

A

C

D

E

B

D

EA

CB

1.Chose a branch and cut it in 2.2.Remove the sticky end from one subtree.3.Connect the remaining sticky end to one

branch in the other subtree.

D

E

A

CB

D

E

A

CB

Page 36: 1. 2 Rooting the tree and giving length to branches

36

A

C

D

E

B

A

CB

1.Chose a branch and cut it in 2.2.Remove the sticky end from both subtrees.3.Connect the remaining 2 subtrees

anywhere.

A

CB

F

E

A

CB

TBR = TREE BISECTION AND RECONNECTION

F

D

E

F

D

E

F

D

Page 37: 1. 2 Rooting the tree and giving length to branches

37

Sequential addition

A

C

B

D D

CA

E

BD

CA

1.Start with a 3-taxa tree.2.Estimate all possible addition of the next

taxa.

Red: best addition

BE

One can do rearrangements in each addition step to increase efficiency.

Page 38: 1. 2 Rooting the tree and giving length to branches

38

Star decomposition

A

C

B

D D

(C,B)A

EB

D

CA

1.Start with an n-taxa star-tree.2. In each step find the best pair of taxa to

separate from the star’s root.

E

One can do rearrangements in each addition step to increase efficiency.

E

Red: best pair to group together

Page 39: 1. 2 Rooting the tree and giving length to branches

39

Simulated Annealing

Another method to avoid local maxima.

The idea in the simulated annealing is to relax the greediness by allowing steps to go downhill. For example we pick up one NNI neighbor randomly. If it is uphill – we move there. If it is downhill, we move there with a certain probability p.

We can control the probability p. In the beginning of the search allow p to be high. As the search progresses, reduce p (i.e., make the search more greedy).

1 0( , ) E

T

if Ep E T

e else

Page 40: 1. 2 Rooting the tree and giving length to branches

40

Page 41: 1. 2 Rooting the tree and giving length to branches

41

Branch and Bound

Page 42: 1. 2 Rooting the tree and giving length to branches

42

There are many trees..,

We cannot go over all the trees. We will try to find a way to find the best tree.There are approximate solutions… But what if we want to make sure we find the global maximum.

There is a way more efficient than just to go over all possible trees. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny.

Page 43: 1. 2 Rooting the tree and giving length to branches

43

BRANCH AND BOUND

To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the shortest Hamiltonian path (SHP) problem.

Page 44: 1. 2 Rooting the tree and giving length to branches

44

THE SHP PROBLEM (adapted to Israel).

A guard has to visit n check-points on a map. The problem is to find the shortest route (including the starting point) that goes through all points.

Naïve approach: (say for 5 points). You have 5 starting points. For each such starting point you have 4 possible “next steps”. For each such combination of starting point and first step, you have 3 possible second steps, etc. All together we have 5*4*3*2*1 possible solutions = 5!.

Page 45: 1. 2 Rooting the tree and giving length to branches

45

THE SHP TREE

1 2 3 4 5

2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4

2 4 5 1 4 5 1 2 5 1 2 4

5 4 5 2 4 2

4 5 2 5 2 4

Page 46: 1. 2 Rooting the tree and giving length to branches

46

THE SHP NAÏVE APPROACH

Each solution can be represented as a permutation:

(1,2,3,4,5)(1,2,3,5,4)(1,2,4,3,5)(1,2,4,5,3)(1,2,5,3,4)…We can go over the list and find the one giving the highest score.

Page 47: 1. 2 Rooting the tree and giving length to branches

47

THE SHP NAÏVE APPROACH

However, for 15 points for example, there are 1,307,674,368,000 permutations.

The rate of increase of the number of solutions is too big (more than exponential).

Page 48: 1. 2 Rooting the tree and giving length to branches

48

THE SHP HEURISTIC APPROACH

Start from a random point. Go to the closest point.This approach doesn’t work so good…

Page 49: 1. 2 Rooting the tree and giving length to branches

49

Computation times

The question is the relationship between computation time and n.

In very good cases, the computation time scales linearly with n: the computation time is increased by a constant for each increase in n.

In polynomial time, the function relating the dependency between computation time and n is a polynomial. For example CT(n) = 7n2.

Page 50: 1. 2 Rooting the tree and giving length to branches

50

Computation times

No matter what polynomial function we have, exponential functions like 2n will overtake for large enough n. .

Page 51: 1. 2 Rooting the tree and giving length to branches

51

NP-complete

Computer science theory shows that there is a class of problems that appear not to have a polynomial time solutions. All these np-complete problems are equivalent, in the sense that if ever one finds a polynomial solution to one – he can solve all of them. Although it was never proven that there is no polynomial solution to these problems (biggest open question in computer science), most people believe this to be the case.

Page 52: 1. 2 Rooting the tree and giving length to branches

52

NP-hard

There is another class of problems: the np-hard. There is no polynomial solution and even if the np-complete problems could be solved in polynomial time – this would not help solving these np-hard problems in polynomial time.

The SHP is one such NP-hard problem!

Page 53: 1. 2 Rooting the tree and giving length to branches

53

G

Estimating the parsimony score of a tree is not NP-complete.

A

C

A

G

4n-2 possible reconstructions.n=number of sequencesn-2=number of internal nodes

One could go over all 4n-2 possible assignments of characters to internal nodes to find the MP score. However, we have previously shown that although the naïve solution if exponential, a linear time algorithm exists.

Page 54: 1. 2 Rooting the tree and giving length to branches

54

BNB SOLUTION TO SHP

1 2 3 4 5

2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4

2 4 5 1 4 5 1 2 5 1 2 4

5 4 5 2 4 2

4 5 2 5 2 4

Shortest path found so far = 15

Score here already 16: no point in checking the rest of the subtree

Page 55: 1. 2 Rooting the tree and giving length to branches

55

Back to finding the MP tree

Finding the MP tree is NP-Hard…

BNB helps, though it is still exponential…

Page 56: 1. 2 Rooting the tree and giving length to branches

56

The MP search tree1

2

34 is added to branch 1.

1

2

34

1

2

34

1

2

3

4

5 is added to branch 2.There are 5 branches

Page 57: 1. 2 Rooting the tree and giving length to branches

57

The MP search tree

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Page 58: 1. 2 Rooting the tree and giving length to branches

58

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best record = 52

Page 59: 1. 2 Rooting the tree and giving length to branches

59

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best record = 52

Page 60: 1. 2 Rooting the tree and giving length to branches

60

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best record = 52

Page 61: 1. 2 Rooting the tree and giving length to branches

61

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52

Page 62: 1. 2 Rooting the tree and giving length to branches

62

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52

Page 63: 1. 2 Rooting the tree and giving length to branches

63

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51

53 58

Page 64: 1. 2 Rooting the tree and giving length to branches

64

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

Page 65: 1. 2 Rooting the tree and giving length to branches

65

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

Page 66: 1. 2 Rooting the tree and giving length to branches

66

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

Page 67: 1. 2 Rooting the tree and giving length to branches

67

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best TREE.MP score = 42Total trees visited: 14

Page 68: 1. 2 Rooting the tree and giving length to branches

68

MP-BNB – an improvement

30

43 39

53 51 42 47 47

55

Evaluate all 3 first

Total trees visited: 9

The “bound” after searching this subtree will be 42.