maximum likelihood (ml) parameter estimation with applications to inferring phylogenetic trees...

27
. Maximum Likelihood ( ML ) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken from Dan Geiger, modified by Benny Chor. Background reading: Durbin et al Chapter 8.

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

.

Maximum Likelihood (ML) Parameter Estimation

with applications to inferring phylogenetic trees

Comput. Genomics, lecture 7a

Presentation partially taken from Dan Geiger, modified by Benny Chor.

Background reading: Durbin et al Chapter 8.

Page 2: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

2

Our Probabilistic Model (Reminder)

Now we don’t know the states at internal node(s), nor

the edge parameters pe1, pe2, pe3

XXYXY YXYXX

YYYYX

pe1

pe2

pe3

A single edge is a fairly boring tree…

?????

Page 3: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

3

Maximum Likelihood

Maximize likelihood (over edge parameters),

while averaging over states of unknown, internal

node(s).

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 4: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

4

Maximum Likelihood (2)Consider the phylogenetic tree to be a stochastic process.

XYXYYX

XXXXXY

XXX XYX

XXX

The probability of transition from character a to character b along edge e is given by parameters pe.

Given the complete tree, the likelihood of data is determined by the values of the pe‘s.

Observed

Unobserved

Page 5: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

5

Maximum Likelihood (3)

We assume each site evolves independently of the others.

XY

XX

This allows us to decompose the likelihood of the data (sequences at leaves) to the product of each site, given the (same) tree and edge probabilities. This is the first key to an efficient DP algorithm for the tiny ML problem.(Felsenstein, 1981).

Will now show how Pr(D(i)|Tree, ) is efficiently computed.

YY

XX

XX

XY

Pr(D|Tree, )=i Pr(D(i)|Tree, )

Page 6: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

6

X

p1 p2

tree1 tree2

Let T be a binary tree with subtrees T1 and T2.

Let Lx(D | T, ) be the

likelihood of T with X

at T’s root.

Define LY(D | T, )

similarly.

Computing the Likelihood

Y

Page 7: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

7

By the definition of likelihood (sum over internal assignments),

L(D | T, ) = Lx(D | T, ) + LY(D | T, )

This is the second key to an

efficient DP algorithm

for the tiny ML problem.

(Felsenstein, 1981)

Computing the Likelihood (2)

X

p1 p2

tree1 tree2

Y

Page 8: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

8

Computing Lx(D | Tree, ) X

p1 p2

X Y X Y

tree1 tree2

Page 9: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

9

Computing Lx(D | Tree, )

X

p1 p2

X Y X Y

tree1 tree2

Lx(D | Tree, ) =

( Lx(D | Tree1, )(1- p1)+ LY(D | Tree1, ) p1 ) *

( Lx(D | Tree2, )(1- p2)+ LY(D | Tree2, ) p2 )

Page 10: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

10

The Dynamic Programming Algorithm

X

p1 p2

X

Y

X

Y

tree1 tree2

The algorithm starts from the leaves and proceeds up towards the root. For each sub-tree visited, keep both Lx(D | sub-tree, ) and LY(D | sub-tree, ). This enables computing Lx and LY likelihoods w.r.t T using 5 multiplications and 2 additions.

Page 11: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

11

The Dynamic Programming Algorithm

X

p1 p2

X

Y

X

Y

tree1 tree2

The algorithm thus takes O(1) floating point operations per internal node of the tree. If there are n leaves, the number ofinternal nodes is n-1, so overall complexity is O(n).

Page 12: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

12

What About Initialization?

X

p1 p2

X Y XY

tree1 tree2

Well, this is easy. If T is a leaf that contains X, then Lx(D | T, ) = 1, and Lx(D | T, ) = 0.( the case where T is a leaf that contains Y is left as a bonusassignment )

Page 13: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

13

A Few More Question Marks

X

p1 p2

X Y XY

tree1 tree2

• What if tree is not binary? Would it not effect complexity…• What if tree unrooted? Can show symmetry of substitution probabilities implies likelihood invariant under choice of roots.• Numerical questions (underflow, stability).• Non binary alphabet.

Page 14: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

14

From Two to Four States Model

Maximize likelihood (over edge parameters),

while averaging over states of unknown, internal

node(s).

But what do the edge probabilities mean now?

ACCGT AAGTT

CGGCT

pe1

pe2

pe3?????

Page 15: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

15

From Two to Four States Model (2)

So far, our models consisted of a “regular” tree, where in addition, edges are assigned substituion probabilities.

For simplicity, assumed our “DNA” has only two

states, say X and Y. If edge e is assigned probability pe , this means

that the probability of substitution (X Y)

across e is pe .

Now a single pe can no longer express all 16-4=12

possible substitution probabilities.

Page 16: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

16

From Two to Four States Model Now a single pe can no longer express all 16-4=12

possible substitution probabilities. The most general model will indeed have 12

independent parameters per edge, e.g. pe (C->A),

pe (T->A), etc. It need not be symmetric.

Still, most popular models are symmetric, and use

far less parameters per edge. For example, the Jukes-Cantor substitution model

assumes equal substitution probability of any unequal pair of nucleotides (across each edge separately).

Page 17: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

17

The Jukes-Cantor model (1969)

Jukes-Cantor assume equal prob. of change:

GA

TC

1-31 3

1 3Subst. Prob.

1 3

1 3

A C G T

A

C

G

T

Page 18: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

18

Tiny ML on Four States : Like Before, Only More Cases

Can handle DNA subst. models, AA subst. models, ... Constant (per node) depends on alphabet size.

subtree)} ()({

subtree)} ()({

)(

rightPNGP

leftPNGP

subtreeP

NsNucleotideN

NsNucleotideN

G

A C G T

A C G TA C G T

P(GC) *PC(left subtree)

GleafG leafP

tionInitialisa

,)(

Page 19: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

19

Kimura’s K2P model (1980)Jukes-Cantor model does not take into account that transitions rates (between purines) AG and (between pyrmidine) CT are different from transversions rates(AC, AT, CG, GT).

Kimura 2 parameter model uses a different substitution matrix:

1 2

1 2Subst. Prob.

1 2

1 2

A C G T

A

C

G

T

Page 20: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

20

Kimura’s K2P model (Cont)

tttt

tttt

tttt

tttt

rsus

srsu

usrs

susr

tS )(

ttt

ttt

tt

usr

eeu

es

21

214

1

14

1

)(24

4

Leading using similar methods to:

Where:

Page 21: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

21

Additional Models

There are yet more involved DNA substitution models, responding to phenomena occurring in DNA.

Some of the models (like Jukes-Cantor, Kimura 2 parameters, and others) exhibit a “group-like” structure that helps analysis.

The most general of these is a matrix where all rates of change are distinct (12 parameters).

For AA (proteins), models typically have less structure.

Further discussion is out of scope for this course. Pleaserefer to the Molecular Evolution course (life science).

Page 22: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

22

Back to the 2 States Model

Showed efficient solution to the tiny ML problem.

Now want to efficiently solve the tiny AML problem.

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 23: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

23

Two Ways to Go

In the second version (maximize over states of internal nodes)

we are looking for the “most likely” ancestral states. This is

called ancestral maximum likelihood (AML).

In some sense AML is “between” MP (having ancestral states)

and ML (because the goal is still to maximize likelihood).

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 24: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

24

Two Ways to Go

In some sense AML is “between” MP (having ancestral states)

and ML (because the goal is still to maximize likelihood).

The tiny AML algorithm will be like Fitch small MP algorithm:

It goes up to the root, then back down to the leaves.

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 25: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

25

Let T be a binary tree with subtrees T1 and T2.

Let LE(D | T, ) be the

ancestral likelihood of T

with E (X or Y) at the

node of T’s father.

Computing the Ancestral Likelihood

X

p1 p2

tree1 tree2

Y

Ep

Page 26: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

26

By the definition of ancestral likelihood (maximizing over internal assignments),

LX(D| T, ) =

max((1-p)Lx(D | tree1, ) * Lx(D | tree2, ) ,

pLY(D | tree1, )* LY(D | tree2, ))

This is key to an efficient DP

algorithm for the tiny AML

problem (Pupko et. al, 2000)

Computing the Ancestral Likelihood (2)

X

p1 p2

tree1 tree2

Y

Xp

Page 27: Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken

27

Boundary conditions: At leaves

LX(D| T, ) = 1-p if leaf label is X, p otherwise.

At root: We pick label E (X or Y) that maximizes LE(D | tree1, ) LE(D | tree2, ).

We now go down the tree. At each node we pick

the label that maximizes the likelihood,

given the (known) label of father.

Total run time is O(n).

Computing the Ancestral Likelihood (2)

X

p1 p2

tree1 tree2

Y

Xp