seminar: kratsch & mchardy 2014 bioinformatics 30(17), i527-i533

40
RidgeRace: ridge regression for continuous ancestral character estimation on phylogenetic trees Presentation by Rosemary McCloskey Christina Kratsch 1 Alice C. McHardy 1 1 Department for Algorithmic Bioinformatics, Heinrich Heine University November 6, 2014 Kratsch & McHardy RidgeRace November 6, 2014 1 / 13

Upload: rosemary-mccloskey

Post on 29-Jul-2015

29 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace: ridge regression for continuous ancestralcharacter estimation on phylogenetic trees

Presentation by Rosemary McCloskey

Christina Kratsch1 Alice C. McHardy1

1Department for Algorithmic Bioinformatics, Heinrich Heine University

November 6, 2014

Kratsch & McHardy RidgeRace November 6, 2014 1 / 13

Page 2: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 3: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxa

I internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 4: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 5: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 6: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)

I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 7: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 8: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 9: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ancestral reconstruction

?

?

phylogeny: binary tree representingevolutionary relationships betweenorganisms

I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors

ancestral reconstruction:estimation of characteristics of unseenancestral taxa

I discrete (eg. DNA sequence)I continuous (eg. body weight)

http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction

Kratsch & McHardy RidgeRace November 6, 2014 2 / 13

Page 10: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 11: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 12: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 13: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 14: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 15: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

RidgeRace

Existing ancestral reconstruction algorithms:

assume traits evolve along the tree according to a particular model(eg. Brownian motion)

assume fixed rates of evolution across some or all branches

use ancestral reconstruction only as a stepping stone to examinecorrelated traits

RidgeRace:

uses phylogenetic information only (no evolutionary model)

allows any rate on any branch

has ancestral reconstruction as its goal

Kratsch & McHardy RidgeRace November 6, 2014 3 / 13

Page 16: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Observed phenotypes are sums ofcontributions of each ancestralbranch, plus the root.

y4 = g0 + ga + gb + gc

Branch contributions areproportional to branch lengths.

ga = laβa

Kratsch & McHardy RidgeRace November 6, 2014 4 / 13

Page 17: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Observed phenotypes are sums ofcontributions of each ancestralbranch, plus the root.

y4 = g0 + ga + gb + gc

Branch contributions areproportional to branch lengths.

ga = laβa

Kratsch & McHardy RidgeRace November 6, 2014 4 / 13

Page 18: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 19: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 20: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 21: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 22: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 23: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Combining all yi,~y = L~β,

where (defining l0 = 1),

Li,j =

{lj branch j is ancestral to sample i

0 otherwise.

Optimize β via ridge regression:

β̂ = arg min~β

∑i

(yi − (L~β)i)2 + λ

∑j

β2j .

λ is the regularization penalty :

penalizes large βj more than small (reduces complexity)

shrinks small βj even closer to zero (reduces noise)

Kratsch & McHardy RidgeRace November 6, 2014 5 / 13

Page 24: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Calculate states at internal nodesfrom estimated β̂.

x3 = β0 + laβa + lbβb.

For all xi,

x̂ = L′β̂,

where

L′ij =

{lj j → i

0 otherwise.

Kratsch & McHardy RidgeRace November 6, 2014 6 / 13

Page 25: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Methods

Calculate states at internal nodesfrom estimated β̂.

x3 = β0 + laβa + lbβb.

For all xi,

x̂ = L′β̂,

where

L′ij =

{lj j → i

0 otherwise.

Kratsch & McHardy RidgeRace November 6, 2014 6 / 13

Page 26: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Simulations

random trees of size 30, 100, 200, 300, 400, 500

phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace

RidgeRace comparable to other methods.

Kratsch & McHardy RidgeRace November 6, 2014 7 / 13

Page 27: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Simulations

random trees of size 30, 100, 200, 300, 400, 500

phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}

ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace

RidgeRace comparable to other methods.

Kratsch & McHardy RidgeRace November 6, 2014 7 / 13

Page 28: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Simulations

random trees of size 30, 100, 200, 300, 400, 500

phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace

RidgeRace comparable to other methods.

Kratsch & McHardy RidgeRace November 6, 2014 7 / 13

Page 29: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Simulations

random trees of size 30, 100, 200, 300, 400, 500

phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace

RidgeRace comparable to other methods.

Kratsch & McHardy RidgeRace November 6, 2014 7 / 13

Page 30: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Simulations

Kratsch & McHardy RidgeRace November 6, 2014 8 / 13

Page 31: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ovarian cancer data

Hierarchical clustering of 325 ovarian cancer samples.

Reconstructed survival time; mapped mutations to ancestral nodes byparsimony.

Kratsch & McHardy RidgeRace November 6, 2014 9 / 13

Page 32: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Ovarian cancer data

Hierarchical clustering of 325 ovarian cancer samples.

Reconstructed survival time; mapped mutations to ancestral nodes byparsimony.

Kratsch & McHardy RidgeRace November 6, 2014 9 / 13

Page 33: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Good points

The good:

simple approach comparable inperformance to more complexmethods

ancestral reconstructionwithout assuming a particularmodel of evolution

Kratsch & McHardy RidgeRace November 6, 2014 10 / 13

Page 34: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Good points

The good:

simple approach comparable inperformance to more complexmethods

ancestral reconstructionwithout assuming a particularmodel of evolution

Kratsch & McHardy RidgeRace November 6, 2014 10 / 13

Page 35: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Good points

The good:

simple approach comparable inperformance to more complexmethods

ancestral reconstructionwithout assuming a particularmodel of evolution

Kratsch & McHardy RidgeRace November 6, 2014 10 / 13

Page 36: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Room for improvement

choice of real data was a bitodd (not ancestralreconstruction)

limitation is very limiting

The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for

approximately balanced trees.

Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.

Kratsch & McHardy RidgeRace November 6, 2014 11 / 13

Page 37: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Room for improvement

choice of real data was a bitodd (not ancestralreconstruction)

limitation is very limiting

The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for

approximately balanced trees.

Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.

Kratsch & McHardy RidgeRace November 6, 2014 11 / 13

Page 38: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Room for improvement

choice of real data was a bitodd (not ancestralreconstruction)

limitation is very limiting

The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for

approximately balanced trees.

Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.

Kratsch & McHardy RidgeRace November 6, 2014 11 / 13

Page 39: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Thank you!

Kratsch & McHardy RidgeRace November 6, 2014 12 / 13

Page 40: Seminar: Kratsch & McHardy 2014 Bioinformatics 30(17), i527-i533

Brownian motion

15 kg

48 kg

. . .

. ..

At each time step ∆t, movementdrawn from a normal distributionwith mean 0 and variance σ2, thenlet ∆t→ 0.

aver

age

body

mas

s

time

1020

3040

50

Kratsch & McHardy RidgeRace November 6, 2014 13 / 13