seminar: kratsch & mchardy 2014 bioinformatics 30(17), i527-i533
TRANSCRIPT
RidgeRace: ridge regression for continuous ancestralcharacter estimation on phylogenetic trees
Presentation by Rosemary McCloskey
Christina Kratsch1 Alice C. McHardy1
1Department for Algorithmic Bioinformatics, Heinrich Heine University
November 6, 2014
Kratsch & McHardy RidgeRace November 6, 2014 1 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxa
I internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)
I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
Ancestral reconstruction
?
?
phylogeny: binary tree representingevolutionary relationships betweenorganisms
I leaves ⇔ observed/sampled taxaI internal nodes ⇔ common ancestors
ancestral reconstruction:estimation of characteristics of unseenancestral taxa
I discrete (eg. DNA sequence)I continuous (eg. body weight)
http://topicpages.ploscompbiol.org/wiki/Ancestral reconstruction
Kratsch & McHardy RidgeRace November 6, 2014 2 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
RidgeRace
Existing ancestral reconstruction algorithms:
assume traits evolve along the tree according to a particular model(eg. Brownian motion)
assume fixed rates of evolution across some or all branches
use ancestral reconstruction only as a stepping stone to examinecorrelated traits
RidgeRace:
uses phylogenetic information only (no evolutionary model)
allows any rate on any branch
has ancestral reconstruction as its goal
Kratsch & McHardy RidgeRace November 6, 2014 3 / 13
Methods
Observed phenotypes are sums ofcontributions of each ancestralbranch, plus the root.
y4 = g0 + ga + gb + gc
Branch contributions areproportional to branch lengths.
ga = laβa
Kratsch & McHardy RidgeRace November 6, 2014 4 / 13
Methods
Observed phenotypes are sums ofcontributions of each ancestralbranch, plus the root.
y4 = g0 + ga + gb + gc
Branch contributions areproportional to branch lengths.
ga = laβa
Kratsch & McHardy RidgeRace November 6, 2014 4 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Combining all yi,~y = L~β,
where (defining l0 = 1),
Li,j =
{lj branch j is ancestral to sample i
0 otherwise.
Optimize β via ridge regression:
β̂ = arg min~β
∑i
(yi − (L~β)i)2 + λ
∑j
β2j .
λ is the regularization penalty :
penalizes large βj more than small (reduces complexity)
shrinks small βj even closer to zero (reduces noise)
Kratsch & McHardy RidgeRace November 6, 2014 5 / 13
Methods
Calculate states at internal nodesfrom estimated β̂.
x3 = β0 + laβa + lbβb.
For all xi,
x̂ = L′β̂,
where
L′ij =
{lj j → i
0 otherwise.
Kratsch & McHardy RidgeRace November 6, 2014 6 / 13
Methods
Calculate states at internal nodesfrom estimated β̂.
x3 = β0 + laβa + lbβb.
For all xi,
x̂ = L′β̂,
where
L′ij =
{lj j → i
0 otherwise.
Kratsch & McHardy RidgeRace November 6, 2014 6 / 13
Simulations
random trees of size 30, 100, 200, 300, 400, 500
phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace
RidgeRace comparable to other methods.
Kratsch & McHardy RidgeRace November 6, 2014 7 / 13
Simulations
random trees of size 30, 100, 200, 300, 400, 500
phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}
ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace
RidgeRace comparable to other methods.
Kratsch & McHardy RidgeRace November 6, 2014 7 / 13
Simulations
random trees of size 30, 100, 200, 300, 400, 500
phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace
RidgeRace comparable to other methods.
Kratsch & McHardy RidgeRace November 6, 2014 7 / 13
Simulations
random trees of size 30, 100, 200, 300, 400, 500
phenotypic evolution by Brownian motion with σ2 ∈ {0.5, 1, . . . , 5}ancestral reconstruction with generalized least squares (GLS),maximum likelihood (ML), and RidgeRace
RidgeRace comparable to other methods.
Kratsch & McHardy RidgeRace November 6, 2014 7 / 13
Simulations
Kratsch & McHardy RidgeRace November 6, 2014 8 / 13
Ovarian cancer data
Hierarchical clustering of 325 ovarian cancer samples.
Reconstructed survival time; mapped mutations to ancestral nodes byparsimony.
Kratsch & McHardy RidgeRace November 6, 2014 9 / 13
Ovarian cancer data
Hierarchical clustering of 325 ovarian cancer samples.
Reconstructed survival time; mapped mutations to ancestral nodes byparsimony.
Kratsch & McHardy RidgeRace November 6, 2014 9 / 13
Good points
The good:
simple approach comparable inperformance to more complexmethods
ancestral reconstructionwithout assuming a particularmodel of evolution
Kratsch & McHardy RidgeRace November 6, 2014 10 / 13
Good points
The good:
simple approach comparable inperformance to more complexmethods
ancestral reconstructionwithout assuming a particularmodel of evolution
Kratsch & McHardy RidgeRace November 6, 2014 10 / 13
Good points
The good:
simple approach comparable inperformance to more complexmethods
ancestral reconstructionwithout assuming a particularmodel of evolution
Kratsch & McHardy RidgeRace November 6, 2014 10 / 13
Room for improvement
choice of real data was a bitodd (not ancestralreconstruction)
limitation is very limiting
The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for
approximately balanced trees.
Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.
Kratsch & McHardy RidgeRace November 6, 2014 11 / 13
Room for improvement
choice of real data was a bitodd (not ancestralreconstruction)
limitation is very limiting
The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for
approximately balanced trees.
Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.
Kratsch & McHardy RidgeRace November 6, 2014 11 / 13
Room for improvement
choice of real data was a bitodd (not ancestralreconstruction)
limitation is very limiting
The estimation of β might thus bebiased if the depth of single leafnodes is large compared with therest of the tree. We thereforerecommend RidgeRace for
approximately balanced trees.
Bush, Robin M., et al. “Effects of passage historyand sampling bias on phylogenetic reconstruction ofhuman influenza A evolution.” PNAS 97.13 (2000):6974-6980.
Kratsch & McHardy RidgeRace November 6, 2014 11 / 13
Thank you!
Kratsch & McHardy RidgeRace November 6, 2014 12 / 13
Brownian motion
15 kg
48 kg
. . .
. ..
At each time step ∆t, movementdrawn from a normal distributionwith mean 0 and variance σ2, thenlet ∆t→ 0.
aver
age
body
mas
s
time
1020
3040
50
Kratsch & McHardy RidgeRace November 6, 2014 13 / 13