taxon coverage pattern - university of tasmania · 2012-06-28 · 2 5 decisiveness definition: let...
TRANSCRIPT
![Page 1: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/1.jpg)
1
Phylomania November 10, 2011
n Joint work with
‘Lassoing’ a tree: Phylogenetic theory for sparse patterns of taxon coverage
Mike Steel
Michelle McMahon Michael Sanderson
Andreas Dress Katharina Huber
2
Talk Outline
n Part 1: Decisiveness n Part 2: Lassoing a tree
n Part 3: Quantifying LGT [if time?]
3
Part I: Partial taxon coverage
Group Taxa Loci % Missing
Citation
Metazoa 77 150 55 Dunn et al. 2008
Papilionoid legumes
2228 39 96 McMahon and Sanderson 2006
Asterales 4954 5 91 Smith et al. 2009
Eukaryotes 73060 13 92 Goloboff et al. 2009
4
Taxon coverage pattern
a x x b x x c x x d x e x
Note: this does not commit to any par4cular gene tree topologies, just the taxon sets.
Two taxon sets: {a,b,c,d} and {a,b,c,e}!
Gene1 Gene2
= {{a,b,c,d},{a,b,c,e}}
![Page 2: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/2.jpg)
2
5
Decisiveness
Definition: Let S be a collection of taxon sets. Then:!!•!!!!!!!!•!!!T must be binary!!!!!
S is decisive for a phylogenetic tree T provided T is uniquely determined by how it resolves each of the taxon sets in S.!!!!!!S is phylogenetically decisive for all trees if this holds for all choices of T.!
!i.e. if S= {Y1,…,Yk} then T is the only tree that displays T|Y1,….,T|Yk!
a x x b x x c x x d x e x
e
a b c
d b
a c d
e
There are 6, like that below, where the taxon coverage is decisive
...and 9, like that below, where it is not decisive
Of the 15 possible binary unrooted trees for this data set...
Not phylogenetically decisive (for all trees)
a x x x x b x x x c x x x d x x x e x x x
Phylogenetically decisive (for all trees)
8!
Testing whether taxon coverage is phylogenetically decisive (for all trees)
Theorem [S+Sanderson, 2010]!!n S is phylogenetically decisive for all trees if and only if: ! For each 4-way partition of the full taxon set, there are four taxa in
one of the taxon sets in S that intersect each block of the partition.!
Complexity? Manuel Bidirsky’s `No rainbow colouring’ problem! 8!
a x x b x x c x x d x e x
a�b�c �d�
e �
a�b�c �d�
e �
![Page 3: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/3.jpg)
3
9
A lower bound on number of loci for decisiveness for all trees
!Examples:
max col density (m/n) ! ! “min” num loci (k)!33% ! ! ! ! !27!80% ! ! ! ! !2!
Theorem: If a collection S of taxon sets of size n1, n2,…, nk is phylogenetically decisive for all trees with n leaves then: So, if each taxon set in S has size at most m then:
!
n j (n j "1)(n j " 2) # n(n "1)(n " 2)j=1
k
$
!
k " n(n #1)(n # 2)m(m #1)(m # 2)
" (n /m)3
p(x,j) = probability taxon x is present at loci j.
Modelling random taxon coverage
Choices made independently for k loci to get a pattern S of taxon coverage (n, pxj, k). Context: whole genome sequencing at “low” coverage that generates partial assemblies. Or complete genome sequences at deep phylogenetic depths where loss of homology is an issue in assembling data sets
Simplest model: Uniform coverage: pxj=p
Theorem For any rooted binary tree T with n leaves, with coverage probability p, the probability that a set S of k (random) taxon sets is phylogenetically decisive for T is at least 1 - ε if!
!
k "log (n # 2) /$( )#log(1# p3)
Moreover, if k is much less than this, then S is not (w.p. 1- ε) phylogenetically decisive!
~log n /!( )
p3
k !log n / 3
" log(1"!)#
$%
&
'(
" log(1" p3)
Three extensions Allow each edge in each T|Yi to be present only with probability q (else the edge is collapsed)!
p3! p3q
Variable coverage (VC): p (x ) ≥ p for all taxa x in!
clade C , and p (x ) ≥ p' for any taxon x not in C !
(where p ' <p ).
p3q! p2p 'q
Unrooted trees
p3! p4
![Page 4: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/4.jpg)
4
13
Decisiveness in real data (random sampling)
p n
Extension: Terraces!"(Science, 2011)###!
n Construct optimal tree for all the loci (MP, or ML/Bayesian with partitioned analysis) n Lemma: If T is an optimal tree, and Y1,…,Yk are the taxa sets
then ANY other tree that displays T|Y1,….,T|Yk is optimal.!n Leads to large ‘terraces’ of equally optimal trees.!
!Eg. Bouchenak-‐Khelladi et al. 2008 [Grasses] 298 taxa, 3 loci. 61 million trees
By removing 12 of 298 original taxa, terrace size reduced from 61 million trees to 1 tree.!!Theorem [MB]: All trees on a terrace belong to the same tree island
15
Talk Outline
n Part 1: Decisiveness n Part 2: Lassoing a tree
n Part 3: Quantifying LGT
16
NJ and all that
a b d
c f
e
a
b
c
d e
f ),( lTdNJ
),( lT Classic results (1960s)
llTTdd lTlT ʹ′=ʹ′=⇒= ʹ′ʹ′ &),(),(
Under patchy taxon sampling, reliable d(x,y) values may not be available for all pairs. !Example: There may be no locus that contains both x and y.!
![Page 5: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/5.jpg)
5
17
Do we need all the d-values? Given and
Does d determine T? l ?
Example: L = {ab, cd, ac, bd}
L !X2"
#$
%
&' L|),( lTdd =
!L = L "{ad} determines T and l!!L = {ab,ac,ad,bc,bd} determines neither
d |L determines T but not l.
a
b
c
d
18
Definition (Lassos)
n L lassos the branch lengths of T iff n L lassos the shape of T iff n L is a strong lasso for T if both hold.
We call the element of L the ‘chords’ of the lasso
L !X2"
#$
%
&'
lldd lTlT ʹ′=⇒= ʹ′ LL || ),(),(
TTdd lTlT ʹ′=⇒ʹ′= ʹ′ʹ′ LL || ),(),(
19
The graph (X, L) L = {ab, cd, ac, bd}
Lemma: v If L lassos the shape of T then (X,L) is
connected. v If L lassos the branch lengths of T then every
connected component of (X, L) has an odd cycle.
a
d
b
c
20
Question 1:
n Is every branch-length lasso a strong lasso?
b
a
c b'
c'
a'
![Page 6: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/6.jpg)
6
21
Is an branch-length lasso a strong lasso?
a a'
b c b'
c'
2 2
a a'
c b c'
b'
2 2
No!
22
Question 2: How small can lassos be?
Observation: Any branch-length lasso of T has size at least E(T). Theorem: If L lassos the shape of T then |L| >|E|-1. Why? Star tree T* If L lassos the shape of a star tree iff L= Binary tree T There exist topological lassos of size 4(n-3). Why? Can we do better?
X2!
"#
$
%&
23
Question 2: How small can lassos be?
Theorem Any binary tree T with n leaves has a strong lasso of size 2n-3 [Yushmanov, 1984] Moreover, T has a topological lasso of size 2n-4.
24
A key idea: Lassos as “Covers”
Lemma: If L lassos the shape or the branch lengths of (binary) T then L is a cover for T.
![Page 7: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!](https://reader033.vdocument.in/reader033/viewer/2022050313/5f7577705d489f1bce1a6f7d/html5/thumbnails/7.jpg)
7
25
“Pointed covers”
Theorem: If L is a pointed cover for T then L is a strong lasso for T.
Corollary: There are strong lassos for T of size 2n-3
26
The ‘Triplet Cover’ conjecture
Conjecture: If L contains a triplet cover for T then L lassos T’s shape
Lemma: If L contains a triplet cover for T then L lassos T’s branch lengths
{ab,ac,bc}! L
Theorem: If L contains a triplet cover for T and one for T’ then T=T’!
27
Finally…minimal branch length lassos
Theorem: M(T ) is a binary matroid if and only if T is a caterpillar tree
Lemma: M(T) is a matroid that determines the shape of T (i.e. )
Simple example: Star tree
M (T ) =M (T ')! T = T ' Example of element of M(T)for. n = 23 here is one:
M(T) := set of branch length lassos of T of minimal size.
Further details
n Sanderson, M.J., McMahon, M.M. and Steel, M. (2010). Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evolutionary Biology 10: 155
n Steel, M. and Sanderson, M.J. (2010). Characterizing phylogenetically decisive taxon coverage. Applied Mathematics Letters 23, 82-86.
n Sanderson, M.J., McMahon, M.M. and Steel, M. Terraces in phylogenetic tree space. Science 333: 448-450.
n Dress, A.W.M., Huber, K.T. and Steel, M. 'Lassoing' a phylogenetic tree (I): Basic properties, shellings, and covers. Journal of Mathematical Biology (in press).
n Lasso (II), (III) in prep.
The Annual New Zealand Phylogenetics Meeting Sun. 29th Jan. - Fri. 3rd Feb. 2012 ABaCASS Feb 4-12th 1st NZ phylogenetics workshop Feb 4-7.
www.math.canterbury.ac.nz/bio/events/south2012/