taxon coverage pattern - university of tasmania · 2012-06-28 · 2 5 decisiveness definition: let...

7
1 Phylomania November 10, 2011 Joint work with ‘Lassoing’ a tree: Phylogenetic theory for sparse patterns of taxon coverage Mike Steel Michelle McMahon Michael Sanderson Andreas Dress Katharina Huber 2 Talk Outline Part 1: Decisiveness Part 2: Lassoing a tree Part 3: Quantifying LGT [if time?] 3 Part I: Partial taxon coverage Group Taxa Loci % Missing Citation Metazoa 77 150 55 Dunn et al. 2008 Papilionoid legumes 2228 39 96 McMahon and Sanderson 2006 Asterales 4954 5 91 Smith et al. 2009 Eukaryotes 73060 13 92 Goloboff et al. 2009 4 Taxon coverage pattern a x x b x x c x x d x e x Note: this does not commit to any par4cular gene tree topologies, just the taxon sets. Two taxon sets: {a,b,c,d} and {a,b,c,e} Gene1 Gene2 = {{a,b,c,d},{a,b,c,e}}

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

1

Phylomania November 10, 2011

n  Joint work with

‘Lassoing’ a tree: Phylogenetic theory for sparse patterns of taxon coverage

Mike Steel

Michelle McMahon Michael Sanderson

Andreas Dress Katharina Huber

2

Talk Outline

n  Part 1: Decisiveness n  Part 2: Lassoing a tree

n  Part 3: Quantifying LGT [if time?]

3

Part I: Partial taxon coverage

Group Taxa Loci % Missing

Citation

Metazoa 77 150 55 Dunn et al. 2008

Papilionoid legumes

2228 39 96 McMahon and Sanderson 2006

Asterales 4954 5 91 Smith et al. 2009

Eukaryotes 73060 13 92 Goloboff et al. 2009

4

Taxon coverage pattern

a   x   x  b   x   x  c   x   x  d   x  e   x  

Note:  this  does  not  commit  to  any  par4cular  gene  tree  topologies,  just  the  taxon  sets.  

Two taxon sets: {a,b,c,d} and {a,b,c,e}!

Gene1   Gene2  

=  {{a,b,c,d},{a,b,c,e}}  

Page 2: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

2

5

Decisiveness

Definition: Let S be a collection of taxon sets. Then:!!•!!!!!!!!•!!!T must be binary!!!!!

S is decisive for a phylogenetic tree T provided T is uniquely determined by how it resolves each of the taxon sets in S.!!!!!!S is phylogenetically decisive for all trees if this holds for all choices of T.!

!i.e. if S= {Y1,…,Yk} then T is the only tree that displays T|Y1,….,T|Yk!

a x x b x x c x x d x e x

e

a b c

d b

a c d

e

There are 6, like that below, where the taxon coverage is decisive

...and 9, like that below, where it is not decisive

Of the 15 possible binary unrooted trees for this data set...

Not phylogenetically decisive (for all trees)

a x x x x b x x x c x x x d x x x e x x x

Phylogenetically decisive (for all trees)

8!

Testing whether taxon coverage is phylogenetically decisive (for all trees)

Theorem [S+Sanderson, 2010]!!n  S is phylogenetically decisive for all trees if and only if: ! For each 4-way partition of the full taxon set, there are four taxa in

one of the taxon sets in S that intersect each block of the partition.!

 Complexity? Manuel Bidirsky’s `No rainbow colouring’ problem! 8!

a   x   x  b   x   x  c   x   x  d   x  e   x  

a�b�c �d�

e �

a�b�c �d�

e �

Page 3: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

3

9

A lower bound on number of loci for decisiveness for all trees

!Examples:

max col density (m/n) ! ! “min” num loci (k)!33% ! ! ! ! !27!80% ! ! ! ! !2!

Theorem: If a collection S of taxon sets of size n1, n2,…, nk is phylogenetically decisive for all trees with n leaves then: So, if each taxon set in S has size at most m then:

!

n j (n j "1)(n j " 2) # n(n "1)(n " 2)j=1

k

$

!

k " n(n #1)(n # 2)m(m #1)(m # 2)

" (n /m)3

p(x,j) = probability taxon x is present at loci j.

Modelling random taxon coverage

Choices made independently for k loci to get a pattern S of taxon coverage (n, pxj, k). Context: whole genome sequencing at “low” coverage that generates partial assemblies. Or complete genome sequences at deep phylogenetic depths where loss of homology is an issue in assembling data sets

Simplest model: Uniform coverage: pxj=p

Theorem For any rooted binary tree T with n leaves, with coverage probability p, the probability that a set S of k (random) taxon sets is phylogenetically decisive for T is at least 1 - ε if!

!

k "log (n # 2) /$( )#log(1# p3)

Moreover, if k is much less than this, then S is not (w.p. 1- ε) phylogenetically decisive!

~log n /!( )

p3

k !log n / 3

" log(1"!)#

$%

&

'(

" log(1" p3)

Three extensions Allow each edge in each T|Yi to be present only with probability q (else the edge is collapsed)!

p3! p3q

Variable coverage (VC): p (x ) ≥ p for all taxa x in!

clade C , and p (x ) ≥ p' for any taxon x not in C !

(where p ' <p ).

p3q! p2p 'q

Unrooted trees

p3! p4

Page 4: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

4

13

Decisiveness in real data (random sampling)

p n

Extension: Terraces!"(Science, 2011)###!

n  Construct optimal tree for all the loci (MP, or ML/Bayesian with partitioned analysis) n  Lemma: If T is an optimal tree, and Y1,…,Yk are the taxa sets

then ANY other tree that displays T|Y1,….,T|Yk is optimal.!n  Leads to large ‘terraces’ of equally optimal trees.!

!Eg. Bouchenak-­‐Khelladi    et  al.  2008 [Grasses]  298  taxa,  3  loci.  61  million  trees  

 By removing 12 of 298 original taxa, terrace size reduced from 61 million trees to 1 tree.!!Theorem [MB]: All trees on a terrace belong to the same tree island

15

Talk Outline

n  Part 1: Decisiveness n  Part 2: Lassoing a tree

n  Part 3: Quantifying LGT

16

NJ and all that

a b d

c f

e

a

b

c

d e

f ),( lTdNJ

),( lT Classic results (1960s)

llTTdd lTlT ʹ′=ʹ′=⇒= ʹ′ʹ′ &),(),(

Under patchy taxon sampling, reliable d(x,y) values may not be available for all pairs. !Example: There may be no locus that contains both x and y.!

Page 5: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

5

17

Do we need all the d-values? Given and

Does d determine T? l ?

Example: L = {ab, cd, ac, bd}

L !X2"

#$

%

&' L|),( lTdd =

!L = L "{ad} determines T and l!!L = {ab,ac,ad,bc,bd} determines neither

d |L determines T but not l.

a

b

c

d

18

Definition (Lassos)

n  L lassos the branch lengths of T iff n  L lassos the shape of T iff n  L is a strong lasso for T if both hold.

We call the element of L the ‘chords’ of the lasso

L !X2"

#$

%

&'

lldd lTlT ʹ′=⇒= ʹ′ LL || ),(),(

TTdd lTlT ʹ′=⇒ʹ′= ʹ′ʹ′ LL || ),(),(

19

The graph (X, L) L = {ab, cd, ac, bd}

Lemma: v If L lassos the shape of T then (X,L) is

connected. v If L lassos the branch lengths of T then every

connected component of (X, L) has an odd cycle.

a

d

b

c

20

Question 1:

n  Is every branch-length lasso a strong lasso?

b

a

c b'

c'

a'

Page 6: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

6

21

Is an branch-length lasso a strong lasso?

a a'

b c b'

c'

2 2

a a'

c b c'

b'

2 2

No!

22

Question 2: How small can lassos be?

Observation: Any branch-length lasso of T has size at least E(T). Theorem: If L lassos the shape of T then |L| >|E|-1. Why? Star tree T* If L lassos the shape of a star tree iff L= Binary tree T There exist topological lassos of size 4(n-3). Why? Can we do better?

X2!

"#

$

%&

23

Question 2: How small can lassos be?

Theorem Any binary tree T with n leaves has a strong lasso of size 2n-3 [Yushmanov, 1984] Moreover, T has a topological lasso of size 2n-4.

24

A key idea: Lassos as “Covers”

Lemma: If L lassos the shape or the branch lengths of (binary) T then L is a cover for T.

Page 7: Taxon coverage pattern - University of Tasmania · 2012-06-28 · 2 5 Decisiveness Definition: Let S be a collection of taxon sets. Then:!! •!!!!! •!!! T must be binary!!!!!

7

25

“Pointed covers”

Theorem: If L is a pointed cover for T then L is a strong lasso for T.

Corollary: There are strong lassos for T of size 2n-3

26

The ‘Triplet Cover’ conjecture

Conjecture: If L contains a triplet cover for T then L lassos T’s shape

Lemma: If L contains a triplet cover for T then L lassos T’s branch lengths

{ab,ac,bc}! L

Theorem: If L contains a triplet cover for T and one for T’ then T=T’!

27

Finally…minimal branch length lassos

Theorem: M(T ) is a binary matroid if and only if T is a caterpillar tree

Lemma: M(T) is a matroid that determines the shape of T (i.e. )

Simple example: Star tree

M (T ) =M (T ')! T = T ' Example of element of M(T)for. n = 23 here is one:

M(T) := set of branch length lassos of T of minimal size.

Further details

n  Sanderson, M.J., McMahon, M.M. and Steel, M. (2010). Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evolutionary Biology 10: 155

n  Steel, M. and Sanderson, M.J. (2010). Characterizing phylogenetically decisive taxon coverage. Applied Mathematics Letters 23, 82-86.

n  Sanderson, M.J., McMahon, M.M. and Steel, M. Terraces in phylogenetic tree space. Science 333: 448-450.

n  Dress, A.W.M., Huber, K.T. and Steel, M. 'Lassoing' a phylogenetic tree (I): Basic properties, shellings, and covers. Journal of Mathematical Biology (in press).

n  Lasso (II), (III) in prep.

The Annual New Zealand Phylogenetics Meeting Sun. 29th Jan. - Fri. 3rd Feb. 2012 ABaCASS Feb 4-12th 1st NZ phylogenetics workshop Feb 4-7.

www.math.canterbury.ac.nz/bio/events/south2012/