a separate analysis approach to the reconstruction of phylogenetic networks

23
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin

Upload: waite

Post on 13-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks. Luay Nakhleh Department of Computer Sciences UT Austin. Who’s Involved. UT CS : Tandy Warnow, Luay Nakhleh UT BIO : Randy Linder UNM CS : Bernard Moret. Why Networks?. Lateral gene transfer (LGT) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks

Luay NakhlehDepartment of Computer Sciences

UT Austin

Page 2: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Who’s Involved

– UT CS: Tandy Warnow, Luay Nakhleh– UT BIO: Randy Linder – UNM CS: Bernard Moret

Page 3: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Why Networks?

• Lateral gene transfer (LGT)– Ochman estimated that 755 of 4,288 ORF’s in

E.coli were from at least 234 LGT events

• Hybridization– Estimates that as many as 30% of all plant

lineages are the products of hybridization– Fish– Some frogs

Page 4: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Phylogenetic Networks

• Rooted, directed, acyclic graphs that actually model the evolutionary process

• “tree” nodes and “network” nodes

• Time constraints

Page 5: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Separate Analysis

• Analyze individual genes separately

• Reconcile the resulting phylogenies

• As opposed to combined analysis in which the datasets are combined (via concatenation) and the combined dataset is then analyzed

Page 6: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

SPR Distances Among Gene Trees

A B C D E

A B C D E A B C D E

SPR Distance 1

Page 7: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Maddison’s Method

Given two gene datasets

• Construct two gene trees T1 and T2

• If SPR(T1,T2)=0– Return a tree

• If SPR(T1,T2)=1– Return a network with one reticulation event

Open problem: extend to reconstructing a network with m reticulation events

Page 8: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Challenges

(1) Computational

– Computing SPR distances is of unknown computational complexity (probably hard)

Page 9: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Solving the Computational Challenge

• Galled-networks: reticulation events are independent

• For two gene trees T1 and T2 on n leaves we can– Decide whether SPR(T1,T2)=m in O(mn)

time, and – Construct network N from T1 and T2 in O(mn)

time

Page 10: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Challenges

(2) Systematic

– Obtaining the correct gene trees in practice is very hard (due to missing data, inaccuracy of tree reconstruction methods, wrong assumptions, etc.)

Page 11: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Solving the Systematic Challenge: Our Method SpNet

Given the sequences of two genes I & II on a set of species

• Run MP or ML on gene I and obtain a set U1 of trees, represented by its consensus tree t1

• Run MP or ML on gene II and obtain a set U2 of trees, represented by its consensus tree t2

• Find binary trees T1 and T2, that refine t1 and t2, respectively, and such that SPR(T1,T2)=1

• Build network N from T1 and T2

Page 12: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

SpNet: Running Time

• We have a linear-time algorithm for the single hybrid case (implementation and experimental results are available as well)

• We are working on the general case of arbitrary number of reticulation events

Page 13: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Experimental Study

• Generated random networks on 10 and 20 taxa, with 0, 1, and 2 hybrids

• Evolved sequences under the GTR+Gamma model of evolution with invariant sites

• We studies the topological accuracy based on the splits defined by the model and inferred network

Page 14: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Evaluation Criteria

• Detection Quality– How often did the method infer the correct

number of hybrids in the model phylogeny?

• Reconstruction Quality– What is the topological accuracy of the

inferred phylogeny?

Page 15: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Methods

• SpNet(i): Our method where we contract i edges

• NNet: The method of Bryant and Moulton

• NJ

Page 16: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Detection Quality of SpNetModel Phylogeny: 20-taxon Tree

Page 17: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Detection Quality of SpNetModel Phylogeny: 20-taxon 1-hybrid network

Page 18: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Detection Quality of SpNetModel Phylogeny: 20-taxon 2-hybrid network

Page 19: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Reconstruction QualityModel Phylogeny: 20-taxon tree

Page 20: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

Page 21: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Reconstruction QualityModel Phylogeny: 20-taxon 1-hybrid network

Page 22: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Conclusions

• Considering a set of “good” trees rather than a single optimal tree is advantageous in network reconstruction

• Separate analysis approaches outperform combined analysis approaches

Page 23: A Separate Analysis Approach to the Reconstruction of   Phylogenetic Networks

Ongoing research

• Using other techniques for obtaining unresolved trees (e.g., Bayesian analyses, bootstrapping, etc.)

• Detection vs. reconstruction – visualization and clustering techniques may also be useful (collaboration with St John)

• Refining unresolved networks

• DCM-like network reconstruction