carleton biology talk : march 2014
Post on 23-Aug-2014
140 Views
Preview:
DESCRIPTION
TRANSCRIPT
@kcranstn!http://slideshare.net/kcranstn
Enabling science with the tree of life
Karen Cranston!National Evolutionary Synthesis Center (NESCent)
The tree of life provides a means for organizing
and explaining biodiversity data
Weigmann et al. PNAS, 2011
What do we want from a Tree of Life?
❖ complete = contains all of biodiversity!
❖ dynamic = continuously updated with new data!
❖ available digitally = browse, query, download
Image: http://evolution.berkeley.edu
❖ Create a complete tree of life by synthesizing published phylogenetic data!
❖ Provide tools for managing, synthesizing & sharing phylogenetic data
http://opentreeoflife.org
Synthetic science❖ Novel methods & analysis tools!
❖ Big data from existing data
Biodiversity Synthesis Center / Encyclopedia of Life
National Evolutionary Synthesis Center
Challenges
❖ Incongruence: How do we detect and use conflict between trees?!
❖ Availability: What data do we have to construct a tree of life?!
❖ Synthesis: How do we combine data across the tree of life?
What can we learn from conflict between trees?
aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...
Phylogenetic inference
Many likely trees
Gene tree uncertainty
Single gene alignment
Bayesian phylogenetic inference
Input: sequence data + evolutionary model
Output = list of sampled phylogenies
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Sampled trees
Pro
bab
ilit
y
Number of times sampled ∝ probability
Is there a stable backbone among the trees?!
What taxa have unstable placement?
Summarize with agreement subtrees
0.20 0.15
0.25Pr=0.40
1 23 4 5
1 2 3 4 51 23 4 5
1 2 3 4 5
Pr=1.00
0.20 0.15
0.25Pr=0.40
1 3 4 5 1 3 4 5
1 3 4 51 3 4 5
0.20 0.15
0.25Pr=0.40
1 23 4 5
1 2 4 3 51 23 4 5
1 2 3 4 5
Pr=0.85
0.20 0.15
0.25Pr=0.40
1 3 4 5 1 4 3 5
1 3 4 51 3 4 5
Cranston, K.A. and B.H. Rannala. Summarizing a posterior distribution of phylogenies using agreement subtrees. Systematic Biology 2007: 56(4), pp. 578-590.
Multiple sequence alignments
Concatenate
Supermatrix
Species tree
Supertrees
Gene duplication
Coalescent
Gene trees
Phylogenomics of rice (Oryza)820,000 BAC-end
sequences for 9 diploid Oryza species
1720 gene fragments!2.4 million nucleotides
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
What are the biological causes of gene tree
incongruence in rice?!
Do we need full genomes to answer these questions?
Phylogenomics of rice (Oryza)
Cranston, K.A., B. Hurwitz, M.J. Sanderson, D. Ware, R.A. Wing, L. Stein. Phylogenomic analysis from deep BAC-end sequence libraries of rice. Systematic Botany, 35:3, 2010
Concatenated analysis
Gene trees in Oryza❖ Gene tree methods: recover every
possible topology!
❖ Species tree methods: many clades not statistically significant
Cranston, K.A., B. Hurwitz, D. Ware, L. Stein, R.A. Wing. Species trees from highly incongruent gene trees in rice. Systematic Biology. 2009: doi: 10.1093/syst- bio.syp054
Supermatrix topology
❖ Suggest incomplete lineage sorting and hybridization / introgression in evolutionary history of rice
What data do we have for creating a complete tree of life?
Gene tree signal in GenBank
How many trees can we build using all of the data in GenBank and how are those trees distributed across the tree of life?
All-vs-all BLAST at each NCBI taxonomy node
Sanderson, M.J., D.T. Boss, D. Chen, K.A. Cranston, and A. Wehe. The PhyLoTA Browser: Processing GenBank for molecular phylogenetics research. Systematic Biology 2008: 57(3).
Arachis hypogaea
Arachis hypogaea subsp. fastigiata
Arachis hypogaea subsp. hypogaea Arachis glabrata
subtree clusters
Arachis
All possible clusters, alignments and trees
aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...
❖ ~90000 clusters, alignments, trees available for download!
❖ data availability matrix at each NCBI node
❖ complete = contains all of biodiversity!
❖ dynamic = continuously updated with new data!
❖ available digitally = browse, query, download
http://opentreeoflife.org
Gordon Burleigh Keith Crandall Karl Gude David Hibbett Mark Holder
Laura Katz Rick Ree Stephen Smith Doug Soltis Tiffani Williams
Computer science!Systematics!
Evolutionary theory!Computational biology!
Bioinformatics!Journalism
Even if there were phylogenies for all sequence clusters in GenBank, would only represent a
small fraction of biodiversity
Two types of inputs
Phylogeny!highly resolved!
computationally derived!limited coverage
Taxonomy!poorly resolved!
manually curated!much more complete
~7000 trees from ~2600 studies
Phylografter: Rick Ree, Field Museum of Natural History
thermore, a paraphyletic relationship of phorids and syrphidswould support the hypothesis that their shared special mode ofextraembryonic development (dorsal amnion closure) (26)evolved in the stem lineage of Cyclorrhapha and preceded theorigin of the schizophoran amnioserosa.
To test this hypothesis, we used a relatively recent phylogenomicmarker: small, noncoding, regulatory micro-RNAs (miRNAs).miRNAs exhibit a striking phylogenetic pattern of conservationacross the metazoan tree of life, suggesting the accumulation andmaintenance ofmiRNA families throughout organismal evolution
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The numberof origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology ofthe organisms.
Wiegmann et al. PNAS Early Edition | 3 of 6
EVOLU
TION
~ 4% of all published phylogenetic trees
Stoltzfus et al 2012
Trees generally published as pictures in PDFs
OpenTree Reference Taxonomy
+
+
+patch files for manual edits
+
3,133,028 nodes and 2,559,835 ‘species’
Jonathan Rees, NESCent
How do we combine data to build and use a tree of life?
Novel datastore for synthesis
Treemachine: Stephen Smith, Cody Hinchliff, Joseph Brown, U Michigan
Jim Allman, NESCent
Manual synthesis based on all data
Automated synthesis based on limited data
Inputs: Published phylogenies
Taxonomies
• filter / weight input trees • re-synthesize
• process feedback • input new trees
synthetic tree of life
Improving the synthetic tree
❖ Branch lengths & divergence times!
❖ Better synthesis using tree metadata!
❖ Community engagement!
❖ data deposition & curation!
❖ feedback & annotation
Moving beyond a single tree
❖ Detecting conflict and coverage!
❖ Visualization! !
❖ Enabling custom synthesis!
❖ Building out to other tools & resources
Leaf
Tree of LifeOPEN
What can we do with a tree of life?
aactgtcgcatgttgacg... aattgtcg-atgttgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aac-gtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg... aactgtcgcatgtcgacg...
+ =
Image: Zephyris at the English language Wikipedia
10 million years
24 million years
Acer macrophyllum!Betula lutea!Aesculus glabra!Tilia americana!Ulmus rubra
Leaf patterns image from Walls RL: American Journal of Botany 2011, 98(2):244-253.
Acer macrophyllum
Betula alleghaniensis
Aesculus glabra
Tilia americana
Ulmus rubra
Stoltzfus, A., Lapp, H., Matasci, N., … Cranston, K.A., ... & Jordan, G. (2013). Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC bioinformatics, 14(1), 158.
Collaborative data collection!Validation of datasets!
Search & download across datasets
Get tree
Get tree
Leaf
Tree of LifeOPEN
What can we do with a tree of life?
University of Alberta: !! Bruce Rannala!!University of Arizona: !! Michael Sanderson!!NESCent:!! Jonathan Rees!! Jim Allman
top related