synthesising disparate data resources to obtain composite estimates of geophylogeny
DESCRIPTION
Invited talk to the 2nd BioVeL workshop, Gothenburg, Sweden, 10 May 2012TRANSCRIPT
SYNTHESISING DISPARATE DATA RESOURCES TO OBTAIN COMPOSITE ESTIMATES OF GEOPHYLOGENY
Rutger Vos
A simple assignment?
Refine a tree for the Primates with taxonomic and systematic data
Add divergence dates
Add occurrence data Visualize the result Use public web
services
Actually not so easy…
The Tree of Life Web Service
Using PhyloWS we traversed the Tree of Life and built a local, semantically annotated copy of the Primate clade
Adding taxonomic metadata
Using the uBio PhyloWS service we enhanced our tree with further taxonomic annotations and links, and expanded some genera
Fetching additional tree data
Using the TreeBASE PhyloWS service we fetched additional data to resolve the tree further using a “supertree” approach
Computing node ages
The TimeTree PhyloWS service allowed us to anchor molecular (i.e. relative) node ages on absolute dates
Adding occurrence data
Using the GBIF XML API, we then fetched occurrence records for the species in our tree
Visualizing the result
Implementation
Except for GBIF, all services: return NeXML implement PhyloWS
Semantic annotations using RDFa
Glued together with Perl
Challenges
Although some services have the same API, no GUI exists to chain them together
No web services for computationally intensive steps
Data and metadata are messy and sparse
Conclusions
The tree of life can be covered with all sorts of metadata (taxonomic, molecular, biogeographic, paleontological), viewable in different ways
Standards still incompletely defined and adhered to, though
Shameless plug: PhyloTastic
A web service to extract subsets of taxa from megatrees and annotate them
Deliverable of the first HIP hackathon, at NESCent, in June 2012
Acknowledgements