synthesising disparate data resources to obtain composite estimates of geophylogeny

14
SYNTHESISING DISPARATE DATA RESOURCES TO OBTAIN COMPOSITE ESTIMATES OF GEOPHYLOGENY Rutger Vos

Upload: rutger-vos

Post on 18-Dec-2014

488 views

Category:

Technology


1 download

DESCRIPTION

Invited talk to the 2nd BioVeL workshop, Gothenburg, Sweden, 10 May 2012

TRANSCRIPT

Page 1: Synthesising disparate data resources to obtain composite estimates of geophylogeny

SYNTHESISING DISPARATE DATA RESOURCES TO OBTAIN COMPOSITE ESTIMATES OF GEOPHYLOGENY

Rutger Vos

Page 2: Synthesising disparate data resources to obtain composite estimates of geophylogeny

A simple assignment?

Refine a tree for the Primates with taxonomic and systematic data

Add divergence dates

Add occurrence data Visualize the result Use public web

services

Page 3: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Actually not so easy…

Page 4: Synthesising disparate data resources to obtain composite estimates of geophylogeny

The Tree of Life Web Service

Using PhyloWS we traversed the Tree of Life and built a local, semantically annotated copy of the Primate clade

Page 5: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Adding taxonomic metadata

Using the uBio PhyloWS service we enhanced our tree with further taxonomic annotations and links, and expanded some genera

Page 6: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Fetching additional tree data

Using the TreeBASE PhyloWS service we fetched additional data to resolve the tree further using a “supertree” approach

Page 7: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Computing node ages

The TimeTree PhyloWS service allowed us to anchor molecular (i.e. relative) node ages on absolute dates

Page 8: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Adding occurrence data

Using the GBIF XML API, we then fetched occurrence records for the species in our tree

Page 9: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Visualizing the result

Page 10: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Implementation

Except for GBIF, all services: return NeXML implement PhyloWS

Semantic annotations using RDFa

Glued together with Perl

Page 11: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Challenges

Although some services have the same API, no GUI exists to chain them together

No web services for computationally intensive steps

Data and metadata are messy and sparse

Page 12: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Conclusions

The tree of life can be covered with all sorts of metadata (taxonomic, molecular, biogeographic, paleontological), viewable in different ways

Standards still incompletely defined and adhered to, though

Page 13: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Shameless plug: PhyloTastic

A web service to extract subsets of taxa from megatrees and annotate them

Deliverable of the first HIP hackathon, at NESCent, in June 2012

Page 14: Synthesising disparate data resources to obtain composite estimates of geophylogeny

Acknowledgements