phyloinformatics and the semantic web
DESCRIPTION
Departmental seminar to the University of Bath, 21 March 2011.TRANSCRIPT
![Page 1: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/1.jpg)
Phyloinformatics and the Semantic Web
Rutger Vos
![Page 2: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/2.jpg)
Outline
• What is phyloinformatics and why should you care?
• How we got here and where we are now• How the semantic web can help• Projects that apply the semantic web to
phyloinformatics• Examples of linked data• Where to next
![Page 3: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/3.jpg)
What is Phyloinformatics?
Phylogenetics:“The systematic study of organism relationships based on evolutionary similarities and differences.”
Informatics:“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
![Page 4: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/4.jpg)
Why should you care?
Firstly, “Nothing in evolution makes sense except in the light
of phylogeny”
Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile?
But if that doesn’t convince you…
![Page 5: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/5.jpg)
As a consumer of phylogenetic data
The “New Biology” is coming:“Major advances will take place via integration and
synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009)
Presumably, this will involve retrieving and classifying.
![Page 6: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/6.jpg)
As a consumer of phylogenetic data
Or maybe for you phylogeny is simply a nuisance:– Functional prediction– Comparative analysis– Ortholog finding– Etc.
But it would still be nice to have that out of the way painlessly…
![Page 7: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/7.jpg)
As a producer of phylogenetic data
• Many journals require proper storage of data described in a manuscript.
• Funding agencies require dissemination and sharing of research results.
![Page 8: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/8.jpg)
The Past
• Everything was closed:– Idiosyncratic,
private data– “pay-walls”–Closed source
softwareNo accessible publishing medium
![Page 9: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/9.jpg)
The Present
Science is opening up:–Open data–Open access
publishing–Open source software
Publishing is now accessible to everyone, online
![Page 10: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/10.jpg)
Our current nightmare
Documents, documents everywhere
![Page 11: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/11.jpg)
The current web makes sense to us
![Page 12: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/12.jpg)
But not to a machine
![Page 13: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/13.jpg)
What was informatics again?
“The sciences concerned with gathering, manipulating, storing,
retrieving, and classifying recorded information.”
![Page 14: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/14.jpg)
![Page 15: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/15.jpg)
This is too hard
• O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittleman and A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
![Page 16: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/16.jpg)
Let’s delegate that
![Page 17: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/17.jpg)
Instead of linked documents
![Page 18: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/18.jpg)
A web of linked concepts
![Page 19: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/19.jpg)
Concepts connected by statements
![Page 20: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/20.jpg)
Concepts are defined in ontologies“An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain,
and may be used to describe the domain.”
![Page 21: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/21.jpg)
Expressing concepts in data syntax
![Page 22: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/22.jpg)
Concepts are linked
A triple is a statementsubject predicate object
Linked by statements called “triples”
Any part of a triple may have to be uniquely identifiable. For this we use URLs.
![Page 23: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/23.jpg)
An applied example
Triple 1Subject: <http://example.org/data/tree1>Predicate: <http://example.org/terms/hasLikelihood>Object: 2342.323i.e. -lnL(tree1) = 2342.323
Triple 2Subject: <http://example.org/data/tree2>Predicate: <http://example.org/terms/hasLikelihood>Object: 2341.184i.e. -lnL(tree2) = 2341.184
![Page 24: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/24.jpg)
What’s the better tree?
• The ontology defines what a likelihood is and how to compare negative log likelihoods.
• Hence, automated reasoning can conclude that tree2 is the better tree.
![Page 25: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/25.jpg)
URLs for phylogeneticsPhyloWS doesn’t just provide an anchor to identify
phylogenetic data, it also enables searching and retrieval.
![Page 26: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/26.jpg)
The EvoInfo “stack”
![Page 27: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/27.jpg)
TreeBASE
![Page 28: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/28.jpg)
External links
Taxon
Taxonvariant
Study
![Page 29: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/29.jpg)
A simple example
TreeBASE maps to uBio using skos:closeMatch...
…and uBio to ToL using gla:mapping
![Page 30: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/30.jpg)
Another Example, UniProt sequences
TreeBASE stores NCBI taxonomy
identifiers
Standard tools can rewrite
these linkout URLs
Result is a corresponding list of UniProt records
![Page 31: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/31.jpg)
Another Example, Geocoding
TreeBASE uses DarwinCore for lat/lon annotations
![Page 32: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/32.jpg)
Many online data repositories
![Page 33: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/33.jpg)
Challenges
• Fragile: many services offline in Japan• Data gets bigger and bigger• Many concepts not yet in ontologies• Many data still “locked in” in publications
![Page 34: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/34.jpg)
The Future
![Page 35: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/35.jpg)
The cloud
• Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo)
• Data will be stored in the cloud (Big Table, FreeBase)
![Page 36: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/36.jpg)
Interpreting locked in knowledge
• Text and images meant for humans are being processed by machines. Examples:– Taxon name mining
(BHL)– Gene name and function
mining– Tree figure processing– Automated annotation
![Page 37: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/37.jpg)
Summary
• Phyloinformatics is moving from closed to open to linked data
• Concepts and syntax are increasingly formalized and machine readable
• Automated queries across integrated resources will enable synthetic research
• Still lots to do to deploy these technologies and unlock legacy data
![Page 38: Phyloinformatics and the Semantic Web](https://reader035.vdocument.in/reader035/viewer/2022070303/549295d3ac7959092e8b4662/html5/thumbnails/38.jpg)
Acknowledgements
Thank you for your attention!Also, many thanks to:
The Pagel lab at UoR
The EvoInfo groupVal TannenWayne MaddisonWilliam PielHilmar LappArlin Stoltzfus