improving discovery in life sciences linked open data cloud
TRANSCRIPT
Slide 1
Improving discovery in Life Sciences Linked Open Data CloudAli Hasnain
AgendaProblemState of the artResearch questions and hypothesisPreliminary resultsApproachEvaluation planReflectionsLessons Learned
Semantic Model
Cataloguing and Linking
Query Federation
Knowledge Publishing
We want to query the content, not the source
Proteins
Molecules
Genes
Diseases
M: the way linked data is organizes still forces us to lookup data by its location, not the content! But those who turn to linked data dont want to query PDB, they want to learn more about proteins, or genes, etfc
-> Our first task is to catalogue the concepts that are relevant in these various datsets. Proving a common access for data is the first pillar on the bridge that crosses the valley of death4
A Linked Life Sciences Roadmap
Proteins
Molecules
Genes
Diseases:Protein:Molecule:Gene:DiseaseUniprotPDBPfamPROSITEProDomUnirefUniParkDailymedDrugBankChemBLPubChemKEGGGene OntologyGeneIDAffymetrixHomogeneMGIDiseasomeSIDER
M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets
-> once we identify these concepts, how do we actualy query them toegether? 5
Related WorkBLOOMS an initiative for finding schema-level links (Ontology Alignment and Wikipedia Characterization ).
Ontology alignment approaches cannot be applied as they:often require an ontology as a starting point. do not attempt to link beyond the label or concept name.do not take into consideration domain matching approaches. Ontology alignment techniques are more suited when data has been structured as a hierarchy.
The Silk Framework. The VoID vocabulary. The SameAs.org service.
6
2- Possible SolutionsTo assemble queries over multiple graphs at multiple endpoints, either vocabularies and ontologies are reused, Or translation maps between different terminologies are created (a posteriori integration)
a-priori v.s a-posteriori Integration
8
Hypothesis
"Given a heterogeneous Life Sciences Linked Open Data corpus, an active Compendium containing concepts from distinct endpoints and properties connecting these concepts, that can be (partially) leveraged to achieve "a posteriori" integration"
Methodology
Cataloguing and Linking Life Sciences LOD Cloud. Ali Hasnain, Ronan Fox, Stefan Decker and Helena F. Deus18th International Conference on Knowledge Engineering and Knowledge Management (EKAW 8 - 12 October 2012), Galway, Ireland
Describing DataSets- an Extract from Catalogue
Query Engine- BioFed
ChebiDrugBankQuery1Query2Query3Query4SPARQL 1.1
PubChemDailyMed?mol a :MoleculeTransformed query
Transformed queryTransformed queryQuery EngineTransformed queryQuery Transformation Rules(CONSTRUCTS)
CompoundDrugMoleculeSubstance
-> But where is the magic? How do you rewrite the rules? Do we write those constructs manually?12
BioFed- Query Engine
BioFed: Federated Query Processing over Life Sciences Linked Open Data
BioFed: Federated Query Processing over Life Sciences Linked Open Data Ali Hasnain, Qaiser Mehmood, Syeda Sana e Zainab, Muhammad Saleem, Claude Warren and Stefan Decker JBMS-2015 (under review)
FedViz System Architecture
15
FedViz is an online application that provides Biologist a flexible visual interface to formulate and execute both federated and non-federated, SPARQL queries.
It translates the visually assembled queries into SPARQL equivalent and execute using query engine (FedX, BioFed).
REVEALD REAL-TIME VISUAL EXPLORER AND AGGREGATOR OF LINKED DATA
ReVeaLD: A user-driven domain specific interactive search platform for biomedical research Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain, Stefan Decker and Helena F. DeusJournal of Biomedical Informatics
16
Linked Biomedical DataSpace
Catalogue Results
Linking Results
BioFed Evaluation- Dataset Used
Evaluating BioFed simple queries
Evaluating BioFed complex queries
Conclusion and Future WorkCataloguing and Linking Compendium of Life sciences Datasets
BioFed and FedViz a Particle Applications
Improving Catalogue/ Adding Statistical/ Latency etc information for customised and targeted query processing.
Evaluating BioFed with other available Query Engines including Anapsid, Splendid etc.