improving discovery in life sciences linked open data cloud

Slide 1

Improving discovery in Life Sciences Linked Open Data CloudAli Hasnain

AgendaProblemState of the artResearch questions and hypothesisPreliminary resultsApproachEvaluation planReflectionsLessons Learned

Semantic Model

Cataloguing and Linking

Query Federation

Knowledge Publishing

We want to query the content, not the source

Proteins

Molecules

Genes

Diseases

M: the way linked data is organizes still forces us to lookup data by its location, not the content! But those who turn to linked data dont want to query PDB, they want to learn more about proteins, or genes, etfc

-> Our first task is to catalogue the concepts that are relevant in these various datsets. Proving a common access for data is the first pillar on the bridge that crosses the valley of death4

A Linked Life Sciences Roadmap

Proteins

Molecules

Genes

Diseases:Protein:Molecule:Gene:DiseaseUniprotPDBPfamPROSITEProDomUnirefUniParkDailymedDrugBankChemBLPubChemKEGGGene OntologyGeneIDAffymetrixHomogeneMGIDiseasomeSIDER

M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets

-> once we identify these concepts, how do we actualy query them toegether? 5

Related WorkBLOOMS an initiative for finding schema-level links (Ontology Alignment and Wikipedia Characterization ).

Ontology alignment approaches cannot be applied as they:often require an ontology as a starting point. do not attempt to link beyond the label or concept name.do not take into consideration domain matching approaches. Ontology alignment techniques are more suited when data has been structured as a hierarchy.

The Silk Framework. The VoID vocabulary. The SameAs.org service.

6

2- Possible SolutionsTo assemble queries over multiple graphs at multiple endpoints, either vocabularies and ontologies are reused, Or translation maps between different terminologies are created (a posteriori integration)

a-priori v.s a-posteriori Integration

8

Hypothesis

"Given a heterogeneous Life Sciences Linked Open Data corpus, an active Compendium containing concepts from distinct endpoints and properties connecting these concepts, that can be (partially) leveraged to achieve "a posteriori" integration"

Methodology

Cataloguing and Linking Life Sciences LOD Cloud. Ali Hasnain, Ronan Fox, Stefan Decker and Helena F. Deus18th International Conference on Knowledge Engineering and Knowledge Management (EKAW 8 - 12 October 2012), Galway, Ireland

Describing DataSets- an Extract from Catalogue

Query Engine- BioFed

ChebiDrugBankQuery1Query2Query3Query4SPARQL 1.1

PubChemDailyMed?mol a :MoleculeTransformed query

Transformed queryTransformed queryQuery EngineTransformed queryQuery Transformation Rules(CONSTRUCTS)

CompoundDrugMoleculeSubstance

-> But where is the magic? How do you rewrite the rules? Do we write those constructs manually?12

BioFed- Query Engine

BioFed: Federated Query Processing over Life Sciences Linked Open Data

BioFed: Federated Query Processing over Life Sciences Linked Open Data Ali Hasnain, Qaiser Mehmood, Syeda Sana e Zainab, Muhammad Saleem, Claude Warren and Stefan Decker JBMS-2015 (under review)

FedViz System Architecture

15

FedViz is an online application that provides Biologist a flexible visual interface to formulate and execute both federated and non-federated, SPARQL queries.

It translates the visually assembled queries into SPARQL equivalent and execute using query engine (FedX, BioFed).

REVEALD REAL-TIME VISUAL EXPLORER AND AGGREGATOR OF LINKED DATA

ReVeaLD: A user-driven domain specific interactive search platform for biomedical research Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain, Stefan Decker and Helena F. DeusJournal of Biomedical Informatics

16

Linked Biomedical DataSpace

Catalogue Results

Linking Results

BioFed Evaluation- Dataset Used

Evaluating BioFed simple queries

Evaluating BioFed complex queries

Conclusion and Future WorkCataloguing and Linking Compendium of Life sciences Datasets

BioFed and FedViz a Particle Applications

Improving Catalogue/ Adding Statistical/ Latency etc information for customised and targeted query processing.

Evaluating BioFed with other available Query Engines including Anapsid, Splendid etc.

improving discovery in life sciences linked open data cloud

Healthcare