Paul Grothhttp://www.few.vu.nl/~pgroth/
@pgrothVU University Amsterdam
Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe1
We are all doing this many times……Pfizer
AZ
GSK
Merck
n
The Problem
Open PHACTS objective
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe3
Platform
Standards
Apps
API
4
Partners
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe
5
Associate Partners
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe
Sequenomics
6Convergence Meeting: Semantic
Interoperability for Clinical Research & Patient Safety in Europe
ChEMBLChEMBL DrugBankDrugBank Gene Ontology
Gene Ontology WikipathwaysWikipathways
UniProtUniProt
ChemSpiderChemSpider
UMLSUMLS
ConceptWikiConceptWiki
ChEBIChEBI
TrialTroveTrialTrove
GVKBioGVKBio
GeneGoGeneGo
TR IntegrityTR Integrity
“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”
“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
“What is the selectivity profile of known p38 inhibitors?”
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe7
Open PHACTS Explorer
PharmaTrek
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe9
ChemBioNavigtor
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe10
Utopia Documents
Semantic interoperability approach
Principles •Respect data providers•Make it easy for application developers
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe11
Semantic interoperability approach
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe12
Semantic Resources – Data sets
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe13
814,535,923 triples
Semantic Resources - Mappings
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe14
18 Million Mappings
Semantic resources - Summary
• Types of semantic resources– RDF Datasets– Mappings– Terminologies
• Mesh, UMLS, NCIM
– Hierarchies are essential • E.G. Target Ontology, Gene Ontology, Enzyme
classification • Class reasoning is essential
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe15
Methodology for semantic integration
1. Define use cases 2. Data Providers – create RDF with VoID headers3. Create mappings
– between dataset and known datasets (instance level)– index for text to url conversion
4. Ingest RDF into data cache (i.e. triple store)5. Define access paths to core concepts in data6. Extend or create sparql queries for API calls7. Publish api calls
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe16
Its easy to integrate, but difficult to integrate well
Adoption of standards
• Basic Semweb standards– SPARQL 1.1, RDF(S), SKOS
• Dataset descriptions– Vocabulary of Interlinked Datasets (VoID)– VoID linkset descriptions
• QUDT Quantities, Units, Dimensions and Types• Provenance
– W3C PROV, PAV, Nanopublications• BioPortal
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe18
Tooling• Infrastructure
– Linked Data API – Bridge DB - identifier to identifier mapping– Concept Wiki - text to identifier mapping and curation– Chemspider: chemistry registration and services– Triple Store: Virtuoso Professional addition
• Data– VoID descriptions and http and ftp sites– Github for data conversion scripts– Recommend turtle as RDF syntax
• friendly for scripting
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe19
Quality assurance of the semantic resources
• Provenance Everywhere• Validation
• ChemSpider Validation and Standardization Platform (CVSP) for flagging chemical representation issues
• Curation• High quality chemical names and synonyms. • Curation interfaces for terminologies (concept wiki)
• Report data quality issues to data providers
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe20
Semantic interoperability issues
1. Do not underestimate infrastructure2. APIs are important
1. Allows for tuning of sparql queries2. Makes it easy for developers
3. Ontologies- Requirements vs. Recommendation
4. Modeling is hard
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe21
Open PHACTS Information
• http://www.openphacts.org• [email protected]• @Open_PHACTS• Publications
– Overview paper: Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., Mons, B.: Open PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today. 17, 1188–1198 (2012).
– Technical approach: Gray, A.J.G., Groth, P., Loizou, A., et al.: Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. (2012).
Convergence Meeting: Semantic Interoperability for Clinical Research &
Patient Safety in Europe22