scientific lenses over linked data: identity management in the open phacts project
DESCRIPTION
TRANSCRIPT
![Page 1: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/1.jpg)
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project
Alasdair J G [email protected]
www.alasdairjggray.co.uk@gray_alasdair
http://c745.r45.cf2.rackcdn.com/img/2009/lens_filter_coasters.jpg
![Page 2: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/2.jpg)
Open PHACTS Use Case
“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”
Chemical Properties (Chemspider) Launched drugs (Drugbank) Human => Mouse (Homologene) Protein Families (Enzyme) Bioactivty Data (ChEMBL) … other info (Uniprot/Entrez etc.)
“Let me compare MW, logP and PSA for launched inhibitors of human & mouse oxidoreductases”
21/05/2014 Brighton Seminar 2
![Page 3: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/3.jpg)
LiteraturePubChem
GenbankPatents Databases
Downloads
Data Integration Data Analysis Firewalled Databases
Repeat @ each companyx
Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
A single, shared solution.
Funded under• IMI: 2011-14• ENSO: 2014-16
Pre-competitive Informatics
![Page 4: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/4.jpg)
Open PHACTS Discovery Platform
21/05/2014 Brighton Seminar 4
Drug Discovery Platform
Apps
Domain API
Interactive responses
Production qualityintegration platform
MethodCalls
![Page 5: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/5.jpg)
(April 2013 – March 2014)
15.8 million total hits
API Hits
![Page 6: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/6.jpg)
An “App Store”?
http://www.openphactsfoundation.org/apps.html
Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium
MOE Collector Cytophacts Utopia Garfield SciBite
KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna
![Page 7: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/7.jpg)
Drug
Disease
PathwayTarget
https://dev.openphacts.org/
Linked Data API
21/05/2014 Brighton Seminar 7
![Page 8: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/8.jpg)
OPS Discovery Platform
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
![Page 9: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/9.jpg)
Platform Interaction
![Page 10: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/10.jpg)
Provenance
![Page 11: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/11.jpg)
Multiple Identities
Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws/
21/05/2014 Brighton Seminar 11
P12047X31045P120
47
GB:29384RS_
2353
Are these the same thing?
![Page 12: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/12.jpg)
Gleevec® = Imatinib Mesylate
21/05/2014 Brighton Seminar 12
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N
![Page 13: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/13.jpg)
21/05/2014 Brighton Seminar 13
![Page 14: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/14.jpg)
21/05/2014 Brighton Seminar 14
![Page 15: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/15.jpg)
Multiple Links: Different Reasons
21/05/2014 Brighton Seminar 16
Link: skos:closeMatchReason: non-salt form
Link: skos:exactMatchReason: drug name
![Page 16: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/16.jpg)
Strict Relaxed
Analysing Browsing
Dynamic Equality
21/05/2014 Brighton Seminar 17
skos:exactMatch(InChI)
![Page 17: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/17.jpg)
Strict Relaxed
Analysing Browsing
Dynamic Equality
21/05/2014 Brighton Seminar 18
skos:closeMatch(Drug Name)
skos:closeMatch(Drug Name)
skos:exactMatch(InChI)
![Page 18: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/18.jpg)
Initial Connectivity
21/05/2014 Brighton Seminar 19
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
![Page 19: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/19.jpg)
Compound Information
![Page 20: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/20.jpg)
Genes == Proteins?
BRCA1Breast cancer type 1 susceptibility protein
21/05/2014 Brighton Seminar 21
http://en.wikipedia.org/wiki/File:Protein_BRCA1_PDB_1jm7.png
http://en.wikipedia.org/wiki/File:BRCA1_en.png
![Page 21: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/21.jpg)
Proceed with Caution!
21/05/2014 Brighton Seminar 22
![Page 22: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/22.jpg)
Co-reference Computation
Rules ensure• Unrestricted transitivity
within conceptual type• Restrict crossing
conceptual types
Based on justifications
Provenance captured
Target
Protein
Gene
21/05/2014 Brighton Seminar 23
0..*
0..*
0..*
0..1
0..1
![Page 23: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/23.jpg)
Initial Connectivity
21/05/2014 Brighton Seminar 24
Datasets 37
Linksets 104
Links 7,096,712
Justifications 7
![Page 24: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/24.jpg)
Inferred Connectivity
21/05/2014 Brighton Seminar 25
Datasets 37
Linksets 883
Links 17,383,846
Justifications 7
![Page 25: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/25.jpg)
BridgeDb
21/05/2014 Brighton Seminar 26
![Page 26: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/26.jpg)
http://ops.rsc.org/OPS45975 http://ops.rsc.org/OPS45978
has_isotopically_unspecified_parent [CHEMINF:000459]
has OPS normalized counterpart [CHEMINF:000458]
http://ops.rsc.org/OPS45991
is_tautomer_of[chebi:is_tautomer_of]
http://ops.rsc.org/OPS45987
has_stereoundefined_parent [CHEMINF:000456]
http://ops.rsc.org/OPS45981
Lenses
![Page 27: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/27.jpg)
OPS Discovery Platform
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
![Page 28: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/28.jpg)
?iri cheminf:logd ?logd .FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 )
cw:979b545d-f9a9 cheminf:logd ?logd .GRAPH <http://rdf.chemspider.com> {
}
cw:979b545d-f9a9 cheminf:logd ?logd .
Query Expansion
Identity Mapping Service
(BridgeDB)
Query Expander Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,cs:2157, chembl:1280,db:db00945]
cw:979b545d-f9a9, L1
Can also be achieved through UNION
21/05/2014 Brighton Seminar 29
![Page 29: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/29.jpg)
Experiment
Is it feasible to use a stand-off mapping service?• Base lines (no external call):
– “Perfect” URIs– Linked data querying
• Expansion approaches (external service call):– FILTER by Graph– UNION by Graph
C. Y. A. Brenninkmeijer, C. A. Goble, A. J. G. Gray, P. T. Groth, A. Loizou, S. Pettifer: Including Co-referent URIs in a SPARQL Query. COLD 2013. http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf21/05/2014 Brighton Seminar 30
![Page 30: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/30.jpg)
“Perfect” URI BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { chembl_mol:m1280 cheminf:mw ?mw . }}
21/05/2014 Brighton Seminar 31
![Page 31: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/31.jpg)
Linked Data BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { ?chemblid cheminf:mw ?mw . } cs:2157 skos:exactMatch ?chemblid .}
21/05/2014 Brighton Seminar 32
![Page 32: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/32.jpg)
Queries
Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)
21/05/2014 Brighton Seminar 33
![Page 33: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/33.jpg)
Queries
Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)
21/05/2014 Brighton Seminar 34
![Page 34: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/34.jpg)
Data:167,783,592 triples
Mappings:2,114,584 triples
Lenses:1
Experiment Data
21/05/2014 Brighton Seminar 35
![Page 35: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/35.jpg)
Average execution times
36
![Page 36: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/36.jpg)
Average execution times
0.01
8
37
![Page 37: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/37.jpg)
Q6: Target Pharmacology
44
![Page 38: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/38.jpg)
Conclusions
• Computing co-reference advantageous– Requires less raw linksets– Larger coverage across datasets
• Rules ensure control– Genes can equal proteins– Compounds never equal proteins
• Provenance captured throughout
21/05/2014 Brighton Seminar 45
![Page 39: Scientific Lenses over Linked Data: Identity Management in the Open PHACTS project](https://reader034.vdocument.in/reader034/viewer/2022051815/53ff82ee8d7f7249088b4644/html5/thumbnails/39.jpg)
Conclusions
• Query expansion slower in general– Due to separate service call– Difference below human perception– UNION faster than FILTER on Virtuoso
• Stand-off mappings feasible• Infrastructure can support lenses
21/05/2014 Brighton Seminar 46
Strict Relaxed
Analysing Browsing