including co-referent uris in a sparql query
DESCRIPTION
Linked data relies on instance level links between potentially differing representations of concepts in multiple datasets. However, in large complex domains, such as pharmacology, the inter-relationship of data instances needs to consider the context (e.g. task, role) of the user and the assumptions they want to apply to the data. Such context is not taken into account in most linked data integration procedures. In this paper we argue that dataset links should be stored in a stand-off fashion, thus enabling different assumptions to be applied to the data links during query execution. We present the infrastructure developed for the Open PHACTS Discovery Platform to enable this and show through evaluation that the incurred performance cost is below the threshold of user perception. http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdfTRANSCRIPT
Including Co-referent URIs in a SPARQL Query
Christian Y A Brenninkmeijer, Carole Goble, Alasdair J G Gray, Paul Groth,
Antonis Loizou, and Steve Pettifer
www.openphacts.org [email protected]@open_phacts @gray_alasdair
Multiple Identities
Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”
http://bioinformatics.roslin.ac.uk/lawslaws.html
22/10/2013 COLD 2013 2
P12047X31045P120
47
GB:29384RS_
2353
Are these the same thing?
Gleevec® = Imatinib Mesylate
22/10/2013 COLD 2013 3
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N
22/10/2013 COLD 2013 4
22/10/2013 COLD 2013 5
Multiple Links: Different Reasons
22/10/2013 COLD 2013 7
Link: skos:closeMatchReason: non-salt form
Link: skos:exactMatchReason: drug name
Strict Relaxed
Analysing Browsing
Dynamic Equality
22/10/2013 COLD 2013 8
skos:exactMatch(InChI)
Strict Relaxed
Analysing Browsing
Dynamic Equality
22/10/2013 COLD 2013 9
skos:closeMatch(Drug Name)
skos:closeMatch(Drug Name)
skos:exactMatch(InChI)
Open PHACTS Discovery Platform
22/10/2013 COLD 2013 10
Drug Discovery Platform
Apps
Domain API
Interactive responses
Production qualityintegration platform
MethodCalls
Integration Approach
• Data kept in original model• Data cached in central triple store• API call translated to SPARQL query• Query expressed in terms of original data
22/10/2013 COLD 2013 11
OPS Discovery Platform
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
22/10/2013 COLD 2013 12
Platform Interaction
1. Resolve user input: – User enters search text– Resolve to a URI for concept
2. Request data for URI– Expand URI to equivalent for each dataset– Run resulting SPARQL query
22/10/2013 COLD 2013 13
?iri cheminf:logd ?logd .FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 )
cw:979b545d-f9a9 cheminf:logd ?logd .GRAPH <http://rdf.chemspider.com> {
}
cw:979b545d-f9a9 cheminf:logd ?logd .
Query Expansion
22/10/2013 COLD 2013 14
Identity Mapping Service
(BridgeDB)
Query Expander Service
Profiles
Mappings
Q, L1 Q’
[cw:979b545d-f9a9,cs:2157, chembl:1280,db:db00945]
cw:979b545d-f9a9, L1
Can also be achieved through UNION
Experiment
Is it feasible to use a stand-off mapping service?• Base lines (no external call):
– “Perfect” URIs– Linked data querying
• Expansion approaches (external service call):– FILTER by Graph– UNION by Graph
22/10/2013 COLD 2013 15
“Perfect” URI BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { chembl_mol:m1280 cheminf:mw ?mw . }}
22/10/2013 COLD 2013 16
Linked Data BaselineWHERE { GRAPH <chemspider> { cs:2157 cheminf:logp ?logp . } GRAPH <chembl> { ?chemblid cheminf:mw ?mw . } cs:2157 skos:exactMatch ?chemblid .}
22/10/2013 COLD 2013 17
Queries
Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)
22/10/2013 COLD 2013 18
Queries
Drawn from Open PHACTS API:1. Simple compound information (1)2. Compound information (1)3. Compound pharmacology (M)4. Simple target information (1)5. Target information (1)6. Target pharmacology (M)
22/10/2013 COLD 2013 19
Data:167,783,592 triples
Mappings:2,114,584 triples
Lenses:1
Datasets and Links
22/10/2013 COLD 2013 20
Average execution times
22/10/2013 COLD 2013 21
Average execution times
22/10/2013 COLD 2013 22
0.01
8
22/10/2013 COLD 2013 29
Conclusions
• Query expansion slower in general– Due to separate service call– Difference below human perception– UNION faster than FILTER on Virtuoso
• Stand-off mappings feasible• Infrastructure can support lenses
22/10/2013 COLD 2013 30
Strict Relaxed
Analysing Browsing