using sparql to query bioportal ontologies and metadata

59
Using SPARQL to Query BioPortal Ontologies and Metadata Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natasha F. Noy ISWC 2012 Boston, US Stanford Center for Biomedical Informatics Research (BMIR) Stanford University sparql.bioontology.org 1 Tuesday, November 13, 12

Upload: manuelso

Post on 08-May-2015

1.863 views

Category:

Technology


3 download

DESCRIPTION

BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies that the US National Li- brary of Medicine distributes in its own proprietary format. We have published the RDF based serializations of all these ontologies and their meta- data at sparql.bioontology.org.

TRANSCRIPT

Page 1: Using SPARQL to Query BioPortal Ontologies and Metadata

Using SPARQL to Query BioPortal Ontologies and Metadata

Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natasha F. Noy

ISWC 2012 Boston, US

Stanford Center for Biomedical Informatics Research (BMIR)Stanford University

sparql.bioontology.org1

Tuesday, November 13, 12

Page 2: Using SPARQL to Query BioPortal Ontologies and Metadata

2Tuesday, November 13, 12

Page 3: Using SPARQL to Query BioPortal Ontologies and Metadata

3

The main entry point to BioPortal data are the REST APIs.

Tuesday, November 13, 12

Page 4: Using SPARQL to Query BioPortal Ontologies and Metadata

3

The main entry point to BioPortal data are the REST APIs.

Via REST services we cannot offer answers to queries that require fine access to the data.

Tuesday, November 13, 12

Page 5: Using SPARQL to Query BioPortal Ontologies and Metadata

3

The main entry point to BioPortal data are the REST APIs.

Via REST services we cannot offer answers to queries that require fine access to the data.

SPARQL finer data access

Tuesday, November 13, 12

Page 6: Using SPARQL to Query BioPortal Ontologies and Metadata

3

The main entry point to BioPortal data are the REST APIs.

Via REST services we cannot offer answers to queries that require fine access to the data.

Challenges, opportunities and lessons learnt with sparql.bioontology.org

SPARQL finer data access

Tuesday, November 13, 12

Page 7: Using SPARQL to Query BioPortal Ontologies and Metadata

4Tuesday, November 13, 12

Page 8: Using SPARQL to Query BioPortal Ontologies and Metadata

5

Our SPARQL endpoint is different from others because our data are primarily ontologies themselves and not data about individuals.

Still lessons learnt apply to other domains, we have to deal with:

performancescalabilityheterogeneity query articulation

Tuesday, November 13, 12

Page 9: Using SPARQL to Query BioPortal Ontologies and Metadata

6

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 10: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 11: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 12: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 13: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 14: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 15: Using SPARQL to Query BioPortal Ontologies and Metadata

6

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 16: Using SPARQL to Query BioPortal Ontologies and Metadata

BioPortal Data• Ontology Content

• OBO Format

• Rich Release Format (RRF)

• OWL

• Ontology Metadata

• Mapping Data

7Tuesday, November 13, 12

Page 17: Using SPARQL to Query BioPortal Ontologies and Metadata

BioPortal Data• Ontology Content

• OBO Format

• Rich Release Format (RRF)

• OWL

• Ontology Metadata

• Mapping Data

7

RDF

Tuesday, November 13, 12

Page 18: Using SPARQL to Query BioPortal Ontologies and Metadata

BioPortal Data• Ontology Content

• OBO Format

• Rich Release Format (RRF)

• OWL

• Ontology Metadata

• Mapping Data

7

RDF

Triple Store

Tuesday, November 13, 12

Page 19: Using SPARQL to Query BioPortal Ontologies and Metadata

BioPortal Data• Ontology Content

• OBO Format

• Rich Release Format (RRF)

• OWL

• Ontology Metadata

• Mapping Data

7

RDF

Triple Store

SPARQL

Tuesday, November 13, 12

Page 20: Using SPARQL to Query BioPortal Ontologies and Metadata

RDF - Ontology Metadata

ontology/1353

ontology/46896

ontology/46116

ontology/42122

meta:hasVersion

name

date

format

(....)

meta:VirtualOntology omv:Ontologyversion

meta:hasDataGraph

<http://bioportal.bioontology.org/ontologies/SNOMED>

meta:hasVersion

meta:hasVersion

8Tuesday, November 13, 12

Page 21: Using SPARQL to Query BioPortal Ontologies and Metadata

RDF - Mappings<http://purl.bioontology.org/mapping/2767e8e0-001b-012e-749f-005056bd0010> maps:has_process_info <.../procinfo/2008-04-23-38138> ; maps:comment "Manual mappings between Mouse anatomy and NCIT." ; maps:relation skos:closeMatch ; maps:target <http://purl.org/obo/owl/MA#MA_0001096> ; maps:source <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Olfactory_Nerve> ; maps:source_ontology_id <http://bioportal.bioontology.org/ontologies/1032> ; maps:target_ontology_id <http://bioportal.bioontology.org/ontologies/1000> ; a maps:One_To_One_Mapping .

<http://purl.bioontology.org/mapping/nonloom/procinfo/2008-04-23-38138> maps:date "2008-04-23T19:21:45Z"^^xsd:dateTime ; maps:mapping_source "Organization" ; maps:mapping_source_contact_info "http://www.nlm.nih.gov" ; maps:mapping_source_name "NLM" ; maps:mapping_source_site <http://www.nlm.nih.gov> ; maps:mapping_type "Manual" ; maps:submitted_by 38138 .

Mapping

Noy, N.F., Griffith, N., Musen, M.A.: Collecting community-based mappings in an ontology repository. In: International Semantic Web Conference. pp. 371–386 (2008)

9Tuesday, November 13, 12

Page 22: Using SPARQL to Query BioPortal Ontologies and Metadata

10

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 23: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

11Tuesday, November 13, 12

Page 24: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

11

rdfs:subClassOf

Tuesday, November 13, 12

Page 25: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

11

rdfs:subClassOf

rdfs:subPropertyOf

Tuesday, November 13, 12

Page 26: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

12

rdfs:subClassOf

rdfs:subPropertyOf

Tuesday, November 13, 12

Page 27: Using SPARQL to Query BioPortal Ontologies and Metadata

BP Taxonomies

Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies.

13

We offer rdfs:subClassOf reasoning to collect hierarchy closures.

Tuesday, November 13, 12

Page 28: Using SPARQL to Query BioPortal Ontologies and Metadata

BP Taxonomies

Almost every ontology in BioPortal uses rdfs:subClassOf to record class hierarchies.

13

We offer rdfs:subClassOf reasoning to collect hierarchy closures.

backward-chainoff by default

Tuesday, November 13, 12

Page 29: Using SPARQL to Query BioPortal Ontologies and Metadata

BP Taxonomies2 use cases and their challenges

partial traversal

hierarchies with mappings

14Tuesday, November 13, 12

Page 30: Using SPARQL to Query BioPortal Ontologies and Metadata

With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.

hierarchies and mappings

15

"malignant hyperthermia"Human Disease Ontology

Tuesday, November 13, 12

Page 31: Using SPARQL to Query BioPortal Ontologies and Metadata

With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.

hierarchies and mappings

15Tuesday, November 13, 12

Page 32: Using SPARQL to Query BioPortal Ontologies and Metadata

With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.

hierarchies and mappings

15Tuesday, November 13, 12

Page 33: Using SPARQL to Query BioPortal Ontologies and Metadata

With mappings one can continue browsing a taxonomy beyond the boundaries of a certain ontology.

hierarchies and mappings

15Tuesday, November 13, 12

Page 34: Using SPARQL to Query BioPortal Ontologies and Metadata

partial traversalSome applications need to traverse the hierarchy for a fixed number of steps.

16Tuesday, November 13, 12

Page 35: Using SPARQL to Query BioPortal Ontologies and Metadata

partial traversalSome applications need to traverse the hierarchy for a fixed number of steps.

16Tuesday, November 13, 12

Page 36: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

17

rdfs:subClassOf

rdfs:subPropertyOf

Tuesday, November 13, 12

Page 37: Using SPARQL to Query BioPortal Ontologies and Metadata

Common Attributes in BP Ontologies

taxonomies

preferred labels

synonyms

definitions

18

rdfs:subClassOf

rdfs:subPropertyOf

Tuesday, November 13, 12

Page 38: Using SPARQL to Query BioPortal Ontologies and Metadata

BP preferred labels, synonyms and definitions

34 ontologies record preferred labels, synonyms and definitions using their own predicates.

When ontology authors upload ontologies into BioPortal they have to choose what are the predicates that represent these attributes.

19Tuesday, November 13, 12

Page 39: Using SPARQL to Query BioPortal Ontologies and Metadata

BP preferred labels, synonyms and definitions

We provide uniform access to these proper ties by linking these different properties to the standard SKOS properties using rdfs:subPropertyOf.

We assert these links in a graph named “globals”

skos:prefLabel skos:altLabel skos:definition

rdfs:label

pref. label predicates

alt. label predicates

definition predicates

user defined

SKOS

Tuesday, November 13, 12

Page 40: Using SPARQL to Query BioPortal Ontologies and Metadata

By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels.

21Tuesday, November 13, 12

Page 41: Using SPARQL to Query BioPortal Ontologies and Metadata

By including the rdfs:subPropertyOf links in “globals” we do not need to know what property is used in NIF-RTH to retrieve preferred labels.

21Tuesday, November 13, 12

Page 42: Using SPARQL to Query BioPortal Ontologies and Metadata

22

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 43: Using SPARQL to Query BioPortal Ontologies and Metadata

22

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 44: Using SPARQL to Query BioPortal Ontologies and Metadata

22

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 45: Using SPARQL to Query BioPortal Ontologies and Metadata

Complex Query Articulation

:x owl:equivalentClass [ owl:Class; owl:unionOf ( :Class0 :Class1 :Class2 ) ] .

Anon0

Anon1

owl:unionOf

Class0

Anon2

rdf:first

rdf:rest

rdf:first

Class1

Anon3rdf:restrdf:rest

rdf:nil

rdf:first

Class2

RDF Turtle Serialization

rdf:type

owl:Classx

owl:equivalentClass

RDF Model Representation

EquivalentClasses( :x ObjectUnionOf( :Class1 :Class2 :Class3 ) ) . Functional Syntax

23Tuesday, November 13, 12

Page 46: Using SPARQL to Query BioPortal Ontologies and Metadata

obo:VO_0000001 a owl:Class ; rdfs:label "vaccine" ; rdfs:seeAlso "MeSH: D014612" ; obo:IAO_0000115 "A vaccine is a processed (...) " ; obo:IAO_0000116 "Many vaccines are developed (...) " ; obo:IAO_0000117 "YH, BP, BS, MC, LC, XZ, RS" ; rdfs:subClassOf obo:OBI_0000047 ; owl:equivalentClass [ a owl:Class ; owl:intersectionOf (obo:OBI_0000047 [ a owl:Restriction ; owl:onProperty obo:BFO_0000085 ; owl:someValuesFrom [ a owl:Class ; owl:intersectionOf (obo:VO_0000278 [ a owl:Restriction ; owl:onProperty obo:BFO_0000054 ; owl:someValuesFrom obo:VO_0000494 ] ) ] ] [ a owl:Restriction ; owl:onProperty obo:OBI_0000312 ; owl:someValuesFrom obo:VO_0000590 ] ) ] .

Example of a relatively complex OWL construction from the Vaccine Ontology

24Tuesday, November 13, 12

Page 47: Using SPARQL to Query BioPortal Ontologies and Metadata

25

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 48: Using SPARQL to Query BioPortal Ontologies and Metadata

25

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 49: Using SPARQL to Query BioPortal Ontologies and Metadata

25

1. Retrieval of common attributes and how simple reasoning can help.

2. Complexity of query articulation when targeting OWL complex constructions.

3. Best practices in using an open shared endpoint:I. Selective queries work better.

II. Careful with large result sets. Paginate.

III. How the client reads matter.

Challenges, opportunities and lessons learn with sparql.bioontology.org

Tuesday, November 13, 12

Page 50: Using SPARQL to Query BioPortal Ontologies and Metadata

Best practices in using a shared SPARQL endpoint

selective queries work better

control size of result sets - pagination

how clients read matters

26Tuesday, November 13, 12

Page 51: Using SPARQL to Query BioPortal Ontologies and Metadata

selective queries work better

27

Tuesday, November 13, 12

Page 52: Using SPARQL to Query BioPortal Ontologies and Metadata

selective queries work better

27

Tuesday, November 13, 12

Page 53: Using SPARQL to Query BioPortal Ontologies and Metadata

selective queries work better

27

for each ?p

Tuesday, November 13, 12

Page 54: Using SPARQL to Query BioPortal Ontologies and Metadata

control size of result sets - pagination

28Tuesday, November 13, 12

Page 55: Using SPARQL to Query BioPortal Ontologies and Metadata

control size of result sets - pagination

28

while len(results) == LIMIT

OFFSET += LIMIT

Tuesday, November 13, 12

Page 56: Using SPARQL to Query BioPortal Ontologies and Metadata

how clients read matters

Use libraries that parse the result set on demand

retrieval of all preferred labels from NCBI Taxonomy (500K solutions)

0

32.5

65.0

97.5

130.0

XML JSON

output size in MB

0

15

30

45

60

JSON+Python JSON+CJSON XML+Jena ARQ XML+Sesame

parsing time in seconds

29Tuesday, November 13, 12

Page 57: Using SPARQL to Query BioPortal Ontologies and Metadata

30

Using SPARQL to Query BioPortalOntologies and Metadata

Manuel Salvadores, Matthew Horridge, Paul R. Alexander,Ray W. Fergerson, Mark A. Musen, and Natalya F. Noy

Stanford Center for Biomedical Informatics ResearchStanford University, US

{manuelso,matthew.horridge,palexander,ray.fergerson,musen,noy}@stanford.edu

Abstract. BioPortal is a repository of biomedical ontologies—the largestsuch repository, with more than 300 ontologies to date. This set includesontologies that were developed in OWL, OBO and other languages, aswell as a large number of medical terminologies that the US National Li-brary of Medicine distributes in its own proprietary format. We have pub-lished the RDF based serializations of all these ontologies and their meta-data at sparql.bioontology.org. This dataset contains 203M triples,representing both content and metadata for the 300+ ontologies; and 9Mmappings between terms. This endpoint can be queried with SPARQLwhich opens new usage scenarios for the biomedical domain. This paperpresents lessons learned from having redesigned several applications thattoday use this SPARQL endpoint to consume ontological data.

Keywords: Ontologies, SPARQL, RDF, Biomedical, Linked Data

1 SPARQL In Use In BioPortal:Overview of Opportunities and Challenges

Ontology repositories act as a gateway for users who need to find ontologies fortheir applications. Ontology developers submit their ontologies to these reposi-tories in order to promote their vocabularies and to encourage inter-operation.In biomedicine, cultural heritage, and other domains, many of the ontologies andvocabularies are extremely large, with tens of thousands of classes.

In our laboratory, we have developed BioPortal, a community-based ontologyrepository for biomedical ontologies [11]. Users can publish their ontologies toBioPortal, submit new versions, browse the ontologies, and access the ontologiesand their components through a set of REST services. BioPortal provides searchacross all ontologies in its collection, a repository of automatically and manuallygenerated mappings between classes in di↵erent ontologies, ontology reviews,new term requests, and discussions generated by the ontology users in the com-munity. BioPortal contains metadata about each ontology and its versions aswell as mappings between terms in di↵erent ontologies.

Undefined 1 (2009) 1–5 1IOS Press

BioPortal as a Dataset of Linked BiomedicalOntologies and Terminologies in RDF.Manuel Salvadores, a,⇤ Paul R. Alexander, a Mark A. Musen a and Natalya F. Noy a

a Stanford Center for Biomedical Informatics ResearchStanford University, USE-mail: {manuelso, palexander, musen, noy}@stanford.edu,

Abstract. BioPortal is a repository of biomedical ontologies—the largest such repository, with more than 300 ontologies todate. This set includes ontologies that were developed in OWL, OBO and other formats, as well as a large number of medicalterminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDFversion of all these ontologies at http://sparql.bioontology.org. This dataset contains 190M triples, representingboth metadata and content for the 300 ontologies. We use the metadata that the ontology authors provide and simple RDFSreasoning in order to provide dataset users with uniform access to key properties of the ontologies, such as lexical properties forthe class names and provenance data. The dataset also contains 9.8M cross-ontology mappings of different types, generated bothmanually and automatically, which come with their own metadata.

Keywords: biomedical ontologies, BioPortal, RDF, linked data

1. IntroductionIn our laboratory, we have developed BioPortal, a

community-based ontology repository for biomedicalontologies [20,1]. Users can publish their ontologiesto BioPortal, submit new versions, browse the ontolo-gies, and access the ontologies and their componentsthrough a set of REST services, SPARQL and de-referenceable URIs.

Over the past four years, as BioPortal grew in popu-larity, research institutions and corporations have usedour REST APIs extensively. The use of the REST ser-vices has experienced outstanding growth in 2011. Theaverage number of hits per month grew from 3M hitsin 2010 to 122M hits in 2011.Our users have incorpo-rated these services in applications that perform drugsurveillance, gene annotation, enrichment and clas-sification of scientific literature, and other tasks. InDecember 2011, we released a public SPARQL end-point, http://sparql.bioontology.org, toprovide direct access to our datasets in RDF. We had

*Corresponding author. E-mail: [email protected].

numerous requests from users for the SPARQL end-point, which would enable them to query and analyzethe data in much more precise and application-specificways than our set of REST APIs allowed.

This paper describes the Linked Data aspects of theBioPortal’s ecosystem and the structure of our linkeddatasets in RDF. In addition, we describe the processthat we used to transform different ontology formatsinto RDF and the mappings between ontologies. Wedescribe several issues with using the shared SPARQLendpoint elsewhere [10]. This discussion includes thedetails on retrieving common attributes from multi-ple ontologies, articulating complex queries, and thelessons that we have learned on the best practices ofusing a shared SPARQL endpoint.

2. Biomedical Ontologies in BioPortalResearchers and practitioners in the Semantic Web

normally deal with two types of data: (1) ontologies,vocabularies or TBoxes; and (2) instance data or sim-ply data. It is important to clarify that BioPortal’s con-tent is almost exclusively ontologies and related arti-facts. By contrast, most other datasets of the Linked

0000-0000/09/$00.00 c� 2009 – IOS Press and the authors. All rights reserved

Tuesday, November 13, 12

Page 58: Using SPARQL to Query BioPortal Ontologies and Metadata

Conclusions• Our use of SPARQL is different from many other use cases

because our data are primarily ontologies themselves and not data about individuals.

• SPARQL and a small amount of reasoning can be particularly powerful in providing easy access to common attributes.

• Exposing OWL through a SPARQL endpoint poses a number of challenges.

• There are challenges in running a shared open SPARQL endpoint. We can overcome these challenges if we encourage developers to conform to a set of simple best practices.

31Tuesday, November 13, 12

Page 59: Using SPARQL to Query BioPortal Ontologies and Metadata

Thank you

Questions

32Tuesday, November 13, 12