spine ontology as the backbone of content discovery
TRANSCRIPT
SPINE Ontology as the backbone of content discovery
SPINEOntology as the backbone of
content discovery
Taxonomy Bootcamp LondonJune 9th, 2021
synaptica25 years of innovation
Kiri AikmanHead of New Content
Royal Pharmaceutical Society
[email protected]://twitter.com/kiriaik
Jonathan StottTechnical Architect
Royal Pharmaceutical Society
Dave ClarkeFounder
Synaptica
[email protected]://twitter.com/DavidClarkeBlog
SPINE Ontology as the backbone of content discovery 2
Outlinesynaptica25 years of innovation
Kiri Aikman1. RPS and Pharmaceutical Press2. SPINE3. PhP Requirements
Dave Clarke1. Graphite and GraphDB2. Working in OWL versus SKOS3. Selecting Best Fit Namespaces and developing custom
PhP pharma-specific properties4. SPINE’s High Level Knowledge Model
Jonathan Stott1. Managing Multilingual Concept Labels2. Supporting PhP’s Established Proprietary UIDs3. Integrating with Content Management, Tagging and Search4. Future Roadmap
SPINE Ontology as the backbone of content discovery
The Royal Pharmaceutical
Society (RPS) is a world leader in the
safe use of medicines with a
mission to put pharmacy at the
forefront of healthcare.
3
The Royal Pharmaceutical Societysynaptica25 years of innovation
Pharmaceutical Press (PhP), the knowledge business of RPS, provides trusted evidence-based resources to support healthcare professionals in their daily practice.
SPINE Ontology as the backbone of content discovery 4
SPINE
SPINE is an ontology of medicinal substances designed and developed by PhP.
The ontology classifies medicinal substances, including drugs, excipients, poisons, herbals, etc., and describes their classifications, therapeutic uses, chemical and physical properties using semantic relationships, properties and nomenclature.
SPINE enables PhP to better structure and align their content, improve editorial efficiencies, remove risks associated with legacy systems and deliver more accurate search results.
synaptica25 years of innovation
SPINE Ontology as the backbone of content discovery 5
SPINE High Level Requirementssynaptica25 years of innovation
1. Support a legacy ontology in OWL2. Create new Substance class ontology in
OWL3. Link out to independent schemes for
synonyms and street names 4. Multilingual, but each language may
have independent IRIs and properties5. Link out to many third-party
vocabularies6. Support proprietary publisher’s UID
format
SPINE Ontology as the backbone of content discovery
Synaptica helps people to organize,
categorize and discover enterprise
knowledge using taxonomies,
ontologies and knowledge graphs.
6
Synapticasynaptica25 years of innovation
Synaptica produce a range of taxonomy and ontology management software solutions, including Graphite used in this case study.
SPINE Ontology as the backbone of content discovery 7
Graphitesynaptica25 years of innovation
SPINE is managed in Graphite, a taxonomy and ontology system produced by Synaptica.
PhP required a system that integrated class-based ontologies with controlled vocabulary taxonomies.
Graphite was selected to define the SPINE semantic schema and terminologies and manage editorial workflows and governance.
SPINE Ontology as the backbone of content discovery 8
GraphDBsynaptica25 years of innovation
Graphite stores its ontologies and taxonomies as linked data in GraphDB, an RDF graph database produced by Ontotext.
By storing the PhP ontology in an RDF graph database the ontology is expressed using open industry standards, making the data portable and accessible using high performance standard query languages like SPARQL.
PhP also utilize GraphDB plugins to deliver real-time event notifications to consuming systems.
RDF
Event Triggers Push OutReal-Time Notifications
On Demand Queriesand Extracts
SPARQL
Triplestore
GraphDB
SPINE Ontology as the backbone of content discovery 9
Choosing Namespaces, Classes and Predicatessynaptica25 years of innovation
ClassHierarchies
CategoricalHierarchies
TopicalHierarchies
Business Rules
Rigorous set-theoretic class-subclass structures. Every sub-class
inherits properties of parent. Leaf-node subclasses may contain a set
of named individuals.
Similar to OWL classes and subclasses, SKOS supports a
transitive hierarchy of categories and subcategories.
SKOS does not distinguish between categories and
individuals.
Non-transitive hierarchies. Topical taxonomy structures mixing abstract and concrete
categorical concepts and named individuals.
Use CasesFormal classification schemes
intended for machine inferencing, reasoning and object-oriented
programming.
Less formal schemes capable of some level of inferencing
such as for use with auto categorization.
Granular subject indexing and navigational taxonomies.
Inference Bearing Yes Yes No
SemanticSchema OWL SKOS SKOS
ResourceTypes
owl:Class
owl:NamedIndividualskos:Concept skos:Concept
Hierarchical Relationships rdfs:subClassOf
skos:broaderTransitive
skos:narrowerTransitive
skos:broader
skos:narrower
Instance-Class Relationship rdf:type NA NA
Many KOS use a mixture of classes and
properties from different open data sources, the most
common being OWLand SKOS.
SPINE Ontology as the backbone of content discovery 10
Controlling Vocabulary in OWL and SKOSsynaptica25 years of innovation
OWL SKOS
Primary EntityLabel rdfs:label skos:prefLabel
SynonymousLabels
skos:altLabel
skos:hiddenLabel
Label Uniqueness Enforcement
Not required but may optionally be enforced through axioms or
business application logic.
Preferred Labels must be unique (disambiguated) within a scheme
for any given language.
Alternative and Hidden labels are not uniqueness enforced.
LanguageNotation
All string literals are language typed.
<rdfs:label xml:lang="en">My Class</rdfs:label>
All string literals are language typed.
<skos:prefLabel xml:lang="en">My Concept</skos:prefLabel>
MultilingualLabels
One concept (URI) may have many prefand / or alt labels in many languages, but one concept may only have one prefLabel
per language.
Out of the box SKOS supports controlled
vocabulary business rules such as all concepts in a
scheme must have a unique preferred label (within any particular
language).
OWL does not mandate label uniqueness but
OWL ontologies can be designed to enforce
them.
SPINE Ontology as the backbone of content discovery 11
PhP Best Fit Namespaces, Classes and Predicatessynaptica25 years of innovation
OWL PhPClass Entity Type owl:Class
Instance Entity Type owl:NamedIndividual
Primary Entity Label rdfs:label
Hierarchical Relationships rdfs:subClassOf
RPS Domain Specific Associative Relationships
Has Street NameHas Synonym
Has INNHas BANHas ATC
etc.
RPS Domain Specific Data Properties
Has Chemical NameSummary
etc.
PhP determined OWL was a better fit than SKOS because:
1. Legacy ontologies had already been developed in OWL
2. OWL is widely adopted in the health and life sciences community
But neither OWL nor SKOS alone could describe the complex set of multilingual terminology surrounding each class in the ontology.
PhP extended the ontology within its own Namespace using properties and relationships to define the semantics of their specialist knowledge domain.
SPINE Ontology as the backbone of content discovery 12
High Level Knowledge Modelsynaptica25 years of innovation
Substanceschemical substances,
drugs, excipients, poisons, herbals etc.
OWL class hierarchy leading to named individuals with unique
preferred labels, publishing metadata, and custom data
properties for chemical formula, chemical name,
molecular weight, and derivative substance
relationships
Street Names
multilingual with source attribution
Synonymsmultilingual with source attribution
Has Street Name
Has Synonym
INN Concept scheme
BAN Concept scheme
ATC Concept scheme
CAS Numbers
Concept scheme
Martindale Categories
Concept scheme
Has IN
N
In Martindale Category
Has BAN
Has ATC
Has CAS
External classification schemes(example selection)
Core SPINE ontology
SPINE Ontology as the backbone of content discovery
Semantic View of KOS
Ontology
SchemaClass Types
Property TypesRelationship Types
TaxonomySpecific Concepts, Classes & Named
Individuals
+
An Ontology comprises a semantic Schema of class, property and relationship types, plus a Taxonomy of specific concepts, classes and named individuals.
By selecting class, property and relationship types from internal
and open data resources like SKOS, OWL, DCT, PROV, etc., …
… one can design a semantic schema for any kind of KOS,
from the simplest glossary to the most complex ontology.
13
synaptica25 years of innovation
SPINE Ontology as the backbone of content discovery 14
synaptica25 years of innovation
Realizing the PhP Model within Graphite
STEP 1
Adopt and curate an extensible predicate library of in-house and external linked open data namespaces, and class, property, and relationship types
Graphite comes pre-loaded with many open data schema resources, to which the new RPS schema was added.
Generic non-PhPscreen example
SPINE Ontology as the backbone of content discovery 15
synaptica25 years of innovation
STEP 2
Design the specific semantic schema for the core SPINE ontology as well as all related reference terminologies
Specific KOS Schemes can be designed in minutes by adopting Schema elements from the library of class, property and
relationship types
Realizing the PhP Model within Graphite
Generic non-PhPscreen example
SPINE Ontology as the backbone of content discovery 16
synaptica25 years of innovation
STEP 3
Populate the ontology’s semantic schema with the Taxonomy of specific named classes and concepts
Drag-and-drop editability, batch editing, create concepts,build hierarchies and mappings between schemes
Import RDF taxonomies,Excel files and flat lists
Realizing the PhP Model within Graphite
Generic non-PhPscreen example
SPINE Ontology as the backbone of content discovery 17
Full Substance Class Recordsynaptica25 years of innovation
SPINE Ontology as the backbone of content discovery 18
Full Substance Class Recordsynaptica25 years of innovation
“Perindopril” is an owl:NamedIndividual with
an rdf:type link to its parent “Substance” which is an
owl:Class
“Perindopril” has numerous data properties
including its chemical name and summary
Links out to external schemes like CAS and internal entities
like derivative substances and molecular formula
SPINE Ontology as the backbone of content discovery 19
Taxonomic Challengessynaptica25 years of innovation
After developing the categorical ontology, PhP tackled several taxonomic challenges associated with the need to support multiple substance name terminologies that are curated by internal and external sources in multiple languages.
These terminologies were established within Graphite as independent taxonomy schemes that were linked to the master substance class ontology.
Traditional controlled vocabulary principles were employed with enhancements to handle special challenges.
SPINE Ontology as the backbone of content discovery 20
Managing Multilingual Concept Labelssynaptica25 years of innovation
Synonyms are separate individuals each with a URI so that we can mark metadata such as a reference source.
Official names (INNs) share a source, so one individual with multiple labels will suffice.
SPINE Ontology as the backbone of content discovery 21
Supporting RPS’s Established Proprietary UIDssynaptica25 years of innovation
Concepts and classes in an RDF ontology must have a unique identifier. Out of the box Graphite supports a few formats including GUIDs, which are globally unique 32-character alpha-numeric IDs.
These standard formats would not work because RPS already had a proprietary ID format that was embedded in customer databases.
The solution involved PhP and Synaptica engineers building endpoints and bi-directional APIs to integrate Graphite with PhP’s internal ID generator.
Graphite Ontology
PhP ID Repository
GET next ID
ID Manager API
Every time a new concept or class is created via the Graphite UI the ID Manager API is invoked to GET the next available PhP ID. Graphite then builds the full HTTP URI for the concept terminating with its PhP ID.
SPINE Ontology as the backbone of content discovery 22
How Ontology Powers Searchsynaptica25 years of innovation
SPINE Ontology as the backbone of content discovery 23
How Ontology Powers Searchsynaptica25 years of innovation
SPINE Ontology as the backbone of content discovery 24
Future Roadmapsynaptica25 years of innovation
1. Widening the scope to include other publications and add new properties
2. Restructuring other content streams into SPINE
3. Utilising Graphite for other structured content / controlled vocabularies such as side-effects
4. Enable additional capabilities such as content discovery
SPINE Ontology as the backbone of content discovery
SPINEOntology as the backbone of
content discovery
Thank You!
Kiri AikmanHead of New Content
Royal Pharmaceutical Society
[email protected]://twitter.com/kiriaik
synaptica25 years of innovation
Jonathan StottTechnical Architect
Royal Pharmaceutical Society
Dave ClarkeFounder
Synaptica
[email protected]://twitter.com/DavidClarkeBlog