spine ontology as the backbone of content discovery

25
SPINE Ontology as the backbone of content discovery Taxonomy Bootcamp London June 9 th , 2021 synaptica 25 years of innovation Kiri Aikman Head of New Content Royal Pharmaceutical Society [email protected] https://twitter.com/kiriaik Jonathan Stott Technical Architect Royal Pharmaceutical Society [email protected] Dave Clarke Founder Synaptica [email protected] https://twitter.com/DavidClarkeBlog

Upload: others

Post on 09-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

SPINE Ontology as the backbone of content discovery

SPINEOntology as the backbone of

content discovery

Taxonomy Bootcamp LondonJune 9th, 2021

synaptica25 years of innovation

Kiri AikmanHead of New Content

Royal Pharmaceutical Society

[email protected]://twitter.com/kiriaik

Jonathan StottTechnical Architect

Royal Pharmaceutical Society

[email protected]

Dave ClarkeFounder

Synaptica

[email protected]://twitter.com/DavidClarkeBlog

SPINE Ontology as the backbone of content discovery 2

Outlinesynaptica25 years of innovation

Kiri Aikman1. RPS and Pharmaceutical Press2. SPINE3. PhP Requirements

Dave Clarke1. Graphite and GraphDB2. Working in OWL versus SKOS3. Selecting Best Fit Namespaces and developing custom

PhP pharma-specific properties4. SPINE’s High Level Knowledge Model

Jonathan Stott1. Managing Multilingual Concept Labels2. Supporting PhP’s Established Proprietary UIDs3. Integrating with Content Management, Tagging and Search4. Future Roadmap

SPINE Ontology as the backbone of content discovery

The Royal Pharmaceutical

Society (RPS) is a world leader in the

safe use of medicines with a

mission to put pharmacy at the

forefront of healthcare.

3

The Royal Pharmaceutical Societysynaptica25 years of innovation

Pharmaceutical Press (PhP), the knowledge business of RPS, provides trusted evidence-based resources to support healthcare professionals in their daily practice.

SPINE Ontology as the backbone of content discovery 4

SPINE

SPINE is an ontology of medicinal substances designed and developed by PhP.

The ontology classifies medicinal substances, including drugs, excipients, poisons, herbals, etc., and describes their classifications, therapeutic uses, chemical and physical properties using semantic relationships, properties and nomenclature.

SPINE enables PhP to better structure and align their content, improve editorial efficiencies, remove risks associated with legacy systems and deliver more accurate search results.

synaptica25 years of innovation

SPINE Ontology as the backbone of content discovery 5

SPINE High Level Requirementssynaptica25 years of innovation

1. Support a legacy ontology in OWL2. Create new Substance class ontology in

OWL3. Link out to independent schemes for

synonyms and street names 4. Multilingual, but each language may

have independent IRIs and properties5. Link out to many third-party

vocabularies6. Support proprietary publisher’s UID

format

SPINE Ontology as the backbone of content discovery

Synaptica helps people to organize,

categorize and discover enterprise

knowledge using taxonomies,

ontologies and knowledge graphs.

6

Synapticasynaptica25 years of innovation

Synaptica produce a range of taxonomy and ontology management software solutions, including Graphite used in this case study.

SPINE Ontology as the backbone of content discovery 7

Graphitesynaptica25 years of innovation

SPINE is managed in Graphite, a taxonomy and ontology system produced by Synaptica.

PhP required a system that integrated class-based ontologies with controlled vocabulary taxonomies.

Graphite was selected to define the SPINE semantic schema and terminologies and manage editorial workflows and governance.

SPINE Ontology as the backbone of content discovery 8

GraphDBsynaptica25 years of innovation

Graphite stores its ontologies and taxonomies as linked data in GraphDB, an RDF graph database produced by Ontotext.

By storing the PhP ontology in an RDF graph database the ontology is expressed using open industry standards, making the data portable and accessible using high performance standard query languages like SPARQL.

PhP also utilize GraphDB plugins to deliver real-time event notifications to consuming systems.

RDF

Event Triggers Push OutReal-Time Notifications

On Demand Queriesand Extracts

SPARQL

Triplestore

GraphDB

SPINE Ontology as the backbone of content discovery 9

Choosing Namespaces, Classes and Predicatessynaptica25 years of innovation

ClassHierarchies

CategoricalHierarchies

TopicalHierarchies

Business Rules

Rigorous set-theoretic class-subclass structures. Every sub-class

inherits properties of parent. Leaf-node subclasses may contain a set

of named individuals.

Similar to OWL classes and subclasses, SKOS supports a

transitive hierarchy of categories and subcategories.

SKOS does not distinguish between categories and

individuals.

Non-transitive hierarchies. Topical taxonomy structures mixing abstract and concrete

categorical concepts and named individuals.

Use CasesFormal classification schemes

intended for machine inferencing, reasoning and object-oriented

programming.

Less formal schemes capable of some level of inferencing

such as for use with auto categorization.

Granular subject indexing and navigational taxonomies.

Inference Bearing Yes Yes No

SemanticSchema OWL SKOS SKOS

ResourceTypes

owl:Class

owl:NamedIndividualskos:Concept skos:Concept

Hierarchical Relationships rdfs:subClassOf

skos:broaderTransitive

skos:narrowerTransitive

skos:broader

skos:narrower

Instance-Class Relationship rdf:type NA NA

Many KOS use a mixture of classes and

properties from different open data sources, the most

common being OWLand SKOS.

SPINE Ontology as the backbone of content discovery 10

Controlling Vocabulary in OWL and SKOSsynaptica25 years of innovation

OWL SKOS

Primary EntityLabel rdfs:label skos:prefLabel

SynonymousLabels

skos:altLabel

skos:hiddenLabel

Label Uniqueness Enforcement

Not required but may optionally be enforced through axioms or

business application logic.

Preferred Labels must be unique (disambiguated) within a scheme

for any given language.

Alternative and Hidden labels are not uniqueness enforced.

LanguageNotation

All string literals are language typed.

<rdfs:label xml:lang="en">My Class</rdfs:label>

All string literals are language typed.

<skos:prefLabel xml:lang="en">My Concept</skos:prefLabel>

MultilingualLabels

One concept (URI) may have many prefand / or alt labels in many languages, but one concept may only have one prefLabel

per language.

Out of the box SKOS supports controlled

vocabulary business rules such as all concepts in a

scheme must have a unique preferred label (within any particular

language).

OWL does not mandate label uniqueness but

OWL ontologies can be designed to enforce

them.

SPINE Ontology as the backbone of content discovery 11

PhP Best Fit Namespaces, Classes and Predicatessynaptica25 years of innovation

OWL PhPClass Entity Type owl:Class

Instance Entity Type owl:NamedIndividual

Primary Entity Label rdfs:label

Hierarchical Relationships rdfs:subClassOf

RPS Domain Specific Associative Relationships

Has Street NameHas Synonym

Has INNHas BANHas ATC

etc.

RPS Domain Specific Data Properties

Has Chemical NameSummary

etc.

PhP determined OWL was a better fit than SKOS because:

1. Legacy ontologies had already been developed in OWL

2. OWL is widely adopted in the health and life sciences community

But neither OWL nor SKOS alone could describe the complex set of multilingual terminology surrounding each class in the ontology.

PhP extended the ontology within its own Namespace using properties and relationships to define the semantics of their specialist knowledge domain.

SPINE Ontology as the backbone of content discovery 12

High Level Knowledge Modelsynaptica25 years of innovation

Substanceschemical substances,

drugs, excipients, poisons, herbals etc.

OWL class hierarchy leading to named individuals with unique

preferred labels, publishing metadata, and custom data

properties for chemical formula, chemical name,

molecular weight, and derivative substance

relationships

Street Names

multilingual with source attribution

Synonymsmultilingual with source attribution

Has Street Name

Has Synonym

INN Concept scheme

BAN Concept scheme

ATC Concept scheme

CAS Numbers

Concept scheme

Martindale Categories

Concept scheme

Has IN

N

In Martindale Category

Has BAN

Has ATC

Has CAS

External classification schemes(example selection)

Core SPINE ontology

SPINE Ontology as the backbone of content discovery

Semantic View of KOS

Ontology

SchemaClass Types

Property TypesRelationship Types

TaxonomySpecific Concepts, Classes & Named

Individuals

+

An Ontology comprises a semantic Schema of class, property and relationship types, plus a Taxonomy of specific concepts, classes and named individuals.

By selecting class, property and relationship types from internal

and open data resources like SKOS, OWL, DCT, PROV, etc., …

… one can design a semantic schema for any kind of KOS,

from the simplest glossary to the most complex ontology.

13

synaptica25 years of innovation

SPINE Ontology as the backbone of content discovery 14

synaptica25 years of innovation

Realizing the PhP Model within Graphite

STEP 1

Adopt and curate an extensible predicate library of in-house and external linked open data namespaces, and class, property, and relationship types

Graphite comes pre-loaded with many open data schema resources, to which the new RPS schema was added.

Generic non-PhPscreen example

SPINE Ontology as the backbone of content discovery 15

synaptica25 years of innovation

STEP 2

Design the specific semantic schema for the core SPINE ontology as well as all related reference terminologies

Specific KOS Schemes can be designed in minutes by adopting Schema elements from the library of class, property and

relationship types

Realizing the PhP Model within Graphite

Generic non-PhPscreen example

SPINE Ontology as the backbone of content discovery 16

synaptica25 years of innovation

STEP 3

Populate the ontology’s semantic schema with the Taxonomy of specific named classes and concepts

Drag-and-drop editability, batch editing, create concepts,build hierarchies and mappings between schemes

Import RDF taxonomies,Excel files and flat lists

Realizing the PhP Model within Graphite

Generic non-PhPscreen example

SPINE Ontology as the backbone of content discovery 17

Full Substance Class Recordsynaptica25 years of innovation

SPINE Ontology as the backbone of content discovery 18

Full Substance Class Recordsynaptica25 years of innovation

“Perindopril” is an owl:NamedIndividual with

an rdf:type link to its parent “Substance” which is an

owl:Class

“Perindopril” has numerous data properties

including its chemical name and summary

Links out to external schemes like CAS and internal entities

like derivative substances and molecular formula

SPINE Ontology as the backbone of content discovery 19

Taxonomic Challengessynaptica25 years of innovation

After developing the categorical ontology, PhP tackled several taxonomic challenges associated with the need to support multiple substance name terminologies that are curated by internal and external sources in multiple languages.

These terminologies were established within Graphite as independent taxonomy schemes that were linked to the master substance class ontology.

Traditional controlled vocabulary principles were employed with enhancements to handle special challenges.

SPINE Ontology as the backbone of content discovery 20

Managing Multilingual Concept Labelssynaptica25 years of innovation

Synonyms are separate individuals each with a URI so that we can mark metadata such as a reference source.

Official names (INNs) share a source, so one individual with multiple labels will suffice.

SPINE Ontology as the backbone of content discovery 21

Supporting RPS’s Established Proprietary UIDssynaptica25 years of innovation

Concepts and classes in an RDF ontology must have a unique identifier. Out of the box Graphite supports a few formats including GUIDs, which are globally unique 32-character alpha-numeric IDs.

These standard formats would not work because RPS already had a proprietary ID format that was embedded in customer databases.

The solution involved PhP and Synaptica engineers building endpoints and bi-directional APIs to integrate Graphite with PhP’s internal ID generator.

Graphite Ontology

PhP ID Repository

GET next ID

ID Manager API

Every time a new concept or class is created via the Graphite UI the ID Manager API is invoked to GET the next available PhP ID. Graphite then builds the full HTTP URI for the concept terminating with its PhP ID.

SPINE Ontology as the backbone of content discovery 22

How Ontology Powers Searchsynaptica25 years of innovation

SPINE Ontology as the backbone of content discovery 23

How Ontology Powers Searchsynaptica25 years of innovation

SPINE Ontology as the backbone of content discovery 24

Future Roadmapsynaptica25 years of innovation

1. Widening the scope to include other publications and add new properties

2. Restructuring other content streams into SPINE

3. Utilising Graphite for other structured content / controlled vocabularies such as side-effects

4. Enable additional capabilities such as content discovery

SPINE Ontology as the backbone of content discovery

SPINEOntology as the backbone of

content discovery

Thank You!

Kiri AikmanHead of New Content

Royal Pharmaceutical Society

[email protected]://twitter.com/kiriaik

synaptica25 years of innovation

Jonathan StottTechnical Architect

Royal Pharmaceutical Society

[email protected]

Dave ClarkeFounder

Synaptica

[email protected]://twitter.com/DavidClarkeBlog