vocabulary registries and services doug tudhope hypermedia research unit university of glamorgan...

25
Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Upload: darlene-garrett

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Vocabulary registries and services

Doug Tudhope

Hypermedia Research Unit

University of Glamorgan

Ecoterm, FAO, Rome, Oct 2009

Page 2: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Presentation(acknowledge Kora Golub, UKOLN on TRSS)

1. JISC Terminology Registry Scoping Study (TRSS)– Architecture options– Use cases– Some major registry projects briefly reviewed– Issues (governance)

2. Terminology services at Glamorgan– SOAP based services– HTTP based services– List of work on services

Page 3: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

1. TRSS Project context

• UK JISC funded - Terminology Registry Scoping Study Background JISC 2006 review on Terminology Services and Technology http://www.jisc.ac.uk/media/documents/programmes/capital/terminology_services_and_technology_review_sep_06.pdf

• Partners– UKOLN (Kora Golub, PI)– University of Glamorgan (Doug Tudhope)– Non-funded: OCLC Office of Research, USA

• TRSS project 2008, published July 2009http://www.ukoln.ac.uk/projects/trss

• TRSS final report http://www.jisc.ac.uk/media/documents/programmes/sharedservices/trss-report-final.pdf

Page 4: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Overall approach

• Relatively short 6 month timescale

• Review previous and current projects and documentation

• Consultation with key services, projects and executives across digital library, research and learning domains

– 28 responses collected

Page 5: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

TRSS final report

• Many of the actual recommendations in the report are UK specific.

However report includes more general material

• discussion on types of registries, their scope and architecture, standards, governance,

• review of functionality and use cases

• review of some KOS registry initiatives and implementations

• review of KOS metadata

with some recommendations on an expanded metadata set

Page 6: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Definitions

• Terminologies– Controlled vocabularies often referred to as terminologies

with regard to registries and web services

• Terminology services– Web services: return/apply vocabularies and their content

• Terminology registry– lists, describes, and points to sets of vocabularies – can hold vocabulary information: member terms, concepts

and relationships, provide terminology services, for both human inspection and m2m access

Page 7: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Architecture

• Option 1: Registry provides metadata for each vocabulary and links to vocabulary owner/provider

• Option 2: Registry provides metadata on (and links to) any available terminology services

• Option 3: Registry provides access to vocabulary content (by downloading or providing access to vocabulary’s concepts, terms and relationships)

• orthogonal (independent) facets which can be combined

Page 8: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Collected use cases (from literature and respondents)under general headings of TR functionality

• Creation, modification and maintenance (Option 3)

• Aquisition and publication (Option 1, 3)

• Cataloguing: Indexing/classification/annotation (Options 2,3)

• Integration (Options 2, 3)– Including mapping, merging and semantic interoperabilty

• Access, search and discovery (Options 1,2,3)– Both at vocabulary and concept/service level

• Use (Options 2,3)– terminology service providing support for a wider application

• Archiving and preservation (Option 3)

Page 9: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Basic rationale for a TR in immediate JISC context

• Main rationale for the near term recommendation of report (Option 1) is in providing a service to assist discovery of existing vocabularies, or the most recent version of a given vocabulary.

• Several TRSS respondents and many use cases describe variants of a scenario, involving a user from a particular subject domain looking to see if a vocabulary with certain properties already exists.

• This may be for purposes of supporting access to a new repository or collection (via search and browse services). It may be to assist the design of a new vocabulary by first looking to see if anything similar already exists.

Page 10: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Need for metadata

• The features of a vocabulary that afford discovery vary (widely) according to the user’s search criteria.

• The user may have a rough idea of a particular vocabularies title. The user may require a vocabulary covering a particular subject domain (to greater or lesser degree of specificity). It may be critical that the vocabulary is free to use. It may be important that the vocabulary be available in a particular language. The depth or breadth of topic coverage may be an issue.

etc.

• To assist discovery a rich set of metadata should be available for the vocabulary.

Page 11: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Some existing TRsFor details see TRSS report

• Taxonomy Warehouse

– Option 1, interactive access

– claims to host more than 670 taxonomies (73 subject domains) from 288 publishers in 39 languages

• Cendi Terminology Locator

– Option 1, interactive access

– Points to terminology resources of CENDI federal science research agencies, spanning agriculture to medicine to the environment

Page 12: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

…Existing TRs…

• Lexaurus Bank ( originated as BECTA Vocabulary Bank)– Options 1, 2, 3, interactive and m2m access– supports creating, editing and maintenance of educational

vocabularies supporting UK National Curriculum

• BioPortal and OBO Foundry – Options 1, 2, 3, interactive and m2m access– US OBO – over 60 life-science ontologies– UK BioPortal – search and browsing access to its ontologies and

experimental data

Page 13: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

…Existing TRs…

• FAO KOS Registry– Options 1, 2, 3, interactive access (and m2m access to Agrovoc)– Holds over 90 KOS, in areas related to agriculture and

administration

• NERC Data Grid's Vocabulary Server – Options 1, 2, 3, m2m access– The British Oceanographic Data Centre (BODC) has a TR which

supports interoperability of scientific datasets in 43 international data centres

– with more than 100 vocabularies

Page 14: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

…Existing TRs

• NSDL registry – Options 1 and 3, interactive access– SKOS-based TR, with an integrated metadata registry– 29 vocabularies, mainly educational so far

• OCLC's Terminology Services Pilot– Options 1, 2, 3, interactive and m2m access – Current vocabularies held include FAST, GSAFD, LC AC SH,

LCSH, MeSH, TGM

And also various broadly related initiatives, including • eXtended MetaData Registry (XMDR) • ISO/IEC 11179 Metadata Registries family of standards• JISC IE Service Registry (IESR)• JISC IE Metadata Schema Registry (IEMSR) • Species 2000 and Catalog of Life

Page 15: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Metadata

Review of KOS metadata, including from• NKOS Registry 1998

• NKOS Registry 2001

• CENDI

• Ecoterm (Environmental Terminology and KOS)

• Food and Agriculture Organization (FAO) of UN

• Hodge et al. 2007 (10th OFMR)

• National Science Digital Library Registry

• ISO 11179 (Information Technology - Metadata registries (MDR))

• OCLC Terminology Services

• SPECTRUM Terminology Bank

• Taxonomy Warehouse

• Vocman (Becta Vocabulary Bank)

and taking into consideration Ontology Metadata Vocabulary (OMV)

Page 16: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Metadata – proposed extended metadata setdetails in TRSS report – interested in feedback

1 General information – Vocabulary name, author or editor, type etc.

2 Scope and usage– Subjects covered, purpose, rating etc.

3 Characteristics– Type of terms, relationships etc.

4 Terms and conditions– Availability etc.

5 Provider – Contact name etc.

Page 17: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Governance

• Includes both Technical and Content governance

Content governance varies with architecture Option and may include:

• Validation of correctness of content

• Maintaining vocabulary representations supported

according to appropriate versions of standards

• Versioning of the vocabulary intellectual content

• Need for selection of vocabularies?– process/criteria for evaluating whether to accept offered vocabularies

– Reviewing metadata returned by vocabulary owners

• Promotion of the TR and its services

• Education and training in the resources and services.

• Emerged as a concern if content held in the registry (Option 3)

One of the reasons for short term recommendation of Option 1

for a general vocabulary situation

Page 18: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Issues

• Metadata set core/optional for TR?

• cost/benefit in how rich a metadata set to recommend – a richer set might be more useful but deter vocabulary providers

• Metadata for terminology services?

• Relationship with ontology and language community registries?

• When is Option 3 feasible?

eg considering governance issues

It may be easier for well defined, coherent communities

Page 19: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

2. Terminology Services

can be applied at all stages of the search process. Services include resolving search terms to controlled vocabulary, disambiguation services, offering browsing access, offering mapping between vocabularies, query expansion, query reformulation, combined search and browsing. These can be applied as immediate elements of the end-user interface or can underpin services behind the scenes, according to context.

JISC review on Terminology Services and Technologies, 2006

Potential for SKOS-based programmatic services

Page 20: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

SKOS Services at Glamorgan

We took as starting point a subset of

• SKOS API (Application Program Interface)a deliverable of SWAD-Europe Thesaurus Activityhttp://www.w3.org/2001/sw/Europe/reports/thes designed to provide programmatic access to SKOS vocabularies

• Our focus is on the functionality of the serviceswhich could be implemented via various lower level protocols

IssuesHow to package the functionality, what are common patterns of use?How to implement the services in different lower level protocols?

Page 21: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

SKOS Web Service and Client Applications

SKOS Web ServiceWindows based client

application

Web browser based components (‘widgets’)

SKOS Client Applications

Page 22: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

SKOS Services:

possible examples

Web Service Client•GetTopmostConcepts•GetConceptSchemes•GetConcept•GetAllConceptRelatives•GetAllConceptsByPath•GetConceptsMatchingKeyword•ExpandConcept

Given a string (cove), GetConcept finds matches in the controlled vocabularies of all SKOS concept schemes registered with the server.

Shows an example of a match with the ‘entry vocabulary’ of effective synonyms (eg bays) for different SKOS schemes

Display details of selected concept.

Here illustrating the semantic expansion service returning ‘semantically close’ concepts to cove

Page 23: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

SKOS Client - Widgets

Concept Schemes

Concept Search

Concept Details Concept Expansion

Page 24: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Current work

Semantic Tools for Archaeology Resources (STAR) research project

English Heritage thesauri converted to SKOS

SKOS based terminology servicesBrowsing

Query expansion

others have used in (DELOS and ArcheoTools) research projects

http://hypermedia.research.glam.ac.uk/kos/terminology_services/

Recently developed URL based web service call interface

for SKOS services in ongoing JISC tag suggestion projectFast, scalable, platform neutral

JSON data structures returned

Related KOS-based web services (non-exhaustive list)http://hypermedia.research.glam.ac.uk/kos/terminology_services/links/

Page 25: Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009

Contact Information

Doug Tudhope

School of Computing

University of Glamorgan

Pontypridd CF37 1DL

Wales, UK

[email protected]

http://hypermedia.research.glam.ac.uk/