fisheries linked open data - claudio baldassare

20
HARMONIZATION AND INTERLINKING OF FISHERY REFERENCE TERMINOLOGIES Fisheries Linked Open Data CLAUDIO BALDASSARRE

Upload: semantic-web-company

Post on 27-Jan-2015

111 views

Category:

Documents


0 download

DESCRIPTION

NPOs and NGOs are acting more-and-more as open data providers for various stakeholders like citizens, enterprises and communities. Linked open data becomes a key concept to meet several demands of information professionals, for instance interoperability and accessibility of data, multilinguality and harmonisation of metadata. The open data value chain is about to change from a rather simple to a more complex network of data streams which produces new revenue models and more differentiated roles – linked open data plays a central role in this development. This webinar is about the use of linked open data and controlled vocabularies in the specific enviroments, NGOs and NPOs are working in. Get an overview about the underpinning motivation and concepts which drive the very concrete use cases which will be presented:

TRANSCRIPT

Page 1: Fisheries Linked Open Data - Claudio Baldassare

HARMONIZATION AND INTERLINKING OF FISHERY REFERENCE TERMINOLOGIES

Fisheries Linked Open Data

C L A U D I O B A L D A S S A R R E

Page 2: Fisheries Linked Open Data - Claudio Baldassare

Outline

Fisheries Linked Open Data Harmonization Interlinked Domains Application Scenarios

FLOD Consumer ApplicationsFLOD as Master Data Management

Objectives, Challenges and Current Status

Page 3: Fisheries Linked Open Data - Claudio Baldassare

Fisheries Linked Open Data

A core of code lists that are references for statistical reports or data dissemination (e.g. yearbooks, web portals).

The codes are associated to terms (and translations) to provide controlled vocabularies. Fishing gears (ISSCGF) ex: purse seines - 01.1.0 Fishing vessels (ISSCVF) ex: purse seiners - 02.1.0 Fishing Area (FAO ) ex: western Mediterranean – 37.1 Marine species (ASFIS) ex: yellow fin tuna - YFT

A dense network of cross domain relationships. e.g. Sovereignty of a Country on Exclusive Economic Zone e.g. Participation of a Country in fishing agreements

Serves fishery communities of practice inside and outside FAO Statisticians, Marine Scientists, Content Mangers

Page 4: Fisheries Linked Open Data - Claudio Baldassare

Purse Seines and Purse Seiners

Page 5: Fisheries Linked Open Data - Claudio Baldassare

Harmonization

FLOD

Fishery Statistics

Supports statisticians in Fisheries Division to aggregate catch statistics from regional to global level.

en: Yellow Fin Tuna, es : Rabil fr : Albacore lt : Thunnus Albacares

asfis : YFT taxonomic : 1750102610 worms : 127027 aquamaps : 22833 fishbase : 22833

Page 6: Fisheries Linked Open Data - Claudio Baldassare

Interlinked Domains

FLOD

Land Geo-

Politics

Marine Geo-

Politics

Fishery Techniqu

e

Fishery Legislatio

n

Fishery Vessels

Fishery Statistics

Enables users to formulate complex requests leveraging cross-domain connections:

Amount of fish caught in 2008 in Danish Exclusive Economic Zone by vessels that practice fishing with traps?

Catch statistics reported in FAO subdivisions intersecting the marine areas sovereigned by Denmark?

Countries interested in the expiration of the legal agreements involving FAO fishing area 18 ending in year 2012?

Driving Competency Question: all deep-water species member of family x and family y that are critically endangered and predominately feed on prey species z that occur in this LME but only between latitudes A and B,  and longitudes C and D

Page 7: Fisheries Linked Open Data - Claudio Baldassare

Application Scenarios

FLOD

Land Geo-

Politics

Marine Geo-

Politics

Fishery Techniqu

e

Fishery Legislatio

n

Fishery Vessels

Fishery Statistics

Reallocate species catch statistics based on geospatial information, and fishing agreements.

Generate landing pages populated with data from remote open linked datasets.

Enhance search by exploiting network of connections in FLOD.

Document retrieval driven by contextual information.

Driving Competency Question: all deep-water species member of family x and family y that are critically endangered and predominately feed on prey species z that occur in this LME but only between latitudes A and B,  and longitudes C and D

Page 8: Fisheries Linked Open Data - Claudio Baldassare

FLOD Portal: Harmonization Exposed

Search for reference terms

Display alternative codes from harmonized code lists

Display translation for the controlled term

Display meta information on data provenance (i.e. rights holder and publisher)

All data are exposed through the FLOD SPARQL endpoint

Multilingual auto completion to avoid spelling errors

Page 9: Fisheries Linked Open Data - Claudio Baldassare

FLOD Portal: Network of Publications

Display a list FLOD individuals annotating this publication.

Display the occurrence of the user query in a specific page of the publication.Display provenance information for this publication.

All data are exposed through the FLOD SPARQL endpoint

Multilingual auto completion to avoid spelling errors

Page 10: Fisheries Linked Open Data - Claudio Baldassare

Enrich User Information Context

Mine FLOD entities into the web page browsed by the users

Enrich the content with data from the FLOD SPARQL endpoint retrieved through the hyperlinks

Provides an alternative to search the portal or the SPARQL endpoint for casual users

Page 11: Fisheries Linked Open Data - Claudio Baldassare

SPREAD: Time Series Spatial Reallocation

Retrieves all Exclusive Economic Zone where a Country is allowed to fish.

Retrieves reference species codes from regional code lists

Retrieves spatial intersection of fishing areas

All data are stored in the FLOD SPARQL endpoint

Retrieves fishing rights based on fishing agreements

Page 12: Fisheries Linked Open Data - Claudio Baldassare

Smart Time Series

Mine FLOD entities into catch time series (i.e. species, water areas, country)Leverage the network of FLOD to associate geo-referential data to geographic entities (e.g. water areas)Generate KLM model including references to the entities URIs found in each statistical record

Map time series records on Google map

Page 13: Fisheries Linked Open Data - Claudio Baldassare

FLOD as Master Data Management

MDM

FLOD

Principles

Features• Managing Multiple Vocabularies/Classifications and their Cross

Mappings• Multilingual Services• Import/Export Routines Supporting Multiple Formats• Integration with Existing Tools and Systems through open APIs/Web

services• Governance (data ownership and update workflows)Benefits

Master Data Management is a wide area of technological investigation in FAO to identify a toolkit that enables the management/maintenance of reference data at corporate level.

FLOD projects inherits the definition of principles and the benefits of MDM, and develops MDM features with the adoption of semantic technologies.

Page 14: Fisheries Linked Open Data - Claudio Baldassare

Vocabularies and their Cross Mappings

Objectives Align reference terminologies of fishing gears, vessels, marine species

and fishing areas. Link individuals of fishing gears, vessels, marine species, fishing and

administrative sea areas, geo-political territories, legal and governmental entities.

Challenges Evolve from an hierarchical structure to a data model design enabling

accurate alignment and linking capabilities w.r.t. the heterogeneity of classifications granularity.

Current status FLOD is designed by architecting modules of part-whole, constituency,

collection, and other reusable ontology engineering patterns. Ingestion workflows are in place from structured sources of

terminologies and relationships to generate linked datasets.

Page 15: Fisheries Linked Open Data - Claudio Baldassare

Multilingual Services

Objectives Associate the name(s) of the reference terminologies

among the variety of lexicalizations of local usage Track evolution of lexicalizations over time.

Challenges A descriptive semantic model for lexicalizations that

responds to the needs of selecting the appropriate name(s) in the user information context.

Current status FLOD implements most known RDFS:label property with

metadata on language; it defers information on more lexicalizations or name change to the source system of names provenance.

Page 16: Fisheries Linked Open Data - Claudio Baldassare

Import/Export Routines

Objectives Streamline workflows of data import and conversion to RDF. Record and store versions of imported data. Selective maintenance operations targeting specific datasets.

Challenges A framework for ETL operations for system administrator with low

or no knowledge of linked datasets. Keep synchronization among data sources and linked dataset on

regular basis, with control on versioning.Current status

Semi-automatic processes of reading and casting reference terminologies in to datasets through FLOD ontology modules.

Each dataset receives a timestamp as an explicit metadata at its creation.

Page 17: Fisheries Linked Open Data - Claudio Baldassare

Integration with Existing Tools and Systems

Objectives Homogeneously search remote reference terminologies hosted in a

combination of DBs and KBs. Empower search engines with query expansion based on the knowledge

available in FLOD. Expose knowledge base content trough RDF agnostic API.

Challenges Define a top level domain ontology that contains the reference super

concepts to address the reference terminologies. Implement a mechanism to cast user terms as individuals or concepts in

the top level ontology.Current status

An implementation of OpenSearch with semantic extension is being prepared to enable query services on top of FLOD SPARQL endpoint.

FLOD portal includes spelling support to convert user query terms in to references to FLOD individuals.

Page 18: Fisheries Linked Open Data - Claudio Baldassare

Governance

Objectives Model the role and activity of the data providers with

explicit reference to owner, publisher, rights holder, right to update, and system of provenance.

Inject the Import/Export maintenance routine with the roles and activity of governance.

Challenges Identify licensing schemas that can drive the modeling

activity.Current status

Round tables on governance have been started in contexts where the data providers have active participation.

Page 19: Fisheries Linked Open Data - Claudio Baldassare

Conclusions

FLOD represents a source of harmonized reference data and controlled terms for applications of statistics and marine science.

FLOD it is consumed by applications in need to aggregate either data instances, or documental resources relevant to users’ need (e.g. search or context).

FLOD approach to maintenance of linked reference datasets leads to a decentralized maintenance effort under the responsibility of respective data owner.

FLOD roadmap is at point where robust maintenance framework and a scalable operational infrastructure are recommended.

Where FLOD approach to code lists and controlled terms maintenance proves to be successful it can provide good practice and recommendations to corporate Master Data Management.

Page 20: Fisheries Linked Open Data - Claudio Baldassare

Acknowledgements

iMarine project supports the development of: FLOD Web Portal SPREAD geospatial reallocation engine FLOD content enricher

iMarine: http://www.i-marine.eu/