fisheries linked open data - claudio baldassare
Post on 27-Jan-2015
111 Views
Preview:
DESCRIPTION
TRANSCRIPT
HARMONIZATION AND INTERLINKING OF FISHERY REFERENCE TERMINOLOGIES
Fisheries Linked Open Data
C L A U D I O B A L D A S S A R R E
Outline
Fisheries Linked Open Data Harmonization Interlinked Domains Application Scenarios
FLOD Consumer ApplicationsFLOD as Master Data Management
Objectives, Challenges and Current Status
Fisheries Linked Open Data
A core of code lists that are references for statistical reports or data dissemination (e.g. yearbooks, web portals).
The codes are associated to terms (and translations) to provide controlled vocabularies. Fishing gears (ISSCGF) ex: purse seines - 01.1.0 Fishing vessels (ISSCVF) ex: purse seiners - 02.1.0 Fishing Area (FAO ) ex: western Mediterranean – 37.1 Marine species (ASFIS) ex: yellow fin tuna - YFT
A dense network of cross domain relationships. e.g. Sovereignty of a Country on Exclusive Economic Zone e.g. Participation of a Country in fishing agreements
Serves fishery communities of practice inside and outside FAO Statisticians, Marine Scientists, Content Mangers
Purse Seines and Purse Seiners
Harmonization
FLOD
Fishery Statistics
Supports statisticians in Fisheries Division to aggregate catch statistics from regional to global level.
en: Yellow Fin Tuna, es : Rabil fr : Albacore lt : Thunnus Albacares
asfis : YFT taxonomic : 1750102610 worms : 127027 aquamaps : 22833 fishbase : 22833
Interlinked Domains
FLOD
Land Geo-
Politics
Marine Geo-
Politics
Fishery Techniqu
e
Fishery Legislatio
n
Fishery Vessels
Fishery Statistics
Enables users to formulate complex requests leveraging cross-domain connections:
Amount of fish caught in 2008 in Danish Exclusive Economic Zone by vessels that practice fishing with traps?
Catch statistics reported in FAO subdivisions intersecting the marine areas sovereigned by Denmark?
Countries interested in the expiration of the legal agreements involving FAO fishing area 18 ending in year 2012?
Driving Competency Question: all deep-water species member of family x and family y that are critically endangered and predominately feed on prey species z that occur in this LME but only between latitudes A and B, and longitudes C and D
Application Scenarios
FLOD
Land Geo-
Politics
Marine Geo-
Politics
Fishery Techniqu
e
Fishery Legislatio
n
Fishery Vessels
Fishery Statistics
Reallocate species catch statistics based on geospatial information, and fishing agreements.
Generate landing pages populated with data from remote open linked datasets.
Enhance search by exploiting network of connections in FLOD.
Document retrieval driven by contextual information.
Driving Competency Question: all deep-water species member of family x and family y that are critically endangered and predominately feed on prey species z that occur in this LME but only between latitudes A and B, and longitudes C and D
FLOD Portal: Harmonization Exposed
Search for reference terms
Display alternative codes from harmonized code lists
Display translation for the controlled term
Display meta information on data provenance (i.e. rights holder and publisher)
All data are exposed through the FLOD SPARQL endpoint
Multilingual auto completion to avoid spelling errors
FLOD Portal: Network of Publications
Display a list FLOD individuals annotating this publication.
Display the occurrence of the user query in a specific page of the publication.Display provenance information for this publication.
All data are exposed through the FLOD SPARQL endpoint
Multilingual auto completion to avoid spelling errors
Enrich User Information Context
Mine FLOD entities into the web page browsed by the users
Enrich the content with data from the FLOD SPARQL endpoint retrieved through the hyperlinks
Provides an alternative to search the portal or the SPARQL endpoint for casual users
SPREAD: Time Series Spatial Reallocation
Retrieves all Exclusive Economic Zone where a Country is allowed to fish.
Retrieves reference species codes from regional code lists
Retrieves spatial intersection of fishing areas
All data are stored in the FLOD SPARQL endpoint
Retrieves fishing rights based on fishing agreements
Smart Time Series
Mine FLOD entities into catch time series (i.e. species, water areas, country)Leverage the network of FLOD to associate geo-referential data to geographic entities (e.g. water areas)Generate KLM model including references to the entities URIs found in each statistical record
Map time series records on Google map
FLOD as Master Data Management
MDM
FLOD
Principles
Features• Managing Multiple Vocabularies/Classifications and their Cross
Mappings• Multilingual Services• Import/Export Routines Supporting Multiple Formats• Integration with Existing Tools and Systems through open APIs/Web
services• Governance (data ownership and update workflows)Benefits
Master Data Management is a wide area of technological investigation in FAO to identify a toolkit that enables the management/maintenance of reference data at corporate level.
FLOD projects inherits the definition of principles and the benefits of MDM, and develops MDM features with the adoption of semantic technologies.
Vocabularies and their Cross Mappings
Objectives Align reference terminologies of fishing gears, vessels, marine species
and fishing areas. Link individuals of fishing gears, vessels, marine species, fishing and
administrative sea areas, geo-political territories, legal and governmental entities.
Challenges Evolve from an hierarchical structure to a data model design enabling
accurate alignment and linking capabilities w.r.t. the heterogeneity of classifications granularity.
Current status FLOD is designed by architecting modules of part-whole, constituency,
collection, and other reusable ontology engineering patterns. Ingestion workflows are in place from structured sources of
terminologies and relationships to generate linked datasets.
Multilingual Services
Objectives Associate the name(s) of the reference terminologies
among the variety of lexicalizations of local usage Track evolution of lexicalizations over time.
Challenges A descriptive semantic model for lexicalizations that
responds to the needs of selecting the appropriate name(s) in the user information context.
Current status FLOD implements most known RDFS:label property with
metadata on language; it defers information on more lexicalizations or name change to the source system of names provenance.
Import/Export Routines
Objectives Streamline workflows of data import and conversion to RDF. Record and store versions of imported data. Selective maintenance operations targeting specific datasets.
Challenges A framework for ETL operations for system administrator with low
or no knowledge of linked datasets. Keep synchronization among data sources and linked dataset on
regular basis, with control on versioning.Current status
Semi-automatic processes of reading and casting reference terminologies in to datasets through FLOD ontology modules.
Each dataset receives a timestamp as an explicit metadata at its creation.
Integration with Existing Tools and Systems
Objectives Homogeneously search remote reference terminologies hosted in a
combination of DBs and KBs. Empower search engines with query expansion based on the knowledge
available in FLOD. Expose knowledge base content trough RDF agnostic API.
Challenges Define a top level domain ontology that contains the reference super
concepts to address the reference terminologies. Implement a mechanism to cast user terms as individuals or concepts in
the top level ontology.Current status
An implementation of OpenSearch with semantic extension is being prepared to enable query services on top of FLOD SPARQL endpoint.
FLOD portal includes spelling support to convert user query terms in to references to FLOD individuals.
Governance
Objectives Model the role and activity of the data providers with
explicit reference to owner, publisher, rights holder, right to update, and system of provenance.
Inject the Import/Export maintenance routine with the roles and activity of governance.
Challenges Identify licensing schemas that can drive the modeling
activity.Current status
Round tables on governance have been started in contexts where the data providers have active participation.
Conclusions
FLOD represents a source of harmonized reference data and controlled terms for applications of statistics and marine science.
FLOD it is consumed by applications in need to aggregate either data instances, or documental resources relevant to users’ need (e.g. search or context).
FLOD approach to maintenance of linked reference datasets leads to a decentralized maintenance effort under the responsibility of respective data owner.
FLOD roadmap is at point where robust maintenance framework and a scalable operational infrastructure are recommended.
Where FLOD approach to code lists and controlled terms maintenance proves to be successful it can provide good practice and recommendations to corporate Master Data Management.
Acknowledgements
iMarine project supports the development of: FLOD Web Portal SPREAD geospatial reallocation engine FLOD content enricher
iMarine: http://www.i-marine.eu/
top related