agINFRAA data infrastructure to
support agricultural scientific communities
Andreas Drakos, University of AlcalaEGI-APARSEN workshop, Amsterdam, 4-6 March 2014
2
Our project
in agINFRA we will:
share agricultural research…
…over a data e-infrastructure
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
3
• Primary data:– Structured, e.g. datasets as tables– Digitized : images, videos, etc.
• Secondary data (elaborations, e.g. a dendogram)• Provenance information, incl. authors, their
organizations and projects• Methods and procedures followed• Reports, including papers• Secondary documents, e.g. training resources• Metadata about the above• Social data, tags, ratings, etc.
Agricultural research data
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
4
A | Open | Must be open and interlinkedNOT subject to barriers, based on standard formats and avoiding building data silos due to lack of interrelatedness and ad-hoc APIs.
B | Meaningful | Must be meaningful through explicit semantics
C | Reliable | Must be reliable, traceable and accessible
D | Actionable | Must be actionable via services that empower research
Reusing the semantics already provided in mature terminologies and ontologies that are exposed and interlinked through the Web.
Any kind of research objects can be stored in the data infrastructure, and there are NO barriers to expressing relations between these objects to capture the context of research activities.
Data is not useful without flexible and adaptable services that allow researchers to act on the data in the ways they need.
agINFRA values: scientific data must be
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
5
There is a lot of data
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
6
CONTENT PROVIDER WITH CMS THAT SUPPORTS SHARING (e.g. OAI-PMH, RSS,...)
CONTENT PROVIDER WITH CMS THAT DOES NOT SUPPORT SHARING (e.g. proprietary DB)
(meta)data export in proprietary
format & mapping to known
ingestion in sharing compliant tool
CONTENT PROVIDER WITH UNORGANISED COLLECTION(e.g. listed at Web site or in DVD-ROM)
chooses sharing compliant tool register
as data source
register as data source
register as data source
hosted over agINFRA
hosted over agINFRA
computed over agINFRA
hosted over agINFRA
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
7
shares (meta)data e.g. through OAI-
PMH
indexed & available through
CIARD RINGshares (meta)data e.g. through OAI-
PMH
shares (meta)data e.g. through OAI-
PMH
(META)DATAAGGREGATOR
computed over agINFRA
computed over agINFRA
computed over agINFRA
hosted over agINFRA
computed over agINFRA
served through agINFRA
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
8
computed over agINFRA
computed over agINFRA
computed over agINFRA
hosted over agINFRA…
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
9
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Actors over the infrastructure
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
10
Data providers
Information systems
providers
Researchers
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Policy makers
Developers
Actors over the infrastructure
Taxonomists
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
11
• a global community movement to make agricultural research information and knowledge publicly accessible to all– http://www.ciard.net
agINFRA 2nd Review Meeting, 13th of December 2013
An existing data community
12
• CIARD RING (Routemap to Information Nodes and Gateways)– global registry to give access to any kind of
information sources pertaining to agricultural research for development
– principal tool created through CIARD to allow information providers to register their services in various categories and facilitate discovery of sources of agriculture-related information across the world
agINFRA 2nd Review Meeting, 13th of December 2013
A core registry service
13
New agINFRA RING
agINFRA 2nd Review Meeting, 13th of December 2013
14
New agINFRA RING
agINFRA 2nd Review Meeting, 13th of December 2013
15
• data aggregators registering their data providers to CIARD RING– asking directly to
be registered there (AGRIS)
– federating own smaller registries (GLN)
agINFRA 2nd Review Meeting, 13th of December 2013
RING data registry usage scenario 1
16
• new data providers using agINFRA cloud tools can be automatically registered to CIARD RING– cloud-hosted AgriDrupal or AgriOceanDSpace
instances for document repositories– cloud-hosted agLR instances for learning
repositories• agINFRA Cloud hosting services
– In collaboration with other cloud communities (eg. OKEANOS/GRNET)
– In collaboration with CHAIN-REDS project etc.
agINFRA 2nd Review Meeting, 13th of December 2013
RING data registry usage scenario 2
17
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Use a cloud hosted CMS
Data provider in need of hosting & storage of small-scale CMS
sets up own CMS instance
agINFRA 2nd Review Meeting, 13th of December 2013
Data provider scenario 1
18
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Requests space/accounts in large-scale CMS
Data provider in need of large scale hosting & replication CMS
agINFRA 2nd Review Meeting, 13th of December 2013
Data provider scenario 2
19
• to help all data providers declaring, publishing & linking their metadata properties and value spaces – Publishing their KOSs using the VocBench and their
metadata vocabularies using Neologism– Linking them to existing vocabularies, e.g. AGROVOC for
KOSs, Dublin Core for metadata• guidelines & tools to support data providers in
adopting such a LOD framework– e.g. LODE-BD recommendations
• to provide an entry point to existing relevant vocabularies
agINFRA 2nd Review Meeting, 13th of December 2013
A semantic backbone for agINFRA
20
Registry of Datasets and APIs
Productivity Tools
Registry of vocabularies
and tools
LOD Vocabularies
agINFRA RDFvocabularies
agINFRA LOD KOSs
data sources
collections
APIs
Information services
Grid
jobs
Grid
wor
kflow
ss
Publ
ic R
EST
APIs
Cloud / SaaS tools
Interested to expose (meta)data to e-infrastructure
Data provider hosting CMS at own or external/commercial infrastructure
agINFRA 2nd Review Meeting, 13th of December 2013
Exposing to the e-infrastructure scenario
21agINFRA 2nd Review Meeting, 13th of December 2013
agINFRA LOD layer usage scenario 1• A data owner wants to share their data as Linked
Data• The data owner uses non-LOD vocabularies and
KOSs and wants to publish them as LOD and link them to existing vocabularies
• agINFRA offers tools for publishing vocabularies and KOSs
Once the vocabularies are published, all metadata and all concepts have URIs and can be referenced by any other system
22agINFRA 2nd Review Meeting, 13th of December 2013
agINFRA LOD layer usage scenario 2• Once KOSs are published, all metadata and all
concepts have URIs and can be referenced by any other system
• Data aggregators like AGRIS and GLN can create mash ups between their core data and other agricultural data types (e.g. germplasm, soil maps, statistics, ….) by using the LOD semantic backbone as a crosswalk between metadata formalizations and concepts in different vocabularies
agINFRA 2nd Review Meeting, 13th of December 2013 23
agINFRA LOD layer usage scenario 2
AGRIS bibliographic metadata
Topic
Thematic metadata
Geographic metadata
Scientificnames
Journal
AGRIS Journals RDF store
DBpediaFAO Country Profiles FAO
FisheriesWorldBank indicators by country
Info on journal
Info on topic
Info on country
Specific indicators on
country
Info on species
Example: LOD-based mash-ups in AGRIS
Ariadne harvester
Filtering component
Stores
File system (DC, IEEE LOM, MODS XML)
File system (DC, IEEE LOM, MODS XML)
Stores
Identification and de-duplication component
MySQL
Duplicates
Stores
Transformation component
Store metadata in JSON
Link checking component
PostProcessing/Enrichment component
File system (XMLs)
Get unique ID
Records with Broken Links
To be ported on the Grid
Workflow architecture
Thank you!
Questions