Download - D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Resource Management
D4Science:An e-Infrastructure for Facilitating Fisheries
and Aquaculture Resource Management
Pasquale PaganoNational Research Council of Italy
22nd International CODATA24-27 October 2010
Cape Town (South Africa)
www.d4science.eu
2
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Assumptions
Consolidated facts:
Very rich applications and data collections are currently maintained by a multitude of authoritative providers
Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, …
Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), …
Several standards are adopted in the same domain
Societal observations
• A rich variety of protocols, models, and formats • Create barriers in the usage of resources• Delay dramatically new exploitation patterns
Technical observations
Protocols, models, and formats heterogeneity increases load, Load increases failures
3
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
D4Science Vision
D4Science objectives:
hide heterogeneity, i.e. abstract over differences in location, protocol, and model;
embrace heterogeneity, i.e. allow for multiple locations, protocols, and models;
Technical goals
no bottlenecks: scale no less than the interfaced resources no outages: keep failures partial and temporary autonomicity: system reacts and recovers
4
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
From a testbed to a production ecosystem
Oct .’04 Nov.’07 Jan.’08 Dec.’09Oct .’09 Sept.’11
5
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
From a testbed to a production ecosystemfu
nctio
nalit
y
gLite
gCube
Oct .’04 Nov.’07 Jan.’08 Dec.’09Oct .’09 Sept.’11
6
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Infrastructure Exploitation
30 Nodes• CNR• NKUA• ESA• FAO• UNIBASEL
25 Data• EEA• MERIS• AATSR69 Metadata• es• ISO19115• eiDB
15 Data• AquaMaps• Fact Sheets• Country Maps 28 Metadata• FARM_dc• aquamaps
Nodes Collections Functionality
29 Nodes• CNR• NKUA• FAO• UNIBASEL
• Integration with gPod
• Geographical and text search• Search by metadata• Personal workspace• Objects annotation• Report generation• Maps Generation•Time Series management
Production
More than 500 autonomic Web Services
7
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
A Digital Library System is a possibly distributed system that collects, manages and preserves for the long term rich digital content, and offers to its user communities specialised functionality on that content, of measurable quality and according to codified policies
[The Digital Library Reference Model]
The gCube data infrastructure enabling framework provides DL functionality by:
gCube as a Digital Library System
Federating exiting digital content
Supporting the generation of new digital content
Providing discovery and access capabilities
maintained in a variety of tailored repository systems
by exploiting heterogeneous computational platforms
on diversely described and modeled digital content
8
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
gCube as an e-Infrastructure ecosystem enabling framework
By bridging a number of well-established systems and standards from various domains
including high-energy physics, biodiversity, fishery and aquaculture resources management
gCube realises an
e-Infrastructure ecosystem
9
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
How does it work ?
10
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
Through the VRE, groups of users have controlled access to distributed data and services integrated under a personalised interface.
11
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
A Virtual Research Environment (VRE) supports cooperative activities
Metadata cleaning, enrichment, and transformation by exploiting mapping schema, controlled vocabulary, thesauri, and ontology
Processes refinement and show cases implementation (restricted to a set of users);
Data assessment (required to make data publically exploitable by VO members);
Expert users validation of products generated through data elaboration or simulation.
12
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Why sharing through VREs is a key?
VREs integrated environment put at disposal a functionality set to support and perform research activities:
the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the
results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information
sources and keeping them updated.
VREs integrated environment put at disposal a functionality set to support and perform research activities:
the ability to integrate heterogeneous data and services the ability to process information on-demand ingesting the
results, to share data and process with other users, to customize collection of information, to store user actions and exploit them for further use, to aggregate relevant information into ad-hoc information
sources and keeping them updated.
13
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
14
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
15
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
16
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
17
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
18
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Building Virtual Research Environments
19
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
…
Transformation
Storage
VRE Facilities
Tools supporting specific tasksTools supporting specific tasks
A virtual live document to describe research results
A virtual live document to describe research results
A virtual desktop to organize the working environment
A virtual desktop to organize the working environment
Workspace
Species Maps Generation
Time Series Management
ReportManagement
Search AnnotationVisualisatio
nSearch AnnotationVisualisatio
nAnnotationSearchStorageVisualisatio
n
TransformationTransformatio
nStorage
20
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Workspace
A collaboration-oriented suite providing for seamless access and organisation facilities on a rich array of
objects (e.g. Information Objects, Queries, Files, Templates) mediation between external world objects, systems and
infrastructures (import/export/publishing) support common file manager (drag & drop, contextual menu) support an effective rich object sharing facility
21
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
AquaMaps is an application*
tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species,
that generates color-coded species range maps using a half-degree latitude and longitude blocks
by interfacing several databases and repository providers
Species Distribution Maps Generation
* Algorithm by Kashner et al. 2006
22
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
AquaMaps execution is based on the gCube Ecological Niche Modelling Suite which allows the extrapolation of known species occurrences
Species Distribution Maps Generation
◦ to determine environmental envelopes (species tolerances)
◦ to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution)
Very large volume of input and output data: HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360Very large number of computation: One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
23
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Time Series Management
Offers a set of tools to manage capture statistics
Supports the complete TS lifecycle Supports validation, curation, and analysis Provides support for data reallocation Produces uniform data-set
24
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Time Series
Offers a set of tools to operate on capture statistics
Multiple key families support Filtering, grouping, and aggregation Union Mining
Produce automatically provenance information
25
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Time Series
Offers a set of tools to operate on capture statistics
Multiple key families support Filtering, grouping, and aggregation Union Mining
Produce automatically provenance information
26
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Report Management
A collaboration-oriented suite providing for template-oriented, feature-rich and flexible document format
definition effective and infrastructure-integrated report compilation (drag &
drop workspace items) collaborative and distributed editing (workspace based) standard-based report materialisation (HTML, OpenXML)
27
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
VREs, Workspaces and Report in Action
28
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
gCube and Humanities: the gMan case
JISC - King’s College London Look at new ways of integrating existing data resources for Classics and
add services so that research work based on integrated resources can be published
Data sources The Heidelberger Gesamtverzeichnis (HGV) der griechischen Papyrusurkunden
Aegyptens, a collection of metadata records for 55,000 Greek papyri from Egypt. Projet Volterra, a database of Roman legal texts, and associated metadata, from
various sources (epigraphic, papyrological, or literary) currently in the low tens of thousands but very much in progress.
The Inscriptions of Aphrodisias, (InsAph), a corpus of about 2,000 ancient Greek inscriptions from the Roman city of Aphrodisias in Asia Minor, including transcribed texts and metadata marked up using EpiDoc TEI, as well as images of the physical objects.
Main functionality cross-collection search workspace annotation report creation
Early results in “AHM 2009 Phil. Trans. A special issue”
29
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
VRE Sumamry
D4Science approach:
• Heterogeneous resources are accessible in a common ecosystem of resources
• despite their locations, technologies, and protocol
• Different communities have access to different views• according to the conditions under which the sharing can occur
• Each community can define its own virtual research environment to satisfy specific needs
• for a limited timeframe and at no cost for the providers of the resource
• Several virtual research environments can coexist• without interfering each other even by competing for the same
resources
30
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Conclusions
Facts
Very rich services and data collections are currently maintained by a multitude of authoritative providers
Several standards are adopted in the same domain
Interoperability approaches are key to exploit such richness
D4Science offers a variety of patterns, tools, and solutions
to interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms
with a rich set of free-to-use tailored services to decrease the cost of adoption to reduce the time to market of new ideas to deal with plethora of standards
31
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Supported Standards
WS-* WSRF WS-BPEL
JDL JSDL Glue Schema (part)
X-* DC, TEI, ISO etc
JSR (several)
GSI-Security XACML SAML
OpenSearch
OGC related
Comply with: OAI-PMH OAI-ORE
32
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Supported Standards
WSRF Specifications
• WS-ResourceProperties (WSRF-RP)• WS-ResourceLifetime (WSRF-RL)• WS-ServiceGroup (WSRF-SG)• WS-BaseFaults (WSRF-BF)
JSR
• 168 : Simple Portlets• 286 : 186 update• 160 : JMX
WSN Specifications:
• WS-BaseNotification• WS-Topics• (WS-BrokeredNotification)• ….
WS-* Standards
• SOAP• WSDL• WS-Addressing• ….
ISO:
• ISO3166 countries• ISO4217 currencies• ISO19115 geo-location• ….
X-*
• XML• XSD• XSL• XSLT• xPath• xQuery
OGC
• Web Coverage Processing Service • Web Coverage Service • Web Feature Service • Web Map Context • Web Map Service • Web Map Tile Service • Web Processing Service • Web Service Common
OGF Standard:
• Glue Schema (2)
……….
Comply with: OAI-PMH OAI-ORE
33
www.d4science.euD4Science22nd International CODATA, Cape Town 24-27 October 2010
Find us
www.gcube-system.org
www.d4science.eu
Donatella CastelliD4Science-II Project [email protected]
Pasquale PaganoD4Science-II Technical [email protected]
Thank You For Your Attention