parallel session - room 211/212 value and challenges of federated … · docker containers...
TRANSCRIPT
www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by the
European Commission, DG Research & Innovation under contract no. 7395631
Chairs:
Hermann Lederer (MPCDF) and Volker Beckmann (CNRS)
Panelists
Carlos Oscar Sorzano (CNB, CSIC, ES) - CryoEM
Werner Kutsch (ICOS RI) – ENVRI/ERFI
Andreas Rietbrock (U. Liverpool) - EPOS/VERCE
Erik van den Bergh (EMBL) - EGA Life Sciences Datasets
Rob van der Meer (Astron) - LOFAR data
Parallel Session - Room 211/212
Value and Challenges of Federated Open Science
Carlos Oscar S. Sorzano
Instruct Image Processing Center
http://i2pc.es
Science Demonstrator:Cryo-Electron Microscopy
Federated Open Science Challenges
Solution: Reproducible JSON[
{
"object.className": "ProtImportMicrographs",
"object.id": “1",
"filesPath": “/data/movie_?????.mrc"
},
{
"object.className": "ProtPreprocessMicrographs",
"object.id": “2",
“inputMicrographs": “1.outputMicrographs",
“doDownsample”: True,
“downsamplingFactor”: 2
},
{
"object.className": "ProtEstimateCTF",
"object.id": “3",
“inputMicrographs": “2.outputMicrographs",
“minDefocus”: 0.5,
“maxDefocus”: 4
},
]
Challenges ahead
Integration challenges
Workflow description:
• Common Workflow Language
• Business Process Execution Language (BPEL)
• Yet Another Workflow Language (YAWL)
• Apache Taverna
• Galaxy
• Knime
• …
Ontology support:
• EDAM ontology
• Ontology coverage
Open science challenges
• Driving force to report process
• Driving force for programs to migrate
Data description:
• Minimum Information (MIBBI, MIAME,
MIAPE,
Challenges ahead
Science Demonstrator ENVRI/ERFI
www.eoscpilot.eu7
DEMONSTRATOR:
Focus on dynamics of greenhouse gases, aerosols and clouds and their role in radiative
forcing, Interoperability between observations and climate modeling; cooperation between
environmental research infrastructures.
Improvement of data integration services based on metadata ontologies, model-data
integration by use of HPC, Petascale data movement, innovative services to compile and
compare model output from different sources, especially on semi-automatic spatiotemporal
scale conversion
FAIR CHALLENGES:
Findability: Metadata ontologies matching between NETCDF-CF and in-situ metadata, data
quality indicators.
Accessibility: Automated access routines between the RI repositories. For fully open data, this
is not immediately problematic, but might require analysis on needed resources and APIs.
Interoperability: APIs, service integration, large data transfers, where to do processing (how to
document?)
Reusability: Citing and persistently identifying scale-changed data-sets? How to transfer
knowledge of data versions used.
ENVRI Radiative Forcing IntegrationOrganisations & Contacts: Werner Kutsch, Alex Vermeulen (ICOS ERIC), Ari Asmi (ENVRIplus) Paolo Laj(ACTRIS), Stefan Kindermann, IS-ENES2 (DKRZ), Sylvie Joussaume, Sébastien Denvil, IS-ENES2 (IPSL)
Computational Seismology
3D waveform modelling in 3D media
The VERCE platform
Andreas Rietbrock & Federica MagnoniAlessandro Spinuso, Andre Gemud, Rafiq Saleh, Emanuele Casarotti
EPOS Computational Earth Sciences
Towards Inverse Modeling
HPC Server Cloud / EOSC
Towards inverse modeling: Misfit calculation
Misfit AnalysisData/Synt Processing Simulated Synthetics
Data Download (FDSN)
Provenance Validation and Monitoring
Generates W3C-PROV
Agile Data Intensive Framework
Python library used to describe abstract workflows for
distributed data-intensive applications.
Support for composition: Single components may be defined by
having their own internal workflows.
Workflows described in dispel4Py can be automatically executed
in numerous parallel environment.
Docker containers available supporting multiple execution environments (MPI,
SharedMemory) and the integration with other workflow systems (eg. Pegasus)
MPI
Deployed on local Clouds (MAP-REDUCE streaming model)
• dispel4py.org
• solution is not supported in OCCI
• by OCCI
Supported by EOSC Demonstrator and new EU project DARE
EGA Life science datasets
A third part dataset (GoNLproject) as use case
Reproduction of the original pipeline
Production of an updated pipeline
Containerized versions of both pipelines using NextFlow
Test both pipelines on the use case dataset
EOSC pilot LOFAR SDEOSCpilot Stakeholder meeting
28 November 2017, Brussels
Rob van der Meer, ASTRON
14
ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)
15
EOSC Pilot LOFAR SD
Excellent Science
• Reduce and analyse Radio astronomy data:• From antenna signal to visibilities to images
Main Challenges
• Large volume, complex (multi-step analysis)• Power users need compute at their data because of volume
• Unexperienced user need guidance with parameters anddata sets
• Make it work across platforms and data centers
16
EOSC Pilot LOFAR SD
Approaches to solution• implementation of Common Workflow Language (CWL)
based pipelines• first results on ‘prefactor pipeline’
The Prefactor pipeline in CWL
17
18
Approaches to solution
• implementation of Common Workflow Language (CWL) based pipelines
• first results on ‘prefactor pipeline’
• make them deployable as Singularity containers.
• run on various systems (data centers)
• Pilot project on SURFsara HPC cloud
• Pilot on beta phase of HTP cluster in February 2018
EOSC Pilot LOFAR SD