parallel session - room 211/212 value and challenges of federated … · docker containers...

18
www.eoscpilot.eu The European Open Science Cloud for Research pilot project is funded by the European Commission, DG Research & Innovation under contract no. 739563 1 Chairs: Hermann Lederer (MPCDF) and Volker Beckmann (CNRS) Panelists Carlos Oscar Sorzano (CNB, CSIC, ES) - CryoEM Werner Kutsch (ICOS RI) ENVRI/ERFI Andreas Rietbrock (U. Liverpool) - EPOS/VERCE Erik van den Bergh (EMBL) - EGA Life Sciences Datasets Rob van der Meer (Astron) - LOFAR data Parallel Session - Room 211/212 Value and Challenges of Federated Open Science

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by the

European Commission, DG Research & Innovation under contract no. 7395631

Chairs:

Hermann Lederer (MPCDF) and Volker Beckmann (CNRS)

Panelists

Carlos Oscar Sorzano (CNB, CSIC, ES) - CryoEM

Werner Kutsch (ICOS RI) – ENVRI/ERFI

Andreas Rietbrock (U. Liverpool) - EPOS/VERCE

Erik van den Bergh (EMBL) - EGA Life Sciences Datasets

Rob van der Meer (Astron) - LOFAR data

Parallel Session - Room 211/212

Value and Challenges of Federated Open Science

Page 2: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Carlos Oscar S. Sorzano

Instruct Image Processing Center

http://i2pc.es

Science Demonstrator:Cryo-Electron Microscopy

Federated Open Science Challenges

Page 3: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Public data repositories

Page 4: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Solution: Reproducible JSON[

{

"object.className": "ProtImportMicrographs",

"object.id": “1",

"filesPath": “/data/movie_?????.mrc"

},

{

"object.className": "ProtPreprocessMicrographs",

"object.id": “2",

“inputMicrographs": “1.outputMicrographs",

“doDownsample”: True,

“downsamplingFactor”: 2

},

{

"object.className": "ProtEstimateCTF",

"object.id": “3",

“inputMicrographs": “2.outputMicrographs",

“minDefocus”: 0.5,

“maxDefocus”: 4

},

]

Page 5: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Challenges ahead

Integration challenges

Workflow description:

• Common Workflow Language

• Business Process Execution Language (BPEL)

• Yet Another Workflow Language (YAWL)

• Apache Taverna

• Galaxy

• Knime

• …

Ontology support:

• EDAM ontology

• Ontology coverage

Open science challenges

• Driving force to report process

• Driving force for programs to migrate

Data description:

• Minimum Information (MIBBI, MIAME,

MIAPE,

Page 6: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Challenges ahead

Page 7: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Science Demonstrator ENVRI/ERFI

www.eoscpilot.eu7

DEMONSTRATOR:

Focus on dynamics of greenhouse gases, aerosols and clouds and their role in radiative

forcing, Interoperability between observations and climate modeling; cooperation between

environmental research infrastructures.

Improvement of data integration services based on metadata ontologies, model-data

integration by use of HPC, Petascale data movement, innovative services to compile and

compare model output from different sources, especially on semi-automatic spatiotemporal

scale conversion

FAIR CHALLENGES:

Findability: Metadata ontologies matching between NETCDF-CF and in-situ metadata, data

quality indicators.

Accessibility: Automated access routines between the RI repositories. For fully open data, this

is not immediately problematic, but might require analysis on needed resources and APIs.

Interoperability: APIs, service integration, large data transfers, where to do processing (how to

document?)

Reusability: Citing and persistently identifying scale-changed data-sets? How to transfer

knowledge of data versions used.

ENVRI Radiative Forcing IntegrationOrganisations & Contacts: Werner Kutsch, Alex Vermeulen (ICOS ERIC), Ari Asmi (ENVRIplus) Paolo Laj(ACTRIS), Stefan Kindermann, IS-ENES2 (DKRZ), Sylvie Joussaume, Sébastien Denvil, IS-ENES2 (IPSL)

Page 8: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Computational Seismology

3D waveform modelling in 3D media

The VERCE platform

Andreas Rietbrock & Federica MagnoniAlessandro Spinuso, Andre Gemud, Rafiq Saleh, Emanuele Casarotti

Page 9: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

EPOS Computational Earth Sciences

Page 10: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Towards Inverse Modeling

HPC Server Cloud / EOSC

Page 11: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Towards inverse modeling: Misfit calculation

Misfit AnalysisData/Synt Processing Simulated Synthetics

Data Download (FDSN)

Provenance Validation and Monitoring

Generates W3C-PROV

Page 12: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

Agile Data Intensive Framework

Python library used to describe abstract workflows for

distributed data-intensive applications.

Support for composition: Single components may be defined by

having their own internal workflows.

Workflows described in dispel4Py can be automatically executed

in numerous parallel environment.

Docker containers available supporting multiple execution environments (MPI,

SharedMemory) and the integration with other workflow systems (eg. Pegasus)

MPI

Deployed on local Clouds (MAP-REDUCE streaming model)

• dispel4py.org

• solution is not supported in OCCI

• by OCCI

Supported by EOSC Demonstrator and new EU project DARE

Page 13: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

EGA Life science datasets

A third part dataset (GoNLproject) as use case

Reproduction of the original pipeline

Production of an updated pipeline

Containerized versions of both pipelines using NextFlow

Test both pipelines on the use case dataset

Page 14: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

EOSC pilot LOFAR SDEOSCpilot Stakeholder meeting

28 November 2017, Brussels

Rob van der Meer, ASTRON

14

ASTRON is part of the Netherlands Organisation for Scientific Research (NWO)

Page 15: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

15

EOSC Pilot LOFAR SD

Excellent Science

• Reduce and analyse Radio astronomy data:• From antenna signal to visibilities to images

Main Challenges

• Large volume, complex (multi-step analysis)• Power users need compute at their data because of volume

• Unexperienced user need guidance with parameters anddata sets

• Make it work across platforms and data centers

Page 16: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

16

EOSC Pilot LOFAR SD

Approaches to solution• implementation of Common Workflow Language (CWL)

based pipelines• first results on ‘prefactor pipeline’

Page 17: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

The Prefactor pipeline in CWL

17

Page 18: Parallel Session - Room 211/212 Value and Challenges of Federated … · Docker containers available supporting multiple execution environments (MPI, SharedMemory) and the integration

18

Approaches to solution

• implementation of Common Workflow Language (CWL) based pipelines

• first results on ‘prefactor pipeline’

• make them deployable as Singularity containers.

• run on various systems (data centers)

• Pilot project on SURFsara HPC cloud

• Pilot on beta phase of HTP cluster in February 2018

EOSC Pilot LOFAR SD