science demonstrator panel session 3 on on physics and … › sites › default › files ›...

31
Science Demonstrator Panel Session 3 on on Physics and Astrophysics

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Science Demonstrator Panel Session 3 on on Physics and

Astrophysics

Page 2: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

PROMINENCE Andrew Lahiff

2www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Page 3: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Challenge

3www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Goals: Overcome the computational bottleneck of using local clusters by using EOSC services

● Local clusters often too small to meet peak demands● Building and compiling for different machines is costly

and delays science output, and can produce different results

● Use of commercial clouds where appropriate can be cost effective for a community

● Principle of “Submit Globally, Run Globally”

Page 4: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Demonstrator

4www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Planned Work:

● Spin up ‘clusters-on-demand’ on any available cloud● Containerise HPC applications● Submit those applications to the cluster● Investigate the performance impact of running HPC

codes on non-optimised hardware

Unplanned Work:

● Get the results from running the applications back to the user

● Integration with an AAI system

Page 5: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Successes

5www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Outcomes:

● Ability to run containerised HTC & MPI jobs across multiple clouds by leveraging HTCondor and Infrastructure Manager (IM)

● Integrated with Identity and Access Management (IAM) service● Ran successfully on several EGI FedCloud sites as well as commercial

clouds● Tested with Docker, Singularity and udocker containers● Successfully moved data from EGI to Ceph resources at STFC

We have successfully run these with a few groups within the fusion community, and they would be quite willing to adopt this as a service.

Page 6: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Issues

6www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Solved:

● Running HPC applications across multiple clouds

Remaining:

● Very limited FedCloud resources provided to SDs● Access to FedCloud requires a VO & X.509 certificates● No AAI integration between compute & storage

Page 7: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Lessons Learned

7www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

What would I choose to do differently in this science demonstrator?

● More focus on data movement; we were not expecting to have time to do it

● More focus on AAI; initially perceived as taking too long● More regular interactions with service providers - they are really

supportive

Things I would change in EOSC Ecosystem

● Better and more consistent documentation, high level and technical ● Better integration between existing services and tools

○ We suspect many SDs have solved the same problem separately, and it should not be up to them

● Less procedure focussed, more agile in responsiveness

Page 8: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

FAIRifying eWaterCycle and SWITCH-ON

Michael R. Crusoe

8www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Page 9: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Challenge

9www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

➔ Central to the science of hydrology is the localised nature of the medium through which water flows.

➔ This fact leads to a large amount of hydrological models, specifically made for a certain region (catchment), severely hindering re-use and reproducibility.

➔ This in turn is leading to a “crisis of reproducibility” in Hydrology.

Page 10: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Demonstrator

10www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

➔ This science demonstrator seeks to create a fully FAIR Hydrological forecasting system, combining local and global models.

➔With this we showcase how data as well as software can be made FAIR in Hydrology.

Page 11: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Successes

11www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

➔ Using a combination of the CWL standard for workflows, Cylc, and Docker software containers, we were able to create a fully reproducible version of the eWaterCycle forecast.

➔ Output data is stored to OneData, and available for analysis in a notebook environment, as well as visualization in a web application.

Page 12: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Issues

12www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

➔ FAIRness (or lack thereof) of input data blocked progress. Attempts to improve FAIRness of datasets from third parties led to nothing.

➔ If input data is not Open Data, this greatly hinders what can be done in terms of processing this data. E.g. no automated checking of data availability and quality, and caching of data problematic.

➔ Only very low-level services seem to be part of EOSC at this time (VMs, clusters, etc); but our needs are for higher level services

➔ OneData system is a really nice concept, but is missing some key usability features. Also needs to be available out-of-the-box on platforms used.

➔ Running the High-resolution forecast is still work in progress...

Page 13: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Lessons Learned

13www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

➔ FAIRness of data you are not the author of cannot be improved by technical means.

➔ EOSC is in need of a coherent set of high level services aimed at researchers.◆ File sharing (Dropbox for large data sets), or location aware

workflow scheduling◆ Execution services for standards-based workflows◆ Authentication and Authorization services◆ Persistent storage (with identifiers like DOIs)

➔ Support for researchers in the form of Data stewards, Research Software Engineers (RSEs), and other roles, is as important as the (compute) services offered.

Page 14: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

LOFARHanno Holties

14www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Page 15: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Challenge

15www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

● LOFAR: distributed

research instrument

● Data centers NL, DE, PL

● 7.5 petabyte per year

● Science processing

by community

● Data transfers

10 – 100 TB

● Scale towards exabytes for Square Kilometer Array

Page 16: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Demonstrator

16www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Create portable processing workflow services

● Port a set of pipeline definitions to theCommon Workflow Language

● Deploy and run software in containers● Demonstrate successful runs on EOSC compute platforms

Provide access to LOFAR data inaccordance with FAIR principles

● Register LOFAR data ina FAIR data repository

● Assess FAIRness ofLOFAR data services

Integration with federated AAI

Page 17: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Successes

17www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Demonstrated portable deployment and running of LOFAR pipelines on EOSC infrastructure

● Docker, Singularity, uDocker● Common Workflow Language● Laptop, HPC Cloud, Grid, HTDP

Proof of Concept web portal forpipelines to run on archived data

Explored metadata services to enhance FAIR sharing

● Inventory of metadata to associate with data repositories ● POC import from archive database in Virtuoso RDF store

Explored collaborations in COmanage for integration in federated AAI

Page 18: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Issues

18www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

POC services still requiring significant work to reach maturity

● Demonstration of processing at scale using CWL● Integrate data staging from archive● Registration/ingest of processed data● Community data portal

Transparent integration of storage & compute infrastructure for data-intensive research at petabyte scale

Standard support for Singularity container deployment on systems

GPU support for container deployments

Page 19: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Lessons Learned

19www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

● What are the main things you would change in your science demonstrator?

Improve on application of standards and building on existing or emerging (not community specific) frameworks.

● What are the main things you would change in the EOSC ecosystem?

Remove/hide personal X509 certificate dependencies

General, if possible standardised, support for container deployment & data analysis workflow services

Transparent integration of storage & compute infrastructure for data-intensive research at petabyte scale (and higher)

Page 20: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

VisIVO Science DemonstratorEva Sciacca / INAF

20www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Page 21: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Challenge

21www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Challenge: space missions and ground-based facilities producemassive volumes of data and the ability to collect and store them isincreasing at a higher pace than the ability to analyze them.

Goal: integrate astrophysical multiwavelength surveys and visualanalytics techniques within the EOSCPilot e-infrastructure toidentify star formation regions in our galaxy, the Milky Way.

Page 22: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Demonstrator

22www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

A number of data visualization tools already exist and yet none of them integrates

access to data provider, IVOA standards, analysis of 2D images catalogue source

properties, 3D spectral datacubes, a cloud integration for complex

computational tasks (massive spectral energy fitting).

The VisIVO Science Demonstrator offers the possibility to design and implement

a visual analytics technical solution in the EOSCPilot ecosystem.

VisIVO Science Demonstrator offers now an integrated solution for visualization

including:

● Services for collaborative portals;

● Visualization and data exploration;

● A number of key components such as workflow applications and data

analysis.

Page 23: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Successes

23www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

VisIVO ViaLactea Cloud

Gateway

EGI Check-in proxy serviceCloud Archive

Data & Metadata

VisIVO SD VA

VLKB Data Resources

and Services

Cloud Computing

EGI Fed Cloud

VisIVO is enabled to connect to EOSC Cloud Infrastructure

computing through the ViaLactea Cloud Gateway (based

on the WS-PGRADE/gUSE portal framework). The gateway

has been integrated with :

- the EGI Check-in service to enable the connection

from the federated Identity Providers

- EGI Federated Cloud to expand the computing

capabilities making use of a dedicated virtual

appliance stored into the EGI Applications DatabaseThe archiving services have been deployed

within the EGI Federated Cloud toward the

assurance of a FAIR access to the surveys data

and related metadata.

Page 24: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Issues

24www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

❖ During the development we found some bugs in the gateway framework forthe connection with the EGI Federated Cloud and the EGI Check-in. Theproblems related to the connection with the cloud have been solved by fixingsome configuration details and upgrading some software libraries. While theproblems related to the connection with the federated log-in services havebeen solved by modifying/customizing the portal plugin of the gateway.

❖ There has been a delay on the VM configuration for deploying the dataresources needed by the science demonstrator due to CESNET-MetaCloudtechnical problems.➢ Within EOSC it would be useful to plan measures for ensuring e-

infrastructure reliability.❖ The problems and issues encountered have been tackled and solved thanks

to the suggestions of shepherds, EGI staff and technical people involved inthe development of the technologies and software employed by the ScienceDemonstrator. We have also profited from the experiences, results andknowledge of other previous science demonstrators (e.g. EPOS-VERCE for thegateway services and the connection with EGI FedCloud and the EGI-CheckIn).

Page 25: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Lessons Learned

25www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

❖ Many services and service catalogues suitable for EOSCare already available. We have so far evaluated mainly theEGI service catalogue➢ EOSC should provide an interface to help evaluating each EOSC

service based on previous experiences, user ratings and tutorialmaterial.

➢ The scientific service enablers should be trained for theconfiguration of the specific EOSC services they would like toemploy and should be advertised of new services that wouldpotentially impact their scientific community.

❖ All the technical and configuration details related to theuse of the EOSC e-infrastructure should be completelyhidden for the end users, since astrophysicists would liketo focus only on the scientific results.

Page 26: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

HEP Data Preservation Jamie Shiers / CERN

26www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

Page 27: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Challenge

27www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

See https://eoscpilot.eu/science-demos/high-energy-physics.

Goal: combine existing services to demonstrate something equivalent to the CERN Open Data Portal (http://opendata.cern.ch/) in the EOSC.

1. A Trustworthy Digital Repository (incl. PIDs)2. A Scalable Digital Library (incl. DOIs)3. “Software and environment” preservation

Target: 100TB AND Open Access! (isn’t this what Open Data means?)

● The size of the dataset of a single LEP experiment (1989 - 2000)● <1 per mil of current LHC data

N.B. long-term (multi-decade) aspects OUT OF SCOPE of this SD!

Page 28: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

The Science Demonstrator

28www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

In High Energy Physics (HEP), Long-Term Data Preservation (LTDP) is built on 3 pillars:

1. “Bit preservation” (= TDRs)2. Documentation preservation (= Digital Libraries)3. Software + environment preservation (= CernVM + CVMFS)

Such services run in production at CERN, many other labs and are offered by EOSC and fore-runners

Presented at iPRES 2016 - all DISCIPLINE AGNOSTIC

Consistent with RDA Data Fabric paper: “Recommendations for Implementing a Virtual Layer for Management of the Complete Life Cycle of Scientific Data”

Page 29: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Successes

29www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

1. Documentation: successfully uploaded file(s) to EUDAT B2SHARE (Invenio-based) test instance (Q1)

1. Software: successfully installed in RAL CVMFS instance (Q1)

1. Data: some data files transferred to CINES TDR (Q4)(decision Q2/Q3 to reduce scope considerably from initial 100TB)

Page 30: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Issues

30www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

It took a long time to find a site prepared to host the data

Open Access was flagged as an issue early on (and never really solved)

Quite some debate about what a TDR can (and should not) do

In particular, FORMAT CONVERSION - not just of (binary) data but also documentation formats (EXPERTS need to check the results)

This is of particular importance in HEP but I suspect generally true

Despite MASSIVE ENTHUSIASM and support from shepherds and others (CINES) the demonstrator did not achieve its objectives

Page 31: Science Demonstrator Panel Session 3 on on Physics and … › sites › default › files › sd-session3.pdf · 2019-01-07 · The Science Demonstrator 4 The European Open Science

Lessons Learned

31www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by

the European Commission, DG Research & Innovation under contract no.

739563

The goals of the SD still remain valid - and are surely of relevance to the EOSC itself, as well as other disciplines

The clock should probably not have started until we were able to identify SERVICE INSTANCES for all services required

● Possible / probable conflict with EOSC Pilot timelines

Is “Open Access” (to the world) really possible without tokens, certificates, registration etc?

● This does not mean no monitoring or accounting● It works in production in the CERN Open Data portal!

(“exact same” technologies!)