science demonstrator panel session 3 on on physics and … › sites › default › files ›...
TRANSCRIPT
Science Demonstrator Panel Session 3 on on Physics and
Astrophysics
PROMINENCE Andrew Lahiff
2www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
3www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Goals: Overcome the computational bottleneck of using local clusters by using EOSC services
● Local clusters often too small to meet peak demands● Building and compiling for different machines is costly
and delays science output, and can produce different results
● Use of commercial clouds where appropriate can be cost effective for a community
● Principle of “Submit Globally, Run Globally”
The Science Demonstrator
4www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Planned Work:
● Spin up ‘clusters-on-demand’ on any available cloud● Containerise HPC applications● Submit those applications to the cluster● Investigate the performance impact of running HPC
codes on non-optimised hardware
Unplanned Work:
● Get the results from running the applications back to the user
● Integration with an AAI system
Successes
5www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Outcomes:
● Ability to run containerised HTC & MPI jobs across multiple clouds by leveraging HTCondor and Infrastructure Manager (IM)
● Integrated with Identity and Access Management (IAM) service● Ran successfully on several EGI FedCloud sites as well as commercial
clouds● Tested with Docker, Singularity and udocker containers● Successfully moved data from EGI to Ceph resources at STFC
We have successfully run these with a few groups within the fusion community, and they would be quite willing to adopt this as a service.
Issues
6www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Solved:
● Running HPC applications across multiple clouds
Remaining:
● Very limited FedCloud resources provided to SDs● Access to FedCloud requires a VO & X.509 certificates● No AAI integration between compute & storage
Lessons Learned
7www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
What would I choose to do differently in this science demonstrator?
● More focus on data movement; we were not expecting to have time to do it
● More focus on AAI; initially perceived as taking too long● More regular interactions with service providers - they are really
supportive
Things I would change in EOSC Ecosystem
● Better and more consistent documentation, high level and technical ● Better integration between existing services and tools
○ We suspect many SDs have solved the same problem separately, and it should not be up to them
● Less procedure focussed, more agile in responsiveness
FAIRifying eWaterCycle and SWITCH-ON
Michael R. Crusoe
8www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
9www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
➔ Central to the science of hydrology is the localised nature of the medium through which water flows.
➔ This fact leads to a large amount of hydrological models, specifically made for a certain region (catchment), severely hindering re-use and reproducibility.
➔ This in turn is leading to a “crisis of reproducibility” in Hydrology.
The Science Demonstrator
10www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
➔ This science demonstrator seeks to create a fully FAIR Hydrological forecasting system, combining local and global models.
➔With this we showcase how data as well as software can be made FAIR in Hydrology.
Successes
11www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
➔ Using a combination of the CWL standard for workflows, Cylc, and Docker software containers, we were able to create a fully reproducible version of the eWaterCycle forecast.
➔ Output data is stored to OneData, and available for analysis in a notebook environment, as well as visualization in a web application.
Issues
12www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
➔ FAIRness (or lack thereof) of input data blocked progress. Attempts to improve FAIRness of datasets from third parties led to nothing.
➔ If input data is not Open Data, this greatly hinders what can be done in terms of processing this data. E.g. no automated checking of data availability and quality, and caching of data problematic.
➔ Only very low-level services seem to be part of EOSC at this time (VMs, clusters, etc); but our needs are for higher level services
➔ OneData system is a really nice concept, but is missing some key usability features. Also needs to be available out-of-the-box on platforms used.
➔ Running the High-resolution forecast is still work in progress...
Lessons Learned
13www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
➔ FAIRness of data you are not the author of cannot be improved by technical means.
➔ EOSC is in need of a coherent set of high level services aimed at researchers.◆ File sharing (Dropbox for large data sets), or location aware
workflow scheduling◆ Execution services for standards-based workflows◆ Authentication and Authorization services◆ Persistent storage (with identifiers like DOIs)
➔ Support for researchers in the form of Data stewards, Research Software Engineers (RSEs), and other roles, is as important as the (compute) services offered.
LOFARHanno Holties
14www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
15www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● LOFAR: distributed
research instrument
● Data centers NL, DE, PL
● 7.5 petabyte per year
● Science processing
by community
● Data transfers
10 – 100 TB
● Scale towards exabytes for Square Kilometer Array
The Science Demonstrator
16www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Create portable processing workflow services
● Port a set of pipeline definitions to theCommon Workflow Language
● Deploy and run software in containers● Demonstrate successful runs on EOSC compute platforms
Provide access to LOFAR data inaccordance with FAIR principles
● Register LOFAR data ina FAIR data repository
● Assess FAIRness ofLOFAR data services
Integration with federated AAI
Successes
17www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Demonstrated portable deployment and running of LOFAR pipelines on EOSC infrastructure
● Docker, Singularity, uDocker● Common Workflow Language● Laptop, HPC Cloud, Grid, HTDP
Proof of Concept web portal forpipelines to run on archived data
Explored metadata services to enhance FAIR sharing
● Inventory of metadata to associate with data repositories ● POC import from archive database in Virtuoso RDF store
Explored collaborations in COmanage for integration in federated AAI
Issues
18www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
POC services still requiring significant work to reach maturity
● Demonstration of processing at scale using CWL● Integrate data staging from archive● Registration/ingest of processed data● Community data portal
Transparent integration of storage & compute infrastructure for data-intensive research at petabyte scale
Standard support for Singularity container deployment on systems
GPU support for container deployments
Lessons Learned
19www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
● What are the main things you would change in your science demonstrator?
Improve on application of standards and building on existing or emerging (not community specific) frameworks.
● What are the main things you would change in the EOSC ecosystem?
Remove/hide personal X509 certificate dependencies
General, if possible standardised, support for container deployment & data analysis workflow services
Transparent integration of storage & compute infrastructure for data-intensive research at petabyte scale (and higher)
VisIVO Science DemonstratorEva Sciacca / INAF
20www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
21www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
Challenge: space missions and ground-based facilities producemassive volumes of data and the ability to collect and store them isincreasing at a higher pace than the ability to analyze them.
Goal: integrate astrophysical multiwavelength surveys and visualanalytics techniques within the EOSCPilot e-infrastructure toidentify star formation regions in our galaxy, the Milky Way.
The Science Demonstrator
22www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
A number of data visualization tools already exist and yet none of them integrates
access to data provider, IVOA standards, analysis of 2D images catalogue source
properties, 3D spectral datacubes, a cloud integration for complex
computational tasks (massive spectral energy fitting).
The VisIVO Science Demonstrator offers the possibility to design and implement
a visual analytics technical solution in the EOSCPilot ecosystem.
VisIVO Science Demonstrator offers now an integrated solution for visualization
including:
● Services for collaborative portals;
● Visualization and data exploration;
● A number of key components such as workflow applications and data
analysis.
Successes
23www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
VisIVO ViaLactea Cloud
Gateway
EGI Check-in proxy serviceCloud Archive
Data & Metadata
VisIVO SD VA
VLKB Data Resources
and Services
Cloud Computing
EGI Fed Cloud
VisIVO is enabled to connect to EOSC Cloud Infrastructure
computing through the ViaLactea Cloud Gateway (based
on the WS-PGRADE/gUSE portal framework). The gateway
has been integrated with :
- the EGI Check-in service to enable the connection
from the federated Identity Providers
- EGI Federated Cloud to expand the computing
capabilities making use of a dedicated virtual
appliance stored into the EGI Applications DatabaseThe archiving services have been deployed
within the EGI Federated Cloud toward the
assurance of a FAIR access to the surveys data
and related metadata.
Issues
24www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
❖ During the development we found some bugs in the gateway framework forthe connection with the EGI Federated Cloud and the EGI Check-in. Theproblems related to the connection with the cloud have been solved by fixingsome configuration details and upgrading some software libraries. While theproblems related to the connection with the federated log-in services havebeen solved by modifying/customizing the portal plugin of the gateway.
❖ There has been a delay on the VM configuration for deploying the dataresources needed by the science demonstrator due to CESNET-MetaCloudtechnical problems.➢ Within EOSC it would be useful to plan measures for ensuring e-
infrastructure reliability.❖ The problems and issues encountered have been tackled and solved thanks
to the suggestions of shepherds, EGI staff and technical people involved inthe development of the technologies and software employed by the ScienceDemonstrator. We have also profited from the experiences, results andknowledge of other previous science demonstrators (e.g. EPOS-VERCE for thegateway services and the connection with EGI FedCloud and the EGI-CheckIn).
Lessons Learned
25www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
❖ Many services and service catalogues suitable for EOSCare already available. We have so far evaluated mainly theEGI service catalogue➢ EOSC should provide an interface to help evaluating each EOSC
service based on previous experiences, user ratings and tutorialmaterial.
➢ The scientific service enablers should be trained for theconfiguration of the specific EOSC services they would like toemploy and should be advertised of new services that wouldpotentially impact their scientific community.
❖ All the technical and configuration details related to theuse of the EOSC e-infrastructure should be completelyhidden for the end users, since astrophysicists would liketo focus only on the scientific results.
HEP Data Preservation Jamie Shiers / CERN
26www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The Science Challenge
27www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
See https://eoscpilot.eu/science-demos/high-energy-physics.
Goal: combine existing services to demonstrate something equivalent to the CERN Open Data Portal (http://opendata.cern.ch/) in the EOSC.
1. A Trustworthy Digital Repository (incl. PIDs)2. A Scalable Digital Library (incl. DOIs)3. “Software and environment” preservation
Target: 100TB AND Open Access! (isn’t this what Open Data means?)
● The size of the dataset of a single LEP experiment (1989 - 2000)● <1 per mil of current LHC data
N.B. long-term (multi-decade) aspects OUT OF SCOPE of this SD!
The Science Demonstrator
28www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
In High Energy Physics (HEP), Long-Term Data Preservation (LTDP) is built on 3 pillars:
1. “Bit preservation” (= TDRs)2. Documentation preservation (= Digital Libraries)3. Software + environment preservation (= CernVM + CVMFS)
Such services run in production at CERN, many other labs and are offered by EOSC and fore-runners
Presented at iPRES 2016 - all DISCIPLINE AGNOSTIC
Consistent with RDA Data Fabric paper: “Recommendations for Implementing a Virtual Layer for Management of the Complete Life Cycle of Scientific Data”
Successes
29www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
1. Documentation: successfully uploaded file(s) to EUDAT B2SHARE (Invenio-based) test instance (Q1)
1. Software: successfully installed in RAL CVMFS instance (Q1)
1. Data: some data files transferred to CINES TDR (Q4)(decision Q2/Q3 to reduce scope considerably from initial 100TB)
Issues
30www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
It took a long time to find a site prepared to host the data
Open Access was flagged as an issue early on (and never really solved)
Quite some debate about what a TDR can (and should not) do
In particular, FORMAT CONVERSION - not just of (binary) data but also documentation formats (EXPERTS need to check the results)
This is of particular importance in HEP but I suspect generally true
Despite MASSIVE ENTHUSIASM and support from shepherds and others (CINES) the demonstrator did not achieve its objectives
Lessons Learned
31www.eoscpilot.euThe European Open Science Cloud for Research pilot project is funded by
the European Commission, DG Research & Innovation under contract no.
739563
The goals of the SD still remain valid - and are surely of relevance to the EOSC itself, as well as other disciplines
The clock should probably not have started until we were able to identify SERVICE INSTANCES for all services required
● Possible / probable conflict with EOSC Pilot timelines
Is “Open Access” (to the world) really possible without tokens, certificates, registration etc?
● This does not mean no monitoring or accounting● It works in production in the CERN Open Data portal!
(“exact same” technologies!)