petraiii/euxfel data archiving

PETRAIII/EuXFEL data archiving

Sergey Yakubov, Martin Gasthuber (@desy.de) / DESY-ITGeneva, June 5, 2019

| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, June 2019

(National)

DESY Campus Hamburg – much more communities

Synchrotron radiation source (highest brilliance)

VUV & soft-x-ray free-electron laser

MPI-SD

FLASH I+II

PETRA III

+X-Ray Free-Electron Laseratomic structure & fs dynamicsof complex matter

CHyN

HARBOR

CXNSNanoLab

CWS

sources of data

• 3 active accelerators on-site (all photon science) – Petra III, FLASH and EuXFEL

• currently 30 active experimental areas (called beamlines) - operated in parallel

• more in preparation

• Petra IV (future) – expect 104-5 more (raw) data - not all to be stored

• FLASH21+

• majority of generated data is analyzed within a few months (cooling afterwards)

• have two independent copies asap (raw & calibration data i.e. for EuXFEL)

| PETRAIII/EuXFEL data archiving | Martin Gasthuber / Sergey Yakubov, May 2019

DESY datacenter - resources interacting with ARCHIVER

data processing resources before archiving

• HPC cluster – 500 nodes, 30,000 cores, large InfiniBand fabric (growing)

• GPFS – 30 building blocks, 30PB, all InfiniBand connected (growing)

• BeeGFS - 3PB, InfiniBand connected

• LHC computing - Analysis Facility + Tier-2, 1000 nodes, 30,000 cores (growing)

• ~40% more resources outside the datacenter (mostly at experimental areas)

current archiving capabilities

• dCache - 6 large instances, 35PB capacity, >120 building blocks, Tape gateway

• Tape – 2 x SL8500 (15000 Slots), 25 x LTO8, 8 x LTO6, >80PB capacity


data life cycle as of today - from the cradle to the grave

• new archive service connected to ‘Core-FS’ and/or after dCache to fit seamlessly into existing workflow

• this scenario will most likely use the full automated (API/CLI) archive system interface


PETRAIII/EuXFEL data archiving

• end user workflows (3)• scientific data and user

• admin workflow• service integration & planning• configuration based on site+community data policy and contracts between

● SIP == DIP (AIP should allow sequential media efficiently)● Archival Storage - here is the ‘hybrid’ in

○ replication (horizontal)○ multi tiering (vertical) - similar to HSMs○ instances should run on distributes sites

● Archival Storage == instances of bit-stream-preservation● Data Management + Ingest + Access == core of archive instance

end user workflow 1

• individual scientist archiving important work (i.e. publication, partial analysis results, …) – DOI required

• key metrics• Single archive size: average 10-100 GB.

• Files in archive: average 10,000

• Total archive size per user: 5 TB

• Duration: 5-10 years

• Ingest rates: 10-100 MB/s (more is better)

• encryption: not required, nice to have

• browser based interaction (authentication, data transfers, metadata query/ingest)

• cli tools usable for data ingest

• metadata query

• starting from a single string input (like Google search) - interactive/immediate selection response

• change QOS - i.e. #replications after re-evaluating ‘value’ of that data

• DOI generated - (like i.e. zenodo) for durable external references

• mobile devices (tablet, phone, …) (tools + protocols) should not excluded


individual scientist – managing private scientific data (on its own generated and managed)

end user workflow 2

• beamline (experimental station) specific + experiment specific, medium size and rate

• key size parameters• Single archive size: average 5 TB


• Total archive size per beamline: 400 TB, doubles every year

• Duration: 10 years

• Ingest rates: 1-2GB/s

• encryption: no required

• 3’rd party copy - ‘gather’ all data from various primary storage systems - controlled from single point• local (to site) data transport should be RDMA based and operate (efficiently) on networks faster than 10Gbs

• data encryption in transit not required

• API + CLI for seamless automation - i.e. API manifested as Rest-API• CLI on Linux, API should support used platforms (focus on Linux but incl. Windows ;-)

• MetaData

• other methods (i.e. referencing/finding through experiment managing services) used in addition


beamline manager – mix of automated and experiment specific/manual archive interaction

end user workflow 3

• large collaboration or site managing and controlling archive operations on behalf of (all experiments) - all automated and

large scale

• all inherited from previous workflow - except the manual part - all interactions automated

• key size parameters• Single archive size: average 400 TB.


• Total archive size per beamline: 10s PB, doubles every year

• Duration: 10 years

• Ingest rates: 3-10GB/s - averaged over 1-3 hours

• encryption: not required

• bulk recall - planned re-analysis require bulk restore operation with decent rates (50% of ingest rate) (feed the compute engine)

• async notification from archive on reaching certain states (i.e. data accepted and stored) to be updated in external DBs


Integrated data archiving for large standardized beamline/facility experiments

site manager & administrative workflows

• create and config core archive and related bit-stream-preservation instances• based on site and community data policies + contracts with community

• create ‘archive profiles’ determining operation modes and limits (all what could generate costs ;-)• i.e. this includes tradeoffs between costs and data resiliency (probability of data loss)• select appropriate ‘bit-stream-preservation’ instances and hierarchy among them (i.e. replication)

• setup further admin and end user accounts and their roles (authorizations)• delegation of limited admin tasks by group admins of community/groups

• configure/setup AAI - i.e. local IDP

• wide range of authentication methods usable (beside local site ones) – x509, OpenID, eduGAIN, … - more is better

used to ‘authenticate’ and to be usable in ‘ACL’ like authorization settings (the identity or DN)

• multiple authentication mapped to single ‘identity’

• setup role based model (identity select roles select archive profile)


integration, setup and control - workflow derived requirements

site manager & administrative workflows

• deployment scenarios (instance architectures)• deploy main services and esp. metadata store/query (Data Management+Ingest+Access in OAIS

speech)• locally• in cloud (using remote service and storage/handling hardware for MD operations)

• create/attach bit stream preservation layer (Archival Storage in OAIS speech)• local only• remote only• tiered - local and remote (i.e. remote tape) - remote could be ‘cooperating lab’, public cloud, …

• (streaming) protocol to transfer data between tiers should support efficient and secure ‘wide area’ transfers

• Deployment based on open standards / open source version preferrable• avoid vendor lock-in, assure long-term viability, benefit from wide community support• subscribing to paid support not excluded• commercial version not excluded as well (depending on the licencing model, exit strategy, etc.)


Deployment models/business models

left over…

• life cycle of archive objects (not bound to a single access session) - create, fill (meta)data, close - data becomes immutable, query

• archive objects could be related to existing ones - i.e. containing new versions of derived data• all data access should be ‘stream’ based

• no random access (within a file) is required• recalls of pre-selected files out of single archive object• network protocol ‘firewall friendly) - i.e. http* based

• Billing• any ‘non-local’ deployment requires billing services and methods (obvious) seperated in service and storage costs

(at least)• external storage resource - long term predictable costs/contracts preferred (less ‘pay as you go’)• per user and group billing (user may be member of several groups and groups might be nested)

• encryption - in all cases is ‘nice to have’ - expecting issues with local ‘key management’ services• pre and post en/decryption of data in motion and/or at rest is a valid alternative

• (Meta)Data formats• no special (known to the archive service) data formats required, thus no format conversions (without user

interaction) required• Metadata, needs ‘exportable/importable’ to new/updated instances• Metadata - query engine should handle binary, strings, integer and date/time


other thoughts, requirements and options

petraiii/euxfel data archiving

Documents