dcache: an overview...dcache: an overview | | 2019-02-27 | 13 data lakes: distributed resources...
TRANSCRIPT
dCache: An OverviewPaul Millar
on behalf of the dCache team
Nordic Data Management WorkshopOslo, Norway; 2019-02-27
https://indico.cern.ch/event/779913/
eXtreme DataCloud is co-funded by the Horizon2020 Framework Program – Grant Agreement 777367Copyright © Members of the XDC Collaboration, 2017-2020
dCache: An Overview | | 2019-02-27 | 2
Scientific data challenges● Volume● Fast ingest● Chaotic Access● Sharing data● Access Control● Persistence & long-term
archival● Immutability
dCache: An Overview | | 2019-02-27 | 3
Fast AnalysisNFS 4.1/pNFS
High SpeedData Ingest
Wide Area Transfers (Globus Online, FTS) by GridFTP, HTTP
Interactive analysis& Sharing
Data management& workflow control(Rucio, Kafka, SSE)
dCache: An Overview | | 2019-02-27 | 4
● HERA
● Tevatron
● WLCG
● Belle II
● LOFAR
● CTA
● IceCUBE
● EU-XFEL
● Petra3
● DUNE
● And many more ...
dCache: An Overview | | 2019-02-27 | 5
Flexibility that works …● Supports many authentication schemes: username+password,
X.509, Kerberos and OpenID-Connect:● Integrates with existing infrastructure + pluggable for flexibility,● Users have same rights, irrespective of how they authenticate.
● Supports delegated authorisation, using Macaroons.● Multiple protocols: (Grid)FTP, HTTP/WebDAV, SRM, xrootd, NFS
v4.1/pNFS and dcap.● Using different protocols, users will see the same data.
dCache innovations:Storage Events
dCache: An Overview | | 2019-02-27 | 7
Storage events: the problems
Upload a file
OK
Delete a file
OK
Catalogue: Rucio/LFC/…
Are these files on disk?
no, no, no, …
Stage files from tape
Request queued
Are these files on disk?
no, no, no, …
Are these files on disk?
no, YES, no, …
…
dCache: An Overview | | 2019-02-27 | 8
9000 stats per second!
dCache: An Overview | | 2019-02-27 | 9
dCache: An Overview | | 2019-02-27 | 10
An new approach: storage events
Subscribe to events
OK
Something happened #1
Something happened #2
Something happened #3
dCache: An Overview | | 2019-02-27 | 11
New solutions to old problems:
Upload
OK
Delete
OK
Rucio
Subscribe …
OK
File uploaded
File deleted
Stage files
Request queued
OK
File #16 on disk
Subscribe …
● User- and internally triggered events:
● Data uploaded● Data deleted/renamed/moved● Tape flush/stage operations
● Uses: update catalogue, metadata extraction, data normalisation, build derived data, …
● Two event systems:● Site integration (Kafka)● Per user events (SSE/inotify)
(DEMO :-)
dCache innovations:Distributed storage & Data Lakes
dCache: An Overview | | 2019-02-27 | 13
Data Lakes: distributed resources● dCache has over a decade of production use as a data lake:
● NDGF is a distributed dCache, spread over five countries.● AGLT2 is a distributed dCache, spread over two campuses.
● dCache can already provide protocol-based QoS; e.g., cache data for NFS access, read remotely for HTTP/GridFTP.
● Currently building new testbed to demonstrate existing solutions and improve upon them:
Hamburg → Zeuthen (RTT: ~5 ms); Hamburg → Moscow (RTT: ~70 ms)● Adding ability to provide cached data when detached:
A “satellite” can offer data if disconnected from the rest of dCache.
dCache: An Overview | | 2019-02-27 | 14
Data Lakes: cloud bursting● dCache stores data in either a local filesystem or as objects within a
CEPH cluster.● Two new developments:
● Storing data within an S3 endpoint● Dynamic pools: just start a dCache pool and that capacity becomes usable.
● Together, support the cloud bursting use-case:● As cloud capacity “comes online” either due to load (cloud burst) or due to
resources being cheap (Amazon grants) then start a dCache pool● Jobs can run “in the cloud” with dCache taking care of any data movement.
dCache innovations:Delegated Authorisation with Macaroons
dCache: An Overview | | 2019-02-27 | 16
Macaroons: delegated authorisation
Photo by Alan Cleaver (CC-BY)
dCache: An Overview | | 2019-02-27 | 17
GET
307
GET 3. Request data directly from dCache
2. Request a macaroon
User Database
Example use: community portals / BOINC
1. Request data
dCache
dCache: An Overview | | 2019-02-27 | 18
Example use: ad-hoc sharing
2. Send to colleague(e.g. via email)
1. Request a macaroon
GET/PUT/DELETE
3. Use macaroondCache
dCache: An Overview | | 2019-02-27 | 19
dCache Workshop: 2019-05-21 to 2019-05-22● Located in Madrid, Spain.● Learn more about latest
developments in dCache● Opportunity to discuss issues directly
with dCache developers● Share stories with dCache admins● Help shape the future direction of
dCache.
https://indico.desy.de/indico/event/22170/
dCache: An Overview | | 2019-02-27 | 20
The take-home message
● dCache is advance storage software for data-intensive science.
● dCache:● has decades of production use throughout the world,● provides scalable resources, used by many scientific disciplines,● offers innovative solutions that help drive the next generation
of scientific discovery.
Backup slides
dCache: An Overview | | 2019-02-27 | 22
dCache 101: Motivation● Data never fits into a single server
● Multiple servers● Off-load to tape
● Growing number of client hosts● Mainframe vs Linux cluster
● Control over hardware/OS selection● Better tender offers● Use and enhance local expertise
dCache: An Overview | | 2019-02-27 | 23
dCache 101: Design
● Single-rooted namespace, distributed data● Client talks to namespace for metadata operations only● Bandwidth and performance grow with number of data
servers● Standard clients (OS native or experiment)● Some data can be offloaded to tape
Processing data without user credentials / BOINC
GET
307
GET
4. Request data directly from dCache
2. Request a macaroon
3. Add caveats
What are macaroons good for?
FTS
What are macaroons good for?
HTTP 3rd party copies
2. Request amacaroon
3. Add caveats
4. COPY with embedded macaroon
5. GET with macaroon
1. Request copy
What are macaroons good for?
Enforcing catalogue permissions
Rucio1. Request accessto data
2. Request a macaroon3. Add caveats
4. Access data
Comparison: it’s what industry is doing…
Comparison: it’s what Open-Source is doing…
dCache Storage Events: Kafka
created Log
billingbilling
stagedLog
billingbilling
created staged
dCache Server-Sent Events (SSE)● Based on HTTP v1.1● HTML 5 standard
Support for many languages and web-browsers
● Initially adding support for inotify events
(it’s how Linux does namespace notification)● Plan to add:
● Locality change notification: flush, stage, …● Transfer-related events● QoS changes
Cheat sheet: Kafka vs SSE
SSE
Standard … Component Protocol
What events does it see? dCache internal events Controlled
Main benefit Easy integration Built-in security
“Catch-up” storage Memory & disk Memory-only(currently)
Target audience Site-level integration Events for users
EOSC-Pilot demonstrator: EU-XFEL data ingest
extractmetadata
new RAW data
createderived
data
store derived file
metadatacatalog
update catalog
Rucio demonstrator: automated replication with SSE
New data
SSERucio
Upload
Third-party copy
Future directions
● Complete SSE inotify support in dCache.● Add additional events, based on initial feedback.● Further explore automated data workflow (EU-XFEL
usecase).● Work with Rucio team to explore SSE integration.● Work with dCache sites to deploy store events in
production.