advanced topic: the srm protocol and the storm implementation ezio corso (egrid project, ictp)
TRANSCRIPT
Advanced topic: The SRM protocol and the StoRM implementation
Ezio Corso (EGRID Project, ICTP)
Advanced topic on data management I will briefly describe how the classic SE works:
Highlight design points and consequences for file security. File security: POSIX-like ACL access to files from the GRID.
I’ll then talk about the SRM protocol: Its origin to allow tape resources to be accessed from the
GRID. Particular attention to design differences with classic SE.
SRM transition as an interface to disk storage resources.
Differences with Tape based systems. I’ll finally talk about StoRM: an SRM implementation
that allows POSIX like ACL access.
I. Classic SE
Classic SE It allows disk resources to be accessed
from the GRID. What makes a machine into a SE? Three
components are needed: A component that publishes and tells the
GRID that it is an available storage resource. The usual framework for authentication: GSI. A component that actually moves the files
around: the characterizing feature!
Classic SE Component that allows the GRID to be aware of its presence,
i.e. to be included in the GRID information system There is an LDAP Server that publishes information about the SE. Information organised according to the GlueSchema: specifically
by the GlueSEUniqueID entity. Information describing the SE such as its name and listening port of
service. Information specific to each VO that the SE is serving such as the
local path to the file holding directory, available space, etc. Part of the information is updated dynamically, especially that
concerning the disk space available and disk space occupied. It is done through LDAP Providers found in /opt/lcg/libexec. The providers run periodically scripts which update the dynamic
information. Finally the rest of the grid information system periodically polls
the information made available by the SE present there.
Classic SE User authentication: Grid Security Infrastructure
GSI Core of GLOBUS 2.4 libraries: used by service in charge of
moving files around! i.e. /opt/globus/lib/libglobus_gsi_credential_gcc32dbg.so.0,
/opt/globus/lib/libglobus_gsi_proxy-core_gcc32dbg.so.0, etc. Set of scripts run by cron jobs to manage pool accounts:
/opt/edg/sbin/edg-mkgridmap creates a gridmap file by reading a local configuration file that specifies sources of allowed credentials, from LDAP server or a specific file.
/opt/edg/sbin/lcg-expiregridmapdir used to remove the mapping to local credentials when a grid user no longer is working on that machine.
/opt/edg/sbin/edg-fetch-crl used to retrieve revocation lists of invalid certificates.
Classic SE
Component that carries out the functionality of moving files around the GRID.
In general it is just any implementation of a transport protocol that implements GSI! GridFTP most common! RFIO Anything that somebody comes up with as long
as it is GSI enabled: it is just a matter of who will adopt it and use it!
Classic SE
GridFTP: Essentially an FTP server
extended/optimized for large data transfers: Parallel streams for speed. Allows checkpoints during file transfers, for
later resuming. Authentication through GSI certificates
instead of user name + password
Classic SE Central point:
It is FTP! A user can do what an FTP client allows to be done!
There is no separation of what can be done from the grid, and the actual transport protocol.
There is no explicit and separate list of file manipulation operations that can be done from the grid!
There is no uniform view of the possible file manipulations: they are linked to the underlying transport protocol!
Depending on the protocol you may not have the same functionality
For the same functionality the specific protocol must be used: it may not be possible to access seamlessly all SEs!
Classic SE
Compare with CEs that have LRMS interface to forked jobs or to batch jobs.
It is an abstraction layer on the kinds of computations that can be done.
LRMS may not be a great protocol (gLite CEs are somewhat different)… yet it is an attempt to introduce an abstraction.
Classic SEA more serious consequence of the lack of abstraction is
how to apply POSIX ACL like control on files, from the grid. It is left up to the transport protocol!
For GridFTP: It is FTP modified for GSI. FTP allows file manipulation compatible with underlying Unix
filesystem permissions. If grid control on files is needed, it is the underlying
filesystem that must be carefully managed! Map users to specific local accounts: not pool accounts. Each grid
user can be controlled individually once it gets into the machine. Partition local accounts into especially created groups: reflects
data access patterns. Carefully crafted directory tree guides data access.
So a grid user with no access rights to a file is stopped because the GridFTP server gets stopped on its track by the local filesystem!
Classic SE
In any case the proposed solution is problematic because data may be present in several SEs: Users have same UID across all SEs. Replication/Synchronisation of
directory structure across all SEs. Users supplied with tools to manage
permissions coherently across all SEs.
Classic SECentral point: GRID lacked the concept of access control within the
same VO. It was only possible to find it when passing to the local
machine. The local machine had the means to enforce it: users +
group membership. Security therefore is set up behind the scenes at the
implementation level! No GRID concept involved! No GRID abstraction
available to: Express fine grained authorization. Express what can be accessed. Check GRID credentials.
Classic SE
VOMS proxies and GridFTP Allows to define roles and groups: it therefore
allows for fine tuning who the GRID user is. It is up to the system receiving these detailed
credentials to decide what local resources to use.
For SE there is still the same problem of explicitly listing what these resources are: dependency on the transport protocol as stated.
II. The SRM protocol
The SRM protocolStorage Resource Manager protocol: Originally devised to allow grid access to tape
based resources that had a disk area acting as cache.
Staging of files: A request for a file arrives If it is in cache it is returned right away Otherwise it is first fetched from tapes, copied to
disk and then returned. The system takes care of consistency between cache
and tapes. Needed to offset latency due to robotic arm
switching tapes.
The SRM protocolSRM designed to handle that Tape/Disk-
cache scenario, from the GRID:1. The presence of cache area introduces
the concept of file type: Volatile: files get written in cache and the
system then removes them automatically after a lifetime expires.
Permanent: the files that get into cache are not removed automatically by the system
Durable: files do have a lifetime that may expire but the system does not remove them and instead sends an e-mail notification to the user.
The SRM protocol
2. File staging introduces the concept of asynchronous calls to get or put a file:
SRM request issued to get a file Server replies immediately without
waiting for staging to complete. Server returns a Request Token which
the client uses to periodically poll the request’s status.
The SRM protocol3. The cache area also introduces a partition of file
namespace: Tape must store files: there have to be names that uniquely
identify the file in tape! The cache area must serve files.
It may return a path to fetch the file on disk that is different from the name that allows to uniquely identify the file in tape.
It can easily support different fetching mechanisms… that is different transport protocols!
SRM reflects this distinction in the concept of SURLs and TURLs:
SURL: Storage URL - A name that identifies a grid file in SRM storage: it is what the GRID sees!
srm://storage.egrid.it:8334/old-stocks/NYSE.txt TURL: Transfer URL – A name that identifies a transport protocol
and the path to fetch the file: it is how the GRID moves the file around!
gridftp://storage.egrid.it:2110/home/ecorso/examples/2005/data.txt
The SRM protocol
Central point: SRM introduces an abstraction to
separate transfer protocol from the file operation itself.
Although introduced to handle the cache area, it also solves classic SE issues!
It decouples file operations from transfer protocol!
The SRM protocol
Direct consequence: SRM servers do not move files in and out
of GRID storage! They only return TURLS! It is up to the SRM client once it gets a
TURL to call a GridFTP/RFIO/etc client for moving files!
SRM acts only as a broker for file management requests!
Transfer is decoupled from data presentation!
The SRM protocol
Extra features and concepts in the protocol:
Big issue of not running out of space during a large file transfer. System used by the HEP community to
store/manage huge amounts of data from LHC.
SRM introduced space management and reservation interface.
The SRM protocol It distinguishes three types of reserved disk space:
Volatile: will be freed by the system as soon as its lifetime expires.
Permanent: will not be freed by the system. Durable: will not be freed but the user that allocated it will be
warned. Space type and file type cannot be mixed in arbitrary ways:
Permanent space will be able to host all three types of files. Volatile space can only host Volatile files.
The general way of working: Space request is made. Server returns a SpaceToken. All subsequent SRM calls made by the client pass on the token. The SRM server keeps track tokens and recognises allocated
space.
The SRM protocol
The protocol calls: Data Transfer Functions
Misnomer… no data is moved by an SRM server
srmPrepareToPut, srmPrepareToGet: for putting a file into GRID storage or getting one out.
srmStatusOfPutRequest srmStatusOfGetRequest for polling!
They work on SURLs!
The SRM protocol
The protocol calls: Cache area management
srmExtendFileLifeTime for extending lifetime of volatile files
srmRemoveFiles to remove permenent files
srmReleaseFiles, srmPutDone to force early lifetime expiry
The SRM protocol
The protocol calls: Directory functions to manage files in tape
srmRmdir srmMkdir srmRm srmLs They work on SURL!
The SRM protocol
The protocol calls: Space management functions srmReserveSpace srmReleaseSpace srmGetSpaceMetaData
Space Token returned and used with all Data transfer functions.
III. SRM applied to disk storage!
SRM applied to disk storage! SRM addresses the issues of classic SE: it is
natural to use it also for disk resources. There was also another important driving
force for its adoption: Many facilities were in place for LHC analysis of
data coming from experiments production centres.
The facilities had high performance storage solutions in place, employing disk parallel file systems such as GPFS and Lustre.
With advent of GRID technologies it became necessary to adapt existing installations to the GRID.
SRM applied to disk storage! The context of operation is now different:
No tape with a cache in between In general all concepts are kept with slight
semantic adjustments SURL/TURL distinction is kept - it decouples
transfer protocol from data presentation as stated.
Three file types are kept - some files may be copied and live just for a certain amount of time.
Space reservation is kept - it is an important functionality.
Directory functions are kept.
SRM applied to disk storage!Some compromises: Asynchronous nature of srmPrepareToGet,
srmPrepareToPut and srmCopy, remain although don’t make sense.
SpaceType distinction makes less sense: Arguably the whole disk can be seen as
permanent space, and so allow all three file types. Akin to tapes that are permanent by their nature.
Releasing of file and lifetime extension remain for volatile files; srmRemoveFiles for managing cache files does not make sense
IV. StoRM SRM implementation
StoRM SRM implementation
Result of collaboration between:
INFN - Grid.IT Project from the Physics community
+ICTP - EGRID Project: to build a pilot
national grid facility for research in Economics and Finance (www.egrid.it)
StoRM SRM implementation StoRM’s implementation of SRM 2.1.1
meant to meet three important requirements from Physics community: Large volumes of data exasperating disk
resources: Space Reservation is paramount.
Boosted performance for data management: direct POSIX I/O call.
Security on data as expressed by VOMS: strategic integration with VOMS proxies.
StoRM SRM implementation EGRID Requirements:
Data comes from Stock Exchanges: very strict legally binding disclosure policies. POSIX-like ACL access from GRID environment.
Promiscuous file access: existing file organisation on disk seamlessly available from the grid + files entering from the grid must blend seamlessly with existing file organisation. Very challenging – probably only partly achievable!
StoRM: disk based storage resource manager… allows for controlled access to files – major opportunity for low level intervention during implementation.
StoRM SRM implementation How StoRM solves POSIX-like ACL
access from the GRID: All file requests are brokered with SRM
protocol. When StoRM receives an SRM request for a
file: StoRM asks policy source for access rights to:
given SURL for given grid credentials. Check is made at the grid credential level: not
local user as before! And it is done on a grid view of a file as identified by the SURL!
StoRM SRM implementation The only part of the implementation outside of the
protocol is the Policy Source: a GRID service that is able to formulate/express physical access rules to resources.
StoRM leverages grid’s LogicalFileCatalogue (LFC) as policy source: it is intended for Logical Names! StoRM therefore stretches its use. Still, it is very GRID-friendly: it is not a proprietary solution!
It would be better to have it explicitly in the SRM protocol: SRM 2.1.1 does have some Permission functions but their expressive power is weak, and in the next version of the protocol they will be re-addressed (srmSetPermission, srmReassignToUser, srmCheckPermission).
StoRM SRM implementation A last note: physical enforcement
through JustInTime ACL setup. All files have no ACLs setup: no user
can access files. Local Unix account corresponding to
grid credentials is determined. ACL granting requested access set up
for local user. ACL removed when file no longer
needed.
Advanced topic on data management
Thank-you!