storage resource managers: essential components for the grid arie shoshani staff:

29
Computing Sciences Directorate, L B N L 1 SC 2003 Storage Resource Managers: Storage Resource Managers: Essential Components for the Essential Components for the Grid Grid Arie Shoshani Arie Shoshani Staff: Staff: Alex Sim, Junmin Gu, Alex Sim, Junmin Gu, Alex Romosan, Viji Natarajan Alex Romosan, Viji Natarajan Scientific Data Management Group Scientific Data Management Group Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm http://sdm.lbl.gov/srm

Upload: olga-marsh

Post on 30-Dec-2015

25 views

Category:

Documents


0 download

DESCRIPTION

Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu, Alex Romosan, Viji Natarajan Scientific Data Management Group Lawrence Berkeley National Laboratory http://sdm.lbl.gov/srm. Outline. What are Storage Resource Managers - Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Computing Sciences Directorate, L B N L

1 SC 2003

Storage Resource Managers:Storage Resource Managers:Essential Components for the GridEssential Components for the Grid

Arie ShoshaniArie Shoshani

Staff:Staff:

Alex Sim, Junmin Gu,Alex Sim, Junmin Gu,

Alex Romosan, Viji NatarajanAlex Romosan, Viji Natarajan

Scientific Data Management GroupScientific Data Management Group

Lawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory

http://sdm.lbl.gov/srmhttp://sdm.lbl.gov/srm

Computing Sciences Directorate, L B N L

2 SC 2003

OutlineOutline

• What are Storage Resource Managers - MotivationWhat are Storage Resource Managers - Motivation

• General Analysis Scenario and the use of SRMsGeneral Analysis Scenario and the use of SRMs

• SRM functionalitySRM functionality

• Real examples of working SRMsReal examples of working SRMs

• Advantages of using SRMsAdvantages of using SRMs

• Conclusions and Future WorkConclusions and Future Work

Computing Sciences Directorate, L B N L

3 SC 2003

MotivationMotivation

• Grid architecture needs to include reservation & Grid architecture needs to include reservation & scheduling of:scheduling of:• Compute resources• Storage resources• Network resources

• Storage Resource Managers (SRMs) role in the Storage Resource Managers (SRMs) role in the data grid architecturedata grid architecture• Shared storage resource allocation & scheduling• Especially important for data intensive applications• Often files are archived on a mass storage system (MSS)• large scientific collaborations (100’s of clients) –

opportunities for file sharing• File replication and caching may be used• Need to support non-blocking (asynchronous) requests

Computing Sciences Directorate, L B N L

4 SC 2003

Types of SRMsTypes of SRMs

• Types of storage resource managersTypes of storage resource managers• Disk Resource Manager (DRM)

• Manages one or more disk resources• Tape Resource Manager (TRM)

• Manages access to a tertiary storage system (e.g. HPSS)• Hierarchical Resource Manager (HRM=TRM + DRM)

• An SRM that stages files from tertiary storage into its disk cache

• SRMs and File transfersSRMs and File transfers• SRMs DO NOT perform file transfer• SRMs DO invoke file transfer service if needed

(GridFTP, FTP, HTTP, …)• SRMs DO monitor transfers and recover from failures

• TRM: from/to MSS• DRM: from/to network

Computing Sciences Directorate, L B N L

5 SC 2003

A A multi-filemulti-file request to a request to aDisk Resource ManagerDisk Resource Manager

...client

FileTransfer Service

DiskCache

filetransfer requests network

DRMDiskCache

...

DiskCache

FileTransfer Service

multi-filerequest

fileaccess

client

Tape System

Client-SRMCommunication

Computing Sciences Directorate, L B N L

6 SC 2003

Accessing Remote Accessing Remote Storage Resource ManagersStorage Resource Managers

Tape SystemDisk

Cache

filetransfer requests network

DRMDiskCache

...

DiskCache

multi-filerequest

fileaccess

...client client

DRMHRM

SRM-SRMCommunication

Computing Sciences Directorate, L B N L

7 SC 2003

General Analysis ScenarioGeneral Analysis Scenario

MSS

RequestExecuter

Storage Resource Manager

Metadatacatalog

Replicacatalog

NetworkWeatherService

logicalquery

network

clientclient ...

RequestInterpreter

requestplanning

A set oflogical files

Execution plan and site-specific

files

Client’s site

...Disk

Cache

DiskCache

ComputeEngine

DiskCache

Compute Resource Manager

Storage Resource Manager

ComputeEngine

DiskCache

Requests fordata placement andremote computation

Site 2Site 1 Site N

Storage Resource Manager

Storage Resource Manager

Compute Resource Manager

result files

ExecutionDAG

::Uniform SRM InterfaceUniform SRM Interface

Computing Sciences Directorate, L B N L

8 SC 2003

SRM is a ServiceSRM is a Service(OGSA, CORBA, C++, Java, …)(OGSA, CORBA, C++, Java, …)

• SRM functionalitySRM functionality• Manage space

• Negotiate and assign space to users• Manage “lifetime” of spaces

• Manage files on behalf of a user• Pin files in storage till they are released• Manage “lifetime” of files• Manage action when pins expire (depends on file types)

• Manage file sharing• Policies on what should reside on a storage resource at any one time• Policies on what to evict when space is needed

• Get files from remote locations when necessary• Purpose: to simplify client’s task

• Manage multi-file requests• A brokering function: queue file requests, pre-stage when possible

• Provide grid access to/from mass storage systems• HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor

(CERN), MSS (NCAR), …

Computing Sciences Directorate, L B N L

9 SC 2003

SRM works with other SRMsSRM works with other SRMsas well as legacy systemsas well as legacy systems

by using GridFTPby using GridFTP

DRM

Disk Cache

Disk Cache

Disk Cache

Disk Cache

BerkeleyBerkeleyChicago Livermore

HRMGridFTPGridFTP GridFTPFTP

Disk Cache

RequestInterpreter

RequestManager

DRM GridFTP

client

server server server server

Logical Request

Data Path

Control path

Legend:

Computing Sciences Directorate, L B N L

10 SC 2003

Tomcat servlet engine

Tomcat servlet engine

MCSMetadata Cataloguing Services

MCSMetadata Cataloguing Services

RLSReplica Location Services

RLSReplica Location Services

SOAP

RMI

MyProxyserver

MyProxyserver

MCS client

RLS client

MyProxy client

GRAMgatekeeper

GRAMgatekeeper

CASCommunity Authorization Services

CASCommunity Authorization Services

CAS client

NCAR-MSSMass Storage System

HPSSHigh PerformanceStorage System

HPSSHigh PerformanceStorage System

DRMStorage Resource

Management

DRMStorage Resource

Management

HRMStorage Resource

Management

HRMStorage Resource

Management

HRMStorage Resource

Management

HRMStorage Resource

Management

HRMStorage Resource

Management

HRMStorage Resource

Management

gridFTP

gridFTP

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver

gridFTPserver

openDAPgserver

openDAPgserver

gridFTPStripedserver

gridFTPStripedserver

LBNL

LLNL

USC-ISI

NCAR

ORNL

ANL

DRMStorage Resource

Management

DRMStorage Resource

Management

disk

disk

disk

disk

Earth System GridEarth System Grid

Computing Sciences Directorate, L B N L

11 SC 2003

Uniformity of Interface Uniformity of Interface Compatibility of SRMsCompatibility of SRMs

SRM SRM SRM

Enstore JASMine

ClientUSER/APPLICATIONS

Grid Middleware

SRM

DCache

SRM

CASTOR

SRM

DiskCache

Computing Sciences Directorate, L B N L

12 SC 2003

Where do SRMs belongWhere do SRMs belongin the Grid architecture?in the Grid architecture?

ComputeSystems

Networks

OtherStorage

systems

StorageResourceManager

ComputeResource

Management

General DataDiscoveryServices

CommunityAuthorization

Services

Application-Specific Data

Discovery Services

StorageManagement(Brokering)

ComputeScheduling(Brokering)

Data Filtering orTransformation

Services

DatabaseManagement

Services

RequestInterpretationand Planning

Services

File TransferService(GridFTP)

DataTransportServices

Monitoring/AuditingServices

Workflow orRequest

ManagementServices

Consistency Services(e.g., Update Subscription,Versioning, Master Copies)

DataFederationServices

RE

SO

UR

CE

:

CO

LLE

CT

I VE

1:

GE

NE

RA

LS

ER

VIC

ES

FO

RC

OO

RD

INA

TIN

GM

ULT

I PLE

RE

SO

UR

CE

S

CO

LLE

CT

IVE

2:

SE

RV

ICE

SS

PE

CIF

IC T

OA

PP

LIC

AT

ION

DO

MA

IN O

RV

IRTU

AL

OR

G.

ResourceMonitoring/

Auditing

FA

BR

ICC

ON

NE

CTI

VIT

Y

CommunicationProtocols (e.g.,TCP/IP stack)

Authentication andAuthorization

Protocols (e.g., GSI)

Data Filtering orTransformation

Services

CO

LL

EC

TI V

E

This figure based on theGrid Architecture paper by Globus Team

Mass StorageSystem(HPSS)

Computing Sciences Directorate, L B N L

13 SC 2003

SRMs provide a brokering serviceSRMs provide a brokering serviceby supporting by supporting multi-filemulti-file requests requests

ComputeSystems

Networks

OtherStorage

systems

StorageResourceManager

ComputeResource

Management

CommunityAuthorization

Services

Application-Specific Data

Discovery Services

StorageManagement(Brokering)

ComputeScheduling(Brokering)

Data Filtering orTransformation

Services

DatabaseManagement

Services

RequestInterpretationand Planning

Services

File TransferService(GridFTP)

DataTransportServices

Monitoring/AuditingServices

Workflow orRequest

ManagementServices

Consistency Services(e.g., Update Subscription,Versioning, Master Copies)

DataFederationServices

RE

SO

UR

CE

:S

HA

RIN

G S

ING

LER

ES

OU

RC

ES

CO

LLE

CT

I VE

1:

GE

NE

RA

LS

ER

VIC

ES

FO

RC

OO

RD

INA

TIN

GM

ULT

I PLE

RE

SO

UR

CE

S

CO

LLE

CT

IVE

2:

SE

RV

ICE

SS

PE

CIF

IC T

OA

PP

LIC

AT

ION

DO

MA

IN O

RV

IRTU

AL

OR

G.

ResourceMonitoring/

Auditing

FA

BR

ICC

ON

NE

CTI

VIT

Y

CommunicationProtocols (e.g.,TCP/IP stack)

Authentication andAuthorization

Protocols (e.g., GSI)

CO

LL

EC

TI V

E

This figure based on theGrid Architecture paper by Globus Team

Mass StorageSystem(HPSS)

General DataDiscoveryServices

Data Filtering orTransformation

Services

Computing Sciences Directorate, L B N L

14 SC 2003

DataMover: SRMs use in ESG and PPDGDataMover: SRMs use in ESG and PPDG for Robust Muti-file replicationfor Robust Muti-file replication

HRM-COPY(thousands of files)

SRM-GET (one file at a time)

GridFTP GET (pull mode)

stage filesarchive files

Network transfer

Get listof filesFrom directory

Recovers from file transfer failures

Anywhere

DiskCache

DataMover(Command-line Interface)

HRM(performs writes)

LBNL/ORNL

DiskCache

HRM(performs reads)

NCAR

NCAR-MSS

Recovers from staging failures

Recovers from archiving failures

Web-basedFile

MonitoringTool

Computing Sciences Directorate, L B N L

15 SC 2003

Concepts: Types of FilesConcepts: Types of Files

• Volatile: temporary files with a lifetime guaranteeVolatile: temporary files with a lifetime guarantee• Files are “pinned” and “released”• Files can be removed by SRM when released or when

lifetime expires

• PermanentPermanent• No lifetime• Files can only be removed by creator (owner)

• Durable: files with a lifetime that CANNOT be Durable: files with a lifetime that CANNOT be removed by SRMremoved by SRM• Files are “pinned” and “released”• Files can only be removed by creator (owner)• If lifetime expires – invoke administrative action (e.g. notify

owner, archive and release)

Computing Sciences Directorate, L B N L

16 SC 2003

Concepts: Types of SpacesConcepts: Types of Spaces

• TypesTypes• Volatile

• Space can be reclaimed by SRM when lifetime expires• durable

• Space can be reclaimed by SRM only if it does NOT contain files• Can choose to archive files and release space

• Permanent• Space can only be released by owner or administrator

• Assignment of files to spacesAssignment of files to spaces• Files can only be assigned to spaces of the same type

• Spaces can be reservedSpaces can be reserved• No limit on number of spaces• Space reference handle is returned to client• Total space of each type are subject to SRM and/or VO policies

• Default spacesDefault spaces• Files can be put into SRM spaces without explicit reservation• Defaults are not visible to client

• Compacting spaceCompacting space• Release all unused space – space that has no files or files whose

lifetime expired

Computing Sciences Directorate, L B N L

17 SC 2003

Concepts: Directory ManagementConcepts: Directory Management

• Usual unix semanticsUsual unix semantics• srmLs, srmMkdir, srmMv, srmRm, srmRmdir

• A single directory for all file typeA single directory for all file type• No directories for each type• File assignment to types is virtual• File can be placed in SRM-managed directories by

maitaining mapping to client’s directory

• Access control servicesAccess control services• Support owner/group/world permission

• Can only be assigned by owner• When file requested by user, SRM should check permission

with source site

Computing Sciences Directorate, L B N L

18 SC 2003

Examples of Directory StructuresExamples of Directory Structures(user defined)(user defined)

D1

D3D2

D4

F2 (P)

F4 (P) F5 (D)

F1 (D) F3 (V)

D1

D3D2D4

F1 (V) F2 (V) F3 (V) F4 (D) F5 (D) F6 (D) F7 (P) F8 (P)

(1) Mixed file types (2) By file type

• Supported function: ChangeFileType

• Advantage of (1): no need to move files when file types are changed

Computing Sciences Directorate, L B N L

19 SC 2003

Concepts: Space ReservationsConcepts: Space Reservations

• NegotiationNegotiation• Client asks for space: C-guaranteed, MaxDesired• SRM return: S-guaranteed <= C-guaranteed,

best effort <= MaxDesired

• Type of spaceType of space• Can be specified• Subject to limits per client (SRM or VO policies)• Default: volatile

• LifetimeLifetime• Negotiated: C-lifetime requested• SRM return: S-lifetime <= C-lifetime

• Reference handleReference handle• SRM returns space reference handle• User can provide: srmSpaceTokenDescription to recover handles

Computing Sciences Directorate, L B N L

20 SC 2003

Concepts: Transfer Protocol NegotiationConcepts: Transfer Protocol Negotiation

• NegotiationNegotiation• Client provides an ordered list• SRM return: highest possible protocol it supports

• ExampleExample• Protocols list: bbftp, gridftp, ftp• SRM returns: gridftp

• AdvantagesAdvantages• Easy to introduce new protocols• User controls which protocol to use• Default – SRM policy choice

• How it is returned?How it is returned?• The protocol of the Transfer URL (TURL)• Example: bbftp://dm.slac.edu/temp/run11/File678.txt

Computing Sciences Directorate, L B N L

21 SC 2003

Concepts: Multi-file requestsConcepts: Multi-file requests

• Can srmRequestToGet multiple filesCan srmRequestToGet multiple files• Required: Files URLs• Optional: space file type, space handle, Protocol list• Optional: total retry time

• Provide: Site URL (SURL)Provide: Site URL (SURL)• URL known externally – e.g. in Rep Catalogs• e.g. srm://sleepy.lbl.gov:4000/tmp/foo-123

• Get back: transfer URL (TURL)Get back: transfer URL (TURL)• Path can be different that in SURL – SRM internal mapping• Protocol chosen by SRM• e.g. gridftp://dm.lbl.gov:4000/home /level1/foo-123

• Managing request queueManaging request queue• Allocate space according to policy, system load, etc.• Bring in as many files as possible• Provide information on each file brought in or pinned• Bring additional files as soon as files are released• Support file streaming

Computing Sciences Directorate, L B N L

22 SC 2003

SRM functionalitySRM functionality

• Space reservationSpace reservation• Negotiate and assign space to users• Manage “lifetime” of spaces• Release and compact space

• File managementFile management• Assign space for putting files into SRM• Pin files in storage when requested till they are released• Manage “lifetime” of files• Manage action when pins expire (depends on file types)

• Get files from remote locations when necessaryGet files from remote locations when necessary• Purpose: to simplify client’s task• srmCopy: in “pull” and “push” modes

Computing Sciences Directorate, L B N L

23 SC 2003

SRM functionality (Cont’d)SRM functionality (Cont’d)

• Space management policies and file sharingSpace management policies and file sharing• Policies on what should reside on a storage resource at any one

time• Policies on what to evict when space is needed• Share files to avoid getting them from remote locations

• Manage multi-file requestsManage multi-file requests• Queues file requests, pre-stage when possible

• Status functionsStatus functions• Files: lifetime remaining, what’s available locally• Requests: what files are available (needed in lieu of callbacks)• Request summary: for progress report• Space metadata: space in use, space available, lifetime

• Provide grid access to/from mass storage systemsProvide grid access to/from mass storage systems• HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab),

Castor (CERN), MSS (NCAR), SE (RAL) …

Computing Sciences Directorate, L B N L

24 SC 2003

SRM MethodsSRM Methods

File Movementsrm(Prepare)Get:srm(Prepare)Put:srmReplicate: Lifetime managementsrmReleaseFiles:srmPutDone:srmExtendFileLifeTime:

Terminate/resumesrmAbortRequest:srmAbortFilesrmSuspendRequest:srmResumeRequest: 

Space managementsrmReserveSpacesrmReleaseSpacesrmUpdateSpacesrmCompactSpace:srmGetCurrentSpace: FileType managementsrmChangeFileType:

Status/metadatasrmGetRequestStatus:srmGetFileStatus:srmGetRequestSummary:srmGetRequestID:srmGetFilesMetaData:srmGetSpaceMetaData:

Computing Sciences Directorate, L B N L

25 SC 2003

Summary: advantages of using SRMsSummary: advantages of using SRMs

• Synchronization between storage resourcesSynchronization between storage resources• Pinning file, releasing files• Allocating space dynamically on as “needed basis”

• Insulate clients from storage and network system failuresInsulate clients from storage and network system failures• Transient MSS failure• Network failures• Interruption of large file transfers

• Facilitate file sharingFacilitate file sharing• Eliminate unnecessary file transfers

• Support “streaming model”Support “streaming model”• Use space allocation policies by SRMs: no reservations needed• Use explicit release by client for reuse of space

• Control number of concurrent file transfersControl number of concurrent file transfers• From/to MSS – avoid flooding MSS and thrashing• From/to network – avoid flooding and packet loss

Computing Sciences Directorate, L B N L

26 SC 2003

Web-Based File Monitoring ToolWeb-Based File Monitoring Tool

Shows:-Files already transferred- Files during transfer- Files to be transferred

Also shows foreach file:-Source URL-Target URL-Transfer rate

Computing Sciences Directorate, L B N L

27 SC 2003

File tracking helps to identify File tracking helps to identify bottlenecksbottlenecks

Shows that archiving is the bottleneck

Computing Sciences Directorate, L B N L

28 SC 2003

File tracking shows recovery from transient File tracking shows recovery from transient failuresfailures

Total:45 GBs

Computing Sciences Directorate, L B N L

30 SC 2003

Ongoing and Future WorkOngoing and Future Work

• Ongoing workOngoing work• Developing Standard SRM interfaces

• Particle Physics Data Grid (PPDG) project• LBNL, TJNAF, FNAL

• European Data Grid (EDG) project• WP2 - data management• WP5 – mass storage

• Deployment• LBNL, BNL, ORNL, TJNAF, FNAL, CERN, (SE-England)

• Use of SRM by other agents• Storage Resource Broker (SDSC) calling HRM to Stage files from HPSS• GridFTP invoking HRM

• New Spec completed (SRM V2.1)• directory management • File/directory file movement• dynamic space management

• Future workFuture work• Access authorization – community access service (CAS)• “On-demand” space allocation, accounting, and charging• Replica management – invoke SRMs and RLS as a single service• Request executer (e.g. DAGMAN) to invoke SRMs• SRMs over NeST (Network STorage)