infso-ri-508833 enabling grids for e-science dags with data placement nodes: the “shish-kebab”...

13
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN Milano

Upload: claire-conley

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

DAGs with data placement nodes:the “shish-kebab” jobs

Francesco Prelz Enzo MartelliINFN Milano

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Why should we bother to schedule data jobs ?

• Fundamental ingredients of data jobs:– Quoting Ian Bird, the SRM functionality foreseen in LCG is:

V1.1 + space management, pin/unpin, etc Not full set of V2.1 V3 not required CMS still to confirm agreement with this set

– Should any additional low-level interface be considered ?

• What interaction with matchmaking?– We consider these scenarios:

Job needing to reserve space (for output) on a given tactical (or even strategic) SE, and to release it at the end.

Job needing to pre-stage a file in from a mass-storage system, and/or to keep the file pinned until the end of execution

– Should anything else be considered ?

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

The fundamental concept

Execute the job

Stage-out

• Stage-in

• Execute the Job

• Stage-out

Stage-in

Release any temporary space

used

Allocate space for input & output data

Data Placement Jobs

Computational Jobs

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Just a few more details

Stage-in

Execute the job

Stage-out

Allocate space for input & output data

Should we deal with multiple matches ? Match-making

For how long? Probably File pinning should be renewed.

How does the executable find thefiles? •Always via POSIX, relative to CWD,with a mapping that is known in advance and is applied by the sites?•Should mapping be carried with the job?

Where? Or: when should

Files should be secured to ´strategic´storage, but how hard should we tryto move them to their final destination ?

occur ?

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

SRM APIs

API PURPOSE

Tells us when files are available and pinned.

extend lifetime of file accessible via URL

release space previously allocated

srmPrepareToGet IN: arrayOfFileRequest userRequestDescription OUT: requestToken returnStatus

is intended to pin a file if the SRM already has the file;otherwise the SRM will allocate space, copy a file fromits archive or a remote location, and pin the file.

userRequestDescription: size of space required, lifetime, etc.requestToken: needed for further request like “extendLIfeTimeâ€

srmStatusOfGetRequest IN: requestToken OUT: returnStatus

SrmExtendFileLifeTime IN: requestToken siteURL newLifeTime OUT: returnStatus

srmReserveSpace IN: sizeOfTotalSpaceRequired lifetimeOfSpaceToReserve OUT: spaceToken returnStatus

allocate space with a lifetime policy

spaceToken: needed for further request

srmReleaseSpace IN: spaceToken OUT: returnStatus

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

APIs used in each node

Stage-in:the SRM pin the file if already has the it;otherwise allocate space, copy the file and pin it. Previous allocation may be avoided.

Release any temporary space

used

srmReserveSpace (either directlyor via reservation framework)

SrmPrepareToGet, waitand srmStatusOfGetRequest

srmReleaseSpace (either directlyor via reservation framework)

Allocate space for input & output data

File pinningSrmExtendFileLifeTime

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

How should pinning and reservation be renewed in the job flow?

• Should we add more ad-hoc machinery, as done for the proxy renewal ?

• It is probably worth to generalise a renewal solution for renewing the allocation of various reservable resources.

• We are studying how to integrate an architecture for resource reservation (see T. Ferrari/E. Ronchieri's talk)– We'll need to resolve the renewal issues in that context.

• Should we have a different approach just for data matchmaking jobs ? How ?

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Agreement Service Architecture

Agreement Initiators

Agreement Offer

Storage/Computing/Network Agreement Service

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Just a DAG ? Really a DAG ?

Stage-in

Execute job

Stage-out

Match-making

This can also fail, what do we do ? First

This should likely be skipped in case of job failure, but, we should not forget to

Release any temporary space

used

?

Release any temporary space

used

Then go back to

File pinning

Oh, this canfail, too!

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

More details about Match-making

What data attributes should contribute to the rank ?• Currently number of close (administratively local) files.

• Should prefetch time estimates be contributing ? Is srmGetReqEstTime going to be there ?

• Should the possibility of remote access be taken into account ? Estimated size and number of accesses if remote file access is allowed ?

What should be the status of a job that failed to releasespace ? OK, But ? And who should be told about this ?

What data attributes should contribute to the requirements ?• This is the same as saying: should we allow a match to occur

only after some independent data movement actions are taken?

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Other details

• What should be the status of a job that failed to release space ?

• OK, But ? • And who should be told about this ?

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

Non-conclusive questions...

• Did we get a reasonable view of the non SRM v1.1 functions that are going to be there ?

• We will be test-driving the generic reservation framework, applied to storage.

•This will require to apply some renewal/extension semantics , should it be added ad-hoc ?

• Handling job flows with data seems to require capabilities beyond DAG.

•Should we be implementing a state machine? A shell? Any other idea ?

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN

JRA1 All Hands meeting, Brno June 20-21-22/06/2005 1

Enabling Grids for E-sciencE

INFSO-RI-508833

References

• SRM V1 API http://sdm.lbl.gov/srm-wg/doc/SRM.Joint.Functional.Design.Jan2002.pdf

• SRM V2 API:

http://sdm.lbl.gov/srm-wg/doc/SRM.spec.v2.1.1.html