a jsdl applications repository and data staging portal: some new parameter sweep developments and...

21
A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Data transfer Requirements David Meredith STFC e-Science Centre Daresbury Laboratory, UK [email protected] Geoff Williams Oxford University Computing Lab, UK [email protected]

Upload: antonia-phillips

Post on 18-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

A JSDL Applications Repository and Data Staging Portal:

Some New Parameter Sweep Developments and Data transfer

Requirements

David Meredith STFC e-Science Centre

Daresbury Laboratory, [email protected]

Geoff Williams Oxford University Computing Lab, UK

[email protected]

What it does:

• Aim: Provide an easy way to access computing resources, execute installed applications, stage/move data between different remote file systems (e.g. off and onto Grid resources).

• Browse application templates in the repository according to ‘categories of interest’ (e.g bioinformatics, tutorials/examples, physics).

• Templates fully describe all of the requirements of an application for execution (‘ready to run’ applications, provides a starting point for new users).

• Users benefit from immediate access to the expertise, artefacts and configuration captured in application description templates (e.g. published and shared by domain-experts).

• Select, load, modify / tweak, save as personal template.

• Browse and perform file operations on different file systems (currently SRB, GridFTP, FTP, SFTP). List, upload, download, delete, rename.

• Recursive data copy between different file systems.

• Execute application and stage data in one action.

Example Use Case

(NGS Applications Repository)

Applications described using middleware agnostic Job Submission Description Language (JSDL GUI editor).

Ensures community formation around a “best practice” approach (OGF), aids interoperability.

Middleware specific dependencies added at run time - convert the JSDL into middleware specific scheme (e.g. RSL).

Ali Anjomshoaa, Fred Brisard, Michel Drescher, Donal K. Fellows, William Lee, An Ly, Steve McGough, Darren Pulsipher, Andreas Savva, Chris Smith

JSDL 1.0 is an OGF standard

JSDL 1.0 is published as GFD-R-P.56 – http://www.ggf.org/gf/docs/?final

<jsdl:Application> <jsdl:ApplicationName>gnuplot</jsdl:ApplicationName> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable> /usr/local/bin/gnuplot </jsdl-posix:Executable> <jsdl-posix:Argument>control.txt</jsdl-posix:Argument> <jsdl-posix:Input>input.dat</jsdl-posix:Input> <jsdl-posix:Output>output1.png</jsdl-posix:Output> </jsdl-posix:POSIXApplication> </jsdl:Application> <jsdl:Resources> ….

JSDL

1. An XML Schema language for describing the requirements of computational jobs for submission to Grids.

2. Is agnostic of middleware - no dependencies on Globus, WSRF, gLite (means portal can be generic and not tied to any particular set of Grid technologies).

3. JSDL documents can be validated against the JSDL and JSDL POSIX XSD Schema to ensure its correctness

Input fields are pre-configured / filled out.

Fields are taken from the JSDL and JSDL-POSIX extension schemas.

POSIXApplication is a JSDL extension. It defines standard POSIX elements.

–stdin, stdout, stderr–Working directory–Command line arguments–Environment variables

<POSIXApplication>

<Executable ... />

<Input ... />?

<Output ... />?

<Error ... />?

<WorkingDirectory ... />?

</POSIXApplication>

<POSIXApplication>

<Executable ... />

<Input ... />?

<Output ... />?

<Error ... />?

<WorkingDirectory ... />?

</POSIXApplication>

Pre configured Job Detail

<jsdl1:Environment name=“TMP">/tmp</jsdl1:Environment> <jsdl1:Environment name="NGSMODULES">envVarValue1</jsdl1:Environment>…..

Pre configured Environment Variables

Paste and parse command line arguments (space and/or line separated values)

<jsdl1:Argument>fasta34</jsdl1:Argument><jsdl1:Argument>-H</jsdl1:Argument><jsdl1:Argument>humanDNA2.input</jsdl1:Argument><jsdl1:Argument>/var/data/bioinformatics/..</jsdl1:Argument><jsdl1:Argument>S</jsdl1:Argument>

Pre configured Command Line Arguments

Named file systems used to declare mount points on the consuming system.

File system names are referenced throughout the portal (and JSDL doc) for substituting mount points.

Changes to a FS mount point will be updated automatically throughout the portal/JSDL.

Used when specifying path info e.g. locations to files/dirs, stage data locations etc.

<jsdl:FileSystem name=“WORKINGDIR"> <jsdl:MountPoint>/home/ngs0024/myScratchDir </jsdl:MountPoint></jsdl:FileSystem><jsdl:FileSystem name=“DataDir"> <jsdl:MountPoint>/home/ngs0024/myDataDir</jsdl:MountPoint></jsdl:FileSystem>…<jsdlposix:Output filesystemName="WORKINGDIR"> fasta.out </jsdl1:Output>

Pre configured Named File Systems

JSDL Parameter Sweep Extensions http://forge.gridforum.org/sf/projects/jsdl-wg

1. A common requirement to select a job and submit it ‘10, 50, 300’ times, each time making some modifications to the ‘original/master’ JSDL (e.g. args, parameters, output dir, input file whatever…).

2. The JSDL + PS extensions allows you to group the master JSDL + the required modifications (which JSDL fields require sweeping);

• Saves writing multiple separate JSDL docs.

• Can be any value within the JSDL document itself,

• Can be any value within a named file that is referenced by the JSDL (e.g. an input file).

• Actually yields multiple separate jobs (rather than solely parameter sweeps).

Recently submitted for public comment at OGF24, Sept.

1. Nest <Sweep> elements within a JSDL doc.

2. The <Assignment> identifies which set of <Parameters> should be swept / iterated using the given sweep <Function>.

3. <Parameter> + <Function> are abstract (can define different implementations as required).

4. Spec v1 Parameters:

• <DocumentNode>, <TemplateFile>

5. Spec v1 Functions:

• <Values>, <LoopInteger>, <LoopDouble>

JSDL SweepOverview

JSDL

+ PS

JSDL

PS ext

Select + give new values for <App> element

Two arguments are swept, yielding 3 separate jobs

(1, 4, again), (2, 5, again), (3, 7, again)

Basic Example - Modify the command line using Values and a Loop

Portal Implementation

Select which values require sweeping

Portal Implementation

Build sweep:

Identify which parameter

Define function values

Note, more interface work required (e.g. upload .csv file for values)

JSDL + JSDL POSIX

PS Extensions

Part 2: Data Staging / File Transfer (Portal is VFS client)

Portal is a single interface to different remote file systems (Ftp, Srb, GridFtp, Sftp). Browse and perform file operations (upload, download, delete, list, rename)

Select files/dirs and ‘copy to opposite host’

Copy data between these different file systems.

Manual Copy Data Between Different File Systems

Compile data (spread over different file systems)

Copy data to target URI (e.g. SRB or wherever)

List of data from across the Grid that should be copied to the consuming system

Before job: src URI

After job: tgt URI

JSDL does not mandate the protocol / URI format.

Data is staged relative to named file systems.

<jsdl:DataStaging> <jsdl:FileName>Mg.psf</jsdl:FileName> <jsdl:FilesystemName>WORKINGDIR</jsdl:FilesystemName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>false</jsdl:DeleteOnTermination> <jsdl:Source> <jsdl:URI>gsiftp://ngs.rl.ac.uk:2811/apps/Siesta_mpi/…</jsdl:URI> </jsdl:Source> </jsdl:DataStaging>

Specify Data Staging Requirements of an Application

SRB/ FTP

SFTP/ GSIFTP

Portal + *VFS client

File operations (list, upload, download, delete, rename)

Bit pipe (byte IO stream)

Authentication tokens (un/pw, x509)

Auth tokens only in memory on one server.

Self contained.

Piping bytes via portal server is not ideal (bottleneck, single point of failure, concurrency issues).

Current Data Staging / File Transfer Implementation (VFS)

Single interface to different remote file systems (Srb GsiFtp, Ftp, Sftp).

*VFS: Apache Commons VFS

SRB/ FTP

SFTP/ GSIFTP

Portal + VFS client

VFS clients

JMS QUEUE

Required / Suggested Architecture for

Data Staging / File Transfer Service

File operations (list, upload, download, delete, rename)

Bit pipe (byte IO stream)

Authentication tokens (un/pw, x509)

Move file transfers to different server (farm), increase bandwidth, concurrency (large transfers).

Passing auth tokens around in messages (strong security required)

Development effort / testing.

Parameter Sweeping Portal role is the JSDL Producer (author/persist JSDL with

sweep extensions for applications), not JSDL Consumer. JSDL Consumers role is to enact the JSDL+PS e.g. create,

submit, stage data for ‘1000’ jobs. This is the responsibility of OGSA BES, middleware and SAGA….

But, middleware / SAGA support for PS extensions not yet available - May have to devise a ‘hack’ in the meantime ?

File Transfer / Data Staging Service • Need to support large data transfers by moving byte

streaming to dedicated servers/services.• Will have to pass security tokens from portal to staging

service (looking at WSS Username and Certificate profiles).

• Here to explore this requirement (also for facilities work ?) and investigate solutions….

Summary