ws-pgrade: supporting parameter sweep applications in workflows péter kacsuk, krisztián...
TRANSCRIPT
WS-PGRADE: Supporting parameter sweep applications in
workflows
Péter Kacsuk, Krisztián Karóczkai,Gábor Hermann, Gergely Sipos, and József Kovács
MTA SZTAKI
Content
• Motivations
– Lessons learnt from P-GRADE portal
– Lessons learnt from CancerGrid
• Workflow concept of gUSE/WS-PGRADE
• Parameter sweep support of gUSE
– CancerGrid
• Executing PS nodes of gUSE workflows in desktop grids
• Conclusions
Popularity of P-GRADE portal
• It has been used in many EGEE and EGEE-related VOs:– GILDA, VOCE, SEE-GRID, BalticGrid, BioInfoGrid, EGRID,
etc.
• It has been used in many national grids:– UK NGS, Grid-Ireland, Turkish Grid, Croatian Grid, Grid
Malaysia etc.
• It has been used as the GIN VO Resource Testing Portal
• It became OSS in the beginning of Januar 2008:
https://sourceforge.net/projects/pgportal/
Download of OSS P-GRADE portal
828 downloads so far
Lessons learnt from P-GRADE portal
• Popular because it provides
– Easy-to-use but powerful workflow system (graphical editor, wf manager, etc.)
– Easy-to-use parameter sweep concept support
– Easy-to-use MPI program execution support
– Grid virtualization:
• Multi-grid/multi-VO access mechanism for LCG-2, gLite, GT2 and GT4
Introducing three levels of parallelism
Each job can be a parallel program
– Parallel execution inside a workflow node
– Parallel execution among workflow nodes
Multiple jobs run parallel
– Parameter study execution of the workflow
Multiple instances of the same workflow with different data files
Parameter study workflow
GENGrid job
generates input
parameter space
COLLCollector grid job
evaluates the results of the
simulation
SEQSEQSEQSEQ
Parameter sweep grid
jobs
This could be any
workflow
3-phase PS execution in P-GRADE portal
First phase: executing
ones all the Generators
Last phase: executing
ones all the
Collectors
Second phase:
executing all generated
eWorkflows in parallel
CancerGrid workflow needs more
• Usage of generators and collectors at any node of the WF without any ordering restrictions
• Usage the PS execution at node-level at any node of the WF without any ordering restrictions
CancerGrid workflow needs more
x1
x1
x1
xN
xN
NxM= 3 million
NxM
xN
xN
N=30K
xN
xN
NxM
Generator job Generator job
N = 30N = 30KK, M = 100, M = 100 => about 0.5 year execution time => about 0.5 year execution time
NxM= 3 million
Solution of the problem
• We need an environment where the user can develop and execute such a workflow
• The environment should contain a broker that decides where to execute the nodes
– MPI nodes on SG clusters
– Nodes with very short execution time on local resources
– Seq. nodes with small number of invocations at SGs
– Seq. nodes called many times at DGs
• Such an environment for SGs is:
– gUSE: provides a high-level service set based middleware
– WS-PGRADE: provides a workflow user interface
gUSE and WS-PGRADE
• gUSE (grid User Support Environment)
– is a grid virtualization environment
– exposes the grid as a workflow
– enables the execution of workflows simultaneously in many grids no matter what their middleware is
• WS-PGRADE is the user interface to support
– Editing, configuring, publishing workflows (as grid applications)
Graphical User Interface: WS-PGRADEGraphical User Interface: WS-PGRADE
Workflowinterpreter
Workflowinterpreter
Workflowstorage
Workflowstorage
Filestorage
Filestorage
Applicationrepository
Applicationrepository
LoggingLogging
gUSEinformation
system
gUSEinformation
system
Presentation layer
gUSE layer (high level, user centric
services)
Grid layer(low level
middleware services)
gLite resources, Globus resources, BOINC resources,Web services and relational databases
gLite resources, Globus resources, BOINC resources,Web services and relational databases
SubmittersSubmittersSubmittersSubmitters
SubmittersSubmitters
gUSEbroker
(Unibroker)
gUSEbroker
(Unibroker)
PS workflow concept of WS-PGRADE
• Any node of the workflow can be:
– PS job
– Generator
– Collector
• There are two kinds of relationship between input files of PS nodes:
– Cross product
– Dot product
Workflow Graph Overview in WS-PGRADE
Input Port
Node: job, service call (WS, legacy), wf
Output Port
The Workflow Editor as it
appears for the user
Configuring the Workflow
hm n
*K
1
Specify the number of input files on external
input Ports
Generator job produces multiple data on the output port within one job submission step
Specify Dot or Cross product relation of
Input ports to define the number of job
submissions
Specify job to be Collector by defining a Gathering Input
Port. The Job execution will be postponed until all input files
have arrived to that port
Legend:Cross ProductDot Product
Animation the number of generated output files
hm n
m*n
m*n h*K
S
m*n h*K
m*n*h*K
S S
S
S
S
h*K *K
1
S=max(m*n,h*k)
1
Sm*n*h*K
m*n h
S
S
Generator job runs h times and each run generates K files on
the output port
In case of cross product separate job submission
is generated for each possible input file
combination
In case of dot product the job is
submitted with input files having a
common index number in each
input port
The user concern
• I have a large workflow containing:
– Sequential nodes to be executed once
– Sequential nodes to be executed many times (PS)
– MPI nodes to be executed once
– MPI nodes to be executed many times (PS)
• I want to execute this workflow as fast as possible using as many resources as possible
x1
x1
x1
xN
xN
NxM= 3 million
NxM
xN
xN
N=30K
xN
xNNxM
Generator job Generator job
NxM= 3 million
Execution in EDGeS VO of EGEE
Execution in the private DG of CancerGrid project
Execution in a local resource
Execution as Web Service
GlobalDEG
LocalDEG
LocalDEG
LocalDEG LocalDEG
Putting everything together
University DG
Volunteer DG
Service Grid
EGEE
gUSE/WS-PGRADE provides the transparent access to SGs/DGs
WS-PGRADE Appl. Repository
gUSEService Grid
OSG
Family of P-GRADE products
and their use • P-GRADE
– Parallelizing applications for clusters and grids
• P-GRADE portal– Creating simple workflow and parameter sweep applications for
grids
• P-GRADE/GEMLCA portal– Creating workflow applications using legacy codes and community
codes from repository
• gUSE/WS-PGRADE– Creating complex workflow and parameter sweep applications to
run on clusters, service grids and desktop grids
– Creating workflow applications using embedded workflows, legacy codes and community workflows from workflow repository
Conclusions
• gUSE and WS-PGRADE solve all the limitation problems of P-GRADE portal:– Implementation of gUSE is highly scalable, can be distributed on
a cluster or even on different grid sites.
– Stress tests show that it can simultaneously serve thousands of jobs (currently manages ~100,000 jobs in CancerGrid)
– Its workflow concept is much more expressive than in P-GRADE portal (recursive wf, generic PS support, etc.)
– WS-PGRADE provides two user interfaces:
• Developer (creates and exports WFs into the WF repository of gUSE)
• End-user (imports and executes WFs from the WF repository)
– gUSE provides grid virtualization at workflow level: nodes of a WF can be executed by
• Web Services, local resources, service grids and desktop grids (see EDGeS project)