06/08/10 pbs, lsf and arc integration zoltán farkas [email protected] mta sztaki lpds

15
06/08/10 PBS, LSF and ARC integration Zoltán Farkas [email protected] MTA SZTAKI LPDS

Upload: kory-whitehead

Post on 18-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

06/08/10

PBS, LSF and ARC integration

Zoltán [email protected]

MTA SZTAKI LPDS

06/08/10 PBS, LSF and ARC 2

Outline

•Introduction•Requirements•PBS and LSF•ARC

•Architecture of P-GRADE Portal runtime layer•PBS/LSF integration•ARC integration•Summary

06/08/10 PBS, LSF and ARC 3

Introduction

•P-GRADE Portal supported gLite, Globus•ETHZ requirement:•Make use of PBS local clusters•Make use of LSF local clusters (Brutus)•Sometimes make use of ARC grid resources

•All this should be integrated within P-GRADE Portal

06/08/10 PBS, LSF and ARC 4

PBS (and LSF)

•Portable Batch Scheduler•(Load Sharing Facility)•Schedule users' jobs on a cluster•Interactive login to a submission node•Users execute different commands:•qsub (bsub): submit•qstat (bjobs): status•qdel (bkill): abort

SubmissionNode

Clusternode

Clusternode

Clusternode

Clusternode

Clusternode

Schedulernode

06/08/10 PBS, LSF and ARC 5

ARC

•Advanced Resource Connector•Complete grid middleware with:

•Information system•Command-line clients with integrated broker•Data management stack (GridFTP)

•Usable through client programs:•Job description: xRSL•ngsub: submit•ngstat: status update•ngkill: cancel•ngget: get results

06/08/10 PBS, LSF and ARC 6

P-GRADE Portal Architecture

•Workflow Editor-related components•Portlet-related components•Workflow data storage•Execution layer

•See next slide!

06/08/10 PBS, LSF and ARC

P-GRADE Portal MachineP-GRADE Portal Machine

Globus GridGlobus Grid EGEE GridEGEE Grid

P-GRADE Portal's filesystemP-GRADE Portal's filesystem

UserWorkflow

Data

Common workflow andjob execution scripts

Globus scripts EGEE scripts

Apache Tomcat servlet containerApache Tomcat servlet container

GridSphere portal framework

P-GRADEPortalPortlet

DAGMan

PBS scripts

PBSCluster

PBSCluster

WorkflowEditorServlet

WorkflowEditorClient

P-GRADEPortalPortlet

P-GRADEPortalPortlet

P-GRADEPortalPortlet

P-GRADEPortalPortlet

06/08/10 PBS, LSF and ARC 8

LSF and PBS integration I.

•Principal idea:•User should be able to configure a remote ssh connection to submission nodes through the Settings portlet•Connection is established using ssh keypairs•Established connections are reused in order to minimize ssh connection attempts

•Connections are used on a:•Per-user,•Per-resource bassis→ a given user's connection isn't accessible by other users→ different resources use different connections

06/08/10 PBS, LSF and ARC 9

LSF and PBS integration II.

Portal Machine

Connection Pool User 1

Connection Pool User 2

LSF resource 1

PBS resource 1

LSF resource 3

PBS resource 2

LSF resource 2

PRIV

PUB

PRIV

PUB

PUB

PUB

PUB

06/08/10 PBS, LSF and ARC 10

LSF and PBS integration III.

•Job preparation:•wkf_pre_LSF.sh: prepare job, wrapper, collect files•wkf_pre_PBS.sh: prepare job, wrapper, collect files

•Job execution:•wkf_LSF.sh: submit and observe job using b* commands•wkf_PBS.sh: submit and observer job using q* commands•Wrappers:

•LSF_fake.sh: handle generator and collector jobs, run exe•PBS_fake.sh: handle generator and collector jobs, run exe

•Job post-processing:•No real task (wkf_post_LSF.sh and wkf_post_PBS.sh)

06/08/10 PBS, LSF and ARC 11

LSF and PBS integration features

•Full PS support•Very quick response time compared to grid middlewares•Support for any kind of executable

06/08/10 PBS, LSF and ARC 12

ARC integration I.

•Very similar to the EGEE support•An ARC client stack has to be installed on the P-GRADE Portal machine•Users can gain access with X.509 proxy certs•Two possible resource selections:•User can specify the target cluster•Cluster can be selected by client broker

06/08/10 PBS, LSF and ARC 13

ARC integration II.

•Job preparation: wkf_pre_nordugrid.sh•Wrapper script preparation•Generator-related cleanups (as needed)•Autogenerator-related file uploads (as needed)

•Job execution: wkf_nordugrid.sh•xRSL prepared based on job properties•Job submission and management using ng* commands•Wrapper script: manage generator and collector jobs if needed

•Job post-processing: wkf_post_nordugrid.sh•No real job to perform

06/08/10 PBS, LSF and ARC 14

ARC integration features

•Full PS support•Offers the possibility to select execution resource•Support for any kind of executable•Multi-node job support•Offers possibility to specify runTimeEnvironment attributes

06/08/10 PBS, LSF and ARC 15

Summary

•PBS, LSF and ARC integration was relatively simple thanks to the pluggable architecture of P-GRADE Portal•However, the devil is in the details:•Ssh connection sharing + parallel connection limits•Proper LSF job cancel•…