osg area coordinator’s report: workload management maxim potekhin bnl 631-344-3621...

3
OSG OSG Area Coordinator’s Report: Area Coordinator’s Report: Workload Management Workload Management Maxim Potekhin BNL 631-344-3621 [email protected]

Upload: mervin-booker

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 potekhin@bnl.gov

OSG OSG Area Coordinator’s Report:Area Coordinator’s Report:

Workload ManagementWorkload Management

Maxim PotekhinBNL

[email protected]

Page 2: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 potekhin@bnl.gov

2

Overview Overview Workload ManagementWorkload Management

• Current Initiatives: extensions of the Panda job aggregation and submission service, in particular scalability and security enhancements

finalizing work on the Panda Pilot Factory, that would help in improving scalability; will proceed to deploy at BNL

integration work on MyProxy/GUMS/glexec; deployment at BNL being worked on; testing on EGEE platform planned in near future – will develop transparent pilot code for both OSG and EGEE

integration work done with new version of WS-GRAM, whereby a web server is a wrapper for the WS-GRAM gatekeeper and can be cloned as necessary, and load balanced, thus allowing us to significantly boost throughput

• Accomplishments Since Last Report glexec-enabled Panda pilot tested, with VOMS proxies cached on FNAL MyProxy server improved user-level access control in Panda Panda has been/is successfully used in production and user analyses

• Issues / Concerns Current priorities in the OSG Workload Management effort continue to be scalability and security EGEE interoperability re: glexec? To further improve scalability of Panda operations across multiple operating regions and VOs, the capability

to partition the system across multiple instances while retaining coherent overall system management and monitoring is being implemented

This is being done in tandem with the introduction of high-availability load balancers to improve the scalability of individual instances, in particular the primary instance at BNL. This enables a single Panda Service to span several physical servers, transparently offering better scaling and fault tolerance.

Security activities include establishing a facility level (as opposed to Panda-based) audit trail, in the pilot-based job submission environment. (see the above on glexec/MyProxy pilots)

Page 3: OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL 631-344-3621 potekhin@bnl.gov

3

WMS in WBS WMS in WBS

WBS Task Information In Charge Finish Date

4.1.2.1 Deliver phase 1 improvements into OSG 1.0 Wenaus 12/07/07

4.1.9.   Support security effort in facility (including GUMS) Wenaus, Potekhin 09/30/08

4.2.1 Support OSG VOs in building, deploying and operating Workload Management Systems (WMS) that are based on just-in time job scheduling and the integration of tools used by these WMS in to the VDT

Potekhin 09/30/08

4.2.1.1 Deliver phase 1 into OSG 1.0 Potekhin, Popescu 17/03/08?

4.2.1.2 Deliver phase 2 into OSG 1.2 Potekhin, Caballero, Popescu

06/07/08

4.2.2.  Manage the allocation of compute and storeage resources allocated to the OSG-ET by OSG sites and/or external resource providers.

Potekhin 09/30/08

4.2.3 Operate and support the hardware upon which the WMS service for OSG VO is instantiated.

Ernst, Popescu 09/30/08

4.2.4. Operate and support the WMS service for the OSG VO Potekhin, Caballero, Popescu

09/30/08

4.3.3.1 Job submission, execution and management Green 09/30/08