results of meeting on workload manager components interaction datagrid wp1

22
F.Pacini - Milan - 8 May, 2001 - n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini [email protected]

Upload: tavia

Post on 12-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Results of Meeting on Workload Manager Components Interaction DataGrid WP1. F. Pacini [email protected]. Summary. General Decisions General LB Model Interactions between UI and RB Interactions between UI and LB Interactions between RB and JSS. General Decisions (1/4). RB. JSS. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 1

Results of Meeting on Workload Manager Components Interaction

DataGrid WP1

F. Pacini

[email protected]

Page 2: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 2

Summary

General Decisions

General LB Model

Interactions between UI and RB

Interactions between UI and LB

Interactions between RB and JSS

Page 3: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 3

General Decisions (1/4)

UI

RB

LB

GK

JSS

Page 4: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 4

General Decisions (2/4)

SUBMITTED: this state is generated by the UI, just after having assigned the Job ID and just before to submit the job to the RB => UI assigns a unique Job ID (dg_jobID) (using e.g. hostname+time+PID) and logs this event

CHKPT: this is a system checkpointing of jobs running on a CE, independently from Application Checkpointing

All parties are strongly invited to use the above Job State Diagram to identify the status of a job in the system!

Page 5: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 5

General Decisions (3/4)

UI will only perform syntax checking on class-ad attributes, whether they exists and are written in the correct format (i.e. <attribute name> = <expression>). UI will NOT perform any semantic checks on the attribute values. UI will assign default values to some mandatory attributes, if needed and when possible.

UI will be able to contact different RB’s and LB’s according to the lists contained in local conf files.

UI will be able to use different conf files as specified by user at UI start-up (e.g. Start_UI –config <filename>).

UI will provide a command for the creation of a proxy (such as the Globus grid-proxy-init function).

JSS will have a proxy repository. CESNET will give details on this.

Page 6: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 6

General Decisions (4/4)

The need for a security mechanism for interactions between UI and RB/LB has been highlighted. Parties will perform a kind of handshake, and a secure channel will be established for their communications. CESNET is going to provide an API encapsulating this mechanism for LB, and will also provide an “how-to” to support UI-RB communications.

If the handshake is successful, RB will send to UI the address of “its” LB.

Both RB and LB will provide C/C++ API’s to be used by UI, encompassing also network communications. As far as UI will use Python for PM9 (at least!), SWIG tool will be used to support their wrapping (http://www.swig.org). NOTE: in order to exploit maximum benefits from SWIG, it is strongly recommended to comply with ANSI C/C++ standards.

LB requires some modifications to be applied to the Globus job-manager. CESNET will take care of this.

Page 7: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 7

General LB Model

We have agreed on the LB model proposed by CESNET in their doc Logging and Bookkeeping Service, Rev. 1.35

This model is based on a PUSH mechanism, where all the actors within the Workload Manager will push logging info using the defined API’s

Some modifications are needed to the Globus job-manager in order to use these API’s (e.g. RSL schema modification to take into account the dg_jobID). CESNET will take care of this.

CESNET proposed event to be logged will be analysed by all parties to build a final agreed set

Page 8: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 8

Interactions between UI and RB (1/8)

The Job Submission UI contacts the RB when the following commands are issued by the user:

dg-job-submit dg-list-job-match dg-job-cancel

Page 9: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 9

Interactions between UI and RB (2/8)

dg-job-submitdg-job-submit <jdl_file> [-resource res_id] [-notify e_mail_address]

The core information flowing from the UI to the RB consists of a job class-ad built from the job description file

There is a minimal set of mandatory attributes to be specified in the class-ad to be sent to RB (ref. UI Man Pages)

The user will be uniquely identified via his/her CertificateSubject (=> UserID no more needed)

UI builds the dg_jobID also using the RB and related LB addresses, for later M&C purposes

Page 10: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 10

Interactions between UI and RB (3/8)

If the dg-job-submit has been issued with the “-resource” option, then the job-ad contains the attribute:

ResourceID = res_id

and the RB shall submit the job to the resource identified by “res_id” without going through the match-making process.

A new attribute RetryCount will be added in the class-ad to allow the user to specify the number of retries in case the specified resource is temporarily not available. In case the user will not provide this attribute, the UI will fill it with a default value.

Page 11: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 11

Interactions between UI and RB (4/8)

If the dg-job-submit has been issued with the “-notify” option, then the job class-ad contains the attribute:

UserContact = e_mail_address

The following schema will be followed to notify the user of job status change for PM9:

the RB shall send an e-mail notification to e_mail_address when the matchmaking process has finished and the job is ready to be submitted to JSS (READY status)

the JSS shall send an e-mail notification to e_mail_address when the job starts running on the CE (RUNNING status)

The RB shall send an e-mail notification to e_mail_address when the job has finished (ABORTED or DONE status)

Page 12: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 12

Interactions between UI and RB (5/8)

dg-list-job-matchdg-list-job-match <jdl_file>

The core information flowing from the UI to the RB consists of a job class-ad built from the job description file

There is a minimal set of mandatory attributes to be specified in the class-ad to be sent to RB (ref. UI Man Pages)

The user will be uniquely identified via his/her CertificateSubject (=> UserID no more needed)

Page 13: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 13

Interactions between UI and RB (6/8)

In this case the RB does not submit the job but only searches for resources compatible with the input job class-ad. The RB will send back to the UI a list of suitable resources identified by their Resource ID composed by Globus Gatekeeper contact string plus a queue name (if any).

Page 14: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 14

Interactions between UI and RB (7/8)

dg-job-canceldg-job-cancel <jobID1…….. jobIDn | -all >

The core information flowing from the UI to the RB consists of a list of dg_jobID’s

The user will be uniquely identified via his/her CertificateSubject (=> UserID no more needed)

According to the RB address provided in the dg_jobID’s, the UI will contact the relevant RB’s for cancellation request

Page 15: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 15

Interactions between UI and RB (8/8)

If the dg-job-cancel command has been issued with the “-all” input parameter, no dg_jobID’s are available a priori.

For PM9, the UI will contact all the RB’s available in the current used conf file, asking for the cancellation of all jobs owned by the requesting user.

Page 16: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 16

Interactions between UI and LB (1/5)

The Job Submission UI contacts the LB when the following commands are issued by the user:

dg-job-status dg-get-logging-info

Page 17: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 17

Interactions between UI and LB (2/5)

dg-job-statusdg-job-status <jobID1…….. jobIDn > | -all > [full]

The core information flowing from the UI to the LB consists of a list of dg_jobID’s and the required information level

The user will be uniquely identified via his/her CertificateSubject (=> UserID no more needed)

According to the LB address provided in the dg_jobID’s, the UI will contact the relevant LB’s for status request

Page 18: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 18

Interactions between UI and LB (3/5)

If the dg-job-status command has been issued with the “-all” input parameter, no dg_jobID’s are available a priori.

For PM9, the UI will contact all the LB’s available in the current used conf file, asking for the status of all jobs owned by the requesting user.

An agreement on the set of information to be returned in short and full mode has been found. Details in UI Man Pages.

Page 19: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 19

Interactions between UI and LB (4/5)

dg-get-logging-infodg-get-logging-info <jobID1…….. jobIDn > | -all > [-from T1] [-to T2] [-full]

The core information flowing from the UI to the LB consists of a list of dg_jobID’s, the required information level and the time interval

The user will be uniquely identified via his/her CertificateSubject (=> UserID no more needed)

According to the LB address provided in the dg_jobID’s, the UI will contact the relevant LB’s for status request

Page 20: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 20

Interactions between UI and LB (5/5)

If the dg-get-logging-info command has been issued with the “-all” input parameter, no dg_jobID’s are available a priori.

For PM9, the UI will contact all the LB’s available in the current used conf file, asking for the status of all jobs owned by the requesting user.

An agreement on the set of information to be returned in short and full mode has been found. Details in UI Man Pages.

CESNET will extend some API’s in order to take into account the time interval as querying filter

Page 21: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 21

Interactions between RB and JSS (1/2)

The core information flowing from the RB to the JSS (via a JSS API) consists of a job class-ad where the RB has properly inserted the following attributes:

GKContactString

QueueName (if any)

In case the UserContact attribute is present in the received class-ad, the JSS will interpret this as the request to send e-mail notifications when needed

For PM9, JSS will transform this class-ad into the Condor Submission File. This file shall contain the dg_jobID to be passed to GK in RSL expression

For PM9, JSS ia also responsible for maintaining the mapping between dg_jobID and Condor-G ID

Page 22: Results of Meeting on  Workload Manager Components  Interaction DataGrid WP1

F.Pacini - Milan - 8 May, 2001 - n° 22

Interactions between RB and JSS (2/2)

For PM9, JSS will inspect Condor-G logfile to detect job status transition

After job completion, JSS will notify the RB of this event.

Assuming that RB and JSS will sit on the same computer, the communications will be based on local sockets.