workflow management in condor gökay gökçay. dagman meta-scheduler the directed acyclic graph...

Post on 26-Dec-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Workflow Management in Condor

Gökay Gökçay

DAGMan Meta-Scheduler

• The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results

• DAGMan reads the Condor log file generated by each Condor job to find out which jobs are unsubmitted, submitted, or complete.

• DAGMan also makes a guarantee that a DAG is recoverable, even if the machine running DAGMan goes down during execution.

Dag File Example

# Filename: diamond.dag Job A A.condor Job B B.condor Job C C.condor Job D D.condor PARENT A CHILD B C PARENT B C CHILD D

Submitting the DAG to Condor

• In order to guarantee recoverability, the DAGMan program itself is run as a Condor job.

• “condor_submit_dag diamond.dag”• This script will generate the

diamond.dag.condor.sub CondorCommandFile for the DAG, and submit it to Condor

Essentials

• Prepare Jobs Each CondorCommandFile can only submit one job. Multi-job clusters (multiple queue lines) are not supported. The log= for all CondorCommandFiles must point to the same Condor log file, otherwise, DAGMan will not see all the Condor log entries for every job in the DAG.

• Write DAG File Write the DAG file, so that JOB entries refer to the CondorCommandFiles you wrote in the previous step.

• Submit the DAG Finally, you submit the DAG written in the previous step using the condor_submit_dag script.

Complications

• Setup, Cleanup, or Interpretation of a Node (Scripts) (Ex: Decompression, Compression, Serialization etc.)

• Throttling (Too many scripts)• Unreliable applications or subsystems

Stork

• Stork is an emerging Condor technology for managing data placement.

• Stork provides a fault tolerant framework for scheduling data allocation and data transfer jobs. The architecture is modular and extensible, with support for many popular storage systems and data transfer protocols.

• Modules: ftp , gsiftp (Grid FTP), http, nest (Condor Nest Network Storage), srb (SDSC storage resource broker), csrm (Castor Srm), srm (dCache SRM), unitree (NCSA UniTree), diskrouter

Condor submit file

$ cat process.condor universe = vanilla executable = /bin/sort arguments = /tmp/stork/index.html /tmp/stork/classad-talk.ps output = /tmp/stork/process.results.out error = process.results.err log = process.results.log should_transfer_files = YES when_to_transfer_output = ON_EXIT notification = never queue

Using Stork with Condor DAGMan

$ cat transfer.stork[ dap_type = transfer; src_url = "file:/tmp/stork/process.results.out"; dest_url = "nest://turkey.cs.wisc.edu/1.dat"; alt_protocols = "gsiftp-nest"log = "transfer.log"; ]

$ cat stork-condor.dagDATA INPUT1 alt_protocol.stork DATA INPUT2 transfer_ftp-file.storkJOB PROCESS process.condorDATA OUTPUT transfer.storkPARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD OUTPUT

Thanks For Listening

Questions?

top related