workflow management in condor gökay gökçay. dagman meta-scheduler the directed acyclic graph...

10
Workflow Management in Condor Gökay Gökçay

Upload: theresa-crawford

Post on 26-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Workflow Management in Condor

Gökay Gökçay

Page 2: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

DAGMan Meta-Scheduler

• The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan is responsible for submitting batch jobs in a predefined order and processing the results

• DAGMan reads the Condor log file generated by each Condor job to find out which jobs are unsubmitted, submitted, or complete.

• DAGMan also makes a guarantee that a DAG is recoverable, even if the machine running DAGMan goes down during execution.

Page 3: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Dag File Example

# Filename: diamond.dag Job A A.condor Job B B.condor Job C C.condor Job D D.condor PARENT A CHILD B C PARENT B C CHILD D

Page 4: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Submitting the DAG to Condor

• In order to guarantee recoverability, the DAGMan program itself is run as a Condor job.

• “condor_submit_dag diamond.dag”• This script will generate the

diamond.dag.condor.sub CondorCommandFile for the DAG, and submit it to Condor

Page 5: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Essentials

• Prepare Jobs Each CondorCommandFile can only submit one job. Multi-job clusters (multiple queue lines) are not supported. The log= for all CondorCommandFiles must point to the same Condor log file, otherwise, DAGMan will not see all the Condor log entries for every job in the DAG.

• Write DAG File Write the DAG file, so that JOB entries refer to the CondorCommandFiles you wrote in the previous step.

• Submit the DAG Finally, you submit the DAG written in the previous step using the condor_submit_dag script.

Page 6: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Complications

• Setup, Cleanup, or Interpretation of a Node (Scripts) (Ex: Decompression, Compression, Serialization etc.)

• Throttling (Too many scripts)• Unreliable applications or subsystems

Page 7: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Stork

• Stork is an emerging Condor technology for managing data placement.

• Stork provides a fault tolerant framework for scheduling data allocation and data transfer jobs. The architecture is modular and extensible, with support for many popular storage systems and data transfer protocols.

• Modules: ftp , gsiftp (Grid FTP), http, nest (Condor Nest Network Storage), srb (SDSC storage resource broker), csrm (Castor Srm), srm (dCache SRM), unitree (NCSA UniTree), diskrouter

Page 8: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Condor submit file

$ cat process.condor universe = vanilla executable = /bin/sort arguments = /tmp/stork/index.html /tmp/stork/classad-talk.ps output = /tmp/stork/process.results.out error = process.results.err log = process.results.log should_transfer_files = YES when_to_transfer_output = ON_EXIT notification = never queue

Page 9: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Using Stork with Condor DAGMan

$ cat transfer.stork[ dap_type = transfer; src_url = "file:/tmp/stork/process.results.out"; dest_url = "nest://turkey.cs.wisc.edu/1.dat"; alt_protocols = "gsiftp-nest"log = "transfer.log"; ]

$ cat stork-condor.dagDATA INPUT1 alt_protocol.stork DATA INPUT2 transfer_ftp-file.storkJOB PROCESS process.condorDATA OUTPUT transfer.storkPARENT INPUT1 INPUT2 CHILD PROCESS PARENT PROCESS CHILD OUTPUT

Page 10: Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan

Thanks For Listening

Questions?