a guide to dagman
DESCRIPTION
A brief guide to DAGManTRANSCRIPT
![Page 1: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/1.jpg)
A Guide to the DAGMan (7.0) “Specification”
Information provided by the folks at Condor
WARNING!!! This presentation lacks images
![Page 2: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/2.jpg)
2
DAGMan
• “DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor”
• Manages dependencies between compute and data jobs at a high level
What this means to us?• Provides users a simple way to denote
simple dependencies between jobs
![Page 3: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/3.jpg)
3
An Example# Filename: aBoringExample.dag JOB A a.condor JOB B b.condor JOB C c.condor JOB D d.condor PARENT A CHILD B C PARENT B C CHILD D
# Filename: a.condorExecutable = foo Requirements = Memory >= 32 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
![Page 4: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/4.jpg)
4
Nodes
• A node is composed of– A “cluster” of compute or data jobs defined by
one Condor or Stork description file respectively
• A group of executions defined by one queue command (i.e. 150 instances of the same program)
– (optionally) associated pre or post scripts• Only one cluster can be defined per
submit file for use with DAGMan
![Page 5: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/5.jpg)
5
Directed Links
• Simple Dependencies– Tells Condor that children nodes can not be
executed until their parents are executed
• No complex relationships / dependencies can be given to DAGMan
![Page 6: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/6.jpg)
6
Specification (the basics)JOB / DATA
{JOB | DATA} jobName jobDescFile.condor [DONE][DIR WD]
SCRIPTSCRIPT {PRE|POST} jobName scriptName.sh [arguments]
PARENT..CHILDPARENT p1 [p2 …] CHILD c1 [c2 …]
RETRYRETRY jobName numRetries [UNLESS-EXIT value]
Others: priority, category, vars, maxjobs, abort-dag-on, config (see documentation or feel free to ask)
![Page 7: A Guide to DAGMan](https://reader036.vdocument.in/reader036/viewer/2022082804/546d0b31af7959ec228b83df/html5/thumbnails/7.jpg)
7
Other Features
• When DAG is submitted, a submit description file is produced– Optionally use this file to build a hierarchy of dags
(dags within dags)• Can monitor watching myFile.dag.dagman.out• Job Recovery
– If failure, DAGMan produces a new “recover” dag– Can be used to restart DAG at nodes where failure
occurred