an overview of torque/moab queuing. topics arc topology authentication architecture of the queuing...

An overview of Torque/Moab queuing

TopicsARC topologyAuthentication Architecture of the queuing systemWorkflowJob ScriptsSome queuing strategies

Network Topology

ARC Authentication

AccountsYour account is your VT PIDYour password is your VT PID passwordContact 4help to change your password

ArchitectureResource Manager - TorqueScheduler - MoabAllocation Manager - Gold

Account RequestsTo request an account:

http://www.arc.vt.edu/arc/UserAccounts.php

System X accountshttps://portal.arc.vt.edu/allocation/

alloc_request.htmlTo add users to a Hat/Project for System XPI Email [email protected] to ask to have that

person added

Queue Architecture

Resource ManagerTorque (Tera-scale Open-source Research

and QUEue manager)Branch of OpenPBS2 Parts

Pbs_mom Daemon on each compute node Handles job start up and keeps track of the

node’s state Pbs_server

Server that jobs are submitted to. Keeps track of all nodes and jobs

Moab SchedulerTakes state information from the resource

manager and then schedules jobs to run“The Brains”Implements and manages:

Scheduling policiesDynamic prioritiesReservationsFairshare

Allocation ManagerGold

Keeps track of cpu-hours

WorkflowFrom the queuing system point of view

When a scheduling interval startsMoab asks pbs_server the state of the nodes and of any jobsMoab attempts to schedule any eligble jobs if there are

enough resources freeMoab tells pbs_server to schedule start any jobs that can be

startedPbs_server contacts the pbs_mom on the first node

assigned to the job (That pbs_mom is called the mother superior)

The mother superior executes the jobs scripts submitted by the user

When a pbs heartbeat happensThe pbs_server will contact the pbs_mom and ask the

status of its node

WorkflowFrom a user’s point of viewSubmit a job script to the queuing systemWait for the job to be scheduled and runGet the results

The QueueQueue divided into 3 subqueues:Active – runningEligible – idle, but waiting to runBlocked – idle, held, deferred

Blocked jobsA job can be “blocked” for several

reasons:Requested resources not availableReserved nodes offlineUser already has the maximum number

of eligible jobs in the queueUser places intentional holdMoab supports four distinct types of

holds: user, system, batch, and deferred

Job ScriptsThe job script has a few definitions to inform

the queuing system of your job requirements and who you are

Includes environment variables and commands to run your application

Script DefinitionsWalltime request

#PBS -lwalltime=hh:mm:ssCPU request

For System X #PBS -lnodes=X:ppn=2

X number of nodes with 2 processors per node

For Cauldron #PBS -lncpus=X

X number of cores

Script DefinitionsWhich queue you want to use

#PBS -q <queue name>queues available now

System X OS X partition: production_q System X Linux partition: linux_q Cauldron: cauldron_q Inferno2: inferno2_q Ithaca: ithaca_q Ithaca parallel matlab: pmatlab_q

Script DefinitionsSome information about who you are

Your submission group #PBS -W group_list=<group>

For System X it is tcf_user For Cauldron it is sgiusers Type `groups` when logged into a head node to check that

you belong to group of the machine you wish to submit to

Your cpu-hour hat #PBS -A <hat>

On Cauldron it is sgim0000 System X users were told their hat in their welcome

letters.

Job Script Template#!/bin/bash

#PBS -lwalltime=01:00:00

#PBS -lncpus=8

#PBS -q cauldron_q

#PBS -W group_list=sgiusers

#PBS -A sgim0000

Job ScriptAfter the PBS definitions, put in the

commands to start your jobThere are example job scripts found in

/apps/doc(s)

Running Your JobUse qsub to submit your job to the queue

qsub ./jobscriptTo check on your job’s status

qstat -a <queue name>showq -p <partition name>

OSX, LINUX, or CAULDRONcheckjob <job id number>cstat (on Cauldron)

To delete a job, use qdelqdel <job id number>

Check StatusTo display jobs currently in the queue:-bash-3.1$ showq -p LINUX

active jobs------------------------

JOBID USERNAME STATE PROCS REMAINING STARTTIME

176882 jalemkul Running 24 23:46:56 Mon Aug 2 07:11:24

176885 jalemkul Running 24 1:01:37:59 Mon Aug 2 09:02:27

176889 jalemkul Running 24 1:02:21:27 Mon Aug 2 09:45:55

176918 kmsong Running 44 6:14:25:16 Mon Aug 2 16:49:44

176897 kmsong Running 88 15:17:01:30 Tue Aug 3 11:25:58

5 active jobs 118 of 118 processors in use by local jobs (100.00%)

50 of 59 nodes active (84.75%)

eligible jobs----------------------

JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME

0 eligible jobs

blocked jobs-----------------------

JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME

176956 kmsong Idle 112 33:08:00:00 Tue Aug 3 15:15:13

1 blocked job

Total jobs: 6

Check StatusWith qstat:-bash-3.1$ qstat linux_q

Job id Name User Time Use S Queue

------------------- ---------------- --------------- -------- - -----

176882.queue yt42_md1 jalemkul 1229:52: R linux_q



176897.queue DNS kmsong 00:00:00 R linux_q

176918.queue Re1200_2sec kmsong 1828:24: R linux_q

176956.queue LDNS kmsong 0 Q linux_q

Note: status give by R – running and Q – queued

Qstat -f-bash-3.1$ checkjob -v 176956

job 176956 (RM job '176956.queue.arc-int.vt.edu’)

AName: LDNS

State: Idle

Creds: user:kmsong group:tcf_user account:engr1003 class:linux_q qos:sysx_qos

WallTime: 00:00:00 of 33:08:00:00

SubmitTime: Tue Aug 3 15:15:13

(Time Queued Total: 1:23:40:11 Eligible: 00:00:19)

NodeMatchPolicy: EXACTNODE

Total Requested Tasks: 112

Total Requested Nodes: 56

Req[0] TaskCount: 112 Partition: ALL

NodeAccess: SINGLEJOB

TasksPerNode: 2

UMask: 0000

OutputFile: sysx2.arc-int.vt.edu:/home/kmsong/Turb_channel/Simulation/Re600/176103.queue.arc-int.vt.edu/LDNS.o176956

ErrorFile: sysx2.arc-int.vt.edu:/home/kmsong/Turb_channel/Simulation/Re600/176103.queue.arc-int.vt.edu/LDNS.e176956

BypassCount: 305

Partition List: LINUX,SHARED

SrcRM: SystemX DstRM: SystemX DstRMJID: 176956.queue.arc-int.vt.edu

Submit Args: -l walltime=800:00:00 -l nodes=56:ppn=2 -Wgroup_list -Aengr1003 -NLDNS -q linux_q -I

Flags: INTERACTIVE

Attr: INTERACTIVE,checkpoint

StartPriority: 200

PE: 112.00

NOTE: job violates constraints for partition OSX (partition OSX not in job partition mask)

Node Availability for Partition LINUX --------

available for 2 tasks - n[925,951-958]

rejected for State - n[833-1024]

NOTE: job req cannot run in dynamic partition LINUX now (insufficient procs available: 18 < 112)

NOTE: job violates constraints for partition CAULDRON (partition CAULDRON not in job partition mask)

NOTE: job violates constraints for partition INFERNO2 (partition INFERNO2 not in job partition mask)

NOTE: job violates constraints for partition TT (partition TT not in job partition mask)

NOTE: job violates constraints for partition PECOS (partition PECOS not in job partition mask)

NOTE: job violates constraints for partition ITHACA (partition ITHACA not in job partition mask)

BLOCK MSG: job 176956 violates active SOFT MAXJOB limit of 2 for class linux_q user (Req: 1 InUse: 2) (recorded at last scheduling iteration)

Queuing StrategiesQueue early, queue often

Queue your jobs up! Can’t run jobs if they aren’t in the queue Don’t wait for the queue to get smaller because the

job will wait, its waiting anyways! Possibility for backfill for smaller jobs

Have an accurate walltime Accurate walltimes will help the queue try to backfill

in smaller jobs in between runs of larger jobs, but only if it won’t effect the start time of the next job

Try to queue large jobs before downtimes If you have a large job that can never seem to have

enough cpus available, queue it up before a downtime.

Queue StrategiesThe command `showbf`

That command shows cpus available right now, and for how long

Showstartestimated start time of a job

checkjob -vCheckpointing

If your code does checkpointing you can exploit backfill, by queuing jobs to fill the small places but maybe not running to completion

Good idea in general, in case of hardware failure

showbf-bash-3.1$ showbf

Partition Tasks Nodes StartOffset Duration StartDate

--------- ----- ----- ------------ ------------ --------------

ALL 146 43 00:00:00 INFINITY 09:29:01_08/10

OSX 4 2 00:00:00 INFINITY 09:29:01_08/10

LINUX 62 31 00:00:00 INFINITY 09:29:01_08/10

PECOS 8 1 00:00:00 INFINITY 09:29:01_08/10

ITHACA 72 9 00:00:00 INFINITY 09:29:01_08/10

showstart-bash-3.1$ showstart 177165

job 177165 requires 64 procs for 12:00:00

Estimated Rsv based start in 8:47:53 on Tue Aug 10 18:11:40

Estimated Rsv based completion in 20:47:53 on Wed Aug 11 06:11:40

Best Partition: OSX

showstart-bash-3.1$ showstart 64@12:00:00

job 64@12:00:00 requires 64 procs for 12:00:00

Estimated Rsv based start in 8:44:19 on Tue Aug 10 18:11:40

Estimated Rsv based completion in 20:44:19 on Wed Aug 11 06:11:40

Best Partition: OSX

DocumentationTorque/PBS and Moab scheduler and job

submission documentation:http://www.clusterresources.com/pages/

resources/documentation.php

an overview of torque/moab queuing. topics arc topology authentication architecture of the queuing...

Documents

jobs slide

q slide

node slide

password slide

deferred slide

strategies slide

application slide

queue architecture slide