an overview of torque/moab queuing. topics arc topology authentication architecture of the queuing...
TRANSCRIPT
An overview of Torque/Moab queuing
TopicsARC topologyAuthentication Architecture of the queuing systemWorkflowJob ScriptsSome queuing strategies
Network Topology
ARC Authentication
AccountsYour account is your VT PIDYour password is your VT PID passwordContact 4help to change your password
ArchitectureResource Manager - TorqueScheduler - MoabAllocation Manager - Gold
Account RequestsTo request an account:
http://www.arc.vt.edu/arc/UserAccounts.php
System X accountshttps://portal.arc.vt.edu/allocation/
alloc_request.htmlTo add users to a Hat/Project for System XPI Email [email protected] to ask to have that
person added
Queue Architecture
Resource ManagerTorque (Tera-scale Open-source Research
and QUEue manager)Branch of OpenPBS2 Parts
Pbs_mom Daemon on each compute node Handles job start up and keeps track of the
node’s state Pbs_server
Server that jobs are submitted to. Keeps track of all nodes and jobs
Moab SchedulerTakes state information from the resource
manager and then schedules jobs to run“The Brains”Implements and manages:
Scheduling policiesDynamic prioritiesReservationsFairshare
Allocation ManagerGold
Keeps track of cpu-hours
WorkflowFrom the queuing system point of view
When a scheduling interval startsMoab asks pbs_server the state of the nodes and of any jobsMoab attempts to schedule any eligble jobs if there are
enough resources freeMoab tells pbs_server to schedule start any jobs that can be
startedPbs_server contacts the pbs_mom on the first node
assigned to the job (That pbs_mom is called the mother superior)
The mother superior executes the jobs scripts submitted by the user
When a pbs heartbeat happensThe pbs_server will contact the pbs_mom and ask the
status of its node
WorkflowFrom a user’s point of viewSubmit a job script to the queuing systemWait for the job to be scheduled and runGet the results
The QueueQueue divided into 3 subqueues:Active – runningEligible – idle, but waiting to runBlocked – idle, held, deferred
Blocked jobsA job can be “blocked” for several
reasons:Requested resources not availableReserved nodes offlineUser already has the maximum number
of eligible jobs in the queueUser places intentional holdMoab supports four distinct types of
holds: user, system, batch, and deferred
Job ScriptsThe job script has a few definitions to inform
the queuing system of your job requirements and who you are
Includes environment variables and commands to run your application
Script DefinitionsWalltime request
#PBS -lwalltime=hh:mm:ssCPU request
For System X #PBS -lnodes=X:ppn=2
X number of nodes with 2 processors per node
For Cauldron #PBS -lncpus=X
X number of cores
Script DefinitionsWhich queue you want to use
#PBS -q <queue name>queues available now
System X OS X partition: production_q System X Linux partition: linux_q Cauldron: cauldron_q Inferno2: inferno2_q Ithaca: ithaca_q Ithaca parallel matlab: pmatlab_q
Script DefinitionsSome information about who you are
Your submission group #PBS -W group_list=<group>
For System X it is tcf_user For Cauldron it is sgiusers Type `groups` when logged into a head node to check that
you belong to group of the machine you wish to submit to
Your cpu-hour hat #PBS -A <hat>
On Cauldron it is sgim0000 System X users were told their hat in their welcome
letters.
Job Script Template#!/bin/bash
#PBS -lwalltime=01:00:00
#PBS -lncpus=8
#PBS -q cauldron_q
#PBS -W group_list=sgiusers
#PBS -A sgim0000
Job ScriptAfter the PBS definitions, put in the
commands to start your jobThere are example job scripts found in
/apps/doc(s)
Running Your JobUse qsub to submit your job to the queue
qsub ./jobscriptTo check on your job’s status
qstat -a <queue name>showq -p <partition name>
OSX, LINUX, or CAULDRONcheckjob <job id number>cstat (on Cauldron)
To delete a job, use qdelqdel <job id number>
Check StatusTo display jobs currently in the queue:-bash-3.1$ showq -p LINUX
active jobs------------------------
JOBID USERNAME STATE PROCS REMAINING STARTTIME
176882 jalemkul Running 24 23:46:56 Mon Aug 2 07:11:24
176885 jalemkul Running 24 1:01:37:59 Mon Aug 2 09:02:27
176889 jalemkul Running 24 1:02:21:27 Mon Aug 2 09:45:55
176918 kmsong Running 44 6:14:25:16 Mon Aug 2 16:49:44
176897 kmsong Running 88 15:17:01:30 Tue Aug 3 11:25:58
5 active jobs 118 of 118 processors in use by local jobs (100.00%)
50 of 59 nodes active (84.75%)
eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
0 eligible jobs
blocked jobs-----------------------
JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
176956 kmsong Idle 112 33:08:00:00 Tue Aug 3 15:15:13
1 blocked job
Total jobs: 6
Check StatusWith qstat:-bash-3.1$ qstat linux_q
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
176882.queue yt42_md1 jalemkul 1229:52: R linux_q
176885.queue yt42_md3 jalemkul 1185:20: R linux_q
176889.queue yt42_md2 jalemkul 1168:07: R linux_q
176897.queue DNS kmsong 00:00:00 R linux_q
176918.queue Re1200_2sec kmsong 1828:24: R linux_q
176956.queue LDNS kmsong 0 Q linux_q
Note: status give by R – running and Q – queued
Qstat -f-bash-3.1$ checkjob -v 176956
job 176956 (RM job '176956.queue.arc-int.vt.edu’)
AName: LDNS
State: Idle
Creds: user:kmsong group:tcf_user account:engr1003 class:linux_q qos:sysx_qos
WallTime: 00:00:00 of 33:08:00:00
SubmitTime: Tue Aug 3 15:15:13
(Time Queued Total: 1:23:40:11 Eligible: 00:00:19)
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 112
Total Requested Nodes: 56
Req[0] TaskCount: 112 Partition: ALL
NodeAccess: SINGLEJOB
TasksPerNode: 2
UMask: 0000
OutputFile: sysx2.arc-int.vt.edu:/home/kmsong/Turb_channel/Simulation/Re600/176103.queue.arc-int.vt.edu/LDNS.o176956
ErrorFile: sysx2.arc-int.vt.edu:/home/kmsong/Turb_channel/Simulation/Re600/176103.queue.arc-int.vt.edu/LDNS.e176956
BypassCount: 305
Partition List: LINUX,SHARED
SrcRM: SystemX DstRM: SystemX DstRMJID: 176956.queue.arc-int.vt.edu
Submit Args: -l walltime=800:00:00 -l nodes=56:ppn=2 -Wgroup_list -Aengr1003 -NLDNS -q linux_q -I
Flags: INTERACTIVE
Attr: INTERACTIVE,checkpoint
StartPriority: 200
PE: 112.00
NOTE: job violates constraints for partition OSX (partition OSX not in job partition mask)
Node Availability for Partition LINUX --------
available for 2 tasks - n[925,951-958]
rejected for State - n[833-1024]
NOTE: job req cannot run in dynamic partition LINUX now (insufficient procs available: 18 < 112)
NOTE: job violates constraints for partition CAULDRON (partition CAULDRON not in job partition mask)
NOTE: job violates constraints for partition INFERNO2 (partition INFERNO2 not in job partition mask)
NOTE: job violates constraints for partition TT (partition TT not in job partition mask)
NOTE: job violates constraints for partition PECOS (partition PECOS not in job partition mask)
NOTE: job violates constraints for partition ITHACA (partition ITHACA not in job partition mask)
BLOCK MSG: job 176956 violates active SOFT MAXJOB limit of 2 for class linux_q user (Req: 1 InUse: 2) (recorded at last scheduling iteration)
Queuing StrategiesQueue early, queue often
Queue your jobs up! Can’t run jobs if they aren’t in the queue Don’t wait for the queue to get smaller because the
job will wait, its waiting anyways! Possibility for backfill for smaller jobs
Have an accurate walltime Accurate walltimes will help the queue try to backfill
in smaller jobs in between runs of larger jobs, but only if it won’t effect the start time of the next job
Try to queue large jobs before downtimes If you have a large job that can never seem to have
enough cpus available, queue it up before a downtime.
Queue StrategiesThe command `showbf`
That command shows cpus available right now, and for how long
Showstartestimated start time of a job
checkjob -vCheckpointing
If your code does checkpointing you can exploit backfill, by queuing jobs to fill the small places but maybe not running to completion
Good idea in general, in case of hardware failure
showbf-bash-3.1$ showbf
Partition Tasks Nodes StartOffset Duration StartDate
--------- ----- ----- ------------ ------------ --------------
ALL 146 43 00:00:00 INFINITY 09:29:01_08/10
OSX 4 2 00:00:00 INFINITY 09:29:01_08/10
LINUX 62 31 00:00:00 INFINITY 09:29:01_08/10
PECOS 8 1 00:00:00 INFINITY 09:29:01_08/10
ITHACA 72 9 00:00:00 INFINITY 09:29:01_08/10
showstart-bash-3.1$ showstart 177165
job 177165 requires 64 procs for 12:00:00
Estimated Rsv based start in 8:47:53 on Tue Aug 10 18:11:40
Estimated Rsv based completion in 20:47:53 on Wed Aug 11 06:11:40
Best Partition: OSX
showstart-bash-3.1$ showstart 64@12:00:00
job 64@12:00:00 requires 64 procs for 12:00:00
Estimated Rsv based start in 8:44:19 on Tue Aug 10 18:11:40
Estimated Rsv based completion in 20:44:19 on Wed Aug 11 06:11:40
Best Partition: OSX
DocumentationTorque/PBS and Moab scheduler and job
submission documentation:http://www.clusterresources.com/pages/
resources/documentation.php