copyright © 2010 altair engineering, inc. proprietary and confidential. all rights reserved. rajiv...
TRANSCRIPT
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Rajiv Jaisankar
Technical Specialist
Altair APAC
PBS Professional Administration training
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
2
Chapter One: Understanding PBS Professional
Chapter One
What is PBS Professional?
History of PBS Professional
PBS Works Online Store
PBS Professional Documentation
Altair Global Offices & Technical Support
Broad Hardware and Operating System Support
Supported MPI Libraries
PBS Professional Components & Roles
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
3
What is PBS Professional?
Workload management solution that maximizes the efficiency and utilization of high-performance computing (HPC) resources and improves job turnaround.
Robust Workload Management Floating flex-based licenses Scalability, with flexible queues Job arrays User and administrator interface Job suspend/resume Application checkpoint/restart Automatic file staging Accounting logs Access control lists
Advanced Scheduling Algorithms Resource-based scheduling Preemptive scheduling Optimized node sorting Enhanced job placement Advance & standing reservations Cycle harvesting across workstations Scheduling across multiple complex Network topology scheduling Manages both batch and interactive work
Reliability, Availability and Scalability Server failover feature Automatic job recovery Provides system monitoring Provides integration with MPI solutions Tested to manage 1,000,000+ jobs per day
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
4
History of PBS Professional
1993-97: Developed for NASA to replace NQS
2000: Veridian formed commercial version of PBS; Released PBS Professional 5.0
2003: Altair acquired PBS Professional technology and engineering; Released PBS Professional 5.3
2004: Released PBS Professional 5.4
2005: Released PBS Professional 7.0 and 7.1
2006: Released PBS Professional 8.0
2007: Released PBS Professional 9.0 and 9.1
2008: Released PBS Professional 9.2
2008: Released PBS Professional 10.0
2009: Released PBS Professional 10.1
2009: Released PBS Professional 10.2
2010: Released PBS Professional 10.4
2010: Released PBS Professional 11.0
2011: Released PBS Professional 11.1
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
5
Broad Hardware & Operating System Support
AMD-Linux and Windows Intel-Linux and Windows IBM AIX on Power IBM Linux on Power HP-UX on Itanium 2 Cray X2, XT, XT3, XT4, XT5,
and XT6
SGI Altix ICE, XI, and UV SUN Solaris on SPARC Windows 7, XP, Vista,
Server 2003, and Server 2008 Red Hat Enterprise 4, 5, and 6 SLES 9, 10, and 11
Note: For a detailed list of supported systems & OS please refer to the latest release notes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
6
Supported MPI Libraries
Currently supported MPI libraries integrated with PBS:
• MPICH 1.2.5, 1.2.6 on Linux 2.4 on, x86, AMD64, EM64T, Itanium2
• MPICH 1.2.5, 1.2.6 on Linux 2.6 on x86, AMD64, EM64T
• MPICH 1.2.7 on x86 Linux
• MPICH-GM on Linux
• Intel MPI 2.0.22 on Linux
• MPICH2 1.0.3, 1.0.5, 1.0.7 on Linux
• IBM POE on AIX 5.x, and 6.x , including HPS support
• HP MPI 1.08.03 on HP-UX 11 on Itanium 2
• HP MPI 2.0.0 on Linux 2.4 & 2.6 on x86, AMD64, EM64T, Itanium 2
• LAM/MPI 6.5.9/7.0.6/7.1.1 on Linux 2.4/2.6 on x86, AMD64, EM64T, Itanium 2
• SGI MPI (MPT) on Linux on Altix / Itanium 2/x86_64 and XE
• SGI MPI (MPT) over Infiniband
• MVAPICH 1.2.7/2.0 on Linux
• OpenMPI 1.4.2 on Linux
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
7
PBS Professional Components & Roles
Batch Server
Scheduler
MOM
- referred to as the PBS Server- central focus for a PBS complex- routes job to compute host *- processes all PBS related commands * - provides the basic batch services *- server maintains its own server and queue settings *- daemon executes as pbs_server.bin
- referred to as the PBS Scheduler- queries list of running and queued jobs from the PBS Server *- queries queue, server, and node properties *- queries resource consumption and availability from the PBS MOM
*- sorts available jobs according to local scheduling policies- determines which job is eligible to run next- daemon executing as pbs_sched- referred to as the PBS MOM- executes jobs at request of PBS Scheduler - monitors resource usage of running jobs- enforces resource limits on jobs- reports system resource limits, configuration *- daemon executing as pbs_mom
* This information is for debugging purposes only. It may change in future releases and should not be relied upon.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
8
Complex Configurations
Single Execution System
Server
Scheduler
MOM
All 3 PBS components on a single host.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
9
Complex Configurations, cont.
Multiple Execution System
Front End System
MOM
MOM
MOM
Server
Scheduler
Note: PBS Server machine maybe a different architecture (UNIX/LINUX) from the execution hosts
A PBS complex can be either UNIX/Linux or Windows, but not both.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
10
Chapter Two - Installation of PBS Professional
Chapter Two
Pre-Installation
Basic Installation
Post-Installation
PBS Installed Directory Structure
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
11
Post Installation – PBS Configuration File
How does the PBS init script determine which services to invoke?
• The init script reads the configuration file: “/etc/pbs.conf”
• Format of a pbs.conf file:
PBS_EXEC=/opt/pbs/default
PBS_HOME=/var/spool/PBS
PBS_START_SERVER=1
PBS_START_MOM=1
PBS_START_SCHED=1
PBS_SERVER=traintb16
PBS_DATA_SERVICE_USER=pbsuser01
0 will prevent init from starting or stopping the daemon
1 will have init start or stop the daemon
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
12
File System and File Transfer
Sites will need to determine how users will access data files
• Most common file sharing methods used by PBS customers:
• NFS Network File System (most widely used)• GFS Global File System
What method of file copy will be used?
• rcp remote copy (default used by PBS)• scp secure copy• cp Linux/Unix copy
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
13
User’s PBS Environment
Delivery of STDOUT/STDERR files• PBS should be able to copy user’s STDOUT and STDERR files to the appropriate
directory without password challenge
Stage input/output files• Users may need to import/export files related to the job before/after execution
Users’ Data Transfer• Users should be able to transfer data without having to supply password, (e.g.
rcp/scp)
Users must have a valid account• Users should be able to log onto execution host(s) and should have a valid username
and group
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
14
Altair LM-X License Management
PBS Professional 11.0 is now licensed by Altair License Management System (ALM) based on X-Formation’s LM-X license management system
Altair’s ALM package for PBS can be downloaded from:
https://secure.altair.com/UserArea/
We recommend that Altair’s ALM be installed and configured before installing PBS Professional v11.0
For additional information on Altair’s ALM refer to the Altair License Manager System 11.0 Installation and Operations Guide
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
15
Chapter Three - PBS Administration
Chapter Seven
• Process flow of a PBS job
• PBS installed directory structure
• Directory structure of $PBS_HOME
• Directory structure of $PBS_EXEC
• Understanding the PBS configuration file
• Manually starting and stopping PBS daemons
• Impact of PBS daemons restarts on running jobs
• Network ports used by PBS
• Status of PBS complex
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
16
Process Flow of a PBS Job – User Level
1. User submits job
2. PBS Server returns a job ID
3. PBS Scheduler requests a list of resources from
the Server *
4. PBS Scheduler sorts all the resources and jobs *
5. PBS Scheduler informs PBS Server which host(s) that job
can run on *
6. PBS Server pushes job script to execution host(s)
7. PBS MOM executes job script
8. PBS MOM periodically reports resource usage back to PBS Server *
9. When job is completed PBS MOM kills the job script
10. PBS Server de-queues job from PBS complex
HOST A HOST B HOST C
PBS SCHEDULER
PBS SERVER6.traintb16
ncpusmemhost
6.traintb16 on HOST A
6.traintb16
Note: * This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
17
PBS Installed Directory Structure
PBS Professional software is installed in two separate directories
• $PBS_EXEC “/opt/pbs/default”
contains: PBS daemonsLibrariesMan pagesSupport toolsAdministrator and user PBS commands
• $PBS_HOME “/var/spool/PBS”
contains: PBS daemon configurations PBS daemon logsOther various file-related directories
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
18
PBS Directory Structure - PBS_HOME
Directory structure of $PBS_HOME
daemon configuration directories
daemon log directories
server_priv
mom_priv
sched_priv
server_logs
mom_logs
sched_logs
spool
undelivered
checkpoint
aux
pbs_environment
pbs_version
datastore
PBS_HOME
misc directories/files
This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
19
PBS Directory Structure - PBS_EXEC
Directory structure of $PBS_EXEC
binaries of PBS daemons and user/admin PBS commands
libraries, manual pages, and header files
PBS_EXEC bin
sbin
lib
man
include
etc
tcltk
unsupported
python
pgsql
This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
20
Directory Structure of $PBS_HOME /server_priv
Detailed structure of $PBS_HOME/server_priv *
jobs
accounting
server.lock
usedlic
directory containing users’ job scripts
directory containing daily accounting logs
PBS server PID lock file
hooks directory containing custom hook definitions
* This information is for debugging purposes only. It may change in future releases.
server_priv
db_password
prov_tracking
svrlive
tracking PBS license related file
PBS license related file
OS provisioning directory
database password - encrypted
used for failover configuration
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
21
PBS Configuration File – pbs.conf
PBS installs a configuration file “pbs.conf” located in “/etc/” directory. This configuration file is used by PBS to determine:
• Which daemons to start/stop• What PBS server to communicate with• What file copy mechanism to use
Each server/scheduler, execution, and client host has a pbs.conf file installed
Refer to Administrator’s Guide; Chapter 13; Section 13.1.3; pages 715-716 for a complete listing of configuration file variables
PBS_EXEC=/opt/pbs/defaultPBS_HOME=/var/spool/PBSPBS_START_SERVER=1PBS_START_MOM=1PBS_START_SCHED=1PBS_SERVER=hostname.domainPBS_DATA_SERVICE_USER=pbsuser01
Default contents of pbs.conf
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
22
PBS Configuration File – pbs.conf, cont.
How pbs.conf differs between the PBS Server and PBS MOM hosts:
PBS SERVER HOST PBS EXECUTION HOST
PBS_EXEC=/opt/pbs/defaultPBS_HOME=/var/spool/PBSPBS_START_SERVER=1PBS_START_MOM=0PBS_START_SCHED=1PBS_SERVER=traintb16PBS_DATA_SERVICE_USER=pbsuser01
PBS_EXEC=/opt/pbs/defaultPBS_HOME=/var/spool/PBSPBS_START_SERVER=0PBS_START_MOM=1PBS_START_SCHED=0PBS_SERVER=traintb16
Note: Only 1 active instance of a PBS Server and PBS Scheduler can be running within a PBS complex
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
23
PBS Configuration File – pbs.conf, cont.
The variable PBS_START_<daemon> sets which daemon should be allowed to start when the “/etc/init.d/pbs” script runs.
For example:
/etc/pbs.confThis is the expected behavior when executing “/etc/init.d/pbs start”:
pbs_server daemon will be invoked
pbs_mom daemon will not be invoked
pbs_sched daemon will be invoked
PBS_EXEC=/opt/pbs/defaultPBS_HOME=/var/spool/PBSPBS_START_SERVER=1PBS_START_MOM=0PBS_START_SCHED=1PBS_SERVER=traintb16PBS_DATA_SERVICE_USER=pbsuser01
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
24
Starting/Stopping PBS Using Start/Stop Script
Starting/stopping PBS
• Why use start/stop script?• Vnode definitions are created only when the start script is used; they are
not created when the daemons are started manually• Vnode definitions are required if PBS is to manage cpusets on a machine• The pbs_mom daemon on the Altix and the Cray must be started via the
start script • Using the pbs start/stop script to stop PBS will preserve jobs (the server
gets a ‘qterm -t quick’)
• Location of start/stop script (Linux)
/etc/init.d/pbs start
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
25
Status of PBS Complex
Use qstat -Bf to view the status of a PBS complexServer: traintb16 server_state = Active server_host = traintb16.prog.altair.com scheduling = True total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0 default_queue = workq log_events = 511 mail_from = adm query_other_jobs = True resources_default.ncpus = 1 default_chunk.ncpus = 1 scheduler_iteration = 600 FLicenses = 33 resv_enable = True node_fail_requeue = 310 max_array_size = 10000 pbs_license_info = 7788@localhost pbs_license_min = 1 pbs_license_max = 2147483647 pbs_license_linger_time = 3600 license_count = Avail_Global:32 Avail_Local:1 Used:0 High_Use:0 pbs_version = PBSPro_11.0.0.103450 eligible_time_enable = False max_concurrent_provision = 5
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
26
Manually Starting/Stopping PBS Daemons
Manually starting/stopping PBS daemons
• PBS Server• Start
• $PBS_EXEC/sbin/pbs_server• Stop
• $PBS_EXEC/bin/qterm –t [quick|delay|immediate]
• PBS Scheduler• Start
• $PBS_EXEC/bin/pbs_sched• Stop
• $PBS_EXEC/bin/qterm –s• kill –INT <pbs_sched_pid>
• PBS MOM• Start
• $PBS_EXEC/sbin/pbs_mom• Stop
• $PBS_EXEC/bin/qterm –m This will shut down all the MOMs• kill –INT <pbs_mom_pid>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
27
Network Ports Used By PBS Daemons
UNIX/Linux network ports
Daemon Port Number Protocol Connection
pbs 15001 TCP Client/Scheduler to Server
pbs_server 15001 UDP Server to MOM via RPP
pbs_mom 15002 TCP MOM to/from Server
pbs_resmon 15003 TCP MOM resource requests
pbs_resmon 15003 UDP MOM resource requests
pbs_sched 15004 TCP PBS Scheduler
pbs_mom_globus 15005 TCP MOM Globus
pbs_mom_globus 15006 TCP MOM Globus resource requests
pbs_mom_globus 15006 UDP MOM Globus resource requests
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
28
Chapter Four - Job Management
Chapter Four
Defining a Job Script
Types of Jobs
Submitting Jobs
Process Flow of a PBS Job
Querying PBS Jobs
Setting Job Attributes
Requesting Job Resources
Default Job Attributes
Order of Default Resources Assigned to Jobs
Job Exit Codes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
29
Defining a Job Script
What is a job script?
• A file that contains a set of instructions to execute a series of commands. Also known as a “batch job”.
Example of a job script:
#!/bin/bash
sleep 5
/home/altair/scripts/optistruct –cpu 2 handlebar.fem
Shell interpreter
commands
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
30
Submitting Jobs - Using “qsub”
Submitting a job script to PBS
• Using “qsub” command
Usage: qsub <job_attributes/resources> <job_script>
Example: qsub –l select=1:ncpus=1 test_script
• If the job is accepted by PBS, a job identifier is returned. This job identifier is comprised of the job number and the submitted server host name:
0.traintb16
Note: - If a job is rejected it will not return a job identifier, but it will increment the job ID - Largest possible job ID is 7 digits: 9,999,999. Once reached it will reset to zero
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
31
Requesting Job Resources – Built in Resources
Resource Description
arch System architecture
cputAmount of CPU time used by the job for all the processes on all the chunks
mem Amount of physical memory allocated to a job
ncpus Number of processors requested for a job
walltime Time requested for the job to run
Note: For complete listing refer to PBS Reference Documentation Guide pages 336-340
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
32
Types of Jobs
There are two types of PBS jobs
• Batch Job- A script that contains commands or tasks to execute site specific applications
• Interactive Job- Runs like a batch job, but when it runs, the user’s terminal input and output are
connected to the execution host; similar to a login session.• Allows users to debug a job script• Verify a new application properly runs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
33
Setting Job Attributes – Using PBS Directives
Job attributes can be set in 2 different ways:
• Method 1: on the qsub command line
qsub –N <job_name> <job_script>
• Method 2: within a job script as a PBS directive
#!/bin/bash#PBS –N test_run_01#PBS –l select=4:ncpus=4:mem=16GB#PBS –l place=scatter#PBS- j oe#PBS –o /home/pbsuser01/OUTPUTS
optistruct –ncpu 2 handlebar.fem
Note: - PBS expects the directives to begin on the second line, and be on consecutive lines thereafter. Once started, the interpreter stops processing directives at the first line that contains an executable line. It will ignore comment lines.
- Command line arguments will override PBS directives.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
34
Requesting Job Resources – Understanding Resources
What are job resources?• Applications sometimes need certain types and amounts of system
resources such as:- memory- ncpus- scratch space
• During job submission, required resources can be requested
How can these resources be requested within PBS?• PBS defines these resources as chunks or as job-wide resources
What are “chunks”?
• set of resources that are allocated as a unit to a job
• smallest set of resources that are allocated to a job
• for example: ncpus, mem
• requested in a “select” statementqsub –l
select=<#>:ncpus=<#>:mem=<#>
What are “job-wide resources”?
• resources that are associated with the entire job
• for example: placement of jobs, walltime
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
35
Requesting Job Resources – Using Chunks & Select
Requesting resources in chunks• Resources which are to be allocated as a unit to a job
- Smallest set of resources to be allocated to a single job- Host/Vnode level request
Syntax: qsub –l select=[ N: ] chunk[ + [N:] chunk….]
For example:
1. Job requesting: 3 chunks with 2 CPUs per chunks:
qsub –l select=3:ncpus=2
2. Job requesting: 2 chunks with 1 CPU each and 10GB each and another set of 3 chunks with 2 CPUs each and 8GB each of memory
qsub –l select=2:ncpus=1:mem=10gb+3:ncpus=2:mem=8gb
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
36
Requesting Job Resources – Job Placement
Placing jobs on hosts/Vnodes
• Users can specify how their multi-node job is placed within a PBS complex based on the resources requested
• Place statement controls how the job is placed on the hosts/vnodes from which resources may be allocated for the job
• Using the “place” statement:
Usage: qsub –l place= <type>| <sharing> | <group>
Example: qsub –l select=1:ncpus=2:mem=100MB –l place=pack script
Type Value Description
type
free place job on any vnode(s), including hosts
pack all chunks will be taken from one host
scatter each chunk is allocated to a separate host
sharingexcl only this job uses the vnodes chosen
shared this job can share the vnodes chosen
group <resource> chunks will be grouped according to a resource
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
37
Requesting Job Resources – Job Wide Resources
Requesting job-wide limits
• Resources that are requested outside a select statement- Such as walltime, or cput
• Requesting resources at server or queue level
• Resources that are not tied to specific host(s)/vnode(s)
For example:
qsub –l select=1:ncpus=1:mem=100MB –l walltime=01:00:00 myscript
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
38
SMP jobs are meant to run on a single execution host
Submitting an SMP PBS job
qsub –l select=x:ncpus=x –l place=pack
Note: all chunks will be placed on a single host
Additional options• Place a job on a host that already has a job running on it
qsub –l select=1:ncpus=2 –l place=pack:shared
• Place a job on a host on which no other jobs are running and make that host exclusive to it
qsub –l select=1:ncpus=2 –l place=pack:excl
Requesting Job Resources – SMP Jobs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
39
MPI jobs run on multiple hosts, using an MPI application
PBS has tightly integrated wrapper scripts for various MPI implementations
• Allows PBS to track spawned MPI processes• More accurate tracking of all resources being consumed across all the
hosts• Accurately record CPU accounting utilization on all nodes• Accurately enforce requested job limits• Automatically "clean up" stray MPI processes on all nodes• Require no changes other than wrapping
Requesting Job Resources – MPI Jobs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
40
1777 ? Ss 0:00 /opt/pbs/default/sbin/pbs_mom
1779 ? Ss 0:00 \_ -bash
1810 ? S 0:00 \_ /bin/sh /var/spool/PBS_10.4.0.101257/mom_priv/jobs/1746.rhel5.lab.altair.com.SC
1812 ? S 0:00 \_ /opt/mpich2-install/bin/mpirun -f /var/spool/PBS_10.4.0.101257/aux/1746.rhel5.lab.altair.com /usr/local/gromacs_mpich2-1.3.2p1/bi
1813 ? S 0:00 \_ /opt/mpich2-install/bin/hydra_pmi_proxy --control-port rhel54:37470 --demux poll --pgid 0 --proxy-id 0
1814 ? R 0:14 \_ /usr/local/gromacs_mpich2-1.3.2p1/bin/mdrun -f /test/bench/d.dppc/grompp.mdp -c /test/bench/d.dppc/conf.gro -p /test/benc
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
41
Method 1
• Request: 4-way MPI job with 2 CPUs and 2GB memory per MPI task, with one MPI task per host, where each host has 2 CPUs and 2 GB memory
qsub –l select=4:ncpus=2:mem=2GB –l place=scatter
• Variable $PBS_NODEFILE contains list of vnodesVnodeAVnodeBVnodeCVnodeD
• Sample of an MPI job script
#!/bin/bash#PBS –l select=4:mem=2GB:mpiprocs=2#PBS –l place=scatter
mpirun –np 8 –mem 8GB file
Requesting Job Resources – Submitting MPI Jobs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
42
Method 2
• Request: 4-way MPI job with 2 CPUs and 2GB memory per MPI task; request up to 4 hosts, where each host has 4 CPUs and 4 GB memory
qsub –l select=4:ncpus=2:mem=2GB –l place = free
• Variable $PBS_NODEFILE contains list of vnodes
VnodeA
VnodeB
• Sample of a MPI job script
Requesting Job Resources – Submitting MPI Jobs, cont.
#!/bin/bash#PBS –l select=4:mem=2GB:mpiprocs=2$PBS –l place=free
mpirun –np 8 –mem 8GB file
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
43
Requesting Job Resources - Boolean Resources
A resource that can be requested as true or false
Requesting chunks that have resource ‘optistruct’, the qsub request line would be:
qsub –l select=1:ncpus=1:optistruct=true
The scheduler will only place this job on vnodes that have the resource “optistruct” set to “true”
If a boolean resource is requested as job-wide, e.g.:
qsub –l select=1:ncpus=1 –l optistruct=true
PBS will check if it is available at the server or queue level – not vnode/host level
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
44
Default Job Attributes
PBS includes default values for resources that the user doesn’t specify during job submission
The following are resource defaults assigned to a job:
• default_chunk.ncpus=1• resources_default.ncpus=1• resources_default.walltime=<5 years>
Note: Root and managers can specify additional default resources
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
45
Querying Jobs – Using “qstat”
To show a list of current PBS jobs’ status
• Using “qstat” command
Usage: qstat <-a, -n, -s, -1, -w>
Example: qstat
Job id Name User Time Use S Queue---------------- ---------------- ----------- -------- - -----6.traintb16 test_script pbsuser01 00:00:00 R workq7.traintb16 jobA pbsuser02 00:00:00 R workq8.traintb16 test_2 pbsuser04 0 Q workq9.traintb16 test_script pbsuser01 00:00:00 R workq
Note: If a job was deleted or completed then it can no longer be listed via qstat unless the PBS complex has enabled the job history functionality
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
46
Querying Jobs – Additional qstat Options
Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time------------- -------- -------- ---------- ------ --- --- ------ ----- - -----8.traintb16 pbsuser0 workq test_scrip 6556 1 8 -- -- R 00:07
-a job name, session id, # nodes req, #ncpus req, req’d mem, req’d, time, and elapsed time
-s same as option –a, but with comments
-n same as option –a, but indicates which execution vnode(s) the job is running on
Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------- -------- -------- ---------- ------ --- --- ------ ----- - -----8.traintb16 pbsuser0 workq test_scrip 5556 1 8 -- -- R 00:07 Job run at Wed Jul 05 at 14:48 on (traintb16:ncpus=8)
Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------- -------- -------- ---------- ------ --- --- ------ ----- - -----8.traintb16 pbsuser0 workq test_scrip 5556 1 8 -- -- R 00:07 traintb16/0
Note: - Adding an additional option “-1” will output each entry on a single line instead of wrapping around
- Also using “-w” shows the full output of individual fields
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
47
Querying Jobs – Job States
State Description
Q Job is queued waiting for execution
R Job is running
S Job is suspended
E Job is exiting after execution
H Job is held or put on hold
WJob is waiting for its requested execution time or has been delayed 30 minutes because stage-in failed
T Job in transition is being moved between states
F Jobs that have finished; regardless if completed successfully or not
M Jobs that have moved to another PBS complex
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
48
Job Attributes - Viewing Job Attributes
To view job attributes that were assigned to a particular job, use the qstat command.
Usage: qstat –f <job_id>
Example: qstat –f 2.trainhp01
Job Id: 1.traintb16
Job_Name = sleep_job
Job_Owner = [email protected]
resources_used.cpupercent = 0
resources_used.cput = 00:00:00
resources_used.mem = 1028kb
resources_used.ncpus = 1
resources_used.vmem = 18440kb
resources_used.walltime = 00:00:00
job_state = R
queue = workq
server = traintb16
Checkpoint = u
ctime = Tue May 5 17:49:09 2010
Error_Path = traintb16.prog.altair.com:/home/pbsuser01/boo/sleep_job.e1
exec_host = traintb16/0
exec_vnode = (traintb16:ncpus=1)
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Tue May 5 17:49:09 2010
Output_Path = traintb16.prog.altair.com:/home/pbsuser01/boo/sleep_job.o1
Priority = 0
qtime = Tue May 5 17:49:09 2010
Rerunable = True
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
49
Job Attributes - Viewing Job Attributes, cont.
Resource_List.ncpus = 1
Resource_List.nodect = 1
Resource_List.place = pack
Resource_List.select = 1:ncpus=1
stime = Tue May 5 17:49:11 2010
session_id = 11535
jobdir = /home/pbsuser01
substate = 42
Variable_List = PBS_O_HOME=/home/pbsuser01,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=pbsuser01,
PBS_O_PATH=/home/pbsuser01/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X
11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mi
t/sbin:/opt/pbs/default/bin:/opt/pbs/default/sbin,
PBS_O_MAIL=/var/spool/mail/pbsuser01,PBS_O_SHELL=/bin/bash,
PBS_O_HOST=traintb16.prog.altair.com,
PBS_O_WORKDIR=/home/pbsuser01/boo,PBS_O_SYSTEM=Linux,PBS_O_QUEUE=workq
comment = Job run at Tue May 05 at 17:49 on (traintb16:ncpus=1)
etime = Tue May 5 17:49:09 2010
Submit_arguments = -l select=1:ncpus=1 my_script
Note: Running as root or PBS Manager will output additional information
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
50
Querying Jobs – Using “tracejob”
Using tracejob to obtain comprehensive information about a job• Using “tracejob” command
Usage: tracejob –n<days> <job id>
Example: tracejob –n4 0.traintb16
Job: 0.traintb16
05/05/2010 17:43:35 S enqueuing into workq, state 1 hop 105/05/2010 17:43:35 S Job Queued at request of [email protected], owner = [email protected],
job name = sleep_job, queue = workq05/05/2010 17:45:08 L Considering job to run05/05/2010 17:45:08 S Job Run at request of [email protected] on exec_vnode (traintb16:ncpus=1)05/05/2010 17:45:08 M Started, pid = 1149105/05/2010 17:45:10 S Job Modified at request of [email protected]/05/2010 17:45:10 L Job run05/05/2010 17:45:14 M task 00000001 terminated05/05/2010 17:45:14 M Terminated05/05/2010 17:45:15 S Obit received momhop:1 serverhop:1 state:4 substate:4205/05/2010 17:45:15 S Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3056kb
resources_used.ncpus=1 resources_used.vmem=39392kb resources_used.walltime=00:00:0705/05/2010 17:45:15 M task 00000001 cput= 0:00:0005/05/2010 17:45:15 M traintb16 cput= 0:00:00 mem=3056kb05/05/2010 17:45:15 M Obit sent05/05/2010 17:45:15 M copy file request received05/05/2010 17:45:15 M staged 2 items out over 0:00:0005/05/2010 17:45:15 M delete job request received05/05/2010 17:45:15 S dequeuing from workq, state 505/05/2010 17:45:15 M kill_job05/05/2010 17:45:15 M work proc outstanding
S = Server
L = Scheduler
M = MOM
Note: Information is taken from server logs, scheduler logs, and mom logs (local to that machine) past 24 hrs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
51
Querying Jobs – Deleting Jobs
To delete jobs that are listed under qstat
• Using “qdel” command
Usage: qdel <job id>
Example: qdel 0.traintb16
To delete a job from the server regardless of the job’s state
Usage: qdel –W force <job id>
Example: qdel –W force 0.traintb16
Note: Users can only delete their own jobs; unless that user’s name is in the manager’s list
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
52
Querying Jobs – Finished Job History
To view only jobs that have been deleted, moved, or finished• qstat -H
To view all jobs; regardless what state type• qstat -x
Job id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----80.traintb16 sleep5 pbsuser01 00:00:00 F workq 81.traintb16 sleep5 pbsuser01 00:00:00 F workq 82.traintb16 sleep5 pbsuser01 00:00:00 F workq 83.traintb16 sleep5 pbsuser01 00:00:00 F workq 84.traintb16 sleep5 pbsuser01 0 Q workq 85.traintb16 sleep5 pbsuser01 00:00:00 R workq
Job id Name User Time Use S Queue---------------- ---------------- ---------------- -------- - -----80.traintb16 sleep5 pbsuser01 00:00:00 F workq 81.traintb16 sleep5 pbsuser01 00:00:00 F workq 82.traintb16 sleep5 pbsuser01 00:00:00 F workq 83.traintb16 sleep5 pbsuser01 00:00:00 F workq
Note: The PBS Server attribute job_history_enable needs to be set in order to use this option
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
53
Querying jobs – Estimated start time/start order
PBS can estimate the start time and start order of jobs using qstat –T option• New column: Est Start Time• Job ids are displayed in the order of estimated start time
$ qstat -T
traintb16: Est Req'd Req'd StartJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----159.traintb16 pbsuser01workq STDIN 4302 1 2 -- 00:05 R --164.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 13:36165.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 13:36160.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 14:41161.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 14:41162.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 15:46163.traintb16 pbsuser01workq STDIN -- 1 1 -- 01:05 Q 15:46
Note: The sorted job ids are NOT determined by the PBS Scheduler.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
54
Querying Jobs – Re-Queuing Jobs
To re-queue a running job
• Using “qrerun” command
Usage: qrerun <job id>
Example: qrerun 0.traintb16
To re-queue a job even if that job’s execution host is not reachable
Usage: qrerun –W force <job id>
Example: qrerun –W force 0.traintb16
Note: only root or managers can perform this operation
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
55
The exit code from a batch job is a standard Unix termination status, the same sort of number you get in a shell script from checking the "$?" variable after executing a command.
Typically, exit code 0 (zero) means successful completion.
Codes 1-127 are typically generated by the job itself calling exit() with a non-zero value to terminate itself and indicate an error.
Exit codes in the range 129-255 represent jobs terminated by Unix "signals". Each type of signal has a number, and what's reported as the job exit code is the signal number plus 128. Signals can arise from within the process itself (as for SEGV) or be sent to the process by some external agent (such as the batch control system).
The specific meaning of the signal numbers are platform-dependent
Exit codes < 0 are set by PBS
Job Exit Codes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
56
Job Exit Codes, cont.
# Name Description
0 JOB_EXEC_OK Job execution was successful
-1 JOB_EXEC_FAIL1 Job execution failed, before files, no retry
-2 JOB_EXEC_FAIL2 Job execution failed, after files, no retry
-3 JOB_EXEC_RETRY Job execution failed, do retry
-4 JOB_EXEC_INITABT Job aborted on MOM initialization
-5 JOB_EXEC_INITRST Job aborted on MOM init, checkpoint, no migrate
-6 JOB_EXEC_INITRMG Job aborted on MOM init, checkpoint, ok migrate
-7 JOB_EXEC_BADREST Job restart failed
-8 JOB_EXEC_GLOBUS_INIT__RETRY Initialization of globus job failed, do retry
-9 JOB_EXEC_GLOBUS_INIT_FAIL Initialization of globus job failed, no retry
-10 JOB_EXEC_FAILUID Invalid UID/GID for job
-11 JOB_EXEC_RERUN Job rerun
-12 JOB_EXEC_CHKP Job was checkpointed and killed
-13 JOB_EXEC_FAIL_PASSWORD Job failed due to a bad password
-14 JOB_EXEC_RERUN_ON_SIS_FAILJob was re-queued or deleted due to communication failure between 1st head node and a sister node
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
57
PBS Accounting Records – Log Information
PBS accounting logs contain information about job statistics such as:• Owner, queue, start time, end time, execution host, resources
requested, exit status, and resources used
Accounting logs are stored on the machine where the pbs_server daemon is running • Location: $PBS_HOME/server_priv/accounting
– A new log file is created every day—file name format: [YYYYMMDD]
The accounting logs are only accessible by root
The accounting logs can be parsed by the “pbs-report” script
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
58
date-time Date and time stamp. Format: mm/dd/yyyy hh:mm:ss
record_type Single character indicating type of record
id_string Job, reservation or reservation-job identifier
message_text Contains detailed information for the job or reservation
Sample of accounting log entry:
syntax: date-time; record_type; id_string; message_text
PBS Accounting Records – Details of Accounting Log Entry
05/05/2010 17:45:15;E;0.traintb16;user=pbsuser01 group=users jobname=sleep_job queue=workq ctime=1241559815 qtime=1241559815 etime=1241559815 start=1241559910 exec_host=traintb16/0 exec_vnode=(traintb16:ncpus=1) Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack Resource_List.select=1:ncpus=1 session=11491 end=1241559915 Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=3056kb resources_used.ncpus=1 resources_used.vmem=39392kb resources_used.walltime=00:00:07
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
59
Using Accounting Records: Using “pbs-report”
To parse information from the accounting logs use the pbs-report script located in $PBS_EXEC/sbin directory
Sample output of pbs-report:
Information obtained from pbs-report helps sites to determine how much work was done by PBS jobs at a site during a specified time period
PBS Pro Cluster Accounting Summary Statistics
-----------------------------------------
Report from Thu Sept 15 2010 00:00:00 to Thu Sept 17 2010 12:13:32
# of Total Total Average
Username jobs CPU Time Wall Time Efcy. Wait Time Muda
------------ ----- ---------- ---------- ----- ---------- -----
TOTAL 132 0 618322 0.000 2108 0.000
pbsuser01 127 0 616328 0.000 2191 0.000
pbsuser02 5 0 1994 0.000 4 0.000
Minimum 5 0 1994 0.000 4 0.000
Maximum 127 0 616328 0.000 2191 0.000
Mean 66 0 309161 0.000 1097 0.000
Deviation 61 0 307167 0.000 1093 0.000
Median 5 0 1994 0.000 4 0.000
Job Set Summary
Standard
Minimum Maximum Mean Deviation Median
---------- ---------- ---------- ---------- ----------
CPU time 0 0 0 0 0
Wall time 0 78616 4684 17559 60
Wait time 0 67778 2108 2070 2
Suspend time 0 0 0 0 0
Note: All times displayed in seconds.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
60
Moving Jobs Between Queues
Users can move jobs from one queue to local queue by using the qmove command
• Using “qmove” command
Usage: qmove <new_queue> <job_id>
Example: qmove small_queue 0.traintb16
Jobs can also be moved to another PBS complex
Example: qmove small_queue@traintb02 0.traintb16
Note:
• Running or suspended jobs cannot be moved
• Use qstat –H if job was moved
• Must specify the fully job id.server for qstat
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
61
Holding and Releasing Jobs
Users can put a hold on their jobs, so that PBS will not schedule them for execution
• Using “qhold” command
Usage: qhold <job_id>
Example: qhold 0.traintb16
To release a held job, to allow PBS to consider it for execution:
• Using “qrls” command
Usage: qrls <job_id>
Example: qrls 0.traintb16
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
62
Deferring Job Execution
Users can specify a date/time for their job to be eligible for execution
Usage: qsub –a date_time
date_time [[[CC]YY]MM]DD]hhmm[.SS]
Example: qsub –a 201008281645 my_script
Note: Deferred jobs will be marked with the “W” wait state
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
63
Specifying Email Notifications
Users can specify what type of email notification they want, depending on job status
The default is only to notify the user when the job is aborted or terminated
Using qsub command with the following options, users can set their own notification:
Usage: qsub –m <a|b|e|n>
Example: qsub –m abe
Options Description
a job is aborted (default)
b job has begun
e job has finished execution
n do not send any email
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
64
Chapter Five - Site Specific Configurations
Chapter Five
Preserving job history
Prologue/epilogue scripts
PBS redundancy and failover
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
65
Preserving Job History - Concept
By default once a job has been de-queued from a PBS complex, the job’s history is retrievable using qstat
To enable job history feature by using qmgr:
Qmgr: set server job_history_enable = True
• preserves job attributes• preserves job resource requested and used
The default preservation time frame is 14 days
Qmgr: set server job_history_duration: <time>• <time> : [[hours:]minutes:]seconds[.milliseconds]
To view job history:• View all job ids; past and present qstat –x |f|a|n|s• View jobs that were only finished, moved, or deleted qstat –H |f|n|s
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
66
Prologue & Epilogue Scripts
Sites can be set up to run custom scripts before jobs are executed or after each job is finished or terminated
• These scripts can perform tasks such as network file staging for site-specific applications, file cleanup after a job has been completed, or to output additional information to the user’s job after completion
• These scripts are known as:• Prologue Script executed on primary execution host before the
job is runLocated in: $PBS_HOME/mom_priv/prologue
• Epilogue Script executed on primary execution host after the job is run
Located in: $PBS_HOME/mom_priv/epilogue
• Each execution host will have it’s own prologue or epilogue script • Only runs on primary execution host of a multinode job• Runs as root
• A timeout period can be set up in the PBS_HOME/mom_priv/config:
$prologalarm <seconds>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
67
Prologue & Epilogue Scripts – Sequence of Events
Start of a Job (Prologue)1. Licenses are obtained2. Files are staged in if needed3. $TMPDIR is created4. The prologue script is executed5. The PBS job script is executed
End of a Job (Epilogue)1. The PBS job script finishes2. The job’s cpusets are destroyed3. The epilogue script is run4. The obit is sent to the pbs server5. Any file stageout takes place – includes STDOUT and STDERR6. Files staged in or out are removed7. PBS Job files are deleted8. FLEX licenses are returned to pool
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
68
Prologue & Epilogue Scripts – Sample Prologue Script
Prologue Script – reordering the vnodes in the PBS_NODEFILE
#!/bin/bash
PBS_NODEFILE="/var/spool/PBS/aux/$1"
lines=`cat $PBS_NODEFILE | wc -l`
nodes=`cat $PBS_NODEFILE | uniq`
nodect=`echo $nodes | wc -w`
loops=$(expr $lines / $nodect)
for (( times = 0; times < $loops; times++ )); do
nodefile=$nodefile$nodes" "
done
echo $nodefile | tr " " "\n" > $PBS_NODEFILE
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
69
Epilogue Script – cleaning up files/directories
#!/bin/sh
# $Id: epilogue,v 3.3 2006/07/27 20:48:36 # $1 = job id# $2 = user name# $3 = group name# $4 = job name# $5 = session id# $6 = requested resource limits# $7 = resources used# $8 = queue name# $9 = account string# $10 = exit code from job
UNIX95=XPG4; export UNIX95jobid=$1jobname=$4user=$2sid=$5if [ -z "$jobid" -o -z "$jobname" -o -z "$user" ]; then echo "`basename $0`: No arguments: exiting." exit 1fi# Defining a marker for utilization later.state=/tmp/cleanup${jobid}# Define the source locationsrc=/scratch/`hostname`/$user/$jobname-`echo $jobid | cut -d. -f1`
if [ -d $src -a ! -f $state ]; then touch $stateif [ -x $src/pbs-cleanup ]; then if [ `whoami` != $user ]; then su - $user -c "$src/pbs-cleanup" else $src/pbs-cleanup fi fiif [ $? -eq 0 ]; then cd / rm -rf $src rmdir `dirname $src` 2>/dev/null firm -f $statefiuntil [ ! -f $state ]; do sleep 5doneexit 0
Prologue & Epilogue Scripts – Sample Epilogue Script
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
70
PBS Redundancy and Failover - Concept
PBS provides the capability for a backup PBS Server to assume the workload of a failed Primary Server
• Primary Server - is the main PBS server• Secondary Server - is usually inactive, but starts up when primary fails
Requirements for a PBS failover configuration:• Primary and secondary servers must run on two separate host machines• Both servers and all the execution hosts must have the same PBS version• Both servers must be the same architecture – same binary• Both servers must be able to communicate with each other and all the execution
hosts• The primary and secondary servers must share the same PBS_HOME directory• PBS_HOME directory should be on a file system that is not local to either of the
server hosts.• Root/administrator must have full read/write access to PBS_HOME
Note: Its not advisable to have a MOM running on either host
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
71
PBS Redundancy and Failover – Setting Up
Configuring Failover on the Primary Server1. Install PBS on the primary server’s host
2. Check whether PBS is able to run jobs on execution hosts
3. If the test passes move the $PBS_HOME directory to a shared file system
4. Check whether PBS is able to run jobs on execution hosts using the new directory
5. If the test passes shut down the pbs_server and pbs_sched daemons
6. Configure the /etc/pbs.conf file to include the following settings:
PBS_PRIMARY=<primary_host>
PBS_SECONDARY=<secondary_host>
PBS_SERVER=<short name for primary host>
7. The primary server is configured to run the scheduler:
PBS_START_MOM=0
PBS_START_SCHED=1
8. Start the PBS daemons by executing: /etc/init.d/pbs start
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
72
PBS Redundancy and Failover – Setting Up, cont.
Configuring Failover on the Secondary Server
1. Install PBS on the secondary server’s host
2. Mount the $PBS_HOME directory to same shared file system where the primary’s $PBS_HOME is mounted to
3. Configure the /etc/pbs.conf file to include the following settings:
PBS_PRIMARY=<primary_host>
PBS_SECONDARY=<secondary_host>
PBS_SERVER=<short name for primary_host>
4. Since only one instance of the PBS scheduler can be running, only the primary server is configured to run it; the secondary will not run it
PBS_START_MOM=0
PBS_START_SCHED=0
5. Start the PBS daemons by executing: /etc/init.d/pbs start
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
73
PBS Redundancy and Failover – Setting Up, cont.
Configuring Failover on Execution and Client Hosts
1. Install PBS on each execution host
2. On each execution host, configure the /etc/pbs.conf file to include the following parameters:
PBS_PRIMARY=<primary_host>
PBS_SECONDARY=<secondary_host>
PBS_SERVER=<short name for primary host>
3. Install the client commands on each client host
4. On each client host, configure the /etc/pbs.conf file to include the following parameters:
PBS_PRIMARY=<primary_host>
PBS_SECONDARY=<secondary_host>
PBS_SERVER=<short name for primary host>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
74
PBS Redundancy and Failover – Behavior
What type of communication occurs between the primary and secondary servers when the daemons are running?
• The secondary server will periodically attempt to connect to the primary server• The primary server will send a “handshake” every few seconds to the secondary server• Doing a “qstat –Bf” will show which of the two servers is active; look at the “server_host”
line
What happens when the secondary server becomes active?• PBS will send an email from the email account defined in the server’s “mail_from” attribute
that a failover has occurred• The Secondary will communicate with the primary’s scheduler
• If it cannot communicate then the secondary server will launch its own scheduler process
• The Secondary server will inform all the PBS MOM that it’s the active server
How does a failover impact PBS users?• Users will not notice when a failover occurs• When a user uses a PBS command such as qstat, the command will try to connect to the
primary server first. If it fails, it will try the secondary server. • If the secondary responds to the command, a local file is created so this process doesn’t repeat
every time that user sends PBS commands
• This file is removed after the primary becomes active
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
75
Chapter Six: Limiting Resource Usage
Chapter Six
Concept
Terminology
Attributes
Users
Groups
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
76
Resource Usage: Concept
PBS allows sites to setup separate resource limits by individual users or groups, generic users or groups, and total used by all users
Different methods of resources limits can be set:• total number of jobs that can run in a PBS complex• total number of jobs a single user can run (named or generic ) • total number of jobs a group can run (named or generic)• maximum amount of resource that a user can request per job • maximum amount of resource that a group can request per job• total number of jobs that can be queued• total number of jobs that a user can have in a queue• total number of jobs that a group can have in a queue
Limit attributes are set within the qmgr utility• at server level• at queue level
PBS managers and operator can set limit attributes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
77
User Limits Description limit-spec
All usersA limit for the total amount of resources allocated to all users combined
o:PBS_ALL
Generic users A limit for any single user u:PBS_GENERIC
An individual user A limit for a named user u:<username>
Group Limits Description limit-spec
Generic groups A limit for any group g:PBS_GENERIC
An individual group A limit for a named group g:<groupname>
Terminology
Note: <limit-spec> is case-sensitive
Resource Usage: Terminology
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
78
<limit attribute> Description
max_run Maximum number of jobs allowed to be running
max_run_soft Soft limit of number of jobs allowed to be running
max_run_res.<resource>Maximum amount of specified resource that be can allocated to running jobs
max_run_res_soft.<resource>Soft limit on the amount of specified resources that be can allocate to running jobs
max_queued Maximum number of jobs allowed in a queue
max_queued_res.<resource>Total amount of specified resource that can be allocated to queued or running jobs
Resource limit attributes
Syntax• Server level
set server <limit_attribute> += “ [<limit_spec=<value>]
• Queue Levelset queue <queue_name> <limit_attribute> += “ [<limit-
spec>=<value>]”
Resource Usage: Attributes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
79
Resource Usage: Users
Limit the total number of running jobs for all users within a PBS complex to 4 jobs
• set server max_run = “[o:PBS_ALL=4]”
Limit a set number of running jobs for each user to 4 jobs
• set server max_run = “[u:PBS_GENERIC=4]”
Limit the number of running jobs for user “pbsuser01” to 4 jobs
• set server max_run += “[u:pbsuser01=4]”
Limit the TOTAL number of running jobs for all users to 7; however allow user
“pbsuser01” to run 5
• set server max_run += “[o:PBS_ALL=7] , [u:pbsuser01=5]”
Generic Users =3; user “pbsuser01” = 2; user “pbsuser02”=5
• set server max_run +=“[u:PBS_GENERIC=3], [u:pbsuser01=2],[u:pbsuser02=5]”
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
80
Resource Usage – Groups
Limit the total number of running jobs for any groups within a PBS complex to 4
jobs
• set server max_run = “[g:PBS_GENERIC=4]”
Limit the number of running jobs for a named group: opti to 4 jobs
• set server max_run += “[g:opti=4]”
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
81
Chapter Seven - Job Attributes & Selective Query
Chapter Four
Altering requested job resources
Handling output and error files
Job’s staging and execution directory
File staging
Sending messages to PBS jobs
Sending signals to PBS jobs
Selective job querying
Job dependencies
Moving jobs between queues
Holding and releasing jobs
Deferring job execution
Specifying email notifications
Exercises
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
82
Altering Requested Job Resources – Using “qalter”
Job’s requested resources can be changed even after submitted
• Using “qalter” command:
Usage: qalter -l <resource_name>=<new_value> <job_id>
Example: qalter -l select=1:ncpus=3 0.traintb16
Can a job’s requested resources be altered once that job has started execution?
• Yes, but only certain types of resources
Note: Managers and Operators can grant more resources even if job has started
Resource Before Execution After Execution
cputime YES YES- smaller amount
walltime YES YES
ncpus YES NO
memory YES NO
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
83
Handling Output and Error Files
Users have the ability to control how their output and errors are handled when their jobs are completed
• Can be defined at qsub command line or as a PBS directive• By default files are copied using rcp; scp can be configured
Option #1: Specifying the path/filename of STDOUT/STDERR• -o <path><filename>• -e <path><filename>
Option #2: Where to retain STDOUT/STDERR files
Options Description
-k eSTDERR to be retained in job’s staging/execution directory
-k oSTDOUT to be retained in job’s staging/execution directory
-k oeBoth files to be retained in job’s staging/execution directory
-k n Neither file is retained
Note: Option #1 and #2 cannot be mixed together
If .O and .E cannot be copied back it is retained on the execution host in the directory $PBS_HOME/undelivered
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
84
Default job staging is user’s home directory
jobdir = <user’s home directory>
Alternative method is have PBS create a unique directory for each job; this is done by using the sandbox attribute
Usage: qsub –W sandbox = <HOME | PRIVATE>
Where: HOME user’s home directory; defaultPRIVATE PBS will create a job-specific
directory
• Where the PRIVATE directory name has the form:
pbs. <job_id.server_name>.<id_string> pbs.21.traintb16.x8z
Job’s Staging and Execution Directory
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
85
If using sandbox=PRIVATE:
• jobdir = /home/pbsuser01/pbs.17.traintb16.x8z• .O and .E will be copied to where it was qsub• after the job is completed the PRIVATE ($jobdir) directory is deleted
If using sandbox=PRIVATE with –k oe option:
• jobdir = /home/pbsuser01/pbs.17.traintb16.x8z• .O and .E will remain in $jobdir directory• after the job is completed the PRIVATE ($jobdir) directory is deleted;
including the .O and .E
Job’s Staging and Execution Directory, cont
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
86
File Staging
Input/Output File Staging• Users can specify which files/directories are copied onto the execution host before their job
executes. This is known as STAGE IN.
• Users can specify which files/directories are returned to the submission host or specified directory after the job completes. This is known as STAGE OUT.
• After a job is completed, all stage-in and stage-out files are removed.
Command line input argument:qsub –W stagein = <remote_path/file@server_name>:<local_path/file>qsub –W stageout = <file>:<remote_path/file@server_name>
PBS Directive:#PBS stagein = <remote_path/file@server_name>:<local_path/file>#PBS stageout =<local_path/file>:<remote_path/file@server_name>
Note: By default PBS uses RCP for file copying. SCP can be used. Walltime is not charged during staging in and out of files.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
87
Sending Messages to PBS Jobs – Using “qmsg”
String messages can be sent to a job’s output (.O) or error (.E) file
Why?• To have external events recorded to the jobs• Useful for administrators to notify a job that system events occurred
where that job was running
• Using “qmsg” command:
Output file: qmsg –O “<msg>” <job_id>
Error file: qmsg –E “<msg>” <job_id>
Note: If flag “O” or “E” is not specified, the message is sent to the error file
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
88
Sending Signals to PBS Jobs - Concept
Why send a signal?
• To force a program to take a specific action
Most signals that are used:
Signal Description
SIGHUP Hangs up the program process
SIGTERM Terminates the program process
SIGINT Interrupts the program process
SIGKILL Kills now regardless of the state of the program
suspend Suspends a job process
resume Resumes a job process
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
89
Sending Signals to PBS Jobs – Using “qsig”
Sending a signal
• Using “qsig” command
Usage: qsig –s <signal> <job_id>
Example: qsig –s suspend 0.traintb16qsig –s resume 0.traintb16
Note: Here, <signal> can be either the name of the signal, or its corresponding
unsigned number.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
90
Selective Job Querying – Using “qselect”
Using qstat will output status of all current jobs
The qselect command can return a list of job IDs that meet specific criteriaOption Value Description
-N <name> Job name
-q <queue> Queue name
-s <job state> Job states R,Q, etc
-u <user name> User name
-H Finished or moved jobs
-l <res.OP.value> By resources
-t <.sub_option.time_attribute.value> By certain time type
OP Description
.eq. equal to
.ne. not equal to
.ge. greater than or equal to
.gt. greater than
.le. less than or equal to
.lt. less than
sub_option time_attribute Description
a Execution_Time time job began execution
c ctime job creation time
e etime job end time
g eligible_time accrued eligible time
m mtime modification time
q qtime job queued time
s stime job start time
Usage: qselect –<option>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
91
Selective Job Querying – Using “qselect”, cont.
Examples:
• To find job IDs of jobs belonging to a particular user:
qselect –u user01
• To find job IDs of running jobs that have requested greater than 4 ncpus:
qselect –s R –l ncpus.gt.4
• To query jobs that are currently in the run state wrapped around qstat:
qstat `qselect –s R`
• To delete all jobs in a PBS complex wrapped around qdel:
qdel `qselect`
• To list all jobs in a PBS complex including finished or moved jobs:
qselect –x
• To list jobs between a time of start time:
qselect -ts.gt.09251200 -ts.lt.09251500
Note: Using qselect without any options outputs all job IDs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
92
Job Dependencies - Concept
Users have the ability to specify dependencies between their jobs, such as:
• Specify order of execution • Execute the next job only if previous job finished• Place jobs on hold until a particular job starts or completes
Using “qsub” command
Usage: qsub –W depend=<type>:<arg_list> <job_script>
Example: qsub -W depend=afterok: 1.traintb16 my_script
To find out if a job has dependencies: qstat –f <jobid>
job_state = Hdepend: afterok:[email protected]
Note: jobs that request a dependency will be placed in “H” state
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
93
Job Dependencies – Dependency Types
Dependency Type Description
after:<arg_list> Job may be scheduled for execution after all jobs in <arg_list> have started execution
afterok:<arg_list> Job may be scheduled for execution only after all in <arg_list> have terminated with no errors.
afternotok:<arg_list> Job may be scheduled for execution only after all jobs in <arg_list> have terminated with errors.
afterany:<arg_list>Job may be scheduled for execution after all jobs in <arg_list> have terminated with or without errors.
before:<arg_list> Jobs in <arg_list> may begin execution once this job has begun execution
beforeok:<arg_list> Jobs in <arg_list> may begin execution once this job terminates without errors
beforenotok:<arg_list> Jobs in <arg_list> may begin execution once job terminates execution with errors
beforeany:<arg_list>Jobs in <arg_list> may begin execution once this job terminates execution, with or without errors
on:<count>Job may be scheduled for execution after count dependencies on other jobs have been satisfied
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
94
Chapter Eight - PBS Server & Site Configurations
Chapter Eight
Viewing and setting server, queue, and vnode attributes
Server log information
Creating a backup of the PBS environment
Exercises
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
95
Viewing PBS Server Configuration – Using “qmgr”
PBS Administrators can use the PBS utility “$PBS_EXEC/bin/qmgr” to view and modify PBS server, queue and vnode attributes.
• The qmgr command prints out the commands to re-create server and queue settings. The values shown below are the defaults.
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_info = 7788@localhost
set server pbs_license_min = 1
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 3600
set server license_count = "Avail_Global:32 Avail_Local:1 Used:0 High_Use:0"
set server eligible_time_enable = False
set server max_concurrent_provision = 5
Default server settings
Default queue settings
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
96
Qmgr Commands
Helpful qmgr commands
• List of qmgr commands and PBS version: qmgr: help
• Print out commands to re-create server/queue: qmgr: print server|queue @default
• Print server/queue attributes and their values: qmgr: list server|queue @default
• Print attributes and values of a specific queue: qmgr: list queue <queue_name>
• Print out commands to re-create named queue: qmgr: print queue <queue_name>
• To delete a queue: qmgr: delete queue <queue_name>
• Print out commands to re-create vnodes: qmgr: print nodes @default
• Print attributes and values of a specific vnode:qmgr: list node <node_name>
• To set the value of an attribute: qmgr: set server|queue|node <attribute>
• To unset the value of an attribute: qmgr: unset server|queue|node <attribute>
• To create a new queue or vnode: qmgr: create queue|node
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
97
PBS Server – Understanding PBS Server Attributes
Setting server attributes allows PBS Administrators to specify who can submit jobs, how many jobs can be running, resource limits (min, max, available, and default), reservations, access control list (acl), etc.
Three levels of privilege: User, Operator, and Manager. Managers have greatest privilege.
• All users can list or print attributes.
• Operators can additionally set or unset attribute values.
• Managers can additionally create or delete queues and vnodes
PBS server daemon must be running in order to execute the qmgr utility.
Any changes made to server attributes via qmgr go into effect as soon as they are entered; the pbs_server daemon does not need to be restarted.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
98
PBS Server - Server Configuration Attributes
Attribute Description
scheduling Specifies whether or not the scheduler will schedule jobs. T|F
default_queueQueue to which jobs are sent when users don’t specify a target queue. This is set to ‘workq’ by the install script.
log_events Specifies which events are logged by the server.
mail_from Username from which server sends mail. Default: “adm”
query_other_jobs Specifies whether users can query other users’ job stats. T|F
resources_default.ncpus Default value for ncpus assigned a given job if not requested at qsub
default_chunk.ncpus Default value for ncpus per chunk
scheduler_iteration Time between non-event-driven scheduling iterations
resv_enable Enables/disables requesting reservations
node_fail_requeueTime value for the server to wait for primary execution vnode to come back up before it will re-queue or delete the vnode’s jobs
max_array_size Maximum number of subjobs allowed in a job array
eligible_time_enable Controls whether a job’s eligible_time attribute is used as its starving time
max_concurrent_provisionThe maximum number of vnodes allowed to be in the process of being provisioned
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
99
PBS Queues – Understanding Queues
PBS uses a resource-based scheduling system, where submitted jobs are held in a container waiting for execution.
This container is known as a “queue”.
There are two types of queues: Execution and Route
• Execution queue – jobs waiting for execution or running jobs• Route queue –routes jobs to either another execution or
another route queue
Queues can be set up with attributes such as:• Number of jobs running• Max queued• Resources available• Which users/groups/hosts have access
PBS comes with a predefined default execution queue: workq
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
100
PBS Queues – Attributes of an Execution Queue
PBS administrators use the PBS “qmgr” utility to view, modify, and delete queues
To view the attributes of queue workq: list queue workq
Qmgr: list queue workq
Queue workq
queue_type = Execution
total_jobs = 0
state_count = Transit:0
Queued:0 Held:0
Waiting:0 Running:0
Exiting:0 Begun:0
enabled = True
started = True
Number of jobs in queue
Number of jobs in each state
Whether queue accepts new jobs
Type of queue
Name of queue
Whether queue’s jobs can be run
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
101
PBS Queues – Creating an Execution Queue
Only PBS Administrators can create and delete queues
To print out the commands to recreate queue workq:
print queue workq
Qmgr: print queue workq
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
Indicates what type of queue
True|False: jobs can be enqueued
True|False: jobs can be scheduled for execution
Creation of a given queue
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
102
PBS Queues – Creating an Execution Queue, cont
Creating a new queue named “my_queue”
1. create queue my_queue
2. set queue my_queue queue_type = Execution
3. set queue my_queue enabled = TRUE
4. set queue my_queue started = TRUE
Defining this queue as an Execution (or Route) queue
Setting the enabled attribute to True allows job to be enqueued
Setting the started attribute to True allows jobs to run from this queue
Naming and creating the new queue
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
103
PBS Queues - Execution Queue Attributes
Attribute Description
max_queuable Maximum number of jobs allowed in queue
max_running Maximum number of jobs allowed to be running
resources_default.<res_name> Default resource assigned to a job if that resource is not specified via qsub command
resources_max.<res_name> Maximum amount of resource request for jobs that are allowed into this queue
resources_min.<res_name> Minimum amount of a resource request for jobs that are allowed into this queue
resources_available.<res_name> Maximum amount of resource allowed to be used by all running jobs in this queue
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
104
Why would a PBS complex have multiple queues instead of a single queue?
• Having multiple queues could help with the following:
• Various types of applications•• Access by different groups of users, hosts, or groups
• Long, medium, or short running jobs
• Different architectures
• Various resources
• Assigning a dedicated queue to a host/vnode
• Peering jobs to another PBS complex
PBS Queues - Why Use Multiple Execution Queues?
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
105
PBS Queues – Setting Access Control on Queues
Queues can be configured so that only certain users, groups, or hosts can submit jobs to a particular queue.
• This functionality is called an Access Control List – “ACL”
• There are 3 types of access level <acl_type>:“user” a list of users who are allowed to enqueue jobs“group” a list of groups who are allowed to enqueue jobs“host” a list of hosts that are allowed to enqueue
jobs
To set an ACL on a queue:
1. Enable the ACL functionality for that queue:
set queue <queue_name> acl_<acl_type>_enable = True
2. Assign a UNIX/Linux list of users, groups, or hosts that will have access:
set queue <queue_name> acl_<acl_type>s += “<list of users, groups, or hosts>”
3. To restrict a user, use the minus operator symbol:
set queue <queue_name> acl_<acl_type>s = “- <user>”
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
106
PBS Queues – Creating a Routing Queue
Routing queues route jobs to an execution queue or to another routing queue
• How can a routing queue be beneficial?• Allows users to submit to one queue instead of specifying at qsub• Destination queues can be set up by ACL or resource restrictions• Jobs can be routed to another PBS complex
To create a routing queue named “routeq”:
1. create queue routeq
2. set queue routeq queue_type = Route
3. set queue routeq route_destinations += “my_queue”
4. set queue routeq enabled = True
5. set queue routeq started = True
- List of execution or route queues to be routed to
- Comma-separated
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
107
PBS Queues - Routing Queue Attributes
Routing queues may also be configured with queue attributes such as:
• route_lifetime• max_queuable• resources_max• resources_min• access control list (ACL)
To prevent users from submitting jobs directly to an execution queue (thus bypassing the route queue), you can set the following attribute:
Usage: set queue < queue_name> from _route_only = True
To assign multiple execution queues as “route_destinations” :
Usage: set queue <queue_name> route_destinations += “queue1, queue2, queue3”
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
108
PBS Queues – Assigning Queue Priorities
Queues can be assigned a priority level between -1023 and +1024
• By default a new queue has a priority level set to 0
• Setting a non-default priority level serves two functions:
1) PBS Scheduler sorts the queues from high to low using this priority level for job sorting
2) Enables queue to be an Express Queue (by default, priority >= 150)• useful in determining which job to preempt when using Preemptive
Scheduling
Usage: set queue <queue_name> priority = <value>
Example: set queue my_queue priority = 100
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
109
PBS Queues - How Updates Affect Jobs
Any modifications made via the qmgr utility take place immediately and do not require the pbs_server daemon to be restarted
Certain types of attributes will affect those jobs already queued but not running
Using qmgr to delete a queue that has jobs enqueued or running is not allowed
Alternative Methods:
• May want to stop enqueuing jobs into the queue by setting enabled=false and let the queue drain the jobs
• If waiting for the queue to drain is not an option
- Option 1: use qdel to delete the jobs
- Option 2: use qmove to move jobs to a different queue; or to another PBS complex
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
110
PBS Queues – Queue Status
To obtain status of all the queues within a PBS complex: qstat –Q[f]
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
workq 0 0 yes yes 0 0 0 0 0 0 Exec
Queue: workq
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
enabled = True
started = True
Output of: qstat –Q
Output of: qstat –Qf
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
111
PBS Nodes - Understanding PBS Vnodes
What is a host?• An instance of a single OS running• A machine
What is a PBS MOM?• Executes the job script• Reports back to the server when the job is completed• Enforces some job resource limits• Can manage multiple vnodes• Tracks job resource usage
What are vnodes?• An abstract object representing a set of resources which form a usable part
of a machine- Can be one of the following: host, nodeboard, or blade
• A single host can be made up of multiple vnodes
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
112
PBS Nodes - Viewing Existing Vnodes
There are two methods to view the list of vnodes and their attributes in a PBS complex
Node traintb16
Mom = traintb16
Port = 15002
pbs_version = PBSPro_11.0.0.103450
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = trantb16
resources_available.mem = 1027124kb
resources_available.ncpus = 1
resources_available.vnode = traintb16
resources_assigned.mem = 0kb
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
traintb16
Mom = traintb16
Port = 15002
pbs_version = PBSPro_11.0.0.103450
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = traintb16
resources_available.mem = 1027124kb
resources_available.ncpus = 1
resources_available.vnode = traintb16
resources_assigned.mem = 0kb
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
Method 1
Within qmgr: list nodes @default
Method 2
Using pbsnodes –av at command line
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
113
PBS Nodes – Setting Vnode Attributes
Attribute Description
comment Assign a comment
max_runningMaximum number of jobs that can run on this vnode
priority Vnodes can be sorted by a priority level
stateShows or sets the state of the vnode. Useful for setting a vnode’s state to online/offline
queue Associate a queue to a vnode
sharingDefines whether more than one job at a time can use this vnode's resources.
resources_available.<res>List of resource amounts available on this vnode. If not explicitly set, amount shown is that reported by pbs_mom running on the vnode.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
114
PBS Nodes – Using “pbsnodes”
Use “pbsnodes” to obtain a detailed listing of all the hosts or vnodes in a PBS complex
Usage: pbsnodes <options>
Example: pbsnodes -a
Options Description
a List all hosts and their attributes
av List all vnodes and their attributes
lLists all hosts or vnodes with state=DOWN or state=OFFLINE
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
115
PBS Nodes – Output of “pbsnodes –a”
pbsnodes –a traintb16
Mom = traintb16
Port = 15002
pbs_version = PBSPro_11.0.0.103450
ntype = PBS
state = free
pcpus = 1
resources_available.arch = linux
resources_available.host = traintb16
resources_available.mem = 1027124kb
resources_available.ncpus = 1
resources_available.vnode = traintb16
resources_assigned.mem = 0kb
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
116
PBS Server - Server Log Information
Server logs are stored on the host where the pbs_server daemon is running • Location: $PBS_HOME/server_logs• A new log file is created every day
– File name format: [YYYYMMDD]
The logging level is configurable using qmgr utility
Usage: set server log_events = <value>
Where <value> can be between 0 and 511— 0 nothing is logged— 511 default log level— 2047 everything is logged; useful for debugging hooks
Note: When changing server’s log_event it is not necessary to restart the pbs_server daemon
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
117
PBS Server – Details of Server Log Entry
date-time date and time stamp, format: mm/dd/yyyy hh:mm:ss
event_code numerical code for type of event
server_name name of the Server which logged the message
object_type type of object which the message is about:
Svr=server
Que=queue
Job=job
Req=request
Fil=file
Node=vnode
Hook=hooks
object_name name of the specific object
message_text text of the log message
09/14/2010 08:17:31;0002;Server@trainhp01;Svr;Log;Log opened
09/14/2010 08:17:45;0002;Server@trainhp01;Node;traintb16.prog.altair.com;node up
09/14/2010 08:18:36;0040;Server@trainhp01;Svr;traintb16;Scheduler sent command 3
Sample of Server log entry:
syntax: date-time;event_code;server_name;object_type;object_name;message_text
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
118
Backing up Server, Queue & Vnode Settings
PBS Administrators can safely back up their qmgr settings at the command line:
1. Output the server and queue settings:
qmgr –c “ print server” > server_queue_settings
2. This command will print all attributes for all vnodes:
qmgr –c “ print node @default” > vnodal_settings
3. This command will print all attributes for all hooks:
qmgr –c “print hook” > hook_definitions
To restore settings:
qmgr < <input_file>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
119
Chapter Nine - PBS MOM Configuration
Chapter Nine
What is the PBS MOM?
Directory structure of $PBS_HOME/mom_priv
Contents of $PBS_HOME/mom_priv/jobs
Configuration parameters
Enforcing resource limits
Restricting user logins
Checkpoint and restart
MOM log information
Details of MOM logs
Exercises
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
120
What is the PBS MOM?
The PBS MOM is the component responsible for monitoring and executing PBS jobs, as well as the following:
• Reports resource usage• Enforces resource usage limits• Notifies the server when the job has finished• Executes prologue/epilogue script
Each execution host (MOM) has its own configuration file • Located in $PBS_HOME/ mom_priv/config• Provides several types of runtime information
- Access control- Static resource names and values- External resources provided by a program to be run on request via a shell script
• Each parameter is on a separate line and component parts are separated by white space
• Default contents of mom_priv/config:$clienthost traintb16
$restrict_user_maxsysid 499
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
121
Directory Structure of $PBS_HOME /mom_priv
Directory structure of $PBS_HOME/mom_priv *
Configuration file
When jobs are running the job script is placed in this directory
MOM pid lock file
List of vnodes in a PBS complex
mom_priv config
jobs
vnodemap
mom.lock
* This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
122
Contents of $PBS_HOME /mom_priv/jobs
Contents of $PBS_HOME/mom_priv/jobs *
-rw------- 1 root root 3427 Jun 10 00:40 2.traintb16.JB
-rwx------ 1 pbsuser01 users 22 Jun 10 00:40 2.traintb16.SC
drwx------ 2 root root 4096 Jun 10 00:40 2.traintb16.TK
If a job is running on a given host it creates 2 files and 1 directory for each job in the mom_priv/jobs directory
<job_id>.<server_name>.JB Contains job information such as resources used
<job_id>.<server_name>.SC User job script
<job_id>.<server_name>.TK Directory containing that job’s task
* This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
123
PBS MOM – MOM Configuration Parameters
Note: After modifying the MOM’s config file, a ‘SIGHUP” must be sent to that pbs_mom daemon
Parameter Description
$clienthost List of hosts allowed to connect to MOM
$cputmult Factor to adjust CPU time used by each job
$ideal_load Declares the ideal mark for load on a vnode
$max_load Declares the high water mark for load on a vnode
$kbd_idle Enables idle workstation cycle harvesting
$logevent Determines the kind of information logged to MOM logs
$max_check_poll Maximum time between polling cycles
$min_check_poll Minimum time between polling cycles
$prologalarm Timeout period for prologue/epilogue script
$restricted List of hosts that are allowed to connect to MOM without needing a privileged port
$restrict_user Controls whether normal users without a job running can log into the host
$restrict_user_maxsysid Aany user with UID less than this value is exempt from $restrict_user
$suspendsig Alternative signal to suspend job instead of SIGSTOP
$usecp Tells MOM to use cp instead of rcp/scp for stdout/err file transfers
$wallmult Factor used to adjust walltime usage by a job
$tmpdir Specifies location of job scratch directory
$jobdir_root job-specific staging and execution directories
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
124
PBS MOM – Enforcing Resource Limits
Each MOM can be configured to enforce job resource limits by setting the $enforce parameter in the mom_priv/config file
Attribute Type Description Default
average_cpufactor float Modifies cpuaverage; ncpus limit multiplier 1.025
average_percent_over intModifies cpuaverage; percentage over ncpus limit to allow
50
average_trialperiod intModifies cpuaverage; minimum walltime before enforcement
120s
cpuaverage boolean enforce this limit off
cpuburst boolean enforce this limit off
delta_cpufactor float Modifies cpuburst; ncpus limit multiplier 1.5
delta_percent_over intModifies cpuburst; percentage over the limit to allow
50
delta_weightup floatModifies cpuburst; weighting when average is moving up
0.4
delta_weightdown floatModifies cpuburst; weighting when average is moving down
0.1
mem boolean Enforces each job’s memory limit off
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
125
PBS MOM - Restricting User Logins
PBS Professional can be configured to kill user-owned processes when that user does not have a job running on that host through PBS
• To configure this functionality, add the following parameter to the $PBS_HOME/mom_priv/config file:
$restrict_user on
Note: When this feature is turned on, all processes belonging to any users who log onto that execution host will be terminated, thus kicking them off
• To create a list of users who are allowed when this featured is enabled:
$restrict_user_exceptions userA, userB, userC
Note: Up to 10 user names are allowed
• To restrict users whose user ID is greater than a specified number:
$restrict_user_maxsysid <number>
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
126
PBS MOM – Checkpoint and Restart
PBS administrators can use their own site-defined external checkpoint facility• This is useful on systems that don’t support OS-level checkpointing• Provided by application or other external means
Site-specific checkpointing is configured in the MOM configuration file mom_priv/config by using the $action parameter and an action
Action Argument Description
checkpointTIME_OUT !SCRIPT_PATH ARGS[…]
Specifies that the script in SCRIPT_PATH is run and the job is left running
checkpoint_abortTIME_OUT !SCRIPT_PATH ARGS[…]
Specifies that the script in SCRIPT_PATH is run and the job is terminated
restartTIME_OUT !SCRIPT_PATH ARGS[…]
Specifies the script to be used to restart the job
$restart_background true|false Specifies how the job is restarted
$restart_transmogrify true|false Controls how MOM launches the restart script/program
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
127
PBS MOM - MOM Log Information
Each execution host has its own MOM log files
• Location: $PBS_HOME/mom_logs
• A new log file is created every day
- file name format: [YYYYMMDD]
The logging level is configurable in $PBS_HOME/mom_priv/config
Usage: $logevent <value>
Where <value> can be between 0 and 0xffffffff
- 0 Nothing is logged
- 0xffffffff All information is logged
Note: When changing the log event a SIGHUP to the pbs_mom daemon signals it to reread the mom_priv/config file
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
128
PBS MOM – Details of MOM Log Entry
date-time field Date and time stamp. Format: mm/dd/yyyy hh:mm:ss
event_code Numerical code for type of event
pbs_daemon pbs_mom
object_type Type of object which the message is about:Svr=server
Que=queue
Job=job
Req=request
Fil=file
Node=vnode
object_name Name of the specific object
message_text Text of the log message
09/14/2010 11:35:55;0008;pbs_mom;Job;1.traintb16;Started, pid = 24073
09/14/2010 11:36:01;0080;pbs_mom;Job;1.traintb16;task 00000001 terminated
09/14/2010 11:36:01;0008;pbs_mom;Job;1.traintb16;Terminated
Sample of MOM log entry:
syntax: date-time;event_code;server_name;object_type;object_name;message_text
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
129
Chapter Ten - PBS Scheduler Configuration
Chapter Ten
What is the PBS scheduler?
Directory Structure of $PBS_HOME/sched_priv
Default behavior of the scheduler
Scheduler configuration file
Default scheduling parameters
Scheduler log information
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
130
What is the PBS Scheduler?
What is the PBS scheduler?
• The PBS daemon that is responsible for enforcing site policy, by choosing the order in which jobs are run, and on what resources
• The scheduler provides various scheduling policies such as:- First in First Out (FIFO)
- Sort jobs based on multiple resources (high to low or low to high)
- Sort nodes based on resources or priority level
- Sort queues based on priority level or by qstat –Q output order
- Allow jobs from higher priority queues to be eligible to run first
- Allow jobs to move between two or more PBS complexes
- Allow jobs to run in a dedicated time space
- Enforce fair portions of a site’s resources and usage
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
131
Directory Structure of $PBS_HOME /sched_priv
Directory structure of $PBS_HOME/sched_priv *
Specifies dedicated time
Lists holidays to be treated as “non-primetime”
Specifies relative percentages between fairshare entities
Scheduler configuration file
pbs_sched pid lock file
Debug messages
sched_config
dedicated_time
holidays
sched_priv
resource_group
sched_out
sched.lock
* This information is for debugging purposes only. It may change in future releases.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
132
Default Behavior of the Scheduler
What events happen within a scheduling cycle?
1. Server will send list of MOM resources to the Scheduler
2. Scheduler will sort all the resources based on default scheduling policies
3. Scheduler will sort queue(s)
- If one or more queues have priority attribute set then sort based on queue priority
- If no queue priority is set then it will randomly sort the queues or by qstat –Q output order
- If a queue’s priority is set to 150 or higher jobs from this queue will be eligible for execution first
- If a queue’s priority is set to 150 or higher and preemption is enabled, then preemptive scheduling will be enforced, allowing jobs from this queue to preempt other jobs
4. Scheduler will sort the jobs from the first queue
- Jobs are sorted based on when they were enqueued
- If a job has been marked “starving” and if the help_starving_jobs scheduling policy is turned on, it will move that job up in sort priority
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
133
Using the sched_config file
Parameter format:
name: value [prime | non_prime | all | none]
• Primetime and non-primetime period are set in the sched_priv/holidays file
• Must send a “kill –HUP <pbs_sched_pid>” in order for the Scheduler to re-read the configuration file
• Any modifications may affect not only queued jobs but also running jobs
Name Description
name Name of the scheduler parameter non-changeable
value Type: string, string array, integer, boolean, time case-sensitive
prime Applies only to primetime period case-sensitive
non_prime Applies only to non-primetime period case-sensitive
allApplies to both primetime and non-primetime periods;default if prime/nonprime is not specified
case-sensitive
none Not used
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
134
Default Scheduling Parameters – sched_config
Parameter Value Parameter Value
round_robin: false load_balancing: false
by_queue: true smp_cluster_dist: pack
strict_ordering: false #unknown_shares: 10
help_starving_jobs: true fairshare_usage_res: cput
max_starve: 24:00:00 fairshare_entity: euser
backfill: true half_life: 24:00:00
backfill_prime: false sync_time: 1:00:00
prime_exempt_anytime_queues false #fairshare_enforce_no_shares: true
#prime_spill: 1:00:00 preemptive_sched: true
primetime_prefix: p_ preempt_queue_prio: 150
nonprimetime_prefix: np_ preempt_prio: "express_queue, normal_jobs“
#job_sort_key: "cput LOW” preempt_order: "SCR“
node_sort_key: "sort_priority HIGH” preempt_sort: min_time_since_start
sort_queues: true true dedicated_prefix: ded
resources: "ncpus, mem, arch, host, vnode“ log_filter: 3328
#sched_cycle_length 20:00:00
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
135
PBS Scheduler – Scheduler Log Information
Scheduler logs are stored on the machine where the pbs_sched daemon is running (default)• Location: $PBS_HOME/sched_logs• A new log file is created every day
– file name format: [YYYYMMDD]
The logging level is configurable in $PBS_HOME/sched_priv/sched_config:
Usage: log_filter: <value>
Where <value> can be between 0 and 3328• 0 Means to log everything• 3328 Default value• 4095 Log nothing
Note: When changing the scheduler log event it is necessary to do a kill –HUP on the pbs_sched pid
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
136
PBS Scheduler – Details of Scheduler Log Entry
date-time field Date and time stamp. Format: mm/dd/yyyy hh:mm:ss
event_code Numerical code for type of event
pbs_daemon pbs_sched
object_type Type of object which the message is about:Svr=serverQue=queueJob=jobReq=requestFil=fileNode=vnode
object_name Name of the specific object
message_text Text of the log message
09/14/2010 16:48:36;0080;pbs_sched;Req;;Starting Scheduling Cycle
09/14/2010 16:48:36;0080;pbs_sched;Req;;Leaving Scheduling Cycle
09/14/2010 21:45:47;0002;pbs_sched;Svr;Log;Log closed
Sample of scheduler log entry:
syntax: date-time;event_code;server_name;object_type;object_name;message_text
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
137
Chapter Twelve - Scheduling Custom Resources
Chapter Twelve
Custom Resources
Resource Types
Resource Flags
Understanding the resourcedef file
Different examples of using custom resources
• Host/vnode level resource
• Boolean resource
• Server level resource
• Queue level resource
• Query execution hosts
• Query FLEXlm server
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
138
Scheduling Resources - Custom Resources
The PBS Scheduler supports arbitrary resources, e.g. to track disk space, or application licenses
Limiting resource usage for users, groups, queues, and vnodes influences the order in which jobs are started
Resources may be tracked in two ways:• Internally by PBS: resources which are consumed by PBS jobs only • External scripts: resources which might be consumed by PBS jobs and/or
outside of PBS
Resources can exist at various levels• Host (vnode) level• Server and queue level
Resource matching• Via arithmetic comparison for number and size type resources• Via string matching for Boolean and string resources
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
139
Scheduling Resources – Resource Types
Data Types Description Consumable/NON
boolean• defined at vnode level• used within a select statement
non-consumable
float • values [+-] 0-9 [[0-9] …][.][[0-9]…]consumablenon-consumable
long • values 0-9[[0-9]…]consumablenon-consumable
size • number of bytes or wordsconsumablenon-consumable
string • string value non-consumable
string_array • multiple string values separated by comma non-consumable
time• maximum time period that resource can be used • format: [hh:mm:ss[.ms]]
consumablenon-consumable
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
140
Scheduling Resources – Resource Flags
Flags Description Consumable/NON
h• host level resource, static or dynamic• used within select statement
non-consumable
n• host level resource, “n” means static• must also use flag “h”
consumable
f• host level resource• must also use flag “h”
consumable @ 1st vnode
q• server level resource• queue level resource
consumable
<no flag>
• server level resource, no flag means dynamic• queue level resource
non-consumable
i• invisible• users cannot request or qalter this resource• users cannot view the value using qstat –f
r• read only• users cannot request or qalter this resource• users can view the value using qstat -f
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
141
Custom resources are defined in: $PBS_HOME/server_priv/resourcedef
• File needs to be created manually• Permissions must be set to 644
Format: resource_name type=<resource type> flag=<flag>
Sample of resourcedef
Removing custom resources on running jobs who requested that resource are purged after when the PBS Server daemon is restarted
Scheduling Resources – resourcedef
optistruct type=long flag=hnmotionsolve type=boolean flag=hradioss type=long flag=qjobtype type=stringscratch type=size flag=hgwu type=long
Note: Any modifications to the resourcedef file require pbs_server to be restarted
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
142
Where resources are not set either at host/server/queue level PBS assigns default values based on the type of resource
Host/Vnode Level
Server/Queue Level• Numerical resources = infinite
Custom resources can be set with infinite regardless at host/server/queue by setting the scheduler parameter: resource_unset_infinite
Custom Resource – Unset Resources
Resource Type Unset Resource Request Value
boolean False False
float 0.0 0.0
long 0 0
size 0 0
string “” No match value
string_array “” No match value
time 0:00 0:00
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
143
Create a custom resource to be applied at the vnode level to indicate how much of that resource is available at a given time
• Define the custom resource in resourcedef:
• Set the value of the custom resource in qmgr:
• Add the custom resource to sched_config file:
• Request the custom resource:
Custom Resource – Host/vnode-Level
optistruct type=long flag=hn
set node traintb01 resources_available.optistruct=2set node traintb02 resources_available.optistruct=0
resources: “ncpus, mem, arch, host, vnode, optistruct”
qsub –l select=1:ncpus=2:optistruct=1
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
144
Create a custom resource to be applied at the vnode level. This custom resource will indicate whether or not that resource is available on a given vnode
Define the custom resource in resourcedef:
Set the value of the custom resource using qmgr:
Add the custom resource to sched_config file:
Request the custom resource:
Custom Resource – Boolean Resource
motionsolve type=boolean flag=h
set node traintb01 resources_available.motionsolve=true
set node traintb02 resources_available.motionsolve=false
resources: “ncpus, mem, arch, host, vnode, motionsolve”
qsub –l select=1:ncpus=2:motionsolve=true
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
145
Custom Resource – Server Level Resource
radioss type=long flag=q
set server resources_available.radioss=8
resources: “ncpus, mem, arch, host, vnode, radioss”
qsub –l select=1:ncpus=2 –l radioss=1
Create a custom resource to be applied at the server, to track how much of that resource is available globally at a given time
• Define the custom resource within resourcedef:
• Set the value of the custom resource using qmgr:
• Add the custom resource to sched_config file:
• Request the custom resource:
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
146
Custom Resource – Queue Level Resource
jobtype type=string
set queue radioss resources_available.jobtype=radiossset queue radioss resources_min.jobtype=radiossset queue radioss resources_max.jobtype=radiossSet queue radioss resources_default.jobtype= “ “
resources: “ncpus, mem, arch, host, vnode, jobtype”
qsub –l select=1:ncpus=2 –l jobtype=radioss
Create a custom resource to be applied at the queue, to control whether or not a job can be en-queued based on how whether the job requests this resource
• Define the custom resource within resourcedef:
• Set the value of the custom resource using qmgr:
• Add the custom resource to sched_config file:
• Request the custom resource:
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
147
Custom Resource – Query Vnodes
scratch type=size flag=h
scratch !/usr/local/bin/scratch.pl
resources: “ncpus, mem, arch, host, vnode, scratch”mom_resources: “scratch”
qsub –l select=1:ncpus=2:scratch=1GB
Create a custom resource to query vnodes using a call-out script
• Define the custom resource within resourcedef:
• Add the custom resource to sched_config file:
• Set the path to the script name in mom_priv/config file:
• Request the custom resource:
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
148
Custom Resource – Query FLEXlm Server
qsub –l select=1:ncpus=2 –l gwu=50
gwu type=long
server_dyn_res:”gwu !/var/spool/altair/scripts/lmstat”
resources: “ncpus, mem, arch, host, vnode, gwu”
Create a custom resource to query the FLEXlm server to determine if enough FLEX tokens are available for execution
• Define the custom resource within resourcedef:
• Add the custom resource to sched_config file:
• Set the path to the script in sched_config file:
• Request the custom resource:
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
149
Chapter Eleven - Various Scheduling Policies
Chapter Eleven Job priorities in PBS Sorting queues Helping starving jobs Eligible time Backfill Strict ordering True FIFO Preemptive scheduling Hard & soft limits Sorting jobs Tunable formula Round robin SMP cluster scheduling Sort execution hosts Placement sets Primetime & non-primetime Dedicated time Fairshare Peer scheduling Advance reservations Exercises
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
150
Chapter Fifteen– Hooks
Chapter Fifteen
Concept
Hook commands
Setting up a custom hook
Viewing hook definitions
Exporting hook contents
Exercises
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
151
Hooks - Concept
What are hooks?• Custom call-out executables that give more precise control over
submitting jobs • Written in Python programming language• Example applications of hooks:
- Allow/disallow en-queueing jobs based on user/group ID, amount of requested resources, timeframe
- Allow/disallow modifying job attributes of already-submitted jobs- Allow/disallow moving jobs to another execution queue or PBS complex- Allow/disallow requesting an advance/standing reservation- Look up 3rd party database for credentials
• To view hook logging information within the server logs the server log_events attribute should be set to: 2047
• Only root can create hooks
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
152
Hooks - Commands
Hook commands in qmgr:
Command Description
list hook <hook name> List a hook’s attributes and their values
print hook <hook name> Print a hook’s creation commands
create hook <hook name> Create a new hook name
set hook <hook name> <attribute name> = <value> Set a hook’s attribute
unset hook <hook name> <attribute name> = <value> Unset a hook’s attribute
import hook <hook name> <content-type> <content-encoding> <input file>|-
Import a hook’s python script file
export hook <hook name> <content-type> <content-encoding> <output file>|-
Export a hook’s python script to a file
delete hook <hook name> Remove a hook and its definition
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
153
Hooks - Adding a Hook
Steps to add a hook, using qmgr:
1. Add the hook name
• first character must be alphabetic
Qmgr: create hook <hook_name>
2. Set the type of trigger event
• can have multiple events associated with a single hook, using “+=“
Qmgr: set hook <hook_name> event = <event_name>
<event_name> Description
queuejob To allow/disallow enqueueing a job into a queue
modifyjob To allow/disallow modifying job attributes
resvsub To allow/disallow reservation requests by users
movejob To allow/disallow moving jobs to another queue or PBS complex
runjob To allow modifications of running jobs
provision To provision a host
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
154
3. Specify the path and name of the Python script
Qmgr: import hook <hook_name> application/x-python <content-encoding> \ <path/filename>
Additional options:
4. Relative order of hook execution; default = 1 (highest level)
Qmgr: set hook <hook_name> order = <n>
5. Specify a timeout value for hook execution; default = 30 seconds
Qmgr: set hook <hook_name> alarm = <n>
<content-encoding>
default (7bit)
base64
Hooks - Adding a Hook, cont.
Note: when importing a hook, PBS will try to evaluate the script. If it cannot, it will report the information at the command line and in the server logs
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
155
6. Enable or disable a particular hook; default = true
Qmgr: set hook <hook_name> enabled = <Boolean>
Hooks - Adding a Hook, cont.
Note: The pbs_server daemon does not need to be restarted for a hook to be active
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
156
Hooks - Viewing Hook Information
create hook hookA
set hook hookA type = site
set hook hookA enabled = true
set hook hookA event = ‘””’
set hook hookA user = pbsadmin
set hook hookA alarm = 30
set hook hookA order = 1
import hook hookA application/x-python base64 -
Printing hook creation commands:
Qmgr: print hook hookA
Listing hook attributes:
Qmgr: list hook hookA
Hook hookA type = site enabled = true
event = “” user = pbsadmin alarm = 30 order = 1
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
157
Hooks – Exporting Hook Contents
Reasons to use the export command• To view the current script content
• To make a backup of the python script
• To make modifications to the python script
• To export a hook’s Python script to a file
To export a hook’s Python script to a file:
Qmgr: export hook <hook_name> application/x-python <content-encoding> \<path/filename>
To back up hook information:
qmgr –c “print hook <hook_name>” > hook_file
Note: if output file is not specified then it will be stdout
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
158
Exercises
Reject a job that doesn’t specify a walltime
(event = queuejob)
Prevent users from altering any of their job attributes once submitted
(event = modifyjob)
Prevent users from requesting a Reservation
(event = resvsub)
Prevent users from moving their job to another queue or to another PBS complex
(event = movejob)
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
159
Exercise – queuejob Hook
Objective: To reject jobs at submission time that do not request walltime resource; the Python script is already provided
Prerequisites: Disable any existing hooks
PBS Administrator Tasks:1. Use qmgr to create a hook called queuejob2. Set the event as queuejob3. The Python script is located in /root/hook_scripts/queuejob.py4. Leave the default attribute values as they are
PBS User Task:5. Submit job without requesting any walltime resource
Observation:• When submitting a job without requesting walltime resource, what, if any, message appears at the
command line?• Was the job enqueued or was it rejected?
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
160
Exercise – modifyjob hook
Objective: To disallow users from qaltering any of their submitted jobs’ attributes/resources; the Python script is already provided
Prerequisites: Disable any existing hooks
PBS Administrator Tasks:1. Using qmgr create a hook called modifyjob2. Set the event as modifyjob3. The Python script is located in /root/hook_scripts/modifyjob.py4. Leave the default attribute values as they are
PBS User Task:5. Submit job 6. qalter any one of a job’s attributes
Observation:• Was the user able to qalter the job?• If not, what error message, if any, was output?
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
161
Exercise – resvsub hook
Objective: To prevent users from requesting reservations
Prerequisites: Disable any existing hooks
PBS Administrator Tasks:1. Using qmgr create a hook called resvsub2. Set the event as resvsub3. The Python script is located in /root/hook_scripts/resvsub.py4. Leave the default attribute values as they are
PBS User Task:5. Request a reservation using pbs_rsub
Observation:• Was the user able to request a reservation?• If not, what error message, if any, was output?
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
162
Exercise – movejob hook
Objective: Prevent users from moving their jobs to another queue or PBS complex
Prerequisites: Disable any existing hooks
Should have at least 2 active queues
PBS Administrator Tasks:1. Using qmgr create a hook called movejob2. Set the event as movejob3. The Python script is located in /root/hook_scripts/movejob.py4. Leave the default attribute values as they are
PBS User Task:5. Qsub a job that should remain queued; not ready for execution6. Using qmove command, try to move it to another queue
Observation:• When trying to qmove, did it move that job where you requested?• If not, what error message if any was output?
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
163
Chapter Sixteen – Miscellaneous
Chapter Sixteen
PBS user and administrator commands
PBS_EXEC/etc directory
PBS_EXEC/unsupported/pbs_diag *
PBS_EXEC/unsupported/pbs_dtj *
Re-Installation of PBS Professional
* These scripts are not supported.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
164
PBS User & Administrator Commands
User Commands Administrator Commands
Command Purpose Command Purpose
nqs2pbs Convert from NQS pbs-report Report job statistics
pbs_rdel Delete Reservation pbs_hostid Report host identifier
pbs_rstat Status Reservation pbs_hostn Report host name(s)
pbs_rsub Submit Reservation pbs_probe PBS diagnostic tool
pbsdsh PBS distributed shell pbs_rcp File transfer tool
qalter Alter job pbs_tclsh TCL with PBS API
qdel Delete job pbsfs Show fairshare usage
qhold Hold a job pbsnodes Node manipulation
qmove Move job printjob Report job details
qmsg Send message to job qdisable Disable a queue
qorder Reorder jobs qenable Enable a queue
qrls Release hold on job qmgr Manager interface
qselect Select jobs by criteria qrerun Re-queue running job
qsig Send signal to job qrun Manually start a job
qstat Status job, queue, server qstart Start a queue
qsub Submit a job qstop Stop a queue
tracejob Report job history qterm Shut down PBS
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
165
$PBS_EXEC/etc directory
The directory $PBS_EXEC/etc contains backup PBS configuration files such as the following, if you ever need to revert back to the default configuration:
Filename Description
pbs_dedicated Dedicated time file
pbs_holidays Holidays file
pbs_init.d PBS init run script
pbs_postinstall PBS postinstall script
pbs_resource_group Fairshare resource group file
pbs_sched_config Scheduler configuration file
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
166
$PBS_EXEC/unsupported directory – pbs_diag
The pbs_diag* script is an interactive script that collects information from PBS configuration files and job-related history
Information that is collected:• qmgr settings for server, queues, and nodes• pbs_probe information about file permissions • pbs.conf master configuration information• pbsnodes node configuration/state information• qstat information about current state of the queues and server• information about existing reservations• pbs_hostn name resolution information• operating system version information• server, scheduler, and mom configuration files• tracejob and logging information for jobs specified by the user• server, scheduler, and mom logs for dates specified by the user• cpuset configuration information and current state if on a cpuset-aware system• vnode definition files• FLEXlm license server status
* pbs_diag is not supported.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
167
$PBS_EXEC/unsupported directory – pbs_dtj
pbs_dtj* (Distributed TraceJob) is a command that enables a user to gather tracejob information from ALL of the nodes where a PBS Professional job ran
By default, the script uses rsh to connect to the nodes, although it will check the pbs.conf file to see if PBS_SCP is set, and use ssh in that case
Usage: pbs_dtj <option>
Option Description
-u <username> Specify a user name under which to run
-r <rcommand> Override the rsh/ssh settings in pbs.conf
-n Number of days of log files to query
* pbs_dtj is not supported.
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
168
Re-installation of PBS Professional
Procedure to re-install PBS Professional either from the server or an execution host
1. Shut down any PBS daemons running on that host:
2. Verify the PBS daemons are no longer running:
3. Obtain the appropriate PBS rpm package name:
4. Remove the PBS rpm package:
5. Remove the directories $PBS_HOME and $PBS_EXEC
6. Remove the file /etc/pbs.conf
7. Remove the file /etc/init.d/pbs
Refer to the Installation part of the PBS Professional Installation and Upgrade Guide for complete installation procedure
/etc/init.d/pbs stop
ps –ef | grep pbs
rpm –qa | grep pbs
rpm –e pbs-11.0.0.103450
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
169
Chapter Seventeen - Troubleshooting
Chapter Seventeen
pbs_probe
pbs_hostn
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
170
Using pbs_probe
If a site has a post-installation issue, running the pbs_probe command may help identify the cause and possible fix
Using the pbs_probe command returns the following information
Options:
-v verbose mode
-f fix mode (checks & fixes directory permissions)
====== System Information =======
sysname=Linux
nodename=traintb16
release=2.6.22.5-31-default
version=#1 SMP 2007/09/21 22:29:00 UTC
machine=i686
=== No PBS Infrastructure Problems Detected ===
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
171
Using pbs_hostn
If a PBS site has hostname resolution issue, using the pbs_hostn command will help identify the problem
The command reports the results from gethostbyname and gethostbyaddr system calls
Example: pbs_hostn –v traintb16
primary name: traintb16.prog.altair.com (from gethostbyname())
aliases: traintb16
address length: 4 bytes
address: 204.235.21.130 (33554559 dec) name: traintb16.prog.altair.com
Copyright © 2010 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
172
Conclusion - Survey Monkey
Please take the opportunity to help assist us by filling out a quick online survey regarding this training class
The web link is bookmarked under the Bookmarks pull down menu in FireFox
Please make sure you click on “SUBMIT” when finished
THANK YOU