rhrk seminar high performance computing with the cluster „elwetritsch ... · high performance...
TRANSCRIPT
High Performance Computing with the Cluster „Elwetritsch“
Fokus: SLURM
Course instructor: Dr. Josef Schüle, RHRK
RHRK Seminar
Batch System SLURM
Advanced Topics
• Specifying Resource Requirements
• Multi-core Jobs
• Job Arrays Image Source: IBM
Batch System Concepts • Cluster: Tightly connected computers, called
nodes. • Job: Execution of user defined work flows by the
batch system. • Jobslot: Smallest execution unit. In practice: A
jobslot corresponds to a CPU core • Queue or Partition: Organisational unit for batch
system. Used for jobs in projects. • Scheduler: Controls users‘s jobs and resource
manager. Defines job priorities and starts and stops jobs accordingly.
Job submission according to resource requirements. Scheduler selects queue.
You do not have to care which node guarantees 64 GB memory and is open for 10 hours. Just specify these requirements. The scheduler will do the rest.
Jobs are scheduled according to priorities and resource availability.
If you you request 1 TB and there is already a job running on the single node with 1 TB you have to wait until that job has finished - no matter how high your priority is.
Priority is reduced while a job is running.
Batch Model at RHRK
Nodes are shared among users. Up to 16 non-parallel jobs may share a node with 16 cores. Requested number of cores are guaranteed. But memory buses, network connection, … are shared and may influence the run time of jobs.
Without an active project only a small subset of the cluster is used.
Idle queue utilizes whole cluster. But jobs in that queue have lowest possible priority and will be suspended in favor of jobs with higher priority.
Batch Model at RHRK
Specification of options to job submission are best collected inside a job script.
sbatch -t 30 -N 1 -J TestJob --wrap=„sleep 5“ Submit a job to sleep 5 seconds using 1 node for max. of 30 minutes or:
Job Scripts
#!/bin/bash #SBATCH -J TestJob #SBATCH -N 1 #SBATCH -o TestJob-%j.out #SBATCH -e TestJob-%j.err #SBATCH -t 30 #SBATCH --mail_type=END echo "Executing on $HOSTNAME" sleep 5
Job Scripts
Specifies characteristics a host must have to match the resource requirements. Typical selection scenarios: • need of 8 cores: --ntasks=8 or -n 8 • specific (CPU) model : --constraint=XEON_E5_2640v3 • at least 60 GB memory: --mem=60000 (but not more than)
Useful command: • retrieve a list of all resources:
sinfo -o “%15N %4c %7m %46f %10G”
Resource Requirements
Typical resources: Memory: sbatch --mem 32000 Hard Disk: not choosable in sbatch, only by node Device: sbatch --gres=gpu:1 License: sbatch -L ansys
Useful commands: Retrieve a list of common resources:
elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml
Find out resource limits: sinfo -O nodelist:9,memory:10 -N | uniq
Finding hosts having a resource: sinfo -n node235 -O memory:10,disk:10
Resource Requirements
Specifies the locality of a parallel job If omitted, the required job slots for the job are
allocated from the available set of job slots
Syntax: -N 1 (--nodes=1)
all job slots must be on one host -N 4 -n 2
pack 2 jobs slots on each of the 4 assigned host should be accompanied by putting a constraint on the host type
Resource Requirements, parallel
empty file system in RAM created for a job RAM filesystem needs requires increase of
memory requested usage:
#SBATCH --ramdisk=20M #20 MB, else 20G
Accessing the disk in a job cd ${SLURM_JOB_RAMDISK}
Resource Requirements, RAM Disk
For parallel jobs on clusters with heterogeneous hosts
Specifies that all processes of a parallel job must run on hosts of the same type (load balancing)
Syntax: -C XEON_E5_2670
--constraint= XEON_E5_2670
other constraints: Infiniband network (-C IB) Hyperthreading off (-C HT:off) SSD (-C SSD) -C mpi64
Resource Requirements - constraint
All jobs slots run on one host with shared memory OpenMP or Multi-threading used to occupy all cores
Default behaviour on Elwetritsch
CPU binding to all assigned cores As some nodes have hyper-threading enabled, you might
get two cores for one job slot (or -C HT:off)
Notes Use as many threads as cores have been assigned Too many threads result in context switches and create a
high load
Multi-core Jobs - SMP
Example: sbatch -t 10 -n 1 --cpus-per-task 4 -
-wrap SMP_JOB
Multi-core Jobs - SMP
MPI = Message Passing Interface Jobs consists of many tasks that communicate Typically on more than one compute node Communication via Infiniband (fast) or Ethernet
(very slow)
MPI support on Elwetritsch Open MPI open source product (free for all users) Intel MPI commercial product (University
members only) IBM Platform MPI commercial and community edition
(all users)
Multi-core Jobs - MPI
pure Open MPI Version module load openmpi/latest
sbatch -t 10 -N 2 --ntasks-per-node=16 \
-L ompi-C XEON_E5_2670 --mem=100 \
--wrap mpirun my_program
threaded Open MPI module load openmpi/latest
sbatch -t 10 -N 2 -n 4 \
--ntasks-per-node=2 --cpus-per-task=8 \
-L ompi -C XEON_E5_2670 --mem=100 \
--wrap “mpirun -x OMP_NUM_THREADS=8 my_program
Multi-core Jobs - Open MPI
Hybrid jobs combine SMP and MPI On each node SMP is used Between nodes MPI is used Optionally, a master thread assigns work to MPI
tasks
Multi-core Jobs - Hybrid
#!/bin/bash # required in first line #SBATCH -p you_project # select a project or insert # in first column #SBATCH --mail-type=END # want a mail notification at end of job #SBATCH -J jobname # name of the job #SBATCH -o jobname.%j.out # Output: %j expands to jobid #SBATCH -e jobname.%j.err # Error: %j expands to jobid #SBATCH -C XEON_E5_2670 # selects only nodes of that type #SBATCH -C IB # request Infiniband #SBATCH -L ompi # request a license for openmpi #SBATCH --nodes=2 # requesting 2 nodes (identical -N 2) #SBATCH --ntasks=4 # requesting 4 MPI tasks (identical -n 4) #SBATCH --ntasks-per-node=2 # 2 MPI tasks will be started per node #SBATCH --cpus-per-task=3 # each MPI task starts 3 OpenMP threads module purge # clean all module versions module load openmpi/your-version # load your openmpi version mpirun -x "OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK" ./your_program
OpenMPI - job script
43
Vielen Dank Thank You
High Performance Computing on Elwetritsch
SLURM