rhrk seminar high performance computing with the cluster „elwetritsch ... · high performance...

19
High Performance Computing with the Cluster „Elwetritsch“ Fokus: SLURM Course instructor: Dr. Josef Schüle, RHRK RHRK Seminar

Upload: duongkien

Post on 25-Jul-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

High Performance Computing with the Cluster „Elwetritsch“

Fokus: SLURM

Course instructor: Dr. Josef Schüle, RHRK

RHRK Seminar

Page 2: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Batch System SLURM

Advanced Topics

• Specifying Resource Requirements

• Multi-core Jobs

• Job Arrays Image Source: IBM

Page 3: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Batch System Concepts • Cluster: Tightly connected computers, called

nodes. • Job: Execution of user defined work flows by the

batch system. • Jobslot: Smallest execution unit. In practice: A

jobslot corresponds to a CPU core • Queue or Partition: Organisational unit for batch

system. Used for jobs in projects. • Scheduler: Controls users‘s jobs and resource

manager. Defines job priorities and starts and stops jobs accordingly.

Page 4: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Job submission according to resource requirements. Scheduler selects queue.

You do not have to care which node guarantees 64 GB memory and is open for 10 hours. Just specify these requirements. The scheduler will do the rest.

Jobs are scheduled according to priorities and resource availability.

If you you request 1 TB and there is already a job running on the single node with 1 TB you have to wait until that job has finished - no matter how high your priority is.

Priority is reduced while a job is running.

Batch Model at RHRK

Page 5: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Nodes are shared among users. Up to 16 non-parallel jobs may share a node with 16 cores. Requested number of cores are guaranteed. But memory buses, network connection, … are shared and may influence the run time of jobs.

Without an active project only a small subset of the cluster is used.

Idle queue utilizes whole cluster. But jobs in that queue have lowest possible priority and will be suspended in favor of jobs with higher priority.

Batch Model at RHRK

Page 6: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Specification of options to job submission are best collected inside a job script.

sbatch -t 30 -N 1 -J TestJob --wrap=„sleep 5“ Submit a job to sleep 5 seconds using 1 node for max. of 30 minutes or:

Job Scripts

Page 7: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

#!/bin/bash #SBATCH -J TestJob #SBATCH -N 1 #SBATCH -o TestJob-%j.out #SBATCH -e TestJob-%j.err #SBATCH -t 30 #SBATCH --mail_type=END echo "Executing on $HOSTNAME" sleep 5

Job Scripts

Page 8: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Specifies characteristics a host must have to match the resource requirements. Typical selection scenarios: • need of 8 cores: --ntasks=8 or -n 8 • specific (CPU) model : --constraint=XEON_E5_2640v3 • at least 60 GB memory: --mem=60000 (but not more than)

Useful command: • retrieve a list of all resources:

sinfo -o “%15N %4c %7m %46f %10G”

Resource Requirements

Page 9: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Typical resources: Memory: sbatch --mem 32000 Hard Disk: not choosable in sbatch, only by node Device: sbatch --gres=gpu:1 License: sbatch -L ansys

Useful commands: Retrieve a list of common resources:

elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Find out resource limits: sinfo -O nodelist:9,memory:10 -N | uniq

Finding hosts having a resource: sinfo -n node235 -O memory:10,disk:10

Resource Requirements

Page 10: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Specifies the locality of a parallel job If omitted, the required job slots for the job are

allocated from the available set of job slots

Syntax: -N 1 (--nodes=1)

all job slots must be on one host -N 4 -n 2

pack 2 jobs slots on each of the 4 assigned host should be accompanied by putting a constraint on the host type

Resource Requirements, parallel

Page 11: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

empty file system in RAM created for a job RAM filesystem needs requires increase of

memory requested usage:

#SBATCH --ramdisk=20M #20 MB, else 20G

Accessing the disk in a job cd ${SLURM_JOB_RAMDISK}

Resource Requirements, RAM Disk

Page 12: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

For parallel jobs on clusters with heterogeneous hosts

Specifies that all processes of a parallel job must run on hosts of the same type (load balancing)

Syntax: -C XEON_E5_2670

--constraint= XEON_E5_2670

other constraints: Infiniband network (-C IB) Hyperthreading off (-C HT:off) SSD (-C SSD) -C mpi64

Resource Requirements - constraint

Page 13: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

All jobs slots run on one host with shared memory OpenMP or Multi-threading used to occupy all cores

Default behaviour on Elwetritsch

CPU binding to all assigned cores As some nodes have hyper-threading enabled, you might

get two cores for one job slot (or -C HT:off)

Notes Use as many threads as cores have been assigned Too many threads result in context switches and create a

high load

Multi-core Jobs - SMP

Page 14: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Example: sbatch -t 10 -n 1 --cpus-per-task 4 -

-wrap SMP_JOB

Multi-core Jobs - SMP

Page 15: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

MPI = Message Passing Interface Jobs consists of many tasks that communicate Typically on more than one compute node Communication via Infiniband (fast) or Ethernet

(very slow)

MPI support on Elwetritsch Open MPI open source product (free for all users) Intel MPI commercial product (University

members only) IBM Platform MPI commercial and community edition

(all users)

Multi-core Jobs - MPI

Page 16: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

pure Open MPI Version module load openmpi/latest

sbatch -t 10 -N 2 --ntasks-per-node=16 \

-L ompi-C XEON_E5_2670 --mem=100 \

--wrap mpirun my_program

threaded Open MPI module load openmpi/latest

sbatch -t 10 -N 2 -n 4 \

--ntasks-per-node=2 --cpus-per-task=8 \

-L ompi -C XEON_E5_2670 --mem=100 \

--wrap “mpirun -x OMP_NUM_THREADS=8 my_program

Multi-core Jobs - Open MPI

Page 17: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Hybrid jobs combine SMP and MPI On each node SMP is used Between nodes MPI is used Optionally, a master thread assigns work to MPI

tasks

Multi-core Jobs - Hybrid

Page 18: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

#!/bin/bash # required in first line #SBATCH -p you_project # select a project or insert # in first column #SBATCH --mail-type=END # want a mail notification at end of job #SBATCH -J jobname # name of the job #SBATCH -o jobname.%j.out # Output: %j expands to jobid #SBATCH -e jobname.%j.err # Error: %j expands to jobid #SBATCH -C XEON_E5_2670 # selects only nodes of that type #SBATCH -C IB # request Infiniband #SBATCH -L ompi # request a license for openmpi #SBATCH --nodes=2 # requesting 2 nodes (identical -N 2) #SBATCH --ntasks=4 # requesting 4 MPI tasks (identical -n 4) #SBATCH --ntasks-per-node=2 # 2 MPI tasks will be started per node #SBATCH --cpus-per-task=3 # each MPI task starts 3 OpenMP threads module purge # clean all module versions module load openmpi/your-version # load your openmpi version mpirun -x "OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK" ./your_program

OpenMPI - job script

Page 19: RHRK Seminar High Performance Computing with the Cluster „Elwetritsch ... · High Performance Computing with the Cluster ... elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

43

Vielen Dank Thank You

High Performance Computing on Elwetritsch

SLURM