using lisa - uva · 2012-06-18 · login nodes and batch nodes lisa consists of - 2 login nodes...

43
Using Lisa

Upload: others

Post on 25-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Using Lisa

Page 2: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,
Page 3: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Willem Vermin

[email protected]

Senior consultant at SARA

www.sara.nl

Page 4: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Q: What is Lisa

A: Lisa is a compute cluster, 512 nodes, 4480 cores

Q: Operating system?

A: Linux

Q: Software?

A: standard Debian (= Ubuntu), plus a few hundred packages

Q: Can I use Lisa?

A: Affiliates from of the UvA use Lisa for free

The Lisa cluster

Page 5: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Q: What is Lisa

A: Lisa is a compute cluster, 512 nodes, 4480 cores

Q: Operating system?

A: Linux

Q: Software?

A: standard Debian (= Ubuntu), plus a few hundred packages

Q: Can I use Lisa?

A: Affiliates from of the UvA use Lisa for free

The Lisa cluster

Page 6: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Login nodes and batch nodes

Lisa consists of

- 2 login nodes (lisa.sara.nl)- 500+ batch nodes for running jobs- gpu-equipped cluster: 8 nodes, 16 GPU's

All nodes share the same home file system and system file systems (SLOW)All nodes are equipped with a scratch disk (FAST)

Page 7: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Contents of this tutorial

What is a job

Modules

Software available

Job scripts: create, submit

Status of jobs

Efficient usage of the system

Page 8: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Log in to the system:

lisa.sara.nl

Login: sdemo001 .. sdemo0050Password: see printout

Change password: passwd

Logging in to the lisa system

Page 9: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Log in to the system:

For this course: gpu.sara.nl

Login: sdemo001 .. sdemo0050Password: see printout

Change password: passwd

Logging in to the lisa system

Page 10: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

What is a job

A job consists of two parts:

- 1. the part describing what kind of job this is:

- amount of wallclock time needed - number of nodes needed - some extra's

- 2. the part describing what this job should do:

- a shell script#PBS -lnodes=1:mem24gb#PBS -lwalltime=200

datecd $HOME/workdirecho "3 + 4" | bcecho "end of job"

Page 11: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

How to submit a job

#PBS -lnodes=1#PBS -lwalltime=200

dateecho "3 + 4" | bcecho "end of job"

File called 'job1':

qsub job1

Type to submit this job:

Create this file 'job1'

Number of nodes

Job can take 200 seconds walltime

Page 12: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

File systems on Lisa

- Home file system: NFS 200 Gb/user accessible from all nodes SLOW

- /scratch file system ($TMPDIR) local disk 70-240 Gb accessible on the node itself cleaned after job

- /archive file system for storing large amounts of seldom used data accessible from login nodes slow

Page 13: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Archive file system

Location: (user elvis) /archive/elvis

Consists of disk and tapes.

For storage of seldom used data. Do not use for storing manysmall files, but tar first. Example, data is directory:

tar zcvf /archive/data.tgz data

See https://www.sara.nl/systems/lisa/filesystems

Page 14: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Software available

Complete unix suite:

grep awk perl python gcc gfortran java ….....

If you are missing something: let us know: [email protected]

Many extra packages, see http://sara.nl/systems/lisa/software

These extra packages are made available by the modules mechanism

If you need something extra, let us know: [email protected]

Page 15: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Modules what and why?

What:

- define environment for using or developing software

Why:

- when installing in a standard place (/usr/local/bin), root permission is required. Disaster when using flaky installation scripts

- what to do when more than one version is required?

- different packages with the same name

- and more ...

Page 16: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Modules

Purpose: make software available by defining the environment (PATH, ...)

Example: type

plinkmodule listmodule load plinkmodule listplinkmodule unload plinkmodule listplink

Page 17: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Module commands

module load name # activate module namemodule unload name # deactivate module name

module list # which modules are active

module avail ['xyz*'] # which modules are available

module display name # show contents of module

name examples:

plink # the default, in general # the newest version plink/1.02 # request specific version

openmpi # openmpi for Intel compilers openmpi/gnu # GNU compilers openmpi/intel/1.3 # specific version for # intel compilers

Page 18: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Most used module commands

Developing code, use Intel compilers:

module load c fortran

and the libraries for usage with Intel compilers:

module load fftw3 mkl

In general: to run a program, compiled with Intel compilers,the corresponding module must also be loaded.

module load fortran./intelfortranprog

Page 19: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Compiler wrappers

Need only to specify -l flag. -L and -I flags areautomatically generated:

module load fortran fftw3ifort myprog.f90 -lfftw3

Compilerwrappers implemented as a module.If undesired:

module unload compilerwrappers

On Lisa, when you call for example gcc, a wrapper iscalled instead, which takes care of the following:

Page 20: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Create job scripts

A job script is a simple text file.Normally: created using an editor

When many jobs are needed:

- create a script (written in bash, R, matlab, python, …) that creates and submits lots of jobs - use disparm - use array jobs

Page 21: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Typical job script

#PBS -lnodes=1:mem24gb:cores8#PBS -lwalltime=5:00:00

module load fortrancd $HOME/workdirecho "start of this job"some-command some parametersecho "end of this job"

-lnodes=n : request n nodes:mem24gb : nodes must have 24gb memory:cores8 : nodes must contain 8 cores:cores12 : nodes must contain 12 cores-lwalltime=5:00:00 : job cannot take more than 5 hours walltime

Page 22: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

What happens after qsub

- 1. the #PBS lines are evaluated - 2. a copy of the job script is made - 3. the job is put in the job queue - 4. when the requested nodes are available: - a. nodes are allocated exclusively to the job - b. the copied job script is started on the first node as if a login is done - c. the job ends after the job script is ended, or when the wallclock limit is exceeded - d. the stdout and stderr are sent to the directory the job was submitted

Page 23: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Following your jobs

showq [ -u loginname]qstat [ -u loginname]

What happens on the nodes

pbs_jobmonitor 6651183

job number from showq/qstat

pbs_jobmonitor 6651183

pbs_joblogin 6651183

logs you in on the node

Page 24: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

When will my job start

Maui scheduler

- first in – first out with backfill - priority adapted according to fair-share: group and user - measures to prevent monopolization of the system - see https://www.sara.nl/systems/shared/usage/maui-explained

How to favor my job

Specify sane walltime. Example: programwill run for one hour. Specify -lwalltime=1:30:00

How to get my work done in minimum of time

- Use all cores in your node - See above - Use efficient program

Page 25: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Deleting a job

Find the job numer using showq or qstat.

qdel jobnumber

Page 26: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Efficient jobs

Efficiency: finish your computations in the minimum amount of wall clock time

- use all computing power in a node (8 or 12 cores)- try to arrange that your jobs are scheduled early

Monitor your jobs:

pbs_jobmonitor jobnumberpbs_joblogin jobnumber

Page 27: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Tasks: 201 total, 3 running, 198 sleeping, 0 stopped, 0 zombieCpu(s): 56.8%us, 9.2%sy, 6.2%ni, 27.6%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%stMem: 16473116k total, 3117048k used, 13356068k free, 32k buffersSwap: 3999736k total, 43420k used, 3956316k free, 1954080k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19689 wva 20 0 560m 472m 107m R 102 2.9 1:18.99 Alpino.bin 19671 wva 20 0 565m 476m 107m R 100 3.0 1:28.93 Alpino.bin 625 wva 20 0 13380 1484 1244 S 0 0.0 0:00.00 bash 794 wva 20 0 13392 1120 860 S 0 0.0 0:00.00 bash 816 wva 20 0 34764 4832 2136 S 0 0.0 0:00.26 ssh 818 wva 20 0 47036 16m 3988 S 0 0.1 0:00.99 python2.7 819 wva 20 0 46424 15m 3988 S 0 0.1 0:01.01 python2.7 820 wva 20 0 46632 16m 3988 S 0 0.1 0:01.06 python2.7 821 wva 20 0 47220 16m 3988 S 0 0.1 0:01.34 python2.7 822 wva 20 0 52352 21m 3988 S 0 0.1 0:01.58 python2.7 823 wva 20 0 49604 19m 3988 S 0 0.1 0:01.56 python2.7 824 wva 20 0 48432 17m 3988 S 0 0.1 0:01.27 python2.7 825 wva 20 0 48004 17m 3988 S 0 0.1 0:01.33 python2.7 826 wva 20 0 36588 7856 3144 S 0 0.0 0:02.66 python2.7 19670 wva 20 0 11104 1420 1192 S 0 0.0 0:00.00 Alpino 19688 wva 20 0 11104 1420 1192 S 0 0.0 0:00.00 Alpino

Inefficient job: 2 active processes

Page 28: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Tasks: 239 total, 2 running, 237 sleeping, 0 stopped, 0 zombieCpu(s): 44.1%us, 1.7%sy, 0.0%ni, 53.8%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%stMem: 24733148k total, 13307084k used, 11426064k free, 40k buffersSwap: 3999736k total, 24212k used, 3975524k free, 502784k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20459 qyin 20 0 12.0g 11g 864 R 101 50.7 5353:21 Yin 20307 qyin 20 0 13380 1484 1244 S 0 0.0 0:00.00 bash 20458 qyin 20 0 13384 896 644 S 0 0.0 0:00.00 bash

Efficient job?

Page 29: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Tasks: 195 total, 9 running, 186 sleeping, 0 stopped, 0 zombieCpu(s): 61.6%us, 1.4%sy, 0.0%ni, 36.7%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%stMem: 24735448k total, 1188560k used, 23546888k free, 44k buffersSwap: 3999736k total, 21420k used, 3978316k free, 889296k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11767 sdebeer 20 0 179m 17m 5720 R 102 0.1 134:45.59 md_mpi 11761 sdebeer 20 0 179m 19m 7088 R 100 0.1 134:50.98 md_mpi 11762 sdebeer 20 0 179m 18m 7168 R 100 0.1 134:49.04 md_mpi 11763 sdebeer 20 0 179m 19m 7284 R 100 0.1 134:51.89 md_mpi 11765 sdebeer 20 0 179m 18m 6736 R 100 0.1 134:49.32 md_mpi 11768 sdebeer 20 0 179m 17m 5940 R 100 0.1 134:39.08 md_mpi 11764 sdebeer 20 0 179m 18m 6576 R 98 0.1 134:45.93 md_mpi 11766 sdebeer 20 0 179m 17m 5692 R 98 0.1 134:49.17 md_mpi 11598 sdebeer 20 0 13380 1476 1236 S 0 0.0 0:00.00 bash 11749 sdebeer 20 0 13404 960 688 S 0 0.0 0:00.00 bash 11755 sdebeer 20 0 52772 2296 1624 S 0 0.0 0:00.59 mpiexec

Efficient MPI job

Page 30: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Tasks: 244 total, 2 running, 242 sleeping, 0 stopped, 0 zombieCpu(s): 42.3%us, 4.2%sy, 0.0%ni, 53.2%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%stMem: 24733148k total, 1244256k used, 23488892k free, 32k buffersSwap: 3999736k total, 21808k used, 3977928k free, 868152k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9097 kazaryan 20 0 430m 116m 3428 R 1194 0.5 366:23.28 dscf_smp 9058 kazaryan 20 0 9152 1388 1112 S 0 0.0 0:00.00 dscf 19258 kazaryan 20 0 13448 1556 1248 S 0 0.0 0:00.02 bash 19669 kazaryan 20 0 13456 1076 752 S 0 0.0 0:00.00 bash 27769 kazaryan 20 0 9580 1840 1136 S 0 0.0 0:00.48 jobex

Efficient?

Page 31: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Tasks: 204 total, 9 running, 195 sleeping, 0 stopped, 0 zombieCpu(s): 60.8%us, 14.6%sy, 9.8%ni, 14.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%stMem: 24735448k total, 856716k used, 23878732k free, 32k buffersSwap: 3999736k total, 21516k used, 3978220k free, 520852k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11599 msnoek 20 0 33216 6008 1716 R 102 0.0 1254:59 FKMC3d_smart_en 11600 msnoek 20 0 33216 6004 1716 R 100 0.0 1254:44 FKMC3d_smart_en 11602 msnoek 20 0 33216 6008 1716 R 100 0.0 1254:38 FKMC3d_smart_en 11603 msnoek 20 0 33216 6004 1712 R 100 0.0 1254:38 FKMC3d_smart_en 11604 msnoek 20 0 33216 6004 1712 R 100 0.0 1254:23 FKMC3d_smart_en 11605 msnoek 20 0 33216 6000 1712 R 100 0.0 1254:56 FKMC3d_smart_en 11601 msnoek 20 0 33216 6008 1716 R 98 0.0 1254:20 FKMC3d_smart_en 11606 msnoek 20 0 33216 6004 1712 R 98 0.0 1254:37 FKMC3d_smart_en 11412 msnoek 20 0 13396 1500 1240 S 0 0.0 0:00.00 bash 11598 msnoek 20 0 13404 924 648 S 0 0.0 0:00.00 bash

Efficient: a number of processes running in parallel

How to create such a nice job?

Page 32: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

principle of multi-process job

#PBS -lnodes=1:cores8 -lwalltime=1:00:00

cd $HOME/workdirsome_program 1 >out1 2>err1 &some_program 2 >out2 2>err2 &some_program 3 >out3 2>err3 &some_program 4 >out4 2>err4 &some_program 5 >out5 2>err5 &some_program 6 >out6 2>err6 &some_program 7 >out7 2>err7 &some_program 8 >out8 2>err8 &wait

8 processes in background

wait until back ground processes are ended

Page 33: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Multi-line commands in background

cd $TMPDIR( cp $HOME/input1 . my_program input1 > output1 cp output1 $HOME) &( cp $HOME/input2 . my_program input2 > output2 cp output2 $HOME) &….wait

Hmmm.. lots of lines to write

Page 34: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Automatic generation of jobs

You can use any language: bash, C, Python, Perl, …to generate job scripts and submit them.

#!/usr/bin/pythonimport osfor i in range(100): f = open("tmpjob","w") print >>f,"#PBS -lnodes=1 -lwalltime=1:00:00" print >>f,"#PBS -Jjob"+str(i) print >>f,"cd $HOME/workdir" for j in range(8): k = 8*i+j; sk=str(k) print >>f,"./myprog parm"+sk+" >out."+\ sk+" 2>err."+sk+" &" print >>f,"wait"; f.close() #os.system("qsub tmpjob") os.system("cat tmpjob")

Page 35: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Array jobs

Submit the same job many times:

qsub -t 4-23 job

The same job will be submitted 20 times,jobnumbers will be like 6788900-4 .. 6788900-23

In the job, the environment variable PBS_ARRAYIDis available, here ranging fro 4 to 23

Deleting all jobs:

qdel 6788900

Page 36: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Using the scratch disk

Home file system is terribly slow

If a job accesses the home file system frequently,everybody on the system suffers!

Remedy:

cp infile $TMPDIR copy input files to scratch

cd $TMPDIRmyprog infile outfile Let your program read from scratch

cp outfile $HOME/datadir copy output file to home

Note: scratch disk is cleaned after job

Page 37: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

DISPARMWhat? string server

Why? to facilitate managing large number of jobs

How? https://www.sara.nl/systems/lisa/software/disparm

- create file with strings to be used as parameters (my_parmfile) - module load disparm - disparm -i my_parmfile -p my_pool

Now the file my_pool is filled with the lines from my_parmfile.

disparm -ngets one line from the pool, and marks this line.

disparm -rmarks the previously extracted line as ready

Page 38: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Disparm example

Create a parameter file. For example:

seq 1000 > myparms

Create a pool file:

module load disparmdisparm -c -i myparms -p mypool

Try it:

disparm -n -p mypoolecho $DISPARM_VALUEdisparm -r -p mypooldisparm -n -p mypoolecho $DISPARM_VALUE

Page 39: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Disparm example 2

disparm -s -p mypooldisparm -n -p mypooldisparm -r -p mypooldisparm -s -p mypool

Disparm takes care, that every line is produced once,and that a line marked as 'ready' will not be producedin future.

Advanced: using the -m flag, you can specify thatdisparm can produce the same line more than once.Useful in case a process runs in a time limit orother mishap.

Page 40: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Disparm example 3A job using disparm:

#PBS -lnodes=1 -lwalltime=1:00:00module load disparmncores=`sara-get-num-cores`for ((i=1; i<=ncores; i++)) ; do ( for ((j=1; j<=10; j++)) ; do disparm -n -p mypool if [ "$DISPARM_RC" != "OK" ]; then break fi myprogram $DISPARM_VALUE disparm -r -p mypool done ) &donewait

Use all cores available

get new line

check if it looks ok

call my program

mark line as ready

Run myprogram 10 timesafter each other in eachthread

https://www.sara.nl/systems/lisa/software/disparm

Submit 20 jobs:qsub -t 1-20 disparmjob

Page 41: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Disparm summary

- use all cores in all kind of nodes- only one, relatively simple, job script required- no job generating script necessary- automatic load balancing possible

Page 42: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Summary

- The Lisa system - jobs: what - software - module environment - efficient jobs - use as much cores as possible - create many efficient jobs - disparm

Page 43: Using Lisa - UvA · 2012-06-18 · Login nodes and batch nodes Lisa consists of - 2 login nodes (lisa.sara.nl) - 500+ batch nodes for running jobs - gpu-equipped cluster: 8 nodes,

Thank you for your attention

[email protected] Willem Vermin

https://www.sara.nl/systems/lisa