high performance computing with the cluster „elwetritsch - ii · high performance computing with...

21
High Performance Computing with the Cluster „Elwetritsch“ - II Course instructor : Dr. Josef Schüle, RHRK RHRK-Seminar

Upload: ledien

Post on 21-Jul-2019

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

High Performance Computing with the Cluster „Elwetritsch“ - II

Course instructor : Dr. Josef Schüle, RHRK

RHRK-Seminar

Page 2: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Overview

Course I

Login to cluster SSH

RDP / NX

Desktop Environments GNOME (default)

other desktops

Linux Basics Terminal / Shell

file systems

Tipps & Tricks

Course II

Cluster Basics Hardware

Software

Brainware

Batch system submitting jobs

Monitoring / Accounting

AHRP Projects

Working on MOGON /Mainz

Page 3: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Overview

Course III

Batchsystem SLURM Array-Jobs

Multi-core Jobs

SMP

MPI

Hybrid

job resources accellerators

file systems

Further questions, ideas for the course?

Send an E-Mail to [email protected]

Page 4: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Cluster Basics

Hardware

• Nodes

• Network

• File systems

Software

• System Software

• Module

• Batch system

Brainware

Page 5: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Hardware

nodes login nodes (headx) compute nodes

interactive available not interactive available

special different GPU versions/number XEON Phi P5110P 1 TB RAM

Infiniband network QDR

Ethernet network 1/10GB

file systems /scratch Fraunhofer BeeGFS (322 TB) /home NetApp-NFS for HOME (10 TB) /work small files on request /tmp short-time temporary storage

https://elwe.rhrk.uni-kl.de/elwetritsch/ressourcen.shtml

Page 6: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

System software Scientific Linux 7.x periodical updates

Module software installed from source ->/software available in several versions -> software modules command: module Is something missing?: E-Mail to [email protected]

kl.de

Batch system SLURM jSLURM – graphical user interface

Software (1)

Page 7: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Software (2)

Compiler & Tools Intel Cluster Suite

Portland PGI

GNU Compiler Collection

CUDA Toolkit & GPU Programming SDK

Totalview Parallel Debugger

Various developper environments

Libraries Intel Math Kernel Library

(MKL)

AMD Core Math Library (ACML)

MPI Intel MPI

Open MPI

IBM Platform MPI

Page 8: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Competence in using the cluster at it‘s best is important for HPC

Don‘t hesitate to contact us and ask questions (Hotline)

Direct personal question may be posed to Dr. Josef Schüle Dr. Markus Hillenbrand Sven Daxinger

.. if you need help installating programs select best programs compile your programs using the batch system (select parameters and resources) developing your own software (OpenMP, MPI, CUDA) writing scripts to support your work

Brainware

Page 9: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Batch System

Job Management

• submission

• query

• ending

Monitoring

Accounting

jSLURM

Page 10: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Cluster: Number of servers to be seen as unity Bundeling compute power resources and load became balanced commands: sinfo, sacct, sprio, sshare

Job: work unit executed by the batch system typical a command or command sequence with

options and arguments batch system schedules for execution, controls and

protocols commands: squeue, sbatch

Wording (1)

Page 11: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Jobslot: Smallest execution unit. on Elwetritsch a jobslot corresponds to a CPU core batch system organizes usage of servers commands: sinfo, squeues, rhrk-showqos

States for jobs: PEND, RUN, DONE, EXIT, WAIT PSUSP, USUSP, SSUSP, POST_DONE, POST_ERR command: sacct

Queue: sorting jobs according to resource requirements after submission, jobs are sorted into waiting queues remain waiting until all resource requirements are available commands: squeue, sbatch

Scheduling: algorithm to select and start next jobs Fairshare Scheduling each user has a fair amount - according to project priority - of

cluster resources

Wording (2)

Page 12: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Commands Purpose

salloc request interactive job allocations. Better is rhrk-launch

sbatch submit a batch script

scancel cancel a job or job step

scontrol manage jobs and query information

sinfo retrieve information about partitions, reservations and nodes

sprio job priorities

srun initiate job steps from within a job

sshare displays fair share information for each user

sstat status information about a running job

sacct accounting information. Better is rhrk-accounting

sacctmgr query accounting information

SLURM - overview

Page 15: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Accounting

Cockpit Kommandozeile

sacct [-l]

rhrk-accounting -s

Page 16: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

jSLURM - GUI for SLURM

Developed at RHRK

report errors

further improvements?

more applications?

Page 17: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

AHRP

General

• history

• aims

• tasks

Projects

Interaction with MOGON

Page 18: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

founded April 2010 by TU Kaiserslautern Johannes Gutenberg Universität Mainz

aims coordinating HPC activities provisioning of state-of-the-art HPC ressources

for research in RLP tasks

install, maintain and purchase ressources in common

coordinated teaching and support for HPC provisioning of 15% of the ressources for

research to all in RLP, load balancing among the sites

application based request for ressources

Overview

Page 19: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Projects

three steps fill-in form wait for review follow the sent instruction

project sizes XS: < 5 NE / month S: < 30 NE / month M: < 100 NE / month L: < 500 NE / month XL: > 500 NE / month

NE = CPU cores * h / 1000

Link: http://www.ahrp.info

Page 20: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

Mogon I in Mainz consists of AMD systems with 64 cores more cores in total, but each less powerful source code has to be recompiled

interaction will again be realized with common batch system (previous LSF)

Usage Premise is an AHRP project E-Mail to [email protected] with a short

reasoning

interaction with Mogon

Page 21: High Performance Computing with the Cluster „Elwetritsch - II · High Performance Computing with the Cluster ... compute nodes ... more cores in total,

25

Vielen Dank Thank You

High Performance Computing on Elwetritsch

Part II