introduction to scientific computing shubin liu, ph.d. renaissance computing institute (renci)...

92
Introduction to Introduction to Scientific Computing Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

Upload: nicholas-tucker

Post on 03-Jan-2016

224 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

Introduction to Scientific Introduction to Scientific ComputingComputing

Shubin Liu, Ph.D.Renaissance Computing Institute (RENCI)University of North Carolina at Chapel Hill

Page 2: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 2

Agenda• Introduction to High-Performance Computing• Hardware Available

– Servers, storage, file systems, etc.

• How to Access• Programming Tools Available

– Compilers & Debugger tools– Utility Libraries– Parallel Computing

• Scientific Packages Available• Job Management• Hands-on Exercises

Page 3: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 3

Course Goals

• An introduction to high-performance computing and UNC Research Computing

• Available Research Computing hardware facilities • Available software packages and serial/parallel

programming tools and utilities/libraries• How to efficiently make use of Research Computing

facilities on campus

Page 4: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 4

Pre-requisites• An account on Emerald cluster• UNIX Basics

Getting started: http://help.unc.edu/?id=5288

Intermediate: http://help.unc.edu/?id=5333

vi Editor: http://help.unc.edu/?id=152

Customizing: http://help.unc.edu/?id=208

Shells: http://help.unc.edu/?id=5290

ne Editor: http://help.unc.edu/?id=187

Security: http://help.unc.edu/?id=217

Data Management: http://help.unc.edu/?id=189

Scripting: http://help.unc.edu/?id=213

HPC Application: http://help.unc.edu/?id=4176

Page 5: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 5

About Us• ITS

– http://its.unc.edu– Physical locations: 401 West Franklin Street; 211 Manning Drive– 12 Divisions

IT Infrastructure and Operations Research Computing Teaching and Learning Technology Planning and Special Projects Telecommunications User Support and Engagement Office of the Vice Chancellor Communications Enterprise Applications Enterprise Data Management Financial Planning and Human Resources Information Security

• RENCI– http://www.renci.org/– Anchor Site: 100 Europa Drive, suite 540, Chapel Hill – A number of virtual sites on the campuses of Duke, NCSU and UNC-Chapel Hill,

and regional facilities across the state – Mission: to foster multidisciplinary collaborations; to enable advancements in

science, industry, education, the humanities and the arts; to provide the technical leadership and expertise; to work hand-in-hand with businesses and communities to utilize advanced technologies

Page 6: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 6

About Us

• Where/Who are we and do we do?– ITS Manning: 211 Manning Drive– Website

http://www.renci.org/unc/computing/

– Groups• Infrastructure • Engagement • User Support

Page 7: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 7

About Myself• Ph.D. from Chemistry, UNC-CH

• Currently Senior Computational Scientist @ RENCI Engagement Center at UNC Chapel Hill

• Responsibilities:

– Support Comp Chem/Phys/Material Science software

– Support Programming (FORTRAN/C/C++) tools, code porting, parallel computing, etc.

– Conduct research and engagement projects on Computational Chemistry

• DFT theory and concept

• Systems in biological and material science

Page 8: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 8

About You

• Name, department, interest?• Any experience before with high performance

computing?• What do you expect to use the Research

Computing facilities for?• What do you expect from this training course?

Page 9: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 9

What is Scientific Computing?• Simply put

– To use high-performance computing (HPC) facilities to solve real scientific problems.

• From Wikipedia.com– Scientific computing (or computational science) is the

field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific and engineering problems. In practical use, it is typically the application of computer simulation and other forms of computation to problems in various scientific disciplines.

Page 10: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 10

What is Scientific Computing?Engineering

Sciences

NaturalSciences

ComputerScience

AppliedMathematics

Scientific Computing

Theory/Model Layer

Algorithm Layer

Hardware/Software

Application Layer From scientific discipline viewpoint From operational viewpoint

Parallel

Computing

High- Performance

ComputingScientificComputing

From Computing Perspective

Page 11: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 11

What is HPC?• Computing resources which provide more than an

order of magnitude more computing power than current top-end workstations or desktops – generic, widely accepted.

• HPC ingredients:– large capability computers (fast CPUs)– massive memory – enormous (fast & large) data storage – highest capacity communication networks (Myrinet, 10

GigE, InfiniBand, etc.)– specifically parallelized codes (MPI, OpenMP)– visualization

Page 12: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 12

Why HPC?

• What are the three-dimensional structures of all of the proteins encoded by an organism's genome and how does structure influence function, both spatially and temporally?

• What patterns of emergent behavior occur in models of very large societies? • How do massive stars explode and produce the heaviest elements in the periodic

table? • What sort of abrupt transitions can occur in Earth’s climate and ecosystem structure?• How do these occur and under what circumstances? If we could design catalysts

atom-by-atom, could we transform industrial synthesis? • What strategies might be developed to optimize management of complex

infrastructure systems? • What kind of language processing can occur in large assemblages of neurons? • Can we enable integrated planning and response to natural and man-made disasters

that prevent or minimize the loss of life and property?

– From NSF Program Solicitation Page on HPC System Acquisition: Towards a Petascale Computing Environment for Science and Engineering, 2006

http://www.nsf.gov/pubs/2005/nsf05625/nsf05625.htm

Page 13: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 13

Machine LINPACK Performance

Peak Performance

Intel Pentium 4 (2.53 GHz) 2355 5060

NEC SX-6/1 (1proc. 2.0 ns) 7575 8000

HP rx5670 Itanium2 (1GHz)   3528 4000

IBM eServer pSeries 690 (1300 MHz)

2894 5200

Cray SV1ex-1-32(500MHz) 1554 2000

Compaq ES45 (1000 MHz) 1542 2000

AMD Athlon MP1800+(1530MHz) 1705 3060

Intel Pentium III (933 MHz) 507 933

SGI Origin 2000 (300 MHz) 533 600

Intel Pentium II Xeon (450 MHz) 295 450

Sun UltraSPARC (167MHz) 237 333

1 CPU, Units in MFLOPS

Reference: http://performance.netlib.org/performance/html/linpack.data.col0.html

MFLOPS : Measure of Performance

Page 14: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 14

TOP500

• A list of the 500 most powerful computer systems over the world

• Established in June 1993 • Compiled twice a year (June & November)• Using LINPACK Benchmark code (solving linear algebra

equation aX=b )• Organized by world-wide HPC experts, computational

scientists, manufacturers, and the Internet community • Homepage: http://www.top500.org

Page 15: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 15

TOP500:June 2007

Rank Installatio Site

/Year

ManufacturerComputer/Procs

Rmax

Rpeak

1 DOE/NNSA/LLNLUnited States/2005

BlueGene/LeServer Blue Gene Solution / 131,072, IBM

280,600367,000

2 Oak Ridge National LaboratoryUnited States/2006

Jaguar - Cray XT4/XT3 /23016

Cray Inc.

101,700

119,350

3 NNSA/Sandia National LaboratoriesUnited States/2006

Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual core/26544Cray Inc.

101,400

127,411

4 IBM Thomas J. Watson Research Center

United States/2005

BGWeServer Blue Gene Solution / 40,960, IBM

91,290114,688

5 Stony Brook/BNL, New York Center for Computional SciencesUnited States/2007

New York Blue - eServer Blue Gene Solution/36864IBM

82,161

103,219

25 University of North CarolinaUnited States/2007

Topsail - PowerEdge 1955, 2.33 GHz, Cisco/Topspin Infiniband/4160, Dell

28,770

38821.1

TOP 5, Units in GFLOPS (=1024 MGLOPS)

Page 16: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 16

Shared memory - single address space. All processors have access to a pool of shared memory. (examples: Chastity/zephyr, happy/yatta, cedar/cypress, sunny) Methods of memory access : Bus and Crossbar

Distributed memory - each processorhas it’s own local memory. Must do message passing to exchange data between processors. (examples: Baobab, the new Dell Cluster)

MEMORY

BUS

CPU CPU CPU CPUCPU CPU CPU CPU

MMMM

NETWORK

Shared/Distributed-Memory Architecture

Page 17: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 17

What is a Beowulf Cluster?

• A Beowulf system is a collection of personal computers constructed from commodity-off-the-shelf hardware components interconnected with a system-area-network and configured to operate as a single unit, parallel computing platform (e.g., MPI), using an open-source network operating system such as LINUX.

• Main components:– PCs running LINUX OS

– Inter-node connection with Ethernet,

Gigabit, Myrinet, InfiniBand, etc.

– MPI (message passing interface)

Page 18: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 18

LINUX Beowulf Clusters

Page 19: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 19

What is Parallel Computing ?

• Concurrent use of multiple processors to process data– Running the same program on many processors. – Running many programs on each processor.

Page 20: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 20

Advantages of Parallelization

• Cheaper, in terms of Price/Performance Ratio• Faster than equivalently expensive

uniprocessor machines • Handle bigger problems• More scalable: the performance of a particular

program may be improved by execution on a large machine

• More reliable: In theory if processors fail we can simply use others

Page 21: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 21

Catch: Amdahl's Law Speedup = 1/(s+p/n)        

Page 22: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 22

Parallel Programming Tools

• Share-memory architecture– OpenMP

• Distributed-memory architecture– MPI, PVM, etc.

Page 23: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 23

OpenMP• An Application Program Interface (API) that may be used to explicitly direct

multi-threaded, shared memory parallelism • What does OpenMP stand for?

– Open specifications for Multi Processing via collaborative work between interested parties from the hardware and software industry, government and academia.

• Comprised of three primary API components: – Compiler Directives – Runtime Library Routines – Environment Variables

• Portable: – The API is specified for C/C++ and Fortran – Multiple platforms have been implemented including most Unix platforms and

Windows NT • Standardized:

– Jointly defined and endorsed by a group of major computer hardware and software vendors

– Expected to become an ANSI standard later???

Page 24: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 24

OpenMP Example (FORTRAN)

PROGRAM HELLO INTEGER NTHREADS, TID, OMP_GET_NUM_THREADS,

+ OMP_GET_THREAD_NUM

C Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(TID)

C Obtain and print thread id TID = OMP_GET_THREAD_NUM() PRINT *, 'Hello World from thread = ', TID

C Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF

C All threads join master thread and disband !$OMP END PARALLEL

END

Page 25: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 25

The Message Passing Model

• Parallelization scheme for distributed memory.• Parallel programs consist of cooperating processes,

each with its own memory.• Processes send data to one another as messages• Message can be passed around among compute

processes• Messages may have tags that may be used to sort

messages.• Messages may be received in any order.

Page 26: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 26

MPI: Message Passing Interface• Message-passing model• Standard (specification)

– Many implementations (almost each vendor has one)– MPICH and LAM/MPI from public domain most widely used– GLOBUS MPI for grid computing

• Two phases:– MPI 1: Traditional message-passing– MPI 2: Remote memory, parallel I/O, and dynamic processes

• Online resources– http://www-unix.mcs.anl.gov/mpi/index.htm– http://www-unix.mcs.anl.gov/mpi/mpich/– http://www.lam-mpi.org/– http://www.mpi-forum.org– http://www-unix.mcs.anl.gov/mpi/tutorial/learning.html

Page 27: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 27

A Simple MPI Code

#include "mpi.h" #include <stdio.h>

int main( argc, argv ) int argc; char **argv;

{ MPI_Init( &argc, &argv ); printf( "Hello world\n" ); MPI_Finalize(); return 0; }

include ‘mpif.h’integer myid, ierr, numprocs

call MPI_INIT( ierr)call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)call MPI_COMM_SIZE (MPI_COMM_WORLD, numprocs,ierr)

write(*,*) ‘Hello from ‘, myidwrite(*,*) ‘Numprocs is’, numprocscall MPI_FINALIZE(ierr)

end

C Version FORTRAN Version

Page 28: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 28

Other Parallelization Models

• VIA: Virtual Interface Architecture -- Standards-based Cluster Communications

• PVM: a portable message-passing programming system, designed to link separate host machines to form a ``virtual machine'' which is a single, manageable computing resource. It’s largely an academic effort and there has been no much development since 1990s.

• BSP: Bulk Synchronous Parallel Model, a generalization of the widely researched PRAM (Parallel Random Access Machine) model

• Linda:a concurrent programming model from Yale, with the primary concept of ``tuple-space''

• HPF: PGI’s first standard parallel programming language for shared and distributed-memory systems.

Page 29: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 29

Research Computing Servers @ UNC-CH

• IBM P690 – SMP, 32 CPUs, happy/yatta• SGI Altix 3700 – SMP, 128 CPUs,

cedar/cypress• AMD & Xeon LINUX Cluster – Distributed

memory, 352 CPUs, Emerald (old Baobab). • Dell LINUX cluster – Distributed memory 4160

CPUs, topsail

Page 30: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 30

IBM P690 (Regatta)-IBM pSeries 690 Model 6C4, Power4+ Turbo, 32 1.7 GHz processors

- 32 CPUs share-memory

- 128 GB Memory

- 217 GLOPS

- 8 146.8 GB local Disk Drives

- access to 4TB of NetApp NAS RAID array used for scratch space, mounted as /nas and /netscr

-OS: IBM AIX 5.3 Maintenance Level 04

- login node: happy.isis.unc.edu

- compute node: yatta.isis.unc.edu

-Will be gone any time

Page 31: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 31

SGI Altix 3700

• Servers for Scientific Applications such as Gaussian, Amber, and custom code

• Login node: cedar.isis.unc.edu • Compute node:

cypress.isis.unc.edu• Cypress: SGI Altix 3700bx2 - 128

Intel Itanium2 Processors (1600MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 6MB L3 cache, 4GB of Shared Memory (512GB total memory)

• Two 70 GB SCSI System Disks as /scr

Page 32: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 32

SGI Altix 3700• Cedar: SGI Altix 350 - 8 Intel

Itanium2 Processors (1500MHz), each with 16k L1 cache for data, 16k L1 cache for instructions, 256k L2 cache, 4MB L3 cache, 1GB of Shared Memory (8GB total memory), two 70 GB SATA System Disks.

• RHEL 3 with Propack 3, Service Pack 3

• No AFS (HOME & pkg space) access

• Scratch Disk: /netscr, /nas, /scr

Page 33: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 33

Emerald Cluster

• Dual AMD Athlon 48-CPU, 1.4 GHz, 1GB/CPU memory, Myrinet connection

• Dual AMD Athlon 20-CPU, 1.6 GHz, 1GB/CPU memory, Myrinet connection

• IBM BladeCenter, 50-CPU, 2.4 GHz, 1.25GB/CPU memory, Gigabit ethernet connection

• IBM BladeCenter, 156-CPU, 2.8 GHz, 1GB/CPU memory, Gigabit ethernet connection

• 2 Login Nodes: IBM BladeCenter, one Xeon 2.4GHz, 2.5GB RAM and one Xeon 2.8GHz, 2.5GB RAM

• Login: emerald.isis.unc.edu• Access to 7TB of NetApp NAS RAID array used for scratch space,

mounted as /nas and /scr • OS: RedHat Enterprise Linux 3.0 • TOP500: 395th place in the June 2003 release.

Page 34: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 34

Page 35: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 35

behuang

wendellx

thclarke

huxz

jinm

kunwang

leaverfa

aok

guntas

skolenik

paulg

emlange

rjha

walljam

aewebb

gsmurphy

Dec. 2006 Baobab/Emerald Usage

bat_96h

bat_24h

patrons

bat_14d

par_48h_8c

par_96h_4c

bat_30d

bat_1h

par_12h_32c

par_24h_8c

par_24h_4c

Page 36: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 36

New Dell LINUX Cluster, Topsail• 520 dual nodes (4160 CPUs) Xeon (EM64T) • 3.6GHz, 2MB L1 cache 2GB memory per CPU• InfiniBand inter-node connection• Not AFS mounted, not open to general public• Access based on peer-reviewed proposal• HPL: 6.252 Teraflops, 74th in 2006 JuneTOP500 list and 104th in the November 2006 list and 25th in the

June 2007 list (after upgrade)

Page 37: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 37

Original Topsail

• Compute nodes: – 1024 CPUs – 3.6 GHz Intel EM64T – 2M L2 cache – 4 GB memory– 90 nm technology– 800 MHz FSB

Page 38: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 38

Intel 5300 Processor Series65 nm technology

NEW Topsail

28.77 teraflops after upgrade

Page 39: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 39

New TopsailChip/Block Diagram

Page 40: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 40

Benchmark: Latency

Page 41: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 41

Benchmark: Bandwidth

Page 42: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 42

Benchmarks: GAMESS

Page 43: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 43

File Systems• AFS (Andrew File System): AFS is a distributed network file

system that enables files from any AFS machine across the campus to be accessed as easily as files stored locally. – As ISIS HOME for all users with an ONYEN– Limited quote: 250 MB for most users [type “fs lq” to view]– Current production version openafs-1.3.8.6– Files backed up daily [ ~/OldFiles ]– Directory/File tree: /afs/isis/home/o/n/onyen

• For example: /afs/isis/home/m/a/mason, where “mason” is the ONYEN of the user

– Accessible from chastity/zephyr, happy/yatta, baobab– But not from cedar/cypress, new dell cluster– Recommended to compile, run I/O intensive jobs on /scr or /netscr– More info: http://help.unc.edu/?id=215#d0e24

Page 44: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 44

Basic AFS Commands

• To add or remove packages– ipm add pkg_name, ipm remove pkg_name

• To find out space quota/usage– fs lq

• To see and review AFS tokens (read/write-able)– tokens, klog

• Over 200 packages installed in AFS pkg space– /afs/isis/pkg/

• More info available at– http://its.unc.edu/dci/dci_components/afs/

Page 45: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 45

Data Storage• Local Scratch: /scr

– Cedar/cypress: 4x70 GB SCSI System Disks – Chastity/zephyr: 550 GB Fibre-Channel RAID Array – Happy/yatta: 8 36.4 GB Disk Drives – For running jobs, temporary data storage, not backed up

• Network Attached Storage (NAS): /nas/uncch, /netscr– 7TB of NetApp NAS RAID array used for scratch space, mounted as /nas and

/scr – For running jobs, temporary data storage, not backed up– Shared by all login and compute nodes (cedar/cypress, chastity/

zephyr, happy/yatta, baobab)• Mass Storage (MS)

– Never run jobs using files in ~/ms (compute nodes do not have ~/ms access)– Mounted for long term data storage on all scientific computing

servers’ login nodes as ~/ms ($HOME/ms)

Page 46: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 46

Subscription of Services• Have an ONYEN ID

– The Only Name You’ll Ever Need

• Eligibility: Faculty, staff, postdoc, and graduate students

• Go to http://onyen.unc.edu

Page 47: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 47

Subscription of Services

Page 48: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 48

Access to Servers

• To cedar– ssh cedar.isis.unc.edu

• To Emerald– ssh baobab.isis.unc.edu

• To Topsail– ssh topsail.unc.edu

Page 49: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 49

Programming Tools• Compilers

– FORTRAN 77/90/95– C/C++

• Utility Libraries– BLAS, LAPACK, FFTW, SCALAPACK– IMSL, NAG, – NetCDF, GSL, PETSc

• Parallel Computing – OpenMP– PVM– MPI (MPICH, LAM/MPI, MPICH-GM)

Page 50: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 50

Compilers: SMP machines• Cedar/Cypress – SGI Altix 3700, 128 CPUs

– 64-bit Intel Compiler versions 8.0, 8.1 and 9.0, /opt/intel

• FORTRAN 77/90/95: ifort/ifc/efc• C/C++: icc/ecc

– 64-bit GNU compilers• FORTRAN 77 f77/g77• C and C++ gcc/cc and g++/c++

• Happy/Yatta – IBM P690, 32CPUs– XL FORTRAN 77/90 8.1.0.3 xlf, xlf90– C and C++ AIX 6.0.0.4 xlc, xlC

Page 51: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 51

Compilers: LINUX Cluster• Absoft ProFortran Compilers

– Package Name: profortran – Current Version: 7.0 – FORTRAN 77 (f77): Absoft FORTRAN 77 compiler version 5.0 – FORTRAN 90/95 (f90/f95): Absoft FORTRAN 90/95 compiler version 3.0

• GNU Compilers– Package Name: gcc – Current Version: 3.4.3 – FORTRAN 77 (g77/f77): 3.4.3 – C (gcc): 3.4.3 – C++ (g++/c++): 3.4.3

• Intel Compilers – Package Name: intel_fortran intel_CC – Current Version: 8.1 – FORTRAN 77/90 (ifc): Intel LINUX compiler version 8.0, 8.1, 9.0 – CC/C++ (icc): Intel LINUX compiler version 8.0, 8.1, 9.0

• Portland Group Compilers– Package Name: pgi – Current Version: 5.2 – FORTRAN 77 (pgf77): The Portland Group, Inc. pgf77 5.2-4 – FORTRAN 90 (pgf90): The Portland Group, Inc. pgf90 5.2-4 – High Performance FORTRAN (pghpf): The Portland Group, Inc. pghpf 5.2-4 – C (pgcc): The Portland Group, Inc. pgcc 5.2-4 – C++ (pgCC): The Portland Group, Inc. pgCC 5.2-4

Page 52: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 52

LINUX Compiler BenchmarkAbsoft ProFortran 90

Intel FORTRAN 90 Portland Group FORTRAN 90

GNU FORTRAN 77

Molecular Dynamics (CPU time)

4.19 (4) 2.83 (2) 2.80 (1) 2.89 (3)

Kepler (CPU Time) 0.49 (1) 0.93 (2) 1.10 (3) 1.24 (4)

Linpack (CPU Time) 98.6 (4) 95.6 (1) 96.7 (2) 97.6 (3)

Linpack (MFLOPS) 182.6 (4) 183.8 (1) 183.2 (3) 183.3 (2)

LFK (CPU Time) 89.5 (4) 70.0 (3) 68.7 (2) 68.0 (1)

LFK (MFLOPS) 309.7 (3) 403.0 (2) 468.9 (1) 250.9 (4)

Total Rank 20 11 12 17

•For reference only. Notice that performance is code and compilation flag dependent. For each benchmark, three identical runs were performed and the best CPU timing was chosen among the three and then listed in the Table. Optimization flags: for Absoft -O, Portland Group -O4 -fast, Intel -O3, GNU -O

Page 53: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 53

Profilers & Debuggers

• SMP machines– Happy: dbx, prof, gprof– Cedar: gprof

• LINUX Cluster– PGI: pgdebug, pgprof, gprof– Absoft: fx, xfx, gprof– Intel: idb, gprof– GNU: gdb, gprof

Page 54: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 54

Utility Libraries• Mathematic Libraries

– IMSL, NAG, etc. • Scientific Computing

– Linear Algebra• BLAS, ATLAS• EISPACK• LAPACK• SCALAPACK

– Fast Fourier Transform, FFTW– The GNU Scientific Library, GSL– Utility Libraries, netCDF, PETSc, etc.

Page 55: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 55

Utility Libraries• SMP Machines

– Happy/Yatta: ESSL (Engineering and Scientific Subroutine Library), -lessl

• BLAS• LAPACK• EISPACK• Fourier Transforms, Convolutions and Correlations, and

Related Computations• Sorting and Searching• Interpolation• Numerical Quadrature• Random Number Generation• Utilities

Page 56: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 56

Utility Libraries• SMP Machines

• Cedar/Cypress: MKL (Intel Math Kernel Library) 8.0,

-L/opt/intel/mkl721/lib/64 -lmkl -lmkl_lapack -lsolver -lvml -lguide

» BLAS

» LAPACK

» Sparse Solvers

» FFT

» VML (Vector Math Library)

» Random-Number Generators

Page 57: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 57

Utility Libraries for Emerald Cluster

• Mathematic Libraries– IMSL

• The IMSL Libraries are a comprehensive set of mathematical and statistical functions

• From Visual Numerics, http://www.vni.com• Functions include

- Optimization - FFT’s- Interpolation - Differential equations - Correlation - Regression - Time series analysis - and many more

• Available in FORTRAN and C• Package name: imsl• Required compiler: Portland Group compiler, pgi• Installed on AFS ISIS package space, /afs/isis/pkg/imsl• Current default version 4.0, latest version 5.0• To subscribe IMSL, type “ipm add pgi imsl”• To compiler a C code, code.c, using IMSL:

pgcc -O $CFLAGS code.c -o code.x $LINK_CNL_STATIC

Page 58: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 58

• Mathematic Libraries– NAG

• NAG produces and distributes numerical, symbolic, statistical, visualisation and simulation software for the solution of problems in a wide range of applications in such areas as science, engineering, financial analysis and research.

• From Numerical Algorithms Group, http://www.nag.co.uk• Functions include

- Optimization - FFT’s- Interpolation - Differential equations - Correlation - Regression - Time series analysis - Multivariate factor analysis - Linear algebra - Random number generator

• Available in FORTRAN and C• Package name: nag• Available platform: SGI IRIX, SUN Solaris, IBM AIX, LINUX• Installed on AFS ISIS package space, /afs/isis/pkg/nag• Current default version 6.0• To subscribe IMSL, type “ipm add nag”

Utility Libraries for Emerald Cluster

Page 59: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 59

Utility Libraries for Emerald Cluster

• Scientific Libraries– Linear Algebra

• BLAS, LAPACK, LAPACK90, LAPACK++, ATALS, SPARSE-BLAS, SCALAPACK, EISPACK, FFTPACK, LANCZOS, HOMPACK, etc.

• Source code downloadable from the website: http://www.netlib.org/liblist.html

• Compiler dependent• BLAS and LAPACK available for all 4 compiler at AFS

ISIS package space, gcc, profortran, intel and pgi

• SCALAPACK available for pgi and intel compilers• Assistance available if other versions are needed

Page 60: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 60

• Scientific Libraries– Other Libraries: not fully implemented yet and thus please be

cautious and patient when using them• FFTW http://www.fftw.org/• GSL http://www.gnu.org/software/gsl/• NetCDF http://www.unidata.ucar.edu/software/netcdf/• NCO http://nco.sourceforge.net/• HDF http://hdf.ncsa.uiuc.edu/hdf4.html • OCTAVE http://www.octave.org/ • PETSc http://www-unix.mcs.anl.gov/petsc/petsc-as/• ……

– If you think more libraries are of broad interest, please recommend to us

Utility Libraries for Emerald Cluster

Page 61: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 61

Parallel Computing• SMP Machines:

– OpenMP• Compilation:

– Use “-qsmp=omp” flag on happy– Use “-openmp” flag on cedar

• Environmental Variable Setup– setenv OMP_NUM_THREADS n

– MPI• Compilation:

– Use “-lmpi” flag on cedar– Use MPI capable compilers, e.g., mpxlf, mpxlf90, mpcc, mpCC

– Hybrid (OpenMP and MPI): Do both!

Page 62: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 62

Parallel Computing With Emerald Cluster

• Setup MPI Implementation MPICH MPI-LAM

MPI Package to be “ipm add”-ed mpich mpi-lam

Vendor\Programming Language F77 F90 C C++ F77 F90 C C++

GNU Compilers √   √ √ √   √ √

Absoft ProfFortran Compilers √ √ √ √ √ √ √ √

Portland Group Compilers √ √ √ √ √ √ √ √

Intel Compilers √ √ √ √ √ √ √ √

Page 63: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 63

• Setup

Vendor \ Language

Package Name FORTRAN 77 FORTRAN 90 C C++

GNU gcc g77   gcc g++

Absoft ProfFortran

profortran f77 f95    

Portland Group pgi pgf77 pgf90 pgcc pgCC

Intelintel_fortran

intel_CCifc ifc icc icc

Commands for Parallel MPI Compilation

mpich or mpi-lam

mpif77 mpif90 mpicc mpiCC

Parallel Computing With Emerald Cluster

Page 64: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 64

• Setup – AFS Packages to be “ipm add”-ed– Notice the order: Compiler is always added first– Add ONLY ONE compiler into your environmentCOMPILER MPICH MPI-LAM

GNU ipm add gcc mpich ipm add gcc mpi-lam

Absoft ProFortran ipm add profortran mpich ipm add profortran mpi-lam

Portland Group ipm add pgi mpich ipm add pgi mpi-lam

Intelipm add intel_fortran

intel_CC mpichipm add intel_fortran

intel_CC mpi-lam

Parallel Computing With Emerald Cluster

Page 65: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 65

• Compilation – To compile an MPI Fortran 77 code, code.f, and to form an

executable, exec%mpif77 -O -o exec code.f

– For a Fortran 90/95 code, code.f90, and to form an executable, exec%mpif90 -O -o exec code.f90

– For a C code, code.c, and to form an executable, exec%mpicc -O -o exec code.c

– For a C++ code, code.cc, and to form an executable, exec%mpiCC -O -o exec code.cc

Parallel Computing With Emerald Cluster

Page 66: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 66

Scientific Packages• Available in AFS package space• To subscribe a package, type “ipm add pkg_name” where

“pkg_name is the name of the package. For example, “ipm add gaussian”

• To remove it, type “ipm remove pkg_name”• All packages are installed at the /afs/isis/pkg/ directory.

For example, /afs/isis/pkg/gaussian. • Categories of scientific packages include:

– Quantum Chemistry– Molecular Dynamics– Material Science– Visualization– NMR Spectroscopy– X-Ray Crystallography– Bioinformatics– Others

Page 67: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 67

Scientific Package: Quantum ChemistrySoftware Package Name Platforms Current Version Parallel

ABINIT abinit IRIX/LINUX 4.3.3 YES (MPI)

ADF adf LINUX 2002.02 Yes (PVM)

Cerius2 cerius2 IRIX/LINUX 4.10 Yes (MPI)

GAMESS gamess IRIX/LINUX 2001.9.6 Yes (MPI)

Gaussian gaussian IRIX/LINUX 03C02 Yes (OpenMP)

MacroModel macromodel IRIX 7.1 No

MOLFDIR molfdir IRIX 2001 NO

Molpro molpro IRIX/LINUX 2002.6 Yes (MPI)

NWChem nwchem IRIX/LINUX 4.7 Yes (MPI)

MaterialStudio materisalstudio LINUX 3.2 Yes (MPI)

CPMD cpmd IRIX/LINUX 3.0 YES (MPI)

ACES2 aces2 IRIX 4.1.2 No

Page 68: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 68

Scientific Package: Molecular Dynamics

Software Package Name Platforms Current Version Parallel

Amber amber IRIX/LINUX 8.0 MPI

NAMD/VMD namd,vmd IRIX/LINUX 2.5 MPI

Gromacs gromcs IRIX/LINUX 3.2.1 MPI

InsightII insightII IRIX 2000.3 --

MacroModel macromodel IRIX 7.1 --

PMEMD pmemd IRIX/LINUX 3.0.0 MPI

Quanta quanta IRIX 2005 MPI

Sybyl sybyl IRIX/LINUX 7.1 --

CHARMM charmm IRIX 3.0B1 MPI

TINKER tinker LINUX 4.2 --

O o IRIX 9.0.7 --

Page 69: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 69

Molecular & Scientific VisualizationSoftware Package Name Platforms Current Version

AVS avs IRIX 5.6

AVS Express Avs-express IRIX 6.2

Cerius2 cerius2 IRIX/LINUX 4.9

DINO dino IRIX 0.8.4

ECCE ecce IRIX 2.1

GaussView gaussian IRIX/LINUX/AIX 3.0.9

GRASP grasp IRIX 1.3.6

InsightII insightII IRIX/LINUX 2000.3

MOIL-VIEW Moil-view IRIX 9.1

MOLDEN molden IRIX/LINUX 4.0

MOLKEL molkel IRIX 4.3

MOLMOL molmol IRIX 2K.1

MOLSCRIPT molscript IRIX 2.1.2

MOLSTAR molstar IRIX/LINUX 1.0

Page 70: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 70

Molecular & Scientific Visualization

Software Package Name Platforms Current Version

MOVIEMOL moviemol IRIX 1.3.1

NBOView nbo IRIX/LINUX 5.0

QUANTA quanta IRIX/LINUX 2005

RASMOL rasmol IRIX/LINUX/AIX 2.7.3

RASTER3D raster3d IRIX/LINUX 2.7c

SPARTAN spartan IRIX 5.1.3

SPOCK spock IRIX 1.7.0p1

SYBYL sybyl IRIX/LINUX 7.1

VMD vmd IRIX/LINUX 1.8.2

XtalView xtalview IRIX 4.0

XMGR xmgr IRIX 4.1.2

GRACE grace IRIX/LINUX 5.1.2

IMAGEMAGICK Imagemagick IRIX/LINUX/AIX 6.2.1.3

GIMP gimp IRIX/LINUX/AIX 1.0.2

XV xv IRIX/LINUX/AIX 3.1.0a

Page 71: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 71

NMR & X-Ray Crystallography Software Package Name Platforms Current Version

CNSsolve cnssolve IRIX/LINUX 1.1

AQUA aqua IRIX/LINUX 3.2

BLENDER blender IRIX 2.28a

BNP bnp IRIX/LINUX 0.99

CAMBRIDGE cambridge IRIX 5.26

CCP4 ccp4 IRIX/LINUX 4.2.2

CNX cns IRIX/LINUX 2002

FELIX felix IRIX/LINUX 2004

GAMMA gamma IRIX 4.1.0

MOGUL mogul IRIX/LINUX 1.0

Phoelix phoelix IRIX 1.2

TURBO turbo IRIX 5.5

XPLOR-NIH Xplor_nih IRIX/LINUX 2.11.2

XtalView xtalview IRIX 4.0

Page 72: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 72

Scientific Package: Bioinformatics

Software Package Name Platforms Current Version

BIOPERL bioperl IRIX 1.4.0

BLAST blast IRIX/LINUX 2.2.6

CLUSTALX clustalx IRIX 8.1

EMBOSS emboss IRIX 2.8.0

GCG gcg LINUX 11.0

Insightful Miner iminer IRIX 3.0

Modeller modeller IRIX/LINUX 7.0

PISE pise LINUX 5.0a

SEAVIEW seaview IRIX/LINUX 1.0

AUTODOCK autodock IRIX 3.05

DOCK dock IRIX/LINUX 5.1.1

FTDOCK ftdock IRIX 1.0

HEX hex IRIX 2.4

Page 73: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 73

Packages on Cedar/Cypress• No access to AFS packages

• A separate pool of packages are installed at /opt• Available packages include:

– Amber– CPMD– Gaussian– GROMACS– MOLPRO– NAMD– NWChem– PMEMD

Page 74: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 74

Why do We Need Job Management Systems?

• “Whose job you run in addition to when and where it is run, may be as important as how many jobs you run!”

• Effectively optimizes the utilization of resources

• Effectively optimizes the sharing of resources• Often referred to as Resource Management

Software, Queuing Systems, or Job Management System, etc.

Page 75: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 75

Job Management Tools

• PBS - Portable Batch System– Open Source Product Developed at NASA Ames Research Center

• DQS - Distributed Queuing System– Open Source Product Developed by SCRI at Florida State University

• LSF - Load Sharing Facility– Commercial Product from Platform Computing, Already Deployed at UNC-CH ITS

Computing Servers

• Codine/Sun Grid Engine– Commercial Version of DQS from Gridware, Inc. Now owned by SUN.

• Condor– A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin

• Others Too Numerous To Mention

Page 76: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 76

Submission host

LIM

Batch API

Master host

MLIM

MBD

Execution host

SBD

Child SBD

LIM

RES

User jobLIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server

queue1

2

3

45

6 7

89

10

11

12

13

Loadinformation

otherhosts

otherhosts

bsub app

Operation of LSF

Page 77: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 77

Cool Things about LSF• It provides user access to dynamic load-balancing, load-sharing, and job

queuing. • It includes LSF JobScheduler, LSF Make, LSF Global Intelligence (which

allows for complex analysis of system activity data), LSF MultiCluster, and Platform HPC.

• Platform HPC includes the Parallel Application Manager (PAM) and integration with a large set of numerical and scientific computing applications.

• Users of the cluster can define what resources they need for a given problem (number of CPUs, special software licenses, CPU time, disk space, memory, etc.) and let resource management facilities of LSF determine where the job should run and when.

• If the cluster becomes overloaded, Platform LSF acts as a “traffic controller”, ensuring that the work continues to flow without a system crashing.

• For programs written with MPI calls, Platform HPC provides parallel application management (PAM) and scripts that integrate with the MPI libraries and binaries.

• These features manage the execution of the code and provide housekeeping services, such as assigning CPUs to the program when it starts and graceful termination in the event of an error.

Page 78: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 78

Common LSF Commands• lsid

– A good choice of LSF command to start with is the lsid command

• lshosts/bhosts– shows all of the nodes that the LSF

system is aware of • bsub

– submits a job interactively or in batch using LSF batch scheduling and queue layer of the LSF suite

• bjobs– isplays information about a recently

run job. You can use the –l option to view a more detailed accounting

• bqueues– displays information about the

batch queues. Again, the –l option will display a more thorough description

• bkill <job ID# >– kill the job with job ID number of #

• bhist -l <job ID# >– displays historical information

about jobs. A “-a” flag can displays information about both finished and unfinished jobs

• bpeek -f <job ID#>– displays the stdout and stderr

output of an unfinished job with a job ID of #.

• bhpart– displays information about host

partitions• bstop

– Suspend a unfinished jobs• bswitch

– switches unfinished jobs from one queue to another

Page 79: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 79

More about LSF

• Type “jle” -- checks job efficiency• Type “bqueues” for all queues on one cluster/machine

(-m); Type “bqueues -l queue_name” for more info about the queue named “queue_name”

• Type “busers” for user job slot limits • Specific for Baobab:

– cpufree -- to check how many free/idle CPUs avaialble on Baobab

– pending -- to check how many jobs are still pending

Page 80: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 80

LSF Clusters• At ITS UNC-CH, we have three clusters

– coral• Emerald/Baobab LINUX Cluster

– fleet• Happy/yatta, sunny, etc.

– conifers• Cedar/cypress

• One cannot submit jobs across LSF clusters! Each cluster is self-contained.

Page 81: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 81

LSF Queues• The LSF queues implement our job scheduling

and control policies. Their names reflect the characteristics of the jobs each queue accepts. These include the job type, the run time, and the number of CPUs requested.

• There are three job types:– batch (serial)– interactive– parallel

Page 82: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 82

LSF Queues for the fleet/conifers Clusters

Queues Description

int Interactive jobs

now Preemptive debugging queue, 10 min wall-clock limit, 2 CPUs

week Default queue, one week wall-clock limit, up to 32 CPUs/user

monthLong-running serial-job queue, one month wall-clock limit, up to 4 jobs per

user

staff ITS Research Computing staff queue

manager For use by LSF administrators

Page 83: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 83

Run Jobs on LSF fleet/conifers Clusters

• Jobs to Interactive Queue

bsub -q int -m cedar -Ip my_interactive_job • Serial Jobs

bsub -q week -m cypress my_batch_job

• Parallel OpenMP Jobssetenv OMP_NUM_THREADS 4

bsub -q week -n 4 -m cypress my_parallel_job

• Parallel MPI Jobsbsub -q week -n 4 -m cypress mpirun -np 4 my_parallel_job

Page 84: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 84

LSF Queues on Emerald

QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP

patrons 90 Open:Active - - - - 0 0 0 0

int 80 Open:Active 16 2 - - 1 0 1 0

now 70 Open:Active - 2 - - 0 0 0 0

week 50 Open:Active - 32 - - 13 0 13 0

month 40 Open:Active 32 4 - - 4 0 4 0

staff 30 Open:Active - - - - 0 0 0 0

manager 20 Open:Active - - - - 0 0 0 0

idle 10 Open:Active - - - - 9 0 9 0

Page 85: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 85

LSF Queues on Emerald$ bqueues -l idle

QUEUE: idle -- jobs that may be preempted by jobs from the patrons queue

PARAMETERS/STATISTICSPRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV 10 0 Open:Active - - - - 8 0 8 0 0

0

SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - -

gm_ports loadSched - loadStop -

SCHEDULING POLICIES: NO_INTERACTIVE

USERS: all HOSTS: donors/ POST_EXEC: /opt/lsf/etc/post_execRES_REQ: select[type==any] same[model]JOB_STARTER: /opt/lsf/common/etc/job_starter.pl

Page 86: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 86

Peculiars of Baobab Cluster(LSF sciclus Cluster)

CPU TypeResources

-R

Parallel Job Submission

esub-a

Wrapper

AMD Athlon 1.4 GHz athlon14

lammpi

mpich

lammpirun_wrapper

mpichp4_wrapper

AMD Athlon 1.6 GHz athlon16

Xeon 2.4 GHz xeon24,blade,lammpi

Xeon 2.8 GHz xeon28,blade,lammpi

Notice that -R and -a flags are mutually exclusive in one command line.

Page 87: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 87

Run Jobs on Emerald LINUX Cluster

• Interactive Jobsbsub -q int_8h -R athlon14 -Ip my_interactive_job

• Syntax for submitting a serial job is:bsub -q queuename -R resources executable

– For examplebsub -q bat_96h -R blade my_executable

• To run a MPICH parallel job on AMD Athlon machines with, say, 4 CPUs, bsub -q par_24h_4c -n 4 -a mpich mpirun.lsf my_par_job

• To run LAM/MPI parallel jobs on IBM BladeCenter machines with, say, 4 CPUs:

bsub -q par_24h_4c -n 4 -a lammpi mpirun.lsf my_par_job

Page 88: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 88

Final Friendly Reminders

• Never run jobs on login nodes– For file management, coding, compilation, etc., purposes only

• Never run jobs outside LSF– Fair sharing

• Never run jobs on your AFS ISIS home or ~/ms. Instead, on /scr, /netscr, or /nas– Slow I/O response, limited disk space

• Move your data to mass storage after jobs are finished and remove all temporary files on scratch disks– Scratch disk not backed up, efficient use of limited resources– Old files will automatically be deleted without notification

Page 89: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 89

Online Resources

• Get started with Research Computing:http://www.unc.edu/atn/hpc/getting_started/index.shtml?id=4196

• Programming Toolshttp://www.unc.edu/atn/hpc/programming_tools/index.shtml

• Scientific Packageshttp://www.unc.edu/atn/hpc/applications/index.shtml?id=4237

• Job Managementhttp://www.unc.edu/atn/hpc/job_management/index.shtml?id=4484

• Benchmarkshttp://www.unc.edu/atn/hpc/performance/index.shtml?id=4228

• High Performance Computinghttp://www.beowulf.orghttp://www.top500.orghttp://www.linuxhpc.orghttp://www.supercluster.org/

Page 90: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 90

Online Training Resources• Ohio Supercomputer Center (OSC)

http://www.osc.edu/hpc/training/• Texas Advanced Computing Center (TACC)

http://www.tacc.utexas.edu/services/training/• Maui High Performance Computing Center (MHPCC)

http://www.mhpcc.edu/training/tutorials/• National Center for Supercomputing Applications (NCSA)

http://www.ncsa.uiuc.edu/Divisions/eot/Training/• Lawrence Livermore National Laboratory (LLNL)

http://www.llnl.gov/computing/hpc/training/• National Energy Research Scientific Computing Center (NERSC)

http://www.nersc.gov/nusers/services/training/classes/• University of Minnesota Supercomputing Institute (UMSI)

http://www.msi.umn.edu/tutorial/• San Diego Supercomputer Center (SDSC)

http://www.sdsc.edu/user_services/training/

Page 91: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 91

QUESTIONS & COMMENTS?

Please direct comments/questions about research computing to

E-mail: [email protected]

Please direct comments/questions pertaining to this presentation to

E-Mail: [email protected]

Please direct comments/questions about research computing to

E-mail: [email protected]

Please direct comments/questions pertaining to this presentation to

E-Mail: [email protected]

Page 92: Introduction to Scientific Computing Shubin Liu, Ph.D. Renaissance Computing Institute (RENCI) University of North Carolina at Chapel Hill

9/17/2007 Scientific Computing @ UNC 92

Hands-on Exercises

• If you haven’t done so yet– Subscribe the Research Computing services– Access via SecureCRT or X-Win32 to chastity, happy,

baobab, etc.– Create a working directory for yourself on /netscr or /scr– Get to know basic AFS, UNIX commands– Get to know the Baobab Beowulf cluster

• Compile serial and one parallel (MPI) codes on Baobab• Get familiar with basic LSF commands • Get to know available packages available in AFS package

space• Submit jobs via LSF using serial or parallel queue