introduction to high performance computing, 9 sept 2015 · 2015. 9. 9. · computeresources...

Introduc)on to High Performance Compu)ng

Advanced Research Computing September 9, 2015

Outline

• What cons)tutes high performance compu)ng (HPC)? • When to consider HPC resources • What kind of problems are typically solved? • What are the components of HPC? • What resources are available? • Overview of HPC Resources at Virginia Tech

2

Should I Pursue HPC?

• Are local resources insufficient to meet your needs? – Very large jobs – Very many jobs – Large data

• Do you have na)onal collaborators? – Share projects between different en))es – Convenient mechanisms for data sharing

3

Physics (91) 19%

Molecular Biosciences (271)

17%

Astronomical Sciences (115)

13%

Atmospheric Sciences (72)

11%

Materials Research (131)

9%

Chemical, Thermal Sys (89)

8%

Chemistry (161) 7%

ScienEfic CompuEng (60)

2%

Earth Sci (29) 2%

Training (51) 2%

Who Uses HPC?

• >2 billion core-‐hours allocated • 1400 alloca)ons • 350 ins)tu)ons • 32 research domains

Learning Curve

• Linux: Command-‐line interface • Scheduler: Shares resources among mul)ple users • Parallel Compu)ng:

– Need to parallelize code to take advantage of supercomputer’s resources – Third party programs or libraries make this easier

Popular So\ware Packages

• Molecular Dynamics: Gromacs, LAMMPS • CFD: OpenFOAM, Ansys • Finite Elements: Deal II, Abaqus • Chemistry: VASP, Gaussian • Climate: CESM • Bioinforma)cs: Mothur, QIIME, MPIBLAST • Numerical Compu)ng/Sta)s)cs: R, Matlab • Visualiza)on: ParaView, VisIt, Ensight

8

What is Parallel Compu)ng?

Parallel Compu)ng 101

• Parallel compu)ng: use of mul)ple processors or computers working together on a common task. – Each processor works on its sec)on of the problem – Processors can exchange informa)on

9

Grid of Problem to be solved

CPU #1 works on this area of the problem




y

x

exchange exchange

exchange

exchange exchange

Why Do Parallel Compu)ng?

• Limits of single CPU compu)ng – performance – available memory – I/O rates

• Parallel compu)ng allows one to: – solve problems that don’t fit on a single CPU – solve problems that can’t be solved in a reasonable )me

• We can solve… – larger problems – faster – more cases

10

Parallelism is the New Moore’s Law

• Power and energy efficiency impose a key constraint on design of micro-‐architectures • Clock speeds have plateaued • Hardware parallelism is increasing rapidly to make up the difference

14

What does a modern supercomputer look like?

Essen)al Components of HPC

• Supercompu)ng resources • Storage • Visualiza)on • Data management • Network infrastructure • Support

16

Terminology

• Core: A computa)onal “unit” • Socket: A single CPU (“processor”). Includes roughly 4-‐15 cores. • Node: A single “computer”. Includes roughly 2-‐8 sockets. • ︎Cluster: A single “supercomputer” consis)ng of many nodes. • ︎GPU: Graphics processing unit. Amached to some nodes. General purpose GPUs (GPGPUs) can be used to speed up certain kinds of codes. • ︎Xeon Phi: Intel’s product name for its GPU compe)tor. Also called “MIC”.

Shared vs. Distributed memory

• All processors have access to a pool of shared memory • Access )mes vary from CPU to CPU in NUMA systems • Example: SGI UV, CPUs on same node

•  Memory is local to each processor

•  Data exchange by message passing over a network

•  Example: Clusters with single-‐socket blades

P

Memory

P P P P

P P P P P

M M MM M

Network

Mul)-‐core systems

• Current processors place mul)ple processor cores on a die • Communica)on details are increasingly complex

– Cache access – Main memory access – Quick Path / Hyper Transport socket connec)ons – Node to node connec)on via network

Memory

Network

Memory Memory Memory Memory

Accelerator-‐based Systems

• Calcula)ons made in both CPUs and GPUs • No longer limited to single precision calcula)ons • Load balancing cri)cal for performance • Requires specific libraries and compilers (CUDA, OpenCL) • Co-‐processor from Intel: MIC (Many Integrated Core)

Network

GPU

Memory

GPU

Memory

GPU

Memory

GPU

Memory

HPC Trends

Architecture Code

Single core Serial

Mul)core OpenMP, Pthreads

GPU CUDA, OpenACC

Cluster MPI

P

MGPU

Memory Memory

How are accelerators different? Intel Xeon E5-‐2670

(CPU) Intel Xeon Phi 5110P

(MIC) Nvidia Tesla K20X

(GPU)

Cores 8 60 14 SMX

Logical Cores 16 240 2,688 CUDA cores

Frequency 2.60 GHz 1.05 GHz 0.74 MHz

GFLOPs (double) 333 1,010 1,317

Memory 64 GB 8GB 6GB

Memory B/W 51.2GB/s 320GB/s 250GB/s

Batch Submission Process

qsub job

Compute Nodes

mpirun –np # ./a.out

Queue

Master Node C1 C3 C2

Login Node

ssh

26

ARC Overview

Advanced Research Compu)ng

• Unit within the Office of the Vice President of Informa)on Technology • Provide centralized resources for:

– Research compu)ng – Visualiza)on

• Staff to assist users • Website: hmp://www.arc.vt.edu

Goals

• Advance the use of compu)ng and visualiza)on in VT research • Centralize resource acquisi)on, maintenance, and support for research community • Provide support to facilitate usage of resources and minimize barriers to entry • Enable and par)cipate in research collabora)ons between departments

Personnel

• Associate VP for Research Compu)ng: Terry Herdman • Director, HPC: Vijay Agarwala • Director, Visualiza)on: Nicholas Polys • Computa)onal Scien)sts

– Jus)n Krome)s – James McClure – Brian Marshall – Srinivas Yarlanki – Srijith Rajamohan

Personnel (Con)nued)

• System Administrators – Tim Rhodes – Chris Snapp – Brandon Sawyers

• Business Manager: Alana Romanella • User Support GRAs: Umar Kalim, Saeed Izadi, Sangeetha Srinivasa

Compute Resources

System Usage Nodes Node DescripEon Special Features

Ithaca Beginners, MATLAB 79 8 cores, 24GB

(2× Intel Nehalem) 10 double-‐memory nodes

HokieOne Shared, Large Memory 82 6 cores, 32GB

(Intel Westmere) 2.6TB shared-‐memory

HokieSpeed GPGPU 201 12 cores, 24 GB (2× Intel Westmere)

402 Tesla C2050 GPU

BlueRidge Large-‐scale CPU, MIC 408 16 cores, 64 GB

(2× Intel Sandy Bridge)

260 Intel Xeon Phi 4 K40 GPU 18 128GB nodes

NewRiver Large-‐scale, Data Intensive 134 24 cores, 128 GB

(2× Intel Haswell)

8 K80 GPGPU 16 “big data” nodes 24 512GB nodes 2 3TB nodes

Computa)onal Resources

Name NewRiver BlueRidge HokieSpeed HokieOne Ithaca

Key Features, Uses Scalable CPU, Data Intensive Scalable CPU or MIC GPU Shared Memory Beginners, MATLAB

Available August 2015 March 2013 Sept 2012 Apr 2012 Fall 2009

Theore)cal Peak (TFlops/s) 152.6 398.7 238.2 5.4 6.1

Nodes 134 408 201 N/A 79

Cores 3,288 6,528 2,412 492 632

Cores/Node 24 16 12 N/A* 8

Accelerators/Coprocessors 8 Nvidia K80 GPU 260 Intel Xeon Phi

8 Nvidia K40 GPU 408 Nvidia C2050

GPU N/A N/A

Memory Size 34.4 TB 27.3 TB 5.0 TB 2.62 TB 2 TB

Memory/Core 5.3 GB* 4 GB* 2 GB 5.3 GB 3 GB*

Memory/Node 128 GB* 64 GB* 24 GB N/A* 24 GB*

Visualiza)on Resources

• VisCube: 3D immersion environment with three 10ʹ′ by 10ʹ′ walls and a floor of 1920×1920 stereo projec)on screens • DeepSix: Six )led monitors with combined resolu)on of 7680×3200 • ROVR Stereo Wall • AISB Stereo Wall

Gexng Started with ARC

• Review ARC’s system specifica)ons and choose the right system(s) for you – Specialty so\ware

• Apply for an account online the Advanced Research Compu)ng website • When your account is ready, you will receive confirma)on from ARC’s system administrators

Resources

• ARC Website: hmp://www.arc.vt.edu • ARC Compute Resources & Documenta)on: hmp://www.arc.vt.edu/hpc • New Users Guide: hmp://www.arc.vt.edu/newusers • Frequently Asked Ques)ons: hmp://www.arc.vt.edu/faq • Linux Introduc)on: hmp://www.arc.vt.edu/unix

Thank you

Ques)ons?

introduction to high performance computing, 9 sept 2015 · 2015. 9. 9. · compute*resources*...

Documents

introduction to high performance computing, 9 sept 2015 · 2015. 9. 9. · computeresources...