introduction to high performance computing, 9 sept 2015 · 2015. 9. 9. · compute*resources*...
TRANSCRIPT
Introduc)on to High Performance Compu)ng
Advanced Research Computing September 9, 2015
Outline
• What cons)tutes high performance compu)ng (HPC)? • When to consider HPC resources • What kind of problems are typically solved? • What are the components of HPC? • What resources are available? • Overview of HPC Resources at Virginia Tech
2
Should I Pursue HPC?
• Are local resources insufficient to meet your needs? – Very large jobs – Very many jobs – Large data
• Do you have na)onal collaborators? – Share projects between different en))es – Convenient mechanisms for data sharing
3
Physics (91) 19%
Molecular Biosciences (271)
17%
Astronomical Sciences (115)
13%
Atmospheric Sciences (72)
11%
Materials Research (131)
9%
Chemical, Thermal Sys (89)
8%
Chemistry (161) 7%
ScienEfic CompuEng (60)
2%
Earth Sci (29) 2%
Training (51) 2%
Who Uses HPC?
• >2 billion core-‐hours allocated • 1400 alloca)ons • 350 ins)tu)ons • 32 research domains
Learning Curve
• Linux: Command-‐line interface • Scheduler: Shares resources among mul)ple users • Parallel Compu)ng:
– Need to parallelize code to take advantage of supercomputer’s resources – Third party programs or libraries make this easier
Popular So\ware Packages
• Molecular Dynamics: Gromacs, LAMMPS • CFD: OpenFOAM, Ansys • Finite Elements: Deal II, Abaqus • Chemistry: VASP, Gaussian • Climate: CESM • Bioinforma)cs: Mothur, QIIME, MPIBLAST • Numerical Compu)ng/Sta)s)cs: R, Matlab • Visualiza)on: ParaView, VisIt, Ensight
8
What is Parallel Compu)ng?
Parallel Compu)ng 101
• Parallel compu)ng: use of mul)ple processors or computers working together on a common task. – Each processor works on its sec)on of the problem – Processors can exchange informa)on
9
Grid of Problem to be solved
CPU #1 works on this area of the problem
CPU #3 works on this area of the problem
CPU #4 works on this area of the problem
CPU #2 works on this area of the problem
y
x
exchange exchange
exchange
exchange exchange
Why Do Parallel Compu)ng?
• Limits of single CPU compu)ng – performance – available memory – I/O rates
• Parallel compu)ng allows one to: – solve problems that don’t fit on a single CPU – solve problems that can’t be solved in a reasonable )me
• We can solve… – larger problems – faster – more cases
10
Parallelism is the New Moore’s Law
• Power and energy efficiency impose a key constraint on design of micro-‐architectures • Clock speeds have plateaued • Hardware parallelism is increasing rapidly to make up the difference
14
What does a modern supercomputer look like?
Essen)al Components of HPC
• Supercompu)ng resources • Storage • Visualiza)on • Data management • Network infrastructure • Support
16
Terminology
• Core: A computa)onal “unit” • Socket: A single CPU (“processor”). Includes roughly 4-‐15 cores. • Node: A single “computer”. Includes roughly 2-‐8 sockets. • ︎Cluster: A single “supercomputer” consis)ng of many nodes. • ︎GPU: Graphics processing unit. Amached to some nodes. General purpose GPUs (GPGPUs) can be used to speed up certain kinds of codes. • ︎Xeon Phi: Intel’s product name for its GPU compe)tor. Also called “MIC”.
Shared vs. Distributed memory
• All processors have access to a pool of shared memory • Access )mes vary from CPU to CPU in NUMA systems • Example: SGI UV, CPUs on same node
• Memory is local to each processor
• Data exchange by message passing over a network
• Example: Clusters with single-‐socket blades
P
Memory
P P P P
P P P P P
M M MM M
Network
Mul)-‐core systems
• Current processors place mul)ple processor cores on a die • Communica)on details are increasingly complex
– Cache access – Main memory access – Quick Path / Hyper Transport socket connec)ons – Node to node connec)on via network
Memory
Network
Memory Memory Memory Memory
Accelerator-‐based Systems
• Calcula)ons made in both CPUs and GPUs • No longer limited to single precision calcula)ons • Load balancing cri)cal for performance • Requires specific libraries and compilers (CUDA, OpenCL) • Co-‐processor from Intel: MIC (Many Integrated Core)
Network
GPU
Memory
GPU
Memory
GPU
Memory
GPU
Memory
HPC Trends
Architecture Code
Single core Serial
Mul)core OpenMP, Pthreads
GPU CUDA, OpenACC
Cluster MPI
P
MGPU
Memory Memory
How are accelerators different? Intel Xeon E5-‐2670
(CPU) Intel Xeon Phi 5110P
(MIC) Nvidia Tesla K20X
(GPU)
Cores 8 60 14 SMX
Logical Cores 16 240 2,688 CUDA cores
Frequency 2.60 GHz 1.05 GHz 0.74 MHz
GFLOPs (double) 333 1,010 1,317
Memory 64 GB 8GB 6GB
Memory B/W 51.2GB/s 320GB/s 250GB/s
Batch Submission Process
qsub job
Compute Nodes
mpirun –np # ./a.out
Queue
Master Node C1 C3 C2
Login Node
ssh
26
ARC Overview
Advanced Research Compu)ng
• Unit within the Office of the Vice President of Informa)on Technology • Provide centralized resources for:
– Research compu)ng – Visualiza)on
• Staff to assist users • Website: hmp://www.arc.vt.edu
Goals
• Advance the use of compu)ng and visualiza)on in VT research • Centralize resource acquisi)on, maintenance, and support for research community • Provide support to facilitate usage of resources and minimize barriers to entry • Enable and par)cipate in research collabora)ons between departments
Personnel
• Associate VP for Research Compu)ng: Terry Herdman • Director, HPC: Vijay Agarwala • Director, Visualiza)on: Nicholas Polys • Computa)onal Scien)sts
– Jus)n Krome)s – James McClure – Brian Marshall – Srinivas Yarlanki – Srijith Rajamohan
Personnel (Con)nued)
• System Administrators – Tim Rhodes – Chris Snapp – Brandon Sawyers
• Business Manager: Alana Romanella • User Support GRAs: Umar Kalim, Saeed Izadi, Sangeetha Srinivasa
Compute Resources
System Usage Nodes Node DescripEon Special Features
Ithaca Beginners, MATLAB 79 8 cores, 24GB
(2× Intel Nehalem) 10 double-‐memory nodes
HokieOne Shared, Large Memory 82 6 cores, 32GB
(Intel Westmere) 2.6TB shared-‐memory
HokieSpeed GPGPU 201 12 cores, 24 GB (2× Intel Westmere)
402 Tesla C2050 GPU
BlueRidge Large-‐scale CPU, MIC 408 16 cores, 64 GB
(2× Intel Sandy Bridge)
260 Intel Xeon Phi 4 K40 GPU 18 128GB nodes
NewRiver Large-‐scale, Data Intensive 134 24 cores, 128 GB
(2× Intel Haswell)
8 K80 GPGPU 16 “big data” nodes 24 512GB nodes 2 3TB nodes
Computa)onal Resources
Name NewRiver BlueRidge HokieSpeed HokieOne Ithaca
Key Features, Uses Scalable CPU, Data Intensive Scalable CPU or MIC GPU Shared Memory Beginners, MATLAB
Available August 2015 March 2013 Sept 2012 Apr 2012 Fall 2009
Theore)cal Peak (TFlops/s) 152.6 398.7 238.2 5.4 6.1
Nodes 134 408 201 N/A 79
Cores 3,288 6,528 2,412 492 632
Cores/Node 24 16 12 N/A* 8
Accelerators/Coprocessors 8 Nvidia K80 GPU 260 Intel Xeon Phi
8 Nvidia K40 GPU 408 Nvidia C2050
GPU N/A N/A
Memory Size 34.4 TB 27.3 TB 5.0 TB 2.62 TB 2 TB
Memory/Core 5.3 GB* 4 GB* 2 GB 5.3 GB 3 GB*
Memory/Node 128 GB* 64 GB* 24 GB N/A* 24 GB*
Visualiza)on Resources
• VisCube: 3D immersion environment with three 10ʹ′ by 10ʹ′ walls and a floor of 1920×1920 stereo projec)on screens • DeepSix: Six )led monitors with combined resolu)on of 7680×3200 • ROVR Stereo Wall • AISB Stereo Wall
Gexng Started with ARC
• Review ARC’s system specifica)ons and choose the right system(s) for you – Specialty so\ware
• Apply for an account online the Advanced Research Compu)ng website • When your account is ready, you will receive confirma)on from ARC’s system administrators
Resources
• ARC Website: hmp://www.arc.vt.edu • ARC Compute Resources & Documenta)on: hmp://www.arc.vt.edu/hpc • New Users Guide: hmp://www.arc.vt.edu/newusers • Frequently Asked Ques)ons: hmp://www.arc.vt.edu/faq • Linux Introduc)on: hmp://www.arc.vt.edu/unix
Thank you
Ques)ons?