introduction to hpc and arc resources - … · compute&resources system’ usage’ nodes...

Introduc)on to ARC Systems

Advanced Research Computing October 6, 2017

Before We Start

•  Sign in •  Request account if necessary

• Windows Users: •  Download PuTTY

•  Google PuTTY •  First result •  Save puFy.exe to Desktop

•  ALTERNATIVE: •  ETX newriver.arc.vt.edu

Today’s goals

• Introduce ARC • Give overview of HPC today • Give overview of VT-‐HPC resources • Familiarize audience with interac)ng with VT-‐ARC systems

Why HPC?

Should I Pursue HPC?

• Necessity: Are local resources insufficient to meet your needs? – Very large jobs – Very many jobs – Large data

• Convenience: Do you have collaborators? – Share projects between different en))es – Convenient mechanisms for data sharing

Parallelism is the New Moore’s Law

• Power and energy efficiency impose a key constraint on design of micro-‐architectures • Clock speeds have plateaued • Hardware parallelism is increasing rapidly to make up the difference

Research in HPC is Broad

•  Earthquake Science and Civil Engineering

•  Molecular Dynamics •  Nanotechnology •  Plant Science •  Storm modeling •  Epidemiology •  Par)cle Physics •  Economic analysis of phone

network paFerns

•  Brain science •  Analysis of large cosmological

simula)ons •  DNA sequencing •  Computa)onal Molecular

Sciences •  Neutron Science •  Interna)onal Collabora)on in

Cosmology and Plasma Physics

7

Physics (91) 19%

Molecular Biosciences (271)

17%

Astronomical Sciences (115)

13%

Atmospheric Sciences (72)

11%

Materials Research (131)

9%

Chemical, Thermal Sys (89)

8%

Chemistry (161) 7%

ScienEfic CompuEng (60)

2%

Earth Sci (29) 2%

Training (51) 2%

Who Uses HPC? • >2 billion core-‐hours allocated

• 1400 alloca)ons • 350 ins)tu)ons • 32 research domains

Popular SoEware Packages

• Molecular Dynamics: Gromacs, LAMMPS, NAMD, Amber • CFD: OpenFOAM, Ansys, Star-‐CCM+ • Finite Elements: Deal II, Abaqus • Chemistry: VASP, Gaussian, PSI4, QCHEM • Climate: CESM • Bioinforma)cs: Mothur, QIIME, MPIBLAST • Numerical Compu)ng/Sta)s)cs: R, Matlab, Julia • Visualiza)on: ParaView, VisIt, Ensight

Learning Curve

•  Linux: Command-‐line interface •  Scheduler: Shares resources among

mul)ple users •  Parallel Compu)ng:

•  Need to parallelize code to take advantage of supercomputer’s resources

•  Third party programs or libraries make this easier

Hardware & Terminology

EssenJal Components of HPC

• Supercompu)ng resources • Storage • Visualiza)on • Data management • Network infrastructure • Support

12

Terminology

• Core: A computa)onal “unit” • Socket: A single CPU (“processor”). Includes roughly 4-‐16 cores. • Node: A single “computer”. Includes roughly 2-‐8 sockets. • Cluster: A single “supercomputer” consis)ng of many nodes. • GPU: Graphics processing unit. AFached to some nodes. General purpose GPUs (GPGPUs) can be used to speed up certain kinds of codes.

Blade : Rack : System

• 1 node : 2 x 8 cores= 16 cores • 1 chassis : 10 nodes = 160 cores • 1 rack (frame) : 4 chassis = 640 cores • system : 10 racks = 6,400 cores x 4

x 10

Shared vs. Distributed memory

• All processors have access to a pool of shared memory • Access )mes vary from CPU to CPU in NUMA systems • Example: SGI UV, CPUs on same node

•  Memory is local to each processor

•  Data exchange by message passing over a network

•  Example: Clusters with single-‐socket blades

P

Memory

P P P P P P P P P

M M MM M

Network

MulJ-‐core systems

• Current processors place mul)ple processor cores on a die • Communica)on details are increasingly complex

– Cache access – Main memory access – Quick Path / Hyper Transport socket connec)ons – Node to node connec)on via network

Memory

Network

Memory Memory Memory Memory

Accelerator-‐based Systems

• Calcula)ons made in both CPUs and GPUs • No longer limited to single precision calcula)ons • Load balancing cri)cal for performance • Requires specific libraries and compilers (CUDA, OpenCL) • Co-‐processor from Intel: MIC (Many Integrated Core)

Network

GPU

Memory

GPU

Memory

GPU

Memory

GPU

Memory

Advanced Research CompuJng

Advanced Research CompuJng

• Unit within the Office of the Vice President of Informa)on Technology • Provide centralized resources for:

– Research compu)ng – Visualiza)on

• Staff to assist users • Website: hFp://www.arc.vt.edu

ARC Goals

• Advance the use of compu)ng and visualiza)on in VT research • Centralize resource acquisi)on, maintenance, and support for research community • Provide support to facilitate usage of resources and minimize barriers to entry • Enable and par)cipate in research collabora)ons between departments

Personnel •  Associate VP for Research Compu)ng: Terry

Herdman •  Director, Visualiza)on: Nicholas Polys •  Computa)onal Scien)sts

•  John Burkardt •  Jus)n Krome)s •  James McClure •  Srijith Rajamohan •  Bob SeFlage

•  Sorware Engineer: Nathan Liles

Personnel (ConJnued)

•  System Administrators (UAS) •  Josh Akers •  MaF Strickler

•  Solu)ons Architects •  Brandon Sawyers •  Chris Snapp

•  Business Manager: Alana Romanella •  User Support GRAs: Mai Dahshan, Ahmed Ibrahim, TBA

Compute Resources

System Usage Nodes Node DescripEon Special Features

NewRiver GPGPU, Data Intensive 165

126x 24 cores, 128 GB (Haswell) 39x 28 cores, 512 GB (Broadwell)

78 P100 GPU 8 K80 GPU 16 “big data” nodes 24 512GB nodes 2 3TB nodes

Cascades Large-‐scale CPU 196 32 cores, 128 GB (Broadwell)

8 K80 GPU 2 3TB nodes

DragonsTooth Single-‐node 96 24 cores, 256 GB (Haswell)

OpenStack on 24 nodes

BlueRidge Large-‐scale CPU 408 16 cores, 64 GB (Sandy Bridge)

4 K40 GPU 18 128GB nodes

Huckleberry Deep Learning 14 2x IBM Power8 (Minsky), 256 GB

96 P100 GPU with NVLink

Storage Resources

Name Intent File System Total Size Maximum Usage Data Lifespan

Home Long-‐term storage of

files Qumulo 219 TB 640 GB per user Unlimited

Group Shared Storage for Research Groups

GPFS 1.0 PB 10 TB per group Unlimited

Work Fast I/O, Temporary

storage GPFS 1.1 PB 14 TB per user 120 days

Archive Long-‐term storage for infrequently-‐accessed

files LTFS Unlimited Unlimited

Storage within a Job

Name Intent File

System Environment Variable

Per User Maximum

Data Lifespan

Available On

Local Scratch

Local disk (hard drives)

$TMPDIR Size of node hard drive

Length of Job

Compute Nodes

Memory (tmpfs)

Very fast I/O Memory (RAM)

$TMPFS Size of node memory

Length of Job

Compute Nodes

VisualizaJon Resources

• VisCube: 3D immersion environment with three 10ʹ′ by 10ʹ′ walls and a floor of 1920×1920 stereo projec)on screens • DeepSix: Six )led monitors with combined resolu)on of 7680×3200 • ROVR Stereo Wall • AISB Stereo Wall

GeWng Started with ARC

• Review ARC’s system specifica)ons and choose the right system(s) for you – Specialty sorware

• Apply for an account online the Advanced Research Compu)ng website • When your account is ready, you will receive confirma)on from ARC’s system administrators

AllocaJon System: Goals

• Track projects that use ARC systems and document how resources are being used • Ensure that computa)onal resources are allocated appropriately based on needs – Research: Provide computa)onal resources for your research lab – Instruc)onal: System access for courses or other training events

AllocaJon Eligibility

To qualify for an alloca)on, you must meet at least one of the following: • Be a Ph.D. level researcher (post-‐docs qualify) • Be an employee of Virginia Tech and the PI for research compu)ng • Be an employee of Virginia Tech and the co-‐PI for a research project led by non-‐VT PI

AllocaJon ApplicaJon Process

• Create a research project in ARC database • Add grants and publica)ons associated with project • Create an alloca)on request using the web-‐based interface • Alloca)on review may take several days • Users may be added to run jobs against your alloca)on once it has been approved

Resources • ARC Website: hFp://www.arc.vt.edu • ARC Compute Resources & Documenta)on: hFp://www.arc.vt.edu/compu)ng • ARC Sorware: hFp://www.arc.vt.edu/sorware • New Users Guide: hFp://www.arc.vt.edu/newusers • Frequently Asked Ques)ons: hFp://www.arc.vt.edu/faq • Linux Introduc)on: hFp://www.arc.vt.edu/unix

Thank you

Ques)ons?

introduction to hpc and arc resources - … · compute&resources system’ usage’ nodes...

Documents