emergence of gpu systems and clusters for general purpose high performance computing itcs 4145/5145...

Emergence of GPU systems and clusters for general

purpose high performance computing

ITCS 4145/5145 April 3, 2012 © Barry Wilkinson

2

Graphic processing units(GPUs) for high performance computing

Single computers with GPU cards GPU clusters GPU Grids

(Geographically distributed resources used collectively, see http://coitweb.uncc.edu/~abw/gridcourse/index.html)

GPU Clouds

With advent of graphic processing units (GPUs) for scientific high performance computing, can incorporate GPUs into systems, greatly increasing their compute capability.

3

Many if not most high performance clusters use GPUs for HPC

Fastest computer systems in the world

Uses NVIDIA GPUs

Japanese

http://top500.org/

Chinese

Tianhe-1A (was #1 now #2) has 7168 NVIDIA M2050 GPUs

coit-grid01.uncc.edu – coit-grid7.uncc.educluster

coit-grid01

switch

coit-grid05

coit-grid03

coit-grid02

coit-grid04

All user’s home directories on coit-grid05 (NFS)

coit-grid06

NVIDIA Tesla GPU

(C2050 448 core Fermi)

Login from within the campus only

Login from on-campus or off-campusUse coit-grid01.uncc.edu

coit-grid07

NVIDIA Tesla GPU

(C2050 448 core Fermi)

coit-grid07: GPU server, X5560 2.8GHz quad-core Xeon processor with NVIDIA 2050 GPU, 12GB main memory

GPU servers grid06 and grid07 for HPC GPU programmingWill also use Windows machines in lab 335, which have mid-level (48-core) NVIDIA cards

Graphics Processing Units (GPUs)Brief History

1970 2010200019901980

Atari 8-bit computer

text/graphics chip

Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit

IBM PC Professional Graphics Controller

card

S3 graphics cards- single chip 2D

accelerator

OpenGL graphics API

Hardware-accelerated 3D graphics

DirectX graphics API

Playstation

GPUs with programmable shading

Nvidia GeForceGE 3 (2001) with

programmable shading

General-purpose computing on graphics processing units

(GPGPUs)

GPU Computing

NVIDIA products

NVIDIA Corp. is the leader in GPUs for high performance computing:

1993 201019991995

http://en.wikipedia.org/wiki/GeForce

20092007 20082000 2001 2002 2003 2004 2005 2006

Established by Jen-Hsun Huang, Chris

Malachowsky, Curtis Priem

NV1 GeForce 1

GeForce 2 series GeForce FX series

GeForce 8 series

GeForce 200 series

GeForce 400 series

GTX460/465/470/475/480/485

GTX260/275/280/285/295GeForce 8800

GT 80

Tesla

Quadro

NVIDIA's first GPU with general purpose processors

C870, S870, C1060, S1070, C2050, …

Tesla 2050 GPU has 448 thread processors

Fermi

Kepler(2011)

Maxwell (2013)

7

GPU performance gains over CPUs

0

200

400

600

800

1000

1200

1400

9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009

GFLO

Ps

NVIDIA GPUIntel CPU

T12

Westmere

NV30NV40

G70

G80

GT200

3GHz Dual Core P4

3GHz Core2 Duo

3GHz Xeon Quad

Source © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign

8

CPU-GPU architecture evolution

Co-processors -- very old idea that appeared in 1970s and 1980s with floating point co-processors attached to microprocessors that did not then have floating point capability.

These coprocessors simply executed floating point instructions that were fetched from memory.

Around same time, interest to provide hardware support for displays, especially with increasing use of graphics and PC games.

Led to graphics processing units (GPUs) attached to CPU to create video display.

CPU

Graphics card

Display

Memory

Early design

9

Birth of general purpose programmable GPU

Dedicated pipeline (late1990s-early 2000s)

By late1990’s, graphics chips needed to support 3-D graphics, especially for games and graphics APIs such as DirectX and OpenGL.

Graphics chips generally had a pipeline structure with individual stages performing specialized operations, finally leading to loading frame buffer for display.

Individual stages may have access to graphics memory for storing intermediate computed data.

Input stage

Vertex shader stage

Geometry shader stage

Rasterizer stage

Frame buffer

Pixel shading stage

Graphics memory

10

GeForce 6 Series Architecture

(2004-5)From GPU Gems 2, Copyright 2005 by NVIDIA Corporation

11

General-Purpose GPU designs

High performance pipelines call for high-speed (IEEE) floating point operations.

People tried to use GPU cards to speed up scientific computations

Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult to do with specialized graphics pipelines, but possible.)

By mid 2000’s, recognized that individual stages of graphics pipeline could be implemented by a more general purpose processor core (although with a data-parallel paradigm)

a

12

2006 -- First GPU for general high performance computing as well as graphics processing, NVIDIA GT 80 chip/GeForce 8800 card.

Unified processors that could perform vertex, geometry, pixel, and general computing operations

Could now write programs in C rather than graphics APIs.

Single-instruction multiple thread (SIMT) programming model

GPU design for general high performance computing

14

Evolving GPU designNVIDIA Fermi architecture

(announced Sept 2009)

•Data parallel single instruction multiple data operation (“Stream” processing)

•Up to 512 cores (“stream processing engines”, SPEs, organized as 16 SPEs, each having 32 SPEs)

• 3GB or 6 GB GDDR5 memory

•Many innovations including L1/L2 caches, unified device memory addressing, ECC memory, …

•First implementation: Tesla 20 series (single chip C2050/2070, 4 chip S2050/2070) 3 billion transistor chip? Number of cores limited by power considerations, C2050 has 448 cores.

15

Fermi Streaming Multiprocessor (SM)

* Whitepaper NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, NVIDIA, 2008

16

CUDA(Compute Unified Device Architecture)

• Architecture and programming model, introduced in NVIDIA in 2007

• Enables GPUs to execute programs written in C.• Within C programs, call SIMT “kernel” routines that are

executed on GPU. • CUDA syntax extension to C identify routine as a

Kernel.

• Very easy to learn although to get highest possible execution performance requires understanding of hardware architecture

17

2010: NVIDIA Corp. selected UNC-Charlotte Department of Computer Science to be a CUDA Teaching Center, kindly providing GPU equipment and TA support.

2011: NVIDIA kindly provided 50 GTX 480 GPU cards valued at $15,000 as continuing support for the CUDA Teaching Center.

UNC-C CUDA Teaching Center

Our course materials are posted on NVIDIA’s corporate site next to those from Stanford, and other top schools.

18

http://developer.nvidia.com/cuda-training

Questions

emergence of gpu systems and clusters for general purpose high performance computing itcs 4145/5145...

Documents

gpu performance gains

emergence of gpu systems

increasing use of graphics

high performance clusters

hpc gpu programmingwill

nvidia m2050 gpus

graphic processing unitsgpus

seriesgeforce fx seriesgeforce