Download - 1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE 2011 - The 42 nd ACM Technical

1

Workshop 9: General purpose computing using GPUs: Developing a hands-on

undergraduate course on CUDA programming

SIGCSE 2011 - The 42nd ACM Technical Symposium on Computer Science Education

Wednesday March 9, 2011, 7:00 pm - 10:00 pm

Dr. Barry WilkinsonUniversity of North Carolina

Charlotte

Dr. Yaohang LiOld Dominion University

SIGCSE 2011 Workshop 9 intro.ppt © 2010 B. Wilkinson Modification date: Feb 22, 2011

2

Agenda

2

3

GPU performance gains over CPUs

0

200

400

600

800

1000

1200

1400

9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009

GFLO

Ps

NVIDIA GPUIntel CPU

T12

Westmere

NV30 NV40

G70

G80

GT200

3GHz Dual Core P4

3GHz Core2 Duo

3GHz Xeon Quad

Source © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign

Emergence of GPU systems for General Purpose High Performance Computing

GPUs have developed from graphics cards into a platform for HPC

GPUs are being designed with that application in mind

Very significant performance improvements on scientific code

4http://www.hpcwire.com/blogs/New-China-GPGPU-Super-Outruns-Jaguar-105987389.html

outline.5

http://www.nvidia.com/object/cuda_courses_and_map.htmlA hot topic to teach

Taught at Illinois, Stanford, MIT, Harvard, Duke, Chapel Hill, UNC-C, …

Taught at graduate level and now moving into undergraduate level

GPU Course for HighPerformance Computing

Concerned with using Graphics Processing Units (GPUs) for high performance computing

Not graphics A programming course

Uses CUDA (Compute Unified Device Architecture), an architecture and programming model introduced by NVIDIA in 2007

C-based. Easy to learn.

NVIDIA products

NVIDIA Corp. is the leader in GPUs for high performance computing:

1993 201019991995

http://en.wikipedia.org/wiki/GeForce

20092007 20082000 2001 2002 2003 2004 2005 2006

Established by Jen-Hsun Huang, Chris

Malachowsky, Curtis Priem

NV1 GeForce 1

GeForce 2 series GeForce FX series

GeForce 8 series

GeForce 200 series

GeForce 400 series

GTX460/465/470/475/480/485

GTX260/275/280/285/295GeForce 8800

GT 80

Tesla

Quadro

NVIDIA's first GPU with general purpose processors

C870, S870, C1060, S1070, C2050, …

Tesla 2050 GPU has 448 thread processors

Fermi

Kepler(2011)

Maxwell (2013)

CUDA

8

Programming Model

GPUs historically designed for creating image data for displays.

Involves manipulating image picture elements (pixels) and often the same operation each pixel.

SIMD (Single Instruction Multiple Data) model - An efficient mode of operation in which the same operation done on each data element at the same time.

GPUs use a thread version of SIMD called Single Instruction Multiple Thread (SIMT).

9

GPU’s SIMT Programming Model

GPUs use very lightweight threads to achieve high parallel performance and to hide memory latency

Multiple threads, each execute the same instruction sequence.

Very large number of threads (10,000’s) possible on GPUs.

Threads mapped onto available processors on GPU (100’s of processors) all executing same program sequence

More on the program model shortly

10

Programming applications using SIMT model

Matrix operations -- very amenable to SIMT

•Same operations done on different elements of matrices

Some “embarassingly” parallel computations such as Monte Carlo calculations

•Monte Carlo calculations use random selections that are independent of each other

Data manipulations• Some sorting can be done quite efficiently

coit-grid01-4 Each dual Xeon processors(3.4Ghz) 8GB main memory

coit-grid05 -- Four quad-core Xeon processors (2.93Ghz)64GB main memory1.2 TB disk

coit-grid01

coit-grid01.uncc.edu – coit-grid06.uncc.edu

switch

coit-grid05

coit-grid03

coit-grid02

coit-grid04

All user’s home directories on coit-grid05 (NFS)

Computer system used for workshop at UNC-Charlotte

coit-grid06

NVIDIA Tesla GPU (448

core Fermi)

System to log onto first Only available directly from on campus

Guest accounts on computer systems

Account details consist of an account name and an ssh password.

Logon through first to coit-grid01 and then to grid06

Files needed for hands-on sessions provided in each account.

More details in hands-on session write-ups

Use PuTTY or WinSCP if Windows

coit-grid01.uncc.edu

13

Xclock running on client PC

Xclock running on coit-

grid01.uncc.edu

Xclock running on coit-

grid06.uncc.edu

Xterm running on client PC, logged onto coit-grid06.uncc.edu

User interface

accessing for

forwarding X11

graphics

Not needed for workshop

WinSCP running on client PC connected to

grid01.uncc.edu

To make sure all X servers running

14

Simple implementation

800 x 800 points50000 iterations

Speed-up = 16.57Fireplace

Heat distribution problem(Solving Laplace’s equation)

Graphics forwards to client computer (PC)

15

N Body problem

16

Video

Questions

Next

Basic CUDA programming

Intro to 1st hands-on session

Download - 1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE 2011 - The 42 nd ACM Technical

Top Related