a brief history of gpgpu - computer sciencecs.unc.edu/xcms/wpfiles/50th-symp/harris.pdf · a brief...

Mark Harris Chief Technologist, GPU Computing

UNC Ph.D. 2003

A BRIEF HISTORY OF GPGPU

3

A BRIEF HISTORY OF GPGPU

fd

General-Purpose computation on Graphics Processing Units

4

THE FIRST GPGPU: IKONAS RDS-3000   1978: Nick England & Mary Whitton founded Ikonas Graphics Systems

  Tim Van Hook wrote microcode for solid modeling, ray tracing (SIGGRAPH ’86)

  From a 1985 Video: “All computation is taking place in the Adage 3000 Display”

http://www.virhistory.com/ikonas/ikonas.html

5

UNC PIXEL PLANES AND PIXELFLOW   Procedural textures on Pixel Planes 5 (Rhodes et al. 1992)

PixelFlow

  100,000+ 100MHz 8-bit processors

  Early real-time programmable shading (Olano/Lastra ‘98)

Kedem et al. (‘98) used for unix password cracking

Proceedings 1992 Symposium on Interactive 3D Graphics

Cambridge, Massachusetts 29 March - 1 April 1992

Program Co-Chairs

Marc Levoy Edwin E. Catmull Stanford University Pixar

Symposium Chair

David Zeltzer MIT Media Laboratory

Sponsored by the following organizations:

Office of Naval Research National Science Foundation

USA Ballistic Research Laboratory Hewlett-Packard Silicon Graphics

Sun Microsystems MIT Media Laboratory

In Cooperation with ACM SIGGRAPH

6

GEFORCE 1-3: THE DAWN OF GPGPU (‘99-’01)   GeForce 256: First “GPU”

  GeForce 3: First programmable GPU

  Vertex Shaders – programmable vertex transforms, 32-bit float

  Data-dependent, configurable texturing + register combiners

  Enabled early GPGPU results

  Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2

  Larsen &McAllister (2001): first GPU matrix multiplication (8-bit)

Rumpf & Strzodka (2001): first GPU PDEs (diffusion, image segmentation)

  NVIDIA SDK Game of Life, Shallow Water (Greg James, 2001)

7

PHYSICALLY BASED SIMULATION ON GEFORCE 3  Approximate simulation of natural phenomena

 Boiling liquid,

 fluid convection,

 chemical reaction-diffusion

  Inaccurate due to low GPU precision

“Physically-Based Visual Simulation on Graphics Hardware”. Harris, Coombe, Scheuermann, and Lastra. Graphics Hardware 2002

8

NAMING A TREND

 Let’s name this thing that people are doing!

 I coined “GPGPU” and created home page November 2002

 home on the web to collect research / resources

  Interest grew quickly: launched GPGPU.org August 2003

“Application of graphics hardware to non-graphics applications” “General computations on graphics hardware” “Exploiting special-purpose hardware for alternative purposes”

9

GEFORCE FX (2003) : FLOATING POINT PIXELS

  True Programmability enabled broader simulation research

  Ray Tracing (Purcell, 2002), Photon Maps (Purcell, 2003)

Radiosity (Carr et al., 2003 & Coombe et al., 2004)

  PDE solvers

  Red-black Gauss-Seidel (Harris et al., 2003)

  Conjugate gradient (Bolz et al. 2003, Krueger et al. 2003)

Multigrid (Goodnight et al. 2003)

  Physically-based simulation

  Fluid and cloud simulation: (Krueger et al. 2003, Harris et al. 2003)

  Cloth simulation (Green, 2003)

  Ice crystal formation (Kim and Lin, 2003)

  FFT (Moreland and Angel, 2003)

  High-level language: Brook for GPUs (Buck et al. 2004)

10

GPU CLOUD SIMULATION  My Ph.D. Dissertation: visually realistic cloud simulation on GPUs

 2D & 3D Incompressible Navier-Stokes fluid

 Thermodynamics (latent heat, diffusion)

 Water condensation / evaporation

 Light scattering simulation for rendering

 Programmed in OpenGL with pixel shaders

“Real-Time Cloud Simulation and Rendering”. Mark Harris Ph.D. Dissertation U. of North Carolina. 2003

11

CUDA AND THE G80 GPU (2006)

 First GPU arch. and software platform designed for computing

 Dedicated computing mode – threads rather than pixels/vertices

 General, byte-addressable memory architecture

 First C/C++ language and compiler for GPUs

 CUDA C++ defines minimally extended subset of C++ with parallelism

 2007 began a massive surge in GPGPU development

 Not just graphics PhDs

12

ACCELERATING DISCOVERIES

USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.” WITHOUT GPUS, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE.

13

FROM HPC TO ENTERPRISE DATACENTERS

Government Supercomputing Finance Higher Ed Oil & Gas Consumer Web

Air Force Research Laboratory

Naval Research Laboratory

Tokyo Institute of Technology

14

MACHINE LEARNING USING DEEP NEURAL NETWORKS

Input Result

Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009

Visual Object Recognition Using Deep Convolutional Neural Networks Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985

15

GPU-ACCELERATED DEEP LEARNING

START-UPS Image Detection

Face Recognition

Gesture Recognition

Video Search & Analytics

Speech Recognition & Translation

Image and Video Understanding

Recommendation Engines

Indexing & Search

16

COMMON PROGRAMMING APPROACHES Across Heterogeneous Architectures

x86

Libraries

Programming Languages

Compiler Directives

AmgX cuBLAS

/

17

Unified Memory DRAMATICALLY LOWER DEVELOPER EFFORT

Past Developer View

System Memory

GPU Memory

Developer View With Unified Memory

Unified Memory

18

PARALLELISM IN MAINSTREAM LANGUAGES

  Enable more programmers to write parallel software

  Give programmers the choice of language to use

  Parallel computing support in key languages

C

19

FUTURE C++: PARALLEL STL

  Complete set of parallel primitives: for_each, sort, reduce, scan, etc.

  ISO C++ committee voted unanimously to accept as official tech. specification working draft

N3960 Technical Specification Working Draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4352.html Prototype: https://github.com/n3554/n3554

std::vector<int> vec = ... // previous standard sequential loop std::for_each(vec.begin(), vec.end(), f); // explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f); // permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);

20

PARTING WORDS OF WISDOM

  Stand Up!

  “Keep it narrow and doable” – Fred Brooks

  “Write a little bit every day” – Fred Brooks

  “If you measure it, you can improve it” – Jen-Hsun Huang

  “But you have to measure the right thing!”

21

1999: ADVENT OF THE GPU

NVIDIA GeForce 256

Coined the term “Graphics Processing Unit”

“A single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.”

Register Combiners

configurable multipass shading

Beginning of GPU programmability

Spare 0

Fragment Color

Texture Fetch

General Combiner 0

4 RGB Inputs

Texture 0

Texture 1

Fog Color/Factor

Reg

iste

r Set

6 RGB Inputs

Specular Color

4 Alpha Inputs

3 RGB Outputs 3 Alpha Outputs

General Combiner 1

4 RGB Inputs 4 Alpha Inputs

3 RGB Outputs 3 Alpha Outputs

Final Combiner 1 Alpha Input

Specular Color

22

EARLY PC & WORKSTATION GRAPHICS Rasterizer, texture unit, z-buffer, frame buffer

Fixed-point math, fixed-function interpolation / texturing

Lengyel et al. (1990) – Robot motion planning

Use rasterizer to fill minkowski sum polygons

HP 9000 TurboSRX Workstation

Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2

Render cones – rasterizer and z-buffer compute voronoi diagram

a brief history of gpgpu - computer sciencecs.unc.edu/xcms/wpfiles/50th-symp/harris.pdf · a brief...

Documents