a brief history of gpgpu - computer sciencecs.unc.edu/xcms/wpfiles/50th-symp/harris.pdf · a brief...
TRANSCRIPT
4
THE FIRST GPGPU: IKONAS RDS-3000 1978: Nick England & Mary Whitton founded Ikonas Graphics Systems
Tim Van Hook wrote microcode for solid modeling, ray tracing (SIGGRAPH ’86)
From a 1985 Video: “All computation is taking place in the Adage 3000 Display”
http://www.virhistory.com/ikonas/ikonas.html
5
UNC PIXEL PLANES AND PIXELFLOW Procedural textures on Pixel Planes 5 (Rhodes et al. 1992)
PixelFlow
100,000+ 100MHz 8-bit processors
Early real-time programmable shading (Olano/Lastra ‘98)
Kedem et al. (‘98) used for unix password cracking
Proceedings 1992 Symposium on Interactive 3D Graphics
Cambridge, Massachusetts 29 March - 1 April 1992
Program Co-Chairs
Marc Levoy Edwin E. Catmull Stanford University Pixar
Symposium Chair
David Zeltzer MIT Media Laboratory
Sponsored by the following organizations:
Office of Naval Research National Science Foundation
USA Ballistic Research Laboratory Hewlett-Packard Silicon Graphics
Sun Microsystems MIT Media Laboratory
In Cooperation with ACM SIGGRAPH
6
GEFORCE 1-3: THE DAWN OF GPGPU (‘99-’01) GeForce 256: First “GPU”
GeForce 3: First programmable GPU
Vertex Shaders – programmable vertex transforms, 32-bit float
Data-dependent, configurable texturing + register combiners
Enabled early GPGPU results
Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2
Larsen &McAllister (2001): first GPU matrix multiplication (8-bit)
Rumpf & Strzodka (2001): first GPU PDEs (diffusion, image segmentation)
NVIDIA SDK Game of Life, Shallow Water (Greg James, 2001)
7
PHYSICALLY BASED SIMULATION ON GEFORCE 3 Approximate simulation of natural phenomena
Boiling liquid,
fluid convection,
chemical reaction-diffusion
Inaccurate due to low GPU precision
“Physically-Based Visual Simulation on Graphics Hardware”. Harris, Coombe, Scheuermann, and Lastra. Graphics Hardware 2002
8
NAMING A TREND
Let’s name this thing that people are doing!
I coined “GPGPU” and created home page November 2002
home on the web to collect research / resources
Interest grew quickly: launched GPGPU.org August 2003
“Application of graphics hardware to non-graphics applications” “General computations on graphics hardware” “Exploiting special-purpose hardware for alternative purposes”
9
GEFORCE FX (2003) : FLOATING POINT PIXELS
True Programmability enabled broader simulation research
Ray Tracing (Purcell, 2002), Photon Maps (Purcell, 2003)
Radiosity (Carr et al., 2003 & Coombe et al., 2004)
PDE solvers
Red-black Gauss-Seidel (Harris et al., 2003)
Conjugate gradient (Bolz et al. 2003, Krueger et al. 2003)
Multigrid (Goodnight et al. 2003)
Physically-based simulation
Fluid and cloud simulation: (Krueger et al. 2003, Harris et al. 2003)
Cloth simulation (Green, 2003)
Ice crystal formation (Kim and Lin, 2003)
FFT (Moreland and Angel, 2003)
High-level language: Brook for GPUs (Buck et al. 2004)
10
GPU CLOUD SIMULATION My Ph.D. Dissertation: visually realistic cloud simulation on GPUs
2D & 3D Incompressible Navier-Stokes fluid
Thermodynamics (latent heat, diffusion)
Water condensation / evaporation
Light scattering simulation for rendering
Programmed in OpenGL with pixel shaders
“Real-Time Cloud Simulation and Rendering”. Mark Harris Ph.D. Dissertation U. of North Carolina. 2003
11
CUDA AND THE G80 GPU (2006)
First GPU arch. and software platform designed for computing
Dedicated computing mode – threads rather than pixels/vertices
General, byte-addressable memory architecture
First C/C++ language and compiler for GPUs
CUDA C++ defines minimally extended subset of C++ with parallelism
2007 began a massive surge in GPGPU development
Not just graphics PhDs
12
ACCELERATING DISCOVERIES
USING A SUPERCOMPUTER POWERED BY 3,000 TESLA PROCESSORS, UNIVERSITY OF ILLINOIS SCIENTISTS PERFORMED THE FIRST ALL-ATOM SIMULATION OF THE HIV VIRUS AND DISCOVERED THE CHEMICAL STRUCTURE OF ITS CAPSID — “THE PERFECT TARGET FOR FIGHTING THE INFECTION.” WITHOUT GPUS, THE SUPERCOMPUTER WOULD NEED TO BE 5X LARGER FOR SIMILAR PERFORMANCE.
13
FROM HPC TO ENTERPRISE DATACENTERS
Government Supercomputing Finance Higher Ed Oil & Gas Consumer Web
Air Force Research Laboratory
Naval Research Laboratory
Tokyo Institute of Technology
14
MACHINE LEARNING USING DEEP NEURAL NETWORKS
Input Result
Hinton et al., 2006; Bengio et al., 2007; Bengio & LeCun, 2007; Lee et al., 2008; 2009
Visual Object Recognition Using Deep Convolutional Neural Networks Rob Fergus (New York University / Facebook) http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php#2985
15
GPU-ACCELERATED DEEP LEARNING
START-UPS Image Detection
Face Recognition
Gesture Recognition
Video Search & Analytics
Speech Recognition & Translation
Image and Video Understanding
Recommendation Engines
Indexing & Search
16
COMMON PROGRAMMING APPROACHES Across Heterogeneous Architectures
x86
Libraries
Programming Languages
Compiler Directives
AmgX cuBLAS
/
17
Unified Memory DRAMATICALLY LOWER DEVELOPER EFFORT
Past Developer View
System Memory
GPU Memory
Developer View With Unified Memory
Unified Memory
18
PARALLELISM IN MAINSTREAM LANGUAGES
Enable more programmers to write parallel software
Give programmers the choice of language to use
Parallel computing support in key languages
C
19
FUTURE C++: PARALLEL STL
Complete set of parallel primitives: for_each, sort, reduce, scan, etc.
ISO C++ committee voted unanimously to accept as official tech. specification working draft
N3960 Technical Specification Working Draft: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4352.html Prototype: https://github.com/n3554/n3554
std::vector<int> vec = ... // previous standard sequential loop std::for_each(vec.begin(), vec.end(), f); // explicitly sequential loop std::for_each(std::seq, vec.begin(), vec.end(), f); // permitting parallel execution std::for_each(std::par, vec.begin(), vec.end(), f);
20
PARTING WORDS OF WISDOM
Stand Up!
“Keep it narrow and doable” – Fred Brooks
“Write a little bit every day” – Fred Brooks
“If you measure it, you can improve it” – Jen-Hsun Huang
“But you have to measure the right thing!”
21
1999: ADVENT OF THE GPU
NVIDIA GeForce 256
Coined the term “Graphics Processing Unit”
“A single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second.”
Register Combiners
configurable multipass shading
Beginning of GPU programmability
Spare 0
Fragment Color
Texture Fetch
General Combiner 0
4 RGB Inputs
Texture 0
Texture 1
Fog Color/Factor
Reg
iste
r Set
6 RGB Inputs
Specular Color
4 Alpha Inputs
3 RGB Outputs 3 Alpha Outputs
General Combiner 1
4 RGB Inputs 4 Alpha Inputs
3 RGB Outputs 3 Alpha Outputs
Final Combiner 1 Alpha Input
Specular Color
22
EARLY PC & WORKSTATION GRAPHICS Rasterizer, texture unit, z-buffer, frame buffer
Fixed-point math, fixed-function interpolation / texturing
Lengyel et al. (1990) – Robot motion planning
Use rasterizer to fill minkowski sum polygons
HP 9000 TurboSRX Workstation
Hoff (1999) -- Voronoi diagrams on NVIDIA TNT2
Render cones – rasterizer and z-buffer compute voronoi diagram