cs 179: gpu programming -...
TRANSCRIPT
CS 179: GPU Programming
Lecture 1: Introduction
Images: http://en.wikipedia.org
http://www.pcper.com
http://northdallasradiationoncology.com/
GPU Gems (Nvidia)
Administration
Covered topics:
• (GP)GPU computing/parallelization
• C++ CUDA (parallel computing platform)
TAs:
• [email protected] for set submission and extension requests
• Aadyot Bhatnagar([email protected])
• Tyler Port ([email protected])
Website:
• http://courses.cms.caltech.edu/cs179/
Overseeing Instructor:
• Al Barr ([email protected])
Class time:• ANB 107, MWF 3:00 PM
• Recitations on Fridays
Course Requirements
Fill out this survey: https://goo.gl/forms/RZiUFBGYs2GKYEFA2
Fill out this when2meet for office hours ASAP: https://www.when2meet.com/?6806202-GXLXT
Homework:• 6 weekly assignments
• Each worth 10% of grade
Final project:• 4-week project
• 40% of grade total
P/F Students must receive at least 60% on every
assignment AND the final project
Homework
Due on Wednesdays before class (3PM)
First set out April 4th, due April 11th
Collaboration policy:• Discuss ideas and strategies freely, but all code must be
your own
• Do not look up prior years solutions or reference
solution code from github without prior TA approval
Office Hours: Located in ANB 104• Times: TBA (will be announced before first set is out)
Extensions• Ask a TA for one if you have a valid reason
Projects
Topic of your choice• We will also provide many options
Teams of up to 2 people• 2-person teams will be held to higher
expectations
Requirements• Project Proposal
• Progress report(s) and Final Presentation
• More info later…
Machines
Primary GPU machines available• Currently being setup. You will receive a user account after
emailing [email protected]
• Titan: titan.cms.caltech.edu (SSH and Mosh available)
• Haru: haru.cms.caltech.edu
• Maki: maki.caltech.edu
Secondary machines
• mx.cms.caltech.edu
• minuteman.cms.caltech.edu
• These use your CMS login
• NOTE: Not all assignments work on these machines
Change your password from the temp one we send you• Use passwd command
Machines
Alternative: Use your own machine:• Must have an NVIDIA CUDA-capable GPU
• Virtual machines won’t work• Exception: Machines with I/O MMU virtualization and
certain GPUs
• Special requirements for:• Hybrid/optimus systems
• Mac/OS X
Setup guide on the website is outdated. Do not
follow 2016 instructions
The CPU
The “Central Processing Unit”
Traditionally, applications use CPU for primary
calculations• General-purpose capabilities
• Established technology
• Usually equipped with 8 or less powerful cores
• Optimal for concurrent processes but not large scale
parallel computations
Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg
The GPU
The "Graphics Processing Unit"
Relatively new technology designed for parallelizable problems• Initially created specifically for graphics
• Became more capable of general computations
GPUs – The Motivation
Raytracing:
for all pixels (i,j):Calculate ray point and direction in 3d spaceif ray intersects object:calculate lighting at closest objectstore color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981
EXAMPLE
Add two arrays• A[ ] + B[ ] -> C[ ]
On the CPU:
float *C = malloc(N * sizeof(float));for (int i = 0; i < N; i++)C[i] = A[i] + B[i];return C;
This operates sequentially… can we do better?
A simple problem…
• On the CPU (multi-threaded, pseudocode):
(allocate memory for C)Create # of threads equal to number of cores on processor (around 2, 4, perhaps 8)(Indicate portions of A, B, C to each thread...)
...
In each thread,For (i from beginning region of thread)C[i] <- A[i] + B[i]//lots of waiting involved for memory reads, writes, ...Wait for threads to synchronize...
This is slightly faster – 2-8x (slightly more with other tricks)
A simple problem…
• How many threads? How does performance scale?
• Context switching: • The action of switching which thread is being processed
• High penalty on the CPU
• Not an issue on the GPU
A simple problem…
• On the GPU:
(allocate memory for A, B, C on GPU)Create the “kernel” – each thread will perform one (or a few) additions
Specify the following kernel operation:
For all i‘s (indices) assigned to this thread:C[i] <- A[i] + B[i]
Start ~20000 (!) threadsWait for threads to synchronize...
GPU: Strengths Revealed
• Emphasis on parallelism means we have lots of cores
• This allows us to run many threads simultaneously with
no context switches
GPU Computing: Step by Step
• Setup inputs on the host (CPU-accessible memory)
• Allocate memory for outputs on the host
• Allocate memory for inputs on the GPU
• Allocate memory for outputs on the GPU
• Copy inputs from host to GPU
• Start GPU kernel (function that executed on gpu)
• Copy output from GPU to host
NOTE: Copying can be asynchronous, and unified memory
management is available
Indexing
Can get a block ID and thread ID within the block:Unique thread ID!
https://cs.calvin.edu/courses/cs/374/CUDA/CUDA-Thread-Indexing-Cheatsheet.pdf
https://en.wikipedia.org/wiki/Thread_block
GPUs – Brief History
• Initially based on graphics focused
fixed-function pipelines• Pre-set functions, limited options
http://gamedevelopment.tutsplus.com/articles/the-end-of-
fixed-function-rendering-pipelines-and-how-to-move-on--
cms-21469
Source: Super Mario 64, by Nintendo
GPUs – Brief History
• Shaders• Could implement one’s own functions!
• GLSL (C-like language)
• Could “sneak in” general-purpose programming!
• Vulkan/OpenCL is the modern multiplatform general purpose GPU
compute system, but we won’t be covering it in this course
http://minecraftsix.com/glsl-shaders-mod/
GPUs – Brief History
“General-purpose computing on GPUs” (GPGPU)
• Hardware has gotten good enough to a point where it’s basically
having a mini-supercomputer
CUDA (Compute Unified Device Architecture)
• General-purpose parallel computing platform for NVIDIA GPUs
Vulkan/OpenCL (Open Computing Language)
• General heterogenous computing framework
Both are accessible as extensions to various languages
• If you’re into python, checkout Theano, pyCUDA.