programming with cuda ws 08/09 lecture 1 tue, 21 oct, 2008
TRANSCRIPT
Programming with Programming with CUDACUDAWS 08/09WS 08/09
Lecture 1Lecture 1Tue, 21 Oct, 2008Tue, 21 Oct, 2008
OrganizationOrganization
Two lectures per weekTwo lectures per week– Tuesdays : 4pm-5pm Ernst-Abbe-Tuesdays : 4pm-5pm Ernst-Abbe-
Platz 2 (room 3517)Platz 2 (room 3517)– Thursdays : 4pm-6pm Carl-Zeiss-Thursdays : 4pm-6pm Carl-Zeiss-
Strasse (room 125)Strasse (room 125) One exercise session per weekOne exercise session per week
– Tuesdays : 5pm-6pm Ernst-Abbe-Tuesdays : 5pm-6pm Ernst-Abbe-Platz 2 (room 3517) Platz 2 (room 3517)
OrganizationOrganization
PeoplePeople– Waqar Saleem (me)Waqar Saleem (me)
http://theinf2.informatik.uni-jena.de/People/Waqar+Saleem.hhttp://theinf2.informatik.uni-jena.de/People/Waqar+Saleem.htmltml
– Jens K. MüllerJens K. Müllerhttp://theinf2.informatik.uni-jena.de/People/Jens+K_+Muellerhttp://theinf2.informatik.uni-jena.de/People/Jens+K_+Mueller.html.html
Office hoursOffice hours– Wednesdays : 2pm-4pm Ernst-Abbe-Wednesdays : 2pm-4pm Ernst-Abbe-
Platz 2 (room 3311)Platz 2 (room 3311)
OrganizationOrganization
Reference MaterialReference Material– www.nvidia.com/object/cuda_education.htmlwww.nvidia.com/object/cuda_education.html
links to university courses on CUDAlinks to university courses on CUDA– www.nvidia.com/object/cuda_develop.htmlwww.nvidia.com/object/cuda_develop.html
documentation, programming guidedocumentation, programming guide
Lecture slides will be made Lecture slides will be made available after each lecture on the available after each lecture on the course websitecourse website– http://theinf2.informatik.uni-jena.de/Fhttp://theinf2.informatik.uni-jena.de/F
or+Students/CUDA.htmlor+Students/CUDA.html
OrganizationOrganization
Two-part courseTwo-part course Part 1: before ChristmasPart 1: before Christmas
– Present and learn about CUDAPresent and learn about CUDA– Form student groupsForm student groups– Groups choose/are assigned projectsGroups choose/are assigned projects
Part 2: after ChristmasPart 2: after Christmas– Groups work on and present their Groups work on and present their
projectsprojects
The Bacardi AlgorithmThe Bacardi Algorithm(courtesy of (courtesy of Elena AndreevaElena Andreeva))
The Bacardi AlgorithmThe Bacardi Algorithm
BacardiBacardi
The Bacardi AlgorithmThe Bacardi Algorithm
BacardiBacardi BacarBacar
The Bacardi AlgorithmThe Bacardi Algorithm
BacardiBacardi BacarBacar WacarWacar
The Bacardi AlgorithmThe Bacardi Algorithm
BacardiBacardi BacarBacar WacarWacar WaqarWaqar
GPGPU (Intro)GPGPU (Intro)
GPGPUGPGPU
GPGPGPUGPU
GPGPUGPGPU
GPGPGPUGPU– Graphical Processing UnitGraphical Processing Unit
GPGPUGPGPU
GPGPGPUGPU– Graphical Processing UnitGraphical Processing Unit– Handles values of pixels displayed on Handles values of pixels displayed on
screenscreen Highly parallel computationHighly parallel computation Optimized for parallel computationsOptimized for parallel computations
GPGPUGPGPU
GPGPGPUGPU
GPGPUGPGPU
GPGPGPUGPU– General Purpose computing on GPUGeneral Purpose computing on GPU
GPGPUGPGPU
GPGPGPUGPU– General Purpose computing on GPUGeneral Purpose computing on GPU– Many non-graphics applications can Many non-graphics applications can
be parallelizedbe parallelized Can then be ported to a GPU Can then be ported to a GPU
implementationimplementation
GPGPUGPGPU
General infoGeneral info– http://http://www.gpgpu.orgwww.gpgpu.org//– http://http://en.wikipedia.org/wiki/GPGPUen.wikipedia.org/wiki/GPGPU
VariantsVariants– GPGPGPGP– GPGP22
Why GPU? (GPU vs. CPU)Why GPU? (GPU vs. CPU)
Specialized for renderingSpecialized for rendering– Highly parallel, compute-intensive Highly parallel, compute-intensive
applicationapplication– Multiple cores, high memory Multiple cores, high memory
bandwidthsbandwidths
Why GPU? (GPU vs. CPU)Why GPU? (GPU vs. CPU)
More data processing transistors More data processing transistors forfor– Flow control: same program for each Flow control: same program for each
datadata– Data caching: one arithmetic-Data caching: one arithmetic-
intensive program, many dataintensive program, many data
So, why now?So, why now?
PreviouslyPreviously– Needed specialized graphics APIsNeeded specialized graphics APIs– GPU DRAM had easy read but limited GPU DRAM had easy read but limited
write capabilitywrite capability Now, CUDA – Compute Unified Now, CUDA – Compute Unified
Device ArchitectureDevice Architecture– Minimal extension to CMinimal extension to C– GPU DRAM read & writeGPU DRAM read & write
CUDACUDA
Some setup issuesSome setup issues
CUDA ready cardsCUDA ready cards– http://http://
www.nvidia.com/object/cuda_learn_products.htwww.nvidia.com/object/cuda_learn_products.htmlml
– CUDA ready machine available in poolCUDA ready machine available in pool
CUDA can be run in emulation modeCUDA can be run in emulation mode– Will install CUDA in emu mode on Will install CUDA in emu mode on
pool PCspool PCs– Repeat at home on own machinesRepeat at home on own machines
Some setup issuesSome setup issues
Signup for pool accessSignup for pool access– Usernames need to be known to Usernames need to be known to
grant access to CUDA ready machinegrant access to CUDA ready machine
All for todayAll for today
Next timeNext time– Finalize course/people websitesFinalize course/people websites
– CUDA programming modelCUDA programming model
– Install CUDA on pool PCsInstall CUDA on pool PCs
See you next week!See you next week!