introduction to cuda geek camp singapore 2011
DESCRIPTION
This presentation is for Geek Camp Singapore 2011 1st OctoberTRANSCRIPT
![Page 1: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/1.jpg)
INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011
Raymond Tay
![Page 2: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/2.jpg)
THE FREE LUNCH IS OVER – HERB SUTTER
![Page 3: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/3.jpg)
WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS
…
![Page 4: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/4.jpg)
NVIDIA GPUS FPS
FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better
GPUs beat CPUs
![Page 5: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/5.jpg)
NVIDIA GPUS MEMORY BANDWIDTH
With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing.
GPUs beat CPUs
![Page 6: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/6.jpg)
GPU VS CPU
CPU " Optimised for low-latency
access to cached data sets " Control logic for out-of-order
and speculative execution
GPU " Optimised for data-parallel,
throughput computation " Architecture tolerant of
memory latency " More transistors dedicated to
computation
![Page 7: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/7.jpg)
I DON’T KNOW C/C++, SHOULD I LEAVE?
Relax, no worries. Not to fret.
Your Brain Asks: Wait a minute, why should I learn the C/C++ SDK?
CUDA Answers: Efficiency!!!
![Page 8: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/8.jpg)
WHAT DO I NEED TO BEGIN WITH CUDA?
A Nvidia CUDA enabled graphics card e.g. Fermi
![Page 9: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/9.jpg)
HOW DOES CUDA WORK
1. Copy input data from CPU memory to GPU memory
2. Load GPU program and execute, caching data on chip for performance
3. Copy results from GPU memory to CPU memory
PCI Bus
![Page 10: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/10.jpg)
EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)
{
for(unsigned int i=0;i<array_length;i++)
{
int element = input_array[i];
int shifted = element + shift_amount;
if(shifted > alphabet_max)
{
shifted = shifted % (alphabet_max + 1);
}
output_array[i] = shifted;
}
}
Int main() {
host_shift_cypher(input_array, output_array, shift_amount, alphabet_max, array_length);
}
__global__ void shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)
{
unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;
int shifted = input_array[tid] + shift_amount;
if ( shifted > alphabet_max )
shifted = shifted % (alphabet_max + 1);
output_array[tid] = shifted;
}
Int main() {
dim3 dimGrid(ceil(array_length)/block_size);
dim3 dimBlock(block_size);
shift_cypher<<<dimGrid,dimBlock>>>(input_array, output_array, shift_amount, alphabet_max, array_length);
} CPU Program
GPU Program
![Page 11: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/11.jpg)
EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C,
unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; }
// C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }
![Page 12: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/12.jpg)
DEBUGGER
CUDA-GDB
Parallel Nsight
• Based on GDB • Linux • Mac OS X
• Plugin inside Visual Studio
![Page 13: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/13.jpg)
VISUAL PROFILER & MEMCHECK
Profiler
• Microsoft Windows • Linux • Mac OS X
• Analyze Performance
CUDA-MEMCHECK
• Microsoft Windows • Linux • Mac OS X
• Detect memory access errors
![Page 14: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/14.jpg)
WHERE’S CUDA AT IN 2011?
60,000 researchers use it to aid drug discovery 470 universities teach CUDA
![Page 15: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/15.jpg)
WHERE’S CUDA AT IN 2011? (PART 2..)
NVIDIA Show Case (1000+ applications)
![Page 16: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/16.jpg)
ADDITIONAL RESOURCES CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
CUDA Tools & Ecosystem (http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
GPGPU (http://gpgpu.org )
CUDA By Example (http://tegradeveloper.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0)
Jason Sanders & Edward Kandrot GPU Computing Gems Emerald Edition (
http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/0123849888/ ) Editor in Chief: Prof Hwu Wen-Mei
![Page 17: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/17.jpg)
CUDA LIBRARIES
Visit this site http://developer.nvidia.com/cuda-tools-ecosystem#Libraries
Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding
A lot of the libraries are hosted in Google Code. Many more gems in there too!
![Page 18: Introduction to cuda geek camp singapore 2011](https://reader033.vdocument.in/reader033/viewer/2022051212/55892d03d8b42a31388b459c/html5/thumbnails/18.jpg)
THANK YOU @RaymondTayBL