![Page 1: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/1.jpg)
CUDA & OpenCVCUDA & OpenCV
![Page 2: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/2.jpg)
ESIEE� Engineering school
� 2 places� Noisy-le-grand (East of Paris)
� Amiens (North of France)
� About 30 student clubs/associations
� 1500 students
� 5 years program� 3 years of common courses (maths, physic, mechanic, electronic, computer
science, management)
� 2 years of specialisation
![Page 3: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/3.jpg)
ESIEE :�ESIEE-engineering (Paris)
�Computer Science
�Electronic and Microelectronic Systems
�Embedded Systems
�Telecommunications and Signal Processing
�Computer network architect
�Digital design 3D
�Electronic system for transportation
�ESIEE-Amiens�Electronic and sustainable Development
�production systems
�Telecommunications and computer networks
�Building trade energy
![Page 4: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/4.jpg)
Summary
�Introduction
�CUDA �CUDA
�OpenCV + GPU support
� results
�tips & tricks
![Page 5: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/5.jpg)
IntroductionIntroduction
![Page 6: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/6.jpg)
about OpenCV�Open Computer Vision library
�Image processing, feature detection, Object detection, Video analysis,
Machine learning ………
�Started by Intel (1999), now open source project (Willow Garage co)
�OOP support since 2.0
�CUDA support since September 2010
![Page 7: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/7.jpg)
about CUDA
�Compute Unified Device Architecture
�GPGPU technology (General Purpose computing on Graphics Processing Unit)
�C language extended
�Windows, Linux, Mac OS�Windows, Linux, Mac OS
�Previous methods :
�shading languages for Real-time rendering
�CTM
�BrookGPU
�Then CUDA since 2007
![Page 8: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/8.jpg)
![Page 9: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/9.jpg)
![Page 10: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/10.jpg)
Motivations
�Architecture massively parallel
=> SIMT (Single Instruction Multiple
Thread)
=> High speed linear algebra
�Cheap and accessible to everyone
�'Easy' to set up and to program
�But limited (see later)
![Page 11: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/11.jpg)
MotivationsBenchmark :
![Page 12: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/12.jpg)
CUDACUDA
![Page 13: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/13.jpg)
vocabulary
�Host & Devices
What is a Kernel ?�What is a Kernel ?
�What is a Thread ?
![Page 14: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/14.jpg)
Kernel hierarchy
�Grid
�Block
�Thread
![Page 15: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/15.jpg)
![Page 16: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/16.jpg)
Kernel creation�Definition : __type__ void myKernel(params)
�__host__
�__global__
�__device__�__device__
�Call :
�Define blocks dimensions : dim3 nbThread(x,y,z)
�Define grid dimensions : dim3 nbBlock(x,y,1)
�myKernel<<<nbBlock, nbThread>>>(params)
�dimension limited
![Page 17: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/17.jpg)
In the Kernels
�dim3 blockIdx : block index in the grid
�dim3 threadIdx : thread index in the block�dim3 threadIdx : thread index in the block
�dim3 blockDim : number of thread in each block
�__syncthreads() : wait until all the thread of the kernel reach this
point
![Page 18: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/18.jpg)
Data management�Allocation
�cudaMalloc(&source, size)
�Transfer�Transfer
�cudaMemcpy(destination, source, size, direction)
�cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
�cudaFree(source)
![Page 19: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/19.jpg)
Memory type
![Page 20: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/20.jpg)
Memory typename prefixe R/W Fast/slow size Accessibility Life time
global __device__ RW slow Some
GBytes
all application
Constant __constant__ RO fast 64 KB/mp all application
texture RO all applicationtexture RO all application
shared __shared__ RW fast when no
bank conflict
16 KB/mp One block block
register (Default) RW fast 8-16
KB/mp
One thread thread
local RW One thread thread
![Page 21: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/21.jpg)
Other CUDA function�Error handling :
�cudaError_t code = cudaGetLastError()
�char* cudaGetErrorString(cudaError_t code)
�Synchronisation :�cudaThreadSynchronize()
�Mathematical functions
�Sqrt, exp, cos, floor….
�See CUDA documentation :http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_Toolkit_Reference_Manual.pdf
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
![Page 22: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/22.jpg)
example
![Page 23: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/23.jpg)
limits
�Memory
�Size
�Bandwidth
�Not efficient for all algorithms
�Only for Nvidia cards
�Language limitation
![Page 24: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/24.jpg)
OpenCV & CUDA supportOpenCV & CUDA support
![Page 25: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/25.jpg)
Presentation :�OpenCV 2. 2 or 2.3
�Set WITH_CUDA flag in Cmake
� Requirement :
�CUDA toolkit 4.0(OpenCV 2.3)�CUDA toolkit 4.0(OpenCV 2.3)
�CUDA toolkit 3.2 (OpenCV 2.2)
�G++ or Visual Studio 2008/2010
�Nvcc : http://sbel.wisc.edu/Courses/ME964/2008/Documents/nvccCompilerInfo.pdf
�NPP library : http://developer.nvidia.com/npp
�More informations : see openCV website, gpu section
![Page 26: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/26.jpg)
�Based on GpuMat,
⇒similar as cv::Mat for Cpu
�Documentation :
http://opencv.itseez.com/modules/gpu/doc/gpu.html
Presentation :
http://opencv.itseez.com/modules/gpu/doc/gpu.html
�Possibility :
�geometrical image transform
�color conversion
�corner detection
�filter engine
�histograms
�feature detection
�…..
![Page 27: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/27.jpg)
Performances :
See sample performance in opencv directory
![Page 28: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/28.jpg)
Results :
OpenCV + my own kernels
![Page 29: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/29.jpg)
Problem & solutionProblem :
Impossible to include CUDA and OpenCV in the same file =>
conflict?
Solution :
![Page 30: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/30.jpg)
hardware software
�CPU Intel Pentium 4 3.6 GHz
�GPU : GeForce GT 240
�CUDA 4.0
�Visual Studio 2008�GPU : GeForce GT 240
�Compute capability : 1.2
�12 Mp x 8 CUDA Cores/MP
�512 threads max
�65 535 blocks max
�Clock speed : 1.34 GHz
�Visual Studio 2008
�OpenCV 2.3
![Page 31: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/31.jpg)
mask generation
GPU CPU
![Page 32: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/32.jpg)
apply mask
GPU CPU
![Page 33: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/33.jpg)
Kernels call
� Fix the number of thread (< 512)
� Deduce the number of Blocks
![Page 34: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/34.jpg)
Kernel representation
![Page 35: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/35.jpg)
benchmark
![Page 36: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/36.jpg)
tips & tricks
![Page 37: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/37.jpg)
Create a project for CUDA�Installation, all you need is here : http://developer.nvidia.com/cuda-toolkit-40
�Include :
�<cuda_runtime.h>
�<cuda.h>
�NVIDIA_gpu_Toolkit\CUDA\version\inc�NVIDIA_gpu_Toolkit\CUDA\version\inc
�Library:
�cuda.lib
�cudart.lib
�NVIDIA_gpu_Toolkit\CUDA\version\lib\Win32
�Ignore library :
�libcmt.lib
�libcmtd.lib
![Page 38: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/38.jpg)
Configure visual studio
�.cu files
�Nvcc configuration :�Nvcc configuration :
�custom build rules
�Syntax highlighting :
�copy usertype.dat from C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK
4.0\C\doc\syntax_highlighting\visual_studio_8
�To C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE
![Page 39: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/39.jpg)
Some tips
� Pay attention with Cmake !
� Mostly gpuMat::step =! Mat::step
� Performance measure
� gpuMat <=> Mat :
� gpuMat gpu_image = gpuMat(cpu_image)
� Or gpu_image.upload(cpu_image)
� Mat cpu_image = gpu_image
![Page 40: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/40.jpg)
optimisation
�Warp of 32 threads
�If/else
� Kernel call is asynchronous� Kernel call is asynchronous
� CudaMemcpy()
� cudaThreadSynchronize()
� Avoid data transfer cpu => gpu
� See sample “bandwidth test”
� Overlapping data transfer
![Page 41: CUDA & OpenCV - Cybernetics · Presentation : OpenCV 2. 2 or 2.3 Set WITH_CUDA flag in Cmake Requirement : CUDA toolkit 4.0(OpenCV 2.3) CUDA toolkit 3.2 (OpenCV 2.2) G++ or Visual](https://reader033.vdocument.in/reader033/viewer/2022052519/5f0f864e7e708231d44497a4/html5/thumbnails/41.jpg)
References
�http://en.wikipedia.org/wiki/CUDA
�http://developer.nvidia.com/nvidia-gpu-computing-documentation
�NVIDIA CUDA C Programming Guide
�CUDA API REFERENCE MANUAL
�CUDA tutorial by Cyril Zeller�Optimizing CUDA, Nvidia tutorial
�Developpez.com (fr)
�Une introduction à CUDA